CN116523388A - Data-driven quality modeling method based on industrial Internet platform - Google Patents

Data-driven quality modeling method based on industrial Internet platform Download PDF

Info

Publication number
CN116523388A
CN116523388A CN202310408969.1A CN202310408969A CN116523388A CN 116523388 A CN116523388 A CN 116523388A CN 202310408969 A CN202310408969 A CN 202310408969A CN 116523388 A CN116523388 A CN 116523388A
Authority
CN
China
Prior art keywords
data
sample
model
representing
industrial internet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310408969.1A
Other languages
Chinese (zh)
Other versions
CN116523388B (en
Inventor
王峰
顾毅
熊亮
张莹
郑锦泉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Xuelang Shuzhi Technology Co ltd
Original Assignee
Wuxi Xuelang Shuzhi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Xuelang Shuzhi Technology Co ltd filed Critical Wuxi Xuelang Shuzhi Technology Co ltd
Priority to CN202310408969.1A priority Critical patent/CN116523388B/en
Publication of CN116523388A publication Critical patent/CN116523388A/en
Application granted granted Critical
Publication of CN116523388B publication Critical patent/CN116523388B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/067Enterprise or organisation modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/04Manufacturing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The invention discloses a data-driven quality modeling method based on an industrial Internet platform, which comprises the following steps: different system data are collected based on an industrial Internet platform, and unified summarization is carried out on the data; carrying out data preprocessing on the acquired data; selecting auxiliary variables according to the process principle and the process characteristics, and adopting a principal component analysis method to reduce the dimension of the auxiliary variables; constructing a key product quality prediction model based on a data-driven modeling strategy; and carrying out deviation correction and model parameter correction on the established prediction model. The method provided by the invention can greatly reduce the requirements of factories on measuring equipment, and has important significance for improving the product quality, promoting energy conservation and consumption reduction and accelerating the digital transformation of enterprises; the method provided by the invention can predict the key indexes of chemical raw materials and products in real time, avoids the problems of long time consumption, difficult detection or incapability of detection of certain indexes and the like, and saves a great amount of time and resources.

Description

Data-driven quality modeling method based on industrial Internet platform
Technical Field
The invention relates to the field of industrial Internet, in particular to a data-driven quality modeling method based on an industrial Internet platform.
Background
In petrochemical processes, the simulation, control and optimization of the system often rely on high performance models. With the increasing market competition and the increasing environmental protection requirements in recent years, enterprises are urgently required to improve economic benefits as much as possible from effective resources, so that new requirements are put on process control and optimization, and modeling difficulty is also increased, particularly modeling of biological parameters in modeling fermentation processes of strong nonlinearity and time-varying objects such as physical and chemical parameters in chemical processes under a continuous stirring reaction kettle. For example, continuous Stirred Tank Reactors (CSTRs) are a widely used type of reactor in polymerization chemistry, which not only plays a significant role in the core equipment of chemical production but also is commonly used in the dye, pharmaceutical reagents, food and synthetic materials industries. However, on the contrary, the reason why the automatic control of the reaction process therein has been slow is mainly that the process modeling thereof has been made very difficult by the fact that the reaction process therein often involves a lot of physical and chemical interactions and influences that make the reaction process exhibit a high degree of nonlinearity.
In the actual production operation of chemical industry, due to the lack of technical means and hardware equipment, a core production system cannot feed back all required process parameters in real time, and if the reaction process needs to be better controlled, data information in the reaction process needs to be obtained. Compared with the variables such as temperature, pressure, liquid level, volume and the like which are relatively easy to measure in real time, the parameters such as reactant concentration and the like lack reliable sensors to detect the variables on line, and the cost is high. Many industrial production systems fail to rely on fault diagnosis and status detection to improve the safety of the system operation. This also brings great trouble to the quality of the product. Other factors, such as temperature, concentration of the feedstock within the reactor, etc., may also be affected during the production process, which may result in an uncertainty in the modeled type.
The arrival of big data and industrial Internet age opens up a new method for the intellectualization of the chemical industry field by algorithm research represented by mathematical mining and machine learning technologies, and indicates a new direction. The data-driven quality modeling method based on the industrial Internet platform has higher flexibility and reality correlation, and can fully mine important information in historical data by utilizing strong learning and characterization capabilities of the data-driven quality modeling method, and an accurate prediction model is established for key raw materials and product quality indexes.
For the problems in the related art, no effective solution has been proposed at present.
Disclosure of Invention
Aiming at the problems in the related art, the invention provides a data-driven quality modeling method based on an industrial Internet platform, so as to overcome the technical problems in the prior related art.
For this purpose, the invention adopts the following specific technical scheme:
a data-driven quality modeling method based on an industrial internet platform, the method comprising the steps of:
s1, collecting different system data based on an industrial Internet platform, and uniformly summarizing the data;
s2, carrying out data preprocessing on the acquired data;
s3, selecting auxiliary variables according to a process principle and process characteristics, and performing dimension reduction on the auxiliary variables by adopting a principal component analysis method;
s4, constructing a key product quality prediction model based on a data driving modeling strategy;
s5, performing deviation correction and model parameter correction on the established prediction model.
Further, the data preprocessing of the collected data includes the following steps:
s201, merging and storing the acquired data to obtain sample data;
s202, abnormal data elimination and filtering processing are carried out on the sample data, and data are normalized.
Further, the calculation formula for fusing the collected data is as follows:
wherein h is 1q Indicating that the business system is at t 1q Data collected at the moment;
h 2q indicating that the production system is at t 2q Data collected at the moment;
ε h1 representing the acquired data h 1q Root mean square error of (a);
ε t1 indicating time t 1q Root mean square error of (a);
ε h2 representing the acquired data h 2q Root mean square error of (a);
ε t2 indicating time t 2q Root mean square error of (a);
h q representing the business system and the production system at t q And collecting the data fusion result at the moment.
Furthermore, the sample data is subjected to abnormal data rejection and is subjected to screening treatment by adopting a 3 sigma judgment principle, and the specific steps are as follows;
assuming that n auxiliary variables in the sample data are x, the sequence of x is x 1 ,x 2 ,…,x i (i=1, 2,3 … n) and the average value thereof is calculatedAnd standard deviation sigma:
if the auxiliary variable x in the sample satisfies the following formula:
then the sample is removed as an abnormal sample, the 3 sigma judgment processing is sequentially carried out on other auxiliary variables in the sample, and the screened sample is selected into a modeling sample set;
further, the sample data is filtered to average filter the samples by the following formula:
X(t)=(X(t-T/2)+X(t-T/2+T c )+…+X(t))
…+X(t-T/2+T c )+X(t+T/2)/(T/T c )
wherein t represents a sampling time;
t represents a filtering time constant;
T c representing the sampling period.
Further, the normalizing the data normalizes the sample data to [ y ] by the following formula min ,y max ]:
y=[y min ,y max ]*(x-x min )/(x max -x min )+y min
Wherein y is min ,y max Representing the upper and lower bounds of the normalized target;
x max ,x min representing the current variable value as upper and lower bounds.
Further, the main component analysis method comprises the following calculation steps:
1) Normalizing the original sample data and forming a normalization matrix:
let m-dimensional random vector x= (X) 1 ,X 2 ,…,X n ) T For n samples X i =(X i1 ,X i2 ,…,X im ) T ,(i=1,2,3…m) T is the superscript of matrix transposition, a sample matrix is formed, the sample matrix is standardized, and the average value of samples is calculated:
sample variance:
the normalized data are:
wherein, (i=1, 2,3 … m; k=1, 2,3 … n),
form a standardized matrix X (X ik );
2) Sample correlation coefficient matrix is calculated for standard price matrix:
wherein r is ij Elements representing row i, column j of matrix R, (i, j=1, 2,3 … m);
3) Determining the main components:
solving characteristic equation |R-lambda I of sample correlation matrix R m M eigenvectors are obtained by =0, wherein λ represents eigenvalues, I represents an identity matrix, and R is a symmetric matrix, eigenvalues are obtained by jacobian method, and the eigenvalues are obtained according toDetermining the value of p to make the information utilization rate up to above 85% to obtain p main components, for every lambda j (j=1, 2,3 … p) solve the equation set rb=λ j b Unit feature vector->b represents a feature vector set;
4) Converting the standardized index variable into a main component:
wherein U is 1 Called first principal component, U 2 Called second principal component, U m Called the m-th principal component;
5) And comprehensively evaluating the m main components, and carrying out weighted summation on the m main components to obtain a final evaluation value, wherein the weight is the variance contribution rate of each main component.
Furthermore, the key product quality prediction model constructed based on the data-driven modeling strategy adopts an algorithm in a machine learning algorithm library built in an industrial Internet platform, and models by combining the preprocessed data.
Further, the performing offset correction by the prediction model includes: in the running process of the model, new data are adopted to correct the model, and a deviation correction method is adopted to correct the model according to the model prediction error, wherein the calculation formula of the deviation correction method is as follows:
wherein, the liquid crystal display device comprises a liquid crystal display device,representing the output value of the model after correction at the current moment;
representing a predicted value output by the current time model;
k represents a correction coefficient;
y (t-1) andrepresenting the real value at the previous moment and the predicted value output by the model;
t represents a sampling time;
the correction coefficient is obtained by dividing the model error of the current period and the model error of the previous period:
wherein Y (t) i ) Representing data within a current time period;
representing an average value of the predicted values in the current period;
Y(t i -t) represents data in a previous period;
Y m (t i -t) represents the median value of the predictions in the previous period;
K=median(K i ) Will K i And obtaining the correction coefficient by taking the average value.
Further, the performing model parameter correction by the prediction model includes: taking deviation between the model output value and the actual value as an optimization target, and optimizing key parameters of the model by adopting a genetic algorithm based on historical data, wherein the optimization target is as follows:
the beneficial effects of the invention are as follows:
1. industrial data are acquired based on an industrial Internet platform, so that the problem of data island existing in chemical enterprises can be solved, and the data value of different systems is fully mined.
2. Based on the data-driven modeling method of the industrial Internet platform, the built-in machine learning algorithm library comprises dozens of mainstream algorithms, so that the model can be better adapted to frequent changes of working conditions.
3. The method provided by the invention can greatly reduce the requirements of factories on measuring equipment, and has important significance for improving the product quality, promoting energy conservation and consumption reduction and accelerating the digital transformation of enterprises.
4. The method provided by the invention can predict the key indexes of chemical raw materials and products in real time, avoids the problems of long time consumption, difficult detection or incapability of detection of certain indexes and the like, and saves a great amount of time and resources.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a data driven quality modeling method based on an industrial Internet platform according to an embodiment of the invention;
FIG. 2 is a flow chart of a data-driven modeling business in a data-driven quality modeling method based on an industrial Internet platform according to an embodiment of the present invention;
FIG. 3 is a diagram of an industrial Internet platform technology architecture in a data-driven quality modeling method based on an industrial Internet platform according to an embodiment of the present invention;
fig. 4 is a configuration diagram of an industrial internet platform in a data-driven quality modeling method based on the industrial internet platform according to an embodiment of the present invention.
Detailed Description
For the purpose of further illustrating the various embodiments, the present invention provides the accompanying drawings, which are a part of the disclosure of the present invention, and which are mainly used to illustrate the embodiments and, together with the description, serve to explain the principles of the embodiments, and with reference to these descriptions, one skilled in the art will recognize other possible implementations and advantages of the present invention, wherein elements are not drawn to scale, and like reference numerals are generally used to designate like elements.
According to an embodiment of the invention, a data-driven quality modeling method based on an industrial Internet platform is provided.
The invention will now be further described with reference to the accompanying drawings and detailed description, as shown in fig. 1-4, a data-driven quality modeling method based on an industrial internet platform according to an embodiment of the invention, the method comprising the steps of:
s1, collecting different system data based on an industrial Internet platform, and uniformly summarizing the data;
specifically, the collected data comprise quality index data of a quality service system and real-time production data of a production system;
s2, carrying out data preprocessing on the acquired data;
s3, selecting auxiliary variables according to a process principle and process characteristics, and performing dimension reduction on the auxiliary variables by adopting a principal component analysis method;
specifically, the principal component analysis method is a dimension reduction method which is widely applied, and on the basis of retaining data information as much as possible, a variance-covariance structure of a group of variables is explained by replacing a plurality of random variables with a few mutually uncorrelated comprehensive factors and essentially a plurality of linear combinations of the group of variables. The weight of each main component is determined by the contribution rate of the main component and objectively determined by the information of the data, so that the defect that the subjective weighting method manually determines the weight is overcome;
s4, constructing a key product quality prediction model based on a data driving modeling strategy;
specifically, the data driving model is a process model based on a large amount of process data and a machine learning algorithm, and benefits from massive real-time process data and experimental analysis data brought by a chemical enterprise distributed control system and a laboratory information management system, so that the process model can be established by deep mining of the data through the machine learning algorithm. The data driving model needs fewer process mechanisms in the training stage, has the advantages of small calculated amount, high solving speed, high accuracy in the data range established by the model and the like in the using stage, achieves good effects in various process modeling tasks, and achieves wide attention of students;
s5, performing deviation correction and model parameter correction on the established prediction model.
In one embodiment, the data preprocessing of the collected data comprises the steps of:
s201, merging and storing the acquired data to obtain sample data;
s202, abnormal data elimination and filtering processing are carried out on the sample data, and data are normalized.
In one embodiment, the calculation formula for fusing the acquired data is as follows:
wherein h is 1q Indicating that the business system is at t 1q Data collected at the moment;
h 2q indicating that the production system is at t 2q Data collected at the moment;
ε h1 representing the acquired data h 1q Root mean square error of (a);
ε t1 indicating time t 1q Root mean square error of (a);
ε h2 representing the acquired data h 2q Root mean square error of (a);
ε t2 indicating time t 2q Root mean square error of (a);
h q representing the business system and the production system at t q And collecting the data fusion result at the moment.
In one embodiment, the sample data is subjected to abnormal data rejection and is subjected to screening processing by adopting a 3 sigma judgment principle, and the specific steps are as follows;
assuming that n auxiliary variables in the sample data are x, the sequence of x is x 1 ,x 2 ,…,x i (i=1, 2,3 … n) and the average value thereof is calculatedAnd standard deviation sigma:
if the auxiliary variable x in the sample satisfies the following formula:
then the sample is removed as an abnormal sample, the 3 sigma judgment processing is sequentially carried out on other auxiliary variables in the sample, and the screened sample is selected into a modeling sample set;
in one embodiment, the sample data is filtered to average filter the samples by the following formula:
X(t)=(X(t-T/2)+X(t-T/2+T c )+…+X(t))
…+X(t-T/2+T c )+X(t+T/2)/(T/T c )
wherein t represents a sampling time;
t represents a filtering time constant;
T c representing the sampling period.
In one embodiment, the normalizing the data normalizes the sample data to [ y ] by the following formula min ,y max ]:
y=[y min ,y max ]*(x-x min )/(x max -x min )+y min
Wherein y is min ,y max Representing the upper and lower bounds of the normalized target;
x max ,x min representing the current variable value as upper and lower bounds.
In one embodiment, the principal component analysis is calculated as follows:
1) Normalizing the original sample data and forming a normalization matrix:
let m-dimensional random vector x= (X) 1 ,X 2 ,…,X n ) T For n samples X i =(X i1 ,X i2 ,…,X im ) T (i=1, 2,3 … m), T is the superscript of the matrix transpose, form the sample matrix, normalize the sample matrix, average the samples:
sample variance:
the normalized data are:
wherein, (i=1, 2,3 … m; k=1, 2,3 … n),
form a standardized matrix X (X ik );
2) Sample correlation coefficient matrix is calculated for standard price matrix:
wherein r is ij Elements representing row i, column j of matrix R, (i, j=1, 2,3 … m);
3) Determining the main components:
solving characteristic equation |R-lambda I of sample correlation matrix R m M eigenvectors are obtained by =0, wherein λ represents eigenvalues, I represents an identity matrix, and R is a symmetric matrix, eigenvalues are obtained by jacobian method, and the eigenvalues are obtained according toDetermining the value of p to make the information utilization rate up to above 85% to obtain p main components, for every lambda j (j=1, 2,3 … p) solve the equation set rb=λ j b Unit feature vector->b represents a feature vector set;
4) Converting the standardized index variable into a main component:
wherein U is 1 Called first principal component, U 2 Called second principal component, U m Called the m-th principal component;
5) And comprehensively evaluating the m main components, and carrying out weighted summation on the m main components to obtain a final evaluation value, wherein the weight is the variance contribution rate of each main component.
In one embodiment, the modeling strategy based on data driving builds a key product quality prediction model by adopting an algorithm in a machine learning algorithm library built in an industrial internet platform, and combining the preprocessed data for modeling
Specifically, the data driving model adopts dozens of mainstream algorithms in a machine learning algorithm library, such as an artificial neural network, a least square support vector machine and the like;
the artificial neural network is a mathematical model for performing distributed parallel information processing by simulating the behavior characteristics of the biological neural network. The network relies on the complexity of the system, and achieves the purpose of information processing by adjusting the relationship of interconnection among a large number of nodes. The artificial neural network has self-learning and self-adapting capabilities, can analyze the internal relation and rules of the two through a group of input and output data which are provided in advance and correspond to each other, and finally forms a complex nonlinear system function through the rules. Each input connection of the neuron has a synaptic connection strength, represented by a connection weight, through which the signal to be generated is amplified, each input quantity corresponding to an associated weight. The processing unit quantizes the weighted inputs, and then adds the weighted values to calculate the output.
In artificial neural networks, the ability and efficiency of the network to solve problems is largely dependent on the activation function employed by the network, in addition to the network architecture. The selection of the activation function has a great influence on the convergence speed of the network, and the selection of the activation function should be different for different practical problems. The usual activation functions are in the following forms:
threshold function:
wherein p represents a dependent variable of the threshold function;
x represents a dependent variable of a threshold function;
this function is also commonly referred to as a step function. When the step function is adopted as the activation function, the output of the neuron is 1 or 0 at the moment, and the excitation or inhibition of the neuron is reflected;
linear function: y=kx+b
Wherein y represents a dependent variable of a linear function;
x represents a dependent variable of a linear function;
k represents the slope of the linear function;
b represents the intercept of the linear function;
the function can be used as an activation function of the output neuron when the output result is any value;
logarithmic sigmoid function:
wherein x represents the dependent variable of the sigmoid function;
the output of the logarithmic S-shaped function is between 0 and 1, and is often required to be selected for outputting signals in the range of 0 to 1, which is the most widely used activation function in neurons;
hyperbolic tangent sigmoid function:
wherein x represents a dependent variable of a hyperbolic tangent sigmoid function;
the hyperbolic tangent sigmoid function is similar to a smoothed step function, has the same shape as a logarithmic sigmoid function, is symmetrical about the origin, has an output between-1 and 1, and is often required to be used for outputting signals in the range of-1 to 1.
The least square support vector machine algorithm changes inequality constraint in the traditional support vector machine into equality constraint, and takes the sum of squares of errors as a loss function of training, so that solving the quadratic programming problem in the support vector machine is converted into solving the linear equation set problem, and the solving speed is increased;
the LSSVM optimization problem can be described by the following system of equations:
wherein L represents a loss function;
omega represents a weight vector;
gamma represents an adjustable function;
e i representing an error vector;
x i representing input data;
y i representing output data;
representing a mapping function;
b represents a deviation vector;
t represents a transpose;
i represents the position of the data (i=1 to n);
n represents the total number of training data;
s.t represents a constraint abbreviation;
solving the optimization problem by adopting a Lagrangian method:
the least square support vector machine expression form isThe invention adopts kernel function as radial basis kernel function, < ->Wherein k (x) i ,y i ) As a kernel function, a i Representing the lagrangian multiplier. e, e i Representing an error vector; n represents the total number of training data; i represents the position of the data (i=1 to n);
in one embodiment, the predictive model performing bias correction includes: in the running process of the model, new data are adopted to correct the model, and a deviation correction method is adopted to correct the model according to the model prediction error, wherein the calculation formula of the deviation correction method is as follows:
wherein, the liquid crystal display device comprises a liquid crystal display device,representing the output value of the model after correction at the current moment;
representing a predicted value output by the current time model;
k represents a correction coefficient;
y (t-1) andrepresenting the real value at the previous moment and the predicted value output by the model;
t represents a sampling time;
the correction coefficient is obtained by dividing the model error of the current period and the model error of the previous period:
wherein Y (t) i ) Representing data within a current time period;
representing an average value of the predicted values in the current period;
Y(t i -t) represents data in a previous period;
Y m (t i -t) represents the median value of the predictions in the previous period;
K=median(K i ) Will K i And obtaining the correction coefficient by taking the average value.
In one embodiment, the predictive model making model parameter corrections includes: taking deviation between the model output value and the actual value as an optimization target, and optimizing key parameters of the model by adopting a genetic algorithm based on historical data, wherein the optimization target is as follows:
genetic algorithms start the search process from a set of randomly generated initial solutions, called populations. Each individual in the population is a solution to the problem, called a chromosome. These chromosomes evolve continuously in subsequent iterations, called inheritance. The genetic algorithm is realized mainly through crossover, mutation and selection operation. Crossover or mutation operations generate the next generation of chromosomes, called offspring. Chromosome quality is measured by fitness. A certain number of individuals are selected from the previous generation and the next generation according to the fitness, and the individuals are used as the next generation group to continue to evolve, so that after a plurality of generations, the algorithm converges to the best chromosome, which is likely to be the optimal solution or suboptimal solution of the problem. The concept of fitness is used in genetic algorithms to measure how well optimal solutions are likely to be achieved in the calculation of the negligence of individual individuals in a population. The function that measures fitness of an individual is called a fitness function. The definition of fitness functions is generally related to a specific solution problem.
The main operation procedure of the genetic algorithm using three genetic operators (selection operator, crossover operator and mutation operator) is as follows:
a. initializing: setting an evolution algebra counter v=0; setting a maximum evolution algebra V; randomly generating H individuals as an initial population Q (0);
b. individual evaluation: calculating the fitness of individuals in the group Q (V);
c. selection operation: applying a selection operator to the population;
d. crossover operator: acting on the population;
e. and (3) mutation operation: acting mutation operators on the group, and obtaining a next generation group Q (v+1) after the group Q (v) is subjected to selection, crossing and mutation operation;
f. judging a termination condition: if V is less than or equal to V, then: v=v+1, go to step b; if V > V, taking the individual with the greatest fitness obtained in the evolution process as the optimal solution to output, and terminating the calculation.
In summary, by means of the technical scheme, the industrial data are collected based on the industrial internet platform, so that the problem of data island existing in chemical enterprises can be solved, and the data value of different systems can be fully mined; based on the data-driven modeling method of the industrial Internet platform, the built-in machine learning algorithm library comprises dozens of mainstream algorithms, so that the model can be better adapted to frequent changes of working conditions; the method provided by the invention can greatly reduce the requirements of factories on measuring equipment, and has important significance for improving the product quality, promoting energy conservation and consumption reduction and accelerating the digital transformation of enterprises; the method provided by the invention can predict the key indexes of chemical raw materials and products in real time, avoids the problems of long time consumption, difficult detection or incapability of detection of certain indexes and the like, and saves a great amount of time and resources.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims (10)

1. A data-driven quality modeling method based on an industrial internet platform, the method comprising the steps of:
s1, collecting different system data based on an industrial Internet platform, and uniformly summarizing the data;
s2, carrying out data preprocessing on the acquired data;
s3, selecting auxiliary variables according to a process principle and process characteristics, and performing dimension reduction on the auxiliary variables by adopting a principal component analysis method;
s4, constructing a key product quality prediction model based on a data driving modeling strategy;
s5, performing deviation correction and model parameter correction on the established prediction model.
2. The method for data-driven quality modeling based on an industrial internet platform according to claim 1, wherein the data preprocessing of the collected data comprises the steps of:
s201, merging and storing the acquired data to obtain sample data;
s202, abnormal data elimination and filtering processing are carried out on the sample data, and data are normalized.
3. The data-driven quality modeling method based on an industrial internet platform according to claim 2, wherein the calculation formula for fusing the collected data is as follows:
wherein h is 1q Representing business systemIs unified at t 1q Data collected at the moment;
h 2q indicating that the production system is at t 2q Data collected at the moment;
ε h1 representing the acquired data h 1q Root mean square error of (a);
ε t1 indicating time t 1q Root mean square error of (a);
ε h2 representing the acquired data h 2q Root mean square error of (a);
ε t2 indicating time t 2q Root mean square error of (a);
h q representing the business system and the production system at t q And collecting the data fusion result at the moment.
4. The data-driven quality modeling method based on the industrial internet platform according to claim 2, wherein the sample data is subjected to abnormal data rejection and is subjected to screening processing by adopting a 3 sigma judgment principle, and the method comprises the following specific steps of;
assuming that n auxiliary variables in the sample data are x, the sequence of x is x 1 ,x 2 ,…,x i (i=1, 2,3 … n) and the average value thereof is calculatedAnd standard deviation sigma:
if the auxiliary variable x in the sample satisfies the following formula:
then the sample is removed as an abnormal sample, the 3 sigma judgment processing is sequentially carried out on other auxiliary variables in the sample, and the screened sample is selected into a modeling sample set.
5. The method of claim 2, wherein the filtering of the sample data provides for average filtering of the sample by the following equation:
X(t)=(X(t-T/2)+X(t-T/2+T c )+…+X(t))…+X(t-T/2+T c )+X(t+T/2)/(T/T c )
wherein t represents a sampling time;
t represents a filtering time constant;
T c representing the sampling period.
6. The industrial internet platform-based data-driven quality modeling method of claim 2, wherein normalizing the data normalizes the sample data to [ y ] by the following formula min ,y max ]:
y=[y min ,y max ]*(x-x min )/(x max -x min )+y min
Wherein y is min ,y max Representing the upper and lower bounds of the normalized target;
x max ,x min representing the current variable value as upper and lower bounds.
7. The method for modeling quality based on data driving of industrial internet platform according to claim 1, wherein the main component analysis method comprises the following steps:
1) Normalizing the original sample data and forming a normalization matrix:
let m-dimensional random vector x= (X) 1 ,X 2 ,…,X n ) T For n samples X i =(X i1 ,X i2 ,…,X im ) T (i=1, 2,3 … m), T is the superscript of the matrix transpose, form the sample matrix, normalize the sample matrix, average the samples:
sample variance:
the normalized data are:
wherein, (i=1, 2,3 … m; k=1, 2,3 … n),
form a standardized matrix X (X ik );
2) Sample correlation coefficient matrix is calculated for standard price matrix:
wherein r is ij Elements representing row i, column j of matrix R, (i, j=1, 2,3 … m);
3) Determining the main components:
solving characteristic equation |R-lambda I of sample correlation matrix R m M eigenvectors are obtained by =0, wherein λ represents eigenvalues, I represents an identity matrix, and R is a symmetric matrix, eigenvalues are obtained by jacobian method, and the eigenvalues are obtained according toDetermining the value of p to make the information utilization rate up to above 85% to obtain p main components, for every lambda j (j=1, 2,3 … p) solve the equation set rb=λ j b Unit feature vector->b represents a feature vector set;
4) Converting the standardized index variable into a main component:
wherein U is 1 Called first principal component, U 2 Called second principal component, U m Called the m-th principal component;
5) And comprehensively evaluating the m main components, and carrying out weighted summation on the m main components to obtain a final evaluation value, wherein the weight is the variance contribution rate of each main component.
8. The method for modeling quality based on data driving of an industrial internet platform according to claim 1, wherein the method for modeling quality prediction model of the key product based on the data driving modeling strategy adopts an algorithm in a machine learning algorithm library built in the industrial internet platform, and models by combining the preprocessed data.
9. The method of claim 1, wherein the performing bias correction by the predictive model comprises: in the running process of the model, new data are adopted to correct the model, and a deviation correction method is adopted to correct the model according to the model prediction error, wherein the calculation formula of the deviation correction method is as follows:
wherein, the liquid crystal display device comprises a liquid crystal display device,representing the output value of the model after correction at the current moment;
representing a predicted value output by the current time model;
k represents a correction coefficient;
y (t-1) andrepresenting the real value at the previous moment and the predicted value output by the model;
t represents a sampling time;
the correction coefficient is obtained by dividing the model error of the current period and the model error of the previous period:
wherein Y (t) i ) Representing data within a current time period;
representing an average value of the predicted values in the current period;
Y(t i -t) represents data in a previous period;
Y m (t i -t) represents the median value of the predictions in the previous period;
K=median(K i ) Will K i And obtaining the correction coefficient by taking the average value.
10. The method of claim 1, wherein the predictive model for model parameter correction comprises: taking deviation between the model output value and the actual value as an optimization target, and optimizing key parameters of the model by adopting a genetic algorithm based on historical data, wherein the optimization target is as follows:
CN202310408969.1A 2023-04-17 2023-04-17 Data-driven quality modeling method based on industrial Internet platform Active CN116523388B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310408969.1A CN116523388B (en) 2023-04-17 2023-04-17 Data-driven quality modeling method based on industrial Internet platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310408969.1A CN116523388B (en) 2023-04-17 2023-04-17 Data-driven quality modeling method based on industrial Internet platform

Publications (2)

Publication Number Publication Date
CN116523388A true CN116523388A (en) 2023-08-01
CN116523388B CN116523388B (en) 2023-11-10

Family

ID=87391371

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310408969.1A Active CN116523388B (en) 2023-04-17 2023-04-17 Data-driven quality modeling method based on industrial Internet platform

Country Status (1)

Country Link
CN (1) CN116523388B (en)

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012027683A (en) * 2010-07-23 2012-02-09 Nippon Steel Corp Quality prediction device, quality prediction method, program and computer readable recording medium
US20170061305A1 (en) * 2015-08-28 2017-03-02 Jiangnan University Fuzzy curve analysis based soft sensor modeling method using time difference Gaussian process regression
CN108647481A (en) * 2018-08-14 2018-10-12 华东理工大学 A kind of rotary kiln burning zone temperature flexible measurement method
US20190114550A1 (en) * 2017-10-13 2019-04-18 Siemens Aktiengesellschaft Method for computer-implemented determination of a data-driven prediction model
CN109657411A (en) * 2019-01-18 2019-04-19 华东理工大学 A kind of solvent deasphalting unit modeling and optimization method based on data-driven
CN110210687A (en) * 2019-06-13 2019-09-06 中南大学 A kind of Nonlinear Dynamic production process product quality prediction technique returned based on local weighted slow feature
CN110405343A (en) * 2019-08-15 2019-11-05 山东大学 A kind of laser welding process parameter optimization method of the prediction model integrated based on Bagging and particle swarm optimization algorithm
CN111291937A (en) * 2020-02-25 2020-06-16 合肥学院 Method for predicting quality of treated sewage based on combination of support vector classification and GRU neural network
CN111428201A (en) * 2020-03-27 2020-07-17 陕西师范大学 Prediction method for time series data based on empirical mode decomposition and feedforward neural network
WO2021063136A1 (en) * 2019-09-30 2021-04-08 江苏大学 Data-driven high-precision integrated navigation data fusion method
US20210233039A1 (en) * 2019-03-24 2021-07-29 Beijing University Of Technology Soft Measurement Method for Dioxin Emission Concentration In Municipal Solid Waste Incineration Process
CN113569993A (en) * 2021-08-27 2021-10-29 浙江工业大学 Method for constructing quality prediction model in polymerization reaction process
US20220147672A1 (en) * 2019-05-17 2022-05-12 Tata Consultancy Services Limited Method and system for adaptive learning of models for manufacturing systems
CN114997486A (en) * 2022-05-26 2022-09-02 南京工业大学 Effluent residual chlorine prediction method of water works based on width learning network
US20220340827A1 (en) * 2019-09-24 2022-10-27 China Petroleum & Chemical Corporation System and method for intelligent gasification blending
US20220373984A1 (en) * 2021-05-19 2022-11-24 Shandong University Hybrid photovoltaic power prediction method and system based on multi-source data fusion
CN115545321A (en) * 2022-10-14 2022-12-30 云南中烟工业有限责任公司 On-line prediction method for process quality of silk making workshop

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012027683A (en) * 2010-07-23 2012-02-09 Nippon Steel Corp Quality prediction device, quality prediction method, program and computer readable recording medium
US20170061305A1 (en) * 2015-08-28 2017-03-02 Jiangnan University Fuzzy curve analysis based soft sensor modeling method using time difference Gaussian process regression
US20190114550A1 (en) * 2017-10-13 2019-04-18 Siemens Aktiengesellschaft Method for computer-implemented determination of a data-driven prediction model
CN108647481A (en) * 2018-08-14 2018-10-12 华东理工大学 A kind of rotary kiln burning zone temperature flexible measurement method
CN109657411A (en) * 2019-01-18 2019-04-19 华东理工大学 A kind of solvent deasphalting unit modeling and optimization method based on data-driven
US20210233039A1 (en) * 2019-03-24 2021-07-29 Beijing University Of Technology Soft Measurement Method for Dioxin Emission Concentration In Municipal Solid Waste Incineration Process
US20220147672A1 (en) * 2019-05-17 2022-05-12 Tata Consultancy Services Limited Method and system for adaptive learning of models for manufacturing systems
CN110210687A (en) * 2019-06-13 2019-09-06 中南大学 A kind of Nonlinear Dynamic production process product quality prediction technique returned based on local weighted slow feature
CN110405343A (en) * 2019-08-15 2019-11-05 山东大学 A kind of laser welding process parameter optimization method of the prediction model integrated based on Bagging and particle swarm optimization algorithm
US20220340827A1 (en) * 2019-09-24 2022-10-27 China Petroleum & Chemical Corporation System and method for intelligent gasification blending
WO2021063136A1 (en) * 2019-09-30 2021-04-08 江苏大学 Data-driven high-precision integrated navigation data fusion method
CN111291937A (en) * 2020-02-25 2020-06-16 合肥学院 Method for predicting quality of treated sewage based on combination of support vector classification and GRU neural network
CN111428201A (en) * 2020-03-27 2020-07-17 陕西师范大学 Prediction method for time series data based on empirical mode decomposition and feedforward neural network
US20220373984A1 (en) * 2021-05-19 2022-11-24 Shandong University Hybrid photovoltaic power prediction method and system based on multi-source data fusion
CN113569993A (en) * 2021-08-27 2021-10-29 浙江工业大学 Method for constructing quality prediction model in polymerization reaction process
CN114997486A (en) * 2022-05-26 2022-09-02 南京工业大学 Effluent residual chlorine prediction method of water works based on width learning network
CN115545321A (en) * 2022-10-14 2022-12-30 云南中烟工业有限责任公司 On-line prediction method for process quality of silk making workshop

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
信息科技: "数据驱动的复杂产品质量预测与质量规则挖掘方法研究", 《信息科技》 *
冯大春;鲁红;: "数据驱动技术在石化工业运行中的应用", 石油化工自动化, no. 06 *
李建刚: "集成即时学习软测量建模方法研究", 《工程科技Ⅰ辑》, no. 06 *
牟盛静, 中国优秀博硕士学位论文全文数据库 (博士), no. 03 *

Also Published As

Publication number Publication date
CN116523388B (en) 2023-11-10

Similar Documents

Publication Publication Date Title
Li et al. Optimal selection of heterogeneous ensemble strategies of time series forecasting with multi-objective programming
US11650968B2 (en) Systems and methods for predictive early stopping in neural network training
CN111126575A (en) Gas sensor array mixed gas detection method and device based on machine learning
CN110571792A (en) Analysis and evaluation method and system for operation state of power grid regulation and control system
CN111091241A (en) BP neural network-based drug sales prediction and decision method and system
Monroy et al. Fault diagnosis of a benchmark fermentation process: a comparative study of feature extraction and classification techniques
CN111723523B (en) Estuary surplus water level prediction method based on cascade neural network
CN110059824A (en) A kind of neural net prediction method based on principal component analysis
CN111832703A (en) Sampling interval perception long-short term memory network-based process manufacturing industry irregular sampling dynamic sequence modeling method
US20140297573A1 (en) Method for quantifying amplitude of a response of a biological network
CN114819395A (en) Industry medium and long term load prediction method based on long and short term memory neural network and support vector regression combination model
CN116523388B (en) Data-driven quality modeling method based on industrial Internet platform
JPH06337852A (en) Time series prediction method by neural network
CN116662925A (en) Industrial process soft measurement method based on weighted sparse neural network
Hongjiu et al. Performance comparison of artificial intelligence methods for predicting cash flow
CN116109039A (en) Data-driven anomaly detection and early warning system
CN113151842B (en) Method and device for determining conversion efficiency of wind-solar complementary water electrolysis hydrogen production
CN115275977A (en) Power load prediction method and device
CN115293520A (en) Method for constructing structured multi-modal industrial process index estimation framework
Zambrano et al. Machine learning techniques for monitoring the sludge profile in a secondary settler tank
Lin et al. A deep learning-based customer forecasting tool
CN113449809A (en) Cable insulation on-line monitoring method based on KPCA-NSVDD
Lajoie et al. A data-driven framework to deal with intrinsic variability of industrial processes: An application in the textile industry
Cui et al. KPCA-ESN soft-sensor model of polymerization process optimized by biogeography-based optimization algorithm
US8812430B1 (en) Determining a confidence of a measurement signature score

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant