CN113743652B

CN113743652B - Sugarcane squeezing process prediction method based on depth feature recognition

Info

Publication number: CN113743652B
Application number: CN202110901473.9A
Authority: CN
Inventors: 蒙艳玫; 陈劼; 柳宏耀; 邱敏敏; 韦锦; 陆冠成; 董振; 李正源; 胡松杰; 吴雪; 张月; 李济钦
Original assignee: Guangxi University
Current assignee: Guangxi University
Priority date: 2021-08-06
Filing date: 2021-08-06
Publication date: 2022-03-11
Anticipated expiration: 2041-08-06
Also published as: CN113743652A

Abstract

The invention discloses a sugarcane squeezing process prediction method based on depth feature recognition, which comprises the following steps: firstly, collecting a plurality of groups of original data; secondly, removing abnormal data and carrying out standardized processing on the original data acquired in the first step to obtain standardized data; thirdly, performing multi-stage screening on the normalized data obtained in the second step to obtain a characteristic vector which has high correlation degree with energy consumption and extraction rate and low redundancy; fourthly, searching the effect of different feature combinations and model parameters on the single data driving model from the feature vector candidate set obtained by screening in the third step by adopting a mixed chicken flock algorithm to obtain parameter variables, energy consumption and extraction rate of the single model under the optimal performance; fifthly, establishing first-layer deterministic prediction output; and sixthly, establishing a multi-model combined model to realize the deterministic prediction and the probabilistic prediction of the extraction rate and the energy consumption. The method greatly improves the model fitting effect and the prediction precision, and solves the problems that the indexes are difficult to measure on line and the like.

Description

Sugarcane squeezing process prediction method based on depth feature recognition

Technical Field

The invention relates to the technical field of design optimization of sugarcane squeezing process procedures, in particular to a sugarcane squeezing process prediction method based on depth feature recognition.

Background

The sugarcane juice extraction is the first link of sugar production, the extraction rate and the production energy consumption are two important indexes of the working section, and whether the sugarcane juice reaches the standard or not affects the smooth operation and the economic benefit of the whole sugar production. Due to technical limitations, these indexes are currently calculated and obtained by an off-line assay experimental mode, and the method has hysteresis, so that the system indexes cannot be quickly adjusted in time. Therefore, the real-time monitoring of the indexes has positive significance for guiding the process operation optimization control.

With the development of artificial intelligence technology, a data-driven modeling-based black box method is widely applied to the sugar industry, and currently, the prediction and analysis aiming at process indexes mainly comprise a traditional support vector machine, an artificial neural network, a generalized dynamic fuzzy neural network and the like. Considering the space-time complexity of index energy consumption and extraction rate of a squeezing system, the difficulty of adopting a data driving algorithm which is optimal for all conditions is extremely high. Another approach to reduce the risk of poor prediction and improve overall accuracy is to integrate data-driven models of a variety of different input features. On the other hand, a key step in building a data-driven model is to determine the important feature inputs. The method comprises the steps of analyzing the whole squeezing system, finding that a plurality of factors influencing the squeezing extraction rate and energy consumption exist, excessive characteristic input can cause the problem that a prediction model has poor generalization performance and the like, and reducing data dimensionality by combining new attribute combinations through conversion in the currently widely used dimensionality reduction method PCA, but the converted characteristics possibly have no physical significance, and meanwhile irrelevant characteristics can cause excessive fitting of training data. Therefore, it is desirable to provide a multi-model integrated prediction framework with feature recognition function, aiming at solving the above problems.

The information disclosed in this background section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.

Disclosure of Invention

The invention aims to provide a sugarcane squeezing process prediction method based on depth feature recognition, so that the defects that an existing sugarcane squeezing process analysis system is hysteretic and difficult to integrate multiple models for prediction are overcome.

In order to achieve the aim, the invention provides a sugarcane squeezing process prediction method based on depth feature recognition, which comprises the following steps:

step one, collecting a plurality of groups of original data from field equipment, wherein each group of original data comprises field detection data of indexes such as current load, primary current load and rotating speed, secondary current load and rotating speed, current load and rotating speed of a presser, actual value and target value of feeding level of a conveying belt, and osmotic water flow and osmotic water-to-sugarcane ratio of a sugarcane cutter;

and step two, removing abnormal data from all the original data of each index collected in the step one, and carrying out scale change on the processed data to obtain normalized data.

Step three, performing multi-stage screening on all normalized data of each index obtained in the step two based on mutual information to obtain a characteristic vector which is high in correlation degree with energy consumption and extraction rate and low in redundancy;

step four, searching the effect of different feature combinations and model parameters on the single data driving model from the feature vector candidate obtained by screening in the step three by adopting a mixed chicken flock algorithm to obtain parameter variables, energy consumption and extraction rate of the single model under the optimal performance;

step five, taking the parameter variable under the optimal performance of the single model obtained in the step four as input, taking energy consumption and extraction rate as output, and establishing a first layer of deterministic prediction output;

and step six, a Bayesian average model is adopted to integrate learning on the basis of the prediction results of the single models, a multi-model combination model is established, and deterministic prediction and probabilistic prediction of the extraction rate and the energy consumption are achieved.

In the second step, the processed data is subjected to scale change according to the scale change

Scaling each item of data to [ -1,1]To obtain normalized data.

Wherein, the multi-stage screening in the third step comprises the following steps: defining the correlation degree of the characteristics and the prediction target and the redundancy among the characteristics, dividing the normalized data into three subsets R1, R2 and R3 according to the correlation degree from high to low, and dividing R2 into two parts R1 and R2 according to the size of an accumulated contribution degree index commonly used in principal component analysis; adding the characteristic corresponding to the maximum value of the R2 correlation degree to R1, calculating the redundancy of the characteristic in R1 and the R3 subset, deleting the characteristic corresponding to the maximum value of the redundancy, repeating the operation until R2 is an empty set, and taking the screened subsets R1 and R1 as the result of multi-stage screening.

Further, the method for defining the correlation degree between the features and the predicted target and the redundancy between the features in the step three comprises the following steps: defining feature f based on information entropy theory_iAnd predicted target y_kMutual information of (2):

in the formula, f_ijIs the j feature of the i sample, y_kFor the kth target output, n, m, and s represent the number of samples, features, and predicted targets in the entire dataset, p (f)_ij) Representing data f_ijProbability of occurrence in data set X, p (f)_ij,y_k) Representing data f_ijAnd y_kThe probability of simultaneous occurrence in the data set X is based on the above formula_jAnd output target y_kThe degree of correlation of (d) can be defined as:

in the formula, Max represents the maximum value of mutual information of all characteristics and an output target;

any two features

And features

The mutual information between can be expressed as:

characteristic f_iThe redundancy calculation formula with other features f in the data set X is as follows:

further, in step three, with the cumulative contribution index of 0.85 as a threshold, R2 is divided into two parts, R1 and R2.

Further, in the fourth step, a principal component analysis method is introduced into the initialization condition of the mixed chicken swarm algorithm, principal component analysis is carried out on the subsets obtained in the third step, initial variable constraints n with contribution degrees larger than 85% are obtained, n features are randomly selected as the input of the data driving model, and binary coding is carried out on the corresponding particle positions 1.

Further, in step five, in the establishing of the first layer deterministic prediction output, the deep neural network and the shallow neural network are organically combined, and the selected data-driven model comprises a twin support vector machine, a core limit learning machine and a deep core limit learning machine.

Compared with the prior art, the invention has the following beneficial effects:

the method is mainly applied to control and optimization of process parameters in the sugarcane squeezing process, effectively identifies characteristic variables under the optimal performance of different data-driven models according to the idea of data-driven modeling and by combining a mutual information theory, a mixed chicken flock algorithm and a principal component analysis method, and provides a multi-model combined model integrated by several models to establish a prediction model for energy consumption and extraction rate in the sugarcane squeezing process. Compared with the traditional method, the method has the advantages that the model fitting effect and the prediction accuracy are greatly improved, and the problems that the indexes are difficult to measure on line and the like are solved. In addition, the invention can also realize interval prediction of the indexes, help decision makers to better understand the performance of the current squeezing process and make corresponding indexes for energy conservation and emission reduction of the system.

Drawings

Fig. 1 is a schematic step diagram of a prediction method of a sugar cane crushing process based on depth feature recognition according to the present invention.

Fig. 2 is a flow chart of depth feature recognition according to the present invention.

FIG. 3 is a diagram of a deep core extreme learning machine network architecture according to the present invention.

FIG. 4 is a graph of sub-model based and ensemble learned energy consumption and extraction rate prediction results according to the present invention;

in fig. 4, a, b, c and d are graphs of the coefficient of determinability when the extraction rate is predicted based on the TSVR model, the KELM model, the DK-ELM model and the BMA model, respectively, with the original feature set as input; and the graph e, the graph f, the graph g and the graph h respectively take the original feature set as input, and predict the coefficient of the energy consumption based on the TSVR model, the KELM model, the DK-ELM model and the BMA model. The graph i, the graph j, the graph k and the graph l are respectively a characteristic set subjected to depth characteristic extraction as an input, and a coefficient-determinable graph when the extraction rate is predicted based on a TSVR model, a KELM model, a DK-ELM model and a BMA model; and the graph e, the graph f, the graph g and the graph h are respectively a set of characteristics obtained after depth characteristic extraction is used as input, and the set coefficient graph for predicting energy consumption is based on a TSVR model, a KELM model, a DK-ELM model and a BMA model.

Fig. 5 is a multi-model integrated probabilistic predictive result graph according to the present invention, wherein a is an energy consumption probability predictive result graph and b is a probability predictive result graph of an extraction rate.

Detailed Description

The following detailed description of the present invention is provided in conjunction with the accompanying drawings, but it should be understood that the scope of the present invention is not limited to the specific embodiments.

Throughout the specification and claims, unless explicitly stated otherwise, the word "comprise", or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated element or component but not the exclusion of any other element or component.

Fig. 1 to 5 show schematic diagrams of a depth feature recognition-based sugarcane crushing process prediction method according to a preferred embodiment of the invention, and the depth feature recognition-based sugarcane crushing process prediction method comprises the following steps:

step one, acquiring field real-time data of a sugarcane squeezing process through a DCS (distributed control system) on sugarcane squeezing field equipment, collecting the data once per minute, and acquiring 424 groups of original data sets OD (origin data) o₁(t)，y₁(t)，o₂(t)，y₂(t)，t＝1，2，...N，o₁(t) and o₂(t) are respectively 27 parameter variable vector sets possibly influencing energy consumption and extraction rate based on domain knowledge, and comprise on-site detection data of 3 sugarcane cutter machine current loads, primary electrified current loads and rotating speeds, secondary electrified current loads and rotating speeds, 6 squeezer current loads and rotating speeds, actual and target material level values on a conveying belt, permeate water flow and permeate water-to-sugarcane ratio, y₁(t) is the energy consumption per hour (Kw.h) of the pressing system, y₂(t) is the squeezing extraction rate (%) of the squeezing system, and t is the sampling period;

step two, eliminating abnormal data from all the original data of each index collected in the step one, carrying out scale change on the processed data, and carrying out scale change according to the obtained data

Each item of data is normalized to an interval of [ -1,1 ] after linear processing]Obtaining normalized data; in the formula, x_iRepresenting each data sample, x_imaxFor the maximum value, x, in all samples of the feature i_iminIs the minimum of all samples of feature i;

step three, defining the correlation degree of the characteristics and the predicted target and the redundancy among the characteristics: defining feature f based on information entropy theory_iAnd predicted target y_kMutual information of (2):

any two features

And features

The mutual information between them is expressed as:

characteristic f_iFormula for calculating redundancy with other characteristics f in data set XComprises the following steps:

the normalized data is divided into three subsets R1, R2 and R3 according to the relevance from high to low, wherein the strong feature subset is R₁The weakly strongly correlated subset is R₂R of the uncorrelated subset₃(ii) a In the division, a threshold value delta of a cumulative contribution index commonly used in principal component analysis is defined₁、δ₂∈[0，1]And delta₁＜δ₂When the correlation is larger than δ₂When the correlation is greater than δ, the feature is stored in R1₁Less than delta₂Then, the feature is stored in R2, and the remaining features are stored in R3. Setting a threshold value delta₃∈(0，δ₂) In this experiment, the threshold value δ₂Dividing R2 into two parts, namely R1 and R2, adding the feature corresponding to the maximum value of the correlation degree of R2 to R1, calculating the redundancy of the feature in R1 and the R3 subset, deleting the feature corresponding to the maximum value of the redundancy, repeating the operation until R2 is an empty set, and taking the screened subsets, namely R1 and R1, as the results of multi-stage screening;

step four, searching the effect of different feature combinations and model parameters on the single data driving model from the feature vector candidate set obtained by screening in the step three by adopting a mixed chicken flock algorithm so as to obtain parameter variables, energy consumption and extraction rate of the single model under the optimal performance; the specific process is as follows:

1) initializing parameters: setting the iteration frequency as 150 times, the algebra of grade updating as 10, the ranges of model parameters are (0.01,1024), the cock proportion, the hen proportion and the chick proportion are respectively 0.15, 0.6 and 0.25, performing principal component analysis on the subset X obtained in the third step, obtaining an initial variable constraint n with the contribution degree of more than 85%, randomly selecting n characteristics in X, and placing the corresponding particle positions in 1 to implement binary coding;

2) traversing each particle position, selecting the characteristic with the variable being 1, inputting the characteristic into a data-driven model, and calculating the coefficient R of the model²，R²The calculation formula of (2) is as follows:

y_ithe actual value of the i-th test sample of the m sample volumes,

is the average value of the samples and is,

is the corresponding predicted value;

according to the coefficient of determinability R²And dividing the subsets into cock subgroups, hen subgroups and chicken subgroups, updating each subgroup when the updating conditions are met, wherein subgroup attributes comprise the selection of characteristic variables and model parameters of each submodel, and updating globally optimal individuals and worst individuals. When the updating condition of the level system is met, the chicken flock system is updated, and the method for updating the subgroups of different subgroups according to the corresponding updating formula is as follows:

i) for the cock subgroups, a forward and reverse learning mechanism is introduced, namely forward learning is carried out on globally optimal individuals, the convergence speed is accelerated, when the globally optimal individuals are found to be unchanged for many times, reverse learning is carried out on globally worst individuals, and local optimal solutions are jumped out with certain probability.

x^t+1 _i＝x^t _i*(1+Randn(0,s²))+w₁(x^t _best-x^t _i)

x^t+1 _i＝x^t _i*(1+Randn(0,s²))+w₂(x^t _worst-x^t _i)

In the formula, Randn (0, s)²) Is a mean value of 0 and a standard deviation of s²Gaussian distribution of (x)^t _iIs the position, x, in the ith individual at the t iteration^t+1 _iPosition in the ith individual at t +1 iteration, x^t _bestFor globally optimal individuals at the t-th iteration, x^t _bestIs the global worst individual at the t-th iteration, w₁And w₂Learning factors, f, for forward and backward learning, respectively_iFitness of the ith individual, f_kIs the fitness of the kth individual,

ii) the hens randomly select the cocks to follow, and the position updating formula is as follows:

x^t+1 _i＝x^t _i+S₁*rand*(x^t _r1-x^t _i)+S₂*rand*(x^t _r2-x^t _i)

in the formula, x^t _iIs the position, x, in the ith individual at the t iteration^t+1 _iPosition in the ith individual at the t +1 th iteration, r₁The cock followed by the hen, r₂Selecting randomly cock or hen for the whole chicken group, and r₁ ¹r₂。

And iii) the chick forages along with the hen, and the parental guidance mechanism and the adaptive factor are introduced into the chick position updating.

x^t+1 _i＝w*x^t _i+l₁*(x^t _m-x^t _i)+l₂*(x^t _r1-x^t _i)

Maximizing R by a coefficient of block²And outputting the input characteristics and the model parameters of each data-driven model after a plurality of training processes for optimizing the target.

Step five, establishing a prediction model by taking the parameter variable under the optimal performance of the single model obtained in the step four as input and taking the energy consumption and the extraction rate of the sugarcane pressing process as output;

preferably, the selected submodels adopt a twin support vector machine, a kernel limit learning machine and a deep kernel limit learning machine. The twin support vector regression machine determines the epsilon-insensitive upper and lower bounds of the target regression function respectively by solving a pair of hyperplanes, and can effectively improve the training speed of the model. On the basis of the extreme learning machine, a kernel function is introduced to constitute the extreme learning machine, so that the training speed and the generalization performance of the model can be effectively improved. The deep kernel extreme learning machine is a deep neural network, can extract and mine effective characteristics of input data through a multi-layer framework to realize more accurate prediction, and mainly comprises two parts as shown in fig. 3, wherein the first part is formed by stacking n ELM-AE self-coders, can realize extraction of data characteristics, and obtains a weight matrix of each layer.

H_i+1＝g((β^l+i)^TH_l)

In the formula, H_iFor the output of the ith layer, i ∈ [1, n ]]K (-) is a kernel function, g (-) is an activation function, and C is a regularization coefficient. Calculating each weight matrix beta by traversing stacking network₁，β₂，…β_n-1Until the output H of the last hidden layer is obtained_n. Second part, output H with last hidden layer_nAs the input of KELM, the target set Y is the output, and the hidden layer and the output layer are solvedThe weight matrix beta in between.

In the formula, k (·) represents a kernel function. The network output of the deep kernel extreme learning machine (DK-ELM) is:

as shown in fig. 4, a, b, c, e, f, g, a, n, o, e, a, e, a, e, a, e, a, e, a, e, a, e.

Step six, integrating the sub-model prediction modules by adopting a Bayesian average model, and establishing a multi-model combination model, wherein the calculation formula is as follows: y ═ w₁y₁+w₂y₂+w₃y₃The weight in the formula is estimated by the posterior model probability, so that the wrong selection and uncertainty of the model can be effectively solved. Assuming that a given dataset is D, y is an integration prediction value, and a model space M formed by each sub-model is { M ═ M₁，M₂，...M_kAnd f, the posterior distribution of the integrated forecast values y is as follows:

wherein p (y | M)_iD) given data D, M_iPredicted value distribution of model, p (M)_i| D) M given data D_iModel (model)The posterior distribution of (a), can be expressed as follows:

wherein, p (M)_i) Is M_iThe prior probability of the model being taken as a uniform distribution, i.e.

p(D|M_i) For the edge likelihood function of the model, the following can be expressed:

p(D|Mi)＝∫p(D|θ_i，M_i)p(D|M_i)dθ_i

theta is M_iA parameter vector of the model. Estimating the weight w and variance σ of a BMA using an expectation maximization algorithm (EM)². As shown in fig. 4, the (l) diagram and the (p) diagram are respectively a coefficient diagram of a test set integrated by a bayesian average model when energy consumption and extraction rate are used as prediction targets, and compared with a submodel prediction of an i, j, k diagram and an m, n, o diagram, the coefficient diagram has better generalization performance and higher accuracy.

After the weight and the variance are obtained, a monte carlo combination can be adopted to determine a forecast uncertainty interval, and the specific process is as follows: weight according to each submodel w₁,w₂,…,w_kLet us set the cumulative probability w₀', the cumulative probability of each submodel is computed in turn: w is a_k'＝w_k-1'+w_k(ii) a Step 1, randomly generating a number u between 0 and 1 if w_k-1'#uw_k', indicates that the kth model is selected; step 2, from the probability distribution p (y | M) of the kth model_kD) randomly generating a predicted value; and repeating the step 1 and the step 2M times, wherein M is the total number of the samples generating the uncertainty interval. After M sample values at any time sample point are obtained by sampling through the method, the M sample values are sorted from small to large, and the 90% forecast uncertainty interval of the Bayes average model is the part between 5% quantiles and 95% quantiles. As shown in FIG. 5, the probability prediction result chart of energy consumption and extraction rate is used for obtaining the fluctuation range of the prediction result of a certain confidence interval through probability predictionAnd a decision maker is helped to better know the performance of the current squeezing system, and corresponding indexes can be formulated for energy conservation and emission reduction of the system.

The method is mainly applied to control and optimization of process parameters in the sugarcane squeezing process, effectively identifies characteristic variables under the optimal performance of different data-driven models according to the idea of data-driven modeling and by combining a mutual information theory, a mixed chicken flock algorithm and a principal component analysis method, and provides a multi-model combined model integrated by several models to establish a prediction model for energy consumption and extraction rate in the sugarcane squeezing process. Compared with the traditional method, the method has the advantages that the model fitting effect and the prediction precision are greatly improved, the problems that the indexes are difficult to measure on line and the like are solved, meanwhile, the interval prediction of the indexes is realized, a decision maker can be helped to better know the performance of the current squeezing process, and corresponding indexes are made for the system for energy conservation and emission reduction.

The foregoing descriptions of specific exemplary embodiments of the present invention have been presented for purposes of illustration and description. It is not intended to limit the invention to the precise form disclosed, and obviously many modifications and variations are possible in light of the above teaching. The exemplary embodiments were chosen and described in order to explain certain principles of the invention and its practical application to enable one skilled in the art to make and use various exemplary embodiments of the invention and various alternatives and modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims and their equivalents.

Claims

1. A sugarcane squeezing process prediction method based on depth feature recognition is characterized by comprising the following steps:

step two, removing abnormal data from all the original data of each index collected in the step one, and carrying out scale change on the processed data to obtain normalized data;

step six, a Bayesian average model is adopted to integrate learning on the basis of the prediction results of each single model, a multi-model combination model is established, and deterministic prediction and probabilistic prediction of the extraction rate and the energy consumption are achieved;

Scaling each item of data to [ -1,1]To obtain normalized data;

wherein, the multi-stage screening in the third step comprises the following steps: defining the correlation degree of the characteristics and the prediction target and the redundancy among the characteristics, dividing the normalized data into three subsets R1, R2 and R3 according to the correlation degree from high to low, and dividing R2 into two parts R1 and R2 according to the size of an accumulated contribution degree index commonly used in principal component analysis; adding the features corresponding to the maximum R2 correlation degree to R1, calculating the redundancy of the features in R1 and the R3 subset, deleting the features corresponding to the maximum redundancy, repeating the operation until R2 is an empty set, and taking the screened subsets R1 and R1 as the results of multi-stage screening;

wherein, the third step defines the characteristics and the predicted targetsThe method for the redundancy between the correlation degree and each characteristic comprises the following steps: defining feature f based on information entropy theory_iAnd predicted target y_kMutual information of (2):

in the formula, f_ijIs the j feature of the i sample, y_kFor the kth target output, n and s represent the number of samples, features and predicted targets in the entire dataset, p (f)_ij) Representing data f_ijProbability of occurrence in data set X, p (f)_ij,y_k) Representing data f_ijAnd y_kProbability of simultaneous occurrence in the data set X, on the basis of the above formula, feature f_jAnd output target y_kThe degree of correlation of (d) can be defined as:

any two features f_j1And feature f_j2The mutual information between can be expressed as:

in the fourth step, a principal component analysis method is introduced into the initialization condition of the mixed chicken swarm algorithm, principal component analysis is carried out on the subset obtained in the third step, initial variable constraint n with contribution degree larger than 85% is obtained, n features are randomly selected as the input of a data driving model, and binary coding is carried out on the corresponding particle position 1;

in the fifth step, when the first-layer deterministic prediction output is established, the deep neural network and the shallow neural network are organically combined, and the selected data-driven model comprises a twin support vector machine, a nuclear limit learning machine and a deep nuclear limit learning machine.

2. The sugarcane crushing process prediction method according to claim 1, characterized in that: in the third step, the cumulative contribution index 0.85 is taken as a threshold value, and R2 is divided into two parts, namely R1 and R2.