CN112862173B

CN112862173B - Lake and reservoir cyanobacterial bloom prediction method based on self-organizing deep confidence echo state network

Info

Publication number: CN112862173B
Application number: CN202110126626.7A
Authority: CN
Inventors: 张慧妍; 胡博; 王小艺; 王立; 孙茜; 王昭洋
Original assignee: Beijing Technology and Business University
Current assignee: Beijing Technology and Business University
Priority date: 2021-01-29
Filing date: 2021-01-29
Publication date: 2022-10-11
Anticipated expiration: 2041-01-29
Also published as: CN112862173A

Abstract

The invention discloses a lake and reservoir cyanobacteria bloom prediction method based on a self-organizing deep confidence echo state network, and belongs to the technical field of cyanobacteria bloom prediction and information science cross fusion. The lake and reservoir cyanobacterial bloom forecasting method adopts a mutual information method to screen input variables and output variables, then constructs a structure of a deep confidence echo state network, designs self-organization mechanisms aiming at the deep confidence network and the echo state network respectively, and obtains a self-organization deep confidence echo state network model after optimizing the structure self-organization mechanisms so as to effectively forecast the lake and reservoir cyanobacterial bloom and facilitate subsequent lake and reservoir cyanobacterial bloom treatment. The method fully learns deep characteristics of training data, designs a self-organizing mechanism for the deep confidence echo state network, realizes dynamic adjustment of the number of hidden layer neurons and sub-reservoirs, is suitable for lake and reservoir cyanobacterial bloom data containing abnormal values such as detection noise and the like, and can improve the precision and the robustness of a prediction result.

Description

Lake and reservoir cyanobacterial bloom prediction method based on self-organizing deep confidence echo state network

Technical Field

The invention belongs to the technical field of blue algae water bloom prediction and information science cross fusion, and particularly relates to a lake and reservoir blue algae water bloom prediction method based on a self-organizing deep confidence echo state network.

Background

The blue algae bloom in lakes and reservoirs refers to the pollution phenomenon that algae and plankton in eutrophic lakes and reservoirs are abnormally and rapidly propagated, a large amount of blue-green algae layers visible to naked eyes are gathered on the surface layer of a water body, and the water surface is thickly covered. As urban and industrial wastewater is continuously discharged into lakes and reservoirs, the content of nutrient substances such as nitrogen, phosphorus and the like in the water body is higher and higher, which provides an environmental foundation for the outbreak of cyanobacterial bloom. Generally, factors such as water temperature, wind speed and nutrient substances influence the outbreak of cyanobacterial bloom in lakes and reservoirs. Therefore, the indexes can provide basis for targeted prediction, early warning and treatment of the cyanobacterial bloom. The lake and reservoir cyanobacteria bloom generation process has a chaotic attribute, and the time series prediction is carried out by taking the chlorophyll a concentration as a representation output variable and taking the water temperature, nutrient substances and the like as modeling input variables. The scholars in the environmental and biological fields carry out extensive research on the formation mechanism of the lake and reservoir cyanobacteria bloom, including modeling on environmental factors and plankton dynamics, and have better embodiment on the basic law of the lake and reservoir cyanobacteria bloom generation. Although the mechanism model has good interpretability, the evolution of the lake and reservoir cyanobacterial bloom is a complex nonlinear dynamic process and has certain sensitivity, and the establishment of the mechanism model with ideal quantitative prediction precision based on the existing research accumulation is difficult. With the development of the technology, the accessibility of data is continuously improved, and the application of a data driving method mainly based on a machine learning algorithm in the field of blue algae bloom prediction is more and more concerned. But the existing lake and reservoir cyanobacterial bloom prediction method has great defects in the aspects of prediction precision and robustness.

Disclosure of Invention

The invention provides a blue algae water bloom prediction method based on a self-organization deep confidence echo state network, which aims to effectively solve the problems of insufficient precision and poor robustness of the existing lake and reservoir blue algae water bloom prediction method. And after determining the input variable and the output variable, constructing a structure of a deep confidence echo state network, respectively designing a self-organization mechanism aiming at the deep confidence network and the echo state network, and obtaining a self-organization deep confidence echo state network model after optimizing the structure self-organization mechanism so as to effectively predict the lake and reservoir cyanobacterial bloom and facilitate the subsequent lake and reservoir cyanobacterial bloom treatment.

The invention provides a lake and reservoir cyanobacterial bloom prediction method based on a self-organizing deep confidence echo state network, which comprises the following four steps:

determining an input variable and an output variable of a deep confidence echo state network model;

and determining the characteristic variables of the lake-reservoir cyanobacteria bloom as output variables according to the domain knowledge, and screening out the influence variables of the lake-reservoir cyanobacteria bloom from the candidate water quality variables as input variables on the basis of a mutual information method.

Step two, establishing a structure of a deep confidence echo state network;

the method comprises the steps of constructing a structure of a deep confidence echo state network, wherein the structure comprises a deep confidence network and an echo state network, and particularly, the echo state network adopts a modularized sub-reserve pool structure and adopts a robust loss function to solve an output weight matrix of the echo state network.

Designing a self-organization mechanism of the deep confidence echo state network and optimizing the deep confidence echo state network;

after the structure of the deep confidence echo state network is constructed, firstly, the importance index of a neuron is defined, then, respective self-organization mechanisms of the deep confidence network and the echo state network are respectively designed, and the deep confidence echo state network is trained and optimized to obtain a self-organization deep confidence echo state network model.

Predicting based on the self-organizing deep confidence echo state network model;

and predicting the cyanobacterial bloom by using the self-organizing deep confidence echo state network model.

Compared with other methods in the prior art, the method provided by the invention has the advantages of feasibility and effectiveness.

The invention has the advantages that:

1. the invention constructs a self-organizing deep belief echo state network model for forecasting the lake and reservoir cyanobacterial bloom, and can fully learn the deep characteristics of training data, thereby realizing the effective forecasting of the lake and reservoir cyanobacterial bloom.

2. The invention provides a neuron importance index for measuring the importance degree of neurons, and the neuron importance index is used as the basis of self-organizing mechanism design and is beneficial to training and optimizing the deep confidence echo state network.

3. The invention designs a self-organization mechanism for the deep belief network and the echo state network respectively, so that the deep belief echo state network model can automatically determine the network structure in the training process, and the dynamic adjustment of the number of hidden layer neurons and sub reserve pools is realized.

4. In the invention, the echo state network part utilizes a robust loss function to solve the output weight matrix. Therefore, the proposed self-organization deep confidence echo state network model is suitable for lake and reservoir cyanobacterial bloom data containing abnormal values such as detection noise and the like, and can improve the accuracy and robustness of a prediction result.

Drawings

FIG. 1 is a flow chart of a lake and reservoir cyanobacterial bloom prediction method based on a self-organizing deep confidence echo state network provided by the invention;

FIG. 2 is a flow chart of establishing a deep confidence echo state network structure according to the present invention;

FIG. 3 is a flow chart of the establishment and training optimization of the self-organizing mechanism of the deep confidence echo state mesh structure in the present invention;

FIG. 4A is a schematic diagram illustrating the mutual information value between the chlorophyll-a concentration and the hysteresis input variable outputted in the embodiment of the present invention;

FIG. 4B is a schematic diagram of mutual information values between the input variables and the chlorophyll-a concentration according to the embodiment of the present invention;

fig. 5A, 5B, and 5C are a convergence graph of the number of neurons in the hidden layer of the deep belief network in the self-organizing deep belief echo state network structure, a convergence graph of the size of the reservoir of the echo state network, and a training error RMSE convergence graph of the deep belief echo state network model in the training process of the embodiment, respectively;

FIG. 6 is a schematic diagram showing the comparison between the predicted result of cyanobacterial bloom in lakes and reservoirs and the results of other conventional prediction methods in the embodiment of the present invention;

FIG. 7 is a comparison diagram of the predicted result of cyanobacterial bloom in lakes and reservoirs obtained by adding abnormal values with different proportions to the training data in the embodiment of the present invention.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings and examples.

The invention provides a lake and reservoir cyanobacterial bloom prediction method based on a self-organizing deep confidence echo state network, wherein the self-organizing deep confidence echo state network comprises a deep confidence network and an echo state network, and in order to better predict lake and reservoir cyanobacterial bloom, the structure of the echo state network is required to be effectively optimized, and the characteristics of input variables are required to be subjected to targeted refining treatment. The deep belief network is a deep neural network model based on an energy function, can overcome the defect of local minimum and has good performance in the time series prediction problem. According to the method, the deep characteristics of the time sequence data in the input variables are extracted by using the unsupervised learning process of the deep confidence network, then the echo state network is adopted to model the deep characteristics and predict the chlorophyll a concentration at the next moment, so that the processing capacity of the self-organized deep confidence echo state network model on the time sequence information can be improved, and the blue-green algae bloom can be conveniently predicted.

In order to solve the optimization design problem of the neural network structure, the invention defines a neuron importance index by adopting a mutual information method, further defines the importance index of the hidden layer neuron and the importance index of the sub reserve pool respectively, and realizes the dynamic adjustment of the number of the hidden layer neuron and the sub reserve pool by designing a self-organization mechanism. In addition, the invention also adds a robust loss function for solving the output weight matrix of the echo state network so as to improve the robustness of the echo state network. Therefore, the prediction method provided by the invention has good prediction performance and good robustness on time sequence data containing abnormal values such as detection noise and the like, is suitable for modeling and prediction of practical lake and reservoir cyanobacterial bloom, and can provide prediction and early warning support for outbreak of the lake and reservoir cyanobacterial bloom.

The invention provides a lake and reservoir cyanobacteria bloom prediction method based on a self-organizing deep confidence echo state network, the flow of which is shown in figure 1, and the method mainly comprises the following four steps:

the self-organizing deep confidence echo state network model is constructed by respectively determining an input variable and an output variable of the deep confidence echo state network model. In this embodiment, the output variable is determined as the concentration of chlorophyll a, and the input variable needs to be screened from a plurality of water quality variables affecting the generation of blue-green algae in lakes and reservoirs. The invention takes a mutual information method as a judgment criterion for screening input variables. Mutual information is used as a method for measuring the degree of interdependence between two variables, and can describe the nonlinear correlation of the two variables. When the mutual information value between the variables is larger, the correlation between the variables is higher. By respectively calculating mutual information values of the candidate water quality variables and the output variables, proper water quality variables can be screened as input variables according to the conditions of prediction precision requirements, speed and the like. Here, when the mutual information value of the candidate water quality variable and the output variable is greater than a set threshold (e.g., 0.2), the candidate water quality variable is selected as the input variable, otherwise, the candidate water quality variable is eliminated. And the screened input variables and the screened output variables participate in the training and prediction of the deep confidence echo state network model together.

Step two, establishing a structure of a deep confidence echo state network;

the self-organizing deep confidence echo state network model is composed of a deep confidence network based on limited Boltzmann machine stacking and a modular echo state network based on a sub-reserve pool. The deep confidence echo state network model firstly extracts deep features of input variables through a conventional deep confidence network. The limited Boltzmann machine is a basic unit forming a deep belief network, and comprises two layers of neurons, wherein one layer is a visible layer and is used for inputting variables; the other layer is a hidden layer for extracting deep features of the input variables. In particular, the deep confidence network part of the deep confidence echo state network is formed by stacking two limited Boltzmann machines. Specifically, as shown in fig. 2, the structure for establishing the deep confidence echo state includes the following steps:

inputting the input variable into a deep belief network, carrying out unsupervised learning through a contrast divergence method, and training the deep belief network to extract deep features of the input variable.

Inputting the deep features output by the hidden layer of the deep belief network into an echo state network, initializing the weight matrix of the deep features and the weight matrix of a sub-reserve pool by the echo state network, and collecting an internal state matrix.

The echo state network in the deep confidence echo state network is an echo state network based on a sub-reserve pool. The echo state network not only can meet the echo state characteristics, but also can reduce the complexity of parameter setting. The reserve pool in the echo state network without output feedback in the deep confidence echo state network comprises a plurality of sub reserve pools, and each sub reserve pool is mutually independent, so that the decoupling of partial neurons in the reserve pool is ensured.

Setting the number of the sub-reservoirs in the original reservoir as N _total Each sub-reserve pool is provided with n _sub Each neuron then consists of N _total Weight matrix W of reserve pool formed by individual sub reserve pools _* ^res In a block diagonal matrix, i.e.,

wherein, W _i (1≤i≤N _total ) And the weight matrix is corresponding to the ith sub-reserve pool. W _i Generated by singular value decomposition, i.e. W _i ＝U _i S _i V _i . Wherein, the diagonal matrix

Randomly generated by a given singular value distribution, and fully connected inside the sub-pool matrix. n is _sub Is the size of the ith sub-reserve pool, i.e. all the sub-reserve pool weight matrixes in the invention are n _sub ×n _sub A dimension matrix.

Is two random orthogonal matrices generated simultaneously, where u _pk ,v _pk ∈(-1,1)，p＝1,2,…,n _sub ，k＝1,2,…,n _sub 。

The mathematical expression of the echo state network based on the sub-reserve pool is as follows:

wherein u (n) is input vector of K x 1 dimension at n time, i.e. deep feature extracted by deep belief network at n time, and K is depth positionThe number of the neurons of the last hidden layer of the network;

x _i (n) is 1 Xn _sub State vector of ith sub reserve pool n moment of dimension; and y (n) is the output value of the echo state network at the moment n.

In order to input the weight matrix, the weight matrix is input,

is n _sub The input weight matrix of the ith sub-pool in xK dimension,

is 1 (N) _total ×n _sub ) The output weight matrix of the dimension. f. of ^res Is the activation function of the reserve pool neurons, and takes the sigmoid function.

Here, to overcome the effect of the initial transient, assume that n is the number n _min From time +1 to time L _train Time-of-day collection internal state matrix H = [ x (n) _min +1),…,x(L _train )] ^T Its corresponding desired output vector is T = [ T (n) _min +1),…,t(L _train )] ^T ， t(n _min + 1) is n _min The desired output value at time + 1.

In addition, in order to overcome the ill-conditioned solution problem possibly caused by detection of abnormal values such as noise and the like and improve the robustness of prediction, the output weight matrix is iteratively solved by adopting a robust loss function containing L2 regularization

Initializing the solving iteration times k =1 of the output weight matrix, initializing a robust weight matrix as a unit matrix, and calculating a robust loss function and residual robust scale estimation; and updating the robust weight matrix according to the robust weight function.

Initializing the solving iteration times k =1 of the output weight matrix, and beginningAnd (3) taking the initialized robust weight matrix as a unit matrix, calculating a robust loss function and residual robust scale estimation in an iterative process, updating the robust weight matrix according to the robust weight function, and calculating an output weight matrix. Robust loss function E (k) and output weight matrix combining regularization term from iteration to k

The solving results of (1) are respectively:

where C is the regularization coefficient and I is (N) _total ×n _sub )×(N _total ×n _sub ) The identity matrix of the dimension(s),

is 2-norm, ρ (-) is the robust objective function, ξ ^[k] (n)＝T(n)-y ^[k] (n) is the training error at the nth time instant of iteration to step k,

for the residual robust scale estimation from iteration to k steps, MAR is the median absolute deviation.

Is represented by (L) _train -n _min )×(L _train -n _min ) The robust weight matrix of dimension, w (-) is the robust weight function. In the invention, a Welsch function is taken as a robust weight function, and a robust target function rho (-) and a robust weight function w (-) thereof are respectively as follows:

wherein z is a function variable, k _set ＝μk _def Mu is a robust coefficient, the robust weight function is selected according to experience, the Welsch function is selected as the robust weight function, and then the coefficient k _def ＝2.985。

Designing a self-organization mechanism of the deep confidence network and the echo state network and training the deep confidence network and the echo state network;

the invention respectively designs a self-organization mechanism and a corresponding training process aiming at a deep confidence network and an echo state network. Namely, in the step two, on the basis of the step two, the adjustment of the hidden layer neuron of the deep confidence network and the neutron reserve pool of the echo state network is respectively realized in each iteration of the respective training process.

As shown in FIG. 3, for each hidden layer neuron of the deep belief network, the iterative training times k are initialized first ₁ =1, training weight matrix of deep belief network according to contrast divergence method, calculating importance index of neuron of each layer, wherein, iteration is carried out to any k-th ₁ Importance index of neurons in step

Is defined as:

wherein the content of the first and second substances,

and

respectively, the input and output of the jth neuron of the l layer.

Is shown as

And

the value of the mutual information between them,

is shown as

And the desired output vector T. For the deep belief network part, the self-organization process of hidden layer neurons includes splitting and deleting, and a specific self-organization mechanism based on neuron importance is shown as follows.

(1) The mechanism of neuronal cleavage in the hidden layer: when iterating to the k ₁ When the step is carried out,

the higher the neuron, the more active it is processing information. The present invention therefore chooses to split the most active neurons in the hidden layer. That is, when the jth neuron of the l-th layer satisfies the following condition:

the jth neuron splits into two neurons,

is iterated to the k < th > ₁ Total number of layer I neurons in step (ii).

(2) Mechanism of pruning of hidden layer neurons: when in use

Lower, the neuron processes the information less strongly and should be considered to be deleted. Thus, the present invention defines iterating through the kth ₁ Step-time adaptive pruning thresholdThe values are as follows:

wherein beta is (0, 1)]. Then, according to the above formula, when the jth neuron satisfies the condition

Then the jth neuron is deleted.

After the number of the neurons of the hidden layer and the weight matrix of the deep belief network are subjected to iterative training, the iterative training of the number of the sub-reservoirs of the echo state network and the output weight matrix can be carried out. Taking the output vector of the last hidden layer of the trained deep belief network as the input of the echo state network, and initializing the iteration number k of the echo state network ₂ =1, self-defining control parameter vector, randomly generating temporary reserve pool weight and temporary input weight which are consistent with the original reserve pool size, and showing the specific sub-reserve pool screening and growing mechanism of the echo state network as follows:

(1) The screening mechanism of the child reserve pool: the invention defines the importance index of the ith sub-reserve pool in the reserve pool

Comprises the following steps:

wherein

Is the input vector of the p-th neuron of the ith sub-reservoir,

and the output vector of the p-th neuron of the ith sub-reservoir. Thus, training to k in iterations ₂ Randomly generating temporary sub-reserves (1, 2, \8230; i, … ,i _max (k ₂ ) Sorting according to the size of the importance indexes:

the invention defines the self-adaptive screening threshold as follows:

S _th (k ₂ )＝NS′ _sub (INT(αi _max (k ₂ ))) (12)

wherein INT (-) is an integer function.

And the sorted sub reserve pool importance vectors are obtained. And the alpha epsilon (0, 1) is a self-defined control parameter, namely different screening degrees of the sub reserve pool are controlled in each circulation. The parameter may take a plurality of values

Together forming a control parameter vector

But need to satisfy alpha ₁ ＜α ₂ ＜…＜α _Nα ，N _α Is the dimension of the control parameter vector.

The training goal of the echo state network is to minimize the robust loss function of equation (4). In order to ensure that the performance of the reserve pool after screening can be kept or better than the sub-reserve pool set before screening, when the kth iteration ₂ The ith sub-reserve pool meets the following conditions:

and the robust loss function E (k) of all sub-pools that satisfy the condition ₂ ) And when the value is less than or equal to the minimum value of the historical robust loss function, the sub-reserve pools are reserved as new reserve pools, and the rest sub-reserve pools are deleted. And taking the screened sub reserve pools as temporary reserve pools and calculating training errors.

(2) Growth mechanism of child reserve pool: after screening, the increase of the sub reserve pools is realized, the temporary reserve pool is used as a new reserve pool, a new randomly generated sub reserve pool is merged, and then the output weight matrix of the echo state network after merging is as follows:

wherein H _o For the state matrix corresponding to the reserve pool after the screening mechanism is completed, H _g For the state matrix corresponding to the growing reserve pool,

is the state matrix corresponding to the growing reserve pool.

Is composed of

An identity matrix of dimensions, wherein,

the total number of the growing child pools. Further, a combined output weight matrix may be derived based on equation (14)

The mathematical expression updated is:

wherein, I _o Is (N) _o ×n _sub )×(N _o ×n _sub ) Identity matrix of dimension, N _o The number of the child reserve pools after the screening mechanism is completed. I is _g Is n _sub ×n _sub An identity matrix of dimensions. I is _L Is (L) _train -n _min )×(L _train -n _min ) An identity matrix of dimensions.

And obtaining a self-organizing deep confidence echo state network model.

through the design of a self-organization mechanism, the self-organization deep confidence echo state network model can automatically learn and optimally design the proper number of neurons in the hidden layer of the deep confidence network and the number of the sub reserve pools of the echo state network in the training process, and meanwhile, the weight matrix solution corresponding to each neural network is realized. And inputting the input variable into the trained self-organizing deep confidence echo state network model, so that the characterization index of the lake and reservoir cyanobacterial bloom, namely the chlorophyll a concentration prediction can be realized.

The technical solution of the present invention is further illustrated by the following examples.

The first embodiment is as follows:

the embodiment provides a lake and reservoir cyanobacterial bloom prediction method based on a self-organizing deep confidence echo state network, which specifically comprises the following implementation steps:

step one, determining an input variable and an output variable of a prediction model;

the data in the examples were derived from the water quality data set of west fal-thao harbor, usa. The data set contains 6 water quality variables, and table 1 specifically shows the abbreviations, units and meanings of the individual variables in the data set.

TABLE 1 Water quality variables information

The sampling frequency of the data is 20 minutes, the acquisition time starts from 18 o 'clock 01 at 6 h in 2017 to 13 o' clock 21 at 31 h in 2017, 8 h in 2017, and 2491 groups of data are shared. In order to overcome the influence of redundant indexes on the modeling effect, the correlation between the water quality variable and the concentration of the chlorophyll a serving as an output variable is measured by a mutual information value in the experiment. In the experiment, not only the correlation of the water quality variables but also the autoregressive characteristic of the time series of the chlorophyll a concentration are considered. As can be seen from fig. 4A, the mutual information value of the lag variable of chlorophyll a gradually decreases as the lag time increases. Fig. 4B shows the mutual information values of 5 water quality variables for the chlorophyll a concentration at the next time. The experiment selects a water quality variable with a mutual information value greater than 0.2. Therefore, the input variables of the self-organizing depth confidence echo state network are the water temperature, the salinity, the oxygen saturation, the specific conductivity, the chlorophyll a concentration at the same moment and the chlorophyll a concentration at the lag three moments, and the output variable is the chlorophyll a concentration at the next moment. That is, the number of input variables is 8, and the number of output variables is 1.

Step two, establishing a structure of a deep confidence echo state network;

in the experiment for predicting the cyanobacterial bloom in the lake and reservoir, the self-organizing deep confidence echo state network starts to collect data after running 200 data to form a state matrix, the length of training data is 1600, and the length of testing data is 691. The neuron of the hidden layer of the deep belief network part is initialized to 3-3, the iterative training time is 50, the learning batch size is 50, the learning rate is 0.1, and beta is 0.98. The initialization range of the input weight matrix element of the echo state network is [ -1,1]The singular value of the diagonal matrix in SVD takes [0.1,0.99]The size of the sub-reserve pool is 5, the regularization coefficient C is 1e-7, the robust coefficient mu is 1, the iteration number of output weight matrix solving is 15, the iteration number of the reserve pool self-organizing process is 50, and the control parameter vector is uniformly distributed

Is (0.5, 0.6,0.7,0.8, 0.9).

Designing and optimizing a self-organizing mechanism and a training process of the deep confidence echo state network;

the self-organizing process of hidden layers and reserve pools in the self-organizing deep confidence echo state network is shown in fig. 5A and 5B. In FIG. 5A, the neurons of the first hidden layer H1 and the second hidden layer H2 are finally stable at 7 and 6, respectively, so the final hidden layer structure is 7-6. During the process of reservoir size learning, the iterative training number is set to 100. As shown in fig. 5B, the pool size iteratively converges to 120, containing a total of 24 child pools, based on the self-organizing mechanism. Therefore, the structure of the self-organizing deep confidence echo state network in the experiment is 8-7-6-120-1. Fig. 5C shows the convergence curve of Root Mean Square Error (RMSE) during training. The training error of the self-organizing deep confidence echo state network finally converges to be near the minimum value of 0.383.

FIG. 6 shows the comparison of the predicted results of cyanobacterial bloom in lakes and reservoirs by using the self-organizing deep confidence echo state network and the predicted results by using other echo state network methods. It can be seen that the self-organizing deep confidence echo state network (SDBMESN) provided by the embodiment of the invention can effectively learn the evolution rule of the cyanobacterial bloom in the lake and reservoir relative to other echo state network models. Table 2 shows the comprehensive performance of the basic echo state network (OESN), the Regularized Echo State Network (RESN), the Growing Echo State Network (GESN), the adaptive regularized echo state network (DRESN), and the deep confidence echo state network (DBESN) in training and testing, including neural network structure and RMSE index. Therefore, the lake and reservoir cyanobacterial bloom prediction method based on the self-organizing deep confidence echo state network has high prediction precision and good generalization capability. Meanwhile, the size of a reserve pool of the self-organizing deep confidence echo state network is smaller than that of other echo state networks, and the self-organizing deep confidence echo state network has the simplest neural network structure. Here, the DBESN for each set of experiments employs the same neural network structure as the self-organizing deep confidence echo state network. However, under the condition of consistent neural network structure, the prediction performance of the DBESN is lower than that of the self-organizing deep confidence echo state network provided by the invention. The self-organizing mechanism of the self-organizing deep confidence echo state network not only realizes structural simplification, but also keeps the neurons and the sub reserve pools with better relative performance in the existing neurons in the self-organizing process, so that the neurons and the reserve pools in the self-organizing deep confidence echo state network have better prediction effect, and the capability of processing dynamic information by the neurons and the reserve pools is further improved. Therefore, the self-organizing deep confidence echo state network is suitable for the prediction application of the lake and reservoir cyanobacterial bloom.

TABLE 2 blue algae bloom prediction experiment results and different method comparison

In the self-organizing deep confidence echo state network in the embodiment, the robustness loss function is used as the target function, so that the robustness of time series data prediction of abnormal values such as monitoring noise can be improved. To verify this feature, a 10% to 40% proportion of the pulse function was added to each training sample of the example dataset. The test results are shown in FIG. 7. It can be seen that the robustness of the self-organizing deep confidence echo state network of the embodiment of the invention is obviously superior to that of other echo state networks, and the robustness is better.

Claims

1. A lake and reservoir cyanobacterial bloom prediction method based on a self-organizing deep confidence echo state network comprises the following steps:

determining an input variable and an output variable of a deep confidence echo state network;

the input variable takes a mutual information method as a judgment criterion, when the mutual information value of the candidate water quality variable and the output variable is greater than a set threshold value of 0.2, the candidate water quality variable is selected as the input variable, otherwise, the candidate water quality variable is removed;

the output variable is the characteristic variable chlorophyll a concentration of the lake and reservoir cyanobacterial bloom;

the screened input variables and output variables participate in the training and prediction of the deep confidence echo state network;

step two, establishing a structure of a deep confidence echo state network;

the structure of the deep confidence echo state network comprises a deep confidence network and an echo state network, wherein the echo state network adopts a modular reserve pool structure; the method comprises the following specific steps:

2.1, adopting a restricted Boltzmann mechanism as a basic unit of a depth confidence network, and extracting deep features of an input variable;

2.2, learning the deep layer characteristics and predicting the chlorophyll a concentration at the next moment by an echo state network;

depth settingThe reserve pool in the echo-back state network structure comprises a plurality of sub reserve pools, and each sub reserve pool is independent; setting the number of the sub-reserve pools as N _total Each is composed of N _total Reserve pool weight matrix formed by individual sub reserve pools

Is a block diagonal matrix, namely:

wherein each weight matrix element W _i For the weight matrix corresponding to the ith sub-reserve pool, i is more than or equal to 1 and less than or equal to N _total ；

W _i Generated by singular value decomposition, i.e. W _i ＝U _i S _i V _i ；

Diagonal matrix

Randomly generated by a given singular value distribution, and the weight matrix inside the sub-reserve pool is fully connected,

p＝1,2,…,n _sub ，n _sub the size of the ith sub-reserve pool;

is two random orthogonal matrices generated simultaneously, where u _pk ,v _pk ∈(-1,1)，p＝1,2,…,n _sub ，k＝1,2,…,n _sub ；

To overcome the effect of the initial transient, assume that n is the number of transitions from _min From time +1 to L _train Time-of-day collection internal state matrix H = [ x (n) _min +1),…,x(L _train )] ^T Its corresponding desired output vector is T = [ T (n) _min +1),…,t(L _train )] ^T ，t(n _min + 1) is n _min The desired output value at time + 1;

output weight matrix

Adopting a robust loss function containing L2 regularization to iteratively solve output, combining the robust loss function E (k) of a regularization item from iteration to k step and outputting a weight matrix

The solution results of (a) are respectively:

c is a regularization coefficient, and C is a regularization coefficient,

i is (N) _total ×n _sub )×(N _total ×n _sub ) The identity matrix of the dimension(s),

is a function of the 2-norm,

ρ (-) is a robust objective function, ξ ^[k] (n)＝T(n)-y ^[k] (n) is the training error at the nth time of iteration to step k,

for residual robust scale estimation from iteration to k steps, MAR is the median absolute deviation,

is represented by (L) _train -n _min )×(L _train -n _min ) A dimensional robust weight matrix, w (-) is a robust weight function, and a Welsch function is taken;

the robust objective function ρ (-) and the robust weighting function w (-) are respectively:

wherein z is a variable, k _set ＝μk _def Mu is a robust coefficient, k _def ＝2.985；

Designing a self-organization mechanism of the deep confidence echo state network and training the deep confidence echo state network;

respectively realizing the adjustment of hidden layer neurons of the deep belief network and a neutron reserve pool in the echo state network in each iteration of the respective training process;

for each hidden layer neuron of the deep belief network, iterate to kth ₁ Importance index of neurons in step

Is defined as follows:

input for the jth neuron at level l;

is the output of the jth neuron of the l layer；

Is shown as

And

mutual information value between;

is shown as

And the desired output vector T;

(3.1) mechanism of cleavage of hidden layer neurons: when the jth neuron of the l-th layer meets the following condition:

the jth neuron splits into two neurons,

is iterated to the k < th > ₁ The total number of layer I neurons;

(3.2) mechanism of pruning of hidden layer neurons: define iteration to kth ₁ The adaptive pruning threshold at step time is as follows:

wherein beta is (0, 1)]When the jth neuron satisfies the condition

Then, the jth neuron is deleted;

the self-organizing mechanism of the echo state network comprises a sub-reserve pool screening and growing mechanism, specifically,

(3.3) screening mechanism of the child pools: defining importance index of ith sub-reserve pool in reserve pool

Comprises the following steps:

wherein

Is the input vector of the p-th neuron of the ith sub-reservoir,

for the output vector of the p-th neuron of the ith sub-reservoir, iteratively training to k ₂ Randomly generating i consistent with the structure of the original reserve pool in step _max (k ₂ ) A temporary child reserve pool {1,2, \8230;, i _max (k ₂ ) Sorting according to the size of the importance indexes:

the adaptive screening threshold is defined as follows:

S _th (k ₂ )＝NS _s ′ _ub (INT(αi _max (k ₂ ))) (12)

wherein INT (-) is an integer function,

for the sorted sub-reserve pool importance vectors, alpha belongs to (0, 1) as a self-defined control parameter;

when iteratingKth ₂ When the step (ii) is finished, the ith sub-reserve pool meets the following conditions:

and robust loss function E (k) for all sub-pools ₂ ) When the current value is less than or equal to the minimum value of the historical robust loss function, the sub reserve pools are reserved, and the rest sub reserve pools are deleted;

predicting the cyanobacterial bloom by using the self-organizing deep confidence echo state network model;

the method is characterized in that:

the mathematical expression of the echo state network is as follows:

u (n) is input vector of K multiplied by 1 dimension at n moment, K is the neuron number of the last hidden layer of the deep belief network, namely the deep feature of the deep belief network at n moment;

x _i (n) is 1 Xn _sub The state vector of the ith sub-reserve pool of the dimension at the moment n;

y (n) is the output value of the echo state network at the moment n;

in order to input the weight matrix, the weight matrix is input,

is n _sub The input weight matrix of the ith sub-pool in xK dimension,

is 1 (N) _total ×n _sub ) A dimension output weight matrix;

f ^res taking a sigmoid function as an activation function of the reserve pool neurons;

(3.4) growth mechanism of child reserve pool: combining each screened sub reserve pool with a new randomly generated sub reserve pool, wherein the weight matrix of the output vector of the echo state network after combination is as follows:

H _o a state matrix corresponding to the reserve pool after the screening mechanism is completed;

H _g a state matrix corresponding to the growing reserve pool;

merging the state matrixes corresponding to the increased reserve pools;

is composed of

An identity matrix of dimensions, wherein,

the total number of the merged and increased child reserve pools;

obtaining an output weight matrix based on equation (14)

The updated mathematical expression is:

I _o is (N) _o ×n _sub )×(N _o ×n _sub ) An identity matrix of dimensions;

N _o the number of the child reserve pools after the screening mechanism is finished;

I _g is n _sub ×n _sub An identity matrix of dimensions;

I _L is (L) _train -n _min )×(L _train -n _min ) An identity matrix of dimensions.

2. The lake and reservoir cyanobacterial bloom prediction method based on the self-organizing deep confidence echo state network as claimed in claim 1, wherein: the input variables of the deep confidence echo state mesh structure also include the chlorophyll a concentration at three moments later.