CN112862173A

CN112862173A - Lake and reservoir cyanobacterial bloom prediction method based on self-organizing deep confidence echo state network

Info

Publication number: CN112862173A
Application number: CN202110126626.7A
Authority: CN
Inventors: 张慧妍; 胡博; 王小艺; 王立; 孙茜; 王昭洋
Original assignee: Beijing Technology and Business University
Current assignee: Beijing Technology and Business University
Priority date: 2021-01-29
Filing date: 2021-01-29
Publication date: 2021-05-28
Anticipated expiration: 2041-01-29
Also published as: CN112862173B

Abstract

The invention discloses a lake and reservoir cyanobacteria bloom prediction method based on a self-organizing deep confidence echo state network, and belongs to the technical field of cyanobacteria bloom prediction and information science cross fusion. The lake and reservoir cyanobacterial bloom forecasting method adopts a mutual information method to screen input variables and output variables, then constructs a structure of a deep confidence echo state network, designs self-organization mechanisms aiming at the deep confidence network and the echo state network respectively, and obtains a self-organization deep confidence echo state network model after optimizing the structure self-organization mechanisms so as to effectively forecast the lake and reservoir cyanobacterial bloom and facilitate subsequent lake and reservoir cyanobacterial bloom treatment. The method fully learns the deep characteristics of the training data, designs a self-organization mechanism for the deep confidence echo state network, realizes the dynamic adjustment of the number of the hidden layer neurons and the sub-reservoirs, is suitable for the lake and reservoir cyanobacterial bloom data containing abnormal values such as detection noise and the like, and can improve the precision and the robustness of a prediction result.

Description

Lake and reservoir cyanobacterial bloom prediction method based on self-organizing deep confidence echo state network

Technical Field

The invention belongs to the technical field of blue algae bloom prediction and information science cross fusion, and particularly relates to a lake and reservoir blue algae bloom prediction method based on a self-organizing deep confidence echo state network.

Background

The lake-reservoir cyanobacterial bloom refers to the pollution phenomenon that algae and plankton in eutrophic lakes and reservoirs are abnormally and rapidly propagated, and a large amount of blue-green algae layers visible to naked eyes are gathered on the surface layer of a water body and thickly cover the water surface. As urban and industrial wastewater is continuously discharged into lakes and reservoirs, the content of nutrient substances such as nitrogen, phosphorus and the like in the water body is higher and higher, which provides an environmental foundation for the outbreak of cyanobacterial bloom. Generally, factors such as water temperature, wind speed and nutrient substances influence the outbreak of the cyanobacterial bloom in lakes and reservoirs. Therefore, the indexes can provide basis for targeted prediction, early warning and treatment of the cyanobacterial bloom. The lake and reservoir cyanobacteria bloom generation process has a chaotic attribute, and the time series prediction is carried out by taking the chlorophyll a concentration as a representation output variable and taking the water temperature, nutrient substances and the like as modeling input variables. The scholars in the environmental and biological fields carry out extensive research on the formation mechanism of the lake and reservoir cyanobacteria bloom, including modeling on environmental factors and plankton dynamics, and have better embodiment on the basic law of the lake and reservoir cyanobacteria bloom generation. Although the mechanism model has good interpretability, the evolution of the lake and reservoir cyanobacterial bloom is a complex nonlinear dynamic process and has certain sensitivity, and the establishment of the mechanism model with ideal quantitative prediction precision based on the existing research accumulation is difficult. With the development of the technology, the accessibility of data is continuously improved, and the application of a data driving method mainly based on a machine learning algorithm in the field of blue algae bloom prediction is more and more concerned. But the existing lake and reservoir cyanobacterial bloom prediction method has great defects in the aspects of prediction precision and robustness.

Disclosure of Invention

The invention provides a blue algae water bloom prediction method based on a self-organization deep confidence echo state network, which aims to effectively solve the problems of insufficient precision and poor robustness of the existing lake and reservoir blue algae water bloom prediction method. And after determining the input variable and the output variable, constructing a structure of a deep confidence echo state network, respectively designing a self-organization mechanism aiming at the deep confidence network and the echo state network, and obtaining a self-organization deep confidence echo state network model after optimizing the structure self-organization mechanism so as to effectively predict the lake and reservoir cyanobacterial bloom and facilitate the subsequent lake and reservoir cyanobacterial bloom treatment.

The invention provides a lake and reservoir cyanobacterial bloom prediction method based on a self-organizing deep confidence echo state network, which comprises the following four steps:

determining an input variable and an output variable of a deep confidence echo state network model;

determining the characterization variables of the lake and reservoir cyanobacterial bloom as output variables according to the domain knowledge, and screening out the influence variables of the lake and reservoir cyanobacterial bloom from the candidate water quality variables as input variables based on a mutual information method.

Step two, establishing a structure of a deep confidence echo state network;

the method comprises the steps of constructing a structure of a deep confidence echo state network, wherein the structure comprises the deep confidence network and the echo state network, and particularly, the echo state network adopts a modularized subreserve pool structure and adopts a robust loss function to solve an output weight matrix.

Designing a self-organization mechanism of the deep confidence echo state network and optimizing the deep confidence echo state network;

after the structure of the deep confidence echo state network is constructed, firstly, the importance index of a neuron is defined, then respective self-organization mechanisms of the deep confidence network and the echo state network are respectively designed, and the deep confidence echo state network is trained and optimized to obtain a self-organization deep confidence echo state network model.

Predicting based on the self-organizing deep confidence echo state network model;

and predicting the cyanobacterial bloom by using the self-organizing deep confidence echo state network model.

Compared with other methods in the prior art, the method provided by the invention has the advantages of feasibility and effectiveness.

The invention has the advantages that:

1. the invention constructs a self-organizing deep confidence echo state network model for forecasting the lake and reservoir cyanobacterial bloom, and can fully learn the deep characteristics of training data, thereby realizing the effective forecasting of the lake and reservoir cyanobacterial bloom.

2. The invention provides a neuron importance index for measuring the importance degree of neurons, and the neuron importance index is used as the basis of self-organizing mechanism design and is beneficial to training and optimizing the deep confidence echo state network.

3. The invention designs a self-organization mechanism for the deep belief network and the echo state network respectively, so that the deep belief echo state network model can automatically determine the network structure in the training process, and the dynamic adjustment of the number of hidden layer neurons and sub reserve pools is realized.

4. In the invention, the echo state network part utilizes a robust loss function to solve the output weight matrix. Therefore, the proposed self-organization deep confidence echo state network model is suitable for lake and reservoir cyanobacterial bloom data containing abnormal values such as detection noise and the like, and can improve the accuracy and robustness of a prediction result.

Drawings

FIG. 1 is a flow chart of a lake and reservoir cyanobacterial bloom prediction method based on a self-organizing deep confidence echo state network provided by the invention;

FIG. 2 is a flow chart of establishing a deep confidence echo state network structure according to the present invention;

FIG. 3 is a flow chart of the establishment and training optimization of the self-organizing mechanism of the deep confidence echo state mesh structure in the present invention;

FIG. 4A is a diagram illustrating the mutual information between the chlorophyll-a concentration and the lag input variable outputted in the embodiment of the present invention;

FIG. 4B is a diagram illustrating the mutual information between the input variables and the chlorophyll-a concentration according to an embodiment of the present invention;

fig. 5A, 5B, and 5C are a convergence graph of the number of neurons in the hidden layer of the deep belief network in the self-organizing deep belief echo state network structure, a convergence graph of the size of the reservoir of the echo state network, and a training error RMSE convergence graph of the deep belief echo state network model in the training process of the embodiment, respectively;

FIG. 6 is a schematic diagram showing the comparison between the predicted result of cyanobacterial bloom in lakes and reservoirs and the result of other conventional prediction methods in the embodiment of the present invention;

FIG. 7 is a comparison diagram of the predicted result of cyanobacterial bloom in lakes and reservoirs obtained by adding abnormal values with different proportions to the training data in the embodiment of the present invention.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings and examples.

The invention provides a lake and reservoir cyanobacterial bloom prediction method based on a self-organizing deep confidence echo state network, wherein the self-organizing deep confidence echo state network comprises a deep confidence network and an echo state network, and in order to better predict lake and reservoir cyanobacterial bloom, the structure of the echo state network is required to be effectively optimized, and the characteristics of input variables are required to be subjected to targeted refining treatment. The deep belief network is a deep neural network model based on an energy function, can overcome the defect of local minimum and has good performance in the time series prediction problem. According to the method, the deep characteristics of the time sequence data in the input variables are extracted by using the unsupervised learning process of the deep confidence network, then the echo state network is adopted to model the deep characteristics and predict the chlorophyll a concentration at the next moment, so that the processing capacity of the self-organized deep confidence echo state network model on the time sequence information can be improved, and the blue-green algae bloom can be conveniently predicted.

In order to solve the optimization design problem of the neural network structure, the invention defines a neuron importance index by adopting a mutual information method, further defines the importance index of the hidden layer neuron and the importance index of the sub reserve pool respectively, and realizes the dynamic adjustment of the number of the hidden layer neuron and the sub reserve pool by designing a self-organization mechanism. In addition, the invention also adds a robust loss function for solving the output weight matrix of the echo state network so as to improve the robustness of the echo state network. Therefore, the prediction method provided by the invention has good prediction performance and good robustness on time sequence data containing abnormal values such as detection noise and the like, is suitable for modeling and prediction of practical lake and reservoir cyanobacterial bloom, and can provide prediction and early warning support for outbreak of the lake and reservoir cyanobacterial bloom.

The invention provides a lake and reservoir cyanobacteria bloom prediction method based on a self-organizing deep confidence echo state network, the flow of which is shown in figure 1, and the method mainly comprises the following four steps:

the self-organizing deep confidence echo state network model is constructed by respectively determining an input variable and an output variable of the deep confidence echo state network model. In this embodiment, the output variable is determined as the concentration of chlorophyll a, and the input variable needs to be screened from a plurality of water quality variables affecting the generation of blue-green algae in lakes and reservoirs. The invention takes a mutual information method as a judgment criterion for screening input variables. Mutual information is used as a method for measuring the degree of interdependence between two variables, and can describe the nonlinear correlation of the two variables. When the mutual information value between the variables is larger, the correlation between the variables is higher. By respectively calculating mutual information values of the candidate water quality variables and the output variables, proper water quality variables can be screened as input variables according to the conditions of prediction precision requirements, speed and the like. Here, when the mutual information value of the candidate water quality variable and the output variable is greater than a set threshold (e.g., 0.2), the candidate water quality variable is selected as the input variable, otherwise, the candidate water quality variable is eliminated. And the screened input variables and the screened output variables participate in the training and prediction of the deep confidence echo state network model together.

Step two, establishing a structure of a deep confidence echo state network;

the self-organizing deep confidence echo state network model is composed of a deep confidence network based on limited Boltzmann machine stacking and a modular echo state network based on a sub-reserve pool. The deep confidence echo state network model firstly extracts deep features of input variables through a conventional deep confidence network. The limited Boltzmann machine is a basic unit forming a deep belief network, and comprises two layers of neurons, wherein one layer is a visual layer and is used for inputting variables; the other layer is a hidden layer for extracting deep features of the input variables. In particular, the deep confidence network part of the deep confidence echo state network is formed by stacking two limited Boltzmann machines. Specifically, as shown in fig. 2, the structure for establishing the deep confidence echo state includes the following steps:

inputting the input variable into a deep belief network, carrying out unsupervised learning through a contrast divergence method, and training the deep belief network to extract deep features of the input variable.

Inputting the deep features output by the hidden layer of the deep belief network into an echo state network, initializing the weight matrix of the deep features and the weight matrix of a sub-reserve pool by the echo state network, and collecting an internal state matrix.

The echo state network in the deep confidence echo state network is an echo state network based on a sub-reserve pool. The echo state network not only can meet the echo state characteristics, but also can reduce the complexity of parameter setting. The reserve pool in the echo state network without output feedback in the deep confidence echo state network comprises a plurality of sub reserve pools, and each sub reserve pool is mutually independent, so that the decoupling of partial neurons in the reserve pool is ensured.

Setting the number of the neutron reserve tanks in the original reserve tank as N_totalN in each sub-reserve pool_subEach neuron then consists of N_totalWeight matrix W of reserve pool formed by individual sub reserve pools_* ^resIn a block diagonal matrix, i.e.,

wherein, W_i(1≤i≤N_total) And the weight matrix is corresponding to the ith sub-reserve pool. W_iGenerated by singular value decomposition, i.e. W_i＝U_iS_iV_i. Wherein, the diagonal matrix

Randomly generated by a given singular value distribution, and the interior of the sub-pool matrix is fullAnd (4) connecting. n is_subIs the size of the ith sub-reserve pool, i.e. all the sub-reserve pool weight matrixes in the invention are n_sub×n_subA dimension matrix.

Is two random orthogonal matrices generated simultaneously, where u_pk,v_pk∈(-1,1)，p＝1,2,…,n_sub，k＝1,2,…,n_sub。

The mathematical expression of the echo state network based on the sub-reserve pool is as follows:

wherein u (n) is an input vector at the time of n with K x 1 dimension, namely the deep-layer feature extracted by the deep belief network at the time of n, and K is the number of neurons of the last hidden layer of the deep belief network;

x_i(n) is 1 Xn_subState vector of ith sub reserve pool n moment of dimension; and y (n) is the output value of the echo state network at the moment n.

In order to input the weight matrix, the weight matrix is input,

is n_subThe input weight matrix of the ith sub-pool in xK dimension,

is 1 × (N)_total×n_sub) The output weight matrix of the dimension. f. of^resIs the activation function of the reserve pool neurons, and takes the sigmoid function.

Here, to overcome the effect of the initial transient, assume that n is the number n_minFrom time +1 to L_trainTime-of-day collection internal state matrix H ═ x (n)_min+1),…,x(L_train)]^TThe corresponding desired output vector is T ═ T (n)_min+1),…,t(L_train)]^T，t(n_min+1) is n_minThe desired output value at time + 1.

In addition, in order to overcome the ill-conditioned solution problem possibly caused by abnormal values including detection noise and the like and improve the robustness of prediction, the output weight matrix is iteratively solved by adopting a robust loss function including L2 regularization

Initializing the solving iteration times k of the output weight matrix to be 1, initializing the robust weight matrix to be a unit matrix, and calculating a robust loss function and residual robust scale estimation; and updating the robust weight matrix according to the robust weight function.

And initializing the solving iteration number k of the output weight matrix to be 1, initializing the robust weight matrix to be a unit matrix, calculating a robust loss function and residual robust scale estimation in the iteration process, updating the robust weight matrix according to the robust weight function, and calculating the output weight matrix. Combining the robust loss function E (k) of the regularization term and the output weight matrix from iteration to k

The solving results of (1) are respectively:

where C is the regularization coefficient and I is (N)_total×n_sub)×(N_total×n_sub) Identity matrix of dimension，

Is a 2-norm, ρ (-) is a robust objective function, ξ^[k](n)＝T(n)-y^[k](n) is the training error at the nth time instant of iteration to step k,

for the residual robust scale estimation from iteration to k steps, MAR is the median absolute deviation.

Is represented by (L)_train-n_min)×(L_train-n_min) The robust weight matrix of dimension, w (-) is the robust weight function. In the invention, a Welsch function is taken as a robust weight function, and a robust target function rho (-) and a robust weight function w (-) thereof are respectively as follows:

wherein z is a function variable, k_set＝μk_defMu is a robust coefficient, the robust weight function is selected according to experience, the Welsch function is selected as the robust weight function, and then the coefficient k_def＝2.985。

Designing a self-organization mechanism of the deep confidence network and the echo state network and training the deep confidence network and the echo state network;

the invention respectively designs a self-organization mechanism and a corresponding training process aiming at a deep confidence network and an echo state network. Namely, in the step two, on the basis of the step two, the adjustment of the hidden layer neuron of the deep confidence network and the neutron reserve pool of the echo state network is respectively realized in each iteration of the respective training process.

As shown in fig. 3, for deep belief networksFor each hidden layer neuron, first initialize the iterative training times k ₁1, training a weight matrix of the deep belief network according to a contrast divergence method, and calculating the importance index of each layer of neuron, wherein the process is iterated to the k-th arbitrary neuron₁Importance index of neurons in step

Is defined as:

wherein,

and

respectively, the input and output of the jth neuron of the l layer.

Is shown as

And

the mutual information value between the two information blocks,

is shown as

And the desired output vector T. For the deep belief network part, the self-organization process of hidden layer neurons includes splitting and deleting, and a specific self-organization mechanism based on neuron importance is shown as follows.

(1) The mechanism of neuronal cleavage in the hidden layer: when iterating to the k₁When the step is carried out,

the higher the neuron is, the more active it is processing information. The present invention therefore chooses to split the most active neurons in the hidden layer. That is, when the jth neuron of the l-th layer satisfies the following condition:

the jth neuron splits into two neurons,

is iterated to the k < th >₁Total number of layer I neurons in step (ii).

(2) Mechanism of pruning of hidden layer neurons: when in use

At lower, the neuron processes the information less strongly and should be considered to delete it. Thus, the present invention defines iterating through the kth₁The adaptive pruning threshold at step time is as follows:

wherein beta is (0,1)]. Then, according to the above formula, when the jth neuron satisfies the condition

Then the jth neuron is deleted.

After the number of neurons in the hidden layer of the deep belief network and the weight matrix are subjected to iterative training, the number of the sub-reserve pools of the echo state network and the output weight matrix can be subjected to iterative training. Taking the output vector of the last hidden layer of the trained deep belief network as the input of the echo state network, and initializing the iteration times k of the echo state network₂Defining the control parameter vector by user, randomly generating a temporary reserve pool weight and a temporary input weight which are consistent with the original reserve pool in size, and for 1The specific screening and growing mechanism of the sub-reserve pool of the echo state network is shown as follows:

(1) the screening mechanism of the child reserve pool: the invention defines the importance index of the ith sub-reserve pool in the reserve pool

Comprises the following steps:

wherein

Is the input vector of the p-th neuron of the ith sub-reservoir,

is the output vector of the p neuron of the ith sub-reservoir. Thus, training iteratively to k₂Randomly generating temporary sub-reserve pool with structure consistent with that of original reserve pool in step time

Sorting according to the size of the importance indexes:

the invention defines the adaptive screening threshold as follows:

S_th(k₂)＝NS′_sub(INT(αi_max(k₂))) (12)

wherein INT (-) is an integer function.

The sorted child pool importance vectors are used. And alpha epsilon (0,1) is a customized control parameter, namely different screening degrees of the control sub-reserve pool in each circulation. The parameter may take a plurality of values

Together forming a control parameter vector

But need to satisfy alpha₁＜α₂＜…＜α_Nα，N_αIs the dimension of the control parameter vector.

The training goal of the echo state network is to minimize the robust loss function of equation (4). In order to ensure that the performance of the reserve pool after screening can be kept or better than the sub-reserve pool set before screening, when the kth iteration₂The ith sub-reserve pool meets the following requirements:

and the robust loss function E (k) of all sub-pools that satisfy the condition₂) And when the value is less than or equal to the minimum value of the historical robust loss function, the sub-reserve pools are reserved as new reserve pools, and the rest sub-reserve pools are deleted. And taking the screened sub reserve pools as temporary reserve pools and calculating training errors.

(2) Growth mechanism of child reserve pool: after screening, the increase of the sub reserve pool is realized, the temporary reserve pool is used as a new reserve pool, and a new randomly generated sub reserve pool is merged, so that the output weight matrix of the merged echo state network is as follows:

wherein H_oFor the state matrix corresponding to the reserve pool after the screening mechanism is completed, H_gFor the state matrix corresponding to the growing reserve pool,

is the state matrix corresponding to the growing reserve pool.

Is composed of

An identity matrix of dimensions, wherein,

is the total number of growing child pools. Further, a combined output weight matrix may be derived based on equation (14)

The updated mathematical expression is:

wherein, I_oIs (N)_o×n_sub)×(N_o×n_sub) Identity matrix of dimension, N_oThe number of the child reserve pools after the screening mechanism is completed. I is_gIs n_sub×n_subAn identity matrix of dimensions. I is_LIs (L)_train-n_min)×(L_train-n_min) An identity matrix of dimensions.

And obtaining a self-organizing deep confidence echo state network model.

through the design of a self-organization mechanism, the self-organization deep confidence echo state network model can automatically learn and optimally design the proper number of neurons in the hidden layer of the deep confidence network and the number of the sub reserve pools of the echo state network in the training process, and meanwhile, the weight matrix solution corresponding to each neural network is realized. And inputting the input variable into the trained self-organizing depth confidence echo state network model, so that the characterization index of the lake and reservoir cyanobacterial bloom, namely the prediction of the chlorophyll a concentration, can be realized.

The technical solution of the present invention is further illustrated by the following examples.

The first embodiment is as follows:

the embodiment provides a lake and reservoir cyanobacterial bloom prediction method based on a self-organizing deep confidence echo state network, which comprises the following specific implementation steps:

step one, determining an input variable and an output variable of a prediction model;

the data in the examples were derived from the water quality data set of west fal-thao harbor, usa. The data set contains 6 water quality variables, and table 1 specifically shows the abbreviations, units and meanings of the individual variables in the data set.

TABLE 1 Water quality variables information

The sampling frequency of the data is 20 minutes, the acquisition time starts from 18 o 'clock 01 at 6 h in 2017 to 13 o' clock 21 at 31 h in 2017, 8 h in 2017, and 2491 groups of data are shared. In order to overcome the influence of the redundant indexes on the modeling effect, the experiment measures the correlation between the water quality variable and the output variable chlorophyll a concentration by using mutual information values. In the experiment, not only the correlation of the water quality variables but also the autoregressive characteristic of the time series of the chlorophyll a concentration are considered. As can be seen from fig. 4A, the mutual information value of the lag variable of chlorophyll a gradually decreases as the lag time increases. Fig. 4B shows the mutual information values of 5 water quality variables for the chlorophyll a concentration at the next time. The experiment selects a water quality variable with a mutual information value greater than 0.2. Therefore, the input variables of the self-organizing depth confidence echo state network are the water temperature, the salinity, the oxygen saturation, the specific conductivity, the chlorophyll a concentration at the same moment and the chlorophyll a concentration at the lag three moments, and the output variable is the chlorophyll a concentration at the next moment. That is, the number of input variables is 8, and the number of output variables is 1.

Step two, establishing a structure of a deep confidence echo state network;

in the experiment for predicting the cyanobacterial bloom in the lake and reservoir, the self-organizing deep confidence echo state network starts to collect data after running 200 data to form a state matrix, the length of training data is 1600, and the length of testing data is 691. Wherein, the neuron of the hidden layer of the deep belief network part is initialized to 3, the iterative training time is 50, and the learning batchThe size is 50, the learning rate is 0.1, and β is 0.98. The initialization range of the input weight matrix element of the echo state network is [ -1,1 [ -1 [ ]]Taking the singular value of the diagonal matrix in SVD to be [0.1,0.99 ]]The size of the sub-reserve pool is 5, the regularization coefficient C is 1e-7, the robust coefficient mu is 1, the iteration frequency of output weight matrix solving is 15, the iteration frequency of the self-organization process of the reserve pool is 50, and the control parameter vector is uniformly distributed

Is (0.5,0.6,0.7,0.8, 0.9).

Designing a self-organizing mechanism and a training process of the optimized deep confidence echo state network;

the self-organizing process of hidden layers and reserve pools in the self-organizing deep confidence echo state network is shown in fig. 5A and 5B. In FIG. 5A, the neurons of the first hidden layer H1 and the second hidden layer H2 are finally stable at 7 and 6, respectively, and thus the final hidden layer structure is 7-6. During the process of reservoir size learning, the iterative training time is set to 100. As shown in fig. 5B, the pool size iteratively converges to 120, containing a total of 24 child pools, based on the self-organizing mechanism. Therefore, the structure of the self-organizing deep confidence echo state network in the experiment is 8-7-6-120-1. Fig. 5C shows the convergence curve of Root Mean Square Error (RMSE) during training. The training error of the self-organizing deep confidence echo state network finally converges to be near the minimum value of 0.383.

FIG. 6 shows the comparison of the predicted results of cyanobacterial bloom in lakes and reservoirs by using the self-organizing deep confidence echo state network and the predicted results by using other echo state network methods. It can be seen that the self-organizing deep confidence echo state network (SDBMESN) provided by the embodiment of the invention can effectively learn the evolution rule of the cyanobacterial bloom in the lake and reservoir relative to other echo state network models. Table 2 shows the comprehensive performance of the basic echo state network (OESN), the Regularized Echo State Network (RESN), the Growing Echo State Network (GESN), the adaptive regularized echo state network (DRESN), and the deep confidence echo state network (DBESN) in training and testing, including neural network structure and RMSE indices. Therefore, the lake and reservoir cyanobacterial bloom prediction method based on the self-organizing deep confidence echo state network has high prediction precision and good generalization capability. Meanwhile, the size of a reserve pool of the self-organizing deep confidence echo state network is smaller than that of other echo state networks, and the self-organizing deep confidence echo state network has the simplest neural network structure. Here, the DBESN for each set of experiments employs the same neural network structure as the self-organizing deep confidence echo state network. But under the condition of consistent neural network structure, the predicted performance of DBESN is lower than that of the self-organizing deep confidence echo state network provided by the invention. The self-organizing mechanism of the self-organizing deep confidence echo state network not only realizes structural simplification, but also reserves neurons and sub reserve pools with better relative performance in the existing neurons in the self-organizing process, so that the neurons and the reserve pools in the self-organizing deep confidence echo state network have better prediction effect, and the capability of the neurons and the reserve pools for processing dynamic information is further improved. Therefore, the self-organizing deep confidence echo state network is suitable for the prediction application of the lake and reservoir cyanobacterial bloom.

TABLE 2 blue algae bloom prediction experiment results and different method comparison

In the self-organizing deep confidence echo state network in the embodiment, the robustness loss function is used as the target function, so that the robustness of time series data prediction of abnormal values such as monitoring noise can be improved. To verify this feature, a 10% to 40% proportion of the pulse function was added to each training sample of the example dataset. The test results are shown in FIG. 7. It can be seen that the robustness of the self-organizing deep confidence echo state network of the embodiment of the invention is obviously superior to that of other echo state networks, and the robustness is better.

Claims

1. A lake and reservoir cyanobacterial bloom prediction method based on a self-organizing deep confidence echo state network is characterized by comprising the following steps: comprises the following steps of (a) carrying out,

determining an input variable and an output variable of a deep confidence echo state network;

the output variable is the chlorophyll a concentration of the characterization variable of the lake and reservoir cyanobacterial bloom, and the input variable is obtained by screening the water quality variables influencing the lake and reservoir cyanobacterial bloom by taking a mutual information method as a judgment criterion;

step two, establishing a structure of a deep confidence echo state network;

the structure of the deep confidence echo state network comprises a deep confidence network and an echo state network, wherein the echo state network adopts a modular reserve pool structure; the method comprises the following specific steps:

2.1, adopting a limited Boltzmann mechanism to form a basic unit of a deep confidence network, and extracting deep features of input variables;

2.2, learning the deep features and predicting the chlorophyll a concentration at the next moment by an echo state network;

designing a self-organization mechanism of the deep confidence echo state network and training the deep confidence echo state network;

firstly, defining importance indexes of neurons in the deep confidence network, then respectively designing respective self-organization mechanisms of the deep confidence network and the echo state network, and optimizing the structure of the deep confidence echo state network to obtain a self-organization deep confidence echo state network model;

2. The lake and reservoir cyanobacterial bloom prediction method based on the self-organizing deep confidence echo state network as claimed in claim 1, wherein: the mutual information value of the input variable and the chlorophyll a concentration at the same moment is more than 0.2; the input variables also include the chlorophyll-a concentration at three moments later.

3. The lake and reservoir cyanobacterial bloom prediction method based on the self-organizing deep confidence echo state network as claimed in claim 1, wherein: the deep belief network is formed by stacking two limited Boltzmann machines, each limited Boltzmann machine comprises two layers of neurons, and one layer is a visual layer and is used for being used as input of input variables of training data; the other layer is a hidden layer for extracting deep features as input variables of the training data.

4. The lake and reservoir cyanobacterial bloom prediction method based on the self-organizing deep confidence echo state network as claimed in claim 3, wherein: the reserve pool in the echo state network comprises a plurality of sub reserve pools, and each sub reserve pool is independent; setting the number of the sub-reserve pools as N_totalEach is then composed of N_totalReserve pool weight matrix W formed by individual reserve pools_* ^resIs a block diagonal matrix, namely:

wherein each weight matrix element W_iI is more than or equal to 1 and less than or equal to N for the weight matrix corresponding to the ith sub-reserve pool_total；W_iGenerated by singular value decomposition, i.e. W_i＝U_iS_iV_iWherein a diagonal matrix

Randomly generated by a given singular value distribution, and the weight matrix inside the sub-reserve pool is fully connected,

n_subis the size of the ith sub-pool,

is two random orthogonal matrices generated simultaneously, where u_pk,v_pk∈(-1,1)，p＝1,2,…,n_sub,k＝1,2,…,n_subThen, the mathematical expression of the echo state network is:

x(n)＝f^res(W_* ^resx(n-1)+Wⁱⁿu(n)) (2)

y(n)＝W_* ^outx(n) (3)

wherein u (n) is an input vector of K multiplied by 1 dimension at n time, K is the number of neurons of the last hidden layer of the deep belief network, namely the deep features of the deep belief network at n time;

x_i(n) is 1 Xn_subState vector of ith sub reserve pool n moment of dimension; y (n) is the output value of the echo state network at the moment n,

as an input weight matrix, W_i ⁱⁿIs n_subInput weight matrix, W, of ith sub-pool of dimension xK_* ^outIs 1 × (N)_total×n_sub) Output weight matrix of dimension, f^resIs the activation function of the reserve pool neurons, and takes the sigmoid function.

5. The lake and reservoir cyanobacterial bloom prediction method based on the self-organizing deep confidence echo state network as claimed in claim 4, wherein: the output weight matrix W_* ^outAdopting a robust loss function containing L2 regularization to iteratively solve output, combining the robust loss function E (k) of the regularization item from iteration to k step and outputting a weight matrix W_* ^out[k]The solving results of (1) are respectively:

W_* ^out[k]＝(H^TW_N ^[k]H+CI)^-1H^TW_N ^[k]T (5)

where C is the regularization coefficient and I is (N)_total×n_sub)×(N_total×n_sub) The identity matrix of the dimension(s),

2-norm, ρ (-) is a robust objective function,

for the training error iterated to the nth time of step k,

for residual robust scale estimation from iteration to k steps, MAR is the median absolute deviation,

to represent

A robust weight matrix of dimension, w (-) is a robust weight function, and the robust objective function ρ (-) and the robust weight function w (-) are respectively:

wherein z is a variable, k_set＝μk_defMu is a robust coefficient, k_def＝2.985。

6. The lake and reservoir cyanobacterial bloom prediction method based on the self-organizing deep confidence echo state network as claimed in claim 5, wherein: the robust weight function is a Welsch function.

7. The lake and reservoir cyanobacterial bloom prediction method based on the self-organizing deep confidence echo state network as claimed in claim 4, wherein: the echo state network further comprises the step of collecting an internal state matrix H, in particular, assumed from n_minFrom time +1 to L_trainTime-of-day collection internal state matrix H ═ x (n)_min+1),…,x(L_train)]^TThe corresponding desired output vector is T ═ T (n)_min+1),…,t(L_train)]^T，t(n_min+1) is n_minThe desired output value at time + 1.

8. The lake and reservoir cyanobacterial bloom prediction method based on the self-organizing deep confidence echo state network as claimed in claim 5, wherein: the self-organization mechanism of the deep belief network in step three comprises a splitting mechanism of hidden layer neurons and a pruning mechanism of hidden layer neurons, and specifically,

for each hidden layer neuron of the deep belief network, iterate to kth₁Importance index of neurons in step

Is defined as:

wherein,

and

respectively the input and output of the jth neuron of the l-th layer,

is shown as

And

the mutual information value between the two information blocks,

is shown as

And the mutual information value between the desired output vector T,

(3.1) mechanism of cleavage of hidden layer neurons: when the jth neuron of the l-th layer meets the following condition:

the jth neuron splits into two neurons,

is iterated to the k < th >₁Total number of layer I neurons in step (ii);

(3.2) mechanism of pruning of hidden layer neurons: define iteration to kth₁The adaptive pruning threshold at step time is as follows:

wherein beta is (0,1)]When the jth neuron satisfies the condition

Then the jth neuron is deleted.

9. The lake and reservoir cyanobacterial bloom prediction method based on the self-organizing deep confidence echo state network as claimed in claim 8, wherein: the self-organization mechanism of the echo state network in step three comprises a sub-pool screening and growing mechanism, specifically,

(3.3) screening mechanism of child pools: defining importance index of ith sub-reserve pool in reserve pool

Comprises the following steps:

wherein

Is the input vector of the p-th neuron of the ith sub-reservoir,

for the output vector of the p-th neuron of the ith sub-reservoir, iteratively training to k₂Randomly generating i consistent with the structure of the original reserve pool in step_max(k₂) A temporary sub-reserve pool

Sorting according to the size of the importance indexes:

the adaptive screening threshold is defined as follows:

S_th(k₂)＝NS′_sub(INT(αi_max(k₂))) (12)

wherein INT (-) is an integer function,

for the sorted sub-reserve pool importance vectors, alpha is an element (0,1) which is a self-defined control parameter;

when iterating the k₂When the step is carried out, the ith sub-reserve pool meets the following requirements:

and robust loss function E (k) for all sub-pools₂) When the value is less than or equal to the minimum value of the historical robust loss function, the sub reserve pools are reserved, and the rest sub reserve pools are deleted;

(3.4) growth mechanism of child reserve pool: combining each screened sub reserve pool with a new randomly generated sub reserve pool, wherein the weight matrix of the output vector of the echo state network after combination is as follows:

to merge the state matrices corresponding to the grown pools,

is composed of

An identity matrix of dimensions, wherein,

the total number of the merged and increased child reserve pools;

further, an output weight matrix is obtained based on the formula (14)

The updated mathematical expression is:

wherein, I_oIs (N)_o×n_sub)×(N_o×n_sub) Identity matrix of dimension, N_oFor the number of child pools after completion of the screening mechanism, I_gIs n_sub×n_subIdentity matrix of dimension, I_LIs (L)_train-n_min)×(L_train-n_min) An identity matrix of dimensions.