CN115587538A

CN115587538A - Lake and reservoir cyanobacterial bloom prediction system based on SMRELM model

Info

Publication number: CN115587538A
Application number: CN202211248275.8A
Authority: CN
Inventors: 张慧妍; 刘明伟; 王小艺; 王立; 许继平; 孙茜; 王昭洋
Original assignee: Beijing Technology and Business University
Current assignee: Beijing Technology and Business University
Priority date: 2022-07-08
Filing date: 2022-10-12
Publication date: 2023-01-10

Abstract

The invention discloses a lake and reservoir cyanobacterial bloom prediction system based on an SMRELM model, which is characterized in that a cyanobacterial bloom time sequence characteristic information extraction module (100), an S-ELM model (200) and an ECM model (30) are added in the traditional cyanobacterial bloom prediction. The method comprises the following prediction steps: the chlorophyll a concentration is used as a characterization index for describing the formation of the cyanobacterial bloom and is used for constructing a time sequence characteristic set; reconstructing a chlorophyll a concentration time sequence through a sliding window, and training a reconstructed input sample by adopting a manifold regularization extreme learning machine; according to the similarity condition, the switching prediction of the cyanobacterial bloom is realized by combining a time sequence feature set and a manifold regularization extreme learning machine; and establishing an error compensation model by adopting improved fuzzy C-means clustering and a T-S fuzzy neural network, and correcting the result of the prediction model. The method solves the problem that the traditional batch type extreme learning machine has lower precision in the blue algae bloom prediction.

Description

Lake and reservoir cyanobacterial bloom prediction system based on SMRELM model

Technical Field

The invention relates to the technical field of blue algae water bloom prediction, in particular to a lake and reservoir blue algae water bloom prediction system based on an SMRELM model.

Background

The eutrophication of the water body refers to a pollution phenomenon that under the combined action of natural factors and human activities, the nutrient substances in the water body are increased, so that the excessive growth of plants and the change of the ecological balance of the water body are caused. The causes of eutrophication are complex and diverse, and relate to the influence of various ecological, social, economic and other factors. The cyanobacterial bloom in lakes and reservoirs is a common ecological disaster caused by eutrophic lakes, and the water body is anoxic by generating toxins and dying and decomposing, so that the normal dissolved oxygen balance of the water body is broken, the water quality is further deteriorated, the human health is threatened, and serious economic loss and social influence are caused.

The author Schroetian's early warning research on cyanobacterial bloom in Suzhou Taihu lake drinking water source' published in 12 months in 2019. Page 12 of the text introduces the process of establishing a cyanobacterial bloom prediction model, and the specific flow is shown in fig. 1.

In the 3 rd phase "ecology newspaper" of volume 25 of 3.2005, thought on the mechanism of formation of cyanobacterial blooms in large shallow water eutrophic lakes was disclosed, and the authors showed a strong and high light. The method indicates that the growth of the blue algae and the formation of the water bloom can be divided into 4 stages of dormancy, resuscitation, biomass increase (growth), floating, aggregation and the like in large shallow lakes at the middle and lower reaches of Yangtze river with clear four seasons and severe disturbance, and the physiological characteristics and the dominant environmental influence factors of the blue algae in each stage are different. In winter, the dormancy of the bloom-forming cyanobacteria is mainly influenced by low temperature and dark environment; the recovery process in spring is mainly controlled by the temperature and dissolved oxygen on the deposition surface of the lake, substances and energy required by photosynthesis and cell division determine the growth conditions of the water-blooming cyanobacteria in spring and summer, and once proper meteorological and hydrological conditions exist, a large amount of water-blooming cyanobacteria groups accumulated in the water body float to the surface of the water body to accumulate to form visible water bloom. The research on the forming mechanism of the cyanobacteria bloom requires to search the trigger factors or specificity factors of each main physiological stage leading to the formation of the bloom, and the intensive research is carried out aiming at the physiological characteristics of the cyanobacteria at different stages. Only then is it possible to gradually understand the mechanism of formation of cyanobacterial bloom and predict each process of its occurrence, seeking more targeted control measures. ".

The current blue algae bloom forecasting model in the lake and reservoir is mainly divided into a mechanism driving model and a data driving model. The mechanism driving model is mainly based on physical, biological and chemical factors of lake and reservoir cyanobacterial bloom, a differential equation set is established, a numerical method is used for solving, the cause of the lake and reservoir cyanobacterial bloom outbreak is analyzed, and the evolution rule and trend are predicted. Although the theoretical machine-driven model is more interpretable, the startup operation requires more parameters, such as: water depth, underwater shear force, etc., which are relatively complex in practical application. The data driving model can excavate the internal law of the lake-reservoir cyanobacteria bloom evolution from the accumulated time sequence data without prior parameters such as the specific mechanism of the lake-reservoir cyanobacteria bloom in the model establishing process, and is widely concerned in the forecast of the lake-reservoir cyanobacteria bloom.

Disclosure of Invention

The invention aims to solve the technical problem that the existing lake and reservoir cyanobacterial bloom prediction method is low in precision, and a lake and reservoir cyanobacterial bloom prediction model, namely an SMRELM model, is constructed by effectively excavating the time sequence characteristics of the lake and reservoir cyanobacterial bloom and combining an error compensation mode.

The invention relates to a lake and reservoir cyanobacterial bloom prediction system based on an SMRELM model, which is improved in that:

arranging a cyanobacterial bloom time sequence characteristic information extraction module (100) in a cyanobacterial bloom dynamic change trend module (10);

arranging an S-ELM model (200) at the output end of the blue algae water bloom dynamic change trend module (10);

an ECM model (30) is provided at the output of the S-ELM model (200).

The invention discloses a lake and reservoir cyanobacterial bloom prediction system based on an SMRELM model, which is characterized in that a cyanobacterial bloom time sequence characteristic information extraction module, an S-ELM model and an ECM model are added in the traditional cyanobacterial bloom prediction. The method comprises the following prediction steps: the chlorophyll a concentration is used as a characterization index for describing the formation of the cyanobacterial bloom and is used for constructing a time sequence characteristic set; reconstructing a chlorophyll a concentration time sequence through a sliding window, and training a reconstructed input sample by adopting a manifold regularization extreme learning machine; according to the similarity condition, the switching prediction of the cyanobacterial bloom is realized by combining a time sequence feature set and a manifold regularization extreme learning machine; and establishing an error compensation model by adopting improved fuzzy C-means clustering and a fuzzy neural network, and correcting the result of the prediction model. The invention solves the problem of low precision of the traditional extreme learning machine in the blue algae bloom prediction.

The SMRELM model has the advantages that:

(1) the invention provides a method for analyzing chlorophyll a concentration information CYB _ LD in historical cyanobacterial bloom, further extracting the CYB _ LD information according to a set floating recovery threshold and a set continuous growth threshold, and obtaining representative cyanobacterial bloom candidate time sequence characteristic information (namely CYB _ LD) ^HX Information); then according to the distance similarity, eliminating CYB _ LD ^HX The time sequence characteristics of the cyanobacterial bloom are repeated in the information, and the prior time sequence characteristic information (namely CYBTF information) of the cyanobacterial bloom is obtained, wherein the CYBTF information can provide the prior local shape characteristics for the invention.

(2) According to the invention, CYBTF information is divided into comparison characteristic information (DB _ CYB information) and prediction characteristic information (CS _ CYB information). And calculating the trend similarity and the distance similarity of the DB _ CYB information and the local prediction segments, and when a threshold condition is met, taking the prediction characteristics as a prediction value of the prediction model at the next moment, otherwise, directly adopting an extreme learning machine to predict. Therefore, the prediction accuracy of the prediction model of the extreme learning machine can be improved by combining the prior local shape characteristics. In addition, a manifold regular term is introduced to reduce the influence of the extreme learning machine caused by random initialization and improve the generalization capability of the extreme learning machine.

(3) The invention provides an improved fuzzy C-means clustering algorithm by combining the characteristics of subtractive clustering and simultaneously considering the intra-class compactness and the inter-class separation degree of the fuzzy C-means clustering. And combining the error compensation model with a fuzzy neural network prediction model to obtain an improved error compensation model. And training and optimizing the error compensation model so as to compensate the prediction result of the S-ELM prediction model.

(4) The invention constructs a comprehensive prediction model based on time sequence characteristics and error compensation, and realizes effective prediction of lake and reservoir cyanobacterial bloom by extracting the time sequence characteristics of the lake and reservoir cyanobacterial bloom and fully utilizing error data.

Drawings

FIG. 1 is a flow chart of the establishment of a traditional cyanobacterial bloom prediction model.

FIG. 1A is a display picture of different chlorophyll a concentrations in the blue algae bloom after inversion.

Fig. 2 is a topology structure diagram of a conventional T-S fuzzy neural network.

FIG. 2A is a flow chart of the invention for constructing an MRELM model.

FIG. 3 is a structural block diagram of the lake and reservoir cyanobacterial bloom prediction system based on the SMRELM model.

FIG. 4 is a flow chart of blue algae bloom prediction in lakes and reservoirs by applying the SMRELM model of the invention.

Fig. 5 is a prediction graph of different distance similarity thresholds in an embodiment.

FIG. 6 is a comparison of the predicted result of cyanobacterial bloom in lakes and reservoirs with that of other prediction methods in the prior art.

FIG. 7 is a comparison of the predicted results of cyanobacterial bloom in lakes and reservoirs in the examples with the stability of each model of other existing prediction methods.

10. Blue algae bloom dynamic change trend module	100. Blue algae bloom time sequence characteristic information extraction module
		20. Prediction index module of chlorophyll a concentration	200.S-ELM model
ECM model

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples.

In the present invention, the Switching prediction model is abbreviated as S-ELM (Switching extreme learning machine) model. The improved T-S fuzzy neural network is called ECM (Error complement Model) Model for short, namely an Error compensation Model based on the fuzzy neural network. The SMRELM model is a short name of a comprehensive prediction model which is composed of an S-ELM model and an ECM model and is based on time sequence characteristics and error compensation.

The ECM model of the invention refers to improvement of an ambiguity layer under a T-S fuzzy neural network structure. T-S fuzzy neural network structure refers to PM based on T-S fuzzy neural network published by volume 25, no. 3 control engineering in month 3 of 2018 _2.5 Prediction research, authors jogjunfei, caijie, hanhonggui; section 3.1. The structure of the T-S fuzzy neural network is shown in FIG. 2, and the ECM model of the present invention is shown in FIG. 2The fuzzy layer in the structure is improved by different generation modes of membership function, and the ECM model has the effects that: the number of Gaussian membership functions in the fuzzy layer can be prevented from being set artificially, and the influence of outliers on the prediction precision is reduced.

The S-ELM model judges whether the trained MRELM model or the cyanobacterial bloom prior time sequence characteristic information CYBTF is selected or not according to the similarity switching condition, so that the prediction precision of the chlorophyll a concentration in the cyanobacterial bloom is improved.

In the present invention, the Manifold regularization Extreme Learning Machine is abbreviated as MRELM (modified regulated empirical Learning Machine) model.

Referring to the display of different chlorophyll a concentrations in the blue algae bloom after inversion shown in fig. 1A, different chlorophyll a concentration values at the sampling time can be obtained from fig. 1A, and are represented by different shades of gray. Just for the different chlorophyll a concentration values, the invention needs to perform the feature extraction again on the cyanobacteria bloom features (namely, the added cyanobacteria bloom time sequence feature information extraction module 100), and the extracted local shape features participate in the construction of the S-ELM model and are also the basis for completing the switching prediction judgment.

In the invention, four stages of the cyanobacteria bloom are newly set up by combining four stages disclosed in thinking of a cyanobacteria bloom forming mechanism in a large shallow water eutrophic lake, a chlorophyll a concentration value when the cyanobacteria bloom is formed and a selection threshold designed in the invention, wherein the four stages are respectively a floating recovery stage, a continuous growth stage, an outbreak stage and an apoptosis stage.

Referring to fig. 1 and fig. 3, in one aspect of the present invention, a cyanobacteria bloom time series characteristic information extraction module 100 is added to the cyanobacteria bloom dynamic change trend module 10; on the other hand, an S-ELM model 200 is added behind the blue algae bloom dynamic change trend module 10; the third aspect requires processing by the ECM model 30 before outputting a predicted value of chlorophyll-a concentration.

In the invention, the analysis information of the historical cyanobacterial bloom in the cyanobacterial bloom dynamic change trend model 10 comprises two parts, one part isHistorical cyanobacterial bloom analysis information for constructing S-ELM model is recorded as CYB, and CYB = { data = { (data) ₁ ,data ₂ ,…,data _δ }; the other part is historical cyanobacterial bloom analysis information used for constructing an ECM model and is recorded as CYB ^ECM And is and

and the lower corner mark delta is the total number of the historical cyanobacterial bloom analysis information used for constructing the S-ELM model.

And the lower corner mark f is the total number of the historical cyanobacterial bloom analysis information used for constructing the ECM model.

data ₁ And the 1 st historical cyanobacterial bloom analysis information used for constructing the S-ELM model in the cyanobacterial bloom dynamic change trend model 10 is represented.

data ₂ And (3) representing the 2 nd historical cyanobacterial bloom analysis information used for constructing the S-ELM model in the cyanobacterial bloom dynamic change trend model 10.

data _δ And representing the last historical cyanobacterial bloom analysis information used for constructing the S-ELM model in the cyanobacterial bloom dynamic change trend model 10.

And the 1 st historical cyanobacterial bloom analysis information used for constructing the ECM model in the cyanobacterial bloom dynamic change trend model 10 is represented.

And (3) representing the 2 nd historical cyanobacterial bloom analysis information used for constructing the ECM model in the cyanobacterial bloom dynamic change trend model 10.

And the last historical cyanobacterial bloom analysis information used for constructing the ECM model in the cyanobacterial bloom dynamic change trend model 10 is represented.

In the invention, each historical cyanobacterial bloom analysis information consists of sampling time and chlorophyll a concentration value. Namely:

data ₁ is recorded as time _ data ₁ 。data ₁ The chlorophyll a concentration value is recorded as ld _ data ₁ 。

data ₂ Is recorded as time _ data ₂ 。data ₂ The chlorophyll a concentration value of (A) is recorded as ld _ data ₂ 。

data _δ Is recorded as time _ data _δ 。data _δ The chlorophyll a concentration value is recorded as ld _ data _δ 。

Is counted as a sampling time

The chlorophyll a concentration value of (A) is recorded

Is counted as a sampling time

The chlorophyll a concentration value of (A) is recorded

Is recorded as

The chlorophyll a concentration value is recorded as

In the present invention, CYB and CYB ^ECM The cyanobacterial bloom analysis information is sequenced according to the sampling time.

Then there are: historical cyanobacterial bloom analysis information for constructing S-ELM model

The chlorophyll a concentration value obtained from the CYB is marked as CYB _ LD (concentration for constructing S-ELM model for short), and CYB _ LD = { LD _ data = { (D _ data) } ₁ ,ld_data ₂ ,…,ld_dataδ}。

Then there are: historical cyanobacterial bloom analysis information for constructing ECM model

From the CYB ^ECM The chlorophyll a concentration value obtained in (1) is recorded as CYB ^ECM LD (concentration for short in constructing ECM model), and

(I) constructing a manifold regularization extreme learning machine model which is recorded as an MRELM model

In the invention, the manifold regularization extreme learning machine is a method of the manifold regularization extreme learning machine disclosed in section 2 of "speech recognition system based on manifold regularization extreme learning machine" with reference to the automatic journal of 09 months in 2015, author of which is xu jiaming.

In the present invention, the grid search method is the grid search method disclosed in section 2 of "determination of kernel function parameters of support vector machine based on grid search", which is described in the journal of national oceanic university in 09 months of 2005, with reference to the author waning.

The specific construction steps of the MRELM model are as follows:

a, constructing a network architecture of an MRELM model;

in the invention, the network architecture of the MRELM model comprises an input layer, a hidden layer and an output layer.

B, establishing an input layer information of the MRELM model;

in the invention, the input layer of the MRELM model is used for receiving the concentration CYB _ LD = { LD _ data for constructing the S-ELM model on the first aspect ₁ ,ld_data ₂ ,…,ld_dataδ}；

The second aspect of the input layer of the MRELM model sets the sliding window width of the MRELM model, and is marked as H _w (ii) a The value range of the width of the sliding window is H _w ＝[3,4,…,τ](ii) a The maximum value of the sliding window width is denoted as τ.

The third aspect of the input layer of the MRELM model is according to the H _w CYB _ LD = { LD _ data ₁ ,ld_data ₂ ,…,ld_data _δ The chlorophyll a concentration in (f) was taken as concentration-training sample set, and is scored as D _ ALL, and

LD _ DATA represents the training-chlorophyll a concentration value, an

T _ DATA represents the sliding window-chlorophyll a concentration value, and T _ DATA = [ T _ DATA = [ T = ₁ ；T ₂ ；…；T _h ]。

The lower subscript h is the total number of concentration-training sample input sequences in the MRELM model.

The 1 st concentration-training sample input sequence for the MRELM model is represented.

The 2 nd concentration of the MRELM model-the training sample input sequence is represented.

The h concentration of the MRELM model is represented-the training sample input sequence.

T ₁ Denotes MRE1 st sliding window concentration of LM model.

T ₂ The 2 nd sliding window concentration of the MRELM model is represented.

T _h Represents the h-th sliding window concentration of the MRELM model.

E.g. sliding window width H _w =3, then

T ₁ ＝ld_data ₄ 。

ld_data ₃ Data for representing analysis information of 3 rd historical cyanobacterial bloom ₃ Chlorophyll a concentration value of.

ld_data ₄ Data for representing analysis information of the 4 th historical cyanobacterial bloom ₄ Chlorophyll a concentration value of.

E.g. sliding window width H _w =3, then

T ₂ ＝ld_data ₅ 。

ld_data ₅ Data for representing analysis information of 5 th historical cyanobacterial bloom ₅ Chlorophyll a concentration value of (a).

E.g. sliding window width H _w =3, then

ld_data _δ-3 Data for representing the analysis information of delta-3 historical cyanobacterial bloom _δ-3 Chlorophyll a concentration value of.

ld_data _δ-2 Data for representing delta-2 historical cyanobacterial bloom analysis information _δ-2 Chlorophyll a concentration value of.

ld_data _δ-1 Data for representing the analysis information of delta-1 historical cyanobacterial bloom _δ-1 Chlorophyll a concentration value of.

ld_data _δ Data for representing delta-th historical cyanobacterial bloom analysis information _δ Chlorophyll a concentration value of.

Then, when the sliding window width H is _w Concentration-training sample set characterization when =3Comprises the following steps:

c, establishing a step C, and setting the number of neurons of a hidden layer of the MRELM model;

in the invention, the number of neurons in the hidden layer of the MRELM model is recorded as L _h (ii) a The value range of the number of the neurons in the hidden layer is 1 < L _h Kappa is less than or equal to kappa. The maximum number of hidden layer neurons is denoted as κ. Each neuron of the hidden layer receives excitation connection from all neurons of the input layer, namely LD _ DATA is subjected to characteristic mapping of the neurons in the hidden layer to obtain output information of the hidden layer, and the output information is recorded as

Refers to the (h × 1) -dimensional output value obtained from LD _ DATA passing through the 1 st neuron in the hidden layer.

Refers to the (h × 1) -dimensional output value obtained from LD _ DATA passing through the 2 nd neuron in the hidden layer.

Means from LD _ DATA through Lth in the hidden layer _h The (h × 1) -dimensional output value obtained by each neuron.

D, constructing the output of the MRELM model;

in the present invention, the manifold regularization extreme learning machine is an output space H with LD _ DATA in the hidden layer ^out Can maintain its local geometry in the input layer, i.e. if two training sample sequences

The similarity in the input layer is high, and the similarity of the input layer and the hidden layer in the output space is high, so that the influence of randomness is reduced. The generalization performance of the manifold regularization extreme learning machine is improved.

The h-1 concentration of the MRELM model-training sample input sequence is represented.

When the output layer of the MRELM model contains 1 neuron, the output information corresponding to the neuron is recorded as

And the LDD ^out ＝H ^out X β, β represents the weight between the hidden layer and the output layer.

And (3) representing the output value corresponding to the 1 st concentration-training sample input sequence of the MRELM model.

And (3) representing the output value corresponding to the input sequence of the 2 nd concentration-training sample of the MRELM model.

And representing the output value corresponding to the h concentration-training sample input sequence of the MRELM model.

In the invention, the MRELM model is obtained through the construction steps A to D. The optimization of the MRELM model is shown in FIG. 2A.

Constructing a step E, and optimizing an MRELM model;

and a step E101 of constructing, setting a training sample subset based on a grid search method, and recording the number of the divided training sample subsets as b.

Constructing a step E102 of dividing D _ ALL into b concentration-training sample subsets of the same size, denoted as SUB _ D, and

BLD denotes sample time ordering-chlorophyll a concentration value, and BLD = [ BLD = [ BLD = ₁ ；bld ₂ ；…；bld _b ]。

T _ BLD denotes sample time ordering-sliding window-chlorophyll a concentration value, and T _ BLD = [ T _ BLD ₁ ；t_bld ₂ ；…；t_bld _b ]。

bld ₁ And (3) representing the 1 st cyanobacterial bloom time sequence-concentration-training sample input subset divided based on the grid search method.

bld ₂ And (3) representing the input subset of the 2 nd cyanobacterial bloom time sequence-concentration-training sample divided based on the grid search method.

bld _b And (4) representing the b-th cyanobacterial bloom time sequence-concentration-training sample input subset divided based on the grid search method.

t_bld ₁ And (3) representing the 1 st cyanobacterial bloom time sequence-sliding window concentration divided based on a grid search method.

t_bld ₂ And (3) representing the 2 nd cyanobacterial bloom time sequence-sliding window concentration divided based on a grid search method.

t_bld _b And (4) representing the concentration of the b-th cyanobacterial bloom time sequence-sliding window after being divided based on a grid search method.

In the present invention, the terms δ, H _w Constraint formed by b

To ensure that the partitioned b training sample subsets are the same size, an

Are all positive integers.

For example, when H _w =3,b =4, the 1 st training subset bld divided based on the grid search method ₁ The middle element is

And

denotes the first

Individual historical cyanobacterial bloom analysis information

Chlorophyll a concentration value of.

Denotes the first

Individual history cyanobacterial bloom analysis information

Chlorophyll a concentration value of (a).

Denotes the first

Individual historical cyanobacterial bloom analysis information

Chlorophyll a concentration value of.

Is shown as

Individual historical cyanobacterial bloom analysis information

Chlorophyll a concentration value of (a).

For example, when H =3,b =4, the divided 2 nd training subset bld is based on the grid search method ₂ The middle element is

Is shown as

Individual historical cyanobacterial bloom analysis information

Chlorophyll a concentration value of.

Is shown as

Individual historical cyanobacterial bloom analysis information

Chlorophyll a concentration value of.

Denotes the first

Individual historical cyanobacterial bloom analysis information

Chlorophyll a concentration value of (a).

Is shown as

Individual historical cyanobacterial bloom analysis information

Chlorophyll a concentration value of.

For example, when H _w =3,b =4, the 3 rd training subset bld is divided based on the grid search method ₃ The middle element is

And

denotes the first

Individual historical cyanobacterial bloom analysis information

Chlorophyll a concentration value of.

Is shown as

Individual historical cyanobacterial bloom analysis information

Chlorophyll a concentration value of.

Is shown as

Individual historical cyanobacterial bloom analysis information

Chlorophyll a concentration value of.

Is shown as

Individual history cyanobacterial bloom analysis information

Chlorophyll a concentration value of.

For example, when H _w If =3,b =4, the 4 th training subset bld is divided based on the grid search method ₄ The middle element is

And

a construction step E103 of setting a subset of training samples SUB _ D _Training And evaluating the subset of samples SUB _ D _{Evaluation of} ；

In the present invention, SUB _ D _Training The study was performed in MRELM.

In the present invention, SUB _ D _{Evaluation of} The evaluation was performed in MRELM after training.

In the present invention, SUB _ D _{Evaluation of} Selecting a b-th cyanobacterial bloom time sequence-concentration-training sample input subset from the SUB _ D; then SUB _ D _Training Is to divide the SUB _ D _{Evaluation of} All other blue algaeWater bloom time series-concentration-training sample input subset.

For example, b =4, from SUB _ D = [ bld ] ₁ ,t_bld ₁ ；bld ₂ ,t_bld ₂ ；bld ₃ ,t_bld ₃ ；bld ₄ ,t_bld ₄ ]The 4 th cyanobacterial bloom time sequence-concentration-training sample input subset bld is selected ₄ ,t_bld ₄ ]Is denoted as SUB _ D _{Evaluation of} I.e. SUB _ D _{Evaluation of} ＝[bld ₄ ,t_bld ₄ ](ii) a And SUB _ D _Training The element in (1) has SUB _ D _Training ＝[bld ₁ ,t_bld ₁ ；bld ₂ ,t_bld ₂ ；bld ₃ ,t_bld ₃ ]。

bld ₃ And (4) representing the 3 rd cyanobacterial bloom time sequence-concentration-training sample input subset divided based on the grid search method.

bld ₄ And (4) representing the input subset of the 4 th cyanobacterial bloom time sequence-concentration-training sample divided based on the grid search method.

t_bld ₃ And (3) representing the 3 rd cyanobacterial bloom time sequence-sliding window concentration divided based on the grid search method.

t_bld ₄ And (4) representing the concentration of the 4 th cyanobacterial bloom time sequence-sliding window divided based on a grid search method.

A step E104 is constructed, and referring to a formula 12 and a formula 15 in the speech recognition system based on the manifold regularization extreme learning machine, two weighting coefficients in the MRELM model objective function are respectively recorded as a first weighting coefficient C ₁ Second weighting factor C ₂ And the weight is denoted as β. Will the SUB _ D _Training And (5) putting the model into an MRELM model for learning to obtain the weight beta.

Construction step E105, SUB _ D _{Evaluation of} Middle bld ₄ Putting the test sample in a trained MRELM model for evaluation, outputting a chlorophyll a concentration evaluation value corresponding to the evaluation sample, and recording the chlorophyll a concentration evaluation value as an LDD ^out _BLD _{Evaluation of} 。

Construction step E106, calculating SUB _ D _{Evaluation of} Middle t _ bld ₄ And LDD ^out _BLD _{Evaluation of} BetweenThe root mean square error value of (1) is recorded as JFC;

in the present invention, a smaller JFC indicates a higher prediction accuracy of the MRELM model.

A construction step E107 of setting a weighting coefficient C ₁ ,C ₂ And calculating root mean square error values under different parameter conditions;

setting a weighting coefficient C ₁ ,C ₂ Respectively has a value range of C ₁ ＝[10 ^-8 ,10 ^-7 ,…,ψ ₁ ]， C ₂ ＝[10 ^-8 ,10 ^-7 ,…,ψ ₂ ]，ψ ₁ Is a first weighting coefficient C ₁ Maximum value of, # ₂ Is the second weighting coefficient C ₂ Is measured.

According to H _w ，L _h ，C ₁ ，C ₂ Is taken from the value of SUB _ D _{Evaluation of} Sample subset bld of (1) _b As input, under the condition of examining different values, the MRELM model outputs corresponding evaluation values, and the evaluation values and t _ bld are calculated _b Root mean square error value of

E.g., b =4, when H is _w ＝3，L _h ＝2，C ₁ ＝10 ^-8 ，C ₂ ＝10 ^-8 When it is, SUB _ D _{Evaluation of} Sample subset bld of (1) ₄ As input, under the condition, the MRELM model outputs a corresponding evaluation value and t _ bld ₄ Root mean square error value of

E.g., b =4, when H is _w ＝3，L _h ＝3，C ₁ ＝10 ^-8 ，C ₂ ＝10 ^-8 When SUB _ D is detected _{Evaluation of} Sample subset bld of (1) ₄ As input, under the condition, the MRELM model outputs corresponding evaluation value and t _ bld ₄ Root mean square error value of

E.g. b =4, when H _w ＝τ，L _h ＝κ，C ₁ ＝ψ ₁ ，C ₂ ＝ψ ₂ When it is, SUB _ D _{Evaluation of} Sample subset bld of (1) ₄ As input, under the condition, the MRELM model outputs corresponding evaluation value and t _ bld ₄ Root mean square error value of

A construction step E108, selecting the minimum root mean square error value in the construction step E107, and marking as JFC _{Minimum size} (ii) a And the JFC _{Minimum size} Corresponding to H _w ，L _h ，C ₁ ，C ₂ And as the parameters of the MRELM model, the MRELM model is optimized to obtain the trained MRELM model.

(II) feature extraction of blue algae bloom

In the invention, the evolution process of the cyanobacterial bloom refers to the four-stage theory of the cyanobacterial bloom disclosed in section 2.1 of thinking of cyanobacterial bloom formation mechanism in large shallow water eutrophic lakes, by the author of the hole propagation in 3 months ecology bulletin 2005.

In the invention, the method for extracting the characteristic information of the cyanobacterial bloom refers to the characteristic extraction method disclosed in section 4 of the 2018 journal of 12-month pattern recognition, by the author Wang Haishuai, time series features learning with labeled and unlabeled data.

In the invention, the cyanobacterial bloom time sequence characteristic information extraction module 100 is arranged in the cyanobacterial bloom dynamic change trend model 10. The method aims to extract the existing cyanobacterial bloom data in the cyanobacterial bloom dynamic change trend module 10 according with the effective time sequence characteristics of the invention.

A characteristic extraction step 1, receiving chlorophyll a concentration information of cyanobacterial bloom;

receiving the blue algae bloom dynamic change trend module 10

The chlorophyll a concentration information in (1) is recorded as CYB _ LD, and CYB _ LD = [ LD _ data = ₁ ,ld_data ₂ ,…,ld_data _δ ]。

A characteristic extraction step 2, setting chlorophyll a concentration threshold value information;

the chlorophyll a concentration related parameters set in the invention are as follows:

recording the critical value of the floating recovery stage of the cyanobacterial bloom as cu;

the critical value of the continuous growth stage of the cyanobacterial bloom is marked as cv;

and marking the outbreak threshold value of the cyanobacterial bloom as cw.

In the invention, the chlorophyll a concentration in the floating recovery stage of cyanobacterial bloom in lakes and reservoirs is generally shown as follows: chlorophyll a concentration value (such as ld _ data) in water body _δ ) Is greater than or equal to cu and less than cv (i.e., cu ≦ ld _ data) _δ < cv). Along with ld _ data _δ Gradually rises when ld _ data _δ Greater than or equal to cv (i.e., ld _ data) _δ Not less than cv), the cyanobacterial bloom in lakes and reservoirs enters a continuous growth stage until ld _ data _δ Greater than or equal to the burst threshold cw (i.e., ld _ data) _δ Not less than cw), indicating that the cyanobacterial bloom in lakes and reservoirs will burst.

In the invention, the set threshold value cu of the floating resuscitation stage is selected according to the maximum chlorophyll a concentration LD _ data in CYB _ LD _{Maximum of} And regulating coefficient p in the floating recovery stage _{Float upwards} I.e. cu = p _{Float upward} ×ld_data _{Maximum of} . The set critical value cv for the continuous growth stage is selected according to the maximum chlorophyll a concentration LD _ data in CYB _ LD _{Maximum of} And a continuous growth phase regulation factor p _{Growth of} I.e. cv = p _{Growth of} ×ld_data _{Maximum of} ，

p _{Growth of the seed} ＝1.76×p _{Float upward} 。

In the present invention, the unit of the sampling time is day. The chlorophyll a concentration is expressed in mg/L.

In the invention, the concentration value of chlorophyll a during the formation of cyanobacterial bloomDenoted as ld _ data _{Form a} And ld _ data _{Form (a) a} =0.01mg/L (thought of mechanism of formation of cyanobacterial bloom in large shallow water eutrophic lakes).

A feature extraction step 3 of extracting feature information of a continuous growth stage;

in the invention, according to the chlorophyll a concentration interval cv is less than or equal to ld _ data _δ < cw, extracting the candidate cyanobacterial bloom characteristic information from CYB _ LD, and recording as CYB _ LD ^HX And is and

and the lower corner mark Q is the total number of the candidate cyanobacterial bloom characteristic information.

Representing the characteristic information of the selected 1 st candidate cyanobacterial bloom.

And representing the characteristic information of the selected 2 nd candidate cyanobacterial bloom.

Representing the characteristic information of the selected Q candidate cyanobacterial bloom.

A characteristic extraction step 4, removing repeated characteristic information of the cyanobacterial bloom;

in the invention, CYB _ LD is divided according to distance similarity ^HX Removing the characteristic of the repeated cyanobacterial bloom information to obtain the prior time sequence characteristic information of the cyanobacterial bloom, and recording as CYBTF, wherein the CYBTF = { cyb = ₁ ,cyb ₂ ,…,cyb _W }. The lower corner mark W is the total number of the characteristic sequences of the prior time sequence of the cyanobacterial bloom.

In the present invention, the distance similarity refers to the similarity between chlorophyll-a concentrations obtained by calculating the euclidean distance.

In the invention, each time sequence characteristic information in the cyanobacterial bloom prior time sequence characteristic information CYBTF consists of two parts, namely comparison characteristic information and prediction characteristic information, and is recorded as

cyb ₁ And (3) representing the 1 st time sequence characteristic information of the cyanobacterial bloom prior.

The 1 st contrast characteristic information is represented.

Indicating the 1 st predictive feature information.

cyb ₂ And (3) representing the 2 nd cyanobacterial bloom prior time sequence characteristic information.

The 2 nd contrast characteristic information is represented.

Indicating the 2 nd predictive feature information.

cyb _W And (4) representing the Wth cyanobacterial bloom prior time sequence characteristic information.

Representing the W-th contrast characteristic information.

Indicating the W-th predicted feature information.

For example cyb ₁ ＝[ld_data ₁ ,ld_data ₂ ,…,ld_data _ε ]ε represents belonging to cyb ₁ Total of the prior time sequence characteristic information of the algal bloom of the blue algaeAnd (4) the number. If cyb ₁ Length of (2)

From cyb ₁ Middle and front epsilon-1 chlorophyll a concentration value composition, and is recorded as

Corresponding cyb ₁ Last value ld _ data in _ε Is recorded as

For example,

indicates belonging to cyb ₂ The identification number phi of the prior time sequence characteristic information of the cyanobacteria bloom represents that the cyanobacteria bloom belongs to cyb ₂ The total number of the prior time sequence characteristic information of the water bloom of the blue algae. If cyb ₂ Length of (2)

From cyb ₂ Middle front

Individual chlorophyll a concentration value composition, noted

Corresponding cyb ₂ Last value ld _ data in _φ Is recorded as

For example,

indicates belonging to cyb _W The identification number and xi of the prior time sequence characteristic information of the algal bloom of the blue algae represent belonging to cyb _W The total number of the prior time sequence characteristic information of the blue algae bloom. If cyb _W Length of (2)

From cyb _W Middle front

The chlorophyll a concentration value composition is marked as

Corresponding cyb _W Last value ld _ data in _ξ Is marked as

In the invention, the cyanobacterial bloom prior time sequence characteristic information CYBTF is obtained through the characteristic extraction steps 1 to 4. The CYBTF participates in the construction and prediction of S-ELM, and can provide effective prior local shape characteristic information of blue algae bloom evolution for the invention.

In the invention, the MRELM model trained in the step (I) and the prior time sequence characteristic information CYBTF of the cyanobacterial bloom form an S-ELM model for switching prediction. The chlorophyll a concentration in the blue algae bloom dynamic change trend module 10 is set as the prediction output index of the prediction index module 20, namely the input variable and the output variable of the S-ELM model are the chlorophyll a concentration.

According to the current data information CYB _ LD _{At present} Determining next predicted time data CYB _ LD _{The next moment} In the process, the trend similarity sl and the distance similarity D between the local prediction fragment pseg of the current chlorophyll a concentration and each piece of comparison characteristic information in the extracted cyanobacterial bloom prior time series characteristic information CYBTF need to be calculated _{European style} . And when the threshold condition is met, the corresponding prediction characteristic is used as a prediction value of the S-ELM model at the next moment, otherwise, the prediction characteristic is switched to the trained MRELM model to predict the next moment.

In the present invention, the local prediction segment pseg refers to the data CYB _ LD associated with the next prediction time _{The next moment} A time series subsequence in nearest neighbor relation.

In the invention, the W-th contrast characteristic information is used

For example, the trend similarity sl in the S-ELM model is determined by comparing feature information with equal length

Calculating the average absolute slope value lp of the local prediction segment pseg ^{Comparison of} 、lp ^pseg Then, calculate | lp ^{Comparison of} -lp ^pseg And | obtaining.

The average absolute slope value is calculated as shown in equation (1):

representing contrasting characteristic information

Average absolute slope value of (a).

Representing contrasting characteristic information

Is of a length of

ft (a) represents contrast characteristic information

The a-th chlorophyll a concentration value.

ft (a + 1) represents contrast characteristic information

Concentration value of (a + 1) th chlorophyll a.

(III) model switching Condition

According to the set trend similarity threshold sl _{Threshold value} When sl is greater than or equal to sl _{Threshold value} (sl≥sl _{Threshold value} ) Switching to a trained MRELM model to predict the chlorophyll a concentration at the next moment; when sl is less than sl _{Threshold value} (sl＜sl _{Threshold value} ) Temporal computation of contrast feature information

Euclidean distance D between local prediction segments pseg with equal length _{European style} 。

If D is _{European style} Is less than a set distance similarity threshold td _{Threshold value} (D _{European style} ＜td _{Threshold value} ) Selecting predicted feature information

As a predicted value of the chlorophyll-a concentration at the next time; if D is _{European style} Greater than or equal to the set distance similarity threshold td _{Threshold value} (D _{European style} ≥td _{Threshold value} ) Switching to trainingThe subsequent MRELM model predicts the chlorophyll a concentration at the next moment.

In the invention, the predicted values of all chlorophyll a concentrations output by the S-ELM model are recorded as LD ^out And is and

and the lower corner mark f is the total number of the predicted values of the chlorophyll a concentration output by the S-ELM model.

Represents the predicted value of the 1 st chlorophyll a concentration output by the S-ELM model.

And (3) a predicted value of the 2 nd chlorophyll-a concentration output by the S-ELM model is shown.

And (3) representing the predicted value of the f-th chlorophyll-a concentration output by the S-ELM model.

(IV) constructing an improved error compensation model based on the fuzzy neural network, and recording the improved error compensation model as an ECM model

In order to further improve the accuracy of the prediction model, the invention provides that the input information and the output information of the S-ELM model are used as the input information of the ECM model to compensate the chlorophyll a concentration predicted value so as to further improve the prediction accuracy of the chlorophyll a concentration.

In the invention, the T-S fuzzy neural network structure refers to PM based on T-S fuzzy neural network published by 'control engineering' at No. 3 of volume 25 of 3 months in 2018 _2.5 Prediction research, authors jogjunfei, caijie, hanhonggui; 3.1 The section content.

In the invention, the fuzzy C mean algorithm refers to fuzzy C mean clustering disclosed in the third section of 'mathematical modeling' by Yang Gui Yuan, author, university Press of Shanghai finance and economics, 2015 02.

In the invention, subtractive clustering refers to subtractive clustering disclosed in 2.2 of the prediction of peak power of discharge of power battery based on ANFIS and subtractive clustering, shangpeng, an author of Sunpui, which is published in 2015, 02-month electrotechnical journal.

The method comprises the following steps of 1, setting input layer information of an ECM model;

in the present invention, the first aspect of the input layer of the ECM model is used for receiving the analysis information of the historical cyanobacterial bloom

The second aspect is for receiving output information of a corresponding S-ELM model

In a third aspect, use

Chlorophyll a concentration value of

Comparing and calculating difference, marking as BIA, and BIA = [ BIA = ₁ ；bia ₂ ；…；bia _f ]；

According to a fourth aspect, according to CYB ^ECM 、LD ^out And BIA to construct a training sample set for the ECM model, denoted X _ ALL = [ X, BIA]。

X represents a training sample sequence input in the ECM model, and

x ₁ the 1 st training sample sequence representing the input layer in the ECM model.

x ₂ The 2 nd training sample sequence representing the input layer in the ECM model.

x _f The f-th training sample sequence representing the input layer in the ECM model.

bia ₁ To represent

And the corresponding training sample error value is the chlorophyll a concentration difference value between the chlorophyll a concentration measured value and the predicted value at the next moment.

bia ₂ To represent

bia _f Represent

Constructing step 2, setting fuzzy layer information of the ECM model;

in the invention, the membership function of the fuzzy layer in the ECM model is a Gaussian membership function and is marked as Gfun (mea, psi). And the mean value mea is obtained by a clustering center generated by improved fuzzy C mean clustering, and the standard deviation psi is obtained by weighting the chlorophyll a concentration values in each row of samples in the X by using the membership value generated by an improved fuzzy C mean clustering algorithm.

In the invention, the information output by the fuzzy layer is obtained by calculating each input sequence in the input data X in the training sample set by adopting a Gaussian membership function Gfun (mea, psi), and is marked as U = [ U ] ₁ ,u ₂ ,…,u _c×τ ]τ represents the maximum value of the sliding window width, and C represents the number of clusters generated by the improved fuzzy C-means clustering algorithm, i.e. the number of rules. And (4) calculating the membership value of each chlorophyll a concentration value in each training sample input sequence in the X according to the determined membership function.

u ₁ Representing the 1 st letter output after the fuzzification layer in the construction stageAnd (4) information.

u ₂ Representing the 2 nd information output after the fuzzification layer in the construction stage.

u _c×τ Represents the c x t information output after the blurring layer in the construction stage.

In the invention, the improved fuzzy C-means clustering algorithm, by inputting the X into a subtraction clustering algorithm and taking the obtained clustering number and the clustering center as the initial value of the conventional fuzzy C-means clustering algorithm, avoids the subjectivity of manually setting the clustering number; according to the density value omega of each training sample input sequence (the density value is calculated by referring to a formula 7 in the power battery discharge peak power prediction based on ANFIS and subtractive clustering), different weights are given to each training sample input sequence; and simultaneously considering the intra-class compactness and the inter-class separation degree to improve the target function of the conventional fuzzy C-means clustering algorithm.

The objective function J of the improved fuzzy C-means clustering algorithm provided by the invention is as follows:

k denotes a training sample input sequence identification number.

c represents the rule number, i represents the identification number of the ith cluster, j represents the identification number of the jth cluster, and i and j are not the same cluster.

m represents a blurring coefficient.

Representing the k-th training sample input sequence to the m-th power of the membership value of the class i-th center.

ω _k The density value of the input sequence for the kth training sample represents the weight.

Inputting Euclidean distance from the sequence to the class i center for the k training sample, and

x _k the kth training sample sequence representing the input layer in the ECM model.

v _i Indicating class i centers.

v _j Representing class j class centers.

Gamma is a specific gravity coefficient.

Is the Euclidean distance from the ith cluster center to the jth cluster center, an

η is a regular term coefficient.

In the present invention, g is solved _ik The calculation formula of (2) is as follows:

g _ik representing the membership value of the kth training sample input sequence to the ith class center.

l denotes an identification number of the l-th class cluster, and l, i, j are not the same cluster.

Inputting Euclidean distance from the sequence to class I center for the kth training sample, and

v _l Indicating class i centers.

v _j Representing the class j center.

Is the Euclidean distance from the ith cluster center to the jth cluster center, and

in the present invention, the i-th class center v _i The calculation formula of (2) is as follows:

in the present invention, x _f The output information of the blurred layer is recorded as

Represents x _f And 1 st training information is output after the fuzzy layer is processed.

Denotes x _f And 2, outputting the 2 nd training information after the fuzzy layer is processed.

Denotes x _f C x tau training information output after fuzzy layer.

Constructing step 3, setting rule layer information of the ECM model;

in the invention, each rule in the rule layer of the ECM model is obtained by carrying out multiplication operation on output information of the fuzzy layer, and then the result is subjected to ruleThe output information of the layer is noted as

Denotes x _f And (4) obtaining an output value through the 1 st rule in the c rules of the rule layer.

Denotes x _f And (4) obtaining an output value through the 2 nd rule in the c rules of the rule layer.

Denotes x _f And obtaining an output value through the q rule in the c rules of the rule layer.

Constructing step 4, setting normalization layer information of the ECM model;

in the invention, the output information of the normalization layer of the ECM model is obtained by the ratio of the output information of each rule in the rule layer to the sum of the output information of all the rules, and the output information of the normalization layer is recorded as

Represents x _f And (4) an output value obtained by the 1 st rule in the c rules of the normalization layer.

Represents x _f And (4) output values obtained by the 2 nd rule in the c rules of the normalization layer.

Denotes x _f And (4) obtaining an output value through the q rule in the c rules of the normalization layer.

A step 5 of constructing, namely setting deblurring layer information of the ECM model;

in the present invention, the first aspect of the deblurring layer of the ECM model is used to receive the output information RA of the normalization layer ^xf 。

A second aspect is for receiving an input sequence of training samples, in x, for an ECM model _f For example.

Output of deblurring layer

Is made by RA ^xf ，x _f And parameter MP = [ MP ] in de-blurring layer ₁ ；mp ₂ ；…；mp _Hw ；p _cot ]Calculated according to the formula (4).

mp ₁ Is x _f The first chlorophyll-a concentration in the deblurring layer.

mp ₂ Is x _f Second chlorophyll-a concentration in the deblurred layer.

mp _Hw Is x _f Middle (H) _w Parameters of chlorophyll a concentration in deblurred layer.

p _cot Is a constant term in the deblurring layer.

Denotes x _f And (4) obtaining an output value through the 1 st rule in the c rules of the deblurring layer.

Denotes x _f And (4) obtaining an output value through the 2 nd rule in the c rules of the deblurring layer.

Represents x _f And (4) obtaining an output value through a q rule in c rules of the deblurring layer.

A construction step 6, setting output layer information of the ECM model;

in the present invention, said x _f The output information of the output layer of the ECM model is obtained by summing the output information of the deblurring layer

And is

Similarly, the output information of the ECM model output layer corresponding to the X is recorded as ME for the input data in the training sample set ^out And is and

said x ₁ The output information of the output layer of the ECM model is obtained by summing the output information of the deblurring layer

Said x ₂ The output information of the output layer of the ECM model is obtained by summing the output information of the deblurring layer

The parameter optimization is based on the BIAs BIA and the ECM model output information ME ^out And constructing an error square loss function, and then updating the mean value and the standard deviation of the Gaussian membership function in the ECM model and the parameters in the deblurring layer by adopting a gradient descent method. When the set training times are reached, the BIA and the ME obtained by each training are calculated ^out The parameter corresponding to the minimum sum of squared errors is selectedAnd taking the number as the final parameter of the ECM model, and finishing the training of the ECM model.

In the invention, an S-ELM model and an ECM model form a comprehensive prediction model based on time sequence characteristics and error compensation, and the comprehensive prediction model is marked as an SMRELM model. Output information LD of S-ELM model from chlorophyll a concentration predicted value of SMRELM model ^out And output information ME of ECM model ^out And (6) summing to obtain the final product.

(V) Generation of membership Functions

In the invention, a Gaussian membership function Gfun (mea, psi) is established by putting a training sample input set X in an improved fuzzy C-means clustering algorithm and then obtaining a mean value and a standard deviation. Wherein the mean mea is a cluster center v generated by improved fuzzy C-means clustering _i And obtaining the standard deviation psi by taking a membership value g generated by an improved fuzzy C-means clustering algorithm as the weight of the chlorophyll a concentration value in each row of samples in the training sample input set X, and then performing weighted calculation.

In the improved fuzzy C-means clustering algorithm, the clustering center v _i The calculation formula of (2) is as follows:

degree of membership g _ik The calculation formula of (c) is:

(VI) blue algae bloom prediction by applying SMRELM model

In the invention, the cyanobacterial bloom test performed by combining the S-ELM model and the ECM model is called SMRELM model. FIG. 4 is a flow chart of forecasting lake and reservoir cyanobacterial bloom by the SMRELM model.

A prediction step I, receiving test information;

in the invention, the test information of the cyanobacterial bloom is represented in a set form and is marked as TCYB = { tdata = (TData) ₁ ,tdata ₂ ,…,tdataσ}。

tdata ₁ Shows the analysis information of the blue algae bloom for prediction of the 1 st.

tdata ₂ Shows the 2 nd analysis information of the blue algae bloom for prediction.

tdata _σ Representing the last analysis information of cyanobacterial bloom for prediction

And the lower corner mark sigma is the number of all the cyanobacterial bloom analysis information used for prediction and is recorded as the test lumped number.

Tdata for convenience of description _σ Also called any analysis information of blue algae bloom for prediction.

In the invention, each cyanobacterial bloom analysis information for prediction carries sampling time and chlorophyll a concentration value. Namely:

tdata ₁ is denoted as time _ tdata ₁ 。

tdata ₂ Is denoted as time _ tdata ₂ 。

tdata _σ Is denoted as time _ tdata _σ 。

tdata ₁ The chlorophyll a concentration value is marked as tld _ data ₁ 。

tdata ₂ The chlorophyll a concentration value is marked as tld _ data ₂ 。

tdata _σ The chlorophyll a concentration value is marked as tld _ data _σ 。

Then there are: test information of cyanobacterial bloom

TCYB = { tdata = { [ tdata ] ₁ ,tdata ₂ 8230, the concentration value of all chlorophyll a in tdata sigma is marked as TCYB _ LD = [ tld _ data% ₁ ,tld_data ₂ ,…,tld_data _σ ]。

Predicting step two, applying an S-ELM model to predict the cyanobacterial bloom;

according to historical cyanobacterial bloom analysis information CYB and CYB ^ECM And current data information TCYB _ LD of the prediction phase _{At present} When is formedCalculating a local prediction segment pseg of the current chlorophyll a concentration, and then calculating the trend similarity sl and the distance similarity D between each piece of contrast characteristic information in the pseg and the cyanobacterial bloom prior time sequence characteristic information CYBTF _{European style} . When the model switching condition is met, the corresponding prediction characteristic is used as the next-time prediction value of the S-ELM model, and the output information of the S-ELM model is recorded as TLD ^out Executing the predicting step three; otherwise, switching to the trained MRELM model to predict the next moment, and executing the predicting step nine.

Predicting information of an input layer in the application of the ECM model;

in the present invention, the first aspect of the input layer of the ECM model is for receiving data information of the same S-ELM model;

second aspect for receiving output information TLD of a corresponding S-ELM model ^out ；

Constructing a test sample set for the ECM model according to the data information of the two aspects, and marking as TX = [ TX = [ [ x ] ₁ ,tx ₂ ,…,tx _s ] ^T 。

tx ₁ The 1 st test sample sequence is indicated.

tx ₂ The 2 nd test sample sequence is indicated.

tx _s Representing the s-th test sample sequence.

Predicting output information of a fuzzification layer in the application of the ECM model;

in the present invention, the output information TU = [ TU ] of the blurring layer ₁ ,tu ₂ ,…,tu _c×τ ]The method is obtained by calculating the membership value of each input sequence in a test sample set TX by adopting a Gaussian membership function Gfun (mea, psi).

tu ₁ And the 1 st information output after the blurring layer in the prediction stage is shown.

tu ₂ And 2, representing the 2 nd information output after the blurring layer in the prediction stage.

tu _c×τ Represents the c × τ -th information output after the blurring layer in the prediction stage.

By tx _s For the purpose of example, it is preferred that,output information of the blurring layer is recorded as

Representing tx in the prediction phase _s And 1 st information output after the fuzzy layer.

Representing tx in the prediction phase _s And 2 nd information is output after the fuzzy layer is processed.

Representing tx in the prediction phase _s And c x tau information output after the blurring layer.

Step five, predicting output information of a rule layer in the application of the ECM model;

in the invention, a rule layer in the ECM model is used for receiving output information of the fuzzification layer, and the output information of the rule layer

And performing multiplication operation on the output information of the fuzzy layer.

Represents tx _s And (4) obtaining an output value through the 1 st rule in the c rules of the rule layer.

Denotes tx _s And (4) obtaining an output value through the 2 nd rule in the c rules of the rule layer.

Denotes tx _s And obtaining an output value through the q rule in the c rules of the rule layer.

Predicting the output information of a normalization layer in the application of the ECM model;

in the invention, a normalization layer in the ECM model is used for receiving the output information of the rule layer, and the output information of the normalization layer

The rule layer is obtained by summing the output information of each rule and the output information of all rules.

Represents tx _s And (4) output values obtained by the 1 st rule in the c rules of the normalization layer.

Denotes tx _s And (4) output values obtained by the 2 nd rule in the c rules of the normalization layer.

Denotes tx _s And (4) an output value obtained by a q rule in c rules of the normalization layer.

A seventh step of predicting, namely, the output information of the deblurring layer in the application of the ECM model;

in the present invention, a first aspect of a deblurring layer in an ECM model is used to receive output information from a normalization layer

A second aspect is for receiving a test sample set of an ECM model at tx _s For example.

Output information of deblurring layer

By

tx _s And parameters in the deblurring layer

Calculated according to the formula (5).

Denotes tx _s And (4) an output value obtained by the 1 st rule in the c rules of the deblurring layer.

Denotes tx _s And (4) obtaining an output value through the 2 nd rule in the c rules of the deblurring layer.

Represents tx _s And (4) obtaining an output value through the q rule in the c rules of the deblurring layer.

The superscript T is the transposed symbol.

Step eight, output information of an output layer in the application of the ECM model is predicted;

in the present invention, the output values of the output layer of the ECM model

Outputting information from deblurring layer

Are summed to obtain

And is

Similarly, the output information corresponding to the test sample set TX is denoted as ALE ^out And is and

the tx ₁ The output information of the output layer of the ECM model is obtained by summing the output information of the deblurring layer

The tx ₂ The output information of the output layer of the ECM model is obtained by summing the output information of the deblurring layer

A ninth prediction step, outputting the final output information of the layer through the SMRELM model;

in the present invention, the FINAL chlorophyll-a concentration prediction value FINAL ^out Output information TLD from S-ELM model ^out And output information ALE of the ECM model ^out And (4) summing to obtain.

Example 1

The data in the examples are derived from water quality data sets at the sites of the great villa. The sampling frequency of the data set is 4 hours, and the sampling time starts from 12 points 04 at 20 days 6 month and 20 month in 2009 to 10 points 8 at 27 month in 2012, and the data set comprises 6342 groups of data. The data used in the experiment is obtained by averaging the chlorophyll a concentration data of each day in the data set to be taken as the data of the day, namely the sampling frequency is 24 hours, 1086 groups of data are obtained, the training data is 900 groups, wherein the first 600 groups of data are used for constructing an MRELM model and extracting the prior time sequence characteristic information of cyanobacterial bloom, the second 300 groups of data are used for testing the prediction performance of the S-ELM model and constructing an ECM model, and the data used for the final SMRELM model test are 186 groups.

In the MRELM model, a sliding window width H is set _w Has a value range of [3,4, \ 8230;, 10]The value range of the number of the neurons in the hidden layer is [10,15, \8230;, 40]The weighting coefficient C ₁ ,C ₂ All values of (1) are [10 ] ^－8 ,10 ^－7 ,…,10 ⁸ ]. Calculating the root mean square error under different parameters to obtain the optimum sliding window width length of 3 and hidden layerThe number of neurons is 10, and the weighting coefficient C ₁ ,C ₂ Are respectively 10 ^－8 And 10 ^－4 。

In the blue algae bloom characteristic extraction experiment, ld _ data _{Maximum of} 29.35mg/L, and the adjustment coefficient p of the floating recovery stage _{Float upward} 0.34, coefficient of regulation p in the continuous growth phase _{Growth of} It was 0.6, and cw was 40mg/L. According to p _{Float upward} ，p _{Growth of the seed} And ld _ data _{Maximum of} The calculated cu is 10mg/L and cv is 17.61mg/L. Setting the length value range of the blue algae bloom prior time sequence characteristic information as [3,4, \8230 ], 7]. According to cu and cv, calculating to obtain candidate cyanobacterial bloom characteristic information (CYB _ LD) with rising trend ^HX Information). Then according to the distance similarity, eliminating CYB _ LD ^HX And (4) obtaining the prior time sequence characteristic information (CYBTF information) of the cyanobacterial bloom by the repeated cyanobacterial bloom characteristic sequence in the information.

In the S-ELM model, setting a trend similarity threshold value of 0.1 and a distance similarity threshold value range of [0,0.05, \ 8230;, 1.5] in the S-ELM model, and performing switching prediction on the MRELM model and the CYBTF according to a similarity threshold value switching condition and a selected similarity threshold value. Fig. 5 shows the effect of different distance similarity thresholds on the prediction effect. As can be seen from fig. 5, the optimal distance similarity threshold is 0.2.

In the ECM model, the latter 300 groups of data are subjected to an improved fuzzy C-means clustering algorithm to obtain a Gaussian membership function Gfun (mea, psi). The improved fuzzy C-means clustering algorithm can avoid the number of artificially set Gaussian membership functions and reduce the influence of outliers in test data. In the improved fuzzy C-means clustering algorithm, a fuzzy coefficient m is 2, a subtraction clustering parameter rad is 0.5, a specific gravity coefficient gamma is 0.04, and a regular term coefficient eta is 0.03.

FIG. 6 shows the predicted results of cyanobacterial bloom in lakes and reservoirs by applying SMRELM model, and the comparison with the predicted results of other models. FIG. 7 is a box plot of the root mean square error for 10 experiments for each model to compare the stability of each model. Table 1 shows the combined performance of BP neural network, dynamic recurrent neural network (ELMAN), and SMRELM models in testing, including Root Mean Square Error (RMSE), mean absolute percentage error (MA)PE), normalized Root Mean Square Error (NRMSE), correlation coefficient (R) ² ) And (4) indexes. By comparison, the SMRELM model in this embodiment has higher RMSE (mean), MAPE, NRMSE, R ² And stability is second only to ELMAN. Therefore, the SMRELM model in the embodiment has higher prediction precision in the lake and reservoir cyanobacterial bloom prediction, and is suitable for the prediction application of the lake and reservoir cyanobacterial bloom.

TABLE 1 lake and reservoir blue algae bloom prediction experiment results and comparison of different methods

	RMSE (mean value)	MAPE	NRMSE	R ²
					BP	1.3917	15.75％	0.3622	0.8721
ELMAN	1.3404	14.40％	0.3488	0.8796
					SMRELM	1.3180	14.36％	0.3429	0.8825

Claims

1. A lake and reservoir cyanobacterial bloom prediction system based on an SMRELM model comprises a cyanobacterial bloom dynamic change trend module (10) and a chlorophyll a concentration extraction index module (20); it is characterized by also comprising: the cyanobacterial bloom time sequence characteristic information extraction module (100), the S-ELM model (200) and the ECM model (30);

the cyanobacterial bloom time sequence characteristic information extraction module (100) is arranged in the cyanobacterial bloom dynamic change trend module (10);

the S-ELM model (200) is arranged in a chlorophyll a concentration extraction index module (20);

the ECM model (30) is arranged at the output end of the chlorophyll a concentration extraction index module (20);

the cyanobacterial bloom time sequence characteristic information extraction module (100) is used for extracting effective time sequence characteristics of the cyanobacterial bloom data in the cyanobacterial bloom dynamic change trend module (10);

firstly, receiving the concentration CYB _ LD of the constructed S-ELM model in the blue algae water bloom dynamic change trend module (10);

at least setting the upper limit critical value of the floating recovery stage of the cyanobacteria bloom, and recording the upper limit critical value as cu _{On the upper part} (ii) a And cu _{On the upper part} ＝p ₁ ×ld_data _{Maximum of} ，p ₁ Represents the first chlorophyll-a concentration regulation coefficient, and

ld_data _{maximum of} Represents the maximum chlorophyll a concentration selected from CYB _ LD;

the lower limit critical value of the continuous growth of the cyanobacterial bloom in the continuous growth stage is recorded as cv _{Lower part} (ii) a And cv _{Lower part} ＝p ₂ ×ld_data _{Maximum of} ，p ₂ Denotes the second chlorophyll-a concentration-regulating coefficient, and p ₂ ＝1.76×p ₁ ；

according to the chlorophyll a concentration interval cv _{Lower part} ≤ld_data _δ Extracting the characteristic information of the concentration of the cyanobacterial bloom chlorophyll candidate chlorophyll a from CYB _ LD, and recording the characteristic information as CYB _ LD ^HX ；

comparing CYB _ LD according to distance similarity ^HX Removing the characteristic of the repeated cyanobacterial bloom information to obtain the prior time sequence characteristic information of the cyanobacterial bloom, and recording as CYBTF;

the S-ELM model (200) construction comprises the following steps:

a, constructing a network architecture of an MRELM model;

the network architecture of the MRELM model is set as an input layer, a hidden layer and an output layer;

b, establishing an input layer information of the MRELM model;

the input layer of the MRELM model is used for receiving the concentration CYB _ LD for constructing the S-ELM model;

the second aspect sets the sliding window width of the MRELM model, which is marked as H _w (ii) a The value range of the width of the sliding window is H _w ＝[3,4,…,τ](ii) a The maximum value of the sliding window width is recorded as tau;

third aspect according to said H _w Dividing the chlorophyll a concentration in the CYB _ LD into training sample sequences, and recording the training sample sequences as LD _ DATA;

c, establishing a step of setting the number of neurons of a hidden layer of the MRELM model;

neurons of the hidden layer of the MRELM model are denoted as L _h (ii) a The value range of the number of the neurons in the hidden layer is 1 < L _h Kappa is less than or equal to; the maximum value of the number of the neurons in the hidden layer is recorded as k; each hidden layer neuron accepts data fromExcitation connection of all input layer neurons, i.e. LD _ DATA is mapped by the characteristics of neurons in the hidden layer to obtain hidden layer output information, which is recorded as H ^out ；

D, constructing the output of the MRELM model;

the stream-shape regularization is the output space HID with LD _ DATA in the hidden layer ^out Can maintain its local geometry in the input layer if a training sample sequence

With another training sample sequence

If the similarity in the input layer is high, the similarity of the input layer and the hidden layer in the output space is also high, so that the influence of randomness is reduced; the generalization performance of the extreme learning machine is improved;

when the output layer of the MRELM model comprises 1 neuron, the output information corresponding to the neuron is recorded as LDD ^out (ii) a And the LDD ^out ＝H ^out X β, β representing the weight between the hidden layer and the output layer;

obtaining an MRELM model after the construction steps A to C;

constructing a step E, and optimizing an MRELM model;

a step E101 is established, wherein the number of the training sample subsets divided based on the grid search method is set to be b;

constructing a step E102, dividing D _ ALL into b training sample subsets with the same size, and recording the b training sample subsets as

BLD denotes sampling time ordering-chlorophyll a concentration value, and BLD = [ BLD = ₁ ；bld ₂ ；…；bld _b ]；

T _ BLD represents sample time ordering-sliding window-chlorophyll a concentration value, and T _ BLD = [ T _ BLD ₁ ；t_bld ₂ ；…；t_bld _b ]；

bld ₁ Representing the 1 st training subset of the cyanobacterial bloom time sequence divided based on the grid search method;

bld ₂ representing the training subset of the 2 nd cyanobacterial bloom time sequence divided based on the grid search method;

bld _b representing a b-th cyanobacterial bloom time sequence training subset divided based on a grid search method;

t_bld ₁ the 1 st cyanobacterial bloom time sequence-sliding window concentration divided based on the grid search method is represented;

t_bld ₂ representing the 2 nd cyanobacterial bloom time sequence-sliding window concentration divided based on the grid search method;

t_bld _b representing the concentration of the sliding window which is the b-th cyanobacterial bloom time sequence divided based on the grid search method;

SUB_D _Training Putting in MRELM for learning;

SUB_D _{evaluation of} Placing in MRELM after training for evaluation;

SUB_D _{evaluation of} Selecting a b-th cyanobacterial bloom time sequence-concentration-training sample input subset from the SUB _ D; then SUB _ D _Training Is to divide the SUB _ D _{Evaluation of} Inputting all the cyanobacterial bloom time sequences, concentrations and training samples into a subset;

a step E104 of constructing, respectively recording two weighting coefficients in the MRELM model objective function as first weighting coefficients C ₁ Second weighting factor C ₂ The weight is recorded as beta; will the SUB _ D _Training Putting the model in an MRELM model for learning to obtain a weight beta;

construction step E105, SUB _ D _{Evaluation of} Middle bld ₄ Putting the test sample in a trained MRELM model for evaluation, outputting evaluation values of chlorophyll a concentrations corresponding to the evaluation samples, and recording the evaluation values as LDD ^out _BLD _{Evaluation of} ；

Construction step E106, calculating SUB _ D _{Evaluation of} Middle t _ bld ₄ And LDD ^out _BLD _{Evaluation of} The root mean square error value between them is recorded as JFC;

in the invention, the smaller JFC is, the higher the prediction accuracy of the MRELM model is;

setting a weighting coefficient C ₁ ,C ₂ Respectively is C ₁ ＝[10 ^-8 ,10 ^-7 ,…,ψ ₁ ]，C ₂ ＝[10 ^-8 ,10 ^-7 ,…,ψ ₂ ]，ψ ₁ Is a first weighting coefficient C ₁ Maximum value of (v), ψ ₂ Is the second weighting coefficient C ₂ Maximum value of (d);

according to H _w ，L _h ，C ₁ ，C ₂ Value of (2) will SUB _ D _{Evaluation of} Sample subset bld of (1) _b As input, under the condition of examining different values, the MRELM model outputs corresponding evaluation values, and the evaluation values and t _ bld are calculated _b Root mean square error value of

A construction step E108, selecting the minimum root mean square error value in the construction step E107, and marking as JFC _{Minimum size of} (ii) a And coupling said JFC _{Minimum size of} Corresponding to H _w ，L _h ，C ₁ ，C ₂ As the parameters of the MRELM model, the MRELM model is optimized to obtain the trained MRELM model;

the construction step of the ECM model (30) comprises the following steps:

the method comprises the following steps of (1) constructing input layer information of an ECM model;

the first aspect of the input layer of the ECM model is used for receiving historical cyanobacterial bloom analysis information CYB ^ECM ；

The second aspect is for receiving output information LD of a corresponding S-ELM model ^out ；

In a third aspect, CYB is used ^ECM Leaf of Chinese Caterpillar fungusConcentration value of a and LD ^out Comparing and calculating a difference value, and recording as BIA;

fourth aspect, according to CYB ^ECM 、LD ^out And BIA to construct a training sample set for the ECM model, denoted X _ ALL = [ X, BIA]；

X represents a training sample sequence input in the ECM model;

constructing step 2, setting fuzzy layer information of the ECM model;

the membership function of the fuzzy layer in the ECM model is a Gaussian membership function and is marked as Gfun (mea, psi); the mean value mea is obtained by a clustering center generated by improved fuzzy C mean clustering, and the standard deviation psi is obtained by weighting the chlorophyll a concentration values in each row of samples in the X by taking the membership value generated by the improved fuzzy C mean clustering algorithm as the weight of the chlorophyll a concentration values in each row of samples;

the objective function of the improved fuzzy C-means clustering algorithm is

h represents the total number of training samples;

k represents a training sample identification number;

c represents the total clustering number, namely the rule number;

i represents the identification number of the ith cluster, j represents the identification number of the jth cluster, and i and j are not the same cluster;

m represents a blurring coefficient;

representing the degree of membership to the ith class center of the kth training sample to the m-th power;

ω _k weight of the kth training sample;

europe from kth training sample to class i centerA distance of formula

Gamma is a specific gravity coefficient;

eta is a regular term coefficient;

v _i representing the class i center;

solving to obtain g _ik Is calculated by the formula

l represents the identification number of the ith cluster, i represents the identification number of the ith cluster, j represents the identification number of the jth cluster, and l, i and j are not the same cluster;

is the Euclidean distance from the kth training sample to class I center, and

class i center v _i Is calculated by the formula

x _k A k training sample sequence representing an input layer in the error compensation model;

v _j representing class j class centers;

constructing step 3, setting rule layer information of the ECM model;

each rule in the rule layer in the ECM model is obtained by performing multiplication operation on output information of the fuzzy layer, and the output information passing through the rule layer is recorded as

Constructing step 4, setting normalization layer information of the ECM model;

the output information of the normalization layer of the ECM model is obtained by the ratio of the output information of each rule in the rule layer to the sum of the output information of all the rules, and the output information of the normalization layer is recorded as

defuzzification layer of ECM model the first aspect is used to receive output information of normalization layer

A second aspect is for receiving an input sequence of training samples, in x, for an ECM model _f For example;

output of deblurring layer

Is formed by

x _f And the parameter MP in the deblurred layer is calculated, i.e.

A construction step 6, setting output layer information of the ECM model;

said x _f The output information of the output layer of the ECM model is obtained by summing the output information of the deblurring layer

And recording the output information of the ECM model output layer corresponding to the X in the input data in the training sample set as ME ^out ；

An error compensation model, namely an ECM model, is obtained through the construction steps 1 to 6.

2. The SMRELM model-based lake and reservoir cyanobacterial bloom prediction system according to claim 1, characterized in that: the steps for predicting the blue algae bloom in the lake and reservoir comprise:

a prediction step I, receiving test information;

recording the test information of the cyanobacterial bloom as TCYB;

according to historical cyanobacterial bloom analysis information CYB and CYB ^ECM And current data information TCYB _ LD of the prediction phase _{At present} The formed time sequence is used for calculating a local prediction segment pseg of the current chlorophyll a concentration, and then the trend similarity sl and the distance similarity D between each contrast characteristic information in the pseg and the cyanobacterial bloom prior time sequence characteristic information CYBTF are calculated _{European style} (ii) a When the model switching condition is met, the corresponding prediction feature is used as a prediction value of the S-ELM model at the next moment, and the output information of the S-ELM model is recorded as TLD ^out Executing a prediction step three; otherwise, switching to the trained MRELM model to predict the next moment, and executing the predicting step nine;

predicting information of an input layer in ECM model application;

the first aspect of the input layer of the ECM model is used for receiving data information of the S-ELM model;

a second aspect is for receiving output information TLD of a corresponding S-ELM model ^out ；

Constructing a test sample set used for an ECM model according to the data information of the two aspects, and recording the test sample set as TX;

the output information TU of the fuzzy layer is obtained by calculating the membership value of each input sequence in the test sample set TX by adopting a Gaussian membership function Gfun (mea, psi);

predicting output information of a rule layer in the application of the ECM model;

the rule layer in the ECM model is used for receiving output information of the fuzzification layer and outputting the output information of the rule layer

Performing multiplication operation on output information of the fuzzy layer;

the normalization layer in the ECM model is used for receiving the output information of the rule layer and the output information of the normalization layer

The output information of each rule in the rule layer is obtained from the sum of the output information of all the rules;

defuzzification layer in ECM model the first aspect is used to receive normalization layer output information

A second aspect is for receiving a set of test samples of an ECM model at tx _s For example;

output information of deblurring layer

By

tx _s And parameter MP in the deblurred layer ^T Calculating to obtain;

output values of an output layer of an ECM model

Outputting information from deblurring layer

Are summed to obtain

The output information corresponding to the test sample set TX is marked as ALE ^out ；

A ninth predicting step, outputting the final output information of the layer through the SMRELM model;

FINAL chlorophyll a concentration prediction value FINAL ^out Output information TLD from S-ELM model ^out And output information ALE of ECM model ^out And (4) summing to obtain.