CN114386710A - Long-term lake cyanobacterial bloom forecasting method and system based on STL-RF-LSTM - Google Patents
Long-term lake cyanobacterial bloom forecasting method and system based on STL-RF-LSTM Download PDFInfo
- Publication number
- CN114386710A CN114386710A CN202210063970.0A CN202210063970A CN114386710A CN 114386710 A CN114386710 A CN 114386710A CN 202210063970 A CN202210063970 A CN 202210063970A CN 114386710 A CN114386710 A CN 114386710A
- Authority
- CN
- China
- Prior art keywords
- data
- chlorophyll
- model
- stl
- long
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000007774 longterm Effects 0.000 title claims abstract description 54
- 238000013277 forecasting method Methods 0.000 title claims description 12
- 230000007613 environmental effect Effects 0.000 claims abstract description 94
- 238000000354 decomposition reaction Methods 0.000 claims abstract description 30
- 238000000034 method Methods 0.000 claims abstract description 26
- 238000012544 monitoring process Methods 0.000 claims abstract description 18
- ATNHDLDRLWWWCB-AENOIHSZSA-M chlorophyll a Chemical compound C1([C@@H](C(=O)OC)C(=O)C2=C3C)=C2N2C3=CC(C(CC)=C3C)=[N+]4C3=CC3=C(C=C)C(C)=C5N3[Mg-2]42[N+]2=C1[C@@H](CCC(=O)OC\C=C(/C)CCC[C@H](C)CCC[C@H](C)CCCC(C)C)[C@H](C)C2=C5 ATNHDLDRLWWWCB-AENOIHSZSA-M 0.000 claims description 88
- 229930002868 chlorophyll a Natural products 0.000 claims description 82
- 238000012549 training Methods 0.000 claims description 31
- 238000012360 testing method Methods 0.000 claims description 19
- 238000011156 evaluation Methods 0.000 claims description 17
- 238000004364 calculation method Methods 0.000 claims description 15
- 238000003066 decision tree Methods 0.000 claims description 13
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 13
- IJGRMHOSHXDMSA-UHFFFAOYSA-N Atomic nitrogen Chemical compound N#N IJGRMHOSHXDMSA-UHFFFAOYSA-N 0.000 claims description 12
- 238000012216 screening Methods 0.000 claims description 12
- 238000007637 random forest analysis Methods 0.000 claims description 11
- QALQXPDXOWOWLD-UHFFFAOYSA-N [N][N+]([O-])=O Chemical compound [N][N+]([O-])=O QALQXPDXOWOWLD-UHFFFAOYSA-N 0.000 claims description 8
- NBIIXXVUZAFLBC-UHFFFAOYSA-K phosphate Chemical compound [O-]P([O-])([O-])=O NBIIXXVUZAFLBC-UHFFFAOYSA-K 0.000 claims description 8
- OAICVXFJPJFONN-UHFFFAOYSA-N Phosphorus Chemical compound [P] OAICVXFJPJFONN-UHFFFAOYSA-N 0.000 claims description 7
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 claims description 7
- 229910052760 oxygen Inorganic materials 0.000 claims description 7
- 239000001301 oxygen Substances 0.000 claims description 7
- 229910052698 phosphorus Inorganic materials 0.000 claims description 7
- 239000011574 phosphorus Substances 0.000 claims description 7
- 238000012795 verification Methods 0.000 claims description 7
- 229910019142 PO4 Inorganic materials 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 6
- 239000010452 phosphate Substances 0.000 claims description 6
- 239000012286 potassium permanganate Substances 0.000 claims description 6
- 241000192700 Cyanobacteria Species 0.000 claims description 5
- 229910052757 nitrogen Inorganic materials 0.000 claims description 5
- CVTZKFWZDBJAHE-UHFFFAOYSA-N [N].N Chemical compound [N].N CVTZKFWZDBJAHE-UHFFFAOYSA-N 0.000 claims description 4
- BTCSSZJGUNDROE-UHFFFAOYSA-N gamma-aminobutyric acid Chemical compound NCCCC(O)=O BTCSSZJGUNDROE-UHFFFAOYSA-N 0.000 claims description 4
- 230000008569 process Effects 0.000 claims description 4
- 239000000654 additive Substances 0.000 claims description 3
- 230000000996 additive effect Effects 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 230000002123 temporal effect Effects 0.000 claims description 3
- 239000000126 substance Substances 0.000 claims description 2
- 230000008859 change Effects 0.000 abstract description 11
- 238000009825 accumulation Methods 0.000 abstract description 3
- 241000195493 Cryptophyta Species 0.000 description 7
- 230000000694 effects Effects 0.000 description 6
- 230000001932 seasonal effect Effects 0.000 description 6
- XKMRRTOUMJRJIA-UHFFFAOYSA-N ammonia nh3 Chemical compound N.N XKMRRTOUMJRJIA-UHFFFAOYSA-N 0.000 description 4
- 238000007726 management method Methods 0.000 description 4
- 238000010606 normalization Methods 0.000 description 4
- 238000010220 Pearson correlation analysis Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 3
- YHXISWVBGDMDLQ-UHFFFAOYSA-N moclobemide Chemical compound C1=CC(Cl)=CC=C1C(=O)NCCN1CCOCC1 YHXISWVBGDMDLQ-UHFFFAOYSA-N 0.000 description 3
- 241000192710 Microcystis aeruginosa Species 0.000 description 2
- 241000282414 Homo sapiens Species 0.000 description 1
- 238000012300 Sequence Analysis Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000012851 eutrophication Methods 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 229910000069 nitrogen hydride Inorganic materials 0.000 description 1
- 230000000050 nutritive effect Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 150000003839 salts Chemical class 0.000 description 1
- 238000010792 warming Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Abstract
The invention discloses a long-term prediction method and a long-term prediction system for lake cyanobacterial bloom based on STL-RF-LSTM, which are characterized in that the method and the system remove potential noise in cyanobacterial bloom monitoring data and obtain the long-term change trend of the cyanobacterial bloom by collecting monitoring data of long-term sequence cyanobacterial bloom and relevant environmental influence factors of the lake cyanobacterial bloom and carrying out STL time sequence decomposition on the cyanobacterial bloom data; the RF model is adopted to screen the key environmental factors of the cyanobacterial bloom, so that the influence of data information redundancy can be further avoided on the basis of effectively considering the nonlinear relation between the environmental factors and the cyanobacterial bloom; and finally, establishing a corresponding relation between the cyanobacterial bloom and the key environmental influence factors thereof in the time before and after by using an LSTM model, and substituting the monitoring data of the key environmental influence factors actually measured at the current time into the established corresponding relation to obtain a future cyanobacterial bloom forecasting result, so that the environmental factors at the future time are not required to be forecasted, the accumulation of forecasting errors is avoided, and the cyanobacterial bloom forecasting precision can be effectively improved.
Description
Technical Field
The invention relates to the field of lake water ecological environment management, in particular to a long-term lake cyanobacterial bloom forecasting method and system based on STL-RF-LSTM.
Background
Along with global warming and the discharge of a large amount of nutritive salts such as nitrogen, phosphorus and the like into the lake water body, the eutrophication situation of the lake is more and more severe, so that the frequency of the blue algae bloom outbreak in the lake is continuously increased, the outbreak scale is continuously increased, great influence is generated on the health state of the water ecological environment of the lake, and even the social life and the economic production of human beings are threatened in severe cases. The treatment of the lake cyanobacteria bloom is a complex system engineering, although a large amount of related manpower and material resources are input, the development of the treatment effect still needs a longer time process, the long-term prediction of the future change trend is carried out by combining the historical law of the lake cyanobacteria bloom, the improvement effect of the treatment measures input at the early stage on the long-term evolution of the cyanobacteria bloom can be tested, and the method is also an important means for carrying out the lake water environment ecological management.
Under the interference of various uncertain factors, the time sequence change characteristics of the lake cyanobacteria bloom often show a non-stationary trend, and the prediction precision can be influenced by directly inputting the data into a model. When the chlorophyll a is used for representing the cyanobacterial bloom, the actual monitoring of the chlorophyll a is influenced by random factors such as weather and instrument precision, so that the measured value contains noise, and the true change rule of the chlorophyll a sequence can be submerged by the existence of the noise. And the cyanobacterial bloom in the lake is often influenced by various environmental factors, but when modeling is carried out by utilizing all the environmental influence factors and the cyanobacterial bloom, the problem of high data redundancy exists, and the screening of key environmental influence factors is needed. The linear correlation coefficient adopted by the traditional Pearson correlation analysis has poor analysis effect on the nonlinear relation between the cyanobacterial bloom and the environmental impact factor. Therefore, a new lake blue algae forecasting model method is needed to be found, high-precision long-term blue algae water bloom forecasting is realized, and reference and decision are provided for lake water environment management.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to provide a long-term forecasting method and system for lake cyanobacterial bloom based on STL-RF-LSTM, which can better remove the noise influence of a chlorophyll a long-term monitoring data sequence and obtain the real internal trend change rule; meanwhile, the nonlinear relation between the environmental factors and the chlorophyll a is considered, the environmental factors which have large influence on the chlorophyll a are screened, and the influence of data redundancy is avoided; and the long-term prediction of the future chlorophyll a trend is realized by considering the dependency relationship of chlorophyll a and environmental influence factors thereof on the front and back time.
The technical scheme is as follows: in order to achieve the purpose, the invention adopts the following technical scheme:
a lake cyanobacterial bloom long-term forecasting method based on STL-RF-LSTM comprises the following steps:
(1) collecting long-time sequence chlorophyll a data and environmental influence factor data of the lake for years and months, wherein the relevant environmental influence factor data comprise water temperature WT, dissolved oxygen DO, pH and potassium permanganate index CODMnNH, ammonia nitrogen3-N, nitro nitrogen NO3-N, total nitrogen TN, total phosphorus TP and phosphate PO4-;
(2) Carrying out month-by-month chlorophyll a long-time sequence data STL decomposition to obtain a trend item, a season item and random item data of chlorophyll a;
(3) based on trend item data of chlorophyll a obtained after STL decomposition, performing importance evaluation on related environmental factors by using an RF (radio frequency) model to obtain key environmental influence factors influencing cyanobacterial bloom;
(4) dividing monthly data in a training set according to years, taking the data of the last year as a verification set, respectively constructing an LSTM model with a backtracking step length of 1-12, establishing a corresponding relation between a key environment influence factor and chlorophyll a under different backtracking step lengths by using the LSTM model according to chlorophyll a trend item data after STL decomposition and key environment factor monitoring data after RF model screening, and performing model precision evaluation to determine the optimal backtracking step length; and inputting the collected data of the key environmental factors for predicting the future into an LSTM model with the established optimal backtracking step length to obtain a long-term forecasting result of the future cyanobacterial bloom.
Preferably, the decomposing of the long-term chlorophyll-a sequence data STL in step (2) comprises:
determining the sizes of a trend window and a season item window;
the temporal chlorophyll a series was decomposed using the STL model into three additive components, represented as:
Yt=Tt+St+Rt
wherein, YtAs observed for chlorophyll a at time T, Tt、St、RtRespectively a trend item, a season item and a random fluctuation item of the observed value.
Preferably, when the RF model is used to screen the key environmental factors in step (3), the calculating of the relative importance of each environmental factor using the out-of-bag data OOB in the random forest sampling process includes:
determining the number N of decision trees contained in the random forest, and respectively utilizing each decision tree lnPredicting corresponding OOB data and calculating the root mean square error e of the OOB datan;
Keeping other environmental factors of the OOB data unchanged, only disturbing the characteristic value sequence of the ith environmental factor, and recalculating the predicted root mean square error
Computing the ith environmental factor for each decision tree lnOf importanceThe calculation formula is as follows:
repeating the calculation steps until the whole random forest model is traversed to obtain the importance mu of the environment factor i, wherein the calculation formula is as follows:
preferably, in the step (4), a MinMax method is adopted to perform normalization processing on the chlorophyll a trend item data and the environmental impact factor data, and the data value range is transformed to [0,1 ].
Preferably, in the step (4), in the training set, the environment factor is used as the input sequence X of the model, the chlorophyll a trend sequence is used as the output Y of the model, and for the LSTM model with a backtracking step length of s, the sequence value of the ith component in the input sequence X of the training set is Xi,1、xi,2、…、xi,mM is the length of the time series, and the sequence value of the output series Y is Yi,1+s、yi,2+s、…、yi,m+s。
Preferably, in the step (4), the LSTM model accuracy evaluation index is selected from a nash coefficient NSE, a root mean square error RMSE, or a correlation coefficient R2The calculation method is as follows:
wherein the content of the first and second substances,is the observed value of the chlorophyll a test period,the prediction value of the chlorophyll a test period model is obtained,is the average of the observed values over the test period,the average value of the prediction values of the test period model is shown, and t is the length of the time sequence in the test set.
Based on the same inventive concept, the long-term lake blue algae forecasting system based on the STL-RF-LSTM comprises the following modules:
the acquisition module is used for collecting long-time sequence chlorophyll a data and environmental influence factor data thereof of the lake for years and months, wherein the relevant environmental influence factor data comprise water temperature WT, dissolved oxygen DO, pH and potassium permanganate index CODMnNH, ammonia nitrogen3-N, nitro nitrogen NO3-N, total nitrogen TN, total phosphorus TP and phosphate PO4-;
The STL decomposition module is used for decomposing the long-time sequence data STL of the chlorophyll a month by month to obtain a trend item, a season item and random item data of the chlorophyll a;
the RF screening module is used for performing importance evaluation on related environmental factors by using an RF model based on trend item data of chlorophyll a obtained after STL decomposition to obtain key environmental influence factors influencing cyanobacterial bloom;
the LSTM training and forecasting module is used for dividing monthly data in a training set according to years, taking the data of the last year as a verification set, respectively constructing an LSTM model with a backtracking step length of 1-12, establishing a corresponding relation between a key environment influence factor and chlorophyll a under different backtracking step lengths by using the LSTM model according to chlorophyll a trend item data after STL decomposition and key environment factor monitoring data after RF model screening, and performing model precision evaluation to determine the optimal backtracking step length; and inputting the collected data of the key environmental factors for predicting the future into an LSTM model with the established optimal backtracking step length to obtain a long-term forecasting result of the future cyanobacterial bloom.
Based on the same inventive concept, the invention provides a computer system, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the computer program realizes the long-term prediction method of lake cyanobacterial bloom based on STL-RF-LSTM when being loaded to the processor.
Has the advantages that: according to the method, an STL decomposition model is used for removing noise influence of random factors on long-time sequence data of the lake cyanobacterial bloom, so that a real cyanobacterial bloom historical trend time sequence is obtained; then, an RF model is adopted to identify key environmental influence factors of the cyanobacterial bloom, so that the influence of data redundancy is avoided; and finally, the key environmental impact factor data at the current moment are utilized and substituted into the trained model to obtain the future long-term lake blue algae bloom trend change, thereby providing reference and decision for lake water environment management. Compared with the prior art, the invention has the following beneficial effects:
1. according to the method, the long-term chlorophyll a sequence data containing noise is decomposed by using the STL decomposition model, the long-term trend change rule of the chlorophyll a can be effectively obtained, and the influence of potential noise of monitoring data on the long-term chlorophyll a prediction precision is avoided. 2. When the RF model is adopted to screen the key environmental factors of the cyanobacterial bloom, the nonlinear relation between the cyanobacterial bloom and the environmental impact factors can be effectively considered, the key environmental impact factors can be accurately identified, and the influence of data redundancy on the long-term forecasting of chlorophyll a is reduced. 3. The method adopts an LSTM model to carry out long-term prediction on the chlorophyll a, considers the dependency relationship of the chlorophyll a and environmental influence factors thereof on the front and back time, and determines the optimal backtracking step length by comparing the cyanobacterial bloom prediction effects under different backtracking step lengths to obtain an optimal prediction model; the long-term prediction of the future cyanobacterial bloom can be obtained by utilizing the environmental factor monitoring data at the current time, the prediction of the future environmental factor is avoided, the accumulation of errors is reduced, and the accuracy of the prediction result is further ensured.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention.
FIG. 2 is an exploded view of a site chlorophyll a time series and its STL according to an embodiment of the present invention.
FIG. 3 is a comparison of the training phase and the testing phase of the LSTM and ARIMA models in an embodiment of the present invention.
FIG. 4 is a comparison graph of the accuracy of the LSTM and ARIMA models during the training phase and during the testing phase in an embodiment of the present invention.
Detailed Description
The invention will be further explained with reference to the drawings and the specific embodiments.
As shown in FIG. 1, the long-term forecasting method for lake cyanobacterial bloom based on STL-RF-LSTM disclosed by the embodiment of the invention comprises the following steps:
(1) collecting long-time sequence chlorophyll a data and environmental influence factor data thereof including Water Temperature (WT), Dissolved Oxygen (DO), pH, and potassium permanganate index (COD) of lake for years and monthsMn) Ammonia Nitrogen (NH)3-N), nitro Nitrogen (NO)3-N), Total Nitrogen (TN), Total Phosphorus (TP), Phosphate (PO)4-) And the like. In this example, chlorophyll a long-time sequence data of three sites of Taihu lake Dapu mouth, Tushan mountain and Jiashan mountain, which are collected continuously for 16 years and month by month from 2000 to 2015, and corresponding Water Temperature (WT), Dissolved Oxygen (DO), pH and potassium permanganate index (COD)Mn) Ammonia Nitrogen (NH)3-N), nitro Nitrogen (NO)3-N), Total Nitrogen (TN), Total Phosphorus (TP), Phosphate (PO)4-) Equal environmental impactLong time sequence data of factors.
(2) And (5) carrying out month-by-month chlorophyll a long-time sequence data STL decomposition to obtain seasonal terms, trend terms and random term data of chlorophyll a. The method specifically comprises the following steps:
(2.1) determining a period window and a seasonal window, wherein the seasonal window is set to 11, and the period window model is automatically selected, which can generally balance the seasonal overfitting and allow it to vary slowly over time.
And (2.2) substituting the chlorophyll a data into an STL model with set parameters to obtain a trend item, a season item and a random item of the chlorophyll a long-time sequence. STL decomposition of large Pukochlorophyll a for a long time is shown in FIG. 2, and (b) in FIG. 2 is the long-term variation trend of large Pukochlorophyll a after decomposition.
The STL decomposition is a time sequence decomposition method taking robust local weighted regression as a smoothing method, reduces the importance of data feature screening on the premise of ensuring enough training data, has the characteristics of simplicity of the traditional linear regression and robustness of the nonlinear regression, and can decompose data into trend terms, seasonal terms and random terms. The STL model is applied to a chlorophyll a long-term monitoring data set containing noise and obvious seasonal regularity, and the historical long-term change trend of the chlorophyll a can be effectively extracted.
(3) And (3) based on the trend item data of the chlorophyll a obtained after STL decomposition, performing importance evaluation on the related environmental factors by using an RF (radio frequency) model to obtain key environmental influence factors influencing the cyanobacterial bloom. The linear correlation coefficient adopted by the traditional Pearson correlation analysis has poor analysis effect on the nonlinear relation between the cyanobacterial bloom and the environmental impact factors, the invention adopts the RF model and takes the decision tree as the base learner to construct the Bagging integration algorithm, can consider the nonlinear relation between variables, can screen out the key environmental impact factors which have great influence on the cyanobacterial bloom from a plurality of environmental impact factors, and can keep effective information under the condition of eliminating redundant and irrelevant information. The method comprises the following specific steps:
(3.1) determining the number N of decision trees contained in the random forest to be 100, and respectively utilizing each decisionTree lnPredicting corresponding OOB data and calculating the root mean square error e of the OOB datan。
(3.2) keeping other environmental factors of the OOB data unchanged, only disturbing the characteristic value sequence of the ith environmental factor, and recalculating the predicted root mean square error
(3.3) calculating the importance of the ith environmental factor to each decision treeThe calculation formula is as follows:
repeating the calculation steps until the whole random forest model is traversed to obtain the importance of the environment factor i:
calculating water temperature, dissolved oxygen, pH and COD according to the formulaMn、NH3The cumulative contribution rate of seven environmental factors of-N, TN and TP exceeds 90%, namely the environmental factors have strong importance for the prediction of chlorophyll a and are key environmental factors influencing chlorophyll a.
(4) And according to the chlorophyll a trend data after STL decomposition and the key environmental factor monitoring data after RF model screening, establishing a corresponding relation between the chlorophyll a trend data and the key environmental factor monitoring data before and after the LSTM model, inputting the key environmental factor data of the current time into the established LSTM model, and obtaining a long-term forecasting result of the future cyanobacterial bloom. The method specifically comprises the following steps:
and (4.1) when data normalization is carried out, carrying out normalization on chlorophyll a data of each station after STL decomposition for 16 continuous years and seven key environmental impact factor data screened by an RF model to an interval [0,1] by using a MinMax method. The concrete form is as follows:
wherein, x and x*The chlorophyll a trend sequence value before and after normalization, xmax、xminThe maximum value and the minimum value of the chlorophyll a trend periodic sequence are respectively shown.
And (4.2) when the LSTM is used for establishing the chlorophyll a and the time corresponding relation of the key environmental influence factors, a training set and a test set need to be divided. Specifically, a training set is divided into k parts in equal proportion, the k part is selected as a verification set, and the rest k-1 parts are used for training. In the embodiment, the time series length of the chlorophyll a and the key environmental influence factors thereof is 192 (month-by-month data of 16 years in total), wherein the training set is divided into 180, the test set is divided into 12, namely the LSTM is used for predicting the water bloom change trend of the blue algae in the next 12 months, meanwhile, the training set is divided into 15 parts by year, and the 15 th part, namely 168-180 th data is selected as a verification set.
(4.3) when training the LSTM model, the main parameter settings of the model are shown in Table 3:
TABLE 1 LSTM model parameter values
And (4.4) respectively carrying out model training when the backtracking step length is 1-12 so as to determine the backtracking step length under the optimal prediction precision result. In the training set, the environmental factors are used as an input sequence X of the model, wherein X comprises seven components (time sequences of seven environmental factors respectively), and the chlorophyll a trend sequence is used as an output Y of the model. When the backtracking step length is 1, the sequence value of the ith component in the training set input sequence X is Xi,1、xi,2、…、xi,167Sequence length is 167, and sequence value of output sequence Y is Yi,2、yi,3、…、yi,168The sequence length is 167, and the input sequence X and the output sequence Y are substituted into the LSTM modelTraining is carried out and certain precision is achieved, and the corresponding relation between seven key environmental influence factors and chlorophyll a when the backtracking step length is 1 can be obtained. The accuracy evaluation index includes NSE, RMSE and R2The calculation method is as follows:
wherein m is the length of the time series used for training,is an observed value of chlorophyll a in a training period,a prediction value is reported for a chlorophyll a training period model,is the average of the observations during the training period,the mean of the predicted values for the training period model is reported. The results of the precision evaluation of the three stations in the training period are shown in fig. 4 (a), (c), and (e).
(4.5) when long-term prediction is carried out in the next year, models with different backtracking step lengths trained in a training period are used for comparing the precision of the future prediction results to determine the backtracking step length with the highest precision of the corresponding site prediction, and the precision evaluation index selects a Nash coefficient NSE, a root mean square error RMSE and a correlation coefficient R2The calculation method is as follows:
wherein t is the length of the time sequence in the test set,is the observed value of the chlorophyll a test period,the prediction value of the chlorophyll a test period model is obtained,is the average of the observed values over the test period,the average of the predicted values for the test period model is reported. The results of the accuracy evaluation of the three stations in the training period are shown in fig. 4 (b), (d), and (f).
Fig. 3 shows the long-term chlorophyll a forecast result of the next 1 year at the optimal backtracking step length of the corresponding site. Wherein, the optimal backtracking step lengths of the large Pukou, the focal mountain and the dragging mountain are respectively 5, 2 and 1.
Comparing the LSTM model with the conventional time sequence analysis model ARIMA, the chlorophyll a forecasting precision of the method is higher than that of the traditional model.
In conclusion, the method provided by the invention has the advantages that the long-term sequence cyanobacterial bloom and the monitoring data of the relevant environmental impact factors thereof in the lake are collected, STL time sequence decomposition is carried out on the cyanobacterial bloom data, the potential noise in the cyanobacterial bloom monitoring data is removed, and the long-term change trend of the cyanobacterial bloom is obtained; considering that the cyanobacterial bloom and the environmental impact factors are in a nonlinear relation, the traditional Pearson correlation analysis can only obtain linear correlation coefficients among variables, and the analysis effect is poor for the nonlinear variables, so that the RF model is adopted to screen the key environmental factors of the cyanobacterial bloom, and the influence of data information redundancy can be further avoided on the basis of effectively considering the nonlinear relation between the environmental factors and the cyanobacterial bloom; and finally, establishing a corresponding relation between the cyanobacterial bloom and the key environmental influence factors thereof in the time before and after by using an LSTM model, and substituting the monitoring data of the key environmental influence factors actually measured at the current time into the established corresponding relation to obtain a future cyanobacterial bloom forecasting result, so that the environmental factors at the future time are not required to be forecasted, the accumulation of forecasting errors is avoided, and the cyanobacterial bloom forecasting precision can be effectively improved. The invention couples the STL model, the RF model and the LSTM model, can accurately extract the long-term change trend of the cyanobacterial bloom, identifies the key environmental influence factors influencing the cyanobacterial bloom, and obtains more accurate future long-term forecasting results of the cyanobacterial bloom on the basis of considering the time relativity of the cyanobacterial bloom and the key environmental influence factors.
Based on the same inventive concept, the lake blue algae long-term forecasting system based on the STL-RF-LSTM provided by the embodiment of the invention comprises the following modules: the acquisition module is used for collecting long-time sequence chlorophyll a data and environmental influence factor data of the lake for years and months; the STL decomposition module is used for decomposing the long-time sequence data STL of the chlorophyll a month by month to obtain a trend item, a season item and random item data of the chlorophyll a; the RF screening module is used for performing importance evaluation on related environmental factors by using an RF model based on trend item data of chlorophyll a obtained after STL decomposition to obtain key environmental influence factors influencing cyanobacterial bloom; the LSTM training and forecasting module is used for dividing monthly data in a training set according to years, taking the data of the last year as a verification set, respectively constructing an LSTM model with a backtracking step length of 1-12, establishing a corresponding relation between a key environment influence factor and chlorophyll a under different backtracking step lengths by using the LSTM model according to chlorophyll a trend item data after STL decomposition and key environment factor monitoring data after RF model screening, and performing model precision evaluation to determine the optimal backtracking step length; and inputting the collected data of the key environmental factors for predicting the future into an LSTM model with the established optimal backtracking step length to obtain a long-term forecasting result of the future cyanobacterial bloom. For details of the specific implementation of each module, reference is made to the above method embodiments, which are not described again.
Based on the same inventive concept, the embodiment of the invention provides a computer system, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the computer program is loaded into the processor to realize the long-term prediction method of lake cyanobacterial bloom based on STL-RF-LSTM.
While the invention has been described in connection with what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not to be limited to the disclosed embodiment, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.
Claims (10)
1. A lake cyanobacterial bloom long-term forecasting method based on STL-RF-LSTM is characterized by comprising the following steps:
(1) collecting long-time sequence chlorophyll a data and environmental influence factor data of the lake for years and months, wherein the relevant environmental influence factor data comprise water temperature WT, dissolved oxygen DO, pH and potassium permanganate index CODMnNH, ammonia nitrogen3-N, nitro nitrogen NO3-N, total nitrogen TN, total phosphorus TP and phosphate PO4-;
(2) Carrying out month-by-month chlorophyll a long-time sequence data STL decomposition to obtain a trend item, a season item and random item data of chlorophyll a;
(3) based on trend item data of chlorophyll a obtained after STL decomposition, performing importance evaluation on related environmental factors by using an RF (radio frequency) model to obtain key environmental influence factors influencing cyanobacterial bloom;
(4) dividing monthly data in a training set according to years, taking the data of the last year as a verification set, respectively constructing an LSTM model with a backtracking step length of 1-12, establishing a corresponding relation between a key environment influence factor and chlorophyll a under different backtracking step lengths by using the LSTM model according to chlorophyll a trend item data after STL decomposition and key environment factor monitoring data after RF model screening, and performing model precision evaluation to determine the optimal backtracking step length; and inputting the collected data of the key environmental factors for predicting the future into an LSTM model with the established optimal backtracking step length to obtain a long-term forecasting result of the future cyanobacterial bloom.
2. The STL-RF-LSTM-based long-term forecasting method for lake cyanobacterial bloom as claimed in claim 1, wherein the STL decomposition of the lunar chlorophyll a long-term sequence data in step (2) comprises:
determining the sizes of a trend window and a season item window;
the temporal chlorophyll a series was decomposed using the STL model into three additive components, represented as:
Yt=Tt+St+Rt
wherein, YtAs observed for chlorophyll a at time T, Tt、St、RtRespectively a trend item, a season item and a random fluctuation item of the observed value.
3. The STL-RF-LSTM-based long-term forecasting method for lake cyanobacterial bloom as claimed in claim 1, wherein when the RF model is adopted to screen the key environmental factors in step (3), the relative importance of each environmental factor is calculated by using the out-of-bag data OOB in the random forest sampling process, and the method comprises the following steps:
determining the number N of decision trees contained in the random forest, and respectively utilizing each decision tree lnPredicting corresponding OOB data and calculating the root mean square error e of the OOB datan;
Keeping other environmental factors of the OOB data unchanged, only disturbing the characteristic value sequence of the ith environmental factor, and recalculating the predicted root mean square error
Computing the ith environmental factor for each decision tree lnOf importanceThe calculation formula is as follows:
repeating the calculation steps until the whole random forest model is traversed to obtain the importance mu of the environment factor i, wherein the calculation formula is as follows:
4. the STL-RF-LSTM-based long-term forecasting method for lake cyanobacterial bloom as claimed in claim 1, wherein in the step (4), a MinMax method is adopted to normalize the chlorophyll a trend data and the environmental impact factor data, and transform the data value domain to [0,1 ].
5. The STL-RF-LSTM-based lake cyanobacterial bloom long-term forecasting method as claimed in claim 1, wherein the step (4) is implemented by taking an environmental factor as an input sequence X of a model, taking a chlorophyll a trend sequence as an output Y of the model, and taking a sequence value of an ith component in the input sequence X of the training set as X for the LSTM model with a backtracking step length of si,1、xi,2、…、xi,mM is the length of the time series, and the sequence value of the output series Y is Yi,1+s、yi,2+s、…、yi,m+s。
6. The STL-RF-LSTM-based lake cyanobacterial bloom long-term forecasting method as claimed in claim 1, wherein the LSTM model accuracy evaluation index in the step (4) is selected from Nash coefficient NSE, root mean square error RMSE or correlation coefficient R2The calculation method is as follows:
wherein the content of the first and second substances,is the observed value of the chlorophyll a test period,the prediction value of the chlorophyll a test period model is obtained,is the average of the observed values over the test period,the average value of the prediction values of the test period model is shown, and t is the length of the time sequence in the test set.
7. The long-term lake blue-green algae forecasting system based on the STL-RF-LSTM is characterized by comprising the following modules:
the acquisition module is used for collecting long-time sequence chlorophyll a data and environmental influence factor data thereof of the lake for years and months, wherein the relevant environmental influence factor data comprise water temperature WT, dissolved oxygen DO, pH and potassium permanganate index CODMnNH, ammonia nitrogen3-N, nitro nitrogen NO3-N, total nitrogen TN, total phosphorus TP and phosphate PO4-;
The STL decomposition module is used for decomposing the long-time sequence data STL of the chlorophyll a month by month to obtain a trend item, a season item and random item data of the chlorophyll a;
the RF screening module is used for performing importance evaluation on related environmental factors by using an RF model based on trend item data of chlorophyll a obtained after STL decomposition to obtain key environmental influence factors influencing cyanobacterial bloom;
the LSTM training and forecasting module is used for dividing monthly data in a training set according to years, taking the data of the last year as a verification set, respectively constructing an LSTM model with a backtracking step length of 1-12, establishing a corresponding relation between a key environment influence factor and chlorophyll a under different backtracking step lengths by using the LSTM model according to chlorophyll a trend item data after STL decomposition and key environment factor monitoring data after RF model screening, and performing model precision evaluation to determine the optimal backtracking step length; and inputting the collected data of the key environmental factors for predicting the future into an LSTM model with the established optimal backtracking step length to obtain a long-term forecasting result of the future cyanobacterial bloom.
8. The STL-RF-LSTM-based long-term forecasting system for lake cyanobacterial bloom as claimed in claim 7, wherein the STL decomposition module comprises, when performing STL decomposition on monthly chlorophyll a long-term sequence data:
determining the sizes of a trend window and a season item window;
the temporal chlorophyll a series was decomposed using the STL model into three additive components, represented as:
Yt=Tt+St+Rt
wherein, YtAs observed for chlorophyll a at time T, Tt、St、RtRespectively a trend item, a season item and a random fluctuation item of the observed value.
9. The STL-RF-LSTM-based long-term forecasting system for lake cyanobacterial bloom as claimed in claim 7, wherein when the RF screening module employs the RF model to screen the key environmental factors, the relative importance of each environmental factor is calculated by using the out-of-bag data OOB in the random forest sampling process, comprising:
determining the number N of decision trees contained in the random forest, and respectively utilizing each decision tree lnPredicting corresponding OOB data and calculating the root mean square error e of the OOB datan;
Keeping other environmental factors of the OOB data unchanged, only disturbing the characteristic value sequence of the ith environmental factor, and recalculating the predicted root mean square error
Computing the ith environmental factor for each decision tree lnOf importanceThe calculation formula is as follows:
repeating the calculation steps until the whole random forest model is traversed to obtain the importance mu of the environment factor i, wherein the calculation formula is as follows:
10. a computer system comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the computer program when loaded on the processor implements the STL-RF-LSTM-based long-term prediction method of cyanobacterial bloom in lakes according to any of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210063970.0A CN114386710A (en) | 2022-01-20 | 2022-01-20 | Long-term lake cyanobacterial bloom forecasting method and system based on STL-RF-LSTM |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210063970.0A CN114386710A (en) | 2022-01-20 | 2022-01-20 | Long-term lake cyanobacterial bloom forecasting method and system based on STL-RF-LSTM |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114386710A true CN114386710A (en) | 2022-04-22 |
Family
ID=81203013
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210063970.0A Pending CN114386710A (en) | 2022-01-20 | 2022-01-20 | Long-term lake cyanobacterial bloom forecasting method and system based on STL-RF-LSTM |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114386710A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115575363A (en) * | 2022-09-27 | 2023-01-06 | 北京航空航天大学 | Method and system for acquiring ecological influence mechanism |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104899653A (en) * | 2015-06-02 | 2015-09-09 | 北京工商大学 | Lake and reservoir cyanobacterial bloom prediction method based on expert system and cyanobacterial growth mechanism timing model |
CN109308544A (en) * | 2018-08-21 | 2019-02-05 | 北京师范大学 | Based on to sdpecific dispersion-shot and long term memory network cyanobacterial bloom prediction technique |
-
2022
- 2022-01-20 CN CN202210063970.0A patent/CN114386710A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104899653A (en) * | 2015-06-02 | 2015-09-09 | 北京工商大学 | Lake and reservoir cyanobacterial bloom prediction method based on expert system and cyanobacterial growth mechanism timing model |
CN109308544A (en) * | 2018-08-21 | 2019-02-05 | 北京师范大学 | Based on to sdpecific dispersion-shot and long term memory network cyanobacterial bloom prediction technique |
Non-Patent Citations (2)
Title |
---|
张以飞 等: ""非参数季节分解模型太湖叶绿素 a 浓度变化特征研究"", 《四川环境》 * |
郝玉莹 等: ""基于RF-LSTM的地表水体水质预测"", 《水资源与水工程学报》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115575363A (en) * | 2022-09-27 | 2023-01-06 | 北京航空航天大学 | Method and system for acquiring ecological influence mechanism |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Van Griensven et al. | Sensitivity analysis and auto-calibration of an integral dynamic model for river water quality | |
CN107885951B (en) | A kind of Time series hydrological forecasting method based on built-up pattern | |
CN108876021B (en) | Medium-and-long-term runoff forecasting method and system | |
US20120179373A1 (en) | Method for measuring total phosphorus using multi-parameter water quality data | |
CN111582551A (en) | Method and system for predicting short-term wind speed of wind power plant and electronic equipment | |
CN112288193A (en) | Ocean station surface salinity prediction method based on GRU deep learning of attention mechanism | |
CN112381673B (en) | Park electricity utilization information analysis method and device based on digital twin | |
CN110490366A (en) | Runoff forestry method based on variation mode decomposition and iteration decision tree | |
CN111695290A (en) | Short-term runoff intelligent forecasting hybrid model method suitable for variable environment | |
CN112508299A (en) | Power load prediction method and device, terminal equipment and storage medium | |
CN107679756B (en) | Soil suitability evaluation method and device | |
CN114386710A (en) | Long-term lake cyanobacterial bloom forecasting method and system based on STL-RF-LSTM | |
CN114357737B (en) | Agent optimization calibration method for time-varying parameters of large-scale hydrologic model | |
CN114819407A (en) | Dynamic prediction method and device for lake blue algae bloom | |
CN110782112B (en) | Method and system for estimating greenhouse gas emission reduction potential in crop production | |
CN117035155A (en) | Water quality prediction method | |
CN106404712A (en) | Adaptive model correcting method and system based on GT-KF-PLC near infrared spectrum | |
CN116127833A (en) | Wind power prediction method, system, device and medium based on VMD and LSTM fusion model | |
CN116091103A (en) | Method, device, electronic equipment and medium for measuring and calculating periodic environment remediation | |
CN115618720A (en) | Soil salinization prediction method and system based on altitude | |
CN114784795A (en) | Wind power prediction method and device, electronic equipment and storage medium | |
CN114492944A (en) | TLBO-Elman-based photovoltaic power station short-term power generation power prediction method and device and storage medium | |
US20100010847A1 (en) | Technique that utilizes a monte carlo method to handle the uncertainty of input values when computing the net present value (npv) for a project | |
Chen et al. | Uncertainty analysis of hydrologic forecasts based on copulas | |
Haida et al. | Modelling daily Dissolved Oxygen Dynamics in the Sebou River (Morocco): Data-Centric Approaches |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20220422 |
|
RJ01 | Rejection of invention patent application after publication |