CN114386710A - Long-term lake cyanobacterial bloom forecasting method and system based on STL-RF-LSTM - Google Patents

Long-term lake cyanobacterial bloom forecasting method and system based on STL-RF-LSTM Download PDF

Info

Publication number
CN114386710A
CN114386710A CN202210063970.0A CN202210063970A CN114386710A CN 114386710 A CN114386710 A CN 114386710A CN 202210063970 A CN202210063970 A CN 202210063970A CN 114386710 A CN114386710 A CN 114386710A
Authority
CN
China
Prior art keywords
data
chlorophyll
model
stl
long
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210063970.0A
Other languages
Chinese (zh)
Inventor
陈求稳
陈诚
张建云
李港
何梦男
林育青
李夫健
胡维鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Shouping Information Industry Co ltd
Nanjing Hydraulic Research Institute of National Energy Administration Ministry of Transport Ministry of Water Resources
Original Assignee
Jiangsu Shouping Information Industry Co ltd
Nanjing Hydraulic Research Institute of National Energy Administration Ministry of Transport Ministry of Water Resources
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Shouping Information Industry Co ltd, Nanjing Hydraulic Research Institute of National Energy Administration Ministry of Transport Ministry of Water Resources filed Critical Jiangsu Shouping Information Industry Co ltd
Priority to CN202210063970.0A priority Critical patent/CN114386710A/en
Publication of CN114386710A publication Critical patent/CN114386710A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a long-term prediction method and a long-term prediction system for lake cyanobacterial bloom based on STL-RF-LSTM, which are characterized in that the method and the system remove potential noise in cyanobacterial bloom monitoring data and obtain the long-term change trend of the cyanobacterial bloom by collecting monitoring data of long-term sequence cyanobacterial bloom and relevant environmental influence factors of the lake cyanobacterial bloom and carrying out STL time sequence decomposition on the cyanobacterial bloom data; the RF model is adopted to screen the key environmental factors of the cyanobacterial bloom, so that the influence of data information redundancy can be further avoided on the basis of effectively considering the nonlinear relation between the environmental factors and the cyanobacterial bloom; and finally, establishing a corresponding relation between the cyanobacterial bloom and the key environmental influence factors thereof in the time before and after by using an LSTM model, and substituting the monitoring data of the key environmental influence factors actually measured at the current time into the established corresponding relation to obtain a future cyanobacterial bloom forecasting result, so that the environmental factors at the future time are not required to be forecasted, the accumulation of forecasting errors is avoided, and the cyanobacterial bloom forecasting precision can be effectively improved.

Description

Long-term lake cyanobacterial bloom forecasting method and system based on STL-RF-LSTM
Technical Field
The invention relates to the field of lake water ecological environment management, in particular to a long-term lake cyanobacterial bloom forecasting method and system based on STL-RF-LSTM.
Background
Along with global warming and the discharge of a large amount of nutritive salts such as nitrogen, phosphorus and the like into the lake water body, the eutrophication situation of the lake is more and more severe, so that the frequency of the blue algae bloom outbreak in the lake is continuously increased, the outbreak scale is continuously increased, great influence is generated on the health state of the water ecological environment of the lake, and even the social life and the economic production of human beings are threatened in severe cases. The treatment of the lake cyanobacteria bloom is a complex system engineering, although a large amount of related manpower and material resources are input, the development of the treatment effect still needs a longer time process, the long-term prediction of the future change trend is carried out by combining the historical law of the lake cyanobacteria bloom, the improvement effect of the treatment measures input at the early stage on the long-term evolution of the cyanobacteria bloom can be tested, and the method is also an important means for carrying out the lake water environment ecological management.
Under the interference of various uncertain factors, the time sequence change characteristics of the lake cyanobacteria bloom often show a non-stationary trend, and the prediction precision can be influenced by directly inputting the data into a model. When the chlorophyll a is used for representing the cyanobacterial bloom, the actual monitoring of the chlorophyll a is influenced by random factors such as weather and instrument precision, so that the measured value contains noise, and the true change rule of the chlorophyll a sequence can be submerged by the existence of the noise. And the cyanobacterial bloom in the lake is often influenced by various environmental factors, but when modeling is carried out by utilizing all the environmental influence factors and the cyanobacterial bloom, the problem of high data redundancy exists, and the screening of key environmental influence factors is needed. The linear correlation coefficient adopted by the traditional Pearson correlation analysis has poor analysis effect on the nonlinear relation between the cyanobacterial bloom and the environmental impact factor. Therefore, a new lake blue algae forecasting model method is needed to be found, high-precision long-term blue algae water bloom forecasting is realized, and reference and decision are provided for lake water environment management.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to provide a long-term forecasting method and system for lake cyanobacterial bloom based on STL-RF-LSTM, which can better remove the noise influence of a chlorophyll a long-term monitoring data sequence and obtain the real internal trend change rule; meanwhile, the nonlinear relation between the environmental factors and the chlorophyll a is considered, the environmental factors which have large influence on the chlorophyll a are screened, and the influence of data redundancy is avoided; and the long-term prediction of the future chlorophyll a trend is realized by considering the dependency relationship of chlorophyll a and environmental influence factors thereof on the front and back time.
The technical scheme is as follows: in order to achieve the purpose, the invention adopts the following technical scheme:
a lake cyanobacterial bloom long-term forecasting method based on STL-RF-LSTM comprises the following steps:
(1) collecting long-time sequence chlorophyll a data and environmental influence factor data of the lake for years and months, wherein the relevant environmental influence factor data comprise water temperature WT, dissolved oxygen DO, pH and potassium permanganate index CODMnNH, ammonia nitrogen3-N, nitro nitrogen NO3-N, total nitrogen TN, total phosphorus TP and phosphate PO4-
(2) Carrying out month-by-month chlorophyll a long-time sequence data STL decomposition to obtain a trend item, a season item and random item data of chlorophyll a;
(3) based on trend item data of chlorophyll a obtained after STL decomposition, performing importance evaluation on related environmental factors by using an RF (radio frequency) model to obtain key environmental influence factors influencing cyanobacterial bloom;
(4) dividing monthly data in a training set according to years, taking the data of the last year as a verification set, respectively constructing an LSTM model with a backtracking step length of 1-12, establishing a corresponding relation between a key environment influence factor and chlorophyll a under different backtracking step lengths by using the LSTM model according to chlorophyll a trend item data after STL decomposition and key environment factor monitoring data after RF model screening, and performing model precision evaluation to determine the optimal backtracking step length; and inputting the collected data of the key environmental factors for predicting the future into an LSTM model with the established optimal backtracking step length to obtain a long-term forecasting result of the future cyanobacterial bloom.
Preferably, the decomposing of the long-term chlorophyll-a sequence data STL in step (2) comprises:
determining the sizes of a trend window and a season item window;
the temporal chlorophyll a series was decomposed using the STL model into three additive components, represented as:
Yt=Tt+St+Rt
wherein, YtAs observed for chlorophyll a at time T, Tt、St、RtRespectively a trend item, a season item and a random fluctuation item of the observed value.
Preferably, when the RF model is used to screen the key environmental factors in step (3), the calculating of the relative importance of each environmental factor using the out-of-bag data OOB in the random forest sampling process includes:
determining the number N of decision trees contained in the random forest, and respectively utilizing each decision tree lnPredicting corresponding OOB data and calculating the root mean square error e of the OOB datan
Keeping other environmental factors of the OOB data unchanged, only disturbing the characteristic value sequence of the ith environmental factor, and recalculating the predicted root mean square error
Figure BDA0003479381750000021
Computing the ith environmental factor for each decision tree lnOf importance
Figure BDA0003479381750000022
The calculation formula is as follows:
Figure BDA0003479381750000031
repeating the calculation steps until the whole random forest model is traversed to obtain the importance mu of the environment factor i, wherein the calculation formula is as follows:
Figure BDA0003479381750000032
preferably, in the step (4), a MinMax method is adopted to perform normalization processing on the chlorophyll a trend item data and the environmental impact factor data, and the data value range is transformed to [0,1 ].
Preferably, in the step (4), in the training set, the environment factor is used as the input sequence X of the model, the chlorophyll a trend sequence is used as the output Y of the model, and for the LSTM model with a backtracking step length of s, the sequence value of the ith component in the input sequence X of the training set is Xi,1、xi,2、…、xi,mM is the length of the time series, and the sequence value of the output series Y is Yi,1+s、yi,2+s、…、yi,m+s
Preferably, in the step (4), the LSTM model accuracy evaluation index is selected from a nash coefficient NSE, a root mean square error RMSE, or a correlation coefficient R2The calculation method is as follows:
Figure BDA0003479381750000033
Figure BDA0003479381750000034
Figure BDA0003479381750000035
wherein the content of the first and second substances,
Figure BDA0003479381750000036
is the observed value of the chlorophyll a test period,
Figure BDA0003479381750000037
the prediction value of the chlorophyll a test period model is obtained,
Figure BDA0003479381750000038
is the average of the observed values over the test period,
Figure BDA0003479381750000039
the average value of the prediction values of the test period model is shown, and t is the length of the time sequence in the test set.
Based on the same inventive concept, the long-term lake blue algae forecasting system based on the STL-RF-LSTM comprises the following modules:
the acquisition module is used for collecting long-time sequence chlorophyll a data and environmental influence factor data thereof of the lake for years and months, wherein the relevant environmental influence factor data comprise water temperature WT, dissolved oxygen DO, pH and potassium permanganate index CODMnNH, ammonia nitrogen3-N, nitro nitrogen NO3-N, total nitrogen TN, total phosphorus TP and phosphate PO4-
The STL decomposition module is used for decomposing the long-time sequence data STL of the chlorophyll a month by month to obtain a trend item, a season item and random item data of the chlorophyll a;
the RF screening module is used for performing importance evaluation on related environmental factors by using an RF model based on trend item data of chlorophyll a obtained after STL decomposition to obtain key environmental influence factors influencing cyanobacterial bloom;
the LSTM training and forecasting module is used for dividing monthly data in a training set according to years, taking the data of the last year as a verification set, respectively constructing an LSTM model with a backtracking step length of 1-12, establishing a corresponding relation between a key environment influence factor and chlorophyll a under different backtracking step lengths by using the LSTM model according to chlorophyll a trend item data after STL decomposition and key environment factor monitoring data after RF model screening, and performing model precision evaluation to determine the optimal backtracking step length; and inputting the collected data of the key environmental factors for predicting the future into an LSTM model with the established optimal backtracking step length to obtain a long-term forecasting result of the future cyanobacterial bloom.
Based on the same inventive concept, the invention provides a computer system, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the computer program realizes the long-term prediction method of lake cyanobacterial bloom based on STL-RF-LSTM when being loaded to the processor.
Has the advantages that: according to the method, an STL decomposition model is used for removing noise influence of random factors on long-time sequence data of the lake cyanobacterial bloom, so that a real cyanobacterial bloom historical trend time sequence is obtained; then, an RF model is adopted to identify key environmental influence factors of the cyanobacterial bloom, so that the influence of data redundancy is avoided; and finally, the key environmental impact factor data at the current moment are utilized and substituted into the trained model to obtain the future long-term lake blue algae bloom trend change, thereby providing reference and decision for lake water environment management. Compared with the prior art, the invention has the following beneficial effects:
1. according to the method, the long-term chlorophyll a sequence data containing noise is decomposed by using the STL decomposition model, the long-term trend change rule of the chlorophyll a can be effectively obtained, and the influence of potential noise of monitoring data on the long-term chlorophyll a prediction precision is avoided. 2. When the RF model is adopted to screen the key environmental factors of the cyanobacterial bloom, the nonlinear relation between the cyanobacterial bloom and the environmental impact factors can be effectively considered, the key environmental impact factors can be accurately identified, and the influence of data redundancy on the long-term forecasting of chlorophyll a is reduced. 3. The method adopts an LSTM model to carry out long-term prediction on the chlorophyll a, considers the dependency relationship of the chlorophyll a and environmental influence factors thereof on the front and back time, and determines the optimal backtracking step length by comparing the cyanobacterial bloom prediction effects under different backtracking step lengths to obtain an optimal prediction model; the long-term prediction of the future cyanobacterial bloom can be obtained by utilizing the environmental factor monitoring data at the current time, the prediction of the future environmental factor is avoided, the accumulation of errors is reduced, and the accuracy of the prediction result is further ensured.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention.
FIG. 2 is an exploded view of a site chlorophyll a time series and its STL according to an embodiment of the present invention.
FIG. 3 is a comparison of the training phase and the testing phase of the LSTM and ARIMA models in an embodiment of the present invention.
FIG. 4 is a comparison graph of the accuracy of the LSTM and ARIMA models during the training phase and during the testing phase in an embodiment of the present invention.
Detailed Description
The invention will be further explained with reference to the drawings and the specific embodiments.
As shown in FIG. 1, the long-term forecasting method for lake cyanobacterial bloom based on STL-RF-LSTM disclosed by the embodiment of the invention comprises the following steps:
(1) collecting long-time sequence chlorophyll a data and environmental influence factor data thereof including Water Temperature (WT), Dissolved Oxygen (DO), pH, and potassium permanganate index (COD) of lake for years and monthsMn) Ammonia Nitrogen (NH)3-N), nitro Nitrogen (NO)3-N), Total Nitrogen (TN), Total Phosphorus (TP), Phosphate (PO)4-) And the like. In this example, chlorophyll a long-time sequence data of three sites of Taihu lake Dapu mouth, Tushan mountain and Jiashan mountain, which are collected continuously for 16 years and month by month from 2000 to 2015, and corresponding Water Temperature (WT), Dissolved Oxygen (DO), pH and potassium permanganate index (COD)Mn) Ammonia Nitrogen (NH)3-N), nitro Nitrogen (NO)3-N), Total Nitrogen (TN), Total Phosphorus (TP), Phosphate (PO)4-) Equal environmental impactLong time sequence data of factors.
(2) And (5) carrying out month-by-month chlorophyll a long-time sequence data STL decomposition to obtain seasonal terms, trend terms and random term data of chlorophyll a. The method specifically comprises the following steps:
(2.1) determining a period window and a seasonal window, wherein the seasonal window is set to 11, and the period window model is automatically selected, which can generally balance the seasonal overfitting and allow it to vary slowly over time.
And (2.2) substituting the chlorophyll a data into an STL model with set parameters to obtain a trend item, a season item and a random item of the chlorophyll a long-time sequence. STL decomposition of large Pukochlorophyll a for a long time is shown in FIG. 2, and (b) in FIG. 2 is the long-term variation trend of large Pukochlorophyll a after decomposition.
The STL decomposition is a time sequence decomposition method taking robust local weighted regression as a smoothing method, reduces the importance of data feature screening on the premise of ensuring enough training data, has the characteristics of simplicity of the traditional linear regression and robustness of the nonlinear regression, and can decompose data into trend terms, seasonal terms and random terms. The STL model is applied to a chlorophyll a long-term monitoring data set containing noise and obvious seasonal regularity, and the historical long-term change trend of the chlorophyll a can be effectively extracted.
(3) And (3) based on the trend item data of the chlorophyll a obtained after STL decomposition, performing importance evaluation on the related environmental factors by using an RF (radio frequency) model to obtain key environmental influence factors influencing the cyanobacterial bloom. The linear correlation coefficient adopted by the traditional Pearson correlation analysis has poor analysis effect on the nonlinear relation between the cyanobacterial bloom and the environmental impact factors, the invention adopts the RF model and takes the decision tree as the base learner to construct the Bagging integration algorithm, can consider the nonlinear relation between variables, can screen out the key environmental impact factors which have great influence on the cyanobacterial bloom from a plurality of environmental impact factors, and can keep effective information under the condition of eliminating redundant and irrelevant information. The method comprises the following specific steps:
(3.1) determining the number N of decision trees contained in the random forest to be 100, and respectively utilizing each decisionTree lnPredicting corresponding OOB data and calculating the root mean square error e of the OOB datan
(3.2) keeping other environmental factors of the OOB data unchanged, only disturbing the characteristic value sequence of the ith environmental factor, and recalculating the predicted root mean square error
Figure BDA0003479381750000061
(3.3) calculating the importance of the ith environmental factor to each decision tree
Figure BDA0003479381750000062
The calculation formula is as follows:
Figure BDA0003479381750000063
repeating the calculation steps until the whole random forest model is traversed to obtain the importance of the environment factor i:
Figure BDA0003479381750000064
calculating water temperature, dissolved oxygen, pH and COD according to the formulaMn、NH3The cumulative contribution rate of seven environmental factors of-N, TN and TP exceeds 90%, namely the environmental factors have strong importance for the prediction of chlorophyll a and are key environmental factors influencing chlorophyll a.
(4) And according to the chlorophyll a trend data after STL decomposition and the key environmental factor monitoring data after RF model screening, establishing a corresponding relation between the chlorophyll a trend data and the key environmental factor monitoring data before and after the LSTM model, inputting the key environmental factor data of the current time into the established LSTM model, and obtaining a long-term forecasting result of the future cyanobacterial bloom. The method specifically comprises the following steps:
and (4.1) when data normalization is carried out, carrying out normalization on chlorophyll a data of each station after STL decomposition for 16 continuous years and seven key environmental impact factor data screened by an RF model to an interval [0,1] by using a MinMax method. The concrete form is as follows:
Figure BDA0003479381750000065
wherein, x and x*The chlorophyll a trend sequence value before and after normalization, xmax、xminThe maximum value and the minimum value of the chlorophyll a trend periodic sequence are respectively shown.
And (4.2) when the LSTM is used for establishing the chlorophyll a and the time corresponding relation of the key environmental influence factors, a training set and a test set need to be divided. Specifically, a training set is divided into k parts in equal proportion, the k part is selected as a verification set, and the rest k-1 parts are used for training. In the embodiment, the time series length of the chlorophyll a and the key environmental influence factors thereof is 192 (month-by-month data of 16 years in total), wherein the training set is divided into 180, the test set is divided into 12, namely the LSTM is used for predicting the water bloom change trend of the blue algae in the next 12 months, meanwhile, the training set is divided into 15 parts by year, and the 15 th part, namely 168-180 th data is selected as a verification set.
(4.3) when training the LSTM model, the main parameter settings of the model are shown in Table 3:
TABLE 1 LSTM model parameter values
Figure BDA0003479381750000071
And (4.4) respectively carrying out model training when the backtracking step length is 1-12 so as to determine the backtracking step length under the optimal prediction precision result. In the training set, the environmental factors are used as an input sequence X of the model, wherein X comprises seven components (time sequences of seven environmental factors respectively), and the chlorophyll a trend sequence is used as an output Y of the model. When the backtracking step length is 1, the sequence value of the ith component in the training set input sequence X is Xi,1、xi,2、…、xi,167Sequence length is 167, and sequence value of output sequence Y is Yi,2、yi,3、…、yi,168The sequence length is 167, and the input sequence X and the output sequence Y are substituted into the LSTM modelTraining is carried out and certain precision is achieved, and the corresponding relation between seven key environmental influence factors and chlorophyll a when the backtracking step length is 1 can be obtained. The accuracy evaluation index includes NSE, RMSE and R2The calculation method is as follows:
Figure BDA0003479381750000072
Figure BDA0003479381750000073
Figure BDA0003479381750000074
wherein m is the length of the time series used for training,
Figure BDA0003479381750000075
is an observed value of chlorophyll a in a training period,
Figure BDA0003479381750000076
a prediction value is reported for a chlorophyll a training period model,
Figure BDA0003479381750000077
is the average of the observations during the training period,
Figure BDA0003479381750000078
the mean of the predicted values for the training period model is reported. The results of the precision evaluation of the three stations in the training period are shown in fig. 4 (a), (c), and (e).
(4.5) when long-term prediction is carried out in the next year, models with different backtracking step lengths trained in a training period are used for comparing the precision of the future prediction results to determine the backtracking step length with the highest precision of the corresponding site prediction, and the precision evaluation index selects a Nash coefficient NSE, a root mean square error RMSE and a correlation coefficient R2The calculation method is as follows:
Figure BDA0003479381750000081
Figure BDA0003479381750000082
Figure BDA0003479381750000083
wherein t is the length of the time sequence in the test set,
Figure BDA0003479381750000084
is the observed value of the chlorophyll a test period,
Figure BDA0003479381750000085
the prediction value of the chlorophyll a test period model is obtained,
Figure BDA0003479381750000086
is the average of the observed values over the test period,
Figure BDA0003479381750000087
the average of the predicted values for the test period model is reported. The results of the accuracy evaluation of the three stations in the training period are shown in fig. 4 (b), (d), and (f).
Fig. 3 shows the long-term chlorophyll a forecast result of the next 1 year at the optimal backtracking step length of the corresponding site. Wherein, the optimal backtracking step lengths of the large Pukou, the focal mountain and the dragging mountain are respectively 5, 2 and 1.
Comparing the LSTM model with the conventional time sequence analysis model ARIMA, the chlorophyll a forecasting precision of the method is higher than that of the traditional model.
In conclusion, the method provided by the invention has the advantages that the long-term sequence cyanobacterial bloom and the monitoring data of the relevant environmental impact factors thereof in the lake are collected, STL time sequence decomposition is carried out on the cyanobacterial bloom data, the potential noise in the cyanobacterial bloom monitoring data is removed, and the long-term change trend of the cyanobacterial bloom is obtained; considering that the cyanobacterial bloom and the environmental impact factors are in a nonlinear relation, the traditional Pearson correlation analysis can only obtain linear correlation coefficients among variables, and the analysis effect is poor for the nonlinear variables, so that the RF model is adopted to screen the key environmental factors of the cyanobacterial bloom, and the influence of data information redundancy can be further avoided on the basis of effectively considering the nonlinear relation between the environmental factors and the cyanobacterial bloom; and finally, establishing a corresponding relation between the cyanobacterial bloom and the key environmental influence factors thereof in the time before and after by using an LSTM model, and substituting the monitoring data of the key environmental influence factors actually measured at the current time into the established corresponding relation to obtain a future cyanobacterial bloom forecasting result, so that the environmental factors at the future time are not required to be forecasted, the accumulation of forecasting errors is avoided, and the cyanobacterial bloom forecasting precision can be effectively improved. The invention couples the STL model, the RF model and the LSTM model, can accurately extract the long-term change trend of the cyanobacterial bloom, identifies the key environmental influence factors influencing the cyanobacterial bloom, and obtains more accurate future long-term forecasting results of the cyanobacterial bloom on the basis of considering the time relativity of the cyanobacterial bloom and the key environmental influence factors.
Based on the same inventive concept, the lake blue algae long-term forecasting system based on the STL-RF-LSTM provided by the embodiment of the invention comprises the following modules: the acquisition module is used for collecting long-time sequence chlorophyll a data and environmental influence factor data of the lake for years and months; the STL decomposition module is used for decomposing the long-time sequence data STL of the chlorophyll a month by month to obtain a trend item, a season item and random item data of the chlorophyll a; the RF screening module is used for performing importance evaluation on related environmental factors by using an RF model based on trend item data of chlorophyll a obtained after STL decomposition to obtain key environmental influence factors influencing cyanobacterial bloom; the LSTM training and forecasting module is used for dividing monthly data in a training set according to years, taking the data of the last year as a verification set, respectively constructing an LSTM model with a backtracking step length of 1-12, establishing a corresponding relation between a key environment influence factor and chlorophyll a under different backtracking step lengths by using the LSTM model according to chlorophyll a trend item data after STL decomposition and key environment factor monitoring data after RF model screening, and performing model precision evaluation to determine the optimal backtracking step length; and inputting the collected data of the key environmental factors for predicting the future into an LSTM model with the established optimal backtracking step length to obtain a long-term forecasting result of the future cyanobacterial bloom. For details of the specific implementation of each module, reference is made to the above method embodiments, which are not described again.
Based on the same inventive concept, the embodiment of the invention provides a computer system, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the computer program is loaded into the processor to realize the long-term prediction method of lake cyanobacterial bloom based on STL-RF-LSTM.
While the invention has been described in connection with what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not to be limited to the disclosed embodiment, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (10)

1. A lake cyanobacterial bloom long-term forecasting method based on STL-RF-LSTM is characterized by comprising the following steps:
(1) collecting long-time sequence chlorophyll a data and environmental influence factor data of the lake for years and months, wherein the relevant environmental influence factor data comprise water temperature WT, dissolved oxygen DO, pH and potassium permanganate index CODMnNH, ammonia nitrogen3-N, nitro nitrogen NO3-N, total nitrogen TN, total phosphorus TP and phosphate PO4-
(2) Carrying out month-by-month chlorophyll a long-time sequence data STL decomposition to obtain a trend item, a season item and random item data of chlorophyll a;
(3) based on trend item data of chlorophyll a obtained after STL decomposition, performing importance evaluation on related environmental factors by using an RF (radio frequency) model to obtain key environmental influence factors influencing cyanobacterial bloom;
(4) dividing monthly data in a training set according to years, taking the data of the last year as a verification set, respectively constructing an LSTM model with a backtracking step length of 1-12, establishing a corresponding relation between a key environment influence factor and chlorophyll a under different backtracking step lengths by using the LSTM model according to chlorophyll a trend item data after STL decomposition and key environment factor monitoring data after RF model screening, and performing model precision evaluation to determine the optimal backtracking step length; and inputting the collected data of the key environmental factors for predicting the future into an LSTM model with the established optimal backtracking step length to obtain a long-term forecasting result of the future cyanobacterial bloom.
2. The STL-RF-LSTM-based long-term forecasting method for lake cyanobacterial bloom as claimed in claim 1, wherein the STL decomposition of the lunar chlorophyll a long-term sequence data in step (2) comprises:
determining the sizes of a trend window and a season item window;
the temporal chlorophyll a series was decomposed using the STL model into three additive components, represented as:
Yt=Tt+St+Rt
wherein, YtAs observed for chlorophyll a at time T, Tt、St、RtRespectively a trend item, a season item and a random fluctuation item of the observed value.
3. The STL-RF-LSTM-based long-term forecasting method for lake cyanobacterial bloom as claimed in claim 1, wherein when the RF model is adopted to screen the key environmental factors in step (3), the relative importance of each environmental factor is calculated by using the out-of-bag data OOB in the random forest sampling process, and the method comprises the following steps:
determining the number N of decision trees contained in the random forest, and respectively utilizing each decision tree lnPredicting corresponding OOB data and calculating the root mean square error e of the OOB datan
Keeping other environmental factors of the OOB data unchanged, only disturbing the characteristic value sequence of the ith environmental factor, and recalculating the predicted root mean square error
Figure FDA0003479381740000011
Computing the ith environmental factor for each decision tree lnOf importance
Figure FDA0003479381740000012
The calculation formula is as follows:
Figure FDA0003479381740000021
repeating the calculation steps until the whole random forest model is traversed to obtain the importance mu of the environment factor i, wherein the calculation formula is as follows:
Figure FDA0003479381740000022
4. the STL-RF-LSTM-based long-term forecasting method for lake cyanobacterial bloom as claimed in claim 1, wherein in the step (4), a MinMax method is adopted to normalize the chlorophyll a trend data and the environmental impact factor data, and transform the data value domain to [0,1 ].
5. The STL-RF-LSTM-based lake cyanobacterial bloom long-term forecasting method as claimed in claim 1, wherein the step (4) is implemented by taking an environmental factor as an input sequence X of a model, taking a chlorophyll a trend sequence as an output Y of the model, and taking a sequence value of an ith component in the input sequence X of the training set as X for the LSTM model with a backtracking step length of si,1、xi,2、…、xi,mM is the length of the time series, and the sequence value of the output series Y is Yi,1+s、yi,2+s、…、yi,m+s
6. The STL-RF-LSTM-based lake cyanobacterial bloom long-term forecasting method as claimed in claim 1, wherein the LSTM model accuracy evaluation index in the step (4) is selected from Nash coefficient NSE, root mean square error RMSE or correlation coefficient R2The calculation method is as follows:
Figure FDA0003479381740000023
Figure FDA0003479381740000024
Figure FDA0003479381740000025
wherein the content of the first and second substances,
Figure FDA0003479381740000026
is the observed value of the chlorophyll a test period,
Figure FDA0003479381740000027
the prediction value of the chlorophyll a test period model is obtained,
Figure FDA0003479381740000028
is the average of the observed values over the test period,
Figure FDA0003479381740000029
the average value of the prediction values of the test period model is shown, and t is the length of the time sequence in the test set.
7. The long-term lake blue-green algae forecasting system based on the STL-RF-LSTM is characterized by comprising the following modules:
the acquisition module is used for collecting long-time sequence chlorophyll a data and environmental influence factor data thereof of the lake for years and months, wherein the relevant environmental influence factor data comprise water temperature WT, dissolved oxygen DO, pH and potassium permanganate index CODMnNH, ammonia nitrogen3-N, nitro nitrogen NO3-N, total nitrogen TN, total phosphorus TP and phosphate PO4-
The STL decomposition module is used for decomposing the long-time sequence data STL of the chlorophyll a month by month to obtain a trend item, a season item and random item data of the chlorophyll a;
the RF screening module is used for performing importance evaluation on related environmental factors by using an RF model based on trend item data of chlorophyll a obtained after STL decomposition to obtain key environmental influence factors influencing cyanobacterial bloom;
the LSTM training and forecasting module is used for dividing monthly data in a training set according to years, taking the data of the last year as a verification set, respectively constructing an LSTM model with a backtracking step length of 1-12, establishing a corresponding relation between a key environment influence factor and chlorophyll a under different backtracking step lengths by using the LSTM model according to chlorophyll a trend item data after STL decomposition and key environment factor monitoring data after RF model screening, and performing model precision evaluation to determine the optimal backtracking step length; and inputting the collected data of the key environmental factors for predicting the future into an LSTM model with the established optimal backtracking step length to obtain a long-term forecasting result of the future cyanobacterial bloom.
8. The STL-RF-LSTM-based long-term forecasting system for lake cyanobacterial bloom as claimed in claim 7, wherein the STL decomposition module comprises, when performing STL decomposition on monthly chlorophyll a long-term sequence data:
determining the sizes of a trend window and a season item window;
the temporal chlorophyll a series was decomposed using the STL model into three additive components, represented as:
Yt=Tt+St+Rt
wherein, YtAs observed for chlorophyll a at time T, Tt、St、RtRespectively a trend item, a season item and a random fluctuation item of the observed value.
9. The STL-RF-LSTM-based long-term forecasting system for lake cyanobacterial bloom as claimed in claim 7, wherein when the RF screening module employs the RF model to screen the key environmental factors, the relative importance of each environmental factor is calculated by using the out-of-bag data OOB in the random forest sampling process, comprising:
determining the number N of decision trees contained in the random forest, and respectively utilizing each decision tree lnPredicting corresponding OOB data and calculating the root mean square error e of the OOB datan
Keeping other environmental factors of the OOB data unchanged, only disturbing the characteristic value sequence of the ith environmental factor, and recalculating the predicted root mean square error
Figure FDA0003479381740000031
Computing the ith environmental factor for each decision tree lnOf importance
Figure FDA0003479381740000032
The calculation formula is as follows:
Figure FDA0003479381740000033
repeating the calculation steps until the whole random forest model is traversed to obtain the importance mu of the environment factor i, wherein the calculation formula is as follows:
Figure FDA0003479381740000034
10. a computer system comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the computer program when loaded on the processor implements the STL-RF-LSTM-based long-term prediction method of cyanobacterial bloom in lakes according to any of claims 1-6.
CN202210063970.0A 2022-01-20 2022-01-20 Long-term lake cyanobacterial bloom forecasting method and system based on STL-RF-LSTM Pending CN114386710A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210063970.0A CN114386710A (en) 2022-01-20 2022-01-20 Long-term lake cyanobacterial bloom forecasting method and system based on STL-RF-LSTM

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210063970.0A CN114386710A (en) 2022-01-20 2022-01-20 Long-term lake cyanobacterial bloom forecasting method and system based on STL-RF-LSTM

Publications (1)

Publication Number Publication Date
CN114386710A true CN114386710A (en) 2022-04-22

Family

ID=81203013

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210063970.0A Pending CN114386710A (en) 2022-01-20 2022-01-20 Long-term lake cyanobacterial bloom forecasting method and system based on STL-RF-LSTM

Country Status (1)

Country Link
CN (1) CN114386710A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115575363A (en) * 2022-09-27 2023-01-06 北京航空航天大学 Method and system for acquiring ecological influence mechanism

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104899653A (en) * 2015-06-02 2015-09-09 北京工商大学 Lake and reservoir cyanobacterial bloom prediction method based on expert system and cyanobacterial growth mechanism timing model
CN109308544A (en) * 2018-08-21 2019-02-05 北京师范大学 Based on to sdpecific dispersion-shot and long term memory network cyanobacterial bloom prediction technique

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104899653A (en) * 2015-06-02 2015-09-09 北京工商大学 Lake and reservoir cyanobacterial bloom prediction method based on expert system and cyanobacterial growth mechanism timing model
CN109308544A (en) * 2018-08-21 2019-02-05 北京师范大学 Based on to sdpecific dispersion-shot and long term memory network cyanobacterial bloom prediction technique

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张以飞 等: ""非参数季节分解模型太湖叶绿素 a 浓度变化特征研究"", 《四川环境》 *
郝玉莹 等: ""基于RF-LSTM的地表水体水质预测"", 《水资源与水工程学报》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115575363A (en) * 2022-09-27 2023-01-06 北京航空航天大学 Method and system for acquiring ecological influence mechanism

Similar Documents

Publication Publication Date Title
Van Griensven et al. Sensitivity analysis and auto-calibration of an integral dynamic model for river water quality
CN107885951B (en) A kind of Time series hydrological forecasting method based on built-up pattern
CN108876021B (en) Medium-and-long-term runoff forecasting method and system
US20120179373A1 (en) Method for measuring total phosphorus using multi-parameter water quality data
CN111582551A (en) Method and system for predicting short-term wind speed of wind power plant and electronic equipment
CN112288193A (en) Ocean station surface salinity prediction method based on GRU deep learning of attention mechanism
CN112381673B (en) Park electricity utilization information analysis method and device based on digital twin
CN110490366A (en) Runoff forestry method based on variation mode decomposition and iteration decision tree
CN111695290A (en) Short-term runoff intelligent forecasting hybrid model method suitable for variable environment
CN112508299A (en) Power load prediction method and device, terminal equipment and storage medium
CN107679756B (en) Soil suitability evaluation method and device
CN114386710A (en) Long-term lake cyanobacterial bloom forecasting method and system based on STL-RF-LSTM
CN114357737B (en) Agent optimization calibration method for time-varying parameters of large-scale hydrologic model
CN114819407A (en) Dynamic prediction method and device for lake blue algae bloom
CN110782112B (en) Method and system for estimating greenhouse gas emission reduction potential in crop production
CN117035155A (en) Water quality prediction method
CN106404712A (en) Adaptive model correcting method and system based on GT-KF-PLC near infrared spectrum
CN116127833A (en) Wind power prediction method, system, device and medium based on VMD and LSTM fusion model
CN116091103A (en) Method, device, electronic equipment and medium for measuring and calculating periodic environment remediation
CN115618720A (en) Soil salinization prediction method and system based on altitude
CN114784795A (en) Wind power prediction method and device, electronic equipment and storage medium
CN114492944A (en) TLBO-Elman-based photovoltaic power station short-term power generation power prediction method and device and storage medium
US20100010847A1 (en) Technique that utilizes a monte carlo method to handle the uncertainty of input values when computing the net present value (npv) for a project
Chen et al. Uncertainty analysis of hydrologic forecasts based on copulas
Haida et al. Modelling daily Dissolved Oxygen Dynamics in the Sebou River (Morocco): Data-Centric Approaches

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20220422

RJ01 Rejection of invention patent application after publication