CN116611580A

CN116611580A - Ocean red tide prediction method based on multi-source data and deep learning

Info

Publication number: CN116611580A
Application number: CN202310739974.0A
Authority: CN
Inventors: 陈俊; 陈芳; 孟伟强; 姜乃祺; 石浩铭; 易天儒
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2023-06-21
Filing date: 2023-06-21
Publication date: 2023-08-18

Abstract

The invention relates to a ocean red tide prediction method based on multi-source data and deep learning. On one hand, remote sensing data, water quality buoy monitoring data, meteorological data and manual detection data are combined, and on the other hand, a deep learning method based on CNN-LSTM is adopted to predict red tide by utilizing multiple characteristic factors of time sequences. The process comprises the following steps: a. constructing a data set, integrating and preprocessing environmental factor data; b. analyzing the correlation between the environmental factors and the red tide by adopting the Pearson coefficient, and analyzing the correlation between the multi-environmental factor combination and the red tide by utilizing the complex correlation coefficient; c. constructing a CNN-LSTM prediction model, and carrying out parameter optimization adjustment on the trained CNN-LSTM prediction model; d. and testing the model by using the multi-environment factors of different combinations, and performing performance evaluation. The method combines remote sensing image data and water quality detection data, increases the data quantity, improves the complexity of a data set and improves the accuracy of red tide forecasting.

Description

Ocean red tide prediction method based on multi-source data and deep learning

Technical Field

The invention relates to the field of marine ecological environment, in particular to the field of marine water quality monitoring, in particular to a marine red tide prediction method based on multi-source data and deep learning.

Background

Red tide is taken as a disaster which can greatly harm the marine environment, and brings potential safety hazard to all coastal cities. The occurrence of red tide can influence ecological balance, so that ocean pollution is caused, the red tide is monitored in real time as soon as possible, the development condition of the red tide is fully mastered, and the occurrence of the red tide is accurately predicted. Traditional empirical, statistical, and numerical model analysis methods are increasingly difficult to adapt to large-scale data scenarios. And the k-nearest neighbor and random forest algorithm, the back scattering method, the chlorophyll anomaly method and other machine learning methods for exploring the probability of generating red tide phenomenon through image detection. Due to the limited expressive power for features in large data volumes and complex scenarios. The traditional method has low prediction accuracy, low model complexity and insufficient applicability.

In recent years, with the development of deep learning, more and more deep neural networks have been developed, they have been intensively studied in many fields, and have had an extraordinary effect on the solution of a series of problems. To solve the above problem of limited expression of machine learning features, recent patents have combined prediction of red tide occurrence with deep learning. The Chinese patent with publication number of CN112365093A discloses a multi-feature factor red tide prediction model based on GRU deep learning, which comprises the steps of preprocessing collected data, analyzing the relevance of feature influence factors and red tide, constructing the GRU prediction model, training the model by utilizing multiple combination feature factors, evaluating the performance of the model and the like. When the method works, the multi-combination characteristic factors are used as input variables, and the probability of red tide occurrence is output through the trained GRU model. The model effectively combines deep learning, predicts the occurrence probability of the red tide by utilizing multiple combination characteristic factors, has a good expression effect on characteristics, and improves the accuracy of the red tide occurrence prediction, but the model has low data complexity, low applicability and rising space of prediction accuracy. Another example is chinese patent publication No. CN112084716a, which discloses a red tide prediction and early warning method based on eutrophication comprehensive evaluation, which includes data acquisition and correlation processing, and the obtained correlated data establishes decision tree, establishes and trains neural network prediction model, and the like. When the method is used, a high confidence coefficient data set and a low confidence coefficient data set are used for inputting a neural network prediction model, and a final prediction result is obtained; and sending out corresponding early warning information according to the final prediction result. The method can improve the defects of the prior art and improve the accuracy of red tide development trend prediction. But training data is small in scale, the model convergence speed is low, and the precision is still to be improved.

At present, no prediction method is available, which can adapt to a large-scale data scene and has high precision, and therefore, a ocean red tide prediction model based on multi-source data and deep learning is designed. Red tide predictions are made using time series analysis of multi-source data. The method has high prediction accuracy, can be suitable for large-scale scenes, improves the prediction accuracy and improves the model complexity. Has practical significance and good application scene.

Disclosure of Invention

The invention aims to provide a ocean red tide prediction method based on multi-source data and deep learning, which adopts a Pearson coefficient to analyze the correlation between environmental factors and red tide occurrence, utilizes a complex correlation coefficient to analyze the correlation between multi-environmental factor combination and red tide occurrence, constructs a CNN-LSTM neural network model, trains the model by utilizing multi-environmental factors of different combinations, and constructs an ocean red tide prediction model based on multi-source data and deep learning.

In order to achieve the above purpose, the technical scheme of the invention is as follows: a ocean red tide prediction method based on multi-source data and deep learning comprises the following steps:

a. constructing a data set, integrating and preprocessing the data;

b. analyzing the correlation between the environmental factors and the red tide by adopting the Pearson coefficient, and analyzing the correlation between the multi-environmental factor combination and the red tide by utilizing the complex correlation coefficient;

c. constructing a CNN-LSTM prediction model, and training the CNN-LSTM prediction model by utilizing different combination multi-environment factors;

d. and testing the trained CNN-LSTM prediction model to obtain a prediction result, and performing performance evaluation.

In an embodiment of the present invention, the step a specifically includes:

a1. extracting remote sensing data, and inverting the surface temperature of the sea water and the chlorophyll concentration by adopting a window splitting algorithm formula (1) and a wave band ratio method formula (2) respectively;

T _s ＝A ₀ +A ₁ T ₃₁ -A ₂ T ₃₂ (1)

t in _s Represents the surface temperature of seawater, T ₃₁ 、T ₃₂ At 31 st and 32 nd band radiation brightness temperature A ₀ 、A ₁ 、A ₂ Is a window splitting algorithm parameter;

Chl-a＝a*(B _NIR B _RED ) ² +b*(B _NIR B _RED )+c(2)

wherein Chl-a represents chlorophyll a concentration, B _NIR 、B _RED For the near infrared band and the infrared band, a, b and c are parameters to be solved;

a2. processing the missing data of the monitoring data, and supplementing the missing data by adopting an interpolation method;

a3. integrating the remote sensing extracted data with the monitored environmental factor data of the time sequence, and performing normalization processing;

wherein x is original characteristic factor data, x _min Is the minimum value of the feature factor data, x _max And x' is the data after normalization processing and is the maximum value of the feature factor data.

In an embodiment of the present invention, the step b specifically includes:

b1. inputting processed environmental factor data including saturated dissolved oxygen, pH, chlorophyll a concentration, water temperature, salinity, turbidity, tide, wind speed u component, wind speed v component and air temperature;

b2. and (3) analyzing the correlation between each environmental factor and the occurrence of red tide by using the Pearson coefficient, wherein the calculation formula is as follows:

wherein X represents an environmental factor, Y represents the occurrence of red tide, cov (X, Y) is covariance between the two, and sigma X and sigma Y are standard deviations of X and Y respectively;

b3. and analyzing the correlation between the multi-environment characteristic factor combinations of different combinations and the occurrence of red tide by using complex correlation coefficients, wherein the calculation formula is as follows:

wherein y represents the occurrence of red tide, and y is the result obtained by regression of y on all environmental factors x.

In an embodiment of the present invention, the step c specifically includes:

c1. carrying out various different combinations on the processed characteristic environment factors;

c2. constructing a CNN-LSTM prediction model, wherein the CNN is used for mining local characteristics of environment variable data, the calculation mode is shown as a formula (6), the LSTM is used for mining time sequence dependent characteristics of an environment factor time sequence, and the calculation process is obtained by a series of formulas (7) - (11):

wherein M is _j Is a set of the input maps that are to be mapped,for the output of the j-th group of data of layer l, < >>For the output of the ith data,weight for the ith data of the jth group,/->The additive bias is mapped for the j-th group of data at the first layer, and for different output maps, the input maps are convolved into different kernels;

f _t ＝σ(W _f *[h _t-1 ,x _t ]+b _f )(7)

i _t ＝σ(W _i *[h _t-1 ,x _t ]+b _i )(8)

C _t ＝tanh*(W _c *[h _t-1 ,x _t ]+b _c )(9)

O _t ＝σ(W _o *[h _t-1 ,x _t ]+b _o )(10)

h _t ＝O _t *tanh(C _t )(11)

f _t i is a forgetful door _t For unit input gates, h _t For the current cell output, C _t Representing a memory cell, x _t Representing element inputs, σ represents a sigmoid function, and tanh and σ are used as activation functions in the structure, W _i 、W _f 、W _c 、W _o Is a recursive weight matrix, b _i 、b _f 、b _c 、b _o Is a corresponding bias term;

c3. selecting an activation function and a loss function of a CNN-LSTM prediction model, and setting an optimizer super-parameter and a model super-parameter of the CNN-LSTM prediction model;

c4. fitting training is carried out on the environmental parameters, training data are confirmed, and repeated training optimization is carried out on the CNN-LSTM prediction model.

In an embodiment of the present invention, the step d specifically includes:

d1. performing performance evaluation on the trained CNN-LSTM prediction model, and verifying the stability and prediction accuracy of the model;

d2. and comparing and verifying the optimized CNN-LSTM prediction model by using different CNN-LSTM prediction models, different data sets corresponding to the same CNN-LSTM prediction model and multi-characteristic environmental factors of different combinations.

Compared with the prior art, the invention has the following beneficial effects: the invention discloses a ocean red tide prediction method based on multi-source data and deep learning. And adopting a deep learning method based on CNN-LSTM to predict red tide by utilizing multiple characteristic factors. The method combines remote sensing image data and water quality detection data, increases the data quantity, improves the complexity of a data set and the convergence speed of a model, and improves the accuracy of red tide forecasting.

Drawings

FIG. 1 is a general flow chart of an implementation of the present invention.

FIG. 2 is a flow chart of the inverse sea water surface temperature of the split window algorithm.

Detailed Description

The technical scheme of the invention is specifically described below with reference to the accompanying drawings.

Referring to fig. 1-2, one embodiment of the present invention includes the steps of:

a. constructing a data set, integrating and preprocessing the data;

b. the pearson coefficient is adopted to analyze the correlation between the environmental factors and the occurrence of the red tide, the environmental factors with relatively large correlation are reserved, and the complex correlation coefficient is utilized to analyze the correlation between the environmental factors and the occurrence of the red tide;

c. constructing a CNN-LSTM prediction model, and training the model by utilizing different combined multi-environment factors;

Further, the step a specifically includes:

a1. the environment factor data of the red tide satellite remote sensing picture is extracted, the sea water surface temperature and chlorophyll concentration are inverted by mainly adopting the following formula, the sea water surface temperature is inverted by utilizing a window splitting algorithm (the formula is as follows (1)), and the chlorophyll concentration is inverted by utilizing a wave band ratio method (the formula is as follows (2)).

T _s ＝A ₀ +A ₁ T ₃₁ -A ₂ T ₃₂ (1)

T in _s Represents the surface temperature of seawater, T ₃₁ 、T ₃₂ At 31 st and 32 nd band radiation brightness temperature A ₀ 、A ₁ 、A ₂ Is a window splitting algorithm parameter.

Chl-a＝a*(B _NIR B _RED ) ² +b*(B _NIR B _RED )+c(2)

Wherein Chl-a represents chlorophyll a concentration, B _NIR 、B _RED For the near infrared band and the infrared band, a, b and c are parameters to be solved.

a2. The missing data in the water quality monitoring data is subjected to interpolation treatment, and the missing data is supplemented by mainly adopting a GAN-based time sequence interpolation method.

a3. Integrating the remote sensing extraction data with the monitored environmental factor data of the time sequence, and carrying out normalization processing.

Further, the step b specifically includes:

b1. inputting processed environmental factor data including saturated dissolved oxygen, pH, chlorophyll a concentration, water temperature, salinity, turbidity, tide, wind speed u component, wind speed v component, air temperature, etc.

b2. The pearson coefficient is utilized to analyze the correlation between each environmental factor including saturated dissolved oxygen, pH, chlorophyll a concentration, water temperature, salinity and the like and red tide occurrence, and the calculation formula is as follows:

wherein X represents an environmental factor, Y represents the occurrence of red tide, cov (X, Y) is covariance between the two, and σx and σy are standard deviations of X and Y, respectively.

Further, the step c specifically includes:

c1. and (3) carrying out various combinations on the processed characteristic environment factors to be used as the input of a model, splitting a data set, and dividing the data set into a training set, a verification set and a test set, wherein the ratio of the training set to the verification set is 3:1:1.

c2. Constructing a CNN-LSTM network model, mining local features of various environmental variable data by using CNN, and finally flattening the extracted feature information to obtain feature input which can be used for the LSTM model, wherein the LSTM is responsible for calculating the dependence among various environmental factors of a time sequence, mining the time features of the data, and the calculation process is obtained by a series of formulas ((7) - (11)):

f _t ＝σ(W _f *[h _t-1 ,x _t ]+b _f )(7)

i _t ＝σ(W _i *[h _t-1 ,x _t ]+b _i )(8)

C _t ＝tanh*(W _c *[h _t-1 ,x _t ]+b _c )(9)

O _t ＝σ(W _o *[h _t-1 ,x _t ]+b _o )(10)

h _t ＝O _t *tanh(C _t )(11)

f _t i is a forgetful door _t For unit input gates, h _t For the current cell output, C _t Representing a memory cell, x _t Representing element inputs, σ represents a sigmoid function, and tanh and σ are used as activation functions in the structure, W _i 、W _f 、W _c 、W _o Is a recursive weight matrix, b _i 、b _f 、b _c 、b _o Is the corresponding bias term.

c3. Determining an activation function and a loss function of a network model, setting an optimizer super-parameter and a model super-parameter of the network model, selecting a training model through a grid search method, determining the range of each parameter while ensuring the convergence of the model, and selecting.

c4. The final value of the network model super-parameters is determined by continuous selective training of the parameters.

Further, the step d specifically includes:

d1. and performing performance evaluation on the trained network model, verifying the stability and prediction accuracy of the model, and adopting root mean square deviation and average absolute error as indexes.

d2. And comparing and verifying the optimized network model by utilizing different network models, different data sets corresponding to the same model and multi-characteristic environment factors of different combinations.

The above is a preferred embodiment of the present invention, and all changes made according to the technical solution of the present invention belong to the protection scope of the present invention when the generated functional effects do not exceed the scope of the technical solution of the present invention.

Claims

1. The ocean red tide prediction method based on multi-source data and deep learning is characterized by comprising the following steps of:

a. constructing a data set, integrating and preprocessing the data;

2. The ocean red tide prediction method based on multi-source data and deep learning according to claim 1, wherein the step a specifically comprises:

T _s ＝A ₀ +A ₁ T ₃₁ -A ₂ T ₃₂ (1)

Chl-a＝a*(B _NIR B _RED ) ² +b*(B _NIR B _RED )+c(2)

3. The ocean red tide prediction method based on multi-source data and deep learning according to claim 1, wherein the step b specifically comprises:

4. The ocean red tide prediction method based on multi-source data and deep learning according to claim 1, wherein the step c specifically comprises:

wherein M is _j Is a set of the input maps that are to be mapped,for the output of the j-th group of data of layer l, < >>For the output of the ith data, +.>Weight for the ith data of the jth group,/->The additive bias is mapped for the j-th group of data at the first layer, and for different output maps, the input maps are convolved into different kernels;

f _t ＝σ(W _f *[h _t-1 ,x _t ]+b _f )(7)

i _t ＝σ(W _i *[h _t-1 ,x _t ]+b _i )(8)

C _t ＝tanh*(W _c *[h _t-1 ,x _t ]+b _c )(9)

O _t ＝σ(W _o *[h _t-1 ,x _t ]+b _o )(10)

h _t ＝O _t *tanh(C _t )(11)

5. The ocean red tide prediction method based on multi-source data and deep learning according to claim 1, wherein the step d specifically comprises: