CN111160655A - Decision tree-based offshore red tide generation and red tide type prediction method - Google Patents

Decision tree-based offshore red tide generation and red tide type prediction method Download PDF

Info

Publication number
CN111160655A
CN111160655A CN201911410770.2A CN201911410770A CN111160655A CN 111160655 A CN111160655 A CN 111160655A CN 201911410770 A CN201911410770 A CN 201911410770A CN 111160655 A CN111160655 A CN 111160655A
Authority
CN
China
Prior art keywords
red tide
decision tree
water body
offshore
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911410770.2A
Other languages
Chinese (zh)
Inventor
吴璟瑜
商少平
李雪丁
林锐
贺志刚
郑祥靖
郭民权
曾银东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
FUJIAN MARINE FORECASTS
Xiamen University
Original Assignee
FUJIAN MARINE FORECASTS
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by FUJIAN MARINE FORECASTS, Xiamen University filed Critical FUJIAN MARINE FORECASTS
Priority to CN201911410770.2A priority Critical patent/CN111160655A/en
Publication of CN111160655A publication Critical patent/CN111160655A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers

Abstract

An offshore red tide generation and red tide type prediction method based on a decision tree relates to offshore red tide prediction. The method comprises the following steps: 1) sorting the relevant information of the red tide event; 2) data extraction and quality control: searching water body surface temperature and salt observation data matched with time and place according to the red tide event information, extracting the minimum and maximum values of the temperature and salt observation data every day, and performing necessary quality control on the temperature and salt data; 3) establishing a training database: establishing a database of which the water body state of the day is matched with the maximum and minimum values of the surface temperature and salinity of the water body of the previous day, and taking the database as the training data of the decision tree prediction model; 4) establishing an offshore red tide forecasting model based on a decision tree; 5) applying the offshore red tide forecasting model established in the step 4) to offshore red tide generation and red tide type forecasting. The method predicts the red tide types which are likely to break out while predicting whether the red tide occurs or not. The method has great significance for the early prevention and emergency management of certain red tide events with great social hazards.

Description

Decision tree-based offshore red tide generation and red tide type prediction method
Technical Field
The invention relates to the field of offshore red tide forecasting, in particular to a decision tree-based method for forecasting offshore red tide occurrence and red tide types.
Background
Offshore red tides are common marine disasters and are closely related to fishery production, coastal travel and public health. The existing red tide prediction is mainly based on the monitoring of environmental elements such as biology (algae chlorophyll concentration, algae cell density, zooplankton feeding and the like), chemistry (nutritive salt, pH, DO and the like), hydrology (water temperature, salinity, tide, ocean current, wave and the like), meteorology (air temperature, air pressure, rainfall, wind speed, wind direction, sunlight, humidity and the like), and the generation of red tide is predicted through a single parameter or multiple parameters.
Early prediction is mostly established on the basis of empirical threshold values or simple mathematical statistics methods such as multiple regression, discriminant analysis and the like. For example, the Japanese scholars Andahilan proposed the range of red tide biological concentration and population growth rate as the standard for judging red tide outbreak according to the example statistics of multiple red tide events in various sea areas in Japan[1](ii) a Land bucket, etc[2]The content of chlorophyll a is increased to more than 10mg/m3 and has a rapid increasing trend, namely, a red tide is about to occur; lin Yi&Forest stand[3]Inferring DIP threshold for Skeletonema costatum red tide by enclosure experiment1.2 mu mol/L, which is used as a reference for prediction and prediction of the red tide; wangzhuang et al[4]Through the sampling observation of a plurality of sites in the Yangtze river mouth and the adjacent sea area, the dissolved oxygen day-night change difference value is considered to be more than or equal to 5mg/dm3Indicating the occurrence of red tide; correcting dawn yang[5]Putting forward a standard value with a transparency value of 1.6m as a red tide early warning for trial pointing; jiangxinglong&Song standing honor[6]Establishing a multi-parameter regression equation of the cell density of each dominant species of the red tide algae in the Quanzhou bay by applying the data of the cell density, the water quality physicochemical biological factors and the like of the dominant species of the red tide algae in the sea area and performing multivariate stepwise regression analysis; wuyufang (Wuyufang)[7]The 24h continuous monitoring data of the automatic continuous monitoring instrument for the ocean water quality in the mansion sea area of 2005-2008 are utilized, the daily variation gradient and the like of environmental elements are used as forecasting factors, and a stepwise regression statistical method is adopted to establish a 28h chlorophyll a forecasting equation.
With the proposal of a data mining concept, aiming at the characteristics of mutation, nonlinearity and complexity of red tide generation, a large number of machine learning algorithms are applied to red tide prediction, and the methods mainly comprise methods such as an artificial neural network, a Support Vector Machine (SVM), a genetic algorithm, fuzzy logic, a decision tree, logistic regression and the like. Marsili-Libelli[8]Setting a series of fuzzy rules based on experimental observation and expert knowledge, and predicting the bloom of the Italy Orbetello lagoon through the daily change of parameters such as dissolved oxygen, pH, water temperature and the like; muttil&Lee[9]Modeling by utilizing a genetic algorithm, and performing alternate-day prediction on the red tide near the bank of hong Kong; lane et al[10]Applying parameters such as water temperature, nutrient salt and river input to predict toxic Pseudo-nitzschia diatoms in Montreal bay, California by Logistic regression; gonz-lez Vilas et al[11]And (3) predicting red tide of Pseudo-nitzschia diatoms in the near-shore water body of Spanish based on SVM (support vector machine) by applying water body temperature and salinity data and an upflow index (upflowing index). There are many similar cases in China, such as courage[12]Extracting inorganic nitrogen (DIN), pH value, salinity, water temperature and silicate which are obviously related to red tide outbreak by utilizing red tide actual measurement data and contemporaneous hydrological meteorological data in 2006 of Haizhou Bay 2004-And threshold values for environmental factors such as wind speed; zhang Cheng et al[13]Predicting chlorophyll after 7 days by using parameters such as chlorophyll a, solar radiation, extinction coefficient, water temperature, pH and the like based on the SVM; suxin hong, etc[14]Establishing nonlinear relations between 219 red tide case data in Fujian sea area and 5 weather factors including air temperature, precipitation, wind speed, air pressure and sunshine by applying BP neural network artificial intelligence model, and learning, training and predicting according to Fujian east, Fujian middle and Fujian south.
According to the occurrence mechanism of the red tide, more complex research is carried out, physical-chemical-biological processes are coupled, and an ecological power model is established to predict the red tide. E.g. Allen et al[15]Establishing an ecological-hydrodynamic coupling model, and predicting the occurrence of water bloom in the continental shelf area of northwest Europe; McGillibrand et al[16]Aiming at Karenia mikimotoi, the multi-factor effects of spore distribution, cell growth/death, hydrodynamic force and the like are considered, and the occurrence of the Karenia mikimotoi on the Saglan coastline in 2006 is simulated; arbor and welfare and the like[17]A six-component red tide ecological dynamic model of the Yangtze river mouth sea area is established, the red tide ecological dynamic model and the control factor of the Yangtze river mouth sea area are researched, and the whole process of the red tide digestion is simulated; li Da Ming et al[18]A two-dimensional red tide ecological mathematical model combining hydrodynamics and biodynamics is established, and the living and digestion process of Bohai sea area brown cystis is simulated.
Although there are many studies or applications at home and abroad in the aspect of red tide prediction, the prediction is limited to the occurrence of red tide or a certain red tide, and the prediction of the possible occurrence types of the red tide is rarely reported. In fact, the types of red tide generation are generally diverse, and the ecological effect and social influence of different red tide types are different. Taking Guangdong province and Fujian province in the south of China as an example, the statistical results of the offshore red tide in the Guangdong province in 1980-2016 show that the common red tide comprises noctiluca scintillans, phaeocystis palmata, skeletonema costatum and Sphaeria pyramidal, wherein phaeocystis palmata can cause a great amount of fish death[19](ii) a The variety of red tide in coastal Fujian province is up to 20 in 2001-2010, the prorocentrum donghaiense, noctiluca, skeletonema costatum and chaetoceros are the most common, and the toxic red tide is Karenia mikimotoi[20]. The method has important social significance for effectively reducing social hazards if the occurrence, early prevention and deployment of red tide species such as Karenia mikimotoi and Zostera marina which have great influence can be predicted.
Decision tree is an inductive learning technique in machine learning, is an algorithm for classifying data or constructing a prediction model, and induces a group of classification rules expressed by a tree structure from a disordered and irregular example set[21,22]. C4.5 is one of the main algorithms of the decision tree, introduces information gain rate to select attributes, and solves the defects that the original ID3 algorithm cannot process continuous attributes and is easy to select values with more attribute values as splitting standards; in the process of constructing the tree, the over-fitting problem is avoided through pruning optimization, so that the applicability of the algorithm is improved[23,24]. On the basis of learning historical data, C4.5 can better realize classification or prediction of two or more types of new data and can be widely applied to various fields such as commerce, medicine, remote sensing images and the like[25-28]
In summary, there are various offshore red tide occurrence types and different social influences of different red tide types, but the existing red tide prediction is limited to whether the red tide occurs or not, and the red tide types are not related.
Reference documents:
[1] andaland-red tide organism と red tide entity aquatic product civil engineering, 1973,9(1):31-36.
[2] Luodidine, J.Gobel, Wangchun, et al, Red tide biological monitoring and real-time red tide prediction in the sea area of Zhejiang, east sea, 2000,18(2):33-43.
[3] Ri Yi, Lin Rong Cheng Xianchong triggered threshold study of harmful diatom bloom phosphorus in ocean and lake 1999,30(4):391 and 396.
[4] Wangquan, Zhang Qing, Luhai Yan, etc. Changjiang estuary dissolved oxygen red tide forecast simple mode, oceanographic report, 2000,22(4): 125-.
[5] The early-warning monitoring parameter of red tide is the preliminary research of transparency, ocean environmental science 2001,20(1):31-35.
[6] Jiangxinglong, Song Li Rong, Quanzhou Bay red tide algae dominant species cell density regression equation research, sea and lake marshes, 2010,41(3): 341-.
[7] Wuyufang, establishment of prediction equation of chlorophyll value in mansion sea area during high-incidence red tide, and application of prediction equation to research of disastrous red tide prediction mode, ocean prediction, 2012,29(2):39-44.
[8]Marsili-Libelli S.Fuzzy prediction of the algal blooms in theOrbetello lagoon.Environmental Modelling&Software,2004,19:799–808.
[9]Muttil N and Lee J H W.Genetic programming for analysis and real-time prediction of coastal algal blooms.Ecological Modelling,2005,189:363–376.
[10]Lane J Q,Taimondi P T,Kudela R M.Development of a logisticregression model for theprediction of toxigenic Pseudo-nitzschia bloomsinMonterey Bay,California.Marine Ecology Progress Series,2009,383:37-51,doi:10.3354/meps07999.
[11]González Vilas L,Spyrakos E,Torres Palenzuela J M,et al.SupportVector Machine-based method for predicting Pseudo-nitzschia spp.blooms incoastal waters(Galician rias,NW Spain).Progress in Oceanography,2014,124:66–77.
[12] Xu, Zhang Ying, Liu Ji Tang, etc. based on Logistic regression, research on red tide environmental element threshold in Bay, Haizhou, oceanic advisory, 2009,28(3):70-75.
[13] Zhang Cheng, Chen Zhen, Xu Qiang, etc. the prediction model of chlorophyll a concentration in Taihu Meilianwan based on support vector machine, report of environmental science, 2013,33(10): 2856-.
[14] Study on red tide forecasting method of Fujian sea area based on BP neural network model, aquatic science and newspaper, 2017, 41 (11): 1744-1755.
[15]Allen J I,Smyth T J,Siddorn J R,Holt M.How well can we forecasthigh biomass algal bloom events in a eutrophic coastal sea?Harmful Algae2008,8,70–76,http://dx.doi.org/10.1016/j.hal.2008.08.024.2.
[16]McGillicuddy J,Townsend J D,He D W,et al.Suppressionof the2010Alexandriumfundyense bloom by changes in physical,biological,andchemicalproperties of the Gulf of Maine.Limnol.Oceanogr.,2011,56:2411–2426,http://dx.doi.org/10.4319/lo.2011.56.6.2411.
[17] Qiaofang, Yuan-Shi, Zhumingyuan, etc. research on red tide ecological dynamics models and red tide control factors in estuary sea areas, oceans and lakes, 2000,31(1):93-100.
[18] Plum-buzz, forest-resold, songxia, and the like, a two-dimensional red tide ecological mathematical model and application thereof in Bohai sea, oceanographic science, 2010,34(9):87-93.
[19]Li L,LüS,Cen J.Spatio-temporal variations of harmful algal bloomsalongthe coast of Guangdong,Southern China during 1980–2016*.Journal ofOceanology and Limnology,37(2):535-551,https://doi.org/10.1007/s00343-019-8088-y.
[20] Li Xueding, Fujian coastal near 10a red tide basic characteristic analysis, environmental science, 2012,33(7):2210-2216.
[21]Hunt E B,Marin J,&Stone P J.Experiments in Induction.New York:Academic Press,1966.
[22]Quinlan J R.Induction of decision tree.Machine Learning,1986,1(1):81-106.
[23]Quinlan J R.C4.5:Programs for machine learning.Morgan KaufmanPublisher,San Mateo,CA,1993:27-48.
[24]Quinlan J.Improved Use of Continuous Attributes in C4.5.Journalof Articial Intelligence Research,1996,4:77-90.
[25]Hwang S,Nguyen Q,Lee P.Reproducibility of a regional geologicalmap derivedfrom geochemical maps,using data mining techniques:withapplication toChungbuk province of Korea.Environ.Geol.,2005,48:569–578,https://doi.org/10.1007/s00254-005-1313-3.
[26]Polat K and Günes S.A novel hybrid intelligent method based onC4.5 decision tree classifier and one-against-all approach for multi-classclassification problems.Expert Systems with Applications,2009,36:1587–1592.
[27]Wu W,Dasgupta S,Ramirez EE,et al.Classification Accuracies ofPhysical Activities Using Smartphone Motion Sensors.J Med Internet Res,2012,14(5):e130,DOI:10.2196/jmir.2208.
[28] The method comprises the steps of forever strong Chaihong, Shao Peake, Sunrong Cheng, and the like, an MODIS image red tide intelligent detection technology based on a decision tree, a university of Qingdao (Nature science edition), 2012,25(2), 47-52.
Disclosure of Invention
The invention aims to provide a decision tree-based offshore red tide generation and red tide type prediction method by taking the minimum and maximum daily values of the surface temperature and salinity of a water body as prediction factors aiming at the problems of diversity of offshore red tide generation types and different social influences of different red tide types without relating to the problem of red tide type prediction in the conventional method.
The invention comprises the following steps:
1) sorting the relevant information of the red tide event;
2) data extraction and quality control: searching water body surface temperature and salt observation data matched with time and place according to the red tide event information, extracting the minimum and maximum values of the temperature and salt observation data every day, and performing necessary quality control on the temperature and salt data;
3) establishing a training database: establishing a database of which the water body state of the day is matched with the maximum and minimum values of the surface temperature and salinity of the water body of the previous day, and taking the database as the training data of the decision tree prediction model;
4) establishing an offshore red tide forecasting model based on a decision tree;
5) applying the offshore red tide forecasting model established in the step 4) to offshore red tide generation and red tide type forecasting.
In step 1), the red tide event related information includes time, geographical location, area from the beginning to the end of the red tide, and dominant algae information when the red tide occurs.
In step 2), the specific steps of data extraction and quality control may be: according to the red tide event information, water surface temperature and salt observation data matched with time and place are searched; besides the relevant data during the red tide, the relevant data of a period of time before and after the red tide is needed to correspond to the normal water body (non-red tide water body); extracting daily minimum and maximum values (T) of temperature and salt observation datamin、Smin、Tmax、Smax) And carrying out quality control on the temperature and salinity data.
In the step 3), the water body state refers to a normal water body and a certain red tide water body; the red tide water body is classified according to the total distribution of the red tide dominant algae species samples.
In step 4), the specific steps of establishing the decision tree-based offshore red tide forecasting model may be: (1) establishing an initial decision tree and generating a view; (2) through cross validation, testing the influence of the minimum sample number contained in the leaf node on the performance of the decision tree, and determining the optimal minimum sample number; (3) setting the minimum sample number contained in the leaf node to be the optimal value according to the cross validation error result, establishing an optimized decision tree, and generating a view; (4) and (4) checking the prediction accuracy of the optimized decision tree model by taking the training data as test data, and performing pre-evaluation on the model performance.
The invention applies decision tree, selects the minimum and maximum values of surface temperature and salinity of water body every day as forecasting factors, and develops the offshore red tide and red tide species forecasting technology. The minimum and maximum values (T) of water body surface temperature and salinity on day before and after a plurality of historical red tide events occurmin、Smin、Tmax、Smax) And the water state N (N ═ 0,1, 2,3 … 0 represents normal water; 1,2, 3.. representing different kinds of red tide water bodies) as a training database, and establishing a model by applying a decision tree C4.5 algorithm and pruning to optimize so as to avoid the over-fitting problem. Based on the model, whether the red tide occurs in the next day and the dominant algae in the red tide can be predicted through the surface temperature and salinity of the water body in the day. Taking Fujian offshore red tide historical data as an example, for 7 established water body states, namely a normal water body, diatom red tides such as east-sea Prorocentrum donghaiense, Karenia mikimotoi red tides, Skeletonema costatum and the like, Hakha haemoglobosa red tides, east-sea Prorocentrum mikimoto/Karenia noctilus biphase or triphase red tides and other red tides, the prediction accuracy of a decision tree model on the water body states is as follows: 88.08% of normal water body and 69.07% of red tide species, wherein the content of Karenia mikimotoi is 71.70%. The invention predicts whether the red tide occurs or not and simultaneously carries out pre-treatment on the red tide types which are possibly outbreakedAnd (6) measuring. The method has great significance for the early prevention and emergency management of certain red tide events with great social hazards, such as toxic Karenia mikimotoi.
Drawings
Fig. 1 is a tree diagram of initial decision making for red tide occurrence and red tide species prediction in offshore Fujian province. Wherein x1, x2, x3 and x4 represent the minimum and maximum values (T) of temperature and salinity of the previous day respectivelymin、Smin、Tmax、Smax) 0-6 correspond to different water state types (Table 1). All node branches are represented by the left side as the case of being less than a certain condition, and the right side as the case of being greater than or equal to correspondingly.
FIG. 2 is a graph of the relationship between the 6-time cross validation errors of the model and the minimum number of samples contained in the leaf nodes of the decision tree, and the black bold line represents the 6-time average results.
Fig. 3 is a tree diagram of the decision tree for forecasting red tide occurrence and red tide species in the offshore Fujian province. Wherein x1, x2, x3 and x4 respectively represent the minimum value and the maximum value (Tmin, Smin, Tmax and Smax) of the temperature and the salinity of the previous day, and 0-6 correspond to different water body state types (Table 1). All node branches are represented by the left side as the case of being less than a certain condition, and the right side as the case of being greater than or equal to correspondingly.
Detailed Description
The following examples will further illustrate the present invention with reference to the accompanying drawings.
The specific implementation of the decision tree-based method for predicting the occurrence and type of the offshore red tide is shown as follows, taking the prediction of the occurrence and type of the offshore red tide in Fujian province as an example.
1. Reorganization of information related to red tide events
And (4) finishing the red tide historical event information of the research area, wherein the red tide historical event information comprises the time, the geographical position and the area from the beginning to the end of the red tide, and dominant algae information when the red tide occurs.
Taking Fujian offshore as an example, and taking 'Fujian province marine disaster bulletin' issued by Fujian province oceans and fishery halls every year as a basis, red tide information of Fujian offshore in 2007 and 2018 is collated.
2. Data extraction and quality control
And (4) searching water body surface temperature and salt observation data matched with time and place according to the red tide event information. Besides the relevant data during the red tide, the relevant data before and after the red tide is needed to correspond to the normal water body (non-red tide water body). Extracting minimum and maximum values (T) of temperature and salt observation data every daymin、Smin、Tmax、Smax) And carrying out quality control on the temperature and salinity data.
Taking Fujian offshore as an example, extracting minimum and maximum value data of the surface temperature and salinity of the daily water body for at least +/-7 days before and after the occurrence period of the red tide aiming at each related red tide event; and (3) discarding the data of the same day when the temperature and salt data of the surface layer of the water body matched with the red tide event is T >38 or T <10 or S >40 or S <0, wherein the temperature range is set by referring to the monthly average result (4-9 months) of the MODIS Aqua satellite in the remote sensing SST of the Taiwan strait provided by the NASA website in the multi-year climate state. The warm salt data is obtained by real-time continuous observation of small buoys or fish raft bases laid by ocean forecast tables of Fujian province near the bank of the Fujian province (sampling interval is 30min or 1 h).
3. Building a training database
And establishing a database of the water body state of the day and the maximum and minimum values of the surface temperature and salinity of the water body of the previous day, and taking the database as the training data of the decision tree prediction model. In the water body state, the normal water body is represented by 0; the red tide water body is classified according to the overall distribution of the dominant algae species of the red tide, and is respectively represented by 1,2 and 3 … ….
Taking the Fujian offshore as an example, the number of samples (days) N matched with the temperature and salt data of the previous day in the water body state of the current day is 786, except that the normal water body state is represented by 0, the red tide water body is classified into one class by the red tide species with the sample number exceeding 20, the samples which do not reach the standard or have uncertain red tide dominant algae species information are classified into other classes, 7 classes of water bodies are finally formed, and the classification conditions of the water body states in the Fujian offshore red tide occurrence and red tide species prediction model training database are shown in Table 1.
TABLE 1
Type of state of water body Number of samples Water body state identification
Normal water body 453 0
Red tide of prorocentrum donghaiense 131 1
Karenia mikimotoi red tide 53 2
Red tide of diatom such as Skeletonema costatum 38 3
Hazakhia Hazao red tide 26 4
Two-phase or three-phase red tide of prorocentrum donghaiense/Karenia mikimotoi/noctiluca scintillans 48 5
Other red tides 37 6
4. Establishing a decision tree-based red tide occurrence and red tide type prediction model
The process can be realized through matlab or other software, and the writing of matlab (2014a) mainly comprises the following steps:
the method comprises the following steps: establishing initial decision tree and generating view
ctree=ClassificationTree.fit(P_train,T_train);
view(ctree,'mode','graph');
Wherein, P _ train and T _ train are training sample data, P _ train is temperature and salinity data of the previous day, T _ train is the corresponding water body state of the current day, and ctre is the established initial decision tree. Taking Fujian offshore as an example, P _ train is a 786 × 4double matrix, and 4 columns of data respectively correspond to Tmin、Smin、Tmax、Smax4 parameters; t _ train is a 786 × 1double matrix representing corresponding water state data (0-6), and the generated initial decision tree is shown in FIG. 1.
Step two: testing the impact of leaf node containing minimum number of samples on decision tree performance
As can be seen from fig. 1, the generated initial decision tree is extremely complex, and although the loss rate of the training samples is small, in the subsequent application, the generalization capability is easily weak due to overfitting. Through cross validation, the influence of the minimum sample number contained in the leaf node of the decision number on the performance (error) of the decision tree is tested, the optimal minimum sample number is determined, and the problem can be avoided by pruning the optimized decision tree. The step can be operated for multiple times, and the average effect is seen.
Taking Fujian offshore as an example, the code is as follows:
leafs=zeros(1,26)
fori=5:30
leafs(1,i-4)=i
end;
N=numel(leafs);
for n=1:N
t=ClassificationTree.fit(P_train,T_train,'crossval','on','minleaf',leafs(n));
err(n)=kfoldLoss(t);
end;
and setting the minimum sample number (leaves) contained in the leaf node to change within 5-30 (the step length is 1) according to the sample number distribution of different water body state types, and performing cross validation to generate a cross validation error err. The results are shown in FIG. 2, where the err population increases with the minimum number of samples contained in the leaf node between 5 and 10, followed by a relatively low value at 11. Considering the results of 6 cross-validation together, 11 may be the best choice for the minimum number of samples contained in the leaf node in the Fujian offshore example.
Step three: establishing an optimized decision tree and generating a view
And setting the minimum number of samples contained in the leaf nodes to be the optimal value according to the cross validation error result of the second step, and generating an optimized decision tree.
Taking Fujian offshore as an example, the minimum number of samples is set to 11, and the generated optimized decision tree (OptimalTree) is shown in FIG. 3. The code is as follows:
OptimalTree=ClassificationTree.fit(P_train,T_train,'minleaf',11);
view(OptimalTree,'mode','graph').
step four: model performance pre-evaluation
The training data is used as test data, the prediction accuracy of the optimized decision tree model is checked, and the prediction accuracy can be used as model performance pre-evaluation.
Take Fujian offshore as an example:
T_test=predict(OptimalTree,P_train)
the temperature and salt data P _ train in the training sample is used as input, a prediction result T _ test is output and compared with an actual water body state T _ train, and the prediction results of the Fujian offshore red tide occurrence and red tide type prediction model based on the decision tree on the training sample are shown in the table 2. The total prediction accuracy of the model on the water body state is 80.03%, wherein the prediction accuracy on the normal water body is 88.08%, and the prediction accuracy on the dominant species of the red tide sample is 69.07%. The prediction accuracy is lower for 6-other types of red tides, the number of samples is less, and the samples are mixed with various types of red tides. The prediction accuracy rate is related to the number of samples, and the model prediction capability can be further improved by increasing the number of training samples and reseparating the red tide species. It should be noted that the prediction accuracy can be as high as 71.70% for the toxic Karenia mikimotoi red tide.
TABLE 2
State of water body Number of samples Prediction accuracy (%)
0-normal water body 453 88.08
1-red tide of prorocentrum donghaiense 131 72.52
2-Karenia mikimotoi red tide 53 71.70
3-diatom red tide such as Skeletonema costatum 38 65.79
4-Haemakha red tide 26 65.38
5-prorocentrum donghaiense/Karenia mikimotoi/noctiluca sp biphase or triphase red tide 48 83.33
6-other classes of red tides 37 40.54
Total _ Red tide species 333 69.07
Total 786 80.03
5. Model application
The code is as follows:
T_test=predict(OptimalTree,P_test)
and P _ test is the minimum and maximum values of temperature and salinity of the water body to be predicted on the same day, the data format and the training data are input, and T _ test is the corresponding water body state prediction result on the next day. Taking Fujian offshore as an example, the predicted result T _ test is a certain number from 0 to 6, 0 represents no occurrence of red tide, 1 to 6 represents occurrence of red tide, and the types of red tide correspond to Table 1.
It should be noted that, since the occurrence of regional red tide is usually concentrated in a few days in a year, and a certain type of red tide is often dominant in a year, the application of the model is also a model testing and verifying process for predicting the type of red tide in a relatively long period of several years.

Claims (5)

1. An offshore red tide occurrence and red tide type prediction method based on a decision tree is characterized by comprising the following steps:
1) sorting the relevant information of the red tide event;
2) data extraction and quality control: searching water body surface temperature and salt observation data matched with time and place according to the red tide event information, extracting the minimum and maximum values of the temperature and salt observation data every day, and performing necessary quality control on the temperature and salt data;
3) establishing a training database: establishing a database of which the water body state of the day is matched with the maximum and minimum values of the surface temperature and salinity of the water body of the previous day, and taking the database as the training data of the decision tree prediction model;
4) establishing an offshore red tide forecasting model based on a decision tree;
5) applying the offshore red tide forecasting model established in the step 4) to offshore red tide generation and red tide type forecasting.
2. The method as claimed in claim 1, wherein in step 1), the information related to red tide events includes time, geographical location, area from beginning to end of red tide, dominant algae information when red tide occurs.
3. The decision tree-based offshore red tide occurrence and red tide species prediction method of claim 1, wherein in step 2), the data extraction and quality control steps are as follows: according to the red tide event information, water surface temperature and salt observation data matched with time and place are searched; besides the relevant data during the red tide, the relevant data of a period of time before and after the red tide is needed to correspond to the normal water body; extracting the minimum and maximum values of the temperature and salt observation data every day, and performing quality control on the temperature and salt data.
4. The decision tree-based offshore red tide occurrence and red tide species prediction method of claim 1, wherein in step 3), the water body state refers to normal water body, a certain type of red tide water body; the red tide water body is classified according to the total distribution of the red tide dominant algae species samples.
5. The decision tree-based offshore red tide occurrence and red tide category prediction method as claimed in claim 1, wherein in step 4), the concrete steps of establishing the decision tree-based offshore red tide prediction model are: (1) establishing an initial decision tree and generating a view; (2) through cross validation, testing the influence of the minimum sample number contained in the leaf node on the performance of the decision tree, and determining the optimal minimum sample number; (3) setting the minimum sample number contained in the leaf node to be the optimal value according to the cross validation error result, establishing an optimized decision tree, and generating a view; (4) and (4) checking the prediction accuracy of the optimized decision tree model by taking the training data as test data, and performing pre-evaluation on the model performance.
CN201911410770.2A 2019-12-31 2019-12-31 Decision tree-based offshore red tide generation and red tide type prediction method Pending CN111160655A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911410770.2A CN111160655A (en) 2019-12-31 2019-12-31 Decision tree-based offshore red tide generation and red tide type prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911410770.2A CN111160655A (en) 2019-12-31 2019-12-31 Decision tree-based offshore red tide generation and red tide type prediction method

Publications (1)

Publication Number Publication Date
CN111160655A true CN111160655A (en) 2020-05-15

Family

ID=70559999

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911410770.2A Pending CN111160655A (en) 2019-12-31 2019-12-31 Decision tree-based offshore red tide generation and red tide type prediction method

Country Status (1)

Country Link
CN (1) CN111160655A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112084716A (en) * 2020-09-15 2020-12-15 河北省科学院地理科学研究所 Red tide prediction and early warning method based on eutrophication comprehensive evaluation
CN112926664A (en) * 2021-03-01 2021-06-08 南京信息工程大学 Feature selection and CART forest short-time strong rainfall forecasting method based on evolutionary algorithm
CN114003590A (en) * 2021-10-29 2022-02-01 厦门大学 Quality control method for environmental element data of surface layer of ocean buoy
CN114170139A (en) * 2021-11-09 2022-03-11 深圳市衡兴安全检测技术有限公司 Offshore sea area ecological disaster early warning method and device, electronic equipment and storage medium
CN115290572A (en) * 2022-10-08 2022-11-04 长春理工大学 Red tide polarization monitoring device based on active illumination and monitoring method thereof
CN116258896A (en) * 2023-02-02 2023-06-13 山东产研卫星信息技术产业研究院有限公司 Quasi-real-time red tide monitoring method based on space-space integration

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109002888A (en) * 2018-06-27 2018-12-14 厦门市海洋与渔业研究所 A kind of red tide prewarning method
CN109856357A (en) * 2019-03-19 2019-06-07 广西科学院 A kind of short-term method for early warning of red tide based on buoy online monitoring data and purposes
US20190188611A1 (en) * 2017-12-14 2019-06-20 Business Objects Software Limited Multi-step time series forecasting with residual learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190188611A1 (en) * 2017-12-14 2019-06-20 Business Objects Software Limited Multi-step time series forecasting with residual learning
CN109002888A (en) * 2018-06-27 2018-12-14 厦门市海洋与渔业研究所 A kind of red tide prewarning method
CN109856357A (en) * 2019-03-19 2019-06-07 广西科学院 A kind of short-term method for early warning of red tide based on buoy online monitoring data and purposes

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
柴永强等: "基于决策树的MODIS影像赤潮智能检测技术", 《青岛大学学报(自然科学版)》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112084716A (en) * 2020-09-15 2020-12-15 河北省科学院地理科学研究所 Red tide prediction and early warning method based on eutrophication comprehensive evaluation
CN112926664A (en) * 2021-03-01 2021-06-08 南京信息工程大学 Feature selection and CART forest short-time strong rainfall forecasting method based on evolutionary algorithm
CN112926664B (en) * 2021-03-01 2023-11-24 南京信息工程大学 Feature selection and CART forest short-time strong precipitation prediction method based on evolutionary algorithm
CN114003590A (en) * 2021-10-29 2022-02-01 厦门大学 Quality control method for environmental element data of surface layer of ocean buoy
CN114003590B (en) * 2021-10-29 2024-04-30 厦门大学 Quality control method for ocean buoy surface environmental element data
CN114170139A (en) * 2021-11-09 2022-03-11 深圳市衡兴安全检测技术有限公司 Offshore sea area ecological disaster early warning method and device, electronic equipment and storage medium
CN115290572A (en) * 2022-10-08 2022-11-04 长春理工大学 Red tide polarization monitoring device based on active illumination and monitoring method thereof
CN115290572B (en) * 2022-10-08 2023-01-10 长春理工大学 Red tide polarization monitoring device based on active illumination and monitoring method thereof
CN116258896A (en) * 2023-02-02 2023-06-13 山东产研卫星信息技术产业研究院有限公司 Quasi-real-time red tide monitoring method based on space-space integration
CN116258896B (en) * 2023-02-02 2023-09-26 山东产研卫星信息技术产业研究院有限公司 Quasi-real-time red tide monitoring method based on space-space integration

Similar Documents

Publication Publication Date Title
CN111160655A (en) Decision tree-based offshore red tide generation and red tide type prediction method
Fan et al. A novel model to predict significant wave height based on long short-term memory network
Lou et al. Application of machine learning in ocean data
Nitsure et al. Wave forecasts using wind information and genetic programming
Coad et al. Proactive management of estuarine algal blooms using an automated monitoring buoy coupled with an artificial neural network
Ni et al. An integrated long-short term memory algorithm for predicting polar westerlies wave height
Elbisy Sea wave parameters prediction by support vector machine using a genetic algorithm
Shen et al. Applications of deep learning in hydrology
Kaandorp et al. Modelling size distributions of marine plastics under the influence of continuous cascading fragmentation
Pinto et al. Modeling the transport pathways of harmful algal blooms in the Iberian coast
Lester et al. Modelling future conditions in the degraded semi-arid estuary of Australia's largest river using ecosystem states
Núñez et al. A methodology to assess the probability of marine litter accumulation in estuaries
Wen et al. Harmful algal bloom warning based on machine learning in maritime site monitoring
Nitsure et al. Prediction of sea water levels using wind information and soft computing techniques
CN107977735A (en) A kind of municipal daily water consumption Forecasting Methodology based on deep learning
Finnis et al. Spatiotemporal patterns of paralytic shellfish toxins and their relationships with environmental variables in British Columbia, Canada from 2002 to 2012
Williams et al. Analysing coastal ocean model outputs using competitive-learning pattern recognition techniques
Hu et al. An early forecasting method for the drift path of green tides: a case study in the Yellow Sea, China
CN115267945A (en) Thunder and lightning early warning method and system based on graph neural network
Istvánovics et al. Stochastic simulation of phytoplankton biomass using eighteen years of daily data-predictability of phytoplankton growth in a large, shallow lake
Chowdhury et al. Climate change and coastal morphodynamics: Interactions on regional scales
Nury et al. Analysis of spatially and temporally varying precipitation in Bangladesh
Niu et al. Incorporating marine particulate carbon into machine learning for accurate estimation of coastal chlorophyll-a
Xu et al. Construction of the rule of law system of marine ecological environment protection under the background of wireless network information fusion
Gu et al. A Stacking Ensemble Learning Model for Monthly Rainfall Prediction in the Taihu Basin, China. Water 2022, 14, 492

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200515