CN111160655A - Decision tree-based offshore red tide generation and red tide type prediction method - Google Patents
Decision tree-based offshore red tide generation and red tide type prediction method Download PDFInfo
- Publication number
- CN111160655A CN111160655A CN201911410770.2A CN201911410770A CN111160655A CN 111160655 A CN111160655 A CN 111160655A CN 201911410770 A CN201911410770 A CN 201911410770A CN 111160655 A CN111160655 A CN 111160655A
- Authority
- CN
- China
- Prior art keywords
- red tide
- decision tree
- water body
- offshore
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003066 decision tree Methods 0.000 title claims abstract description 48
- 238000000034 method Methods 0.000 title claims abstract description 28
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims abstract description 72
- 150000003839 salts Chemical class 0.000 claims abstract description 22
- 238000012549 training Methods 0.000 claims abstract description 20
- 238000003908 quality control method Methods 0.000 claims abstract description 12
- 238000013075 data extraction Methods 0.000 claims abstract description 6
- 230000003203 everyday effect Effects 0.000 claims abstract description 5
- 241000894007 species Species 0.000 claims description 15
- 238000012360 testing method Methods 0.000 claims description 14
- 241000195493 Cryptophyta Species 0.000 claims description 13
- 238000002790 cross-validation Methods 0.000 claims description 10
- 238000009826 distribution Methods 0.000 claims description 5
- 238000011156 evaluation Methods 0.000 claims description 4
- 230000002265 prevention Effects 0.000 abstract description 3
- 229920003266 Leaf® Polymers 0.000 description 13
- 241000290272 Karenia mikimotoi Species 0.000 description 11
- 230000008569 process Effects 0.000 description 7
- 241000206732 Skeletonema costatum Species 0.000 description 6
- ATNHDLDRLWWWCB-AENOIHSZSA-M chlorophyll a Chemical compound C1([C@@H](C(=O)OC)C(=O)C2=C3C)=C2N2C3=CC(C(CC)=C3C)=[N+]4C3=CC3=C(C=C)C(C)=C5N3[Mg-2]42[N+]2=C1[C@@H](CCC(=O)OC\C=C(/C)CCC[C@H](C)CCC[C@H](C)CCCC(C)C)[C@H](C)C2=C5 ATNHDLDRLWWWCB-AENOIHSZSA-M 0.000 description 6
- 241001404107 Prorocentrum donghaiense Species 0.000 description 5
- 230000002354 daily effect Effects 0.000 description 5
- 238000012544 monitoring process Methods 0.000 description 5
- 238000011160 research Methods 0.000 description 5
- 238000012706 support-vector machine Methods 0.000 description 5
- 230000007613 environmental effect Effects 0.000 description 4
- 231100000331 toxic Toxicity 0.000 description 4
- 230000002588 toxic effect Effects 0.000 description 4
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 229930002875 chlorophyll Natural products 0.000 description 3
- 235000019804 chlorophyll Nutrition 0.000 description 3
- 229930002868 chlorophyll a Natural products 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000001965 increasing effect Effects 0.000 description 3
- 229910052760 oxygen Inorganic materials 0.000 description 3
- 239000001301 oxygen Substances 0.000 description 3
- 238000013138 pruning Methods 0.000 description 3
- 241000251468 Actinopterygii Species 0.000 description 2
- IJGRMHOSHXDMSA-UHFFFAOYSA-N Atomic nitrogen Chemical compound N#N IJGRMHOSHXDMSA-UHFFFAOYSA-N 0.000 description 2
- 241000200174 Noctiluca Species 0.000 description 2
- 241000200173 Noctiluca scintillans Species 0.000 description 2
- 241001466487 Phaeocystis Species 0.000 description 2
- 240000000095 Pseudonitzschia Species 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000029087 digestion Effects 0.000 description 2
- 230000002068 genetic effect Effects 0.000 description 2
- 238000007477 logistic regression Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 241000227752 Chaetoceros Species 0.000 description 1
- 241000013759 Karenia <Dinophyceae> Species 0.000 description 1
- 241000192710 Microcystis aeruginosa Species 0.000 description 1
- BPQQTUXANYXVAA-UHFFFAOYSA-N Orthosilicate Chemical compound [O-][Si]([O-])([O-])[O-] BPQQTUXANYXVAA-UHFFFAOYSA-N 0.000 description 1
- OAICVXFJPJFONN-UHFFFAOYSA-N Phosphorus Chemical compound [P] OAICVXFJPJFONN-UHFFFAOYSA-N 0.000 description 1
- 241000200248 Prorocentrum Species 0.000 description 1
- 241000554265 Sphaerias Species 0.000 description 1
- 241001148683 Zostera marina Species 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000008033 biological extinction Effects 0.000 description 1
- 239000003181 biological factor Substances 0.000 description 1
- 230000036760 body temperature Effects 0.000 description 1
- 230000030833 cell death Effects 0.000 description 1
- 230000010261 cell growth Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000034994 death Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- HAPOVYFOVVWLRS-UHFFFAOYSA-N ethosuximide Chemical compound CCC1(C)CC(=O)NC1=O HAPOVYFOVVWLRS-UHFFFAOYSA-N 0.000 description 1
- 230000012010 growth Effects 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 229910052757 nitrogen Inorganic materials 0.000 description 1
- 235000015097 nutrients Nutrition 0.000 description 1
- 230000000050 nutritive effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 229910052698 phosphorus Inorganic materials 0.000 description 1
- 239000011574 phosphorus Substances 0.000 description 1
- 238000001556 precipitation Methods 0.000 description 1
- 238000002203 pretreatment Methods 0.000 description 1
- 230000005180 public health Effects 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 238000000611 regression analysis Methods 0.000 description 1
- 230000008521 reorganization Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 239000002344 surface layer Substances 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
Abstract
An offshore red tide generation and red tide type prediction method based on a decision tree relates to offshore red tide prediction. The method comprises the following steps: 1) sorting the relevant information of the red tide event; 2) data extraction and quality control: searching water body surface temperature and salt observation data matched with time and place according to the red tide event information, extracting the minimum and maximum values of the temperature and salt observation data every day, and performing necessary quality control on the temperature and salt data; 3) establishing a training database: establishing a database of which the water body state of the day is matched with the maximum and minimum values of the surface temperature and salinity of the water body of the previous day, and taking the database as the training data of the decision tree prediction model; 4) establishing an offshore red tide forecasting model based on a decision tree; 5) applying the offshore red tide forecasting model established in the step 4) to offshore red tide generation and red tide type forecasting. The method predicts the red tide types which are likely to break out while predicting whether the red tide occurs or not. The method has great significance for the early prevention and emergency management of certain red tide events with great social hazards.
Description
Technical Field
The invention relates to the field of offshore red tide forecasting, in particular to a decision tree-based method for forecasting offshore red tide occurrence and red tide types.
Background
Offshore red tides are common marine disasters and are closely related to fishery production, coastal travel and public health. The existing red tide prediction is mainly based on the monitoring of environmental elements such as biology (algae chlorophyll concentration, algae cell density, zooplankton feeding and the like), chemistry (nutritive salt, pH, DO and the like), hydrology (water temperature, salinity, tide, ocean current, wave and the like), meteorology (air temperature, air pressure, rainfall, wind speed, wind direction, sunlight, humidity and the like), and the generation of red tide is predicted through a single parameter or multiple parameters.
Early prediction is mostly established on the basis of empirical threshold values or simple mathematical statistics methods such as multiple regression, discriminant analysis and the like. For example, the Japanese scholars Andahilan proposed the range of red tide biological concentration and population growth rate as the standard for judging red tide outbreak according to the example statistics of multiple red tide events in various sea areas in Japan[1](ii) a Land bucket, etc[2]The content of chlorophyll a is increased to more than 10mg/m3 and has a rapid increasing trend, namely, a red tide is about to occur; lin Yi&Forest stand[3]Inferring DIP threshold for Skeletonema costatum red tide by enclosure experiment1.2 mu mol/L, which is used as a reference for prediction and prediction of the red tide; wangzhuang et al[4]Through the sampling observation of a plurality of sites in the Yangtze river mouth and the adjacent sea area, the dissolved oxygen day-night change difference value is considered to be more than or equal to 5mg/dm3Indicating the occurrence of red tide; correcting dawn yang[5]Putting forward a standard value with a transparency value of 1.6m as a red tide early warning for trial pointing; jiangxinglong&Song standing honor[6]Establishing a multi-parameter regression equation of the cell density of each dominant species of the red tide algae in the Quanzhou bay by applying the data of the cell density, the water quality physicochemical biological factors and the like of the dominant species of the red tide algae in the sea area and performing multivariate stepwise regression analysis; wuyufang (Wuyufang)[7]The 24h continuous monitoring data of the automatic continuous monitoring instrument for the ocean water quality in the mansion sea area of 2005-2008 are utilized, the daily variation gradient and the like of environmental elements are used as forecasting factors, and a stepwise regression statistical method is adopted to establish a 28h chlorophyll a forecasting equation.
With the proposal of a data mining concept, aiming at the characteristics of mutation, nonlinearity and complexity of red tide generation, a large number of machine learning algorithms are applied to red tide prediction, and the methods mainly comprise methods such as an artificial neural network, a Support Vector Machine (SVM), a genetic algorithm, fuzzy logic, a decision tree, logistic regression and the like. Marsili-Libelli[8]Setting a series of fuzzy rules based on experimental observation and expert knowledge, and predicting the bloom of the Italy Orbetello lagoon through the daily change of parameters such as dissolved oxygen, pH, water temperature and the like; muttil&Lee[9]Modeling by utilizing a genetic algorithm, and performing alternate-day prediction on the red tide near the bank of hong Kong; lane et al[10]Applying parameters such as water temperature, nutrient salt and river input to predict toxic Pseudo-nitzschia diatoms in Montreal bay, California by Logistic regression; gonz-lez Vilas et al[11]And (3) predicting red tide of Pseudo-nitzschia diatoms in the near-shore water body of Spanish based on SVM (support vector machine) by applying water body temperature and salinity data and an upflow index (upflowing index). There are many similar cases in China, such as courage[12]Extracting inorganic nitrogen (DIN), pH value, salinity, water temperature and silicate which are obviously related to red tide outbreak by utilizing red tide actual measurement data and contemporaneous hydrological meteorological data in 2006 of Haizhou Bay 2004-And threshold values for environmental factors such as wind speed; zhang Cheng et al[13]Predicting chlorophyll after 7 days by using parameters such as chlorophyll a, solar radiation, extinction coefficient, water temperature, pH and the like based on the SVM; suxin hong, etc[14]Establishing nonlinear relations between 219 red tide case data in Fujian sea area and 5 weather factors including air temperature, precipitation, wind speed, air pressure and sunshine by applying BP neural network artificial intelligence model, and learning, training and predicting according to Fujian east, Fujian middle and Fujian south.
According to the occurrence mechanism of the red tide, more complex research is carried out, physical-chemical-biological processes are coupled, and an ecological power model is established to predict the red tide. E.g. Allen et al[15]Establishing an ecological-hydrodynamic coupling model, and predicting the occurrence of water bloom in the continental shelf area of northwest Europe; McGillibrand et al[16]Aiming at Karenia mikimotoi, the multi-factor effects of spore distribution, cell growth/death, hydrodynamic force and the like are considered, and the occurrence of the Karenia mikimotoi on the Saglan coastline in 2006 is simulated; arbor and welfare and the like[17]A six-component red tide ecological dynamic model of the Yangtze river mouth sea area is established, the red tide ecological dynamic model and the control factor of the Yangtze river mouth sea area are researched, and the whole process of the red tide digestion is simulated; li Da Ming et al[18]A two-dimensional red tide ecological mathematical model combining hydrodynamics and biodynamics is established, and the living and digestion process of Bohai sea area brown cystis is simulated.
Although there are many studies or applications at home and abroad in the aspect of red tide prediction, the prediction is limited to the occurrence of red tide or a certain red tide, and the prediction of the possible occurrence types of the red tide is rarely reported. In fact, the types of red tide generation are generally diverse, and the ecological effect and social influence of different red tide types are different. Taking Guangdong province and Fujian province in the south of China as an example, the statistical results of the offshore red tide in the Guangdong province in 1980-2016 show that the common red tide comprises noctiluca scintillans, phaeocystis palmata, skeletonema costatum and Sphaeria pyramidal, wherein phaeocystis palmata can cause a great amount of fish death[19](ii) a The variety of red tide in coastal Fujian province is up to 20 in 2001-2010, the prorocentrum donghaiense, noctiluca, skeletonema costatum and chaetoceros are the most common, and the toxic red tide is Karenia mikimotoi[20]. The method has important social significance for effectively reducing social hazards if the occurrence, early prevention and deployment of red tide species such as Karenia mikimotoi and Zostera marina which have great influence can be predicted.
Decision tree is an inductive learning technique in machine learning, is an algorithm for classifying data or constructing a prediction model, and induces a group of classification rules expressed by a tree structure from a disordered and irregular example set[21,22]. C4.5 is one of the main algorithms of the decision tree, introduces information gain rate to select attributes, and solves the defects that the original ID3 algorithm cannot process continuous attributes and is easy to select values with more attribute values as splitting standards; in the process of constructing the tree, the over-fitting problem is avoided through pruning optimization, so that the applicability of the algorithm is improved[23,24]. On the basis of learning historical data, C4.5 can better realize classification or prediction of two or more types of new data and can be widely applied to various fields such as commerce, medicine, remote sensing images and the like[25-28]。
In summary, there are various offshore red tide occurrence types and different social influences of different red tide types, but the existing red tide prediction is limited to whether the red tide occurs or not, and the red tide types are not related.
Reference documents:
[1] andaland-red tide organism と red tide entity aquatic product civil engineering, 1973,9(1):31-36.
[2] Luodidine, J.Gobel, Wangchun, et al, Red tide biological monitoring and real-time red tide prediction in the sea area of Zhejiang, east sea, 2000,18(2):33-43.
[3] Ri Yi, Lin Rong Cheng Xianchong triggered threshold study of harmful diatom bloom phosphorus in ocean and lake 1999,30(4):391 and 396.
[4] Wangquan, Zhang Qing, Luhai Yan, etc. Changjiang estuary dissolved oxygen red tide forecast simple mode, oceanographic report, 2000,22(4): 125-.
[5] The early-warning monitoring parameter of red tide is the preliminary research of transparency, ocean environmental science 2001,20(1):31-35.
[6] Jiangxinglong, Song Li Rong, Quanzhou Bay red tide algae dominant species cell density regression equation research, sea and lake marshes, 2010,41(3): 341-.
[7] Wuyufang, establishment of prediction equation of chlorophyll value in mansion sea area during high-incidence red tide, and application of prediction equation to research of disastrous red tide prediction mode, ocean prediction, 2012,29(2):39-44.
[8]Marsili-Libelli S.Fuzzy prediction of the algal blooms in theOrbetello lagoon.Environmental Modelling&Software,2004,19:799–808.
[9]Muttil N and Lee J H W.Genetic programming for analysis and real-time prediction of coastal algal blooms.Ecological Modelling,2005,189:363–376.
[10]Lane J Q,Taimondi P T,Kudela R M.Development of a logisticregression model for theprediction of toxigenic Pseudo-nitzschia bloomsinMonterey Bay,California.Marine Ecology Progress Series,2009,383:37-51,doi:10.3354/meps07999.
[11]González Vilas L,Spyrakos E,Torres Palenzuela J M,et al.SupportVector Machine-based method for predicting Pseudo-nitzschia spp.blooms incoastal waters(Galician rias,NW Spain).Progress in Oceanography,2014,124:66–77.
[12] Xu, Zhang Ying, Liu Ji Tang, etc. based on Logistic regression, research on red tide environmental element threshold in Bay, Haizhou, oceanic advisory, 2009,28(3):70-75.
[13] Zhang Cheng, Chen Zhen, Xu Qiang, etc. the prediction model of chlorophyll a concentration in Taihu Meilianwan based on support vector machine, report of environmental science, 2013,33(10): 2856-.
[14] Study on red tide forecasting method of Fujian sea area based on BP neural network model, aquatic science and newspaper, 2017, 41 (11): 1744-1755.
[15]Allen J I,Smyth T J,Siddorn J R,Holt M.How well can we forecasthigh biomass algal bloom events in a eutrophic coastal sea?Harmful Algae2008,8,70–76,http://dx.doi.org/10.1016/j.hal.2008.08.024.2.
[16]McGillicuddy J,Townsend J D,He D W,et al.Suppressionof the2010Alexandriumfundyense bloom by changes in physical,biological,andchemicalproperties of the Gulf of Maine.Limnol.Oceanogr.,2011,56:2411–2426,http://dx.doi.org/10.4319/lo.2011.56.6.2411.
[17] Qiaofang, Yuan-Shi, Zhumingyuan, etc. research on red tide ecological dynamics models and red tide control factors in estuary sea areas, oceans and lakes, 2000,31(1):93-100.
[18] Plum-buzz, forest-resold, songxia, and the like, a two-dimensional red tide ecological mathematical model and application thereof in Bohai sea, oceanographic science, 2010,34(9):87-93.
[19]Li L,LüS,Cen J.Spatio-temporal variations of harmful algal bloomsalongthe coast of Guangdong,Southern China during 1980–2016*.Journal ofOceanology and Limnology,37(2):535-551,https://doi.org/10.1007/s00343-019-8088-y.
[20] Li Xueding, Fujian coastal near 10a red tide basic characteristic analysis, environmental science, 2012,33(7):2210-2216.
[21]Hunt E B,Marin J,&Stone P J.Experiments in Induction.New York:Academic Press,1966.
[22]Quinlan J R.Induction of decision tree.Machine Learning,1986,1(1):81-106.
[23]Quinlan J R.C4.5:Programs for machine learning.Morgan KaufmanPublisher,San Mateo,CA,1993:27-48.
[24]Quinlan J.Improved Use of Continuous Attributes in C4.5.Journalof Articial Intelligence Research,1996,4:77-90.
[25]Hwang S,Nguyen Q,Lee P.Reproducibility of a regional geologicalmap derivedfrom geochemical maps,using data mining techniques:withapplication toChungbuk province of Korea.Environ.Geol.,2005,48:569–578,https://doi.org/10.1007/s00254-005-1313-3.
[26]Polat K and Günes S.A novel hybrid intelligent method based onC4.5 decision tree classifier and one-against-all approach for multi-classclassification problems.Expert Systems with Applications,2009,36:1587–1592.
[27]Wu W,Dasgupta S,Ramirez EE,et al.Classification Accuracies ofPhysical Activities Using Smartphone Motion Sensors.J Med Internet Res,2012,14(5):e130,DOI:10.2196/jmir.2208.
[28] The method comprises the steps of forever strong Chaihong, Shao Peake, Sunrong Cheng, and the like, an MODIS image red tide intelligent detection technology based on a decision tree, a university of Qingdao (Nature science edition), 2012,25(2), 47-52.
Disclosure of Invention
The invention aims to provide a decision tree-based offshore red tide generation and red tide type prediction method by taking the minimum and maximum daily values of the surface temperature and salinity of a water body as prediction factors aiming at the problems of diversity of offshore red tide generation types and different social influences of different red tide types without relating to the problem of red tide type prediction in the conventional method.
The invention comprises the following steps:
1) sorting the relevant information of the red tide event;
2) data extraction and quality control: searching water body surface temperature and salt observation data matched with time and place according to the red tide event information, extracting the minimum and maximum values of the temperature and salt observation data every day, and performing necessary quality control on the temperature and salt data;
3) establishing a training database: establishing a database of which the water body state of the day is matched with the maximum and minimum values of the surface temperature and salinity of the water body of the previous day, and taking the database as the training data of the decision tree prediction model;
4) establishing an offshore red tide forecasting model based on a decision tree;
5) applying the offshore red tide forecasting model established in the step 4) to offshore red tide generation and red tide type forecasting.
In step 1), the red tide event related information includes time, geographical location, area from the beginning to the end of the red tide, and dominant algae information when the red tide occurs.
In step 2), the specific steps of data extraction and quality control may be: according to the red tide event information, water surface temperature and salt observation data matched with time and place are searched; besides the relevant data during the red tide, the relevant data of a period of time before and after the red tide is needed to correspond to the normal water body (non-red tide water body); extracting daily minimum and maximum values (T) of temperature and salt observation datamin、Smin、Tmax、Smax) And carrying out quality control on the temperature and salinity data.
In the step 3), the water body state refers to a normal water body and a certain red tide water body; the red tide water body is classified according to the total distribution of the red tide dominant algae species samples.
In step 4), the specific steps of establishing the decision tree-based offshore red tide forecasting model may be: (1) establishing an initial decision tree and generating a view; (2) through cross validation, testing the influence of the minimum sample number contained in the leaf node on the performance of the decision tree, and determining the optimal minimum sample number; (3) setting the minimum sample number contained in the leaf node to be the optimal value according to the cross validation error result, establishing an optimized decision tree, and generating a view; (4) and (4) checking the prediction accuracy of the optimized decision tree model by taking the training data as test data, and performing pre-evaluation on the model performance.
The invention applies decision tree, selects the minimum and maximum values of surface temperature and salinity of water body every day as forecasting factors, and develops the offshore red tide and red tide species forecasting technology. The minimum and maximum values (T) of water body surface temperature and salinity on day before and after a plurality of historical red tide events occurmin、Smin、Tmax、Smax) And the water state N (N ═ 0,1, 2,3 … 0 represents normal water; 1,2, 3.. representing different kinds of red tide water bodies) as a training database, and establishing a model by applying a decision tree C4.5 algorithm and pruning to optimize so as to avoid the over-fitting problem. Based on the model, whether the red tide occurs in the next day and the dominant algae in the red tide can be predicted through the surface temperature and salinity of the water body in the day. Taking Fujian offshore red tide historical data as an example, for 7 established water body states, namely a normal water body, diatom red tides such as east-sea Prorocentrum donghaiense, Karenia mikimotoi red tides, Skeletonema costatum and the like, Hakha haemoglobosa red tides, east-sea Prorocentrum mikimoto/Karenia noctilus biphase or triphase red tides and other red tides, the prediction accuracy of a decision tree model on the water body states is as follows: 88.08% of normal water body and 69.07% of red tide species, wherein the content of Karenia mikimotoi is 71.70%. The invention predicts whether the red tide occurs or not and simultaneously carries out pre-treatment on the red tide types which are possibly outbreakedAnd (6) measuring. The method has great significance for the early prevention and emergency management of certain red tide events with great social hazards, such as toxic Karenia mikimotoi.
Drawings
Fig. 1 is a tree diagram of initial decision making for red tide occurrence and red tide species prediction in offshore Fujian province. Wherein x1, x2, x3 and x4 represent the minimum and maximum values (T) of temperature and salinity of the previous day respectivelymin、Smin、Tmax、Smax) 0-6 correspond to different water state types (Table 1). All node branches are represented by the left side as the case of being less than a certain condition, and the right side as the case of being greater than or equal to correspondingly.
FIG. 2 is a graph of the relationship between the 6-time cross validation errors of the model and the minimum number of samples contained in the leaf nodes of the decision tree, and the black bold line represents the 6-time average results.
Fig. 3 is a tree diagram of the decision tree for forecasting red tide occurrence and red tide species in the offshore Fujian province. Wherein x1, x2, x3 and x4 respectively represent the minimum value and the maximum value (Tmin, Smin, Tmax and Smax) of the temperature and the salinity of the previous day, and 0-6 correspond to different water body state types (Table 1). All node branches are represented by the left side as the case of being less than a certain condition, and the right side as the case of being greater than or equal to correspondingly.
Detailed Description
The following examples will further illustrate the present invention with reference to the accompanying drawings.
The specific implementation of the decision tree-based method for predicting the occurrence and type of the offshore red tide is shown as follows, taking the prediction of the occurrence and type of the offshore red tide in Fujian province as an example.
1. Reorganization of information related to red tide events
And (4) finishing the red tide historical event information of the research area, wherein the red tide historical event information comprises the time, the geographical position and the area from the beginning to the end of the red tide, and dominant algae information when the red tide occurs.
Taking Fujian offshore as an example, and taking 'Fujian province marine disaster bulletin' issued by Fujian province oceans and fishery halls every year as a basis, red tide information of Fujian offshore in 2007 and 2018 is collated.
2. Data extraction and quality control
And (4) searching water body surface temperature and salt observation data matched with time and place according to the red tide event information. Besides the relevant data during the red tide, the relevant data before and after the red tide is needed to correspond to the normal water body (non-red tide water body). Extracting minimum and maximum values (T) of temperature and salt observation data every daymin、Smin、Tmax、Smax) And carrying out quality control on the temperature and salinity data.
Taking Fujian offshore as an example, extracting minimum and maximum value data of the surface temperature and salinity of the daily water body for at least +/-7 days before and after the occurrence period of the red tide aiming at each related red tide event; and (3) discarding the data of the same day when the temperature and salt data of the surface layer of the water body matched with the red tide event is T >38 or T <10 or S >40 or S <0, wherein the temperature range is set by referring to the monthly average result (4-9 months) of the MODIS Aqua satellite in the remote sensing SST of the Taiwan strait provided by the NASA website in the multi-year climate state. The warm salt data is obtained by real-time continuous observation of small buoys or fish raft bases laid by ocean forecast tables of Fujian province near the bank of the Fujian province (sampling interval is 30min or 1 h).
3. Building a training database
And establishing a database of the water body state of the day and the maximum and minimum values of the surface temperature and salinity of the water body of the previous day, and taking the database as the training data of the decision tree prediction model. In the water body state, the normal water body is represented by 0; the red tide water body is classified according to the overall distribution of the dominant algae species of the red tide, and is respectively represented by 1,2 and 3 … ….
Taking the Fujian offshore as an example, the number of samples (days) N matched with the temperature and salt data of the previous day in the water body state of the current day is 786, except that the normal water body state is represented by 0, the red tide water body is classified into one class by the red tide species with the sample number exceeding 20, the samples which do not reach the standard or have uncertain red tide dominant algae species information are classified into other classes, 7 classes of water bodies are finally formed, and the classification conditions of the water body states in the Fujian offshore red tide occurrence and red tide species prediction model training database are shown in Table 1.
TABLE 1
Type of state of water body | Number of samples | Water body state identification |
Normal water body | 453 | 0 |
Red tide of prorocentrum donghaiense | 131 | 1 |
Karenia mikimotoi red tide | 53 | 2 |
Red tide of diatom such as Skeletonema costatum | 38 | 3 |
Hazakhia Hazao red tide | 26 | 4 |
Two-phase or three-phase red tide of prorocentrum donghaiense/Karenia mikimotoi/noctiluca scintillans | 48 | 5 |
Other red tides | 37 | 6 |
4. Establishing a decision tree-based red tide occurrence and red tide type prediction model
The process can be realized through matlab or other software, and the writing of matlab (2014a) mainly comprises the following steps:
the method comprises the following steps: establishing initial decision tree and generating view
ctree=ClassificationTree.fit(P_train,T_train);
view(ctree,'mode','graph');
Wherein, P _ train and T _ train are training sample data, P _ train is temperature and salinity data of the previous day, T _ train is the corresponding water body state of the current day, and ctre is the established initial decision tree. Taking Fujian offshore as an example, P _ train is a 786 × 4double matrix, and 4 columns of data respectively correspond to Tmin、Smin、Tmax、Smax4 parameters; t _ train is a 786 × 1double matrix representing corresponding water state data (0-6), and the generated initial decision tree is shown in FIG. 1.
Step two: testing the impact of leaf node containing minimum number of samples on decision tree performance
As can be seen from fig. 1, the generated initial decision tree is extremely complex, and although the loss rate of the training samples is small, in the subsequent application, the generalization capability is easily weak due to overfitting. Through cross validation, the influence of the minimum sample number contained in the leaf node of the decision number on the performance (error) of the decision tree is tested, the optimal minimum sample number is determined, and the problem can be avoided by pruning the optimized decision tree. The step can be operated for multiple times, and the average effect is seen.
Taking Fujian offshore as an example, the code is as follows:
leafs=zeros(1,26)
fori=5:30
leafs(1,i-4)=i
end;
N=numel(leafs);
for n=1:N
t=ClassificationTree.fit(P_train,T_train,'crossval','on','minleaf',leafs(n));
err(n)=kfoldLoss(t);
end;
and setting the minimum sample number (leaves) contained in the leaf node to change within 5-30 (the step length is 1) according to the sample number distribution of different water body state types, and performing cross validation to generate a cross validation error err. The results are shown in FIG. 2, where the err population increases with the minimum number of samples contained in the leaf node between 5 and 10, followed by a relatively low value at 11. Considering the results of 6 cross-validation together, 11 may be the best choice for the minimum number of samples contained in the leaf node in the Fujian offshore example.
Step three: establishing an optimized decision tree and generating a view
And setting the minimum number of samples contained in the leaf nodes to be the optimal value according to the cross validation error result of the second step, and generating an optimized decision tree.
Taking Fujian offshore as an example, the minimum number of samples is set to 11, and the generated optimized decision tree (OptimalTree) is shown in FIG. 3. The code is as follows:
OptimalTree=ClassificationTree.fit(P_train,T_train,'minleaf',11);
view(OptimalTree,'mode','graph').
step four: model performance pre-evaluation
The training data is used as test data, the prediction accuracy of the optimized decision tree model is checked, and the prediction accuracy can be used as model performance pre-evaluation.
Take Fujian offshore as an example:
T_test=predict(OptimalTree,P_train)
the temperature and salt data P _ train in the training sample is used as input, a prediction result T _ test is output and compared with an actual water body state T _ train, and the prediction results of the Fujian offshore red tide occurrence and red tide type prediction model based on the decision tree on the training sample are shown in the table 2. The total prediction accuracy of the model on the water body state is 80.03%, wherein the prediction accuracy on the normal water body is 88.08%, and the prediction accuracy on the dominant species of the red tide sample is 69.07%. The prediction accuracy is lower for 6-other types of red tides, the number of samples is less, and the samples are mixed with various types of red tides. The prediction accuracy rate is related to the number of samples, and the model prediction capability can be further improved by increasing the number of training samples and reseparating the red tide species. It should be noted that the prediction accuracy can be as high as 71.70% for the toxic Karenia mikimotoi red tide.
TABLE 2
State of water body | Number of samples | Prediction accuracy (%) |
0-normal water body | 453 | 88.08 |
1-red tide of prorocentrum donghaiense | 131 | 72.52 |
2-Karenia mikimotoi red tide | 53 | 71.70 |
3-diatom red tide such as Skeletonema costatum | 38 | 65.79 |
4-Haemakha red tide | 26 | 65.38 |
5-prorocentrum donghaiense/Karenia mikimotoi/noctiluca sp biphase or triphase red tide | 48 | 83.33 |
6-other classes of red tides | 37 | 40.54 |
Total _ Red tide species | 333 | 69.07 |
Total | 786 | 80.03 |
5. Model application
The code is as follows:
T_test=predict(OptimalTree,P_test)
and P _ test is the minimum and maximum values of temperature and salinity of the water body to be predicted on the same day, the data format and the training data are input, and T _ test is the corresponding water body state prediction result on the next day. Taking Fujian offshore as an example, the predicted result T _ test is a certain number from 0 to 6, 0 represents no occurrence of red tide, 1 to 6 represents occurrence of red tide, and the types of red tide correspond to Table 1.
It should be noted that, since the occurrence of regional red tide is usually concentrated in a few days in a year, and a certain type of red tide is often dominant in a year, the application of the model is also a model testing and verifying process for predicting the type of red tide in a relatively long period of several years.
Claims (5)
1. An offshore red tide occurrence and red tide type prediction method based on a decision tree is characterized by comprising the following steps:
1) sorting the relevant information of the red tide event;
2) data extraction and quality control: searching water body surface temperature and salt observation data matched with time and place according to the red tide event information, extracting the minimum and maximum values of the temperature and salt observation data every day, and performing necessary quality control on the temperature and salt data;
3) establishing a training database: establishing a database of which the water body state of the day is matched with the maximum and minimum values of the surface temperature and salinity of the water body of the previous day, and taking the database as the training data of the decision tree prediction model;
4) establishing an offshore red tide forecasting model based on a decision tree;
5) applying the offshore red tide forecasting model established in the step 4) to offshore red tide generation and red tide type forecasting.
2. The method as claimed in claim 1, wherein in step 1), the information related to red tide events includes time, geographical location, area from beginning to end of red tide, dominant algae information when red tide occurs.
3. The decision tree-based offshore red tide occurrence and red tide species prediction method of claim 1, wherein in step 2), the data extraction and quality control steps are as follows: according to the red tide event information, water surface temperature and salt observation data matched with time and place are searched; besides the relevant data during the red tide, the relevant data of a period of time before and after the red tide is needed to correspond to the normal water body; extracting the minimum and maximum values of the temperature and salt observation data every day, and performing quality control on the temperature and salt data.
4. The decision tree-based offshore red tide occurrence and red tide species prediction method of claim 1, wherein in step 3), the water body state refers to normal water body, a certain type of red tide water body; the red tide water body is classified according to the total distribution of the red tide dominant algae species samples.
5. The decision tree-based offshore red tide occurrence and red tide category prediction method as claimed in claim 1, wherein in step 4), the concrete steps of establishing the decision tree-based offshore red tide prediction model are: (1) establishing an initial decision tree and generating a view; (2) through cross validation, testing the influence of the minimum sample number contained in the leaf node on the performance of the decision tree, and determining the optimal minimum sample number; (3) setting the minimum sample number contained in the leaf node to be the optimal value according to the cross validation error result, establishing an optimized decision tree, and generating a view; (4) and (4) checking the prediction accuracy of the optimized decision tree model by taking the training data as test data, and performing pre-evaluation on the model performance.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911410770.2A CN111160655A (en) | 2019-12-31 | 2019-12-31 | Decision tree-based offshore red tide generation and red tide type prediction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911410770.2A CN111160655A (en) | 2019-12-31 | 2019-12-31 | Decision tree-based offshore red tide generation and red tide type prediction method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111160655A true CN111160655A (en) | 2020-05-15 |
Family
ID=70559999
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911410770.2A Pending CN111160655A (en) | 2019-12-31 | 2019-12-31 | Decision tree-based offshore red tide generation and red tide type prediction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111160655A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112084716A (en) * | 2020-09-15 | 2020-12-15 | 河北省科学院地理科学研究所 | Red tide prediction and early warning method based on eutrophication comprehensive evaluation |
CN112926664A (en) * | 2021-03-01 | 2021-06-08 | 南京信息工程大学 | Feature selection and CART forest short-time strong rainfall forecasting method based on evolutionary algorithm |
CN114003590A (en) * | 2021-10-29 | 2022-02-01 | 厦门大学 | Quality control method for environmental element data of surface layer of ocean buoy |
CN114170139A (en) * | 2021-11-09 | 2022-03-11 | 深圳市衡兴安全检测技术有限公司 | Offshore sea area ecological disaster early warning method and device, electronic equipment and storage medium |
CN115290572A (en) * | 2022-10-08 | 2022-11-04 | 长春理工大学 | Red tide polarization monitoring device based on active illumination and monitoring method thereof |
CN116258896A (en) * | 2023-02-02 | 2023-06-13 | 山东产研卫星信息技术产业研究院有限公司 | Quasi-real-time red tide monitoring method based on space-space integration |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109002888A (en) * | 2018-06-27 | 2018-12-14 | 厦门市海洋与渔业研究所 | A kind of red tide prewarning method |
CN109856357A (en) * | 2019-03-19 | 2019-06-07 | 广西科学院 | A kind of short-term method for early warning of red tide based on buoy online monitoring data and purposes |
US20190188611A1 (en) * | 2017-12-14 | 2019-06-20 | Business Objects Software Limited | Multi-step time series forecasting with residual learning |
-
2019
- 2019-12-31 CN CN201911410770.2A patent/CN111160655A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190188611A1 (en) * | 2017-12-14 | 2019-06-20 | Business Objects Software Limited | Multi-step time series forecasting with residual learning |
CN109002888A (en) * | 2018-06-27 | 2018-12-14 | 厦门市海洋与渔业研究所 | A kind of red tide prewarning method |
CN109856357A (en) * | 2019-03-19 | 2019-06-07 | 广西科学院 | A kind of short-term method for early warning of red tide based on buoy online monitoring data and purposes |
Non-Patent Citations (1)
Title |
---|
柴永强等: "基于决策树的MODIS影像赤潮智能检测技术", 《青岛大学学报(自然科学版)》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112084716A (en) * | 2020-09-15 | 2020-12-15 | 河北省科学院地理科学研究所 | Red tide prediction and early warning method based on eutrophication comprehensive evaluation |
CN112926664A (en) * | 2021-03-01 | 2021-06-08 | 南京信息工程大学 | Feature selection and CART forest short-time strong rainfall forecasting method based on evolutionary algorithm |
CN112926664B (en) * | 2021-03-01 | 2023-11-24 | 南京信息工程大学 | Feature selection and CART forest short-time strong precipitation prediction method based on evolutionary algorithm |
CN114003590A (en) * | 2021-10-29 | 2022-02-01 | 厦门大学 | Quality control method for environmental element data of surface layer of ocean buoy |
CN114003590B (en) * | 2021-10-29 | 2024-04-30 | 厦门大学 | Quality control method for ocean buoy surface environmental element data |
CN114170139A (en) * | 2021-11-09 | 2022-03-11 | 深圳市衡兴安全检测技术有限公司 | Offshore sea area ecological disaster early warning method and device, electronic equipment and storage medium |
CN115290572A (en) * | 2022-10-08 | 2022-11-04 | 长春理工大学 | Red tide polarization monitoring device based on active illumination and monitoring method thereof |
CN115290572B (en) * | 2022-10-08 | 2023-01-10 | 长春理工大学 | Red tide polarization monitoring device based on active illumination and monitoring method thereof |
CN116258896A (en) * | 2023-02-02 | 2023-06-13 | 山东产研卫星信息技术产业研究院有限公司 | Quasi-real-time red tide monitoring method based on space-space integration |
CN116258896B (en) * | 2023-02-02 | 2023-09-26 | 山东产研卫星信息技术产业研究院有限公司 | Quasi-real-time red tide monitoring method based on space-space integration |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111160655A (en) | Decision tree-based offshore red tide generation and red tide type prediction method | |
Fan et al. | A novel model to predict significant wave height based on long short-term memory network | |
Lou et al. | Application of machine learning in ocean data | |
Nitsure et al. | Wave forecasts using wind information and genetic programming | |
Coad et al. | Proactive management of estuarine algal blooms using an automated monitoring buoy coupled with an artificial neural network | |
Ni et al. | An integrated long-short term memory algorithm for predicting polar westerlies wave height | |
Elbisy | Sea wave parameters prediction by support vector machine using a genetic algorithm | |
Shen et al. | Applications of deep learning in hydrology | |
Kaandorp et al. | Modelling size distributions of marine plastics under the influence of continuous cascading fragmentation | |
Pinto et al. | Modeling the transport pathways of harmful algal blooms in the Iberian coast | |
Lester et al. | Modelling future conditions in the degraded semi-arid estuary of Australia's largest river using ecosystem states | |
Núñez et al. | A methodology to assess the probability of marine litter accumulation in estuaries | |
Wen et al. | Harmful algal bloom warning based on machine learning in maritime site monitoring | |
Nitsure et al. | Prediction of sea water levels using wind information and soft computing techniques | |
CN107977735A (en) | A kind of municipal daily water consumption Forecasting Methodology based on deep learning | |
Finnis et al. | Spatiotemporal patterns of paralytic shellfish toxins and their relationships with environmental variables in British Columbia, Canada from 2002 to 2012 | |
Williams et al. | Analysing coastal ocean model outputs using competitive-learning pattern recognition techniques | |
Hu et al. | An early forecasting method for the drift path of green tides: a case study in the Yellow Sea, China | |
CN115267945A (en) | Thunder and lightning early warning method and system based on graph neural network | |
Istvánovics et al. | Stochastic simulation of phytoplankton biomass using eighteen years of daily data-predictability of phytoplankton growth in a large, shallow lake | |
Chowdhury et al. | Climate change and coastal morphodynamics: Interactions on regional scales | |
Nury et al. | Analysis of spatially and temporally varying precipitation in Bangladesh | |
Niu et al. | Incorporating marine particulate carbon into machine learning for accurate estimation of coastal chlorophyll-a | |
Xu et al. | Construction of the rule of law system of marine ecological environment protection under the background of wireless network information fusion | |
Gu et al. | A Stacking Ensemble Learning Model for Monthly Rainfall Prediction in the Taihu Basin, China. Water 2022, 14, 492 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200515 |