CN116362394A - Synergistic prediction method and system for marine algae growth pollution - Google Patents
Synergistic prediction method and system for marine algae growth pollution Download PDFInfo
- Publication number
- CN116362394A CN116362394A CN202310301771.3A CN202310301771A CN116362394A CN 116362394 A CN116362394 A CN 116362394A CN 202310301771 A CN202310301771 A CN 202310301771A CN 116362394 A CN116362394 A CN 116362394A
- Authority
- CN
- China
- Prior art keywords
- growth
- marine algae
- algae
- pollution
- marine
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000005791 algae growth Effects 0.000 title claims abstract description 96
- 238000000034 method Methods 0.000 title claims abstract description 54
- 230000002195 synergetic effect Effects 0.000 title 1
- 241000195493 Cryptophyta Species 0.000 claims abstract description 97
- 230000012010 growth Effects 0.000 claims abstract description 88
- 238000012417 linear regression Methods 0.000 claims abstract description 23
- IJGRMHOSHXDMSA-UHFFFAOYSA-N Atomic nitrogen Chemical compound N#N IJGRMHOSHXDMSA-UHFFFAOYSA-N 0.000 claims description 36
- ATNHDLDRLWWWCB-AENOIHSZSA-M chlorophyll a Chemical compound C1([C@@H](C(=O)OC)C(=O)C2=C3C)=C2N2C3=CC(C(CC)=C3C)=[N+]4C3=CC3=C(C=C)C(C)=C5N3[Mg-2]42[N+]2=C1[C@@H](CCC(=O)OC\C=C(/C)CCC[C@H](C)CCC[C@H](C)CCCC(C)C)[C@H](C)C2=C5 ATNHDLDRLWWWCB-AENOIHSZSA-M 0.000 claims description 31
- 229930002868 chlorophyll a Natural products 0.000 claims description 29
- OAICVXFJPJFONN-UHFFFAOYSA-N Phosphorus Chemical compound [P] OAICVXFJPJFONN-UHFFFAOYSA-N 0.000 claims description 20
- 229910052698 phosphorus Inorganic materials 0.000 claims description 20
- 239000011574 phosphorus Substances 0.000 claims description 20
- 238000012545 processing Methods 0.000 claims description 20
- 229910052757 nitrogen Inorganic materials 0.000 claims description 18
- 238000004458 analytical method Methods 0.000 claims description 17
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 claims description 17
- 229910052760 oxygen Inorganic materials 0.000 claims description 17
- 239000001301 oxygen Substances 0.000 claims description 17
- 238000012360 testing method Methods 0.000 claims description 16
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 16
- 239000011159 matrix material Substances 0.000 claims description 12
- 229910052710 silicon Inorganic materials 0.000 claims description 12
- 239000010703 silicon Substances 0.000 claims description 12
- 238000009826 distribution Methods 0.000 claims description 11
- 238000007781 pre-processing Methods 0.000 claims description 11
- XKMRRTOUMJRJIA-UHFFFAOYSA-N ammonia nh3 Chemical compound N.N XKMRRTOUMJRJIA-UHFFFAOYSA-N 0.000 claims description 10
- 230000003247 decreasing effect Effects 0.000 claims description 6
- 235000015097 nutrients Nutrition 0.000 claims description 6
- BPQQTUXANYXVAA-UHFFFAOYSA-N Orthosilicate Chemical group [O-][Si]([O-])([O-])[O-] BPQQTUXANYXVAA-UHFFFAOYSA-N 0.000 claims description 5
- 239000000049 pigment Substances 0.000 claims description 5
- 239000013535 sea water Substances 0.000 claims description 5
- 238000003860 storage Methods 0.000 claims description 5
- 230000002123 temporal effect Effects 0.000 claims description 5
- 238000000977 Dickey–Fuller test Methods 0.000 claims description 4
- 238000010219 correlation analysis Methods 0.000 claims description 3
- 229930002875 chlorophyll Natural products 0.000 claims description 2
- 235000019804 chlorophyll Nutrition 0.000 claims description 2
- 238000007619 statistical method Methods 0.000 abstract description 6
- 239000003102 growth factor Substances 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 21
- 238000011161 development Methods 0.000 description 12
- XUIMIQQOPSSXEZ-UHFFFAOYSA-N Silicon Chemical compound [Si] XUIMIQQOPSSXEZ-UHFFFAOYSA-N 0.000 description 10
- 230000008569 process Effects 0.000 description 7
- HIVGXUNKSAJJDN-UHFFFAOYSA-N [Si].[P] Chemical compound [Si].[P] HIVGXUNKSAJJDN-UHFFFAOYSA-N 0.000 description 6
- 230000000875 corresponding effect Effects 0.000 description 6
- 230000007774 longterm Effects 0.000 description 6
- 238000004590 computer program Methods 0.000 description 5
- YUWBVKYVJWNVLE-UHFFFAOYSA-N [N].[P] Chemical compound [N].[P] YUWBVKYVJWNVLE-UHFFFAOYSA-N 0.000 description 4
- 238000005183 dynamical system Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 3
- 230000002596 correlated effect Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 230000001364 causal effect Effects 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000005312 nonlinear dynamic Methods 0.000 description 2
- 230000008092 positive effect Effects 0.000 description 2
- 239000000047 product Substances 0.000 description 2
- 238000000611 regression analysis Methods 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 150000003839 salts Chemical class 0.000 description 2
- 239000013589 supplement Substances 0.000 description 2
- 238000010998 test method Methods 0.000 description 2
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 238000000342 Monte Carlo simulation Methods 0.000 description 1
- GRYLNZFGIOXLOG-UHFFFAOYSA-N Nitric acid Chemical compound O[N+]([O-])=O GRYLNZFGIOXLOG-UHFFFAOYSA-N 0.000 description 1
- JVMRPSJZNHXORP-UHFFFAOYSA-N ON=O.ON=O.ON=O.N Chemical compound ON=O.ON=O.ON=O.N JVMRPSJZNHXORP-UHFFFAOYSA-N 0.000 description 1
- -1 X 3 temperature Chemical compound 0.000 description 1
- UMVBXBACMIOFDO-UHFFFAOYSA-N [N].[Si] Chemical compound [N].[Si] UMVBXBACMIOFDO-UHFFFAOYSA-N 0.000 description 1
- MMDJDBSEMBIJBB-UHFFFAOYSA-N [O-][N+]([O-])=O.[O-][N+]([O-])=O.[O-][N+]([O-])=O.[NH6+3] Chemical compound [O-][N+]([O-])=O.[O-][N+]([O-])=O.[O-][N+]([O-])=O.[NH6+3] MMDJDBSEMBIJBB-UHFFFAOYSA-N 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 239000005422 algal bloom Substances 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000006866 deterioration Effects 0.000 description 1
- 238000009792 diffusion process Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000012851 eutrophication Methods 0.000 description 1
- 239000003337 fertilizer Substances 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 229910017604 nitric acid Inorganic materials 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- NBIIXXVUZAFLBC-UHFFFAOYSA-K phosphate Chemical compound [O-]P([O-])([O-])=O NBIIXXVUZAFLBC-UHFFFAOYSA-K 0.000 description 1
- 239000010865 sewage Substances 0.000 description 1
- RMAQACBXLXPBSY-UHFFFAOYSA-N silicic acid Chemical class O[Si](O)(O)O RMAQACBXLXPBSY-UHFFFAOYSA-N 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 238000000528 statistical test Methods 0.000 description 1
- 239000003053 toxin Substances 0.000 description 1
- 231100000765 toxin Toxicity 0.000 description 1
- 108700012359 toxins Proteins 0.000 description 1
- 230000001228 trophic effect Effects 0.000 description 1
- 238000001744 unit root test Methods 0.000 description 1
- 238000010792 warming Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Tourism & Hospitality (AREA)
- Economics (AREA)
- Data Mining & Analysis (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Operations Research (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Mathematical Physics (AREA)
- Development Economics (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Educational Administration (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Health & Medical Sciences (AREA)
- Game Theory and Decision Science (AREA)
- Evolutionary Biology (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Quality & Reliability (AREA)
- Primary Health Care (AREA)
- Algebra (AREA)
- Entrepreneurship & Innovation (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Farming Of Fish And Shellfish (AREA)
Abstract
本发明公开了一种海洋藻类生长污染协整预测方法及系统,属于藻类污染治理技术领域。本发明根据藻类生长因素的已有实证理论、非线性复杂系统理论的相关结果,基于藻类生长相关真实数据,进行统计分析与多元线性回归预测,获取影响藻类生长的核心因素,进而进行海洋藻类污染的治理。能够针对不同海域的实际情况,获取对海洋藻类生长的影响因素,有针对性的进行海洋藻类污染的治理;解决了现有技术中存在“缺乏预测海洋藻类污染影响因素的有效方法”的问题。
The invention discloses a co-integration prediction method and system for marine algae growth pollution, belonging to the technical field of algae pollution control. According to the existing empirical theory of algae growth factors and relevant results of nonlinear complex system theory, the present invention performs statistical analysis and multiple linear regression prediction based on real data related to algae growth, obtains the core factors affecting algae growth, and then performs marine algae pollution. governance. According to the actual conditions of different sea areas, the influencing factors on the growth of marine algae can be obtained, and the pollution of marine algae can be controlled in a targeted manner; the problem of "lack of effective methods for predicting the influencing factors of marine algae pollution" in the prior art is solved.
Description
技术领域technical field
本发明涉及藻类污染治理技术领域,特别是涉及一种海洋藻类生长污染协整预测方法及系统。The invention relates to the technical field of algae pollution control, in particular to a cointegration prediction method and system for marine algae growth pollution.
背景技术Background technique
本部分的陈述仅仅是提到了与本发明相关的背景技术,并不必然构成现有技术。The statements in this section merely mention the background technology related to the present invention and do not necessarily constitute the prior art.
在特殊的环境条件下,某些海洋藻类的过量繁殖和高度聚集(即“有害藻华”),会造成海洋藻类污染。目前判断海洋藻类污染的方法有传统的船只定期调查、岸滨人工定期观测,但是难以发现赤潮的突发性变化;或者运用多源卫星遥感、无人机、船舶、车辆陆岸立体监测绿潮的动态变化,运用生态动力学和漂移预测模型等预测绿潮的变化,以及绿潮覆盖面积和海温之间的长期均衡和因果关系预测绿潮的规模变化。Under special environmental conditions, the excessive reproduction and high accumulation of certain marine algae (ie "harmful algal blooms") will cause marine algae pollution. At present, the methods for judging marine algae pollution include traditional ship regular surveys and shoreside artificial regular observations, but it is difficult to detect sudden changes in red tides; or use multi-source satellite remote sensing, drones, ships, and vehicles to monitor green tides three-dimensionally Using ecological dynamics and drift prediction models to predict changes in green tides, and the long-term equilibrium and causal relationship between green tide coverage area and sea temperature to predict the scale changes of green tides.
以上所有的海洋藻类污染预测都需要深入理解藻类的生长数量或密度变化,然而,基于藻类生长与海温、化学物质占比、其他捕食藻类等之间存在的复杂非线性关系,藻类在不同时间段、垂向分布发生变化时,在气候因子、风速与温度等产生不同时,均会有不同的生长变化特征。近年来基于人类的捕捞、污水排放、水体富营养化、自然气候的变化等,破坏了水域生态系统的生态平衡,使其出现营养层级下降、优势种群衰退等现象。All of the above marine algae pollution predictions require an in-depth understanding of algae growth quantity or density changes. When the segmental and vertical distribution changes, and when the climatic factors, wind speed and temperature are different, there will be different growth and change characteristics. In recent years, due to human fishing, sewage discharge, eutrophication of water bodies, and changes in natural climate, etc., the ecological balance of the water ecosystem has been destroyed, resulting in the decline of trophic levels and the decline of dominant species.
因此,即使人们已经大概了解化肥使用量的增加和全球气候变暖是藻华形势恶化的可能原因,目前也仍然没有治理海洋藻类污染的有效方法。虽然目前已经有机器学习等算法对藻类生长的影响因素进行分类,但是机器学习算法并没有对影响因素进行分析,无法为海洋藻类的污染治理提供帮助。Therefore, even if people have a general understanding of the increase in fertilizer use and global warming as possible reasons for the deterioration of the algal bloom situation, there is still no effective way to control marine algae pollution. Although there are algorithms such as machine learning to classify the influencing factors of algae growth, the machine learning algorithm does not analyze the influencing factors and cannot provide help for the pollution control of marine algae.
发明内容Contents of the invention
为了解决现有技术的不足,本发明提供了一种海洋藻类生长污染协整预测方法、系统、电子设备及计算机可读存储介质,根据藻类生长因素的已有实证理论、非线性复杂系统理论的相关结果,基于不同地区的藻类生长相关真实数据,进行统计分析与多元线性回归预测。In order to solve the deficiencies of the prior art, the present invention provides a cointegration prediction method, system, electronic equipment and computer-readable storage medium for marine algae growth pollution, based on the existing empirical theory of algae growth factors and the theory of nonlinear complex systems The relevant results are based on real data related to algae growth in different regions, and statistical analysis and multiple linear regression predictions are carried out.
第一方面,本发明提供了一种海洋藻类生长污染协整预测方法;In the first aspect, the present invention provides a cointegration prediction method for marine algae growth pollution;
一种海洋藻类生长污染协整预测方法,包括:A cointegration prediction method for marine algae growth pollution, comprising:
获取海洋藻类生长相关数据,并对海洋藻类生长相关数据进行预处理;Obtain the data related to the growth of marine algae and preprocess the data related to the growth of marine algae;
对将预处理后的海洋藻类生长相关数据输入预设的协整预测模型进行处理,获取海洋藻类生长影响变量之间的协整关系,以根据协整关系抑制海洋藻类生长污染;其中,将预处理后的海洋藻类生长相关数据输入预设的协整预测模型进行处理包括:Input the pre-processed marine algae growth-related data into the preset co-integration prediction model to obtain the co-integration relationship between the variables affecting the growth of marine algae, so as to inhibit the pollution of marine algae growth according to the co-integration relationship; The processed marine algae growth-related data are input into the preset co-integration prediction model for processing including:
对预处理后的海洋藻类生长相关数据进行稳定性分析,获取藻类多元时间动态生长趋势曲线;Perform stability analysis on the preprocessed data related to the growth of marine algae, and obtain the multivariate temporal dynamic growth trend curve of algae;
对预处理后的海洋藻类生长相关数据进行归一化处理,基于藻类多元时间动态生长趋势曲线,确定藻类生长污染的关键影响变量和次级影响变量;Normalize the data related to the growth of marine algae after preprocessing, and determine the key influencing variables and secondary influencing variables of algae growth pollution based on the multivariate time dynamic growth trend curve of algae;
计算关键影响变量和次级影响变量之间的相关系数矩阵,建立关键影响变量和次级影响变量之间的协整关系;运用Jcitest方法检验关键影响变量和次级影响变量之间的协整关系,建立多元线性回归模型。进一步的,所述对预处理后的海洋藻类生长相关数据进行稳定性分析,获取藻类多元时间动态生长趋势曲线具体为:Calculate the correlation coefficient matrix between the key influencing variables and the secondary influencing variables, and establish the cointegration relationship between the key influencing variables and the secondary influencing variables; use the Jcitest method to test the cointegration relationship between the key influencing variables and the secondary influencing variables , to build a multiple linear regression model. Further, the stability analysis is carried out on the data related to the growth of marine algae after pretreatment, and the multivariate time dynamic growth trend curve of the obtained algae is specifically:
根据预处理后的海洋藻类生长相关数据,分析海洋藻类生长污染出现次数或海洋藻类生长污染面积呈减少趋势的海洋藻类生长相关数据,获取多元时间动态生长趋势曲线。According to the data related to the growth of marine algae after preprocessing, the number of occurrences of marine algae growth pollution or the data related to the growth of marine algae growth showing a decreasing trend is analyzed to obtain a multivariate time dynamic growth trend curve.
进一步的,所述基于藻类多元时间动态生长趋势曲线,确定藻类生长污染的关键影响变量和次级影响变量包括:Further, the determination of key influencing variables and secondary influencing variables of algae growth pollution based on the multivariate time dynamic growth trend curve of algae includes:
基于藻类多元时间动态生长趋势曲线,将藻类的叶绿素a浓度作为藻类生长污染的关键影响变量;Based on the multi-time dynamic growth trend curve of algae, the chlorophyll-a concentration of algae is regarded as the key influencing variable of algae growth pollution;
对海洋藻类生长相关数据中的海洋水质基本数据进行相关性分析,获取具有高度相关性的次级影响变量。Correlation analysis is carried out on the basic data of marine water quality in the data related to the growth of marine algae to obtain secondary influencing variables with high correlation.
优选的,删除相关性过低的次级影响变量,运用迪基富勒检验方法对处理后的数据进行平稳性分析;Preferably, the secondary influencing variables with low correlation are deleted, and the Dickey Fuller test method is used to analyze the stationarity of the processed data;
通过QQ图分析法判断核心影响变量是否符合正态分布。The QQ graph analysis method was used to judge whether the core influencing variables conformed to the normal distribution.
进一步的,所述多元线性回归模型表示为Further, the multiple linear regression model is expressed as
Y=1.63X1+4.6516X2+3.6104X3-0.91236X4 Y=1.63X 1 +4.6516X 2 +3.6104X 3 -0.91236X 4
+1.0207X5-2.563X6+0.52605X7+0.33509X8 +1.0207X 5 -2.563X 6 +0.52605X 7 +0.33509X 8
+0.85703X9-0.023949+0.85703X 9 -0.023949
其中,Y为叶绿素a浓度、X1为五天生化需氧量、X2为氨氮含量、X3为温度、X4为盐度、X5为脱镁色素含量、X6为硅含量、X7为溶解氧含量、X8为总氮含量、X9为总磷含量;Among them, Y is the concentration of chlorophyll a, X 1 is the five-day biochemical oxygen demand, X 2 is the content of ammonia nitrogen, X 3 is the temperature, X 4 is the salinity, X 5 is the content of pheomagnesium pigment, X 6 is the content of silicon, X 7 is dissolved oxygen content, X 8 is total nitrogen content, X 9 is total phosphorus content;
或者,or,
所述多元线性回归模型表示为The multiple linear regression model is expressed as
Y=-0.023616X1+1.4527X2-0.29186X3+0.055603X4+0.13686Y=-0.023616X 1 +1.4527X 2 -0.29186X 3 +0.055603X 4 +0.13686
其中,Y为硅磷比,X1为水柱平均叶绿素a浓度,X2为硅酸盐,X3为溶解无机氮,X4为氮磷比。Among them, Y is the ratio of silicon to phosphorus, X1 is the average concentration of chlorophyll a in the water column, X2 is silicate, X3 is dissolved inorganic nitrogen, and X4 is the ratio of nitrogen to phosphorus.
进一步的,对海洋藻类生长相关数据进行预处理包括:Further, preprocessing the data related to the growth of marine algae includes:
构建海洋藻类生长相关数据集,判断数据集中的缺失值是否影响海洋藻类生长污染预测;Construct a data set related to marine algae growth, and determine whether the missing values in the data set affect the prediction of marine algae growth pollution;
若是,则通过插值法补充缺失值;若否,则删除缺失值;If so, the missing value is supplemented by interpolation; if not, the missing value is deleted;
删除与海洋藻类生长污染预测不相关的非数值数据。Non-numerical data not relevant to marine algae growth pollution predictions were removed.
进一步的,所述海洋藻类生长相关数据包括叶绿素浓度、海域水体营养盐浓度和海域水质数据。Further, the data related to the growth of marine algae includes chlorophyll concentration, sea water nutrient concentration and sea water quality data.
第二方面,本发明提供了一种海洋藻类生长污染协整预测系统;In the second aspect, the present invention provides a cointegration prediction system for marine algae growth pollution;
一种海洋藻类生长污染协整预测系统,包括:A cointegration prediction system for marine algae growth pollution, including:
数据处理模块,被配置为:获取海洋藻类生长相关数据,并对海洋藻类生长相关数据进行预处理;The data processing module is configured to: obtain data related to the growth of marine algae, and preprocess the data related to the growth of marine algae;
协整关系预测模块,被配置为:将预处理后的海洋藻类生长相关数据输入预设的协整预测模型进行处理,获取海洋藻类生长影响变量之间的协整关系,以根据协整关系抑制海洋藻类生长污染;其中,将预处理后的海洋藻类生长相关数据输入预设的协整预测模型进行处理包括:The co-integration relationship prediction module is configured to: input the pre-processed marine algae growth-related data into the preset co-integration prediction model for processing, and obtain the co-integration relationship between the variables affecting the growth of marine algae, so as to suppress according to the co-integration relationship Marine algae growth pollution; wherein, inputting the preprocessed marine algae growth-related data into a preset co-integration prediction model for processing includes:
对预处理后的海洋藻类生长相关数据进行稳定性分析,获取藻类多元时间动态生长趋势曲线;Perform stability analysis on the preprocessed data related to the growth of marine algae, and obtain the multivariate temporal dynamic growth trend curve of algae;
对预处理后的海洋藻类生长相关数据进行归一化处理,基于藻类多元时间动态生长趋势曲线,确定藻类生长污染的关键影响变量和次级影响变量;Normalize the data related to the growth of marine algae after preprocessing, and determine the key influencing variables and secondary influencing variables of algae growth pollution based on the multivariate time dynamic growth trend curve of algae;
计算关键影响变量和次级影响变量之间的相关系数矩阵,建立关键影响变量和次级影响变量之间的协整关系;运用Jcitest方法检验关键影响变量和次级影响变量之间的协整关系,建立多元线性回归模型,以根据协整关系调整次级影响变量的占比,抑制海洋藻类生长污染。Calculate the correlation coefficient matrix between the key influencing variables and the secondary influencing variables, and establish the cointegration relationship between the key influencing variables and the secondary influencing variables; use the Jcitest method to test the cointegration relationship between the key influencing variables and the secondary influencing variables , to establish a multiple linear regression model to adjust the proportion of secondary impact variables according to the co-integration relationship, and to inhibit the growth and pollution of marine algae.
第三方面,本发明提供了一种电子设备;In a third aspect, the present invention provides an electronic device;
一种电子设备,包括存储器和处理器以及存储在存储器上并在处理器上运行的计算机指令,所述计算机指令被处理器运行时,完成上述海洋藻类生长污染协整预测方法的步骤。An electronic device includes a memory, a processor, and computer instructions stored in the memory and run on the processor. When the computer instructions are run by the processor, the steps of the above-mentioned cointegration prediction method for marine algal growth pollution are completed.
第四方面,本发明提供了一种计算机可读存储介质;In a fourth aspect, the present invention provides a computer-readable storage medium;
一种计算机可读存储介质,用于存储计算机指令,所述计算机指令被处理器执行时,完成上述海洋藻类生长污染协整预测方法的步骤。A computer-readable storage medium is used for storing computer instructions, and when the computer instructions are executed by a processor, the steps of the above-mentioned cointegration prediction method for marine algae growth pollution are completed.
与现有技术相比,本发明的有益效果是:Compared with prior art, the beneficial effect of the present invention is:
本发明提供的技术方案,基于藻类生长的复杂非线性动态特性、实际藻类生长数据特征,结合经济发展水平的区域情况,构建基于真实藻类生长相关数据的时间动态比较多元线性回归模型,有助于查明人类活动与藻类污染的影响关系,确定影响海洋藻类生长污染的关键影响因素,便于根据关键影响因素进行调控,抑制海洋藻类的生长,整治海洋藻类生长污染。The technical solution provided by the present invention is based on the complex nonlinear dynamic characteristics of algae growth, the characteristics of actual algae growth data, and in combination with the regional conditions of the economic development level, constructs a time dynamic comparison multiple linear regression model based on real algae growth related data, which is helpful Find out the relationship between human activities and algae pollution, and determine the key factors that affect the growth of marine algae pollution, so as to facilitate regulation based on key factors, inhibit the growth of marine algae, and remediate the growth of marine algae pollution.
附图说明Description of drawings
构成本发明的一部分的说明书附图用来提供对本发明的进一步理解,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。The accompanying drawings constituting a part of the present invention are used to provide a further understanding of the present invention, and the schematic embodiments of the present invention and their descriptions are used to explain the present invention and do not constitute improper limitations to the present invention.
图1为本发明实施例提供的流程示意图;Fig. 1 is a schematic flow diagram provided by an embodiment of the present invention;
图2为本发明实施例提供的经济发展水平一般的某海域的藻类多元时间动态生长趋势曲线示意图;其中,(a)为水体营养盐的时间动态生长趋势曲线示意图,(b)为叶绿素a的时间动态生长趋势曲线示意图;Fig. 2 is a schematic diagram of the multi-time dynamic growth trend curve of algae in a certain sea area with a general economic development level provided by the embodiment of the present invention; wherein, (a) is a schematic diagram of the time dynamic growth trend curve of water body nutrients, and (b) is a graph of chlorophyll a Schematic diagram of time dynamic growth trend curve;
图3为本发明实施例提供的经济发展水平一般的某海域的硅磷比与其他相关数据的相关关系示意图;Fig. 3 is a schematic diagram of the correlation relationship between the silicon-phosphorus ratio and other relevant data in a certain sea area with a general level of economic development provided by the embodiment of the present invention;
图4为本发明实施例提供的经济发展水平一般的某海域的硅磷比与其他相关数据的协整关系检验示意图;其中,(a)为预测值与真实值的散点图与回归置信区间图,(b)为预测值与真实值残差的正态性分布检验图,(c)为拟合值与残差值关系的散点图;Fig. 4 is a schematic diagram of co-integration relationship test between the silicon-phosphorus ratio and other related data in a certain sea area with general economic development level provided by the embodiment of the present invention; wherein, (a) is the scatter plot and regression confidence interval of the predicted value and the actual value Figure, (b) is the normality distribution test graph of predicted value and true value residual, (c) is the scatter diagram of fitting value and residual value relation;
图5为本发明实施例提供的经济发展水平高的某海域的藻类多元时间动态生长趋势曲线示意图;其中,(a)为叶绿素a的时间动态生长趋势曲线示意图,(b)为水体营养盐的时间动态生长趋势曲线示意图;Fig. 5 is a schematic diagram of the multi-time dynamic growth trend curve of algae in a certain sea area with a high level of economic development provided by the embodiment of the present invention; wherein, (a) is a schematic diagram of the time dynamic growth trend curve of chlorophyll a, and (b) is the water body nutrient salt Schematic diagram of time dynamic growth trend curve;
图6为本发明实施例提供的经济发展水平高的某海域的水体营养盐、叶绿素a等变量之间的相关系数示意图;Fig. 6 is a schematic diagram of the correlation coefficient between variables such as water body nutrients and chlorophyll a in a certain sea area with a high level of economic development provided by the embodiment of the present invention;
图7为本发明实施例提供的经济发展水平高的某海域的叶绿素a与次级影响变量之间的协整关系检验示意图;其中,(a)为真实的叶绿素a浓度与拟合的叶绿素a浓度图,(b)为残差图,(c)为残差图的概率分布示意图;(d)为模型预测值与已有统计数据值的散点图与回归置信区间图。Figure 7 is a schematic diagram of the co-integration relationship test between chlorophyll a and secondary influencing variables in a certain sea area with a high level of economic development provided by the embodiment of the present invention; wherein, (a) is the true chlorophyll a concentration and the fitted chlorophyll a Concentration map, (b) is a residual map, (c) is a schematic diagram of the probability distribution of the residual map; (d) is a scatter plot and a regression confidence interval map of model predicted values and existing statistical data values.
具体实施方式Detailed ways
应该指出,以下详细说明都是示例性的,旨在对本发明提供进一步的说明。除非另有指明,本发明使用的所有技术和科学术语具有与本发明所属技术领域的普通技术人员通常理解的相同含义。It should be noted that the following detailed description is exemplary and intended to provide further explanation of the present invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
需要注意的是,这里所使用的术语仅是为了描述具体实施方式,而非意图限制根据本发明的示例性实施方式。如在这里所使用的,除非上下文另外明确指出,否则单数形式也意图包括复数形式,此外,还应当理解的是,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terminology used here is only for describing specific embodiments, and is not intended to limit exemplary embodiments according to the present invention. As used herein, unless the context clearly dictates otherwise, the singular is intended to include the plural, and it should also be understood that the terms "comprising" and "having" and any variations thereof are intended to cover a non-exclusive Comprising, for example, a process, method, system, product, or device comprising a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include steps or units not explicitly listed or for these processes, methods, Other steps or units inherent in a product or equipment.
在不冲突的情况下,本发明中的实施例及实施例中的特征可以相互组合。In the case of no conflict, the embodiments and the features in the embodiments of the present invention can be combined with each other.
实施例一Embodiment one
结合图1,现有的机器学习算法并没有关注藻类生长影响因素背后的相关关系与因果关系,因此,并不能给出不同因素或因素组合(即多变量协整)影响藻类生长浓度的量化系数。其中,协整关系是指从实际数据的生成过程出发,时间序列呈现出的某种长期稳定关系;只有时间序列中存在协整关系,才能进行回归分析并预测。如果时间序列不平稳直接做回归分析,会出现回归结果的各项显著性和拟合程度都很好,但是实际却并不存在这种回归关系。因此,本发明提供了一种海洋藻类生长污染协整预测方法,该海洋藻类生长污染协整预测方法包括如下步骤:Combined with Figure 1, the existing machine learning algorithms do not pay attention to the correlation and causality behind the factors affecting algae growth. Therefore, they cannot give the quantitative coefficients of different factors or factor combinations (ie, multivariate cointegration) affecting the concentration of algae growth. . Among them, the co-integration relationship refers to a long-term stable relationship presented by the time series starting from the actual data generation process; only when there is a co-integration relationship in the time series can regression analysis and prediction be performed. If the time series is not stable and the regression analysis is performed directly, the significance and fitting degree of the regression results will be very good, but such a regression relationship does not actually exist. Therefore, the present invention provides a method for predicting cointegration of marine algae growth pollution, which includes the following steps:
S1、获取海洋藻类生长相关数据,并对海洋藻类生长相关数据进行预处理;具体的,构建海洋藻类生长相关数据集,判断数据集中的缺失值是否影响海洋藻类生长污染协整预测;若是,则通过插值法补充缺失值;若否,则删除缺失值;删除与海洋藻类生长污染协整预测不相关的非数值数据。S1. Obtain the data related to the growth of marine algae, and preprocess the data related to the growth of marine algae; specifically, construct a data set related to the growth of marine algae, and determine whether the missing values in the data set affect the cointegration prediction of marine algae growth pollution; if so, then Missing values were supplemented by interpolation; if not, missing values were deleted; non-numeric data irrelevant to cointegration prediction of marine algae growth pollution were deleted.
S2、将预处理后的海洋藻类生长相关数据输入预设的协整预测模型进行处理,获取海洋藻类生长影响变量之间的协整关系,以根据协整关系抑制海洋藻类生长污染;其中,将预处理后的海洋藻类生长相关数据输入预设的协整预测模型进行处理包括:S2. Input the pre-processed data related to the growth of marine algae into the preset co-integration prediction model for processing, and obtain the co-integration relationship between the variables affecting the growth of marine algae, so as to inhibit the growth and pollution of marine algae according to the co-integration relationship; wherein, the The preprocessed marine algae growth-related data is input into the preset co-integration prediction model for processing including:
S201、对预处理后的海洋藻类生长相关数据进行稳定性分析,获取藻类多元时间动态生长趋势曲线;具体的,根据预处理后的海洋藻类生长相关数据,分析海洋藻类生长污染出现次数或海洋藻类生长污染面积呈减少趋势的海洋藻类生长相关数据,获取多元时间动态生长趋势曲线。S201. Perform stability analysis on the preprocessed data related to the growth of marine algae, and obtain the multivariate time dynamic growth trend curve of the algae; specifically, according to the data related to the growth of the preprocessed marine algae, analyze the occurrence times of marine algae growth pollution or marine algae The data related to the growth of marine algae whose polluted area shows a decreasing trend is obtained, and the multivariate time dynamic growth trend curve is obtained.
基于非线性动力系统理论中的斑图动力学,可知海洋藻类个体数量的空间或动态变化与扩散、迁移等行为有关,为远离热力学平衡的开放、耗散系统,在具有捕获、毒素等作用下呈现出复杂的特征,而复杂系统的明显特征是从空间上来看,无法通过局部的规律预测整体的变化;从时间上来看,短期的规律也不能很好的预测长期的变化。因此,并不是数据越多越好,而是必须基于不同海域的真实藻类生长数据,并分析其具体的动态趋势,才能较好的给出短时期内的预测。Based on the pattern dynamics in the theory of nonlinear dynamical systems, it can be known that the spatial or dynamic changes in the number of marine algae individuals are related to behaviors such as diffusion and migration. They are open and dissipative systems far from thermodynamic equilibrium. Under the effects of capture and toxins It presents complex features, and the obvious feature of a complex system is that from a spatial point of view, local laws cannot predict overall changes; from a time point of view, short-term laws cannot predict long-term changes well. Therefore, it is not that the more data the better, but it must be based on the real algae growth data in different sea areas and analyze its specific dynamic trends in order to give a better forecast in a short period of time.
具体的,根据预处理后的海洋藻类生长相关数据,对数化该数据,并分析其稳定性,继而给出海洋藻类生长污染出现次数或海洋藻类生长污染面积呈减少趋势的海洋藻类生长相关数据,获取多元时间动态生长趋势曲线。Specifically, according to the data related to the growth of marine algae after preprocessing, logarithmize the data, and analyze its stability, and then give the relevant data of the growth of marine algae that shows the number of occurrences of marine algae growth pollution or the area of marine algae growth pollution that shows a decreasing trend , to obtain the multivariate time dynamic growth trend curve.
S202、、对预处理后的海洋藻类生长相关数据进行归一化处理,基于藻类多元时间动态生长趋势曲线,确定藻类生长污染的关键影响变量和次级影响变量。包括如下步骤:S202. Perform normalization processing on the preprocessed data related to the growth of marine algae, and determine the key influencing variables and secondary influencing variables of algae growth pollution based on the multivariate time dynamic growth trend curve of algae. Including the following steps:
S2021、基于藻类多元时间动态生长趋势曲线,获取藻类生长污染的关键影响变量;S2021. Based on the multivariate time dynamic growth trend curve of algae, obtain the key influencing variables of algae growth pollution;
S2022、对海洋藻类生长相关数据中的海洋水质基本数据进行相关性分析,获取具有高度相关性的次级影响变量;S2022. Carry out correlation analysis on the basic data of marine water quality in the data related to the growth of marine algae, and obtain highly correlated secondary influencing variables;
S2023、删除相关性过低的次级影响变量,运用迪基富勒检验方法对处理后的数据进行平稳性分析;具体流程如下:S2023. Delete secondary influencing variables with low correlation, and use the Dickey Fuller test method to analyze the stationarity of the processed data; the specific process is as follows:
(1)假设被检验的时间序列服从自回归模型:(1) Assume that the tested time series obeys the autoregressive model:
Yt=δYt-1+εt,Y t =δY t-1 +ε t ,
(2)分析其差分回归模型中的λ是否为0:(2) Analyze whether the λ in its differential regression model is 0:
ΔYt=λYt-1+εt,ΔY t =λY t-1 +ε t ,
(3)如果是0,则被检验时间序列为最基本的单位根过程,是非平稳的。判断显著性水平的方法为蒙特卡洛模拟方法。(3) If it is 0, the tested time series is the most basic unit root process, which is non-stationary. The method for judging the significance level is the Monte Carlo simulation method.
S203、计算关键影响变量和次级影响变量之间的相关系数矩阵,用来选择影响被预测变量的独立变量,同时基于协整的定义,最终分析得出具备协整关系的影响因素,获得多元线性回归模型的具体系数与统计指标,运用Jcitest方法检验关键影响变量和次级影响变量之间的协整关系,建立多元线性回归模型;通过QQ图分析法判断多元线性回归预模型的残差是否符合正态分布。QQ图是Quantile-Quantile(分位数-分位数图)的简称,其原理是通过检验一组样本数据的分位数与已知分布的数据分位数进行比较,判断该检验数据是否符合某种分布。S203. Calculate the correlation coefficient matrix between the key influencing variable and the secondary influencing variable, which is used to select the independent variable that affects the predicted variable. At the same time, based on the definition of cointegration, finally analyze the influencing factors with cointegration relationship, and obtain multivariate For the specific coefficients and statistical indicators of the linear regression model, use the Jcitest method to test the co-integration relationship between the key influencing variables and the secondary influencing variables, and establish a multiple linear regression model; use the QQ graph analysis method to judge whether the residuals of the multiple linear regression pre-model are Fits a normal distribution. QQ diagram is the abbreviation of Quantile-Quantile (quantile-quantile diagram). Its principle is to compare the quantile of a set of sample data with the quantile of known distribution data to judge whether the test data conforms to the some kind of distribution.
其中,多元线性回归模型用于显示关键影响变量和次级影响变量之间的协整关系,以根据协整关系调整次级影响变量的占比,抑制海洋藻类生长污染。Among them, the multiple linear regression model is used to display the co-integration relationship between the key influencing variables and the secondary influencing variables, so as to adjust the proportion of the secondary influencing variables according to the co-integrating relationship, and inhibit the growth and pollution of marine algae.
正态分布检验是判断多元线性回归模型的残差是否符合正态分布。因为只有残差符合正态分布,才能保证此模型是最优的。The normal distribution test is to judge whether the residuals of the multiple linear regression model conform to the normal distribution. This is because the model is guaranteed to be optimal only if the residuals conform to a normal distribution.
具体的,首先,基于原始数据的对数化进行两两相关系数的计算,得到其相关系数矩阵,并用图来表示。Specifically, firstly, the pairwise correlation coefficient is calculated based on the logarithmization of the original data, and the correlation coefficient matrix is obtained, which is represented by a graph.
然后,利用Jcitest方法检验关键影响变量和次级影响变量之间的协整关系,Jcitest方法即Johansen协整检验方法,目的是用来检验(非平稳)数据间的因果关系是否是伪回归。协整理论能用于正确解释预测现象。Jcitest是多元方程技术,其思想为用极大似然估计来检验多变量之间的协整关系。其具体检验流程为:Then, the Jcitest method is used to test the co-integration relationship between the key influencing variables and the secondary influencing variables. The Jcitest method is the Johansen cointegration test method, which is used to test whether the causal relationship between (non-stationary) data is pseudo-regression. Cointegration theory can be used to correctly explain forecasting phenomena. Jcitest is a multivariate equation technique whose idea is to use maximum likelihood estimation to test the co-integration relationship between multiple variables. The specific inspection process is as follows:
(1)运用单位根检验,确定海洋藻类生长相关数据的滞后阶数;(1) Use the unit root test to determine the lagging order of the data related to the growth of marine algae;
(2)构建向量自回归(VAR)模型,以k个变量,滞后阶数为1为例:(2) Construct a vector autoregressive (VAR) model, taking k variables with a lagging order of 1 as an example:
其中,Y1,t,Y2,t是因变量,X1,t,X2,t为自变量,β1,t,β2,t,π1,t,π2,t为回归系数,随机误差项u1,t,u2,t满足u1,t,u2,t IID(0,σ2),Cov(u1,t,u2,t)=0;Among them, Y 1,t ,Y 2,t are dependent variables, X 1,t ,X 2,t are independent variables, β 1,t, β 2,t ,π 1,t ,π 2,t are regression coefficients , the random error term u 1,t ,u 2,t satisfies u 1,t ,u 2,t IID(0,σ 2 ), Cov(u 1,t ,u 2,t )=0;
(3)构建向量误差修正(VECM)模型,残差序列e1,t,e2,t满足:(3) Construct a Vector Error Correction (VECM) model, the residual sequence e 1,t ,e 2,t satisfy:
(4)分析该VECM模型中影响矩阵的秩;(4) Analyze the influence matrix in the VECM model rank;
(401)进行矩阵M迹的统计检验:(401) carry out the statistical test of matrix M trace:
其中,为按降序排列的矩阵M的特征根;in, is the characteristic root of the matrix M arranged in descending order;
4.2.对M矩阵的非零特征根个数进行检验,确定协整关系。4.2. Check the number of non-zero characteristic roots of the M matrix to determine the co-integration relationship.
进一步的,在一些实施例中,结合图2-图7,以经济发展水平高的某海域为例对本实施例公开的一种海洋藻类生长污染协整预测方法进行详细说明。Further, in some embodiments, a cointegration prediction method for marine algae growth pollution disclosed in this embodiment is described in detail by taking a certain sea area with a high level of economic development as an example with reference to FIGS. 2-7 .
一种海洋藻类生长污染协整预测方法,包括如下步骤:A cointegration prediction method for marine algae growth pollution, comprising the following steps:
步骤1、建立藻类生长相关数据集,查看数据集中的缺失值是否影响最终的统计分析结果,如果不影响,删除;如果影响,则选择用插值法等进行缺失值的补充。对于数据集中没有具体数值,但是给出了数值的上界或下界的数据进行选择临界值。基于已有的藻类生长相关非线性动力系统分析与藻类实际数据与野外数据分析,删除不相关的非数值数据。
藻类生长相关数据集包括DM1面层区域的海水水质数据(数据来源epd.gov.hk),时间范围为1986/2/14~2021/12/4,包括五天生化需氧量(毫克/升)、氨氮(毫克/升)、叶绿素a(微克/升)、溶解氧饱和百分率(百分率)、硝酸盐氮(毫克/升)、酸碱值、脱镁色素(微克/升)、盐度、温度(摄氏)、无机氮(毫克/升)、总氮(毫克/升)和总磷(毫克/升)。The data set related to algae growth includes seawater quality data in the surface area of DM1 (data source epd.gov.hk), the time range is 1986/2/14~2021/12/4, including five-day biochemical oxygen demand (mg/L ), ammonia nitrogen (mg/L), chlorophyll a (μg/L), dissolved oxygen saturation percentage (percentage), nitrate nitrogen (mg/L), pH value, pheomagnesium pigment (μg/L), salinity, Temperature (Celsius), Inorganic Nitrogen (mg/L), Total Nitrogen (mg/L) and Total Phosphorus (mg/L).
步骤2、运用海洋藻类生长非线性动力系统、复杂系统等理论与真实数据的统计分析结果,统计分析经济发达但海洋藻类污染出现次数或污染面积出现减少趋势的区域藻类生长相关数据,给出其多元时间动态长期趋势。
步骤3、为了去除不同单位带来的影响,对数据用最大最小方法进行归一化处理。基于藻类生长相关实验分析与实证分析结果,海洋中的叶绿素a可以看作海洋赤潮或绿潮等污染现象的输出变量(核心影响变量)。先根据后海湾的海洋水质基本数据,分析五天生化需氧量(毫克/升)、氨氮(毫克/升)、总氮(毫克/升)、总磷(毫克/升)、叶绿素a(微克/升)、温度(摄氏)、透明度(米)、酸碱值、脱镁色素(微克/升)、溶解氧(毫克/升)、溶解氧饱和百分率(百分率)、盐度(psu)、硝酸盐氮(毫克/升)、亚硝酸盐氮(毫克/升)、正磷酸盐磷(微克/升)的相关关系,即核心影响变量与次级影响变量的相关关系。
根据图6,可知氨氮、总氮、总磷、正硝酸盐磷之间具有高度相关性,所以去掉总氮、总磷和正硝酸盐磷,保留氨氮数据,避免高度相关性的因素影响预测结果的准确性,保证自变量之间的独立性。但是因为影响因素过多,还需要进一步删除不相关因素,保留五天生化需氧量、氨氮、盐度、脱镁酸碱度、盐度、脱镁色素、温度、溶解氧、总磷、总磷9个因素。并运用迪基富勒检验(ADF)检验方法发现,该处理后的数据集并不平稳。所以用一阶差分再次判断变得平稳。According to Figure 6, it can be seen that ammonia nitrogen, total nitrogen, total phosphorus, and orthonitrate phosphorus are highly correlated, so the total nitrogen, total phosphorus, and orthonitrate phosphorus are removed, and the ammonia nitrogen data is retained to avoid highly correlated factors affecting the prediction results. Accuracy, to ensure independence between independent variables. However, because there are too many influencing factors, it is necessary to further delete irrelevant factors, and keep the five-day biochemical oxygen demand, ammonia nitrogen, salinity, pheomagnesizing pH, salinity, pheomagnesizing pigment, temperature, dissolved oxygen, total phosphorus, and
步骤4,根据藻类生长相关数据的多元数据相关系数矩阵,统计分析发现,平稳的海洋藻类生长相关数据之间已经不具备显著线性相关性,把X1五天生化需氧量、X2氨氮、X3温度、X4盐度、X5脱镁色素、X6硅、X7溶解氧、X8总氮、X9总磷9个因素保留为最重要的核心变量。同时根据图3显示输出变量Y叶绿素a与其他所有因素不存在单独的线性相关性,因此应考虑叶绿素a是否与其他变量成协整关系。运用Jcitest检验海洋藻类生长相关数据之间的协整关系,检验结果发现确实存在在协整关系。随后,建立多元线性回归模型:
Y=1.63X1+4.6516X2+3.6104X3-0.91236X4 Y=1.63X 1 +4.6516X 2 +3.6104X 3 -0.91236X 4
+1.0207X5-2.563X6+0.52605X7+0.33509X8 +1.0207X 5 -2.563X 6 +0.52605X 7 +0.33509X 8
+0.85703X9-0.023949(1)+0.85703X 9 -0.023949(1)
对应的多元线性回归模型的估计系数与对应统计量数值如表1所示。The estimated coefficients and corresponding statistical values of the corresponding multiple linear regression model are shown in Table 1.
表1多元线性回归的统计数值Table 1 Statistics of Multiple Linear Regression
F统计量为59.7,p值为7.36e-66。以上的模型(1)和表1的结果显示,叶绿素a浓度与五天生化需氧量、氨氮、盐度、脱镁色素、硅和溶解氧有显著的协整关系,其中盐度、硅对叶绿素a的浓度起到显著负向效应,其他四个因素起到显著正向作用。The F-statistic was 59.7 and the p-value was 7.36e-66. The above model (1) and the results of Table 1 show that there is a significant co-integration relationship between the concentration of chlorophyll a and the five-day biochemical oxygen demand, ammonia nitrogen, salinity, pheomagnesium pigment, silicon and dissolved oxygen, in which salinity, silicon The concentration of chlorophyll a played a significant negative effect, and the other four factors played a significant positive effect.
进一步的,在一些实施例中,以经济发展水平一般的某海域为例,结合图2-图7,对本实施例中的海洋藻类生长污染协整预测方法进行进一步说明。具体步骤如下:Further, in some embodiments, taking a certain sea area with a general level of economic development as an example, the cointegration prediction method for marine algae growth pollution in this embodiment is further described with reference to FIGS. 2-7 . Specific steps are as follows:
步骤1、建立藻类生长相关数据集,查看两个数据集中的缺失值是否影响最终的统计分析结果,如果不影响,删除;如果影响,则选择用插值法等进行缺失值的补充。对于数据集中没有具体数值,但是给出了数值的上界或下界的数据进行选择临界值。基于已有的藻类生长相关非线性动力系统分析与藻类实际数据与野外数据分析,删除不相关的非数值数据。海洋藻类生长相关数据包括数据有表层叶绿素a浓度、底层叶绿素a浓度、水柱平均叶绿素a的浓度(毫克/升)的长期数据与氮磷比、硅磷比、硅氮比等水体营养盐的数据,时间范围为1997年3月到2010年11月。
步骤2、运用海洋藻类生长非线性动力系统、复杂系统等理论与实证结果,统计分析经济发达但海洋藻类污染出现次数或污染面积出现减少趋势的区域藻类生长相关数据,给出其多元时间动态长期趋势。
步骤3、根据藻类生长相关数据的相关系数(见图6),发现表层叶绿素a浓度、底层叶绿素a浓度与水柱平均叶绿素a浓度之间存在显著相关性,硅磷比、硅氮比和硅酸盐存在显著线性相关性,所以保留水柱平均叶绿素a浓度、硅酸盐、溶解无机氮、氮磷比和硅磷比为核心变量,其中的硅磷比为输出变量,其他为核心解释变量。
步骤4、建立经济发展水平一般的某海域的藻类生长多元线性回归模型,藻类生长多元线性回归模型为:
Y=-0.023616X1+1.4527X2-0.29186X3+0.055603X4+0.13686 (2)Y=-0.023616X 1 +1.4527X 2 -0.29186X 3 +0.055603X 4 +0.13686 (2)
对应的统计量为表2。The corresponding statistics are listed in Table 2.
对应的估计系数与统计数值如表2所示。The corresponding estimated coefficients and statistical values are shown in Table 2.
表2多元线性回归的估计系数与对应统计数值Table 2 Estimated coefficients and corresponding statistical values of multiple linear regression
F统计量为32.6,p值为6.63e-13。以上的模型(2)和表2的结果显示,硅磷比与水柱平均叶绿素a浓度、硅酸盐、溶解无机氮、氮磷比之间存在显著的协整关系,其中溶解无机氮对硅磷比起到显著负向效应,氮磷比起到显著的正向作用。The F-statistic is 32.6 and the p-value is 6.63e-13. The above model (2) and the results of Table 2 show that there is a significant co-integration relationship between the silicon-phosphorus ratio and the average concentration of chlorophyll a in the water column, silicate, dissolved inorganic nitrogen, and nitrogen-phosphorus ratio. The ratio has a significant negative effect, and the nitrogen-phosphorus ratio has a significant positive effect.
根据模型(1)和模型(2),可以发现,对于经济发展程度高的后海湾海域,可以通过增加盐度、硅含量来显著降低叶绿素a的浓度,或者减少氨氮含量、五天生化需氧量、脱镁色素含量和溶解氧的含量来显著减少叶绿素a的浓度。而对于经济发展水平程度较低的胶州湾海域,可以通过增加溶解无机氮的含量来显著减少硅磷比,或者增加硅酸盐的含量和氮磷比的含量来显著减少硅磷比的含量。According to model (1) and model (2), it can be found that for the deep bay sea area with a high degree of economic development, the concentration of chlorophyll a can be significantly reduced by increasing salinity and silicon content, or reducing ammonia nitrogen content, five-day biochemical oxygen demand amount, pheomagnesin content and dissolved oxygen content to significantly reduce the concentration of chlorophyll a. For the Jiaozhou Bay area with a low level of economic development, the ratio of silicon to phosphorus can be significantly reduced by increasing the content of dissolved inorganic nitrogen, or the content of silicon and phosphorus can be significantly reduced by increasing the content of silicate and nitrogen to phosphorus ratio.
实施例二Embodiment two
本实施例公开了一种海洋藻类生长污染协整预测系统,包括:This embodiment discloses a cointegration prediction system for marine algae growth pollution, including:
数据处理模块,被配置为:获取海洋藻类生长相关数据,并对海洋藻类生长相关数据进行预处理;The data processing module is configured to: obtain data related to the growth of marine algae, and preprocess the data related to the growth of marine algae;
协整关系预测模块,被配置为:将预处理后的海洋藻类生长相关数据输入预设的协整预测模型进行处理,获取海洋藻类生长影响变量之间的协整关系,以根据协整关系抑制海洋藻类生长污染;其中,将预处理后的海洋藻类生长相关数据输入预设的协整预测模型进行处理包括:The co-integration relationship prediction module is configured to: input the pre-processed marine algae growth-related data into the preset co-integration prediction model for processing, and obtain the co-integration relationship between the variables affecting the growth of marine algae, so as to suppress according to the co-integration relationship Marine algae growth pollution; wherein, inputting the preprocessed marine algae growth-related data into a preset co-integration prediction model for processing includes:
对预处理后的海洋藻类生长相关数据进行稳定性分析,获取藻类多元时间动态生长趋势曲线;Perform stability analysis on the preprocessed data related to the growth of marine algae, and obtain the multivariate temporal dynamic growth trend curve of algae;
对预处理后的海洋藻类生长相关数据进行归一化处理,基于藻类多元时间动态生长趋势曲线,确定藻类生长污染的关键影响变量和次级影响变量;Normalize the data related to the growth of marine algae after preprocessing, and determine the key influencing variables and secondary influencing variables of algae growth pollution based on the multivariate time dynamic growth trend curve of algae;
计算关键影响变量和次级影响变量之间的相关系数矩阵,建立关键影响变量和次级影响变量之间的协整关系;运用Jcitest方法检验关键影响变量和次级影响变量之间的协整关系,建立多元线性回归模型。Calculate the correlation coefficient matrix between the key influencing variables and the secondary influencing variables, and establish the cointegration relationship between the key influencing variables and the secondary influencing variables; use the Jcitest method to test the cointegration relationship between the key influencing variables and the secondary influencing variables , to build a multiple linear regression model.
此处需要说明的是,上述数据处理模块、协整关系预测模块对应于实施例一中的步骤,上述模块与对应的步骤所实现的示例和应用场景相同,但不限于上述实施例一所公开的内容。需要说明的是,上述模块作为系统的一部分可以在诸如一组计算机可执行指令的计算机系统中执行。What needs to be explained here is that the above-mentioned data processing module and cointegration relationship prediction module correspond to the steps in the first embodiment, and the examples and application scenarios implemented by the above-mentioned modules and the corresponding steps are the same, but are not limited to those disclosed in the first embodiment above Content. It should be noted that, as a part of the system, the above-mentioned modules can be executed in a computer system such as a set of computer-executable instructions.
实施例三Embodiment three
本发明实施例三提供一种电子设备,包括存储器和处理器以及存储在存储器上并在处理器上运行的计算机指令,计算机指令被处理器运行时,完成上述海洋藻类生长污染协整预测方法的步骤。
实施例四Embodiment four
本发明实施例四提供一种计算机可读存储介质,用于存储计算机指令,所述计算机指令被处理器执行时,完成上述海洋藻类生长污染协整预测方法的步骤。
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and a combination of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing equipment produce a An apparatus for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions The device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device, and a series of operational steps are executed on the computer or other programmable device to produce a computer-implemented process, so that the instructions executed on the computer or other programmable device Steps are provided for implementing the functions specified in the flow chart or flow charts and/or block diagram block or blocks.
上述实施例中对各个实施例的描述各有侧重,某个实施例中没有详述的部分可以参见其他实施例的相关描述。The description of each embodiment in the foregoing embodiments has its own emphases, and for parts not described in detail in a certain embodiment, reference may be made to relevant descriptions of other embodiments.
以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310301771.3A CN116362394A (en) | 2023-03-22 | 2023-03-22 | Synergistic prediction method and system for marine algae growth pollution |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310301771.3A CN116362394A (en) | 2023-03-22 | 2023-03-22 | Synergistic prediction method and system for marine algae growth pollution |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116362394A true CN116362394A (en) | 2023-06-30 |
Family
ID=86941429
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310301771.3A Pending CN116362394A (en) | 2023-03-22 | 2023-03-22 | Synergistic prediction method and system for marine algae growth pollution |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116362394A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117093956A (en) * | 2023-10-19 | 2023-11-21 | 美赞臣婴幼儿营养品技术(广州)有限公司 | Method and device for predicting tap density of dry-mixed finished product |
CN117725345A (en) * | 2024-02-08 | 2024-03-19 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | Multi-source remote sensing green tide growth rate measuring method based on green tide biomass density |
CN119207627A (en) * | 2024-12-02 | 2024-12-27 | 江西省生态文明研究院(江西省山江湖开发治理委员会办公室) | A dynamic monitoring method and system for river micropollutants based on artificial intelligence |
-
2023
- 2023-03-22 CN CN202310301771.3A patent/CN116362394A/en active Pending
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117093956A (en) * | 2023-10-19 | 2023-11-21 | 美赞臣婴幼儿营养品技术(广州)有限公司 | Method and device for predicting tap density of dry-mixed finished product |
CN117093956B (en) * | 2023-10-19 | 2024-02-20 | 美赞臣婴幼儿营养品技术(广州)有限公司 | Method and device for predicting tap density of dry-mixed finished product |
CN117725345A (en) * | 2024-02-08 | 2024-03-19 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | Multi-source remote sensing green tide growth rate measuring method based on green tide biomass density |
CN117725345B (en) * | 2024-02-08 | 2024-05-31 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | Multi-source remote sensing green tide growth rate measuring method based on green tide biomass density |
CN119207627A (en) * | 2024-12-02 | 2024-12-27 | 江西省生态文明研究院(江西省山江湖开发治理委员会办公室) | A dynamic monitoring method and system for river micropollutants based on artificial intelligence |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN116362394A (en) | Synergistic prediction method and system for marine algae growth pollution | |
CN103886218B (en) | Storehouse, the lake algal bloom Forecasting Methodology compensated with neutral net and support vector machine based on polynary non-stationary time series | |
Zheng et al. | Prediction of harmful algal blooms in large water bodies using the combined EFDC and LSTM models | |
Wei et al. | Use of artificial neural network in the prediction of algal blooms | |
CN112488415A (en) | Power load prediction method based on empirical mode decomposition and long-and-short-term memory network | |
Rousset et al. | The Louvain-La-Neuve sea ice model LIM3. 6: global and regional capabilities | |
Palani et al. | An ANN application for water quality forecasting | |
Wang et al. | A full-view management method based on artificial neural networks for energy and material-savings in wastewater treatment plants | |
Karim et al. | Simulation of eutrophication and associated occurrence of hypoxic and anoxic condition in a coastal bay in Japan | |
Hao et al. | Prediction of the landscape pattern of the Yancheng Coastal Wetland, China, based on XGBoost and the MCE-CA-Markov model | |
Yuan et al. | Development of an integrated model for assessing the impact of diffuse and point source pollution on coastal waters | |
Song et al. | Study on turbidity prediction method of reservoirs based on long short term memory neural network | |
Li et al. | A novel combined prediction model for monthly mean precipitation with error correction strategy | |
Lei et al. | Neural ordinary differential grey model and its applications | |
Ji et al. | GHG-mitigation oriented and coal-consumption constrained inexact robust model for regional energy structure adjustment–A case study for Jiangsu Province, China | |
CN114330132B (en) | An AI-based ENSO Diversity Forecast Method | |
Aparna et al. | Optimizing wastewater treatment plant operational efficiency through integrating machine learning predictive models and advanced control strategies | |
Leng et al. | Incorporating receiving waters responses into the framework of spatial optimization of LID-BMPs in plain river network region | |
Gaska et al. | Optimization of biological wastewater treatment process by hierarchical adaptive control | |
Li et al. | Prediction and elucidation of the population dynamics of Microcystis spp. in Lake Dianchi (China) by means of artificial neural networks | |
CN117408385A (en) | Improved cellular automaton cyanobacteria bloom prediction method | |
Liang et al. | Parameter optimization method for the water quality dynamic model based on data-driven theory | |
Eddaoudi et al. | A Brief Review of Energy Consumption Forecasting Using Machine Learning Models | |
Tsai et al. | Probabilistic eutrophication risk mapping in response to reservoir remediation | |
Liang et al. | Integrating machine learning algorithm with sewer process model to realize swift prediction and real-time control of H2S pollution in sewer systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |