CN116864026A - Ocean dissolved oxygen concentration reconstruction model construction method based on Argo warm salt profile - Google Patents

Ocean dissolved oxygen concentration reconstruction model construction method based on Argo warm salt profile Download PDF

Info

Publication number
CN116864026A
CN116864026A CN202310062810.9A CN202310062810A CN116864026A CN 116864026 A CN116864026 A CN 116864026A CN 202310062810 A CN202310062810 A CN 202310062810A CN 116864026 A CN116864026 A CN 116864026A
Authority
CN
China
Prior art keywords
dissolved oxygen
data
oxygen concentration
training
space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310062810.9A
Other languages
Chinese (zh)
Inventor
薛存金
王振国
岳林峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aerospace Information Research Institute of CAS
Original Assignee
Aerospace Information Research Institute of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aerospace Information Research Institute of CAS filed Critical Aerospace Information Research Institute of CAS
Priority to CN202310062810.9A priority Critical patent/CN116864026A/en
Publication of CN116864026A publication Critical patent/CN116864026A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/30Prediction of properties of chemical compounds, compositions or mixtures
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of ocean information, in particular to a method for constructing an ocean dissolved oxygen concentration reconstruction model based on an Argo warm salt profile, which comprises the following steps: acquiring month scale dissolved oxygen data of any target depth; constructing ocean dissolved oxygen space-time classification partitions under different target depths; constructing a regression prediction model of the corresponding space-time classification partition; and (3) completing reconstruction of corresponding time-space classification partitions and month scale warm salt data under the target depth, and constructing a marine dissolved oxygen concentration reconstruction model. The invention overcomes the defect of discrete sparsity of each longitude and latitude coordinate of the data in the target depth plane in the Argo system through spatial interpolation; and constructing a regression prediction model by utilizing the coupling relation between the temperature and salinity at the same position and the dissolved oxygen concentration, and constructing a marine dissolved oxygen concentration reconstruction model based on the reconstruction of the temperature and salinity at the corresponding position, so that the influence of data sparseness on interpolation precision is reduced.

Description

Ocean dissolved oxygen concentration reconstruction model construction method based on Argo warm salt profile
Technical Field
The invention relates to the technical field of ocean information, in particular to a method for constructing an ocean dissolved oxygen concentration reconstruction model based on an Argo warm salt profile.
Background
Marine Dissolved Oxygen (DOXY) is oxygen dissolved in a water body, provides a necessary biochemical environment for the survival of marine organisms, and is an essential important substance for the life activities of the marine organisms. The concentration of dissolved oxygen in seawater is not only an important basis for measuring the quality of seawater, evaluating the main index of marine ecological environment, marine science experiments and resource exploration, but also an essential parameter for knowing the geochemical process of marine organisms, global climate change and marine carbon circulation.
The existing global dissolved oxygen concentration space grid data is mainly based on ship survey, anchor buoys and underwater intelligent detection equipment, and the continuous updating capability of the data is poor. Of which shipboard is the predominant mode, such as water sample collection by CTD water sampling followed by chemical analytical titration of continuous (discrete) water column samples. In the prior art, the ship measurement mode has the defects of too low sampling rate and insufficient space-time resolution, and meanwhile, the time cost and the economic cost of ship-borne observation are very high, the ship measurement mode is limited by the influence of extreme weather, and the observation data are very deficient under the conditions of severe sea conditions, polar sea areas and the like. At present, the observation data of ocean dissolved oxygen are still relatively sparse and the space-time distribution is uneven, which greatly limits the understanding and understanding of the global ocean dissolved oxygen space-time distribution and the influence of physical processes, so that the observation and analysis data processing method of the dissolved oxygen, which covers more time and space ranges, is urgently needed in the market.
Disclosure of Invention
In view of the above analysis, the present invention aims to provide a method for constructing a marine dissolved oxygen concentration reconstruction model based on Argo warm salt profile, which is used for solving at least one of the existing technical problems.
The aim of the invention is mainly realized by the following technical scheme:
the invention provides a marine dissolved oxygen concentration reconstruction model construction method based on an Argo warm salt profile, which comprises the following steps:
screening and obtaining weather state month scale dissolved oxygen data based on Argo dissolved oxygen profile data of the same month of the year; acquiring month scale dissolved oxygen data of any target depth based on the climatic state month scale dissolved oxygen data; the climatic state month scale dissolved oxygen data comprise calendar year dissolved oxygen data of different months and different depths;
dividing a plane corresponding to each target depth into a plurality of space units by using longitude and latitude coordinates, carrying out first interpolation calculation on the dissolved oxygen concentration of different space units under the same target depth based on the lunar scale dissolved oxygen data of the target depth to obtain the dissolved oxygen concentration of the central point of the space unit, and taking the dissolved oxygen concentration as the dissolved oxygen concentration of the space unit; based on the dissolved oxygen concentration of each space unit, dividing the space units with the dissolved oxygen concentration close to the target depth into the same type of region, and completing the ocean dissolved oxygen space-time classification partition under the target depth; partitioning the different target depths to construct ocean dissolved oxygen space-time classification partitions under different target depths;
Screening and obtaining month scale warm salt data of the same target depth based on Argo warm salt profile data of the same month of the past year; selecting month scale temperature and salinity data and month scale dissolved oxygen concentration data with longitude and latitude coordinates in the same time-space classification partition, the same target depth and the same longitude and latitude coordinates, and constructing a regression prediction model of the time-space classification partition by taking temperature and salinity as variables and dissolved oxygen concentration as dependent variables;
acquiring correction values of the dissolved oxygen concentration of each sampling point with different longitudes and latitudes in the corresponding space-time classification partition under the target depth by a regression prediction model based on month-scale warm salt data of month-scale warm salt data sampling points; the sampling points are data sampling points of the Argo system at different depth and longitude and latitude coordinates; and carrying out second interpolation calculation based on the correction value of the dissolved oxygen concentration under the target depth to obtain the corrected dissolved oxygen concentration of the central point of the space unit, completing the reconstruction of corresponding space-time classification partitions and month-scale warm salt data under the target depth, and constructing ocean dissolved oxygen concentration reconstruction models comprising different target depths and different space-time partitions.
Preferably, the obtaining the climatic state month scale dissolved oxygen data includes:
Acquiring the profile data of the Argo annual dissolved oxygen;
taking month as screening parameter, screening the profile data of the dissolved oxygen of Argo over the years to obtain eachMonth calendar climatic state month scale dissolved oxygen data Doxy M (h) The method comprises the steps of carrying out a first treatment on the surface of the Wherein M is 1-12, h is depth.
Preferably, the performing a first interpolation calculation on the dissolved oxygen concentration of different space units at the same target depth includes: and taking the central point of the space unit as a target point, taking data points near the central point of the space unit of the dissolved oxygen data with the same depth-month scale as adjacent sampling points, and giving different weights based on the distances between the adjacent sampling points and the central point of the space unit to obtain the concentration of the dissolved oxygen at the central point of the space unit.
Preferably, the target point dissolved oxygen concentration, the dissolved oxygen concentration Doxy, satisfies the following formula:
where i is the number of adjacent sampling points, n is the number of adjacent sampling points designated to perform interpolation, d i Doxy is the distance from the ith adjacent sample point to the target point i Represents the dissolved oxygen concentration of the adjacent sampling point i where interpolation is performed.
Preferably, the constructing a regression prediction model with temperature and salinity as variables and dissolved oxygen concentration as a dependent variable includes:
based on the month scale dissolved oxygen data of the same time-space classification partition and the target depth and the month scale temperature salt data of the same target depth and the same month, a test set and a training set containing variable temperature, salinity, longitude, latitude, year and dependent variable dissolved oxygen concentration are constructed; the data units of the test set and the training set comprise dependent variables and variables which are in one-to-one correspondence;
Dividing a training set into k training subsets by adopting a random sampling mode, dividing the training subsets, selecting a dividing point with the minimum mean square value of two divided training subset data as an optimal dividing point, and constructing k decision tree models of decision trees with different dividing structures; obtaining predicted values of all the decision trees by the average value of the output values of all the data points of all the decision trees of all the optimal segmentation points in all the decision subtrees of all the decision trees; taking the average value of the k decision tree predicted values as the prediction output by the regression prediction model;
substituting the test set into a random forest regression model to obtain a predicted value of the dissolved oxygen concentration, and evaluating the accuracy of the model based on the predicted value of the dissolved oxygen concentration and a true value of the dissolved oxygen concentration in the test set; if the accuracy meets the preset requirement, training of the regional ocean dissolved oxygen concentration reconstruction model is completed, and if the accuracy does not meet the requirement, the maximum depth of the decision tree and the number of the decision tree are adjusted until the model accuracy meets the preset requirement.
Preferably, the constructing k decision tree models of the decision subtrees with different splitting structures includes:
obtaining an optimal segmentation point corresponding to the minimum value of the mean square value of the two segmented training subsets, wherein the mean square value RMSE meets the formula:
R 1 And R is R 2 The method meets the following conditions:
R 1 (j,s)={x|x (j) ≤s},R 2 (j,s)={x|x (j) >s};
wherein x is an input characteristic value, and y is an output value; c 1 An average value output for the left subregion; c 2 An average value output for the right subregion; j is the optimal segmentation variable, s is the optimal segmentation point, and x (j) A characteristic value representing x=j; r is R 1 To cut the rear left subregion, R 2 Is the right side subarea after segmentation; y is i An output value representing an i-th data point; min represents taking the minimum value for the region of the min function near the right side;
based on the average value of the output values of the data points in each decision subtree in each decision tree, obtaining the predicted value of each decision tree and the predicted value c of the decision tree k The method meets the following conditions:
wherein y is i Representing the ith data pointOutput value, N k Representing the number of data points in the kth training subset, c k Representing the average value of the output values of the decision tree corresponding to the kth training subset;
taking the average value of the k decision tree predicted values as the predicted value output by the regression prediction model, and taking the average value c of the k decision tree predicted values as the average value c - The method meets the following conditions:
wherein l is less than or equal to k, and represents c in the first training subset l Representing the average of the output values of the decision tree corresponding to the first training subset.
Preferably, the obtaining the optimal segmentation point corresponding to the minimum value of the mean square value of the two segmented training subsets includes:
Selecting any variable in the training subset to carry out multiple segmentation, and solving the optimal segmentation point s of the variable 1 The method comprises the steps of carrying out a first treatment on the surface of the Mean Square Error (RMSE) corresponding to optimal dividing point 1 The method meets the following conditions:
wherein c 1 An average value output for the left subregion; c 2 An average value output for the right subregion; r is R 1 To cut the rear left subregion, R 2 Is the right side subarea after segmentation; y is i An output value representing an i-th data point; min represents taking the minimum value for the region of the min function near the right side;
and sequentially segmenting the rest variables in the training subset according to the same method, calculating the mean square error, solving the optimal segmentation point of each variable, comparing the mean square error corresponding to the optimal segmentation point of each variable, taking the optimal segmentation point corresponding to the minimum mean square error as the optimal segmentation point s of the training subset, and taking the corresponding variable as the optimal segmentation variable j.
Preferably, the adjusting the maximum depth of the decision tree and the number of the decision trees includes a cross-validation method, wherein the cross-validation method includes:
dividing the training set into a training part, a verification part and a test part; wherein the training part is used for model training, the verification part is used for adjusting parameters, and the test part is used for measuring the performance of the model; dividing the training set into k training subsets continuously, wherein the training subsets are equally divided by the middle position;
Respectively making each training subset as a verification part, taking the rest k-1 training subset data as training parts, and obtaining k models;
taking the average number of root mean square errors of the final verification parts of the k models as performance indexes corresponding to the k models, and recording decision tree parameters corresponding to the performance indexes;
traversing k models, repeating the steps, and selecting the corresponding decision tree parameters as optimal parameters when the performance index is optimal.
Preferably, the performance index corresponding to the model is root mean square error; the decision tree parameters comprise the maximum depth of the decision tree and the number of the decision tree.
Preferably, the random sampling mode includes Bootstrap sampling.
Compared with the prior art, the invention has at least one of the following beneficial effects:
(1) According to the method, the lunar scale warm salt profile data of the same target depth are obtained through screening, the lunar scale warm salt data are used for reconstructing the dissolved oxygen concentration of the space unit in the same partition, the problems of large interpolation calculation and actual deviation of the dissolved oxygen concentration of the space unit caused by local excessive sparsity of the data on different longitude and latitude coordinates of the Argo profile database are solved, and the prediction accuracy of the dissolved oxygen concentration is improved.
(2) According to the method, the plane corresponding to each target depth is divided into different space units by using longitude and latitude coordinates, the dissolved oxygen concentration of the target depth is used for carrying out interpolation calculation on the dissolved oxygen concentrations of different space units under the same target depth, so that the dissolved oxygen concentrations of different space units under the same target depth are obtained, and further the dissolved oxygen concentrations of different space units under different target depths are obtained, and the defect of sparse data of the existing Argo profile database on different longitude and latitude coordinates is greatly overcome.
(3) The method is based on the dissolved oxygen concentration obtained by carrying out first interpolation calculation on the dissolved oxygen concentrations of different space units under the same target depth, the data points are partitioned, and a prediction model of temperature and salinity on the dissolved oxygen is constructed in a region with the dissolved oxygen concentration close to the data points; on the basis of introducing temperature and salinity to interpolation calculation correction of dissolved oxygen, the number of predictive models to be constructed is reduced, the workload is reduced, and the efficiency is improved.
(4) According to the invention, the Bootstrap random sampling which can be repeatedly sampled is adopted, so that a data set with better uniformity can be obtained, the data units with different variables in the training set and the test set are more uniformly dispersed, and adverse effects on decision tree prediction precision in the similar variable data set are avoided; meanwhile, as Bootstrap can count the variance of the sample, the smaller the variance is, the better the uniformity of sampling is, and the random sampling result is convenient for quantitative characterization.
(5) According to the method, the spatial weight related to the distance is introduced, the distance factor of the distance space unit is considered, the error of the data correlation evaluation is reduced, and the accuracy of interpolation calculation is improved.
(6) Compared with the traditional ship sampling mode, the self-powered active buoy type sampling system represented by the Argo system has lower use cost, is more flexible in sampling mode, and can adapt to the task scene of simultaneous large-scale sampling in different places; compared with the traditional passive buoy sampling, the active buoy can finish sampling a larger area through active motion, has higher sampling efficiency and resistance to a ocean current system, and can finish high-efficiency and accurate sampling of a given area.
In the invention, the technical schemes can be mutually combined to realize more preferable combination schemes. Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention may be realized and attained by the embodiments of the invention particularly pointed out in the written description and drawings.
Drawings
The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, like reference numerals being used to refer to like parts throughout the several views.
FIG. 1 is a flow chart of a method for constructing a marine dissolved oxygen concentration reconstruction model based on Argo warm salt profile;
FIG. 2 is a graph showing the concentration profile of 1000dbar Argo marine dissolved oxygen at 2015 during 3 months in an embodiment of the invention;
FIG. 3 is a graph showing the temperature and salinity data profile of a 1000dbar Argo ocean at 2015, 3 months, in accordance with one embodiment of the present invention;
FIG. 4 is an interpolation of ocean dissolved oxygen 3 month climatic state data in one embodiment of the invention;
FIG. 5 is a graph showing a 3 month climatic state partition of ocean dissolved oxygen concentration in one embodiment of the invention;
FIG. 6 is a graph showing the concentration profile of 1000dbar in 2005 ocean for 3 months in one embodiment of the present invention;
FIG. 7 is a graph showing relative error statistics for each partition model test set in accordance with one embodiment of the present invention;
FIG. 8 is a graph showing the comparison of the true values and the reconstructed values of each partition model test set in an embodiment of the present invention;
FIG. 9 is a graph showing the relative error between the reconstructed value and the measured value of the dissolved oxygen in one embodiment of the present invention.
Detailed Description
The following detailed description of preferred embodiments of the invention is made in connection with the accompanying drawings, which form a part hereof, and together with the description of the embodiments of the invention, are used to explain the principles of the invention and are not intended to limit the scope of the invention.
In order to better explain the technical scheme of the invention, the following technical terms are specifically described:
absolute error of
The pair error is the absolute value of the difference between the measured value and the true value, i.e., absolute error= |measured value-true value|.
Relative error
The relative error is the percentage of the absolute error to the true value, i.e., the relative error= |measurement-true value|/true value.
Mean square error
The mean square error is an expected value reflecting the degree of difference between the estimated amount and the true amount, and is often used to evaluate the degree of change of data and predict the accuracy of the data.
Root mean square error
Square of mean square error.
Average absolute error
The mean absolute error is the average of the absolute values of the deviations of all individual observations from the arithmetic mean; the average absolute error can avoid the problem of mutual offset of errors, so that the magnitude of the actual prediction error can be accurately reflected.
Determining coefficients
The higher the decision coefficient, the higher the degree to which it represents that it can be interpreted, the better the regression model is.
Minimum percent error
Minimum percentage error= |measurement-true value|/true value x 100%.
Maximum depth of decision tree
The maximum depth of the decision tree refers to the number of splitting times of the decision subtree meeting the precision requirement obtained by splitting the original data set.
Number of decision trees
The number of decision trees refers to the minimum level decision subtrees which meet the precision requirement after splitting in the decision trees.
dbar
dbar is the ocean depth unit, 1dbar equals 1m.
Based on the technical problems in the prior art, the invention provides a marine dissolved oxygen concentration reconstruction model based on global marine organism geochemical buoy profile data.
The Argo (Array for real-time geostrophic oceanography) is the only real-time observation system for global stereoscopic observation of the upper ocean in the field of bio-geochemistry, provides 24 ten thousand pieces of dissolved oxygen profile data in the global scope, and provides an important data base for understanding and analyzing the current characteristics and change trend of global ocean dissolved oxygen.
The Argo adopts an active buoy type sampling system with power and operates in a manner of 'submerging-drifting at a preset depth layer-submerging again-ascending measurement-surface layer positioning and data transmission'. The Argo buoy can only measure the dissolved oxygen with different depths at one longitude and latitude place in an upward floating stage in one operation period, but is limited by the design of a sampling scheme of Argo, and the problem that the Argo system has sparse coordinate data of different longitudes and latitudes is difficult to avoid.
Meanwhile, as ocean dissolved oxygen mainly comes from photosynthesis of the atmosphere and phytoplankton, temperature and salinity are main factors influencing the concentration of ocean dissolved oxygen, in general, the higher the temperature is, the higher the salinity is, and the lower the concentration of dissolved oxygen is. The invention relates to a space-time coupling relation between the temperature and the salinity of different depths of a water area and the concentration of dissolved oxygen, wherein an Argo buoy observes a temperature-salt section while observing the dissolved oxygen, and meanwhile, the data quantity of the temperature and the salinity in an Argo database reaches 220 ten thousand and is far more than that of the dissolved oxygen data (22 ten thousand).
The ocean dissolved oxygen interpolation method based on the profile space features is established, standard observation depth of layer global climate state month scale ocean dissolved oxygen space grid products are developed, and a data basis is provided for developing global ocean ecological health assessment and sustainable development management.
Based on the problems found by the inventor in the research process, the invention discloses a marine dissolved oxygen concentration reconstruction model construction method based on an Argo warm salt profile, which comprises the following steps:
Step 1: screening and obtaining weather state month scale dissolved oxygen data based on Argo dissolved oxygen profile data of the same month of the year; acquiring month scale dissolved oxygen data of any target depth based on the climatic state month scale dissolved oxygen data; the climatic state month scale dissolved oxygen data comprise calendar year dissolved oxygen data of different months and different depths;
specifically, the method for acquiring the climatic state month scale dissolved oxygen data comprises the following steps: accessing the Argo annual dissolved oxygen profile data, and screening the annual climatic state month scale dissolved oxygen data Doxy of the Mth month from the Argo annual dissolved oxygen profile data by taking months as screening parameters M (h) Wherein M is 1-12, h is depth. Such as Argo dissolved oxygen profile data provides dissolved oxygen data Doxy (Y, M, h) at different depths from 1.1.2010 to 1.2021.12.31; screening by taking months as classification parameters to obtain the past year dissolved oxygen data of different months: doxy 1 (h),…,Doxy M (h) The method comprises the steps of carrying out a first treatment on the surface of the Wherein Y is year.
Specifically, the climatic state month scale dissolved oxygen data comprise month scale dissolved oxygen data under various depths, and the climatic state month scale dissolved oxygen data are specifically determined by the acquisition mode of the annual dissolved oxygen profile data of Argo;
specifically, the acquisition of the month scale dissolved oxygen data of any target depth includes: and selecting the lunar scale dissolved oxygen data closest to the target depth from the climatic state lunar scale dissolved oxygen data as the lunar scale dissolved oxygen data of the target depth.
Illustratively, the various target depths selected by the present invention are shown in Table 1:
TABLE 1 target depth of layer for marine element observations above 2000 meters
Standard layer Depth (Rice) Standard layer Depth (Rice) Standard layer Depth (Rice)
1 10 10 200 19 1000
2 20 11 250 20 1100
3 30 12 300 21 1200
4 40 13 400 22 1300
5 50 14 500 23 1400
6 75 15 600 24 1500
7 100 16 700 25 1750
8 125 17 800 26 2000
9 150 18 900
Illustratively, the Argo buoy system performs data collection in the following manner, specifically, in five phases to form a cycle period:
(1) a sinking stage;
(2) a stagnant layer drifting stage;
(3) a sinking stage again;
(4) a step of ascending measurement;
(5) drifting and simultaneously communicating with the satellite.
Illustratively, the Argo dissolved oxygen profile data obtained from Argo buoy measurements select the data points with the same longitude and latitude coordinates closest to the target depth of table 1, as shown in table 2, with the depths of the data points being:
table 2 data point depth of closest depth to target depth observed for marine elements above 2000 meters
Standard layer Depth (Rice) Standard layer Depth (Rice) Standard layer Depth (Rice)
1 13 10 193 19 999
2 22 11 245 20 1080
3 32 12 308 21 1190
4 39 13 386 22 1320
5 52 14 489 23 1430
6 78 15 568 24 1520
7 97 16 634 25 1779
8 120 17 789 26 2005
9 149 18 987
Step 2: dividing a plane corresponding to each target depth into a plurality of space units by using longitude and latitude coordinates, performing first interpolation calculation based on the lunar scale dissolved oxygen data of the target depth to obtain the dissolved oxygen concentration of the central point of the space unit, and taking the dissolved oxygen concentration as the dissolved oxygen concentration of the space unit; based on the dissolved oxygen concentration of each space unit, dividing the space units with the dissolved oxygen concentration close to the target depth into the same type of region, and completing the ocean dissolved oxygen space-time classification partition under the target depth; and carrying out partition treatment on different target depths, and constructing ocean dissolved oxygen space-time classification partitions under different target depths.
Specifically, the dissolved oxygen concentration of the same target depth under a plurality of longitude and latitude coordinates is divided into space units according to the longitude and latitude coordinates, and interpolation calculation is carried out on the dissolved oxygen concentration of the central point of the space unit based on the dissolved oxygen concentration of the data point of the month scale dissolved oxygen data under the same target depth near the central point of the space unit and the central distance between the nearby data point and the space unit.
Specifically, a dissolved oxygen concentration isosurface with fixed intervals is set, the isosurface is divided into a plurality of grade ranges with different dissolved oxygen concentrations according to the fixed intervals, space units are divided into different grades based on the dissolved oxygen concentration of the space units, and the space units with the same grade are divided into the same type of region.
In the implementation, the isosurface is divided into different grades of dissolved oxygen concentration according to fixed intervals: 0-r, r-2 r, …, (v-1) r-vr; wherein r is the equivalent surface spacing of the concentration of dissolved oxygen, and v is any natural number greater than or equal to 1.
The areas with close dissolved oxygen concentration have similar environments, and the influences of the temperature and the salt concentration on the dissolved oxygen are close, so that unified researches are carried out on the areas with the same classification, and the same prediction model is constructed.
The space unit with the close dissolved oxygen concentration refers to a space unit with the same grade of isosurface and adjacent longitude and latitude coordinates, that is, the space unit with the close dissolved oxygen concentration has the adjacent longitude and latitude and meets the following conditions: as an example, the dissolved oxygen concentrations of the space units having the near dissolved oxygen concentrations are all E [ (v-1) r, vr ].
Step 3: screening and obtaining month scale warm salt data of the same target depth based on Argo warm salt profile data of the same month of the past year; and selecting month scale temperature and salinity data and month scale dissolved oxygen concentration data with longitude and latitude coordinates in the same time-space classification partition, the same target depth and the same longitude and latitude coordinates, and constructing a regression prediction model of the time-space classification partition by taking temperature and salinity as variables and dissolved oxygen concentration as dependent variables.
The above-mentioned space-time classification partition refers to classifying the regions adjacent to each other in terms of longitude and latitude coordinates and having a close concentration of dissolved oxygen into regions of the same type as the space-time classification.
It should be noted that, the regression prediction model is affected by the space-time classification partition, the target depth, and the month, and varies with the space-time classification partition, the target depth, and the month.
Specifically, the month scale salt temperature data is obtained from the Argo salt temperature profile data by the same method as the month scale dissolved oxygen data.
In order to construct the prediction model, the month-scale warm salt data and the month-scale dissolved oxygen concentration data are selected to be the warm salt concentration data and the dissolved oxygen concentration data which are obtained by an Argo buoy system at the same position and at the same time of the same sampling point; in order to construct the regression prediction model, the data points used in the steps 1-3 originate from Argo sampling points with both warm salt concentration data and dissolved oxygen concentration data.
Meanwhile, as the measurement condition of the dissolved oxygen concentration is more severe, the quantity of the effective dissolved oxygen concentration data in the Argo database is still far smaller than that of the temperature and salt concentration data.
Specifically, based on two kinds of data of sampling points with dissolved oxygen concentration and temperature and salinity data in a certain space-time classification partition, constructing a regression prediction model which takes temperature and salinity data as variables and takes dissolved oxygen concentration as a dependent variable, wherein the regression prediction model corresponds to the space-time classification partition level one by one; substituting the temperature and salinity data of the sampling points with temperature and salinity data in the time-space classification partition into a regression prediction model corresponding to the time-space classification partition, and obtaining a dissolved oxygen concentration data correction value of the sampling points with temperature and salinity data.
Specifically, a regression prediction model is constructed by using a random forest regression method.
In implementation, data points of the same time-space classification partition, the same target depth and the same month are selected to construct a prediction model, and the temperature and the salinity in month-scale warm salt data of the same target depth and the same month are used as variables to construct a regression prediction model by taking the concentration of dissolved oxygen as a dependent variable.
Step 4: acquiring correction values of dissolved oxygen concentration of each sampling point with different longitudes and latitudes in corresponding space-time classification partitions under the target depth by a regression prediction model based on month scale warm salt data of month scale warm salt data sampling points; the sampling points are data sampling points of the Argo system at different depth and longitude and latitude coordinates; and carrying out second interpolation calculation based on the correction value of the dissolved oxygen concentration under the target depth to obtain the corrected dissolved oxygen concentration of the central point of the space unit, completing the reconstruction of corresponding space-time classification partitions and month-scale warm salt data under the target depth, and constructing ocean dissolved oxygen concentration reconstruction models comprising different target depths and different space-time partitions.
Specifically, a basic data unit in month scale temperature and salt data is taken as a sampling point, and the temperature, the salinity, the longitude and latitude coordinates and the year of the month scale temperature and salt data sampling point are substituted into a regression prediction model corresponding to a space-time classification partition and a target depth to obtain a dissolved oxygen concentration correction value of the sampling point.
Specifically, interpolation calculation is performed on the correction value of the dissolved oxygen concentration of the central point of the space unit based on the correction value of the dissolved oxygen concentration of the data point of the month scale dissolved oxygen data at the same target depth near the central point of the space unit and the distance between the nearby data point and the central point of the space unit.
In practice, the correction value of the dissolved oxygen concentration at the sampling point is the same as the first interpolation calculation for obtaining the dissolved oxygen concentration at the center point of the space cell in step 2.
Specifically, performing the second interpolation calculation in the step 4 to obtain the corrected dissolved oxygen concentration of the central point of the space unit includes: taking a space unit central point as a target point, and taking a data point near the same depth month scale dissolved oxygen data space unit central point as an adjacent sampling point; and giving different weights to the distances between the adjacent sampling points and the central point of the space unit, and averaging correction values of the dissolved oxygen concentration of the adjacent sampling points at the same target depth to obtain the corrected dissolved oxygen concentration of the central point of the space unit.
Specifically, the method comprises the following steps: taking a space unit central point as a target point, taking data points near the same depth-month scale dissolved oxygen data space unit central point as adjacent sampling points, giving different weights based on the distances between the adjacent sampling points and the space unit central point, and obtaining the concentration of dissolved oxygen at the space unit central point
It should be noted that, the correction value of the concentration of the dissolved oxygen obtained by the regression prediction model is based on the correction of the temperature salt data related to the dissolved oxygen, and the prediction accuracy is greatly improved compared with the first dissolved oxygen interpolation calculation result.
Specifically, ocean dissolved oxygen concentrations of different depth and different time space areas are obtained based on a regression prediction model, and an ocean dissolved oxygen concentration reconstruction model comprising different target depths and different time space areas is constructed.
Compared with the prior art, the method has the advantages that the plane corresponding to each target depth is divided into different space units by using longitude and latitude coordinates, the dissolved oxygen concentration of the target depth is used for carrying out first interpolation calculation on the dissolved oxygen concentrations of the different space units under the same target depth, the dissolved oxygen concentrations of the different space units under the target depth are obtained, and then the dissolved oxygen concentrations of the different space units under different target depths are obtained, so that the defect of sparse data of the existing Argo profile database on different longitude and latitude coordinates is greatly overcome.
Compared with the prior art, the method has the advantages that the lunar scale warm salt profile data of the same target depth is obtained through screening, the dissolved oxygen concentration of the space unit in the same partition area is reconstructed through the lunar scale warm salt data, the problems of large interpolation calculation and actual deviation of the dissolved oxygen concentration of the space unit caused by excessive sparse data of the Argo profile database on different longitude and latitude coordinates are solved, and the prediction accuracy of the dissolved oxygen concentration is improved.
Compared with the prior art, the method is based on the dissolved oxygen concentration obtained by carrying out first interpolation calculation on the dissolved oxygen concentrations of different space units under the same target depth, the data points are partitioned, and a prediction model of temperature and salinity on the dissolved oxygen is constructed in a region with the dissolved oxygen concentration close to the data points; on the basis of interpolation calculation correction of temperature and salinity on dissolved oxygen, the number of prediction models is limited, the workload is reduced, and the efficiency is improved.
Compared with the prior art, the self-powered active buoy type sampling system represented by the Argo system has lower use cost compared with the traditional ship sampling mode, is more flexible in sampling mode, and can adapt to the task scene of simultaneous and large-scale sampling in different places; compared with the traditional passive buoy sampling, the active buoy can finish sampling a larger area through active motion, has higher sampling efficiency and resistance to a ocean current system, and can finish high-efficiency and accurate sampling of a given area.
Specifically, the performing the first interpolation calculation on the dissolved oxygen concentrations of different space units under the same target depth in the step 2 includes: taking a space unit central point as a target point, taking data points near the space unit central point of the dissolved oxygen data with the same depth and month scale as adjacent sampling points, and giving different weights based on the distances between the adjacent sampling points and the space unit central point to obtain the concentration of the dissolved oxygen at the space unit central point;
Specifically, the target point dissolved oxygen concentration satisfies the following formula:
where i is the number of adjacent sampling points, n is the number of adjacent sampling points designated to perform interpolation, d i Doxy is the distance from the ith adjacent sample point to the target point i Represents the dissolved oxygen concentration of the adjacent sampling point i where interpolation is performed.
Specifically, the method for obtaining the concentration of dissolved oxygen at the adjacent sampling point comprises the following steps:
the data points in the moon scale dissolved oxygen data of the same target depth are numbered C from the near to the far by taking the center of the kth space unit as the circle center 1 ,C 2 ,...,C x And x represents the data point serial number, and at least three sampling points with the minimum x value are screened as adjacent sampling points.
Compared with the prior art, the method and the device have the advantages that the distance space unit distance factors are considered by introducing the space weight related to the distance, so that errors of data correlation evaluation are reduced, and the accuracy of interpolation calculation is improved.
As an example, as shown in fig. 4, interpolation calculation results of dissolved oxygen concentrations of different space units at the same target depth show that: through interpolation calculation, originally scattered and sparse dissolved oxygen data points become uniformly distributed and continuous data points; the concentration of the dissolved oxygen is intuitively indicated by the color depth in the graph.
Specifically, the constructing a space-time classification partition of ocean dissolved oxygen under the target depth in the step 2 includes: the isosurface is divided into different grades of dissolved oxygen concentration according to fixed intervals: 0 to r, r to 2r, …, (v-1) r to vr are divided into different areas according to the grade of different dissolved oxygen concentrations, wherein r is the equivalent surface distance of the dissolved oxygen concentration, and v is any natural number greater than or equal to 1.
As an example, as shown in fig. 5, the iso-surface pitch was taken as 40 μmol/kg, and the iso-surfaces were classified into different levels of dissolved oxygen concentration: 0 to 40 mu mol/kg,40 to 80 mu mol/kg, … and 280 to 320 mu mol/kg; further dividing the water into different areas according to the grade with different dissolved oxygen concentration, marking with different colors, and giving marks; the concentration of the dissolved oxygen is visually represented by the color depth in the graph and belongs to the same region.
Specifically, regarding the random forest regression method described in step 3: the random forest regression method is suitable for multiple variable regression analysis, a regression tree analysis method is adopted for a data set containing multiple variables and one dependent variable, the training data set is split into multiple data subsets randomly, the middle of the data subsets is split, the minimum value of the sum of mean square deviations of the two split parts is calculated as an optimal splitting point, and a decision tree model of a decision subtree with different splitting structures is constructed; verifying the constructed decision tree model by using the test set data, and confirming the prediction precision of the decision tree model to the data:
If the precision meets the preset requirement, outputting the decision tree model as a final model;
if the precision does not meet the preset requirement, continuously splitting the two parts after splitting the data subset to form a decision sub-tree, searching an optimal splitting point of the decision sub-tree, and constructing a new decision tree model; and verifying the constructed decision tree model by using the test set data, confirming the prediction precision of the decision tree model to the data until the precision reaches the preset requirement or the maximum depth of the decision tree and the number of the decision trees in the forest reach the preset value, and outputting the decision tree model as a final model.
And 3, constructing a regression prediction model with temperature and salinity as variables and dissolved oxygen concentration as a dependent variable, wherein the regression prediction model comprises the following steps of:
s301: based on the month scale dissolved oxygen data of the same time-space classification partition and the target depth and the month scale temperature salt data of the same target depth and the same month, a test set and a training set containing variable temperature, salinity, longitude, latitude, year and dependent variable dissolved oxygen concentration are constructed; the data units of the test set and the training set comprise dependent variables and variables which are in one-to-one correspondence;
specifically, because the Argo buoy can simultaneously acquire the measured temperature salt data and the dissolved oxygen concentration at the same space-time position, the monthly scale dissolved oxygen data and the monthly scale temperature salt data have corresponding data points of the same space-time classification partition, the same target depth, the same month and the same longitude and latitude, and the plurality of variables in the monthly scale dissolved oxygen data and the corresponding dissolved oxygen concentration data in the monthly scale temperature salt data are in one-to-one correspondence and are split into a plurality of data units comprising one-to-one correspondence variables and the dissolved oxygen concentration.
S302: dividing a training set into k training subsets by adopting a random sampling mode, dividing the training subsets, selecting a dividing point with the minimum mean square value of two divided training subset data as an optimal dividing point, and constructing k decision tree models of decision trees with different dividing structures; obtaining predicted values of all the decision trees by the average value of the output values of all the data points of all the decision trees of all the optimal segmentation points in all the decision subtrees of all the decision trees; taking the average value of the k decision tree predicted values as the prediction output by the regression prediction model;
s303: substituting the test set into a random forest regression model to obtain a predicted value of the dissolved oxygen concentration, and evaluating the accuracy of the model based on the predicted value of the dissolved oxygen concentration and a true value of the dissolved oxygen concentration in the test set; if the accuracy meets the preset requirement, training of the regional ocean dissolved oxygen concentration reconstruction model is completed, and if the accuracy does not meet the requirement, the maximum depth of the decision tree and the number of the decision tree are adjusted until the model accuracy meets the preset requirement.
It should be noted that, the trained decision tree model has, in addition to the variable types, characteristic parameters of two dimensions of the maximum depth of the decision tree and the number of decision trees, and the two parameters and the variables are used as indexes of the training quality of the decision tree model, so that the prediction accuracy of the decision tree model is determined together.
In the training set training process, variables and dependent variables (output values) of training data are known, and the regression prediction model identifies marked training data and adjusts characteristic parameters in the model during training to adjust the prediction accuracy of the regression prediction model.
The construction of test and training sets containing variable temperature, salinity, longitude, latitude, year, and dependent variable dissolved oxygen concentrations as described in S301, comprising:
s3011: acquiring data points with the same longitude and latitude by using the same time-space classification partition, the same target depth and the month scale dissolved oxygen data and month scale warm salt data of the same month; the data points generate information comprising variables temperature, salinity, longitude, latitude, year, and dissolved oxygen concentration;
s3012: and selecting any one of variable temperature, salinity, longitude, latitude and year and corresponding dissolved oxygen concentration from the data points to generate a first data unit, sequentially generating the rest variables and the corresponding dissolved oxygen concentration into a second data unit, … and a fifth data unit, wherein dependent variables and variables in the data units are in one-to-one correspondence.
The random sampling method in S302 includes Bootstrap sampling, specifically includes the following steps:
(1) Sampling a certain amount of the original samples by adopting a resampling technology to allow resampling;
(2) Calculating statistics T to be estimated according to the extracted samples;
(3) Repeating the above steps for N times to obtain N statistics T;
(4) And calculating the sample variance of the N statistics T, so as to estimate the variance of the statistics T.
Compared with the prior art, the invention adopts Bootstrap random sampling which can be repeatedly sampled, and can acquire a data set with better uniformity, so that data units with different variables in a training set and a testing set are more uniformly dispersed, and adverse effects on decision tree prediction precision in the similar variable data sets are avoided; meanwhile, as Bootstrap can count the variance of the sample, the smaller the variance is, the better the uniformity of sampling is, and the random sampling result is convenient for quantitative characterization.
S302, constructing a decision tree model of k decision subtrees with different splitting structures, which comprises the following steps:
s3021, obtaining an optimal segmentation point corresponding to a minimum value of a mean square value of the two segmented training subsets, wherein the mean square value RMSE meets the formula:
R 1 and R is R 2 The method meets the following conditions:
R 1 (j,s)={x|x (j) ≤s},R 2 (j,s)={x|x (j) >s};
wherein x is an input characteristic value, and y is an output value; c 1 An average value output for the left subregion; c 2 An average value output for the right subregion; j is an optimal segmentation variable, s is an optimal segmentation point, and x (j) represents a characteristic value of x=j; r is R 1 To cut the rear left subregion, R 2 Is the right side subarea after segmentation; y is i An output value representing an i-th data point; min represents taking the minimum value for the region of the min function near the right side;
s3022, obtaining predicted values of all decision trees based on the average value of the output values of all data points in all decision subtrees in all decision trees, and the predicted value c of the decision tree k The method meets the following conditions:
wherein y is i Output value representing the ith data point, N k Representing the number of data points in the kth training subset, c k Representing the average of the output values of the corresponding decision tree of the kth training subset.
S3023, taking the average value of the k predicted values of the decision tree as the predicted value output by the regression prediction model, and taking the average value c of the k predicted values of the decision tree as the average value c - The method meets the following conditions:
wherein l is less than or equal to k, and represents the first training subset, c l Representing the average of the output values of the decision tree corresponding to the first training subset.
Specifically, S3021 is to obtain an optimal segmentation point corresponding to a minimum mean square value of the two segmented training subsets, and includes the following steps:
(1) Selecting any variable in the training subset to carry out multiple segmentation, and solving the optimal segmentation point s of the variable 1 The method comprises the steps of carrying out a first treatment on the surface of the Mean Square Error (RMSE) corresponding to optimal dividing point 1 The method meets the following conditions:
wherein c 1 An average value output for the left subregion; c 2 An average value output for the right subregion; r is R 1 To cut the rear left subregion, R 2 Is the right side subarea after segmentation; y is i An output value representing an i-th data point; min represents taking the minimum value for the region of the min function near the right side;
(2) And sequentially segmenting the rest variables in the training subset according to the same method, calculating the mean square error, solving the optimal segmentation point of each variable, comparing the mean square error corresponding to the optimal segmentation point of each variable, taking the optimal segmentation point corresponding to the minimum mean square error as the optimal segmentation point s of the training subset, and taking the corresponding variable as the optimal segmentation variable j.
The precision evaluation in S303 includes: the regression prediction model is evaluated for accuracy using one or more of a minimum percentage error, a mean square error, or a decision coefficient.
Preferably, the accuracy assessment is made with a minimum percentage error.
Specifically, the minimum percentage error is less than 10%; the maximum depth of the decision tree is 300-500.
The adjustment of the maximum depth of decision tree and the number of decision trees described in S303 includes the use of a cross-validation method.
Specifically, the cross-validation method includes the steps of:
S3031: dividing the training set into a training part, a verification part and a test part; wherein the training part is used for model training, the verification part is used for adjusting parameters, and the test part is used for measuring the performance of the model; dividing the training set into k training subsets continuously, and equally dividing the training subsets at the middle positions;
s3032: respectively making each training subset as a verification part, taking the rest k-1 training subset data as training parts, and obtaining k models;
s3033: taking the average number of root mean square errors of the final verification parts of the k models as performance indexes corresponding to the k models, and recording decision tree parameters corresponding to the performance indexes;
s3034: traversing k models, repeating the steps, and selecting the corresponding decision tree parameters as optimal parameters when the performance index is optimal.
Preferably, the performance index corresponding to the model is root mean square error; the decision tree parameters include the maximum depth of the decision tree and the number of decision trees.
By way of example, the invention discloses a method for evaluating the precision of a random forest regression model of a certain partition by using a test set, wherein evaluation indexes comprise one or more of minimum percentage error, mean square error, root mean square error, mean absolute error and decision coefficient; the method specifically comprises the following steps:
(1) Dividing the training set and the test set according to the ratio of 4:1, wherein the input features comprise: temperature, salinity, longitude, latitude, and year, dependent variables are dissolved oxygen concentration;
(2) Dividing the sampling method into k training sets randomly by adopting a Bootstrap sampling method, wherein the training sets comprise data units with one-to-one corresponding variable and dependent variable dissolved oxygen concentration; training a decision tree by each training set, wherein the k value is the number of trees in the random forest regression parameters, and dividing each training set in the training process to obtain left and right training subsets; for data units with the same class of variables, calculating the mean square error of the left training subset and the right training subset of the variables, and obtaining a segmentation point corresponding to the time when the mean square error is minimum as an optimal segmentation point of the variables; sequentially obtaining all variable optimal segmentation points, comparing the mean square deviations corresponding to all variable segmentation points, selecting the variable with the minimum mean square deviation as the training set optimal segmentation variable, constructing a decision tree in each training set, and obtaining k decision trees;
(3) The average value of the dissolved oxygen concentration of the data unit in each training set is used as the output value of the decision tree corresponding to the training set; further averaging the output values of the k decision trees to serve as a predicted value output by the random forest regression prediction model;
(4) The model is evaluated with respect to accuracy by the test set data using one or more of a minimum percentage error, a mean square error, a root mean square error, an average absolute error, and a decision coefficient: inputting a random forest regression prediction model into the test set data by utilizing the temperature, the salinity, the longitude, the latitude and the year to obtain a predicted value, and carrying out precision analysis on the predicted value and an actual measured value in the test set; if the minimum percentage error is less than or equal to 10%, training the regional ocean dissolved oxygen concentration reconstruction model is completed, and if the minimum percentage error is more than 10%, automatically adjusting parameters of the maximum depth of the decision tree and the number of the decision tree by adopting a cross verification method until the maximum depth of the decision tree reaches 500; and (5) training a random forest regression prediction model.
As an example, as shown in fig. 7: dividing the month scale dissolved oxygen data with the depth of 1000dbar and 3 months into 220 subareas, and carrying out precision evaluation on a random forest regression prediction model trained by each subarea by adopting test set data; the result shows that the prediction minimum percentage error of the near 80% model is less than 5%, and the prediction minimum percentage error of the 90% model is less than 10%; the result shows that the model has higher prediction precision. As shown in fig. 8: the prediction values of the random forest regression prediction models of all the partitions are uniformly distributed near the measured values and are approximately normally distributed, and the models have good prediction precision.
Specifically, in the step 4, performing the second interpolation calculation to obtain the corrected dissolved oxygen concentration of the central point of the space unit, where the corrected dissolved oxygen concentration Doxy' of the central point of the space unit satisfies:
where i is the number of adjacent sampling points, n is the number of adjacent sampling points designated to perform interpolation, d i Doxy is the distance from the ith adjacent sample point to the target point i ' represents the corrected dissolved oxygen concentration of the adjacent sampling point i where interpolation is performed.
Further, in order to evaluate the accuracy of each partition regression prediction model, the invention also discloses an accuracy correction method for reconstructing the dissolved oxygen concentration data based on the warm salt profile, which comprises the following steps:
(1) Reconstructing annual dissolved oxygen data by adopting annual Argo warm salt profile data, wherein a reconstructed data set is identified as follows: o (O) TS The method comprises the steps of carrying out a first treatment on the surface of the Obtaining the actual measurement data of the Argo dissolved oxygen in the past year, wherein the identification is as follows: o (O) Argo The method comprises the steps of carrying out a first treatment on the surface of the O is added with _Argo And O TS Mapping data onto a longitude and latitude 1 degree x 1 degree grid;
(2) And comparing the two grid values corresponding to the space based on the absolute error and the maximum percentage error.
Specifically, the historical year data selects the Argo temperature salt profile data from 2010-2021 and the dissolved oxygen data from 2010-2021.
The results show that, as shown in fig. 9: the maximum percentage error result of the month scale dissolved oxygen data and month scale warm salt profile data with the depth of 1000dbar and 3 months is smaller than 15%, and the maximum percentage error of 90% of data points is higher in prediction accuracy and prediction precision; the maximum percentage error of the data points with smaller true values of dissolved oxygen concentration is larger.
The present invention is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention.

Claims (10)

1. The method for constructing the ocean dissolved oxygen concentration reconstruction model based on the Argo warm salt profile is characterized by comprising the following steps of:
screening and obtaining weather state month scale dissolved oxygen data based on Argo dissolved oxygen profile data of the same month of the year; acquiring month scale dissolved oxygen data of any target depth based on the climatic state month scale dissolved oxygen data; the climatic state month scale dissolved oxygen data comprise calendar year dissolved oxygen data of different months and different depths;
dividing a plane corresponding to each target depth into a plurality of space units by using longitude and latitude coordinates, carrying out first interpolation calculation on the dissolved oxygen concentration of different space units under the same target depth based on the lunar scale dissolved oxygen data of the target depth to obtain the dissolved oxygen concentration of the central point of the space unit, and taking the dissolved oxygen concentration as the dissolved oxygen concentration of the space unit; based on the dissolved oxygen concentration of each space unit, dividing the space units with the dissolved oxygen concentration close to the target depth into the same type of region, and completing the ocean dissolved oxygen space-time classification partition under the target depth; partitioning the different target depths to construct ocean dissolved oxygen space-time classification partitions under different target depths;
Screening and obtaining month scale warm salt data of the same target depth based on Argo warm salt profile data of the same month of the past year; selecting month scale temperature and salinity data and month scale dissolved oxygen concentration data with longitude and latitude coordinates in the same time-space classification partition, the same target depth and the same longitude and latitude coordinates, and constructing a regression prediction model of the time-space classification partition by taking temperature and salinity as variables and dissolved oxygen concentration as dependent variables;
acquiring correction values of dissolved oxygen concentration of each sampling point with different longitudes and latitudes in corresponding space-time classification partitions under the target depth by a regression prediction model based on month scale warm salt data of month scale warm salt data sampling points; the sampling points are data sampling points obtained by the Argo system at different depth and longitude and latitude coordinates; and carrying out second interpolation calculation based on the correction value of the dissolved oxygen concentration under the target depth to obtain the corrected dissolved oxygen concentration of the central point of the space unit, completing the reconstruction of corresponding space-time classification partitions and month-scale warm salt data under the target depth, and constructing ocean dissolved oxygen concentration reconstruction models comprising different target depths and different space-time partitions.
2. The method of claim 1, wherein the obtaining climatic state month scale dissolved oxygen data comprises:
Acquiring the profile data of the Argo annual dissolved oxygen;
the month is taken as a screening parameter, and the annual climate of each month is obtained by screening the annual dissolved oxygen profile data of ArgoData on dissolved oxygen Doxy on the moon scale M (h) The method comprises the steps of carrying out a first treatment on the surface of the Wherein M is 1-12, h is depth.
3. The method according to claim 1, wherein the performing a first interpolation calculation on the dissolved oxygen concentrations of different spatial units at the same target depth includes: and taking the central point of the space unit as a target point, taking data points near the central point of the space unit of the dissolved oxygen data with the same depth-month scale as adjacent sampling points, and giving different weights based on the distances between the adjacent sampling points and the central point of the space unit to obtain the concentration of the dissolved oxygen at the central point of the space unit.
4. A construction method according to claim 3, wherein the target dissolved oxygen concentration, the dissolved oxygen concentration Doxy, satisfies the following formula:
where i is the number of adjacent sampling points, n is the number of adjacent sampling points designated to perform interpolation, d i Doxy is the distance from the ith adjacent sample point to the target point i Represents the dissolved oxygen concentration of the adjacent sampling point i where interpolation is performed.
5. The Argo-based marine dissolved oxygen concentration reconstruction model construction method according to claim 1, wherein the constructing a regression prediction model with warm salinity as a variable and dissolved oxygen concentration as a dependent variable comprises:
Based on the month scale dissolved oxygen data of the same time-space classification partition and the target depth and the month scale temperature salt data of the same target depth and the same month, a test set and a training set containing variable temperature, salinity, longitude, latitude, year and dependent variable dissolved oxygen concentration are constructed; the data units of the test set and the training set comprise dependent variables and variables which are in one-to-one correspondence;
dividing a training set into k training subsets by adopting a random sampling mode, dividing the training subsets, selecting a dividing point with the minimum mean square value of two divided training subset data as an optimal dividing point, and constructing k decision tree models of decision trees with different dividing structures; obtaining predicted values of all the decision trees by the average value of the output values of all the data points of all the decision trees of all the optimal segmentation points in all the decision subtrees of all the decision trees; taking the average value of the k decision tree predicted values as the prediction output by the regression prediction model;
substituting the test set into a random forest regression model to obtain a predicted value of the dissolved oxygen concentration, and evaluating the accuracy of the model based on the predicted value of the dissolved oxygen concentration and a true value of the dissolved oxygen concentration in the test set; if the accuracy meets the preset requirement, training of the regional ocean dissolved oxygen concentration reconstruction model is completed, and if the accuracy does not meet the requirement, the maximum depth of the decision tree and the number of the decision tree are adjusted until the model accuracy meets the preset requirement.
6. The method of constructing a decision tree model of k decision subtrees having different splitting structures according to claim 5, comprising:
obtaining an optimal segmentation point corresponding to the minimum value of the mean square value of the two segmented training subsets, wherein the mean square value RMSE meets the formula:
R 1 and R is R 2 The method meets the following conditions:
R 1 (j,s)={x|x (j) ≤s},R 2 (j,s)={x|x (j) >s};
wherein x is an input characteristic value, and y is an output value; c 1 An average value output for the left subregion; c 2 An average value output for the right subregion; j is the optimal segmentation variable, s is the optimal segmentation point, and x (j) A characteristic value representing x=j; r is R 1 To cut the rear left subregion, R 2 Is the right side subarea after segmentation; y is i An output value representing an i-th data point; min represents the function of minThe number is close to the right area to take the minimum value;
based on the average value of the output values of the data points in each decision subtree in each decision tree, obtaining the predicted value of each decision tree and the predicted value c of the decision tree k The method meets the following conditions:
wherein y is i Output value representing the ith data point, N k Representing the number of data points in the kth training subset, c k Representing the average value of the output values of the decision tree corresponding to the kth training subset;
taking the average value of the k decision tree predicted values as the predicted value output by the regression prediction model, and taking the average value c of the k decision tree predicted values as the average value c - The method meets the following conditions:
wherein l is less than or equal to k, and represents c in the first training subset l Representing the average of the output values of the decision tree corresponding to the first training subset.
7. The method of claim 6, wherein the obtaining the optimal segmentation point corresponding to the minimum mean square value of the two segmented training subsets comprises:
selecting any variable in the training subset to carry out multiple segmentation, and solving the optimal segmentation point s of the variable 1 The method comprises the steps of carrying out a first treatment on the surface of the Mean Square Error (RMSE) corresponding to optimal dividing point 1 The method meets the following conditions:
wherein c 1 An average value output for the left subregion; c 2 An average value output for the right subregion; r is R 1 To cut the rear left subregion, R 2 For cuttingDividing into right side subregions; y is i An output value representing an i-th data point; min represents taking the minimum value for the region of the min function near the right side;
and sequentially segmenting the rest variables in the training subset according to the same method, calculating the mean square error, solving the optimal segmentation point of each variable, comparing the mean square error corresponding to the optimal segmentation point of each variable, taking the optimal segmentation point corresponding to the minimum mean square error as the optimal segmentation point s of the training subset, and taking the corresponding variable as the optimal segmentation variable j.
8. The method of claim 5, wherein the adjusting the maximum depth of the decision tree and the number of decision trees comprises a cross-validation method, the cross-validation method comprising:
Dividing the training set into a training part, a verification part and a test part; wherein the training part is used for model training, the verification part is used for adjusting parameters, and the test part is used for measuring the performance of the model; dividing the training set into k training subsets continuously, wherein the training subsets are equally divided by the middle position;
respectively making each training subset as a verification part, taking the rest k-1 training subset data as training parts, and obtaining k models;
taking the average number of root mean square errors of the final verification parts of the k models as performance indexes corresponding to the k models, and recording decision tree parameters corresponding to the performance indexes;
traversing k models, repeating the steps, and selecting the corresponding decision tree parameters as optimal parameters when the performance index is optimal.
9. The method of claim 8, wherein the performance index corresponding to the model is a root mean square error; the decision tree parameters comprise the maximum depth of the decision tree and the number of the decision tree.
10. The method of claim 5, wherein the random sampling pattern comprises Bootstrap sampling.
CN202310062810.9A 2023-01-18 2023-01-18 Ocean dissolved oxygen concentration reconstruction model construction method based on Argo warm salt profile Pending CN116864026A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310062810.9A CN116864026A (en) 2023-01-18 2023-01-18 Ocean dissolved oxygen concentration reconstruction model construction method based on Argo warm salt profile

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310062810.9A CN116864026A (en) 2023-01-18 2023-01-18 Ocean dissolved oxygen concentration reconstruction model construction method based on Argo warm salt profile

Publications (1)

Publication Number Publication Date
CN116864026A true CN116864026A (en) 2023-10-10

Family

ID=88227382

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310062810.9A Pending CN116864026A (en) 2023-01-18 2023-01-18 Ocean dissolved oxygen concentration reconstruction model construction method based on Argo warm salt profile

Country Status (1)

Country Link
CN (1) CN116864026A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117421562A (en) * 2023-12-18 2024-01-19 浙江大学 Ocean dissolved oxygen content space-time distribution prediction method, system, medium and equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117421562A (en) * 2023-12-18 2024-01-19 浙江大学 Ocean dissolved oxygen content space-time distribution prediction method, system, medium and equipment
CN117421562B (en) * 2023-12-18 2024-03-15 浙江大学 Ocean dissolved oxygen content space-time distribution prediction method, system, medium and equipment

Similar Documents

Publication Publication Date Title
Guiot et al. Chapter thirteen transfer functions: methods for quantitative paleoceanography based on microfossils
Fortin Spatial statistics in landscape ecology
Cossarini et al. Towards operational 3D-Var assimilation of chlorophyll Biogeochemical-Argo float data into a biogeochemical model of the Mediterranean Sea
Sauquet et al. Comparison of catchment grouping methods for flow duration curve estimation at ungauged sites in France
Leonardsson et al. Theoretical and practical aspects on benthic quality assessment according to the EU-Water Framework Directive–examples from Swedish waters
CN106372277B (en) Method for optimizing variation function model in forest land index space-time estimation
CN114254802B (en) Prediction method for vegetation coverage space-time change under climate change drive
CN116864026A (en) Ocean dissolved oxygen concentration reconstruction model construction method based on Argo warm salt profile
CN108764527B (en) Screening method for soil organic carbon library time-space dynamic prediction optimal environment variables
CN115345069A (en) Lake water volume estimation method based on maximum water depth record and machine learning
Liu et al. Improving assessment accuracy for lake biological condition by classifying lakes with diatom typology, varying metrics and modeling multimetric indices
Sabia et al. Assessing the quality of biogeochemical coastal data: a step-wise procedure
González‐Abad et al. Using explainability to inform statistical downscaling based on deep learning beyond standard validation approaches
CN117078114B (en) Water quality evaluation method and system for water-bearing lakes under influence of diversion engineering
Lipor et al. Distance-penalized active learning using quantile search
CN112729292A (en) Gravity adaptive area selection method based on multiple statistical parameters of hierarchical analysis method
CN116912672A (en) Unmanned survey vessel-based biological integrity evaluation method for large benthonic invertebrates
CN116541681A (en) Composite disaster space variability identification method based on collaborative kriging interpolation
Strauss et al. The use of habitat models in conservation of rare and endangered leafhopper species (Hemiptera, Auchenorrhyncha)
Goldenberg Vilar et al. Seasonality modulates the predictive skills of diatom based salinity transfer functions
Vinagre et al. Addressing a gap in the Water Framework Directive implementation: Rocky shores assessment based on benthic macroinvertebrates
CN116010799A (en) Ocean dissolved oxygen space grid model construction method based on Argo
Jepsen et al. Intercalibration of fish-based methods to evaluate river ecological quality
Sauzède et al. NEW GLOBAL VERTICAL DISTRIBUTION OF GRIDDED PARTICULATE ORGANIC CARBON AND CHLOROPHYLL-A CONCENTRATION USING MACHINE LEARNING FOR CMEMS
CN117423387B (en) Method and system for evaluating space-time difference of aquatic community based on digital driving

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination