CN115293577B - Machine learning-based high-cold-flow-domain groundwater chemical control factor analysis method - Google Patents

Machine learning-based high-cold-flow-domain groundwater chemical control factor analysis method Download PDF

Info

Publication number
CN115293577B
CN115293577B CN202210939068.0A CN202210939068A CN115293577B CN 115293577 B CN115293577 B CN 115293577B CN 202210939068 A CN202210939068 A CN 202210939068A CN 115293577 B CN115293577 B CN 115293577B
Authority
CN
China
Prior art keywords
groundwater
cold
water
water chemistry
chemistry
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210939068.0A
Other languages
Chinese (zh)
Other versions
CN115293577A (en
Inventor
张海发
王巍
张旭
曾祥云
邵世鹏
王宝强
宋东东
郭舟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Pearl River Water Resources Commission Technical Consulting Guangzhou Co ltd
Original Assignee
Pearl River Water Resources Commission Technical Consulting Guangzhou Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pearl River Water Resources Commission Technical Consulting Guangzhou Co ltd filed Critical Pearl River Water Resources Commission Technical Consulting Guangzhou Co ltd
Priority to CN202210939068.0A priority Critical patent/CN115293577B/en
Publication of CN115293577A publication Critical patent/CN115293577A/en
Application granted granted Critical
Publication of CN115293577B publication Critical patent/CN115293577B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A20/00Water conservation; Efficient water supply; Efficient water use
    • Y02A20/152Water filtration

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • Software Systems (AREA)
  • Educational Administration (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Marketing (AREA)
  • Tourism & Hospitality (AREA)
  • General Engineering & Computer Science (AREA)
  • Primary Health Care (AREA)
  • Water Supply & Treatment (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Chemical & Material Sciences (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Public Health (AREA)
  • Geometry (AREA)
  • Computer Hardware Design (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Geophysics And Detection Of Objects (AREA)

Abstract

The invention discloses a machine learning-based method for analyzing chemical control factors of groundwater in a high-cold-flow area, which is applied to the technical field of groundwater environment management in the high-cold-flow area and comprises the following steps: based on the preprocessed water chemistry element data, constructing an SOM self-organizing neural network to obtain a water chemistry visual topological graph; determining an optimal clustering number by combining a Davies-Bouldin index and a K-means algorithm on a water chemistry visualization topological graph, clustering the preprocessed water chemistry element data, carrying out normalization processing on a clustering result, and drawing a radar graph and an ion ratio graph; and according to the clustering result, the method is fused with an orthogonal matrix factor decomposition (PMF), and the chemical control factors of the groundwater in the high-cold-flow area are qualitatively and quantitatively analyzed by combining correlation analysis and an ion ratio method. According to the invention, by combining SOM, PMF, correlation analysis and ion ratio method, qualitative and quantitative analysis of groundwater chemical control factors in a high-cold-flow area is realized.

Description

Machine learning-based high-cold-flow-domain groundwater chemical control factor analysis method
Technical Field
The invention relates to the technical field of groundwater environment management in a high-cold-flow area, in particular to a machine learning-based method for analyzing chemical control factors of groundwater in the high-cold-flow area.
Background
Groundwater is a key component of water resources in northwest China, and most of resident drinking water, industrial water, agricultural irrigation and ecological water requirements depend on groundwater. Because of little precipitation and large evaporation, the water is lack in northwest areas, and the quality of underground water is reduced or even deteriorated, so that the water in the areas is more serious. The development of underground water chemical control factor research in the alpine water-deficient region is beneficial to the deep understanding of the underground water quality evolution of the region and the scientific guidance of underground water resource management.
The chemical control factors of groundwater are widely studied by a plurality of scholars at home and abroad in recent years. The current research is mainly focused on hydrogeologic investigation, qualitative analyte sources, water chemistry characteristics of local areas for short periods of time, and the like. However, the environmental conditions of the high-cold river basin are difficult, the chemical characteristics of the groundwater are quite complex, and a single method cannot systematically reveal the control factors of the groundwater chemistry in different areas of different seasons in the whole river basin. The large number of complex space-time varying water chemistry datasets are difficult to visualize to a high degree, making it more difficult to simultaneously qualitatively and quantitatively analyze the control factors of high-cold-flow-domain groundwater.
Therefore, how to describe complex groundwater data, identify the hydrogeochemical process and source analysis of each cluster, and quantify the control factors of different spaces and seasons of the river basin, and qualitatively and quantitatively analyze groundwater chemical control factors of the high-cold river basin is a problem to be solved by those skilled in the art.
Disclosure of Invention
In view of the above, the invention provides a machine learning-based method for analyzing the chemical control factors of groundwater in a high-cold-flow area. According to the invention, complex high-dimensional data is visualized to a low-dimensional space by applying the SOM, so that the chemical components of groundwater and potential control factors thereof are determined; obtaining non-negative source contribution of each chemical substance and main factors of qualitative classification by applying PMF quantitative source distribution by an orthogonal matrix factorization method; the analysis method of the underground water chemical control factors of the high-cold-flow areas based on machine learning is provided by combining SOM, PMF, correlation analysis and ion ratio method, complex underground water data can be described, the hydrological geochemical process and source analysis of each cluster can be identified, meanwhile, the control factors of different spaces and seasons of the flow areas are quantized, and qualitative and quantitative analysis of the underground water chemical control factors of the high-cold-flow areas is realized, so that the analysis method has important significance for deep understanding of the evolution of the underground water quality and scientific guidance of the prior treatment of pollution sources and the management and control of underground water resources.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
the machine learning-based method for analyzing the underground water chemical control factors of the high-cold-flow domain comprises the following steps:
step (1): and acquiring water chemical element data of the groundwater sample in the high-cold-flow area, and preprocessing.
Step (2): and constructing an SOM self-organizing neural network based on the preprocessed water chemistry element data to obtain a water chemistry visual topological graph.
Step (3): and determining an optimal clustering number on the water chemistry visualization topological graph by combining the Davies-Bouldin index and the K-means algorithm, clustering the preprocessed water chemistry element data, carrying out normalization processing on the clustering result, and drawing a radar graph and an ion ratio graph.
Step (4): and according to the clustering result, the method is fused with an orthogonal matrix factor decomposition (PMF), and the chemical control factors of the groundwater in the high-cold-flow area are qualitatively and quantitatively analyzed by combining correlation analysis and an ion ratio method.
Optionally, in step (1), preprocessing the water chemistry element data includes: sample data are named and arranged on the basis of the water chemistry element data by combining season and region attributes.
Optionally, in step (1), further includes: when the preprocessed sample data has abnormal values or missing values, abnormal values are removed, and the missing values are compensated by interpolation.
Optionally, in step (2), constructing a SOM self-organizing neural network to obtain a water chemistry visualization topological graph, specifically:
inputting the preprocessed water chemistry element data into an SOM self-organizing neural network toolbox in Matlab;
step A: determining an optimal number of neuronsWhere n is the number of groundwater samples.
And (B) step (B): the weight space with smaller random value is initialized, and at the same time, the mapping size, the initial winner neuron and the initial learning rate are set respectively.
Step C: the best matching unit BMU with the weight vector most similar to the input vector is found.
Step D: updating the weight vector of the BMU and the near-end neuron thereof.
Step E: the iterative search process is then converged to the optimal self-organizing map.
And (C) ending outputting the water chemistry visualization topological graph when the preset iteration times or learning rate is reached and the learning rate is towards 0, otherwise, returning to the step (C).
Optionally, in the step (2), outputting a water chemistry visualization topological graph obtained by calculating a SOM self-organizing neural network in a form of a unified distance matrix and a component plane.
Optionally, in step (3), the formula for determining the optimal cluster number DBI is as follows:
wherein N is the number of clusters; sigma (sigma) i 、σ j Respectively, from all patterns in the ith and j clusters to the centroid c j And c j Average distance of (2); d (c) j ,c j ) Is c j And c j Distance between them.
Optionally, in the step (3), clustering is performed on the preprocessed water chemistry element data, specifically:
amplifying the interval of the lowest point by the change of the Davies-Bouldin index, and determining the clear boundary of the cluster by the fact that the minimum Davies-Bouldin index value corresponds to the optimal cluster number; finally, the labels of the preprocessed water chemistry element data are projected on neurons, and the positions of the preprocessed water chemistry element data in each cluster are determined.
Optionally, in the step (3), a radar chart and an ion ratio chart of the data concentration of the water chemical element in each cluster can be drawn by normalizing the clustering result.
Optionally, in the step (4), according to the clustering result, the method is fused with an orthogonal matrix factorization PMF, specifically:
step a: applying an orthogonal matrix factorization (PMF) method to the preprocessed water chemistry element data to obtain water chemistry X ij The sample content matrix of the sample is decomposed into a factor contribution matrix g ik Factor distribution matrix f kj And residual matrix e ij The following are provided:
wherein X is ij The sample concentration matrix X is the concentration of the jth water chemical element in the ith sample; p is the number of pollution sources; g ik Is the contribution of the kth water chemistry element to the ith sample; f (f) kj Is the concentration of the j-th species in the k-th water chemistry element.
Step b: calculating uncertainty u ij ,u ij The uncertainty of the jth water chemical element in the ith sample is calculated by the water chemical element content, the method detection limit MDL and the measurement uncertainty:
if the content of the water chemical elements is greater than the detection limit MDL, u of the method ij The calculation formula of (2) is as follows:
wherein error is the error coefficient.
If the content of the water chemical elements is greater than the detection limit MDL of the method, the related u ij The calculation formula of (2) is as follows:
step c: deriving factor contribution and distribution by minimizing the objective function Q:
wherein m is the number of water chemistry elements, and n is the number of samples.
Optionally, in the step (4), qualitative and quantitative analysis of the high-cold-flow-area groundwater chemical control factors is combined with correlation analysis and ion ratio method, specifically:
quantitative information of each factor contribution and quantitative information of each factor distributed to each water chemistry element are obtained according to an orthogonal matrix factor decomposition (PMF); the water chemical contribution of each factor is related to the radar map, correlation analysis, and ion ratio based on SOM classification results (TDS vs Na + /(Na + +Ca2 + )、Mg 2+ /Na + With Ca 2+ /Na + 、CAI-I(=(Cl--(Na + +K + ) /Cl-) and CAI-II (= (Cl- - (Na) + +K + )/(SO 4 2- +HCO 3 -+CO 3 2- +NO 3 - ))、NO 3 - /Na + With Cl - /Na + ) In combination with the analysis, each factor may correspond to a groundwater chemistry control factor and reflect the contribution rate.
Compared with the prior art, the invention provides a machine learning-based method for analyzing the chemical control factors of the groundwater in the high-cold-flow area. According to the invention, complex high-dimensional data is visualized to a low-dimensional space by applying the SOM, so that the chemical components of groundwater and potential control factors thereof are determined; obtaining non-negative source contribution of each chemical substance and main factors of qualitative classification by applying PMF quantitative source distribution by an orthogonal matrix factorization method; the analysis method of the underground water chemical control factors of the high-cold-flow areas based on machine learning is provided by combining SOM, PMF, correlation analysis and ion ratio method, complex underground water data can be described, the hydrological geochemical process and source analysis of each cluster can be identified, meanwhile, the control factors of different spaces and seasons of the flow areas are quantized, and qualitative and quantitative analysis of the underground water chemical control factors of the high-cold-flow areas is realized, so that the analysis method has important significance for deep understanding of the evolution of the underground water quality and scientific guidance of the prior treatment of pollution sources and the management and control of underground water resources.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of the present invention.
FIG. 2 is a schematic diagram of a water chemistry visualization topology of the present invention.
FIG. 3 is a schematic diagram of SOM clustering results according to the present invention.
Fig. 4 is a schematic representation of the radar of the present invention.
Fig. 5 is an illustration of ion ratio for the present invention.
Fig. 6 is a schematic diagram of contributions of different factors of determining groundwater chemical element data based on PMF model according to the invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The embodiment 1 of the invention discloses a machine learning-based method for analyzing chemical control factors of groundwater in a high-cold-flow area, which comprises the following steps:
step (1): the method for acquiring the water chemical element data of the groundwater sample in the high-cold-flow area and carrying out pretreatment comprises the following steps: naming and arranging sample data based on water chemistry element data by combining season and region attributes, specifically:
the method comprises the steps of firstly, simply naming water chemical element data as a tag, wherein numbers represent regional positions, month English abbreviations represent seasons, SGW represents diving, and DGW represents pressurized water; then according to the ordinate is the name of the sample, and the abscissa is the arrangement sequence of each water chemical element; when the preprocessed sample data has abnormal values or missing values, abnormal values are removed, and the missing values are compensated by interpolation.
Step (2): based on the pretreated water chemistry element data, constructing an SOM self-organizing neural network to obtain a water chemistry visual topological graph, wherein the water chemistry visual topological graph specifically comprises the following steps of:
inputting the preprocessed water chemistry element data into an SOM self-organizing neural network toolbox in Matlab;
step A: determining an optimal number of neuronsWhere n is the number of groundwater samples.
And (B) step (B): the weight space with smaller random value is initialized, and at the same time, the mapping size, the initial winner neuron and the initial learning rate are set respectively.
Step C: the best matching unit BMU with the weight vector most similar to the input vector is found.
Step D: updating the weight vector of the BMU and the near-end neuron thereof.
Step E: the iterative search process is then converged to the optimal self-organizing map.
And (C) ending outputting the water chemistry visualization topological graph when the preset iteration times or learning rate is reached and the learning rate is towards 0, otherwise, returning to the step (C).
As shown in fig. 2, the water chemistry visual topological graph is obtained by outputting the calculation result of the SOM self-organizing neural network in the form of a unified distance matrix (u matrix) and a component plane.
Step (3): and determining an optimal clustering number by combining a Davies-Bouldin index and a K-means algorithm on a water chemistry visualization topological graph, clustering the preprocessed water chemistry element data, normalizing the clustering result, drawing a radar graph and an ion ratio graph, and reflecting the spatial and seasonal changes of the underground water chemical concentration.
The formula for determining the optimal cluster number DBI is as follows:
wherein N is the number of clusters; sigma (sigma) i 、σ j Respectively, from all patterns in the ith and j clusters to the centroid c j And c j Average distance of (2); d (c) j ,c j ) Is c j And c j Distance between them.
Clustering the pretreated water chemistry element data, as shown in fig. 3, specifically:
amplifying the interval of the lowest point by the change of the Davies-Bouldin index, and determining the clear boundary of the cluster by the fact that the minimum Davies-Bouldin index value corresponds to the optimal cluster number; finally, the labels of the preprocessed water chemistry element data are projected on neurons, and the positions of the preprocessed water chemistry element data in each cluster are determined.
Finally, by normalizing the clustering result, a radar chart of the water chemical element data concentration in each cluster can be drawn, as shown in fig. 4 and an ion ratio chart are shown in fig. 5, and the spatial and seasonal changes of the water chemical concentration in each cluster can be analyzed according to the radar chart.
Step (4): and according to the clustering result, the method is fused with an orthogonal matrix factor decomposition (PMF), and the chemical control factors of the groundwater in the high-cold-flow area are qualitatively and quantitatively analyzed by combining correlation analysis and an ion ratio method.
According to the clustering result, the method is fused with an orthogonal matrix factorization (PMF) method, specifically:
step a: applying an orthogonal matrix factorization (PMF) method to the preprocessed water chemistry element data to obtain water chemistry X ij The sample content matrix of the sample is decomposed into a factor contribution matrix g ik Factor distribution matrix f kj And residual matrix e ij The following are provided:
wherein X is ij The sample concentration matrix X is the concentration of the jth water chemical element in the ith sample; p is the number of pollution sources; g ik Is the contribution of the kth water chemistry element to the ith sample; f (f) kj Is the concentration of the j-th species in the k-th water chemistry element.
Step b: calculating uncertainty u ij ,u ij The uncertainty of the jth water chemical element in the ith sample is calculated by the water chemical element content, the method detection limit MDL and the measurement uncertainty:
if the content of the water chemical elements is greater than the detection limit MDL, u of the method ij The calculation formula of (2) is as follows:
wherein error is the error coefficient.
If the content of the water chemical elements is greater than the detection limit MDL of the method, the related u ij The calculation formula of (2) is as follows:
step c: deriving factor contribution and distribution by minimizing the objective function Q:
wherein m is the number of water chemistry elements, and n is the number of samples.
As shown in fig. 6, fig. 6a is a PMF factor fingerprint diagram, fig. 6b is a PMF factor contribution diagram, fig. 6c is a correlation diagram of PMF factor contribution and groundwater chemical components, and the high-cold-flow-area groundwater chemical control factors are qualitatively and quantitatively analyzed by combining correlation analysis and ion ratio method, specifically:
quantitative information of each factor contribution and quantitative information of each factor distributed to each water chemistry element are obtained according to an orthogonal matrix factor decomposition (PMF); the water chemical contribution of each factor is related to the radar map, correlation analysis, and ion ratio based on SOM classification results (TDS vs Na + /(Na + +Ca2 + )、Mg 2+ /Na + With Ca 2+ /Na + 、CAI-I(=(Cl - -(Na + +K + )/Cl - ) With CAI-II (= (Cl) - -(Na + +K + )/(SO 4 2- +HCO 3 - +CO 3 2- +NO 3 - ))、NO 3 - /Na + With Cl-/Na + ) In combination with the analysis, each factor may correspond to a groundwater chemistry control factor and reflect the contribution rate.
The embodiment of the invention discloses a machine learning-based method for analyzing chemical control factors of groundwater in a high-cold-flow area. According to the invention, complex high-dimensional data is visualized to a low-dimensional space by applying the SOM, so that the chemical components of groundwater and potential control factors thereof are determined; obtaining non-negative source contribution of each chemical substance and main factors of qualitative classification by applying PMF quantitative source distribution by an orthogonal matrix factorization method; the analysis method of the underground water chemical control factors of the high-cold-flow areas based on machine learning is provided by combining SOM, PMF, correlation analysis and ion ratio method, complex underground water data can be described, the hydrological geochemical process and source analysis of each cluster can be identified, meanwhile, the control factors of different spaces and seasons of the flow areas are quantized, and qualitative and quantitative analysis of the underground water chemical control factors of the high-cold-flow areas is realized, so that the analysis method has important significance for deep understanding of the evolution of the underground water quality and scientific guidance of the prior treatment of pollution sources and the management and control of underground water resources.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (9)

1. The method for analyzing the underground water chemical control factors of the high-cold-flow area based on machine learning is characterized by comprising the following steps of:
step (1): acquiring water chemical element data of a groundwater sample in a high-cold-flow area, and preprocessing;
step (2): based on the preprocessed water chemistry element data, constructing an SOM self-organizing neural network to obtain a water chemistry visual topological graph;
step (3): determining an optimal clustering number by combining a Davies-Bouldin index and a K-means algorithm on the water chemistry visualization topological graph, clustering the preprocessed water chemistry element data, normalizing the clustering result, and drawing a radar graph and an ion ratio graph;
step (4): according to the clustering result, the method is fused with an orthogonal matrix factorization (PMF) method, and the chemical control factors of the groundwater in the high-cold-flow area are qualitatively and quantitatively analyzed by combining correlation analysis and an ion ratio method;
in the step (4), according to the clustering result, the clustering result is fused with an orthogonal matrix factorization (PMF), specifically:
step a: applying an orthogonal matrix factorization (PMF) method to the preprocessed water chemistry element data to obtain water chemistry X ij The sample content matrix of the sample is decomposed into a factor contribution matrix g ik Factor distribution matrix f kj And residual matrix e ij The following are provided:
wherein X is ij The sample concentration matrix X is the concentration of the jth water chemical element in the ith sample; p is the number of pollution sources; g ik Is the contribution of the kth water chemistry element to the ith sample; f (f) kj Is the concentration of the jth species in the kth water chemistry element;
step b: calculating uncertainty u ij ,u ij The uncertainty of the jth water chemical element in the ith sample is calculated by the water chemical element content, the method detection limit MDL and the measurement uncertainty:
if the content of the water chemical elements is greater than the detection limit MDL, u of the method ij The calculation formula of (2) is as follows:
wherein error is an error coefficient;
if the content of the water chemical elements is greater than the detection limit MDL of the method, the related u ij The calculation formula of (2) is as follows:
step c: deriving factor contribution and distribution by minimizing the objective function Q:
wherein m is the number of water chemistry elements, and n is the number of samples.
2. The machine learning-based high-cold-flow-domain groundwater chemistry control factor analysis method according to claim 1, wherein in step (1), preprocessing the water chemistry element data comprises: and naming and arranging the sample data by combining season and region attributes on the basis of the water chemistry element data.
3. The machine learning-based high-cold-flow-domain groundwater chemistry control factor analysis method according to claim 1, wherein in step (1), further comprising: when the preprocessed sample data has abnormal values or missing values, abnormal values are removed, and the missing values are compensated by interpolation.
4. The machine learning-based high-cold-flow-domain groundwater chemistry control factor analysis method according to claim 1, wherein in the step (2), an SOM self-organizing neural network is constructed to obtain a water chemistry visualization topological graph, specifically:
inputting the preprocessed water chemistry element data into an SOM self-organizing neural network toolbox in Matlab;
step A: determining an optimal number of neuronsWherein n is the number of groundwater samples;
and (B) step (B): initializing a weight space with smaller random value, and simultaneously, respectively setting a mapping size, an initial winner neuron and an initial learning rate;
step C: finding out the best matching unit BMU with the most similar weight vector and input vector;
step D: updating the weight vector of the BMU and the near-end neurons thereof;
step E: iterative search process and converged to optimal self-organizing map;
and (C) ending outputting the water chemistry visual topological graph when the preset iteration times or learning rate is reached to 0, otherwise, returning to the step (C).
5. The machine learning-based method for analyzing groundwater chemistry control factors in a high-cold-flow area according to claim 1, wherein in the step (2), the water chemistry visualization topological graph obtained by the calculation result of the SOM self-organizing neural network is output in the form of a unified distance matrix and a component plane.
6. The machine learning-based method for analyzing groundwater chemistry control factors in a high-cold-flow area according to claim 1, wherein in the step (3), the formula for determining the optimal clustering number DBI is as follows:
wherein N is the number of clusters; sigma (sigma) i 、σ j Respectively, from all patterns in the ith and j clusters to the centroid c i And c j Average distance of (2); d (c) i ,c j ) Is c i And c j Distance between them.
7. The machine learning-based method for analyzing groundwater chemical control factors in a high-cold-flow area according to claim 1, wherein in the step (3), the preprocessed water chemical element data is clustered, specifically:
amplifying the interval of the lowest point by the change of the Davies-Bouldin index, and determining the clear boundary of the cluster by the fact that the minimum Davies-Bouldin index value corresponds to the optimal cluster number; finally, the tags of the preprocessed water chemistry element data are projected on neurons, and the positions of the preprocessed water chemistry element data in each cluster are determined.
8. The machine learning-based analysis method of groundwater chemistry control factors in a high-cold-flow area according to claim 1, wherein in the step (3), a radar map and an ion ratio map of water chemistry element data concentration in each cluster can be drawn by normalizing the clustering result.
9. The machine learning-based analysis method for controlling chemical factors of groundwater in a high-cold-flow area according to claim 1, wherein in the step (4), the correlation analysis and the ion ratio method are combined to qualitatively and quantitatively analyze the chemical controlling factors of groundwater in the high-cold-flow area, specifically:
quantitative information of each factor contribution and quantitative information of each factor distributed to each water chemistry element are obtained according to an orthogonal matrix factor decomposition (PMF); the water chemical contribution of each factor is related to the radar map, correlation analysis, and ion ratio based on SOM classification results (TDS vs Na + /(Na + +Ca2 + )、Mg 2+ /Na + With Ca 2+ /Na + 、CAI-I(=(Cl - -(Na + +K + )/Cl - ) With CAI-II (= (Cl) - -(Na + +K + )/(SO 4 2- +HCO 3 - +CO 3 2- +NO 3 - ))、NO 3 - /Na + With Cl - /Na + ) In combination with the analysis, each factor may correspond to a groundwater chemistry control factor and reflect the contribution rate.
CN202210939068.0A 2022-08-05 2022-08-05 Machine learning-based high-cold-flow-domain groundwater chemical control factor analysis method Active CN115293577B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210939068.0A CN115293577B (en) 2022-08-05 2022-08-05 Machine learning-based high-cold-flow-domain groundwater chemical control factor analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210939068.0A CN115293577B (en) 2022-08-05 2022-08-05 Machine learning-based high-cold-flow-domain groundwater chemical control factor analysis method

Publications (2)

Publication Number Publication Date
CN115293577A CN115293577A (en) 2022-11-04
CN115293577B true CN115293577B (en) 2023-07-21

Family

ID=83828207

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210939068.0A Active CN115293577B (en) 2022-08-05 2022-08-05 Machine learning-based high-cold-flow-domain groundwater chemical control factor analysis method

Country Status (1)

Country Link
CN (1) CN115293577B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116030900B (en) * 2023-03-24 2023-06-16 安徽瑞邦数科科技服务有限公司 Method, device, equipment and storage medium for controlling component content of chemical product
CN117524347B (en) * 2023-11-20 2024-04-16 中南大学 First principle prediction method for acid radical anion hydration structure accelerated by machine learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942841A (en) * 2013-08-15 2014-07-23 中国地质科学院矿产资源研究所 Mineral resource multivariate information processing method and system based on GIS
CN106355011A (en) * 2016-08-30 2017-01-25 有色金属矿产地质调查中心 Geochemical data element sequence structure analysis method and device
CN113706354A (en) * 2021-09-02 2021-11-26 浙江索思科技有限公司 Marine integrated service management system based on big data technology
CN113780465A (en) * 2021-09-27 2021-12-10 中国水利水电科学研究院 Underground water chemistry seasonal change analysis method based on self-organizing neural network
CN113887635A (en) * 2021-10-08 2022-01-04 河海大学 Basin similarity classification method and classification device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942841A (en) * 2013-08-15 2014-07-23 中国地质科学院矿产资源研究所 Mineral resource multivariate information processing method and system based on GIS
CN106355011A (en) * 2016-08-30 2017-01-25 有色金属矿产地质调查中心 Geochemical data element sequence structure analysis method and device
CN113706354A (en) * 2021-09-02 2021-11-26 浙江索思科技有限公司 Marine integrated service management system based on big data technology
CN113780465A (en) * 2021-09-27 2021-12-10 中国水利水电科学研究院 Underground water chemistry seasonal change analysis method based on self-organizing neural network
CN113887635A (en) * 2021-10-08 2022-01-04 河海大学 Basin similarity classification method and classification device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
潘三矿区地下水化学特征及成因分析;张妹;刘启蒙;刘凯旋;;煤矿开采(02);272-274 *

Also Published As

Publication number Publication date
CN115293577A (en) 2022-11-04

Similar Documents

Publication Publication Date Title
CN115293577B (en) Machine learning-based high-cold-flow-domain groundwater chemical control factor analysis method
Jiang et al. Uncovering flooding mechanisms across the contiguous United States through interpretive deep learning on representative catchments
Hassan et al. Efficient prediction of water quality index (WQI) using machine learning algorithms
CN105391083B (en) Wind power interval short term prediction method based on variation mode decomposition and Method Using Relevance Vector Machine
Stojanova et al. Dealing with spatial autocorrelation when learning predictive clustering trees
Bieroza et al. Exploratory analysis of excitation–emission matrix fluorescence spectra with self-organizing maps—A tutorial
CN112785450B (en) Soil environment quality partitioning method and system
Gholami et al. Integrated modelling for mapping spatial sources of dust in central Asia-An important dust source in the global atmospheric system
Pantiskas et al. Interpretable multivariate time series forecasting with temporal attention convolutional neural networks
CN109583635A (en) A kind of short-term load forecasting modeling method towards operational reliability
Li et al. A novel combined prediction model for monthly mean precipitation with error correction strategy
Wang et al. R2-trans: Fine-grained visual categorization with redundancy reduction
Hemann et al. Assessing positive matrix factorization model fit: a new method to estimate uncertainty and bias in factor contributions at the measurement time scale
Cui et al. Deep learning methods for atmospheric PM2. 5 prediction: A comparative study of transformer and CNN-LSTM-attention
Merufinia et al. On the simulation of streamflow using hybrid tree-based machine learning models: A case study of Kurkursar basin, Iran
Pyo et al. Long short-term memory models of water quality in inland water environments
CN117078114B (en) Water quality evaluation method and system for water-bearing lakes under influence of diversion engineering
Hossain et al. Artificial neural network modelling technique in predicting Western Australian seasonal rainfall
Zheng et al. Application of complete ensemble empirical mode decomposition based multi-stream informer (CEEMD-MsI) in PM2. 5 concentration long-term prediction
Kidzinski et al. Generalized Matrix Factorization: efficient algorithms for fitting generalized linear latent variable models to large data arrays
Zárate et al. FCANN: A new approach for extraction and representation of knowledge from ANN trained via formal concept analysis
Hu et al. Grain yield predict based on GRA-AdaBoost-SVR model
Li et al. Quantitative morphology of polder landscape based on SOM identification model: Case study of typical polders in the south of Yangtze River
Wang et al. Medium and long-term trend prediction of urban air quality based on deep learning
Li Application of Fuzzy K‐Means Clustering Algorithm in the Innovation of English Teaching Evaluation Method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant