CN118039031A - Method for judging regional ore-forming potential based on machine learning of apatite components - Google Patents
Method for judging regional ore-forming potential based on machine learning of apatite components Download PDFInfo
- Publication number
- CN118039031A CN118039031A CN202311652244.3A CN202311652244A CN118039031A CN 118039031 A CN118039031 A CN 118039031A CN 202311652244 A CN202311652244 A CN 202311652244A CN 118039031 A CN118039031 A CN 118039031A
- Authority
- CN
- China
- Prior art keywords
- dataset
- apatite
- models
- features
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 229910052586 apatite Inorganic materials 0.000 title claims abstract description 54
- VSIIXMUUUJUKCM-UHFFFAOYSA-D pentacalcium;fluoride;triphosphate Chemical compound [F-].[Ca+2].[Ca+2].[Ca+2].[Ca+2].[Ca+2].[O-]P([O-])([O-])=O.[O-]P([O-])([O-])=O.[O-]P([O-])([O-])=O VSIIXMUUUJUKCM-UHFFFAOYSA-D 0.000 title claims abstract description 54
- 238000010801 machine learning Methods 0.000 title claims abstract description 30
- 238000000034 method Methods 0.000 title claims abstract description 30
- 229910052500 inorganic mineral Inorganic materials 0.000 claims abstract description 24
- 239000011707 mineral Substances 0.000 claims abstract description 24
- 238000004458 analytical method Methods 0.000 claims abstract description 20
- 239000011435 rock Substances 0.000 claims abstract description 19
- 229910052727 yttrium Inorganic materials 0.000 claims abstract description 16
- 238000012549 training Methods 0.000 claims abstract description 7
- 235000010755 mineral Nutrition 0.000 claims description 20
- 229910052684 Cerium Inorganic materials 0.000 claims description 15
- 229910052720 vanadium Inorganic materials 0.000 claims description 15
- VYPSYNLAJGMNEJ-UHFFFAOYSA-N Silicium dioxide Chemical compound O=[Si]=O VYPSYNLAJGMNEJ-UHFFFAOYSA-N 0.000 claims description 13
- 239000011573 trace mineral Substances 0.000 claims description 12
- 235000013619 trace mineral Nutrition 0.000 claims description 12
- 229910052693 Europium Inorganic materials 0.000 claims description 10
- 229910052746 lanthanum Inorganic materials 0.000 claims description 10
- 230000006870 function Effects 0.000 claims description 7
- 229910052761 rare earth metal Inorganic materials 0.000 claims description 7
- 229910052765 Lutetium Inorganic materials 0.000 claims description 6
- KKCBUQHMOMHUOY-UHFFFAOYSA-N Na2O Inorganic materials [O-2].[Na+].[Na+] KKCBUQHMOMHUOY-UHFFFAOYSA-N 0.000 claims description 6
- 229910052779 Neodymium Inorganic materials 0.000 claims description 6
- 229910052772 Samarium Inorganic materials 0.000 claims description 6
- 229910052681 coesite Inorganic materials 0.000 claims description 6
- 230000000875 corresponding effect Effects 0.000 claims description 6
- 229910052906 cristobalite Inorganic materials 0.000 claims description 6
- UQSXHKLRYXJYBZ-UHFFFAOYSA-N iron oxide Inorganic materials [Fe]=O UQSXHKLRYXJYBZ-UHFFFAOYSA-N 0.000 claims description 6
- 239000000377 silicon dioxide Substances 0.000 claims description 6
- 229910052682 stishovite Inorganic materials 0.000 claims description 6
- 229910052905 tridymite Inorganic materials 0.000 claims description 6
- 229910052770 Uranium Inorganic materials 0.000 claims description 5
- 239000011159 matrix material Substances 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 5
- 238000011156 evaluation Methods 0.000 claims description 4
- BVKZGUZCCUSVTD-UHFFFAOYSA-L Carbonate Chemical compound [O-]C([O-])=O BVKZGUZCCUSVTD-UHFFFAOYSA-L 0.000 claims description 3
- 229910052692 Dysprosium Inorganic materials 0.000 claims description 3
- 229910052691 Erbium Inorganic materials 0.000 claims description 3
- 229910052688 Gadolinium Inorganic materials 0.000 claims description 3
- 229910052689 Holmium Inorganic materials 0.000 claims description 3
- 229910052777 Praseodymium Inorganic materials 0.000 claims description 3
- 229910052771 Terbium Inorganic materials 0.000 claims description 3
- 229910052776 Thorium Inorganic materials 0.000 claims description 3
- 229910052775 Thulium Inorganic materials 0.000 claims description 3
- 229910052769 Ytterbium Inorganic materials 0.000 claims description 3
- 230000004075 alteration Effects 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 3
- ABUQEJQUPDARBN-UHFFFAOYSA-N copper gold(3+) iron(2+) oxygen(2-) Chemical compound [Au+3].[Cu+2].[O-2].[Fe+2] ABUQEJQUPDARBN-UHFFFAOYSA-N 0.000 claims description 3
- 230000002596 correlated effect Effects 0.000 claims description 3
- 238000002790 cross-validation Methods 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 239000010931 gold Substances 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 229910052726 zirconium Inorganic materials 0.000 claims description 3
- BASFCYQUMIYNBI-UHFFFAOYSA-N platinum Chemical group [Pt] BASFCYQUMIYNBI-UHFFFAOYSA-N 0.000 claims description 2
- 239000004615 ingredient Substances 0.000 claims 2
- 230000033558 biomineral tissue development Effects 0.000 abstract description 6
- 239000000126 substance Substances 0.000 abstract description 5
- 238000010586 diagram Methods 0.000 abstract description 3
- 229910052751 metal Inorganic materials 0.000 abstract description 3
- 239000002184 metal Substances 0.000 abstract description 3
- 230000015572 biosynthetic process Effects 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- DGAQECJNVWCQMB-PUAWFVPOSA-M Ilexoside XXIX Chemical compound C[C@@H]1CC[C@@]2(CC[C@@]3(C(=CC[C@H]4[C@]3(CC[C@@H]5[C@@]4(CC[C@@H](C5(C)C)OS(=O)(=O)[O-])C)C)[C@@H]2[C@]1(C)O)C)C(=O)O[C@H]6[C@@H]([C@H]([C@@H]([C@H](O6)CO)O)O)O.[Na+] DGAQECJNVWCQMB-PUAWFVPOSA-M 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 239000011734 sodium Substances 0.000 description 2
- 229910052708 sodium Inorganic materials 0.000 description 2
- 229910052845 zircon Inorganic materials 0.000 description 2
- GFQYVLUOOAAOGM-UHFFFAOYSA-N zirconium(iv) silicate Chemical compound [Zr+4].[O-][Si]([O-])([O-])[O-] GFQYVLUOOAAOGM-UHFFFAOYSA-N 0.000 description 2
- -1 (La/Yb) N Inorganic materials 0.000 description 1
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 229910018054 Ni-Cu Inorganic materials 0.000 description 1
- 229910018481 Ni—Cu Inorganic materials 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 229910052802 copper Inorganic materials 0.000 description 1
- 239000010949 copper Substances 0.000 description 1
- 238000002425 crystallisation Methods 0.000 description 1
- 230000008025 crystallization Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000006866 deterioration Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000035558 fertility Effects 0.000 description 1
- 239000010438 granite Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 230000000155 isotopic effect Effects 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 229910052748 manganese Inorganic materials 0.000 description 1
- 238000010239 partial least squares discriminant analysis Methods 0.000 description 1
- 229910052611 pyroxene Inorganic materials 0.000 description 1
- 239000010453 quartz Substances 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/90—Programming languages; Computing architectures; Database systems; Data warehousing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/01—Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/70—Machine learning, data mining or chemometrics
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Crystallography & Structural Chemistry (AREA)
- Physics & Mathematics (AREA)
- Chemical & Material Sciences (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Investigating Or Analyzing Non-Biological Materials By The Use Of Chemical Means (AREA)
Abstract
The invention provides a method for distinguishing regional mineral potential based on machine learning of apatite components, which comprises the steps of compiling three global datasets of chemical components of main elements and/or microelements of apatite from mineral and non-mineral rock samples, and training a series of XGBoost models to determine the mineral potential of a deposit. Compared with the traditional binary diagram, the new classification method has greatly improved accuracy and efficiency in distinguishing whether the apatite is from a rich ore rock body or a lean ore rock body. In addition, feature importance analysis shows that V/Y and Cl/F ratios and S content are critical to metal enrichment and mineralization.
Description
Technical Field
The invention relates to the technical field of geological investigation and mineral exploration, in particular to a method for distinguishing regional ore potential based on machine learning of apatite components.
Background
Apatite (Ca 5[PO4]3 [ F, cl, OH ]) is a widely occurring side mineral in most igneous and metamorphic rock and exogenous debris deposits, and has a strong resistance to weathering. In view of its sensitivity to the crystalline environment, its chemical composition is considered to be an ideal indicator mineral. The trace elements and volatile chemical components as well as isotopic characteristics of apatite can characterize different crystallization environments, including magma systems, low-grade metamorphic systems, and depositional environments. Therefore, the microelement chemical features of apatite are widely used to reflect the lithology of source rock, including tracing the place of origin of clastic rock, and to constrain rock causative processes, particularly to reveal the origin and evolution of magma. In addition, the main and trace element chemistries of apatite are also used in mineral prospecting, including the use of various chemical metrics such as Sr/Y, mn, eu/Eu *, th/U, la/Sm and (Ce/Yb) N, and binary classification schemes such as Sr vs. F (Mn, Y, (La/Yb) N、Eu/Eu*)、F/Cl vs. F、Cl vs. Eu/Eu*, th/U, la/Sm and (Ce/Yb) N. Where F, cl vs. Eu/Eu *、V/Y vs. REE+Y、Cl vs. SO3 and 87Sr/86 Sr vs. Cl/F, etc. binary classification schemes are commonly used to determine the mineral formation of rock magma.
The field of Machine Learning (ML) involves the use of computer programming to identify data rules in a dataset, which are then applied to predictions. Machine learning provides a powerful tool kit for decoding potential information in high-dimensional data. In the past few years, there has been a great deal of interest in the application of ML in solid earth science. ML has been widely used for seismic phase detection and seismic classification, geophysical data processing and image interpretation, geophysical inversion, and multi-physics and multi-disciplinary information integration. Given the complexity and diversity of geochemical data, ML-based classification methods have become a promising approach over traditional methods, particularly in large scale geological processes such as predicting global mantle deterioration, revealing source components of basalt in the slab, identifying connate water concentrations in the mantle pyroxene, determining quartz forming environments, and classifying source rocks of clastic zircon. In the field of mineral deposit exploration, there are two studies attempting to describe the mineralisation of magma using ML based on zircon composition data, with the aim of determining the potential for copper mineralization of porphyry. Tan et al (2023) used partial least squares discriminant analysis (PLS-DA) on apatite trace element datasets (4,298 data) to distinguish between apatite from different types of deposits and rocks. Their spectra cannot be directly distinguished into mineral magma apatite and hydrothermal apatite, but show great potential in classifying lean and rich mineral apatite from granite related deposits and underscores the role of V, eu and Sr in classification.
Here, the present invention compiles a global data set of three apatite major elements and/or trace element chemistries from ore-forming and non-ore-forming rock samples and trains a series XGBoost of models to determine the ore-forming potential of the deposit. Compared with the traditional binary diagram, the new classification method has greatly improved accuracy and efficiency in distinguishing whether the apatite is from a rich ore rock body or a lean ore rock body. In addition, feature importance analysis shows that V/Y and Cl/F ratios and S content are critical to metal enrichment and mineralization.
Disclosure of Invention
In order to make up for the defects of the prior art, the invention provides a method for distinguishing regional mineral potential based on machine learning of apatite components.
The invention is realized by the following technical scheme: the method for distinguishing regional ore potential based on machine learning of apatite components specifically comprises the following steps:
S1, database construction: the original dataset used for modeling contained 13382 pieces of apatite component data; deposit types are classified according to their value, morphology, alteration, ore mineralogy, and host rock relevance; the analysis results collected from apatite formed with the deposit are labeled "mineralized", and the apatite analysis results in unmineralized rock are labeled "unmineralized"; according to these criteria, 9104 and 4278 pieces of data are labeled "mineralized" and "unmineralized", respectively;
S2, dividing the sub-databases: the original dataset is divided into three subsets, wherein the analysis result of the sample containing CaO, P 2O5、SO3, cl and F is selected as a 'main quantity' dataset, and the analysis result of the sample containing trace elements is selected as a 'trace' dataset; the analysis result containing both principal and trace elements is set as a "principal and trace" dataset;
S3, preprocessing the data collected in the S2:
S31, processing missing values: i.e. eliminating any element with a missing value greater than 60% of the whole column; after filtering, the "master" dataset included 5618 pieces of data, and the "trace" dataset included 9979 pieces of data;
S32, calculating geochemical indexes, including LREE、HREE、Sr/Y、V/Y、Ce/Nd、Eu*、Ce* N、EuN/Eu* N、Ce/Ce*、Eu/Eu*/Y、REE+Y、(La/Yb)N and La/Sm, and adding the geochemical indexes to a micro-data set to serve as new characteristics; the "major and minor" data set includes 2448 pieces of data and 43 features;
s4, a machine learning method: adopting XGBoost model, the training method is addition operation, and each new tree is added to adapt to the residual error of the previous prediction; adding the results of all the trees to obtain a final prediction result; given a dataset d= { (xi, yi) } (|d|=n, xi e Rm, yi e R), where there are n examples and m features, the output of the tree set model using K addition functions is predicted as the sum of K scores:
Wherein, Representing the space of the regression tree, the function q representing the structure of each tree, which maps an example to the corresponding leaf index, T being the number of leaves in the tree, each f k corresponding to an independent tree structure q and leaf weight w, w i representing the score on the ith leaf;
S5, model super-parameter adjustment: combining the five-fold cross validation method with a grid search strategy, wherein the grid search strategy thoroughly generates candidate parameters from a parameter value grid, and selects and outputs the candidate parameters with highest scores according to the evaluation result of the predefined index;
S6, machine learning classification results: 14 XGBoost models were trained based on three apatite component datasets altogether; five models were trained using the "prime and trace" datasets, the number of selected features being 43, 35, 22, 12 and 6, respectively, two models were trained using the "prime" dataset, all ten prime elements were used to train model M-1, and four selected elements were used to train model M-2, the "trace" dataset was used to train seven models, and the number of relevant features was set to 33, 28, 21, 14, 7, 3 and 2 in order; the classification result of the XGBoost model is displayed as a confusion matrix;
Obtaining the relative importance of all features used in each model from XGBoost algorithm to determine the elements in apatite that are highly correlated with the mineralisation;
Preferably, the ore deposit in the step S1 comprises a porphyry type, a skarn type, a shallow low-temperature Au-Ag ore deposit, a mountain-forming Au ore deposit, a copper-iron-oxide gold ore (IOCG), a sodium-based oxide (IOA), a mountain-forming Ni-Cu+ -platinum group element ore and a carbonate ore deposit.
Preferably, the features of the "major" dataset in step S31 include CaO, P 2O5、SO3、F、Cl、FeO、MnO、Na2O、SiO2 and Cl/F, and the features of the "minor" dataset include V, mn, rb, sr, Y, zr, la, ce, pr, nd, sm, eu, gd, tb, dy, ho, er, tm, yb and Lu; the features of the "master and trace" data sets include CaO、P2O5、SO3、F、Cl、FeO、Cl/F、SiO2、Na2O、MgO、Rb、Sr、Y、Zr、La、Ce、Pr、Nd、Sm、Ee/Sm)、Pr、Nd、Sm、Eu、Gd、Tb、Dy、Ho、Er、Tm、Yb、Lu、Th、U、Sr/Y、V/Y、Ce/Nd、Eu*、Ce* N、EuN/Eu* N、Ce/Ce*、Eu/Eu*/Y、REE+Y、(La/Yb)N、La/Sm、LREE、HREE.
Preferably, the grid search in step S5 is performed by determining the optimal combination of the super parameters including eta, gamma, maximum depth and alpha, generating 3600 candidate models, and selecting the optimal model
Preferably, V, sr, Y, eu, ce and Rb most often appear in the top ten features of the relative importance ranking, V being most important in the ranking, in all models of "trace" datasets in step S6; of all five models of V are selected, the relative importance of V is highest among the four models, and second among the remaining one model; the relative importance of the SO 3 content is highest in the two models of the 'prime quantity' data set, and the proportion of each characteristic is quite consistent; features that play a key role in the "principal and micro" dataset models are similar to those in the "micro" dataset models.
The invention adopts the technical proposal, and compared with the prior art, the invention has the following beneficial effects: in the present invention, the performance of several conventional apatite fertility indicators was evaluated using the raw data set (fig. 4). For example, xu et al (2021) have proposed three indices in apatite that can effectively distinguish between rich and lean porphyry. However, when applied to the dataset of the present study, its best accuracy was only 0.553 (fig. 4 a). More precisely, the classification based on Cl/F ratio (fig. 4 a) had a True Positive Rate (TPR) of 0.421 for the rich mineral apatite and a True Negative Rate (TNR) of 0.580 for the lean mineral apatite. The accuracy of the V and Y double graph (fig. 4 b), TPR and TNR were 0.261, 0.866 and 0.026, respectively, indicating that it was able to identify rich mineral apatite but not lean mineral apatite. In addition, on a global data set, conventional discriminant graphs show lower accuracy (from 0.242 to 0.553), which when applied to mineral exploration may lead to erroneous mineralisation potential assessment and unreliable mineralisation zone localization.
As the geochemical data associated with apatite increases, the limitations of conventional research methods are also increasingly prominent. One of the main limitations is that the mineral-rich geochemical index of local porphyry cannot be accurately applied to the mineral formation evaluation of other areas. In addition, the traditional method which only depends on limited indexes cannot comprehensively consider the ore formation information contained in various elements, so that the potential of metal enrichment cannot be effectively estimated.
ML models capable of processing high-dimensional geochemical data are considered to be powerful mineral exploration tools. Compared with the traditional element two-dimensional graph, the XGBoost model in the study is obviously more accurate and efficient, and the accuracy is varied from 0.8507 to 0.9918, which shows that the success rate is higher in the processes of prospecting and prospecting. In addition, ML can integrate all the characteristics of the apatite microelements at the same time, and directly capture the relationship between geochemical data and mineralization. The advantage of this approach is that the results are applicable to any geological environment. As the amount of apatite geochemical data from various deposit types increases, ML models trained on such data sets may become more complex and accurate.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the invention will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:
FIG. 1 is a block diagram of major and trace elements and geochemical indicators of a global apatite sample, expressed as weight percentages (a) and ppm (b). Boxes represent the quartile spacing (IQR) and mark the upper quartile (75%) and the lower quartile (25%). Outliers extend to 1.5 times that of IQR. The horizontal line within the color box represents the median (50%). Black square symbols and circular symbols represent average and outliers, respectively;
Fig. 2 is a confusion matrix (left) and feature importance ranking (right) for four representative XGBoost models. The confusion matrix displays the prediction result of each category;
FIG. 3 is a correlation between feature selection and XGBoost model performance;
Fig. 4 is a scatter plot of elemental ratios of rich ("mineralized") and lean ("unmineralized") apatite in the raw dataset.
Detailed Description
In order that the above-recited objects, features and advantages of the present application will be more clearly understood, a more particular description of the application will be rendered by reference to the appended drawings and appended detailed description. It should be noted that, without conflict, the embodiments of the present application and features in the embodiments may be combined with each other.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced otherwise than as described herein, and therefore the scope of the present invention is not limited to the specific embodiments disclosed below.
The method for discriminating regional mineral potential based on machine learning of apatite components according to the embodiment of the present invention will be specifically described with reference to fig. 1 to 3.
The invention provides a method for distinguishing regional mineralization potential based on machine learning of apatite components, which specifically comprises the following steps:
S1, database construction: all apatite composition data for modeling were collected and compiled from existing literature, containing 241 sampling points in 27 countries worldwide. Each site includes a plurality of samples and analyses. The raw dataset contains 13382 pieces of apatite component data, including point analysis data and averages of documents not providing point analysis data. FIG. 1 shows the elements and geochemical data structures contained in the dataset;
Deposit types are classified according to their value, morphology, alteration, ore mineralogy, and host rock relevance; the ore deposit includes a porphyry type, a skarn type, a shallow low temperature Au-Ag ore deposit, a mountain-forming Au ore deposit, a copper iron oxide gold ore (IOCG), a sodium-based (IOA) ore, a mountain-forming Ni-cu±platinum group element ore, and a carbonate ore deposit. The analysis results collected from apatite formed with the deposit are labeled "mineralized", and the apatite analysis results in unmineralized rock are labeled "unmineralized"; according to these criteria, 9104 and 4278 pieces of data are labeled "mineralized" and "unmineralized", respectively;
S2, dividing the sub-databases: to further distinguish the effects of the principal and trace elements, the original dataset is divided into three subsets, with the analysis results for samples containing CaO, P 2O5、SO3, cl and F selected as the "principal" dataset and the analysis results for samples containing trace elements selected as the "trace" dataset; the analysis result containing both principal and trace elements is set as a "principal and trace" dataset;
S3, preprocessing the data collected in the S2:
s31, processing missing values: i.e. eliminating any element with a missing value greater than 60% of the whole column; after filtering, the "main amount" data set comprises 5618 pieces of data, the characteristics of the "main amount" data set comprise CaO, P 2O5、SO3、F、Cl、FeO、MnO、Na2O、SiO2 and Cl/F, the "micro" data set comprises 9979 pieces of data, and the characteristics of the "micro" data set comprise V, mn, rb, sr, Y, zr, la, ce, pr, nd, sm, eu, gd, tb, dy, ho, er, tm, yb and Lu;
S32, calculating geochemical indexes, and adding the geochemical indexes into a micro data set as new characteristics, wherein the indexes are considered to have important significance on ore formation and magma evolution. These indices include LREE、HREE、Sr/Y、V/Y、Ce/Nd、Eu*、Ce* N、EuN/Eu* N、Ce/Ce*、Eu/Eu*/Y、REE+Y、(La/Yb)N and La/Sm; the "major and minor" data set includes 2448 pieces of data and 43 features; the features of the "master and trace" data sets include CaO、P2O5、SO3、F、Cl、FeO、Cl/F、SiO2、Na2O、MgO、Rb、Sr、Y、Zr、La、Ce、Pr、Nd、Sm、Ee/Sm)、Pr、Nd、Sm、Eu、Gd、Tb、Dy、Ho、Er、Tm、Yb、Lu、Th、U、Sr/Y、V/Y、Ce/Nd、Eu*、Ce* N、EuN/Eu* N、Ce/Ce*、Eu/Eu*/Y、REE+Y、(La/Yb)N、La/Sm、LREE、HREE.
S4, a Machine Learning (ML) method:
XGBoost is a gradient tree promotion based ML system that can solve real world scale problems with minimal resources. XGBoost is a distributed gradient promotion library, which is optimized for high efficiency and flexibility. Its flexibility is manifested in being able to handle sparse data with a variety of possible reasons, including missing values and frequently occurring 0 values. In addition, its parallel and distributed computing capabilities help to speed up learning, thereby enabling faster model exploration. Highly scalable end-to-end tree enhancement systems can be efficiently extended to larger data sets with minimal cluster resources. In addition, the XGBoost tree structure can identify important features, so that the interpretation of the result is improved, and the relationship between the apatite component and the ore formation is clarified, and the geochemical significance of the apatite component is explored.
Using the XGBoost model, XGBoost is an ML algorithm that runs under a gradient lifting framework. The training method is addition operation, and each new tree is added to adapt to the residual error of the previous prediction; adding the results of all the trees to obtain a final prediction result; given a dataset d= { (xi, yi) } (|d|=n, xi e Rm, yi e R), where there are n examples and m features, the output of the tree set model using K addition functions is predicted as the sum of K scores:
Wherein, Representing the space of the regression tree, the function q representing the structure of each tree, which maps an example to the corresponding leaf index, T being the number of leaves in the tree, each f k corresponding to an independent tree structure q and leaf weight w, w i representing the score on the ith leaf;
S5, model super-parameter adjustment: the five-fold cross-validation method is combined with a grid search strategy for optimizing XGBoost models. The grid searching strategy thoroughly generates candidate parameters from a parameter value grid, and selects and outputs the candidate parameters with highest scores according to the evaluation result of the predefined index; the grid search procedure is to determine the best combination of the hyper-parameters including eta, gamma, maximum depth and alpha, and generate 3600 candidate models from which the best model is selected.
S6, machine learning classification results: based on the three apatite component datasets, 14 XGBoost models were trained in total according to different feature choices; five models were trained using the "prime and trace" datasets, 43, 35, 22, 12 and 6 respectively, two models were trained using the "prime" dataset, all ten prime elements were used to train model M-1, and four selected elements were used to model M-2, the "trace" dataset was considered very important for recognition mineralization and therefore was used to train seven models, the relevant feature numbers were set to 33, 28, 21, 14, 7, 3 and 2 in sequence; the classification result of the XGBoost model is displayed as a confusion matrix; fig. 2 shows the predicted results of four representative models.
Obtaining the relative importance of all features used in each model from XGBoost algorithm to determine the elements in apatite that are highly correlated with the mineralisation; of all models of "trace" datasets, V, sr, Y, eu, ce and Rb occur most often in the top ten features of the relative importance ranking, V being most important in the ranking; of all five models of V are selected, the relative importance of V is highest among the four models, and second among the remaining one model; some geochemical criteria also have an impact on ranking, including Sr/Y, V/Y, eu *、(La/Yb)N and La/Sm. The relative importance of the SO 3 content is highest in the two models of the "prime" dataset. However, the proportions of each feature are quite consistent; features that play a key role in the "principal and micro" dataset models are similar to those in the "micro" dataset models. In addition, cl, F and Cl/F are also notable.
Feature selection: the classification results also indicate that there is a positive correlation between the number of features and the model performance. As shown in FIG. 3, the XGBoost model scores higher when training on more elements and geochemical indices. For example, the accuracy and F1 score increases from 0.9146 and 0.8507 for model T-7 (feature number=2) to 0.9682 and 0.9474 for model T-5 (feature number=7), and 0.9939 and 0.9900 for model T-3 (feature number=33).
Overall, of the 12 models, more than 90% of the samples from the test set were correctly classified by 10 models (accuracy greater than 0.9), indicating that the models in this study perform well in distinguishing between "mineralized" and "unmineralized" apatite. Of all 14 models, model M-T-1 obtained the highest score on both the training set and the test set. In the results of this model, all samples in the training set were correctly classified (accuracy=1), and more than 99% of samples in the test set were correctly classified (accuracy=0.9918). The elemental data obtained in practice may not be sufficient to meet the requirements of model M-T-1; however, model M-T-4 can achieve similar performance with only 9 elements (12 features), with accuracy and F1 scores of 0.9878 and 0.900, respectively. This suggests that the classification model in the present study may function in various situations. However, when the number of selected features is reduced to 2, the performance of the XGBoost model drops dramatically (fig. 3). From the overall classification results, the XGBoost model in this study clearly achieves excellent performance after appropriate feature selection and is applicable to various situations.
In the description of the present specification, the terms "one embodiment," "some embodiments," "particular embodiments," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (5)
1. The method for distinguishing regional ore potential based on machine learning of apatite components is characterized by comprising the following steps:
s1, database construction: the original dataset used for modeling contained 13382 pieces of apatite component data;
Deposit types are classified according to their value, morphology, alteration, ore mineralogy, and host rock relevance; the analysis results collected from apatite formed with the deposit are labeled "mineralized", and the apatite analysis results in unmineralized rock are labeled "unmineralized"; according to these criteria, 9104 and 4278 pieces of data are labeled "mineralized" and "unmineralized", respectively;
S2, dividing the sub-databases: the original dataset is divided into three subsets, wherein the analysis result of the sample containing CaO, P 2O5、SO3, cl and F is selected as a 'main quantity' dataset, and the analysis result of the sample containing trace elements is selected as a 'trace' dataset; the analysis result containing both principal and trace elements is set as a "principal and trace" dataset;
S3, preprocessing the data collected in the S2:
S31, processing missing values: i.e. eliminating any element with a missing value greater than 60% of the whole column; after filtering, the "master" dataset included 5618 pieces of data, and the "trace" dataset included 9979 pieces of data;
S32, calculating geochemical indexes, including LREE、HREE、Sr/Y、V/Y、Ce/Nd、Eu*、Ce* N、EuN/Eu* N、Ce/Ce*、Eu/Eu*/Y、REE+Y、(La/Yb)N and La/Sm, and adding the geochemical indexes to a micro-data set to serve as new characteristics; the "major and minor" data set includes 2448 pieces of data and 43 features;
S4, a machine learning method: adopting XGBoost model, its training method is addition operation, and every new tree is added to adapt to the residual error of previous prediction; adding the results of all the trees to obtain a final prediction result; given a dataset d= { (xi, yi) } (|d|=n, xi e Rm, yi e R), where there are n examples and m features, the output of the tree set model using K addition functions is predicted as the sum of K scores:
Wherein, Representing the space of the regression tree, the function q representing the structure of each tree, which maps an example to the corresponding leaf index, T being the number of leaves in the tree, each f k corresponding to an independent tree structure q and leaf weight w, w i representing the score on the ith leaf;
S5, model super-parameter adjustment: combining the five-fold cross validation method with a grid search strategy, wherein the grid search strategy thoroughly generates candidate parameters from a parameter value grid, and selects and outputs the candidate parameters with highest scores according to the evaluation result of the predefined index;
S6, machine learning classification results: 14 XGBoost models were trained based on three apatite component datasets altogether; five models were trained using the "prime and trace" datasets, the number of selected features being 43, 35, 22, 12 and 6, respectively, two models were trained using the "prime" dataset, all ten prime elements were used to train model M-1, and four selected elements were used to train model M-2, the "trace" dataset was used to train seven models, and the number of relevant features was set to 33, 28, 21, 14, 7, 3 and 2 in order; the classification result of the XGBoost model is displayed as a confusion matrix;
the relative importance of all features used in each model was obtained from the XGBoost algorithm to determine the elements in apatite that are highly correlated with the mineralisation.
2. The method for determining regional mineral potential based on machine learning of apatite ingredients according to claim 1, wherein the ore deposit in step S1 comprises a porphyry type, a skarn type, a shallow low temperature Au-Ag ore deposit, a mountain-forming Au ore deposit, a copper iron oxide gold ore (IOCG), a rhynchophylla type (IOA), a mountain-forming Ni-cu±platinum group element ore, and a carbonate ore deposit.
3. The method for determining regional mineral potential based on machine learning of apatite ingredients according to claim 1, wherein the features of the "major" dataset in step S31 include CaO, P 2O5、SO3、F、Cl、FeO、MnO、Na2O、SiO2 and Cl/F, and the features of the "minor" dataset include V, mn, rb, sr, Y, zr, la, ce, pr, nd, sm, eu, gd, tb, dy, ho, er, tm, yb and Lu; the features of the "master and trace" data sets include CaO、P2O5、SO3、F、Cl、FeO、Cl/F、SiO2、Na2O、MgO、Rb、Sr、Y、Zr、La、Ce、Pr、Nd、Sm、Ee/Sm)、Pr、Nd、Sm、Eu、Gd、Tb、Dy、Ho、Er、Tm、Yb、Lu、Th、U、Sr/Y、V/Y、Ce/Nd、Eu*、Ce* N、EuN/Eu* N、Ce/Ce*、Eu/Eu*/Y、REE+Y、(La/Yb)N、La/Sm、LREE、HREE.
4. The method according to claim 1, wherein the mesh searching in step S5 is performed by determining the optimal combination of super parameters including eta, gamma, maximum depth and alpha, and generating 3600 candidate models, and selecting the optimal model.
5. The method of claim 1, wherein, in step S6, of all models of "micro" datasets, V, sr, Y, eu, ce and Rb occur most frequently in the top ten features of relative importance ranking, V being most important in ranking; of all five models of V are selected, the relative importance of V is highest among the four models, and second among the remaining one model; the relative importance of the SO 3 content is highest in the two models of the 'prime quantity' data set, and the proportion of each characteristic is quite consistent; features that play a key role in the "principal and micro" dataset models are similar to those in the "micro" dataset models.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311652244.3A CN118039031B (en) | 2023-12-05 | 2023-12-05 | Method for judging regional ore-forming potential based on machine learning of apatite components |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311652244.3A CN118039031B (en) | 2023-12-05 | 2023-12-05 | Method for judging regional ore-forming potential based on machine learning of apatite components |
Publications (2)
Publication Number | Publication Date |
---|---|
CN118039031A true CN118039031A (en) | 2024-05-14 |
CN118039031B CN118039031B (en) | 2024-07-16 |
Family
ID=90984836
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311652244.3A Active CN118039031B (en) | 2023-12-05 | 2023-12-05 | Method for judging regional ore-forming potential based on machine learning of apatite components |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118039031B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021004198A1 (en) * | 2019-07-10 | 2021-01-14 | 江苏金恒信息科技股份有限公司 | Plate performance prediction method and apparatus |
CN115078520A (en) * | 2022-06-13 | 2022-09-20 | 西藏巨龙铜业有限公司 | Mineral geochemistry-based porphyry system mineralization evaluation method |
CN115148299A (en) * | 2022-07-15 | 2022-10-04 | 中国地质大学(北京) | XGboost-based ore deposit type identification method and system |
CN115482196A (en) * | 2022-08-09 | 2022-12-16 | 中南大学 | Sintering mixed material moisture online soft measurement method and system based on multi-source information fusion |
-
2023
- 2023-12-05 CN CN202311652244.3A patent/CN118039031B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021004198A1 (en) * | 2019-07-10 | 2021-01-14 | 江苏金恒信息科技股份有限公司 | Plate performance prediction method and apparatus |
CN115078520A (en) * | 2022-06-13 | 2022-09-20 | 西藏巨龙铜业有限公司 | Mineral geochemistry-based porphyry system mineralization evaluation method |
CN115148299A (en) * | 2022-07-15 | 2022-10-04 | 中国地质大学(北京) | XGboost-based ore deposit type identification method and system |
CN115482196A (en) * | 2022-08-09 | 2022-12-16 | 中南大学 | Sintering mixed material moisture online soft measurement method and system based on multi-source information fusion |
Non-Patent Citations (1)
Title |
---|
张照录等: "基于XGBoost预测焦家金矿深部资源量", 《首届全国矿产勘查大会论文集》, 12 October 2021 (2021-10-12), pages 1 - 2 * |
Also Published As
Publication number | Publication date |
---|---|
CN118039031B (en) | 2024-07-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105257286B (en) | Method and device for obtaining stratum rock component content | |
CN106355011A (en) | Geochemical data element sequence structure analysis method and device | |
Zhong et al. | Revealing the multi-stage ore-forming history of a mineral deposit using pyrite geochemistry and machine learning-based data interpretation | |
CN117272181B (en) | Method for discriminating zircon host rock and ore forming environment by machine learning modeling | |
D'Antonio et al. | Combined Sr-Nd isotopic and geochemical fingerprinting as a tool for identifying tephra layers: Application to deep-sea cores from Eastern Mediterranean Sea | |
Gong et al. | Using regional geochemical survey data to trace anomalous samples through geochemical genes: The Tieshanlong tungsten deposit area (Southeastern China) case study | |
CN111028095A (en) | Method for quantitatively identifying shale lithofacies based on well logging curve | |
Nykänen et al. | Spatial analysis techniques as successful mineral-potential mapping tools for orogenic gold deposits in the Northern Fennoscandian Shield, Finland | |
Siqueira et al. | Magnetic susceptibility for characterizing areas with different potentials for sugarcane production | |
CN114638300A (en) | Method, device and storage medium for identifying desserts of shale oil and gas reservoir | |
CN118039029A (en) | Method and system for identifying granite type based on machine learning and zircon component | |
Bishop et al. | Using machine learning to identify indicators of rare earth element enrichment in sedimentary strata with applications for metal prospectivity | |
CN118039031B (en) | Method for judging regional ore-forming potential based on machine learning of apatite components | |
Didenko et al. | Tectonic implications: zircon age of sedimentary rocks from Khabarovsk, Samarka, and Zhuravlevka-Amur terranes in the northern Sikhote-Alin Orogenic Belt | |
Huang et al. | Organic matter accumulation of the Upper Triassic Ma'antang shales in the Longmenshan Foreland Basin (western Sichuan, China) | |
CN109444189B (en) | Method for carrying out complex stratum comparison and quantitative evaluation by utilizing digital rock analysis technology | |
CN1187626C (en) | Multielement optimizing control method for prospecting ore deposit | |
Whitten | Application of quantitative methods in the geochemical study of granite massifs | |
Saha et al. | Discriminating quartz host rock based on its trace element chemistry using machine learning-a new tool for sedimentary provenance studies | |
Shelton et al. | Machine learning can assign geologic basin to produced water samples using major ion geochemistry | |
Khammar et al. | Analysis of lithogeochemical data using log-ratio transformations and CA fractal to separate geochemical anomalies in Tak-Talar, Iran | |
Hampton | From Isotopes and Whole Rock Geochemistry to Machine Learning: Diving into the Plumbing System of Large Mafic Eruptions Using a Diverse Geochemical Toolset to Investigate Magmatic Processes | |
Lorin Fassbender et al. | Geochemical Signatures of Mafic Volcanic Rocks in Modern Oceanic Settings and Implications for Archean Mafic Magmatism | |
Sanchez Siachoque | Geochemical Mapping of the North-Central Portion of the Yukon-Tanana Upland, Alaska, United States: Application of Exploratory Data Analysis (EDA) to REE and PGE Mining Prospection | |
Bootorabi et al. | REVIEW OF GEOCHEMICAL, ISOTOPIC AND FLUID INCLUSIONS STUDIES IN RAMAND REGION (QAZVIN PROVINCE) |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |