CN118039031A - Method for judging regional ore-forming potential based on machine learning of apatite components - Google Patents

Method for judging regional ore-forming potential based on machine learning of apatite components Download PDF

Info

Publication number
CN118039031A
CN118039031A CN202311652244.3A CN202311652244A CN118039031A CN 118039031 A CN118039031 A CN 118039031A CN 202311652244 A CN202311652244 A CN 202311652244A CN 118039031 A CN118039031 A CN 118039031A
Authority
CN
China
Prior art keywords
dataset
apatite
models
features
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311652244.3A
Other languages
Chinese (zh)
Other versions
CN118039031B (en
Inventor
许博
郑育宇
温子豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Geosciences Beijing
Original Assignee
China University of Geosciences Beijing
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Geosciences Beijing filed Critical China University of Geosciences Beijing
Priority to CN202311652244.3A priority Critical patent/CN118039031B/en
Publication of CN118039031A publication Critical patent/CN118039031A/en
Application granted granted Critical
Publication of CN118039031B publication Critical patent/CN118039031B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/90Programming languages; Computing architectures; Database systems; Data warehousing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Chemical & Material Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Investigating Or Analyzing Non-Biological Materials By The Use Of Chemical Means (AREA)

Abstract

The invention provides a method for distinguishing regional mineral potential based on machine learning of apatite components, which comprises the steps of compiling three global datasets of chemical components of main elements and/or microelements of apatite from mineral and non-mineral rock samples, and training a series of XGBoost models to determine the mineral potential of a deposit. Compared with the traditional binary diagram, the new classification method has greatly improved accuracy and efficiency in distinguishing whether the apatite is from a rich ore rock body or a lean ore rock body. In addition, feature importance analysis shows that V/Y and Cl/F ratios and S content are critical to metal enrichment and mineralization.

Description

Method for judging regional ore-forming potential based on machine learning of apatite components
Technical Field
The invention relates to the technical field of geological investigation and mineral exploration, in particular to a method for distinguishing regional ore potential based on machine learning of apatite components.
Background
Apatite (Ca 5[PO4]3 [ F, cl, OH ]) is a widely occurring side mineral in most igneous and metamorphic rock and exogenous debris deposits, and has a strong resistance to weathering. In view of its sensitivity to the crystalline environment, its chemical composition is considered to be an ideal indicator mineral. The trace elements and volatile chemical components as well as isotopic characteristics of apatite can characterize different crystallization environments, including magma systems, low-grade metamorphic systems, and depositional environments. Therefore, the microelement chemical features of apatite are widely used to reflect the lithology of source rock, including tracing the place of origin of clastic rock, and to constrain rock causative processes, particularly to reveal the origin and evolution of magma. In addition, the main and trace element chemistries of apatite are also used in mineral prospecting, including the use of various chemical metrics such as Sr/Y, mn, eu/Eu *, th/U, la/Sm and (Ce/Yb) N, and binary classification schemes such as Sr vs. F (Mn, Y, (La/Yb) N、Eu/Eu*)、F/Cl vs. F、Cl vs. Eu/Eu*, th/U, la/Sm and (Ce/Yb) N. Where F, cl vs. Eu/Eu *、V/Y vs. REE+Y、Cl vs. SO3 and 87Sr/86 Sr vs. Cl/F, etc. binary classification schemes are commonly used to determine the mineral formation of rock magma.
The field of Machine Learning (ML) involves the use of computer programming to identify data rules in a dataset, which are then applied to predictions. Machine learning provides a powerful tool kit for decoding potential information in high-dimensional data. In the past few years, there has been a great deal of interest in the application of ML in solid earth science. ML has been widely used for seismic phase detection and seismic classification, geophysical data processing and image interpretation, geophysical inversion, and multi-physics and multi-disciplinary information integration. Given the complexity and diversity of geochemical data, ML-based classification methods have become a promising approach over traditional methods, particularly in large scale geological processes such as predicting global mantle deterioration, revealing source components of basalt in the slab, identifying connate water concentrations in the mantle pyroxene, determining quartz forming environments, and classifying source rocks of clastic zircon. In the field of mineral deposit exploration, there are two studies attempting to describe the mineralisation of magma using ML based on zircon composition data, with the aim of determining the potential for copper mineralization of porphyry. Tan et al (2023) used partial least squares discriminant analysis (PLS-DA) on apatite trace element datasets (4,298 data) to distinguish between apatite from different types of deposits and rocks. Their spectra cannot be directly distinguished into mineral magma apatite and hydrothermal apatite, but show great potential in classifying lean and rich mineral apatite from granite related deposits and underscores the role of V, eu and Sr in classification.
Here, the present invention compiles a global data set of three apatite major elements and/or trace element chemistries from ore-forming and non-ore-forming rock samples and trains a series XGBoost of models to determine the ore-forming potential of the deposit. Compared with the traditional binary diagram, the new classification method has greatly improved accuracy and efficiency in distinguishing whether the apatite is from a rich ore rock body or a lean ore rock body. In addition, feature importance analysis shows that V/Y and Cl/F ratios and S content are critical to metal enrichment and mineralization.
Disclosure of Invention
In order to make up for the defects of the prior art, the invention provides a method for distinguishing regional mineral potential based on machine learning of apatite components.
The invention is realized by the following technical scheme: the method for distinguishing regional ore potential based on machine learning of apatite components specifically comprises the following steps:
S1, database construction: the original dataset used for modeling contained 13382 pieces of apatite component data; deposit types are classified according to their value, morphology, alteration, ore mineralogy, and host rock relevance; the analysis results collected from apatite formed with the deposit are labeled "mineralized", and the apatite analysis results in unmineralized rock are labeled "unmineralized"; according to these criteria, 9104 and 4278 pieces of data are labeled "mineralized" and "unmineralized", respectively;
S2, dividing the sub-databases: the original dataset is divided into three subsets, wherein the analysis result of the sample containing CaO, P 2O5、SO3, cl and F is selected as a 'main quantity' dataset, and the analysis result of the sample containing trace elements is selected as a 'trace' dataset; the analysis result containing both principal and trace elements is set as a "principal and trace" dataset;
S3, preprocessing the data collected in the S2:
S31, processing missing values: i.e. eliminating any element with a missing value greater than 60% of the whole column; after filtering, the "master" dataset included 5618 pieces of data, and the "trace" dataset included 9979 pieces of data;
S32, calculating geochemical indexes, including LREE、HREE、Sr/Y、V/Y、Ce/Nd、Eu*、Ce* N、EuN/Eu* N、Ce/Ce*、Eu/Eu*/Y、REE+Y、(La/Yb)N and La/Sm, and adding the geochemical indexes to a micro-data set to serve as new characteristics; the "major and minor" data set includes 2448 pieces of data and 43 features;
s4, a machine learning method: adopting XGBoost model, the training method is addition operation, and each new tree is added to adapt to the residual error of the previous prediction; adding the results of all the trees to obtain a final prediction result; given a dataset d= { (xi, yi) } (|d|=n, xi e Rm, yi e R), where there are n examples and m features, the output of the tree set model using K addition functions is predicted as the sum of K scores:
Wherein, Representing the space of the regression tree, the function q representing the structure of each tree, which maps an example to the corresponding leaf index, T being the number of leaves in the tree, each f k corresponding to an independent tree structure q and leaf weight w, w i representing the score on the ith leaf;
S5, model super-parameter adjustment: combining the five-fold cross validation method with a grid search strategy, wherein the grid search strategy thoroughly generates candidate parameters from a parameter value grid, and selects and outputs the candidate parameters with highest scores according to the evaluation result of the predefined index;
S6, machine learning classification results: 14 XGBoost models were trained based on three apatite component datasets altogether; five models were trained using the "prime and trace" datasets, the number of selected features being 43, 35, 22, 12 and 6, respectively, two models were trained using the "prime" dataset, all ten prime elements were used to train model M-1, and four selected elements were used to train model M-2, the "trace" dataset was used to train seven models, and the number of relevant features was set to 33, 28, 21, 14, 7, 3 and 2 in order; the classification result of the XGBoost model is displayed as a confusion matrix;
Obtaining the relative importance of all features used in each model from XGBoost algorithm to determine the elements in apatite that are highly correlated with the mineralisation;
Preferably, the ore deposit in the step S1 comprises a porphyry type, a skarn type, a shallow low-temperature Au-Ag ore deposit, a mountain-forming Au ore deposit, a copper-iron-oxide gold ore (IOCG), a sodium-based oxide (IOA), a mountain-forming Ni-Cu+ -platinum group element ore and a carbonate ore deposit.
Preferably, the features of the "major" dataset in step S31 include CaO, P 2O5、SO3、F、Cl、FeO、MnO、Na2O、SiO2 and Cl/F, and the features of the "minor" dataset include V, mn, rb, sr, Y, zr, la, ce, pr, nd, sm, eu, gd, tb, dy, ho, er, tm, yb and Lu; the features of the "master and trace" data sets include CaO、P2O5、SO3、F、Cl、FeO、Cl/F、SiO2、Na2O、MgO、Rb、Sr、Y、Zr、La、Ce、Pr、Nd、Sm、Ee/Sm)、Pr、Nd、Sm、Eu、Gd、Tb、Dy、Ho、Er、Tm、Yb、Lu、Th、U、Sr/Y、V/Y、Ce/Nd、Eu*、Ce* N、EuN/Eu* N、Ce/Ce*、Eu/Eu*/Y、REE+Y、(La/Yb)N、La/Sm、LREE、HREE.
Preferably, the grid search in step S5 is performed by determining the optimal combination of the super parameters including eta, gamma, maximum depth and alpha, generating 3600 candidate models, and selecting the optimal model
Preferably, V, sr, Y, eu, ce and Rb most often appear in the top ten features of the relative importance ranking, V being most important in the ranking, in all models of "trace" datasets in step S6; of all five models of V are selected, the relative importance of V is highest among the four models, and second among the remaining one model; the relative importance of the SO 3 content is highest in the two models of the 'prime quantity' data set, and the proportion of each characteristic is quite consistent; features that play a key role in the "principal and micro" dataset models are similar to those in the "micro" dataset models.
The invention adopts the technical proposal, and compared with the prior art, the invention has the following beneficial effects: in the present invention, the performance of several conventional apatite fertility indicators was evaluated using the raw data set (fig. 4). For example, xu et al (2021) have proposed three indices in apatite that can effectively distinguish between rich and lean porphyry. However, when applied to the dataset of the present study, its best accuracy was only 0.553 (fig. 4 a). More precisely, the classification based on Cl/F ratio (fig. 4 a) had a True Positive Rate (TPR) of 0.421 for the rich mineral apatite and a True Negative Rate (TNR) of 0.580 for the lean mineral apatite. The accuracy of the V and Y double graph (fig. 4 b), TPR and TNR were 0.261, 0.866 and 0.026, respectively, indicating that it was able to identify rich mineral apatite but not lean mineral apatite. In addition, on a global data set, conventional discriminant graphs show lower accuracy (from 0.242 to 0.553), which when applied to mineral exploration may lead to erroneous mineralisation potential assessment and unreliable mineralisation zone localization.
As the geochemical data associated with apatite increases, the limitations of conventional research methods are also increasingly prominent. One of the main limitations is that the mineral-rich geochemical index of local porphyry cannot be accurately applied to the mineral formation evaluation of other areas. In addition, the traditional method which only depends on limited indexes cannot comprehensively consider the ore formation information contained in various elements, so that the potential of metal enrichment cannot be effectively estimated.
ML models capable of processing high-dimensional geochemical data are considered to be powerful mineral exploration tools. Compared with the traditional element two-dimensional graph, the XGBoost model in the study is obviously more accurate and efficient, and the accuracy is varied from 0.8507 to 0.9918, which shows that the success rate is higher in the processes of prospecting and prospecting. In addition, ML can integrate all the characteristics of the apatite microelements at the same time, and directly capture the relationship between geochemical data and mineralization. The advantage of this approach is that the results are applicable to any geological environment. As the amount of apatite geochemical data from various deposit types increases, ML models trained on such data sets may become more complex and accurate.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the invention will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:
FIG. 1 is a block diagram of major and trace elements and geochemical indicators of a global apatite sample, expressed as weight percentages (a) and ppm (b). Boxes represent the quartile spacing (IQR) and mark the upper quartile (75%) and the lower quartile (25%). Outliers extend to 1.5 times that of IQR. The horizontal line within the color box represents the median (50%). Black square symbols and circular symbols represent average and outliers, respectively;
Fig. 2 is a confusion matrix (left) and feature importance ranking (right) for four representative XGBoost models. The confusion matrix displays the prediction result of each category;
FIG. 3 is a correlation between feature selection and XGBoost model performance;
Fig. 4 is a scatter plot of elemental ratios of rich ("mineralized") and lean ("unmineralized") apatite in the raw dataset.
Detailed Description
In order that the above-recited objects, features and advantages of the present application will be more clearly understood, a more particular description of the application will be rendered by reference to the appended drawings and appended detailed description. It should be noted that, without conflict, the embodiments of the present application and features in the embodiments may be combined with each other.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced otherwise than as described herein, and therefore the scope of the present invention is not limited to the specific embodiments disclosed below.
The method for discriminating regional mineral potential based on machine learning of apatite components according to the embodiment of the present invention will be specifically described with reference to fig. 1 to 3.
The invention provides a method for distinguishing regional mineralization potential based on machine learning of apatite components, which specifically comprises the following steps:
S1, database construction: all apatite composition data for modeling were collected and compiled from existing literature, containing 241 sampling points in 27 countries worldwide. Each site includes a plurality of samples and analyses. The raw dataset contains 13382 pieces of apatite component data, including point analysis data and averages of documents not providing point analysis data. FIG. 1 shows the elements and geochemical data structures contained in the dataset;
Deposit types are classified according to their value, morphology, alteration, ore mineralogy, and host rock relevance; the ore deposit includes a porphyry type, a skarn type, a shallow low temperature Au-Ag ore deposit, a mountain-forming Au ore deposit, a copper iron oxide gold ore (IOCG), a sodium-based (IOA) ore, a mountain-forming Ni-cu±platinum group element ore, and a carbonate ore deposit. The analysis results collected from apatite formed with the deposit are labeled "mineralized", and the apatite analysis results in unmineralized rock are labeled "unmineralized"; according to these criteria, 9104 and 4278 pieces of data are labeled "mineralized" and "unmineralized", respectively;
S2, dividing the sub-databases: to further distinguish the effects of the principal and trace elements, the original dataset is divided into three subsets, with the analysis results for samples containing CaO, P 2O5、SO3, cl and F selected as the "principal" dataset and the analysis results for samples containing trace elements selected as the "trace" dataset; the analysis result containing both principal and trace elements is set as a "principal and trace" dataset;
S3, preprocessing the data collected in the S2:
s31, processing missing values: i.e. eliminating any element with a missing value greater than 60% of the whole column; after filtering, the "main amount" data set comprises 5618 pieces of data, the characteristics of the "main amount" data set comprise CaO, P 2O5、SO3、F、Cl、FeO、MnO、Na2O、SiO2 and Cl/F, the "micro" data set comprises 9979 pieces of data, and the characteristics of the "micro" data set comprise V, mn, rb, sr, Y, zr, la, ce, pr, nd, sm, eu, gd, tb, dy, ho, er, tm, yb and Lu;
S32, calculating geochemical indexes, and adding the geochemical indexes into a micro data set as new characteristics, wherein the indexes are considered to have important significance on ore formation and magma evolution. These indices include LREE、HREE、Sr/Y、V/Y、Ce/Nd、Eu*、Ce* N、EuN/Eu* N、Ce/Ce*、Eu/Eu*/Y、REE+Y、(La/Yb)N and La/Sm; the "major and minor" data set includes 2448 pieces of data and 43 features; the features of the "master and trace" data sets include CaO、P2O5、SO3、F、Cl、FeO、Cl/F、SiO2、Na2O、MgO、Rb、Sr、Y、Zr、La、Ce、Pr、Nd、Sm、Ee/Sm)、Pr、Nd、Sm、Eu、Gd、Tb、Dy、Ho、Er、Tm、Yb、Lu、Th、U、Sr/Y、V/Y、Ce/Nd、Eu*、Ce* N、EuN/Eu* N、Ce/Ce*、Eu/Eu*/Y、REE+Y、(La/Yb)N、La/Sm、LREE、HREE.
S4, a Machine Learning (ML) method:
XGBoost is a gradient tree promotion based ML system that can solve real world scale problems with minimal resources. XGBoost is a distributed gradient promotion library, which is optimized for high efficiency and flexibility. Its flexibility is manifested in being able to handle sparse data with a variety of possible reasons, including missing values and frequently occurring 0 values. In addition, its parallel and distributed computing capabilities help to speed up learning, thereby enabling faster model exploration. Highly scalable end-to-end tree enhancement systems can be efficiently extended to larger data sets with minimal cluster resources. In addition, the XGBoost tree structure can identify important features, so that the interpretation of the result is improved, and the relationship between the apatite component and the ore formation is clarified, and the geochemical significance of the apatite component is explored.
Using the XGBoost model, XGBoost is an ML algorithm that runs under a gradient lifting framework. The training method is addition operation, and each new tree is added to adapt to the residual error of the previous prediction; adding the results of all the trees to obtain a final prediction result; given a dataset d= { (xi, yi) } (|d|=n, xi e Rm, yi e R), where there are n examples and m features, the output of the tree set model using K addition functions is predicted as the sum of K scores:
Wherein, Representing the space of the regression tree, the function q representing the structure of each tree, which maps an example to the corresponding leaf index, T being the number of leaves in the tree, each f k corresponding to an independent tree structure q and leaf weight w, w i representing the score on the ith leaf;
S5, model super-parameter adjustment: the five-fold cross-validation method is combined with a grid search strategy for optimizing XGBoost models. The grid searching strategy thoroughly generates candidate parameters from a parameter value grid, and selects and outputs the candidate parameters with highest scores according to the evaluation result of the predefined index; the grid search procedure is to determine the best combination of the hyper-parameters including eta, gamma, maximum depth and alpha, and generate 3600 candidate models from which the best model is selected.
S6, machine learning classification results: based on the three apatite component datasets, 14 XGBoost models were trained in total according to different feature choices; five models were trained using the "prime and trace" datasets, 43, 35, 22, 12 and 6 respectively, two models were trained using the "prime" dataset, all ten prime elements were used to train model M-1, and four selected elements were used to model M-2, the "trace" dataset was considered very important for recognition mineralization and therefore was used to train seven models, the relevant feature numbers were set to 33, 28, 21, 14, 7, 3 and 2 in sequence; the classification result of the XGBoost model is displayed as a confusion matrix; fig. 2 shows the predicted results of four representative models.
Obtaining the relative importance of all features used in each model from XGBoost algorithm to determine the elements in apatite that are highly correlated with the mineralisation; of all models of "trace" datasets, V, sr, Y, eu, ce and Rb occur most often in the top ten features of the relative importance ranking, V being most important in the ranking; of all five models of V are selected, the relative importance of V is highest among the four models, and second among the remaining one model; some geochemical criteria also have an impact on ranking, including Sr/Y, V/Y, eu *、(La/Yb)N and La/Sm. The relative importance of the SO 3 content is highest in the two models of the "prime" dataset. However, the proportions of each feature are quite consistent; features that play a key role in the "principal and micro" dataset models are similar to those in the "micro" dataset models. In addition, cl, F and Cl/F are also notable.
Feature selection: the classification results also indicate that there is a positive correlation between the number of features and the model performance. As shown in FIG. 3, the XGBoost model scores higher when training on more elements and geochemical indices. For example, the accuracy and F1 score increases from 0.9146 and 0.8507 for model T-7 (feature number=2) to 0.9682 and 0.9474 for model T-5 (feature number=7), and 0.9939 and 0.9900 for model T-3 (feature number=33).
Overall, of the 12 models, more than 90% of the samples from the test set were correctly classified by 10 models (accuracy greater than 0.9), indicating that the models in this study perform well in distinguishing between "mineralized" and "unmineralized" apatite. Of all 14 models, model M-T-1 obtained the highest score on both the training set and the test set. In the results of this model, all samples in the training set were correctly classified (accuracy=1), and more than 99% of samples in the test set were correctly classified (accuracy=0.9918). The elemental data obtained in practice may not be sufficient to meet the requirements of model M-T-1; however, model M-T-4 can achieve similar performance with only 9 elements (12 features), with accuracy and F1 scores of 0.9878 and 0.900, respectively. This suggests that the classification model in the present study may function in various situations. However, when the number of selected features is reduced to 2, the performance of the XGBoost model drops dramatically (fig. 3). From the overall classification results, the XGBoost model in this study clearly achieves excellent performance after appropriate feature selection and is applicable to various situations.
In the description of the present specification, the terms "one embodiment," "some embodiments," "particular embodiments," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (5)

1. The method for distinguishing regional ore potential based on machine learning of apatite components is characterized by comprising the following steps:
s1, database construction: the original dataset used for modeling contained 13382 pieces of apatite component data;
Deposit types are classified according to their value, morphology, alteration, ore mineralogy, and host rock relevance; the analysis results collected from apatite formed with the deposit are labeled "mineralized", and the apatite analysis results in unmineralized rock are labeled "unmineralized"; according to these criteria, 9104 and 4278 pieces of data are labeled "mineralized" and "unmineralized", respectively;
S2, dividing the sub-databases: the original dataset is divided into three subsets, wherein the analysis result of the sample containing CaO, P 2O5、SO3, cl and F is selected as a 'main quantity' dataset, and the analysis result of the sample containing trace elements is selected as a 'trace' dataset; the analysis result containing both principal and trace elements is set as a "principal and trace" dataset;
S3, preprocessing the data collected in the S2:
S31, processing missing values: i.e. eliminating any element with a missing value greater than 60% of the whole column; after filtering, the "master" dataset included 5618 pieces of data, and the "trace" dataset included 9979 pieces of data;
S32, calculating geochemical indexes, including LREE、HREE、Sr/Y、V/Y、Ce/Nd、Eu*、Ce* N、EuN/Eu* N、Ce/Ce*、Eu/Eu*/Y、REE+Y、(La/Yb)N and La/Sm, and adding the geochemical indexes to a micro-data set to serve as new characteristics; the "major and minor" data set includes 2448 pieces of data and 43 features;
S4, a machine learning method: adopting XGBoost model, its training method is addition operation, and every new tree is added to adapt to the residual error of previous prediction; adding the results of all the trees to obtain a final prediction result; given a dataset d= { (xi, yi) } (|d|=n, xi e Rm, yi e R), where there are n examples and m features, the output of the tree set model using K addition functions is predicted as the sum of K scores:
Wherein, Representing the space of the regression tree, the function q representing the structure of each tree, which maps an example to the corresponding leaf index, T being the number of leaves in the tree, each f k corresponding to an independent tree structure q and leaf weight w, w i representing the score on the ith leaf;
S5, model super-parameter adjustment: combining the five-fold cross validation method with a grid search strategy, wherein the grid search strategy thoroughly generates candidate parameters from a parameter value grid, and selects and outputs the candidate parameters with highest scores according to the evaluation result of the predefined index;
S6, machine learning classification results: 14 XGBoost models were trained based on three apatite component datasets altogether; five models were trained using the "prime and trace" datasets, the number of selected features being 43, 35, 22, 12 and 6, respectively, two models were trained using the "prime" dataset, all ten prime elements were used to train model M-1, and four selected elements were used to train model M-2, the "trace" dataset was used to train seven models, and the number of relevant features was set to 33, 28, 21, 14, 7, 3 and 2 in order; the classification result of the XGBoost model is displayed as a confusion matrix;
the relative importance of all features used in each model was obtained from the XGBoost algorithm to determine the elements in apatite that are highly correlated with the mineralisation.
2. The method for determining regional mineral potential based on machine learning of apatite ingredients according to claim 1, wherein the ore deposit in step S1 comprises a porphyry type, a skarn type, a shallow low temperature Au-Ag ore deposit, a mountain-forming Au ore deposit, a copper iron oxide gold ore (IOCG), a rhynchophylla type (IOA), a mountain-forming Ni-cu±platinum group element ore, and a carbonate ore deposit.
3. The method for determining regional mineral potential based on machine learning of apatite ingredients according to claim 1, wherein the features of the "major" dataset in step S31 include CaO, P 2O5、SO3、F、Cl、FeO、MnO、Na2O、SiO2 and Cl/F, and the features of the "minor" dataset include V, mn, rb, sr, Y, zr, la, ce, pr, nd, sm, eu, gd, tb, dy, ho, er, tm, yb and Lu; the features of the "master and trace" data sets include CaO、P2O5、SO3、F、Cl、FeO、Cl/F、SiO2、Na2O、MgO、Rb、Sr、Y、Zr、La、Ce、Pr、Nd、Sm、Ee/Sm)、Pr、Nd、Sm、Eu、Gd、Tb、Dy、Ho、Er、Tm、Yb、Lu、Th、U、Sr/Y、V/Y、Ce/Nd、Eu*、Ce* N、EuN/Eu* N、Ce/Ce*、Eu/Eu*/Y、REE+Y、(La/Yb)N、La/Sm、LREE、HREE.
4. The method according to claim 1, wherein the mesh searching in step S5 is performed by determining the optimal combination of super parameters including eta, gamma, maximum depth and alpha, and generating 3600 candidate models, and selecting the optimal model.
5. The method of claim 1, wherein, in step S6, of all models of "micro" datasets, V, sr, Y, eu, ce and Rb occur most frequently in the top ten features of relative importance ranking, V being most important in ranking; of all five models of V are selected, the relative importance of V is highest among the four models, and second among the remaining one model; the relative importance of the SO 3 content is highest in the two models of the 'prime quantity' data set, and the proportion of each characteristic is quite consistent; features that play a key role in the "principal and micro" dataset models are similar to those in the "micro" dataset models.
CN202311652244.3A 2023-12-05 2023-12-05 Method for judging regional ore-forming potential based on machine learning of apatite components Active CN118039031B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311652244.3A CN118039031B (en) 2023-12-05 2023-12-05 Method for judging regional ore-forming potential based on machine learning of apatite components

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311652244.3A CN118039031B (en) 2023-12-05 2023-12-05 Method for judging regional ore-forming potential based on machine learning of apatite components

Publications (2)

Publication Number Publication Date
CN118039031A true CN118039031A (en) 2024-05-14
CN118039031B CN118039031B (en) 2024-07-16

Family

ID=90984836

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311652244.3A Active CN118039031B (en) 2023-12-05 2023-12-05 Method for judging regional ore-forming potential based on machine learning of apatite components

Country Status (1)

Country Link
CN (1) CN118039031B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021004198A1 (en) * 2019-07-10 2021-01-14 江苏金恒信息科技股份有限公司 Plate performance prediction method and apparatus
CN115078520A (en) * 2022-06-13 2022-09-20 西藏巨龙铜业有限公司 Mineral geochemistry-based porphyry system mineralization evaluation method
CN115148299A (en) * 2022-07-15 2022-10-04 中国地质大学(北京) XGboost-based ore deposit type identification method and system
CN115482196A (en) * 2022-08-09 2022-12-16 中南大学 Sintering mixed material moisture online soft measurement method and system based on multi-source information fusion

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021004198A1 (en) * 2019-07-10 2021-01-14 江苏金恒信息科技股份有限公司 Plate performance prediction method and apparatus
CN115078520A (en) * 2022-06-13 2022-09-20 西藏巨龙铜业有限公司 Mineral geochemistry-based porphyry system mineralization evaluation method
CN115148299A (en) * 2022-07-15 2022-10-04 中国地质大学(北京) XGboost-based ore deposit type identification method and system
CN115482196A (en) * 2022-08-09 2022-12-16 中南大学 Sintering mixed material moisture online soft measurement method and system based on multi-source information fusion

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张照录等: "基于XGBoost预测焦家金矿深部资源量", 《首届全国矿产勘查大会论文集》, 12 October 2021 (2021-10-12), pages 1 - 2 *

Also Published As

Publication number Publication date
CN118039031B (en) 2024-07-16

Similar Documents

Publication Publication Date Title
CN105257286B (en) Method and device for obtaining stratum rock component content
CN106355011A (en) Geochemical data element sequence structure analysis method and device
Zhong et al. Revealing the multi-stage ore-forming history of a mineral deposit using pyrite geochemistry and machine learning-based data interpretation
CN117272181B (en) Method for discriminating zircon host rock and ore forming environment by machine learning modeling
D'Antonio et al. Combined Sr-Nd isotopic and geochemical fingerprinting as a tool for identifying tephra layers: Application to deep-sea cores from Eastern Mediterranean Sea
Gong et al. Using regional geochemical survey data to trace anomalous samples through geochemical genes: The Tieshanlong tungsten deposit area (Southeastern China) case study
CN111028095A (en) Method for quantitatively identifying shale lithofacies based on well logging curve
Nykänen et al. Spatial analysis techniques as successful mineral-potential mapping tools for orogenic gold deposits in the Northern Fennoscandian Shield, Finland
Siqueira et al. Magnetic susceptibility for characterizing areas with different potentials for sugarcane production
CN114638300A (en) Method, device and storage medium for identifying desserts of shale oil and gas reservoir
CN118039029A (en) Method and system for identifying granite type based on machine learning and zircon component
Bishop et al. Using machine learning to identify indicators of rare earth element enrichment in sedimentary strata with applications for metal prospectivity
CN118039031B (en) Method for judging regional ore-forming potential based on machine learning of apatite components
Didenko et al. Tectonic implications: zircon age of sedimentary rocks from Khabarovsk, Samarka, and Zhuravlevka-Amur terranes in the northern Sikhote-Alin Orogenic Belt
Huang et al. Organic matter accumulation of the Upper Triassic Ma'antang shales in the Longmenshan Foreland Basin (western Sichuan, China)
CN109444189B (en) Method for carrying out complex stratum comparison and quantitative evaluation by utilizing digital rock analysis technology
CN1187626C (en) Multielement optimizing control method for prospecting ore deposit
Whitten Application of quantitative methods in the geochemical study of granite massifs
Saha et al. Discriminating quartz host rock based on its trace element chemistry using machine learning-a new tool for sedimentary provenance studies
Shelton et al. Machine learning can assign geologic basin to produced water samples using major ion geochemistry
Khammar et al. Analysis of lithogeochemical data using log-ratio transformations and CA fractal to separate geochemical anomalies in Tak-Talar, Iran
Hampton From Isotopes and Whole Rock Geochemistry to Machine Learning: Diving into the Plumbing System of Large Mafic Eruptions Using a Diverse Geochemical Toolset to Investigate Magmatic Processes
Lorin Fassbender et al. Geochemical Signatures of Mafic Volcanic Rocks in Modern Oceanic Settings and Implications for Archean Mafic Magmatism
Sanchez Siachoque Geochemical Mapping of the North-Central Portion of the Yukon-Tanana Upland, Alaska, United States: Application of Exploratory Data Analysis (EDA) to REE and PGE Mining Prospection
Bootorabi et al. REVIEW OF GEOCHEMICAL, ISOTOPIC AND FLUID INCLUSIONS STUDIES IN RAMAND REGION (QAZVIN PROVINCE)

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant