CN118194162B - Method, system, electronic equipment and storage medium for locating mining target area based on multivariate data - Google Patents

Method, system, electronic equipment and storage medium for locating mining target area based on multivariate data Download PDF

Info

Publication number
CN118194162B
CN118194162B CN202410399063.2A CN202410399063A CN118194162B CN 118194162 B CN118194162 B CN 118194162B CN 202410399063 A CN202410399063 A CN 202410399063A CN 118194162 B CN118194162 B CN 118194162B
Authority
CN
China
Prior art keywords
geochemical
data
variable
variables
analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410399063.2A
Other languages
Chinese (zh)
Other versions
CN118194162A (en
Inventor
刘海明
王立强
李保亮
周敖日格勒
王勇
高腾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Mineral Resources of Chinese Academy of Geological Sciences
Original Assignee
Institute of Mineral Resources of Chinese Academy of Geological Sciences
Filing date
Publication date
Application filed by Institute of Mineral Resources of Chinese Academy of Geological Sciences filed Critical Institute of Mineral Resources of Chinese Academy of Geological Sciences
Priority to CN202410399063.2A priority Critical patent/CN118194162B/en
Publication of CN118194162A publication Critical patent/CN118194162A/en
Application granted granted Critical
Publication of CN118194162B publication Critical patent/CN118194162B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The application discloses a method, a system, electronic equipment and a storage medium for locating a mining target area based on multi-metadata. The method comprises the steps of processing a geological dataset, a geochemical dataset and a geophysical dataset of a target area to obtain lithologic variables, fault variables, rock variables, geophysical M-means central cluster analysis variables, original geochemical single element variables, geochemical M-means central cluster analysis variables, geochemical linear dimension-reduction projection variables and geochemical nonlinear dimension-reduction analysis variables, and fusing the variables to obtain a multi-element prediction variable layer; taking mineralized point and non-mineralized point variables as classification layers; respectively inputting the multi-element prediction variable layers into at least three different machine learning models, and selecting the machine learning model with the best performance as a final mine classification model based on classification evaluation data of all the machine learning models; and outputting a mineral distribution probability map of the target area based on the final mine point classification model.

Description

Method, system, electronic equipment and storage medium for locating mining target area based on multivariate data
Technical Field
The application relates to the technical field of mineral resource exploration, in particular to a method, a system, electronic equipment and a storage medium for locating a target area for prospecting based on multivariate data.
Background
Mineral resource prediction evaluation is to scientifically predict possible positions and resource potential of minerals with economic value on the earth surface, so as to guide national government resource policy formulation and mineral exploration deployment work. With the rapid rise of information technology and artificial intelligent analysis technology, mineral resource prediction and evaluation work also enters an intelligent analysis era. Research finds that the extraction of geological anomaly information by using artificial intelligence technology, especially machine learning and deep learning algorithms, has become a key technology for intelligent mineral resource prediction.
From classical mineralisation theory, it is known that deposits are economically significant bodies whose formation processes are often related to mineralisation hydrothermal activity, rock alteration, and formation volume, which often cause combined anomalies of multiple elements in geochemical anomaly investigation, and that conventional geochemical anomaly delineation processes are often simple superposition of multiple elements to find mineralisation anomalies, but do not take into account the linear and nonlinear dependence of sample element content, which can cause excessive delineation areas of element anomalies or loss of element anomalies, leading to the illusion that the target area of the mine is not coincident with the blind ore body.
The mining target area is defined by only geochemical anomaly data or geophysical anomaly data of a single data source, and large uncertainty exists in the practical application process, and especially the result of difficulty in selecting the target area in the drilling engineering arrangement is easily caused by lack of correlation and clustering correlation analysis of the original data in the data processing process, so that the mining potential prediction and resource evaluation of the area are directly influenced.
Disclosure of Invention
The embodiment of the application provides a method, a system, electronic equipment and a storage medium for locating a mining target area based on multivariate data.
The application discloses a method for locating a target area of prospecting based on multiple metadata, which comprises the following steps: acquiring a geological data set, a geochemical data set and a geophysical data set of a target area; processing the geological data set to obtain lithology variables, fault variables, rock mass variables, mineralization points and non-mineralization point variables; processing the geochemical data set to obtain an original geochemical single element variable, a geochemical M-means central cluster analysis variable, a geochemical linear dimension-reduction projection variable and a geochemical nonlinear dimension-reduction analysis variable; processing the geophysical data set to obtain a geophysical M-means central cluster analysis variable; fusing lithologic variables, fault variables, rock variables, geophysical M-means central clustering analysis variables, original geochemical single element variables, geochemical M-means central clustering analysis variables, geochemical linear dimension-reduction projection variables and geochemical nonlinear dimension-reduction analysis variables to obtain fused multi-element prediction variable layers; taking mineralized point and non-mineralized point variables as classification layers; respectively inputting the multi-element prediction variable layers into at least three different machine learning models to obtain classification probability values of each point output by each machine learning model, wherein the classification labels of each machine learning model are classification layers; selecting a machine learning model with the best performance as a final mine point classification model based on classification evaluation data of all the machine learning models; and outputting a mineral distribution probability map of the target area based on the final mine point classification model.
The application discloses a multi-metadata-based locating system for a target area of prospecting, which comprises an acquisition module, a first processing module, a second processing module, a third processing module, a fusion module, an input module, a selection module and an output module, wherein the acquisition module is used for acquiring a geological dataset, a geochemical dataset and a geophysical dataset of a target area; the first processing module is used for processing the geological data set to obtain lithology variables, fault variables, rock mass variables, mineralization points and non-mineralization point variables; the second processing module is used for processing the geochemical data set to obtain an original geochemical single element variable, a geochemical M-means central cluster analysis variable, a geochemical linear dimension-reduction projection variable and a geochemical nonlinear dimension-reduction analysis variable; the third processing module is used for processing the geophysical data set to obtain a geophysical M-means central cluster analysis variable; the fusion module is used for fusing lithologic variables, fault variables, rock mass variables, geophysical M-means central clustering analysis variables, original geochemical single element variables, geochemical M-means central clustering analysis variables, geochemical linear dimension-reduction projection variables and geochemical nonlinear dimension-reduction analysis variables to obtain fused multi-element prediction variable layers; taking mineralized point and non-mineralized point variables as classification layers; the input module is used for respectively inputting the multi-element prediction variable layers into at least three different machine learning models, and the classification label of each machine learning model is a classification layer to obtain the classification probability value of each point output by each machine learning model; the selection module is used for selecting the machine learning model with the best performance as a final mine classification model based on classification evaluation data of all the machine learning models; the output module is used for outputting a mineral distribution probability map of the target area based on the final mine point classification model.
The electronic device according to the embodiment of the application comprises: at least one processor and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the multi-data-based method of locating a target area for prospecting.
The application also provides a computer readable storage medium storing computer instructions for causing a computer to perform the above-described method for locating a target area for prospecting based on multivariate data.
According to the method, the device, the electronic equipment and the computer readable storage medium for locating the mining target area, the novel data structure is formed by exploring the linear and nonlinear correlations in the original geological data (geological data set, geochemical data and geophysical data set) and fusing through cluster analysis, and the purposes of reducing the mining target area and quickly locating are achieved by reasonably highlighting the mining information abnormality and further weakening the non-mining abnormality. In the aspect of selecting a machine learning prediction model, the algorithm model with the highest accuracy, the highest stability and the highest time efficiency is selected for final training through comprehensive comparison and evaluation of at least three machine learning models, the best prospecting target area positioning is obtained through obtaining a working area probability map, the precision and the efficiency of the subsequent prospecting target area are improved, the manpower, material resources and financial investment in the prospecting process of the same type of deposit can be greatly reduced, and the method is not beneficial to improving the prospecting efficiency and the prospecting technical level of the plateau ecologically fragile area.
Additional aspects and advantages of embodiments of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a schematic flow chart of a method for locating a target area for prospecting according to an embodiment of the present application;
FIG. 2 is a block diagram of a target location system for mine prospecting in accordance with an embodiment of the present application;
fig. 3 is a schematic structural view of an electronic device according to an embodiment of the present application;
FIG. 4 is a block diagram of a computer-readable storage medium according to an embodiment of the present application;
FIG. 5 is a graph showing the results of an example of the application of an embodiment of the present application to a tungsten mine in a Tibet.
Detailed Description
Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for explaining the embodiments of the present application and are not to be construed as limiting the embodiments of the present application.
Referring to fig. 1, the present application provides a method for locating a target area for prospecting based on multivariate data, and in some embodiments, the method for locating a target area for prospecting includes the following steps 01 to 08.
01: A geological dataset, a geochemical dataset, and a geophysical dataset of a target area are acquired.
The target area may be a research area to be researched or a partial area in the area to be researched, and the partial area has known ore distribution and can be obtained through investigation by means of manual investigation and the like. The geological dataset may comprise geological survey data of the target area 1:50000, which may be GIS (Geographic Information System ) data, raw data obtained by carrying out geological survey of the mineral by a scale of 1:50000, which may be stored and managed in a GIS format, and the data generally comprise a multi-element database of geology, geophysics, geochemistry, remote sensing and the like, and cover various contents of geological background, mineral resource distribution, rock mineral characteristics, geophysics, geochemistry anomalies and the like of the mineral area. The geochemical data set may include 1:50000 geochemical measurement data of the water-based sediment, data obtained by geochemical measurement of the water-based sediment through a scale of 1:50000, the data mainly reflect content and distribution characteristics of various elements in the water-based sediment, the geophysical data set may include 1:50000 geophysical measurement data, and specifically may include inverted data such as magnetic susceptibility, resistivity, density, and the like, the information reflects physical properties of the underground medium, the data may be obtained by geophysical exploration through a scale of 1:50000, the scale is large, the obtained data has high accuracy and resolution, and the physical properties of the underground medium may be reflected more accurately. The resolution of the geological dataset, the geochemical dataset, the geophysical dataset may be a uniform 50m. In other embodiments, these data sets are not limited to a 1:50000 scale, but may be other scales, and resolution may be other sizes.
02: And processing the geological data set to obtain lithology variables, fault variables, rock mass variables, mineralization points and non-mineralization point variables.
The geological data set can be processed, different geological elements can be extracted, the address data set can be processed differently according to the characteristics of the different geological elements, so as to obtain variables corresponding to the different geological elements, for example, the geological data set of a target area can be opened by ArcGIS software, and the corresponding geological elements can be processed by a corresponding analysis tool, so that the corresponding variables are obtained.
In one embodiment, step 02 includes the following steps 021 through 024.
021: And rasterizing rock exposure of different lithologies in the geological data set by using a surface analysis tool to obtain lithology variables.
The vector data of rock exposure in the geological data set usually represents rock areas with different lithologies in the form of polygons (faces), the rock exposure with different lithologies can be subjected to rasterization processing through a face analysis tool in ArcGIS software to obtain a raster data set, each pixel represents one area in the original vector data, the pixel value corresponds to a lithologic code or name in the original vector data, the raster data set can be regarded as a layer, and the lithologic code or name field is set as a variable to obtain lithologic variables so as to facilitate subsequent analysis and visualization. In addition, a unique symbol or color may be created for each lithology code or name field to more intuitively display lithology differences of the geologic volume on the map.
022: And setting multi-level buffer areas for fracture zones or faults with different trend in the geological data set by utilizing a line analysis tool, extracting the trend of the fracture zones or faults, classifying the fracture zones or faults according to the trend of the fracture zones or faults, and carrying out rasterization treatment on the classified fracture zones or faults to obtain fault variables.
The vector data of the fault bands or faults in the geological dataset are represented in the form of line elements and the fault bands or faults generally have trends, each of which can be classified according to its trend, e.g., into two categories of Northeast (NE) and Northwest (NW) according to a "trend" attribute field, and two layers can be created, northeast and northwest, respectively, with each category of fault bands/faults (NE and NW), a multi-level buffer can be created using a "buffer" tool in which the input line elements, output buffer layer names, and required buffer distances (e.g., 50m, 100m, 150m, 200 m) are specified, ensuring that a separate buffer layer is created for each distance. For each of the northeast and northwest directions, the obtained multi-level buffers are subjected to rasterization processing, so that one or more raster data sets representing faults with different directions and multi-level buffers thereof can be obtained, namely, the rasterization data sets are set as fault variables. In one embodiment, the pixel values of these raster data sets (i.e., different buffer levels) may be provided with unique symbols or colors to be more intuitive.
023: And rasterizing the rock mass and stratum of the magma rock invaded body in the geological data set by using a surface analysis tool to obtain rock mass variables.
Vector data of a magma rock invasion and formation in a geological dataset is typically represented in the form of polygons (faces), each polygon (face) having an attribute field representing a rock mass type or formation name, for example, a rock mass type or formation name. The rock burst and formation may be classified. For example, magma rock is divided into different invasion times or rock types, and strata is divided into different strata units or rock combinations. The classified vector data is converted into a grating data set, and one or more grating data sets representing different rock masses or strata, i.e. rock mass variables, can be obtained. The pixel values of these grating data sets (i.e. different rock mass types or formation units) may also be provided with unique symbols or colors to facilitate intuitive differentiation.
024: And marking obvious mineralization positions on the surface of the geological data set as mineralization points, randomly selecting other point positions in the research area as non-mineralization points according to a preset proportion, and rasterizing the mineralization points and the non-mineralization points by using a point analysis tool to obtain mineralization point and non-mineralization point variables.
The geological data set can comprise field records of mineral geological investigation, mineralization points in a target area can be marked, obvious mineralization positions on the surface exposure are marked as mineralization points, a mineralization point map layer is obtained, other point positions are randomly selected according to the proportion of 1:1, 1:2 and 1:5 to serve as non-mineralization points, the non-mineralization point map layer can be used as non-mineralization points, and the number of the non-mineralization points can be guaranteed to be enough to provide enough background data for comparison analysis. The mineralized dot pattern layer and the non-mineralized dot pattern layer can be combined into a new dot pattern layer, the new dot pattern layer is subjected to rasterization processing by using a dot analysis tool, a raster data set which represents the distribution of mineralized dots and non-mineralized dots is obtained, different symbols or colors can be set for the mineralized dots and the non-mineralized dots so as to be convenient for distinguishing the mineralized dots and the non-mineralized dots on a map, and the raster data set can be used as mineralized dots and non-mineralized dots variables for subsequent spatial analysis and visualization.
03: And processing the geochemical data set to obtain an original geochemical single element variable, a geochemical M-means central cluster analysis variable, a geochemical linear dimension-reduction projection variable and a geochemical nonlinear dimension-reduction analysis variable.
The geochemical data set can be processed to extract and obtain an original geochemical single element variable, a geochemical M-means central cluster analysis variable, a geochemical linear dimension-reduction projection variable and a geochemical nonlinear dimension-reduction analysis variable, so that the subsequent visual analysis based on the variables is facilitated.
In some embodiments, step 03 comprises the steps of:
0311: and carrying out format unification processing on the geochemical data set to obtain a first geochemical data set.
The acquired geochemical data set may be subjected to format unification, duplicate values are removed, and the element content unit in the removed geochemical data set is converted from w% percent to ppm parts per million.
0312: And acquiring the missing value in the first geochemical data set, acquiring the missing value proportion, and calculating the missing value for the data with the missing value proportion lower than the preset proportion to obtain second geochemical data.
The missing value ratios may be viewed through the overall inspection data, values may be estimated and populated by algorithmic interpolation for data having missing value ratios below a predetermined ratio (e.g., 10%, 15%, 20%, etc.), and the populated data may be inspected. For example, the missing values may be output using impKNNa () function in robCompositions packages of the R language environment using algorithmic interpolation, and the output quality is checked for each element using Q-Q diagrams to ensure output data accuracy.
0313: And carrying out rasterization processing on the second geochemical data by using a point analysis tool to obtain the original geochemical single element variable.
In one embodiment, the second geochemical data may be imported into ArcGIS software to visualize each element, if the sample is unbalanced in spatial distribution, for example, some areas have dense data points and other areas have sparsity, which may affect the accuracy of interpolation results, the unbalanced sample may be processed using a spatial interpolation tool in ArcGIS, for example, interpolation may be performed using two common interpolation methods, such as Kriging and Inverse Distance Weighting (IDW), so that the data is richer, the geochemical data may cover the target area as much as possible, the second geochemical data after interpolation may be rasterized using a point analysis tool to obtain rasterized data, and each pixel in the rasterized data may contain content values of the corresponding element, and the content values may be set as original geochemical single element variables.
0314: And converting each data in the second geochemical data into logarithmic ratio data to obtain third geochemical data.
The second geochemical data can be subjected to logarithmic transformation, nonlinear relations often exist among geochemical element data, and the nonlinear relations can be converted into linear relations through logarithmic transformation, so that data analysis is simplified. Specifically, the second geochemical data can be converted into log ratio data by using a log conversion formula, so as to obtain third geochemical data, wherein the log conversion formula is as follows:
Where clr (x) represents the logarithmically converted values, x 1 to x D represent the 1 st to D element contents, g (x) represents the geometric mean of each element content, and the base of log is chosen to be 10.clr (x) contains values of the original element content after geometric mean normalization and logarithmic transformation, and can help stabilize variance of data after logarithmic transformation, and solve the problem of component data closure caused by the fact that the data summation is a fixed value, so that the data is more suitable for further statistical analysis.
0315: And performing dimension reduction analysis on the third geochemical data by using a linear projection dimension reduction function to obtain a double-scale graph and a characteristic root graph.
Performing dimension reduction analysis on the third geochemical data by using a linear projection dimension reduction function, wherein the linear projection dimension reduction function is as follows,
Wherein x represents an element content matrix, which contains K element variables, T matrix is a low latitude space matrix [ T 1,t2,t3,…tA ], (a < K), a represents the number of dimensions of the projection space, T i represents a score vector, which is a projection value of x data projection of each column in the potential T space, p i represents a load vector, which is a vector for describing the correlation between the variables, which can be obtained by calculating the linear correlation between the element variables and the score vector (eigenvectors of the covariance matrix of the original data), and E represents a residual matrix.
After dimension reduction analysis is performed by using a linear projection dimension reduction function, a double-scale graph, a characteristic root graph, a variable load graph (obtained by a load matrix p i) and a sample score graph (obtained by a score matrix t i) can be respectively drawn, for example, the double-scale graph is obtained by drawing t i and p i on one graph, the correlation of geochemical elements can be analyzed by using the double-scale graph, the elements in the same quadrant in the double-scale graph have positive correlation, the elements in opposite angles of the quadrant have negative correlation, the elements at two ends in the low-latitude space direction and the opposite angles of the quadrant have the largest difference for overall data, and the elements at the center of the double-scale graph tend to have small differences among samples. And determining the representativeness of key low-latitude space projection on the data by utilizing the feature root graph, wherein the first three low-latitude spaces often represent the largest difference in the data, the rear low-latitude projection space direction often represents an ore event or mineralization abnormality, and finally deriving a sample score matrix obtained by calculating a projection dimension-reduction function to obtain the representativeness of each sample in each low-latitude space direction after integrating multiple element combinations. The variable load map shows the load of each element variable on each principal component, the load representing the correlation between the element variable and the principal component. The sample score graph shows the score of the sample on each principal component, which helps identify patterns and clusters in the dataset.
0316: And calculating the relative contribution value of the single element in the third geochemical data in each low latitude space, selecting a target low latitude space by combining the relative contribution value, the double-scale graph and the characteristic root graph, and setting the score value of the sample in the target low latitude space as the conversion value of the projection dimension reduction function of the sample.
Outputting the relative contribution value rc of the single element in each low latitude space by using the relative contribution-linear projection dimensionality reduction function of the third geochemical data, wherein the relative contribution-linear projection dimensionality reduction function is as follows:
Wherein rc is the relative contribution value of a single element in a certain low latitude space direction, p i represents the load vector, and A represents the dimension number of the projection space after conversion. The larger the relative contribution value of an element in a certain low latitude space is, the more representative the low latitude space is for the element, meanwhile, the target low latitude space representing the target element is selected by combining the element correlation and the characteristic root graph analyzed in the obtained double-scale graph of the low latitude space, generally speaking, the elements in the same quadrant in the double-scale graph have positive correlation, the elements in the opposite-angle quadrant have negative correlation, therefore, the element combination with the positive correlation with the target element needs to be selected by utilizing the double-scale graph, and meanwhile, the relative contribution value of the element combination in the low latitude space is taken into consideration, and the projection dimension reduction function conversion value of the selected sample in the target low latitude space score t i is set as the projection dimension reduction function conversion value of the sample based on the data.
0317: Matching the sample coordinates with corresponding conversion values of the projection dimension reduction function to generate a projection dimension reduction function conversion visual map; and converting the visual map based on the projection dimension-reduction function to obtain the geochemical linear dimension-reduction projection variable.
And (3) judging whether the sample has unbalance problem or not based on the projection dimension-reduction function conversion visual map, if so, interpolating supplementary data by adopting interpolation methods such as Kerling or inverse distance weight and the like for a data sparse region, so that geochemical data covers a target region as much as possible, rasterizing the data by utilizing a point analysis tool to obtain a geochemical linear dimension-reduction projection variable, wherein the point analysis tool can be a point analysis tool in ArcGIS software.
0318: And carrying out cluster analysis on the third geochemical data to obtain an M-means central cluster grouping result.
The third chemical data of the earth can be subjected to cluster analysis by adopting various cluster analysis functions, for example, the cluster analysis can be performed by utilizing M-means central cluster functions, wherein the M-means central cluster functions are as follows:
Firstly, determining the number of set groups (cluster number M), randomly selecting M data as an initial cluster center, obtaining a centroid x M, calculating the distance between each data point x im and each centroid x M, assigning the distance to the class to which the closest centroid belongs, calculating the average value of all data points of each class, taking the average value as a new centroid, repeatedly calculating the two steps until the internal distribution value W of the overall combination reaches the minimum, and completing cluster analysis. And (3) synthesizing a double-map result of the linear projection dimension reduction analysis and mining rule analysis of a research area, selecting a proper mean center (M value), and storing and assigning a clustering analysis result of each sample to each sample.
0319: Matching the sample coordinates with the M-means central clustering grouping result to form a geochemical data clustering analysis visual map; and obtaining a geochemistry M-means central cluster analysis variable based on the geochemistry data cluster analysis visual map.
And matching the sample coordinates with M-means center clustering grouping results, so that each sample coordinate has a clustering grouping result, constructing a geochemical data clustering analysis visual diagram based on all sample coordinates, identifying samples with similar geochemical data structures (forming a cluster) through the geochemical data clustering analysis visual diagram, observing spatial spreading of different clusters in space, estimating clustering values of the areas based on geochemical data of the existing sample points by using interpolation methods such as Kriging or inverse distance weights if the data sparse areas exist, enabling the geochemical data to cover a target area as much as possible, and rasterizing the data by using a point analysis tool in ArcGIS software for the geochemical data after interpolation to obtain geochemical M-means center clustering analysis variables.
0320: And carrying out random nonlinear dimension reduction analysis on the third geochemical data, and calculating a score value of each sample in each nonlinear low-dimensional space direction.
The random nonlinear dimension reduction analysis may be performed on the third geochemical data using a nonlinear projection dimension reduction function, and may specifically include calculating a similarity in a high latitude space and a similarity in a low latitude space, minimizing relative entropy in the high latitude and low latitude spaces to optimize the target, and maintaining the high latitude similarity in the low latitude space as much as possible. In one specific embodiment, the mathematical calculation formula includes the following formula:
A ij represents the similarity between data points x i and x j in high latitude space, the gaussian kernel function measures similarity by exponential terms, σ i is the variance of the gaussian distribution, which can be determined by a fixed standard deviation or based on the distance between the data points.
B ij represents the similarity between data points y i and y j in low latitude space, which can be measured in terms of the square and inverse.
PQ is the minimum relative entropy, and by gradient descent method, the low latitude embedding y= { Y 1,y2,y3…yn }, so that B ij is as close as possible to a ij.
Specifically, for the high latitude data matrix x= { X 1,x2,x3…xn }, each high latitude data point X i is mapped to the low latitude space y i by using a nonlinear projection dimension reduction function, firstly, similarity in the high latitude space is calculated by using formula (1), then similarity in the low latitude space is calculated by using formula (2), finally, a target is optimized by minimizing relative entropy (PQ divergence) in the high latitude and low latitude space by using formula (3), and the high latitude similarity is maintained in the low latitude space as much as possible by using a gradient descent method. The classification of each sample adopts the result obtained in the step 0318, the spatial dimension of the nonlinear projection dimension-reducing function may be set to 3 (or other values), and finally the score value of each sample calculated by the nonlinear projection dimension-reducing function in each nonlinear low-dimension spatial direction is output.
0321: Matching the sample coordinates with the score values to generate a nonlinear dimension reduction visual map of the geochemical data; and obtaining the geochemistry nonlinear dimension reduction analysis variable based on the geochemistry data nonlinear dimension reduction visualization.
And matching the sample coordinates with the score values of the nonlinear low-dimensional space directions, forming a nonlinear dimension-reduction visual map of the geochemical data in the ArcGIS, supplementing data by interpolation methods such as Kerling or inverse distance weight in the ArcGIS to enable the geochemical data to cover a target area as much as possible if the sample has unbalance problem, and rasterizing the data by using a point analysis tool in the ArcGIS to obtain a nonlinear dimension-reduction analysis variable of the geochemical data.
04: And processing the geophysical data set to obtain a geophysical M-means central cluster analysis variable.
The geophysical data set can be subjected to cluster analysis processing to obtain a geophysical M-means central cluster analysis variable. In one embodiment, step 04 includes: performing M-means central cluster analysis on the geophysical data set to obtain a cluster analysis result of each geophysical sampling point; matching the geophysical sample coordinates with clustering grouping results to generate a geophysical data clustering analysis visual map; and supplementing geophysical data based on the geophysical data cluster analysis visual map, and carrying out rasterization processing on the supplemented geophysical data set to obtain a geophysical M-means central cluster analysis variable.
And carrying out cluster analysis on the geophysical data set (comprising inverted magnetic susceptibility, resistivity and density data) by using an M-means central cluster function, setting the grouping number (M value) to be 3, storing the classification result of each sample, and assigning the classification result to each sample to obtain the cluster analysis result of each geophysical sampling point. And matching the geophysical sample coordinates with the M-means central clustering grouping result to form a geophysical data clustering analysis visual map, judging whether the sample has an unbalance problem based on the geophysical data clustering analysis visual map, and if the sample has the unbalance problem, utilizing a kriging and inverse distance weight interpolation method in an ArcGIS to supplement data to enable the geophysical data to cover a research area as much as possible, and utilizing a point analysis tool to rasterize the data to obtain a geophysical M-means central clustering analysis variable. The M-means center cluster function is the same as the function in step 0318.
05: Fusing lithologic variables, fault variables, rock variables, geophysical M-means central clustering analysis variables, original geochemical single element variables, geochemical M-means central clustering analysis variables, geochemical linear dimension-reduction projection variables and geochemical nonlinear dimension-reduction analysis variables to obtain fused multi-element prediction variable layers; mineralization point and non-mineralization point variables are used as classification layers.
Lithology variables, fault variables, rock mass variables, geophysical M-means central clustering analysis variables, original geochemistry single element variables, geochemistry M-means central clustering analysis variables, geochemistry linear dimension-reduction projection variables and geochemistry nonlinear dimension-reduction analysis variables are all used as prediction variables, and are all grid data, mineralization points and non-mineralization point variables are used as classification variables and are also grid data. Multiple predicted variables (such as lithologic variables, fault variables, rock mass variables and the like) can be overlapped and fused according to uniform resolution (for example, 50 m.50m) to obtain a multi-predicted variable layer. For example, the prediction variables can be overlapped and fused according to the resolution of 50m by using a layer overlapping tool in remote sensing processing image software ENVI software.
06: And respectively inputting the multi-element prediction variable layers into at least three different machine learning models to obtain classification probability values of each point output by each machine learning model, wherein the classification labels of each machine learning model are classification layers.
At least three different machine learning models are built in advance, a multi-element prediction variable layer can be input into each machine learning model, classification labels of each machine learning model are classification layers, and the machine learning model can output classification probability values of each point according to each prediction variable in the multi-element prediction variable layer and the classification layers. The present application is described by taking three machine learning models as an example, and the present application is not limited to three, but may be four, five, six or more, and is not particularly limited herein. The three machine learning models are a Random Forest model (Random Forest), a support vector machine model (Support Vector Machine), and a multi-layer perceptron model (Multilayer Perceptron, MLP), respectively. When training a random forest model, a support vector machine model and a multi-layer perceptron model, a multi-element predicted variable layer and mineralized point and unmineralized point variable layers of a known mine point area in a target area can be obtained based on the steps 01 to 05 for training.
For the random forest model, when training the random forest model, the multi-element prediction variable layer can be input into the random forest model, the classification label adopts mineralized points and non-mineralized point variable layers (namely classification layers), the classification probability value of each point is output, the coordinates of each point are matched with the classification probability value, a mineral classification probability map based on the random forest classification model can be further formed, the high-value area is a high-probability mineralized point, and the low-value area is a low-probability mineralized point or a non-mineralized point. Further drawing a sample learning curve, and confirming the number of the trees and the node selection number. And outputting a base index (Giniindex) graph, wherein the contribution value of each variable is ordered from high to low.
The sample learning curve is mainly drawn based on the ratio of the number of deposit points which are accurately predicted to the number of all predicted points, and different machine learning model parameters can cause the sample learning curve to reach the efficiency of optimal prediction accuracy. Therefore, the sample learning curve is used for evaluating the performance of the model under different training set sizes, so that the model is helped to know whether the model is influenced by high variance (over fitting) or high deviation (under fitting), and a balance point can be reached by adjusting model parameters (including the number of trees and the number of node selection), so that higher accuracy is obtained, and better prediction performance is realized. In the random forest model, the number of trees is an important super parameter, and when a learning curve is drawn, the optimal number of trees can be determined by observing that the error rate of the model reaches stability along with the increase of the number of trees. In the construction process of each decision tree of the random forest, the node selection quantity determines the size of the feature subset randomly selected on each node, and the change of the error rate of the model under different node selection quantities can be observed by drawing a learning curve, so that the optimal node selection quantity is determined. By looking at the base index map, it can be used to extract the variables that contribute most to the prediction of deposit classification, and thus the predicted variables that are most important to deposit prediction. The drawing of a sample learning curve and the output of a base index chart are helpful for in-depth understanding of performance characteristics of the model and the effect of the characteristics in the model, so that the model is guided to be optimized, the prediction accuracy is improved, and therefore, the optimized random forest prediction model is finally obtained.
For the support vector machine model, a multi-element prediction variable layer can be input into the support vector machine model, classification labels adopt a mineralization point and non-mineralization point variable layer, a classification probability value of each point is output, coordinates of each point are matched with the classification probability value, a mineral product classification probability map based on the support vector machine model is further formed based on a matching result, a high-value area is a high-probability mineralization point, and a low-value area is a low-probability mineralization point or a non-mineralization point. When the support vector machine model is trained, a sample learning curve can be drawn, the selection of the optimal super parameters is confirmed, and the final support vector machine model can be obtained after iterative training. The sample learning curve of the support vector machine is also dependent on adjusting the parameter settings of the support vector machine model, such as the penalty coefficient C and the parameter optimization of the kernel function, and the best parameters can be obtained through 10-fold grid search, so that the learning efficiency of the model is checked by using the sample learning curve.
For the multi-layer perceptron model, inputting a multi-element prediction variable layer into the multi-layer perceptron model, outputting a classification probability value of each point by adopting a mineralization point and non-mineralization point variable layer as classification labels, matching coordinates of each point with the classification probability value, and importing the coordinates and the classification probability value into a mineral product classification probability map based on the multi-layer perceptron model, wherein a high-value area is a high-probability mineralization point, and a low-value area is a low-probability mineralization point or a non-mineralization point. Wherein, the MLP algorithm can be adopted to solve the classification problem, comprising an input layer, at least one hidden layer and an output layer, and the MLP neural network consists of distributed neurons and weighted connection; each neuron contains a nonlinear activation function, and the connections between neurons carry the associated weights. The activation function generates a nonlinear decision boundary based on a nonlinear combination of weighted inputs, which may increase the nonlinearity and expressive power of the neural network model. The activation functions may include a tanh function, a sigmoid function, and a ReLU function. To minimize the deviation of the actual output from the desired output (i.e., the loss function), the optimal connection weights are found to train the MLP. Weights can be calibrated through an iterative training process, while a back propagation algorithm based on gradient descent can make iterative neural network training more efficient. During the training process, the back propagation algorithm can find the best combination of the square error and the weights, thereby outputting the best result of the training network. Training the MLP involves selecting structures (hidden layers and number of nodes per layer), weight optimized solvers, maximum number of iterations, regularization parameters, etc. to achieve optimal results and prevent overfitting. The importance of the input variables can be ordered by means of SHAPLEY ADDITIVE exPlanationis (SHAP) tools and the like, and the most important predicted variable layers can be searched.
The calculation formula of each activation function in the activation functions is as follows:
the calculation formula of the tanh function:
the formula for calculating the sigmoid function:
the formula for the calculation of the ReLU function: reLU (x) =max (x, 0).
After training to obtain a final random forest model, a support vector machine model and a multi-layer perceptron model, the multi-element prediction variable layer can be respectively input into the random forest model, the support vector machine model and the multi-layer perceptron model, and output classification evaluation data of each model can be obtained.
07: And selecting the machine learning model with the best performance as a final mine point classification model based on the classification evaluation data of all the machine learning models.
The classification probability values of all points output by each machine learning model form classification evaluation data of the machine learning models, and all the machine learning models can be compared based on the classification evaluation data of all the machine learning models so as to judge which machine learning model predicts a target area more accurately, and then the machine learning model with the best performance can be selected as a final mine point classification model of the target area. It can be understood that the final mine point classification models of different target areas are different, and the geological differences of different target areas are larger, so that different machine learning models have different prediction precision for different geology, and the final mine point classification model optimally matched with the target area is selected by comparing the output data of a plurality of machine learning models, so that the accuracy of prospecting is improved, and the prospecting efficiency is improved.
In some embodiments, step 07 includes the following steps 071 to 073.
071: Respectively drawing corresponding ROC curves according to classification evaluation data of each machine learning model;
072: based on ROC curves corresponding to the machine learning models, respectively calculating corresponding AUC values;
073: and comparing the AUC values corresponding to the machine learning models, and selecting the machine learning model with the largest AUC value as a final mine classification model.
ROC curve (Receiver Operating Characteristic curve,) is a tool for evaluating the performance of a classification model, while AUC (Area Under the Curve) is the area under the ROC curve, which quantifies the performance of the model over the entire classification range. By comparing ROC curves and AUC values for different models, their classification efficiency and prediction accuracy can be evaluated and compared. For each model, the real label and the prediction probability on the test set are used for calculating the ROC curve and the AUC value, the classification efficiency of the model is evaluated by comparing the AUC values, the closer the AUC value is to 1, the better the classification performance of the model is, the model with the highest AUC value can be selected as the final mine classification model, for example, if the AUC value of the multi-layer perceptron model is the highest, the multi-layer perceptron model is selected as the final mine classification model.
08: And outputting a mineral distribution probability map of the target area based on the final mine point classification model.
The multi-element prediction variable layer of the target area can be input into a final mine point classification model to obtain a mineral distribution probability map of the target area, and the area with potential mineral resources can be identified based on the mineral distribution probability map, so that guidance is provided for subsequent exploration and development work. The target area can be a partial area with known ore distribution in the area to be researched, the machine learning models can be trained based on the partial area, the machine learning model with the best performance is selected, the ore distribution probability map of the area to be researched comprising the target area can be output based on the final machine learning classification model, and then the ore distribution probability map of the whole research area is obtained, so that the ore-making abnormal range is reduced, the target area is further reduced for verification of the arrangement drilling engineering, and the ore-finding efficiency is improved.
In some embodiments, in step 01, if the remote sensing dataset of the target area is obtained, the remote sensing dataset may be obtained by obtaining satellite hyperspectral data of the target area, and the resolution of the remote sensing dataset is the same as the geological dataset, the geochemical dataset and the geophysical survey dataset.
The method for locating the target area for prospecting further comprises the following steps: and processing the remote sensing data set to obtain the changed mineral distribution variable. The ENVI can be used to open the remote sensing dataset and import the distribution map of different minerals into the ArcGIS software. Each typical mineral profile was rasterized using a face analysis tool in ArcGIS software to obtain altered mineral profile variables.
Step 05 further comprises: and taking the changed mineral distribution variable as a prediction variable, and fusing the lithologic variable, the fault variable, the rock variable, the original geochemical unit element variable, the geochemical linear dimension-reducing projection variable, the geochemical M-means central cluster analysis variable, the geochemical nonlinear dimension-reducing analysis variable and the changed mineral distribution variable to obtain a multi-element prediction variable layer. Thus, the accuracy of the mineral distribution probability map output by the subsequent model can be improved.
It should be noted that the foregoing embodiments are only described by taking ArcGIS software as an example, but the present application is not limited to ArcGIS software, and may be other similar software in the art, such as QGIS, mapGIS, etc., and the present application is not limited to this list.
The method for locating the mining target area, disclosed by the application, achieves the purposes of reducing the mining target area and quickly locating by exploring the linear and nonlinear correlations in original geological data (geological data set, geochemical data and geophysical data set), fusing and forming a new data structure through cluster analysis, reasonably highlighting mining information anomalies and further weakening non-mining anomalies. In the aspect of selecting a machine learning prediction model, the algorithm model with the highest accuracy, the highest stability and the highest time efficiency is selected for final training through comprehensive comparison and evaluation of at least three machine learning models, the best prospecting target area positioning is obtained through obtaining a working area probability map, the precision and the efficiency of the subsequent prospecting target area are improved, the manpower, material resources and financial investment in the prospecting process of the same type of deposit can be greatly reduced, and the method is not beneficial to improving the prospecting efficiency and the prospecting technical level of the plateau ecologically fragile area.
Further, various cluster analysis and dimension reduction analysis methods are applied to explore the internal structure of the data in the early stage of data processing, various combinations of reasonable mining anomalies are highlighted, the characteristics of mining points and non-mining points are extracted by using a machine learning algorithm, various prediction models are established, a mining exploration beneficial target area is quickly established in a mode of outputting a working area probability map, reliable mining information is extracted by using an interpreter of a conventional machine learning model, and final mining target area positioning is performed by selecting an algorithm with highest evaluation score.
As shown in fig. 2, the application further provides a mining target area positioning system based on multi-element data, which comprises an acquisition module 201, a first processing module 202, a second processing module 203, a third processing module 204, a fusion module 205, an input module 206, a selection module 207 and an output module 208, wherein the acquisition module 201 is used for acquiring a geological data set, a geochemical data set and a geophysical data set of a target area; a first processing module 202, configured to process the geological data set to obtain lithology variables, fault variables, rock variables, mineralization points and non-mineralization point variables; the second processing module 203 is configured to process the geochemical dataset to obtain an original geochemical single element variable, a geochemical M-means central cluster analysis variable, a geochemical linear dimension-reduction projection variable, and a geochemical nonlinear dimension-reduction analysis variable; a third processing module 204, configured to process the geophysical dataset to obtain a geophysical M-means central cluster analysis variable; the fusion module 205 is configured to fuse a lithologic variable, a fault variable, a rock variable, a geophysical M-means central cluster analysis variable, an original geochemical single element variable, a geochemical M-means central cluster analysis variable, a geochemical linear dimension-reduction projection variable, and a geochemical nonlinear dimension-reduction analysis variable to obtain a fused multi-element prediction variable layer; taking mineralized point and non-mineralized point variables as classification layers; the input module 206 is configured to input the multiple prediction variable layers into at least three different machine learning models respectively, and obtain a classification probability value of each point output by each machine learning model, where a classification label of each machine learning model is a classification layer; a selection module 207, configured to select, as a final mine classification model, a machine learning model with the best performance based on classification evaluation data of all machine learning models; and the output module 208 is used for outputting a mineral distribution probability map of the target area based on the final mine point classification model.
The specific content of each module can refer to the related content of the mine finding target area positioning method based on the multivariate data, and the description is omitted herein.
As shown in fig. 3, the present application further provides an electronic device, including: at least one processor 301; and a memory 302 communicatively coupled to the at least one processor 301; the memory 302 stores instructions executable by the at least one processor to enable the at least one processor 301 to perform the multi-data-based target location method of any of the above embodiments.
Memory 302 may comprise high-speed RAM memory or may also include non-volatile memory (non-volatile memory), such as at least one disk memory. Further, the electronic device may also comprise a communication interface 303, the communication interface 303 being for communication between the memory 302 and the processor 301.
If the memory 302, the processor 301, and the communication interface 303 are implemented independently, the communication interface 303, the memory 302, and the processor 301 may be connected to each other and perform communication with each other through buses. The bus may be an industry standard architecture (Industry Standard Architecture, abbreviated ISA) bus, an external device interconnect (PERIPHERAL COMPONENT, abbreviated PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, only one thick line is shown in fig. 3, but not only one bus or one type of bus.
In a specific implementation, if the memory 302, the processor 301, and the communication interface 303 are integrated on a chip, the memory 302, the processor 301, and the communication interface 303 may communicate with each other through internal interfaces.
Processor 301 may be a central processing unit (Central Processing Unit, abbreviated as CPU) or an Application SPECIFIC INTEGRATED Circuit (abbreviated as ASIC) or one or more integrated circuits configured to implement embodiments of the present application.
As shown in fig. 4, the present application further provides a computer readable storage medium storing computer instructions 401, where the computer instructions 401 are configured to cause a computer 402 to execute the method for locating a target area for mining based on the metadata according to any one of the embodiments described above.
Application example 1
According to the concrete implementation case of the application, the mining target area is quickly positioned in a large tungsten mine in a Tibet by applying the mining target area positioning method based on the multivariate data, and the quartz vein type wolfram mine is positioned in the West section of the Tibet public lake-Yangjiang Chengjingjingjingji, and the average altitude is 4800-5200 m, and belongs to a high-cold high-altitude area. The stratum of the mining area is exposed simply, and the stratum is mainly of a snake green hybrid terrain and a fourth line. The invaded rock mainly comprises biotite two-long granite porphyry and granite. The quartz pulse type tungsten ore bed has rich pulse types, and the penetration and cutting relations among different pulse bodies are clear. Based on the existing research results and field geological investigation, the geological data (1:50000 mineral geological survey data, arcGIS readable format) and geochemical data (1:50000 water system sediment geochemical data) of the mining area are obtained, and a random forest model is finally selected as a prediction model by fusing lithology variables, fault variables, rock mass variables, mineralization point and non-mineralization point variables, original geochemical unit element variables, geochemical M-means central clustering analysis variables, geochemical linear dimension-reduction projection variables and geochemical nonlinear dimension-reduction analysis variables, adding up to 25 image layer variables, comparing the performance of three machine learning models (figure 5). According to field record description of regional geological investigation, defining mineralization points at the positions where obvious wolframite or wolframite-containing quartz veins are grown at observation points, selecting 138 mineralization point variables in total, taking random points as non-mineralization point variables according to the ratio of 1:1 and 1:5, performing two classification on spatial data, and finally obtaining 98% accuracy rate by random forests. By comparing target prediction models with different proportions, the mineralized abnormal target (red frame) in fig. 5B is obviously smaller than the mineralized abnormal target (yellow frame) in fig. 5A, and it can be seen that the mining target can be quickly reduced, the mining abnormality is highlighted and the mining target prediction is assisted by fusing the multivariate data variables through the random forest model.
In the description of the present specification, reference to the terms "one embodiment," "some embodiments," "illustrative embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.
While embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.

Claims (8)

1. The method for locating the target area of the prospecting based on the multivariate data is characterized by comprising the following steps:
Acquiring a geological data set, a geochemical data set and a geophysical data set of a target area;
Processing the geological data set to obtain lithology variables, fault variables, rock mass variables, mineralization points and non-mineralization point variables;
Processing the geochemical data set to obtain an original geochemical single element variable, a geochemical M-means central cluster analysis variable, a geochemical linear dimension-reduction projection variable and a geochemical nonlinear dimension-reduction analysis variable;
processing the geophysical data set to obtain a geophysical M-means central cluster analysis variable;
Fusing the lithology variable, the fault variable, the rock variable, the geophysical M-means central cluster analysis variable, the original geochemical single element variable, the geochemical M-means central cluster analysis variable, the geochemical linear dimension-reduction projection variable and the geochemical nonlinear dimension-reduction analysis variable to obtain a fused multi-element prediction variable layer; taking the mineralization point and non-mineralization point variables as classification layers;
Respectively inputting the multi-element prediction variable layers into at least three different machine learning models to obtain classification probability values of each point output by the machine learning models, wherein the classification labels of each machine learning model are the classification layers;
Selecting the machine learning model with the best performance as a final mine classification model based on classification evaluation data of all the machine learning models;
outputting a mineral distribution probability map of the target area based on the final mine point classification model;
The processing the geochemical data set to obtain an original geochemical single element variable, a geochemical M-means central cluster analysis variable, a geochemical linear dimension-reduction projection variable and a geochemical nonlinear dimension-reduction analysis variable comprises the following steps:
Carrying out format unification processing on the geochemical data set to obtain a first geochemical data set;
Acquiring a missing value in the first geochemical data set, acquiring a missing value proportion, and calculating a missing value for data with the missing value proportion lower than a preset proportion to obtain second geochemical data;
performing rasterization processing on the second geochemical data by using a point analysis tool to obtain an original geochemical unit element variable;
converting each datum in the second geochemical data into logarithmic ratio data to obtain third geochemical data;
performing dimension reduction analysis on the third geochemical data by using a linear projection dimension reduction function to obtain a double-scale graph and a characteristic root graph;
Calculating the relative contribution value of a single element in the third geochemical data in each low-latitude space, selecting a target low-latitude space by combining the relative contribution value, the double-scale graph and the characteristic root graph, and setting the score value of a sample in the target low-latitude space as a projected dimension-reduction function conversion value of the sample;
Matching the sample coordinates with the corresponding conversion values of the projection dimension reduction function to generate a projection dimension reduction function conversion visual map; converting the visual map based on the projection dimension-reduction function to obtain a geochemical linear dimension-reduction projection variable;
Performing cluster analysis on the third geochemical data to obtain an M-means central cluster grouping result;
Matching the sample coordinates with the M-means central clustering grouping result to form a geochemical data clustering analysis visual map; obtaining a geochemistry M-means central cluster analysis variable based on the geochemistry data cluster analysis visual map;
Carrying out random nonlinear dimension reduction analysis on the third geochemical data, and calculating a score value of each sample in each nonlinear low-dimension space direction;
Matching the sample coordinates with the score values to generate a geochemical data nonlinear dimension reduction visual map; obtaining a geochemistry nonlinear dimension reduction analysis variable based on a geochemistry data nonlinear dimension reduction visual map;
The processing of the geophysical data set to obtain a geophysical M-means central cluster analysis variable includes:
performing M-means central cluster analysis on the geophysical data set to obtain a cluster analysis result of each geophysical sampling point;
Matching the geophysical sample coordinates with the clustering grouping result to generate a geophysical data clustering analysis visual map;
Supplementing geophysical data based on the geophysical data cluster analysis visual map, and carrying out rasterization processing on the supplemented geophysical data set to obtain the geophysical M-means central cluster analysis variable.
2. The method of claim 1, wherein processing the geological data set to obtain lithology variables, fault variables, rock mass variables, mineralization points and non-mineralization point variables comprises:
Rasterizing rock exposure of different lithologies in the geological data set by using a surface analysis tool to obtain lithology variables;
a line analysis tool is utilized to set a multi-level buffer zone for fracture zones or faults with different trend in the geological data set, the trend of the fracture zones or faults is extracted, the fracture zones or faults are classified according to the trend of the fracture zones or faults, and the classified fracture zones or faults are subjected to grating treatment to obtain the fault variable;
Rasterizing a magma rock invaded body and a stratum in the geological data set by using a surface analysis tool to obtain a rock mass variable;
Marking obvious mineralization positions on the surface of the geological data set as mineralization points, randomly selecting other point positions as non-mineralization points according to a preset proportion, and rasterizing the mineralization points and the non-mineralization points by using a point analysis tool to obtain the mineralization point and non-mineralization point variables.
3. The method for locating a target area according to claim 1, wherein the clustering analysis is performed on the third geochemical data to obtain an M-means center cluster grouping result, and the method comprises: and performing cluster analysis on the third geochemical data by using an M-means central cluster function, wherein the M-means central cluster function is as follows:
Where M is the number of groupings, X M is the centroid, and X im is the data point;
the random nonlinear dimension-reduction analysis is performed on the third geochemical data, and the score value of each sample in each nonlinear low-dimension space direction is calculated, which comprises the following steps: and carrying out random nonlinear dimension reduction analysis on the third geochemical data by using a nonlinear projection dimension reduction function, wherein a calculation formula of the nonlinear projection dimension reduction function comprises:
A ij represents the similarity between data points x i and x j in high latitude space, σ i is the variance of the gaussian distribution;
B ij represents the similarity between data points y i and y j in low latitude space;
PQ is the minimization of relative entropy.
4. The method of claim 1, wherein the at least three different machine learning models include a random forest model, a support vector machine model, and a multi-layer perceptron model;
training the random forest model includes: drawing a sample learning curve, and confirming the number of trees and the node selection number based on the sample learning curve; outputting a base index chart, and confirming the contribution value of each predicted variable in the multi-predicted variable chart layer based on the base index chart so as to adjust the type and/or weight of the predicted variable in the multi-predicted variable chart layer;
Training the support vector machine model includes: drawing a sample learning curve, and confirming the sample introduction quantity based on the sample learning curve; outputting regression coefficients to identify a primary prediction variable;
The multi-layered perceptron model comprises an input layer, one or more hidden layers, and an output layer, the hidden layers comprising a plurality of neurons, the neurons being interconnected by weighted connections, each neuron comprising a nonlinear activation function comprising at least one of a tanh function, a sigmoid function, and a ReLU function, the training step of the multi-layered perceptron model comprising: the multi-layer perceptron model is trained based on a back propagation algorithm of gradient descent.
5. The method of claim 1, wherein selecting the machine learning model with the best performance as a final mine classification model based on classification evaluation data of all the machine learning models comprises:
respectively drawing corresponding ROC curves according to the classification evaluation data of each machine learning model;
calculating corresponding AUC values based on the ROC curves corresponding to the machine learning models respectively;
And comparing the AUC values corresponding to the machine learning models, and selecting the machine learning model with the largest AUC value as a final mining point classification model.
6. The utility model provides a prospecting target region positioning system based on multivariate data which characterized in that includes:
The acquisition module is used for acquiring a geological data set, a geochemical data set and a geophysical data set of the target area;
the first processing module is used for processing the geological data set to obtain lithology variables, fault variables, rock mass variables, mineralization points and non-mineralization point variables;
The second processing module is used for processing the geochemical data set to obtain an original geochemical single element variable, a geochemical M-means central cluster analysis variable, a geochemical linear dimension-reduction projection variable and a geochemical nonlinear dimension-reduction analysis variable;
the third processing module is used for processing the geophysical data set to obtain a geophysical M-means central cluster analysis variable;
The fusion module is used for fusing the lithologic variable, the fault variable, the rock variable, the geophysical M-means central cluster analysis variable, the original geochemistry single element variable, the geochemistry M-means central cluster analysis variable, the geochemistry linear dimension-reduction projection variable and the geochemistry nonlinear dimension-reduction analysis variable to obtain a fused multi-element multi-prediction variable layer; taking the mineralization point and non-mineralization point variables as classification layers;
The input module is used for respectively inputting the multi-element prediction variable layers into at least three different machine learning models to obtain a classification probability value of each point output by each machine learning model, and the classification label of each machine learning model is the classification layer;
the selection module is used for selecting the machine learning model with the best performance as a final mine point classification model based on classification evaluation data of all the machine learning models;
the output module is used for outputting a mineral distribution probability map of the target area based on the final mine point classification model;
The second processing module is specifically configured to:
Carrying out format unification processing on the geochemical data set to obtain a first geochemical data set;
Acquiring a missing value in the first geochemical data set, acquiring a missing value proportion, and calculating a missing value for data with the missing value proportion lower than a preset proportion to obtain second geochemical data;
performing rasterization processing on the second geochemical data by using a point analysis tool to obtain an original geochemical unit element variable;
converting each datum in the second geochemical data into logarithmic ratio data to obtain third geochemical data;
performing dimension reduction analysis on the third geochemical data by using a linear projection dimension reduction function to obtain a double-scale graph and a characteristic root graph;
Calculating the relative contribution value of a single element in the third geochemical data in each low-latitude space, selecting a target low-latitude space by combining the relative contribution value, the double-scale graph and the characteristic root graph, and setting the score value of a sample in the target low-latitude space as a projected dimension-reduction function conversion value of the sample;
Matching the sample coordinates with the corresponding conversion values of the projection dimension reduction function to generate a projection dimension reduction function conversion visual map; converting the visual map based on the projection dimension-reduction function to obtain a geochemical linear dimension-reduction projection variable;
Performing cluster analysis on the third geochemical data to obtain an M-means central cluster grouping result;
Matching the sample coordinates with the M-means central clustering grouping result to form a geochemical data clustering analysis visual map; obtaining a geochemistry M-means central cluster analysis variable based on the geochemistry data cluster analysis visual map;
Carrying out random nonlinear dimension reduction analysis on the third geochemical data, and calculating a score value of each sample in each nonlinear low-dimension space direction;
Matching the sample coordinates with the score values to generate a geochemical data nonlinear dimension reduction visual map; obtaining a geochemistry nonlinear dimension reduction analysis variable based on a geochemistry data nonlinear dimension reduction visual map;
The third processing module is specifically configured to:
performing M-means central cluster analysis on the geophysical data set to obtain a cluster analysis result of each geophysical sampling point;
Matching the geophysical sample coordinates with the clustering grouping result to generate a geophysical data clustering analysis visual map;
Supplementing geophysical data based on the geophysical data cluster analysis visual map, and carrying out rasterization processing on the supplemented geophysical data set to obtain the geophysical M-means central cluster analysis variable.
7. An electronic device, the electronic device comprising:
at least one processor; and
A memory communicatively coupled to the at least one processor; wherein,
The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the multi-data-based prospecting target positioning method according to any one of claims 1 to 5.
8. A computer readable storage medium storing computer instructions for causing a computer to perform the method of multi-data-based prospecting target area positioning according to any one of claims 1 to 5.
CN202410399063.2A 2024-04-03 Method, system, electronic equipment and storage medium for locating mining target area based on multivariate data Active CN118194162B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410399063.2A CN118194162B (en) 2024-04-03 Method, system, electronic equipment and storage medium for locating mining target area based on multivariate data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410399063.2A CN118194162B (en) 2024-04-03 Method, system, electronic equipment and storage medium for locating mining target area based on multivariate data

Publications (2)

Publication Number Publication Date
CN118194162A CN118194162A (en) 2024-06-14
CN118194162B true CN118194162B (en) 2024-08-27

Family

ID=

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050252A (en) * 2014-06-12 2014-09-17 核工业北京地质研究院 Hyperspectral remote sensing alteration information extracting method
CN114997501A (en) * 2022-06-08 2022-09-02 河海大学 Deep learning mineral resource classification prediction method and system based on sample unbalance

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050252A (en) * 2014-06-12 2014-09-17 核工业北京地质研究院 Hyperspectral remote sensing alteration information extracting method
CN114997501A (en) * 2022-06-08 2022-09-02 河海大学 Deep learning mineral resource classification prediction method and system based on sample unbalance

Similar Documents

Publication Publication Date Title
Cressie et al. Spatial statistics
US11599790B2 (en) Deep learning based reservoir modeling
Zhu et al. Intelligent logging lithological interpretation with convolution neural networks
Alnahwi et al. Mineralogical composition and total organic carbon quantification using x-ray fluorescence data from the Upper Cretaceous Eagle Ford Group in southern Texas
CN113902861A (en) Three-dimensional geological modeling method based on machine learning
CN114139819A (en) Geochemical variable space prediction method based on geostatistical weighted random forest
Leung et al. Sample truncation strategies for outlier removal in geochemical data: the MCD robust distance approach versus t-SNE ensemble clustering
CN112199886A (en) Processing method of PRB data deep learning geological map prediction model
CN114997501A (en) Deep learning mineral resource classification prediction method and system based on sample unbalance
Sadeghi Spectrum-area method
Akinyokun et al. Well log interpretation model for the determination of lithology and fluid
Lovejoy Scaling and scale invariance
CN118194162B (en) Method, system, electronic equipment and storage medium for locating mining target area based on multivariate data
Isleyen et al. Lithological classification of limestones with self-organizing maps
Li et al. Prospectivity and Uncertainty Analysis of Tungsten Polymetallogenic Mineral Resources in the Nanling Metallogenic Belt, South China: A Comparative Study of AdaBoost, GBDT, and XgBoost Algorithms
Prades Geostatistics and clustering for geochemical data analysis
CN118194162A (en) Method, system, electronic equipment and storage medium for locating target area of prospecting
Chen et al. A geospatial case‐based reasoning model for oil–gas reservoir evaluation
Cheng et al. Fuzzy weights of evidence method implemented in GeoDAS GIS for information extraction and integration for prediction of point events
Houran et al. USING GIS DATA AND MACHINE LEARNING FOR MINERAL MAPPING. STUDY CASE, BOU SKOUR EASTERN ANTI-ATLAS, MOROCCO
Stoyan Stochastic geometry in the geosciences
Posa et al. Spatial autocorrelation
Benndorf Statistical Quality Control
Ouadfeul et al. Self-Organizing Maps
Hou et al. Identification of carbonate sedimentary facies from well logs with machine learning

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant