CN115331752B - Method capable of adaptively predicting quartz forming environment - Google Patents

Method capable of adaptively predicting quartz forming environment Download PDF

Info

Publication number
CN115331752B
CN115331752B CN202210867109.XA CN202210867109A CN115331752B CN 115331752 B CN115331752 B CN 115331752B CN 202210867109 A CN202210867109 A CN 202210867109A CN 115331752 B CN115331752 B CN 115331752B
Authority
CN
China
Prior art keywords
data
quartz
algorithm
dimension reduction
classification model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210867109.XA
Other languages
Chinese (zh)
Other versions
CN115331752A (en
Inventor
牛云云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Geosciences Beijing
Original Assignee
China University of Geosciences Beijing
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Geosciences Beijing filed Critical China University of Geosciences Beijing
Priority to CN202210867109.XA priority Critical patent/CN115331752B/en
Publication of CN115331752A publication Critical patent/CN115331752A/en
Application granted granted Critical
Publication of CN115331752B publication Critical patent/CN115331752B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/20Identification of molecular entities, parts thereof or of chemical compositions
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/80Data visualisation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The invention discloses a method capable of adaptively predicting a quartz forming environment, which comprises the following steps: performing data preprocessing on the obtained original data, and enhancing the preprocessed original data through a Smote algorithm to obtain target data; selecting characteristics of the target data, and constructing a classification model of a quartz forming environment through an SVM algorithm according to the selected characteristics; parameter optimization is carried out on the classification model through a PSO algorithm, and an optimized quartz model is obtained; and performing dimension reduction processing on the target data through a dimension reduction algorithm, drawing to obtain a two-dimensional decision boundary according to the obtained dimension reduction data and an SVM algorithm, and constructing a visualization tool of the quartz classifier. The invention improves the prediction performance and the prediction accuracy, and can be widely applied to the technical field of machine learning.

Description

Method capable of adaptively predicting quartz forming environment
Technical Field
The invention relates to the technical field of machine learning, in particular to a method capable of adaptively predicting a quartz forming environment.
Background
Quartz is a mineral resource widely distributed in the crust and having important economic value, and has important application in the industries of glass, semiconductors, chemical industry and the like. The formation environment of quartz determines the properties of quartz, and thus has important significance for the study of the formation environment of quartz. The formation environment of quartz can be distinguished by studying the concentration of elements contained in quartz. In the prior study, the quartz forming environment is often distinguished by drawing a binary graph or a ternary graph, and the quartz is often difficult to be accurately classified by the classification method.
The quartz forming environment is distinguished mainly by drawing a binary diagram or a ternary diagram for the trace element content of different types of quartz. However, the conventional method relies heavily on personal experience, is complicated in design, and fails when the microelements contained in different types of quartz are not obvious.
Recently, it is considered that classification of quartz forming environments is achieved by machine learning, and a good classification effect is achieved. However, this work is time-consuming for the prediction model to tune parameters using grid search and cross-validation, requiring a thorough search in the hyper-parameter space. This process requires the determination of the interval of feasible solutions and the appropriate sampling step size, which is heavily dependent on personal experience and may be trapped in local optima. The data set used in the experimental part of the scheme has a certain data imbalance problem, which may lead to poor prediction performance of the model, especially for the predictions of fewer classes of samples. The scheme performs feature selection by a method of testing on a data set retaining different sample features, and the method of feature selection is too cumbersome when there are many sample features. In addition, existing quartz visualization methods are not effective in separating different quartz samples.
Disclosure of Invention
In view of the above, the embodiment of the invention provides a method with good prediction performance and high accuracy, which can adaptively predict the quartz forming environment.
An aspect of an embodiment of the present invention provides a method capable of adaptively predicting a quartz formation environment, including:
performing data preprocessing on the obtained original data, and enhancing the preprocessed original data through a Smote algorithm to obtain target data;
selecting characteristics of the target data, and constructing a classification model of a quartz forming environment through an SVM algorithm according to the selected characteristics;
parameter optimization is carried out on the classification model through a PSO algorithm, and an optimized quartz model is obtained;
performing dimension reduction processing on the target data through a dimension reduction algorithm, drawing to obtain a two-dimensional decision boundary according to the obtained dimension reduction data and an SVM algorithm, and constructing a visualization tool of the quartz classifier;
the visualization tool is used for displaying quartz forming environment information obtained after the data to be predicted are predicted.
Optionally, the performing data preprocessing on the obtained original data, enhancing the preprocessed original data by a Smote algorithm, to obtain target data, including:
carrying out standardization processing on the original data, and dividing the original data by a data variance after subtracting a data mean value according to data attributes to obtain a standardization processing result;
carrying out data transformation processing on the original data to obtain Gaussian distribution information of the data;
the target data after the standardization processing and the data transformation processing are subjected to data enhancement by a Smote algorithm and then are divided into a training set and a testing set.
Optionally, the feature selection of the target data includes:
the weight of each sample feature in the classification process, in particular, can be calculated using a random forest algorithm:
calculating the error of the out-of-bag data of each decision tree in the random forest to obtain a first error; wherein the out-of-bag data is used to characterize the remaining data not used to train the decision tree;
randomly changing the value of the sample characteristic, and calculating the error of the data outside the bag again to obtain a second error;
according to the first error and the second error, calculating weights of different sample characteristics;
and determining the correlation relationship among different sample characteristics through a Person correlation coefficient calculation formula, and finally selecting target characteristics.
Optionally, the building a classification model of the quartz forming environment through an SVM algorithm according to the selected characteristics comprises:
constructing a model training data set according to the selected characteristics, and determining characteristic data and sample labels in the model training data set;
the support vector machine classifies sample data in the model training data set by searching a hyperplane to obtain a classification model for constructing a quartz forming environment.
Optionally, performing parameter optimization on the classification model through a PSO algorithm to obtain an optimized quartz model, including:
initializing parameter populations and selecting parameters;
training a model according to the current parameters to obtain K-fold cross validation scores of the quartz classification model based on the current parameters on a validation set;
the K-fold cross verification score is used as an fitness function value, and a population optimal value and an individual optimal value are updated;
updating the iteration times, judging whether the termination condition is met, if the termination condition is not met, returning to the initialization parameter population again, and selecting parameters until the final optimized classification model is obtained through iteration.
Optionally, the dimension reduction processing is performed on the target data by a dimension reduction algorithm, a two-dimensional decision boundary is drawn according to the obtained dimension reduction data and an SVM algorithm, and a visualization tool of the quartz classifier is constructed, including:
compressing the high-dimensional characteristic data into two-dimensional data through a dimension reduction algorithm;
training a quartz classification model based on the SVM by using the compressed data;
equidistant coordinate points are selected on the two-dimensional plane, the selected coordinate points are used as new data, and classification is completed by using a trained classification model;
and (3) all data are projected onto a two-dimensional plane, and a decision boundary diagram is drawn by using the SVM.
Another aspect of the embodiment of the invention also provides an electronic device, which includes a processor and a memory;
the memory is used for storing programs;
the processor executes the program to implement the method as described above.
Another aspect of the embodiments of the present invention also provides a computer-readable storage medium storing a program that is executed by a processor to implement a method as described above.
Embodiments of the present invention also disclose a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions may be read from a computer-readable storage medium by a processor of a computer device, and executed by the processor, to cause the computer device to perform the foregoing method.
According to the embodiment of the invention, the obtained original data is subjected to data preprocessing, and the preprocessed original data is enhanced by a Smote algorithm to obtain target data; selecting characteristics of the target data, and constructing a classification model of a quartz forming environment through an SVM algorithm according to the selected characteristics; parameter optimization is carried out on the classification model through a PSO algorithm, and an optimized quartz model is obtained; and performing dimension reduction processing on the target data through a dimension reduction algorithm, drawing to obtain a two-dimensional decision boundary according to the obtained dimension reduction data and an SVM algorithm, and constructing a visualization tool of the quartz classifier. The invention improves the prediction performance and the prediction accuracy.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of the overall steps provided by an embodiment of the present invention;
fig. 2 is a flowchart of PSO optimization model parameters provided in an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
Aiming at the problems existing in the prior art, the embodiment of the invention provides a method capable of adaptively predicting a quartz forming environment, which comprises the following steps:
performing data preprocessing on the obtained original data, and enhancing the preprocessed original data through a Smote algorithm to obtain target data;
selecting characteristics of the target data, and constructing a classification model of a quartz forming environment through an SVM algorithm according to the selected characteristics;
parameter optimization is carried out on the classification model through a PSO algorithm, and an optimized quartz model is obtained;
performing dimension reduction processing on the target data through a dimension reduction algorithm, drawing to obtain a two-dimensional decision boundary according to the obtained dimension reduction data and an SVM algorithm, and constructing a visualization tool of the quartz classifier;
the visualization tool is used for displaying quartz forming environment information obtained after the data to be predicted are predicted.
Optionally, the performing data preprocessing on the obtained original data, enhancing the preprocessed original data by a Smote algorithm, to obtain target data, including:
carrying out standardization processing on the original data, and dividing the original data by a data variance after subtracting a data mean value according to data attributes to obtain a standardization processing result;
carrying out data transformation processing on the original data to obtain Gaussian distribution information of the data;
the target data after the standardization processing and the data transformation processing are subjected to data enhancement by a Smote algorithm and then are divided into a training set and a testing set.
Optionally, the feature selection of the target data includes:
the weight of each sample feature in the classification process, in particular, can be calculated using a random forest algorithm:
calculating the error of the out-of-bag data of each decision tree in the random forest to obtain a first error; wherein the out-of-bag data is used to characterize the remaining data not used to train the decision tree;
randomly changing the value of the sample characteristic, and calculating the error of the data outside the bag again to obtain a second error;
according to the first error and the second error, calculating weights of different sample characteristics;
and determining the correlation relationship among different sample characteristics through a Person correlation coefficient calculation formula, and finally selecting target characteristics.
Optionally, the building a classification model of the quartz forming environment through an SVM algorithm according to the selected characteristics comprises:
constructing a model training data set according to the selected characteristics, and determining characteristic data and sample labels in the model training data set;
the support vector machine classifies sample data in the model training data set by searching a hyperplane to obtain a classification model for constructing a quartz forming environment.
Optionally, performing parameter optimization on the classification model through a PSO algorithm to obtain an optimized quartz model, including:
initializing parameter populations and selecting parameters;
training a model according to the current parameters to obtain K-fold cross validation scores of the quartz classification model based on the current parameters on a validation set;
the K-fold cross verification score is used as an fitness function value, and a population optimal value and an individual optimal value are updated;
updating the iteration times, judging whether the termination condition is met, if the termination condition is not met, returning to the initialization parameter population again, and selecting parameters until the final optimized classification model is obtained through iteration.
Optionally, the dimension reduction processing is performed on the target data by a dimension reduction algorithm, a two-dimensional decision boundary is drawn according to the obtained dimension reduction data and an SVM algorithm, and a visualization tool of the quartz classifier is constructed, including:
compressing the high-dimensional characteristic data into two-dimensional data through a dimension reduction algorithm;
training a quartz classification model based on the SVM by using the compressed data;
equidistant coordinate points are selected on the two-dimensional plane, the selected coordinate points are used as new data, and classification is completed by using a trained classification model;
and (3) all data are projected onto a two-dimensional plane, and a decision boundary diagram is drawn by using the SVM.
Another aspect of the embodiment of the invention also provides an electronic device, which includes a processor and a memory;
the memory is used for storing programs;
the processor executes the program to implement the method as described above.
Another aspect of the embodiments of the present invention also provides a computer-readable storage medium storing a program that is executed by a processor to implement a method as described above.
Embodiments of the present invention also disclose a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions may be read from a computer-readable storage medium by a processor of a computer device, and executed by the processor, to cause the computer device to perform the foregoing method.
The following describes the specific implementation of the present invention in detail with reference to the drawings of the specification:
in order to avoid poor prediction performance of the model due to unbalanced distribution of quartz samples, particularly for prediction of fewer types of samples, the invention proposes to carry out data enhancement on a quartz data set by utilizing a Smote algorithm; in order to improve the complicated characteristic selection process, the invention provides a method for calculating characteristic weights by utilizing random forests and carrying out characteristic selection by combining pearson correlation coefficients among sample characteristics; in order to avoid the problem that the classification model is in local optimum due to parameter adjustment by using a traditional mode, the invention carries out self-adaptive parameter adjustment on the quartz classifier by using PSO; in order to further improve the visual effect of the quartz classification model and more intuitively observe the classification condition of the sample, the invention provides a dimension reduction algorithm and draws a decision boundary by utilizing an SVM.
Firstly, data preprocessing is carried out on a selected data set, the data preprocessing comprises data standardization, gaussian transformation is carried out on the data, and data enhancement is carried out by utilizing a Smote algorithm to relieve data unbalance.
Feature selection is performed by computing feature weights using random forests and combining pearson correlation coefficients between sample features. According to the scheme, an SVM algorithm is selected to construct quartz to form an environment classification model.
Support Vector Machines (SVMs) are a well developed and sophisticated supervised learning technique that is widely used to solve classification problems in machine learning applications. The support vector machine algorithm solves an optimal hyperplane in the solution space so that the samples of the solution space can be correctly separated by the hyperplane, i.e., the hyperplane spacing between positive and negative samples is maximized.
And selecting PSO to perform parameter optimization on the quartz classification model.
PSO is easy to realize, the parameters to be adjusted are fewer, and the parameter optimization can be performed through a PSO algorithm, so that the classification model can be effectively prevented from sinking into local optimization. Initializing a parameter population firstly in a parameter optimization process, selecting initial parameters, training a model according to the current parameters to obtain K-fold cross validation scores of a quartz classification model based on the current parameters on a validation set, using the K-fold cross validation scores as fitness function values, updating the population optimal values and the individual optimal values, updating iteration times, judging whether termination conditions are met, reselecting the parameters if the conditions are not met, repeating the previous process, and repeating for a plurality of times to obtain a final optimized quartz model.
Finally, the data is compressed from high dimension to two dimension by using a dimension reduction algorithm, and a two-dimension decision boundary is drawn by using the compressed data and an SVM algorithm, so that the visualization tool of the quartz classifier is provided.
FIG. 1 is a complete flow chart of steps provided by an embodiment of the present invention.
The following describes the implementation of the present invention in detail with reference to fig. 1:
1. data preprocessing
The quartz dataset was divided into training and test sets in a 8:2 ratio. The training set is used for training the classifier, and the testing set is used for verifying the effect of the classifier under new data. In order to better solve the quartz classification problem, the data are standardized. The specific steps are that the data is subtracted by its mean value according to its attributes and then divided by its variance. By means of the normalization process, different feature variables can be made to have the same dimensions. Therefore, the difference between the features is eliminated, the convergence speed is increased, the classification accuracy is improved, in addition, data are transformed to obtain Gaussian distribution, and the purpose is to relieve the influence of inclination of the data distribution, so that the value of an originally dense interval becomes dispersed as much as possible. Based on the distribution of the data samples collected historically, it can be appreciated that there is some data imbalance in the quartz data set, which may affect the final quartz classification effect, thus alleviating the imbalance in the data set by using the Smote algorithm.
2. Feature selection
Each of the quartz samples of the selected quartz dataset had 5 features, al, ti, li, ge, sr in turn. The weight of each sample feature in the classification process can be calculated using a random forest algorithm. First, the error err1 of the out-of-bag data of each decision tree in the random forest is calculated, and the out-of-bag data refers to the remaining data not used for training the decision tree. The value at sample feature X is then randomly changed and the error err2 of the out-of-bag data is again calculated. Assuming that there are N trees in the forest, the weights of the features X can be calculated using equation (1). The weights of the five features are all greater than 0.15, so there are no extraneous features.
W=∑(err2-err1)/N (1)
And obtaining the correlation between the different sample features by calculating the Pearson correlation coefficient between the different features. The Pearson correlation coefficient is a statistic that reflects the degree of linear correlation of two variables. The value range of the Pearson correlation coefficient is (-1, 1), and the closer the Pearson correlation coefficient is-1 means that the stronger the negative linear relationship between the two variables, the closer to 1 means that the positive linear relationship between the two variables is stronger, and the closer to 0 the Pearson correlation coefficient means that the wireless correlation between the two variables is. Formula (2) is a specific calculation formula of the pearson correlation coefficient, and Cov (X, Y) in formula (2) represents the standard deviation sigma between two variables X Representing the covariance of the variable X, sigma Y Representing the covariance of the variable Y. The correlation coefficient between the 5 features can be determined to be low, and the correlation coefficient between the 5 features is not more than 0.6. This shows that the quartz sample features do not have a significant linear correlation, thus preserving 5 features.
ρ(X,Y)=Cov(X,Y)/(σ_X*σ_Y) (2)
3. SVM quartz classifier based on particle swarm optimization
And selecting a support vector machine algorithm with RBF kernels to carry out quartz classification. Support Vector Machines (SVMs) are a well developed and sophisticated supervised learning technique that is widely used to solve classification problems in machine learning applications. The support vector machine algorithm solves an optimal hyperplane in the solution space so that the samples of the solution space can be correctly separated by the hyperplane, i.e., the hyperplane spacing between positive and negative samples is maximized. Not all training samples can find the optimal hyperplane in the original sample space, the support vector machine often maps the data samples from the feature space to the kernel space through the kernel function, and the support vector machine can find the optimal hyperplane in the kernel space, so that the samples can be linearly separable in the mapped feature space.
A training data set is determined, the feature data is represented by x, and y represents the sample label. As discussed above, the support vector machine classifies sample data by finding a hyperplane, so the hyperplane can be written as follows:
y=sign[W T φ(x)+b] (3)
the nonlinear function phi (x) in equation (3) can map low-dimensional sample features to high-dimensional kernel space, where W represents a coefficient vector and b represents a bias term.
H (W, ζ) represents the hyperplane to be found, and the coefficient vector W and the relaxation variable are calculated by equation (4). Relaxation variable ζ i The function of (c) is to allow some sample data to be misclassified by the hyperplane. Zeta type toy i Allowing the ith data point to deviate from the interval. However, the relaxation variable ζ cannot be allowed i At infinity, the goal is to obtain a relaxation variable ζ as small as possible i . The penalty parameter C in equation (4) is used to control the soft interval. The larger C, the greater the penalty for error classification. And (3) converting the equation (4) into a corresponding Lagrangian function to find an unknown solution, wherein the specific expression form is as follows:
lambda in equation (5) i Represents the Lagrangian multiplier, variables W, b and ζ in equation (5) i Respectively deriving to obtain an equation (6):
the above equation (6) is taken into equation (2) to obtain equation (7), and the unknown parameters can be solved according to equation (7).
The particle swarm algorithm is a population intelligent optimization algorithm, and is inspired by foraging of birds, so that the particles can find the global optimal solution. The particle swarm algorithm mainly searches for an optimal solution by utilizing competition and cooperation relations among population individuals, wherein each solution is regarded as a particle with neglected volume quality in a search space and flies in a feasible space according to a certain speed, all particles have fitness values, and a proper fitness function is constructed according to specific problems. The particles perform a state update according to the fitness function. The iterative formula of the particle swarm algorithm is as follows:
equation (8) is a velocity update equation, in which the velocity of the current particle is calculated and then the position information of the current particle is updated according to equation (9), in whichRepresenting the position of the ith particle in N-dimensional space for the t-th iteration with the variable +.>Representing the previous optimal position of the ith particle, using the variable +.>Represents the optimal position of i iterations before the population, +.>The speed of the ith iteration of the ith particle in the M-dimensional space is the vector, the speed of the particle is +.>The range of the value of (C) is [ -V max ,V max ]. In formula (8), W t The inertial weight factor representing the ith iteration is in the range of 0,1]Inertial weight W t The particles may be made to remain in motion and thereby search for new regions. Parameter C 1 、C 1 Indicating acceleration constants, all of which are non-negative, C 1 Individual awareness for controlling current particles, C 2 For controlling population awareness of current particles, when C 1 Too large a particle population can result in a local optimumWhen C 2 When too large, the model convergence speed is slow.
The parameter tuning of the quartz classifier based on the particle swarm algorithm is to tune the penalty parameter C and the kernel function parameter gama, as shown in FIG. 2. In the parameter optimization process, firstly, initializing a parameter population, selecting parameters (C, ga), training a model according to the current parameters, and obtaining K-fold cross-validation scores of a quartz classification model based on the current parameters on a validation set. And taking the K-fold cross validation score as an fitness function value fitness, and updating the population optimal value and the individual optimal value. Updating the iteration times, judging whether the iteration times meet the termination conditions, if the iteration times do not meet the termination conditions, reselecting parameters, repeating the previous process, and obtaining the finally optimized quartz model through multiple iterations.
By observing the change trend of the average objective function value of the population individuals and the optimal objective function value of the population individuals, it can be determined that the average objective function value of the population individuals is continuously improved to finally approach the optimal objective function value of the population individuals. The objective function may be defined as the average classification accuracy after K-fold cross-validation on the validation set, and eventually may be optimized when c=701, gama=0.927.
4. Drawing two-dimensional decision boundary by using dimension reduction algorithm
The two-dimensional decision boundary of the quartz classifier is drawn based on the dimension reduction algorithm and by utilizing the SVM, so that the visualization of the quartz classifier is realized, and the visualization effect of the decision boundary drawn based on different dimension reduction algorithms is compared. Drawing a decision plane, firstly compressing high-dimensional characteristic data into two-dimensional data through a dimension reduction algorithm, training a quartz classification model based on SVM by using the compressed data, then equidistantly selecting coordinate points on the two-dimensional plane, taking the selected coordinate points as new data, finishing classification by using the trained classification model, and finally projecting all the data onto a 2-dimensional plane.
In summary, the invention has the following characteristics:
(1) Compared with a grid search method, the method reduces the parameters required to be searched, avoids the problem of poor model effect caused by insufficient priori knowledge, and improves the classification accuracy. (2) The data enhancement is performed on the used data set by utilizing a Smote algorithm, and feature weights are calculated by utilizing a random forest and feature selection is performed by combining with the Pelson correlation coefficient among sample features, so that the recognition accuracy of the quartz classification model is further improved. (3) The method is characterized in that the high-dimensional characteristic data are compressed into two-dimensional data by using a dimension reduction algorithm, and a two-dimensional (2D) decision boundary is drawn by using an SVM algorithm according to the compressed data, so that the classification effect is visualized better.
Compared with the prior art, the method and the device for enhancing the data set by the Smote effectively relieve the problem of unbalanced data and further improve the identification effect of the quartz classifier. The important features are preserved by calculating feature weights using random forests and combining pearson correlation coefficients between sample features for feature selection. The parameter tuning is performed by using the particle swarm algorithm, so that the classification effect of the quartz classifier is effectively improved, the final classification accuracy of quartz classification is improved, meanwhile, compared with a grid search method, the parameters required to be searched are reduced, the problem of poor model effect caused by insufficient priori knowledge is avoided, the self-adaptive parameter tuning is realized, and the recognition effect of a quartz forming environment classification model is improved. In the past, different kinds of quartz samples cannot be intuitively distinguished by drawing a binary diagram or a three-primary diagram, and a classification effect of an SVM classifier cannot be well presented by selecting specific sample characteristics and drawing a decision boundary based on the SVM. According to the scheme, the dimension reduction algorithm compresses the high-dimension characteristic data into two-dimension data, and the two-dimension (2D) decision boundary is drawn by using the SVM algorithm according to the compressed data, so that the classification effect can be visualized well.
In some alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flowcharts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed, and in which sub-operations described as part of a larger operation are performed independently.
Furthermore, while the invention is described in the context of functional modules, it should be appreciated that, unless otherwise indicated, one or more of the described functions and/or features may be integrated in a single physical device and/or software module or one or more functions and/or features may be implemented in separate physical devices or software modules. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary to an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be apparent to those skilled in the art from consideration of their attributes, functions and internal relationships. Accordingly, one of ordinary skill in the art can implement the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative and are not intended to be limiting upon the scope of the invention, which is to be defined in the appended claims and their full scope of equivalents.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.
While the preferred embodiment of the present invention has been described in detail, the present invention is not limited to the embodiments described above, and those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit of the present invention, and these equivalent modifications or substitutions are included in the scope of the present invention as defined in the appended claims.

Claims (9)

1. A method for adaptively predicting a quartz-forming environment, comprising:
performing data preprocessing on the obtained original data, and enhancing the preprocessed original data through a Smote algorithm to obtain target data;
selecting characteristics of the target data, and constructing a classification model of a quartz forming environment through an SVM algorithm according to the selected characteristics;
parameter optimization is carried out on the classification model through a PSO algorithm, and an optimized quartz model is obtained;
performing dimension reduction processing on the target data through a dimension reduction algorithm, drawing to obtain a two-dimensional decision boundary according to the obtained dimension reduction data and an SVM algorithm, and constructing a visualization tool of the quartz classifier;
the visualization tool is used for displaying quartz forming environment information obtained after the data to be predicted are predicted;
the step of performing dimension reduction processing on the target data through a dimension reduction algorithm, and drawing a two-dimensional decision boundary according to the obtained dimension reduction data and an SVM algorithm comprises the following steps:
compressing the high-dimensional characteristic data into two-dimensional data through a dimension reduction algorithm;
training a quartz classification model based on the SVM by using the compressed data;
equidistant coordinate points are selected on the two-dimensional plane, the selected coordinate points are used as new data, and classification is completed by using a trained classification model;
and (3) all data are projected onto a two-dimensional plane, and a decision boundary diagram is drawn by using the SVM.
2. The method for adaptively predicting a quartz forming environment according to claim 1, wherein the performing data preprocessing on the obtained raw data, and enhancing the preprocessed raw data by a Smote algorithm to obtain target data comprises:
carrying out standardization processing on the original data, and dividing the original data by a data variance after subtracting a data mean value according to data attributes to obtain a standardization processing result;
carrying out data transformation processing on the original data to obtain Gaussian distribution information of the data;
the target data after the standardization processing and the data transformation processing are subjected to data enhancement by a Smote algorithm and then are divided into a training set and a testing set.
3. A method of adaptively predicting a quartz-forming environment as set forth in claim 1, wherein said feature selection of said target data comprises:
the weight of each sample feature in the classification process can be calculated by using a random forest algorithm;
and according to the weight of the sample characteristics, determining the correlation relationship among different sample characteristics through a Pearson correlation coefficient calculation formula, and finally selecting target characteristics.
4. A method of adaptively predicting a quartz-forming environment as in claim 3, wherein said using a random forest algorithm to calculate the weight of each sample feature in the classification process comprises:
calculating the error of the out-of-bag data of each decision tree in the random forest to obtain a first error; wherein the out-of-bag data is used to characterize the remaining data not used to train the decision tree;
randomly changing the value of the sample characteristic, and calculating the error of the data outside the bag again to obtain a second error;
and calculating weights of different sample characteristics according to the first error and the second error.
5. A method for adaptively predicting a quartz forming environment as set forth in claim 1, wherein said constructing a classification model of the quartz forming environment by an SVM algorithm based on the selected features comprises:
constructing a model training data set according to the selected characteristics, and determining characteristic data and sample labels in the model training data set;
the support vector machine classifies sample data in the model training data set by searching a hyperplane to obtain a classification model for constructing a quartz forming environment.
6. The method of claim 1, wherein the parameter optimization of the classification model by a PSO algorithm results in an optimized quartz model, comprising:
initializing parameter populations and selecting parameters;
training a model according to the current parameters to obtain K-fold cross validation scores of the quartz classification model based on the current parameters on a validation set;
the K-fold cross verification score is used as an fitness function value, and a population optimal value and an individual optimal value are updated;
updating the iteration times, judging whether the termination condition is met, if the termination condition is not met, returning to the initialization parameter population again, and selecting parameters until the final optimized classification model is obtained through iteration.
7. An electronic device comprising a processor and a memory;
the memory is used for storing programs;
the processor executing the program implements the method of any one of claims 1 to 6.
8. A computer-readable storage medium, characterized in that the storage medium stores a program that is executed by a processor to implement the method of any one of claims 1 to 6.
9. A computer program product comprising a computer program which, when executed by a processor, implements the method of any of claims 1 to 6.
CN202210867109.XA 2022-07-22 2022-07-22 Method capable of adaptively predicting quartz forming environment Active CN115331752B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210867109.XA CN115331752B (en) 2022-07-22 2022-07-22 Method capable of adaptively predicting quartz forming environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210867109.XA CN115331752B (en) 2022-07-22 2022-07-22 Method capable of adaptively predicting quartz forming environment

Publications (2)

Publication Number Publication Date
CN115331752A CN115331752A (en) 2022-11-11
CN115331752B true CN115331752B (en) 2024-03-05

Family

ID=83920062

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210867109.XA Active CN115331752B (en) 2022-07-22 2022-07-22 Method capable of adaptively predicting quartz forming environment

Country Status (1)

Country Link
CN (1) CN115331752B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115859106A (en) * 2022-12-05 2023-03-28 中国地质大学(北京) Mineral exploration method and device based on semi-supervised learning and storage medium
CN116151107B (en) * 2023-02-02 2023-09-05 中国地质大学(北京) Method, system and electronic equipment for identifying ore potential of magma type nickel cobalt

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110189799A (en) * 2019-05-20 2019-08-30 西安交通大学 Based on variable importance scoring and how graceful Pearson came examine macro genome signature selection method
CN111783840A (en) * 2020-06-09 2020-10-16 苏宁金融科技(南京)有限公司 Visualization method and device for random forest model and storage medium
CN111914478A (en) * 2020-07-02 2020-11-10 中国地质大学(武汉) Comprehensive geological drilling well logging lithology identification method
CN113256409A (en) * 2021-07-12 2021-08-13 广州思迈特软件有限公司 Bank retail customer attrition prediction method based on machine learning
CN113344075A (en) * 2021-06-02 2021-09-03 湖南湖大金科科技发展有限公司 High-dimensional unbalanced data classification method based on feature learning and ensemble learning
CN113935375A (en) * 2021-10-13 2022-01-14 哈尔滨理工大学 High-speed electric spindle fault identification method based on UMAP dimension reduction algorithm
CN114341872A (en) * 2019-08-29 2022-04-12 皇家飞利浦有限公司 Facilitating interpretability of classification models
CN114492198A (en) * 2022-02-15 2022-05-13 重庆大学 Cutting force prediction method based on improved PSO algorithm assisted SVM algorithm

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110189799A (en) * 2019-05-20 2019-08-30 西安交通大学 Based on variable importance scoring and how graceful Pearson came examine macro genome signature selection method
CN114341872A (en) * 2019-08-29 2022-04-12 皇家飞利浦有限公司 Facilitating interpretability of classification models
CN111783840A (en) * 2020-06-09 2020-10-16 苏宁金融科技(南京)有限公司 Visualization method and device for random forest model and storage medium
CN111914478A (en) * 2020-07-02 2020-11-10 中国地质大学(武汉) Comprehensive geological drilling well logging lithology identification method
CN113344075A (en) * 2021-06-02 2021-09-03 湖南湖大金科科技发展有限公司 High-dimensional unbalanced data classification method based on feature learning and ensemble learning
CN113256409A (en) * 2021-07-12 2021-08-13 广州思迈特软件有限公司 Bank retail customer attrition prediction method based on machine learning
CN113935375A (en) * 2021-10-13 2022-01-14 哈尔滨理工大学 High-speed electric spindle fault identification method based on UMAP dimension reduction algorithm
CN114492198A (en) * 2022-02-15 2022-05-13 重庆大学 Cutting force prediction method based on improved PSO algorithm assisted SVM algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"Machine Learning Prediction of Quartz Forming-Environments";Yu Wang 等;《JGR Solid Earth》;第1-11页 *

Also Published As

Publication number Publication date
CN115331752A (en) 2022-11-11

Similar Documents

Publication Publication Date Title
CN115331752B (en) Method capable of adaptively predicting quartz forming environment
US7542953B1 (en) Data classification by kernel density shape interpolation of clusters
US8738534B2 (en) Method for providing with a score an object, and decision-support system
JP4376145B2 (en) Image classification learning processing system and image identification processing system
CN107240097B (en) Pulmonary nodule image processing method based on MKL-SVM-PSO algorithm
JP6897749B2 (en) Learning methods, learning systems, and learning programs
Saberian et al. Boosting algorithms for detector cascade learning
US20230062289A1 (en) Learning method and processing apparatus regarding machine learning model classifying input image
Shang et al. Support vector machine-based classification of rock texture images aided by efficient feature selection
CN111738319B (en) Clustering result evaluation method and device based on large-scale samples
CN107480627B (en) Behavior recognition method and device, storage medium and processor
CN113221065A (en) Data density estimation and regression method, corresponding device, electronic device, and medium
CN114925938A (en) Electric energy meter running state prediction method and device based on self-adaptive SVM model
CN108257148B (en) Target suggestion window generation method of specific object and application of target suggestion window generation method in target tracking
Yang et al. Identifying mislabeled images in supervised learning utilizing autoencoder
CN111783866A (en) Production logistics early warning information multi-classification method based on improved FOA-SVM
Yellamraju et al. Benchmarks for image classification and other high-dimensional pattern recognition problems
Vlahek et al. An efficient iterative approach to explainable feature learning
Liang et al. Figure-ground image segmentation using genetic programming and feature selection
Balakrishnan et al. Computing WHERE-WHAT classification through FLIKM and deep learning algorithms
Wu et al. Real-time compressive tracking with motion estimation
Moser Machine learning with the sparse grid density estimation using the combination technique
CN112634869A (en) Command word recognition method, device and computer storage medium
CN110866562A (en) Big data classification prediction method based on DSVM
Li et al. Strangeness based feature selection for part based recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant