CN113076700A

CN113076700A - SVM-LDA rock burst machine learning prediction model method based on data analysis principle

Info

Publication number: CN113076700A
Application number: CN202110458500.XA
Authority: CN
Inventors: 李克钢; 李明亮; 秦庆词; 娄颖豪; 徐港; 岳睿; 刘博�; 李博文
Original assignee: Kunming University of Science and Technology
Current assignee: Kunming University of Science and Technology
Priority date: 2021-04-27
Filing date: 2021-04-27
Publication date: 2021-07-06

Abstract

The invention discloses a SVM-LDA rock burst machine learning prediction model method based on a data analysis principle, and relates to the technical field of geotechnical engineering and underground excavation engineering. The method comprises collecting multiple groups of rock burst case engineering data at home and abroad; calculating the correlation coefficient of the rock burst prediction index by using a correlation coefficient principle; carrying out extreme value processing on original rock burst case engineering data, and then carrying out standardized processing; introducing a T-distribution neighborhood embedding (T-SNE) method to perform dimensionality reduction visualization on data; determining a rock burst prediction sample training set and a rock burst intensity level prediction set by combining a random cross validation method, and establishing 6 rock burst intensity level prediction models; the invention discusses the prediction effect of the model based on each rock burst grade, is not limited to a certain model, but finds the model with better prediction effect on a certain or certain rock burst grades, and combines the models to predict the rock burst grades. The method has great significance for researching rock burst intensity grade prediction of mines, tunnels, hydropower stations and the like.

Description

SVM-LDA rock burst machine learning prediction model method based on data analysis principle

Technical Field

The invention relates to a SVM-LDA rock burst machine learning prediction model method based on a data analysis principle, and belongs to the technical field of deep geotechnical engineering and underground excavation engineering.

Background

Along with the gradual extension of underground works such as mines, tunnels, hydropower stations and the like to the deep part, the problem of engineering geological disasters with great harmfulness, namely rock burst, is increased day by day. Rock burst is often manifested as the phenomena of extremely rapid ejection, spalling and surrounding rock rib spalling of brittle rock fragments. The rock burst has the characteristics of randomness, burst property, uncertainty, great danger and the like, and the research of rock burst disasters becomes one of important scientific problems to be solved urgently in the rock mechanics field of China. Therefore, how to accurately predict the occurrence of rock burst is an ongoing effort among many scholars.

Rock burst prediction is a core link of rock burst mechanism and rock burst prevention and control, a method for predicting rock burst disasters reasonably, effectively and accurately is provided, rock bursts can be effectively and deeply controlled and avoided, and a rock burst intensity grade prediction method is roughly divided into a single-factor prediction method and a comprehensive consideration multi-factor prediction method by numerous scholars and experts at home and abroad. The single-factor prediction method adopts the established rock burst criterion to realize the prediction of the rock burst intensity grade, such as: russenes criterion, Turchaninov criterion, Erlangshan criterion, pottery-earth criterion, Hoek criterion, N-Jelum criterion, and the like; due to numerous factors inducing rock burst and complexity of a rock burst mechanism, the accuracy and the confidence degree of a single-factor discrimination method are obviously low, so that a large number of scholars consider multi-factor prediction methods, and the multi-factor prediction methods predict the rock burst intensity grade based on a mathematical method and an intelligent algorithm, such as: the prediction method is widely applied, the accuracy of rock burst intensity level prediction is improved, and the problems of low accuracy, poor universality and the like still exist.

Disclosure of Invention

Aiming at the defects of the existing rock burst intensity grade prediction method, the invention provides a SVM-LDA rock burst machine learning prediction model method based on a data analysis principle, and aims to predict the rock burst intensity grade classification condition in geotechnical engineering and underground excavation engineering more accurately and with better universality.

The technical scheme adopted by the invention is as follows: a SVM-LDA rock burst machine learning prediction model method based on a data analysis principle comprises the following steps:

the method comprises the following steps: constructing a rock burst prediction sample library;

step two: analyzing rock burst case engineering data;

step three: determining the grading condition of the rockburst intensity grade;

step four: optimizing model parameters;

step five: pre-processing rock burst prediction sample data;

step six: establishing a rock burst intensity grade prediction model;

step seven: and (5) modeling analysis.

Specifically, in the first step, a rock burst prediction sample library is constructed by collecting relevant domestic and foreign rock burst prediction documents and selecting a plurality of groups of mutually independent domestic and foreign rock burst case engineering data based on the selected rock mass stress coefficient sigma theta/sigma c, the rock brittleness coefficient sigma c/sigma t and the elastic deformation energy coefficient Wet as rock burst prediction indexes.

Specifically, the method for analyzing the rockburst case engineering data in the second step includes: each group of rockburst case engineering data in the first step comprises 4 variables: and meanwhile, reducing 3 characteristics of the stress coefficient, the rock brittleness coefficient and the elastic deformation energy coefficient in each group of rock burst case engineering data in the rock burst case engineering data collected in the first step to two dimensions based on a T-SNE method, and observing whether the samples in different actual categories have obvious boundaries or not.

Specifically, in the third step, the grade of the rockburst intensity is divided into 4 grades, namely the total rockburst intensity prediction result is 4 types, namely, the rockburst-free intensity is I, the slight rockburst is II, the medium rockburst is III and the strong rockburst is IV.

Specifically, the fourth step specifically comprises the following steps: and constructing a KNN model, an NB model, a DT model, an RF model, an LDA model and an SVM model according to the machine learning algorithm data packet and the Python tool, and respectively optimizing the 6 rock burst prediction model parameters by adopting a grid search mode.

Specifically, the pretreatment of the rock burst prediction sample data in the fifth step is as follows: the extreme value processing is firstly carried out on the original rock burst case engineering data, then the standardization processing is carried out, and the influence of dimension is eliminated.

Specifically, the process of establishing the rock burst intensity level prediction model in the sixth step is as follows:

(1) adopting a five-fold cross-validation method to take the rock burst case engineering data sample after the fifth preprocessing step as a training set according to 80% of the sample, and taking 20% of the sample as a testing set;

(2) utilizing the optimized 6 rock burst prediction model parameters in combination with a machine learning algorithm data packet, and adopting Python tool operation processing to respectively obtain prediction accuracy of each grade of the 6 rock burst prediction models, and further judging the prediction accuracy of the 6 rock burst prediction models for rock burst intensity grades from I grade to IV grade;

the above two steps are performed simultaneously.

Specifically, the modeling analysis in the seventh step is as follows: and 6, obtaining the prediction accuracy rate result of each grade of the 6 rock burst prediction models based on the sixth step, and analyzing the bias of the prediction result of the 6 machine learning algorithm to the real result, namely the bias risk prediction or the bias safety prediction.

The invention has the beneficial effects that:

1. the invention provides a SVM-LDA rock burst machine learning prediction model method based on a data analysis principle, which selects a plurality of groups of typical rock burst case engineering data, and establishes 6 rock burst intensity grade prediction models based on 6 machine learning algorithms and a random cross validation method. The selected machine learning algorithm has the advantages of simple logic, easy realization, strong model generalization capability, high training speed, suitability for small samples and the like, and the method for optimizing the model parameters by adopting the grid search mode has certain universality. Determining that no strong correlation exists between variables by means of a correlation coefficient principle, and simultaneously carrying out extreme value processing on original rock burst case engineering data and then carrying out standardized processing to eliminate the influence of dimension;

2. the invention discusses the prediction effect of the model based on each rock burst grade, is not limited to a certain model, but finds the model with better prediction effect on a certain or certain rock burst grades, combines the models, predicts the rock burst grades and provides better guiding significance for the rock burst prediction problem of geotechnical engineering;

3. the invention adopts a T-SNE dimension reduction method to perform dimension reduction visualization processing on the rockburst case engineering data selected by the invention, judges whether each rockburst intensity grade sample has a clustering effect, and provides a theoretical basis for a reader to research such a topic by adopting a machine learning algorithm in the future.

Drawings

FIG. 1 is a diagram showing a stress coefficient distribution of a rock mass;

FIG. 2 is a graph of a rock brittleness coefficient distribution;

FIG. 3 is a graph showing an elastic modulus distribution;

FIG. 4 is a diagram of an actual grade profile of a rock burst;

FIG. 5 is a three-dimensional distribution diagram of actual levels of a rock burst;

FIG. 6 is a sample distribution diagram after dimensionality reduction;

FIG. 7 is a flow chart of a machine learning algorithm model;

FIG. 8 is a graph of LDA and SVM cross-validation accuracy.

Detailed Description

For better clarity of the computing principles, the operation processes and the method advantages of the embodiments of the present invention, the following detailed descriptions of the technical solutions of the present invention are provided with reference to the accompanying drawings.

Example 1: as shown in fig. 1 to 8, a method for learning and predicting a model by an SVM-LDA rock burst machine based on a data analysis principle includes the following steps:

the method comprises the following steps: and constructing a rock burst prediction sample library.

The invention relates to a method for predicting rock burst degree based on PCA-N, by collecting rock burst prediction related documents (Zhou J, Li X B, Shi XZ. Long-term prediction model of rock burst in underlying concrete using hardware and supporting vector models [ J ]. Safety Science,2012,50(4): 629. J., "DONG L J., XING L I., PENG K. morning prediction of rock burst specification [ J ]. Transactions of Nonrorus Metals Society of China,2013,23(2):472,. Wushun, Zhang bridge, and coal bridge prediction model [ J ]. 19, K. J. (J.),. Safety prediction of rock burst in nuclear coal mine) of rock burst classification [ 19, K. prediction of rock burst classification of rock burst in nuclear coal mine, K.,. 76, K. prediction of rock burst classification [ J.,. 9. sub.K.,. K.,76, K. 5. sub., 2016,26(7): 1995-.

TABLE 1 rock burst case engineering data at home and abroad

Step two: and analyzing the rockburst case engineering data.

(1) Normalization and correlation coefficients

Aiming at the problems that each evaluation index in a multi-index evaluation system has different properties and different dimensions and magnitude levels, or when the levels among the evaluation indexes are greatly different, the original index value is selected for analysis, so that the index with higher numerical value plays an important role in evaluation, and the function of the index with lower numerical value is reduced. Therefore, in order to ensure the reliability of the result, the raw index data needs to be normalized, and the invention utilizes a normalization processing method for comparing typical data, namely, uniformly mapping the data to the [0,1] interval, and the formula is as follows.

In the formula: x is the number of_istdFor normalized values, min (x) is the minimum value of the variable, max (x) is the maximum value of the variable, x_iIs the actual variable value.

The correlation coefficient was first proposed by karl pearson and is a non-deterministic relationship that reflects a statistical indicator of the closeness of the correlation between variables. The formula is expressed as follows, the correlation coefficient R (X, Y) is positioned in the range of [ -1,1], and when R (X, Y) is 0, X and Y are called to be uncorrelated; when | R (X, Y) | ═ 1, X and Y are called to be completely correlated, and then X and Y have a linear functional relationship; if | R (X, Y) | <1, the variation of X causes a partial variation of Y, and if the absolute value of R (X, Y) is larger, the variation of X causes a larger variation of Y, then | R (X, Y) | >0.8 is called a high correlation, and if | R (X, Y) | <0.3 is called a low correlation, and if not, it is a medium correlation.

In the formula: cov (X, Y) is the covariance of X and Y, Var [ X ] is the variance of X, Var [ Y ] is the variance of Y, and R (X, Y) is the correlation coefficient of X and Y.

(2) T-SNE principle

T-SNE (T-distributed stored systematic neighbor embedding) was originally proposed by Laurens van der Maaten and Geoffrey Hinton, and is taken as a conventional method in a nonlinear dimension reduction algorithm and generally applied to the dimension reduction process of popular learning (simulated learning).

The rockburst case engineering data collected by the method is 145 groups in total, and samples are independent. There are 4 variables in total, namely the rock mass stress coefficient (sigma theta/sigma c), the rock brittleness coefficient (sigma c/sigma t), the elastic deformation energy coefficient (Wet) and the actual grade (I-IV) of the rock burst. The first three are independent variables of the model, the actual grade is a dependent variable of the model, and in order to achieve higher accuracy of the rock burst prediction model, rock burst case engineering data (table 1) are preprocessed based on the formula (1) and the formula (2), and the basic description of the sample is shown in the table 2.

TABLE 2 basic information of variables

Index (I)	σθ/σc	σc/σt	Wet
				Sample size
	145	145	145
				Mean value of	0.46	22.36	4.49
Standard deviation of	0.21	12.68	2.09
				Minimum value	0.05	4.48	0.90
Quantile 25%	0.35	13.98	3.00
				50% quantile	0.45	20.40	4.30
75% quantile	0.60	28.43	5.76
				Maximum value	1.41	80.00	10.90

As can be seen from Table 2, the maximum value of the stress coefficient of the rock mass is 1.41, the quantile is 0.35, the minimum value is 0.05, and obvious right-bias distribution is presented; the maximum value of the rock brittleness coefficient is 80.00, the median is 20.40, the minimum value is 4.48, and obvious right-biased distribution is also presented; the maximum value of the elastic deformation energy coefficient is 10.90, the median is 4.30, the minimum value is 0.90, and the elastic deformation energy coefficient also has a certain right deviation characteristic. The data in the table are limited, and the distribution condition of the samples is difficult to visually see, so that distribution curves of three variables are drawn, and the distribution characteristics are visually presented as shown in figures 1-3.

Rock burst prediction results are visually expressed by rock burst grades, and in the existing rock burst prediction evaluation system, the rock burst intensity grades are generally divided into 4 grades such as no rock burst (I), slight rock burst (II), medium rock burst (III) and strong rock burst (IV). In the 145 cases collected this time, the actual grade distribution of the rock burst is shown in fig. 4.

As can be seen from fig. 4, the number of grades iii is at most 57; the number of grades i is the minimum, 27, and there is a certain imbalance characteristic for the samples. But the ratio of the maximum sample size to the minimum sample size is only slightly larger than 2, and the imbalance problem is less severe. For a model, if there is a large correlation between independent variables, the model prediction accuracy and stability are affected, and therefore, it is important to analyze the stability between independent variables.

From the equation (2), the correlation matrix among the rock mass stress coefficient σ θ/σ c, the rock brittleness coefficient σ c/σ t, and the elastic deformation energy coefficient (Wet) is shown in table 3. As can be seen from table 3, the correlation between the three variables is very small, and it can be considered that there is no correlation between the variables.

TABLE 3 independent variable correlation matrix

Index (I)	σθ/σc	σc/σt	Wet
				σθ/σc	1.00	-0.06	0.03
σc/σt	-0.06	1.00	-0.14
				Wet	0.03	-0.14	1.00

Since the independent variables are only 3, i.e. the modeled samples have only three features, the relationship between the independent variables and the dependent variables is now shown in three-dimensional space, see fig. 5. In three-dimensional space, the present invention does not allow the differences between categories to be viewed very intuitively. Therefore, the characteristics are reduced to two dimensions, and the sample distribution of each category is more visually shown.

FIG. 6 is a relationship between each category and two features after the T-SNE dimension reduction, and data after the dimension reduction shows that there is a more obvious boundary between samples of different actual categories. The invention adopts a machine learning method to carry out modeling prediction on actual categories. Note: VAR1 and VAR2 are two variables after dimensionality reduction. As can be seen from fig. 6, most of the points of the square are gathered together, and there are distinct boundaries between the points of the square and other points, so that the boundaries can be found by the prejudgment machine learning algorithm to distinguish different rock burst intensity levels.

Step three: and determining the grading condition of the rock burst intensity grade.

The grade of the rockburst intensity is divided into 4 grades, namely the rockburst intensity prediction result is 4 types, and the 4 types are non-rockburst (I), slight rockburst (II), medium rockburst (III), strong rockburst (IV) and the like in sequence.

Step four: and optimizing the model parameters.

The invention adopts a grid searching mode to determine the parameters of the model, and the grid searching principle is as follows: the grid search is a relatively common parameter determination method, and the basic idea is to traverse all possible values of parameters in an effective space, and to arrange and combine the values of different parameters, and finally to select one or more groups of parameter combinations with better model effect. One advantage of grid search is that the optimal parameter combination can be found more accurately. But the disadvantages are also evident. Firstly, a rough traversal method is adopted in grid search, and when the value space of parameters is large, the model training time is long; second, the range of the parameter needs to be determined empirically, so the parameter found is not necessarily globally optimal, and may be locally optimal. Note: the invention discusses the prediction effect of the model based on each rock burst grade, is not limited to a certain model, but finds the model with better prediction effect on a certain or certain rock burst grades, and combines the models to predict the rock burst grades. Therefore, the method for optimizing the model parameters by adopting the grid search mode has certain universality.

Step five: and (4) preprocessing rock burst case engineering data.

The data preprocessing stage comprises extreme value processing and data standardization. According to the univariate analysis, the rock stress coefficient and the rock brittleness coefficient both show serious right-bias distribution, namely a sample has a small number of extreme values, so that the characteristic values with the rock stress coefficient value larger than 0.8 are uniformly assigned to 0.8, the characteristic values with the rock brittleness coefficient value larger than 45 are uniformly assigned to 45, and data after extreme value processing are further subjected to data standardization.

The data processed according to equation (1) has a maximum value of 1 and a minimum value of 0, and the distribution of the samples is unchanged after the data is normalized, and the normalized partial samples are shown in table 4.

TABLE 4 normalized samples

Serial number	σθ/σc	σ_c/σ_t	Wet	level
					1	1.00	0.60	0.23	Ⅲ
2	1.00	0.60	0.23	Ⅳ
					3	0.92	0.60	0.23	Ⅲ
4	1.00	0.36	0.22	Ⅳ
					5	0.67	0.25	0.55	Ⅲ
┇	┇	┇	┇	┇
					141	0.44	0.32	0.81	Ⅲ
142	0.12	0.24	0.04	Ⅰ
					143	0.47	0.25	0.62	Ⅲ
144	0.00	0.22	0.60	Ⅰ
					145	0.23	0.22	0.82	Ⅲ

Step six: and establishing a rock burst intensity grade prediction model.

In order to solve the problem of rock burst intensity level prediction, the method carries out programming calculation on the existing machine learning algorithm based on python software, the searched algorithms are K Nearest Neighbor (KNN), Naive Bayes (NB), Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM) and Linear Discriminant (LDA), and a rock burst intensity level prediction model based on 6 machine learning algorithms is established. The flow chart of the machine learning algorithm model building of the rock burst intensity level prediction is shown in figure 7. In order to improve the prediction stability of the model and improve the fitting capability, cross validation is often performed on the data. The method of n-fold cross validation is as follows:

step 1: dividing the data into n parts on average; step 2: for the n samples, sequentially taking n-1 as a training set and the rest 1 as a prediction set, and establishing a model for prediction; step 3: and averaging the n prediction results to obtain the final effect of the model. Typically, to avoid the contingency of sample partitioning, m-fold cross-validation is also performed n times. The penmen try cross validation for multiple times, the purpose is to achieve the highest model accuracy, and meanwhile, 145 rockburst engineering cases selected by the method are considered, so that the five-fold cross validation method is suitable.

Step seven: and (5) modeling analysis.

The rock burst prediction model is trained and predicted by 6 statistical machine learning methods of nearest neighbor (KNN), Support Vector Machine (SVM), Naive Bayes (NB), Decision Tree (Decision Tree), Random Forest (Random Forest) and Linear Discriminant (LDA), when the rock burst intensity grade prediction model is established, 20% of samples are used as a test set, and 80% of the samples are used as a training set. After the division, the training set has 116 samples in total, and the test set has 29 samples in total. In order to make the rock burst prediction model universal and eliminate the contingency of sample division. Each model should be selected on the model trained by the five-fold cross validation. And comparing the prediction grade with the real grade, and elaborating the prediction result. The prediction accuracy of each grade of the obtained model is shown in a table 5, and the prediction condition and the model accuracy of each grade of the model are shown in a table 6. Finally, based on the result of the five-fold cross validation, the bias of the prediction results of various algorithms to the real results (namely the bias risk prediction or the bias safety prediction) is intensively studied. And providing a model with highest accuracy and strongest applicability for predicting the rockburst intensity grade I and a rockburst prediction model with highest accuracy and strongest applicability for predicting the rockburst intensity grades II-IV in 6 rockburst machine learning prediction models.

TABLE 5 model prediction accuracy for each grade

Actual grade

Actual quantity

KNN

SVM

NB

DT

RF

LDA

Ⅰ

5

100.00％

Ⅱ

8

75.00％

87.50％

62.50％

75.00％

100.00％

Ⅲ

11

81.82％

100.00％

90.91％

72.73％

100.00％

90.91％

Ⅳ

5

100.00％

80.00％

60.00％

100.00％

As can be seen from Table 5, 6 models can be accurately predicted when the actual grade of the rock burst intensity is I; for the rock burst intensity actual grade II, the prediction accuracy of linear discriminant LDA is the highest and reaches 100%, and then the accuracy reaches 87.5% by a Support Vector Machine (SVM); for the actual grade of the rockburst intensity is III, the prediction accuracy of the support vector machine and the random forest is the highest and reaches 100%, then linear discrimination and naive Bayes are carried out, the accuracy reaches 90.91%, for the actual grade of the rockburst intensity is IV, the prediction accuracy of nearest neighbor and linear discrimination is the highest and reaches 100%, and then the support vector machine, naive Bayes and decision tree models are carried out.

TABLE 6 model prediction for each level and model accuracy

Actual grade	Actual quantity	KNN	SVM	NB	DT	RF	LDA
								Ⅰ	5	5	5	5	5	5	5
Ⅱ	8	6	7	5	6	6	8
								Ⅲ	11	9	11	10	8	11	10
Ⅳ	5	5	4	4	4	3	5
								Total of	29	25	27	24	23	25	28
Rate of accuracy	--	86.21％	93.10％	82.76％	79.31％	86.21％	96.55％

Table 5 analyzes the prediction accuracy of each model at a single level, based on which the prediction accuracy of the model is viewed overall. From table 6, it can be seen that the prediction accuracy of linear discrimination is the highest, which is consistent with the guess of the present invention, i.e. the samples are approximately linearly separable, and then the SVM, which has a very good prediction effect on the linearly separable small samples.

The invention adopts the random five-fold cross validation, each round of prediction set has 29 samples, and the five rounds of prediction set have 145 prediction levels in total. The predicted grade was compared to the true grade, see table 7. The prediction results are described in detail below.

(1) For the Decision Tree (DT) algorithm, for real samples with a rockburst level i, the ratio of the predicted level i is 85.19%, and the ratio of the predicted level ii is 14.81%. The decision tree will not predict the true grade i to iii or iv with an error substantially within a controllably acceptable range. And (3) predicting the proportion of the rock burst grade II in the real sample with the grade II, predicting the proportion of the rock burst grade II in the real sample with the grade II to be 56%, predicting the proportion of the rock burst grade III in the real sample with the grade III to be 36%, and predicting the proportion of the rock burst grade I in the real sample with the grade I to be 8%. It can be seen that the prediction error of the decision tree is biased towards a higher level. The rock burst level III real sample predicts 70% of level III, 18.57% of level IV and 11.43% of level III or below. And (3) predicting the proportion of the rock burst in the real sample with the grade IV to be 56.52 and predicting the proportion of the rock burst in the real sample with the grade III to be 39.13%. In a comprehensive view, the decision tree is biased to the prediction of danger, the overall error is large, and the prediction effect is poor especially for samples with real grades II-IV.

(2) For the nearest neighbor (KNN) algorithm, the real sample with the rock burst level I predicts the occupation ratio of the level I as high as 96.3%, which is far higher than the level of the decision tree. The rock burst grade of the real sample II is 60% higher than that of the decision tree, and the rock burst grade of the real sample II is 36% equal to that of the decision tree. And (3) for the real samples with the rock burst level III, the ratio of the prediction level III is 75.71%, the real samples are higher than the decision tree, the ratio of the prediction level IV is 15.71%, and no sample is predicted to be I. For real samples with a rock burst level IV, the ratio of the predicted level IV is 78.26%, the ratio of the predicted level III is 21.74%, and no sample is predicted to be I or II. In general, the KNN prediction effect is better than that of a decision tree, but for samples of rock burst grades II-IV, the prediction accuracy of the KNN algorithm is still low.

(3) For Linear Discriminant (LDA), the ratio of prediction grade i is 92.59% for real samples with grade i of rockburst, which is close to KNN. The rock burst grade is a real sample of II, the ratio of the prediction grade II reaches 84%, and the prediction grade is far higher than the two machine learning algorithms. The rock burst level III of the real sample is predicted to be 77.14 percent, and the rock burst level III of the real sample is predicted to be IV in nearly 15 percent. The rock burst grade is an IV real sample, and the prediction grade IV accounts for 86.69% which is far higher than the two algorithms. In a comprehensive view, the prediction effect of LDA on each grade has better performance, and each algorithm has certain limitation on a sample with a real grade of III.

(4) For a Naive Bayes (NB) algorithm, for real samples with a rock burst level I, the proportion of the rock burst level I is predicted to be 92.29%, but the remaining 7.41% of samples are predicted to be III, which is biased to be predicted at higher risk. The rock burst level is II-IV real samples, and the prediction effect is not as good as that of the LDA algorithm.

(5) For a Random Forest (RF) algorithm, the prediction accuracy of real samples with the rockburst level I is higher, but the prediction effect of real samples with the rockburst levels II-IV is not as good as that of an LDA algorithm.

(6) For a Support Vector Machine (SVM) algorithm, the proportion of the prediction grade I to the real sample with the rockburst grade I reaches 100%, namely the algorithm has higher recognition rate on the sample with the rockburst grade I, but the prediction effect of the real sample with the rockburst grades II-IV is not as good as that of an LDA algorithm.

TABLE 7 accuracy of machine learning algorithms on prediction of various rockburst intensity levels

Fig. 8 shows the prediction accuracy of Linear Discriminant (LDA) and Support Vector Machine (SVM) in the five-fold cross validation. As shown in fig. 8, for a rockburst level i, the prediction accuracy of each round of the Support Vector Machine (SVM) can reach 100%, so that the SVM has a good identification effect on a sample with the rockburst level i, and the prediction accuracy of other rockburst levels has high volatility. Compared with the SVM, the prediction accuracy and stability of the LDA on the rock burst grades of II-IV are superior to those of the SVM algorithm. Based on this, the prediction results of the two algorithms can be combined when finally determining the prediction results. When the SVM is predicted to be I, the SVM is used as a standard, otherwise, the LDA is used as a standard.

In the rock burst practical engineering application, not only the model prediction accuracy rate but also the risk caused by prediction errors are considered, if the rock burst practical level is IV and the prediction level is III, a more serious result is caused, the overall accuracy rate is comprehensively considered, a Support Vector Machine (SVM) and a linear discriminant model (LDA) are recommended to be selected for more refined comparison, and the prediction results of the support vector machine and the linear discriminant model are shown in a table 8.

TABLE 8 support vector machine and Linear discriminant model prediction results

As can be seen from Table 8, the SVM can accurately predict the actual grade of the rock burst I and the actual grade of the rock burst III; for the actual grade of the rock burst, II, 7 grades of models are accurately predicted, 1 grade of models is predicted to be I, for the actual grade of the rock burst, IV, 4 samples are correctly predicted, 1 grade of models is predicted to be III, the rock burst grade is underestimated under the condition that the two kinds of predictions are wrong, and the rock burst grade can bring serious influence on actual engineering construction. For the rock burst intensity grades I, II and IV, the LDA model can accurately predict, for the rock burst actual grade III, 10 models can accurately predict, wherein 1 model is predicted to be IV, namely the model overestimates the rock burst grade. In actual engineering construction, if the rock burst intensity level is overestimated, certain resource waste may be brought, but the safety can be guaranteed.

In summary, if the accuracy of the model is required to be higher than 90%, the effects of the SVM and the LDA can both meet the requirements. However, both the two misclassifications of the SVM underestimate the rockburst level, which may affect the production safety.

It is not difficult to see by combining the research, from the viewpoint of prediction accuracy and stability of rock burst grades from II to IV, a linear discriminant model (LDA) has better accuracy and more stable model performance, and because the actual grade of the engineering rock burst case selected in the engineering application is from II to III, the invention adopts the linear discriminant model (LDA) to carry out rock burst intensity grade prediction on a mosaic screen secondary hydropower station, a river side hydropower station and a pallida tunnel, realizes an LDA algorithm based on Python language programming, and realizes a code based on a KMeans algorithm package in Python 3.7. The model is applied to three projects with rock burst tendency in China, such as a brocade secondary hydropower station, a river side hydropower station, a palliative tunnel and the like, the linear discriminant model (LDA) in the machine learning algorithm provided by the invention is applied to rock burst prediction, the prediction grade is 9, and research results show that the prediction results of the three rock burst tendency projects completely accord with the actual grade.

TABLE 9 rockburst intensity level prediction results of three projects

The rockburst intensity grade prediction method has better accuracy and universality, and can provide better guiding significance for rockburst intensity grade prediction problems.

While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims

1. A SVM-LDA rock burst machine learning prediction model method based on a data analysis principle is characterized in that: the method comprises the following steps:

step two: analyzing rock burst case engineering data;

step three: determining the grading condition of the rockburst intensity grade;

step four: optimizing model parameters;

step five: pre-processing rock burst prediction sample data;

step six: establishing a rock burst intensity grade prediction model;

step seven: and (5) modeling analysis.

2. The SVM-LDA rock burst machine learning prediction model method based on the data analysis principle as claimed in claim 1, wherein: in the first step, a rock burst prediction sample library is constructed by collecting relevant documents for rock burst prediction at home and abroad and selecting a plurality of groups of independent domestic and foreign rock burst case engineering data based on the selected rock stress coefficient sigma theta/sigma c, the rock brittleness coefficient sigma c/sigma t and the elastic deformation energy coefficient Wet as rock burst prediction indexes.

3. The SVM-LDA rock burst machine learning prediction model method based on the data analysis principle as claimed in claim 2, wherein: the method for analyzing the rock burst case engineering data in the second step comprises the following steps: each group of rockburst case engineering data in the first step comprises 4 variables: and meanwhile, reducing 3 characteristics of the stress coefficient, the rock brittleness coefficient and the elastic deformation energy coefficient in each group of rock burst case engineering data in the rock burst case engineering data collected in the first step to two dimensions based on a T-SNE method, and observing whether the samples in different actual categories have obvious boundaries or not.

4. The SVM-LDA rock burst machine learning prediction model method based on the data analysis principle as claimed in claim 3, wherein: in the third step, the grade of the rockburst intensity is divided into 4 grades, namely the rockburst intensity prediction result is 4 types, and the rockburst-free intensity prediction result, the slight rockburst II, the medium rockburst III and the strong rockburst IV are sequentially arranged.

5. The SVM-LDA rock burst machine learning prediction model method based on the data analysis principle as claimed in claim 4, wherein: the fourth step comprises the following specific steps: and constructing a KNN model, an NB model, a DT model, an RF model, an LDA model and an SVM model according to the machine learning algorithm data packet and the Python tool, and respectively optimizing the 6 rock burst prediction model parameters by adopting a grid search mode.

6. The SVM-LDA rock burst machine learning prediction model method based on the data analysis principle as claimed in claim 5, wherein: preprocessing the rock burst prediction sample data in the fifth step as follows: the extreme value processing is firstly carried out on the original rock burst case engineering data, then the standardization processing is carried out, and the influence of dimension is eliminated.

7. The SVM-LDA rock burst machine learning prediction model method based on the data analysis principle as claimed in claim 6, wherein: the process of establishing the rock burst intensity grade prediction model in the sixth step is as follows:

the above two steps are performed simultaneously.

8. The SVM-LDA rock burst machine learning prediction model method based on the data analysis principle as claimed in claim 7, wherein: the modeling analysis in the seventh step is as follows: and 6, obtaining the prediction accuracy rate result of each grade of the 6 rock burst prediction models based on the sixth step, and analyzing the bias of the prediction result of the 6 machine learning algorithm to the real result, namely the bias risk prediction or the bias safety prediction.