CN108280289B

CN108280289B - Rock burst danger level prediction method based on local weighted C4.5 algorithm

Info

Publication number: CN108280289B
Application number: CN201810058598.8A
Authority: CN
Inventors: 王彦彬; 彭连会; 何满辉
Original assignee: Liaoning Technical University
Current assignee: Liaoning Technical University
Priority date: 2018-01-22
Filing date: 2018-01-22
Publication date: 2021-10-08
Anticipated expiration: 2038-01-22
Also published as: CN108280289A

Abstract

The invention provides a rock burst danger level prediction method based on a local weighted C4.5 algorithm, and relates to the technical field of rock burst prediction. The method comprises the steps of firstly discretizing continuous attribute data in sample data by adopting an MDLP method, then selecting a training set by adopting a local weighting method and calculating sample weight, calculating the information gain rate of each attribute by utilizing the sample weight, selecting the sample attribute as the split attribute of a root node and other branch nodes of a C4.5 decision tree according to the information gain rate, and finally carrying out pessimistic pruning on the established decision tree by adopting the sample weight to replace the number of samples so as to realize prediction of the rock burst danger level of a prediction region. The rock burst danger level prediction method based on the local weighting C4.5 algorithm overcomes the defect that more attributes are selected preferentially when the split attributes of the information gain selection nodes are adopted in the ID3 algorithm, avoids the problem of overfitting, and is high in model prediction accuracy.

Description

Rock burst danger level prediction method based on local weighted C4.5 algorithm

Technical Field

The invention relates to the technical field of rock burst prediction, in particular to a rock burst danger level prediction method based on a local weighted C4.5 algorithm.

Background

The rock burst is a dynamic phenomenon which is characterized by sudden, sharp and violent damage caused by the release of deformation energy of coal rock bodies around mine roadways and stopes, is one of major disasters affecting the safety production of coal mines, almost all countries in the world are threatened by the rock burst to different degrees, developed countries in recent years close rock burst mines successively for the adjustment of energy structures and safety considerations, and China becomes a main country for the main victimization of the rock burst and the prevention and control of the rock burst.

The prediction and evaluation of rock burst are key steps for preventing and treating rock burst on the basis of research on the occurrence mechanism of rock burst, but the mechanism of rock burst is not completely understood, so that the difficulty of rock burst prediction is increased particularly when the research on the occurrence mechanism of deep rock burst is still in a starting stage. At present, rock mechanics methods and geophysical methods are mainly used for predicting rock burst, wherein the rock mechanics methods comprise a drilling cutting method, a mining induced stress detection method and the like, and the geophysical methods comprise methods such as ground sound monitoring, microseismic monitoring, electromagnetic radiation monitoring and the like; in addition, with the development of artificial intelligence, some methods for predicting rock burst by using an intelligent algorithm appear, such as: the method includes a neural network method, a Bayes discriminant analysis method, a support vector machine and the like, the method obtains a great deal of research results in the prediction of rock burst risk level, but has some problems, such as that the neural network generally needs a large amount of samples, the amount of samples used for rock burst prediction is small, the Bayes method needs high independence among data, real rock burst sampling data hardly meet independence requirements, and the method does not consider overfitting problems of models and the like.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a rock burst danger grade prediction method based on a local weighted C4.5 algorithm, which realizes the prediction of the rock burst danger grade of coal and rock mass around mine roadways and stopes.

The rock burst danger level prediction method based on the local weighting C4.5 algorithm comprises the following steps:

step 1, collecting rock burst data of a known type as sample data, setting a collected sample data set as T, a type set of the sample as C, k' as the total number of the type of the sample, and the number of the sample as N;

step 2, discretizing continuous attribute data in the known type of sample data by adopting a Minimization Description Length (MDLP) method, wherein the method specifically comprises the following steps:

step 2.1: sequencing a group of continuous attribute values to be discretized and corresponding categories thereof according to the sequence of the continuous attribute values from small to large;

step 2.2: selecting continuous attribute values as boundary points according to the difference of categories corresponding to the sorted continuous attribute values to form a boundary point set; if the attribute values corresponding to different categories are the same, selecting the attribute value corresponding to the smallest category as a demarcation point;

Step 2.3: calculating the information gain of all the demarcation points in the demarcation point set, selecting the demarcation point with the minimum information gain, judging whether the demarcation point meets the minimum description criterion, and if so, keeping the demarcation point; otherwise, removing the demarcation point;

the calculation formula of the information gain of the demarcation point is as follows:

Gain(a)＝H(C)-H(C|a)

wherein a is a demarcation point in the demarcation point set, H (C) is the category information entropy, and H (C | a) is the information entropy obtained by dividing the category set C into two subsets by the demarcation point a;

let a_minIs the demarcation point where the information gain is minimal, which divides the class set C into two subsets C₁And C₂Judgment of a_minThe calculation formula of whether the minimum description criterion is met is as follows:

Gain(a_min)＞log₂(N-1)/N+log₂(3^k′-2)-[k＇H(C)-k′₁H(C₁)-k′₂H(C₂)]

wherein, k'₁、k′₂Are respectively a subset C₁And C₂The number of categories included in (a);

step 2.4: judging whether the demarcation point in the step 2.3 has other demarcation points in the two interval sequences divided by the original data set, if so, recombining the demarcation points in each interval sequence into a corresponding demarcation point set and returning to the step 2.3, continuously judging whether each interval sequence keeps the corresponding demarcation points according to the number of samples in the interval sequences and the corresponding class set, and otherwise, executing the step 2.5;

Step 2.5: according to the finally selected demarcation point set, interval sequence division is carried out on the continuous attribute data, if no demarcation point finally conforms to the minimum description criterion, all the continuous attribute data in the attribute are divided into an interval sequence, otherwise, the demarcation point divides the continuous attribute data into different interval sequences, and the discretization result of the continuous attribute data is obtained;

step 2.6: judging whether the continuous attributes in the sample data set are all discretized, if so, executing the step 3, otherwise, repeating the steps 2.1-2.5, and discretizing all the continuous attributes of the sample data set;

step 3, collecting rock burst attribute data of the area to be predicted, comparing the continuous attribute data with the corresponding attribute data in the step 2, and determining an interval sequence where the continuous attribute data in the rock burst attribute data of the area to be predicted are located according to a comparison result, so that the continuous attribute data in the rock burst attribute data of the area to be predicted are discretized;

step 4, searching K samples adjacent to the sample to be predicted from the discretization data set generated in the step 2 by adopting a K neighbor algorithm, forming a training set of a C4.5 decision tree by the K samples, and calculating the weight of the samples in the training set;

The weights of the samples in the training set are calculated according to the following formula:

wherein, ω is_iFor the weight of the ith sample adjacent to the sample to be predicted in the training set, i is 1, 2, …, k, d_iFor the sample to be predicted to the ith sample data x_iThe distance is calculated according to a distance formula using the attribute data of the sample, d_maxThe maximum value of the distances from the sample to be predicted to all samples in the training set is obtained;

and 5: calculating the information gain rate of all attributes in the training set according to the weight of sample data in the training set, and selecting the attribute with the maximum information gain rate in each iteration process as the splitting attribute of the root node and other branch nodes in the C4.5 decision tree in the generation process of the root node and other branch nodes;

the specific method for calculating the information gain rate of the attributes in the training set comprises the following steps:

let V be an attribute in the training set, V_jJ is 1, 2, … and m, m is the number of attribute values of attribute V of sample data in training set which do not overlap each other, and the class set corresponding to sample data in training set is C' ═ { C ═ C₁、c₂、…、c_nIn which c is_i′For the ith 'category, i' is 1, 2, …, and n is the total number of categories corresponding to the sample data in the training set, and the specific method for calculating the information gain rate of the attributes in the training set is as follows:

Calculating the class information entropy of the sample data in the training set, as shown in the following formula:

wherein,

for training set sample class c_i′Of samples of (a) and (b), ω_C′Weight sum of samples for all classes in training set, p (c)_i′) Class c in training set_i′Of samples and

weight sum ω with samples of all classes_C′The ratio of (A) to (B);

calculating the class condition entropy of the sample data in the training set, as shown in the following formula:

wherein,

taking a value of v for an attribute_jOf samples of (a) and (b), ω_VIs the sum of the weights of all samples in attribute V,

representing an attribute value of v_jIn the sample of (A) is of_i′Sum of sample weights for classes, p (v)_j) Taking the value of v for the attribute in the training set_jThe ratio of the sum of weights of the samples to the sum of weights of all samples, p (c)_i′|v_j) Taking a value of v for an attribute_jClass c in the sample_i′The sum of the weights of the samples and all the attribute values are v_jThe ratio of the weighted sums of the samples of (a);

calculating the information gain of the attribute V of the sample data in the training set, as shown in the following formula:

I(C′，V)＝I(C′)-I(C′|V)

calculating the information entropy of the attribute V of the sample data in the training set, as shown in the following formula:

calculating the information gain rate of the attribute V of the sample data in the training set, as shown in the following formula:

gain_radio(V)＝I(C′，V)/I(V)；

step 6: establishing a decision tree according to the splitting attribute, pruning the decision tree by adopting a pessimistic pruning method, and calculating the error rates of branch nodes and corresponding leaf nodes by using sample weights instead of the number of samples in the pruning process; and finally, predicting the potential rock burst danger level of the region to be predicted by the generated decision tree.

According to the technical scheme, the invention has the beneficial effects that: according to the rock burst danger level prediction method based on the local weighting C4.5 algorithm, the continuous attribute data in the sample data can be well processed by discretizing the continuous attribute data by adopting a minimum description criterion MDLP method, the local weighting method can select a training set according to the distance from the discretized sample to the sample to be predicted and endow different weights to the sample in the training set, the C4.5 algorithm is used for calculating an information gain rate by using the sample weight to select a node splitting attribute, the defect that more attributes are selected in a biased mode when the node splitting attribute is selected by using the information gain in an ID3 algorithm is overcome, the sample weight is used for replacing the number of samples to perform pessimistic pruning operation, the over-fitting problem is avoided, and the accuracy of the prediction model is improved.

Drawings

Fig. 1 is a flowchart of a rock burst risk level prediction method based on a local weighted C4.5 algorithm according to an embodiment of the present invention.

Detailed Description

The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.

In this embodiment, an inkstone coal mine in a certain area is taken as an example, and the rock burst risk level of the inkstone coal mine is predicted by using the rock burst risk level prediction method based on the local weighted C4.5 algorithm.

The rock burst danger level prediction method based on the local weighting C4.5 algorithm, as shown in FIG. 1, comprises the following steps:

step 1, collecting rock burst data of a known type as sample data, setting a collected sample data set as T, a type set of the sample as C, k' as the total number of the type of the sample, and the number of the sample as N.

The coal thickness (V) is selected in the embodiment due to more factors influencing rock burst₁) Inclination angle (V)₂) Buried depth (V)₃) Structural condition (V)₄) Change of inclination angle (V)₅) Coal thickness variation (V)₆) Gas concentration (V)₇) And top plate management (V)₈) Pressure relief (V)₉) Sound of coal (V)₁₀) And predicting the rock burst danger level of the coal mine by using 10 factors as the attributes of the sample data. Wherein, the structural condition (V)₄) Change of inclination angle (V)₅) Coal thickness variation (V)₆) And top plate management (V)₈) Pressure relief (V)₉) Sound of coal (V)₁₀) For the state parameters, the assignment is shown in table 1:

TABLE 1 State parameter assignment

The danger level of rock burst is divided into four categories according to the intensity of the rock burst, namely a category 1 of micro-impact, a category 2 of weak impact, a category 3 of medium impact and a category 4 of strong impact.

Table 2 shows the rock burst data collected as sample data in this example.

Table 2 rock burst data as sample data

Gain(a)＝H(C)-H(C|a)

step 2.6: and (3) judging whether the continuous attributes in the sample data set are all discretized, if so, executing the step (3), otherwise, repeating the steps 2.1-2.5, and discretizing all the continuous attributes of the sample data set.

In this embodiment, the continuous attribute V₁、V₃And V₇The information gain of the demarcation points in the set of demarcation points does not meet the minimum description criterion, and corresponding continuous attribute data is discretized into the same interval sequence according to the MDLP discretization principle, wherein the output is 1 in the embodiment. Continuous attribute V₂The final boundary point of (1) is the continuous attribute value of 45, so that continuous attribute values of 45 or more are classified into a section sequence and output as 2, and continuous attribute values of 45 or less are classified into a section sequence and output as 1. In this embodiment, the sample data after discretization as the training set is shown in table 3.

TABLE 3 discretized sample data

And 3, acquiring rock burst attribute data of the area to be predicted, comparing the continuous attribute data with the corresponding attribute data in the step 2, and determining an interval sequence where the continuous attribute data in the rock burst attribute data of the area to be predicted are located according to a comparison result, so that the continuous attribute data in the rock burst attribute data of the area to be predicted are discretized.

In this embodiment, in order to verify the effectiveness of the method of the present invention, the attribute data in table 4 is used as the collected attribute data of the impact pressure of the area to be predicted, the category data in table 4 is used to compare with the prediction result, and for the continuous attribute data in the 10 groups of data, the discretization result of the continuous attribute data in the 10 groups of data is obtained by comparing with the corresponding attribute data in the 25 groups of data in table 2, as shown in table 5.

TABLE 4 data to be predicted

Serial number	V₁/m	V₂/(°)	V₃/m	V₄	V₅	V₆	V₇/(m³·min^-1)	V₈	V₉	V₁₀	Categories
												1	1.5	35	530	0	0	0	0.56	3	3	0	1
2	1.6	62	307	3	2	2	1	0	0	2	4
												3	1.9	59	542	1	2	3	0.25	0	0	1	3
4	1.3	44	570	0	0	0	0.66	3	3	0	1
												5	2.2	54	290	3	2	2	1	0	0	2	4
6	3	34	475	2	2	1	0.42	0	0	2	3
												7	3.2	42	574	3	0	0	0.29	0	0	2	3
8	1.8	62	283	3	2	3	1	0	0	2	4
												9	1.3	44	656	2	1	3	0.24	1	1	2	3
10	1.2	40	553	2	2	2	0.49	1	2	2	3

TABLE 5 discretized data to be predicted

wherein, ω is_iFor the weight of the ith sample adjacent to the sample to be predicted in the training set, i is 1, 2, …, k, d_iFor the sample to be predicted to the ith sample data x_iThe distance is calculated according to a distance formula using the attribute data of the sample, d_maxIs the maximum of the distances from the sample to be predicted to all samples in the training set.

let V be an attribute in the training set, V _jJ is 1, 2, … and m, m is the number of attribute values of attribute V of sample data in training set which do not overlap each other, and the class set corresponding to sample data in training set is C' ═ { C ═ C₁、c₂、…、c_nIn which c is_i′For the ith 'category, i' is 1, 2, …, and n is the total number of categories corresponding to the sample data in the training set, and the specific method for calculating the information gain rate of the attributes in the training set is as follows:

wherein,

weight sum ω with samples of all classes_C′The ratio of (A) to (B);

wherein,

representing an attribute value of v_jIn the sample of (A) is of_i′Sum of sample weights for classes, p (v)_j) Taking the value of v for the attribute in the training set_jThe ratio of the sum of weights of the samples to the sum of weights of all samples, p (c)_i′|v_j) Taking a value of v for an attribute _jClass c in the sample_i′The sum of the weights of the samples and all the attribute values are v_jThe ratio of the weighted sums of the samples of (a);

I(C′，V)＝I(C′)-I(C′|V)

gain_radio(V)＝I(C′，V)/I(V)；

In this embodiment, in order to verify the prediction performance of the decision tree model established according to the discretized sample data, the model is first verified by a ten-fold cross validation method. Because the quantity of sample data in the training set is small, the sample data in all the training sets is selected as adjacent samples in the cross validation, in addition, the significance level in the pruning process of the C4.5 decision tree is set to be 25% which is commonly used, the sample distance in the weighted learning is determined by adopting an Euclidean distance function, the accuracy of the cross validation result of the model established by adopting the discretization training sample set is 88%, and the accuracy of the model established by adopting the original data in the table 2 is 84%, which indicates that the sample data after discretization can establish a better prediction model.

And predicting the rock burst danger level of the discretized area to be predicted in the table 4 by adopting a local weighted C4.5 algorithm. In this embodiment, a NaiveBayes method, an original C4.5 method, and a random forest method are further used to establish a prediction model according to the data in table 2 to predict the rock burst risk level in table 4, and a comparison with the prediction result of the method of the present invention is shown in table 6:

TABLE 6 comparison of predicted results of rock burst hazard ratings

Algorithm	Accuracy rate
		NaiveBayes	70％
C4.5 decision Tree	80％
		Random forest	80％
The method of the invention	100％

The table shows that the method can accurately predict the rock burst hazard level of the area to be predicted, and the prediction result is superior to that of the NaiveBayes method, the original C4.5 method and the random forest method.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions and scope of the present invention as defined in the appended claims.

Claims

1. A rock burst danger level prediction method based on a local weighting C4.5 algorithm is characterized by comprising the following steps: the method comprises the following steps:

step 2, discretizing continuous attribute data in the known type of sample data by adopting a Minimization Description Length (MDLP);

Step 6: establishing a decision tree according to the splitting attribute, pruning the decision tree by adopting a pessimistic pruning method, and calculating the error rates of branch nodes and corresponding leaf nodes by using sample weights instead of the number of samples in the pruning process; finally, predicting the potential rock burst danger level of the region to be predicted by the generated decision tree;

the specific method for discretization in step 2 is as follows:

Gain(a)＝H(C)-H(C|a)

Gain(a_min)＞log₂(N-1)/N+log₂(3^k′-2)-[k′H(C)-k₁′H(C₁)-k₂′H(C₂)]

2. The locally weighted C4.5 algorithm-based rock burst hazard level prediction method according to claim 1, wherein: the specific method for weighting the samples in the training set in step 4 comprises the following steps:

3. The locally weighted C4.5 algorithm-based rock burst hazard level prediction method according to claim 2, wherein: step 5, the specific method for calculating the information gain rate of the attributes in the training set is

wherein,

weight sum ω with samples of all classes_C′The ratio of (A) to (B);

wherein,

I(C′,V)＝I(C′)-I(C′|V)

gain_radio(V)＝I(C′,V)/I(V)。