CN108280289B - Rock burst danger level prediction method based on local weighted C4.5 algorithm - Google Patents
Rock burst danger level prediction method based on local weighted C4.5 algorithm Download PDFInfo
- Publication number
- CN108280289B CN108280289B CN201810058598.8A CN201810058598A CN108280289B CN 108280289 B CN108280289 B CN 108280289B CN 201810058598 A CN201810058598 A CN 201810058598A CN 108280289 B CN108280289 B CN 108280289B
- Authority
- CN
- China
- Prior art keywords
- attribute
- sample
- training set
- data
- samples
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 68
- 239000011435 rock Substances 0.000 title claims abstract description 64
- 238000004422 calculation algorithm Methods 0.000 title claims abstract description 23
- 238000012549 training Methods 0.000 claims abstract description 74
- 238000003066 decision tree Methods 0.000 claims abstract description 20
- 238000013138 pruning Methods 0.000 claims abstract description 12
- 230000008569 process Effects 0.000 claims description 10
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 230000007547 defect Effects 0.000 abstract description 3
- 239000003245 coal Substances 0.000 description 11
- 238000002790 cross-validation Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 238000012544 monitoring process Methods 0.000 description 3
- 238000007637 random forest analysis Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005553 drilling Methods 0.000 description 1
- 230000005670 electromagnetic radiation Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- Evolutionary Computation (AREA)
- Geometry (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a rock burst danger level prediction method based on a local weighted C4.5 algorithm, and relates to the technical field of rock burst prediction. The method comprises the steps of firstly discretizing continuous attribute data in sample data by adopting an MDLP method, then selecting a training set by adopting a local weighting method and calculating sample weight, calculating the information gain rate of each attribute by utilizing the sample weight, selecting the sample attribute as the split attribute of a root node and other branch nodes of a C4.5 decision tree according to the information gain rate, and finally carrying out pessimistic pruning on the established decision tree by adopting the sample weight to replace the number of samples so as to realize prediction of the rock burst danger level of a prediction region. The rock burst danger level prediction method based on the local weighting C4.5 algorithm overcomes the defect that more attributes are selected preferentially when the split attributes of the information gain selection nodes are adopted in the ID3 algorithm, avoids the problem of overfitting, and is high in model prediction accuracy.
Description
Technical Field
The invention relates to the technical field of rock burst prediction, in particular to a rock burst danger level prediction method based on a local weighted C4.5 algorithm.
Background
The rock burst is a dynamic phenomenon which is characterized by sudden, sharp and violent damage caused by the release of deformation energy of coal rock bodies around mine roadways and stopes, is one of major disasters affecting the safety production of coal mines, almost all countries in the world are threatened by the rock burst to different degrees, developed countries in recent years close rock burst mines successively for the adjustment of energy structures and safety considerations, and China becomes a main country for the main victimization of the rock burst and the prevention and control of the rock burst.
The prediction and evaluation of rock burst are key steps for preventing and treating rock burst on the basis of research on the occurrence mechanism of rock burst, but the mechanism of rock burst is not completely understood, so that the difficulty of rock burst prediction is increased particularly when the research on the occurrence mechanism of deep rock burst is still in a starting stage. At present, rock mechanics methods and geophysical methods are mainly used for predicting rock burst, wherein the rock mechanics methods comprise a drilling cutting method, a mining induced stress detection method and the like, and the geophysical methods comprise methods such as ground sound monitoring, microseismic monitoring, electromagnetic radiation monitoring and the like; in addition, with the development of artificial intelligence, some methods for predicting rock burst by using an intelligent algorithm appear, such as: the method includes a neural network method, a Bayes discriminant analysis method, a support vector machine and the like, the method obtains a great deal of research results in the prediction of rock burst risk level, but has some problems, such as that the neural network generally needs a large amount of samples, the amount of samples used for rock burst prediction is small, the Bayes method needs high independence among data, real rock burst sampling data hardly meet independence requirements, and the method does not consider overfitting problems of models and the like.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a rock burst danger grade prediction method based on a local weighted C4.5 algorithm, which realizes the prediction of the rock burst danger grade of coal and rock mass around mine roadways and stopes.
The rock burst danger level prediction method based on the local weighting C4.5 algorithm comprises the following steps:
step 1, collecting rock burst data of a known type as sample data, setting a collected sample data set as T, a type set of the sample as C, k' as the total number of the type of the sample, and the number of the sample as N;
step 2, discretizing continuous attribute data in the known type of sample data by adopting a Minimization Description Length (MDLP) method, wherein the method specifically comprises the following steps:
step 2.1: sequencing a group of continuous attribute values to be discretized and corresponding categories thereof according to the sequence of the continuous attribute values from small to large;
step 2.2: selecting continuous attribute values as boundary points according to the difference of categories corresponding to the sorted continuous attribute values to form a boundary point set; if the attribute values corresponding to different categories are the same, selecting the attribute value corresponding to the smallest category as a demarcation point;
Step 2.3: calculating the information gain of all the demarcation points in the demarcation point set, selecting the demarcation point with the minimum information gain, judging whether the demarcation point meets the minimum description criterion, and if so, keeping the demarcation point; otherwise, removing the demarcation point;
the calculation formula of the information gain of the demarcation point is as follows:
Gain(a)=H(C)-H(C|a)
wherein a is a demarcation point in the demarcation point set, H (C) is the category information entropy, and H (C | a) is the information entropy obtained by dividing the category set C into two subsets by the demarcation point a;
let aminIs the demarcation point where the information gain is minimal, which divides the class set C into two subsets C1And C2Judgment of aminThe calculation formula of whether the minimum description criterion is met is as follows:
Gain(amin)>log2(N-1)/N+log2(3k′-2)-[k'H(C)-k′1H(C1)-k′2H(C2)]
wherein, k'1、k′2Are respectively a subset C1And C2The number of categories included in (a);
step 2.4: judging whether the demarcation point in the step 2.3 has other demarcation points in the two interval sequences divided by the original data set, if so, recombining the demarcation points in each interval sequence into a corresponding demarcation point set and returning to the step 2.3, continuously judging whether each interval sequence keeps the corresponding demarcation points according to the number of samples in the interval sequences and the corresponding class set, and otherwise, executing the step 2.5;
Step 2.5: according to the finally selected demarcation point set, interval sequence division is carried out on the continuous attribute data, if no demarcation point finally conforms to the minimum description criterion, all the continuous attribute data in the attribute are divided into an interval sequence, otherwise, the demarcation point divides the continuous attribute data into different interval sequences, and the discretization result of the continuous attribute data is obtained;
step 2.6: judging whether the continuous attributes in the sample data set are all discretized, if so, executing the step 3, otherwise, repeating the steps 2.1-2.5, and discretizing all the continuous attributes of the sample data set;
step 3, collecting rock burst attribute data of the area to be predicted, comparing the continuous attribute data with the corresponding attribute data in the step 2, and determining an interval sequence where the continuous attribute data in the rock burst attribute data of the area to be predicted are located according to a comparison result, so that the continuous attribute data in the rock burst attribute data of the area to be predicted are discretized;
step 4, searching K samples adjacent to the sample to be predicted from the discretization data set generated in the step 2 by adopting a K neighbor algorithm, forming a training set of a C4.5 decision tree by the K samples, and calculating the weight of the samples in the training set;
The weights of the samples in the training set are calculated according to the following formula:
wherein, ω isiFor the weight of the ith sample adjacent to the sample to be predicted in the training set, i is 1, 2, …, k, diFor the sample to be predicted to the ith sample data xiThe distance is calculated according to a distance formula using the attribute data of the sample, dmaxThe maximum value of the distances from the sample to be predicted to all samples in the training set is obtained;
and 5: calculating the information gain rate of all attributes in the training set according to the weight of sample data in the training set, and selecting the attribute with the maximum information gain rate in each iteration process as the splitting attribute of the root node and other branch nodes in the C4.5 decision tree in the generation process of the root node and other branch nodes;
the specific method for calculating the information gain rate of the attributes in the training set comprises the following steps:
let V be an attribute in the training set, VjJ is 1, 2, … and m, m is the number of attribute values of attribute V of sample data in training set which do not overlap each other, and the class set corresponding to sample data in training set is C' ═ { C ═ C1、c2、…、cnIn which c isi′For the ith 'category, i' is 1, 2, …, and n is the total number of categories corresponding to the sample data in the training set, and the specific method for calculating the information gain rate of the attributes in the training set is as follows:
Calculating the class information entropy of the sample data in the training set, as shown in the following formula:
wherein,for training set sample class ci′Of samples of (a) and (b), ωC′Weight sum of samples for all classes in training set, p (c)i′) Class c in training seti′Of samples andweight sum ω with samples of all classesC′The ratio of (A) to (B);
calculating the class condition entropy of the sample data in the training set, as shown in the following formula:
wherein,taking a value of v for an attributejOf samples of (a) and (b), ωVIs the sum of the weights of all samples in attribute V,representing an attribute value of vjIn the sample of (A) is ofi′Sum of sample weights for classes, p (v)j) Taking the value of v for the attribute in the training setjThe ratio of the sum of weights of the samples to the sum of weights of all samples, p (c)i′|vj) Taking a value of v for an attributejClass c in the samplei′The sum of the weights of the samples and all the attribute values are vjThe ratio of the weighted sums of the samples of (a);
calculating the information gain of the attribute V of the sample data in the training set, as shown in the following formula:
I(C′,V)=I(C′)-I(C′|V)
calculating the information entropy of the attribute V of the sample data in the training set, as shown in the following formula:
calculating the information gain rate of the attribute V of the sample data in the training set, as shown in the following formula:
gain_radio(V)=I(C′,V)/I(V);
step 6: establishing a decision tree according to the splitting attribute, pruning the decision tree by adopting a pessimistic pruning method, and calculating the error rates of branch nodes and corresponding leaf nodes by using sample weights instead of the number of samples in the pruning process; and finally, predicting the potential rock burst danger level of the region to be predicted by the generated decision tree.
According to the technical scheme, the invention has the beneficial effects that: according to the rock burst danger level prediction method based on the local weighting C4.5 algorithm, the continuous attribute data in the sample data can be well processed by discretizing the continuous attribute data by adopting a minimum description criterion MDLP method, the local weighting method can select a training set according to the distance from the discretized sample to the sample to be predicted and endow different weights to the sample in the training set, the C4.5 algorithm is used for calculating an information gain rate by using the sample weight to select a node splitting attribute, the defect that more attributes are selected in a biased mode when the node splitting attribute is selected by using the information gain in an ID3 algorithm is overcome, the sample weight is used for replacing the number of samples to perform pessimistic pruning operation, the over-fitting problem is avoided, and the accuracy of the prediction model is improved.
Drawings
Fig. 1 is a flowchart of a rock burst risk level prediction method based on a local weighted C4.5 algorithm according to an embodiment of the present invention.
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
In this embodiment, an inkstone coal mine in a certain area is taken as an example, and the rock burst risk level of the inkstone coal mine is predicted by using the rock burst risk level prediction method based on the local weighted C4.5 algorithm.
The rock burst danger level prediction method based on the local weighting C4.5 algorithm, as shown in FIG. 1, comprises the following steps:
step 1, collecting rock burst data of a known type as sample data, setting a collected sample data set as T, a type set of the sample as C, k' as the total number of the type of the sample, and the number of the sample as N.
The coal thickness (V) is selected in the embodiment due to more factors influencing rock burst1) Inclination angle (V)2) Buried depth (V)3) Structural condition (V)4) Change of inclination angle (V)5) Coal thickness variation (V)6) Gas concentration (V)7) And top plate management (V)8) Pressure relief (V)9) Sound of coal (V)10) And predicting the rock burst danger level of the coal mine by using 10 factors as the attributes of the sample data. Wherein, the structural condition (V)4) Change of inclination angle (V)5) Coal thickness variation (V)6) And top plate management (V)8) Pressure relief (V)9) Sound of coal (V)10) For the state parameters, the assignment is shown in table 1:
TABLE 1 State parameter assignment
The danger level of rock burst is divided into four categories according to the intensity of the rock burst, namely a category 1 of micro-impact, a category 2 of weak impact, a category 3 of medium impact and a category 4 of strong impact.
Table 2 shows the rock burst data collected as sample data in this example.
Table 2 rock burst data as sample data
Step 2, discretizing continuous attribute data in the known type of sample data by adopting a Minimization Description Length (MDLP) method, wherein the method specifically comprises the following steps:
step 2.1: sequencing a group of continuous attribute values to be discretized and corresponding categories thereof according to the sequence of the continuous attribute values from small to large;
step 2.2: selecting continuous attribute values as boundary points according to the difference of categories corresponding to the sorted continuous attribute values to form a boundary point set; if the attribute values corresponding to different categories are the same, selecting the attribute value corresponding to the smallest category as a demarcation point;
step 2.3: calculating the information gain of all the demarcation points in the demarcation point set, selecting the demarcation point with the minimum information gain, judging whether the demarcation point meets the minimum description criterion, and if so, keeping the demarcation point; otherwise, removing the demarcation point;
the calculation formula of the information gain of the demarcation point is as follows:
Gain(a)=H(C)-H(C|a)
wherein a is a demarcation point in the demarcation point set, H (C) is the category information entropy, and H (C | a) is the information entropy obtained by dividing the category set C into two subsets by the demarcation point a;
Let aminIs the demarcation point where the information gain is minimal, which divides the class set C into two subsets C1And C2Judgment of aminThe calculation formula of whether the minimum description criterion is met is as follows:
Gain(amin)>log2(N-1)/N+log2(3k′-2)-[k'H(C)-k′1H(C1)-k′2H(C2)]
wherein, k'1、k′2Are respectively a subset C1And C2The number of categories included in (a);
step 2.4: judging whether the demarcation point in the step 2.3 has other demarcation points in the two interval sequences divided by the original data set, if so, recombining the demarcation points in each interval sequence into a corresponding demarcation point set and returning to the step 2.3, continuously judging whether each interval sequence keeps the corresponding demarcation points according to the number of samples in the interval sequences and the corresponding class set, and otherwise, executing the step 2.5;
step 2.5: according to the finally selected demarcation point set, interval sequence division is carried out on the continuous attribute data, if no demarcation point finally conforms to the minimum description criterion, all the continuous attribute data in the attribute are divided into an interval sequence, otherwise, the demarcation point divides the continuous attribute data into different interval sequences, and the discretization result of the continuous attribute data is obtained;
step 2.6: and (3) judging whether the continuous attributes in the sample data set are all discretized, if so, executing the step (3), otherwise, repeating the steps 2.1-2.5, and discretizing all the continuous attributes of the sample data set.
In this embodiment, the continuous attribute V1、V3And V7The information gain of the demarcation points in the set of demarcation points does not meet the minimum description criterion, and corresponding continuous attribute data is discretized into the same interval sequence according to the MDLP discretization principle, wherein the output is 1 in the embodiment. Continuous attribute V2The final boundary point of (1) is the continuous attribute value of 45, so that continuous attribute values of 45 or more are classified into a section sequence and output as 2, and continuous attribute values of 45 or less are classified into a section sequence and output as 1. In this embodiment, the sample data after discretization as the training set is shown in table 3.
TABLE 3 discretized sample data
And 3, acquiring rock burst attribute data of the area to be predicted, comparing the continuous attribute data with the corresponding attribute data in the step 2, and determining an interval sequence where the continuous attribute data in the rock burst attribute data of the area to be predicted are located according to a comparison result, so that the continuous attribute data in the rock burst attribute data of the area to be predicted are discretized.
In this embodiment, in order to verify the effectiveness of the method of the present invention, the attribute data in table 4 is used as the collected attribute data of the impact pressure of the area to be predicted, the category data in table 4 is used to compare with the prediction result, and for the continuous attribute data in the 10 groups of data, the discretization result of the continuous attribute data in the 10 groups of data is obtained by comparing with the corresponding attribute data in the 25 groups of data in table 2, as shown in table 5.
TABLE 4 data to be predicted
Serial number | V1/m | V2/(°) | V3/m | V4 | V5 | V6 | V7/(m3·min-1) | V8 | V9 | V10 | Categories |
1 | 1.5 | 35 | 530 | 0 | 0 | 0 | 0.56 | 3 | 3 | 0 | 1 |
2 | 1.6 | 62 | 307 | 3 | 2 | 2 | 1 | 0 | 0 | 2 | 4 |
3 | 1.9 | 59 | 542 | 1 | 2 | 3 | 0.25 | 0 | 0 | 1 | 3 |
4 | 1.3 | 44 | 570 | 0 | 0 | 0 | 0.66 | 3 | 3 | 0 | 1 |
5 | 2.2 | 54 | 290 | 3 | 2 | 2 | 1 | 0 | 0 | 2 | 4 |
6 | 3 | 34 | 475 | 2 | 2 | 1 | 0.42 | 0 | 0 | 2 | 3 |
7 | 3.2 | 42 | 574 | 3 | 0 | 0 | 0.29 | 0 | 0 | 2 | 3 |
8 | 1.8 | 62 | 283 | 3 | 2 | 3 | 1 | 0 | 0 | 2 | 4 |
9 | 1.3 | 44 | 656 | 2 | 1 | 3 | 0.24 | 1 | 1 | 2 | 3 |
10 | 1.2 | 40 | 553 | 2 | 2 | 2 | 0.49 | 1 | 2 | 2 | 3 |
TABLE 5 discretized data to be predicted
Step 4, searching K samples adjacent to the sample to be predicted from the discretization data set generated in the step 2 by adopting a K neighbor algorithm, forming a training set of a C4.5 decision tree by the K samples, and calculating the weight of the samples in the training set;
the weights of the samples in the training set are calculated according to the following formula:
wherein, ω isiFor the weight of the ith sample adjacent to the sample to be predicted in the training set, i is 1, 2, …, k, diFor the sample to be predicted to the ith sample data xiThe distance is calculated according to a distance formula using the attribute data of the sample, dmaxIs the maximum of the distances from the sample to be predicted to all samples in the training set.
And 5: calculating the information gain rate of all attributes in the training set according to the weight of sample data in the training set, and selecting the attribute with the maximum information gain rate in each iteration process as the splitting attribute of the root node and other branch nodes in the C4.5 decision tree in the generation process of the root node and other branch nodes;
the specific method for calculating the information gain rate of the attributes in the training set comprises the following steps:
let V be an attribute in the training set, V jJ is 1, 2, … and m, m is the number of attribute values of attribute V of sample data in training set which do not overlap each other, and the class set corresponding to sample data in training set is C' ═ { C ═ C1、c2、…、cnIn which c isi′For the ith 'category, i' is 1, 2, …, and n is the total number of categories corresponding to the sample data in the training set, and the specific method for calculating the information gain rate of the attributes in the training set is as follows:
calculating the class information entropy of the sample data in the training set, as shown in the following formula:
wherein,for training set sample class ci′Of samples of (a) and (b), ωC′Weight sum of samples for all classes in training set, p (c)i′) Class c in training seti′Of samples andweight sum ω with samples of all classesC′The ratio of (A) to (B);
calculating the class condition entropy of the sample data in the training set, as shown in the following formula:
wherein,taking a value of v for an attributejOf samples of (a) and (b), ωVIs the sum of the weights of all samples in attribute V,representing an attribute value of vjIn the sample of (A) is ofi′Sum of sample weights for classes, p (v)j) Taking the value of v for the attribute in the training setjThe ratio of the sum of weights of the samples to the sum of weights of all samples, p (c)i′|vj) Taking a value of v for an attribute jClass c in the samplei′The sum of the weights of the samples and all the attribute values are vjThe ratio of the weighted sums of the samples of (a);
calculating the information gain of the attribute V of the sample data in the training set, as shown in the following formula:
I(C′,V)=I(C′)-I(C′|V)
calculating the information entropy of the attribute V of the sample data in the training set, as shown in the following formula:
calculating the information gain rate of the attribute V of the sample data in the training set, as shown in the following formula:
gain_radio(V)=I(C′,V)/I(V);
step 6: establishing a decision tree according to the splitting attribute, pruning the decision tree by adopting a pessimistic pruning method, and calculating the error rates of branch nodes and corresponding leaf nodes by using sample weights instead of the number of samples in the pruning process; and finally, predicting the potential rock burst danger level of the region to be predicted by the generated decision tree.
In this embodiment, in order to verify the prediction performance of the decision tree model established according to the discretized sample data, the model is first verified by a ten-fold cross validation method. Because the quantity of sample data in the training set is small, the sample data in all the training sets is selected as adjacent samples in the cross validation, in addition, the significance level in the pruning process of the C4.5 decision tree is set to be 25% which is commonly used, the sample distance in the weighted learning is determined by adopting an Euclidean distance function, the accuracy of the cross validation result of the model established by adopting the discretization training sample set is 88%, and the accuracy of the model established by adopting the original data in the table 2 is 84%, which indicates that the sample data after discretization can establish a better prediction model.
And predicting the rock burst danger level of the discretized area to be predicted in the table 4 by adopting a local weighted C4.5 algorithm. In this embodiment, a NaiveBayes method, an original C4.5 method, and a random forest method are further used to establish a prediction model according to the data in table 2 to predict the rock burst risk level in table 4, and a comparison with the prediction result of the method of the present invention is shown in table 6:
TABLE 6 comparison of predicted results of rock burst hazard ratings
Algorithm | Accuracy rate |
NaiveBayes | 70% |
C4.5 decision Tree | 80% |
Random forest | 80% |
The method of the invention | 100% |
The table shows that the method can accurately predict the rock burst hazard level of the area to be predicted, and the prediction result is superior to that of the NaiveBayes method, the original C4.5 method and the random forest method.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions and scope of the present invention as defined in the appended claims.
Claims (3)
1. A rock burst danger level prediction method based on a local weighting C4.5 algorithm is characterized by comprising the following steps: the method comprises the following steps:
step 1, collecting rock burst data of a known type as sample data, setting a collected sample data set as T, a type set of the sample as C, k' as the total number of the type of the sample, and the number of the sample as N;
step 2, discretizing continuous attribute data in the known type of sample data by adopting a Minimization Description Length (MDLP);
step 3, collecting rock burst attribute data of the area to be predicted, comparing the continuous attribute data with the corresponding attribute data in the step 2, and determining an interval sequence where the continuous attribute data in the rock burst attribute data of the area to be predicted are located according to a comparison result, so that the continuous attribute data in the rock burst attribute data of the area to be predicted are discretized;
step 4, searching K samples adjacent to the sample to be predicted from the discretization data set generated in the step 2 by adopting a K neighbor algorithm, forming a training set of a C4.5 decision tree by the K samples, and calculating the weight of the samples in the training set;
and 5: calculating the information gain rate of all attributes in the training set according to the weight of sample data in the training set, and selecting the attribute with the maximum information gain rate in each iteration process as the splitting attribute of the root node and other branch nodes in the C4.5 decision tree in the generation process of the root node and other branch nodes;
Step 6: establishing a decision tree according to the splitting attribute, pruning the decision tree by adopting a pessimistic pruning method, and calculating the error rates of branch nodes and corresponding leaf nodes by using sample weights instead of the number of samples in the pruning process; finally, predicting the potential rock burst danger level of the region to be predicted by the generated decision tree;
the specific method for discretization in step 2 is as follows:
step 2.1: sequencing a group of continuous attribute values to be discretized and corresponding categories thereof according to the sequence of the continuous attribute values from small to large;
step 2.2: selecting continuous attribute values as boundary points according to the difference of categories corresponding to the sorted continuous attribute values to form a boundary point set; if the attribute values corresponding to different categories are the same, selecting the attribute value corresponding to the smallest category as a demarcation point;
step 2.3: calculating the information gain of all the demarcation points in the demarcation point set, selecting the demarcation point with the minimum information gain, judging whether the demarcation point meets the minimum description criterion, and if so, keeping the demarcation point; otherwise, removing the demarcation point;
the calculation formula of the information gain of the demarcation point is as follows:
Gain(a)=H(C)-H(C|a)
wherein a is a demarcation point in the demarcation point set, H (C) is the category information entropy, and H (C | a) is the information entropy obtained by dividing the category set C into two subsets by the demarcation point a;
Let aminIs the demarcation point where the information gain is minimal, which divides the class set C into two subsets C1And C2Judgment of aminThe calculation formula of whether the minimum description criterion is met is as follows:
Gain(amin)>log2(N-1)/N+log2(3k′-2)-[k′H(C)-k1′H(C1)-k2′H(C2)]
wherein, k'1、k′2Are respectively a subset C1And C2The number of categories included in (a);
step 2.4: judging whether the demarcation point in the step 2.3 has other demarcation points in the two interval sequences divided by the original data set, if so, recombining the demarcation points in each interval sequence into a corresponding demarcation point set and returning to the step 2.3, continuously judging whether each interval sequence keeps the corresponding demarcation points according to the number of samples in the interval sequences and the corresponding class set, and otherwise, executing the step 2.5;
step 2.5: according to the finally selected demarcation point set, interval sequence division is carried out on the continuous attribute data, if no demarcation point finally conforms to the minimum description criterion, all the continuous attribute data in the attribute are divided into an interval sequence, otherwise, the demarcation point divides the continuous attribute data into different interval sequences, and the discretization result of the continuous attribute data is obtained;
step 2.6: and (3) judging whether the continuous attributes in the sample data set are all discretized, if so, executing the step (3), otherwise, repeating the steps 2.1-2.5, and discretizing all the continuous attributes of the sample data set.
2. The locally weighted C4.5 algorithm-based rock burst hazard level prediction method according to claim 1, wherein: the specific method for weighting the samples in the training set in step 4 comprises the following steps:
the weights of the samples in the training set are calculated according to the following formula:
wherein, ω isiFor the weight of the ith sample adjacent to the sample to be predicted in the training set, i is 1, 2, …, k, diFor the sample to be predicted to the ith sample data xiThe distance is calculated according to a distance formula using the attribute data of the sample, dmaxIs the maximum of the distances from the sample to be predicted to all samples in the training set.
3. The locally weighted C4.5 algorithm-based rock burst hazard level prediction method according to claim 2, wherein: step 5, the specific method for calculating the information gain rate of the attributes in the training set is
Let V be an attribute in the training set, VjJ is 1, 2, … and m, m is the number of attribute values of attribute V of sample data in training set which do not overlap each other, and the class set corresponding to sample data in training set is C' ═ { C ═ C1、c2、…、cnIn which c isi′For the ith 'category, i' is 1, 2, …, and n is the total number of categories corresponding to the sample data in the training set, and the specific method for calculating the information gain rate of the attributes in the training set is as follows:
Calculating the class information entropy of the sample data in the training set, as shown in the following formula:
wherein,for training set sample class ci′Of samples of (a) and (b), ωC′Weight sum of samples for all classes in training set, p (c)i′) Class c in training seti′Of samples andweight sum ω with samples of all classesC′The ratio of (A) to (B);
calculating the class condition entropy of the sample data in the training set, as shown in the following formula:
wherein,taking a value of v for an attributejOf samples of (a) and (b), ωVIs the sum of the weights of all samples in attribute V,representing an attribute value of vjIn the sample of (A) is ofi′Sum of sample weights for classes, p (v)j) Taking the value of v for the attribute in the training setjThe ratio of the sum of weights of the samples to the sum of weights of all samples, p (c)i′|vj) Taking a value of v for an attributejClass c in the samplei′The sum of the weights of the samples and all the attribute values are vjThe ratio of the weighted sums of the samples of (a);
calculating the information gain of the attribute V of the sample data in the training set, as shown in the following formula:
I(C′,V)=I(C′)-I(C′|V)
calculating the information entropy of the attribute V of the sample data in the training set, as shown in the following formula:
calculating the information gain rate of the attribute V of the sample data in the training set, as shown in the following formula:
gain_radio(V)=I(C′,V)/I(V)。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810058598.8A CN108280289B (en) | 2018-01-22 | 2018-01-22 | Rock burst danger level prediction method based on local weighted C4.5 algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810058598.8A CN108280289B (en) | 2018-01-22 | 2018-01-22 | Rock burst danger level prediction method based on local weighted C4.5 algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108280289A CN108280289A (en) | 2018-07-13 |
CN108280289B true CN108280289B (en) | 2021-10-08 |
Family
ID=62804465
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810058598.8A Active CN108280289B (en) | 2018-01-22 | 2018-01-22 | Rock burst danger level prediction method based on local weighted C4.5 algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108280289B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110175194B (en) * | 2019-04-19 | 2021-02-02 | 中国矿业大学 | Coal mine roadway surrounding rock deformation and fracture identification method based on association rule mining |
CN111764963B (en) * | 2020-07-06 | 2021-04-02 | 中国矿业大学(北京) | Rock burst prediction method based on fast-RCNN |
CN113901939B (en) * | 2021-10-21 | 2022-07-01 | 黑龙江科技大学 | Rock burst danger level prediction method based on fuzzy correction, storage medium and equipment |
CN114780443A (en) * | 2022-06-23 | 2022-07-22 | 国网数字科技控股有限公司 | Micro-service application automatic test method and device, electronic equipment and storage medium |
CN117557087B (en) * | 2023-09-01 | 2024-09-06 | 广州市河涌监测中心 | Drainage unit risk prediction model training method and system based on water affair data |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
ES2287122T3 (en) * | 2000-04-24 | 2007-12-16 | Qualcomm Incorporated | PROCEDURE AND APPARATUS FOR QUANTIFY PREDICTIVELY SPEAKS SOUND. |
NZ596478A (en) * | 2009-06-30 | 2014-04-30 | Dow Agrosciences Llc | Application of machine learning methods for mining association rules in plant and animal data sets containing molecular genetic markers, followed by classification or prediction utilizing features created from these association rules |
WO2013075104A1 (en) * | 2011-11-18 | 2013-05-23 | Rutgers, The State University Of New Jersey | Method and apparatus for detecting granular slip |
US20160358099A1 (en) * | 2015-06-04 | 2016-12-08 | The Boeing Company | Advanced analytical infrastructure for machine learning |
CN105373606A (en) * | 2015-11-11 | 2016-03-02 | 重庆邮电大学 | Unbalanced data sampling method in improved C4.5 decision tree algorithm |
CN106096748A (en) * | 2016-04-28 | 2016-11-09 | 武汉宝钢华中贸易有限公司 | Entrucking forecast model in man-hour based on cluster analysis and decision Tree algorithms |
CN107145998A (en) * | 2017-03-31 | 2017-09-08 | 中国农业大学 | A kind of soil calculation of pressure method and system based on Dyna CLUE models |
-
2018
- 2018-01-22 CN CN201810058598.8A patent/CN108280289B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN108280289A (en) | 2018-07-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108280289B (en) | Rock burst danger level prediction method based on local weighted C4.5 algorithm | |
CN110363344B (en) | Probability integral parameter prediction method for optimizing BP neural network based on MIV-GP algorithm | |
CN107357966B (en) | Evaluation method for stability prediction of surrounding rock of stoping roadway | |
CN107194524B (en) | RBF neural network-based coal and gas outburst prediction method | |
CN110674841B (en) | Logging curve identification method based on clustering algorithm | |
CN107122860B (en) | Rock burst danger level prediction method based on grid search and extreme learning machine | |
CN112232522B (en) | Intelligent recommendation and dynamic optimization method for deep roadway support scheme | |
CN103617147A (en) | Method for identifying mine water-inrush source | |
CN109934398A (en) | A kind of drill bursting construction tunnel gas danger classes prediction technique and device | |
CN112529341A (en) | Drilling well leakage probability prediction method based on naive Bayesian algorithm | |
CN115130375A (en) | Rock burst intensity prediction method | |
CN115017791A (en) | Tunnel surrounding rock grade identification method and device | |
CN108268460A (en) | A kind of method for automatically selecting optimal models based on big data | |
CN114723095A (en) | Missing well logging curve prediction method and device | |
CN110633504A (en) | Prediction method for coal bed gas permeability | |
CN115438823A (en) | Borehole wall instability mechanism analysis and prediction method and system | |
CN110348510B (en) | Data preprocessing method based on staged characteristics of deepwater oil and gas drilling process | |
CN115980826A (en) | Rock burst intensity prediction method based on weighted meta-heuristic combined model | |
CN117035197A (en) | Intelligent lost circulation prediction method with minimized cost | |
CN116822971B (en) | Well wall risk level prediction method | |
CN110568495A (en) | Rayleigh wave multi-mode dispersion curve inversion method based on generalized objective function | |
CN113946790A (en) | Method, system, equipment and terminal for predicting height of water flowing fractured zone | |
CN117473305A (en) | Method and system for predicting reservoir parameters enhanced by neighbor information | |
CN116933920A (en) | Prediction and early warning method and system for underground mine debris flow | |
CN111667192A (en) | Safety production risk assessment method based on NLP big data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |