CN108280289B - Rock burst danger level prediction method based on local weighted C4.5 algorithm - Google Patents

Rock burst danger level prediction method based on local weighted C4.5 algorithm Download PDF

Info

Publication number
CN108280289B
CN108280289B CN201810058598.8A CN201810058598A CN108280289B CN 108280289 B CN108280289 B CN 108280289B CN 201810058598 A CN201810058598 A CN 201810058598A CN 108280289 B CN108280289 B CN 108280289B
Authority
CN
China
Prior art keywords
attribute
sample
training set
data
samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810058598.8A
Other languages
Chinese (zh)
Other versions
CN108280289A (en
Inventor
王彦彬
彭连会
何满辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Liaoning Technical University
Original Assignee
Liaoning Technical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Liaoning Technical University filed Critical Liaoning Technical University
Priority to CN201810058598.8A priority Critical patent/CN108280289B/en
Publication of CN108280289A publication Critical patent/CN108280289A/en
Application granted granted Critical
Publication of CN108280289B publication Critical patent/CN108280289B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a rock burst danger level prediction method based on a local weighted C4.5 algorithm, and relates to the technical field of rock burst prediction. The method comprises the steps of firstly discretizing continuous attribute data in sample data by adopting an MDLP method, then selecting a training set by adopting a local weighting method and calculating sample weight, calculating the information gain rate of each attribute by utilizing the sample weight, selecting the sample attribute as the split attribute of a root node and other branch nodes of a C4.5 decision tree according to the information gain rate, and finally carrying out pessimistic pruning on the established decision tree by adopting the sample weight to replace the number of samples so as to realize prediction of the rock burst danger level of a prediction region. The rock burst danger level prediction method based on the local weighting C4.5 algorithm overcomes the defect that more attributes are selected preferentially when the split attributes of the information gain selection nodes are adopted in the ID3 algorithm, avoids the problem of overfitting, and is high in model prediction accuracy.

Description

Rock burst danger level prediction method based on local weighted C4.5 algorithm
Technical Field
The invention relates to the technical field of rock burst prediction, in particular to a rock burst danger level prediction method based on a local weighted C4.5 algorithm.
Background
The rock burst is a dynamic phenomenon which is characterized by sudden, sharp and violent damage caused by the release of deformation energy of coal rock bodies around mine roadways and stopes, is one of major disasters affecting the safety production of coal mines, almost all countries in the world are threatened by the rock burst to different degrees, developed countries in recent years close rock burst mines successively for the adjustment of energy structures and safety considerations, and China becomes a main country for the main victimization of the rock burst and the prevention and control of the rock burst.
The prediction and evaluation of rock burst are key steps for preventing and treating rock burst on the basis of research on the occurrence mechanism of rock burst, but the mechanism of rock burst is not completely understood, so that the difficulty of rock burst prediction is increased particularly when the research on the occurrence mechanism of deep rock burst is still in a starting stage. At present, rock mechanics methods and geophysical methods are mainly used for predicting rock burst, wherein the rock mechanics methods comprise a drilling cutting method, a mining induced stress detection method and the like, and the geophysical methods comprise methods such as ground sound monitoring, microseismic monitoring, electromagnetic radiation monitoring and the like; in addition, with the development of artificial intelligence, some methods for predicting rock burst by using an intelligent algorithm appear, such as: the method includes a neural network method, a Bayes discriminant analysis method, a support vector machine and the like, the method obtains a great deal of research results in the prediction of rock burst risk level, but has some problems, such as that the neural network generally needs a large amount of samples, the amount of samples used for rock burst prediction is small, the Bayes method needs high independence among data, real rock burst sampling data hardly meet independence requirements, and the method does not consider overfitting problems of models and the like.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a rock burst danger grade prediction method based on a local weighted C4.5 algorithm, which realizes the prediction of the rock burst danger grade of coal and rock mass around mine roadways and stopes.
The rock burst danger level prediction method based on the local weighting C4.5 algorithm comprises the following steps:
step 1, collecting rock burst data of a known type as sample data, setting a collected sample data set as T, a type set of the sample as C, k' as the total number of the type of the sample, and the number of the sample as N;
step 2, discretizing continuous attribute data in the known type of sample data by adopting a Minimization Description Length (MDLP) method, wherein the method specifically comprises the following steps:
step 2.1: sequencing a group of continuous attribute values to be discretized and corresponding categories thereof according to the sequence of the continuous attribute values from small to large;
step 2.2: selecting continuous attribute values as boundary points according to the difference of categories corresponding to the sorted continuous attribute values to form a boundary point set; if the attribute values corresponding to different categories are the same, selecting the attribute value corresponding to the smallest category as a demarcation point;
Step 2.3: calculating the information gain of all the demarcation points in the demarcation point set, selecting the demarcation point with the minimum information gain, judging whether the demarcation point meets the minimum description criterion, and if so, keeping the demarcation point; otherwise, removing the demarcation point;
the calculation formula of the information gain of the demarcation point is as follows:
Gain(a)=H(C)-H(C|a)
wherein a is a demarcation point in the demarcation point set, H (C) is the category information entropy, and H (C | a) is the information entropy obtained by dividing the category set C into two subsets by the demarcation point a;
let aminIs the demarcation point where the information gain is minimal, which divides the class set C into two subsets C1And C2Judgment of aminThe calculation formula of whether the minimum description criterion is met is as follows:
Gain(amin)>log2(N-1)/N+log2(3k′-2)-[k'H(C)-k′1H(C1)-k′2H(C2)]
wherein, k'1、k′2Are respectively a subset C1And C2The number of categories included in (a);
step 2.4: judging whether the demarcation point in the step 2.3 has other demarcation points in the two interval sequences divided by the original data set, if so, recombining the demarcation points in each interval sequence into a corresponding demarcation point set and returning to the step 2.3, continuously judging whether each interval sequence keeps the corresponding demarcation points according to the number of samples in the interval sequences and the corresponding class set, and otherwise, executing the step 2.5;
Step 2.5: according to the finally selected demarcation point set, interval sequence division is carried out on the continuous attribute data, if no demarcation point finally conforms to the minimum description criterion, all the continuous attribute data in the attribute are divided into an interval sequence, otherwise, the demarcation point divides the continuous attribute data into different interval sequences, and the discretization result of the continuous attribute data is obtained;
step 2.6: judging whether the continuous attributes in the sample data set are all discretized, if so, executing the step 3, otherwise, repeating the steps 2.1-2.5, and discretizing all the continuous attributes of the sample data set;
step 3, collecting rock burst attribute data of the area to be predicted, comparing the continuous attribute data with the corresponding attribute data in the step 2, and determining an interval sequence where the continuous attribute data in the rock burst attribute data of the area to be predicted are located according to a comparison result, so that the continuous attribute data in the rock burst attribute data of the area to be predicted are discretized;
step 4, searching K samples adjacent to the sample to be predicted from the discretization data set generated in the step 2 by adopting a K neighbor algorithm, forming a training set of a C4.5 decision tree by the K samples, and calculating the weight of the samples in the training set;
The weights of the samples in the training set are calculated according to the following formula:
Figure BDA0001554600150000021
wherein, ω isiFor the weight of the ith sample adjacent to the sample to be predicted in the training set, i is 1, 2, …, k, diFor the sample to be predicted to the ith sample data xiThe distance is calculated according to a distance formula using the attribute data of the sample, dmaxThe maximum value of the distances from the sample to be predicted to all samples in the training set is obtained;
and 5: calculating the information gain rate of all attributes in the training set according to the weight of sample data in the training set, and selecting the attribute with the maximum information gain rate in each iteration process as the splitting attribute of the root node and other branch nodes in the C4.5 decision tree in the generation process of the root node and other branch nodes;
the specific method for calculating the information gain rate of the attributes in the training set comprises the following steps:
let V be an attribute in the training set, VjJ is 1, 2, … and m, m is the number of attribute values of attribute V of sample data in training set which do not overlap each other, and the class set corresponding to sample data in training set is C' ═ { C ═ C1、c2、…、cnIn which c isi′For the ith 'category, i' is 1, 2, …, and n is the total number of categories corresponding to the sample data in the training set, and the specific method for calculating the information gain rate of the attributes in the training set is as follows:
Calculating the class information entropy of the sample data in the training set, as shown in the following formula:
Figure BDA0001554600150000031
wherein,
Figure BDA0001554600150000032
for training set sample class ci′Of samples of (a) and (b), ωC′Weight sum of samples for all classes in training set, p (c)i′) Class c in training seti′Of samples and
Figure BDA0001554600150000033
weight sum ω with samples of all classesC′The ratio of (A) to (B);
calculating the class condition entropy of the sample data in the training set, as shown in the following formula:
Figure BDA0001554600150000034
wherein,
Figure BDA0001554600150000035
taking a value of v for an attributejOf samples of (a) and (b), ωVIs the sum of the weights of all samples in attribute V,
Figure BDA0001554600150000036
representing an attribute value of vjIn the sample of (A) is ofi′Sum of sample weights for classes, p (v)j) Taking the value of v for the attribute in the training setjThe ratio of the sum of weights of the samples to the sum of weights of all samples, p (c)i′|vj) Taking a value of v for an attributejClass c in the samplei′The sum of the weights of the samples and all the attribute values are vjThe ratio of the weighted sums of the samples of (a);
calculating the information gain of the attribute V of the sample data in the training set, as shown in the following formula:
I(C′,V)=I(C′)-I(C′|V)
calculating the information entropy of the attribute V of the sample data in the training set, as shown in the following formula:
Figure BDA0001554600150000037
calculating the information gain rate of the attribute V of the sample data in the training set, as shown in the following formula:
gain_radio(V)=I(C′,V)/I(V);
step 6: establishing a decision tree according to the splitting attribute, pruning the decision tree by adopting a pessimistic pruning method, and calculating the error rates of branch nodes and corresponding leaf nodes by using sample weights instead of the number of samples in the pruning process; and finally, predicting the potential rock burst danger level of the region to be predicted by the generated decision tree.
According to the technical scheme, the invention has the beneficial effects that: according to the rock burst danger level prediction method based on the local weighting C4.5 algorithm, the continuous attribute data in the sample data can be well processed by discretizing the continuous attribute data by adopting a minimum description criterion MDLP method, the local weighting method can select a training set according to the distance from the discretized sample to the sample to be predicted and endow different weights to the sample in the training set, the C4.5 algorithm is used for calculating an information gain rate by using the sample weight to select a node splitting attribute, the defect that more attributes are selected in a biased mode when the node splitting attribute is selected by using the information gain in an ID3 algorithm is overcome, the sample weight is used for replacing the number of samples to perform pessimistic pruning operation, the over-fitting problem is avoided, and the accuracy of the prediction model is improved.
Drawings
Fig. 1 is a flowchart of a rock burst risk level prediction method based on a local weighted C4.5 algorithm according to an embodiment of the present invention.
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
In this embodiment, an inkstone coal mine in a certain area is taken as an example, and the rock burst risk level of the inkstone coal mine is predicted by using the rock burst risk level prediction method based on the local weighted C4.5 algorithm.
The rock burst danger level prediction method based on the local weighting C4.5 algorithm, as shown in FIG. 1, comprises the following steps:
step 1, collecting rock burst data of a known type as sample data, setting a collected sample data set as T, a type set of the sample as C, k' as the total number of the type of the sample, and the number of the sample as N.
The coal thickness (V) is selected in the embodiment due to more factors influencing rock burst1) Inclination angle (V)2) Buried depth (V)3) Structural condition (V)4) Change of inclination angle (V)5) Coal thickness variation (V)6) Gas concentration (V)7) And top plate management (V)8) Pressure relief (V)9) Sound of coal (V)10) And predicting the rock burst danger level of the coal mine by using 10 factors as the attributes of the sample data. Wherein, the structural condition (V)4) Change of inclination angle (V)5) Coal thickness variation (V)6) And top plate management (V)8) Pressure relief (V)9) Sound of coal (V)10) For the state parameters, the assignment is shown in table 1:
TABLE 1 State parameter assignment
Figure BDA0001554600150000041
Figure BDA0001554600150000051
The danger level of rock burst is divided into four categories according to the intensity of the rock burst, namely a category 1 of micro-impact, a category 2 of weak impact, a category 3 of medium impact and a category 4 of strong impact.
Table 2 shows the rock burst data collected as sample data in this example.
Table 2 rock burst data as sample data
Figure BDA0001554600150000052
Figure BDA0001554600150000061
Step 2, discretizing continuous attribute data in the known type of sample data by adopting a Minimization Description Length (MDLP) method, wherein the method specifically comprises the following steps:
step 2.1: sequencing a group of continuous attribute values to be discretized and corresponding categories thereof according to the sequence of the continuous attribute values from small to large;
step 2.2: selecting continuous attribute values as boundary points according to the difference of categories corresponding to the sorted continuous attribute values to form a boundary point set; if the attribute values corresponding to different categories are the same, selecting the attribute value corresponding to the smallest category as a demarcation point;
step 2.3: calculating the information gain of all the demarcation points in the demarcation point set, selecting the demarcation point with the minimum information gain, judging whether the demarcation point meets the minimum description criterion, and if so, keeping the demarcation point; otherwise, removing the demarcation point;
the calculation formula of the information gain of the demarcation point is as follows:
Gain(a)=H(C)-H(C|a)
wherein a is a demarcation point in the demarcation point set, H (C) is the category information entropy, and H (C | a) is the information entropy obtained by dividing the category set C into two subsets by the demarcation point a;
Let aminIs the demarcation point where the information gain is minimal, which divides the class set C into two subsets C1And C2Judgment of aminThe calculation formula of whether the minimum description criterion is met is as follows:
Gain(amin)>log2(N-1)/N+log2(3k′-2)-[k'H(C)-k′1H(C1)-k′2H(C2)]
wherein, k'1、k′2Are respectively a subset C1And C2The number of categories included in (a);
step 2.4: judging whether the demarcation point in the step 2.3 has other demarcation points in the two interval sequences divided by the original data set, if so, recombining the demarcation points in each interval sequence into a corresponding demarcation point set and returning to the step 2.3, continuously judging whether each interval sequence keeps the corresponding demarcation points according to the number of samples in the interval sequences and the corresponding class set, and otherwise, executing the step 2.5;
step 2.5: according to the finally selected demarcation point set, interval sequence division is carried out on the continuous attribute data, if no demarcation point finally conforms to the minimum description criterion, all the continuous attribute data in the attribute are divided into an interval sequence, otherwise, the demarcation point divides the continuous attribute data into different interval sequences, and the discretization result of the continuous attribute data is obtained;
step 2.6: and (3) judging whether the continuous attributes in the sample data set are all discretized, if so, executing the step (3), otherwise, repeating the steps 2.1-2.5, and discretizing all the continuous attributes of the sample data set.
In this embodiment, the continuous attribute V1、V3And V7The information gain of the demarcation points in the set of demarcation points does not meet the minimum description criterion, and corresponding continuous attribute data is discretized into the same interval sequence according to the MDLP discretization principle, wherein the output is 1 in the embodiment. Continuous attribute V2The final boundary point of (1) is the continuous attribute value of 45, so that continuous attribute values of 45 or more are classified into a section sequence and output as 2, and continuous attribute values of 45 or less are classified into a section sequence and output as 1. In this embodiment, the sample data after discretization as the training set is shown in table 3.
TABLE 3 discretized sample data
Figure BDA0001554600150000071
Figure BDA0001554600150000081
And 3, acquiring rock burst attribute data of the area to be predicted, comparing the continuous attribute data with the corresponding attribute data in the step 2, and determining an interval sequence where the continuous attribute data in the rock burst attribute data of the area to be predicted are located according to a comparison result, so that the continuous attribute data in the rock burst attribute data of the area to be predicted are discretized.
In this embodiment, in order to verify the effectiveness of the method of the present invention, the attribute data in table 4 is used as the collected attribute data of the impact pressure of the area to be predicted, the category data in table 4 is used to compare with the prediction result, and for the continuous attribute data in the 10 groups of data, the discretization result of the continuous attribute data in the 10 groups of data is obtained by comparing with the corresponding attribute data in the 25 groups of data in table 2, as shown in table 5.
TABLE 4 data to be predicted
Serial number V1/m V2/(°) V3/m V4 V5 V6 V7/(m3·min-1) V8 V9 V10 Categories
1 1.5 35 530 0 0 0 0.56 3 3 0 1
2 1.6 62 307 3 2 2 1 0 0 2 4
3 1.9 59 542 1 2 3 0.25 0 0 1 3
4 1.3 44 570 0 0 0 0.66 3 3 0 1
5 2.2 54 290 3 2 2 1 0 0 2 4
6 3 34 475 2 2 1 0.42 0 0 2 3
7 3.2 42 574 3 0 0 0.29 0 0 2 3
8 1.8 62 283 3 2 3 1 0 0 2 4
9 1.3 44 656 2 1 3 0.24 1 1 2 3
10 1.2 40 553 2 2 2 0.49 1 2 2 3
TABLE 5 discretized data to be predicted
Figure BDA0001554600150000082
Figure BDA0001554600150000091
Step 4, searching K samples adjacent to the sample to be predicted from the discretization data set generated in the step 2 by adopting a K neighbor algorithm, forming a training set of a C4.5 decision tree by the K samples, and calculating the weight of the samples in the training set;
the weights of the samples in the training set are calculated according to the following formula:
Figure BDA0001554600150000092
wherein, ω isiFor the weight of the ith sample adjacent to the sample to be predicted in the training set, i is 1, 2, …, k, diFor the sample to be predicted to the ith sample data xiThe distance is calculated according to a distance formula using the attribute data of the sample, dmaxIs the maximum of the distances from the sample to be predicted to all samples in the training set.
And 5: calculating the information gain rate of all attributes in the training set according to the weight of sample data in the training set, and selecting the attribute with the maximum information gain rate in each iteration process as the splitting attribute of the root node and other branch nodes in the C4.5 decision tree in the generation process of the root node and other branch nodes;
the specific method for calculating the information gain rate of the attributes in the training set comprises the following steps:
let V be an attribute in the training set, V jJ is 1, 2, … and m, m is the number of attribute values of attribute V of sample data in training set which do not overlap each other, and the class set corresponding to sample data in training set is C' ═ { C ═ C1、c2、…、cnIn which c isi′For the ith 'category, i' is 1, 2, …, and n is the total number of categories corresponding to the sample data in the training set, and the specific method for calculating the information gain rate of the attributes in the training set is as follows:
calculating the class information entropy of the sample data in the training set, as shown in the following formula:
Figure BDA0001554600150000101
wherein,
Figure BDA0001554600150000102
for training set sample class ci′Of samples of (a) and (b), ωC′Weight sum of samples for all classes in training set, p (c)i′) Class c in training seti′Of samples and
Figure BDA0001554600150000103
weight sum ω with samples of all classesC′The ratio of (A) to (B);
calculating the class condition entropy of the sample data in the training set, as shown in the following formula:
Figure BDA0001554600150000104
wherein,
Figure BDA0001554600150000105
taking a value of v for an attributejOf samples of (a) and (b), ωVIs the sum of the weights of all samples in attribute V,
Figure BDA0001554600150000106
representing an attribute value of vjIn the sample of (A) is ofi′Sum of sample weights for classes, p (v)j) Taking the value of v for the attribute in the training setjThe ratio of the sum of weights of the samples to the sum of weights of all samples, p (c)i′|vj) Taking a value of v for an attribute jClass c in the samplei′The sum of the weights of the samples and all the attribute values are vjThe ratio of the weighted sums of the samples of (a);
calculating the information gain of the attribute V of the sample data in the training set, as shown in the following formula:
I(C′,V)=I(C′)-I(C′|V)
calculating the information entropy of the attribute V of the sample data in the training set, as shown in the following formula:
Figure BDA0001554600150000107
calculating the information gain rate of the attribute V of the sample data in the training set, as shown in the following formula:
gain_radio(V)=I(C′,V)/I(V);
step 6: establishing a decision tree according to the splitting attribute, pruning the decision tree by adopting a pessimistic pruning method, and calculating the error rates of branch nodes and corresponding leaf nodes by using sample weights instead of the number of samples in the pruning process; and finally, predicting the potential rock burst danger level of the region to be predicted by the generated decision tree.
In this embodiment, in order to verify the prediction performance of the decision tree model established according to the discretized sample data, the model is first verified by a ten-fold cross validation method. Because the quantity of sample data in the training set is small, the sample data in all the training sets is selected as adjacent samples in the cross validation, in addition, the significance level in the pruning process of the C4.5 decision tree is set to be 25% which is commonly used, the sample distance in the weighted learning is determined by adopting an Euclidean distance function, the accuracy of the cross validation result of the model established by adopting the discretization training sample set is 88%, and the accuracy of the model established by adopting the original data in the table 2 is 84%, which indicates that the sample data after discretization can establish a better prediction model.
And predicting the rock burst danger level of the discretized area to be predicted in the table 4 by adopting a local weighted C4.5 algorithm. In this embodiment, a NaiveBayes method, an original C4.5 method, and a random forest method are further used to establish a prediction model according to the data in table 2 to predict the rock burst risk level in table 4, and a comparison with the prediction result of the method of the present invention is shown in table 6:
TABLE 6 comparison of predicted results of rock burst hazard ratings
Algorithm Accuracy rate
NaiveBayes 70%
C4.5 decision Tree 80%
Random forest 80%
The method of the invention 100%
The table shows that the method can accurately predict the rock burst hazard level of the area to be predicted, and the prediction result is superior to that of the NaiveBayes method, the original C4.5 method and the random forest method.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions and scope of the present invention as defined in the appended claims.

Claims (3)

1. A rock burst danger level prediction method based on a local weighting C4.5 algorithm is characterized by comprising the following steps: the method comprises the following steps:
step 1, collecting rock burst data of a known type as sample data, setting a collected sample data set as T, a type set of the sample as C, k' as the total number of the type of the sample, and the number of the sample as N;
step 2, discretizing continuous attribute data in the known type of sample data by adopting a Minimization Description Length (MDLP);
step 3, collecting rock burst attribute data of the area to be predicted, comparing the continuous attribute data with the corresponding attribute data in the step 2, and determining an interval sequence where the continuous attribute data in the rock burst attribute data of the area to be predicted are located according to a comparison result, so that the continuous attribute data in the rock burst attribute data of the area to be predicted are discretized;
step 4, searching K samples adjacent to the sample to be predicted from the discretization data set generated in the step 2 by adopting a K neighbor algorithm, forming a training set of a C4.5 decision tree by the K samples, and calculating the weight of the samples in the training set;
and 5: calculating the information gain rate of all attributes in the training set according to the weight of sample data in the training set, and selecting the attribute with the maximum information gain rate in each iteration process as the splitting attribute of the root node and other branch nodes in the C4.5 decision tree in the generation process of the root node and other branch nodes;
Step 6: establishing a decision tree according to the splitting attribute, pruning the decision tree by adopting a pessimistic pruning method, and calculating the error rates of branch nodes and corresponding leaf nodes by using sample weights instead of the number of samples in the pruning process; finally, predicting the potential rock burst danger level of the region to be predicted by the generated decision tree;
the specific method for discretization in step 2 is as follows:
step 2.1: sequencing a group of continuous attribute values to be discretized and corresponding categories thereof according to the sequence of the continuous attribute values from small to large;
step 2.2: selecting continuous attribute values as boundary points according to the difference of categories corresponding to the sorted continuous attribute values to form a boundary point set; if the attribute values corresponding to different categories are the same, selecting the attribute value corresponding to the smallest category as a demarcation point;
step 2.3: calculating the information gain of all the demarcation points in the demarcation point set, selecting the demarcation point with the minimum information gain, judging whether the demarcation point meets the minimum description criterion, and if so, keeping the demarcation point; otherwise, removing the demarcation point;
the calculation formula of the information gain of the demarcation point is as follows:
Gain(a)=H(C)-H(C|a)
wherein a is a demarcation point in the demarcation point set, H (C) is the category information entropy, and H (C | a) is the information entropy obtained by dividing the category set C into two subsets by the demarcation point a;
Let aminIs the demarcation point where the information gain is minimal, which divides the class set C into two subsets C1And C2Judgment of aminThe calculation formula of whether the minimum description criterion is met is as follows:
Gain(amin)>log2(N-1)/N+log2(3k′-2)-[k′H(C)-k1′H(C1)-k2′H(C2)]
wherein, k'1、k′2Are respectively a subset C1And C2The number of categories included in (a);
step 2.4: judging whether the demarcation point in the step 2.3 has other demarcation points in the two interval sequences divided by the original data set, if so, recombining the demarcation points in each interval sequence into a corresponding demarcation point set and returning to the step 2.3, continuously judging whether each interval sequence keeps the corresponding demarcation points according to the number of samples in the interval sequences and the corresponding class set, and otherwise, executing the step 2.5;
step 2.5: according to the finally selected demarcation point set, interval sequence division is carried out on the continuous attribute data, if no demarcation point finally conforms to the minimum description criterion, all the continuous attribute data in the attribute are divided into an interval sequence, otherwise, the demarcation point divides the continuous attribute data into different interval sequences, and the discretization result of the continuous attribute data is obtained;
step 2.6: and (3) judging whether the continuous attributes in the sample data set are all discretized, if so, executing the step (3), otherwise, repeating the steps 2.1-2.5, and discretizing all the continuous attributes of the sample data set.
2. The locally weighted C4.5 algorithm-based rock burst hazard level prediction method according to claim 1, wherein: the specific method for weighting the samples in the training set in step 4 comprises the following steps:
the weights of the samples in the training set are calculated according to the following formula:
Figure FDA0003028421540000021
wherein, ω isiFor the weight of the ith sample adjacent to the sample to be predicted in the training set, i is 1, 2, …, k, diFor the sample to be predicted to the ith sample data xiThe distance is calculated according to a distance formula using the attribute data of the sample, dmaxIs the maximum of the distances from the sample to be predicted to all samples in the training set.
3. The locally weighted C4.5 algorithm-based rock burst hazard level prediction method according to claim 2, wherein: step 5, the specific method for calculating the information gain rate of the attributes in the training set is
Let V be an attribute in the training set, VjJ is 1, 2, … and m, m is the number of attribute values of attribute V of sample data in training set which do not overlap each other, and the class set corresponding to sample data in training set is C' ═ { C ═ C1、c2、…、cnIn which c isi′For the ith 'category, i' is 1, 2, …, and n is the total number of categories corresponding to the sample data in the training set, and the specific method for calculating the information gain rate of the attributes in the training set is as follows:
Calculating the class information entropy of the sample data in the training set, as shown in the following formula:
Figure FDA0003028421540000022
wherein,
Figure FDA0003028421540000031
for training set sample class ci′Of samples of (a) and (b), ωC′Weight sum of samples for all classes in training set, p (c)i′) Class c in training seti′Of samples and
Figure FDA0003028421540000032
weight sum ω with samples of all classesC′The ratio of (A) to (B);
calculating the class condition entropy of the sample data in the training set, as shown in the following formula:
Figure FDA0003028421540000033
wherein,
Figure FDA0003028421540000034
taking a value of v for an attributejOf samples of (a) and (b), ωVIs the sum of the weights of all samples in attribute V,
Figure FDA0003028421540000035
representing an attribute value of vjIn the sample of (A) is ofi′Sum of sample weights for classes, p (v)j) Taking the value of v for the attribute in the training setjThe ratio of the sum of weights of the samples to the sum of weights of all samples, p (c)i′|vj) Taking a value of v for an attributejClass c in the samplei′The sum of the weights of the samples and all the attribute values are vjThe ratio of the weighted sums of the samples of (a);
calculating the information gain of the attribute V of the sample data in the training set, as shown in the following formula:
I(C′,V)=I(C′)-I(C′|V)
calculating the information entropy of the attribute V of the sample data in the training set, as shown in the following formula:
Figure FDA0003028421540000036
calculating the information gain rate of the attribute V of the sample data in the training set, as shown in the following formula:
gain_radio(V)=I(C′,V)/I(V)。
CN201810058598.8A 2018-01-22 2018-01-22 Rock burst danger level prediction method based on local weighted C4.5 algorithm Active CN108280289B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810058598.8A CN108280289B (en) 2018-01-22 2018-01-22 Rock burst danger level prediction method based on local weighted C4.5 algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810058598.8A CN108280289B (en) 2018-01-22 2018-01-22 Rock burst danger level prediction method based on local weighted C4.5 algorithm

Publications (2)

Publication Number Publication Date
CN108280289A CN108280289A (en) 2018-07-13
CN108280289B true CN108280289B (en) 2021-10-08

Family

ID=62804465

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810058598.8A Active CN108280289B (en) 2018-01-22 2018-01-22 Rock burst danger level prediction method based on local weighted C4.5 algorithm

Country Status (1)

Country Link
CN (1) CN108280289B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110175194B (en) * 2019-04-19 2021-02-02 中国矿业大学 Coal mine roadway surrounding rock deformation and fracture identification method based on association rule mining
CN111764963B (en) * 2020-07-06 2021-04-02 中国矿业大学(北京) Rock burst prediction method based on fast-RCNN
CN113901939B (en) * 2021-10-21 2022-07-01 黑龙江科技大学 Rock burst danger level prediction method based on fuzzy correction, storage medium and equipment
CN114780443A (en) * 2022-06-23 2022-07-22 国网数字科技控股有限公司 Micro-service application automatic test method and device, electronic equipment and storage medium
CN117557087B (en) * 2023-09-01 2024-09-06 广州市河涌监测中心 Drainage unit risk prediction model training method and system based on water affair data

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ES2287122T3 (en) * 2000-04-24 2007-12-16 Qualcomm Incorporated PROCEDURE AND APPARATUS FOR QUANTIFY PREDICTIVELY SPEAKS SOUND.
NZ596478A (en) * 2009-06-30 2014-04-30 Dow Agrosciences Llc Application of machine learning methods for mining association rules in plant and animal data sets containing molecular genetic markers, followed by classification or prediction utilizing features created from these association rules
WO2013075104A1 (en) * 2011-11-18 2013-05-23 Rutgers, The State University Of New Jersey Method and apparatus for detecting granular slip
US20160358099A1 (en) * 2015-06-04 2016-12-08 The Boeing Company Advanced analytical infrastructure for machine learning
CN105373606A (en) * 2015-11-11 2016-03-02 重庆邮电大学 Unbalanced data sampling method in improved C4.5 decision tree algorithm
CN106096748A (en) * 2016-04-28 2016-11-09 武汉宝钢华中贸易有限公司 Entrucking forecast model in man-hour based on cluster analysis and decision Tree algorithms
CN107145998A (en) * 2017-03-31 2017-09-08 中国农业大学 A kind of soil calculation of pressure method and system based on Dyna CLUE models

Also Published As

Publication number Publication date
CN108280289A (en) 2018-07-13

Similar Documents

Publication Publication Date Title
CN108280289B (en) Rock burst danger level prediction method based on local weighted C4.5 algorithm
CN110363344B (en) Probability integral parameter prediction method for optimizing BP neural network based on MIV-GP algorithm
CN107357966B (en) Evaluation method for stability prediction of surrounding rock of stoping roadway
CN107194524B (en) RBF neural network-based coal and gas outburst prediction method
CN110674841B (en) Logging curve identification method based on clustering algorithm
CN107122860B (en) Rock burst danger level prediction method based on grid search and extreme learning machine
CN112232522B (en) Intelligent recommendation and dynamic optimization method for deep roadway support scheme
CN103617147A (en) Method for identifying mine water-inrush source
CN109934398A (en) A kind of drill bursting construction tunnel gas danger classes prediction technique and device
CN112529341A (en) Drilling well leakage probability prediction method based on naive Bayesian algorithm
CN115130375A (en) Rock burst intensity prediction method
CN115017791A (en) Tunnel surrounding rock grade identification method and device
CN108268460A (en) A kind of method for automatically selecting optimal models based on big data
CN114723095A (en) Missing well logging curve prediction method and device
CN110633504A (en) Prediction method for coal bed gas permeability
CN115438823A (en) Borehole wall instability mechanism analysis and prediction method and system
CN110348510B (en) Data preprocessing method based on staged characteristics of deepwater oil and gas drilling process
CN115980826A (en) Rock burst intensity prediction method based on weighted meta-heuristic combined model
CN117035197A (en) Intelligent lost circulation prediction method with minimized cost
CN116822971B (en) Well wall risk level prediction method
CN110568495A (en) Rayleigh wave multi-mode dispersion curve inversion method based on generalized objective function
CN113946790A (en) Method, system, equipment and terminal for predicting height of water flowing fractured zone
CN117473305A (en) Method and system for predicting reservoir parameters enhanced by neighbor information
CN116933920A (en) Prediction and early warning method and system for underground mine debris flow
CN111667192A (en) Safety production risk assessment method based on NLP big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant