Disclosure of Invention
In order to improve the accuracy of coal and gas outburst accident prediction, the application provides a coal and gas outburst prediction device.
The application provides a coal and gas outburst prediction device, adopts following technical scheme:
a coal and gas outburst prediction device, comprising:
the data acquisition module is used for acquiring sample data;
the data processing module is used for carrying out standardized processing on the sample data to obtain standard sample data;
a model building module for building an SVM-RF model based on the standard sample data;
and the result prediction module is used for predicting based on the SVM-RF model to obtain a prediction result.
By adopting the technical scheme, the model is trained according to the plurality of groups of sample data after the standardized processing, the SVM model is combined with the RF model to obtain the SVM-RF model, and the two models are combined to predict, so that the defects of the two models are avoided more probably, and the accuracy of predicting the coal and gas outburst accidents is improved by using the SVM-RF model.
Optionally, the model building module includes:
the data dimension reduction sub-module is used for carrying out dimension reduction operation on the standard sample data;
the data dividing sub-module is used for dividing the standard sample data after dimension reduction to obtain a training data set and a verification data set;
a first training sub-module for training the SVM Model based on the training data set to obtain a Model SVM ;
A second training sub-module for training the RF Model based on the training data set to obtain a Model RF ;
A Model determination sub-module for determining the Model based on the verification data set SVM The Model RF An SVM-RF model is determined.
By adopting the technical scheme, the dimension reduction operation is performed on the standard sample data before the model training, the model training and the model combination are performed according to the dimension reduced standard sample data, the accuracy of the model is improved, and when the model training and the model combination are performed, different standard sample data are used, so that the possibility of low model accuracy caused by errors of part of standard sample data is reduced.
Optionally, the standard sample data includes at least one standard feature data and output data, and the data dimension reduction submodule is specifically configured to:
bringing the salient feature data set corresponding to each standard feature data and the salient class data set corresponding to the output data into a formula to solveρThe formula includes:
,
wherein,ρfor characterizing the correlation between the standard characteristic data and the output data,x i is the first in the salient feature datasetiThe data of the plurality of data,y i is the first of the salient category datasetiThe data of the plurality of data,nfor the number of samples in the salient feature dataset,iis 1 to 1nAn integer therebetween.
Optionally, the data dimension reduction sub-module is further specifically configured to:
will be less than a preset valueρAnd deleting the corresponding standard characteristic data, thereby completing the dimension reduction operation, wherein the preset value is the minimum value of the correlation, and the dimension reduction operation is to delete the standard characteristic data which has the correlation with the output data lower than the preset value in the standard sample data.
By adopting the technical scheme, the standard characteristic data with low correlation with the output data, namely the predicted data, in the standard sample data is deleted, so that the accuracy of model training is improved.
Optionally, the verification data set includes a true value, and the model determination submodule is specifically configured to:
the Model is paired based on the validation dataset SVM Verifying to obtain a first predicted value;
pairing the verification data set based on the verification data setModel RF Verifying to obtain a second predicted value;
determining a first average absolute error based on the first predicted value and the real value, the first average absolute error being an average absolute error between the first predicted value and the real value;
determining a second average absolute error based on the second predicted value and the real value, the second average absolute error being an average absolute error between the second predicted value and the real value;
the SVM-RF model is determined based on the first average absolute error and the second average absolute error.
By adopting the technical scheme, the Model is respectively checked by verifying the data set SVM Model RF And verifying to obtain a first predicted value and a second predicted value, respectively calculating average absolute errors of the two models according to the predicted values, and determining an SVM-RF model according to the first average absolute error and the second average absolute error, namely, combining the models according to specific conditions of the two models, so that the accuracy of the SVM-RF model is higher.
Optionally, the model determination submodule is further specifically configured to:
if the first average absolute error and the second average absolute error do not meet a first condition, determining a weight value by the following formula:
,
wherein,for the Model RF Corresponding weight value, +.>For the Model SVM The corresponding weight value is used for the weight,for said first mean absolute error, +.>The first condition is that the second average absolute error is,λA preset judgment threshold value;
based on the followingSaid->The SVM-RF model is determined.
By adopting the technical scheme, when the average absolute error difference of the two models is smaller, the weight values respectively corresponding to the two models are calculated according to the first average absolute error and the second average absolute error, and the models are combined according to the corresponding weight values, so that the SVM-RF model is obtained, and the accuracy of the SVM-RF model is higher.
Optionally, the model determination submodule is further specifically configured to:
if the first average absolute error and the second average absolute error meet a first condition, determining a magnitude relation between the first average absolute error and the second average absolute error, wherein the first condition is that,λFor the preset judgment threshold value, < >>For said first mean absolute error, +.>Is the second average absolute error;
determining the Model based on the size relationship SVM Weight value of (2)The Model RF Weight value +.>;
Based on the followingSaid->The SVM-RF model is determined.
By adopting the technical scheme, when the average absolute error of the two models is larger, the corresponding weight value is determined according to the magnitude relation between the first average absolute error and the second average absolute error, so that the accuracy of the SVM-RF model is higher.
Optionally, the model determination submodule is further specifically configured to:
if the first average absolute error is greater than the second average absolute error, then the Model SVM The corresponding weight value is 0, the Model RF The corresponding weight value is 1;
if the first average absolute error is less than the second average absolute error, then the Model SVM The corresponding weight value is 1, the Model RF The corresponding weight value is 0.
By adopting the technical scheme, when the average absolute error difference of the two models is larger, the prediction is performed only according to the model with smaller average absolute error, namely, the SVM-RF model is only one model, so that the accuracy of the SVM-RF model is higher.
Optionally, the coal and gas outburst prediction device further includes a missing value filling module, and the missing value filling module is specifically configured to:
determining a group of standard characteristic data and output data before the missing value;
determining a set of the standard feature data and the output data after the missing values;
the missing value is calculated based on the following formula:
,
wherein,x k for the standard characteristic data preceding the missing value,y k for the output data preceding the missing value,x k+1 for the standard characteristic data after the missing values,y k+1 for the output data after the missing value,xfor the value of the absence of the value,H3(x)the output data corresponding to the missing value is obtained;
and filling the missing value into the sample data to obtain the complete sample data.
By adopting the technical scheme, the missing value is calculated according to the standard characteristic data before and after the missing value, namely the missing value is calculated according to the standard characteristic data with similar environments with the missing value, so that the missing value is more accurate, and the sample data is more accurate.
Optionally, the sample data includes at least one characteristic data, and the data processing module includes:
and carrying out standardization processing on the characteristic data based on the following formula to obtain the standard characteristic data, wherein the formula comprises the following steps:
,
wherein,μfor each of the means of the feature data,σfor each standard deviation of the characteristic data,xin order to provide the characteristic data as described,Zis the standard characteristic data.
By adopting the technical scheme, the sample data with different orders of magnitude and dimensions are standardized, so that the orders of magnitude and dimensions of the standard sample data are the same, and the model trained according to the standard sample data is more accurate.
Detailed Description
The present application is described in further detail below with reference to the accompanying drawings.
Fig. 1 is a block diagram of a coal and gas outburst prediction device according to an embodiment of the present application.
As shown in fig. 1, a coal and gas outburst prediction apparatus 100 includes (blocks 101-104):
the data acquisition module 101 is configured to acquire sample data.
In this embodiment, the sample data is coal and gas outburst sample data of different mining areas, the sample data includes at least one kind of characteristic data and output data, the characteristic data is index data related to coal and gas outburst, and the characteristic data is for example: the method comprises the steps of gas content, gas pressure, gas emission initial speed, firmness coefficient, burial depth, gas concentration change rate, coal seam thickness change rate, distance between the burial depth and a geological structure zone and thickness change rate of coal, wherein output data are the occurrence condition of coal and gas outburst, the output data are represented by 1 or 2, 1 represents occurrence of coal and gas outburst accidents, 2 represents non-occurrence of coal and gas outburst accidents, multiple groups of sample data are obtained from different mining areas, so that the sample data are more reliable, the number of the sample data is not limited to 100 groups, and the number of the sample data can be 200 groups.
After the sample data are acquired, the sample data are ordered according to the mining areas corresponding to the sample data, so that adjacent sample data come from the same mining area, and the sample data of different mining areas are ordered according to the acquired time.
As an optional implementation manner of this embodiment, the coal and gas outburst prediction device further includes a missing value filling module, where the missing value filling module is specifically configured to: determining a set of standard characteristic data and output data before the missing values; determining a group of standard characteristic data and output data after the missing values; the missing value is calculated based on the following formula:
wherein, the method comprises the steps of, wherein,x k as standard characteristic data before the missing value,y k for the output data before the missing value,x k+1 is the standard characteristic data after the missing value,y k+1 as the output data after the missing value,xin order to be able to delete the value,H3(x)output data corresponding to the missing value; and filling the missing value into the sample data to obtain complete sample data.
In the process of obtaining sample data, partial sample data are missing due to various reasons, filling of missing values is needed, and because the sample data are ordered according to corresponding mining areas, namely the environments of adjacent sample data are similar, the missing values can be calculated through two groups of adjacent sample data, the missing values can be calculated according to the formula, the missing values are filled into the corresponding sample data, and accordingly prediction can be performed more accurately according to a model trained by the sample data.
The data processing module 102 is configured to perform normalization processing on the sample data to obtain standard sample data.
As an alternative implementation of the present embodiment, the data processing module 102 includes: the feature data is normalized based on the following formula, so as to obtain standard feature data, wherein the formula comprises:wherein, the method comprises the steps of, wherein,μfor the mean value of each type of feature data,σfor the standard deviation of each type of feature data,xas a result of the characteristic data,Zis standard characteristic data.
Because of the difference of the orders of magnitude and the dimensions among various characteristic data in the sample data, the orders of magnitude and the dimensions of the sample data need to be unified in order to better establish a prediction model and predict the coal and gas outburst better.
Based on the above formula, each feature data is normalized to obtain standard feature data, so as to obtain standard sample data, wherein, the method for calculating standard deviation and calculating mean is a method known to those skilled in the art, and will not be described herein.
The model building module 103 is configured to build an SVM-RF model based on the standard sample data.
The standard sample data comprises at least one standard characteristic data and output data, and the SVM-RF model is a model obtained by combining the SVM model and the RF model.
Fig. 2 is a block diagram of the model creation module 103 provided in the embodiment of the present application.
As an alternative implementation of the present embodiment, as shown in fig. 2, the model building module 103 includes (sub-modules 1031 to 1035):
the data dimension reduction submodule 1031 is used for performing dimension reduction operation on the standard sample data.
The dimension reduction operation is to delete standard characteristic data with low correlation with output data in the standard sample data.
As an alternative implementation of this embodiment, the data dimension reduction submodule 1031 is specifically configured to: carrying the salient feature data set corresponding to each standard feature data and the salient category data set corresponding to the output data into a formula to solveρThe formula includes:
,
wherein,ρfor characterizing the correlation between the standard characteristic data and the output data,x i to highlight the first of the feature datasetiThe data of the plurality of data,y i to highlight the first category datasetiThe data of the plurality of data,nto highlight the number of samples in the feature dataset,iis 1 to 1nAn integer therebetween.
In the present embodiment, the correlation between each standard characteristic data and the output data is calculated according to the above formula, according to the correlationρDetermining a level of correlation between the standard characteristic data and the output data, the magnitude of the correlationρThe correspondence with the correlation level is shown in table 1.
TABLE 1 correspondence between correlation magnitude and correlation level
Value range
|
Correlation grade
|
ρ>0.9
|
Height
|
0.6<ρ<0.9
|
Moderate degree
|
0.5<ρ<0.6
|
In general
|
ρ<0.5
|
Low level of |
As an alternative implementation of this embodiment, the data dimension reduction submodule 1031 is further specifically configured to: will be less than a preset valueρDeleting the corresponding standard characteristic data, thereby completing the dimension reduction operation, wherein the preset value is the minimum value of the correlation, and the dimension reduction operation is to delete the standard characteristic data with the correlation with the output data lower than the preset value in the standard sample data.
In this embodiment, the preset value may be a correlation value corresponding to a low correlation levelρI.e.ρMay be 0.5, and, at the same time, in order to improve the accuracy of the model,ρbut may also be 0.6.
The data dividing submodule 1032 is configured to divide the standard sample data after the dimension reduction to obtain a training data set and a verification data set.
Dividing standard sample data to obtain a training data set and a verification data set, and dividing the training data set into a test data set, training a model through the training data set, verifying the accuracy of the model through the verification data set and combining the models, and testing the combined model through the test data set, wherein in order to improve the accuracy of the model, the training data set occupies the largest proportion of the sample data, for example: the partitioning criteria may be 80%, 10% and 10% of the sample for the training data set, the validation data set and the test data set, respectively.
A first training submodule 1033 for training the SVM Model based on the training data set to obtain a Model SVM 。
Inputting the training data set into the SVM Model for training to obtain a trained SVM Model SVM 。
A second training sub-module 1034 for training the RF Model based on the training data set to obtain a Model RF 。
Inputting the training data set into the RF Model for training to obtain a trained RF Model RF 。
Model determination submodule 1035 for verifying the data set, model SVM Model RF An SVM-RF model is determined.
As an alternative implementation of the present embodiment, the verification data set includes a true value, and the model determination submodule 1035 is specifically configured to: model pair based on validation dataset SVM Verifying to obtain a first predicted value; model pair based on validation dataset RF Verifying to obtain a second predicted value; determining a first average absolute error based on the first predicted value and the true value, the first average absolute error being an average absolute error between the first predicted value and the true value; determining a second average absolute error based on the second predicted value and the true value, the second average absolute error being an average absolute error between the second predicted value and the true value; the SVM-RF model is determined based on the first average absolute error and the second average absolute error.
In this embodiment, the true value is the output data in the verification data set, and the standard bits in the verification data setThe sign data is input to the Model SVM The Model obtains an output value which is a first predicted value, and inputs the standard characteristic data in the verification data set to the Model RF The method comprises the steps of obtaining a model, obtaining an output value which is a second predicted value, obtaining a plurality of first predicted values and a plurality of second predicted values because a plurality of groups of sample data are included in a verification data set, respectively making differences between the plurality of first predicted values and a real value and taking absolute values to obtain a plurality of first absolute errors, respectively taking positive numbers for the first absolute errors, averaging the plurality of first absolute errors to obtain a first average absolute error, respectively making differences between the plurality of second predicted values and the real value and taking absolute values for the second absolute errors, obtaining a plurality of second absolute errors, respectively taking positive numbers for the second absolute errors, averaging the plurality of second absolute errors to obtain a second average absolute error, and determining a combination mode of an SVM model and an RF model according to the first average absolute errors and the second average absolute errors to obtain the SVM-RF model.
As an alternative implementation of the present embodiment, the model determination submodule 1035 is further specifically configured to: if the first average absolute error and the second average absolute error do not meet the first condition, determining the weight value through the following formula:
,
wherein,is a Model RF Corresponding weight value, +.>Is a Model SVM Corresponding weight value, +.>For the first mean absolute error>The first condition is +.>,λA preset judgment threshold value; based on->And +.>An SVM-RF model is determined.
In the present embodiment of the present invention, in the present embodiment,λthe method includes the steps that a judgment threshold value preset by a worker is not specifically limited, if a first average absolute error and a second average absolute error do not meet a first condition, namely the first average absolute error and the second average absolute error are smaller in phase difference, a weight value corresponding to each model is calculated through the first average absolute error and the second average absolute error, the two models are combined according to the corresponding weight values, the combination mode is that output results of the two models are combined and calculated, a prediction result of an SVM-RF model is obtained, and the prediction result of the SVM-RF model isWherein->Is a Model SVM Output result of->Is a Model RF Output results of (2).
And->1 or 2, if the weight value is decimal, so that the prediction result of the combined SVM-RF model is decimal, calculating the difference value between the prediction result and 1 or 2, and if the difference value between the prediction result and 1 is smaller than the first difference value, the prediction result is 1; if the difference between the predicted result and 2 is smaller than the first difference, the predicted result is 2; if the difference between the predicted result and 1 and 2 is greater than or equal to the first difference, according to the first levelRe-determining the weight value of each model according to the magnitude relation between the absolute error and the second average absolute error, and determining the weight value according to the new weight value and the formula +.>And calculating a prediction result of the SVM-RF model, wherein the first difference value is less than or equal to 0.3, and the first difference value can be 0.3 or 0.2.
As an alternative implementation of the present embodiment, the model determination submodule 1035 is further specifically configured to: if the first average absolute error and the second average absolute error meet the first condition, determining the magnitude relation between the first average absolute error and the second average absolute error, wherein the first condition is that,λFor the preset judgment threshold value, < >>For the first mean absolute error>Is the second average absolute error; determining Model based on size relationships SVM Weight value +.>Model RF Weight value +.>The method comprises the steps of carrying out a first treatment on the surface of the Based on->And +.>An SVM-RF model is determined.
In this embodiment, if the first average absolute error and the second average absolute error satisfy the first condition, i.e. the first average absolute error and the second average absolute error differ greatly, the magnitudes of the first average absolute error and the second average absolute error are compared to obtain the magnitude relation between the twoThe relation comprises that the first average absolute error is larger than the second average absolute error and the first average absolute error is smaller than the second average absolute error, and the Model is determined according to the magnitude relation SVM Weight value of (2)Model RF Weight value +.>Combining the two models according to the corresponding weight values in a way of combining and calculating the output results of the two models so as to obtain a prediction result of the SVM-RF model, wherein the prediction result of the SVM-RF model is +.>Wherein->Is a Model SVM Output result of->Is a Model RF Output results of (2).
As an alternative implementation of the present embodiment, the model determination submodule 1035 is further specifically configured to: if the first average absolute error is greater than the second average absolute error, then the Model SVM The corresponding weight value is 0, model RF The corresponding weight value is 1; if the first average absolute error is less than the second average absolute error, then the Model SVM The corresponding weight value is 1, model RF The corresponding weight value is 0.
In the present embodiment, when the weight values of the two models are determined according to the magnitude relation between the first average absolute error and the second average absolute error, if the magnitude relation is that the first average absolute error is greater than the second average absolute error, the Model SVM The corresponding weight value is 0, model RF The corresponding weight value is 1; if the magnitude relation is that the first average absolute error is smaller than the second average absolute error, the Model SVM The corresponding weight value is 1, model RF The corresponding weight value is 0.
The result prediction module 104 is configured to predict based on the SVM-RF model to obtain a predicted result.
At the time of obtaining Model SVM And Model RF After the weight values of (2), respectively by Model SVM And Model RF Predicting to obtain Model SVM Is the prediction result of (2)Model RF Predicted outcome of->Then according to the formulaAnd calculating a prediction result of the SVM-RF model.
In one example, a module in any of the above apparatuses may be one or more integrated circuits configured to implement the above methods, for example: one or more application specific integrated circuits (application specific integratedcircuit, ASIC), or one or more digital signal processors (digital signal processor, DSP), or one or more field programmable gate arrays (field programmable gate array, FPGA), or a combination of at least two of these integrated circuit forms.
For another example, when a module in an apparatus may be implemented in the form of a scheduler of processing elements, the processing elements may be general-purpose processors, such as a central processing unit (central processing unit, CPU) or other processor that may invoke a program. For another example, the modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).
The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
The foregoing description is only of the preferred embodiments of the present application and is presented as a description of the principles of the technology being utilized. It will be appreciated by persons skilled in the art that the scope of the application referred to in this application is not limited to the specific combinations of features described above, but it is intended to cover other embodiments in which any combination of features described above or their equivalents is possible without departing from the spirit of the application. Such as the above-mentioned features and the technical features having similar functions (but not limited to) applied for in this application are replaced with each other.