CN115310352A

CN115310352A - Mine water inrush source identification method based on PCA-CSSA-RF model

Info

Publication number: CN115310352A
Application number: CN202210894193.4A
Authority: CN
Inventors: 黄敏; 王彦彬; 毛岸
Original assignee: Liaoning Technical University
Current assignee: Liaoning Technical University
Priority date: 2022-07-27
Filing date: 2022-07-27
Publication date: 2022-11-08

Abstract

The invention discloses a method for identifying a mine water inrush source, which is characterized in that Na + + K +, ca2+, mg2+, cl-, SO42-, HCO 3-and total hardness are used as distinguishing indexes according to the difference of water chemical components of different aquifers of a mine for quickly and accurately identifying the mine water inrush source. Dimensionality reduction is carried out on data by using a principal component analysis method, optimization is carried out on two parameters of tree depth (dp) and tree number (es) in a random forest algorithm through a chaotic sparrow search algorithm, and a mine water inrush source identification model based on PCA (principal component analysis) -CSSA (chaotic sparrow search) -RF (random forest) is established. The invention can reduce the redundancy in the original data by using the principal component analysis method to carry out dimensionality reduction on the data, and simultaneously can improve the global searching capability and the predicting capability by using the random forest model optimized by the chaotic sparrow searching algorithm, thereby improving the efficiency and the accuracy of identifying the water inrush source.

Description

Mine water inrush source identification method based on PCA-CSSA-RF model

Technical Field

The invention belongs to the technical field of mine water inrush source identification, and particularly relates to a mine water inrush source identification method based on a PCA-CSSA-RF model.

Background

The mine water inrush accident is one of the main common disasters of a coal mine, and when the water inrush disaster happens, whether the water inrush cause can be judged quickly and accurately and a water inrush water source can be identified is significant for reducing casualties and economic losses and preventing and controlling the water inrush disaster; the water inrush source identification mainly adopts a water chemistry method at present, the water chemistry method can reflect the essential characteristics of underground water and can accurately, quickly and economically identify water sources and is widely applied to mine water inrush source identification, the method for predicting the mine water inrush source at present mainly focuses on pair analysis, machine learning, extreme learning machine, neural network, bayesian method, fisher discriminant analysis method, logistic regression and random forest method, support vector machine and the like, the research obtains rich results in the aspect of mine water inrush source identification, but as correlation and nonlinearity often exist among different aquifer water chemistry data, the difficulty of water inrush source identification is increased, and the identification efficiency of certain algorithms is greatly influenced by parameter selection, therefore, further optimization space exists in the aspects of original data processing, the efficiency of the discriminant algorithms and the accuracy.

Principal Component Analysis (PCA) is a widely used multivariate statistical analysis method, and can realize the dimension reduction processing of original data and eliminate the correlation among indexes by extracting key information in the original data and using a few new variables to represent the original variables; the Sparrow Search Algorithm (SSA) is a group intelligent optimization Algorithm provided according to Sparrow foraging behavior and anti-predation behavior, has the advantages of strong optimizing capability, high solving efficiency and the like, but similar to other group intelligent optimization algorithms, the SSA Algorithm is easy to have the problems of local extreme value, slow convergence speed and the like in the later iteration stage, so the quality of the initial solution of the SSA Algorithm is improved by adopting a method based on Logistic chaotic mapping, and the global searching capability of the Algorithm is improved; breiman proposed a Random Forest (RF) Algorithm in 2001, which is essentially a classifier including a plurality of classification regression trees, wherein the classification regression trees are independent from each other, the model performs back-placed sampling on training set samples in the training process, 2/3 of the training set samples are extracted each time to construct different classification regression trees, when a new object is classified and distinguished based on certain attributes, each tree in the Random Forest gives its own classification selection and votes from the classification selection, and finally outputs the classification option with the largest number of votes, and the tree depth (dp) and the tree number (es) in the RF model are considered as the most important parameters, so that the parameters are optimized through a Chaos Sparrow Search Algorithm (CSSA), the prediction capability of the Random Forest Algorithm is optimized and improved, and finally a PCA-CSSA-RF model for identifying water inrush sources is established.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a method for identifying a mine water inrush source based on a PCA-CSSA-RF model.

A mine water inrush identification method based on a PCA-CSSA-RF model comprises the following steps:

step 1: collecting relevant data of a water source in a new Zhuang Zi coal mine, and selecting Z groups of water inrush sample data; known as X = [ X ] ₁ ,X ₂ ,……,X _p ]Is an index for distinguishing water inrush sources of mines X _p P indexes of the sample X are provided, i =1,2, wherein p and p are the number of mine water inrush source distinguishing indexes;

the mine water inrush source judgment index mainly comprises main ions Na of underground water which is commonly adopted ⁺ +K+、Ca ²⁺ 、Mg ²⁺ 、Cl ^- 、 SO ₄ ^2- 、HCO ₃ ^- The ion concentration and total hardness (unit is mg/L), the water inrush source type comprises: a coal-series sandstone aquifer, a carboniferous taiyuan limestone solvus aquifer and an Ordovician limestone solvus aquifer;

step 2: standardizing the selected Z groups of water inrush source sample data to obtain standardized sample data Z' of the mine water inrush source;

and step 3: discriminating index X = [ X ] for mine water inrush source ₁ ,X ₂ ,……,X _p ]Carrying out correlation analysis to obtain a Person correlation coefficient matrix among all indexes;

and 4, step 4: processing Z groups of water inrush water source sample data Z' after standardized processing by using a principal component analysis method, extracting a sample data characteristic value to obtain principal component data after dimensionality reduction, and using the principal component data with higher accumulated variance contribution rate for identifying a water inrush water source; selecting Z from the Z groups of water inrush sample data subjected to standardized treatment ₁ Training by using the water burst water source sample data as a training set and using the residual Z ₂ Group data were used as test samples; the principal component extracted according to the principal component analysis method is taken as an input variable,establishing a PCA-CSSA-RF mine water inrush source identification model by taking the water inrush source type as an output target;

step 4.1: calculating a mean value X and a covariance matrix S of the standardized sample data of the mine water inrush source;

step 4.2: calculating eigenvalue r of covariance matrix S and corresponding unit orthogonal eigenvector e _i ；

Step 4.3: smoothly arranging corresponding eigenvectors according to the eigenvalues from big to small to obtain an eigenvector matrix A = [ e = ₁ ,e ₂ ,…,e _p ]；

Step 4.4: obtaining a principal component matrix Y = [ Y ] of the discrimination index data according to the eigenvector matrix A ₁ ,Y ₂ ,……,Y _P ]= AX, wherein Y _i The ith main component data;

step 4.5: calculating the cumulative variance contribution rate G (m) of the principal components corresponding to the characteristic values, and selecting m principal components (m < p) of which the cumulative variance contribution rate is greater than the upper limit threshold of the cumulative variance contribution rate;

step 4.6: the calculation formula of the cumulative variance contribution rate G (m) is as follows:

step 4.7:

step 4.8: wherein m belongs to {1,2, …, p } is the number of selected principal components, k =1,2, …, p;

and 5: optimizing two parameters of the tree depth and the tree number of a random forest algorithm by adopting a chaotic sparrow search algorithm, taking a mean square error as a fitness function of the chaotic sparrow search algorithm, and determining the values of the two parameters of the tree depth and the tree number in the random forest algorithm according to a global optimal solution (a minimum fitness function value) which is finally obtained by continuously updating the chaotic sparrow search algorithm in an iteration process to obtain a mine water inrush source identification model;

step 5.1: setting the tree depth and the number in the random forest algorithm as the problem to be optimized of the sparrow search algorithm, setting the dimensionality of the problem to be optimized of the sparrow search algorithm as D, the size of a sparrow population as N and the maximum iteration coefficient asitermax, the number of discoverers pNum in the sparrow population, reconnaissance early-warning sparrow sNum and an early-warning value R ₂ Logistic mapping function parameters θ and E _t ；

Step 5.2: initializing a sparrow population parameter and two parameters of tree depth and tree number in a random forest algorithm by Logistic chaotic mapping;

step 5.3: the mathematical expression of Logistic chaotic mapping is as follows:

step 5.4: e _t+1 ＝θE _t (1-E _t )；

And step 5.5: wherein, theta is a Logistic mapping function parameter and belongs to the field of 0,4]；E _t ∈[0,1]Representing a function value corresponding to the iteration of the Logistic mapping function to the tth time;

step 5.6: calculating a fitness function value taking a mean square error as a fitness function in the chaotic sparrow search algorithm;

step 5.7: updating the early warning value, and updating the position of the finder according to the early warning value; the discoverers are the groups with better fitness in the sparrow groups, play a role in guiding the search direction of the sparrow groups, the number of the discoverers is marked as pNum, the percentage of the discoverers in the groups is about 10-20%, and the position update is described as follows:

wherein: t =1,2, …, itermax is the number of iterations; eta and Q are both random numbers and eta is an element (0,1)](ii) a L represents a 1 × D all-1 matrix; r ₂ ∈[0,1]Representing an early warning value; ST ∈ [0.5,1]Represents a security value; when R is ₂ <In ST, the finder performs an extensive search operation; otherwise, all sparrows are quickly transferred to the safe area to forage;

step 5.8: and updating the position of the joiner, wherein the position of the joiner is updated according to the following formula:

in the formula:

and

respectively representing the optimal and worst positions of the population in the d-th dimension 2; a represents a 1 XD matrix in which each element is randomly assigned a value of-1 or 1, and A ⁺ ＝A ^T (A A ^T ) ^-1 (ii) a When i is>N/2, the fact that the ith subscriber does not obtain food is shown, and the ith subscriber needs to fly to other places to find food; when i is less than or equal to N/2, the subscriber randomly forages near the current optimal position;

step 5.9: updating the positions of the early-warning sparrows, recording the number of the early-warning sparrows as sNum, accounting for 10-20% of the population, and updating the positions according to the following formula:

in the formula: beta is a random number following a standard normal distribution; k E [ -1,1]Is a random number; f. of _i Is the fitness value of the ith sparrow; f. of _g And f _w The current global best and worst fitness values, respectively; ε is a small constant that avoids the denominator being zero; when f is _i ≠f _g Indicating that the sparrow is at the edge of the population and is vulnerable, f _i ＝f _g Indicating that the sparrows in the middle of the population are aware of the danger and need to be drawn close to other sparrows in the population to avoid being attacked by predators;

step 5.10: updating the fitness value and the optimal sparrow position at the moment;

step 5.11: if the iteration times t is greater than itermax, outputting optimized parameters, namely two parameters of tree depth and number in the random forest algorithm; otherwise, let t = t +1, execute the step;

and 6: constructing a random forest model according to the currently optimized parameters, thereby establishing the random forest model and obtaining a final mine water inrush source prediction model; and (3) taking the principal component data subjected to dimensionality reduction by a principal component analysis method in the training set as the input of a random forest model, taking the data in the training sample set as the output, and verifying the prediction performance of the model by comparing the prediction result of the model with the prediction accuracy, mean square error and absolute average percentage error of the data in the training sample set.

The invention has the beneficial effects that:

the invention provides a mine water inrush source identification method based on a PCA-CSSA-RF model, which increases the difficulty of identifying water inrush sources due to the fact that correlation and nonlinearity often exist among different aquifer water chemistry data, and the identification efficiency of some algorithms is greatly influenced by parameter selection.

Drawings

FIG. 1 is a flow chart of a method for identifying a water source in a mine by flooding based on a PCA-CSSA-RF model according to an embodiment of the present invention;

FIG. 2 is a diagram of a matrix of correlation coefficients of the discriminant indicators in accordance with an embodiment of the present invention;

FIG. 3 is a graph of the cumulative variance contribution of each index in accordance with an embodiment of the present invention;

FIG. 4 is a graph of an algorithm fitness iteration curve in an embodiment of the present invention;

FIG. 5 is a diagram illustrating predicted results of various models according to an embodiment of the present invention.

Detailed Description

The following detailed description of embodiments of the invention refers to the accompanying drawings.

A method for identifying a mine water inrush source based on a PCA-CSSA-RF model is shown in figure 1 and comprises the following steps:

step 1: collecting related data of a water source in a new Zhuang Zi coal mine, and selecting Z groups of water inrush sample data; known as X = [ X ] ₁ ,X ₂ ,……,X _p ]Is an index for distinguishing water inrush sources of mines X _p The number of the indexes is p, i =1,2, and p are the number of the mine water inrush source distinguishing indexes.

The mine water inrush source judgment index mainly comprises main ions Na of underground water which is commonly adopted ⁺ +K+、Ca ²⁺ 、Mg ²⁺ 、Cl ^- 、 SO ₄ ^2- 、HCO ₃ ^- The ion concentration and total hardness (unit is mg/L), the water inrush source type comprises: a coal-series sandstone aquifer, a carboniferous taiyuan limestone solvus aquifer and an Ordovician limestone solvus aquifer.

In this embodiment, a research is performed by taking a new mine water source Zhuang Zi from Huainan as an example, and according to the difference of the water-chemical characteristics in each aquifer of a new Zhuang Zi mine, the water inrush source of the mine is divided into: a coal-series sandstone aquifer I (coal-series water for short), a carboniferous taiyuan limestone solvus aquifer II (taigrey water for short) and an Ordovician limestone solvus aquifer III (Ordovician grey water for short); selecting Na which is a main ion of underground water used more frequently ⁺ +K ⁺ 、Ca ²⁺ 、 Mg ²⁺ 、Cl ^- 、SO ₄ ^2- 、HCO ₃ ^- The ion concentration and the total hardness (the unit is mg/L) of the water source identification model are used as the distinguishing indexes of the mine water inrush source identification model; 45 groups of water inrush data actually measured from a new Zhuang Zi mine are arranged as research samples, and coal-based water I, taisui water II and Aohui water III are taken as 3 prediction categories.

Table 1 shows the original sample data of water burst source

Step 2: and carrying out standardization processing on the selected Z group water inrush source sample data to obtain standardized mine water inrush source sample data Z'.

In this embodiment, in order to simplify the calculation and reduce the magnitude to reduce the influence of the dimension on the discrimination efficiency of the subsequent model, 45 groups of samples of the water inrush source in table 1 are standardized by using the zscore function of the Matlab R2016b software platform.

And 3, step 3: discriminating index X = [ X ] for mine water inrush source ₁ ,X ₂ ,……,X _p ]And carrying out correlation analysis to obtain a Person correlation coefficient matrix among the indexes.

In this embodiment, on the basis of performing standardized dimensionless processing on the raw data, the SPSS25.0 software platform is used to perform correlation analysis on the determination indexes, and a Person correlation coefficient matrix among the indexes is obtained as shown in fig. 2.

It can be seen from the figure that there is a clear correlation between the indices, especially Ca ²⁺ With SO ₄ ^2- The correlation coefficient of (A) was 0.913, ²⁺ the coefficient of correlation with total hardness was 0.949 ⁺ +K ⁺ With HCO ₃ ^- Has a correlation coefficient of 0.865, SO ₄ ^2- The correlation coefficient with the total hardness is 0.833, which indicates that information overlapping exists among the sample indexes, and if the 7 indexes are directly adopted for identifying the water bursting source, the identification precision is inevitably influenced.

And 4, step 4: and processing the standardized water inrush water source sample data Z by using a principal component analysis method, extracting a sample data characteristic value to obtain principal component data after dimensionality reduction, and using the principal component data with higher accumulative variance contribution rate for identifying the water inrush water source. Selecting Z from the water inrush sample data Z subjected to standardized treatment ₁ Training by using the water burst water source sample data as a training set and using the residual Z ₂ Group data were used as test samples. And (3) constructing a PCA-CSSA-RF mine water inrush source identification model by taking the principal components extracted according to the principal component analysis method as input variables and taking the type of the water inrush source as an output target.

Step 4.1: calculating a mean value X and a covariance matrix S of the standardized sample data of the water source of the mine water inrush;

step 4.2: calculating eigenvalue r of covariance matrix S and corresponding unit orthogonal characteristicEigenvector e _i ；

Step 4.4: obtaining a principal component matrix Y = [ Y ] of the discrimination index data according to the eigenvector matrix A ₁ ,Y ₂ ,……,Y _P ]= AX, wherein Y _i Is the ith major component data.

Step 4.5: the cumulative variance contribution rate G (m) of the principal component corresponding to the eigenvalue is calculated, and m principal components (m < p) having a cumulative variance contribution rate larger than the cumulative variance contribution rate upper limit threshold are selected.

In the present embodiment, the upper threshold of the cumulative variance contribution rate is 95%.

Extracting sample data characteristic values by using a principal component analysis method to obtain variance contribution rates of the first four principal components of which all are more than 10%, wherein the variance contribution rates are 0.445502041, 0.245515105, 0.168117093 and 0.111299057 respectively, the cumulative variance contribution rate reaches 97.04%, the variance contribution rates of the last three principal components are smaller, the variance contribution rates are 0.020073461, 0.005162479 and 0.004330762 respectively, the cumulative variance contribution rate is only 2.96%, therefore, the first four principal components are selected for identifying the water inrush source, the coefficients of the principal components are shown in table 2, and Y in the table is Y ₁ 、Y ₂ 、Y ₃ 、Y ₄ Represents a main component, X ₁ 、 X ₂ …X ₇ Respectively represent Ca ²⁺ 、Mg ²⁺ 、Na ⁺ +K ⁺ 、HCO ₃ ^- 、SO ₄ ^2- And Cl ^- Each discrimination index.

Table 2 shows the coefficients of the respective principal components

Mathematical models as shown in formulas (8) to (11) can be constructed based on the principal component coefficients in table 2.

Table 3 shows partial sample data after the dimension reduction processing, and the cumulative variance contribution ratio of each index is shown in fig. 3.

Table 3 shows sample data (partial data) after the PCA dimensionality reduction processing.

And 5: the method comprises the steps of optimizing two parameters of tree depth and tree number of a random forest algorithm by adopting a chaotic sparrow search algorithm, taking a mean square error as a fitness function of the chaotic sparrow search algorithm, and determining values of the two parameters of the tree depth and the tree number in the random forest algorithm according to a global optimal solution (a minimum fitness function value) which is finally obtained by continuously updating the chaotic sparrow search algorithm in an iteration process to obtain a mine water inrush source identification model.

Step 5.1: setting the tree depth and the number in the random forest algorithm as the problem to be optimized of the sparrow search algorithm, setting the dimension of the problem to be optimized of the sparrow search algorithm as D, the size of a sparrow population as N, the maximum iteration coefficient as itermax, the number of discoverers in the sparrow population as pNum, the reconnaissance early warning sparrow sNum, the early warning value R2, the Logistic mapping function parameters theta and E _t 。

In the embodiment, 33 groups of water inrush source sample data in table 3 are selected as a training set for training, and the remaining 12 groups of data are used as test samples. The 33 sets of data include 12 sets of coal measure water data, 12 sets of Tai Hui water data and 9 sets of Ao Hui water data. And (3) constructing a PCA-CSSA-RF mine water inrush source identification model by taking the 4 main components extracted according to the PCA as input variables and taking the type of the water inrush source as an output target.

The initial SSA parameter settings are as follows: sparrow population size N =30; maximum number of iterations iter _max =100; the percentage of discoverers in the sparrow population, pNum, is set to 30%; the ratio of the scouting early warning sparrow sNum is set to be 10 percent; dimension D =2 of the solution; early warning value R ₂ =0.8; mean square error (MAE) is taken as a fitness function. Logistic mapping function parameter θ =4,E ₀ =0.7.RF model parameter setting: each node in RF selects 2 features to form a split feature set, and the value ranges of dp and es are [0,50 ]]The number of the binary tree is set to 200.

And step 5.2: initializing a sparrow population parameter and two parameters of tree depth and tree number in a random forest algorithm by Logistic chaotic mapping;

step 5.4: e _t+1 ＝θE _t (1-E _t )；

Step 5.5: wherein, theta is a Logistic mapping function parameter and belongs to the field of 0,4]；E _t ∈[0，1]Representing a function value corresponding to the iteration of the Logistic mapping function to the tth time;

step 5.7: and updating the early warning value, and updating the position of the finder according to the early warning value. The discoverer is a group with better fitness in the sparrow population, plays a role in guiding the search direction of the sparrow population, the number of the discoverer is marked as pNum, the discoverer accounts for about 10-20% of the sparrow population, and the position update is described as follows:

wherein: t =1,2.. Itermax is the number of iterations; eta and Q are both random numbers and eta is an element (0,1)](ii) a L represents a 1 × D all-1 matrix; r ₂ ∈[0，1]Representing an early warning value; ST ∈ [0.5,1]Representing a security value. When R is ₂ If ST is less than ST, the finder performs an extensive search operation; otherwise, all sparrows are quickly transferred to the safe area to forage;

in the formula:

and

respectively representing the optimal and worst positions of the population in the d dimension; a represents a 1 × D matrix in which each element is randomly assigned a value of-1 or 1, and A ⁺ ＝A ^T (A A ^T ) ^-1 . When i is more than N/2, the ith participant does not obtain food, and the participant needs to fly to other places to feed; when i is less than or equal to N/2, the joiner randomly forages near the current optimal position;

in the formula: beta is a random number following a standard normal distribution; k E [ -1,1]Is a random number; f. of _i Is the fitness value of the ith sparrow; f. of _g And f _w The current global best and worst fitness values, respectively; epsilon isA very small constant with zero denominator is avoided. When f is _i ≠f _g Indicating that the sparrow is at the edge of the population and is vulnerable, f _i ＝f _g Indicating that the sparrows in the middle of the population are aware of the danger and need to be drawn close to other sparrows in the population to avoid being attacked by predators;

step 5.11: and if the iteration times t is more than itermax, outputting optimized parameters, namely two parameters of tree depth and number in the random forest algorithm. Otherwise, let t = t +1, execute the step;

step 6: and constructing a random forest model according to the currently optimized parameters, thereby establishing the random forest model and obtaining a final mine water inrush source prediction model. And (3) taking the principal component data subjected to dimensionality reduction by a principal component analysis method in the training set as the input of a random forest model, taking the data in the training sample set as the output, and verifying the prediction performance of the model by comparing the prediction result of the model with the prediction accuracy, mean square error and absolute average percentage error of the data in the training sample set.

In the embodiment, a PCA-CSSA-RF water inrush source recognition model is realized by utilizing Matlab R2016b software programming, 12 groups of data are tested and recognized after the model is trained by 33 groups of data, the recognition result is shown in a table 4, and the correct recognition rate is 100 percent according to the table.

Table 4 shows CSSA-RF model identification results

In order to further verify the accuracy and reliability of the PCA-CSSA-RF mine water inrush source identification model provided by the method, the same sample set is selected to respectively establish PCA-SSA-RF, CSSA-RF and RF models for sample data, and comparison analysis is carried out on the judgment results of the models.

The models are all programmed and realized on a Matlab software platform, the running time of the PCA-CSSA-RF models on software is 0.9702s, and the running times of the PCA-SSA-RF models, the CSSA-RF models and the RF models are 1.4273s, 1.2844s and 39.27s respectively, so that the PCA-CSSA-RF models can realize the identification of water inrush sources more quickly compared with other 3 models.

Fig. 4 is an iterative curve of the fitness value of each model, and it can be known from fig. 4 that 3 models gradually tend to the minimum mean square error (minimum fitness value) in the iterative process under the same iteration number. The fitness curve of the PCA-CSSA-RF model jumps out of a local optimal solution for multiple times and gradually approaches to a steady state when the iteration is carried out for 20 times, and compared with the PCA-SSA-RF model, the global mean square error minimum value is reached earlier, so that the model can reach the optimal state faster than other models after the SSA is improved. In the three models, the mean square error under the PCA-CSSA-RF model is the minimum, and the model has higher discrimination accuracy.

The data of the test set is judged back according to the model established by the training set, fig. 5 is the judgment result of each model, and it can be known from the figure that the judgment result of the PCA-CSSA-RF model provided herein is consistent with the real situation, the misjudgment rate of CSSA-RF is 1/12, the misjudgment rate of PCA-SSA-RF is 2/12, the misjudgment rate of RF is 4/12, compared with other models, the misjudgment rate of the PCA-CSSA-RF model provided herein is the lowest, and the misjudgment rate of the traditional RF model of the model is the highest.

In order to further objectively compare the recognition results of various models, indexes such as prediction accuracy, mean Square Error (MSE), mean absolute error percentage (MAPE) and the like are selected as comparison bases, and the recognition results of 4 models are shown in Table 5.

Table 5 shows comparison of recognition results of the respective models

As can be seen from Table 5, the prediction accuracy of the PCA-CSSA-RF model is 100%, and compared with other prediction models, the mean square error and the average error percentage are remarkably reduced, so that the model is further proved to have higher recognition efficiency and to be an effective method for recognizing the water inrush source of the mine.

In conclusion, by comparing the PCA-CSSA-RF model with the CSSA-RF model and the identification results and various indexes of the RF models, the fact that the model which adopts the PCA to carry out reduction processing on the data dimension is superior to the model which does not carry out dimension reduction processing in prediction accuracy, mean square error and mean absolute error percentage is found, and the fact that after the data are subjected to dimension reduction processing can effectively eliminate redundant information interference and improve the classification efficiency of the model is demonstrated; the prediction accuracy and other indexes of the CSSA-RF model are only second to those of the PCA-CSSA-RF model and are superior to those of other models, the CSSA has better optimization capability by side reaction, the condition that the model is trapped into a local optimal solution prematurely is reduced, and the global optimization performance of the RF is improved.

Claims

1. A method for identifying a mine water inrush source based on a PCA-CSSA-RF model is characterized by comprising the following steps:

step 1: collecting related data of a water source in a new Zhuang Zi coal mine, and selecting Z groups of water inrush sample data; known as X = [ X ] ₁ ,X ₂ ,……,X _p ]Is an index for distinguishing water bursting sources of mines, X _p P indexes of the sample X are i =1,2, and p are the number of the mine water inrush source distinguishing indexes;

And step 3: discriminating index X = [ X ] for mine water inrush source ₁ ,X ₂ ,……,X _p ]And carrying out correlation analysis to obtain a Person correlation coefficient matrix among the indexes.

And 4, step 4: processing Z groups of water inrush water source sample data Z' after standardized processing by using a principal component analysis method, extracting a sample data characteristic value to obtain principal component data after dimensionality reduction, and using the principal component data with higher accumulated variance contribution rate for identifying a water inrush water source; the water inrush sample data Z after the standardization processingSelection of Z from the group ₁ Training by using the water burst water source sample data as a training set, and training the residual Z ₂ Group data were used as test samples; and (3) constructing a PCA-CSSA-RF mine water inrush source identification model by taking the principal components extracted according to the principal component analysis method as input variables and taking the type of the water inrush source as an output target.

Step 5.1: setting the tree depth and the number in the random forest algorithm as the problem to be optimized of the sparrow search algorithm, setting the dimensionality of the problem to be optimized of the sparrow search algorithm as D, the size of a sparrow population as N, the maximum iteration coefficient as itermax, the number of discoverers pNum in the sparrow population, the reconnaissance early warning sparrow sNum and the early warning value R ₂ Logistic mapping function parameters θ and E _t 。

step 5.3: calculating a fitness function value taking a mean square error as a fitness function in the chaotic sparrow search algorithm;

step 5.4: updating the early warning value, and updating the position of the finder according to the early warning value;

step 5.5: updating the location of the joiner;

step 5.6: updating the position of the early-warning sparrow;

step 5.7: updating the fitness value and the optimal sparrow position at the moment;

step 5.8: and if the iteration times t is greater than itermax, outputting optimized parameters, namely two parameters of tree depth and number in the random forest algorithm. Otherwise, let t = t +1, carry out the step;

step 5.9: according to the step 5.8, obtaining the depth and the number of the optimized parameter trees, and constructing a random forest model according to the currently optimized parameters so as to establish a random forest model; and (3) taking the principal component data subjected to dimensionality reduction by a principal component analysis method in the training set as the input of a random forest model, taking the data in the training sample set as the output, and verifying the prediction performance of the model by comparing the prediction result of the model with the prediction accuracy, mean square error and absolute average percentage error of the data in the training sample set.

2. The method of claim 1, wherein the mine water burst source identification criteria includes mostly the more commonly used Na as a primary ion of groundwater ⁺ +K+、Ca ²⁺ 、Mg ²⁺ 、Cl ^- 、SO ₄ ^2- 、HCO ₃ ^- The ion concentration and total hardness (unit is mg/L), the water inrush source type comprises: a coal-series sandstone aquifer, a carboniferous taiyuan limestone solvus aquifer and an Ordovician limestone solvus aquifer.

3. The method of claim 1, wherein the mine water inrush sample data Z is standardized by using a zscore function of a Matlab R2016b software platform.

4. The PCA-CSSA-RF model-based mine water inrush source identification method of claim 1, wherein the step 4 comprises the steps of:

Step 4.4: obtaining from the eigenvector matrix APrincipal component matrix Y = [ Y ] to discriminant index data ₁ ,Y ₂ ,……,Y _P ]= AX, wherein Y _i Is the ith major component data.

Step 4.5: the cumulative variance contribution rate G (m) of the principal components corresponding to the feature values is calculated, and m principal components (m < p) having a cumulative variance contribution rate larger than the upper threshold of the cumulative variance contribution rate are selected.

5. The method for identifying a mine water inrush source based on the PCA-CSSA-RF model according to claim 1, wherein the mathematical expression of the Logistic chaotic map is as follows:

E _t+1 ＝θE _t (1-E _t )

wherein, theta is a Logistic mapping function parameter and belongs to the field of 0,4]；E _t ∈[0,1]Representing a function value corresponding to the iteration of the Logistic mapping function to the tth time;

6. the method for identifying the mine water inrush source based on the PCA-CSSA-RF model as claimed in claim 1, wherein the discoverers are groups with better fitness in the sparrow population, and play a role in guiding the search direction of the sparrow population, the number of the discoverers is marked as pNum, the percentage of the discoverers in the population is about 10% -20%, and the location update is described as follows:

wherein: t =1,2, …, itermax is the number of iterations, η, Q are both random numbers and η e (0,1)]L denotes a 1 × D matrix of all 1, R ₂ ∈[0,1]Represents the early warning value, ST ∈ [0.5,1]Represents a security value; when R is ₂ <In ST, the finder performs an extensive search operation; otherwise, all sparrows are rapidly transferred to the safe area to be foraged.

7. The PCA-CSSA-RF based mine flood water source identification method of claim 1, wherein the subscriber location update formula is as follows:

in the formula:

and

respectively representing the optimal and worst positions of the population 2 in the d-th dimension; a represents a 1 × D matrix in which each element is randomly assigned a value of-1 or 1, and A ⁺ ＝A ^T (A A ^T ) ^-1 . When i is>N/2, the fact that the ith participant does not obtain food is shown, and the participant needs to fly to other places to find food; and when i is less than or equal to N/2, the entrant randomly forages near the current optimal position.

8. The method for identifying a mine water inrush source based on the PCA-CSSA-RF model as claimed in claim 1, wherein the number of scouting and early warning sparrows is recorded as sNum, the percentage of the scouting and early warning sparrows in a population is about 10% -20%, and the position updating formula is as follows:

in the formula: beta is a random number which obeys standard normal distribution, and K is ∈ [ -1,1]Is a random number, f _i Is the fitness value of the ith sparrow, f _g And f _w The current global best and worst fitness values, respectively, ε is a very small constant that avoids the denominator being zero; when f is _i ≠f _g Showing that the sparrow is at the edge of the population and is vulnerable to attack, f _i ＝f _g Indicating that the sparrow in the middle of the population is aware of the danger, it is necessary to move closer to other sparrows in the population to avoid being attacked by predators.

9. The method for identifying a mine water inrush source based on the PCA-CSSA-RF model of claim 1, wherein the calculation formula for calculating the cumulative variance contribution rate G (m) of the corresponding principal component is as follows:

where m ∈ {1,2, …, p } is the number of selected principal components, and k =1,2, …, p.