CN113094247B

CN113094247B - Real-time prediction method for running state of coal mining machine based on Storm

Info

Publication number: CN113094247B
Application number: CN202110438420.8A
Authority: CN
Inventors: 黄玉鑫; 闫振国; 范京道; 刘睿卿; 王延平
Original assignee: Xian University of Science and Technology
Current assignee: Xian University of Science and Technology
Priority date: 2021-04-22
Filing date: 2021-04-22
Publication date: 2023-08-15
Anticipated expiration: 2041-04-22
Also published as: CN113094247A

Abstract

The application relates to the technical field of coal mine safety, and discloses a real-time prediction method for the running state of a coal mining machine based on Storm. And combining various data predicted by the prediction model GRU with the threshold value of each data based on Storm, so as to realize early warning of the data state. And GRU is trained on training set from RMSE, MAE and R ² The three aspects respectively evaluate the performance of the GRU in the test set, and prove that the prediction model GRU can be suitable for predicting the state data of the coal mining machine. The application breaks through the traditional monitoring method for the running state of the coal mining machine, integrates the prediction and early warning method for the real-time running state of the coal mining machine, and has great practical value in parallel processing of massive coal mine data.

Description

Real-time prediction method for running state of coal mining machine based on Storm

Technical Field

The application relates to the technical field of coal mine safety, in particular to a real-time prediction method for the running state of a coal mining machine based on Storm.

Background

The coal mining machine is used as one of the fully-mechanized coal mining three machines, the working environment is complex, along with the development of coal mine intellectualization, the coal mining machine is provided with a plurality of sensors, the sampling frequency is high, the data collected every day are increased by PB magnitude, and through real-time collection and analysis of the running state data of the coal mining machine, the change of the data at the next moment and whether the state of the coal mining machine is abnormal or not are predicted, so that the data can be effectively utilized, the safety of the coal mining machine and personnel can be ensured to a certain extent, and the real-time monitoring and prediction of the state data of the coal mining machine are significant to the intelligent running of the coal mining machine.

In the process of researching the running state of the coal mining machine in the past, the monitoring data of the coal mining machine is subjected to simple mathematical statistical analysis, so that the research on how to predict the running state of the coal mining machine in real time is less, and the following defects are caused: (1) In the practical process, the traditional preprocessing method is used for judging whether the abnormal data is useful data capable of extracting equipment state information or useless data capable of being cleaned, the dynamic characteristics of time sequence data cannot be adapted, a preprocessing cleaning model capable of adapting to the time sequence characteristics needs to be established, useless abnormal values of the coal mining machine can be identified, and the useless abnormal values can be dynamically repaired; (2) When the algorithms such as ARIMA, SVR and hidden Markov model are used for predicting time series data, the accuracy can meet the requirements, but the requirements of big data cannot be met, and the method is not easy to apply to practice. Although the deep learning method is excellent in large data, the deep learning method is less applied to coal mine data, the coal mine data is complex and changeable, a large amount of unreal data exists, and reasonable pretreatment is needed to carry out training prediction through a deep learning model. (3) The data volume of the coal mining machine is huge, the realization of coal mine intellectualization needs to realize the parallelization real-time processing of the data of each sensor, the existing coal mine data research is based on offline data processing, and the hysteresis of the data processing leads to the value dip of the data.

Storm is an open-source distributed real-time computing framework, can easily process infinite data flow, is widely applied to the aspects of power grids and big data, is less used in the real-time processing of coal mine data, has the problems of big data quantity, noise, missing value and the like of the running state of fully-mechanized coal mining equipment aiming at the coal mine equipment in the aspect of processing the coal mine equipment data, establishes a MapReduce-based big data cleaning model of the running state of the fully-mechanized coal mining equipment, but does not meet the real-time requirement by Hadoop's MapReduce; cao Xiangang and the like establish a Storm-based data real-time cleaning platform aiming at the problems of noise points and missing values of the running state data of the coal mining machine, but prediction and early warning of the running state data of the coal mining machine are not realized through Storm.

Disclosure of Invention

Aiming at the defects existing in the prior art, the application aims to provide a real-time prediction method for the running state of a coal mining machine based on Storm.

In order to achieve the above purpose, the application adopts the following technical scheme:

a real-time prediction method for the running state of a coal mining machine based on Storm comprises the following steps:

step one, designing a data storage structure

The data storage structure is designed based on a Hadoop Database distributed storage Database, and comprises an index table and a data table;

the index table comprises monitoring point positions (locations), GRU Prediction models (Prediction models) and early warning Threshold Upper and Lower bounds (Threshold Upper and Lower Bound);

the data table simulates a time sequence in a line growth mode, each column represents sensor time sequence data acquired by a certain monitoring point, in actual production, real-time operation data of each monitoring point are stored into a Hadoop Database at fixed time intervals, and the Hadoop Database is allocated by a main control node Nimbus to transmit the data into a Spout for processing;

step two, spout design

The method comprises the steps of simulating stream data collected in production through a Hadoop Database distributed storage Database, transmitting the stream data to different message queues Spout, extracting information of corresponding monitoring points from a data stream by the Spout through calling a next complex () method, and transmitting information package to a Bolt, wherein the specific steps are as follows:

21 Reading n data from the data streams of the corresponding monitoring points;

22 Judging whether the sequence length formed by the data reaches N (predicted historical data sample capacity), if so, executing a step 24, otherwise, executing a step 23;

23 Reading new data from the data stream, adding the new data to the end of the sequence, jumping back to step 22;

24 Packaging into a tuple pattern and sending the tuple pattern to the preprocessing Bolt;

step three, bolt design

The method comprises the following specific steps that a plurality of bolts are used for calling an execution () method to respectively realize preprocessing, prediction and early warning of data, wherein the method comprises the following specific steps:

31 Embedding a python packet of a preprocessing algorithm in an execute () method in the preprocessing Bolt, automatically calling the python packet of the preprocessing algorithm to preprocess the original data when the preprocessing Bolt receives N data corresponding to the monitoring point through tuple analysis transmitted from Spout, and transmitting the preprocessed data to the GRU prediction Bolt in a tuple form;

32 Embedding a trained GRU model python packet in an execution () method in the GRU prediction Bolt, when the prediction Bolt receives data transmitted by the preprocessing Bolt, automatically calling the trained GRU model, predicting data at the next moment, sending a predicted value to the early warning Bolt, waiting for actual data at the moment, and then sending the actual data to the early warning Bolt;

33 The early warning Bolt designs the early warning of the state data of the coal mining machine by using a time sliding window model:

given a time t and a span d, at [ t-d, t]The data stream arriving in the time period is the time basic window, which is marked as W, and the j-th time basic window is marked as W _j The method comprises the steps of carrying out a first treatment on the surface of the A sequence of successive time base windows forms a time sliding window W _S ，W _Si ＝W _i-n+j ,W _i-n+j+1 ,…,W _i And (3) for the time sliding window after the ith basic window arrives, wherein n represents the number of the basic windows accommodated in one time sliding window, each basic window judges the early warning level of data in parallel, only the tuples after the threshold value judgment of each basic window are cached, each tuple of the original window is not required to be cached one by one, the early warning result is stored in a storage bolt database, and early warning is carried out when the predicted value of a certain sensor exceeds the threshold value.

Further, GRU model training is carried out, and df is used as input data after the original data are preprocessed by the missing value, the abnormal value and the noise; input data were written at 7:3, dividing the proportion into a training set and a testing set, training the hidden layer aiming at the training set, adjusting the super parameters of the model through an optimization function Adam and a loss function MSE, optimizing the model by taking the minimum loss value loss as an optimization criterion, finding out the best comprehensive performance in the training set once, and storing the loading model to obtain the optimal result of epoch at the time; and (3) finding GRU parameters conforming to the characteristics of the data through training, simulating real-time data flow on a Hadoop Database by a test set, inputting data of various test sets into each workbench in parallel under the regulation and control of a main control node Nimbus, and completing prediction and early warning of the data in respective topologies.

Further, after training the GRU in the training set, evaluating the performance of the GRU in the test set from three aspects of RMSE, MAE and R2, wherein the calculation formulas are as follows:

wherein the method comprises the steps ofAs predicted value, y _i Is a true value.

In the first step, after the Hadoop Database is allocated by the main control node Nimbus, the task allocated by Nimbus is received by the Supervisor node, all the work works are managed and started, and the data is transmitted into the Spout for processing.

In a preferred embodiment of the application, the shearer operating state data includes a shearer motor current, a shearer motor temperature, a shearer motor current, a shearer motor speed, a pump operating pressure, a pump operating speed, a coolant water pressure, and a frequency converter current in the shearer.

Further, in step 33), the sliding distance of the sliding window model is 1s, the sliding window size is 1min, the sliding window is divided into 60 basic time windows, the attribute set represents the set of state data of the coal mining machine, each basic window concurrently judges the early warning level of the data, only 60 tuples judged by the threshold value of each basic window are cached, each tuple of the original window does not need to be cached one by one, and the early warning result is stored in the storage bolt database.

In a preferred embodiment of the present application, the threshold is set based on past experience and error requirements.

The prediction framework principle of the prediction method comprises the following steps: the running state data of the coal mining machine are continuously generated at fixed time intervals, the Storm provides a platform for online real-time processing of the state detection data of the coal mining machine, the flow data collected in the simulation production of a Hadoop Database distributed storage Database is transmitted to different message queues Spout, each Spout transmits the data to the corresponding Bolt in the form of a tuple flow, and the pretreatment, the prediction, the error calculation and the storage of the data are respectively realized through a plurality of bolts, so that the real-time prediction of the running state data of the coal mining machine is finally realized.

The application breaks through the traditional monitoring method for the running state of the coal mining machine, integrates the prediction early warning method for the real-time running state of the coal mining machine, adopts a GRU (Gate Recurrent Unit, gate control circulating unit) model in deep learning to predict the time sequence data of the coal mining machine aiming at a large amount of time sequence data for a long time, monitors the predicted value of each sensor in real time after the stability of the predicted model is ensured through training, and early warns when the predicted value of a certain sensor exceeds a threshold value; by checking the accuracy and the processing efficiency of early warning, the accuracy of various data state predictions reaches more than 85%, the implementation time of the whole early warning process is only about 10s, which is far lower than 1min of the measuring point data interval, and the storm distributed real-time processing framework has great practical value in parallel processing of massive coal mine data.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, given with reference to the accompanying drawings in which:

FIG. 1 is a basic structure of a Storm data stream model;

FIG. 2 is a Storm framework;

FIG. 3 is a schematic diagram of a sliding window;

FIG. 4 is a graph showing the comparison between the predicted and actual states of the data in example 2;

fig. 5 is a graph showing the comparison of the actual values and predicted results for 300 points in each test set of example 2.

Detailed Description

The present application will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the present application, but are not intended to limit the application in any way. It should be noted that variations and modifications could be made by those skilled in the art without departing from the inventive concept. These are all within the scope of the present application.

Example 1

As shown in FIG. 1, storm is an open-source distributed real-time computing framework that can easily handle unlimited data streams. The core components of the Storm mainly comprise a master node Nimbus and a slave node super. The Nimbus node is mainly responsible for resource allocation and task scheduling, the Supervisor node is responsible for receiving tasks allocated by Nimbus, managing and starting all work workers, 1 Supervisor corresponds to 4 workers, 1 Worker corresponds to 1 Topology, and the Topology consists of Stream, spout and Bolt. Stream is a data Stream; spout acts as a collector, connecting to a data source; and the Bolt is a service logical operation node, subscribes a plurality of spots, and realizes operations such as service processing, connection operation and the like.

As shown in fig. 2, a frame diagram of Storm of the present application is shown, and the prediction frame principle of the prediction method is as follows: the running state data of the coal mining machine are continuously generated at fixed time intervals, the Storm provides a platform for online real-time processing of the state detection data of the coal mining machine, the flow data collected in the simulation production of a Hadoop Database distributed storage Database is transmitted to different message queues Spout, each Spout transmits the data to the corresponding Bolt in the form of a tuple flow, and the pretreatment, the prediction, the error calculation and the storage of the data are respectively realized through a plurality of bolts, so that the real-time prediction of the running state data of the coal mining machine is finally realized.

step one, designing a data storage structure

The data storage structure is designed based on a Hadoop Database distributed storage Database and comprises an index table and a plurality of data tables;

the index table comprises monitoring point positions (locations), GRU Prediction models (Prediction models) and early warning Threshold Upper and Lower bounds (Threshold Upper and Lower Bound), and the index table is shown in table 1:

table 1 index table

The data table simulates a time sequence in a line growth mode, each column represents sensor time sequence data acquired by a certain monitoring point, in actual production, real-time operation data of each monitoring point are stored into a Hadoop Database at fixed time intervals, and the Hadoop Database is allocated by a main control node Nimbus to transmit the data into a Spout for processing; the data table is shown in table 1:

table 2 data sheet

Step two, spout design

21 Reading n data from the data streams of the corresponding monitoring points;

step three, bolt design

given a time t and a span d, at [ t-d, t]The data stream arriving in the time period is the time basic window, which is marked as W, and the j-th time basic window is marked as W _j The method comprises the steps of carrying out a first treatment on the surface of the A sequence of successive time base windows forms a time sliding window W _S ，W _Si ＝W _i-n+j ,W _i-n+j+1 ,…,W _i And the time sliding window after the i-th basic window arrives is the time sliding window, wherein n represents the number of the time sliding window containing the basic windows, and the time sliding window is shown in fig. 3.

The sliding distance of the sliding window model is 1s, the sliding window size is 1min, the sliding window is divided into 60 basic time windows, the attribute set represents a set of state data of the coal mining machine, each basic window judges early warning levels of the data in parallel, only 60 tuples judged by threshold values of each basic window are cached, each tuple of the original window is not required to be cached one by one, and the early warning results are stored in a storage bolt database.

Example 2

In the embodiment, 3 PCs with the same configuration are selected to build a Storm distributed cluster environment, and each machine is provided with a virtual machine. The three virtual machine operating systems are central OS6.8, one of the three virtual machine operating systems is used as a Master, nimbus nodes are arranged, the other two virtual machine operating systems are provided with Supervisor nodes, after the Nimbus receives a task of a Storm cluster, resources are allocated to the Supervisor through a Zookeeper, and a main node Nimbus dual-core single processor, a 4GB memory and a 40G hard disk are arranged; the auxiliary node single-core single-processor, the 2GB memory and the 20G hard disk.

Taking data of an MG400930-WD electric traction coal mining machine of a fully mechanized mining face of a certain mine as an example, taking 1000 pieces of monitoring data of cutting part motor current, cutting part motor temperature, traction part motor current, traction part motor rotating speed, pump working pressure and pump working rotating speed, cooling water pressure and frequency converter current in the coal mining machine as experimental data.

First, a GRU model is trained. After the original data is preprocessed by the missing value, the abnormal value and the noise, df is used as input data. Input data were written at 7:3, dividing the proportion into a training set and a testing set, training the hidden layer aiming at the training set, adjusting the super parameters of the model through an optimization function Adam and a loss function MSE, optimizing the model by taking the minimum loss value loss as an optimization criterion, finding out the best comprehensive performance in the training set once, and storing the loading model to obtain the optimal result of epoch.

And (3) finding GRU parameters conforming to the characteristics of the data through training, simulating real-time data flow on a Hadoop Database by a test set, setting a basic time window to be 1min, inputting data of eight test sets into 8 workers in parallel under the regulation and control of a main control node Nimbus, and completing prediction and early warning of the data in respective topologies.

Prediction results of the prediction model:

cutting part motor current, cutting part motor temperature, traction part motor current, traction part motor rotating speed, pump working pressure regulation, pump working rotating speed regulation, cooling water pressure regulation, eight monitoring data of frequency converter current are respectively expressed by 1-8, optimizing training frequency epoch, learning rate, neuron number hidden size, weight attenuation weight decade, time step timetable, number of training samples each time N, hidden layer number layer, fitting goodness R ² By C ₁ ～C ₈ The result of the GRU model in the super-parameter optimizing of the experimental training set is shown in table 3.

TABLE 3 GRU super parameter optimizing results

The trained GRU is imported into a Bolt, the test sets simulate real-time data flow, and the real values and the predicted result pairs of 300 points of each test set are shown in figure 5.

The present embodiment uses Root Mean Square Error (RMSE), mean Absolute Error (MAE), and goodness of fit (R ² ) As an evaluation index. The calculation formulas are respectively as follows:

wherein the method comprises the steps ofAs predicted value, y _i As a true value, test set resultsAs shown in table 4 below:

table 4 comparison of evaluation index

R2 represents the goodness of fit, and the closer to 1, the better the effect. The closer to 0 the mean absolute error and root mean square error of MAE and RMSE characterization, the better the effect. As can be seen from the data in Table 4, the fitting goodness of the predicted value and the actual value of the R2 representation reaches more than 90%, the magnitude of the MAE and the RMSE is small compared with the magnitude of experimental data, and the model can be suitable for predicting the running state data of the coal mining machine.

Accuracy of early warning:

in this example, the threshold setting was performed for each data of the experiment according to the conventional experience and error requirements, and the threshold setting for each data is shown in table 5 below.

Table 5 data threshold settings

The states of the respective data can be classified into three types according to threshold setting: normal, attention, and failure. When the coal mining machine runs, the data does not reach the attention value, the state is normal running, the working performance is stable, and no measures are required; the data reaches an attention value but does not reach a fault threshold value, and the state is attention monitoring; when the data reaches the fault threshold, the state is fault waiting repair. And predicting the data state and making corresponding early warning before the predicted data are obtained by the predictive Bolt and the actual data are not obtained. The comparison result of the predicted state and the actual state of each data of the test set is shown in fig. 4; normal, attention and failure are respectively indicated by 1,2 and 3, and the prediction accuracy of each data is shown in Table 6:

table 6 early warning accuracy

The state early warning accuracy of the rest data except the cooling water pressure reaches more than 95%, so that the practical requirements are met. The cooling water pressure is smaller in data, the attention value and the fault threshold interval are small, the error is larger in prediction, but the accuracy is higher as high as more than 85%, and the method has a certain practical value.

Processing efficiency based on Storm platform:

in the embodiment, data obtained from a sensor are simulated and transmitted into a Database Hadoop Database, a Storm reads the data from the Hadoop, a master control node Nimbus monitors and distributes tasks through a zookeeper, and specific processing logic is submitted to a workbench through a Supervisor node. The present embodiment measures the time required for the database to transfer stream data into Spout, respectively; the time required by the Spout to process the data distribution to the Bolt; the total time required to pre-process the Bolt, predict the Bolt, pre-warn the Bolt, and store the Bolt. The processing times for each section are shown in table 7 below:

TABLE 7 processing timetable

The processing speed of the Spout and each Bolt in the database and the workbench is very high, the realization time of the whole early warning process is only about 10s, and the requirements of measuring point data intervals of 1min can be met.

The foregoing describes specific embodiments of the present application. It is to be understood that the application is not limited to the particular embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the claims without affecting the spirit of the application.

Claims

1. A real-time prediction method for the running state of a coal mining machine based on Storm is characterized by comprising the following steps:

step one, designing a data storage structure

the index table comprises monitoring point positions, GRU prediction models and early warning threshold upper and lower bounds;

step two, spout design

21 Reading n data from the data streams of the corresponding monitoring points;

22 Judging whether the sequence length formed by the data reaches N, if so, executing a step 24, otherwise, executing a step 23;

step three, bolt design

2. The Storm-based coal mining machine operation state real-time prediction method according to claim 1, further comprising training a GRU model, preprocessing original data by using missing values, abnormal values and noise, and taking df as input data; input data were written at 7:3, dividing the proportion into a training set and a testing set, training the hidden layer aiming at the training set, adjusting the super parameters of the model through an optimization function Adam and a loss function MSE, optimizing the model by taking the minimum loss value loss as an optimization criterion, finding out the best comprehensive performance in the training set once, and storing the loading model to obtain the optimal result of epoch at the time; and (3) finding GRU parameters conforming to the characteristics of the data through training, simulating real-time data flow on a Hadoop Database by a test set, inputting data of various test sets into each workbench in parallel under the regulation and control of a main control node Nimbus, and completing prediction and early warning of the data in respective topologies.

3. The Storm-based coal mining machine operation state real-time prediction method according to claim 1, wherein after training the GRU in a training set, the three aspects of RMSE, MAE and R2 are used for respectively evaluating the performance of the GRU in a test set, and the calculation formulas are as follows:

4. The Storm-based real-time prediction method for the running state of a coal mining machine according to claim 1, wherein the real-time running data of each monitoring point is prepared by a Hadoop Database through a main control node Nimbus, and a Supervisor node is responsible for receiving tasks distributed by the Nimbus, managing and starting all work workers, and transmitting the data into a Spout for processing.

5. The Storm-based coal mining machine operational state real-time prediction method of claim 1, wherein said coal mining machine operational state data comprises a cutter motor current, a cutter motor temperature, a tractor motor current, a tractor motor speed, a pump up operating pressure, a pump up operating speed, a cooling water pressure, and a frequency converter current in the coal mining machine.

6. The real-time prediction method of the running state of the coal mining machine based on Storm of claim 1, wherein in the step 33), the sliding distance of the sliding window model is 1s, the sliding window is divided into 60 basic time windows, the attribute set represents the set of the state data of the coal mining machine, each basic window judges the early warning level of the data in parallel, only 60 tuples judged by the threshold value of each basic window are cached, each tuple of the original window does not need to be cached one by one, and the early warning result is stored in the storage bolt database.