Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a gas accident prediction method and a gas accident prediction device.
In order to achieve the above object, the present invention adopts the following technical solutions.
In a first aspect, the present invention provides a gas accident prediction method, including:
dividing accident prediction areas on the basis of actual geographic areas;
summarizing accident types aiming at each accident prediction area, analyzing various factors causing each accident, expanding a single factor and combining different factors to obtain more factors;
aiming at each accident prediction region, taking the accident type quantized value and the accident occurrence frequency as output variables of a prediction model, and selecting several factors with the maximum correlation coefficient with various accident occurrence frequencies as input variables of the prediction model;
collecting historical data required by a prediction model aiming at each accident prediction area, and constructing a training data set; and determining a model structure, and training model parameters by using a training data set to obtain an accident prediction model.
Further, the accident category includes gas explosion, fire, leakage, and poisoning of people.
Still further, the input variables of the prediction model include a maximum temperature, a minimum temperature, an average temperature, a wind power quantification value, a seasonal quantification value, a holiday quantification value, a sum of the seasonal quantification value and a weather quantification value, and a sum of the average temperature and the wind power quantification value.
Further, the input variables also include the accident category quantization value and the accident occurrence number which occur in the previous prediction period.
Further, the prediction model is a recurrent neural network model RNN.
In a second aspect, the present invention provides a gas accident prediction apparatus, including:
the area dividing module is used for dividing the accident prediction area on the basis of the actual geographic area;
the factor analysis module is used for summarizing accident types according to each accident prediction area, analyzing various factors causing each accident, expanding a single factor and combining different factors to obtain more factors;
the input and output determination module is used for selecting a plurality of factors with the maximum correlation coefficient with various accident occurrence times as the input variables of the prediction model by taking the accident type quantization value and the accident occurrence times as the output variables of the prediction model aiming at each accident prediction area;
the model training module is used for collecting historical data required by the prediction model aiming at each accident prediction area and constructing a training data set; and determining a model structure, and training model parameters by using a training data set to obtain an accident prediction model.
Further, the accident category includes gas explosion, fire, leakage, and poisoning of people.
Still further, the input variables of the prediction model include a maximum temperature, a minimum temperature, an average temperature, a wind power quantification value, a seasonal quantification value, a holiday quantification value, a sum of the seasonal quantification value and a weather quantification value, and a sum of the average temperature and the wind power quantification value.
Further, the input variables also include the accident category quantization value and the accident occurrence number which occur in the previous prediction period.
Further, the prediction model is a recurrent neural network model RNN.
Compared with the prior art, the invention has the following beneficial effects.
According to the invention, the accident prediction regions are divided on the basis of the actual geographic region, and the prediction models are respectively constructed on the basis of the divided accident prediction regions, so that the prediction precision of the prediction models can be improved; the invention can further improve the prediction precision of the prediction model by selecting the factor with the maximum correlation coefficient with the accident occurrence frequency as the input variable.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer and more obvious, the present invention is further described below with reference to the accompanying drawings and the detailed description. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of a gas accident prediction method according to an embodiment of the present invention, including the following steps:
step 101, dividing accident prediction areas based on actual geographic areas;
102, summarizing accident types according to each accident prediction area, analyzing various factors causing each accident, expanding a single factor, and combining different factors to obtain more factors;
103, aiming at each accident prediction area, taking the accident type quantized value and the accident occurrence frequency as output variables of a prediction model, and selecting a plurality of factors with the maximum correlation coefficient with various accident occurrence frequencies as input variables of the prediction model;
104, collecting historical data required by a prediction model aiming at each accident prediction area, and constructing a training data set; and determining a model structure, and training model parameters by using a training data set to obtain an accident prediction model.
In this embodiment, step 101 is mainly used for dividing the accident prediction area. Because the distribution of different types of gas users in different areas is obviously different, and gas facilities and the like are also different, the types, times and rules of gas accidents in different areas are also different. If the accident prediction is carried out by establishing a prediction model aiming at all the areas without carrying out area division, the prediction precision is difficult to ensure. Therefore, in the embodiment, before modeling, region division is performed, then prediction models are respectively established for each region, the model structure of each region is independently selected, the input and output variables of the model are independently determined, and model training is performed by respectively using historical data in the respective region, so that the prediction model capable of accurately predicting the accident of the region is obtained. The area division is carried out on the basis of an actual geographic area, and meshing management is realized. The regional meshing management mode is as follows: a plurality of gas service centers exist under a gas company, and analysis is firstly carried out according to the management conditions of the affiliated places of the gas service centers, including the service range of the service centers, the affiliated administrative areas, the street and the district, and the specific conditions of the street, the district, the building and the user distribution are clarified. Then, the service center divides the district into unit grids according to the principles of geographical position, user meter distribution and the like, and marks the grids according to the divided grids in the modes of user type, geographical position, management personnel and the like. Thus, the gas service center realizes the preliminary grid service area division for provinces and cities, and the gas service center further divides the provinces and the cities through grid management.
In this embodiment, step 102 is mainly used to analyze the accident type and various factors causing the accident in each prediction area, and prepare for determining the input and output variables of the model. According to the detailed records of historical safety accidents of each prediction region, a gas maintenance work order and the like, the accident type and various factors causing the accidents can be obtained through rational analysis. Although each area has its own characteristics, the types of accidents generally include gas explosion, fire, and people poisoning. The direct factors causing the accident and the factors related to the accident are many, such as temperature, wind power, holidays, etc. And a plurality of factors which can be used as input variables of the model can be obtained by expanding one factor, for example, the temperature can be further expanded to be the maximum temperature, the minimum temperature and the average temperature. It is also possible to combine a number of different factors, for example the sum of temperature and wind power as one factor. It has been shown that many extended factors are more suitable than the original factors for use as input variables for the model.
In this embodiment, step 103 is mainly used to determine the input and output variables of the model. The output variables of the model are typically selected as the type of accident and the number of occurrences. Since the accident type (including many influencing factors) is an unquantized variable, the accident type needs to be subjected to quantization processing to be used as an output variable of the model, such as 11 for fire and 12 for gas explosion. Because the number of factors is too large, strict screening is necessary to determine whether the factors can be used as input variables. The embodiment is that the correlation coefficients of various factors and various accident occurrence times are calculated firstly, then the factors are sorted according to the sequence of the correlation coefficients from large to small, and the factors arranged at the top are selected as input variables. The correlation coefficient reflects the correlation degree of various factors and accidents, and the prediction model with the accuracy meeting the requirements can be obtained only by selecting the factors with the maximum correlation degree with the accidents as input variables. Two variables X (X)1,x2,…xn)、Y(y1,y2,…yn) The correlation coefficient r (X, Y) of (a) is calculated as follows:
the magnitude of the correlation coefficient reflects the degree of correlation between various factors and accidents, and the larger the absolute value of the correlation coefficient is, the higher the degree of correlation between the factors is; when the correlation coefficient is negative, it means that an increase (or decrease) in the quantized value of the factor may cause a decrease (or increase) in the gas usage, and this relationship is called negative correlation. The absolute value of the correlation coefficient and the influence degree are in a relation of: 0.8-1.0 is extremely strong correlation, 0.6-0.8 is strong correlation, 0.4-0.6 is moderate correlation, 0.2-0.4 is weak correlation, and 0.0-0.2 is extremely weak correlation or no correlation. In the embodiment, several factors with the maximum accident correlation degree are selected as input variables, so that the prediction accuracy of the prediction model is improved.
In this embodiment, step 104 is mainly used to collect historical data and train the model to obtain the accident prediction model. Firstly, determining the type of required historical data according to the input and output variables of the determined model; and then, acquiring corresponding data from the gas business system in the divided prediction area. The data of gas is not a single field, but a strongly correlated structured type of data. Therefore, a piece of gas data or a piece of gas user data contains a large amount of information structured data, such as user ID, user daily gas amount, user card table type, user last year payment times, user gas equipment type, maintenance times and the like. Because the acquired data includes a part of security information, and also has a large amount of data irrelevant to the security information, and may include data missing, data abnormal, and the like, it is also necessary to perform preliminary cleaning and combing on the data, that is, to perform preprocessing such as filling missing values, repairing abnormal values, and the like. A training data set can be constructed by the historical data; and training the model by using the training data set to determine model parameters, so as to obtain the required accident prediction model. Of course, in actual operation, a part of samples can be separated from the training data set to be used as a test set for detecting whether the trained model meets the precision requirement.
As an alternative, the accident category includes gas explosion, fire, leakage, poisoning of people.
This example presents several common accident categories. Among several common accident categories, gas explosion and fire hazard are two forms which have the greatest harm to people's life and property safety; leakage and personnel poisoning, while not as great a loss or hazard as the first two, are the two most frequent forms. Therefore, the four forms are the main points of accident prediction. And the accident type is used as an output variable of the model after being coded and quantized. It should be noted that, in this embodiment, only a few types of accidents that are relatively common are given, and in fact, the specific accident forms are very many, and different accident types can be selected as output variables of the model for different prediction regions according to the specific conditions of the region.
As an alternative embodiment, the input variables of the prediction model include a maximum temperature, a minimum temperature, an average temperature, a wind power quantification value, a seasonal quantification value, a holiday quantification value, a sum of the seasonal quantification value and a weather quantification value, and a sum of the average temperature and the wind power quantification value.
This embodiment presents the input variables for a particular predictive model. The input variables in the embodiment include single factors, such as wind power quantized values, season quantized values and holiday quantized values; and factors that extend from a single factor, such as maximum temperature, minimum temperature, and average temperature; also included are extended factors combined by multiple factors, such as the sum of the seasonal and weather quantifies, the average temperature and the wind quantifies. The holidays are seemingly not directly related to gas accidents, but the life and working conditions of people can change greatly during the holidays, for example, the gas loads and the accident times during and after spring festival, national day and the like are obviously different from the working days. Therefore, the present embodiment also uses the quantified holidays as an input variable of the model. Of course, the factors used as the input variables are all the factors which are selected to have the highest degree of correlation with the accident occurrence number after calculating the correlation coefficient with the accident occurrence number. It should be noted that the present embodiment merely provides input variables of a specific prediction model, which is a preferred embodiment for those skilled in the art to refer to, and does not negate or exclude other possible embodiments.
As an alternative embodiment, the input variables further include the accident category quantization value and the accident occurrence number which occur in the previous prediction periods.
This embodiment is a modification of the above embodiment. Through intensive research and repeated experiments, the accident occurrence types and times of different prediction periods, particularly adjacent prediction periods, have strong correlation. This is similar to the reasoning behind a major earthquake, which is often accompanied by many aftershocks. For example, in a thunderstorm day, the sound is silent after only one thunderstorm is heard. Therefore, the present embodiment adds the quantified value of the accident category and the accident occurrence number occurring in the previous prediction periods as input variables on the basis of the input variables listed in the previous embodiment. The prediction accuracy of the prediction model can be further improved after the improvement.
As an alternative embodiment, the prediction model is a recurrent neural network model RNN.
The embodiment provides a specific prediction model structure, namely, a recurrent neural network model RNN is adopted. Unlike conventional feed-forward neural networks, the neural network elements of the RNN are not only linked to inputs and outputs, but also form a loop with themselves. This network structure reveals the nature of the RNN: the network state information of the previous time will act on the network state of the next time, that is, the RNN has a memory function. The memory function of RNN is essentially to inherit the memory mechanism of human brain, making it the best choice for processing time series, and also making it very widely used in AI fields such as Natural Language Processing (NLP), speech images, etc. The main reason for selecting the RNN in this embodiment is that the input variables include the accident category and the accident occurrence frequency of the previous prediction periods, belong to a time series, and require a memory function of the model. Like a general neural network, the RNN is also composed of an input layer, a hidden layer, and an output layer. The hidden layer may be a plurality of layers, training may be insufficient when the number of layers is small, and time and memory overhead increases exponentially when the number of layers is too large, so 2 layers are generally selected. RNN is mature prior art and its structural principles are not described in further detail herein.
Fig. 2 is a schematic composition diagram of a gas accident prediction device according to an embodiment of the present invention, the device including:
the region dividing module 11 is used for dividing accident prediction regions based on actual geographic regions;
the factor analysis module 12 is used for summarizing accident types for each accident prediction area, analyzing various factors causing each accident, expanding a single factor and combining different factors to obtain more factors;
the input and output determining module 13 is used for selecting several factors with the largest correlation coefficient with various accident occurrence times as the input variables of the prediction model by taking the accident type quantization value and the accident occurrence times as the output variables of the prediction model aiming at each accident prediction area;
the model training module 14 is used for collecting historical data required by the model aiming at each accident prediction area and constructing a training data set; and determining a model structure, and training model parameters by using a training data set to obtain an accident prediction model.
The apparatus of this embodiment may be used to implement the technical solution of the method embodiment shown in fig. 1, and the implementation principle and the technical effect are similar, which are not described herein again. The same applies to the following embodiments, which are not further described.
As an alternative, the accident category includes gas explosion, fire, leakage, poisoning of people.
As an alternative embodiment, the input variables of the prediction model include a maximum temperature, a minimum temperature, an average temperature, a wind power quantification value, a seasonal quantification value, a holiday quantification value, a sum of the seasonal quantification value and a weather quantification value, and a sum of the average temperature and the wind power quantification value.
As an alternative embodiment, the input variables further include the accident category quantization value and the accident occurrence number which occur in the previous prediction periods.
As an alternative embodiment, the prediction model is a recurrent neural network model RNN.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.