CN111260249A

CN111260249A - Electric power communication service reliability assessment and prediction method and device based on LSTM and random forest mixed model

Info

Publication number: CN111260249A
Application number: CN202010091057.2A
Authority: CN
Inventors: 李石君; 赵远; 杨济海; 李学礼; 龚红霞; 余伟; 余放; 李宇轩
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2020-02-13
Filing date: 2020-02-13
Publication date: 2020-06-09
Anticipated expiration: 2040-02-13
Also published as: CN111260249B

Abstract

The invention discloses a power communication service reliability assessment and prediction method based on an LSTM and random forest mixed model, belongs to the research category of time sequence analysis and classification regression, relates to the technical fields of LSTM, random forest and the like, mainly aims at communication network service record and service alarm record, constructs an LSTM and random forest mixed classification model, adopts an Adam optimization method to train the model, and utilizes the trained model to perform classification tasks. The invention has the advantages that: the training model can be learned automatically from historical alarm records of the past twelve months, the service reliability of the next month is evaluated and predicted, the risk early warning of low-reliability services is improved, and loss is prevented and stopped in time.

Description

Electric power communication service reliability assessment and prediction method and device based on LSTM and random forest mixed model

Technical Field

The invention relates to the technical field of time sequence analysis and classification regression, in particular to a method and a device for evaluating and predicting reliability of power communication service based on an LSTM and random forest mixed model.

Background

Reliability of electric power communication service: the power communication network is used as a special system of a power system, and the carried communication services mainly include communication services related to power production and operation, such as relay protection services, safety and stability services, dispatching automation services, dispatching telephone services, administrative telephone services, data communication services, communication environment monitoring services and the like. These services have special requirements on the reliability of the channel route, and the reliable channel route directly affects the safe and stable operation of the power system. Therefore, reliability assessment and prediction for power communication services is essential.

Electric power communication management system: the special power communication network system is used as an important support of the smart grid, and is a communication management system 'SG-TMS' of 'two-stage deployment' of headquarters and provincial companies, headquarters, branches, provincial companies and city and county companies, and 'four-stage application'. By standardized and standardized project construction and vigorous promotion of system practicability, the 'SG-TMS' is deeply integrated into the daily work of tens of thousands of electric power communication professionals, construction, operation and management data of tens of thousands of devices for several years are comprehensively collected, and accumulated mass electric power communication data, numerous external system data and public data form the basis for developing big data analysis together.

Communication network service recording: the information session management system for the smart grid communication stores a large amount of service record information, service operation condition information, channel information adopted by services and the like, wherein the information session management system not only has standard structured data, service opening time, operation time, service types and the like, but also has a plurality of semi-structured data. Furthermore, the service class reflects the application field of the service, but the service allocation in the power network is variable, especially the spare service channel. Traffic is always made up of one or more channels, each of which is interconnected by a plurality of sites. In daily production management, historical alarm records of sites are recorded, and the historical alarm records comprise information such as alarm time, alarm types and alarm recovery time.

The inventor of the present application finds that the method of the prior art has at least the following technical problems in the process of implementing the present invention:

the existing reliability evaluation technology analyzes and evaluates the reliability of the service in the current state mostly according to fault information or service importance, and can only give general trend analysis similar to that of the service with longer service time and lower reliability for the service reliability in the future period.

That is to say, the prior art has the technical problem that the prediction result is not accurate enough.

Disclosure of Invention

In view of the above, the present invention provides a method and an apparatus for estimating and predicting reliability of power communication service based on LSTM and a random forest mixed model, so as to solve or at least partially solve the technical problem in the prior art that a prediction result is not accurate enough.

In order to solve the above technical problem, a first aspect of the present invention provides a method for estimating and predicting reliability of power communication service based on LSTM and a random forest mixed model, including:

s1: acquiring historical alarm record information of the power communication service, and preprocessing the historical alarm record information;

s2: constructing an LSTM network model, and training the LSTM network model by utilizing the preprocessed historical alarm record information;

s3: predicting the power communication service data to be predicted by using the trained LSTM network model to obtain a time sequence prediction result;

s4: acquiring basic information of the power communication service, splicing a predicted time sequence prediction result with the basic information of the power communication service, and then performing normalization processing on spliced data;

s5: inputting the normalized data into a random forest model for training to obtain a trained random forest model;

s6: and predicting the power communication service data to be predicted by using the trained random forest model to obtain a reliability evaluation result, wherein the reliability evaluation result comprises the predicted alarm quantity and occurrence probability of the power communication service.

In one embodiment, the preprocessing of the historical alert record information in S1 includes: and carrying out data division, time sequence processing and standardization processing on the historical alarm record information.

In an embodiment, the preprocessed historical alarm record information is time series data, where the time series data includes a characteristic attribute, and S2 specifically includes:

s2.1: constructing an input layer and an output layer, wherein the number of nodes of the input layer is the same as the number of characteristic attributes of time sequence data to be input, and the number of nodes of the output layer is 1, and the nodes are used for outputting to obtain a time sequence prediction result;

s2.2: constructing a hidden layer, wherein the hidden layer is a single-layer recurrent neural network constructed by adopting LSTM cells;

s2.3: and taking the preprocessed historical alarm record information as training data, defining a loss function, and training the LSTM network model by adopting a gradient-based optimization algorithm to obtain the trained LSTM network model.

In one embodiment, S2.3 specifically includes:

s2.3.1: calculating LSTM cell output according to forward propagation;

s2.3.2: reversely calculating an error term of each LSTM cell, and reversely propagating according to two directions of time and network hierarchy; calculating the gradient of each weight according to the corresponding error term;

s2.3.3: updating the weight based on the optimization algorithm of the gradient, wherein the average absolute error is selected as an error calculation mode, and a loss function in the training process is as follows:

where m is the training data length, h (x)_i) Return value, y, for the network model_iAnd setting the minimum loss function as an optimization target for the true value of the sample, giving a network initialization seed, a learning rate η and a training step size steps, and continuously updating the network weight by applying an Adam optimization algorithm to finally obtain the well-trained LSTM network model.

In one embodiment, the basic information of the power communication service includes a service type, a service bandwidth, and an interface type, and the S4 splices the predicted timing prediction result with the basic information of the power communication service, and then preprocesses the spliced data, including:

s4.1: splicing service time sequence data obtained by time sequence prediction with service basic information such as service types, service bandwidths, interface types and the like as characteristic attributes of the samples;

s4.2: and carrying out normalization processing on the sample.

In one embodiment, S5 specifically includes:

s5.1: the sub data sets are generated using the randomly placed back drawn samples,

s5.2: independently training each sub-decision tree on the generated sub-data set respectively, wherein when the sub-decision trees are trained, the optimal division characteristics are selected by utilizing the characteristic information, specifically, the optimal division characteristics are selected through a Gini coefficient GINI value, wherein a GINI calculation formula is as follows:

where T represents the sample class contained in the sample set D, p_iRepresenting the proportion of the sample to the total sample, gini (D) is inversely proportional to the purity of the sample set D;

s5.3: and verifying and analyzing by adopting the out-of-bag error rate to finally obtain the trained random forest model.

In one embodiment, the method further comprises:

carrying out weighted summation on the predicted alarm quantity vector A of the power communication service and the input data generation probability vector P to obtain the expected value s of the alarm quantity which is equal to P.A^T；

Carrying out normalization processing on the expected value to obtain a final reliability score, wherein the normalization processing formula is as follows:

where score represents the final reliability score, min (a) represents the minimum value of the alarm quantity vector, and max (a) represents the maximum value of the alarm quantity vector.

Based on the same inventive concept, the second aspect of the present invention provides an apparatus for estimating and predicting reliability of power communication service based on LSTM and random forest mixed model, comprising:

the data preprocessing module is used for acquiring historical alarm record information of the power communication service and preprocessing the historical alarm record information;

the LSTM network model training module is used for constructing an LSTM network model and training the LSTM network model by utilizing the preprocessed historical alarm record information;

the time sequence prediction module is used for predicting the power communication service data to be predicted by utilizing the trained LSTM network model to obtain a time sequence prediction result;

the data splicing module is used for acquiring basic information of the power communication service, splicing a predicted time sequence prediction result with the basic information of the power communication service, and then carrying out normalization processing on the spliced data;

the random forest model training module is used for inputting the data after the normalization processing into a random forest model for training to obtain a trained random forest model;

and the reliability evaluation module is used for predicting the power communication service data to be predicted by using the trained random forest model to obtain a reliability evaluation result.

Based on the same inventive concept, a third aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed, performs the method of the first aspect.

Based on the same inventive concept, a fourth aspect of the present invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method according to the first aspect when executing the program.

One or more technical solutions in the embodiments of the present application have at least one or more of the following technical effects:

the invention provides a power communication service reliability assessment and prediction method based on an LSTM and random forest mixed model, which comprises the steps of firstly, predicting power communication service data to be predicted by using a trained LSTM network model to obtain a time sequence prediction result; and then splicing the time sequence prediction result with basic information of the service, inputting dynamic characteristics and static characteristics of the time sequence into a random forest model for training, predicting power communication service data to be predicted by using the random forest model obtained by training to obtain the predicted category and probability, and improving the prediction accuracy compared with the prior art that a general trend analysis result with longer service time and lower reliability can be obtained, so that the risk early warning of low-reliability service can be improved, and loss can be prevented and stopped in time.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a schematic flow chart of an implementation of a reliability assessment and prediction method for power communication services based on a mixed model of LSTM and random forest in an embodiment;

FIG. 2 is a line graph of the results of the model validation set;

FIG. 3 is a block diagram of a power communication service reliability assessment and prediction device based on a mixed model of LSTM and random forest in the embodiment of the present invention;

FIG. 4 is a block diagram of a computer-readable storage medium according to an embodiment of the present invention;

fig. 5 is a block diagram of a computer device in an embodiment of the present invention.

Detailed Description

The invention aims to provide a power communication service reliability assessment and prediction method based on an LSTM and random forest mixed model, which is used for predicting the alarm number and the occurrence probability of a power communication service at a certain time in the future, so that the prediction accuracy is improved.

The general inventive concept of the present invention is as follows:

the utility model provides an electric power communication business reliability assessment prediction method based on LSTM and random forest mixed model, belongs to the research category of time sequence analysis and classification regression, relates to LSTM, random forest and other technical fields, mainly aims at communication network business record and business alarm record, constructs LSTM and random forest mixed classification model, adopts Adam optimization method to carry out model training, utilizes the trained model to carry out classification task. The invention has the advantages that: the training model can be learned automatically from historical alarm records of the past twelve months, the service reliability of the next month is evaluated and predicted, the risk early warning of low-reliability services is improved, and loss is prevented and stopped in time.

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example one

The embodiment provides a method for evaluating and predicting reliability of power communication service based on an LSTM and random forest mixed model, which comprises the following steps:

s1: and acquiring historical alarm record information of the power communication service, and preprocessing the historical alarm record information.

Specifically, the historical alarm record information is typical in time sequence, so that the time sequence prediction of the alarm record number can be conveniently carried out by training the LSTM model through the method, and the historical alarm record information can be obtained from a database. The pre-processing may be data partitioning, normalization, etc.

S2: and constructing an LSTM network model, and training the LSTM network model by utilizing the preprocessed historical alarm record information.

Specifically, the LSTM network model, also called long-short memory recurrent neural network, is an improvement on Recurrent Neural Networks (RNNs), can avoid the problems of gradient disappearance, insufficient long-term memory capability, and the like of conventional RNNs, and has excellent performance in application to time series data analysis, so that the recurrent neural network can actually and effectively utilize long-distance time series information.

S3: and predicting the power communication service data to be predicted by using the trained LSTM network model to obtain a time sequence prediction result.

S4: acquiring basic information of the power communication service, splicing the predicted time sequence prediction result with the basic information of the power communication service, and then performing normalization processing on the spliced data.

Specifically, the step is to use a time sequence prediction result obtained by the LSTM network model as a dynamic feature, and use basic information of the power communication service as a static feature to prepare for subsequent training of the forest random model.

S5: and inputting the data after the normalization processing into a random forest model for training to obtain a trained random forest model.

In particular, the training process may draw samples in a random and drop-back manner to generate the sub data sets.

Specifically, the LSTM model prediction result is prediction using the chronology, and the model result learns the dynamic characteristics of data in the chronology. And then, training the prediction result as an input feature of a random forest model together with other static features of the service to finally obtain the alarm quantity (category) of the power service and the probability vector (occurrence probability) of the input data belonging to the category.

Fig. 1 is a schematic diagram of an implementation flow of a power communication service reliability assessment and prediction method based on an LSTM and random forest mixed model in an embodiment, where an original fault time sequence is historical alarm record information obtained from a database, and other service attributes are service basic information such as a service type, a service bandwidth, and an interface type.

Specifically, first, the site history alarm record information in the database is read based on python, and a record in which an alarm occurred in the past twelve months is extracted, and the alarm time and the alarm number therein are processed in the form of time series data (i.e., time series processing). And reading the channel information and the starting and ending sites of the service from the database, and finding out the routing path of the service by combining the SDH time slot cross table, the topology base class table and the equipment table. And then counting to obtain a time sequence data form of the historical alarm number of the service.

Specifically, in S2.1, the time-series data to be input is converted into a vector form as follows:

D_i＝(x₁，x₂，…，x_M)^T，i∈1，2，3...N

where M is the characteristic number of the data. D_iThe data of the ith record is shown, and N is the number of input data pieces, namely the number of training samples.

In S2.2, a single-layer cyclic neural network is built by adopting LSTM cells, and an activation function adopts a tanh function. The LSTM cell comprises complicated gate structures such as an input gate, a forgetting gate, a cell state updating gate and an output gate, and inputs x at t moment_tAnd output h at time t-1_t-1Splicing is carried out and input into cells for calculation. The sequential forward propagation calculation formula is as follows:

f_t＝σ(W_f·[h_t-1，x_t]+b_f)

i_t＝σ(W_i·[h_t-1，x_t]+b_i)

o_t＝σ(W_o·[h_t-1，x_t]+b_o)

h_t＝o_t*tanh(C_t)

in S2.3, the network training mainly aims at the weight of the hidden layer.

In one embodiment, S2.3 specifically includes:

s2.3.1: calculating LSTM cell output according to forward propagation;

s4.2: and carrying out normalization processing on the sample.

Specifically, for the character type attribute in the feature attribute, such as: and (4) service bandwidth, numerical value mapping and one-hot coding.

In one embodiment, S5 specifically includes:

Specifically, the smaller the gini (D), the higher the purity of the sample, that is, the fewer the categories included in the sample D, and the features that can improve the purity of the sample to the maximum are selected for classification, so that the decision tree model can be constructed quickly and reasonably. For the distance: dividing the sample set D by the characteristics A to obtain T sub-sample sets { D₁,D₂,...，D_T}, then

Where | D | represents the total number of samples, | D_iIs the generated subset | D_iThe number of samples of l. GINI_iIs the GINI (D) of the subset_i) The optimal feature selection is to select the feature that minimizes the GINI (D, a) for partitioning.

In addition, if M is used to represent the feature dimension of each sample, when training the sub-decision tree, a constant M < < M is designated, M feature subsets are randomly selected from the M features, and the feature with the largest information gain is selected from the M features each time the feature is selected. At this time, according to the bagging idea of ensemble learning, the training target of each sub-decision tree is that the optimal feature subset can be represented and the feature information can be fully fitted. Therefore, pruning is not used in the sub-decision tree training, so that the anti-noise capability of the random forest model is improved, and the possibility of overfitting is reduced.

S5.3: the random forest model can adopt the error rate outside the bag to replace cross validation to obtain unbiased estimation of errors, which is an unbiased estimation of random forest generalization errors, and the result is similar to k-fold cross validation which needs a large amount of calculation.

In one embodiment, the method further comprises:

Specifically, for the meaning analysis of the prediction result, that is, calculating the service reliability score, the alarm number vector a and the occurrence probability vector P obtained by model prediction need to be weighted and the expected value s of the alarm number obtained is P · a^TAnd then substituting the formula for normalization to obtain the final reliability score.

Analyzing the utility of the model, wherein fifty services in the verification set of service reliability scores predicted in 2018 and 4 months are evaluated by using past twelve months of historical alarm data, the reliability scores calculated by using the real alarm number are represented by a line 1, the reliability scores estimated by the model prediction are represented by a line 2, and the final result is shown in figure 2 in the attached specification figure. The result of the prediction and evaluation can clearly reflect the change trend of the real result, and the reliability score obtained by the prediction and evaluation reflects the real reliability of the service, namely the risk of alarming exists.

The service to be analyzed, the possible alarm number A and the occurrence probability P (namely the prediction probability of the attribution class) at a certain time in the future can be obtained through a random forest classification model. And weighting and summing the obtained alarm number and the occurrence probability, and obtaining the reliability score with the value of [0,1] through the score calculation formula. The closer the score is to 1, the more reliable the service is at that point in the future, the lower the unreliable probability of an alarm occurring. Otherwise, it indicates that the service performance is not reliable enough and needs high attention.

Generally speaking, the method has the advantages that the training model can be learned automatically from historical alarm records of the past twelve months, the service reliability of the next month can be evaluated and predicted, the risk early warning of low-reliability services is improved, and the loss can be prevented and stopped in time.

Example two

Based on the same inventive concept, the present embodiment provides an electric power communication service reliability assessment and prediction apparatus based on LSTM and random forest mixed model, please refer to fig. 3, the apparatus includes:

the data preprocessing module 201 is configured to acquire historical alarm record information of the power communication service, and preprocess the historical alarm record information;

the LSTM network model training module 202 is used for constructing an LSTM network model and training the LSTM network model by utilizing the preprocessed historical alarm record information;

the time sequence prediction module 203 is used for predicting the power communication service data to be predicted by using the trained LSTM network model to obtain a time sequence prediction result;

the data splicing module 204 is configured to acquire basic information of the power communication service, splice a predicted time sequence prediction result with the basic information of the power communication service, and then perform normalization processing on the spliced data;

the random forest model training module 205 is configured to input the normalized data into a random forest model for training, so as to obtain a trained random forest model;

and the reliability evaluation module 206 is configured to predict the power communication service data to be predicted by using the trained random forest model to obtain a reliability evaluation result.

Since the apparatus introduced in the second embodiment of the present invention is an apparatus used for implementing the method for estimating and predicting reliability of power communication service based on the LSTM and the random forest mixed model in the first embodiment of the present invention, based on the method introduced in the first embodiment of the present invention, a person skilled in the art can understand the specific structure and deformation of the apparatus, and thus, details are not described herein. All the devices adopted in the method of the first embodiment of the present invention belong to the protection scope of the present invention.

EXAMPLE III

Referring to fig. 4, based on the same inventive concept, the present application further provides a computer-readable storage medium 300, on which a computer program 311 is stored, which when executed implements the method according to the first embodiment.

Since the computer-readable storage medium introduced in the third embodiment of the present invention is a computer-readable storage medium used for implementing the power communication service reliability assessment and prediction method based on the LSTM and the random forest mixed model in the first embodiment of the present invention, based on the method introduced in the first embodiment of the present invention, a person skilled in the art can understand the specific structure and deformation of the computer-readable storage medium, and thus, details are not described herein again. Any computer readable storage medium used in the method of the first embodiment of the present invention is within the scope of the present invention.

Example four

Based on the same inventive concept, the present application further provides a computer device, please refer to fig. 5, which includes a storage 401, a processor 402, and a computer program 403 stored in the storage and running on the processor, and when the processor 402 executes the above program, the method in the first embodiment is implemented.

Since the computer device introduced in the fourth embodiment of the present invention is a computer device used for implementing the power communication service reliability assessment and prediction method based on the LSTM and the random forest mixed model in the first embodiment of the present invention, based on the method introduced in the first embodiment of the present invention, a person skilled in the art can know the specific structure and deformation of the computer device, and thus, details are not described herein. All the computer devices used in the method in the first embodiment of the present invention are within the scope of the present invention.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various modifications and variations can be made in the embodiments of the present invention without departing from the spirit or scope of the embodiments of the invention. Thus, if such modifications and variations of the embodiments of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to encompass such modifications and variations.

Claims

1. A method for evaluating and predicting reliability of power communication service based on LSTM and random forest mixed model is characterized by comprising the following steps:

2. The method of claim 1, wherein the preprocessing of the historical alert record information in S1 comprises: and carrying out data division, time sequence processing and standardization processing on the historical alarm record information.

3. The method according to claim 1, wherein the pre-processed historical alarm record information is time series data, the time series data includes characteristic attributes, and the S2 specifically includes:

4. The method according to claim 3, wherein S2.3 specifically comprises:

s2.3.1: calculating LSTM cell output according to forward propagation;

5. The method of claim 1, wherein the basic information of the power communication service includes a service type, a service bandwidth, and an interface type, and the splicing of the predicted timing prediction result and the basic information of the power communication service in S4 and the preprocessing of the spliced data include:

s4.2: and carrying out normalization processing on the sample.

6. The method of claim 1, wherein S5 specifically comprises:

7. The method of claim 1, wherein the method further comprises:

8. A power communication service reliability assessment and prediction device based on an LSTM and random forest mixed model is characterized by comprising the following steps:

9. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when executed, implements the method of any one of claims 1 to 7.

10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 7 when executing the program.