CN111260249A - Electric power communication service reliability assessment and prediction method and device based on LSTM and random forest mixed model - Google Patents

Electric power communication service reliability assessment and prediction method and device based on LSTM and random forest mixed model Download PDF

Info

Publication number
CN111260249A
CN111260249A CN202010091057.2A CN202010091057A CN111260249A CN 111260249 A CN111260249 A CN 111260249A CN 202010091057 A CN202010091057 A CN 202010091057A CN 111260249 A CN111260249 A CN 111260249A
Authority
CN
China
Prior art keywords
data
power communication
communication service
lstm
random forest
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010091057.2A
Other languages
Chinese (zh)
Other versions
CN111260249B (en
Inventor
李石君
赵远
杨济海
李学礼
龚红霞
余伟
余放
李宇轩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202010091057.2A priority Critical patent/CN111260249B/en
Publication of CN111260249A publication Critical patent/CN111260249A/en
Application granted granted Critical
Publication of CN111260249B publication Critical patent/CN111260249B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/40Business processes related to the transportation industry

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • Evolutionary Computation (AREA)
  • Tourism & Hospitality (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • General Business, Economics & Management (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Marketing (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Educational Administration (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a power communication service reliability assessment and prediction method based on an LSTM and random forest mixed model, belongs to the research category of time sequence analysis and classification regression, relates to the technical fields of LSTM, random forest and the like, mainly aims at communication network service record and service alarm record, constructs an LSTM and random forest mixed classification model, adopts an Adam optimization method to train the model, and utilizes the trained model to perform classification tasks. The invention has the advantages that: the training model can be learned automatically from historical alarm records of the past twelve months, the service reliability of the next month is evaluated and predicted, the risk early warning of low-reliability services is improved, and loss is prevented and stopped in time.

Description

Electric power communication service reliability assessment and prediction method and device based on LSTM and random forest mixed model
Technical Field
The invention relates to the technical field of time sequence analysis and classification regression, in particular to a method and a device for evaluating and predicting reliability of power communication service based on an LSTM and random forest mixed model.
Background
Reliability of electric power communication service: the power communication network is used as a special system of a power system, and the carried communication services mainly include communication services related to power production and operation, such as relay protection services, safety and stability services, dispatching automation services, dispatching telephone services, administrative telephone services, data communication services, communication environment monitoring services and the like. These services have special requirements on the reliability of the channel route, and the reliable channel route directly affects the safe and stable operation of the power system. Therefore, reliability assessment and prediction for power communication services is essential.
Electric power communication management system: the special power communication network system is used as an important support of the smart grid, and is a communication management system 'SG-TMS' of 'two-stage deployment' of headquarters and provincial companies, headquarters, branches, provincial companies and city and county companies, and 'four-stage application'. By standardized and standardized project construction and vigorous promotion of system practicability, the 'SG-TMS' is deeply integrated into the daily work of tens of thousands of electric power communication professionals, construction, operation and management data of tens of thousands of devices for several years are comprehensively collected, and accumulated mass electric power communication data, numerous external system data and public data form the basis for developing big data analysis together.
Communication network service recording: the information session management system for the smart grid communication stores a large amount of service record information, service operation condition information, channel information adopted by services and the like, wherein the information session management system not only has standard structured data, service opening time, operation time, service types and the like, but also has a plurality of semi-structured data. Furthermore, the service class reflects the application field of the service, but the service allocation in the power network is variable, especially the spare service channel. Traffic is always made up of one or more channels, each of which is interconnected by a plurality of sites. In daily production management, historical alarm records of sites are recorded, and the historical alarm records comprise information such as alarm time, alarm types and alarm recovery time.
The inventor of the present application finds that the method of the prior art has at least the following technical problems in the process of implementing the present invention:
the existing reliability evaluation technology analyzes and evaluates the reliability of the service in the current state mostly according to fault information or service importance, and can only give general trend analysis similar to that of the service with longer service time and lower reliability for the service reliability in the future period.
That is to say, the prior art has the technical problem that the prediction result is not accurate enough.
Disclosure of Invention
In view of the above, the present invention provides a method and an apparatus for estimating and predicting reliability of power communication service based on LSTM and a random forest mixed model, so as to solve or at least partially solve the technical problem in the prior art that a prediction result is not accurate enough.
In order to solve the above technical problem, a first aspect of the present invention provides a method for estimating and predicting reliability of power communication service based on LSTM and a random forest mixed model, including:
s1: acquiring historical alarm record information of the power communication service, and preprocessing the historical alarm record information;
s2: constructing an LSTM network model, and training the LSTM network model by utilizing the preprocessed historical alarm record information;
s3: predicting the power communication service data to be predicted by using the trained LSTM network model to obtain a time sequence prediction result;
s4: acquiring basic information of the power communication service, splicing a predicted time sequence prediction result with the basic information of the power communication service, and then performing normalization processing on spliced data;
s5: inputting the normalized data into a random forest model for training to obtain a trained random forest model;
s6: and predicting the power communication service data to be predicted by using the trained random forest model to obtain a reliability evaluation result, wherein the reliability evaluation result comprises the predicted alarm quantity and occurrence probability of the power communication service.
In one embodiment, the preprocessing of the historical alert record information in S1 includes: and carrying out data division, time sequence processing and standardization processing on the historical alarm record information.
In an embodiment, the preprocessed historical alarm record information is time series data, where the time series data includes a characteristic attribute, and S2 specifically includes:
s2.1: constructing an input layer and an output layer, wherein the number of nodes of the input layer is the same as the number of characteristic attributes of time sequence data to be input, and the number of nodes of the output layer is 1, and the nodes are used for outputting to obtain a time sequence prediction result;
s2.2: constructing a hidden layer, wherein the hidden layer is a single-layer recurrent neural network constructed by adopting LSTM cells;
s2.3: and taking the preprocessed historical alarm record information as training data, defining a loss function, and training the LSTM network model by adopting a gradient-based optimization algorithm to obtain the trained LSTM network model.
In one embodiment, S2.3 specifically includes:
s2.3.1: calculating LSTM cell output according to forward propagation;
s2.3.2: reversely calculating an error term of each LSTM cell, and reversely propagating according to two directions of time and network hierarchy; calculating the gradient of each weight according to the corresponding error term;
s2.3.3: updating the weight based on the optimization algorithm of the gradient, wherein the average absolute error is selected as an error calculation mode, and a loss function in the training process is as follows:
Figure BDA0002383731330000031
where m is the training data length, h (x)i) Return value, y, for the network modeliAnd setting the minimum loss function as an optimization target for the true value of the sample, giving a network initialization seed, a learning rate η and a training step size steps, and continuously updating the network weight by applying an Adam optimization algorithm to finally obtain the well-trained LSTM network model.
In one embodiment, the basic information of the power communication service includes a service type, a service bandwidth, and an interface type, and the S4 splices the predicted timing prediction result with the basic information of the power communication service, and then preprocesses the spliced data, including:
s4.1: splicing service time sequence data obtained by time sequence prediction with service basic information such as service types, service bandwidths, interface types and the like as characteristic attributes of the samples;
s4.2: and carrying out normalization processing on the sample.
In one embodiment, S5 specifically includes:
s5.1: the sub data sets are generated using the randomly placed back drawn samples,
s5.2: independently training each sub-decision tree on the generated sub-data set respectively, wherein when the sub-decision trees are trained, the optimal division characteristics are selected by utilizing the characteristic information, specifically, the optimal division characteristics are selected through a Gini coefficient GINI value, wherein a GINI calculation formula is as follows:
Figure BDA0002383731330000032
where T represents the sample class contained in the sample set D, piRepresenting the proportion of the sample to the total sample, gini (D) is inversely proportional to the purity of the sample set D;
s5.3: and verifying and analyzing by adopting the out-of-bag error rate to finally obtain the trained random forest model.
In one embodiment, the method further comprises:
carrying out weighted summation on the predicted alarm quantity vector A of the power communication service and the input data generation probability vector P to obtain the expected value s of the alarm quantity which is equal to P.AT
Carrying out normalization processing on the expected value to obtain a final reliability score, wherein the normalization processing formula is as follows:
Figure BDA0002383731330000041
where score represents the final reliability score, min (a) represents the minimum value of the alarm quantity vector, and max (a) represents the maximum value of the alarm quantity vector.
Based on the same inventive concept, the second aspect of the present invention provides an apparatus for estimating and predicting reliability of power communication service based on LSTM and random forest mixed model, comprising:
the data preprocessing module is used for acquiring historical alarm record information of the power communication service and preprocessing the historical alarm record information;
the LSTM network model training module is used for constructing an LSTM network model and training the LSTM network model by utilizing the preprocessed historical alarm record information;
the time sequence prediction module is used for predicting the power communication service data to be predicted by utilizing the trained LSTM network model to obtain a time sequence prediction result;
the data splicing module is used for acquiring basic information of the power communication service, splicing a predicted time sequence prediction result with the basic information of the power communication service, and then carrying out normalization processing on the spliced data;
the random forest model training module is used for inputting the data after the normalization processing into a random forest model for training to obtain a trained random forest model;
and the reliability evaluation module is used for predicting the power communication service data to be predicted by using the trained random forest model to obtain a reliability evaluation result.
Based on the same inventive concept, a third aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed, performs the method of the first aspect.
Based on the same inventive concept, a fourth aspect of the present invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method according to the first aspect when executing the program.
One or more technical solutions in the embodiments of the present application have at least one or more of the following technical effects:
the invention provides a power communication service reliability assessment and prediction method based on an LSTM and random forest mixed model, which comprises the steps of firstly, predicting power communication service data to be predicted by using a trained LSTM network model to obtain a time sequence prediction result; and then splicing the time sequence prediction result with basic information of the service, inputting dynamic characteristics and static characteristics of the time sequence into a random forest model for training, predicting power communication service data to be predicted by using the random forest model obtained by training to obtain the predicted category and probability, and improving the prediction accuracy compared with the prior art that a general trend analysis result with longer service time and lower reliability can be obtained, so that the risk early warning of low-reliability service can be improved, and loss can be prevented and stopped in time.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic flow chart of an implementation of a reliability assessment and prediction method for power communication services based on a mixed model of LSTM and random forest in an embodiment;
FIG. 2 is a line graph of the results of the model validation set;
FIG. 3 is a block diagram of a power communication service reliability assessment and prediction device based on a mixed model of LSTM and random forest in the embodiment of the present invention;
FIG. 4 is a block diagram of a computer-readable storage medium according to an embodiment of the present invention;
fig. 5 is a block diagram of a computer device in an embodiment of the present invention.
Detailed Description
The invention aims to provide a power communication service reliability assessment and prediction method based on an LSTM and random forest mixed model, which is used for predicting the alarm number and the occurrence probability of a power communication service at a certain time in the future, so that the prediction accuracy is improved.
The general inventive concept of the present invention is as follows:
the utility model provides an electric power communication business reliability assessment prediction method based on LSTM and random forest mixed model, belongs to the research category of time sequence analysis and classification regression, relates to LSTM, random forest and other technical fields, mainly aims at communication network business record and business alarm record, constructs LSTM and random forest mixed classification model, adopts Adam optimization method to carry out model training, utilizes the trained model to carry out classification task. The invention has the advantages that: the training model can be learned automatically from historical alarm records of the past twelve months, the service reliability of the next month is evaluated and predicted, the risk early warning of low-reliability services is improved, and loss is prevented and stopped in time.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
The embodiment provides a method for evaluating and predicting reliability of power communication service based on an LSTM and random forest mixed model, which comprises the following steps:
s1: and acquiring historical alarm record information of the power communication service, and preprocessing the historical alarm record information.
Specifically, the historical alarm record information is typical in time sequence, so that the time sequence prediction of the alarm record number can be conveniently carried out by training the LSTM model through the method, and the historical alarm record information can be obtained from a database. The pre-processing may be data partitioning, normalization, etc.
S2: and constructing an LSTM network model, and training the LSTM network model by utilizing the preprocessed historical alarm record information.
Specifically, the LSTM network model, also called long-short memory recurrent neural network, is an improvement on Recurrent Neural Networks (RNNs), can avoid the problems of gradient disappearance, insufficient long-term memory capability, and the like of conventional RNNs, and has excellent performance in application to time series data analysis, so that the recurrent neural network can actually and effectively utilize long-distance time series information.
S3: and predicting the power communication service data to be predicted by using the trained LSTM network model to obtain a time sequence prediction result.
S4: acquiring basic information of the power communication service, splicing the predicted time sequence prediction result with the basic information of the power communication service, and then performing normalization processing on the spliced data.
Specifically, the step is to use a time sequence prediction result obtained by the LSTM network model as a dynamic feature, and use basic information of the power communication service as a static feature to prepare for subsequent training of the forest random model.
S5: and inputting the data after the normalization processing into a random forest model for training to obtain a trained random forest model.
In particular, the training process may draw samples in a random and drop-back manner to generate the sub data sets.
S6: and predicting the power communication service data to be predicted by using the trained random forest model to obtain a reliability evaluation result, wherein the reliability evaluation result comprises the predicted alarm quantity and occurrence probability of the power communication service.
Specifically, the LSTM model prediction result is prediction using the chronology, and the model result learns the dynamic characteristics of data in the chronology. And then, training the prediction result as an input feature of a random forest model together with other static features of the service to finally obtain the alarm quantity (category) of the power service and the probability vector (occurrence probability) of the input data belonging to the category.
Fig. 1 is a schematic diagram of an implementation flow of a power communication service reliability assessment and prediction method based on an LSTM and random forest mixed model in an embodiment, where an original fault time sequence is historical alarm record information obtained from a database, and other service attributes are service basic information such as a service type, a service bandwidth, and an interface type.
In one embodiment, the preprocessing of the historical alert record information in S1 includes: and carrying out data division, time sequence processing and standardization processing on the historical alarm record information.
Specifically, first, the site history alarm record information in the database is read based on python, and a record in which an alarm occurred in the past twelve months is extracted, and the alarm time and the alarm number therein are processed in the form of time series data (i.e., time series processing). And reading the channel information and the starting and ending sites of the service from the database, and finding out the routing path of the service by combining the SDH time slot cross table, the topology base class table and the equipment table. And then counting to obtain a time sequence data form of the historical alarm number of the service.
In an embodiment, the preprocessed historical alarm record information is time series data, where the time series data includes a characteristic attribute, and S2 specifically includes:
s2.1: constructing an input layer and an output layer, wherein the number of nodes of the input layer is the same as the number of characteristic attributes of time sequence data to be input, and the number of nodes of the output layer is 1, and the nodes are used for outputting to obtain a time sequence prediction result;
s2.2: constructing a hidden layer, wherein the hidden layer is a single-layer recurrent neural network constructed by adopting LSTM cells;
s2.3: and taking the preprocessed historical alarm record information as training data, defining a loss function, and training the LSTM network model by adopting a gradient-based optimization algorithm to obtain the trained LSTM network model.
Specifically, in S2.1, the time-series data to be input is converted into a vector form as follows:
Di=(x1,x2,…,xM)T,i∈1,2,3...N
where M is the characteristic number of the data. DiThe data of the ith record is shown, and N is the number of input data pieces, namely the number of training samples.
In S2.2, a single-layer cyclic neural network is built by adopting LSTM cells, and an activation function adopts a tanh function. The LSTM cell comprises complicated gate structures such as an input gate, a forgetting gate, a cell state updating gate and an output gate, and inputs x at t momenttAnd output h at time t-1t-1Splicing is carried out and input into cells for calculation. The sequential forward propagation calculation formula is as follows:
ft=σ(Wf·[ht-1,xt]+bf)
it=σ(Wi·[ht-1,xt]+bi)
Figure BDA0002383731330000081
Figure BDA0002383731330000082
ot=σ(Wo·[ht-1,xt]+bo)
ht=ot*tanh(Ct)
in S2.3, the network training mainly aims at the weight of the hidden layer.
In one embodiment, S2.3 specifically includes:
s2.3.1: calculating LSTM cell output according to forward propagation;
s2.3.2: reversely calculating an error term of each LSTM cell, and reversely propagating according to two directions of time and network hierarchy; calculating the gradient of each weight according to the corresponding error term;
s2.3.3: updating the weight based on the optimization algorithm of the gradient, wherein the average absolute error is selected as an error calculation mode, and a loss function in the training process is as follows:
Figure BDA0002383731330000083
where m is the training data length, h (x)i) Return value, y, for the network modeliAnd setting the minimum loss function as an optimization target for the true value of the sample, giving a network initialization seed, a learning rate η and a training step size steps, and continuously updating the network weight by applying an Adam optimization algorithm to finally obtain the well-trained LSTM network model.
In one embodiment, the basic information of the power communication service includes a service type, a service bandwidth, and an interface type, and the S4 splices the predicted timing prediction result with the basic information of the power communication service, and then preprocesses the spliced data, including:
s4.1: splicing service time sequence data obtained by time sequence prediction with service basic information such as service types, service bandwidths, interface types and the like as characteristic attributes of the samples;
s4.2: and carrying out normalization processing on the sample.
Specifically, for the character type attribute in the feature attribute, such as: and (4) service bandwidth, numerical value mapping and one-hot coding.
In one embodiment, S5 specifically includes:
s5.1: the sub data sets are generated using the randomly placed back drawn samples,
s5.2: independently training each sub-decision tree on the generated sub-data set respectively, wherein when the sub-decision trees are trained, the optimal division characteristics are selected by utilizing the characteristic information, specifically, the optimal division characteristics are selected through a Gini coefficient GINI value, wherein a GINI calculation formula is as follows:
Figure BDA0002383731330000091
where T represents the sample class contained in the sample set D, piRepresenting the proportion of the sample to the total sample, gini (D) is inversely proportional to the purity of the sample set D;
s5.3: and verifying and analyzing by adopting the out-of-bag error rate to finally obtain the trained random forest model.
Specifically, the smaller the gini (D), the higher the purity of the sample, that is, the fewer the categories included in the sample D, and the features that can improve the purity of the sample to the maximum are selected for classification, so that the decision tree model can be constructed quickly and reasonably. For the distance: dividing the sample set D by the characteristics A to obtain T sub-sample sets { D1,D2,...,DT}, then
Figure BDA0002383731330000092
Where | D | represents the total number of samples, | DiIs the generated subset | DiThe number of samples of l. GINIiIs the GINI (D) of the subseti) The optimal feature selection is to select the feature that minimizes the GINI (D, a) for partitioning.
In addition, if M is used to represent the feature dimension of each sample, when training the sub-decision tree, a constant M < < M is designated, M feature subsets are randomly selected from the M features, and the feature with the largest information gain is selected from the M features each time the feature is selected. At this time, according to the bagging idea of ensemble learning, the training target of each sub-decision tree is that the optimal feature subset can be represented and the feature information can be fully fitted. Therefore, pruning is not used in the sub-decision tree training, so that the anti-noise capability of the random forest model is improved, and the possibility of overfitting is reduced.
S5.3: the random forest model can adopt the error rate outside the bag to replace cross validation to obtain unbiased estimation of errors, which is an unbiased estimation of random forest generalization errors, and the result is similar to k-fold cross validation which needs a large amount of calculation.
In one embodiment, the method further comprises:
carrying out weighted summation on the predicted alarm quantity vector A of the power communication service and the input data generation probability vector P to obtain the expected value s of the alarm quantity which is equal to P.AT
Carrying out normalization processing on the expected value to obtain a final reliability score, wherein the normalization processing formula is as follows:
Figure BDA0002383731330000101
where score represents the final reliability score, min (a) represents the minimum value of the alarm quantity vector, and max (a) represents the maximum value of the alarm quantity vector.
Specifically, for the meaning analysis of the prediction result, that is, calculating the service reliability score, the alarm number vector a and the occurrence probability vector P obtained by model prediction need to be weighted and the expected value s of the alarm number obtained is P · aTAnd then substituting the formula for normalization to obtain the final reliability score.
Analyzing the utility of the model, wherein fifty services in the verification set of service reliability scores predicted in 2018 and 4 months are evaluated by using past twelve months of historical alarm data, the reliability scores calculated by using the real alarm number are represented by a line 1, the reliability scores estimated by the model prediction are represented by a line 2, and the final result is shown in figure 2 in the attached specification figure. The result of the prediction and evaluation can clearly reflect the change trend of the real result, and the reliability score obtained by the prediction and evaluation reflects the real reliability of the service, namely the risk of alarming exists.
The service to be analyzed, the possible alarm number A and the occurrence probability P (namely the prediction probability of the attribution class) at a certain time in the future can be obtained through a random forest classification model. And weighting and summing the obtained alarm number and the occurrence probability, and obtaining the reliability score with the value of [0,1] through the score calculation formula. The closer the score is to 1, the more reliable the service is at that point in the future, the lower the unreliable probability of an alarm occurring. Otherwise, it indicates that the service performance is not reliable enough and needs high attention.
Generally speaking, the method has the advantages that the training model can be learned automatically from historical alarm records of the past twelve months, the service reliability of the next month can be evaluated and predicted, the risk early warning of low-reliability services is improved, and the loss can be prevented and stopped in time.
Example two
Based on the same inventive concept, the present embodiment provides an electric power communication service reliability assessment and prediction apparatus based on LSTM and random forest mixed model, please refer to fig. 3, the apparatus includes:
the data preprocessing module 201 is configured to acquire historical alarm record information of the power communication service, and preprocess the historical alarm record information;
the LSTM network model training module 202 is used for constructing an LSTM network model and training the LSTM network model by utilizing the preprocessed historical alarm record information;
the time sequence prediction module 203 is used for predicting the power communication service data to be predicted by using the trained LSTM network model to obtain a time sequence prediction result;
the data splicing module 204 is configured to acquire basic information of the power communication service, splice a predicted time sequence prediction result with the basic information of the power communication service, and then perform normalization processing on the spliced data;
the random forest model training module 205 is configured to input the normalized data into a random forest model for training, so as to obtain a trained random forest model;
and the reliability evaluation module 206 is configured to predict the power communication service data to be predicted by using the trained random forest model to obtain a reliability evaluation result.
Since the apparatus introduced in the second embodiment of the present invention is an apparatus used for implementing the method for estimating and predicting reliability of power communication service based on the LSTM and the random forest mixed model in the first embodiment of the present invention, based on the method introduced in the first embodiment of the present invention, a person skilled in the art can understand the specific structure and deformation of the apparatus, and thus, details are not described herein. All the devices adopted in the method of the first embodiment of the present invention belong to the protection scope of the present invention.
EXAMPLE III
Referring to fig. 4, based on the same inventive concept, the present application further provides a computer-readable storage medium 300, on which a computer program 311 is stored, which when executed implements the method according to the first embodiment.
Since the computer-readable storage medium introduced in the third embodiment of the present invention is a computer-readable storage medium used for implementing the power communication service reliability assessment and prediction method based on the LSTM and the random forest mixed model in the first embodiment of the present invention, based on the method introduced in the first embodiment of the present invention, a person skilled in the art can understand the specific structure and deformation of the computer-readable storage medium, and thus, details are not described herein again. Any computer readable storage medium used in the method of the first embodiment of the present invention is within the scope of the present invention.
Example four
Based on the same inventive concept, the present application further provides a computer device, please refer to fig. 5, which includes a storage 401, a processor 402, and a computer program 403 stored in the storage and running on the processor, and when the processor 402 executes the above program, the method in the first embodiment is implemented.
Since the computer device introduced in the fourth embodiment of the present invention is a computer device used for implementing the power communication service reliability assessment and prediction method based on the LSTM and the random forest mixed model in the first embodiment of the present invention, based on the method introduced in the first embodiment of the present invention, a person skilled in the art can know the specific structure and deformation of the computer device, and thus, details are not described herein. All the computer devices used in the method in the first embodiment of the present invention are within the scope of the present invention.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made in the embodiments of the present invention without departing from the spirit or scope of the embodiments of the invention. Thus, if such modifications and variations of the embodiments of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to encompass such modifications and variations.

Claims (10)

1. A method for evaluating and predicting reliability of power communication service based on LSTM and random forest mixed model is characterized by comprising the following steps:
s1: acquiring historical alarm record information of the power communication service, and preprocessing the historical alarm record information;
s2: constructing an LSTM network model, and training the LSTM network model by utilizing the preprocessed historical alarm record information;
s3: predicting the power communication service data to be predicted by using the trained LSTM network model to obtain a time sequence prediction result;
s4: acquiring basic information of the power communication service, splicing a predicted time sequence prediction result with the basic information of the power communication service, and then performing normalization processing on spliced data;
s5: inputting the normalized data into a random forest model for training to obtain a trained random forest model;
s6: and predicting the power communication service data to be predicted by using the trained random forest model to obtain a reliability evaluation result, wherein the reliability evaluation result comprises the predicted alarm quantity and occurrence probability of the power communication service.
2. The method of claim 1, wherein the preprocessing of the historical alert record information in S1 comprises: and carrying out data division, time sequence processing and standardization processing on the historical alarm record information.
3. The method according to claim 1, wherein the pre-processed historical alarm record information is time series data, the time series data includes characteristic attributes, and the S2 specifically includes:
s2.1: constructing an input layer and an output layer, wherein the number of nodes of the input layer is the same as the number of characteristic attributes of time sequence data to be input, and the number of nodes of the output layer is 1, and the nodes are used for outputting to obtain a time sequence prediction result;
s2.2: constructing a hidden layer, wherein the hidden layer is a single-layer recurrent neural network constructed by adopting LSTM cells;
s2.3: and taking the preprocessed historical alarm record information as training data, defining a loss function, and training the LSTM network model by adopting a gradient-based optimization algorithm to obtain the trained LSTM network model.
4. The method according to claim 3, wherein S2.3 specifically comprises:
s2.3.1: calculating LSTM cell output according to forward propagation;
s2.3.2: reversely calculating an error term of each LSTM cell, and reversely propagating according to two directions of time and network hierarchy; calculating the gradient of each weight according to the corresponding error term;
s2.3.3: updating the weight based on the optimization algorithm of the gradient, wherein the average absolute error is selected as an error calculation mode, and a loss function in the training process is as follows:
Figure FDA0002383731320000021
where m is the training data length, h (x)i) Return value, y, for the network modeliAnd setting the minimum loss function as an optimization target for the true value of the sample, giving a network initialization seed, a learning rate η and a training step size steps, and continuously updating the network weight by applying an Adam optimization algorithm to finally obtain the well-trained LSTM network model.
5. The method of claim 1, wherein the basic information of the power communication service includes a service type, a service bandwidth, and an interface type, and the splicing of the predicted timing prediction result and the basic information of the power communication service in S4 and the preprocessing of the spliced data include:
s4.1: splicing service time sequence data obtained by time sequence prediction with service basic information such as service types, service bandwidths, interface types and the like as characteristic attributes of the samples;
s4.2: and carrying out normalization processing on the sample.
6. The method of claim 1, wherein S5 specifically comprises:
s5.1: the sub data sets are generated using the randomly placed back drawn samples,
s5.2: independently training each sub-decision tree on the generated sub-data set respectively, wherein when the sub-decision trees are trained, the optimal division characteristics are selected by utilizing the characteristic information, specifically, the optimal division characteristics are selected through a Gini coefficient GINI value, wherein a GINI calculation formula is as follows:
Figure FDA0002383731320000022
where T represents the sample class contained in the sample set D, piRepresenting the proportion of the sample to the total sample, gini (D) is inversely proportional to the purity of the sample set D;
s5.3: and verifying and analyzing by adopting the out-of-bag error rate to finally obtain the trained random forest model.
7. The method of claim 1, wherein the method further comprises:
carrying out weighted summation on the predicted alarm quantity vector A of the power communication service and the input data generation probability vector P to obtain the expected value s of the alarm quantity which is equal to P.AT
Carrying out normalization processing on the expected value to obtain a final reliability score, wherein the normalization processing formula is as follows:
Figure FDA0002383731320000023
where score represents the final reliability score, min (a) represents the minimum value of the alarm quantity vector, and max (a) represents the maximum value of the alarm quantity vector.
8. A power communication service reliability assessment and prediction device based on an LSTM and random forest mixed model is characterized by comprising the following steps:
the data preprocessing module is used for acquiring historical alarm record information of the power communication service and preprocessing the historical alarm record information;
the LSTM network model training module is used for constructing an LSTM network model and training the LSTM network model by utilizing the preprocessed historical alarm record information;
the time sequence prediction module is used for predicting the power communication service data to be predicted by utilizing the trained LSTM network model to obtain a time sequence prediction result;
the data splicing module is used for acquiring basic information of the power communication service, splicing a predicted time sequence prediction result with the basic information of the power communication service, and then carrying out normalization processing on the spliced data;
the random forest model training module is used for inputting the data after the normalization processing into a random forest model for training to obtain a trained random forest model;
and the reliability evaluation module is used for predicting the power communication service data to be predicted by using the trained random forest model to obtain a reliability evaluation result.
9. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when executed, implements the method of any one of claims 1 to 7.
10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 7 when executing the program.
CN202010091057.2A 2020-02-13 2020-02-13 Electric power communication service reliability assessment and prediction method and device based on LSTM and random forest mixed model Active CN111260249B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010091057.2A CN111260249B (en) 2020-02-13 2020-02-13 Electric power communication service reliability assessment and prediction method and device based on LSTM and random forest mixed model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010091057.2A CN111260249B (en) 2020-02-13 2020-02-13 Electric power communication service reliability assessment and prediction method and device based on LSTM and random forest mixed model

Publications (2)

Publication Number Publication Date
CN111260249A true CN111260249A (en) 2020-06-09
CN111260249B CN111260249B (en) 2022-08-05

Family

ID=70945636

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010091057.2A Active CN111260249B (en) 2020-02-13 2020-02-13 Electric power communication service reliability assessment and prediction method and device based on LSTM and random forest mixed model

Country Status (1)

Country Link
CN (1) CN111260249B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112215696A (en) * 2020-09-28 2021-01-12 北京大学 Personal credit evaluation and interpretation method, device, equipment and storage medium based on time sequence attribution analysis
CN112861093A (en) * 2021-04-25 2021-05-28 上海派拉软件股份有限公司 Verification method, device and equipment for access data and storage medium
CN112966443A (en) * 2021-03-10 2021-06-15 中国人民解放军海军航空大学 Equipment reliability and performance evaluation method based on long-term and short-term memory network
CN113076239A (en) * 2021-04-12 2021-07-06 西安交通大学 Hybrid neural network fault prediction method and system for high-performance computer
CN113593703A (en) * 2021-07-29 2021-11-02 甘肃省人民医院 Device and method for constructing pressure damage risk prediction model
CN113872703A (en) * 2021-09-16 2021-12-31 国科量子通信网络有限公司 Method and system for predicting multi-network metadata in quantum communication network
CN113965467A (en) * 2021-08-30 2022-01-21 国网山东省电力公司信息通信公司 Neural network-based reliability assessment method and system for power communication system
CN114266925A (en) * 2021-12-30 2022-04-01 华北电力大学 DLSTM-RF-based user electricity stealing detection method and system
CN116702059A (en) * 2023-06-05 2023-09-05 苏州市联佳精密机械有限公司 Intelligent production workshop management system based on Internet of things
CN116910668A (en) * 2023-09-11 2023-10-20 国网浙江省电力有限公司余姚市供电公司 Lightning arrester fault early warning method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107769972A (en) * 2017-10-25 2018-03-06 武汉大学 A kind of power telecom network equipment fault Forecasting Methodology based on improved LSTM
US20180137412A1 (en) * 2016-11-16 2018-05-17 Cisco Technology, Inc. Network traffic prediction using long short term memory neural networks
CN110750641A (en) * 2019-09-24 2020-02-04 武汉大学 Classification error correction method based on sequence connection model and binary tree model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180137412A1 (en) * 2016-11-16 2018-05-17 Cisco Technology, Inc. Network traffic prediction using long short term memory neural networks
CN107769972A (en) * 2017-10-25 2018-03-06 武汉大学 A kind of power telecom network equipment fault Forecasting Methodology based on improved LSTM
CN110750641A (en) * 2019-09-24 2020-02-04 武汉大学 Classification error correction method based on sequence connection model and binary tree model

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
方勇 等: ""基于LSTM与随机森林混合构架的钓鱼网站识别研究"", 《工程科学与技术》 *
李旭阳 等: ""LSTM与随机森林购买行为预测模型研究"", 《青岛大学学报(工程技术版)》 *
杨济海 等: ""基于并行的F-LSTM模型及其在电力通信设备故障预测中的应用"", 《武汉大学学报(理学版)》 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112215696A (en) * 2020-09-28 2021-01-12 北京大学 Personal credit evaluation and interpretation method, device, equipment and storage medium based on time sequence attribution analysis
CN112966443B (en) * 2021-03-10 2022-04-29 中国人民解放军海军航空大学 Equipment reliability and performance evaluation method based on long-term and short-term memory network
CN112966443A (en) * 2021-03-10 2021-06-15 中国人民解放军海军航空大学 Equipment reliability and performance evaluation method based on long-term and short-term memory network
CN113076239A (en) * 2021-04-12 2021-07-06 西安交通大学 Hybrid neural network fault prediction method and system for high-performance computer
CN113076239B (en) * 2021-04-12 2023-05-23 西安交通大学 Hybrid neural network fault prediction method and system for high-performance computer
CN112861093A (en) * 2021-04-25 2021-05-28 上海派拉软件股份有限公司 Verification method, device and equipment for access data and storage medium
CN113593703A (en) * 2021-07-29 2021-11-02 甘肃省人民医院 Device and method for constructing pressure damage risk prediction model
CN113965467A (en) * 2021-08-30 2022-01-21 国网山东省电力公司信息通信公司 Neural network-based reliability assessment method and system for power communication system
CN113965467B (en) * 2021-08-30 2023-10-10 国网山东省电力公司信息通信公司 Power communication system reliability assessment method and system based on neural network
CN113872703B (en) * 2021-09-16 2022-09-06 国科量子通信网络有限公司 Method and system for predicting multi-network metadata in quantum communication network
CN113872703A (en) * 2021-09-16 2021-12-31 国科量子通信网络有限公司 Method and system for predicting multi-network metadata in quantum communication network
CN114266925A (en) * 2021-12-30 2022-04-01 华北电力大学 DLSTM-RF-based user electricity stealing detection method and system
CN114266925B (en) * 2021-12-30 2022-09-30 华北电力大学 DLSTM-RF-based user electricity stealing detection method and system
CN116702059A (en) * 2023-06-05 2023-09-05 苏州市联佳精密机械有限公司 Intelligent production workshop management system based on Internet of things
CN116702059B (en) * 2023-06-05 2023-12-19 苏州市联佳精密机械有限公司 Intelligent production workshop management system based on Internet of things
CN116910668A (en) * 2023-09-11 2023-10-20 国网浙江省电力有限公司余姚市供电公司 Lightning arrester fault early warning method, device, equipment and storage medium
CN116910668B (en) * 2023-09-11 2024-04-02 国网浙江省电力有限公司余姚市供电公司 Lightning arrester fault early warning method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN111260249B (en) 2022-08-05

Similar Documents

Publication Publication Date Title
CN111260249B (en) Electric power communication service reliability assessment and prediction method and device based on LSTM and random forest mixed model
CN109767312B (en) Credit evaluation model training and evaluation method and device
CN114220271A (en) Traffic flow prediction method, equipment and storage medium based on dynamic space-time graph convolution cycle network
CN111008337B (en) Deep attention rumor identification method and device based on ternary characteristics
CN110750641B (en) Classification error correction method based on sequence connection model and binary tree model
CN101546290B (en) Method for improving accuracy of quality forecast of class hierarchy in object-oriented software
US20090089228A1 (en) Generalized reduced error logistic regression method
Haggag et al. Infrastructure performance prediction under climate-induced disasters using data analytics
Syeed et al. Flood prediction using machine learning models
Gowtham Sethupathi et al. Efficient rainfall prediction and analysis using machine learning techniques
CN115249081A (en) Object type prediction method and device, computer equipment and storage medium
CN111145535B (en) Travel time reliability distribution prediction method under complex scene
Alam Recurrent neural networks in electricity load forecasting
CN116862658A (en) Credit evaluation method, apparatus, electronic device, medium and program product
Dharsan Asurvey ON WEATHER FORECASTING AND THEIR TECHNIQUES
CN115062686A (en) Multi-KPI (Key performance indicator) time sequence abnormity detection method and system based on multi-angle features
CN114816962A (en) ATTENTION-LSTM-based network fault prediction method
CN115858606A (en) Method, device and equipment for detecting abnormity of time series data and storage medium
Eranga Lstm based framework for time series anomaly detection
Kendre et al. Traffic Volume Prediction Based on Weather Parameters
Acula Classification of Disaster Risks in the Philippines using Adaptive Boosting Algorithm with Decision Trees and Support Vector Machine as Based Estimators
Derras et al. Prediction of recovery time of infrastructure functionalities after an earthquake using machine learning
Ismail et al. Change Vulnerability Forecasting for Southeast Asia using Deep Learning Algorithm
CN112508303B (en) OD passenger flow prediction method, device, equipment and readable storage medium
CN113379125B (en) Logistics storage sales prediction method based on TCN and LightGBM combined model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant