CN115438190A

CN115438190A - Power distribution network fault decision-making assisting knowledge extraction method and system

Info

Publication number: CN115438190A
Application number: CN202211086406.7A
Authority: CN
Inventors: 李智; 刘正祎; 李默涵; 张瑶瑶; 张海; 倪玉露; 刘鑫蕊; 裴玉杰; 金银龙; 王野; 袁明阳; 路学文; 贾俊海; 吴厚毅
Original assignee: Fushun Power Supply Co Of State Grid Liaoning Electric Power Supply Co ltd; State Grid Corp of China SGCC
Current assignee: Fushun Power Supply Co Of State Grid Liaoning Electric Power Supply Co ltd; State Grid Corp of China SGCC
Priority date: 2022-09-06
Filing date: 2022-09-06
Publication date: 2022-12-06
Anticipated expiration: 2042-09-06
Also published as: CN115438190B

Abstract

The invention discloses a power distribution network fault assistant decision knowledge extraction method and a power distribution network fault assistant decision knowledge extraction system, wherein vectorization operation is carried out on original text data after being subjected to overall aggregation to form word vector set data; performing entity extraction on the word vector set, and labeling the obtained entities; distributing the labeled attribute relation of the same entity to a certain relation class by adopting a multi-classification principle to complete entity relation extraction; training the relationship between adjacent labeled entities, repairing the labeling error between the entities, and outputting the structural association relationship between the repaired entities; evaluating the knowledge extraction result according to the error range of the structured incidence relation between the original text data and each repaired entity; the system comprises a data preprocessing module, a Bi-LSTM module, a weight correlation model, an error correction module and a model evaluation module. The method and the system realize knowledge extraction of the fault processing text of the power distribution network, are closer to the original semantics and are beneficial to optimizing the knowledge extraction process.

Description

Power distribution network fault decision-making assisting knowledge extraction method and system

Technical Field

The invention relates to the technical field of power transmission and distribution, in particular to a power distribution network fault auxiliary decision knowledge extraction method and system.

Background

With the acceleration of the national technical energy reform process, the requirements of electric power systems and users on the links of electric energy transmission and distribution are increasingly enhanced, and higher requirements and challenges are provided for the dispatching operation of a power grid. In the face of the problems of regional dispersion of terminal users, large-scale grid connection of renewable energy sources, power supply reliability guarantee and the like in the current large environment of the power market, under emergency situations such as emergency faults and the like, real-time fault data are analyzed and calculated only by a dispatcher, corresponding screening decisions are carried out according to textual data and empirical knowledge, a large amount of resource waste is caused, the fault time is greatly prolonged, the current big data of the power internet of things has the characteristics of diversification, large quantity and the like, misjudgment and misoperation conditions are easily caused in the face of different task scenes, and the operation stability and reliability of a power system are influenced.

Disclosure of Invention

In order to improve the stability and reliability of the operation of the power system, make accurate judgment and operation under different task scenes and avoid prolonging the fault time, the invention provides a power distribution network fault auxiliary decision knowledge extraction method and system.

The adopted technical scheme is as follows:

on one hand, the invention provides a power distribution network fault assistant decision-making knowledge extraction method, which comprises the steps of vectorizing obtained original text data to form a word vector set retaining original semantics;

performing entity extraction on the word vector set, and labeling the obtained entities;

distributing the labeled attribute relation of the same entity to a certain relation class by adopting a multi-classification principle to finish labeling of various attribute relations of the entity;

training the relationship between adjacent labeled entities, repairing the labeling error between the entities, and outputting the structural association relationship between the repaired entities;

and evaluating the knowledge extraction result according to the error range of the structured incidence relation between the original text data and each repaired entity.

Furthermore, vectorizing the obtained original text data to form a word vector set retaining original semantics, wherein the word vector set comprises that non-text data in the input original text data is arranged and summarized into text data by manual operation or character conversion software; segmenting the text data by using Python codes and punctuation marks as identifiers to form segmented text data; vectorization operation is carried out on the Duan Wen data by using a word vector training tool, and the vectorized data are collected into a data set after multiple cycles to form a word vector set capable of retaining original semantics.

Preferably, the non-text data includes one or more of operation procedure, treatment plan, scheduling procedure, table of fault information and scheduling instruction, picture and voice.

Further, after non-text data in the input original text data is manually operated or is arranged and summarized into text data by using character conversion software, missing value processing, abnormal value processing, repeated value processing and noise filtering processing are carried out on the text data.

Further, performing entity extraction on the word vector set, and labeling the obtained entities, including: the method comprises the following steps of learning input entities by adopting a Bi-LSTM combined model, extracting the entities through an LSTM network, and labeling the entities extracted from text data by adopting a BIEOS entity labeling method, and specifically comprises the following steps:

the LSTM network receives a set of word vectors as input and performs learning, including a receiving gate

Door for throwing away

Door for recording and displaying results

And a data recording gate

；

The discarding gate processes the text data to be discarded, and the formula adopted by the discarded content is as follows:

wherein:

characterizing time

A receiving variable of (1);

characterizing a previous period

Deep layer results of (2);

is that

A weight;

is that

The weight of (2);

is an offset;

the receiving door

Calculating the information to be stored when the LSTM network carries out cell updating, and adoptingThe formula of (1) is as follows:

wherein:

is that

The weight of (c);

is that

The weight of (c);

is the hypothetical cell state;

is that

Is/are as follows

A weight;

is that

Is/are as follows

A weight;

and

characterization of each

And

the error amount of (a);

representing the state of the current grid;

characterizing a grid state of a previous period;

characterization of

CAVs with medium parameters transformed by a Sigmoid function;

characterization of

CAVs with middle parameters transformed by Sigmoid function;

characterization of

CAVs with medium parameters transformed by tanh function, the CAVs are activation vectors

An amount;

the result door

Output Bi-LSTM combined modelExtracted entities:

in the formula:

is that

A weight;

is that

A weight;

characterizing an error amount;

characterizing an output result of the LSTM network;

characterization of

Middle parameters are CAVs transformed by Sigmoid function.

Further, a weight association mechanism is introduced to improve the weight parameters, text data are trained according to different weight parameters, and key contents are screened for entity relationship extraction, which specifically comprises the following steps:

the input text data is learnt and trained by adopting the following formula, and the input text data is selectively processed in parallel:

in the formula:

characterizing non-apparent states

Relative degree of importance of;

sign a certain vector

The amount of error of (a);

is the weight automatically assigned by the weight association model;

characterizing the number of independent parameters in the Bi-LSTM network;

characterization of

CAVs with medium parameters transformed by tanh functions;

characterizing the look-ahead relationship for each text datum;

characterizing each subsequent association of textual data;

is an operator;

and substituting the obtained data into the following formula to determine the structural relationship among different extraction entities:

in the formula:

is the final output of the weight correlation model;

is a weight correlation model

Assigning time instants to non-distinct states

The weight of (c).

Further, decoupling analysis is carried out by calling the mutual relation between adjacent labeling entities, global optimal sequence solving of output data and knowledge extraction of a power distribution network fault processing text are sequentially completed, and the correct output possibility ratio is calculated according to the following formula:

wherein:

representing a result value of a Linear-Chain model in the error correction module;

characterization error correctionReceiving values of a Linear-Chain model in the module;

characterizing the emission probability;

characterizing a transition probability;

is referred to as a parameter

Number of elements in the vector.

Further, evaluating the quality of the auxiliary decision knowledge extraction based on a reward and punishment mechanism according to the error range of the structural association relationship between the obtained original text data and each repaired entity, wherein the evaluation function is as follows:

wherein, F is a reward and punishment result,

is the error threshold range;

is the value of the error in the physical sequence,

is a relational sequence error value;

is the entity weight bias coefficient;

is a relation weight bias coefficient;

and then combining the following formula to obtain an evaluation result of the decision reference value:

wherein the content of the first and second substances,

characterization of

The number of parameters of (2);

representing the total number of F;

the system error rate is characterized, expressed as a percentage.

On the other hand, the invention also provides a power distribution network fault assistant decision knowledge extraction system, which comprises the following modules:

the data preprocessing module is used for performing quantization operation on the original text data after being processed and aggregated to form a word vector set which retains original semantics;

the Bi-LSTM module is used for performing entity extraction and multiple attribute relation labeling on the word vector set output by the data preprocessing module;

the error correction module is used for training the relationship between adjacent labeled entities in the Bi-LSTM module, repairing labeled errors existing in the Bi-LSTM module and outputting the structural association relationship between the repaired entities;

and the model evaluation module is used for evaluating the accuracy of the model according to the error ranges of the structured incidence relations between the sorted and aggregated original text data and the repaired entities.

Furthermore, the system is also provided with a weight association model which is used for screening the weight of each entity extracted from the input text data, identifying and judging the relation among the entities and extracting the relation; and repairing the structural association relationship among the entities through the error correction module.

The technical scheme of the invention has the following advantages:

A. the method realizes entity extraction and relation extraction by vectorizing operation processing on the original text data to be summarized, repairs standard errors among entities are added in the extraction method, global optimal sequence solving can be completed on the output data of the weight association model by training the relation among adjacent labeled entities, finally knowledge extraction on the fault processing text of the power distribution network is realized, the system accuracy is improved, the method is closer to the original semantics and is beneficial to optimizing the knowledge extraction process.

B. The invention provides a power distribution network fault assistant decision knowledge extraction method and a power distribution network fault assistant decision knowledge extraction system based on weight association and error correction, which deeply learn entities in text data of a power system and information such as logic, organizational structures, operation and constraint of the entities, analyze fault states and fault data parameters in real time, simultaneously convert unstructured contents such as operation rules, treatment plans, scheduling rules and fault information into a structural knowledge network expression mode capable of reasoning by combining a natural language processing technology, process unstructured and semi-structured data by relying on a data preprocessing module, a Bi-LSTM module, a weight association model, an error correction module and a model evaluation module, enable the text data to be vectorized, simultaneously retain original semantics of the text data, and output assistant decisions in a relatively short time range, which is the greatest advantage of a power field application knowledge map and helps power system scheduling personnel to make decisions in the power field, the scheduling field and the operation and inspection field and complete a power distribution network fault treatment series emergency response plan, thereby promoting intelligent control and scheduling process of the power system.

C. The system provided by the invention can be used for constructing the subsequent power system fault decision knowledge graph to be a foundation, and mapping the topology of the power grid (for example, the position of power equipment, parts and other related information form a natural topological graph) to the knowledge graph plate, so that the early-stage work is completed, and the completeness of a knowledge graph decision item is ensured; compared with the traditional manual scheduling decision process, the design of the system benefits from the support of cloud computing and AI technology, a large amount of manpower, financial resources and material resources can be saved, the system is combined with the audit of professional schedulers, the decision specialty and accuracy can be improved while the fault processing scheduling decision time is shortened, the deployment and the conversion with low cost and quick response are realized, and the maximization of the benefit is brought to a power grid company.

Drawings

In order to more clearly illustrate the embodiments of the present invention, the drawings which are needed to be used in the embodiments will be briefly described below, and it is apparent that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained from the drawings without inventive labor to those skilled in the art.

FIG. 1 is a flow chart of a power distribution network fault assistant decision knowledge extraction method provided by the invention;

FIG. 2 is a flow chart of a Bi-LSTM module provided by the present invention;

FIG. 3 is a flowchart of the weight relevance model provided by the present invention;

FIG. 4 is a flow chart of the linear chain model provided by the present invention;

FIG. 5 is a comparison of the effect of the error-free correction module provided by the present invention;

FIG. 6 is a flow chart of accident history knowledge extraction provided by the present invention;

fig. 7 is a structural diagram of a power distribution network fault assistant decision knowledge extraction system provided by the invention.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1, the invention provides a power distribution network fault decision-making assisting knowledge extraction method, which comprises the following steps:

s01, vectorizing the original text data after the integration is conducted, and a word vector set with original semantics reserved is formed.

Because the data structure in the power system contains information such as entities, logics, organizational structures, operations, constraints and the like, and contains unstructured contents such as operation rules, treatment plans, scheduling rules, fault information and the like, the unstructured and semi-structured data can be converted into a structural knowledge network expression mode which can be inferred through a natural language processing technology, so that the original semantics of the text data can be kept while vectorization of the text data is carried out, auxiliary decisions are output within a relatively short time range, power system scheduling personnel are helped to carry out power distribution network fault processing, and the intelligent control and scheduling process of the power system is promoted.

The processing method for performing vectorization operation on non-text data in original text data to form a word vector set retaining original semantics comprises the following steps: manually operating input non-text data or sorting and summarizing the input non-text data into text data by using character conversion software; segmenting the text data by using Python codes and punctuation marks as identifiers; vectorization operation is carried out on the Duan Wen data by using a word vector training tool, and the data are collected into a data set for storage after multiple cycles to form a word vector set capable of retaining original semantics. The Word vector training tools herein may be Word2Vec, fastText, etc.

After the non-text data in the original text data is converted and sorted into the text data, the method further comprises the step of processing the detail problem of the text data, wherein the process mainly comprises four processes of missing value processing, abnormal value processing, repeated value processing and noise filtering.

For text data, when a missing value exists in a certain piece of data, different operations may be performed according to the number of attributes of each piece of data, and when the number of attributes of the data is small (for example, there is only one associated attribute or none), a deletion method is selected, that is, the piece of data is directly deleted. The method for processing the data loss is suitable for the conditions that the number of samples is large, the percentage of the lost data in the total amount of the samples is small, and the method has the advantages of simplicity, convenience, easy operation and the like. When the data has a large number of attributes, the data may be padded using interpolation. Firstly, a maximum likelihood estimation method is adopted, secondly, a mean interpolation method is adopted, and finally, a regression interpolation method is selected. The specific operation of the maximum likelihood estimation interpolation method is to call the sample information in the existing database and calculate the maximum probability parameter causing the result to appear by adopting reverse thinking. The method has wide application range, almost covers most data conditions, and has great effectiveness and consistency in interpolation values. Its maximum likelihood function is expressed as:

the value of the current parameter is

When, if a function

When the maximum value is reached, the parameter at this time

Is the estimated value sought.

The average method is to collect the samples in the database into a data set, add the data set and then call a weighted average function, or sort and screen the data set to obtain a value with the most repetition times to replace the original data, so as to obtain an interpolation value. The calculation can be done using the following simplified equation:

wherein the parameters

The condition of the response is characterized,

= 1 i.e. the default is the "yes" state,

state of no by default if = 0;

the quantity is characterized.

The regression interpolation method is to analyze variables needing to be supplemented and completely existing variables in the existing data set and estimate the missing value variable by constructing a regression equation simultaneously containing the two variables. Is provided with

Is an original independent variable of the system,

is a variable to be solved for the system,

is a constant parameter of the number of the optical fibers,

is as follows

The weight coefficient of each independent variable, taking random factors into account

Constructing a data set, substituting the data set into the following formula to respectively obtain each missing interpolation value:

[ S012 ] abnormal value processing. Irrelevant material involved in the text data will be considered as anomalous data. The mode of directly discarding the abnormal data is not applicable; the data with fuzzy definition of the data content is transferred to a professional to be secondarily adjusted, so that the data content is specialized, glossed and refined, and the use requirement is met.

[ S013 ] repetition value processing. For repeated text contents existing in the data, firstly defining the range of the repeated contents needing to be inquired; grouping the repeated contents By using Group By; the method comprises the steps of querying data with the quantity larger than 1 by using Having, independently storing the data in a temporary file list, deleting all records with data repetition stored in the temporary file list by using a ' Select Distinct (Select deduplication) ' From address (query expression) ' loop statement to obtain a temporary file list without repeated records, summarizing the temporary file list and the data with the repetition frequency smaller than 1 to obtain a final result set, and accordingly reducing unnecessary resource waste in a system data processing process.

[ S014 ] noise filtering. There are often random errors in the raw data, which are referred to as noise. The truth of the data set is greatly influenced by noise data, so that the noise filtering is carried out in the system by adopting an outlier analysis method, a mean smoothing method and a regression method in consideration of the form diversity of the original data.

The outlier analysis method obtains a group of data sets through clustering, the data sets are called as a class of clusters, data in the clusters have high similarity, and outliers which fall outside the clusters due to large difference with values in the clusters are deleted, so that the purpose of noise reduction can be achieved.

The mean value smoothing method is commonly used for removing noise data in images, each pixel gray value with sequence characteristics is replaced by the data mean value in the field range, and effective data denoising and vocalization can be achieved.

The regression method firstly needs to preliminarily judge the trend of original data by using a visualization method, and the central idea is to find a regression function to fit the original data, then carry out smoothing operation of the function, and replace the original data set with the data set obtained again.

S015, selecting a Continuous-Bag-Of-Words (CBOW) algorithm to generate a word vector based on the size degree Of the data set; the text content is trained into a WORD vector format through WORD vector training software WORD2VEC, so that the conversion from text knowledge to a computer recognizable form is realized, the subsequent calculation and analysis process is participated, and the characteristics and text expression of the power system distribution network fault auxiliary decision are realized.

And S02, performing entity extraction on the converted word vector set, outputting the overall characteristics of the hidden layer and labeling the obtained entity.

Obtaining the established word vector set information by adopting a Bi-LSTM combined model and learning; then, using the LSTM network to perform entity extraction of the knowledge extraction system; using a Bi-LSTM network to obtain two different characteristics of each sentence, namely forward information and backward information, and outputting the overall characteristics of a hidden layer through a certain algorithm; and selecting a BIEOS method to label the entity extracted from the text data.

The invention introduces a 'gate' concept combined with a storage mechanism, after receiving information, an input gate calculates the information required to be stored by an LSTM network according to the unit state and the current weight setting and updates the information in a rolling way in real time, and the method comprises the following specific steps:

[ S021 ] the LSTM network is improved based on the RNN structure model, receives the preprocessed word vector set output information as input, and learns. By receiving doors

Door for throwing away

Sum result door

And a data recording gate

The four parts are formed. The discard gate processes the data to be discarded, and the discarded content is determined by the following formula:

wherein:

characterizing time

A receiving variable of (1);

characterizing a previous time period

Deep layer results of (1);

is that

A weight;

is that

The weight of (2);

is an offset.

The receiving gate calculates the information to be stored when the cell is updated in the LSTM network, and the information is determined by the following formula:

in the formula (I), the compound is shown in the specification,

is that

The weight of (c);

is that

The weight of (c);

is the hypothetical cell state;

is that

Is/are as follows

A weight;

is that

Is/are as follows

A weight;

and

characterization of each

And

the amount of error of (a);

representing the state of the current grid;

characterizing a grid state of a previous time period;

characterization of

CAVs with middle parameters transformed by Sigmoid function;

characterization of

CAVs with middle parameters transformed by Sigmoid function;

characterization of

Middle parameters are CAVs transformed by tanh function. (CAVs are activation vectors)

The results gate outputs the LSTM model:

in the above formula:

is that

A weight;

is that

A weight;

characterizing an error amount;

characterizing the output result of the LSTM;

characterization of

Middle parameters are CAVs transformed by Sigmoid function.

Because the LSTM network only has the status feature of one-way transmission from the front to the back, the LSTM network can only obtain the advanced link of the text message, and cannot obtain the subsequent link of the text message. The invention provides an advanced relation for fusing each data by a Bi-LSTM network architecture

And subsequent contact

These 2 different connections, as shown in fig. 2; then calculating to obtain the final edition expression result of the deep layer

And acquiring a result set of a scheduling data vector version of the knowledge extraction system which keeps the original semantics.

BIEOS coding mode is a commonly used entity notation, where B (begin) characterizes the beginning of an entity; i (inside) characterizes the piece of data positioned inside this label; o (outside) characterizes the piece of data as being located outside this tag; e (end) characterizes the end of this entity; s (single) characterizes the piece of data as a single entity. And will not be described in detail herein.

S03 focuses on key knowledge in the converted text file, and the labeled attribute relation of the same entity is distributed to a certain relation class by adopting a multi-classification principle, so that entity relation extraction is completed.

The multi-classification principle here means that the same entity may have multiple attribute relationships, and these attribute relationships belong to different relationship classes. While elements in the same relationship class may have multiple attributes. Therefore, the invention classifies the multi-classification principle. Here, the attribute relationship may be allocated to different relationship classes according to the content, such as ID, charcter, task, time, location, quantity, retrieval, status, and the like; it can also be assigned according to characteristics, such as simple attributes, composite attributes; single-value attribute, multi-value attribute and null value attribute; derived attributes, etc.

A weight association mechanism is introduced to improve the weight parameters, a knowledge extraction process is selectively carried out, parallel calculation can be carried out, text data are trained in a targeted mode according to different weight parameters, key contents are screened, and the running speed and the running efficiency are improved. The method comprises the following specific steps:

s031, learn and train the input data by applying the following formula:

in the formula:

characterizing non-apparent states

The relative degree of importance of;

characterizing a certain vector

The amount of error of (a);

is the weight automatically assigned by the weight association model;

characterizing the number of independent parameters in the Bi-LSTM network;

characterization of

Middle parameters are CAVs transformed by tanh function.

Substituting the following equation to obtain the final result, as shown in fig. 3:

in the formula:

is the final output of the weight correlation model;

is a weight-associated model of

Assigning time instants to non-distinct states

The weight of (c).

S04, training the relationship between adjacent labeled entities, repairing the labeled errors among the entities, and outputting the repaired structural association relationship among the entities.

And calling the interdependence relation between adjacent entity labels and carrying out decoupling analysis, thereby completing global optimal sequence solution on the output data of the weight association model and finally realizing the knowledge extraction of the fault processing text of the power distribution network.

As shown in fig. 4, the correct output likelihood ratio is estimated according to the following equation:

parameter(s)

Linear chain model in characterization error correction module(Linear-Chain) result value,

representing a received value of a Linear-Chain model in an error correction module;

characterizing the emission probability;

representing the conversion probability;

denotes a parameter

And

number of elements in the vector.

The simplified format is obtained by logarithmizing both sides of the equation:

the maximum overall probability ratio result sequence of the prediction stage can be obtained:

in the formula:

characterizing specified predicted input values

As a function of (c).

And S05, evaluating a knowledge extraction result according to the error range of the structured incidence relation between the summarized original text data and the repaired entities in the S01.

Comparing the deviation degree of the result sequence with the original sequence, and judging the accuracy of the work done before from two aspects: on one hand, comparing and calculating the entity sequence deviation amount and the relation sequence deviation amount respectively; on the other hand, the overall memory change polarity (i.e., whether the data is increased or decreased from the original state) of the result sequence obtained by the comparison.

For the first aspect, for the entity sequence deviation amount, if the entity has a deviation in actual work, the accuracy of the scheduling decision is greatly affected, and the influence of the deviation of the relationship data is relatively small, so that the different influence degree of the relationship data and the entity is represented by a bias coefficient.

Determining respective reward and punishment parameter forms aiming at the second aspect, and if the deviation amount is within an allowable error range, determining that the decision is effective and relatively accurate, wherein the reward and punishment parameters are embodied as reward factors; and if the deviation amount is out of the allowable error range, the decision is not reliable, and the reward and punishment parameters are reflected as penalty factors. If the whole sequence is changed in the forward direction (namely the content of the result sequence is increased), the decision is omitted before, and information loss can be caused when the decision is serious, so that reward and punishment parameters are calculated in a quadratic function mode; if the sequence is changed in a negative direction as a whole (namely the content of the result sequence is reduced), the decision is not obviously omitted, but the screening is not precise, the situation of data redundancy exists, and therefore the reward and punishment parameters participate in the calculation in a linear function form.

And (3) carrying out final reward and punishment training by using the following functions and taking the result as the evaluation standard of the model goodness and badness:

wherein, F is a reward and punishment result,

is the error threshold range;

is the value of the error in the physical sequence,

is a relational sequence error value;

is the entity weight bias coefficient;

is a relation bias weight coefficient.

Then the system quality evaluation result can be obtained:

wherein, the first and the second end of the pipe are connected with each other,

characterization of

The number of parameters of (2);

characterization of

The total number;

the system error rate is characterized, expressed as a percentage.

The smaller the value is, the higher the system accuracy is, and the higher the decision reference value is;

the larger the value, the larger the system error rate, the lower the decision reference value, and the need for the dispatcher to take careAnd checking and judging and modifying the system decision by combining manpower.

In addition, the invention also provides a power distribution network fault assistant decision knowledge extraction system, which comprises a data preprocessing module, a Bi-LSTM module, a weight correlation model, an error correction module and a model evaluation module, as shown in FIG. 7. The data preprocessing module is used for performing quantization operation on the text data after being processed and aggregated to form a word vector set which retains original semantics, and further comprises a missing value processing module, an abnormal value processing module, a repeated value processing module and a noise filtering processing module, wherein the missing value processing module is used for performing direct deletion processing on the text data with less data attribute quantity and performing interpolation filling processing on the text data with more data attribute quantity; the abnormal value processing module is used for discarding non-relevant data related to the text data; the repeated value processing module is used for deleting the repeated text content in the text data; the noise filtering processing module is used for carrying out noise filtering processing on random errors contained in the text data.

The Bi-LSTM module is used for extracting and labeling the entity of the word vector set output by the data preprocessing module; the weight association model is used for screening the weight of each entity extracted from the input text data, identifying and judging the relation among the entities and extracting the relation;

and the model evaluation module evaluates the accuracy of the model according to the error range of the structured incidence relation between the original text data and each repaired entity.

Examples

In order to verify the application value of the knowledge extraction system designed by the text, the failure report of a certain area and historical scheduling decision text data are used as samples for experimental verification, the processed failure is represented as a power failure event caused by switch tripping, and the analyzed scheduling decision text is used for power transmission operation after the failure.

Firstly, input non-text data is sorted and summarized to form text data by using character conversion software; the text data is segmented by using Python codes and punctuation marks as identifiers, then vectorization operation is carried out on the binary Duan Wen data by using a word vector training tool, and a word vector set capable of retaining original semantics is formed finally after multiple cycles.

Secondly, acquiring information output by the last module by adopting a Bi-LSTM combined model and learning; using LSTM network to extract entity of knowledge extraction system; and selecting a BIEOS method to label the entity extracted from the text data.

And then, modifying the weight by adopting a weight association model, selectively carrying out parallel processing on the data input into the module, identifying and judging the possibly existing relation among different entities, and finishing the relation extraction. And then, repairing a small amount of labeling errors in the Bi-LSTM module through an error correction module, establishing an association relation among output labels and outputting a final result.

And finally, evaluating the accuracy of the designed model based on a reward and punishment mechanism training result according to the error range of the original sequence of the receiving end of the last module and the final output end result sequence.

The specific steps in the power grid model are as follows:

setting experiment parameters: the BIEOS notation is used, with the entity tagging rules as shown in table 1.

Table 1 entity tagging rules

The values of the parameters required for the experiment are shown in table 2:

TABLE 2 System parameter settings

In order to prove that compared with the traditional method, the combination of the Bi-LSTM module, the weight correlation model and the error correction module provided by the invention can better perform entity extraction and relationship extraction, different test combinations are set for comparison, the final result of the system is evaluated, and the system accuracy under different experimental conditions is obtained as shown in Table 3.

TABLE 3 comparison of accuracy rates for different models

In order to verify the existence of the error correction module, the global optimal sequence solution can be completed on the output data of the weight association model, the knowledge extraction of the fault processing text of the power distribution network is finally realized, and the system accuracy is improved. The existing setting of a contrast experiment is verified, the first scheme does not carry out an error correction link, the second scheme adopts the error correction module provided by the system of the invention, the two schemes of the other links are completely consistent, and the obtained result is shown in figure 5.

Fig. 6 shows the application result of the system knowledge extraction method of the present invention visually with the accident passage and the failure disposition in a certain area as the raw data. The power distribution network fault auxiliary decision knowledge extraction system comprises the following modules:

the data preprocessing module is used for performing vectorization operation on the original text data after the completion and the summarization to form a word vector set which retains original semantics;

the weight association model is used for screening the weights of all the entities extracted from the input text data, identifying and judging the relation among the entities and extracting the relation; the error correction module is used for repairing the structural association relationship among the entities;

the error correction module is used for training the relationship between adjacent labeled entities in the Bi-LSTM module, repairing labeled errors in the Bi-LSTM module and outputting the structural association relationship between the repaired entities;

Firstly, carrying out segmentation processing on input text data by using a data preprocessing module to form a plurality of segmented text data; then, vectorizing the data Duan Wen by using a word vector training tool to form a word vector set capable of retaining original semantics, such as 'accepting client application, examining, meeting new installation requirements, performing site investigation and determining a scheme, examining and approving a scheme … …' and the like in text data; adopting a Bi-LSTM module to extract entities from the word vector set, outputting the overall characteristics of the hidden layer and labeling the obtained entities; focusing key knowledge in the text data by adopting a weight association model, abandoning unnecessary knowledge, and distributing the attribute relations of the same entity to a certain relation class by adopting a multi-classification principle to finish the labeling of various attribute relations; the error correction module trains the relationship between adjacent entity labels to obtain the global optimum of the text labels, and finally, a flow diagram of the structured association relationship is formed, as shown in the right diagram in FIG. 6; and finally, the model evaluation module evaluates the work of the system according to the output result, and whether the evaluation accords with the meaning expressed by the original text data. The system of the invention carries out text preprocessing, entity and entity relation extraction on the obtained left graph in FIG. 6, automatically generates a structural association relation graphic diagram of the relation between adjacent entities, combines the system with the auditing of professional dispatchers, can improve the specialty and accuracy of decision while shortening the decision time of fault handling dispatching, realizes the deployment and conversion with low cost and quick response, and brings the maximization of the income for power grid companies.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The scheme in the embodiment of the invention can be realized by adopting various computer languages, such as object-oriented programming language Java and transliteration scripting language JavaScript.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all changes and modifications that fall within the scope of the invention.

Nothing is said about the invention as applied to the prior art.

It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications derived therefrom are intended to be within the scope of the present invention.

Claims

1. A power distribution network fault auxiliary decision knowledge extraction method is characterized in that,

vectorizing the obtained original text data to form a word vector set retaining original semantics;

training the relationship between adjacent labeling entities, repairing the labeling errors among the entities, and outputting the structural association relationship among the repaired entities;

2. The power distribution network fault assistant decision-making knowledge extraction method according to claim 1, wherein vectorization operation is performed on the obtained original text data to form a word vector set retaining original semantics, and the method comprises the following steps: manually operating non-text data in the input original text data or sorting and summarizing the non-text data into text data by using character conversion software; segmenting the text data by using Python codes and punctuation marks as identifiers to form segmented text data; vectorization operation is carried out on the Duan Wen data by using a word vector training tool, and the data are collected into a data set after multiple cycles to form a word vector set capable of retaining original semantics.

3. The extraction method of power distribution network fault assistant decision knowledge according to claim 2, wherein the non-text data includes one or more of operation procedures, treatment plans, scheduling procedures, fault information and tables, pictures and voices of scheduling instructions.

4. The extraction method of power distribution network fault assistant decision knowledge according to claim 2, wherein after non-text data in the input original text data is manually operated or is collated and summarized into text data by using character conversion software, missing value processing, abnormal value processing, repeated value processing and noise filtering processing are further performed on the text data.

5. The power distribution network fault assistant decision knowledge extraction method of claim 4, wherein performing entity extraction on the word vector set and labeling the obtained entities comprises:

the method comprises the following steps of learning input entities by adopting a Bi-LSTM combined model, extracting the entities through an LSTM network, and labeling the entities extracted from text data by adopting a BIEOS entity labeling method, and specifically comprises the following steps:

Discarding door

Door for recording and displaying results

And a data recording gate

；

wherein:

a receive variable characterizing time t;

characterizing a previous period

Deep layer results of (2);

is that

A weight;

is that

The weight of (2);

is an offset;

the receiving door

When the LSTM network carries out cell updating, the information to be stored is calculated, and the formula is as follows:

wherein:

is that

The weight of (c);

is that

The weight of (c);

is the hypothetical cell state;

is that

Is/are as follows

A weight;

is that

Is/are as follows

A weight;

and

characterization of each

And

the amount of error of (a);

representing the state of the current grid;

characterizing a grid state of a previous time period;

characterization of

CAVs with medium parameters transformed by a Sigmoid function;

characterization of

CAVs with middle parameters transformed by Sigmoid function;

characterization of

CAVs with medium parameter transformed by tanh function, the CAVs being activation vectors

An amount;

the result door

Outputting the entity extracted from the Bi-LSTM combined model:

in the formula:

is that

A weight;

is that

A weight;

characterizing an error amount;

characterizing an output result of the LSTM network;

characterization of

Middle parameters are CAVs transformed by Sigmoid function.

6. The power distribution network fault assistant decision-making knowledge extraction method according to claim 5, wherein a weight association mechanism is introduced to improve weight parameters, text data are trained according to different weight parameters, and key contents are screened to extract entity relationships, and the method specifically comprises the following steps:

the input word vector set data is learnt and trained by adopting the following formula, and the input word vector set data is selectively processed in parallel:

in the formula:

characterizing non-apparent states

Relative degree of importance of;

sign a certain vector

The amount of error of (a);

is the weight automatically assigned by the weight association model;

characterizing the number of independent parameters in the Bi-LSTM network;

characterization of

CAVs with medium parameters transformed by tanh functions;

characterizing the look-ahead relationship for each text datum;

characterizing subsequent contacts for each text datum;

is an operator;

in the formula:

is the final output of the weight correlation model;

is a weight correlation model

Assigning time instants to non-distinct states

The weight of (c).

7. The power distribution network fault assistant decision knowledge extraction method according to claim 5, wherein decoupling analysis is performed by calling the interrelation between adjacent labeled entities, global optimal sequence solution for output data and knowledge extraction for power distribution network fault processing texts are sequentially completed, and the correct output probability ratio is calculated according to the following formula:

wherein:

characterizing the emission probability;

representing the conversion probability;

is referred to as a parameter

Number of elements in the vector.

8. The method for extracting power distribution network fault assistant decision knowledge according to claim 7, wherein the quality of the extraction of the assistant decision knowledge is evaluated based on a reward and punishment mechanism according to an error range of a structured incidence relation between the obtained original text data and the repaired entities, and an evaluation function is adopted as follows:

wherein F is a reward and punishment result,

is the error threshold range;

is the value of the error in the physical sequence,

is a relational sequence error value;

is the entity weight bias coefficient;

is a relation weight bias coefficient;

characterization of

The number of parameters of (2);

representing the total number of F;

the system error rate is characterized, expressed as a percentage.

9. The system for extracting the power distribution network fault auxiliary decision knowledge is characterized by comprising the following modules:

and the model evaluation module is used for evaluating the accuracy of the model according to the error range of the structured incidence relation between the summarized original text data and the repaired entities.

10. The system for extracting power distribution network fault assistant decision knowledge according to claim 9, wherein a weight association model is further provided in the system, and is used for performing weight screening on each entity extracted from the input text data, identifying and determining the relation among the entities, and performing relation extraction; and repairing the structural association relationship among the entities through the error correction module.