CN115617946A

CN115617946A - Power supply operation and maintenance full-factor data fusion method based on knowledge graph

Info

Publication number: CN115617946A
Application number: CN202211258052.XA
Authority: CN
Inventors: 邢春阳; 李国玉; 魏荣耀; 张岩; 安佰秀
Original assignee: Qingdao Urban Rail Transit Technology Co ltd; Qingdao Metro Group Co ltd
Current assignee: Qingdao Urban Rail Transit Technology Co ltd; Qingdao Metro Group Co ltd
Priority date: 2022-10-13
Filing date: 2022-10-13
Publication date: 2023-01-17

Abstract

A power supply operation and maintenance full-factor data fusion method based on a knowledge graph comprises the following steps: the method comprises the steps of obtaining historical fault documents, extracting fault element information, sequencing according to the position of each fault and the duration time of the fault, carrying out fault modeling to obtain a fault model, using TF-IDF coding to realize vectorization of fault information, obtaining vector representation of fault description, carrying out comparison verification, carrying out similarity matching, identifying fault types, and pushing different information sets aiming at different types of nodes.

Description

Power supply operation and maintenance full-factor data fusion method based on knowledge graph

Technical Field

The invention relates to the technical field of information processing, in particular to the technical field of rail transit, and particularly relates to a power supply operation and maintenance full-factor data fusion method based on a knowledge graph.

Background

The safe operation of the rail transit is not safe, standard and reliable, the power supply system is blood of rail transit transportation and is a core system, once the power supply system breaks down or is interrupted, the power supply system not only can cause paralysis of urban rail transit transportation, but also can endanger the life safety of passengers, and brings huge pressure to ground wire public transportation, thereby causing adverse effects on social stability and urban image.

The knowledge graph is a modern theory which achieves the aim of multi-discipline fusion by combining theories and methods of applying subjects such as mathematics, graphics, information visualization technology, information science and the like with methods such as metrology introduction analysis, co-occurrence analysis and the like and utilizing a visualized graph to vividly display the core structure, development history, frontier field and overall knowledge framework of the subjects.

In the prior art, the invention patent with the publication number of CN112307218A discloses a method for constructing a fault diagnosis knowledge base of typical equipment of an intelligent power plant based on a knowledge graph, which is directly oriented to the field of fault diagnosis of the typical equipment of the intelligent power plant, and the fault diagnosis knowledge base is designed by combining multi-mode fault diagnosis data from a factory and the Internet and expert knowledge to construct the fault diagnosis knowledge graph which is stored in the knowledge base, so that the automation level of fault diagnosis is effectively improved. The tower-shaped knowledge map framework in the form of 'double-layer-three-element' is redesigned, and the retrieval and application are facilitated while the ideographic capability is strong. The invention establishes the description vector of the text in the knowledge graph without supervision by using the bidirectional GRU model, contains the semantic information of the text, can be used for optimizing the fault diagnosis knowledge graph, improves the reasoning and calculating efficiency, and has important significance for the application of the fault diagnosis knowledge graph on the ground. The invention patent with publication number CN114491037A discloses a fault diagnosis method, device, equipment and medium based on knowledge graph. The method comprises the following steps: when equipment fails, determining a characteristic vector of current fault equipment based on an equipment fault knowledge graph, wherein the equipment fault knowledge graph is constructed based on fault diagnosis data of historical fault equipment; determining a similarity between the feature vector of the current faulty equipment and the feature vector of each historical faulty equipment in the equipment fault knowledge-graph; and determining the diagnosis result of the historical fault equipment corresponding to the highest similarity as the diagnosis result of the current fault equipment, and pushing a solution corresponding to the diagnosis result to the current fault equipment. According to the method and the device, the accuracy of equipment fault diagnosis can be improved, and the equipment fault diagnosis effect is improved. Based on the prior art, the specific implementation mode is realized by modeling, TF-IDF coding and matching and the like, and the capacity and the speed of data storage and the continuity of fault occurrence are not designed in a targeted manner.

Subway power supply operation and maintenance business relates to a series of businesses such as standing book maintenance, operation monitoring, maintenance operation, preventive test, inspection operation, safety, emergency and the like, ensures the safe and stable operation of equipment, and simultaneously generates a large amount of historical data traces.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a power supply operation and maintenance full-factor data fusion method based on a knowledge graph.

The invention provides a power supply operation and maintenance full-factor data fusion method based on a knowledge graph, which comprises the following steps:

(1) Acquiring historical fault documents as source data, sorting the source data according to the positions of faults, and then sorting the source data at the same position according to the duration of the fault on the basis of the sorting of the faults at the same position; extracting the historical fault documents according to the classification and sequencing order by utilizing fault knowledge to obtain the element information of the fault; estimating the number of character strings corresponding to the fault factor information;

(2) Organizing the fault element information according to a fault model mode, and performing fault modeling to obtain a fault model;

(3) Splicing all elements in the fault model into a complete character string, and using TF-IDF coding to realize vectorization of fault information to obtain vector representation of fault description; calculating the number of characters contained in the character string according to the vector representation of the fault description, randomly selecting a bit of character of the character string and the position of the character in the character string, and converting the position of the character in the character string into the position in the corresponding element information;

(4) Comparing the number of the character strings estimated in the step (1) with the number of the characters contained in the character strings obtained by calculation, if the numbers are the same, searching and comparing the searched and compared characters in the corresponding element information in the step (1) according to the positions of the characters obtained by conversion in the step (3) in the corresponding element information, and if the numbers are the same, entering the next step;

(5) Matching similarity between the vectorized and converted fault description vectors and the vectors in the fault knowledge base, calculating the matching degree between the faults, and returning to the fault record with the highest similarity in the knowledge base;

(6) And identifying the fault type, and pushing different information sets aiming at different types of nodes.

Wherein, the step (1) further comprises:

(1.1) dividing historical fault documents layer by layer according to a tree structure, wherein the information of each event is represented by a directed graph, and the event of each leaf node stores detailed information related to faults and comprises a complete process from occurrence to solution of the fault event;

and (1.2) extracting entity and relation fault knowledge from the processed historical fault document.

Wherein, in the step (1.1), each failure event is split into three parts, namely event information, failure information and handling information, wherein:

event information for storing devices appearing in the fault event and original text information in the fault event; wherein the raw textual information includes at least one of a summary, an impact, and a cause analysis of the fault event;

fault information including fault occurrence characteristics and fault names;

and handling information, identifying key actions and entities according to the solutions given in the historical fault documents, taking the actions as the relation, and taking the object in the historical fault documents as the entity pointed to by the relation.

The step (1) of extracting the historical fault document by utilizing fault knowledge to obtain the fault element information is to obtain the fault element information after passing through a TPLink model of a fault knowledge extraction part.

The element information of the fault obtained by the TPlinker model of the fault knowledge extraction part is specifically as follows:

given two entities p1 and p2, and a particular relationship type r, the model attempts to answer the following three questions:

a) p1 and p2 are the start and end positions of not two identical entities;

b) p1 and p2 are the starting positions of two entities not involved in the relation r;

c) p1 and p2 are end positions of two entities not involved in the relation r;

the TPLinker model answers these three questions using a handshake labeling method and labels three matrices for each relationship to represent different labeling results.

The fault element comprises at least one of a substation, fault equipment, a fault description and a fault time.

In the step (3), the TF-IDF coding is used to implement vectorization of the fault information, and a vector representation of the fault description is obtained, specifically:

the TF-IDF value is finally determined by the product of TF and IDF, and the TF-IDF coding is calculated as follows;

the calculation process of TF is as follows:

the IDF is calculated as:

the smoothing operation process comprises the following steps:

sim _TF-IDF ＝tf _ij *idf；

n is the total number of documents, x represents an arbitrary word, N (x) represents the number of documents in which the word x appears, and N (x) +1 in the denominator is a smoothing operation employed to avoid the denominator being zero.

Wherein, the process of calculating the matching degree between faults by the cosine similarity in the step (4) is as follows:

and (5) pushing the fault information in the step (6) by adopting a SpringBoot and MybatisPlus framework.

The power supply operation and maintenance full-factor data fusion method based on the knowledge graph can realize that:

the knowledge graph is used as the brain of the power knowledge, a large amount of power data and fault knowledge are mastered, fault location and analysis are rapidly achieved when a fault occurs, important information capable of assisting decision making is gathered from the large amount of power knowledge, dependence on expert experience is reduced, and dependence on real-time data is reduced.

Compared with the prior art, the source data are preprocessed in a classification and sequencing mode according to the position of the fault and the priority of the fault duration, so that the subsequent processing efficiency of the whole system is improved, meanwhile, parallel transmission and targeted storage can be carried out subsequently according to the data size capacity, and the efficiency is improved to a great extent.

Compared with the prior art, the verification step is firstly set in the TF-IDF mode, and the specific verification mode is set in a targeted manner, so that the vectorized data is verified, the validity of the data can be ensured under the condition of small calculation amount,

drawings

FIG. 1 is a general architecture diagram of a power supply operation and maintenance full-factor data fusion method based on a knowledge graph;

FIG. 2 is a tree structure diagram of a fault event;

FIG. 3 is a diagram of a fault event architecture;

FIG. 4 is a handshake tagging method;

FIG. 5 is a compressed representation of a matrix;

FIG. 6 is a model structure diagram;

fig. 7 is a diagram of a failure matching structure.

Detailed Description

Reference will now be made in detail to the embodiments of the present invention, the following examples of which are intended to be illustrative only and are not to be construed as limiting the scope of the invention.

The invention provides a power supply operation and maintenance full-factor data fusion method based on a knowledge graph, and a specific related implementation manner is shown in accompanying fig. 1-7, wherein fig. 1 is a general architecture diagram of the power supply operation and maintenance full-factor data fusion method based on the knowledge graph, fig. 2 is a tree structure diagram of a fault event, fig. 3 is a fault event structure diagram, fig. 4 is a handshake marking method, fig. 5 is a compression representation of a matrix, fig. 6 is a model structure diagram, and fig. 7 is a fault matching structure diagram. The power supply operation and maintenance full-factor data fusion method based on the knowledge graph is specifically introduced below.

The application provides a power supply operation and maintenance full-factor data fusion method based on a knowledge graph, aiming at the problems, the knowledge graph is adopted to be combined with various service data for analysis and refinement, and the functions of fault study and judgment and fault matching are provided for various devices and nodes.

Firstly, fault modeling is carried out, fault knowledge is analyzed by specifically using an event extraction idea, and the fault knowledge is divided layer by layer according to a substation, equipment and a serial number in a tree structure. For example, equipment of the type such as a transformer and a breaker is arranged in the Li village station, and the equipment can be divided into the breaker 322, the breaker 201 and the like according to the number, and each equipment has a corresponding document description, so that the advantage that the approximate range of fault knowledge can be quickly positioned according to a tree structure when a fault occurs is provided. Detailed information about the fault is saved in each leaf event, such as "fault name", "normal state", "fault signature", "fault handling", etc. The complete process from occurrence to resolution of a fault is described by these events.

The following detailed description is made with reference to the accompanying drawings. As shown in the attached figure 1, the application provides a power supply operation and maintenance full-element data fusion method based on a knowledge graph, data of the method is derived from historical fault documents, fault element information can be obtained after a fault knowledge extraction part of a TPlinker model, fault element information is organized according to a fault model mode to achieve fault modeling, on the basis of fault modeling, vectorization can be performed on faults through TF-IDF coding, fault matching is performed through cosine similarity, and in the process of fault matching, existing fault information needs to be read from a fault knowledge base, and the existing fault information is matched with the obtained faults. In order to avoid the defects of the prior art, the capacity and the speed of data storage and the continuity of fault occurrence are designed in a targeted manner, in the step, source data are classified and sorted according to the positions of fault occurrence and the priority of the fault duration time, namely, historical fault documents are classified and sorted according to the positions of faults while being used as the source data, the types of multiple equipment faults and the fault types of the same equipment can be initially classified in a position mode, then on the basis of the fault sorting of the same position, the source data of the same position are sorted according to the fault duration time, namely, the fault duration time is sorted from the longest to the shortest mode, so that the mode that the classification and the sorting are firstly carried out according to the positions of the faults and the fault duration time are obtained, the source data are preprocessed according to the classification and the sorting of the positions of the faults and the priority of the fault duration time, the subsequent processing efficiency of the whole system is improved, and the subsequent parallel transmission and the targeted storage efficiency can be improved to a great extent according to the data size.

In the fault modeling process, the fault knowledge is analyzed by using the idea of event extraction, and the fault document is divided in a mode shown in fig. 2. The method is characterized in that the power substation, the equipment and the serial number thereof, the fault and the serial number thereof are divided layer by layer in a tree structure. For example, the types of equipment such as a transformer and a breaker are arranged in the Li village station, the equipment can be divided into a 1# transformer, a 201 breaker and the like according to the number, each equipment is provided with a corresponding document description, and the corresponding fault and the number thereof are defined as a 1# transformer fault and a 201 breaker fault, so that the approximate range of fault knowledge can be quickly positioned according to a tree structure when the fault occurs.

Referring to fig. 3, a fault can be treated as an event, and can be arranged in a tree form according to the distribution of the fault, which is called an event tree, so that an event message can also be represented by a directed graph like that shown in fig. 3. The event of each leaf node stores detailed information about the fault, such as "fault name", "normal state", "fault feature", "fault handling", and the like. The complete process from occurrence to resolution of a fault is described by these events. A failure event can be divided into three parts of event information, failure information and handling information, which are connected through a relationship of 'occurrence' and 'compliance'. Based on the information obtained from the document, a large amount of original text information is stored in the event information, such as a summary, impact and cause analysis of the event, and devices occurring in the event are also stored therein. The failure information includes the failure occurrence characteristics and the failure name. The handling information identifies the critical actions and entities, such as "notify electrician shift", "restart 304 integrated protection", according to the solution given in the document, it is necessary to identify both the actions "notify" and "restart", and take the actions as the relationship and the objects "electrician shift", "304 integrated protection" as the entities to which the relationship points.

Next, failure knowledge extraction is performed. The extraction of the fault knowledge mainly aims at the extraction of fault record documents, and because the pipeline model is easy to cause error transmission and simultaneously the problems of nested entities and overlapping relations frequently occur in projects, the TPLink model is specifically used for extracting the knowledge and extracting the entities and the relations. As for the mode of knowledge extraction by using the TPlinker model, the method is a combined extraction method capable of solving the problem of relationship overlapping. Given two entities p1 and p2, and a particular relationship type r, the model will attempt to answer the following three questions:

a) p1 and p2 are not the start and end positions of two identical entities.

b) p1 and p2 are the starting positions of two entities not involved in the relation r.

c) p1 and p2 are end positions of two entities not involved in the relation r.

The TPLinker model uses a handshake Tagging method (Handshaking Tagging) to answer the three questions. The method needs to label three matrixes (Token Link matrixes) for each relationship to represent different labeling results, and all entities and overlapping relationships can be extracted from the three matrixes. Because any interdependent extraction step is not needed in the model, the dependence of a real sample (Ground Truth) is avoided, and the consistency of training and testing is ensured.

The process of data labeling can be mainly divided into several types, and the result can be obtained by decoding after the labels are obtained. In connection with fig. 4, given a given sentence, a matrix is designed to represent the relationships between all entities. The following relationships are mainly required to be embodied in the labels, and the labels are respectively:

a) Entity head to entity tail (EH-to-ET). The first 5 th and the last 1 st labels in the row 5 in the figure indicate that the two positions are the head and the tail of an entity, for example, for the entity of "power supply electronics part", the corresponding positions of the ("power supply", "part") are marked with red-pink labels.

b) Host to guest heads (SH-to-OH). Line 1, label 6 of the figure, whose coordinates represent the respective starting positions of two entities involved in a relationship. For example, the triplets < "Weibi", "Job", "Power supply department" >, the starting positions of the two entities are "Wei" and "supply", so the ("Wei", "supply") positions are labeled as the 1 st line, 6 th label.

c) Host-to-guest tails (ST-to-OT). The last 1 label in line 5 of the figure, which is similar to the red label, means that a relationship relates to the respective end positions of two entities. For example, the triplets < "wei-bi", "role in", "power supply and electromechanical part" >, it can be obtained that the end positions of the two entities are "bi" and "part", so the ("bi", "part") position is marked as the last 1 tag of the 5 th row.

Because EH-to-ET is unlikely to occur in the lower triangular region, the elements of the lower triangular region are relatively few, but it is also not feasible to discard the lower triangular region directly, because for some relationships, object may occur before Subject, and then SH-to-OH and ST-to-OT will both occur in the lower triangular region. In order to avoid the condition that the sparse matrix occupies too large memory, 1 in the lower triangular region is converted into the corresponding upper triangular region, and the mark is set to be 2, so that the lower triangular region can be directly discarded, and all information can be expressed only by the upper triangular region.

The same labeling work is done for each relationship. But does notThe EH-to-ET represents the location of the entity, so it can be shared between all relationships and need only be tagged once. As shown in FIG. 4, in the case of N relationships, the task of relationship extraction is decomposed into 2N +1 sequence labeled subtasks. In the case of a sentence length of n, each subtask receives a length of (n) ² The sequence of + n)/2 is used as input.

EH-to-ET is labeled with ("supply", "portion") and ("wei", "di"), which represent the starting and ending positions of two entities, i.e., "supply electromechanical portion" and "weibi", respectively. The labels of the entities are all in the upper triangle, while in the labeling of the relationship, there may be a case that the subject is behind the object, in order to compress the matrix without losing data, part of the labels are modified to 2, and for these labels, the analysis needs to be performed in the reverse order. For the relationship "for" is marked SH-to-OH, the value of the mark is 2, indicating the opposite relationship, then the subject of the relationship starts with "Wei" and the object of the relationship starts with "for". (section, two) is labeled ST-to-OT with a value of 2 indicating an inverse relationship, then the subject of the relationship ends with "two" and the object of the relationship ends with "wei". From this information, a complete triplet < "wei-bi", "job", "power supply and machine section" can be deduced.

Therefore, to summarize the tag decoding process, for each relationship, all entities are found according to EH-to-ET, and a dictionary D is created, in which the start position of each entity and the corresponding entity are stored. Next, ST-to-0T is first decoded, and set E is created to save the subject and object tail positions. Then, SH-to-OH is decoded, where the starting positions of the subject and object can be obtained, and the candidate entities are found in the dictionary D according to the starting positions of the two entities. And finally traversing the alternative entities, and if the tail subscripts of the entities exist in the dictionary E, saving the entities as new relations into the set T.

As shown in FIG. 6, the model uses Bert as Encoder for a sentence of length n [ w ₁ ，..，w _n ]After Encoder coding, the vector representation h of each word can be obtained _i Vectors of any two words can be spliced together, and a vector representation of an entity pair is obtained through a full connection layer:

h _i，j ＝tanh(W _h ·[h _i ；h _j ])，j≥j (1)

wherein W _h Is a parameter matrix, b _h Are bias vectors that are learned during the training process. The Handshaking Kernel takes the last layer output of the Encoder as input and is used for calculating all h expressed by the formula (1) _i，j . For a sentence of length n, n (n + 1)/2 h can be obtained _i，j A vector representing a vector representation between each pair of entities.

The loss function averages the errors of all the solid models and the relational models, as shown in equation (2), where N is the input sentence length,

are authentic tags, E, H and T represent labels for EH-to-ET, SH-to-OH and ST-to-OT, respectively.

Then, fault matching is carried out, most of traditional fault research and judgment is completed based on manpower, pressure on maintenance personnel is high, and requirements on expert experience and real-time data are high. This has the disadvantage that on the one hand the efficiency is impaired and on the other hand it is also prone to faults. Therefore, the most similar fault is found according to the existing fault description. And (3) extracting knowledge by adopting the model introduced in the last step, namely finding out fault elements from the fault description by using a TPlinker model, wherein the fault elements mainly comprise the substation, fault equipment, fault description, fault time and the like. After relevant elements are obtained through prediction of the model, all the elements are spliced into a complete character string, and then TF-IDF coding is used for achieving vectorization of fault information. For example, "switch-on failure of the internet access isolating switch 2111 at the north station in Qingdao," the model extracts the key elements "north station in Qingdao," "the internet access isolating switch 2111," and "switch-on failure," and represents the key elements in the form of character strings: "Qingdao north station; an internet isolating switch 2111; and when the switching-on fails, the obtained character string is coded by using TF-IDF to obtain the vector representation of the fault description. The cosine similarity is used to calculate the degree of match between faults.

Formula (4) is the IDF calculation process, N is the total number of documents, x represents any word, N (x) represents the number of documents in which the word x appears, and N (x) +1 in the denominator is a smoothing operation adopted to avoid the denominator being zero. The value of TF-IDF is ultimately determined by the product of TF and IDF, as shown in equation (5).

sim _TF-IDF ＝tf _i，j *idf (5)

As shown in fig. 7, the Fault knowledge in the knowledge base has been converted into vectors for storage through TF-IDF, so the Fault Retrieval module mainly matches the similarity between the Fault code converted in the previous step and the vectors in the knowledge base, returns the Fault record with the highest similarity in the knowledge base, and still uses cosine similarity for calculating the similarity of the vectors, and the cosine similarity is calculated in the manner shown in formula (6).

Calculating the number of characters in the character string according to the vector representation of the fault description, randomly selecting a bit of character of the character string and the position of the character in the character string, and converting the position of the character in the character string into the position in the corresponding element information; comparing the estimated number of the character strings with the number of characters contained in the character strings obtained by calculation, if the estimated number of the character strings is the same as the number of the characters contained in the character strings obtained by calculation, searching and comparing the characters in the corresponding element information with faults according to the positions of the characters obtained by conversion in the corresponding element information, and if the characters are the same as the corresponding element information, entering the subsequent steps, adding verification steps, performing reverse verification, and performing corresponding conversion in the specific verification process to a type which can be compared to realize the conversion, so that verification is performed on vectorized data, and the validity of the data can be ensured. Certainly, the verification is not required to verify each fault information, and the aim of data verification can be achieved only by targeted verification, so that the calculation difficulty is effectively reduced.

After the fault type is identified, the related information can be pushed. When pushing data, different information sets are pushed for different types of nodes, for example, different information is needed for a transformer and a circuit breaker. The information set lists information possibly required by the current equipment, is a summary of all information types, and when pushing is carried out, similarity matching is carried out according to the current fault information and the information types in the information set, and the information is ranked from high to low according to the similarity, so that more important information can be displayed in front of a user. For example, the following table shows some information required by the user.

The fault information contains a large number of fault elements extracted from historical fault documents, and for some outdoor equipment, meteorological factors also appear in the fault documents many times. Therefore, for the matching of the information types, the fault information can be directly encoded, and the similarity calculation can be performed by using the encoded fault information. The invention codes the fault information into vectors in TF-IDF format, and simultaneously, the same coding is carried out on the name of each information type in each information set. In this way, the similarity between the fault vector and the information type vector may be calculated, and the information type with higher similarity has higher priority. The similarity adopts cosine similarity.

And the pushing of the fault information is realized by adopting a Springboot and MybatisPlus framework. The MybatisPlus is responsible for interaction with all relational databases, including multiple data sources and a real-time library, and the SpringBoot plays a role in realizing business logic. The management of the model is realized by using Python language, django is used for providing services for convenience of interaction, and data are exchanged with SpringBoot in an HTTP mode. All requests are proxied through Nginx.

The power supply operation and maintenance full-element data fusion method based on the knowledge graph can be processed in computer equipment, and the processing device can be the computer equipment and execute the method, wherein the computer equipment can comprise one or more processors, such as one or more Central Processing Units (CPUs), and each processing unit can realize one or more hardware threads. The computer device may also include any memory for storing any kind of information, such as code, settings, data etc. For example, and without limitation, memory may include any one or combination of the following: any type of RAM, any type of ROM, flash memory devices, hard disks, optical disks, etc. More generally, any memory may use any technology to store information. Further, any memory may provide volatile or non-volatile retention of information. Further, any memory may represent fixed or removable components of the computer device. In one case, when the processor executes the associated instructions stored in any memory or combination of memories, the computer device can perform any of the operations of the associated instructions. The computer device also includes one or more drive mechanisms for interacting with any memory, such as a hard disk drive mechanism, an optical disk drive mechanism, and so forth.

The computer device may also include an input/output module (I/O) for receiving various inputs (via input devices) and for providing various outputs (via output devices)). One particular output mechanism may include a presentation device and an associated Graphical User Interface (GUI). In other embodiments, input/output modules (I/O), input devices, and output devices may also be excluded, as just one computer device in a network. The computer device may also include one or more network interfaces for exchanging data with other devices via one or more communication links. One or more communication buses couple the above-described components together.

The communication link may be implemented in any manner, e.g., over a local area network, a wide area network (e.g., the Internet), a point-to-point connection, etc., or any combination thereof. The communication links may comprise any combination of hardwired links, wireless links, routers, gateway functions, name servers, etc., governed by any protocol or combination of protocols.

Although exemplary embodiments of the present invention have been described for illustrative purposes, those skilled in the art will appreciate that various modifications, additions, substitutions and the like can be made in form and detail without departing from the scope and spirit of the invention as disclosed in the accompanying claims, all of which are intended to fall within the scope of the claims, and that various steps in the various sections and methods of the claimed product can be combined together in any combination. Therefore, the description of the embodiments disclosed in the present invention is not intended to limit the scope of the present invention, but to describe the present invention. Accordingly, the scope of the present invention is not limited by the above embodiments, but is defined by the claims or their equivalents.

Claims

1. A power supply operation and maintenance full-factor data fusion method based on a knowledge graph is characterized by comprising the following steps:

(1) Acquiring historical fault documents as source data, sorting the source data according to the positions of faults, and then sorting the source data at the same position according to the duration of the fault on the basis of the sorting of the faults at the same position; extracting the historical fault documents according to the classification and sequencing order by utilizing fault knowledge to obtain the element information of the fault; estimating the number of character strings corresponding to the fault element information;

2. The method of claim 1, wherein: the step (1) is specifically as follows:

(1.1) dividing historical fault documents layer by layer according to a tree structure, wherein the information of each event is represented by a directed graph, and fault-related detailed information is stored in the event of each leaf node and comprises a complete process from occurrence to solution of a fault event;

3. The method of claim 2, wherein: in the step (1.1), each fault event is divided into three parts, namely event information, fault information and handling information, wherein:

event information, storing equipment appearing in the fault event and original text information in the fault event; wherein the raw textual information includes at least one of a summary, an impact, and a cause analysis of the fault event;

fault information including the fault occurrence characteristics and fault names;

4. The method of any one of claims 1-3, wherein: in the step (1), the failure knowledge is used for extracting the historical failure document to obtain the failure element information, and the failure element information is specifically obtained after passing through a TPLink model of a failure knowledge extraction part.

5. The method of claim 4, wherein: the element information of the fault obtained by the TPlinker model of the fault knowledge extraction part is specifically as follows:

a) p1 and p2 are the start and end positions of not two identical entities;

c) p1 and p2 are end positions of two entities not involved in the relation r;

the TPLinker model uses a handshake labeling approach to answer the three questions and labels three matrices for each relationship to represent different labeling results.

6. The method of claim 5, wherein: the fault element comprises at least one of a substation, fault equipment, fault description and fault time.

7. The method of claim 1 or 6, wherein: in the step (3), the TF-IDF coding is used to implement vectorization of the fault information, and a vector representation of the fault description is obtained, specifically:

the calculation process of TF is as follows:

the IDF is calculated as:

the smoothing operation process comprises the following steps:

sim _TF-IDF ＝tf _i，j *idf；

8. the method of claim 7, wherein: and (5) pushing the fault information by adopting a SpringBoot and MybatisPlus framework.