CN113157860B - Electric power equipment maintenance knowledge graph construction method based on small-scale data - Google Patents

Electric power equipment maintenance knowledge graph construction method based on small-scale data Download PDF

Info

Publication number
CN113157860B
CN113157860B CN202110370413.9A CN202110370413A CN113157860B CN 113157860 B CN113157860 B CN 113157860B CN 202110370413 A CN202110370413 A CN 202110370413A CN 113157860 B CN113157860 B CN 113157860B
Authority
CN
China
Prior art keywords
electric power
knowledge
node
encyclopedia
dictionary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN202110370413.9A
Other languages
Chinese (zh)
Other versions
CN113157860A (en
Inventor
严莉
张志勇
马超
李素建
刘荫
黄振
郭爽爽
张悦
汤琳琳
杨华飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
NARI Group Corp
Information and Telecommunication Branch of State Grid Shandong Electric Power Co Ltd
Original Assignee
Peking University
NARI Group Corp
Information and Telecommunication Branch of State Grid Shandong Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University, NARI Group Corp, Information and Telecommunication Branch of State Grid Shandong Electric Power Co Ltd filed Critical Peking University
Priority to CN202110370413.9A priority Critical patent/CN113157860B/en
Publication of CN113157860A publication Critical patent/CN113157860A/en
Application granted granted Critical
Publication of CN113157860B publication Critical patent/CN113157860B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3335Syntactic pre-processing, e.g. stopword elimination, stemming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Abstract

The invention discloses a method for constructing a power equipment maintenance knowledge graph based on small-scale data, which comprises the following steps of: crawling an knowledge encyclopedia website to construct an electric power basic corpus data set, wherein the electric power basic corpus data set at least comprises an overhaul manual and encyclopedia knowledge crawled according to basic words in the overhaul manual; according to the electric power basic corpus data set and the general dictionary, counting word frequency to construct an electric power field subject dictionary; generating a power semantic word vector conversion model according to the power basic corpus document data set, and calculating semantic similarity between basic words in the overhaul manual and the encyclopedia knowledge according to the conversion model; and judging whether the semantic similarity is greater than a threshold value, if so, establishing a semantic association relation between the overhaul manual and the encyclopedia knowledge, and constructing a knowledge graph. The method can meet the scenes of fault reason inquiry, processing measure retrieval and related knowledge reference.

Description

Electric power equipment maintenance knowledge graph construction method based on small-scale data
Technical Field
The invention belongs to the technical field of electric power and knowledge maps, relates to a method for constructing a power equipment maintenance knowledge map, in particular to a method for constructing a knowledge map under the condition of insufficient information of a power equipment document and a maintenance scheme, and particularly relates to automatic construction of a power subject dictionary, entity coreference resolution and rule-based relation construction.
Background
With the rapid development of economy in China, the requirement of people on power supply is continuously improved, and the condition maintenance of power equipment is crucial to the stable power supply. In a conventional repair process, workers typically follow a regulatory flow and a repair manual to perform the repair within a fixed time or when the equipment has failed. The working range is bigger under this kind of mode, and the staff is easy to be gone wrong in the maintenance in-process, loses some positions, leads to easily that the maintenance effect is not good, has the potential safety hazard in the electric power supply system operation. The recent development of knowledge-graph technology in the computer field has provided solutions to these difficulties and effective technical aids.
The knowledge graph is a semantic network essentially, and is a data structure based on a graph, and comprises nodes (nodes) and edges (edges), wherein each Node represents an entity, and each Edge is a relationship between the entities. The entity refers to things in the real world, such as a power grid unit, an organization staff, a machine room, power equipment, other materials and the like; relationships are used to express some kind of relationship between different entities, such as fault reasons, treatment measures, etc. The knowledge graph provides a more effective mode for the expression, organization, management and utilization of massive, heterogeneous and dynamic big data, thereby being more beneficial to completing the scene application of intelligent search and deep question and answer. For example, in the intelligent search, after the user queries and inputs, the knowledge graph not only searches for key words, but also firstly carries out semantic understanding, normalizes the description of query words, searches corresponding knowledge points in a knowledge base and provides a complete knowledge system. In addition, the intelligent voice assistant Siri of the apple can provide services such as answers, introduction and the like for the user, namely, the result of introducing the knowledge graph.
The invention introduces knowledge graph technology in the maintenance of electric power equipment, and aims to provide a method for constructing a basic word segmentation table and a knowledge graph facing the maintenance of the electric power equipment by using external information such as knowledge encyclopedia and the like under the condition that a maintenance manual and basic data are insufficient. On the basis, the problem retrieval and question-answering service is realized, so that the maintainers can conveniently retrieve the related knowledge of the fault equipment and find out the fault reason and the solution in time. With the continuous expansion and deepening of the power overhaul knowledge graph in the future, the potential equipment faults can be excavated in advance by using the strong correlation analysis capability of the graph, so that maintainers can conveniently remove and reduce loss in advance. In the practical work of the invention, the following four difficulties are faced and mainly solved:
A. lack of a power topic dictionary: with the increase of semantic understanding requirements of power texts, a complete power industry theme dictionary is more important, but at present, no complete theme dictionary is established in the power field.
B. The basic data size is small, and the data content is simple: although a large amount of text data such as electric power science and technology papers, project reports, electric power regulations and the like are accumulated in the electric power industry, the contents of an electric power overhaul operation manual are relatively simple, and overhaul rules are basically combed by business experts. Referring to southern power grid equipment standard defect knowledge base and 2020 power equipment maintenance regulations, due to space limitation, a maintenance manual mainly comprises components, fault phenomena, generation reasons and solving measures, and lacks detailed explanation.
C. And (3) a map knowledge description system: knowledge description systems, i.e. the determination of graph entities and relationships, are a fundamental task. The electric power overhaul comprises various information such as equipment, faults, reasons, numerical values, sequences and the like, and complex knowledge needs to be accurately and effectively expressed.
D. The maintenance steps and sequence are as follows: the fault of one power equipment is usually corresponding to a plurality of reasons and maintenance measures, and the reason for troubleshooting and the maintenance measures usually comprise sequence elements. How to add sequence features into the knowledge graph, and then the method can provide accurate and concise retrieval and question-answering services for users, and becomes an important problem.
Disclosure of Invention
The invention provides a method for constructing a power equipment maintenance knowledge map under the condition of small data scale.
The technical scheme provided by the invention is as follows: a method for constructing a power equipment overhaul knowledge graph based on small-scale data comprises the following steps:
A. crawling an electric power basic corpus document data set to construct an electric power field theme dictionary:
A1. crawling knowledge encyclopedia websites according to basic vocabularies and industry terms in the overhaul manual to construct document data sets;
it is well known that large amounts of data are a prerequisite for the effectiveness of machine learning, especially deep learning models. Considering, for the most part, the industry document basic corpus data set includes the following three parts:
(1) the equipment category names and the component names of the maintenance manual are sorted, and a maintenance equipment vocabulary table T is generatedHandbook
(2) Adding all entries of electric power industry term list in other electric power vocabularies and knowledge encyclopedias
(3) At present, various machine learning modeling competitions are numerous, and each competition can provide some actual data sets in advance, so that the data sets of the machine learning competitions in the power industry and encyclopedic knowledge pages of entity names generated by competition teams can be utilized to further expand the data volume of basic corpora, and finally a basic corpora data set C in the power industry is constructed.
A2. Using a bag of words method to count the word frequency of the fixed-length phrases of the document corpus;
after preparing a basic corpus data set in the power industry, the invention adopts a bag-of-words method to count word frequency.
(1) Filtering the corpus according to the stop dictionary, and replacing stop words with spaces, thereby preliminarily segmenting the corpus;
(2) further to the corpus, ". "isocandela point symbol is further divided;
(3) setting the maximum word length L of the topic dictionary, respectively taking phrases with word lengths of 2, 3 and 4 … L as candidate words, and counting the frequency of occurrence of each candidate word in the prediction. If the word frequency of the power line and the load power line are paired to form a string, the word frequency is counted as: power supply 2, power supply line and 1, power supply line and minus 1, …, power supply line and load line paired in string 1, source line 2, source line and 1, source line and minus 1, …, in string 1;
A3. and marking the part of speech of the words in the electric power subject dictionary by utilizing the general basic dictionary.
In consideration of the accuracy and processing performance of the dictionary, the present invention uses a general dictionary D (including words, word frequencies, parts of speech) containing 300 ten thousand words as an initial dictionary.
(1) The overhaul equipment vocabulary T arranged in A1(1)HandbookAdding to the general dictionary D;
(2) d is used for filtering the candidate words in A2(3), all the candidate words in D are screened out and used as the words of the electric power subject dictionary, the word frequency of the words is the statistical word frequency of the electric power corpus data set, and the part of speech of the words is the part of speech of the words in the basic dictionary D;
(3) it should be noted that, in (1), a part of vocabularies in the overhaul device vocabulary table T newly added to D does not exist in D originally, and at this time, the part of speech is set as the last word V after the word segmentation, and the part of speech with the largest number of all words in the vocabulary in D with V as the end of word is set as the part of speech. And if V is included in D, manually marking the part of speech.
(4) Through the steps, the electric power field theme dictionary D is finally establishedElectric powerThe method comprises three items of contents of words, word frequency and word parts.
Word2vec is a word vector calculation tool that Google open source in 2013, converts words into vector form, and the similarity of text in vector space can represent the semantic similarity of text. Computing a repair equipment class name and part name word set T by adopting a word2vec modelHandbookAnd crawl encyclopedic power entry TEncyclopedicSemantic similarity of (2). The following operations are performed:
B1. using the electric field segmentation dictionary D obtained in A3Electric powerSegmenting the electric power basic corpus C to obtain a segmented corpus CWord segmentation
B2. Use of CWord segmentationThe method for constructing the electric power word2vec model has two modes:
(1) when the basic corpus is large enough, it can be completely based on CWord segmentationCorpus construction electric power field word2vec model MElectric power
(2) When the basic corpus is small, a fine-tune method can be adopted, a general word2vec model is utilized, and C is usedWord segmentationPerforming corpus increment training to construct a word2vec model M in the electric power fieldElectric power
B3. Utilizing word2vec model M in the field of electric powerElectric powerCalculating THandbookAnd TEncyclopedicAnd setting a threshold value to obtain encyclopedic knowledge related to equipment in the overhaul manual according to the semantic similarity R between the equipment and the equipment.
C. Based on a maintenance manual and an electric power encyclopedia knowledge entry, an electric power knowledge graph facing equipment maintenance is constructed:
C1. referring to the basic concept of a graph database, determining that a knowledge description system comprises an Entity (Entity) and a Label (Label), and adding attributes (Property) for a graph Entity and a relation respectively:
(1) labeling: the label is a node type, and is used for grouping nodes, and the nodes with the same label belong to the same group. A node may have zero, one, or multiple tags. By means of the label or grouping query nodes, the query range can be greatly reduced, and the query performance is improved.
(2) Entity: the method comprises the steps of (Node) and (Relationship), wherein the Node has a label and an attribute; the relationship is directed, linking two nodes, with attributes and relationship types.
(3) The attributes are as follows: the node attribute and the relationship attribute are included and used for expanding the entity information. The node attribute describes additional information of the node entity, such as a node identifier, a node name, a superior node name, a node type, and the like. Different from some existing knowledge maps, the invention emphasizes the importance of the entity attributes and puts a lot of information into the entity attributes, thereby enabling the structure of the power overhaul knowledge map to be clearer and facilitating retrieval and application.
C2. In the repair scene, a large number of same names such as flanges, leakage oil and the like actually represent different equipment or fault conditions, so that fault causes and treatment measures are different. How to define these homonymous nodes is completely distinguished through nodes or distinguished through relationship nodes, and complexity and intuition of the knowledge graph are directly influenced. The present invention employs two strategies simultaneously as required, and the detailed description will be given in the detailed description.
C3. The relationship attributes describe additional information of the relationship entities, such as relationship identifications, relationship names, and the like. It should be noted that, if "fault severity" or the like is defined as a node in some knowledge graph, the node should have entity characteristics, and "fault severity" is a characteristic describing a fault and does not have entity characteristics, so that it is more appropriate to define relationship attributes, and specifically includes the following four parts:
(1) severity of failure: attributes describing the relationship between the device node and the failed node;
(2) priority of failure cause: describing the attribute of the relationship between the reason node and the maintenance node, so that maintenance personnel can maintain the fault with the determined reason according to the priority sequence;
(3) troubleshooting measure priority: describing the attribute of the relationship between the reason node and the maintenance node, so that maintenance personnel can maintain the fault with the determined reason according to the priority sequence;
C4. according to the output content in the step B, introducing power encyclopedia knowledge, expanding a power knowledge graph:
(1) and establishing electric power encyclopedic knowledge entry nodes, wherein the node attributes are attribute information in encyclopedic entries, such as Chinese names, English names, functions, compositions, classifications, principles and the like.
(2) And establishing a relationship between the equipment name of the maintenance manual and the electric power encyclopedia knowledge entry according to the semantic similarity R calculated by the B3.
The invention has the beneficial effects that:
the invention provides a method for constructing a power equipment overhaul knowledge graph with insufficient basic data and no labeled data, which specifically comprises the contents of basic data set construction, field dictionary construction, field knowledge description system design, field knowledge graph construction, external knowledge fusion and the like.
a) Acquiring structured data of power equipment fault maintenance by analyzing an equipment fault maintenance document, introducing knowledge encyclopedic power field knowledge and a power competition open document, and constructing a field basic corpus;
b) the domain dictionary is skillfully constructed by using a deep learning technology based on the domain basic corpus and the general dictionary, so that a better word segmentation effect is obtained;
c) in a domain knowledge description system, a conventional entity-attribute value triple mode is replaced, and a mode of adding attributes in entities and relations is adopted, so that complex knowledge description such as numerical values, sequences and the like is solved, and knowledge representation is realized more clearly and conveniently;
d) by introducing external knowledge data such as encyclopedic knowledge and the like and by completing knowledge correlation of the domain map through technologies such as seed entity identification, entity linkage and the like, verification of relevant scenes of electric power overhaul is performed.
Drawings
FIG. 1 is a block flow diagram of the method of the present invention;
FIGS. 2(a) to 2(c) are sample diagrams of a generation process of a topic dictionary in the power domain in the embodiment of the present invention;
FIGS. 3(a) to 3(c) are sample diagrams of the construction process of the corpus participle and vocabulary semantic similarity model according to the embodiment of the present invention;
4(a) to 4(e) are diagrams of power overhaul knowledge graph entities, relationships, and attribute samples in an embodiment of the invention;
FIG. 5 is an example diagram of a power overhaul knowledge graph query in an embodiment of the invention.
Detailed Description
The invention will be further described by way of examples, without in any way limiting the scope of the invention, with reference to the accompanying drawings.
The invention provides a technology for constructing a knowledge graph facing the maintenance of electric power equipment by using external information such as knowledge encyclopedia and the like under the condition that a maintenance manual and basic corpus data are insufficient. The specific task comprises the steps of constructing a power field theme dictionary; reconstructing or calculating the semantic similarity of the vocabularies by using a Fine-tune machine learning model; adding attribute features and sequence features in the knowledge graph; finally, the retrieval and question-and-answer service based on the knowledge map is realized, so that the maintainers can conveniently retrieve the related knowledge of the fault equipment and find the fault reason and the solution in time.
Referring to fig. 1, the method of the present invention is embodied as follows:
A. crawling an electric power basic corpus document data set to construct an electric power field theme dictionary:
a1 crawling knowledge encyclopedia websites according to basic vocabularies and industry terms to construct basic corpus document data sets;
(1) referring to fig. 2(a), 998 inspection words are total after the equipment categories and the component names in the inspection manual are integrated and the weight is removed;
(2) adding electric power vocabularies in a word mining competition in the AIIA cup national grid electric power professional field, and obtaining an electric power basic vocabulary T after removing the heavy words and total 13147 vocabularies;
(3) crawling the encyclopedia entry of the Baidu knowledge of each vocabulary in the T, wherein 3856 vocabularies have the encyclopedia entries, and downloading the knowledge page document with the entries;
(4) the basic corpus of a word mining competition in the field of national power grid electric power professional of AIIA cup is added, and the basic corpus comprises tens of thousands of electric power scientific articles, project reports, electric power regulations, electric power operation manuals and the like.
In summary, the preparation work of the electric basic corpus C is completed, including 13856 documents, 107 ten thousand paragraphs, and 7000 total words.
A2. Adopting a bag-of-words method to count the fixed-length phrase frequency of basic corpus documents;
(1) filtering the linguistic data by using a stop dictionary, and dividing the electric power basic linguistic data C into sentences according to stop words and punctuation marks (,);
(2) setting the maximum word length L as 12, and extracting a phrase candidate word set S with the word lengths of 2, 3 and 4 … 12 in the corpus;
(3) selecting a general dictionary (comprising words, word frequencies and parts of speech) containing 330 ten thousand words as a basic dictionary DGeneral purposeAdding vocabulary in the electric power basic vocabulary T to the general basic dictionary DGeneral purposeIn the electric power subject dictionary, the total vocabulary is about 334 ten thousand and serves as a candidate word set D'General purpose
A3. Utilizing candidate word set D'General purposeAnd marking the part of speech of the words in the electric power subject dictionary.
(1) From D'General purposeFiltering the candidate word set S and screening out the electric power field subject dictionary DElectric power363254 words of (A);
(2) the marked word frequency is the number of occurrences of the word in the power basic corpus C, see fig. 2 (b);
(3) part of speech of the tagged topic dictionary vocabulary:
a. wherein 357680 words are in the basic dictionary DGeneral purposeAlready present in (A), using DGeneral purposeThe part of speech of the vocabulary;
b. to DGeneral purpose5574 words not present in (B), using Jieba participle and DElectric powerThe dictionary carries out word segmentation, and the aim is to find a tail vocabulary V of the vocabulary;
c. for each tail word V in V, there are 5433 word packetsIs contained in DGeneral purposeIn the tail vocabulary of (1), statistics DGeneral purposeThe tail part of the Chinese vocabulary also uses the part-of-speech number of the word, and the part-of-speech of v is set as the part-of-speech with the statistical maximum value by adopting a voting mechanism, and the result is shown in figure 2 (c);
d.V there are 141 words not included in DGeneral purposeIn the tail vocabulary of (2), the part of speech is set manually.
(4) Through the steps, the electric power field theme dictionary D is finally establishedElectric powerThe method comprises three items of contents of words, word frequency and word parts.
B. Constructing an electric power word2vec model, and calculating a word set T of overhaul equipmentHandbookAnd encyclopedic electric power entry TEncyclopedicSemantic similarity of (2).
B1. Using Jieba participle and electric power domain topic dictionary DElectric powerPerforming word segmentation on the electric power basic corpus C to obtain the word segmented CWord segmentationCorpora, results see fig. 3 (a);
B2. considering that the data of the electric power basic corpus C is sufficient, the electric power basic corpus C is completely based on the C after word segmentationWord segmentationCorpus construction electric power field word2vec model MElectric power
B3. Using MElectric powerCalculating THandbookAnd TEncyclopedicIn between, the encyclopedic knowledge entries and T are calculated respectivelyHandbookSemantic similarity R of the names of the medium equipment classes and the names of the equipment components, see fig. 3 (b);
B4. through data exploration, the semantic similarity R of similar words is found to be about 0.87, and is set as a word similarity threshold, so that about 1/3 (321) encyclopedia entries with similar semantics in the overhaul manual words are obtained, and see fig. 3 (c). And establishing encyclopedic knowledge correlation and retrieval services for the vocabularies in the knowledge graph.
C. The method adopts Vue (front end) + Apache (Web Server) + flash (API) + Python (background) technical stack, and develops an encyclopedia crawler program based on a Scapy framework. The program was first developed to extract the data items in the service manual, as shown in the following table:
Figure BDA0003009055640000081
C1. create node labels, entities (node entities and relationship entities), attributes (node and relationship attributes) in Neo4j database:
(1) define class 6 node labels: baidu encyclopedia nodes (Baidu BaikeIs), equipment class nodes (Categories), equipment component nodes (Parts), Fault nodes (Fault), Fault cause nodes (Reason), and Fault Repair nodes (Repair).
(2) Entity: the number of the node entities is 22333, wherein the number of the "encyclopedia entry nodes" is 3433, the number of the "equipment item nodes" is 1061, the number of the "equipment component nodes" is 2129, the number of the "fault nodes" is 790, the number of the "fault reason nodes" is 5037, and the number of the "fault maintenance nodes" is 9833. 24052 relation entities, wherein the number of the 'containing relation' is 3188, the number of the 'failure relation' is 3122, the number of the 'failure reason relation' is 5037, the number of the 'failure maintenance relation' is 9881, and the number of the 'encyclopedia relation' is 2834.
(3) The attributes are as follows: in order to make the structure of the power overhaul knowledge graph clearer and facilitate retrieval and application, it is very necessary to make clear the difference between entities and attributes, the node attributes and the relationship attributes are introduced into the graph, and the detailed description is as follows.
C2. There are a large number of entities with the same name in the service manual, such as equipment name, fault name, cause, and solution. Although these entities are synonymous, the meanings are different. For example, different devices are called flanges, many devices have oil leakage faults, and the like, and it is easy to understand that the reasons and maintenance modes of oil leakage of different devices may be quite different. Considering the complexity and intuition of the knowledge graph, the invention adopts two different node definition modes:
(1) the product name, the component name, the reason description, the maintenance measure and the Baidu encyclopedia entry node are unique, and different nodes are defined by the same name but different meanings. Referring to fig. 4(a), the equipment component includes two supports, which belong to the category of cable trench products and the category of cable supports.
(2) The fault node uses the fault name after the rearrangement, namely the same fault name has only one fault node. Referring to fig. 4(b), oil leakage faults occur in 34 equipment components, and if nodes are defined for oil leakage of each component individually, the data size during fault retrieval is large, and the system performance is significantly reduced; in addition, the interface display is too bloated and the map is inconvenient to view.
(3) When the same-name faults occur to different equipment components, the fault reasons and maintenance measures are different, so that the component identification and the component name are added into the fault reason node, see fig. 4(b), so as to mark which fault component the reason belongs to, thereby establishing a unique path of 'equipment component-fault-reason-solution measure'.
C3. "inventive content C3", the present invention defines important relationship attributes, aiming to extend the knowledge-graph search capability:
(1) severity of failure: referring to fig. 4(C), the severity of the same-name fault of different devices may be different, for example, the severity of the damage of the critical device component and the damage of the general device component are very different, and according to C2, there are only 1 "damage fault nodes" in the knowledge graph, so that the "fault severity" (serious) attribute needs to be added to the "device component-fault" relationship.
(2) Priority of failure cause: referring to fig. 4(d), there is a priority order of the fault causes in the service manual, and the cause needs to be checked in strict order during the knowledge graph search, so the invention adds the priority order to the fault maintenance measure attribute in the relationship of "equipment fault-fault cause", and similarly, also in the relationship of "fault cause-maintenance measure".
(3) Encyclopedia synonym similarity attribute: referring to fig. 4(e), the overhaul manual has limited contents, and the method adopts the encyclopedia knowledge to expand the power knowledge graph and adopts the semantic model to calculate the similarity between the device part names and the encyclopedia knowledge entries. In order to facilitate the association of encyclopedia knowledge for the maintainer, a similarity attribute in the relationship of "encyclopedia entry-device part" is added.
C4. And introducing electric power encyclopedia knowledge according to the output content in the step B, and expanding an electric power knowledge map. Considering that the encyclopedia knowledge content is large in text quantity, the method is not suitable for using a neo4j database and is saved by using a mongo document database. Approximate semantic vocabulary entry encyclopedia knowledge in mongo is associated when neo4j retrieval is performed in python background.
After the electric power knowledge graph is constructed, the following steps are executed:
match (i) - [ r1: encyclopedia ] - (p) - [ r2: fault ] - (f) - [ r3: fault cause ] - (r) - [ r4: maintenance of fault ] - (z) where p.part _ id ═ 20-2' and f.fault _ desc ═ oil leakage ' and r.part _ id ═ 20-2' return i, p, f, r, z, r1, r2, r3, r4
Referring to fig. 5, the cause, maintenance measures and encyclopedia of the oil leakage fault of the equipment component flange of the category 'power transformation facility-primary equipment-main transformer-pressure relief valve' can be obtained.
It is noted that the disclosed embodiments are intended to aid in further understanding of the invention, but those skilled in the art will appreciate that: various substitutions and modifications are possible without departing from the spirit and scope of the invention and appended claims. Therefore, the invention should not be limited to the embodiments disclosed, but the scope of the invention is defined by the appended claims.

Claims (6)

1. A method for constructing a power equipment overhaul knowledge graph based on small-scale data is characterized by comprising the following steps:
1) crawling an knowledge encyclopedia website to construct an electric power basic corpus data set, wherein the electric power basic corpus data set at least comprises an overhaul manual and encyclopedia knowledge crawled according to basic words in the overhaul manual; wherein crawling knowledge encyclopedia website constructs a document data set of the electric power basic corpora, and specifically comprises:
1-a) sorting equipment category names and component names of a maintenance manual to generate a maintenance equipment vocabulary THandbook
1-b) at said THandbookAdding all entries of the electric power industry term list in other electric power vocabularies and knowledge encyclopedias, crawling encyclopedia knowledge to obtain encyclopedia entries TEncyclopedic
1-c) collecting data set T of machine learning competition in the electric power industryCollecting
1-d) reacting said THandbook、TEncyclopedicAnd TCollectingAs a power base corpus data set;
2) according to the electric power basic corpus data set and the general dictionary, counting word frequency to construct an electric power field subject dictionary; the method specifically comprises the following steps:
2-a) preliminarily dividing the electric power basic corpus data set according to a deactivation dictionary;
2-b) setting the maximum word length L of the electric power field topic dictionary, respectively taking phrases with the word length of 2-L as candidate words, and counting the word frequency of each candidate word;
2-c) filtering the candidate words by using a general dictionary, screening out the candidate words in the general dictionary to be used as the vocabulary of the electric power field topic dictionary, wherein the part of speech of the vocabulary is the part of speech of the vocabulary in the general dictionary;
3) generating a power semantic word vector conversion model according to the power basic corpus document data set, and calculating semantic similarity between basic words in the overhaul manual and the encyclopedia knowledge according to the conversion model; the method specifically comprises the following steps:
3-a) using the electric power field subject dictionary to perform word segmentation on the electric power basic corpus data set to obtain a word segmented corpus CWord segmentation
3-b) according to said CWord segmentationConstructing a power semantic word vector conversion word2vec model;
3-c) calculating the semantic similarity between the basic vocabulary in the overhaul manual and the encyclopedia knowledge according to the conversion model;
4) judging whether the semantic similarity is greater than a threshold value, if so, establishing a semantic association relation between the overhaul manual and the encyclopedia knowledge, and constructing a knowledge graph, which specifically comprises the following steps:
determining three elements of an entity, a node tag and an attribute contained in a knowledge graph structure according to a graph database; the entities comprise node entities and relationship entities, and the node labels refer to the types of the nodes and are used for grouping the nodes; the attributes comprise node attributes and relationship attributes, the node attributes describe additional information of the node entities, and the relationship attributes describe additional information of the relationship entities.
2. The method of claim 1, wherein the universal dictionary is a universal dictionary containing 300 ten thousand words.
3. The method of claim 1, wherein the node label comprises: the system comprises an encyclopedia entry node, an equipment item node, an equipment component node, a fault reason node and a fault maintenance node.
4. The method of claim 1, wherein the node labels are synonymously distinct in name, defined as distinct nodes; the fault node uses the fault name after the duplication elimination; when the same-name faults occur to different equipment components, the component identification and the component name are added into the fault reason node.
5. The method of claim 1, wherein the relationship attributes include a failure severity attribute, a failure cause priority attribute, and an encyclopedia similarity attribute.
6. The method of claim 1, wherein the encyclopedia knowledge is saved using a mongo document database.
CN202110370413.9A 2021-04-07 2021-04-07 Electric power equipment maintenance knowledge graph construction method based on small-scale data Expired - Fee Related CN113157860B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110370413.9A CN113157860B (en) 2021-04-07 2021-04-07 Electric power equipment maintenance knowledge graph construction method based on small-scale data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110370413.9A CN113157860B (en) 2021-04-07 2021-04-07 Electric power equipment maintenance knowledge graph construction method based on small-scale data

Publications (2)

Publication Number Publication Date
CN113157860A CN113157860A (en) 2021-07-23
CN113157860B true CN113157860B (en) 2022-03-11

Family

ID=76888762

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110370413.9A Expired - Fee Related CN113157860B (en) 2021-04-07 2021-04-07 Electric power equipment maintenance knowledge graph construction method based on small-scale data

Country Status (1)

Country Link
CN (1) CN113157860B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114048148A (en) * 2022-01-13 2022-02-15 广东拓思软件科学园有限公司 Crowdsourcing test report recommendation method and device and electronic equipment
CN115905575A (en) * 2023-01-09 2023-04-04 海乂知信息科技(南京)有限公司 Semantic knowledge graph construction method, electronic equipment and storage medium
CN116521852B (en) * 2023-06-26 2023-09-19 南京实创信息技术有限公司 Deep learning-based intelligent mapping device and mapping method for power equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104038368A (en) * 2014-05-15 2014-09-10 国家电网公司 Device alarm full-data notification system and method for electric-power communication network
CN109033284A (en) * 2018-07-12 2018-12-18 国网福建省电力有限公司 The power information operational system database construction method of knowledge based map
CN109635127A (en) * 2019-02-20 2019-04-16 云南电网有限责任公司信息中心 A kind of power equipment portrait knowledge mapping construction method based on big data technology
CN111737496A (en) * 2020-06-29 2020-10-02 东北电力大学 Power equipment fault knowledge map construction method
CN112002411A (en) * 2020-08-20 2020-11-27 杭州电子科技大学 Cardiovascular and cerebrovascular disease knowledge map question-answering method based on electronic medical record
CN112347271A (en) * 2020-12-04 2021-02-09 国网天津市电力公司电力科学研究院 Auxiliary defect entry method for power distribution Internet of things equipment based on character semantic recognition
CN112612902A (en) * 2020-12-23 2021-04-06 国网浙江省电力有限公司电力科学研究院 Knowledge graph construction method and device for power grid main device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8452725B2 (en) * 2008-09-03 2013-05-28 Hamid Hatami-Hanza System and method of ontological subject mapping for knowledge processing applications

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104038368A (en) * 2014-05-15 2014-09-10 国家电网公司 Device alarm full-data notification system and method for electric-power communication network
CN109033284A (en) * 2018-07-12 2018-12-18 国网福建省电力有限公司 The power information operational system database construction method of knowledge based map
CN109635127A (en) * 2019-02-20 2019-04-16 云南电网有限责任公司信息中心 A kind of power equipment portrait knowledge mapping construction method based on big data technology
CN111737496A (en) * 2020-06-29 2020-10-02 东北电力大学 Power equipment fault knowledge map construction method
CN112002411A (en) * 2020-08-20 2020-11-27 杭州电子科技大学 Cardiovascular and cerebrovascular disease knowledge map question-answering method based on electronic medical record
CN112347271A (en) * 2020-12-04 2021-02-09 国网天津市电力公司电力科学研究院 Auxiliary defect entry method for power distribution Internet of things equipment based on character semantic recognition
CN112612902A (en) * 2020-12-23 2021-04-06 国网浙江省电力有限公司电力科学研究院 Knowledge graph construction method and device for power grid main device

Also Published As

Publication number Publication date
CN113157860A (en) 2021-07-23

Similar Documents

Publication Publication Date Title
CN113157860B (en) Electric power equipment maintenance knowledge graph construction method based on small-scale data
CN106649260B (en) Product characteristic structure tree construction method based on comment text mining
CN107463607B (en) Method for acquiring and organizing upper and lower relations of domain entities by combining word vectors and bootstrap learning
CN109344236A (en) One kind being based on the problem of various features similarity calculating method
CN108121829A (en) The domain knowledge collection of illustrative plates automated construction method of software-oriented defect
Abujar et al. A heuristic approach of text summarization for Bengali documentation
CN107992633A (en) Electronic document automatic classification method and system based on keyword feature
CN102298635A (en) Method and system for fusing event information
CN102214189B (en) Data mining-based word usage knowledge acquisition system and method
CN108681574A (en) A kind of non-true class quiz answers selection method and system based on text snippet
CN110362678A (en) A kind of method and apparatus automatically extracting Chinese text keyword
CN101702167A (en) Method for extracting attribution and comment word with template based on internet
Kashmira et al. Generating entity relationship diagram from requirement specification based on nlp
CN108228701A (en) A kind of system for realizing Chinese near-nature forest language inquiry interface
CN106202039B (en) Vietnamese portmanteau word disambiguation method based on condition random field
CN111897914A (en) Entity information extraction and knowledge graph construction method for field of comprehensive pipe gallery
Pal et al. An approach to automatic text summarization using simplified lesk algorithm and wordnet
CN111325018A (en) Domain dictionary construction method based on web retrieval and new word discovery
CN115858758A (en) Intelligent customer service knowledge graph system with multiple unstructured data identification
Yang et al. Ontology generation for large email collections.
KR20110017129A (en) Apparatus and method for words sense disambiguation using korean wordnet and its program stored recording medium
CN110929518A (en) Text sequence labeling algorithm using overlapping splitting rule
Kaur et al. A detailed analysis of core NLP for information extraction
CN114579709A (en) Intelligent question-answering intention identification method based on knowledge graph
CN112380848B (en) Text generation method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220311