CN112613314A - Electric power communication network knowledge graph construction method based on BERT model - Google Patents
Electric power communication network knowledge graph construction method based on BERT model Download PDFInfo
- Publication number
- CN112613314A CN112613314A CN202011588999.8A CN202011588999A CN112613314A CN 112613314 A CN112613314 A CN 112613314A CN 202011588999 A CN202011588999 A CN 202011588999A CN 112613314 A CN112613314 A CN 112613314A
- Authority
- CN
- China
- Prior art keywords
- communication network
- power communication
- knowledge graph
- knowledge
- bert
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004891 communication Methods 0.000 title claims abstract description 63
- 238000010276 construction Methods 0.000 title claims abstract description 18
- 238000000034 method Methods 0.000 claims abstract description 48
- 238000000605 extraction Methods 0.000 claims abstract description 24
- 238000012549 training Methods 0.000 claims abstract description 13
- 238000013508 migration Methods 0.000 claims abstract description 8
- 230000005012 migration Effects 0.000 claims abstract description 8
- 239000013598 vector Substances 0.000 claims description 29
- 239000011159 matrix material Substances 0.000 claims description 14
- 238000012423 maintenance Methods 0.000 claims description 12
- 230000002457 bidirectional effect Effects 0.000 claims description 9
- 230000033228 biological regulation Effects 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 6
- 238000007781 pre-processing Methods 0.000 claims description 6
- 238000002372 labelling Methods 0.000 claims description 4
- 230000007246 mechanism Effects 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 4
- 230000011218 segmentation Effects 0.000 claims description 4
- 230000006870 function Effects 0.000 claims description 3
- 230000009467 reduction Effects 0.000 claims description 3
- 239000013589 supplement Substances 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 230000000644 propagated effect Effects 0.000 claims description 2
- 230000008569 process Effects 0.000 description 9
- 238000005516 engineering process Methods 0.000 description 5
- 230000015654 memory Effects 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012552 review Methods 0.000 description 2
- 238000010561 standard procedure Methods 0.000 description 2
- 208000024891 symptom Diseases 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000009792 diffusion process Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000007787 long-term memory Effects 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 230000006641 stabilisation Effects 0.000 description 1
- 238000011105 stabilization Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Business, Economics & Management (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Molecular Biology (AREA)
- Economics (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a construction method of a knowledge graph of a power communication network based on a BERT model, which comprises the following steps: s1 builds an original document library: selecting data of different sources and structures as a basis, and constructing an original document library; s2 label migration: marking the original knowledge data of the power communication network, and carrying out label migration on the basis of marking information; s3 semantic feature extraction: training based on a BERT model to realize semantic feature extraction of the structured document data of the power communication network; s4 constructing a knowledge graph: and carrying out named entity recognition on the extracted semantic features based on a BilSTM-CRF model, extracting knowledge concepts and mutual connection, and forming a knowledge graph by matching with knowledge modeling. The method effectively solves the typical problems of more professional terms, less training corpus and the like, improves the accuracy of key steps such as named entity recognition and the like, and finally improves the performance of the knowledge map.
Description
Technical Field
The invention relates to the technical field of informatization, automation and intelligent maintenance of a power communication network, in particular to a construction method of a power communication network knowledge graph based on BERT and a BilSTM-CRF entity recognition model.
Background
An important accessory of modern power networks is their supporting power communication network. Compared with the larger-scale general internet, the power communication network still keeps a relatively limited scale, and the complexity of the load service is more specific. However, as more and more new technologies such as internet of things and artificial intelligence are introduced into the power grid construction, and signals are transmitted into the power grid by various devices to meet business requirements, the structure of the power communication network has become more complex. Unlike the operation data of the junction grid, the electric power communication network extracts a wide range of information from environmental conditions to various on-site monitoring signals and corresponding contents. In such a situation, various unknown service operations may cause potential failures, and maintenance operators of the power communication network will face increasingly challenging problems. When a fault occurs, they must consult a large number of specific files of relevant equipment and services to locate the fault. This situation severely limits the efficiency of problem location and resolution.
For similar problems, in the field of general knowledge search, knowledge graph technology can effectively improve the efficiency of knowledge retrieval. As a typical artificial intelligence technology, the proposal of the knowledge graph is firstly used for improving the problem understanding capability of a search engine. The core idea is that the existing concepts, entities, events and the relations among the concepts, the entities, the events and the relations among the events are described in a structural mode. The construction of the knowledge graph mainly comprises the steps of collecting, processing, extracting and representing knowledge. Due to the self-ability, the knowledge graph is widely applied to related tasks such as general knowledge retrieval and search engines, recommendation algorithms, intelligent storage systems and the like. Compared with the general knowledge graph technology, the technology is applied to a specific field, and the establishment of a small-sized knowledge graph with higher specialization degree is a research trend in recent years. A key challenge for knowledgegraph construction in the field of expertise is the existence of many specific terms and concepts in a relatively small corpus. Traditional entity recognition methods such as Word2Vec still focus on feature extraction of single words, rather than contextual semantic information, which leads to limitations in distinguishing specific concepts and expressing relationships between them for further application.
Therefore, aiming at the problems, the invention introduces the knowledge graph into the operation and maintenance scene of the power communication network, and provides a power communication network knowledge graph construction method based on BERT and a BilSTM-CRF entity recognition model; according to the fact that a power communication network maintains a plurality of data sources and different structures, the method combines a bidirectional long-short term memory (BilSTM) unit network and a Conditional Random Field (CRF) method to modularize the relation among named entities, and finally forms a series of mutually related knowledge concepts to combine and support query requests in a library mode; the method can effectively solve the typical problems of more professional terms, less training corpora and the like, improves the accuracy of key steps such as named entity recognition and the like, and finally improves the performance of the knowledge graph.
Disclosure of Invention
The invention aims to solve the technical problem of providing a power communication network knowledge graph construction method based on BERT and a BilSTM-CRF entity recognition model; according to the fact that a power communication network maintains a plurality of data sources and different structures, the method combines a bidirectional long-short term memory (BilSTM) unit network and a Conditional Random Field (CRF) method to modularize the relation among named entities, and finally forms a series of mutually related knowledge concepts to combine and support query requests in a library mode; the method can effectively solve the typical problems of more professional terms, less training corpora and the like, improves the accuracy of key steps such as named entity recognition and the like, and finally improves the performance of the knowledge graph.
In order to solve the technical problems, the invention adopts the technical scheme that: the electric power communication network knowledge graph construction method based on the BERT and the BilSTM-CRF entity recognition model specifically comprises the following steps:
s1 builds an original document library: selecting data of different sources and structures as a basis, and constructing an original document library;
s2 label migration: marking the original knowledge data of the power communication network, and carrying out label migration on the basis of marking information;
s3 semantic feature extraction: training based on a BERT model to realize semantic feature extraction of the structured document data of the power communication network;
s4 constructing a knowledge graph: and carrying out named entity recognition on the extracted semantic features based on a BilSTM-CRF model, extracting knowledge concepts and mutual connection, and forming a knowledge graph by matching with knowledge modeling.
By adopting the technical scheme, the power communication network has multiple maintenance data sources and different structures, and the method aligns the original data into a unified frame at first. Then, a transformer-based bidirectional coded representation (BERT) model is used as a basic feature extraction method for words in the above-described document. The method combines a bidirectional long-short term memory (BilSTM) unit network and a Conditional Random Field (CRF) method to modularize the relationship between named entities, and finally forms a series of interrelated knowledge concepts, namely, a knowledge graph is combined in a library mode to support a query request; the method can effectively solve the typical problems of more professional terms, less training corpora and the like, improves the accuracy of key steps such as named entity recognition and the like, and finally improves the performance of the knowledge graph.
The invention further improves the method, and further comprises the step S5 of constructing an application: and constructing a flow standardization application and a fault handling guidance application based on the professional knowledge graph of the power communication network. In terms of process standardization, due to the importance of the power grid, all operations are performed strictly according to the standard process of operation stabilization and switching operations. With the sophistication of grid models and the sophistication of power communication network tasks, following standard procedures requires an increasing review of relevant literature. On the basis of the knowledge graph, maintenance personnel can be helped to complete tasks such as decision guidance, instruction verification, stability limit calculation and the like, and a more standard process flow is adopted, so that the workload and the operation risk are further reduced.
As a preferred technical solution of the present invention, the step S1 specifically includes: the method comprises the steps of selecting power communication network operation and maintenance data, equipment operation historical records, existing power communication network operation regulations and system guidelines as bases, and conducting data preprocessing on the input of the power communication network operation and maintenance data, the equipment operation historical records, the existing power communication network operation regulations and the system guidelines from structured data, semi-structured data to unstructured data, wherein the preprocessing comprises word segmentation and reduction processing, so that the words are converted into unified structured data to facilitate further extraction.
As a preferred technical solution of the present invention, the step S2 specifically includes: firstly, the term concept labeling is carried out on the unified structured data obtained in the step S1, then the label information on the labeled words is diffused and propagated by using a K neighbor algorithm and a synonym transformation data enhancement means, and the entity identification with supervision information guidance is obtained, so that the labeled structure document is obtained.
As a preferred embodiment of the present invention, the step S3 specifically includes the following steps: and (4) extracting word feature vectors of the labeled structure documents obtained in the step S2 by adopting a BERT model, wherein the BERT model is obtained by pre-training a massive corpus and can be used for extracting information of the documents to obtain word vectors, so that semantic feature extraction is realized.
As a preferred technical solution of the present invention, the specific steps of extracting the word feature vector by using the BERT model in step S3 are as follows: setting an input X formed by vectors of L C channels, wherein a BERT model adopts a multi-head attention mechanism; firstly, mapping the vector X ∈ X of each C channel to h low-dimensional subspaces to obtain the vector of the low-dimensional subspacesAnd ensure hXCsubC, the result of the different low-dimensional subspace is restored to the original space after the series operation to form a vector of the C channel as an output; each vector x' on each low-dimensional subspace is obtained by projection of a mapping matrixq, k and v, obtaining an attention matrix by performing cross correlation on q and k, weighting v by using the attention matrix to obtain output on a low-dimensional subspace, and expressing the output on the low-dimensional subspace by using a matrix form according to the following formula:
where Q, K and V are vector groups consisting of Q, K and V, respectively, expressed in matrix form.
As a preferred technical solution of the present invention, in the step S4, the semantic features extracted by using the BERT model in the step S3 are input into a BiLSTM-CRF model for real name entity identification, specifically:
s41: firstly, a bidirectional LSTM method, namely a BilSTM method, is adopted to respectively calculate forward LSTM and backward LSTM for each word sequence, and then the outputs at the same position are combined; the unidirectional LSTM adopts four gate function input gates, an input modulation gate, a forgetting gate, an output gate and a storage unit to modulate the time sequence signal, and the output of the BiLSTM simultaneously modulates all the information in the forward direction and the backward direction of the current position;
s42: then according to the relationship between the modeled adjacent labels, adopting a conditional random field CRF as the supplement of the BilSTM; based on a random field principle, the CRF adopts an adjacent entity concept related to a current entity to smooth a recognition result; the score output by the BilSTM is used as the input of a conditional random field CRF, and the category with the highest score in the category sequence of the label is the final result of prediction, so that the semantics of the entity is output, and the final prediction of the corresponding relation is kept;
s43: and then completing knowledge modeling by means of disambiguation and alignment, and finally storing the knowledge as a knowledge graph in a key value pair mode.
Compared with the prior art, the invention has the beneficial effects that: according to the electric power communication network knowledge graph construction method based on the BERT and the BilSTM-CRF entity recognition model, the BERT and the BilSTM-CRF model are introduced into the electric power communication network knowledge graph construction problem, semantic information hidden in a term context is fully extracted by the BERT method, term ambiguity is reduced, and recognition accuracy is improved; finally, the accuracy of the concepts and the incidence relations in the whole knowledge graph is improved.
Drawings
The technical scheme of the invention is further described by combining the accompanying drawings as follows:
FIG. 1 is a construction process of a knowledge graph related to a power communication network of the method for constructing the knowledge graph of the power communication network based on the BERT plus the BilTM-CRF entity recognition model;
FIG. 2 is a process of BERT model semantic information extraction of the electric power communication network knowledge graph construction method based on the BERT plus the BilSTM-CRF entity recognition model of the invention;
FIG. 3 is a power communication network process standardization application framework supported by knowledge graph in the power communication network knowledge graph construction method based on BERT plus BilTM-CRF entity recognition model of the present invention;
FIG. 4 is a power communication network fault handling guidance application framework supported by a knowledge graph in the power communication network knowledge graph construction method based on the BERT plus the BilTM-CRF entity recognition model.
Detailed Description
For the purpose of enhancing the understanding of the present invention, the present invention will be described in further detail with reference to the accompanying drawings and examples, which are provided for the purpose of illustration only and are not intended to limit the scope of the present invention.
Example (b): as shown in fig. 1, the method for constructing the knowledge graph of the power communication network based on BERT plus the BiLSTM-CRF entity recognition model specifically comprises the following steps:
s1 builds an original document library: selecting data of different sources and structures as a basis, and constructing an original document library; the step S1 specifically includes: selecting power communication network operation and maintenance data, equipment operation historical records, existing power communication network operation regulations and system guidelines as bases, and performing data preprocessing on the input of the power communication network operation and maintenance data, the equipment operation historical records, the existing power communication network operation regulations and the system guidelines from structured data, semi-structured data to unstructured data, wherein the preprocessing comprises word segmentation and reduction processing, so that the words are converted into uniform structured data for further extraction;
s2 label migration: after data in a uniform format is obtained, naming entity identification is needed for the data in the document content to extract terms and concepts; marking the original knowledge data of the power communication network, and carrying out label migration on the basis of marking information; the step S2 specifically includes: firstly, labeling the term concept of the unified structured data obtained in the step S1, and in order to further improve the accuracy of entity recognition, labeling part of the term concept by manpower before feature extraction, and then performing diffusion propagation on label information on a labeled word by using a K neighbor algorithm and a synonym transformation data enhancement means to obtain entity recognition with supervision information guidance, thereby obtaining a labeled structure document;
s3 semantic feature extraction: training based on a BERT model to realize semantic feature extraction of the structured document data of the power communication network; as shown in fig. 2, word feature vector extraction is performed on the labeled structure document obtained in step S2 by using a BERT model, which is obtained by pre-training a massive corpus and can be used to extract information of the document to obtain word vectors, thereby implementing semantic feature extraction; the BERT model is generally obtained by pre-training a massive corpus, and can be used for accurately extracting information of a document to obtain a high-quality word vector, so that extraction and classification of an entity are facilitated, and the entity identification accuracy is finally improved; the traditional entity recognition is mainly based on Word embedding methods such as Word2Vec and the like in the knowledge extraction step, and the natural language is preliminarily mapped to the feature space, and one of the defects is that the static Word embedding method cannot express the multiple meanings of words in the context; the BERT model realizes the extraction of bidirectional characteristics on sentences by introducing a converter (Transformer) structure; the method fully considers the characteristics of Chinese language and professional field, realizes sentence segmentation by taking Chinese words rather than characters as basic units, and generates the input of BERT model by randomly masking words in the sentences; as shown in FIG. 2, the core building block of BERT is the converter structure, which is advantageousA similar attention mechanism in human language understanding is used; the method for extracting the word feature vector by adopting the BERT model comprises the following specific steps: setting an input X formed by vectors of L C channels, wherein a BERT model adopts a multi-head attention mechanism; firstly, mapping the vector X ∈ X of each C channel to h low-dimensional subspaces to obtain the vector of the low-dimensional subspacesAnd ensure hXCsubC, the result of the different low-dimensional subspace is restored to the original space after the series operation to form a vector of the C channel as an output; obtaining three groups of vectors of q, k and upsilon by projecting each vector x' on each low-dimensional subspace through a mapping matrix, obtaining an attention matrix by performing cross correlation on q and k, then weighting upsilon through the attention matrix to obtain output on the low-dimensional subspace, and expressing the output on the low-dimensional subspace by using a matrix form as follows:
q, K and V are vector groups formed by Q, K and upsilon expressed in a matrix form respectively;
s4 constructing a knowledge graph: carrying out named entity recognition on the extracted semantic features based on a BilSTM-CRF model, extracting accurate knowledge concepts and mutual connection, and forming a knowledge graph by matching with knowledge modeling;
in the step S4, the semantic features extracted by the BERT model in the step S3 are input into a BilSTM-CRF model for real-name entity recognition, wherein an LSTM structure is a typical implementation of a Recurrent Neural Network (RNN), and consists of an input gate, an input modulation gate, a forgetting gate and an output gate, and a storage unit is utilized to explicitly transfer information for short-term memory in a time sequence; additionally, the learnable parameters in the different gate structures represent long-term memory expressed by the training data. However, the unidirectional LSTM cannot fully consider the context knowledge around the word, and the specific steps of knowledge modeling to form the knowledge graph by adopting the bidirectional LSTM method, i.e. the BiLSTM method, in the method are as follows:
s41: firstly, a bidirectional LSTM method, namely a BilSTM method, is adopted to respectively calculate forward LSTM and backward LSTM for each word sequence, and then the outputs at the same position are combined; the unidirectional LSTM adopts four gate function input gates, an input modulation gate, a forgetting gate, an output gate and a storage unit to modulate the time sequence signal, and the output of the BiLSTM simultaneously modulates all the information in the forward direction and the backward direction of the current position;
s42: then according to the relationship between the modeled adjacent labels, adopting a conditional random field CRF as the supplement of the BilSTM; based on a random field principle, the CRF adopts an adjacent entity concept related to a current entity to smooth a recognition result; the score output by the BilSTM is used as the input of a conditional random field CRF, and the category with the highest score in the category sequence of the label is the final result of prediction, so that the semantics of the entity is output, and the final prediction of the corresponding relation is kept;
s43: on the basis of entity identification, modeling knowledge is completed by adopting a disambiguation and alignment mode shown in figure 1, and finally the knowledge is stored as a knowledge graph in a key value pair mode;
s5 construction application: establishing a flow standardization application and a fault handling guidance application based on the professional knowledge map of the power communication network; the method takes the electric power communication network professional knowledge map constructed in the steps S1-S4 as a support, and an electric power communication network flow standardization application framework shown in figure 3 and an electric power communication network fault handling guidance application framework shown in figure 4 are constructed; for process standardization, due to the importance of the power grid, all operations are executed strictly according to the standard process of operation stability and switching operation; with the sophistication of grid models and the sophistication of power communication network tasks, following standard procedures requires an increasing review of relevant literature. On the basis of the knowledge graph, the system shown in fig. 3 can help maintenance personnel to complete tasks such as decision guidance, instruction verification, stability limit calculation and the like, and a more standard process flow is adopted, so that the workload and the operation risk are further reduced. As for the guidance of fault handling, fault handling becomes more delicate due to the increasingly complex potential structure of the power communication network. One failure may result in a series of hidden risks, and different failures may have fairly similar symptoms. In the face of this, simple treatment based on experience may bring additional problems. The knowledge-graph-based fault handling guidance shown in FIG. 4 may provide more accurate fault handling recommendations and guidance; for example, when a fiber fails, the knowledgemap automatically searches for all concepts associated with the fiber patch cord, giving detailed operating guidance and prompts based on specific parameters and medical history symptoms.
It is obvious to those skilled in the art that the present invention is not limited to the above embodiments, and it is within the scope of the present invention to adopt various insubstantial modifications of the method concept and technical scheme of the present invention, or to directly apply the concept and technical scheme of the present invention to other occasions without modification.
Claims (7)
1. A power communication network knowledge graph construction method based on a BERT and a BilSTM-CRF entity recognition model is characterized by comprising the following steps:
s1 builds an original document library: selecting data of different sources and structures as a basis, and constructing an original document library;
s2 label migration: marking the original knowledge data of the power communication network, and carrying out label migration on the basis of marking information;
s3 semantic feature extraction: training based on a BERT model to realize semantic feature extraction of the structured document data of the power communication network;
s4 constructing a knowledge graph: and carrying out named entity recognition on the extracted semantic features based on a BilSTM-CRF model, extracting knowledge concepts and mutual connection, and forming a knowledge graph by matching with knowledge modeling.
2. The method for constructing the knowledge graph of the power communication network based on the BERT plus BilTM-CRF entity recognition model as claimed in claim 1, further comprising the step S5 of constructing an application: and constructing a flow standardization application and a fault handling guidance application based on the professional knowledge graph of the power communication network.
3. The method for constructing the knowledge graph of the power communication network based on the BERT plus BiLSTM-CRF entity recognition model according to claim 2, wherein the step S1 specifically comprises: the method comprises the steps of selecting power communication network operation and maintenance data, equipment operation historical records, existing power communication network operation regulations and system guidelines as bases, and conducting data preprocessing on the input of the power communication network operation and maintenance data, the equipment operation historical records, the existing power communication network operation regulations and the system guidelines from structured data, semi-structured data to unstructured data, wherein the preprocessing comprises word segmentation and reduction processing, so that the words are converted into unified structured data to facilitate further extraction.
4. The method for constructing the knowledge graph of the power communication network based on the BERT plus BiLSTM-CRF entity recognition model according to claim 3, wherein the step S2 specifically comprises: firstly, the term concept labeling is carried out on the unified structured data obtained in the step S1, then the label information on the labeled words is diffused and propagated by using a K neighbor algorithm and a synonym transformation data enhancement means, and the entity identification with supervision information guidance is obtained, so that the labeled structure document is obtained.
5. The method for constructing the knowledge graph of the power communication network based on the BERT plus BilTM-CRF entity recognition model as claimed in claim 4, wherein said step S3 specifically comprises the following steps: and (4) performing word feature vector extraction on the labeled structure document obtained in the step S2 by adopting a BERT model, wherein the BERT model is obtained by pre-training a massive corpus and can perform information extraction on the document to obtain word equivalent, so that semantic feature extraction is realized.
6. The method for constructing knowledge graph of power communication network based on BERT plus BilTM-CRF entity recognition model as claimed in claim 5Characterized in that, the concrete steps of extracting the word feature vector by adopting the BERT model in the step S3 are as follows: setting an input X formed by vectors of L C channels, wherein a BERT model adopts a multi-head attention mechanism; firstly, mapping the vector X ∈ X of each C channel to h low-dimensional subspaces to obtain the vector of the low-dimensional subspacesAnd ensure hXCsubC, the result of the different low-dimensional subspace is restored to the original space after the series operation to form a vector of the C channel as an output; obtaining three groups of vectors of q, k and v by projecting each vector x' on each low-dimensional subspace through a mapping matrix, obtaining an attention matrix by performing cross correlation on q and k, weighting v by using the attention matrix to obtain output on the low-dimensional subspace, and expressing the output on the low-dimensional subspace by using a matrix form according to the following formula:
where Q, K and V are vector groups consisting of Q, K and V, respectively, expressed in matrix form.
7. The method for constructing a knowledge graph of a power communication network based on BERT plus BilTM-CRF entity recognition model of claim 4, wherein in the step S4, the semantic features extracted by the BERT model in the step S3 are inputted into the BilTM-CRF model for real-name entity recognition, specifically:
s41: firstly, a bidirectional LSTM method, namely a BilSTM method, is adopted to respectively calculate forward LSTM and backward LSTM for each word sequence, and then the outputs at the same position are combined; the unidirectional LSTM adopts four gate function input gates, an input modulation gate, a forgetting gate, an output gate and a storage unit to modulate the time sequence signal, and the output of the BiLSTM simultaneously modulates all the information in the forward direction and the backward direction of the current position;
s42: then according to the relationship between the modeled adjacent labels, adopting a conditional random field CRF as the supplement of the BilSTM; based on a random field principle, the CRF smoothes the recognition result by adopting the concept of an adjacent entity related to the current entity; the score output by the BilSTM is used as the input of a conditional random field CRF, and the category with the highest score in the category sequence of the label is the final result of prediction, so that the semantics of the entity is output, and the final prediction of the corresponding relation is kept;
s43: and then completing knowledge modeling by means of disambiguation and alignment, and finally storing the knowledge as a knowledge graph in a key value pair mode.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011588999.8A CN112613314A (en) | 2020-12-29 | 2020-12-29 | Electric power communication network knowledge graph construction method based on BERT model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011588999.8A CN112613314A (en) | 2020-12-29 | 2020-12-29 | Electric power communication network knowledge graph construction method based on BERT model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112613314A true CN112613314A (en) | 2021-04-06 |
Family
ID=75248656
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011588999.8A Pending CN112613314A (en) | 2020-12-29 | 2020-12-29 | Electric power communication network knowledge graph construction method based on BERT model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112613314A (en) |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113111660A (en) * | 2021-04-22 | 2021-07-13 | 脉景(杭州)健康管理有限公司 | Data processing method, device, equipment and storage medium |
CN113139069A (en) * | 2021-05-14 | 2021-07-20 | 上海交通大学 | Knowledge graph construction-oriented Chinese text entity identification method and system for power failure |
CN113449526A (en) * | 2021-08-27 | 2021-09-28 | 杭萧钢构股份有限公司 | Method and system for analyzing applicability of steel structure production scheduling strategy |
CN113569016A (en) * | 2021-09-27 | 2021-10-29 | 北京语言大学 | Bert model-based professional term extraction method and device |
CN113779255A (en) * | 2021-09-13 | 2021-12-10 | 广州汇通国信科技有限公司 | Identification method and device based on LSTM neural network and knowledge graph |
CN113806554A (en) * | 2021-09-14 | 2021-12-17 | 上海云思智慧信息技术有限公司 | Knowledge graph construction method for massive conference texts |
CN113836940A (en) * | 2021-09-26 | 2021-12-24 | 中国南方电网有限责任公司 | Knowledge fusion method and device in electric power metering field and computer equipment |
CN114004230A (en) * | 2021-09-23 | 2022-02-01 | 杭萧钢构股份有限公司 | Industrial control scheduling method and system for producing steel structure |
CN114154505A (en) * | 2021-12-07 | 2022-03-08 | 国网四川省电力公司经济技术研究院 | Named entity identification method for power planning review field |
CN114168745A (en) * | 2021-11-30 | 2022-03-11 | 大连理工大学 | Knowledge graph construction method for production process of ethylene oxide derivative |
CN114707005A (en) * | 2022-06-02 | 2022-07-05 | 浙江建木智能系统有限公司 | Knowledge graph construction method and system for ship equipment |
CN115048492A (en) * | 2022-06-17 | 2022-09-13 | 广东电网有限责任公司 | Method, device and equipment for processing defect information of power equipment and storage medium |
CN115168603A (en) * | 2022-06-27 | 2022-10-11 | 天翼爱音乐文化科技有限公司 | Automatic feedback response method, device and storage medium for color ring back tone service process |
CN115238688A (en) * | 2022-08-15 | 2022-10-25 | 广州市刑事科学技术研究所 | Electronic information data association relation analysis method, device, equipment and storage medium |
CN116091045A (en) * | 2023-02-28 | 2023-05-09 | 武汉烽火技术服务有限公司 | Knowledge-graph-based communication network operation and maintenance method and operation and maintenance device |
CN116644192A (en) * | 2023-05-30 | 2023-08-25 | 中国民用航空飞行学院 | Knowledge graph construction method based on reliability of aircraft parts |
CN117012185A (en) * | 2023-06-20 | 2023-11-07 | 国网山东省电力公司泗水县供电公司 | Power grid dispatching method and system based on knowledge graph |
CN117151117A (en) * | 2023-10-30 | 2023-12-01 | 国网浙江省电力有限公司营销服务中心 | Automatic identification method, device and medium for power grid lightweight unstructured document content |
CN117875414A (en) * | 2023-12-06 | 2024-04-12 | 中新金桥数字科技(北京)有限公司 | Knowledge graph model construction method |
CN117874755A (en) * | 2024-03-13 | 2024-04-12 | 中国电子科技集团公司第三十研究所 | System and method for identifying hidden network threat users |
CN117993050A (en) * | 2023-12-27 | 2024-05-07 | 清华大学 | Building design method and system based on knowledge-enhanced diffusion model |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110287334A (en) * | 2019-06-13 | 2019-09-27 | 淮阴工学院 | A kind of school's domain knowledge map construction method based on Entity recognition and attribute extraction model |
CN111159385A (en) * | 2019-12-31 | 2020-05-15 | 南京烽火星空通信发展有限公司 | Template-free universal intelligent question-answering method based on dynamic knowledge graph |
CN111241837A (en) * | 2020-01-04 | 2020-06-05 | 大连理工大学 | Theft case legal document named entity identification method based on anti-migration learning |
CN111488734A (en) * | 2020-04-14 | 2020-08-04 | 西安交通大学 | Emotional feature representation learning system and method based on global interaction and syntactic dependency |
CN111737496A (en) * | 2020-06-29 | 2020-10-02 | 东北电力大学 | Power equipment fault knowledge map construction method |
-
2020
- 2020-12-29 CN CN202011588999.8A patent/CN112613314A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110287334A (en) * | 2019-06-13 | 2019-09-27 | 淮阴工学院 | A kind of school's domain knowledge map construction method based on Entity recognition and attribute extraction model |
CN111159385A (en) * | 2019-12-31 | 2020-05-15 | 南京烽火星空通信发展有限公司 | Template-free universal intelligent question-answering method based on dynamic knowledge graph |
CN111241837A (en) * | 2020-01-04 | 2020-06-05 | 大连理工大学 | Theft case legal document named entity identification method based on anti-migration learning |
CN111488734A (en) * | 2020-04-14 | 2020-08-04 | 西安交通大学 | Emotional feature representation learning system and method based on global interaction and syntactic dependency |
CN111737496A (en) * | 2020-06-29 | 2020-10-02 | 东北电力大学 | Power equipment fault knowledge map construction method |
Non-Patent Citations (2)
Title |
---|
吴俊 等: "基于BERT嵌入BiLSTM-CRF模型的中文专业术语抽取研究", 《情报学报》, vol. 39, no. 04, pages 409 - 418 * |
李俊卿 等: "基于随机森林重要性的LSTM网络风电功率缺失数据补齐", 《电器与能效管理技术》, no. 13, pages 47 - 52 * |
Cited By (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113111660A (en) * | 2021-04-22 | 2021-07-13 | 脉景(杭州)健康管理有限公司 | Data processing method, device, equipment and storage medium |
CN113139069A (en) * | 2021-05-14 | 2021-07-20 | 上海交通大学 | Knowledge graph construction-oriented Chinese text entity identification method and system for power failure |
CN113449526B (en) * | 2021-08-27 | 2022-02-08 | 杭萧钢构股份有限公司 | Method and system for analyzing applicability of steel structure production scheduling strategy |
CN113449526A (en) * | 2021-08-27 | 2021-09-28 | 杭萧钢构股份有限公司 | Method and system for analyzing applicability of steel structure production scheduling strategy |
CN113779255A (en) * | 2021-09-13 | 2021-12-10 | 广州汇通国信科技有限公司 | Identification method and device based on LSTM neural network and knowledge graph |
CN113806554A (en) * | 2021-09-14 | 2021-12-17 | 上海云思智慧信息技术有限公司 | Knowledge graph construction method for massive conference texts |
CN114004230A (en) * | 2021-09-23 | 2022-02-01 | 杭萧钢构股份有限公司 | Industrial control scheduling method and system for producing steel structure |
CN113836940A (en) * | 2021-09-26 | 2021-12-24 | 中国南方电网有限责任公司 | Knowledge fusion method and device in electric power metering field and computer equipment |
CN113836940B (en) * | 2021-09-26 | 2024-04-12 | 南方电网数字电网研究院股份有限公司 | Knowledge fusion method and device in electric power metering field and computer equipment |
CN113569016A (en) * | 2021-09-27 | 2021-10-29 | 北京语言大学 | Bert model-based professional term extraction method and device |
CN114168745A (en) * | 2021-11-30 | 2022-03-11 | 大连理工大学 | Knowledge graph construction method for production process of ethylene oxide derivative |
CN114168745B (en) * | 2021-11-30 | 2022-08-09 | 大连理工大学 | Knowledge graph construction method for production process of ethylene oxide derivative |
CN114154505A (en) * | 2021-12-07 | 2022-03-08 | 国网四川省电力公司经济技术研究院 | Named entity identification method for power planning review field |
CN114154505B (en) * | 2021-12-07 | 2024-07-16 | 国网四川省电力公司经济技术研究院 | Named entity identification method oriented to power planning review field |
CN114707005A (en) * | 2022-06-02 | 2022-07-05 | 浙江建木智能系统有限公司 | Knowledge graph construction method and system for ship equipment |
CN114707005B (en) * | 2022-06-02 | 2022-10-25 | 浙江建木智能系统有限公司 | Knowledge graph construction method and system for ship equipment |
CN115048492A (en) * | 2022-06-17 | 2022-09-13 | 广东电网有限责任公司 | Method, device and equipment for processing defect information of power equipment and storage medium |
CN115168603A (en) * | 2022-06-27 | 2022-10-11 | 天翼爱音乐文化科技有限公司 | Automatic feedback response method, device and storage medium for color ring back tone service process |
CN115168603B (en) * | 2022-06-27 | 2023-04-07 | 天翼爱音乐文化科技有限公司 | Automatic feedback response method, device and storage medium for color ring back tone service process |
CN115238688A (en) * | 2022-08-15 | 2022-10-25 | 广州市刑事科学技术研究所 | Electronic information data association relation analysis method, device, equipment and storage medium |
CN116091045A (en) * | 2023-02-28 | 2023-05-09 | 武汉烽火技术服务有限公司 | Knowledge-graph-based communication network operation and maintenance method and operation and maintenance device |
CN116644192A (en) * | 2023-05-30 | 2023-08-25 | 中国民用航空飞行学院 | Knowledge graph construction method based on reliability of aircraft parts |
CN117012185A (en) * | 2023-06-20 | 2023-11-07 | 国网山东省电力公司泗水县供电公司 | Power grid dispatching method and system based on knowledge graph |
CN117151117A (en) * | 2023-10-30 | 2023-12-01 | 国网浙江省电力有限公司营销服务中心 | Automatic identification method, device and medium for power grid lightweight unstructured document content |
CN117151117B (en) * | 2023-10-30 | 2024-03-01 | 国网浙江省电力有限公司营销服务中心 | Automatic identification method, device and medium for power grid lightweight unstructured document content |
CN117875414A (en) * | 2023-12-06 | 2024-04-12 | 中新金桥数字科技(北京)有限公司 | Knowledge graph model construction method |
CN117993050A (en) * | 2023-12-27 | 2024-05-07 | 清华大学 | Building design method and system based on knowledge-enhanced diffusion model |
CN117993050B (en) * | 2023-12-27 | 2024-09-17 | 清华大学 | Building design method and system based on knowledge-enhanced diffusion model |
CN117874755A (en) * | 2024-03-13 | 2024-04-12 | 中国电子科技集团公司第三十研究所 | System and method for identifying hidden network threat users |
CN117874755B (en) * | 2024-03-13 | 2024-05-10 | 中国电子科技集团公司第三十研究所 | System and method for identifying hidden network threat users |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112613314A (en) | Electric power communication network knowledge graph construction method based on BERT model | |
WO2022037256A1 (en) | Text sentence processing method and device, computer device and storage medium | |
CN111177393B (en) | Knowledge graph construction method and device, electronic equipment and storage medium | |
WO2021121198A1 (en) | Semantic similarity-based entity relation extraction method and apparatus, device and medium | |
CN111737476A (en) | Text processing method and device, computer readable storage medium and electronic equipment | |
US20210217504A1 (en) | Method and apparatus for verifying medical fact | |
CN112100332A (en) | Word embedding expression learning method and device and text recall method and device | |
CN111522839A (en) | Natural language query method based on deep learning | |
CN112131883B (en) | Language model training method, device, computer equipment and storage medium | |
CN110647632B (en) | Image and text mapping technology based on machine learning | |
WO2022088671A1 (en) | Automated question answering method and apparatus, device, and storage medium | |
CN113779225B (en) | Training method of entity link model, entity link method and device | |
WO2023137918A1 (en) | Text data analysis method and apparatus, model training method, and computer device | |
CN115115914B (en) | Information identification method, apparatus and computer readable storage medium | |
CN114676255A (en) | Text processing method, device, equipment, storage medium and computer program product | |
CN113705218A (en) | Event element gridding extraction method based on character embedding, storage medium and electronic device | |
WO2022134793A1 (en) | Method and apparatus for extracting semantic information in video frame, and computer device | |
CN114880991B (en) | Knowledge graph question-answering question-sentence entity linking method, device, equipment and medium | |
CN113704434A (en) | Knowledge base question and answer method, electronic equipment and readable storage medium | |
CN115114419A (en) | Question and answer processing method and device, electronic equipment and computer readable medium | |
CN111931503B (en) | Information extraction method and device, equipment and computer readable storage medium | |
CN117556048A (en) | Artificial intelligence-based intention recognition method, device, equipment and medium | |
CN111737951B (en) | Text language incidence relation labeling method and device | |
CN112199954A (en) | Disease entity matching method and device based on voice semantics and computer equipment | |
CN114925681B (en) | Knowledge graph question-answering question-sentence entity linking method, device, equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210406 |