CN112613314A - Electric power communication network knowledge graph construction method based on BERT model - Google Patents

Electric power communication network knowledge graph construction method based on BERT model Download PDF

Info

Publication number
CN112613314A
CN112613314A CN202011588999.8A CN202011588999A CN112613314A CN 112613314 A CN112613314 A CN 112613314A CN 202011588999 A CN202011588999 A CN 202011588999A CN 112613314 A CN112613314 A CN 112613314A
Authority
CN
China
Prior art keywords
communication network
power communication
knowledge graph
knowledge
bert
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011588999.8A
Other languages
Chinese (zh)
Inventor
吴海洋
陈鹏
李伟
戴勇
蒋春霞
顾彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information and Telecommunication Branch of State Grid Jiangsu Electric Power Co Ltd
Original Assignee
Information and Telecommunication Branch of State Grid Jiangsu Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information and Telecommunication Branch of State Grid Jiangsu Electric Power Co Ltd filed Critical Information and Telecommunication Branch of State Grid Jiangsu Electric Power Co Ltd
Priority to CN202011588999.8A priority Critical patent/CN112613314A/en
Publication of CN112613314A publication Critical patent/CN112613314A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Molecular Biology (AREA)
  • Economics (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a construction method of a knowledge graph of a power communication network based on a BERT model, which comprises the following steps: s1 builds an original document library: selecting data of different sources and structures as a basis, and constructing an original document library; s2 label migration: marking the original knowledge data of the power communication network, and carrying out label migration on the basis of marking information; s3 semantic feature extraction: training based on a BERT model to realize semantic feature extraction of the structured document data of the power communication network; s4 constructing a knowledge graph: and carrying out named entity recognition on the extracted semantic features based on a BilSTM-CRF model, extracting knowledge concepts and mutual connection, and forming a knowledge graph by matching with knowledge modeling. The method effectively solves the typical problems of more professional terms, less training corpus and the like, improves the accuracy of key steps such as named entity recognition and the like, and finally improves the performance of the knowledge map.

Description

Electric power communication network knowledge graph construction method based on BERT model
Technical Field
The invention relates to the technical field of informatization, automation and intelligent maintenance of a power communication network, in particular to a construction method of a power communication network knowledge graph based on BERT and a BilSTM-CRF entity recognition model.
Background
An important accessory of modern power networks is their supporting power communication network. Compared with the larger-scale general internet, the power communication network still keeps a relatively limited scale, and the complexity of the load service is more specific. However, as more and more new technologies such as internet of things and artificial intelligence are introduced into the power grid construction, and signals are transmitted into the power grid by various devices to meet business requirements, the structure of the power communication network has become more complex. Unlike the operation data of the junction grid, the electric power communication network extracts a wide range of information from environmental conditions to various on-site monitoring signals and corresponding contents. In such a situation, various unknown service operations may cause potential failures, and maintenance operators of the power communication network will face increasingly challenging problems. When a fault occurs, they must consult a large number of specific files of relevant equipment and services to locate the fault. This situation severely limits the efficiency of problem location and resolution.
For similar problems, in the field of general knowledge search, knowledge graph technology can effectively improve the efficiency of knowledge retrieval. As a typical artificial intelligence technology, the proposal of the knowledge graph is firstly used for improving the problem understanding capability of a search engine. The core idea is that the existing concepts, entities, events and the relations among the concepts, the entities, the events and the relations among the events are described in a structural mode. The construction of the knowledge graph mainly comprises the steps of collecting, processing, extracting and representing knowledge. Due to the self-ability, the knowledge graph is widely applied to related tasks such as general knowledge retrieval and search engines, recommendation algorithms, intelligent storage systems and the like. Compared with the general knowledge graph technology, the technology is applied to a specific field, and the establishment of a small-sized knowledge graph with higher specialization degree is a research trend in recent years. A key challenge for knowledgegraph construction in the field of expertise is the existence of many specific terms and concepts in a relatively small corpus. Traditional entity recognition methods such as Word2Vec still focus on feature extraction of single words, rather than contextual semantic information, which leads to limitations in distinguishing specific concepts and expressing relationships between them for further application.
Therefore, aiming at the problems, the invention introduces the knowledge graph into the operation and maintenance scene of the power communication network, and provides a power communication network knowledge graph construction method based on BERT and a BilSTM-CRF entity recognition model; according to the fact that a power communication network maintains a plurality of data sources and different structures, the method combines a bidirectional long-short term memory (BilSTM) unit network and a Conditional Random Field (CRF) method to modularize the relation among named entities, and finally forms a series of mutually related knowledge concepts to combine and support query requests in a library mode; the method can effectively solve the typical problems of more professional terms, less training corpora and the like, improves the accuracy of key steps such as named entity recognition and the like, and finally improves the performance of the knowledge graph.
Disclosure of Invention
The invention aims to solve the technical problem of providing a power communication network knowledge graph construction method based on BERT and a BilSTM-CRF entity recognition model; according to the fact that a power communication network maintains a plurality of data sources and different structures, the method combines a bidirectional long-short term memory (BilSTM) unit network and a Conditional Random Field (CRF) method to modularize the relation among named entities, and finally forms a series of mutually related knowledge concepts to combine and support query requests in a library mode; the method can effectively solve the typical problems of more professional terms, less training corpora and the like, improves the accuracy of key steps such as named entity recognition and the like, and finally improves the performance of the knowledge graph.
In order to solve the technical problems, the invention adopts the technical scheme that: the electric power communication network knowledge graph construction method based on the BERT and the BilSTM-CRF entity recognition model specifically comprises the following steps:
s1 builds an original document library: selecting data of different sources and structures as a basis, and constructing an original document library;
s2 label migration: marking the original knowledge data of the power communication network, and carrying out label migration on the basis of marking information;
s3 semantic feature extraction: training based on a BERT model to realize semantic feature extraction of the structured document data of the power communication network;
s4 constructing a knowledge graph: and carrying out named entity recognition on the extracted semantic features based on a BilSTM-CRF model, extracting knowledge concepts and mutual connection, and forming a knowledge graph by matching with knowledge modeling.
By adopting the technical scheme, the power communication network has multiple maintenance data sources and different structures, and the method aligns the original data into a unified frame at first. Then, a transformer-based bidirectional coded representation (BERT) model is used as a basic feature extraction method for words in the above-described document. The method combines a bidirectional long-short term memory (BilSTM) unit network and a Conditional Random Field (CRF) method to modularize the relationship between named entities, and finally forms a series of interrelated knowledge concepts, namely, a knowledge graph is combined in a library mode to support a query request; the method can effectively solve the typical problems of more professional terms, less training corpora and the like, improves the accuracy of key steps such as named entity recognition and the like, and finally improves the performance of the knowledge graph.
The invention further improves the method, and further comprises the step S5 of constructing an application: and constructing a flow standardization application and a fault handling guidance application based on the professional knowledge graph of the power communication network. In terms of process standardization, due to the importance of the power grid, all operations are performed strictly according to the standard process of operation stabilization and switching operations. With the sophistication of grid models and the sophistication of power communication network tasks, following standard procedures requires an increasing review of relevant literature. On the basis of the knowledge graph, maintenance personnel can be helped to complete tasks such as decision guidance, instruction verification, stability limit calculation and the like, and a more standard process flow is adopted, so that the workload and the operation risk are further reduced.
As a preferred technical solution of the present invention, the step S1 specifically includes: the method comprises the steps of selecting power communication network operation and maintenance data, equipment operation historical records, existing power communication network operation regulations and system guidelines as bases, and conducting data preprocessing on the input of the power communication network operation and maintenance data, the equipment operation historical records, the existing power communication network operation regulations and the system guidelines from structured data, semi-structured data to unstructured data, wherein the preprocessing comprises word segmentation and reduction processing, so that the words are converted into unified structured data to facilitate further extraction.
As a preferred technical solution of the present invention, the step S2 specifically includes: firstly, the term concept labeling is carried out on the unified structured data obtained in the step S1, then the label information on the labeled words is diffused and propagated by using a K neighbor algorithm and a synonym transformation data enhancement means, and the entity identification with supervision information guidance is obtained, so that the labeled structure document is obtained.
As a preferred embodiment of the present invention, the step S3 specifically includes the following steps: and (4) extracting word feature vectors of the labeled structure documents obtained in the step S2 by adopting a BERT model, wherein the BERT model is obtained by pre-training a massive corpus and can be used for extracting information of the documents to obtain word vectors, so that semantic feature extraction is realized.
As a preferred technical solution of the present invention, the specific steps of extracting the word feature vector by using the BERT model in step S3 are as follows: setting an input X formed by vectors of L C channels, wherein a BERT model adopts a multi-head attention mechanism; firstly, mapping the vector X ∈ X of each C channel to h low-dimensional subspaces to obtain the vector of the low-dimensional subspaces
Figure BDA0002868261260000031
And ensure hXCsubC, the result of the different low-dimensional subspace is restored to the original space after the series operation to form a vector of the C channel as an output; each vector x' on each low-dimensional subspace is obtained by projection of a mapping matrixq, k and v, obtaining an attention matrix by performing cross correlation on q and k, weighting v by using the attention matrix to obtain output on a low-dimensional subspace, and expressing the output on the low-dimensional subspace by using a matrix form according to the following formula:
Figure BDA0002868261260000041
where Q, K and V are vector groups consisting of Q, K and V, respectively, expressed in matrix form.
As a preferred technical solution of the present invention, in the step S4, the semantic features extracted by using the BERT model in the step S3 are input into a BiLSTM-CRF model for real name entity identification, specifically:
s41: firstly, a bidirectional LSTM method, namely a BilSTM method, is adopted to respectively calculate forward LSTM and backward LSTM for each word sequence, and then the outputs at the same position are combined; the unidirectional LSTM adopts four gate function input gates, an input modulation gate, a forgetting gate, an output gate and a storage unit to modulate the time sequence signal, and the output of the BiLSTM simultaneously modulates all the information in the forward direction and the backward direction of the current position;
s42: then according to the relationship between the modeled adjacent labels, adopting a conditional random field CRF as the supplement of the BilSTM; based on a random field principle, the CRF adopts an adjacent entity concept related to a current entity to smooth a recognition result; the score output by the BilSTM is used as the input of a conditional random field CRF, and the category with the highest score in the category sequence of the label is the final result of prediction, so that the semantics of the entity is output, and the final prediction of the corresponding relation is kept;
s43: and then completing knowledge modeling by means of disambiguation and alignment, and finally storing the knowledge as a knowledge graph in a key value pair mode.
Compared with the prior art, the invention has the beneficial effects that: according to the electric power communication network knowledge graph construction method based on the BERT and the BilSTM-CRF entity recognition model, the BERT and the BilSTM-CRF model are introduced into the electric power communication network knowledge graph construction problem, semantic information hidden in a term context is fully extracted by the BERT method, term ambiguity is reduced, and recognition accuracy is improved; finally, the accuracy of the concepts and the incidence relations in the whole knowledge graph is improved.
Drawings
The technical scheme of the invention is further described by combining the accompanying drawings as follows:
FIG. 1 is a construction process of a knowledge graph related to a power communication network of the method for constructing the knowledge graph of the power communication network based on the BERT plus the BilTM-CRF entity recognition model;
FIG. 2 is a process of BERT model semantic information extraction of the electric power communication network knowledge graph construction method based on the BERT plus the BilSTM-CRF entity recognition model of the invention;
FIG. 3 is a power communication network process standardization application framework supported by knowledge graph in the power communication network knowledge graph construction method based on BERT plus BilTM-CRF entity recognition model of the present invention;
FIG. 4 is a power communication network fault handling guidance application framework supported by a knowledge graph in the power communication network knowledge graph construction method based on the BERT plus the BilTM-CRF entity recognition model.
Detailed Description
For the purpose of enhancing the understanding of the present invention, the present invention will be described in further detail with reference to the accompanying drawings and examples, which are provided for the purpose of illustration only and are not intended to limit the scope of the present invention.
Example (b): as shown in fig. 1, the method for constructing the knowledge graph of the power communication network based on BERT plus the BiLSTM-CRF entity recognition model specifically comprises the following steps:
s1 builds an original document library: selecting data of different sources and structures as a basis, and constructing an original document library; the step S1 specifically includes: selecting power communication network operation and maintenance data, equipment operation historical records, existing power communication network operation regulations and system guidelines as bases, and performing data preprocessing on the input of the power communication network operation and maintenance data, the equipment operation historical records, the existing power communication network operation regulations and the system guidelines from structured data, semi-structured data to unstructured data, wherein the preprocessing comprises word segmentation and reduction processing, so that the words are converted into uniform structured data for further extraction;
s2 label migration: after data in a uniform format is obtained, naming entity identification is needed for the data in the document content to extract terms and concepts; marking the original knowledge data of the power communication network, and carrying out label migration on the basis of marking information; the step S2 specifically includes: firstly, labeling the term concept of the unified structured data obtained in the step S1, and in order to further improve the accuracy of entity recognition, labeling part of the term concept by manpower before feature extraction, and then performing diffusion propagation on label information on a labeled word by using a K neighbor algorithm and a synonym transformation data enhancement means to obtain entity recognition with supervision information guidance, thereby obtaining a labeled structure document;
s3 semantic feature extraction: training based on a BERT model to realize semantic feature extraction of the structured document data of the power communication network; as shown in fig. 2, word feature vector extraction is performed on the labeled structure document obtained in step S2 by using a BERT model, which is obtained by pre-training a massive corpus and can be used to extract information of the document to obtain word vectors, thereby implementing semantic feature extraction; the BERT model is generally obtained by pre-training a massive corpus, and can be used for accurately extracting information of a document to obtain a high-quality word vector, so that extraction and classification of an entity are facilitated, and the entity identification accuracy is finally improved; the traditional entity recognition is mainly based on Word embedding methods such as Word2Vec and the like in the knowledge extraction step, and the natural language is preliminarily mapped to the feature space, and one of the defects is that the static Word embedding method cannot express the multiple meanings of words in the context; the BERT model realizes the extraction of bidirectional characteristics on sentences by introducing a converter (Transformer) structure; the method fully considers the characteristics of Chinese language and professional field, realizes sentence segmentation by taking Chinese words rather than characters as basic units, and generates the input of BERT model by randomly masking words in the sentences; as shown in FIG. 2, the core building block of BERT is the converter structure, which is advantageousA similar attention mechanism in human language understanding is used; the method for extracting the word feature vector by adopting the BERT model comprises the following specific steps: setting an input X formed by vectors of L C channels, wherein a BERT model adopts a multi-head attention mechanism; firstly, mapping the vector X ∈ X of each C channel to h low-dimensional subspaces to obtain the vector of the low-dimensional subspaces
Figure BDA0002868261260000061
And ensure hXCsubC, the result of the different low-dimensional subspace is restored to the original space after the series operation to form a vector of the C channel as an output; obtaining three groups of vectors of q, k and upsilon by projecting each vector x' on each low-dimensional subspace through a mapping matrix, obtaining an attention matrix by performing cross correlation on q and k, then weighting upsilon through the attention matrix to obtain output on the low-dimensional subspace, and expressing the output on the low-dimensional subspace by using a matrix form as follows:
Figure BDA0002868261260000062
q, K and V are vector groups formed by Q, K and upsilon expressed in a matrix form respectively;
s4 constructing a knowledge graph: carrying out named entity recognition on the extracted semantic features based on a BilSTM-CRF model, extracting accurate knowledge concepts and mutual connection, and forming a knowledge graph by matching with knowledge modeling;
in the step S4, the semantic features extracted by the BERT model in the step S3 are input into a BilSTM-CRF model for real-name entity recognition, wherein an LSTM structure is a typical implementation of a Recurrent Neural Network (RNN), and consists of an input gate, an input modulation gate, a forgetting gate and an output gate, and a storage unit is utilized to explicitly transfer information for short-term memory in a time sequence; additionally, the learnable parameters in the different gate structures represent long-term memory expressed by the training data. However, the unidirectional LSTM cannot fully consider the context knowledge around the word, and the specific steps of knowledge modeling to form the knowledge graph by adopting the bidirectional LSTM method, i.e. the BiLSTM method, in the method are as follows:
s41: firstly, a bidirectional LSTM method, namely a BilSTM method, is adopted to respectively calculate forward LSTM and backward LSTM for each word sequence, and then the outputs at the same position are combined; the unidirectional LSTM adopts four gate function input gates, an input modulation gate, a forgetting gate, an output gate and a storage unit to modulate the time sequence signal, and the output of the BiLSTM simultaneously modulates all the information in the forward direction and the backward direction of the current position;
s42: then according to the relationship between the modeled adjacent labels, adopting a conditional random field CRF as the supplement of the BilSTM; based on a random field principle, the CRF adopts an adjacent entity concept related to a current entity to smooth a recognition result; the score output by the BilSTM is used as the input of a conditional random field CRF, and the category with the highest score in the category sequence of the label is the final result of prediction, so that the semantics of the entity is output, and the final prediction of the corresponding relation is kept;
s43: on the basis of entity identification, modeling knowledge is completed by adopting a disambiguation and alignment mode shown in figure 1, and finally the knowledge is stored as a knowledge graph in a key value pair mode;
s5 construction application: establishing a flow standardization application and a fault handling guidance application based on the professional knowledge map of the power communication network; the method takes the electric power communication network professional knowledge map constructed in the steps S1-S4 as a support, and an electric power communication network flow standardization application framework shown in figure 3 and an electric power communication network fault handling guidance application framework shown in figure 4 are constructed; for process standardization, due to the importance of the power grid, all operations are executed strictly according to the standard process of operation stability and switching operation; with the sophistication of grid models and the sophistication of power communication network tasks, following standard procedures requires an increasing review of relevant literature. On the basis of the knowledge graph, the system shown in fig. 3 can help maintenance personnel to complete tasks such as decision guidance, instruction verification, stability limit calculation and the like, and a more standard process flow is adopted, so that the workload and the operation risk are further reduced. As for the guidance of fault handling, fault handling becomes more delicate due to the increasingly complex potential structure of the power communication network. One failure may result in a series of hidden risks, and different failures may have fairly similar symptoms. In the face of this, simple treatment based on experience may bring additional problems. The knowledge-graph-based fault handling guidance shown in FIG. 4 may provide more accurate fault handling recommendations and guidance; for example, when a fiber fails, the knowledgemap automatically searches for all concepts associated with the fiber patch cord, giving detailed operating guidance and prompts based on specific parameters and medical history symptoms.
It is obvious to those skilled in the art that the present invention is not limited to the above embodiments, and it is within the scope of the present invention to adopt various insubstantial modifications of the method concept and technical scheme of the present invention, or to directly apply the concept and technical scheme of the present invention to other occasions without modification.

Claims (7)

1. A power communication network knowledge graph construction method based on a BERT and a BilSTM-CRF entity recognition model is characterized by comprising the following steps:
s1 builds an original document library: selecting data of different sources and structures as a basis, and constructing an original document library;
s2 label migration: marking the original knowledge data of the power communication network, and carrying out label migration on the basis of marking information;
s3 semantic feature extraction: training based on a BERT model to realize semantic feature extraction of the structured document data of the power communication network;
s4 constructing a knowledge graph: and carrying out named entity recognition on the extracted semantic features based on a BilSTM-CRF model, extracting knowledge concepts and mutual connection, and forming a knowledge graph by matching with knowledge modeling.
2. The method for constructing the knowledge graph of the power communication network based on the BERT plus BilTM-CRF entity recognition model as claimed in claim 1, further comprising the step S5 of constructing an application: and constructing a flow standardization application and a fault handling guidance application based on the professional knowledge graph of the power communication network.
3. The method for constructing the knowledge graph of the power communication network based on the BERT plus BiLSTM-CRF entity recognition model according to claim 2, wherein the step S1 specifically comprises: the method comprises the steps of selecting power communication network operation and maintenance data, equipment operation historical records, existing power communication network operation regulations and system guidelines as bases, and conducting data preprocessing on the input of the power communication network operation and maintenance data, the equipment operation historical records, the existing power communication network operation regulations and the system guidelines from structured data, semi-structured data to unstructured data, wherein the preprocessing comprises word segmentation and reduction processing, so that the words are converted into unified structured data to facilitate further extraction.
4. The method for constructing the knowledge graph of the power communication network based on the BERT plus BiLSTM-CRF entity recognition model according to claim 3, wherein the step S2 specifically comprises: firstly, the term concept labeling is carried out on the unified structured data obtained in the step S1, then the label information on the labeled words is diffused and propagated by using a K neighbor algorithm and a synonym transformation data enhancement means, and the entity identification with supervision information guidance is obtained, so that the labeled structure document is obtained.
5. The method for constructing the knowledge graph of the power communication network based on the BERT plus BilTM-CRF entity recognition model as claimed in claim 4, wherein said step S3 specifically comprises the following steps: and (4) performing word feature vector extraction on the labeled structure document obtained in the step S2 by adopting a BERT model, wherein the BERT model is obtained by pre-training a massive corpus and can perform information extraction on the document to obtain word equivalent, so that semantic feature extraction is realized.
6. The method for constructing knowledge graph of power communication network based on BERT plus BilTM-CRF entity recognition model as claimed in claim 5Characterized in that, the concrete steps of extracting the word feature vector by adopting the BERT model in the step S3 are as follows: setting an input X formed by vectors of L C channels, wherein a BERT model adopts a multi-head attention mechanism; firstly, mapping the vector X ∈ X of each C channel to h low-dimensional subspaces to obtain the vector of the low-dimensional subspaces
Figure FDA0002868261250000021
And ensure hXCsubC, the result of the different low-dimensional subspace is restored to the original space after the series operation to form a vector of the C channel as an output; obtaining three groups of vectors of q, k and v by projecting each vector x' on each low-dimensional subspace through a mapping matrix, obtaining an attention matrix by performing cross correlation on q and k, weighting v by using the attention matrix to obtain output on the low-dimensional subspace, and expressing the output on the low-dimensional subspace by using a matrix form according to the following formula:
Figure FDA0002868261250000022
where Q, K and V are vector groups consisting of Q, K and V, respectively, expressed in matrix form.
7. The method for constructing a knowledge graph of a power communication network based on BERT plus BilTM-CRF entity recognition model of claim 4, wherein in the step S4, the semantic features extracted by the BERT model in the step S3 are inputted into the BilTM-CRF model for real-name entity recognition, specifically:
s41: firstly, a bidirectional LSTM method, namely a BilSTM method, is adopted to respectively calculate forward LSTM and backward LSTM for each word sequence, and then the outputs at the same position are combined; the unidirectional LSTM adopts four gate function input gates, an input modulation gate, a forgetting gate, an output gate and a storage unit to modulate the time sequence signal, and the output of the BiLSTM simultaneously modulates all the information in the forward direction and the backward direction of the current position;
s42: then according to the relationship between the modeled adjacent labels, adopting a conditional random field CRF as the supplement of the BilSTM; based on a random field principle, the CRF smoothes the recognition result by adopting the concept of an adjacent entity related to the current entity; the score output by the BilSTM is used as the input of a conditional random field CRF, and the category with the highest score in the category sequence of the label is the final result of prediction, so that the semantics of the entity is output, and the final prediction of the corresponding relation is kept;
s43: and then completing knowledge modeling by means of disambiguation and alignment, and finally storing the knowledge as a knowledge graph in a key value pair mode.
CN202011588999.8A 2020-12-29 2020-12-29 Electric power communication network knowledge graph construction method based on BERT model Pending CN112613314A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011588999.8A CN112613314A (en) 2020-12-29 2020-12-29 Electric power communication network knowledge graph construction method based on BERT model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011588999.8A CN112613314A (en) 2020-12-29 2020-12-29 Electric power communication network knowledge graph construction method based on BERT model

Publications (1)

Publication Number Publication Date
CN112613314A true CN112613314A (en) 2021-04-06

Family

ID=75248656

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011588999.8A Pending CN112613314A (en) 2020-12-29 2020-12-29 Electric power communication network knowledge graph construction method based on BERT model

Country Status (1)

Country Link
CN (1) CN112613314A (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113111660A (en) * 2021-04-22 2021-07-13 脉景(杭州)健康管理有限公司 Data processing method, device, equipment and storage medium
CN113139069A (en) * 2021-05-14 2021-07-20 上海交通大学 Knowledge graph construction-oriented Chinese text entity identification method and system for power failure
CN113449526A (en) * 2021-08-27 2021-09-28 杭萧钢构股份有限公司 Method and system for analyzing applicability of steel structure production scheduling strategy
CN113569016A (en) * 2021-09-27 2021-10-29 北京语言大学 Bert model-based professional term extraction method and device
CN113779255A (en) * 2021-09-13 2021-12-10 广州汇通国信科技有限公司 Identification method and device based on LSTM neural network and knowledge graph
CN113806554A (en) * 2021-09-14 2021-12-17 上海云思智慧信息技术有限公司 Knowledge graph construction method for massive conference texts
CN113836940A (en) * 2021-09-26 2021-12-24 中国南方电网有限责任公司 Knowledge fusion method and device in electric power metering field and computer equipment
CN114004230A (en) * 2021-09-23 2022-02-01 杭萧钢构股份有限公司 Industrial control scheduling method and system for producing steel structure
CN114154505A (en) * 2021-12-07 2022-03-08 国网四川省电力公司经济技术研究院 Named entity identification method for power planning review field
CN114168745A (en) * 2021-11-30 2022-03-11 大连理工大学 Knowledge graph construction method for production process of ethylene oxide derivative
CN114707005A (en) * 2022-06-02 2022-07-05 浙江建木智能系统有限公司 Knowledge graph construction method and system for ship equipment
CN115048492A (en) * 2022-06-17 2022-09-13 广东电网有限责任公司 Method, device and equipment for processing defect information of power equipment and storage medium
CN115168603A (en) * 2022-06-27 2022-10-11 天翼爱音乐文化科技有限公司 Automatic feedback response method, device and storage medium for color ring back tone service process
CN115238688A (en) * 2022-08-15 2022-10-25 广州市刑事科学技术研究所 Electronic information data association relation analysis method, device, equipment and storage medium
CN116091045A (en) * 2023-02-28 2023-05-09 武汉烽火技术服务有限公司 Knowledge-graph-based communication network operation and maintenance method and operation and maintenance device
CN116644192A (en) * 2023-05-30 2023-08-25 中国民用航空飞行学院 Knowledge graph construction method based on reliability of aircraft parts
CN117012185A (en) * 2023-06-20 2023-11-07 国网山东省电力公司泗水县供电公司 Power grid dispatching method and system based on knowledge graph
CN117151117A (en) * 2023-10-30 2023-12-01 国网浙江省电力有限公司营销服务中心 Automatic identification method, device and medium for power grid lightweight unstructured document content
CN117875414A (en) * 2023-12-06 2024-04-12 中新金桥数字科技(北京)有限公司 Knowledge graph model construction method
CN117874755A (en) * 2024-03-13 2024-04-12 中国电子科技集团公司第三十研究所 System and method for identifying hidden network threat users
CN117993050A (en) * 2023-12-27 2024-05-07 清华大学 Building design method and system based on knowledge-enhanced diffusion model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110287334A (en) * 2019-06-13 2019-09-27 淮阴工学院 A kind of school's domain knowledge map construction method based on Entity recognition and attribute extraction model
CN111159385A (en) * 2019-12-31 2020-05-15 南京烽火星空通信发展有限公司 Template-free universal intelligent question-answering method based on dynamic knowledge graph
CN111241837A (en) * 2020-01-04 2020-06-05 大连理工大学 Theft case legal document named entity identification method based on anti-migration learning
CN111488734A (en) * 2020-04-14 2020-08-04 西安交通大学 Emotional feature representation learning system and method based on global interaction and syntactic dependency
CN111737496A (en) * 2020-06-29 2020-10-02 东北电力大学 Power equipment fault knowledge map construction method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110287334A (en) * 2019-06-13 2019-09-27 淮阴工学院 A kind of school's domain knowledge map construction method based on Entity recognition and attribute extraction model
CN111159385A (en) * 2019-12-31 2020-05-15 南京烽火星空通信发展有限公司 Template-free universal intelligent question-answering method based on dynamic knowledge graph
CN111241837A (en) * 2020-01-04 2020-06-05 大连理工大学 Theft case legal document named entity identification method based on anti-migration learning
CN111488734A (en) * 2020-04-14 2020-08-04 西安交通大学 Emotional feature representation learning system and method based on global interaction and syntactic dependency
CN111737496A (en) * 2020-06-29 2020-10-02 东北电力大学 Power equipment fault knowledge map construction method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
吴俊 等: "基于BERT嵌入BiLSTM-CRF模型的中文专业术语抽取研究", 《情报学报》, vol. 39, no. 04, pages 409 - 418 *
李俊卿 等: "基于随机森林重要性的LSTM网络风电功率缺失数据补齐", 《电器与能效管理技术》, no. 13, pages 47 - 52 *

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113111660A (en) * 2021-04-22 2021-07-13 脉景(杭州)健康管理有限公司 Data processing method, device, equipment and storage medium
CN113139069A (en) * 2021-05-14 2021-07-20 上海交通大学 Knowledge graph construction-oriented Chinese text entity identification method and system for power failure
CN113449526B (en) * 2021-08-27 2022-02-08 杭萧钢构股份有限公司 Method and system for analyzing applicability of steel structure production scheduling strategy
CN113449526A (en) * 2021-08-27 2021-09-28 杭萧钢构股份有限公司 Method and system for analyzing applicability of steel structure production scheduling strategy
CN113779255A (en) * 2021-09-13 2021-12-10 广州汇通国信科技有限公司 Identification method and device based on LSTM neural network and knowledge graph
CN113806554A (en) * 2021-09-14 2021-12-17 上海云思智慧信息技术有限公司 Knowledge graph construction method for massive conference texts
CN114004230A (en) * 2021-09-23 2022-02-01 杭萧钢构股份有限公司 Industrial control scheduling method and system for producing steel structure
CN113836940A (en) * 2021-09-26 2021-12-24 中国南方电网有限责任公司 Knowledge fusion method and device in electric power metering field and computer equipment
CN113836940B (en) * 2021-09-26 2024-04-12 南方电网数字电网研究院股份有限公司 Knowledge fusion method and device in electric power metering field and computer equipment
CN113569016A (en) * 2021-09-27 2021-10-29 北京语言大学 Bert model-based professional term extraction method and device
CN114168745A (en) * 2021-11-30 2022-03-11 大连理工大学 Knowledge graph construction method for production process of ethylene oxide derivative
CN114168745B (en) * 2021-11-30 2022-08-09 大连理工大学 Knowledge graph construction method for production process of ethylene oxide derivative
CN114154505A (en) * 2021-12-07 2022-03-08 国网四川省电力公司经济技术研究院 Named entity identification method for power planning review field
CN114154505B (en) * 2021-12-07 2024-07-16 国网四川省电力公司经济技术研究院 Named entity identification method oriented to power planning review field
CN114707005A (en) * 2022-06-02 2022-07-05 浙江建木智能系统有限公司 Knowledge graph construction method and system for ship equipment
CN114707005B (en) * 2022-06-02 2022-10-25 浙江建木智能系统有限公司 Knowledge graph construction method and system for ship equipment
CN115048492A (en) * 2022-06-17 2022-09-13 广东电网有限责任公司 Method, device and equipment for processing defect information of power equipment and storage medium
CN115168603A (en) * 2022-06-27 2022-10-11 天翼爱音乐文化科技有限公司 Automatic feedback response method, device and storage medium for color ring back tone service process
CN115168603B (en) * 2022-06-27 2023-04-07 天翼爱音乐文化科技有限公司 Automatic feedback response method, device and storage medium for color ring back tone service process
CN115238688A (en) * 2022-08-15 2022-10-25 广州市刑事科学技术研究所 Electronic information data association relation analysis method, device, equipment and storage medium
CN116091045A (en) * 2023-02-28 2023-05-09 武汉烽火技术服务有限公司 Knowledge-graph-based communication network operation and maintenance method and operation and maintenance device
CN116644192A (en) * 2023-05-30 2023-08-25 中国民用航空飞行学院 Knowledge graph construction method based on reliability of aircraft parts
CN117012185A (en) * 2023-06-20 2023-11-07 国网山东省电力公司泗水县供电公司 Power grid dispatching method and system based on knowledge graph
CN117151117A (en) * 2023-10-30 2023-12-01 国网浙江省电力有限公司营销服务中心 Automatic identification method, device and medium for power grid lightweight unstructured document content
CN117151117B (en) * 2023-10-30 2024-03-01 国网浙江省电力有限公司营销服务中心 Automatic identification method, device and medium for power grid lightweight unstructured document content
CN117875414A (en) * 2023-12-06 2024-04-12 中新金桥数字科技(北京)有限公司 Knowledge graph model construction method
CN117993050A (en) * 2023-12-27 2024-05-07 清华大学 Building design method and system based on knowledge-enhanced diffusion model
CN117993050B (en) * 2023-12-27 2024-09-17 清华大学 Building design method and system based on knowledge-enhanced diffusion model
CN117874755A (en) * 2024-03-13 2024-04-12 中国电子科技集团公司第三十研究所 System and method for identifying hidden network threat users
CN117874755B (en) * 2024-03-13 2024-05-10 中国电子科技集团公司第三十研究所 System and method for identifying hidden network threat users

Similar Documents

Publication Publication Date Title
CN112613314A (en) Electric power communication network knowledge graph construction method based on BERT model
WO2022037256A1 (en) Text sentence processing method and device, computer device and storage medium
CN111177393B (en) Knowledge graph construction method and device, electronic equipment and storage medium
WO2021121198A1 (en) Semantic similarity-based entity relation extraction method and apparatus, device and medium
CN111737476A (en) Text processing method and device, computer readable storage medium and electronic equipment
US20210217504A1 (en) Method and apparatus for verifying medical fact
CN112100332A (en) Word embedding expression learning method and device and text recall method and device
CN111522839A (en) Natural language query method based on deep learning
CN112131883B (en) Language model training method, device, computer equipment and storage medium
CN110647632B (en) Image and text mapping technology based on machine learning
WO2022088671A1 (en) Automated question answering method and apparatus, device, and storage medium
CN113779225B (en) Training method of entity link model, entity link method and device
WO2023137918A1 (en) Text data analysis method and apparatus, model training method, and computer device
CN115115914B (en) Information identification method, apparatus and computer readable storage medium
CN114676255A (en) Text processing method, device, equipment, storage medium and computer program product
CN113705218A (en) Event element gridding extraction method based on character embedding, storage medium and electronic device
WO2022134793A1 (en) Method and apparatus for extracting semantic information in video frame, and computer device
CN114880991B (en) Knowledge graph question-answering question-sentence entity linking method, device, equipment and medium
CN113704434A (en) Knowledge base question and answer method, electronic equipment and readable storage medium
CN115114419A (en) Question and answer processing method and device, electronic equipment and computer readable medium
CN111931503B (en) Information extraction method and device, equipment and computer readable storage medium
CN117556048A (en) Artificial intelligence-based intention recognition method, device, equipment and medium
CN111737951B (en) Text language incidence relation labeling method and device
CN112199954A (en) Disease entity matching method and device based on voice semantics and computer equipment
CN114925681B (en) Knowledge graph question-answering question-sentence entity linking method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210406