CN116578717A - Multi-source heterogeneous knowledge graph construction method for electric power marketing scene - Google Patents

Multi-source heterogeneous knowledge graph construction method for electric power marketing scene Download PDF

Info

Publication number
CN116578717A
CN116578717A CN202310475107.0A CN202310475107A CN116578717A CN 116578717 A CN116578717 A CN 116578717A CN 202310475107 A CN202310475107 A CN 202310475107A CN 116578717 A CN116578717 A CN 116578717A
Authority
CN
China
Prior art keywords
knowledge
entity
graph
power marketing
power
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310475107.0A
Other languages
Chinese (zh)
Inventor
徐道磊
郑皓文
侯劲松
蒋明
周明
张靖
韩学民
卓文合
刘辉舟
李周
马俊杰
欧阳昱
刘军
薛晓茹
方锐
周婕
唐轶轩
路宇
张迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information and Telecommunication Branch of State Grid Anhui Electric Power Co Ltd
Original Assignee
Information and Telecommunication Branch of State Grid Anhui Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information and Telecommunication Branch of State Grid Anhui Electric Power Co Ltd filed Critical Information and Telecommunication Branch of State Grid Anhui Electric Power Co Ltd
Priority to CN202310475107.0A priority Critical patent/CN116578717A/en
Publication of CN116578717A publication Critical patent/CN116578717A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a multi-source heterogeneous knowledge graph construction method for an electric power marketing scene, which comprises the following steps: knowledge extraction is carried out on the multisource heterogeneous corpus of the electric power marketing system; identifying the corpus after knowledge extraction by utilizing a mixed multi-granularity algorithm to obtain an electric marketing knowledge entity; selecting a graph neural network to establish association between power marketing knowledge entities, and carrying out knowledge fusion storage on the association; and carrying out knowledge graph verification on the stored electric power marketing knowledge entity, and completing knowledge graph construction after the verification is finished. According to the invention, the relationship between entities is established on the document level by adopting the graph neural network, so that the document-level relationship reasoning is realized, the chapter-level relationship extraction task is solved, the recognition accuracy of Chinese word granularity is improved by utilizing a mixed multi-granularity entity recognition method, the knowledge automatic fusion method oriented to the electric power marketing field is provided, the problem of improper credibility distribution in the traditional knowledge fusion is solved, and the effectiveness and accuracy of the fusion result are ensured.

Description

Multi-source heterogeneous knowledge graph construction method for electric power marketing scene
Technical Field
The invention relates to the technical field of natural language processing (NPL), in particular to a multi-source heterogeneous knowledge graph construction method for an electric power marketing scene.
Background
The knowledge graph can establish semantic links among different knowledge points, provides good data support for intelligent question and answer, and can answer the questions of users from the semantic link level of the knowledge points when the current knowledge graph is applied to an electric power marketing system, thereby solving the problems of untimely and inadequately personalized customer service question answering feedback in online marketing.
The method for constructing the knowledge graph in the electric power marketing field comprises manual construction and automatic construction, the method based on the manual construction utilizes unstructured text data such as electric power documents, equipment data and the like to finish the construction of the knowledge graph through the processes of constructing ontology, data acquisition, entity extraction, relation extraction and homologous data fusion under the guidance of field experts, the method can ensure the specialization of the knowledge graph in the electric power marketing field, but consumes time and effort, different experts can have deviation on the knowledge of the same knowledge point, and the precision and consistency are difficult to ensure for basic data; the method based on automatic construction extracts entity, entity relation and entity attribute in the semi-structured text from the network to automatically construct the discipline knowledge graph, and the method can save a large amount of construction cost and quickly construct the knowledge graph, but the accuracy of data cannot be ensured, and the requirements of the electric power marketing field cannot be met.
However, in the prior art, the electric power marketing knowledge graph is usually based on a single data source, and although the questions of the user can be answered from the semantic contact level of the knowledge points, the efficiency and accuracy of graph construction cannot be ensured, and the requirements of the user cannot be fully covered.
Disclosure of Invention
According to a first aspect of the present invention, there is provided a multi-source heterogeneous knowledge graph construction method for a power marketing scenario, including:
step 1, knowledge extraction is carried out on multi-source heterogeneous corpus of an electric power marketing system;
step 2, recognizing the corpus after knowledge extraction by utilizing a mixed multi-granularity algorithm to obtain an electric marketing knowledge entity;
step 3, selecting a graph neural network to establish association between the power marketing knowledge entities, and carrying out knowledge fusion storage on the association;
and 4, checking the stored power marketing knowledge entity for the knowledge graph, and completing the construction of the knowledge graph after the checking is finished.
Further, in the step 2, the method specifically includes:
at the input layer of the mixed multi-granularity algorithm, word vector coding and position vector coding of the corpus after knowledge extraction are input;
performing mixed multi-granularity entity identification on the full connection layer;
at the output layer, the sequence labels of the word granularity predictions are identified, and the multi-granularity power marketing knowledge entity is output.
Further, the performing hybrid multi-granularity entity identification includes:
acquiring word vector representations of word levels and two corresponding position vector representations of the input corpus text;
segmenting the corpus text by using word segmentation devices with different granularities to form a plurality of words, and then splicing the words to the original sentence to obtain multi-granularity word segmentation;
obtaining a corresponding word vector representation and two corresponding position vector representations for each word;
wherein the word vector representation, and the representation in the position vector representation are parameters that are learnable in training, the position vector representation comprising a start position and an end position.
Further, in the step 3, the method specifically includes:
constructing a homogeneous network in the graph neural network by using a node construction function, a dynamic reasoner and a classifier;
establishing a document structure relation of the electric power marketing knowledge entity through the homogeneous network;
and classifying the relation of the document structure relation to hierarchically store the document structure relation.
Further, establishing a document structure relationship of the power marketing knowledge entity through the homogeneous network includes:
performing word vector coding on the electric marketing knowledge entity by using a node construction function, and extracting word vector representation and position vector representation on the shortest dependence path as nodes;
inducing the extracted nodes into a document structure through the dynamic reasoner, updating the representation of the nodes in the document structure based on information propagation and performing iterative optimization on the representation;
the nodes perform vector coding by using a pre-trained natural language processing model, and model long-distance dependency relations.
Further, the entity nodes of the power marketing knowledge entity in the homogeneous network are generated according to the extracted nodes.
Further, the performing relationship classification includes:
the classifier receives node information transmitted by the dynamic reasoner and acquires the document structure relation of the nodes;
the classifier invokes a classification model to judge whether the association exists between two adjacent entities in the document structural relationship of the node;
if so, identifying the specific relation type of the relation type to carry out category identification;
if not, neglecting, and selecting other adjacent entities to carry out association judgment.
Further, the relationship types include: single entity multi-attribute type, conditional constraint type, comparison type and maximum value type.
Further, in the step 3, performing knowledge fusion storage includes:
carrying out probability optimization on the associated nodes between the power marketing knowledge entities by using a power knowledge graph embedding technology based on a conceptual graph model to obtain corresponding knowledge characterization;
carrying out knowledge fusion processing on the knowledge representation by utilizing a heterogeneous knowledge fusion algorithm;
and carrying out storage management on the knowledge after fusion processing by combining the knowledge management of distributed data, link tracking and a rule engine.
Further, in the step 4, performing the knowledge-graph verification includes:
acquiring triple knowledge from the stored knowledge graph, and inputting a text in which each knowledge is successfully recorded;
obtaining an embedded representation in the triplet knowledge by utilizing a power knowledge graph embedding technology, wherein the embedded representation is (entity, relation, entity) or (entity, attribute, value);
training the bidirectional multi-view model by the aid of the text where the triplet knowledge is corresponding to the successfully input knowledge, and outputting to obtain an electric power knowledge verification model after training is finished;
when the power knowledge is newly added, the power knowledge verification model is utilized to verify the power knowledge and output a verification value, the verification value is compared with a set threshold value, and whether the newly added power knowledge is allowed to be input into a power knowledge graph is judged;
if the check value is more than or equal to the threshold value, allowing the newly-added power knowledge to be input into the power knowledge graph;
if the check value is smaller than the threshold value, the newly-added power knowledge is not allowed to be input into the power knowledge graph;
the value range of the check value is [0,1], and the threshold parameter is set to be 0.56 so as to meet the requirement of the accuracy of the map construction.
Compared with the prior art, the invention has the beneficial effects that: on one hand, the invention establishes the relation between the entities on the document level by adopting the graph neural network, thereby realizing the document level relation reasoning, solving the chapter level relation extraction task, improving the recognition accuracy of Chinese word granularity by utilizing a mixed multi-granularity entity recognition method, simultaneously providing an automatic knowledge fusion method oriented to the electric power marketing field, solving the problems of improper assignment of credibility and incapability of effectively reflecting the importance of results in the traditional knowledge fusion, guaranteeing the validity and accuracy of fusion results, improving the semantic calculation efficiency, solving the data sparseness problem and completing heterogeneous information fusion; on the other hand, the electric power knowledge verification technology combined with the bidirectional multi-view model further strengthens the representation form by considering the information of the triples on the original text, verifies the newly added electric power knowledge and ensures the input reliability.
It should be understood that all combinations of the foregoing concepts, as well as additional concepts described in more detail below, may be considered a part of the inventive subject matter of the present disclosure as long as such concepts are not mutually inconsistent. In addition, all combinations of claimed subject matter are considered part of the disclosed inventive subject matter.
The foregoing and other aspects, embodiments, and features of the present teachings will be more fully understood from the following description, taken together with the accompanying drawings. Other additional aspects of the invention, such as features and/or advantages of the exemplary embodiments, will be apparent from the description which follows, or may be learned by practice of the embodiments according to the teachings of the invention.
Drawings
The drawings are not intended to be drawn to scale. In the drawings, each identical or nearly identical component that is illustrated in various figures may be represented by a like numeral. For purposes of clarity, not every component may be labeled in every drawing. Embodiments of various aspects of the invention will now be described, by way of example, with reference to the accompanying drawings.
Fig. 1 is a schematic flow diagram of a multi-source heterogeneous knowledge graph construction method for a power marketing scenario according to the present invention;
FIG. 2 is a schematic diagram of a transducer encoder model according to the present invention;
FIG. 3 is a schematic diagram of a pre-trained natural language processing model of the present invention;
FIG. 4 is a diagram of a knowledge graph embedded frame according to the present invention;
FIG. 5 is a diagram of a variation self-encoder architecture shown in the present invention;
FIG. 6 is a diagram of a knowledge fusion framework in accordance with the present invention;
FIG. 7 is a schematic diagram of a text feature extraction RNN model according to the present invention.
Detailed Description
For a better understanding of the technical content of the present invention, specific examples are set forth below, along with the accompanying drawings.
Aspects of the invention are described in this disclosure with reference to the drawings, in which are shown a number of illustrative embodiments. The embodiments of the present disclosure are not necessarily intended to include all aspects of the invention. It should be understood that the various concepts and embodiments described above, as well as those described in more detail below, may be implemented in any of a number of ways, as the disclosed concepts and embodiments are not limited to any implementation. Additionally, some aspects of the disclosure may be used alone or in any suitable combination with other aspects of the disclosure.
At present, the power marketing knowledge graph in the prior art is usually based on a single data source, and although the query of the user can be answered from the semantic contact level of the knowledge points, the efficiency and accuracy of graph construction cannot be guaranteed, and the requirements of the user cannot be fully covered, which is obstructive to the technical development of the power marketing field, so the embodiment provides a multi-source heterogeneous knowledge graph construction method for the power marketing scene, so as to solve the above problems.
According to an embodiment of the invention, in combination with the flowchart shown in fig. 1, a multi-source heterogeneous knowledge graph construction method for a power marketing scene comprises the following steps:
step 1, knowledge extraction is carried out on multi-source heterogeneous corpus of an electric power marketing system;
step 2, recognizing the corpus after knowledge extraction by utilizing a mixed multi-granularity algorithm to obtain an electric marketing knowledge entity;
step 3, selecting a graph neural network to establish association between power marketing knowledge entities, and carrying out knowledge fusion storage on the association;
and 4, carrying out knowledge graph verification on the stored electric power marketing knowledge entity, and completing knowledge graph construction after the verification is finished.
It should be noted that, the multi-source heterogeneous knowledge graph construction method provided in this embodiment is based on real-time automatic construction of a computer running program, and when the knowledge fusion storage step is completed, the construction of the knowledge graph is primarily completed, but in order to further improve the validity and accuracy of the graph construction result, and also in order to ensure the accuracy of updating the real-time corpus data, a knowledge graph verification step is added, and the verification of whether the newly added power knowledge exists is performed on the basis of the primary knowledge graph construction, so as to ensure the integrity and accuracy of the graph construction.
Preferably, in this embodiment, aiming at the singleness of the data sources of the power marketing knowledge graph, a multi-source heterogeneous technology is fused to obtain corpus data of a plurality of different sources in the power marketing field, for example: the method has the advantages that the mixed data (structured and unstructured) and the discrete data (distributed) are extracted uniformly and normally, the built homogeneous network is utilized to establish document structure association among extraction entities, so that the efficiency of power marketing knowledge graph construction is improved, the problem of improper distribution of knowledge fusion credibility in the prior art is solved through a heterogeneous information fusion technology, and the validity and accuracy of knowledge entity fusion of the multi-source heterogeneous corpus data are ensured.
The implementation and/or effects of certain examples of the present invention are described in more detail below in conjunction with the flowcharts shown in fig. 2-7 and some preferred or alternative examples of the present invention.
[ knowledge extraction ]
In the foregoing step 1, knowledge extraction is performed on the multi-source heterogeneous corpus of the electric power marketing system, which specifically includes:
collecting multi-source heterogeneous corpus data from an electric power marketing system by utilizing a data mining technology, and preprocessing the multi-source heterogeneous corpus data;
the pretreatment comprises the following steps: classifying the multi-source heterogeneous corpus data, cleaning abnormal/repeated data and integrating the data;
the preprocessed multi-source heterogeneous corpus data comprises four types, namely files with different sources, files with different lengths, files with different formats and files with different structures, and the corresponding knowledge extraction operation is carried out on the corpus data with different categories;
when extracting files of different sources, mixing the files of different sources to train a large-scale pre-training model, after the pre-training is finished, respectively setting extraction models in the corpus of each source, and extracting knowledge from the corpus of different sources through each extraction model to obtain knowledge;
the extraction model is a pre-training model subjected to fine tuning, the fine tuning is to modify learning parameters in the pre-training model, and the multi-source learning parameters are modified into specified source learning parameters, for example: the multi-source learning parameters in the original pre-training model comprise 1-10, 1-10 types of sources can be learned and trained, knowledge is identified, after fine adjustment, the source learning parameters in the extraction model can only be 1/2/3/4/5/6/7/8/9/10, and if the source learning parameters are 1, only the first type of sources can be learned, and knowledge is identified;
when files with different lengths are extracted, a named entity algorithm is combined with a cyclic neural network to extract knowledge entities in short texts, a hierarchical long-short-term memory network is used for capturing text long-distance dependency relations between single paragraphs and long paragraphs, and knowledge is extracted;
when extracting files with different formats, obtaining text representation in a word vector form and identifying knowledge in the text representation if the files are in a text format, obtaining picture vector representation through a picture representation model if the files are in a picture format, storing each picture as an independent entity, splicing the picture representation and the text representation if the files are in a picture-text combination form, adding a type vector to indicate which current vector is, and extracting knowledge in the spliced vector by using a fusion model;
when the files with different structures are extracted, entity representation and knowledge extraction are carried out according to field attributes if the files are structured texts, and knowledge is extracted by utilizing a mixed multi-granularity entity recognition technology if the files are unstructured texts;
and after the knowledge extraction is finished, carrying out mixed multi-granularity entity recognition on the extracted corpus data so as to improve the accuracy of entity recognition.
It should be noted that, in this embodiment, the possibility that multi-source heterogeneous corpus data exists in the electric power marketing field is considered, the multi-source heterogeneous corpus data is preprocessed, and is divided into four categories, and corresponding knowledge extraction operations are adopted for different categories, so as to solve the problems of low entity extraction efficiency, low accuracy and sparse data of corpus data with different sources and different formats in the prior art.
Preferably, the embodiment provides training a large-scale pre-training model for mixed files of different sources to improve generalization capability of the model, and independently setting corresponding extraction models on corpus of each source to improve knowledge extraction accuracy of corpus of each source.
[ hybrid Multi-granularity identification Power marketing knowledge entity ]
In the step 2, the corpus after knowledge extraction is identified by using a mixed multi-granularity algorithm to obtain an electric marketing knowledge entity, which specifically comprises the following steps:
inputting word vector codes and position vector codes of the corpus after knowledge extraction in an input layer of a mixed multi-granularity algorithm;
the word vector coding and the position vector coding are coded by an encoder, and the encoder adopts a transducer (sequence generation neural network) structure based on a self-attention mechanism and a full-connection layer, as shown in fig. 2, so as to realize the fusion of the information of word granularity, word granularity and phrase granularity;
at the full-connection layer, word vector representations at the word level and two corresponding position vector representations are obtained for the input corpus text (i.e., word vector codes and position vector codes).
Wherein the position vector is calculated according to the following formula:
where pos represents the position of the word, i represents the dimension, PE (pos,2i) Represents the starting position, PE (pos,2i+1) Indicating the end position;
dividing the text of the language material by word separators with different granularities to form a plurality of words, and then splicing the words to the original sentence to obtain multi-granularity word separation;
obtaining a corresponding word vector representation and two corresponding position vector representations for each word;
the word vector representation, the word vector representation and the position vector representation are parameters which can be learned in training, and the position vector representation comprises a starting position and an ending position;
performing word granularity prediction (BIO) on the obtained word vector representation, word vector representation and position vector representation, marking labels on the obtained word vector representation, and arranging the labels according to time sequence to obtain sequence labels, for example: performing word granularity prediction on the corresponding representation of the "good" word to obtain a start (B)/duration (I)/non-entity (O) sequence tag;
identifying a sequence label of word granularity prediction at an output layer, and outputting an identified electric marketing knowledge entity;
and storing the output power marketing knowledge entity into a graph neural network so as to quickly establish entity association.
Preferably, the knowledge-extracted corpus data comprises a plurality of granular physical forms, such as: compared with the traditional entity recognition method (word segmentation information is not introduced and Chinese recognition accuracy is limited) which directly adopts the word granularity, the word granularity and the phrase granularity are adopted, a self-attention mechanism and a full-connection layer transducer structure encoder are selected to encode the text of the language, and the word vector representation, the word vector representation and the corresponding position vector representation are adopted to adapt to the requirements of various granularities, so that the problem of limited Chinese recognition accuracy in the traditional entity recognition method is solved, and the recognition accuracy of the Chinese word granularity is improved.
[ establishing a knowledge entity relationship ]
In the step 3, the selecting the graph neural network to establish the association between the electric power marketing knowledge entities specifically includes:
constructing a homogeneous network in the graph neural network by using a node construction function, a dynamic reasoner and a classifier;
reading the stored electric power marketing knowledge entity through a homogeneous network and establishing a document structure relation between the entities;
the method comprises the steps of carrying out word vector coding on an electric power marketing knowledge entity by using a node construction function, extracting word vector representation and position vector representation on a shortest dependent path as nodes, and generating entity nodes of the electric power marketing knowledge entity in a homogeneous network according to the extracted nodes;
inducing the extracted nodes into a document structure through a dynamic reasoner, updating the representation of the nodes in the document structure based on information propagation, performing iterative optimization on the representation, and transmitting the finally optimized document structure (the end of node updating) to a classifier to perform relationship type judgment;
referring to fig. 3, the nodes perform vector coding by using a pre-trained natural language processing model, and model long-distance dependency relationships to perform iterative optimization;
the classifier receives the node information transmitted by the dynamic inference engine, acquires the document structure relation of the node, and invokes the classification model to judge whether the association exists between two adjacent entities in the document structure relation of the node;
if so, identifying its specific relationship type for category identification, for example: the relationship type comprises a single entity multi-attribute type, a condition constraint type, a comparison type and a maximum value type, and if the identified relationship type is the comparison type, the relationship type is identified through a label;
if not, neglecting, and selecting other adjacent entities to perform association judgment;
and storing the entity relationship category identification with association in the form of association nodes so as to quickly traverse entity association in the document structure.
It should be noted that, in this embodiment, it is proposed to build a homogeneous network to establish a document structure relationship between entities, which is automatically differentiated by an attention mechanism, so that the efficiency of graph construction is improved, and meanwhile, compared with a traditional convolutional neural network and a cyclic neural network, the homogeneous network can establish a relationship between entities on a document level, so as to implement document-level relationship reasoning.
[ knowledge entity fusion store ]
In the step 3, knowledge fusion storage is performed, including:
referring to fig. 4, probability optimization is performed on the associated nodes between the power marketing knowledge entities by using a power knowledge graph embedding technology based on a conceptual graph model, so as to obtain corresponding knowledge characterization;
carrying out knowledge fusion processing on the knowledge characterization by utilizing a heterogeneous knowledge fusion algorithm;
and carrying out storage management on the knowledge after fusion processing by combining the knowledge management of distributed data, link tracking and a rule engine.
Further, the power knowledge graph embedding technology based on the conceptual graph model acquires knowledge characterization, which specifically comprises the following steps:
adding semantic features among triples in the knowledge graph by using a graph embedding technology;
using a Markov random field and a VAE model (variational self-encoder) to set up a probability map network by taking added semantic features in a variable form as nodes in a probability space;
specifically, the markov random field satisfies two conditions:
where P (f) is the probability that event x=f occurs, f=f 1 ,f 2 ,…,f n One state called a random field X, and the set of all states of the random field X is described asI.e. state space, defines the neighborhood of the set of base points S as +.>
Wherein the method comprises the steps ofFor all the sets of base points adjacent to base point i, +.>Satisfy->
Referring to FIG. 5, an example of a VAE model architecture is shown, where x i (j) A j-th feature representing an i-th data point;
dividing the probability graph network into bigrams (adding any node in a graph does not form a blob anymore, for example, { X1, X3} is a bigram, if X2 is added, it is not a bigram), and building an optimization function to optimize the conversion of bytecodes into message objects when data flows in and into bytecodes when data flows out;
based on a circulation confidence message propagation algorithm, the influence among nodes in the probability map network is transmitted as a message;
mapping the operations in the probability space (namely the probability graph network) into the embedding space, and establishing a distance relation between hidden variables so as to optimize the embedding vectors in the embedding space, thereby obtaining proper embedding positions and obtaining corresponding knowledge characterization.
Still further, the knowledge fusion processing is performed by using a heterogeneous knowledge fusion algorithm, which specifically includes:
the BM25 (algorithm for calculating the similarity scores of query and document) is utilized to search the optimal knowledge candidate set from multiple types of power knowledge to form a knowledge set to be fused;
model abstraction is carried out on the knowledge set to be fused by using a D-S evidence theory (trust function theory), the confidence level of the knowledge is obtained, and each knowledge O is calculated Kj Confidence P of (1) Okj The calculation formula is as follows:
where K is a single knowledge of the input, O Kj For one retrieved knowledge, KVEc is the vector representation corresponding to the single knowledge, O Kj Vec is the vector representation corresponding to the retrieved knowledge;
sequencing the confidence degrees, and obtaining a confidence degree score by calculating the similarity of the existing text;
and selecting a corresponding interval range for storage according to the confidence score, completing knowledge fusion, and forming a preliminary electric power marketing knowledge graph structure.
The confidence score interval is [0,1.0], and the number of intervals which are not completely divided is adopted in the embodiment, and is set as follows:
higher: 0.75,1.0], medium: [0.25,0.75], lower: [0,0.25];
it should be noted that, knowledge fusion in this embodiment may be implemented by a recurrent neural network or a natural language processing model.
As a preferred embodiment, knowledge fusion is implemented based on a recurrent neural network (as shown in fig. 6 and 7), and the input sequence (i.e. the confidence ranking sequence) can be processed through the internal states of the recurrent neural network, so that the recurrent neural network memorizes the information input before and applies the information to the calculation of the current output, namely, the input of the hidden layer not only comprises the output of the input layer, but also comprises the output of the hidden layer at the last moment, and the calculation formula is as follows:
O t =g(V·S t )
S t =f(U·X t +W·S t-1 )
wherein X is t For the input value received by the network at time t, S t For hiding the value of the layer at time t, O t For the output value of the network at the time t, U is the weight matrix from the input layer to the hidden layer, V is the weight matrix from the hidden layer to the output layer, and W is the last value of the hidden layer as the weight matrix of the current input.
It should be noted that, in this embodiment, through knowledge fusion, different knowledge entities are linked to align the entities, so that the problem of low quality of single knowledge graph data is solved.
Preferably, in this embodiment, probability optimization is performed on association nodes between power marketing knowledge entities by using a markov random field and a VAE model architecture, so as to obtain corresponding knowledge representation, through undirected graph characteristics of the markov random field, nodes in a graph network represent one or a group of variables, edges between the nodes represent dependency relationships of two variables, and through influence interaction between each node, the relationship between the nodes in space can be propagated, so that the relationship between heterogeneous knowledge in this embodiment can be expressed through space vector positions, so as to improve cognition effectiveness and accuracy of correct relationship distribution of multi-source heterogeneous knowledge, and provide an accurate calculation parameter basis for subsequent confidence calculation, so as to improve accuracy of heterogeneous information fusion.
Preferably, the embodiment provides an automatic knowledge fusion method oriented to the electric power marketing field based on the D-S evidence theory, and compared with the traditional knowledge fusion method, the method provided by the embodiment effectively solves the problems that reliability distribution is improper and importance of results cannot be effectively reflected in traditional knowledge fusion, ensures validity and accuracy of fusion results, improves semantic calculation efficiency, solves the problem of data sparseness, and completes heterogeneous information fusion.
[ knowledge-graph verification ]
In the step 4, the knowledge graph verification is performed, which specifically includes:
based on n-gram (statistical language model, performing sliding window operation of n size on the content in the text in bytes to form a byte fragment sequence with length of n), and screening candidate entities from the preliminarily constructed map in a word embedding mode;
combining the context of the candidate entity to construct a low-dimensional map about the candidate entity;
identifying low-dimensional mapping of candidate entities, screening out corresponding entities in the preliminary power marketing knowledge graph, and completing context-based power entity chain finger processing;
acquiring the triplet knowledge in the preliminarily constructed map after the entity chain finger processing, extracting the triplet knowledge of one of the completed entity chain fingers, and acquiring the entity in the knowledge map and the attribute and the relation of the entity;
performing low-dimensional space mapping representation on the original text (without entity chain finger processing) and the triplet knowledge based on a semantic representation technology;
judging whether the attribute or the relation contained in the newly added triplet (namely, the electric power knowledge updated in real time by the electric power marketing system) belongs to the acquired attribute or the relation of the entity or not by combining the acquired attribute and the relation of the entity, if so, replacing and updating the attribute or the relation in the triplet, and if not, submitting manual verification;
acquiring triplet knowledge and text in which each successfully recorded knowledge is located from the preliminarily constructed map;
obtaining embedded characterization in the triplet knowledge by utilizing a power knowledge graph embedding technology, wherein the embedded characterization is (entity, relation, entity) or (entity, attribute, value);
training the bidirectional multi-view model by inputting the text where the triplet knowledge is in, and outputting to obtain an electric power knowledge verification model after the training is finished;
when the power knowledge is newly added, the power knowledge verification model is utilized to verify the power knowledge and output a verification value, the verification value is compared with a set threshold value, and whether the newly added power knowledge is allowed to be input into a power knowledge graph is judged;
if the check value is more than or equal to the threshold value, allowing the newly-added power knowledge to be input into the power knowledge graph;
if the check value is less than the threshold value, the newly-added power knowledge is not allowed to be input into the power knowledge graph;
the value range of the check value is [0,1], and the threshold parameter is set to be 0.56 so as to meet the requirement of the accuracy of the map construction.
It should be noted that, the power knowledge verification model is obtained by training a text where the knowledge is successfully recorded by combining a bidirectional multi-view model with the triplet knowledge, the verification of the newly-added power knowledge is the operation of the triplet in the newly-added power knowledge, the essence of the output verification value is a probability value similar to a knowledge entity, whether the newly-added power knowledge operates an recorded map is obtained by comparing with a set threshold value, and in order to improve the accuracy of the map construction, the threshold value in the embodiment is set to be 0.56 so as to screen the knowledge input with higher probability of representing the similarity with the knowledge in the map and ensure the reliability of the recorded knowledge.
It should be noted that, compared with the traditional atlas auditing method, the embodiment considers the single information of the triplet to the original text, proposes the electric power knowledge verification technology combined with the bidirectional multi-view model, further strengthens the characterization form, verifies the newly added knowledge, ensures the reliability of the input knowledge, and improves the accuracy of constructing the atlas.
Preferably, the verification process provided in the step 4 is set for further improving the validity and accuracy of the graph construction result, and since the electric power marketing system has the problem of updating the electric power knowledge in real time, in order to ensure the accuracy of updating the real-time corpus data, that is, avoid error influence on the graph construction caused by missing the newly added electric power knowledge, the embodiment performs the verification of whether the newly added electric power knowledge exists on the basis of the preliminary knowledge graph construction by adding the verification step and utilizing a computer operation program polling mechanism so as to ensure the integrity and the accuracy of the graph construction.
The entity extraction and recognition algorithm, knowledge fusion technology, chain finger processing and bidirectional multi-view model verification method of the knowledge entity can be performed by means of the prior art, and are not described in detail in this example.
While the invention has been described with reference to preferred embodiments, it is not intended to be limiting. Those skilled in the art will appreciate that various modifications and adaptations can be made without departing from the spirit and scope of the present invention. Accordingly, the scope of the invention is defined by the appended claims.

Claims (10)

1. The multi-source heterogeneous knowledge graph construction method for the electric power marketing scene is characterized by comprising the following steps of:
step 1, knowledge extraction is carried out on multi-source heterogeneous corpus of an electric power marketing system;
step 2, recognizing the corpus after knowledge extraction by utilizing a mixed multi-granularity algorithm to obtain an electric marketing knowledge entity;
step 3, selecting a graph neural network to establish association between the power marketing knowledge entities, and carrying out knowledge fusion storage on the association;
and 4, checking the stored power marketing knowledge entity for the knowledge graph, and completing the construction of the knowledge graph after the checking is finished.
2. The method for constructing a multi-source heterogeneous knowledge graph for a power marketing scene according to claim 1, wherein in the step 2, the method specifically comprises:
at the input layer of the mixed multi-granularity algorithm, word vector coding and position vector coding of the corpus after knowledge extraction are input;
performing mixed multi-granularity entity identification on the full connection layer;
at the output layer, the sequence labels of the word granularity predictions are identified, and the multi-granularity power marketing knowledge entity is output.
3. The multi-source heterogeneous knowledge graph construction method for a power marketing scenario of claim 2, wherein the performing hybrid multi-granularity entity identification comprises:
acquiring word vector representations of word levels and two corresponding position vector representations of the input corpus text;
segmenting the corpus text by using word segmentation devices with different granularities to form a plurality of words, and then splicing the words to the original sentence to obtain multi-granularity word segmentation;
obtaining a corresponding word vector representation and two corresponding position vector representations for each word;
wherein the word vector representation, and the representation in the position vector representation are parameters that are learnable in training, the position vector representation comprising a start position and an end position.
4. The method for constructing a multi-source heterogeneous knowledge graph for a power marketing scenario according to claim 1 or 2, wherein in the step 3, the method specifically comprises:
constructing a homogeneous network in the graph neural network by using a node construction function, a dynamic reasoner and a classifier;
establishing a document structure relation of the electric power marketing knowledge entity through the homogeneous network;
and classifying the relation of the document structure relation to hierarchically store the document structure relation.
5. The multi-source heterogeneous knowledge graph construction method for a power marketing scenario of claim 4, wherein establishing a document structure relationship of the power marketing knowledge entity through the homogeneous network comprises:
performing word vector coding on the electric marketing knowledge entity by using a node construction function, and extracting word vector representation and position vector representation on the shortest dependence path as nodes;
inducing the extracted nodes into a document structure through the dynamic reasoner, updating the representation of the nodes in the document structure based on information propagation and performing iterative optimization on the representation;
the nodes perform vector coding by using a pre-trained natural language processing model, and model long-distance dependency relations.
6. The multi-source heterogeneous knowledge graph construction method for a power marketing scenario of claim 5, wherein entity nodes of the power marketing knowledge entity in the homogeneous network are generated from the extracted nodes.
7. The multi-source heterogeneous knowledge graph construction method for a power marketing scenario of claim 4, wherein the performing relationship classification comprises:
the classifier receives node information transmitted by the dynamic reasoner and acquires the document structure relation of the nodes;
the classifier invokes a classification model to judge whether the association exists between two adjacent entities in the document structural relationship of the node;
if so, identifying the specific relation type of the relation type to carry out category identification;
if not, neglecting, and selecting other adjacent entities to carry out association judgment.
8. The multi-source heterogeneous knowledge graph construction method for a power marketing scenario of claim 7, wherein the relationship type comprises: single entity multi-attribute type, conditional constraint type, comparison type and maximum value type.
9. The method for constructing a multi-source heterogeneous knowledge graph for a power marketing scene according to claim 1, wherein in the step 3, knowledge fusion storage is performed, comprising:
carrying out probability optimization on the associated nodes between the power marketing knowledge entities by using a power knowledge graph embedding technology based on a conceptual graph model to obtain corresponding knowledge characterization;
carrying out knowledge fusion processing on the knowledge representation by utilizing a heterogeneous knowledge fusion algorithm;
and carrying out storage management on the knowledge after fusion processing by combining the knowledge management of distributed data, link tracking and a rule engine.
10. The method for constructing a multi-source heterogeneous knowledge graph for a power marketing scenario according to claim 1 or 9, wherein in the step 4, the knowledge graph verification is performed, and the method comprises:
acquiring triple knowledge from the stored knowledge graph, and inputting a text in which each knowledge is successfully recorded;
obtaining an embedded representation in the triplet knowledge by utilizing a power knowledge graph embedding technology, wherein the embedded representation is (entity, relation, entity) or (entity, attribute, value);
training the bidirectional multi-view model by the aid of the text where the triplet knowledge is corresponding to the successfully input knowledge, and outputting to obtain an electric power knowledge verification model after training is finished;
when the power knowledge is newly added, the power knowledge verification model is utilized to verify the power knowledge and output a verification value, the verification value is compared with a set threshold value, and whether the newly added power knowledge is allowed to be input into a power knowledge graph is judged;
if the check value is more than or equal to the threshold value, allowing the newly-added power knowledge to be input into the power knowledge graph;
if the check value is smaller than the threshold value, the newly-added power knowledge is not allowed to be input into the power knowledge graph;
the value range of the check value is [0,1], and the threshold parameter is set to be 0.56 so as to meet the requirement of the accuracy of the map construction.
CN202310475107.0A 2023-04-28 2023-04-28 Multi-source heterogeneous knowledge graph construction method for electric power marketing scene Pending CN116578717A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310475107.0A CN116578717A (en) 2023-04-28 2023-04-28 Multi-source heterogeneous knowledge graph construction method for electric power marketing scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310475107.0A CN116578717A (en) 2023-04-28 2023-04-28 Multi-source heterogeneous knowledge graph construction method for electric power marketing scene

Publications (1)

Publication Number Publication Date
CN116578717A true CN116578717A (en) 2023-08-11

Family

ID=87542411

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310475107.0A Pending CN116578717A (en) 2023-04-28 2023-04-28 Multi-source heterogeneous knowledge graph construction method for electric power marketing scene

Country Status (1)

Country Link
CN (1) CN116578717A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116913460A (en) * 2023-09-13 2023-10-20 福州市迈凯威信息技术有限公司 Marketing business compliance judgment and analysis method for pharmaceutical instruments and inspection reagents
CN117194682A (en) * 2023-11-07 2023-12-08 国网浙江省电力有限公司营销服务中心 Method, device and medium for constructing knowledge graph based on power grid related file
CN117725232A (en) * 2024-02-18 2024-03-19 中国电子科技集团公司第十五研究所 Multi-mode knowledge graph verification method and device, electronic equipment and storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116913460A (en) * 2023-09-13 2023-10-20 福州市迈凯威信息技术有限公司 Marketing business compliance judgment and analysis method for pharmaceutical instruments and inspection reagents
CN116913460B (en) * 2023-09-13 2023-12-29 福州市迈凯威信息技术有限公司 Marketing business compliance judgment and analysis method for pharmaceutical instruments and inspection reagents
CN117194682A (en) * 2023-11-07 2023-12-08 国网浙江省电力有限公司营销服务中心 Method, device and medium for constructing knowledge graph based on power grid related file
CN117194682B (en) * 2023-11-07 2024-03-01 国网浙江省电力有限公司营销服务中心 Method, device and medium for constructing knowledge graph based on power grid related file
CN117725232A (en) * 2024-02-18 2024-03-19 中国电子科技集团公司第十五研究所 Multi-mode knowledge graph verification method and device, electronic equipment and storage medium
CN117725232B (en) * 2024-02-18 2024-04-26 中国电子科技集团公司第十五研究所 Multi-mode knowledge graph verification method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN108897857B (en) Chinese text subject sentence generating method facing field
CN108416058B (en) Bi-LSTM input information enhancement-based relation extraction method
CN111738004A (en) Training method of named entity recognition model and named entity recognition method
CN109376222B (en) Question-answer matching degree calculation method, question-answer automatic matching method and device
CN116578717A (en) Multi-source heterogeneous knowledge graph construction method for electric power marketing scene
CN111324728A (en) Text event abstract generation method and device, electronic equipment and storage medium
CN112800170A (en) Question matching method and device and question reply method and device
CN111339765B (en) Text quality assessment method, text recommendation method and device, medium and equipment
CN113392209B (en) Text clustering method based on artificial intelligence, related equipment and storage medium
CN113535917A (en) Intelligent question-answering method and system based on travel knowledge map
CN112100332A (en) Word embedding expression learning method and device and text recall method and device
CN113032568A (en) Query intention identification method based on bert + bilstm + crf and combined sentence pattern analysis
CN117151220B (en) Entity link and relationship based extraction industry knowledge base system and method
CN112487190A (en) Method for extracting relationships between entities from text based on self-supervision and clustering technology
CN116127090B (en) Aviation system knowledge graph construction method based on fusion and semi-supervision information extraction
CN116383399A (en) Event public opinion risk prediction method and system
CN115688784A (en) Chinese named entity recognition method fusing character and word characteristics
CN113761868A (en) Text processing method and device, electronic equipment and readable storage medium
CN114880307A (en) Structured modeling method for knowledge in open education field
CN115098673A (en) Business document information extraction method based on variant attention and hierarchical structure
CN116522165B (en) Public opinion text matching system and method based on twin structure
CN111930892B (en) Scientific and technological text classification method based on improved mutual information function
CN113705207A (en) Grammar error recognition method and device
CN117609421A (en) Electric power professional knowledge intelligent question-answering system construction method based on large language model
CN117112786A (en) Rumor detection method based on graph attention network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination