CN115329101A - Electric power Internet of things standard knowledge graph construction method and device - Google Patents

Electric power Internet of things standard knowledge graph construction method and device Download PDF

Info

Publication number
CN115329101A
CN115329101A CN202211082212.XA CN202211082212A CN115329101A CN 115329101 A CN115329101 A CN 115329101A CN 202211082212 A CN202211082212 A CN 202211082212A CN 115329101 A CN115329101 A CN 115329101A
Authority
CN
China
Prior art keywords
standard
things
power internet
relation
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211082212.XA
Other languages
Chinese (zh)
Inventor
高辉
王倩倩
杨璐彤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202211082212.XA priority Critical patent/CN115329101A/en
Publication of CN115329101A publication Critical patent/CN115329101A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for constructing a standard knowledge graph of an electric power Internet of things, which comprises the steps of acquiring a standard text set of the electric power Internet of things, and preprocessing the standard text set; performing hierarchical relation extraction and entity identification between entities in a semi-supervised form on the preprocessed standard text set through a pre-constructed word vector model; mapping the hierarchical relation between the extracted entities to a pre-constructed primary knowledge map by a mapping learning method; the method comprises the steps of automatically learning characteristics between entities and entity attribute relations based on a graph neural network, naturally combining graph structures and node characteristics to train a classifier, carrying out semi-supervised classification on nodes to be classified of a graph, and constructing the standard knowledge graph of the power internet of things.

Description

Electric power Internet of things standard knowledge graph construction method and device
Technical Field
The invention relates to a method and a device for constructing a standard knowledge graph of an electric power Internet of things, and belongs to the technical field of knowledge graph construction.
Background
With the advancement of national standardization, standard documents are more and more abundant, and the documents are stored in a database in the form of pictures, texts and the like, so that the documents are difficult to be clearly classified and fully utilized. The method for constructing the knowledge graph is a mainstream method for improving the utilization rate and the classification efficiency of the standard knowledge of the power internet of things. The knowledge graph is a concept proposed by Google in 2012, and is essentially a high-quality data knowledge base, and text data is represented by entity triples, namely, a relationship display representing data is displayed in a graph form. The power internet of things standard comprises a large number of power related guide standards and technical specification standards, and with the rapid development of the power internet of things, the standard is developed along with the rapid development of the power internet of things, and the number of standard texts is increased. However, at present, the standard texts of the power internet of things have the problems of irregular data, unclear classification, various text sentence structures, incomplete data knowledge maps of the power internet of things, incomplete map node relations and the like, so that the problem to be solved is to construct a complete standard knowledge map of the power internet of things.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, and provides a method and a device for constructing a standard knowledge graph of an electric power internet of things, so that a complete standard knowledge graph of the electric power internet of things is constructed, and the construction accuracy and the model efficiency of the knowledge graph are improved.
In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, the invention provides a method for constructing a standard knowledge graph of an electric power internet of things, which comprises the following steps:
acquiring a standard text set of the power Internet of things, and preprocessing the standard text set;
performing hierarchical relation extraction and entity identification between entities in a semi-supervised form on the preprocessed standard text set through a pre-constructed word vector model;
mapping the hierarchical relationship among the extracted entities to a pre-constructed preliminary knowledge map by a mapping learning method;
automatically learning characteristics between entities and entity attribute relations based on a graph neural network, naturally combining graph structures and node characteristics to train a classifier, carrying out semi-supervised classification on nodes to be classified of the graph, and constructing a standard knowledge graph of the power internet of things.
Further, the preprocessing the standard text set includes: and performing word segmentation, labeling, naming, entity identification, merging and corpus format conversion processing on the standard text set.
Further, the preprocessing the standard text set includes the following processes:
storing the standard text data of the power Internet of things into a database, and directly converting all data in the database into structured form data;
extracting relationships among the standard entities, between the entities and the attributes and between the attributes of the power internet of things according to the requirements of the standard knowledge map of the power internet of things to form entities, attributes and relationship triples;
selecting a short sentence in the standard document of the power Internet of things as a unit, labeling the selected entity, scanning the processed document in the previous step, and finally obtaining all sentences containing more than two entity fields.
Further, the extracting of the hierarchical relationship and the entity recognition between the entities in the semi-supervised form are performed on the preprocessed standard text set through the pre-constructed word vector model, and the extracting of the hierarchical relationship and the entity recognition comprise:
constructing a word vector model, and training the word vector model through a Skip-gram model;
for the relation among the standard entities of the power Internet of things, a single-pass clustering method is adopted to generate and obtain a seed set extraction mode to be followed, a new type of hierarchy relation set can be obtained by extracting the seed set once, and meanwhile, 3 types of standard context feature vectors are formed;
a new cluster is equal to an example, the input quantity of the algorithm is a list of standard relation examples of the power Internet of things, and then any example i is obtained through calculation n Corresponding each cluster Cl j Is called similarity threshold T Sim If the threshold value T is Sim Not more than all the similarities, the seed is instantiated as i n Adding into corresponding cluster if threshold value T Sim Greater than or equal to all the similarities, a new cluster C is created m Continuing to browse the seed set relation instance set,
seed example set i n And clustering of Cl j The similarity between them is calculated by the formula Sim (i) n ,Cl j ) Then calculate i again n And clustering of Cl j If most of the similarity values are greater than the threshold value T Sim If the similarity between the two instances is not 0, the maximum value is returned, and if the similarity between the two instances is not 0, the similarity calculation formula between the two instances is as follows:
Sim=(S n ,S j )=α·cos(BEF i ,BEF j )+β·cos(BET i ,BET j )+γ·cos(AFT i ,AFT j ) (1)
wherein α, β, γ are the weights of the vectors, BEF is the word before the first entity, BET is the word between the two entities, AFT is the word after the second entity;
inputting a standard example of the power Internet of things and outputting a relation mode.
Further, scoring is carried out on all modes, and scoring on the modes is mainly carried out according to the relation of the extracted standard examples of the power internet of things, wherein the scoring grades are P, N and U; when the relation between the selected standard entities of the power internet of things is equal to the seed set, the relation is positive and the relation is added into the set P; if the selected power internet of things standard example relation is different from the seed set, the relation is negative, and the power internet of things standard example relation is added into the set N; if the relation of the selected power internet of things standard examples is in a seed set, and some are not identical or unknown, the selected power internet of things standard examples can be added into a set U;
the confidence of pattern P is specifically calculated as follows:
Figure BDA0003833452590000031
wherein, W n And W m Corresponding to the weights of N and U respectively,
the confidence weight calculation formula of the mode is as follows:
Figure BDA0003833452590000041
where ξ is the pattern of the decimation criterion instance i, C i Is the context for standard instance i.
Further, carry out semi-supervised classification to the node of waiting to classify of map, construct electric power thing networking standard knowledge map, include: inputting the constructed preliminary power internet of things standard knowledge graph into the graph volume layer, directly extracting the feature information of all nodes from the preliminary power internet of things standard knowledge graph, inputting the feature information into the feature extraction layer, and training the relation triple classifier to obtain the complete feature information of the power internet of things standard knowledge graph nodes.
Further, a py2Neo library is used for storing relational triples based on an analytic hierarchy process, word vectors and a graph neural network into a Neo4j graph database, and an open source visualization library Echarts is used for completing construction of the standard knowledge graph of the power internet of things.
In a second aspect, the invention provides a device for constructing a standard knowledge graph of an electric power internet of things, which comprises:
the preprocessing unit is used for acquiring a standard text set of the power Internet of things and preprocessing the standard text set;
the hierarchical relation extraction unit is used for carrying out hierarchical relation extraction and entity identification between entities in a semi-supervised form on the preprocessed standard text set through a word vector model which is constructed in advance;
the mapping unit is used for mapping the hierarchical relationship among the extracted entities into a pre-constructed preliminary knowledge graph by a mapping learning method;
and the map construction unit is used for automatically learning the characteristics between the entity and the entity attribute relation based on the neural network of the map, naturally combining the map structure and the node characteristics to train the classifier, performing semi-supervised classification on the nodes to be classified of the map, and constructing the standard knowledge map of the power internet of things.
In a third aspect, the invention provides a device for constructing a standard knowledge graph of an electric power internet of things, which comprises a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method according to any one of the preceding claims.
In a fourth aspect, the invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of any one of the preceding claims.
Compared with the prior art, the invention has the following beneficial effects:
the invention provides a method and a device for constructing a standard knowledge graph of an electric power Internet of things, which are characterized in that data preprocessing such as word segmentation, labeling, named entity recognition and combination, corpus format conversion and the like is carried out on a standard text set by collecting standard texts of the electric power Internet of things; extracting and identifying the hierarchical relationship between entities in a semi-supervised form on the standard text information of the power Internet of things through word vectors, and automatically constructing the hierarchical relationship between the entities through a mapping learning method; automatically learning the characteristics among the entities by using a graph neural network, identifying the attribute relation between the entities and the entity, naturally combining a graph structure and node characteristics to train a classifier, and semi-monitoring and classifying unclassified nodes in a knowledge graph; therefore, a ternary relation group is obtained, stored in a graph database and the construction of the knowledge graph of the power internet of things is completed, and the accuracy rate of the construction of the knowledge graph and the efficiency of the model are greatly improved.
Drawings
Fig. 1 is a flowchart of a method for constructing a standard knowledge graph of an electric power internet of things according to an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
Example 1
The embodiment introduces a method for constructing a standard knowledge graph of an electric power internet of things, which comprises the following steps:
acquiring a standard text set of the power Internet of things, and preprocessing the standard text set;
performing hierarchical relation extraction and entity identification between entities in a semi-supervised form on the preprocessed standard text set through a pre-constructed word vector model;
mapping the hierarchical relation between the extracted entities to a pre-constructed primary knowledge map by a mapping learning method;
automatically learning characteristics between entities and entity attribute relations based on a graph neural network, naturally combining graph structures and node characteristics to train a classifier, carrying out semi-supervised classification on nodes to be classified of the graph, and constructing a standard knowledge graph of the power internet of things.
The application process of the electric power internet of things standard knowledge graph construction method provided by the embodiment specifically relates to the following steps:
step 1: data preprocessing is carried out on standard text sets of power internet of things
(1) The standard text data of the power internet of things is stored in a database, and all data in the database are directly converted into structured form data.
(2) The knowledge graph is composed of triples, and then the power internet of things standard knowledge graph is directly represented by V = (head, relationship, tail), i.e., V = (head, relationship set between entities, tail entity), wherein the head and tail entities form a set of co-constituent entities, and migration = { r1, r 2.
(3) According to the requirements of the standard knowledge map of the power internet of things, the relations among the standard entities, the relations among the entities and the attributes and the relations among the attributes of the power internet of things are extracted, and entity, attribute and relation triples are formed.
(4) Selecting a short sentence in a standard document of the power Internet of things as a unit, labeling the selected entity, scanning the document processed in the previous step, and finally obtaining all sentences containing more than two entity fields.
Step 2: construction of electric power internet of things standard knowledge graph based on analytic hierarchy process
Two main subprograms used for constructing the knowledge graph based on the hierarchical lexical method are that a feature selection algorithm is used for determining a proper feature set at each decision point, and a supervised learning algorithm is used for constructing a classifier for the decision. The analytic hierarchy process uses a directed non-circular graph to encode domain-defining conditional independence assumptions that allow the distribution to be described as the product of a small local interaction model. Each variable (feature) X is a node in the network, and the arc connecting the two nodes indicates that there is an interdependent relationship between the two variables. The use of a hierarchy focuses on distributions with more uniform features, allowing us to direct both selected features and dependent models to local classification tasks.
And step 3: performing semi-supervised field entity hierarchical relation extraction and entity recognition on text information of the electric power Internet of things standard based on word vectors:
(1) The word2vec open source toolkit in google is utilized, and the Skip-gram model is selected by the training word vector model.
(2) And for the obtained relationship between the examples of the power Internet of things standard, a single-pass clustering method is adopted to generate a seed set extraction mode to be followed, a new type of relationship set can be obtained by extracting the seed set once, and meanwhile, context feature vectors of 3 types of standards are formed.
(3) A new cluster is equal to an example, the input quantity of the algorithm is a list of standard relation examples of the power Internet of things, and then any example i is obtained through calculation n Corresponding each cluster Cl j The similarity of (c). Referred to as a similarity threshold T Sim If the threshold value T is Sim Not greater than all similarities, the seed instance i n And adding the cluster into the corresponding cluster. If the threshold value T Sim Greater than or equal to all the similarities, a new cluster C is created m And continuing to browse the seed set relation instance set.
(4) Seed example set i n And clustering of Cl j The similarity between them is calculated by the formula Sim (i) n ,Cl j ) Then calculate i again n And clustering of Cl j If most of the similarity values are greater than the threshold T Sim Then return the maximum value, otherwise return 0. The similarity between the two examples is calculated as follows.
Sim=(S n ,S j )=α·cos(BEF i ,BEF j )+β·cos(BET i ,BET j )+γ·cos(AFT i ,AFT j ) (1)
Where α, β, γ are the weights of the vectors, BEF is the word before the first entity, BET is the word between the two entities, and AFT is the word after the second entity.
(5) Inputting: electric power internet of things standard instance Instances = { i = 1 ,i 2 ,...,i n }
And (3) outputting: relational schema Patterns = { }
Figure BDA0003833452590000081
(6) The method has the advantages that incorrect matching is prevented in the matching process, matching accuracy is improved, all patterns are scored, and the scoring of the patterns is mainly performed according to the relation of the extracted standard examples of the power internet of things, wherein the scoring grades are P, N and U. When the relation between the selected standard entities of the power internet of things is equal to the seed set, the relation is positive and the relation is added into the set P; if the selected power internet of things standard example relation is different from the seed set, the relation is negative, and the power internet of things standard example relation is added into the set N; if the relationship of the decimated power internet of things standard examples is in a seed set, and some are not identical or unknown, the decimated power internet of things standard examples can be added into the set U.
(7) The confidence of pattern P is specifically calculated as follows:
Figure BDA0003833452590000082
wherein, W n And W m Corresponding to the weights of N and U respectively,
the confidence weight calculation formula of the mode is as follows:
Figure BDA0003833452590000091
where ξ is the pattern of the decimation criterion instance i, C i Is the context for the standard instance i.
And 4, step 4: the mapping-based learning method automatically constructs a domain entity hierarchical relationship organization:
(1) The word vector model comprises a large number of semantic and word relations, and a mapping method is adopted for relation level recognition and entity recognition among the standard entity words of the power internet of things. The transition matrix maps the words to hypernyms, K-means clustering is utilized, the set of original data is S = (S1, S2.. Si =), and the difference is located in the S i Is the offset vector of the entity pair (x, y);
(2) Randomly selecting a proper amount of K cluster centroid points S1, S2.
(3) Calculating the dissimilarity degree from the unselected element centroid points to K clustering clusters so as to ensure that
Figure BDA0003833452590000092
Value reaches minimum, mu i Is the average of the classification clusters Ci;
(4) Calculating to obtain a clustering result, and calculating centroid points of the K clustering clusters selected previously;
(5) Repeating the steps (3) and (4) until the clustering result is not changed, stopping the calculation, and outputting the required result;
(6) Let y = Φ given an entity as variable x and its hypernym as variable y k x, where it is not easy to compute the transition matrix, so here the matrix is solved to the optimal solution using the gradient descent method, the formula is as follows:
Figure BDA0003833452590000093
wherein, N k Is C K The number of entity pairs in the kth cluster;
(7) The hierarchical relationship between standard entities and the structure between the entities can be regarded as that the nodes are standard entities, and the edges represent the hierarchical relationship between the entities, as follows:
Figure BDA0003833452590000101
l represents the upper and lower level word list of the entity, x, y, z represent the upper level word in the list L,
Figure BDA0003833452590000102
indicating a relationship between upper and lower levels
(8) If the Euclidean distance d (phi) is satisfied k x, y) is less than some given threshold δ, indicating that y is an hypernym for x.
d(Φ k x,y)=||Φ k x-y|| 2 <δ (6)
And 5: forming a semi-supervised node feature classification learning model based on the features between automatic learning entities of the graph neural network:
(1) Inputting samples with labels and without labels into a graph convolution neural network (GCN) model, naturally combining node characteristics and a graph structure together by the model, inputting the nodes with the labels into corresponding nodes, mixing the nodes without the labels into surrounding nodes with the labels, propagating the mixed nodes on a primary graph through a multilayer neural network, and gradually updating and perfecting the node characteristics layer by layer, thereby predicting the labels of the nodes without the labels.
(2) And inputting the relevant information and text information data of the preliminary knowledge graph into a graph volume layer, and then inputting all node information in the knowledge graph at a feature extraction layer, thereby obtaining useful information about the nodes of the standard graph.
(3) The feature extraction layer consists of two parts of standard text vectorization and optimization data;
(4) The first part carries out standard text vectorization on all the guided nodes in the knowledge graph by using a TF-IDF algorithm, firstly, after word frequency and IDF are determined, the corresponding TF-IDF is calculated, and the formula is as follows:
Figure BDA0003833452590000103
TF-IDF(x)=TF(x)·IDF(x) (8)
(5) Converting the original knowledge graph nodes into a sparse matrix about TF-IDF, wherein the matrix can represent the characteristics of the nodes;
(6) When the TF-IDF word lists are more, simultaneously, all the known node feature dimensions also become more, so that the classification effect of the next layer of relationship is indirectly influenced, therefore, SVD dimension reduction is utilized here, not only most of the known data information can be kept, but also noise and redundant information can be eliminated, therefore, the required data can be optimized, the classification effect is obviously improved, and the formula is shown as the formula:
Figure BDA0003833452590000111
(7) The method includes the steps that a preliminary power internet of things standard knowledge map composed of n nodes V = (V1, V2., vn = (E1, E2.), and m sides E = (E1, E2.), em &ispaired is subjected to feature classification of a neural network;
(8) The preliminary calculation formula of the adjacency matrix of the standard knowledge graph of the power internet of things is A = (a) ij )(10)
The degree matrix calculation formula is D = diag (∑ D) i≠j a ij ) (11) the regularized Laplace matrix calculation formula is:
Figure BDA0003833452590000112
(9) Decomposing the Laplace matrix into two parts by adopting characteristics, wherein one part is a characteristic value, the other part is a characteristic vector, and the definition is as follows:
g θ *x=Ug(ψ)U T *x (13)
wherein x represents a scalar of the node, U represents the decomposed feature vector, and ψ represents the feature value;
(10) Since the computation difficulty of the feature decomposition is high, the computation range is dense, the convolution kernels are all the convolution kernels, and the parameter quantity and the number of nodes in the graph are increased in proportion, a measure of a loose constraint condition is proposed, namely, a chebyshev polynomial is introduced into the GCN model to approximate g (ψ), and the chebyshev polynomial defines the formula as follows:
Figure BDA0003833452590000113
(11) The GCN model approximates g (ψ) using a first order Chebyshev polynomial as shown:
Figure BDA0003833452590000121
order to
Figure BDA0003833452590000122
Then the single layer convolution operation is:
Figure BDA0003833452590000123
(12) Two-layer graph rolling network of
Figure BDA0003833452590000124
Figure BDA0003833452590000125
Wherein W is the relationship weight between entities in the neural network, the neural network has two layers in total, W (1) Represents the first layer weight, W (2) Representing second tier weights, X representing first tier inputs, H (1) The input representing the second layer is also the output of the first layer, H (2) Representing the second layer output, reLU refers to a non-linear activation function, the output layer contains the softmax function,
Figure BDA0003833452590000126
is a single layer convolution operation, i.e. a convolution kernel, i.e.:
Figure BDA0003833452590000127
(13) Converting the characteristic matrix through convolution of the two layers of graphs, and inputting the obtained convolution result into an output layer through convolution of the two layers of graphs;
(14) The convolution result of the vector matrix obtained by the graph volume layer is sent to an output layer of the graph volume neural network, and the class labels of n nodes can be obtained through the operation calculation of the softmax function, yi represents different class labels of the node i (i =1,2,3, \8230;, n), and the classification result of the node features is as follows:
Z=softmax(H (2) ) (20)
(15) The model evaluates the cross-entropy loss function using the labeled samples:
Figure BDA0003833452590000128
wherein y is L Is a collection of nodes containing labels.
(16) And finally, all the nodes to be classified contain class labels, so that semi-supervised classification of the node characteristics in the preliminarily constructed standard knowledge graph of the power internet of things is realized.
(17) Two main sub-procedures are used according to the analytic hierarchy process, a feature selection algorithm to determine the appropriate set of features at each decision point, and a supervised learning algorithm to build classifiers for the decisions. The standard knowledge graph of the power internet of things is completely constructed through the word vector and the graph convolution neural network.
Step 6: a domain knowledge display method based on a graph database Neo4j comprises the following steps:
and storing the relation triples obtained based on an analytic hierarchy process, word vectors and a graph neural network into a Neo4j graph database in batches by using a py2Neo library, and completing construction of the standard knowledge graph of the power internet of things by using an open source visualization library Echarts.
Example 2
The embodiment provides a device is constructed to electric power thing networking standard knowledge map, includes:
the preprocessing unit is used for acquiring a standard text set of the power Internet of things and preprocessing the standard text set;
the hierarchical relation extraction unit is used for carrying out hierarchical relation extraction and entity identification between entities in a semi-supervised form on the preprocessed standard text set through a word vector model which is constructed in advance;
the mapping unit is used for mapping the hierarchical relationship between the extracted entities into a pre-constructed preliminary knowledge map by a mapping learning method;
and the map construction unit is used for automatically learning the characteristics between the entity and the entity attribute relation based on the neural network of the map, naturally combining the map structure and the node characteristics to train a classifier, and performing semi-supervised classification on the nodes to be classified of the map to construct the standard knowledge map of the power internet of things.
Example 3
The embodiment provides a device for constructing a standard knowledge graph of an electric power internet of things, which comprises a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method according to any of embodiment 1.
Example 4
The present embodiment provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the method of any of embodiment 1.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, it is possible to make various improvements and modifications without departing from the technical principle of the present invention, and those improvements and modifications should be considered as the protection scope of the present invention.

Claims (10)

1. A method for constructing a standard knowledge graph of an electric power Internet of things is characterized by comprising the following steps:
acquiring a standard text set of the power Internet of things, and preprocessing the standard text set;
performing hierarchical relation extraction and entity identification between entities in a semi-supervised form on the preprocessed standard text set through a pre-constructed word vector model;
mapping the hierarchical relation between the extracted entities to a pre-constructed primary knowledge map by a mapping learning method;
automatically learning characteristics between entities and entity attribute relations based on a graph neural network, naturally combining graph structures and node characteristics to train a classifier, carrying out semi-supervised classification on nodes to be classified of the graph, and constructing a standard knowledge graph of the power internet of things.
2. The electric power internet of things standard knowledge graph construction method according to claim 1, characterized in that: the preprocessing the standard text set comprises: and performing word segmentation, labeling, naming, entity identification, merging and corpus format conversion processing on the standard text set.
3. The electric power internet of things standard knowledge graph construction method according to claim 2, wherein the preprocessing of the standard text set comprises the following processes:
storing the standard text data of the power Internet of things into a database, and directly converting all data in the database into structured form data;
extracting relationships among standard entities, between entities and attributes and between attributes of the power internet of things according to the requirements of the standard knowledge graph of the power internet of things to form entities, attributes and relationship triples;
selecting a short sentence in the standard document of the power Internet of things as a unit, labeling the selected entity, scanning the processed document in the previous step, and finally obtaining all sentences containing more than two entity fields.
4. The electric power internet of things standard knowledge graph construction method according to claim 1, wherein the extraction of hierarchical relationship and entity recognition between entities in a semi-supervised form are carried out on the preprocessed standard text sets through a pre-constructed word vector model, and the method comprises the following steps:
constructing a word vector model, and training the word vector model through a Skip-gram model;
for the relation between the standard entities of the power Internet of things, a single-pass clustering method is adopted to generate and obtain a seed set extraction mode to be followed, a new type of relation between layers is obtained when the seed set is extracted once, and meanwhile, 3 types of standard context feature vectors are formed;
a new cluster is equal to an example, the input quantity of the algorithm is a list of standard relation examples of the power Internet of things, and then any example i is obtained through calculation n Corresponding each cluster Cl j Is called similarity threshold T Sim If the threshold value T is Sim Not greater than all similarities, the seed instance i n Adding into corresponding cluster if threshold value T Sim Greater than or equal to all the similarities, a new cluster C is created m Continuing to browse the seed set relation instance set,
seed example set i n And clustering of Cl j The similarity between them is calculated by Sim (i) n ,Cl j ) Then calculate i again n And clustering of Cl j If most of the similarity values are greater than the threshold T Sim Then return to itOtherwise, return 0, and the similarity calculation formula between the two examples is as follows:
Sim=(S n ,S j )=α·cos(BEF i ,BEF j )+β·cos(BET i ,BET j )+γ·cos(AFT i ,AFT j ) (1)
wherein α, β, γ are the weights of the vectors, BEF is the word before the first entity, BET is the word between the two entities, AFT is the word after the second entity;
inputting a standard example of the Internet of things of electric power and outputting a relation mode.
5. The electric power Internet of things standard knowledge graph construction method according to claim 4,
scoring all the modes, wherein the scoring of the modes is mainly based on the relation of the extracted standard examples of the power internet of things, and the scoring grades are P, N and U; when the relation between the selected standard entities of the power internet of things is equal to the seed set, the relation is positive and the relation is added into the set P; if the selected power internet of things standard example relation is different from the seed set, the relation is negative, and the power internet of things standard example relation is added into the set N; if the relationship of the decimated power internet of things standard examples is in a seed set, and some are not completely the same or unknown, the decimated power internet of things standard examples can be added into a set U;
the confidence of pattern P is specifically calculated as follows:
Figure FDA0003833452580000031
wherein, W n And W m Corresponding to the weights of N and U respectively,
the confidence weight calculation formula of the mode is as follows:
Figure FDA0003833452580000032
where ξ is the modulus of the decimation criterion instance iFormula (II) C i Is the context for standard instance i.
6. The electric power internet of things standard knowledge graph building method according to claim 1, wherein the semi-supervised classification of the nodes to be classified of the graph is performed to build the electric power internet of things standard knowledge graph, and the method comprises the following steps: inputting the constructed preliminary power internet of things standard knowledge graph into the graph convolution layer, directly extracting feature information of all nodes from the preliminary power internet of things standard knowledge graph, inputting the feature information into the feature extraction layer, and training the relation triple classifier to obtain complete feature information of the power internet of things standard knowledge graph nodes.
7. The method for constructing the standard knowledge graph of the power internet of things as claimed in claim 1, wherein a py2Neo library is used for storing relational triples based on an analytic hierarchy process, word vectors and a graph neural network into a Neo4j graph database, and an open source visualization library Echarts is used for completing construction of the standard knowledge graph of the power internet of things.
8. The utility model provides an electric power thing networking standard knowledge map founds device which characterized in that includes:
the preprocessing unit is used for acquiring a standard text set of the power Internet of things and preprocessing the standard text set;
the hierarchical relation extraction unit is used for carrying out hierarchical relation extraction and entity identification between entities in a semi-supervised form on the preprocessed standard text set through a word vector model which is constructed in advance;
the mapping unit is used for mapping the hierarchical relationship between the extracted entities into a pre-constructed preliminary knowledge map by a mapping learning method;
and the map construction unit is used for automatically learning the characteristics between the entity and the entity attribute relation based on the neural network of the map, naturally combining the map structure and the node characteristics to train the classifier, performing semi-supervised classification on the nodes to be classified of the map, and constructing the standard knowledge map of the power internet of things.
9. The utility model provides an electric power thing networking standard knowledge map founds device which characterized in that: comprising a processor and a storage medium;
the storage medium is to store instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method according to any one of claims 1 to 7.
10. A computer-readable storage medium having stored thereon a computer program, characterized in that: the program when executed by a processor implements the steps of the method of any one of claims 1 to 7.
CN202211082212.XA 2022-09-06 2022-09-06 Electric power Internet of things standard knowledge graph construction method and device Pending CN115329101A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211082212.XA CN115329101A (en) 2022-09-06 2022-09-06 Electric power Internet of things standard knowledge graph construction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211082212.XA CN115329101A (en) 2022-09-06 2022-09-06 Electric power Internet of things standard knowledge graph construction method and device

Publications (1)

Publication Number Publication Date
CN115329101A true CN115329101A (en) 2022-11-11

Family

ID=83930418

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211082212.XA Pending CN115329101A (en) 2022-09-06 2022-09-06 Electric power Internet of things standard knowledge graph construction method and device

Country Status (1)

Country Link
CN (1) CN115329101A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116127090A (en) * 2022-12-28 2023-05-16 中国航空综合技术研究所 Aviation system knowledge graph construction method based on fusion and semi-supervision information extraction
CN117667890A (en) * 2023-12-01 2024-03-08 中国标准化研究院 Knowledge base construction method and system for standard digitization

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116127090A (en) * 2022-12-28 2023-05-16 中国航空综合技术研究所 Aviation system knowledge graph construction method based on fusion and semi-supervision information extraction
CN116127090B (en) * 2022-12-28 2023-11-21 中国航空综合技术研究所 Aviation system knowledge graph construction method based on fusion and semi-supervision information extraction
CN117667890A (en) * 2023-12-01 2024-03-08 中国标准化研究院 Knowledge base construction method and system for standard digitization

Similar Documents

Publication Publication Date Title
CN109189925B (en) Word vector model based on point mutual information and text classification method based on CNN
CN109635291B (en) Recommendation method for fusing scoring information and article content based on collaborative training
CN108388651B (en) Text classification method based on graph kernel and convolutional neural network
WO2018218708A1 (en) Deep-learning-based public opinion hotspot category classification method
CN110674850A (en) Image description generation method based on attention mechanism
CN115329101A (en) Electric power Internet of things standard knowledge graph construction method and device
CN113360582B (en) Relation classification method and system based on BERT model fusion multi-entity information
CN111858940A (en) Multi-head attention-based legal case similarity calculation method and system
CN115269865A (en) Knowledge graph construction method for auxiliary diagnosis
CN112199501A (en) Scientific and technological information text classification method
CN114911945A (en) Knowledge graph-based multi-value chain data management auxiliary decision model construction method
CN114722820A (en) Chinese entity relation extraction method based on gating mechanism and graph attention network
CN114580638A (en) Knowledge graph representation learning method and system based on text graph enhancement
CN114265935A (en) Science and technology project establishment management auxiliary decision-making method and system based on text mining
CN114997288A (en) Design resource association method
CN112836051A (en) Online self-learning court electronic file text classification method
CN116467443A (en) Topic identification-based online public opinion text classification method
CN116610818A (en) Construction method and system of power transmission and transformation project knowledge base
CN112784017B (en) Archive cross-modal data feature fusion method based on main affinity expression
CN113837307A (en) Data similarity calculation method and device, readable medium and electronic equipment
CN111951079B (en) Credit rating method and device based on knowledge graph and electronic equipment
CN117291190A (en) User demand calculation method based on emotion dictionary and LDA topic model
CN115964468A (en) Rural information intelligent question-answering method and device based on multilevel template matching
CN115905554A (en) Chinese academic knowledge graph construction method based on multidisciplinary classification
CN115129890A (en) Feedback data map generation method and generation device, question answering device and refrigerator

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination