CN115934955A - Electric power standard knowledge graph construction method, knowledge question answering system and device - Google Patents

Electric power standard knowledge graph construction method, knowledge question answering system and device Download PDF

Info

Publication number
CN115934955A
CN115934955A CN202211320954.1A CN202211320954A CN115934955A CN 115934955 A CN115934955 A CN 115934955A CN 202211320954 A CN202211320954 A CN 202211320954A CN 115934955 A CN115934955 A CN 115934955A
Authority
CN
China
Prior art keywords
sequence
submodel
vector
text
knowledge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211320954.1A
Other languages
Chinese (zh)
Inventor
周育忠
林正平
王冕
涂亮
杨宇亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CSG Electric Power Research Institute
Guizhou Power Grid Co Ltd
Original Assignee
CSG Electric Power Research Institute
Guizhou Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CSG Electric Power Research Institute, Guizhou Power Grid Co Ltd filed Critical CSG Electric Power Research Institute
Priority to CN202211320954.1A priority Critical patent/CN115934955A/en
Publication of CN115934955A publication Critical patent/CN115934955A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a power standard knowledge graph construction method, a knowledge question-answering system and a device, which comprises the steps of constructing a body structure of a power standard knowledge graph through collected power standard data, wherein the body structure comprises entities, attributes and relationships among the entities; acquiring basic data containing power standard knowledge, extracting the basic data to extract entities, attributes and relationships among the entities; and performing knowledge fusion based on the extracted knowledge, storing the fused knowledge, and constructing the power standard knowledge map. The invention effectively solves the problem of difficulty in extracting the knowledge of the electric power standard through the designed model aiming at the knowledge extraction of the text information and the image information, thereby not only ensuring the reliability of the knowledge extraction, but also ensuring the extraction efficiency.

Description

Electric power standard knowledge graph construction method, knowledge question answering system and device
Technical Field
The invention relates to the technical field of electric power, in particular to a construction method of an electric power standard knowledge graph, a knowledge question-answering system and a knowledge question-answering device.
Background
The knowledge graph is a modern theory which achieves the aim of multi-discipline fusion by combining theories and methods of applying subjects such as mathematics, graphics, information visualization technology, information science and the like with methods such as metrology introduction analysis, co-occurrence analysis and the like and utilizing a visualized graph to vividly display the core structure, development history, frontier field and overall knowledge framework of the subjects. The method can display the complex knowledge field through data mining, information processing, knowledge measurement and graph drawing, reveal the dynamic development rule of the knowledge field, and provide practical and valuable reference for subject research.
Applications based on knowledge graphs are many, such as intelligent question answering, personalized recommendation, knowledge reasoning, visualization, and the like. The knowledge question-answering system is similar to a search engine and is also an information retrieval tool, but the knowledge question-answering system can understand and process natural language questions at a semantic level and directly returns answers to the questions to realize semantic retrieval. If the knowledge map is used as a knowledge source of the question-answering system, the knowledge question-answering system based on the knowledge base is formed, the questions in natural language forms can be accepted, the meaning of the questions is understood through semantic analysis, and then the answers of the questions are inquired and returned in the knowledge base.
The existing related knowledge of the power industry generally depends on a search engine, and an intelligent question and answer system in the vertical field is not seen. The reason is that the difficulty in constructing the knowledge graph is caused by high difficulty in extracting relevant knowledge in the process of constructing the power standard knowledge graph.
Disclosure of Invention
This section is for the purpose of summarizing some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. In this section, as well as in the abstract and the title of the invention of this application, simplifications or omissions may be made to avoid obscuring the purpose of the section, the abstract and the title, and such simplifications or omissions are not intended to limit the scope of the invention.
The present invention has been made in view of the above and/or problems occurring in the prior art related to the power industry.
Therefore, the problem to be solved by the invention is how to extract relevant knowledge in the process of constructing the power standard knowledge graph.
In order to solve the technical problems, the invention provides the following technical scheme:
in a first aspect, an embodiment of the present invention provides a power standard knowledge graph construction method, which includes,
constructing an ontology structure of the power standard knowledge graph through the acquired power standard data, wherein the ontology structure comprises entities, attributes and relationships among the entities;
acquiring basic data containing power standard knowledge, extracting the basic data to extract entities, attributes and relationships among the entities;
and performing knowledge fusion based on the extracted knowledge, storing the fused knowledge, and constructing the power standard knowledge map.
As a preferable scheme of the electric power standard knowledge graph construction method of the invention, the method comprises the following steps: the acquiring of basic data containing power standard knowledge and the knowledge extraction of the basic data comprise,
preprocessing the basic data to obtain a plurality of text messages or obtain a plurality of text messages and at least one image message;
for each text message, segmenting the text message and inputting the segmented text message into a Bert submodel to obtain a corresponding vector sequence, inputting the vector sequence into a BGRU submodel, outputting a state matrix for revealing scores of labels corresponding to words in the text message, inputting the state matrix into a CRF submodel, calculating an optimal label sequence, and realizing extraction of an entity and extraction of attributes;
aiming at each piece of image information, inputting the image information into a formula recognition sub tool called by the outside to obtain converted text information, processing the converted text information to obtain at least one formula text, inputting each formula text into a WordBert sub-model together to obtain a corresponding vector sequence, inputting the vector sequence into a BGRU sub-model, outputting a state matrix for revealing each label score corresponding to each formula text in the converted text information, inputting the state matrix into a CRF sub-model, calculating an optimal label sequence, and realizing the extraction of attributes;
and processing the vector sequences of the extracted entities and attributes and then inputting the processed vector sequences into a relation extraction submodel to realize the extraction of the relation between the entities.
As a preferable scheme of the electric power standard knowledge graph construction method of the invention, the method comprises the following steps: the knowledge extraction method for each text message includes,
segmenting the text information to obtain a segmented text w with the length of n;
segmenting the word text w = ([ CLS)],w 1 ,w 2 ,…,w n ,[SEP]) Inputting the word segmentation text w into a Bert submodel to obtain a vector sequence l = (l) corresponding to the word segmentation text w 0 ,l 1 ,l 2 ,…,l n ,l n+1 ),l i ∈R n×L Wherein i is epsilon [0, n +1]Vector sequence l = (l) 0 ,l 1 ,l 2 ,…,l n ,l n+1 ) For the hidden state corresponding to the participle text w in the last layer of the Bert submodel, [ CLS]As an initiator, [ SEP]For the terminator, L is the hidden state dimension of the Bert submodel;
vector sequence l = (l) 0 ,l 1 ,l 2 ,…,l n ,l n+1 ) In each word vector sequence l i As input for each time step in the BGRU submodel;
hidden state sequence for outputting forward GRU in BGRU submodel
Figure BDA0003910410720000031
And hidden state sequence of reverse GRU output
Figure BDA0003910410720000032
Calculating to obtain a hidden state sequence h corresponding to the vector sequence l n+1 ,h n+1 ∈R n×H H is the hidden state dimension of the BGRU submodel;
will hide the state sequence h n+1 Mapping from H dimension to k dimension, wherein k is the number of labels;
calculating the label scores of each participle classified into k labels to obtain a state matrix E = (E) 0 ,e 1 ,e 2 ,…,e n ,e n+1 ) In which e is i ∈R k Is a column vector;
and inputting the state matrix into a CRF submodel, and calculating an optimal label sequence.
As a preferable scheme of the electric power standard knowledge graph construction method of the invention, the method comprises the following steps: inputting the state matrix into a CRF submodel, calculating an optimal label sequence comprises,
state matrix E = (E) 0 ,e 1 ,e 2 ,…,e n ,e n+1 ) Inputting the CRF model into a CRF sub-model;
calculating each label sequence based on a constraint matrix F introduced in a CRF submodel and an input state matrix E
Figure BDA0003910410720000033
Total score of (c):
Figure BDA0003910410720000034
/>
wherein F ∈ R (k+2)×(k+2)
Figure BDA0003910410720000035
Represents the tag sequence->
Figure BDA0003910410720000036
Alpha is an adjustment factor, and>
Figure BDA0003910410720000037
represents the probability that the ith participle is classified into the jth tag in the state matrix E, and ` is greater or lesser than `>
Figure BDA0003910410720000038
Indicates that it is based on the tag sequence->
Figure BDA0003910410720000039
The probability of the jth label transitioning to the j +1 th label;
based on each tag sequence
Figure BDA00039104107200000310
Is greater than or equal to>
Figure BDA00039104107200000311
Calculating an optimal tag sequence->
Figure BDA00039104107200000312
Figure BDA00039104107200000313
Wherein the content of the first and second substances,
Figure BDA00039104107200000314
is a collection of all possible tag sequences.
As a preferable scheme of the electric power standard knowledge graph construction method of the invention, the method comprises the following steps: the knowledge extraction method for each image information includes,
identifying the converted text information, and determining whether a target symbol "=" exists;
if the target symbol "=" does not exist, determining the converted text information as a formula text;
if the target symbol "=" exists, splitting the converted text information by using the target symbol "=" to obtain a plurality of formula texts;
combining formula text with v = ([ CLS)],v 1 ,v 2 ,…,v m ,[SEP]) Inputting the vector sequence into a WordBert submodel to obtain a vector sequence l = (l) corresponding to a formula text combination v 0 ,l 1 ,l 2 ,…,l m ,l m+1 ),l i ∈R m×L Wherein i is epsilon [0, m +1 +]Vector sequence l = (l) 0 ,l 1 ,l 2 ,…,l m ,l m+1 ) Is a hidden state corresponding to a formula text combination v in the last layer of the WordBert submodel, [ CLS]As initiator, [ SEP]For the terminator, L is the hidden state dimension of the WordBert submodel;
vector sequence l = (l) 0 ,l 1 ,l 2 ,…,l m ,l m+1 ) In each formula vector sequence l i As input for each time step in the BGRU submodel; hidden state sequence for outputting forward GRU in BGRU submodel
Figure BDA0003910410720000041
And a hidden state sequence output in reverse GRU>
Figure BDA0003910410720000042
Calculating to obtain a hidden state sequence h corresponding to the vector sequence l m+1 ,h m+1 ∈R m×H H is the hidden state dimension of the BGRU submodel; will hide the state sequence h m+1 Mapping from H dimension to k dimension, wherein k is the number of labels; calculating the label scores of k labels classified by each formula to obtain a state matrix E = (E) 0 ,e 1 ,e 2 ,…,e m ,e m+1 ),e i ∈R k Is a column vector;
and inputting the state matrix into a CRF submodel, and calculating an optimal label sequence.
As a preferable scheme of the electric power standard knowledge graph construction method of the invention, the method comprises the following steps: inputting the state matrix into a CRF submodel, calculating an optimal label sequence comprises,
state matrix E = (E) 0 ,e 1 ,e 2 ,…,e m ,e m+1 ) Inputting the CRF model into a CRF sub-model;
calculating each tag sequence based on the input state matrix E
Figure BDA0003910410720000043
Total score of (c):
Figure BDA0003910410720000044
wherein the content of the first and second substances,
Figure BDA0003910410720000045
represents the tag sequence->
Figure BDA0003910410720000046
Is based on the total score of->
Figure BDA0003910410720000047
Representing the probability of classifying the ith component into the jth label in the state matrix E;
based on each tag sequence
Figure BDA0003910410720000048
Is greater than or equal to>
Figure BDA0003910410720000049
Calculating an optimal tag sequence->
Figure BDA00039104107200000410
Wherein it is present>
Figure BDA00039104107200000411
Is a collection of all possible tag sequences.
As a preferable scheme of the electric power standard knowledge graph construction method of the invention, the method comprises the following steps: the vector sequence of the extracted entities and attributes is processed and then input into a relation extraction submodel to realize the extraction of the relation between the entities,
based on the extracted entities, the vector sequence l = (l) corresponding to the segmented text w 0 ,l 1 ,l 2 ,…,l n ,l n+1 ) The corresponding vector in (1) is marked;
inputting the marked vector sequence l' into a relation extraction submodel;
performing binary mutual grouping on all the marker vectors aiming at the marker vectors with the markers in the vector sequence l' so as to enable each marker vector and other marker vectors to have a paired combination relationship;
splicing two marker vectors of the marker vector pair to obtain a combined vector aiming at each marker vector pair with a combination relation;
calculating the score of each combination vector under each relation category;
and respectively obtaining the optimal scores corresponding to each combined vector, sorting, eliminating the last optimal score in the sorting, and determining the relationship between the entities with the corresponding relationship categories between the entities corresponding to the combined vectors aiming at each residual optimal score to realize the extraction of the relationship between the entities.
In a second aspect, an embodiment of the present invention provides a power standard knowledge graph building system, which includes:
the data layer comprises a pre-constructed power standard knowledge graph and a word segmentation dictionary constructed based on entities and attributes in the power standard knowledge graph;
the Web layer is used for receiving question information of a user and generating and displaying answer information based on a query result of the query layer, wherein the question information is in a natural language form;
and the query layer is used for converting the question information into Cypher query sentences, sending the Cypher query sentences to the Neo4j graph database for query and acquiring query results.
In a third aspect, an embodiment of the present invention provides a computer device, including a memory and a processor, where the memory stores a computer program, and where: the processor, when executing the computer program, performs any of the steps of the above-described method.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, wherein: which when executed by a processor performs any of the steps of the above-described method.
The invention has the beneficial effects that: the knowledge extraction of the text information realizes the combined extraction of the entities and the attributes in the power standard knowledge through a designed model, so that the reliability of the knowledge extraction can be ensured, and the extraction efficiency can be ensured; the problem that power standard knowledge is difficult to extract (because formula images are used for representing data of relevant information such as numerical value limitation and calculation modes, effective knowledge extraction cannot be realized in the prior art) is effectively solved through a designed model aiming at knowledge extraction of image information, and not only can extraction of relevant knowledge in the formula images be guaranteed, but also reliability of extraction of the knowledge can be guaranteed. And moreover, the designed WordBert submodel is adopted for the formula text, word segmentation operation is not involved, the processing process can be reduced, information can be effectively reserved, and the problem of wrong formula information extraction caused by word segmentation in the traditional Bert model is solved. The vector sequence of the extracted entities and the attributes is processed and then input into the relation extraction submodel, so that the extraction of the relation between the entities is realized, the subsequent relation processing can be carried out by utilizing the vector sequence obtained by processing the Bert submodel, and the extraction of the relation can be carried out after the corresponding processing, so that the workload of knowledge extraction can be effectively reduced (because the repeated entity extraction process is not required), and the entity is determined, so that the work can be doubled with half the effort in the relation extraction process.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise. Wherein:
FIG. 1 is a flow chart of a power standard knowledge graph construction method.
FIG. 2 is a schematic diagram of a power standard knowledge graph building model.
FIG. 3 is a schematic diagram of a power standard knowledge question-answering system.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, specific embodiments accompanied with figures are described in detail below, and it is apparent that the described embodiments are a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, shall fall within the protection scope of the present invention.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.
Furthermore, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.
The present invention will be described in detail with reference to the drawings, wherein the cross-sectional views illustrating the structure of the device are not enlarged partially in general scale for convenience of illustration, and the drawings are only exemplary and should not be construed as limiting the scope of the present invention. In addition, the three-dimensional dimensions of length, width and depth should be included in the actual fabrication.
Meanwhile, in the description of the present invention, it should be noted that the terms "upper, lower, inner and outer" and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of describing the present invention and simplifying the description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation and operate, and thus, cannot be construed as limiting the present invention. Furthermore, the terms first, second, or third are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
The terms "mounted, connected" and "connected" in the present invention are to be construed broadly, unless otherwise explicitly specified or limited, for example: can be fixedly connected, detachably connected or integrally connected; they may be mechanically, electrically, or directly connected, or indirectly connected through intervening media, or may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
Example 1
Referring to fig. 1 and 2, a first embodiment of the present invention provides a power standard knowledge graph construction method, including:
s100: and constructing an ontology structure of the power standard knowledge graph through the collected power standard data, wherein the ontology structure comprises entities, attributes and relationships among the entities.
It should be noted that, considering the field of power standard knowledge graph, the ontology structure may be constructed in a top-down and bottom-up manner, and a part of the ontology structure is designed in advance, such as: the method comprises the following steps of obtaining the standard name of the electric power (such as building lightning protection design specifications), indexes (such as lightning protection devices), indexes (lightning protection devices), lower-layer indexes (lightning protection lines) and the like, and finding and adding a new body structure in the follow-up knowledge extraction process.
S200: acquiring basic data containing power standard knowledge, extracting the basic data to extract entities, attributes and relationships among the entities.
It should be noted that, acquiring basic data containing power standard knowledge may be implemented by collecting documents, crawling web pages, and the like. For example, the information about the power standard knowledge in the web page can be crawled, and can also be obtained from the constructed data set (since the power standard knowledge belongs to a very vertical field and the knowledge is relatively stable). The basic data containing the power standard knowledge may be plain text data (e.g., a word document, a PDF document, a TXT document, etc.), or may be a combination of text data and a formula image (e.g., a PDF document containing a formula, a word document containing a formula image, etc.), and the basic data may be a document obtained by processing and then sorting data obtained by crawling a web page.
It should be noted that, for text data in the base data, the text data in the base data may be split into a plurality of text information based on the sentence separator.
It should be noted that the acquired basic data may be preprocessed to obtain a plurality of text messages, or obtain a plurality of text messages and at least one image message.
Further, if a formula image exists in the basic data, the formula image may be processed for each formula image in the basic data to obtain corresponding image information. For example, a formula image can be input into Mathpix to obtain an output formula, the output Latex format can be converted into tex, and then MathType is used to convert the Latex into MathML format, i.e. plain text format, which can be used to obtain a Word document.
Further, for each image information, a number may be assigned to the image information, and the same number may be assigned to all text information corresponding to the paragraph in which the formula image is located and the adjacent paragraph in the text data, so as to establish an association relationship between the image information and the text information. By the method, the incidence relation can be established for the text information and the image information, so that the entity object to which the attribute belongs can be conveniently determined subsequently, and the accuracy and the reliability of the knowledge graph are ensured.
It should be noted that, in order to implement knowledge extraction (joint extraction of entities and attributes) on text information, for each text information, the text information may be segmented and input to the Bert submodel, so as to obtain a corresponding vector sequence.
Specifically, the text information is segmented to obtain a segmented text w with the length of n, and then the segmented text w = ([ CLS)],w 1 ,w 2 ,…,w n ,[SEP]) Transfusion deviceEntering a Bert submodel to obtain a vector sequence l = (l) corresponding to the word segmentation text w 0 ,l 1 ,l 2 ,…,l n ,l n+1 ),l i ∈R n×L
Wherein i belongs to [0, n +1 ]]Vector sequence l = (l) 0 ,l 1 ,l 2 ,…,l n ,l n+1 ) For the hidden state corresponding to the participle text w in the last layer of the Bert submodel, [ CLS]As an initiator, [ SEP]For the terminator, L is the hidden state dimension (e.g., 100 dimensions, 200 dimensions, etc.) of the Bert submodel.
Further, after the vector sequence l output by the Bert submodel is obtained, the vector sequence l can be input into the BGRU submodel, and the BGRU submodel outputs a state matrix for revealing scores of the labels corresponding to the words in the text information.
Specifically, the vector sequence l = (l) 0 ,l 1 ,l 2 ,…,l n ,l n+1 ) In each word vector sequence l i Respectively used as the input of each time step (n +2 time steps are needed) in the BGRU submodel, and then the hidden state sequence of the forward GRU output in the BGRU submodel
Figure BDA0003910410720000081
And a hidden state sequence of inverted GRU outputs>
Figure BDA0003910410720000082
Calculating to obtain a hidden state sequence h corresponding to the vector sequence l n+1 ,h n+1 ∈R n×H Where H is the hidden state dimension of the BGRU submodel.
It should be noted that the hidden state sequence is output to the GRU in the forward direction
Figure BDA0003910410720000083
Hidden state sequence with reverse GRU output
Figure BDA0003910410720000084
Averaging after bit-wise addition (to further improve the precision, averaging can also be performed in a bit-wise weighted addition manner) to obtainHidden state sequence h n+1
Further, a hidden state sequence h n+1 Mapping from H dimension to k dimension, wherein k is the number of labels, and calculating the label score of each participle classification to k labels to obtain a state matrix E = (E) 0 ,e 1 ,e 2 ,…,e n ,e n+1 ) Wherein e is i ∈R k ,e i Is a column vector;
and inputting the state matrix into a CRF submodel, calculating an optimal label sequence, and realizing the extraction of the entity and the attribute.
Specifically, the state matrix E = (E) 0 ,e 1 ,e 2 ,…,e n ,e n+1 ) Inputting the state matrix E into a CRF submodel based on a constraint matrix F introduced into the CRF submodel and an input state matrix E, wherein F belongs to R (k+2)×(k+2) Each tag sequence is calculated using the following formula
Figure BDA0003910410720000091
Total score of (c):
Figure BDA0003910410720000092
wherein the content of the first and second substances,
Figure BDA0003910410720000093
represents the tag sequence->
Figure BDA0003910410720000094
Alpha is a regulatory factor, based on the total score of (a)>
Figure BDA0003910410720000095
Represents the probability that the ith participle is classified into the jth tag in the state matrix E, and ` is greater or lesser than `>
Figure BDA0003910410720000096
Indicates that it is based on the tag sequence->
Figure BDA0003910410720000097
Probability of the jth label in (j) to transition to the j +1 th label.
Then, the sequence can be based on each label
Figure BDA0003910410720000098
In total score +>
Figure BDA0003910410720000099
Substituting the following formula to calculate the optimal tag sequence
Figure BDA00039104107200000910
Figure BDA00039104107200000911
Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA00039104107200000912
is a collection of all possible tag sequences.
In addition, in order to guarantee the applicability of the introduced constraint matrix F, the following loss function can be added in the CRF submodel, and in the training phase, the constraint matrix F is learned by minimizing the loss function.
Figure BDA00039104107200000913
Wherein y is the correct tag sequence,
Figure BDA00039104107200000914
is a collection of all possible tag sequences.
The method realizes the combined extraction of the entities and the attributes in the power standard knowledge through the designed model, not only can ensure the reliability of the knowledge extraction, but also can ensure the extraction efficiency. Due to the adoption of the construction of the Bert + BGRU + CRF model, the method can perform word segmentation and then utilize the Bert model for processing, realize the combined extraction of the entity and the attribute, improve the accuracy of the entity and the attribute extraction, and reduce the model designDifficulty. And the introduced constraint matrix F is used for constraining the state matrix E, so that illegal tag sequences can be prevented from being output. And, calculating each tag sequence
Figure BDA00039104107200000915
The adjustment factor alpha is introduced during the total score, so that the method has stronger applicability in the process of entity and attribute combined extraction, ensures the accuracy of entity and attribute extraction, and overcomes the problem caused by the difference of a constraint matrix F required in the process of entity and attribute extraction (the problem that the entity extraction precision is high but the attribute extraction precision is low or the entity extraction precision is low but the attribute extraction precision is high because the attribute and the entity adopt the constraint matrix of the same standard).
In order to realize knowledge extraction (extraction of attributes) of image information, for each image information, the image information can be input into an externally called formula identification sub-tool, and converted text information is obtained. The converted text information may then be processed to obtain at least one formula text.
It should be noted that the converted text information may be recognized to determine whether the target symbol "=" is present therein. If the target symbol "=" does not exist, determining the converted text information as a formula text; if the target symbol "=" exists, the converted text information is split by the target symbol "=" to obtain a plurality of formula texts (if there are 4 target symbols "=", the formula texts can be split into 5 formula texts).
The attribute related to the formula can be split into an attribute identification part (for example, symbolic representation of the attribute) and an attribute definition part (for example, value definition of the attribute, parameter value range definition, and the like), and even an intermediate derivation process including the attribute is available.
For each formula text, each formula text can be input into the WordPert submodel together to obtain a corresponding vector sequence. Here, each formula text represents a formula text obtained by splitting text information obtained by converting the same image data.
Specifically, the formula text is combined with v = ([ CLS)],v 1 ,v 2 ,…,v m ,[SEP]) Inputting the vector sequence into a WordBert submodel to obtain a vector sequence l = (l) corresponding to a formula text combination v 0 ,l 1 ,l 2 ,…,l m ,l m+1 ),l i ∈R m×L Wherein i is epsilon [0, m +1 +]Vector sequence l = (l) 0 ,l 1 ,l 2 ,…,l m ,l m+1 ) Is a hidden state corresponding to a formula text combination v in the last layer of the WordBert submodel, [ CLS]As an initiator, [ SEP]For the terminator, L is the hidden state dimension of the WordBert submodel (e.g., 100, 200, consistent with the hidden state dimension of the Bert submodel).
The obtained vector sequence can be input into a BGRU submodel, and a state matrix for revealing each label score corresponding to each formula text in the converted text information is output.
Specifically, a vector sequence l = (l) 0 ,l 1 ,l 2 ,…,l m ,l m+1 ) In each formula vector sequence l i As the input of each time step in the BGRU submodel, the hidden state sequence of the forward GRU output in the BGRU submodel
Figure BDA0003910410720000101
And a hidden state sequence of inverted GRU outputs>
Figure BDA0003910410720000102
Calculating to obtain a hidden state sequence h corresponding to the vector sequence l m+1 ,h m+1 ∈R m×H And H is the hidden state dimension of the BGRU submodel. Then the hidden state sequence h m+1 Mapping from H dimension to k dimension, wherein k is the number of labels, calculating the label score of each formula classification to k labels to obtain a state matrix E = (E) 0 ,e 1 ,e 2 ,…,e m ,e m+1 ),e i ∈R k ,e i Is a column vector. The process is similar to the operation of the BGRU submodel described above, and thus will not be described again.
Get the state matrix E = (E) 0 ,e 1 ,e 2 ,…,e m ,e m+1 ) The state matrix may then be input to a CRF submodel to compute an optimal tag sequence.
Specifically, the state matrix E = (E) 0 ,e 1 ,e 2 ,…,e m ,e m+1 ) Inputting the CRF model into a CRF sub-model; calculating each tag sequence based on the input state matrix E
Figure BDA0003910410720000103
Total score of (c):
Figure BDA0003910410720000104
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003910410720000111
represents the tag sequence->
Figure BDA0003910410720000112
Is based on the total score of->
Figure BDA0003910410720000113
Representing the probability of the ith component in the state matrix E being classified to the jth label.
Based on each tag sequence
Figure BDA0003910410720000114
Is greater than or equal to>
Figure BDA0003910410720000115
Calculating an optimal label sequence>
Figure BDA0003910410720000116
Figure BDA0003910410720000117
Wherein the content of the first and second substances,
Figure BDA0003910410720000118
is a collection of all possible tag sequences.
It should be noted that, in the present embodiment, l = (l) for the vector sequence output based on the WordBert sub-model 0 ,l 1 ,l 2 ,…,l m ,l m+1 ) Determined state matrix E = (E) 0 ,e 1 ,e 2 ,…,e m ,e m+1 ) The vector sequence l = (l) output based on the Bert submodel is adopted 0 ,l 1 ,l 2 ,…,l n ,l n+1 ) Determined state matrix E = (E) 0 ,e 1 ,e 2 ,…,e n ,e n+1 ) Computing tag sequences in different computing manners
Figure BDA0003910410720000119
The total score of (2) is due to the fact that the state matrix obtained in the two cases is better in effect by adopting a differentiation calculation method. Of course, the vector sequence output based on the WordBert submodel l = (l) 0 ,l 1 ,l 2 ,…,l m ,l m+1 ) Determined state matrix E = (E) 0 ,e 1 ,e 2 ,…,e m ,e m+1 ) In calculating a tag sequence >>
Figure BDA00039104107200001110
The method of calculating the total score according to equation (1) can also be used because the method of equation (1) also introduces the adjustment factor α in consideration of the difference between the entity and the attribute (particularly, the formula of text information), but relatively speaking, when only the method is used for attribute extraction, the effect of equation (1) is slightly inferior to that of equation (4), but the method of equation (1) performs tag sequence transformation on the entity and the attribute without differentiation processing>
Figure BDA00039104107200001111
The total score of (a) is calculated, the performance will be much better.
Thereby, the extraction of the attribute based on the formula image can be realized.
In such a mode, the extraction of the relevant knowledge in the power standard knowledge (both belonging to attributes) in the formula image can be realized through the designed model, the problem of difficulty in extracting the power standard knowledge (due to the fact that the formula image is used for representing data of relevant information such as numerical value limitation and calculation modes, effective knowledge extraction cannot be realized in the prior art) is effectively solved, the extraction of the relevant knowledge in the formula image can be ensured, and the reliability of the extraction of the knowledge can also be ensured. In addition, word segmentation operation is not needed, and the WordBert submodel does not need to be trained by using segmented texts but by using whole sentences (particularly formulas, characters, operators and the like), so that the accuracy of formula class attribute extraction can be greatly improved. The designed WordBert submodel is adopted for the formula text, word segmentation operation is not involved, the processing process can be reduced, information can be effectively reserved, and the problem of wrong formula information extraction caused by word segmentation in the traditional Bert model is solved.
Furthermore, after the extraction of the entities and the attributes is realized, the vector sequences of the extracted entities and the attributes can be processed and then input into the relationship extraction submodel, so that the extraction of the relationship between the entities is realized.
The vector sequence of the extracted entities and the attributes is processed and then input into the relation extraction submodel, so that the extraction of the relation between the entities is realized, the subsequent relation processing can be carried out by utilizing the vector sequence obtained by processing the Bert submodel, and the extraction of the relation can be carried out after the corresponding processing, so that the workload of knowledge extraction can be effectively reduced (because the repeated entity extraction process is not required), and the entity is determined, so that the work can be doubled with half the effort in the relation extraction process.
Specifically, based on the extracted entities, the vector sequence l = (l) corresponding to the segmented text w 0 ,l 1 ,l 2 ,…,l n ,l n+1 ) The corresponding vector in (1) is labeled.
For example, the vector sequence l = (l) corresponding to the participle text w 0 ,l 1 ,l 2 ,…,l n ,l n+1 ) In, l 1 ,l 3 ,l 5 Corresponding participles are extracted as entities, then, can be pairedVector sequence l = (l) 0 ,l 1 ,l 2 ,…,l n ,l n+1 ) Marking the corresponding vector in (1) to obtain a marked vector sequence l' = (l) 0 ,l′ 1 ,l 2 ,l′ 3 ,l 4 ,l′ 5 ,…,l n ,l n+1 )。
The marked vector sequence l' may then be input into the relation extraction submodel. The relationship extraction submodel here belongs to a relationship extraction model based on Bert.
And for the marked marker vectors in the vector sequence l', performing binary mutual grouping on all the marker vectors so that each marker vector has a paired combination relationship with other marker vectors, and so that each marker vector has a paired combination relationship with other marker vectors to represent the marker vector pairs.
Following the previous example, for the marked vector sequence l' = (l) 0 ,l′ 1 ,l 2 ,l′ 3 ,l 4 ,l′ 5 ,…,l n ,l n+1 ) For all the token vectors l' 1 ,l′ 3 ,l′ 5 ) And performing binary mutual grouping to obtain three grouped mark vector pairs: (l' 1 ,l′ 3 )、(l′ 1 ,l′ 5 ) And (l' 3 ,l′ 5 )。
And splicing the two mark vectors of the mark vector pair to obtain a combined vector aiming at each mark vector pair with a combination relation. The splicing mode here can be: and splicing the two mark vectors of the mark vector pair end to obtain a corresponding combined vector. For example, a pair of marker vectors (l' 1 ,l′ 3 ) Obtaining a combined vector l after splicing c1 Vector pair of (/ ')' 1 ,l′ 5 ) Obtaining a combined vector l after splicing c2 Number vector pair (l' 3 ,l′ 5 ) Obtaining a combined vector l after splicing c3
Then, the score of each combined vector under each relation category can be calculated, and a corresponding score vector can be obtained
Figure BDA0003910410720000121
Wherein, P x And q is the number of relation categories.
May then be based on the vector sequence p x And determining the optimal score, wherein the relation category corresponding to the optimal score represents the relation category between the combination vectors. And sorting the optimal scores, and removing the optimal score at the tail of sorting.
And aiming at each residual optimal score, determining the relationship between the entities with the corresponding relationship categories between the entities corresponding to the combination vector, and realizing the extraction of the relationship between the entities. Therefore, the extraction of the relationship among the entities can be quickly, efficiently and accurately realized in a multi-entity binary mutual group mode, and the relationship among the entities can be considered.
For the corresponding relation between the attribute and the entity, the entity and the attribute can be corresponded in the combined extraction process; or after determining the entity and the attribute, attribution division can be carried out on the attribute and the entity; the attribution relationship between the entity and the attribute can also be extracted from the webpage by means of a wrapper (for example, by inputting a URL, using a tool to crawl the webpage, and using the wrapper to extract the attribute corresponding to the entity provided by the webpage and then attributing the extracted attribute).
It should be noted that, for the data source of the power standard knowledge, for each document (especially, the document whose content belongs to the normative file, such as overvoltage protection design specification of industrial and civil power devices, grounding design specification of industrial and civil power devices, lightning protection design specification of buildings, design specification of power devices in explosion and fire hazard places, etc.), the title can be extracted separately, a basic entity object is extracted, and key attributes such as formulation time, application scene, publishing unit, etc. are extracted as important factors in the application of the power standard knowledge graph such as subsequent intelligent question answering, personalized recommendation, etc.
S300: and performing knowledge fusion based on the extracted knowledge, and storing the knowledge after the knowledge fusion by adopting a Neo4j graph database so as to construct the power standard knowledge graph.
It should be noted that there are many ways of knowledge fusion, mainly requiring entity alignment and entity disambiguation. For example, the Jaccard algorithm based on string similarity can be employed to achieve entity alignment and entity disambiguation.
Further, a strategy of extracting and storing at the same time can be adopted: and temporarily storing the knowledge extraction result in a memory by JSON format data, and then submitting the knowledge extraction result to a Neo4j graph database through a py2Neo library of Python to realize persistent storage.
In this way, the construction of the power standard knowledge graph can be realized.
Further, the present embodiment also provides an electric power standard knowledge question-answering system, which includes:
the data layer comprises a pre-constructed power standard knowledge graph and a word segmentation dictionary constructed based on entities and attributes in the power standard knowledge graph;
the Web layer is used for receiving question information of a user and generating and displaying answer information based on a query result of the Web layer, wherein the question information is in a natural language form;
and the query layer is used for converting the question information into Cypher query sentences, sending the Cypher query sentences to the Neo4j graph database for query and acquiring query results.
The embodiment also provides a computer device, which is suitable for the case of the power standard knowledge graph construction method, and includes:
a memory and a processor; the memory is used for storing computer executable instructions, and the processor is used for executing the computer executable instructions to realize the power station area change relationship identification method provided by the embodiment.
The computer device may be a terminal comprising a processor, a memory, a communication interface, a display screen and an input means connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for communicating with an external terminal in a wired or wireless manner, and the wireless manner can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
The present embodiment also provides a storage medium on which a computer program is stored, which when executed by a processor implements the method for implementing a power standard knowledge graph as set forth in the above embodiments.
The storage medium proposed by the present embodiment belongs to the same inventive concept as the data storage method proposed by the above embodiments, and technical details that are not described in detail in the present embodiment can be referred to the above embodiments, and the present embodiment has the same beneficial effects as the above embodiments.
Example 2
Referring to fig. 1 to 3, a second embodiment of the present invention provides a method for constructing a power standard knowledge graph, and scientific demonstration is performed through experiments in order to verify the beneficial effects of the present invention.
In this embodiment, 984 labeled basic data are used to construct a data set, and the data set is 7:2:1, dividing the model into a training set (689), a verification set (197) and a test set (98), training, verifying and testing the model, taking accuracy, recall rate and F1 value as evaluation indexes, and verifying the effect of the model:
(1) The accuracy rate P represents the accuracy degree of model prediction, and the calculation formula is as follows:
Figure BDA0003910410720000141
where M represents the sample set for which the model predicts to be positive and T represents the sample set that is truly positive.
(2) The recall ratio R represents the comprehensive degree of model prediction, and the calculation formula is as follows:
Figure BDA0003910410720000142
(3) The F1 value is the combination of the precision P and the recall ratio R, and the calculation formula is as follows:
Figure BDA0003910410720000151
based on the effect verification on the model, the obtained relevant evaluation data is as follows: the precision rate P is approximately equal to 0.84, the recall rate R is approximately equal to 0.90, and F1 is approximately equal to 0.87. Therefore, the model is well represented, and the effect of extracting the power standard knowledge is good.
And based on the constructed power standard knowledge graph, a power standard knowledge question-answering system can be further constructed.
It should be noted that the above-mentioned embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.

Claims (10)

1. A power standard knowledge graph construction method is characterized by comprising the following steps: comprises the steps of (a) preparing a mixture of a plurality of raw materials,
constructing an ontology structure of the power standard knowledge graph through the acquired power standard data, wherein the ontology structure comprises entities, attributes and relationships among the entities;
acquiring basic data containing power standard knowledge, extracting the basic data to extract entities, attributes and relationships among the entities;
and performing knowledge fusion based on the extracted knowledge, storing the fused knowledge, and constructing a power standard knowledge map.
2. The power standard knowledge-graph construction method of claim 1, wherein: the acquiring of basic data containing power standard knowledge and the knowledge extraction of the basic data comprise,
preprocessing the basic data to obtain a plurality of text messages or obtain a plurality of text messages and at least one image message;
for each text message, segmenting the text message and inputting the segmented text message into a Bert submodel to obtain a corresponding vector sequence, inputting the vector sequence into a BGRU submodel, outputting a state matrix for revealing scores of labels corresponding to words in the text message, inputting the state matrix into a CRF submodel, calculating an optimal label sequence, and realizing extraction of an entity and extraction of attributes;
inputting the image information into a formula identification submodel called from the outside aiming at each image information to obtain converted text information, processing the converted text information to obtain at least one formula text, inputting each formula text into a WordBert submodel together to obtain a corresponding vector sequence, then inputting the vector sequence into a BGRU submodel, outputting a state matrix for revealing each label score corresponding to each formula text in the converted text information, inputting the state matrix into a CRF submodel, calculating an optimal label sequence, and realizing the extraction of attributes;
and processing the vector sequences of the extracted entities and attributes and then inputting the processed vector sequences into the relation extraction submodel to realize the extraction of the relation between the entities.
3. The power standard knowledge graph construction method according to claim 2, wherein: the knowledge extraction method for each text message includes,
segmenting the text information to obtain a segmented text w with the length of n;
segmenting the word text w = ([ CLS)],w 1 ,w 2 ,…,w n ,[SEP]) Is inputted intoThe Bert submodel obtains a vector sequence l = (l) corresponding to the participle text w 0 ,l 1 ,l 2 ,…,l n ,l n+1 ),l i ∈R n×L Wherein i is epsilon [0, n +1]Vector sequence l = (l) 0 ,l 1 ,l 2 ,…,l n ,l n+1 ) For the hidden state corresponding to the participle text w in the last layer of the Bert submodel, [ CLS]As an initiator, [ SEP]For the terminator, L is the hidden state dimension of the Bert submodel;
the vector sequence l = (l) 0 ,l 1 ,l 2 ,…,l n ,l n+1 ) In each word vector sequence l i As input for each time step in the BGRU submodel;
hidden state sequence for outputting forward GRU in BGRU submodel
Figure FDA0003910410710000021
And a hidden state sequence of inverted GRU outputs>
Figure FDA0003910410710000022
Calculating to obtain a hidden state sequence h corresponding to the vector sequence l n+1 ,h n+1 ∈R n×H H is the hidden state dimension of the BGRU submodel;
will hide the state sequence h n+1 Mapping from H dimension to k dimension, wherein k is the label number;
calculating the label scores of each participle classified into k labels to obtain a state matrix E = (E) 0 ,e 1 ,e 2 ,…,e n ,e n+1 ) Wherein e is i ∈R k Is a column vector;
and inputting the state matrix into a CRF submodel, and calculating an optimal label sequence.
4. The power standard knowledge graph construction method of claim 3, wherein: inputting the state matrix into a CRF submodel, calculating an optimal label sequence comprises,
state matrix E = (E) 0 ,e 1 ,e 2 ,…,e n ,e n+1 ) Inputting the CRF model into a CRF sub-model;
calculating each label sequence based on a constraint matrix F introduced in a CRF submodel and an input state matrix E
Figure FDA0003910410710000023
Total score of (c):
Figure FDA0003910410710000024
wherein F ∈ R (k+2)×(k+2)
Figure FDA0003910410710000025
Represents the tag sequence->
Figure FDA0003910410710000026
Alpha is a regulatory factor, based on the total score of (a)>
Figure FDA0003910410710000027
Represents the probability that the ith participle is classified into the jth tag in the state matrix E, and ` is greater or lesser than `>
Figure FDA0003910410710000028
Represents a number of times greater than or equal to a tag sequence>
Figure FDA0003910410710000029
The probability of the jth label transitioning to the j +1 th label;
based on each tag sequence
Figure FDA00039104107100000210
Is greater than or equal to>
Figure FDA00039104107100000211
Calculating an optimal label sequence>
Figure FDA00039104107100000212
Figure FDA00039104107100000213
Wherein the content of the first and second substances,
Figure FDA00039104107100000214
is a collection of all possible tag sequences.
5. The power standard knowledge graph construction method of claim 4, wherein: the knowledge extraction method for each image information includes,
identifying the converted text information, and determining whether a target symbol "=" exists;
if the target symbol "=" does not exist, determining the converted text information as a formula text;
if the target symbol "=" exists, splitting the converted text information by using the target symbol "=" to obtain a plurality of formula texts;
combining formula text by v = ([ CLS)],v 1 ,v 2 ,…,v m ,[SEP]) Inputting the vector sequence into a WordBert submodel to obtain a vector sequence l = (l) corresponding to a formula text combination v 0 ,l 1 ,l 2 ,…,l m ,l m+1 ),l i ∈R m×L Wherein i belongs to [0, m +1 ]]Vector sequence l = (l) 0 ,l 1 ,l 2 ,…,l m ,l m+1 ) Is a hidden state corresponding to a formula text combination v in the last layer of the WordPert submodel, [ CLS ]]As an initiator, [ SEP]For the terminator, L is the hidden state dimension of the WordBert submodel;
the vector sequence l = (l) 0 ,l 1 ,l 2 ,…,l m ,l m+1 ) In each formula vector sequence l i As input for each time step in the BGRU submodel; hidden state sequence for outputting forward GRU in BGRU submodel
Figure FDA0003910410710000031
And a hidden state sequence of inverted GRU outputs>
Figure FDA0003910410710000032
Calculating to obtain a hidden state sequence h corresponding to the vector sequence l m+1 ,h m+1 ∈R m×H H is the hidden state dimension of the BGRU submodel; will hide the state sequence h m+1 Mapping from H dimension to k dimension, wherein k is the number of labels; calculating the label scores of k labels classified by each formula to obtain a state matrix E = (E) 0 ,e 1 ,e 2 ,…,e m ,e m+1 ),e i ∈R k Is a column vector;
and inputting the state matrix into a CRF submodel, and calculating an optimal label sequence.
6. The power standard knowledge graph construction method of claim 5, wherein: inputting the state matrix into a CRF submodel, calculating an optimal label sequence comprises,
state matrix E = (E) 0 ,e 1 ,e 2 ,…,e m ,e m+1 ) Inputting the CRF model into a CRF sub-model;
calculating each tag sequence based on the input state matrix E
Figure FDA0003910410710000033
Total score of (c):
Figure FDA0003910410710000034
/>
wherein the content of the first and second substances,
Figure FDA0003910410710000035
represents a tag sequence +>
Figure FDA0003910410710000036
Total score ofValue,. Or>
Figure FDA0003910410710000037
Representing the probability of classifying the ith component into the jth label in the state matrix E;
based on each tag sequence
Figure FDA0003910410710000038
In total score +>
Figure FDA0003910410710000039
Calculating an optimal tag sequence->
Figure FDA00039104107100000310
Wherein the content of the first and second substances,
Figure FDA00039104107100000311
is a collection of all possible tag sequences.
7. The power standard knowledge graph construction method of claim 6, wherein: the vector sequence of the extracted entities and attributes is processed and then input into a relation extraction submodel to realize the extraction of the relation between the entities,
based on the extracted entities, the vector sequence l = (l) corresponding to the segmented text w 0 ,l 1 ,l 2 ,…,l n ,l n+1 ) The corresponding vector in (1) is marked;
inputting the marked vector sequence l' into a relation extraction submodel;
performing binary mutual grouping on all the marker vectors aiming at the marker vectors with the markers in the vector sequence l' so as to enable each marker vector and other marker vectors to have a paired combination relationship;
splicing two mark vectors of the mark vector pair to obtain a combined vector aiming at each mark vector pair with a combination relation;
calculating the score of each combination vector under each relation category;
and respectively obtaining the optimal scores corresponding to each combined vector, sorting, eliminating the last optimal score in the sorting, and determining the relationship between the entities with the corresponding relationship categories between the entities corresponding to the combined vectors aiming at each residual optimal score to realize the extraction of the relationship between the entities.
8. An electric power standard knowledge question-answering system based on the electric power standard knowledge graph construction method of claims 1-7, characterized in that: comprises the steps of (a) preparing a mixture of a plurality of raw materials,
the data layer comprises a pre-constructed power standard knowledge graph and a word segmentation dictionary constructed based on entities and attributes in the power standard knowledge graph;
the Web layer is used for receiving question information of a user and generating and displaying answer information based on a query result of the query layer, wherein the question information is in a natural language form;
and the query layer is used for converting the question information into Cypher query sentences, sending the Cypher query sentences to the Neo4j graph database for query and acquiring query results.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that: the processor, when executing the computer program, performs the steps of the method of any one of claims 1 to 7.
10. A computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program, when executed by a processor, implements the steps of the method of any one of claims 1 to 7.
CN202211320954.1A 2022-10-26 2022-10-26 Electric power standard knowledge graph construction method, knowledge question answering system and device Pending CN115934955A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211320954.1A CN115934955A (en) 2022-10-26 2022-10-26 Electric power standard knowledge graph construction method, knowledge question answering system and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211320954.1A CN115934955A (en) 2022-10-26 2022-10-26 Electric power standard knowledge graph construction method, knowledge question answering system and device

Publications (1)

Publication Number Publication Date
CN115934955A true CN115934955A (en) 2023-04-07

Family

ID=86654939

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211320954.1A Pending CN115934955A (en) 2022-10-26 2022-10-26 Electric power standard knowledge graph construction method, knowledge question answering system and device

Country Status (1)

Country Link
CN (1) CN115934955A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116186232A (en) * 2023-04-26 2023-05-30 中国电子技术标准化研究院 Standard knowledge intelligent question-answering implementation method, device, equipment and medium
CN117493645A (en) * 2023-12-29 2024-02-02 同略科技有限公司 Big data-based electronic archive recommendation system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116186232A (en) * 2023-04-26 2023-05-30 中国电子技术标准化研究院 Standard knowledge intelligent question-answering implementation method, device, equipment and medium
CN117493645A (en) * 2023-12-29 2024-02-02 同略科技有限公司 Big data-based electronic archive recommendation system
CN117493645B (en) * 2023-12-29 2024-04-12 同略科技有限公司 Big data-based electronic archive recommendation system

Similar Documents

Publication Publication Date Title
CN111221939B (en) Scoring method and device and electronic equipment
CN108182177A (en) A kind of mathematics knowledge-ID automation mask method and device
CN115934955A (en) Electric power standard knowledge graph construction method, knowledge question answering system and device
CN110929038A (en) Entity linking method, device, equipment and storage medium based on knowledge graph
CN110825875A (en) Text entity type identification method and device, electronic equipment and storage medium
JP7295189B2 (en) Document content extraction method, device, electronic device and storage medium
CN110825867B (en) Similar text recommendation method and device, electronic equipment and storage medium
CN112541122A (en) Recommendation model training method and device, electronic equipment and storage medium
CN116402063B (en) Multi-modal irony recognition method, apparatus, device and storage medium
CN111563384A (en) Evaluation object identification method and device for E-commerce products and storage medium
CN114722069A (en) Language conversion method and device, electronic equipment and storage medium
CN113707299A (en) Auxiliary diagnosis method and device based on inquiry session and computer equipment
CN114238571A (en) Model training method, knowledge classification method, device, equipment and medium
CN114595327A (en) Data enhancement method and device, electronic equipment and storage medium
CN115577095A (en) Graph theory-based power standard information recommendation method
CN111126610A (en) Topic analysis method, topic analysis device, electronic device and storage medium
CN110399547B (en) Method, apparatus, device and storage medium for updating model parameters
CN113987125A (en) Text structured information extraction method based on neural network and related equipment thereof
Lubis et al. Topic discovery of online course reviews using LDA with leveraging reviews helpfulness
Jian et al. An end-to-end algorithm for solving circuit problems
CN110852071A (en) Knowledge point detection method, device, equipment and readable storage medium
CN113822040A (en) Subjective question marking and scoring method and device, computer equipment and storage medium
CN113010657B (en) Answer processing method and answer recommendation method based on answer text
CN112085091B (en) Short text matching method, device, equipment and storage medium based on artificial intelligence
CN112559711A (en) Synonymous text prompting method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination