CN115455169A - Knowledge graph question-answering method and system based on vocabulary knowledge and semantic dependence - Google Patents

Knowledge graph question-answering method and system based on vocabulary knowledge and semantic dependence Download PDF

Info

Publication number
CN115455169A
CN115455169A CN202211342154.XA CN202211342154A CN115455169A CN 115455169 A CN115455169 A CN 115455169A CN 202211342154 A CN202211342154 A CN 202211342154A CN 115455169 A CN115455169 A CN 115455169A
Authority
CN
China
Prior art keywords
feature vector
query entity
entity set
question text
knowledge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211342154.XA
Other languages
Chinese (zh)
Other versions
CN115455169B (en
Inventor
嵇望
安毫亿
陈默
张羽
梁青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Yuanchuan Xinye Technology Co ltd
Original Assignee
Hangzhou Yuanchuan Xinye Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Yuanchuan Xinye Technology Co ltd filed Critical Hangzhou Yuanchuan Xinye Technology Co ltd
Priority to CN202211342154.XA priority Critical patent/CN115455169B/en
Publication of CN115455169A publication Critical patent/CN115455169A/en
Application granted granted Critical
Publication of CN115455169B publication Critical patent/CN115455169B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a knowledge graph question-answering method and a knowledge graph question-answering system based on lexical knowledge and semantic dependency, wherein the method comprises the following steps of: performing word segmentation and coding on a target question text to obtain a question text feature vector; carrying out named entity identification, keyword extraction and coding on the target question text to obtain a query entity set feature vector of a query entity set; carrying out syntax analysis and coding on the target question text to obtain a syntax dependence characteristic vector of the query entity set; extracting subgraphs of the candidate answer set from the knowledge graph based on the query entity set and coding to obtain subgraph feature vectors; and obtaining an answer of the text of the target question from the knowledge graph according to the four feature vectors. By the method and the device, the problem of low accuracy of the conventional knowledge map question and answer based on the question query entity set is solved, the integration of multiple feature vectors is realized by combining entity vocabulary knowledge and syntax information in the question, and the accuracy of the knowledge map question and answer is improved based on the integration features.

Description

Knowledge graph question-answering method and system based on vocabulary knowledge and semantic dependence
Technical Field
The application relates to the technical field of natural language processing, in particular to a knowledge graph question-answering method and a knowledge graph question-answering system based on vocabulary knowledge and semantic dependency.
Background
With the advent of the big data age, it becomes important how to help users quickly find desired information in mass information, and Knowledge Graphs (KGs) structurally store mass information in a triple form. The knowledge graph question answering (KBQA) can deeply understand the user question and give an answer by utilizing the rich semantic association information of the knowledge graph, and attracts the wide attention of academic and industrial circles in recent years. Knowledge-graph-based question answering is widely applied to the fields of medical treatment, finance, government affairs and the like. Users no longer satisfy single-hop questions about entity attributes, but rather tend to express complex multi-hop question-answering requirements more.
Most of the existing knowledge graph questions and answers are based on named entities to obtain query entity sets, key phrases in questions are not fully mined and utilized to expand the entity sets, entity loss is easily caused, and the accuracy of obtaining correct answers through the query entity sets is low.
At present, an effective solution is not provided aiming at the problem of low precision of the conventional knowledge graph question-answering based on question query entity sets in the related technology.
Disclosure of Invention
The embodiment of the application provides a knowledge-graph question-answering method and a knowledge-graph question-answering system based on vocabulary knowledge and semantic dependence, and aims to at least solve the problem that the prior knowledge-graph question-answering method based on question query entity sets in the related art is low in accuracy.
In a first aspect, an embodiment of the present application provides a knowledge graph question and answer method based on lexical knowledge and semantic dependency, where the method includes:
performing word segmentation and coding on a target question text to obtain a question text feature vector of the target question text;
conducting named entity recognition and keyword extraction on the target question text to obtain a query entity set in the target question text, and calculating to obtain a query entity set characteristic vector based on the query entity set;
a syntactic analysis tool is adopted to carry out syntactic analysis on the target question text to obtain a syntactic dependency relationship of the query entity set, and the syntactic dependency relationship is encoded to obtain a syntactic dependency feature vector of the query entity set;
extracting a subgraph of a candidate answer set from a knowledge graph based on the query entity set, and coding the subgraph of the candidate answer set to obtain a subgraph feature vector of the candidate answer set;
performing feature fusion based on an attention mechanism on the sub-image feature vector, the question text feature vector, the syntax dependency feature vector and the query entity set feature vector respectively to obtain a final feature vector;
and obtaining an answer of the target question text from the knowledge graph based on the final feature vector.
In some embodiments, calculating the query entity set feature vector based on the query entity set includes:
coding the query entity set to obtain an embedded feature vector of the query entity set;
acquiring the basic unit category of the query entity set from a preset common sense knowledge base, and coding the basic unit category to obtain a category feature vector of the query entity set;
and performing feature fusion based on an attention mechanism on the embedded feature vector and the category feature vector to obtain a query entity set feature vector.
In some embodiments, performing attention-based feature fusion on the sub-graph feature vector, the question text feature vector, the syntax dependency feature vector, and the query entity set feature vector, respectively, to obtain a final feature vector includes:
performing attention-based feature fusion on the subgraph feature vector and the question text feature vector to obtain a first fusion feature vector;
performing feature fusion based on an attention mechanism on the sub-graph feature vector and the syntactic dependency feature vector to obtain a second fusion feature vector;
performing feature fusion based on an attention mechanism on the sub-image feature vector and the query entity set feature vector to obtain a third fusion feature vector;
and fusing the first fusion characteristic vector, the second fusion characteristic vector and the third fusion characteristic vector based on the weighted average to obtain a final characteristic vector.
In some embodiments, encoding the syntactic dependency, obtaining a syntactic dependency feature vector for the query entity set includes:
and coding the syntactic dependency relationship through a GCN (generalized regression network) graph convolution neural network to obtain the syntactic dependency feature vector of the query entity set.
In some embodiments, performing word segmentation and coding on a target question text to obtain a question text feature vector of the target question text includes:
and performing word segmentation and coding on the target question text through a BiGRU network to obtain a question text feature vector of the target question text.
In some embodiments, encoding a sub-graph of the candidate answer set to obtain a sub-graph feature vector of the candidate answer set includes:
and coding the subgraph of the candidate answer set through an R-GCN relation graph convolutional neural network to obtain a subgraph feature vector of the candidate answer set.
In some embodiments, encoding the set of query entities and obtaining the embedded feature vectors of the set of query entities comprises:
and coding the query entity set through a TransE vectorization tool to obtain the embedded characteristic vector of the query entity set.
In some embodiments, obtaining the basic unit category of the query entity set from a preset common sense knowledge base, and encoding the basic unit category to obtain the category feature vector of the query entity set includes:
acquiring the basic unit type of the query entity set from a Hownet common sense knowledge base;
and coding the basic unit category through PCA principal component analysis and a unique hot coding tool to obtain a category feature vector of the query entity set.
In some embodiments, performing attention-based feature fusion on the embedded feature vector and the category feature vector to obtain a query entity set feature vector comprises:
and performing feature fusion on the embedded feature vector and the category feature vector by adopting a Concat Attention mechanism to obtain a feature vector of the query entity set.
In a second aspect, the embodiment of the present application provides a knowledge graph question-answering system based on lexical knowledge and semantic dependency, which includes a first branch module, a second branch module, a third branch module, a branch fusion module, and a prediction judgment module;
the first branch module is used for performing word segmentation and coding on a target question text to obtain a question text feature vector of the target question text;
the second branch module is used for carrying out named entity identification and keyword extraction on the target question text to obtain a query entity set in the target question text, and calculating to obtain a query entity set feature vector based on the query entity set;
the third branch module is used for carrying out syntactic analysis on the target question text by adopting a syntactic analysis tool to obtain a syntactic dependency relationship of the query entity set, and coding the syntactic dependency relationship to obtain a syntactic dependency feature vector of the query entity set;
the branch fusion module is used for extracting a sub-graph of a candidate answer set from a knowledge graph based on the query entity set and coding the sub-graph of the candidate answer set to obtain a sub-graph feature vector of the candidate answer set;
performing feature fusion based on an attention mechanism on the sub-image feature vector, the question text feature vector, the syntax dependency feature vector and the query entity set feature vector respectively to obtain a final feature vector;
and the prediction judgment module is used for obtaining the answer of the target question text from the knowledge graph according to the final feature vector.
Compared with the related technology, the knowledge map question-answering method and system based on vocabulary knowledge and semantic dependency provided by the embodiment of the application have the advantages that the target question text is segmented and coded to obtain the question text characteristic vector of the target question text; carrying out named entity recognition and keyword extraction on the target question text to obtain a query entity set in the target question text, and calculating to obtain a query entity set characteristic vector based on the query entity set; a syntactic analysis tool is adopted to carry out syntactic analysis on the target question text to obtain syntactic dependency relationships of the query entity set, and the syntactic dependency relationships are encoded to obtain syntactic dependency feature vectors of the query entity set; extracting subgraphs of the candidate answer set from the knowledge graph based on the query entity set, and coding the subgraphs of the candidate answer set to obtain subgraph feature vectors of the candidate answer set; performing feature fusion based on an attention mechanism on the sub-image feature vector, the question text feature vector, the syntax dependence feature vector and the query entity set feature vector respectively to obtain a final feature vector; the answer of the target question text is obtained from the knowledge map based on the final feature vector, the problem that the accuracy of the conventional knowledge map question and answer based on the question query entity set is low is solved, the fusion of multiple feature vectors is realized by combining the entity vocabulary knowledge and the syntax information in the question, and the accuracy of the knowledge map question and answer is improved based on the fusion features.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flow diagram of steps of a knowledge-graph question-answering method based on lexical knowledge and semantic dependencies, according to an embodiment of the application;
FIG. 2 is a schematic diagram of a structure of a knowledge-graph question-answer model according to an embodiment of the present application;
FIG. 3 is a block diagram of a knowledge-graph question-answering system based on lexical knowledge and semantic dependencies according to an embodiment of the present application;
fig. 4 is an internal structural diagram of an electronic device according to an embodiment of the present application.
Description of the drawings: 31. a first branching module; 32. a second branching module; 33. a third branch module; 34. a branch fusion module; 35. and a prediction judgment module.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application.
It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is to be expressly and implicitly understood by one of ordinary skill in the art that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The use of the terms "including," "comprising," "having," and any variations thereof herein, is meant to cover a non-exclusive inclusion; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as referred to herein means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
The embodiment of the present application provides a knowledge-graph question-answering method based on lexical knowledge and semantic dependency, fig. 1 is a flow chart of steps of the knowledge-graph question-answering method based on lexical knowledge and semantic dependency according to the embodiment of the present application, as shown in fig. 1, the method includes the following steps:
step S102, performing word segmentation and coding on a target question text to obtain a question text feature vector of the target question text;
specifically, the target question text is subjected to word segmentation and coding through the BiGRU network, and a question text feature vector of the target question text is obtained.
Preferably, for the target question textQ = (w 1w 2 ,…,w M ) Dividing words and coding by adopting a BiGRU network to obtain question text feature vectors of the word level of the target question texta = (a 1a 2 ,…,a N ) The coding process formula is as follows:
Figure 330014DEST_PATH_IMAGE001
step S104, conducting named entity identification and keyword extraction on the target question text to obtain a query entity set in the target question text, and calculating to obtain a query entity set characteristic vector based on the query entity set;
specifically, step S104 further includes the steps of:
step S41, carrying out named entity identification and keyword extraction on the target question text, then merging and removing duplication to obtain a query entity set in the target question textL = (e 1e 2 ,…,e N ),
S42, coding the query entity set to obtain an embedded feature vector of the query entity set;
s43, acquiring the basic unit category of the query entity set from a preset common sense knowledge base, and coding the basic unit category to obtain a category feature vector of the query entity set;
and S44, performing feature fusion based on an attention mechanism on the embedded feature vector and the category feature vector to obtain a feature vector of the query entity set.
Step S42 preferably encodes the query entity set by using a transformatting vectorization tool (transforming Embedding) to obtain an embedded feature vector of the query entity setb = (b 1b 2 ,…,b N )。
Step S43 preferably obtains basic unit categories (such as everything, components, attributes, time, space, attribute values, events, etc.) of the query entity set from the homelet common sense knowledge base; the basic unit category is coded through PCA principal component analysis and one hot encoding tool (one hot encoding) to obtain the category characteristic vector of the query entity setc = (c 1c 2 ,…,c N )。
Preferably, in step S44, a Concat Attention mechanism is adopted to perform feature fusion on the embedded feature vector and the category feature vector to obtain a feature vector of the query entity setu = (u 1u 2 ,…,u N ) The formula of the fusion process is as follows:
Figure 825324DEST_PATH_IMAGE002
it should be noted that in the formula of the fusion process
Figure 41191DEST_PATH_IMAGE003
The attention scoring function is in the form of an additive model, and may be in the form of a dot product model, a scaled dot product model, a bilinear model, etc., which will not be described in detail herein.
Step S104, basic unit type characteristics of the query entity set are supplemented by using a Hownet common sense knowledge base to construct a model, the depth characteristics of the question are fully mined, the characteristic information among the query entities is enriched, the accuracy of obtaining correct answers from the candidate entity set is improved, and the user experience is further improved.
Step S106, carrying out syntax analysis on the target question text by adopting a syntax analysis tool to obtain a syntax dependence relationship of the query entity set, and coding the syntax dependence relationship to obtain a syntax dependence feature vector of the query entity set;
preferably, step S106 further comprises the steps of:
step S61, carrying out syntactic analysis on the target question text by adopting an open source syntactic analysis tool, and defining an undirected graphG = (vε) Is a sentence target question textQ = (w 1w 2 ,…,w M ) The dependency syntax tree of (2) is,V = (v 1v 2 ,…,v M ) Andεis a set of nodes and edges corresponding to the graph, each node in the syntax treev p Representing a word in a sentencew q If an edge (v pv q ) Belong toεThen represents a wordv p Andv q there is a directed syntactic arc; and further obtaining the syntactic dependency relationship of the query entity set.
Step S62, the syntactic dependency relationship is coded through a GCN (Graph Convolutional neural Network), and the syntactic dependency feature vector of the query entity set is obtainedd = (d 1d 2 ,…,d T ) Wherein, the first step of the GCN graph convolution neural networkrLayer, nodevThe convolution vector of (a) can be expressed as:
Figure 429142DEST_PATH_IMAGE004
Wandbare the corresponding weight and the offset of the weight,N(v) Is thatvCorresponding domain set and includesvfIs an activation function.
Step S106 utilizes the syntactic dependency feature vector supplement of the query entity set to carry out model construction, fully excavates the depth features of the question, enriches the feature information between query entities, improves the accuracy of obtaining correct answers from the candidate entity set, and further improves the user experience.
Step S108, extracting subgraphs of the candidate answer set from the knowledge graph based on the query entity set, and coding the subgraphs of the candidate answer set to obtain subgraph feature vectors of the candidate answer set;
preferably, step S108 further comprises the steps of:
step S81, extracting subgraphs of the candidate answer set from the knowledge graph based on the query entity set.
Step S82, coding the subgraph of the candidate answer set through an R-GCN (Relational-Graph Convolutional neural Network) to obtain a subgraph feature vector of the candidate answer set, wherein for a certain entity in the knowledge Graph (the subgraph of the candidate answer set), the Relational Network of the entity in the knowledge Graph can be represented asG = (VεR) In which the entityv i Belong toVRelationship (a)v irv j ) Belong toεrBelong toRRepresenting a type of relationship, an entity (node)v i The R-GCN convolution vector of (a) is expressed as:
Figure 38590DEST_PATH_IMAGE005
Figure 440271DEST_PATH_IMAGE006
representing nodesv i All the neighbor nodes contained in the relationship set R,c i,r represent a normalized hyper-parameter that can be customized or learned,
Figure 166655DEST_PATH_IMAGE007
representing activation functions (e.g., relu, etc.).
Step S110, performing feature fusion based on an attention mechanism on the sub-image feature vector, the question text feature vector, the syntax dependence feature vector and the query entity set feature vector respectively to obtain a final feature vector;
preferably, step S110 further comprises the steps of:
step S1101, performing attention-based feature fusion on the subgraph feature vector and the question text feature vector to obtain a first fusion feature vector
Figure 302888DEST_PATH_IMAGE008
The formula for the attention mechanism is as follows:
Figure 851113DEST_PATH_IMAGE009
step S1102, performing attention-based feature fusion on the subgraph feature vector and the syntactic dependency feature vector to obtain a second fusion feature vector
Figure 809098DEST_PATH_IMAGE010
The formula for the attention mechanism is as follows:
Figure 673760DEST_PATH_IMAGE011
step S1103, performing attention-based feature fusion on the sub-image feature vectors and the query entity set feature vectors to obtain third fusion feature vectors
Figure 500552DEST_PATH_IMAGE012
The formula for the attention mechanism is as follows:
Figure 61589DEST_PATH_IMAGE013
step S1104, based on the weighted average, the first fusion feature vector, the second fusion feature vector and the third fusion feature vector are combinedFusing the combined feature vectors to obtain final feature vectors
Figure 927609DEST_PATH_IMAGE014
Step S110 focuses model construction on important features such as question text feature vectors, syntax dependency feature vectors, query entity set feature vectors, and the like by using an attention mechanism, so as to shorten the time for obtaining answers of questions by the model and improve the efficiency of knowledge-graph question answering.
And step S112, obtaining the answer of the target question text from the knowledge graph based on the final feature vector.
It should be noted that fig. 2 is a schematic structural diagram of a knowledge-graph question-answer model according to an embodiment of the present application, and as shown in fig. 2, the knowledge-graph question-answer model is based on the above steps S102 to S112, and has a training phase and an application phase, wherein the training phase preferably uses a negative log-likelihood function as a loss function, and selects a model with the best F1 index on a development set for storage, and the model can accurately find an answer from a knowledge graph and reply to a user.
Through steps S102 to S112 in the embodiment of the present application, a problem that the precision of the conventional knowledge-graph question-answering based on a question query entity set is low is solved, multi-feature vector fusion is performed in combination with entity vocabulary knowledge and syntax information in a question, and the precision of the knowledge-graph question-answering is improved based on the fusion features.
It should be noted that the steps illustrated in the above-described flow diagrams or in the flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order different than here.
The embodiment of the present application provides a knowledge-graph question-answering system based on lexical knowledge and semantic dependency, fig. 3 is a structural block diagram of the knowledge-graph question-answering system based on lexical knowledge and semantic dependency according to the embodiment of the present application, as shown in fig. 3, the system includes a first branch module 31, a second branch module 32, a third branch module 33, a branch fusion module 34 and a prediction judgment module 35;
the first branching module 31 is configured to perform word segmentation and coding on the target question text to obtain a question text feature vector of the target question text;
the second branch module 32 is configured to perform named entity identification and keyword extraction on the target question text to obtain a query entity set in the target question text, and calculate to obtain a query entity set feature vector based on the query entity set;
the third branch module 33 is configured to perform syntax parsing on the target question text by using a syntax analysis tool to obtain a syntax dependency relationship of the query entity set, and encode the syntax dependency relationship to obtain a syntax dependency feature vector of the query entity set;
the branch fusion module 34 is configured to extract subgraphs of the candidate answer set from the knowledge graph based on the query entity set, and encode the subgraphs of the candidate answer set to obtain subgraph feature vectors of the candidate answer set;
performing attention-based feature fusion on the sub-image feature vector, the question text feature vector, the syntax dependence feature vector and the query entity set feature vector respectively to obtain a final feature vector;
and the prediction judgment module 35 is configured to obtain an answer of the target question text from the knowledge graph according to the final feature vector.
Through the first branch module 31, the second branch module 32, the third branch module 33, the branch fusion module 34 and the prediction judgment module 35 in the embodiment of the application, the problem of low precision of the conventional knowledge-graph question and answer based on a question query entity set is solved, the fusion of multiple feature vectors is realized by combining the entity vocabulary knowledge and the syntax information in the question, and the precision of the knowledge-graph question and answer is improved based on the fusion features.
The above modules may be functional modules or program modules, and may be implemented by software or hardware. For a module implemented by hardware, the modules may be located in the same processor; or the modules can be respectively positioned in different processors in any combination.
The present embodiment also provides an electronic device comprising a memory having a computer program stored therein and a processor configured to execute the computer program to perform the steps of any of the above method embodiments.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
It should be noted that, for specific examples in this embodiment, reference may be made to the examples described in the foregoing embodiment and optional implementation manners, and details of this embodiment are not described herein again.
In addition, in combination with the knowledge graph question-answering method based on lexical knowledge and semantic dependency in the foregoing embodiments, the embodiments of the present application may provide a storage medium to implement. The storage medium having stored thereon a computer program; the computer program is executed by a processor to implement the method for knowledge graph question answering based on vocabulary knowledge and semantic dependency in any of the above embodiments.
In one embodiment, a computer device is provided, which may be a terminal. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method for knowledge-graph question-answering based on lexical knowledge and semantic dependencies. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on a shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
In one embodiment, fig. 4 is a schematic diagram of an internal structure of an electronic device according to an embodiment of the present application, and as shown in fig. 4, there is provided an electronic device, which may be a server, and its internal structure diagram may be as shown in fig. 4. The electronic device comprises a processor, a network interface, an internal memory and a non-volatile memory connected by an internal bus, wherein the non-volatile memory stores an operating system, a computer program and a database. The processor is used for providing calculation and control capabilities, the network interface is used for communicating with an external terminal through network connection, the internal memory is used for providing an environment for an operating system and the running of a computer program, the computer program is executed by the processor to realize a knowledge graph question-answering method based on vocabulary knowledge and semantic dependency, and the database is used for storing data.
Those skilled in the art will appreciate that the configuration shown in fig. 4 is a block diagram of only a portion of the configuration associated with the present application, and does not constitute a limitation on the electronic device to which the present application is applied, and a particular electronic device may include more or less components than those shown in the drawings, or combine certain components, or have a different arrangement of components.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It should be understood by those skilled in the art that various features of the above-described embodiments can be combined in any combination, and for the sake of brevity, all possible combinations of features in the above-described embodiments are not described in detail, but rather, all combinations of features which are not inconsistent with each other should be construed as being within the scope of the present disclosure.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is specific and detailed, but not to be understood as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, and these are all within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A knowledge graph question-answering method based on vocabulary knowledge and semantic dependency is characterized by comprising the following steps:
performing word segmentation and coding on a target question text to obtain a question text characteristic vector of the target question text;
carrying out named entity identification and keyword extraction on the target question text to obtain a query entity set in the target question text, and calculating to obtain a query entity set characteristic vector based on the query entity set;
syntax analysis is carried out on the target question text by adopting a syntax analysis tool to obtain syntax dependence of the query entity set, and the syntax dependence is encoded to obtain syntax dependence characteristic vectors of the query entity set;
extracting a subgraph of a candidate answer set from a knowledge graph based on the query entity set, and coding the subgraph of the candidate answer set to obtain a subgraph feature vector of the candidate answer set;
performing attention-based feature fusion on the sub-graph feature vector, the question text feature vector, the syntax dependency feature vector and the query entity set feature vector respectively to obtain a final feature vector;
and obtaining an answer of the target question text from the knowledge graph based on the final feature vector.
2. The method of claim 1, wherein computing a query entity set feature vector based on the query entity set comprises:
coding the query entity set to obtain an embedded feature vector of the query entity set;
acquiring the basic unit category of the query entity set from a preset common sense knowledge base, and coding the basic unit category to obtain a category feature vector of the query entity set;
and performing feature fusion based on an attention mechanism on the embedded feature vector and the category feature vector to obtain a query entity set feature vector.
3. The method of claim 1, wherein performing attention-based feature fusion on the sub-graph feature vector, the question text feature vector, the syntactic dependency feature vector, and the query entity set feature vector, respectively, to obtain a final feature vector comprises:
performing feature fusion based on an attention mechanism on the sub-image feature vector and the question text feature vector to obtain a first fusion feature vector;
performing feature fusion based on an attention mechanism on the sub-graph feature vector and the syntactic dependency feature vector to obtain a second fusion feature vector;
performing feature fusion based on an attention mechanism on the sub-image feature vector and the query entity set feature vector to obtain a third fusion feature vector;
and fusing the first fusion characteristic vector, the second fusion characteristic vector and the third fusion characteristic vector based on weighted average to obtain a final characteristic vector.
4. The method of claim 1, wherein encoding the syntactic dependency, obtaining the syntactic dependency feature vector for the query entity set comprises:
and coding the syntactic dependency relationship through a GCN graph convolution neural network to obtain the syntactic dependency feature vector of the query entity set.
5. The method of claim 1, wherein the step of segmenting and encoding the target question text to obtain the question text feature vector of the target question text comprises the steps of:
and performing word segmentation and coding on the target question text through a BiGRU network to obtain a question text feature vector of the target question text.
6. The method of claim 1, wherein encoding a sub-graph of the set of candidate answers, resulting in a sub-graph feature vector for the set of candidate answers comprises:
and coding the subgraph of the candidate answer set through an R-GCN relation graph convolutional neural network to obtain a subgraph feature vector of the candidate answer set.
7. The method of claim 2, wherein encoding the set of query entities to obtain embedded feature vectors for the set of query entities comprises:
and coding the query entity set through a TransE vectorization tool to obtain the embedded characteristic vector of the query entity set.
8. The method of claim 2, wherein obtaining the basic unit category of the query entity set from a predetermined common sense knowledge base and encoding the basic unit category to obtain the category feature vector of the query entity set comprises:
acquiring the basic unit type of the query entity set from a Hownet common sense knowledge base;
and coding the basic unit category through PCA principal component analysis and a one-hot coding tool to obtain a category feature vector of the query entity set.
9. The method of claim 2, wherein performing attention-based feature fusion on the embedded feature vector and the category feature vector to obtain a query entity set feature vector comprises:
and performing feature fusion on the embedded feature vector and the category feature vector by adopting a Concat extension mechanism to obtain a query entity set feature vector.
10. A knowledge map question-answering system based on vocabulary knowledge and semantic dependency is characterized by comprising a first branch module, a second branch module, a third branch module, a branch fusion module and a prediction judgment module;
the first branch module is used for performing word segmentation and coding on a target question text to obtain a question text feature vector of the target question text;
the second branch module is used for carrying out named entity identification and keyword extraction on the target question text to obtain a query entity set in the target question text, and calculating to obtain a query entity set feature vector based on the query entity set;
the third branch module is used for carrying out syntactic analysis on the target question text by adopting a syntactic analysis tool to obtain a syntactic dependency relationship of the query entity set, and coding the syntactic dependency relationship to obtain a syntactic dependency feature vector of the query entity set;
the branch fusion module is used for extracting a sub-graph of a candidate answer set from a knowledge graph based on the query entity set and coding the sub-graph of the candidate answer set to obtain a sub-graph feature vector of the candidate answer set;
performing feature fusion based on an attention mechanism on the sub-image feature vector, the question text feature vector, the syntax dependency feature vector and the query entity set feature vector respectively to obtain a final feature vector;
and the prediction judgment module is used for obtaining an answer of the target question text from the knowledge graph according to the final characteristic vector.
CN202211342154.XA 2022-10-31 2022-10-31 Knowledge graph question-answering method and system based on vocabulary knowledge and semantic dependence Active CN115455169B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211342154.XA CN115455169B (en) 2022-10-31 2022-10-31 Knowledge graph question-answering method and system based on vocabulary knowledge and semantic dependence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211342154.XA CN115455169B (en) 2022-10-31 2022-10-31 Knowledge graph question-answering method and system based on vocabulary knowledge and semantic dependence

Publications (2)

Publication Number Publication Date
CN115455169A true CN115455169A (en) 2022-12-09
CN115455169B CN115455169B (en) 2023-04-18

Family

ID=84310971

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211342154.XA Active CN115455169B (en) 2022-10-31 2022-10-31 Knowledge graph question-answering method and system based on vocabulary knowledge and semantic dependence

Country Status (1)

Country Link
CN (1) CN115455169B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115659058A (en) * 2022-12-30 2023-01-31 杭州远传新业科技股份有限公司 Method and device for generating questions

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110457431A (en) * 2019-07-03 2019-11-15 深圳追一科技有限公司 Answering method, device, computer equipment and the storage medium of knowledge based map
CN112364132A (en) * 2020-11-12 2021-02-12 苏州大学 Similarity calculation model and system based on dependency syntax and method for building system
CN113254659A (en) * 2021-02-04 2021-08-13 天津德尔塔科技有限公司 File studying and judging method and system based on knowledge graph technology
CN114090748A (en) * 2021-11-04 2022-02-25 海信电子科技(武汉)有限公司 Question and answer result display method, device, equipment and storage medium
US20220198154A1 (en) * 2020-04-03 2022-06-23 Tencent Technology (Shenzhen) Company Limited Intelligent question answering method, apparatus, and device, and computer-readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110457431A (en) * 2019-07-03 2019-11-15 深圳追一科技有限公司 Answering method, device, computer equipment and the storage medium of knowledge based map
US20220198154A1 (en) * 2020-04-03 2022-06-23 Tencent Technology (Shenzhen) Company Limited Intelligent question answering method, apparatus, and device, and computer-readable storage medium
CN112364132A (en) * 2020-11-12 2021-02-12 苏州大学 Similarity calculation model and system based on dependency syntax and method for building system
CN113254659A (en) * 2021-02-04 2021-08-13 天津德尔塔科技有限公司 File studying and judging method and system based on knowledge graph technology
CN114090748A (en) * 2021-11-04 2022-02-25 海信电子科技(武汉)有限公司 Question and answer result display method, device, equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ZHANG, Y ET AL.: "MKGN: A Multi-Dimensional Knowledge Enhanced Graph Network for Multi-Hop Question and Answering", 《IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS》 *
刘峰等: "基于Multi-head Attention和Bi-LSTM的实体关系分类", 《计算机系统应用》 *
张翠等: "融合句法依存树注意力的关系抽取研究", 《广东通信技术》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115659058A (en) * 2022-12-30 2023-01-31 杭州远传新业科技股份有限公司 Method and device for generating questions

Also Published As

Publication number Publication date
CN115455169B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
US11017178B2 (en) Methods, devices, and systems for constructing intelligent knowledge base
KR102491172B1 (en) Natural language question-answering system and learning method
CN112149400B (en) Data processing method, device, equipment and storage medium
CN110866098B (en) Machine reading method and device based on transformer and lstm and readable storage medium
CN112214593A (en) Question and answer processing method and device, electronic equipment and storage medium
CN112287069B (en) Information retrieval method and device based on voice semantics and computer equipment
CN112215008A (en) Entity recognition method and device based on semantic understanding, computer equipment and medium
US11170169B2 (en) System and method for language-independent contextual embedding
CN109710921B (en) Word similarity calculation method, device, computer equipment and storage medium
CN115455169B (en) Knowledge graph question-answering method and system based on vocabulary knowledge and semantic dependence
CN115062134A (en) Knowledge question-answering model training and knowledge question-answering method, device and computer equipment
CN111177404A (en) Knowledge graph construction method and device of home decoration knowledge and computer equipment
CN113342944B (en) Corpus generalization method, apparatus, device and storage medium
CN114372454A (en) Text information extraction method, model training method, device and storage medium
CN116467412A (en) Knowledge graph-based question and answer method, system and storage medium
CN114398903B (en) Intention recognition method, device, electronic equipment and storage medium
CN114491076B (en) Data enhancement method, device, equipment and medium based on domain knowledge graph
CN115203388A (en) Machine reading understanding method and device, computer equipment and storage medium
CN111507098B (en) Ambiguous word recognition method and device, electronic equipment and computer-readable storage medium
CN115270746A (en) Question sample generation method and device, electronic equipment and storage medium
CN115145980A (en) Dialog reply generation method and device, electronic equipment and storage medium
CN114090778A (en) Retrieval method and device based on knowledge anchor point, electronic equipment and storage medium
CN112749251B (en) Text processing method, device, computer equipment and storage medium
CN114238715A (en) Question-answering system based on social aid, construction method, computer equipment and medium
CN110175331B (en) Method and device for identifying professional terms, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant