CN110717025B - Question answering method and device, electronic equipment and storage medium - Google Patents

Question answering method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN110717025B
CN110717025B CN201910950761.6A CN201910950761A CN110717025B CN 110717025 B CN110717025 B CN 110717025B CN 201910950761 A CN201910950761 A CN 201910950761A CN 110717025 B CN110717025 B CN 110717025B
Authority
CN
China
Prior art keywords
template
natural language
question
language question
language
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910950761.6A
Other languages
Chinese (zh)
Other versions
CN110717025A (en
Inventor
周丽芳
尹存祥
骆金昌
方军
钟辉强
吴晓晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201910950761.6A priority Critical patent/CN110717025B/en
Publication of CN110717025A publication Critical patent/CN110717025A/en
Application granted granted Critical
Publication of CN110717025B publication Critical patent/CN110717025B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The application discloses a question and answer method, a question and answer device, electronic equipment and a storage medium, and relates to the field of knowledge base question and answer. The specific implementation scheme is as follows: determining a natural language question template matched with a natural language question, wherein the natural language question template can be matched with at least one type of natural language question; searching a first language question template corresponding to the natural language question template according to the corresponding relation between the natural language question template and the first language question template; and generating a first language question corresponding to the natural language question by adopting the first language question template. The natural language problem template adopted by the embodiment of the application can be matched with at least one type of natural problem template, the service retrieval requirement can be used without compiling a large number of templates, and the labor and time cost is saved.

Description

Question answering method and device, electronic equipment and storage medium
Technical Field
The application relates to the technical field of computers, in particular to the field of Knowledge Base Question Answering (KB-QA).
Background
The knowledge base stores a large number of entities and relationships in the form of triples, and when a user asks a natural language question, the natural language question needs to be converted into a language understood by the knowledge base. Most of the existing question-answering systems are realized based on semantic analysis, vector modeling and other modes. Among them, vector modeling is poor in controllability and accuracy. When the way of semantic analysis is faced with more complex business data, a regular template based on character strings is usually adopted to realize the conversion of problems. A regular template generally can only correspond to an inquiry method of a natural language problem, so that a large number of regular templates need to be written to meet the business retrieval requirement.
Disclosure of Invention
The embodiment of the invention provides a question answering method and a question answering device, which are used for at least solving the technical problems in the prior art.
In a first aspect, an embodiment of the present invention provides a question answering method, including:
determining a natural language question template matched with the natural language question, wherein the natural language question template can be matched with at least one type of natural language question;
searching a first language question template corresponding to the natural language question template according to the corresponding relation between the natural language question template and the first language question template;
and generating a first language question corresponding to the natural language question by adopting the first language question template.
Because one natural language problem template can be matched with at least one type of natural language problem, the fact that one natural template is compiled for each natural language problem is avoided, the business retrieval requirements can be used without compiling a large number of templates, and labor and time costs are saved.
In one embodiment, the first language question template is a SPARQL language question template and the first language question is a question described in SPARQL language.
By adopting the SPARQL language question template, the corresponding SPARQL question can be generated, and the query and the retrieval are facilitated.
In one embodiment, before determining the natural language question template matching the natural language question, the method further comprises:
and setting a plurality of rule templates, wherein each rule template comprises a corresponding relation between a natural language question template and a first language question template.
In one embodiment, the rule template is set by:
collecting a plurality of expression modes of at least one type of natural language questions;
analyzing keywords in various expression modes, and setting a vocabulary matching rule according to the keywords; analyzing entities or attributes in various expression modes, and setting part-of-speech matching rules according to the entities or the attributes;
constructing a natural language problem template according to the vocabulary matching rule and the part of speech matching rule;
and constructing a first language question template corresponding to the natural language question template according to the natural language question template and the construction rule of the first language question template.
Each rule template corresponds to multiple expression modes of at least one type of natural language problem, so that the large number of rule templates can be prevented from being written, and the efficiency is improved.
In one embodiment, the method further comprises:
and counting the natural language problems which cannot be matched with the natural language problem template, and updating and maintaining the rule template according to the counting result.
The rule template is updated and maintained, and the robustness of the question answering system can be improved.
In one embodiment, the method further comprises:
acquiring initial data;
carrying out structuralization processing on the initial data to obtain structuralized entity data, and storing entity names in the structuralized entity data according to a format of a user-defined dictionary;
defining ontology description of the knowledge base according to the structured entity data, and adding the meaning of the attribute in the attribute of the ontology description;
and converting the structured entity data into data in a Resource Description Framework (RDF) format according to the ontology description.
Data in RDF format can facilitate query retrieval using the SPARQL problem.
In one embodiment, the method further comprises:
and searching data in a resource description framework format by adopting the first language question to obtain an answer aiming at the natural language question.
In one embodiment, the method further comprises: extracting the attribute of the body description, and taking the extracted attribute as SPARQL primitive;
determining a natural language question template that matches a natural language question, comprising:
performing word segmentation processing on the natural language problem according to the SPARQL primitive language and the entity name to obtain a plurality of participles;
respectively carrying out matching detection on each natural language problem template by utilizing the sequence of the multiple participles and the part of speech or the lexical value of each participle, and determining the natural language problem template matched with the natural language problem; the natural language question template comprises a plurality of participles with fixed sequence and parts of speech or vocabulary values of each participle.
The SPARQL primitive language and the entity name are adopted to perform word segmentation processing on the natural language problem, and the obtained participles are used for matching detection of the natural language problem template, so that the accuracy of query can be improved.
In one embodiment, the matching detection of each natural language question template using the sequence of the plurality of segmented words and the part of speech or vocabulary value of each segmented word comprises:
and (4) combining preset rule reasoning, and respectively carrying out matching detection on each natural language problem template by using the sequence of the multiple participles and the part of speech or the value of vocabulary of each participle.
Matching detection is carried out by combining rule reasoning, and the query and retrieval of complex problems can be facilitated.
In a second aspect, an embodiment of the present invention provides a question answering device, including:
the matching module is used for determining a natural language problem template matched with the natural language problem by language, and the natural language problem template can be matched with at least one type of natural language problem;
the searching module is used for searching a first language question template corresponding to the natural language question template according to the corresponding relation between the natural language question template and the first language question template;
and the generating module is used for generating a first language problem corresponding to the natural language problem by adopting a first language problem template.
In one embodiment, the first language question template is a SPARQL language question template and the first language question is a question described in SPARQL language.
In one embodiment, the method further comprises:
and the setting module is used for setting a plurality of rule templates, and each rule template comprises a corresponding relation between a natural language question template and a first language question template.
In one embodiment, the method further comprises:
the acquisition module is used for acquiring initial data;
the structuralization module is used for structuralizing the initial data to obtain structuralized entity data and storing the entity names in the structuralized entity data according to the format of a user-defined dictionary;
the ontology description definition module is used for defining the ontology description of the knowledge base according to the structured entity data and adding the meaning of the attribute in the attribute of the ontology description;
and the format conversion module is used for converting the structured entity data into data in a resource description framework format according to the ontology description.
In one embodiment, further comprising:
and the reply module is used for searching the data in the resource description framework format by adopting the first language question to obtain an answer aiming at the natural language question.
In one embodiment, the primitive language creating module is configured to extract an attribute of the ontology description, and use the extracted attribute as a SPARQL primitive;
the matching module is used for: performing word segmentation processing on the natural language problem according to the SPARQL primitive language and the entity name to obtain a plurality of participles; respectively carrying out matching detection on each natural language problem template by utilizing the sequence of the multiple participles and the part of speech or the lexical value of each participle, and determining the natural language problem template matched with the natural language problem; the natural language question template comprises a plurality of participles with fixed sequence and parts of speech or lexical values of the participles.
In one embodiment, the entity name includes a stock name of a listed company;
when the matching module carries out word segmentation processing, the stock names of listed companies are preferentially used for carrying out word segmentation processing on the natural language problem.
In a third aspect, an embodiment of the present application provides an electronic device, including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform any one of the question answering methods.
In a fourth aspect, embodiments of the present application provide a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform any one of the methods of question answering.
Other effects of the above-described alternative will be described below with reference to specific embodiments.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
fig. 1 is a first schematic flow chart illustrating an implementation of a question answering method according to an embodiment of the present application;
fig. 2 is a schematic diagram illustrating a flow chart of an implementation of a question answering method according to an embodiment of the present application;
fig. 3 is a schematic flow chart illustrating an implementation process of step S101 of a question answering method according to an embodiment of the present application;
fig. 4 is a schematic diagram illustrating a flow of implementing a setting mode of a rule template in a question answering method according to an embodiment of the present application;
fig. 5 is a schematic overall implementation flow diagram of a question answering method according to an embodiment of the present application;
FIG. 6 is a first structural diagram of a question answering device according to an embodiment of the present application;
FIG. 7 is a schematic structural diagram of a question answering device according to an embodiment of the present application;
fig. 8 is a block diagram of an electronic device for implementing a question answering method according to an embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
An embodiment of the present application provides a question and answer method, and fig. 1 is a schematic diagram illustrating a first implementation flow of the question and answer method according to the embodiment of the present application, including:
step S101: determining a natural language question template matched with the natural language question, wherein the natural language question template can be matched with at least one type of natural language question;
step S102: searching a first language question template corresponding to the natural language question template according to the corresponding relation between the natural language question template and the first language question template;
step S103: and generating a first language question corresponding to the natural language question by adopting the first language question template.
In a possible embodiment, the first language is SPARQL; accordingly, the first language question template is a SPARQL language question template (hereinafter, SPARQL template), and the first language question is a question described in the SPARQL language (hereinafter, SPARQL question).
The SPARQL Language is called SPARQL Protocol and RDF Query Language (SPARQL Protocol and RDF Query Language) and is a Query Language and data acquisition Protocol developed for Resource Description Framework (RDF). RDF is a data model that describes the storage of data files, and typically describes the fact that it is made up of three parts, called triples. A triplet consists of a subject (subject), a predicate (predicate), and an object (object), and looks like a simple sentence.
Fig. 2 is a schematic diagram illustrating an implementation flow of a question answering method according to an embodiment of the present application. In the question answering method provided in the embodiment of the present application, before step S101, the method may further include:
step S200: and setting a plurality of rule templates, wherein each rule template comprises a corresponding relation between a natural language question template and a first language question template.
In step S200, the first language question template may be referred to as a SPARQL language question template. In a plurality of preset rule templates in the step S200, a natural language question template included in each rule template is matched with at least one type of natural language question; and, for a natural language question, the natural language question template can match multiple interrogations of a natural language question. Compared with an inquiry method that one natural language problem template corresponds to one natural language problem in the prior art, the method and the device for inquiring the natural language problem template can obviously reduce the number of the rule templates and save labor and time cost.
One implementation of converting a natural language question to a SPARQL question is introduced above. In order to implement the knowledge base question answering, the embodiment of the application also discloses an implementation mode of how to prepare the knowledge base. As shown in fig. 2, the embodiment of the present application further includes:
step S201: initial data is acquired.
Step S202: and carrying out structuralization processing on the initial data to obtain structuralized entity data, and storing the entity name in the structuralized entity data according to the format of a user-defined dictionary.
Taking a knowledge base for an enterprise as an example, in one embodiment, enterprise data may be integrated from multiple data dimensions, such as an enterprise knowledge graph, patent information, public opinion information, legal documents, etc., and used as initial data. Then, the initial data is structured to obtain structured entity data, for example, entity data in JSON (JavaScript Object Notation) format. To improve the accuracy of named entity recognition and the accuracy of subsequent part-of-speech tagging, stock names of listed companies, as well as all other structured entity names, may be stored in a defined format in a custom dictionary (e.g., a jieba custom dictionary).
Step S203: and defining an ontology description of the knowledge base according to the structured entity data, and adding the meaning of the attribute in the attribute of the ontology description.
The above "entities" and "ontological descriptions" are briefly introduced here: ontologies, are intended to define terms and also relationships between terms. The specific physical thing is called an entity (entity). An ontology description is a description that defines a certain class of entities, the composition between classes of entities, and their associations. For example, a company is a class of entities; such entities abstract a uniform feature description language such as addresses, corporate representatives, etc., which are also referred to as attributes in the knowledge base, with all attributes of a corporate entity constituting the ontology description of the knowledge base.
In one embodiment, the Ontology description may be automatically defined using a specification of a network Ontology Language (owl, Web Ontology Language) such as owl: class, owl: objecteProperty, and the like.
For the question and answer of the Chinese knowledge base, the meaning of the attribute can be a Chinese paraphrase. For example, for the "name" attribute, its Chinese paraphrase "name, name" may be added to the description (comment) portion of the attribute. This approach facilitates subsequent automated queries for the base attributes.
Step S204: and converting the structured entity data into data in an RDF format according to the ontology description.
In one embodiment, the structured JSON formatted entity data can be directly converted into RDF formatted data. According to the embodiment of the application, the data format conversion from JSON data to RDF data can be realized by adopting an RDFlib toolkit of Python language.
For other Structured relational data, such as data frames (DataFrame data) of Structured Query Language (SQL) and Comma Separated (CSV) file carriers, if the data size is large, the data can be directly converted into JSON-format data, and then the JSON-format data is converted into RDF-format data; if the data volume is small, the conversion of the virtual JSON format can be realized firstly, and then the data is converted into the data in the RDF format finally.
The above steps S201 to S204 implement the establishment of the knowledge base. On the premise of establishing the knowledge base and converting the natural language question into the first language question (for example, SPARQL question), as shown in fig. 2, the embodiment of the present application may further include:
step S205: and searching data in a Resource Description Framework (RDF) format by adopting the first language question to obtain an answer aiming at the natural language question.
In one implementation, the above embodiment may further include: and extracting the attribute of the ontology description, and taking the extracted attribute as a SPARQL primitive.
Fig. 3 is a schematic flow chart illustrating an implementation process of step S101 of a question answering method according to an embodiment of the present application. As shown in fig. 3, based on the configured SPARQL primitive and the entity name saved in step S202, step 101 may include:
step S301: and performing word segmentation processing on the natural language problem according to the SPARQL primitive language and the entity name to obtain a plurality of participles.
Step S302: respectively carrying out matching detection on each natural language problem template by utilizing the sequence of the multiple participles and the part of speech or the lexical value of each participle, and determining the natural language problem template matched with the natural language problem; the natural language question template comprises a plurality of participles with fixed sequence and parts of speech or vocabulary values of each participle.
Specifically, the jieba self-defined dictionary can be loaded by a jieba tool of Python language, and the entity name in the jieba self-defined dictionary is used for word segmentation of the natural language problem, so that the entity name and the keyword in the natural language problem can be identified. Since the content of SPARQL primitive is equivalent to the attribute of the ontology description, the above step S203 adds its text definitions in the comment part of the attribute; therefore, according to the keywords (usually Chinese) in the natural language question, the corresponding SPARQL primitive can be determined. The determined SPARQL primitive can then be used to compose a corresponding SPARQL problem.
For example:
in the knowledge base, Chinese paraphrase 'address and position' is added to the attribute 'location' in advance, and Chinese paraphrase 'legal representative and legal representative' is added to the 'legacy representation' in advance; and extracts the aforementioned attributes as SPARQL primitive.
A natural language question input by a user is received, the contents of which are as follows:
"where the address of company a, who is the legal representative? "
The natural language question is word-cutting processed, and the entity name is identified as 'A company', and the keywords are 'address' and 'legal representative'. According to the keywords, a corresponding SPARQL primitive can be determined, namely that the ' address and legal representative ' are mapped to a corresponding attribute field ', location ', and legacy representation ', so that the SPARQL problem is automatically generated as follows:
“selecto1o2where{
{? s name A company.
?s:locationo1.
?s:legalRepresentativeo2.}
}”;
And finally, automatic retrieval query and flexible combination of the attributes are realized.
For a knowledge base question-answer for a company, in one embodiment, the entity name may comprise a stock name of a listed company. Accordingly, in the step S301, when performing the word segmentation process, the stock name of the listed company can be preferentially used to perform the word segmentation process on the natural language question, so as to improve the accuracy and response speed of the search of the listed company.
The specific reasons for adopting the above treatment mode are as follows: when a user queries a company name, the user often does not input the company full name and then queries, so that a good effect cannot be achieved when the user performs simple word segmentation processing on the company name input by the user and then performs fuzzy matching. For example, "national agrotechnology co ltd", the user often inputs "national agro-technology" in inputting the natural language question sentence. The basic word cutting tool tends to cut words into two words of 'national agriculture and science and technology', and 'science and technology' is a high-frequency word appearing in a plurality of company names, so that a good matching effect cannot be achieved. In the process of the foregoing problem, the present application adopts the stock name of the listed company as the priority matching item, and since the stock name is often an abbreviation of the company name that the user tends to search, the accuracy and response speed of the retrieval can be greatly improved.
In addition, when performing the matching detection in step S302, the preset rule inference may be considered comprehensively, that is, in an embodiment, the matching detection is performed on each natural language question template by using the sequence of the multiple participles and the part of speech or vocabulary value of each participle in combination with the preset rule inference.
Rule reasoning can describe relationships that cannot be directly reflected by attributes. The query of the complex natural language problem can be completed by combining the rule reasoning and the rule template. For example, in the natural language question of "whether company A is a big company", it is possible to previously define "a company having a number of employees over 10000 is a big company", define the ontology tag of "big company" as ": big company"; and specifies that its inverse relationship is "small company: small company ". For the problem of whether a certain company is a small company, the system can define the label of the small company by defining the label of the large company, and finally converts the label of the small company into a corresponding SPARQL query statement by combining with a rule template. The following is a simple expression defining rule reasoning:
rulebigCompany:(?s:employeem)(?m>10000)->(?s rdf:type:BigCompany)
ruleInverse:(?s:BigCompanym)->(?m:SmallCompanys)
by combining the rule reasoning, when receiving the natural language question "whether company a is a large company" input by the user, the question can be converted into a SPARQL query sentence for querying the employee number of company a, for example, as follows:
“selecto1 where{
{? s name A company.
?s:employeeo1.}
}”。
After the number of the employees of company A is inquired by adopting the SPARQL inquiry statement, whether company A is a large company or not can be determined by reasoning according to the data and the rule.
Fig. 4 is a schematic diagram illustrating a flow of implementing a setting mode of a rule template in a question answering method according to an embodiment of the present application, where the flow includes:
step S401: collecting a plurality of expression modes of at least one type of natural language questions;
step S402: analyzing keywords in various expression modes, and setting a vocabulary matching rule according to the keywords; analyzing entities or attributes in various expression modes, and setting part-of-speech matching rules according to the entities or the attributes;
step S403: constructing a natural language problem template according to the vocabulary matching rule and the part of speech matching rule;
step S404: and constructing a first language question template corresponding to the natural language question template according to the natural language question template and the construction rule of the first language question template.
The embodiment of the application can perform object-level regular matching on words in a natural language problem based on a Refo module of a Python language. The Refo word objects specified by the Refo module may have a part-of-speech value and a vocabulary value. For example, for the natural language question of "how many employees a company has", it refers to a class of questions that is "how many employees a company has".
With the embodiment shown in fig. 4, possible questions of this type are first collected, such as:
how many employees a company has?
How many employees a company has?
What is the number of employees of a company?
……
Thereafter, the keywords in each possible question are analyzed. Specifically, "employee" and "employee" are both keywords. The vocabulary matching rules can be set as follows according to the keywords: keyword is W (employee). And, the attributes or entities in each possible question and answer are analyzed. Specifically, "company a" is the entity of the problem, the vocabulary of the actual problem is unknown, but the part of speech (i.e., organization) of the word is fixed. According to the analysis of the entity, the part-of-speech matching rule can be set as follows: the company _ entity ═ W (pos ═ nz).
According to the part of speech matching rules and the vocabulary matching rules, a natural language problem template can be constructed. For the above problem, the process of constructing the natural language question template is as follows:
{ company entity./employee } - > [ company _ entity + - + keyword ];
where "- >" preceded the word "company entity and" employee/employee "should appear in the natural language question that can be matched, and the company entity precedes the word" employee/employee ". "-", the following description shows that the words of the company entity and "employee/employee" are expressed as above according to the word matching rules and vocabulary matching rules.
Thereafter, rules are constructed that generate the SPARQL language. Convert the above [ company _ entity +. x + keyword ] into the corresponding Refo language, for example, as follows:
w (pos ═ nz ") + Star (Any, greedy ═ False) + W (employee | employee);
the contents of the above-mentioned Refo language expression indicate that the SPARQL template should include an entity with a part of speech (denoted by "nz") and a keyword with a vocabulary value of "employee | employee".
And generating a corresponding SPARQL template by adopting the rule, wherein the content is as follows:
selecty from{
is there a x name company name.
?x employnumy.
}
The above is described by taking an example in which one rule template corresponds to one type of natural language question. In the embodiment of the application, one rule template can correspond to multiple types of natural language problems. For example:
for the natural language question of "how many companies a invested" the possible questions for this type of question are first collected, such as:
a company invests in several companies?
How many companies a company invests?
A company participating in several companies?
How many companies a company participates in?
……
And then analyzing the relation and the key words in each possible question and answer. In particular, investment/ginseng shares are a relational name. "how many" and "few" are keywords. The word matching rule may be set to be keyword ═ W (how many | several) according to the aforementioned keyword. And, the attributes or entities in each possible question and answer are analyzed. Specifically, "company a" is the entity of the problem, the vocabulary of the actual problem is unknown, but the part of speech (i.e., organization) of the word is fixed. According to the analysis of the entity, the part-of-speech matching rule can be set as follows: the company _ entity ═ W (pos ═ nz).
According to the relation, the part of speech matching rule and the vocabulary matching rule, a natural language problem template can be constructed. For the above problem, the process of constructing the natural language question template is as follows:
{ company entity./number./few. - > [ company _ entity +. plus investment/. ginseng. + keyword ];
thereafter, rules are constructed that generate the SPARQL language. Convert the above [ company _ entity +. investment/stock +. keyword2 ] into the corresponding Refo language, for example as follows:
w (pos ═ nz ") + Star (Any, greedy ═ False) + (investment | ginseng strand) + Star (Any, greedy ═ False) + W (how many | few);
the contents of the above-mentioned Refo language expression indicate that the SPARQL template should include an entity having a part of speech (denoted by "nz") and a keyword having a vocabulary value of "what | several". And contains the relationship named "invest/stock".
And generating a corresponding SPARQL template by adopting the rule, wherein the content is as follows:
select count distincty from{
is there a x name company name.
Is there a Investment of x? y.
}
In the above templates, the relationship names (i.e., "investment/stock") in the natural language question template and the corresponding SPARQL template may be replaced with other relationship names, i.e.:
the replaced natural language question template is: { company entity./how many } - > - > [ company _ entity +. a + relationship +. a + keyword ];
the corresponding SPARQL template is:
select count distincty from{
is there a x name company name.
Is there a x relation name? y.
}
Thus, the natural language problem template can match multiple types of natural language problems, namely, multiple types of natural language problems for counting the number of other entities having a certain relationship with a company entity. The natural language question template is not limited as to what relationship another entity has with the corporate entity.
According to the preset rule template, a natural language problem template in the rule template is adopted to cut words of the input natural language problem, and the word sequence of the natural language, the part of speech or the word value of the segmented words after word cutting are compared with the content in the natural language problem template; if the natural language question template is hit, the corresponding SPARQL question can be generated according to the SPARQL template corresponding to the natural language question template.
In addition, since the query based on the rule template is limited by the definition of the rule template, in order to better maintain and update the rule template of the system, the embodiment of the present application may further include: and counting the natural language problems which cannot be matched with the natural language problem template, and updating and maintaining the rule template according to the counting result. Specifically, the high-frequency questions except the natural language question template may be recorded, so that the natural language questions with the occurrence frequency higher than the preset threshold may be updated and maintained in the following corresponding rule template.
Fig. 5 is a schematic overall implementation flow diagram of a question answering method according to an embodiment of the present application, including:
step S501: and obtaining and preprocessing the data to obtain structured entity data.
Step S502: an ontology description of the knowledge base is defined according to the structured entity data. After the ontology description is defined, step S503, step S504, and step S505 may be performed respectively.
Step S503: and converting the structured entity data into data in an RDF format according to the ontology description.
Step S504: and configuring the SPARQL primitive according to the ontology description.
Step S505: and defining a rule template and rule reasoning according to the ontology description.
Step S506: and converting the natural language question into the SPARQL question according to the rule template and rule reasoning defined in the step S505 and the SPARQL primitive configured in the step S504.
Step S507: the SPARQL question obtained in step S506 is used to query the RDF-formatted data obtained in step S503, and a search query operation for the natural language question is executed.
In addition, during the actual usage, the method may further include step S508: and updating and maintaining the rule template. Therefore, the rule template is more robust, and the interface of query operation is more concise.
The method provided by the embodiment of the application can be applied to knowledge base question and answer in any field, such as companies (particularly listed companies), literature fields, movie fields and the like. On the construction of the ontology description, the whole process can automatically identify and generate the corresponding ontology description according to RDF data stored in a graph database, and the automatic ontology description of JSON, SQL and other data in various formats is realized. According to the method and the device, the basic attribute retrieval of all entities can be realized by remarking the attributes, and manpower defined by a part of rule templates is saved. For knowledge base questioning and answering in the enterprise field, the initial data adopted in the embodiment of the application can include relevant data such as basic business information, patent information, lawsuits, public opinions and the like of a company. The question and answer made aiming at the data has greater flexibility and practicability, and can realize complex queries such as filtering aggregation and the like. According to the method and the device, basic SPARQL primitive is defined in advance in the SPARQL generation part, and the query statements can be directly spliced aiming at simple select query, so that the defect that a rule template needs to be written in a problem is reduced. The stock names of listed companies are used as the priority matching items during word cutting processing, so that the accuracy of retrieval and query can be effectively improved. The method has great universality, simple entity attribute retrieval can be realized only by changing the original data, and a flexible and efficient knowledge base question-answering system can be realized by assisting with user-defined rule reasoning and rule templates.
An embodiment of the present application further provides a question answering device, and fig. 6 is a schematic structural diagram of a question answering device according to an embodiment of the present application, which includes:
a matching module 601, configured to determine a natural language question template matching a natural language question, where the natural language question template is capable of matching at least one type of natural language question;
the searching module 602 is configured to search a first language question template corresponding to the natural language question template according to a corresponding relationship between the natural language question template and the first language question template;
the generating module 603 is configured to generate a first language question corresponding to the natural language question by using the first language question template.
In one embodiment, the first language question template is a SPARQL language question template and the first language question is a question described in SPARQL language.
Fig. 7 is a schematic structural diagram of a second question answering device according to an embodiment of the present application, and as shown in fig. 7, the device includes: a matching module 601, a searching module 602, a generating module 603 and a setting module 704; the matching module 601, the searching module 602, and the generating module 603 have the same functions as the corresponding modules in the above embodiments, and are not described again.
A setting module 704, configured to set a plurality of rule templates, where each rule template includes a corresponding relationship between a natural language question template and a first language question template.
In one embodiment, as shown in fig. 7, the apparatus further comprises:
an obtaining module 705, configured to obtain initial data;
the structuralization module 706 is configured to perform structuralization processing on the initial data to obtain structuralized entity data, and store an entity name in the structuralized entity data according to a format of a user-defined dictionary;
an ontology description definition module 707 for defining an ontology description of the knowledge base according to the structured entity data, and adding the meaning of the attribute to the attribute of the ontology description;
and the format conversion module 708 is used for converting the structured entity data into data in a resource description framework format according to the ontology description.
In one embodiment, the above apparatus further comprises:
the reply module 709 is configured to search data in the resource description framework format using the first language question to obtain an answer to the natural language question.
In one embodiment, the above apparatus further comprises:
a primitive language creating module 710, configured to extract an attribute of the ontology description, and use the extracted attribute as a SPARQL primitive language;
the matching module 601 is configured to: performing word segmentation processing on the natural language problem according to the SPARQL primitive language and the entity name to obtain a plurality of participles; respectively carrying out matching detection on each natural language problem template by utilizing the sequence of the multiple participles and the part of speech or the lexical value of each participle, and determining the natural language problem template matched with the natural language problem; the natural language question template comprises a plurality of participles with fixed sequence and parts of speech or vocabulary values of each participle.
In one embodiment, the entity name includes a stock name of a listed company; when the matching module 601 performs word segmentation, the stock names of listed companies are preferentially used to perform word segmentation on the natural language question.
The functions of the modules in the devices in the embodiments of the present application can be referred to the corresponding descriptions in the above methods, and are not described herein again.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
Fig. 8 is a block diagram of an electronic device according to the question answering method in the embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 8, the electronic apparatus includes: one or more processors 801, memory 802, and interfaces for connecting the various components, including a high speed interface and a low speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display Graphical information for a Graphical User Interface (GUI) on an external input/output device, such as a display device coupled to the Interface. In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 8 illustrates an example of a processor 801.
The memory 802 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by at least one processor to cause the at least one processor to perform the question answering method provided by the present application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute the question-answering method provided by the present application.
The memory 802, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., the matching module 601, the search module 602, and the generation module 603 shown in fig. 6) corresponding to the question answering method in the embodiments of the present application. The processor 801 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 802, that is, implements the question-answering method in the above-described method embodiments.
The memory 802 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by use of the electronic device according to the question-answering method, and the like. Further, the memory 802 may include high speed random access memory and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 802 may optionally include memory located remotely from the processor 801, which may be connected to the question-answering method's electronic device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the question answering method may further include: an input device 803 and an output device 804. The processor 801, the memory 802, the input device 803, and the output device 804 may be connected by a bus or other means, and are exemplified by a bus in fig. 8.
The input device 803 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the question-answering method electronic apparatus, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer, one or more mouse buttons, a track ball, a joystick, or other input device. The output devices 804 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The Display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) Display, and a plasma Display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, Integrated circuitry, Application Specific Integrated Circuits (ASICs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (Cathode Ray Tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
According to the technical scheme of the embodiment of the application, one natural language problem template can be matched with at least one type of natural language problem, so that the problem that one natural template is written for each natural language problem is avoided, the service retrieval requirement can be used without writing a large number of templates, and the labor and time cost is saved. By adopting the SPARQL language question template, the corresponding SPARQL question can be generated, and the query and the retrieval are facilitated. Converting the initial data into data in RDF format can facilitate query retrieval using the SPARQL problem. The SPARQL primitive language and the entity name are adopted to perform word segmentation processing on the natural language problem, and the obtained participles are used for matching detection of the natural language problem template, so that the accuracy of query can be improved. When the word segmentation processing is carried out on the natural language problem, the stock names of listed companies are preferentially adopted for word segmentation, so that the accuracy rate of inquiring the company knowledge base can be improved.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (12)

1. A question-answering method, comprising:
acquiring initial data;
carrying out structuralization processing on the initial data to obtain structuralized entity data, and storing entity names in the structuralized entity data according to a format of a user-defined dictionary;
defining an ontology description of a knowledge base according to the structured entity data, and adding the meaning of the attribute in the attribute of the ontology description;
converting the structured entity data into data in a resource description framework format according to the ontology description;
extracting the attribute of the body description, and taking the extracted attribute as SPARQL primitive;
performing word segmentation processing on the natural language question according to the SPARQL primitive language and the entity name to obtain a plurality of participles;
respectively carrying out matching detection on each natural language problem template by utilizing the sequence of the multiple participles and the part of speech or the vocabulary value of each participle, and determining the natural language problem template matched with the natural language problem; the natural language question template comprises a plurality of participles with fixed sequence and part of speech or vocabulary value of each participle, and can be matched with at least one type of natural language question;
searching a first language question template corresponding to the natural language question template according to the corresponding relation between the natural language question template and the first language question template;
generating a first language question corresponding to the natural language question by using the first language question template, wherein the first language question template is a SPARQL language question template, and the first language question is a question described by using a SPARQL language;
the method for constructing the natural language question template comprises the following steps: collecting a plurality of expression modes of at least one type of natural language questions; analyzing keywords in the multiple expression modes, and setting a vocabulary matching rule according to the keywords; analyzing the entity or attribute in the multiple expression modes, and setting a part-of-speech matching rule according to the entity or attribute; and constructing a natural language problem template according to the vocabulary matching rule and the part of speech matching rule.
2. The method of claim 1, wherein prior to determining a natural language question template that matches a natural language question, further comprising:
and setting a plurality of rule templates, wherein each rule template comprises the corresponding relation between the natural language question template and the first language question template.
3. The method of claim 2, wherein the rule template is set by:
and constructing a first language question template corresponding to the natural language question template according to the natural language question template and the construction rule of the first language question template.
4. The method of claim 2, further comprising:
and counting the natural language problems which cannot be matched with the natural language problem template, and updating and maintaining the rule template according to the counting result.
5. The method of claim 1, further comprising:
and searching the data in the resource description frame format by adopting the first language question to obtain an answer aiming at the natural language question.
6. The method of claim 1, wherein the performing match detection on each natural language question template by using the sequence of the plurality of participles and the part of speech or vocabulary value of each participle comprises:
and (4) combining preset rule reasoning, and respectively carrying out matching detection on each natural language problem template by using the sequence of the multiple participles and the part of speech or the value of vocabulary of each participle.
7. A question answering device, comprising:
the acquisition module is used for acquiring initial data;
the structuralization module is used for structuralizing the initial data to obtain structuralized entity data and storing entity names in the structuralized entity data according to a format of a user-defined dictionary;
the ontology description definition module is used for defining the ontology description of the knowledge base according to the structured entity data and adding the meaning of the attribute in the attribute of the ontology description;
the format conversion module is used for converting the structured entity data into data in a resource description framework format according to the ontology description;
the primitive language creating module is used for extracting the attribute of the ontology description and taking the extracted attribute as an SPARQL primitive language;
the matching module is used for carrying out word segmentation processing on the natural language problem according to the SPARQL primitive language and the entity name to obtain a plurality of participles; respectively carrying out matching detection on each natural language problem template by utilizing the sequence of the multiple participles and the part of speech or the lexical value of each participle, and determining the natural language problem template matched with the natural language problem; the natural language problem template comprises a plurality of participles with fixed sequence and parts of speech or vocabulary values of the participles, can be matched with at least one type of natural language problem, and comprises the following steps: collecting a plurality of expression modes of at least one type of natural language questions; analyzing keywords in the multiple expression modes, and setting a vocabulary matching rule according to the keywords; analyzing the entity or attribute in the multiple expression modes, and setting a part-of-speech matching rule according to the entity or attribute; constructing a natural language problem template according to the vocabulary matching rules and the part of speech matching rules;
the search module is used for searching a first language question template corresponding to a natural language question template according to the corresponding relation between the natural language question template and the first language question template, wherein the first language question template is a SPARQL language question template, and the first language question is a question described by a SPARQL language;
and the generating module is used for generating the first language question corresponding to the natural language question by adopting the first language question template.
8. The apparatus of claim 7, wherein the first language question template is a SPARQL language question template and the first language question is a question described in SPARQL language.
9. The apparatus of claim 8, further comprising:
and the setting module is used for setting a plurality of rule templates, and each rule template comprises the corresponding relation between the natural language question template and the first language question template.
10. The apparatus of claim 7, further comprising:
and the reply module is used for searching the data in the resource description framework format by adopting the first language question to obtain an answer aiming at the natural language question.
11. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.
12. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6.
CN201910950761.6A 2019-10-08 2019-10-08 Question answering method and device, electronic equipment and storage medium Active CN110717025B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910950761.6A CN110717025B (en) 2019-10-08 2019-10-08 Question answering method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910950761.6A CN110717025B (en) 2019-10-08 2019-10-08 Question answering method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110717025A CN110717025A (en) 2020-01-21
CN110717025B true CN110717025B (en) 2022-08-12

Family

ID=69212208

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910950761.6A Active CN110717025B (en) 2019-10-08 2019-10-08 Question answering method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110717025B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112256853A (en) * 2020-10-30 2021-01-22 深圳壹账通智能科技有限公司 Question generation method, device, equipment and computer readable storage medium
CN113535987B (en) * 2021-09-13 2022-01-21 杭州涂鸦信息技术有限公司 Linkage rule matching method and related device
CN114428788B (en) * 2022-01-28 2024-08-13 腾讯科技(深圳)有限公司 Natural language processing method, device, equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108959358A (en) * 2018-05-14 2018-12-07 北京大学 A kind of end-user listening data access method and system based on ontology model
CN110147436A (en) * 2019-03-18 2019-08-20 清华大学 A kind of mixing automatic question-answering method based on padagogical knowledge map and text

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8996555B2 (en) * 2012-11-26 2015-03-31 Sap Se Question answering framework for structured query languages
CN105868313B (en) * 2016-03-25 2019-02-12 浙江大学 A kind of knowledge mapping question answering system and method based on template matching technique
CN109033063B (en) * 2017-06-09 2022-02-25 微软技术许可有限责任公司 Machine inference method based on knowledge graph, electronic device and computer readable storage medium
CN109710737B (en) * 2018-12-21 2021-01-22 神思电子技术股份有限公司 Intelligent reasoning method based on structured query

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108959358A (en) * 2018-05-14 2018-12-07 北京大学 A kind of end-user listening data access method and system based on ontology model
CN110147436A (en) * 2019-03-18 2019-08-20 清华大学 A kind of mixing automatic question-answering method based on padagogical knowledge map and text

Also Published As

Publication number Publication date
CN110717025A (en) 2020-01-21

Similar Documents

Publication Publication Date Title
JP7223785B2 (en) TIME-SERIES KNOWLEDGE GRAPH GENERATION METHOD, APPARATUS, DEVICE AND MEDIUM
US20230142217A1 (en) Model Training Method, Electronic Device, And Storage Medium
KR102694765B1 (en) Event extraction method, device, electronic equipment and storage medium
CN111709247B (en) Data set processing method and device, electronic equipment and storage medium
US20210216717A1 (en) Method, electronic device and storage medium for generating information
US11775859B2 (en) Generating feature vectors from RDF graphs
US10332012B2 (en) Knowledge driven solution inference
EP3671526B1 (en) Dependency graph based natural language processing
CN110717025B (en) Question answering method and device, electronic equipment and storage medium
KR102485129B1 (en) Method and apparatus for pushing information, device and storage medium
CN109947921B (en) Intelligent question-answering system based on natural language processing
CN111611468B (en) Page interaction method and device and electronic equipment
US20200356726A1 (en) Dependency graph based natural language processing
CN110555205B (en) Negative semantic recognition method and device, electronic equipment and storage medium
US20220129448A1 (en) Intelligent dialogue method and apparatus, and storage medium
CN113220836A (en) Training method and device of sequence labeling model, electronic equipment and storage medium
CN111767334B (en) Information extraction method, device, electronic equipment and storage medium
US20210209112A1 (en) Text query method and apparatus, device and storage medium
Miao et al. A dynamic financial knowledge graph based on reinforcement learning and transfer learning
US20230127654A1 (en) Method for knowledge answering, and method for generating knowledge answering system
Barbieri et al. A natural language querying interface for process mining
CN110795456B (en) Map query method and device, computer equipment and storage medium
EP4120101A1 (en) Concepts and link discovery system
CN116226478B (en) Information processing method, model training method, device, equipment and storage medium
CN113221566B (en) Entity relation extraction method, entity relation extraction device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant