WO2017010652A1

WO2017010652A1 - Automatic question and answer method and device therefor

Info

Publication number: WO2017010652A1
Application number: PCT/KR2016/002275
Authority: WO
Inventors: 이근배; 박선영; 김병수; 심효섭; 한상도
Original assignee: 포항공과대학교 산학협력단
Priority date: 2015-07-15
Filing date: 2016-03-08
Publication date: 2017-01-19
Also published as: KR101678787B1

Abstract

An automatic question and answer device and a method therefor are disclosed. The automatic question and answer device comprises: a semantic parsing module for generating a first question sentence expressed in a formal language from an inputted natural language question sentence, and extracting, from a database, a first answer sentence to the first question sentence; and a question pattern template module for generating a second question sentence by applying, to a natural language question sentence, a question template included in a question pattern to which the natural language question sentence corresponds, among predetermined question patterns, and extracting, from the database, a second answer to the second question sentence. Accordingly, a user's request for information can be identified and an answer having high suitability can be outputted despite variations such as a change in word order or a replacement of a word in a natural language question sentence.

Description

Automatic query response method and device

The present invention relates to an automatic query response method and apparatus, and more particularly, to an automatic query response method and apparatus for grasping an information request from a natural language query sentence and extracting information suitable for the information request from a database based on a knowledge base. It is about.

Recently, a large knowledge base was released by community activities such as Freebase or Dbpedia. The knowledge base is composed of triples of <objects, relationships and entities>, which are atomic forms of fragmentary knowledge. This triple can be used as a resource for solving the user's information needs.

The source of information in the automatic query response method based on the existing technology of information retrieval is large text. Because the paragraphs retrieved from these large tests are provided as a response to an information request, an automated query response method based on information retrieval can add text to broaden the resolution of the information request, but the accuracy of the response is relatively low. low.

In contrast, the knowledgebase-based automatic query response method is relatively accurate because a suitable response is retrieved from a highly structured knowledgebase. However, since only knowledge that is embedded directly in a knowledge base by a person is the target, the range of the knowledge base-based automatic query response method is relatively narrow. Because of these characteristics, knowledge base based automatic query response and information retrieval based automatic query response can complement each other.

In knowledgebase-based automatic query response system, it is required to extract data from the knowledgebase that meets the information needs of the user's query. To do this, it is necessary to properly grasp the user's intended information request from the natural language query sentence, and generate a formal language query based on the information.

However, even when a knowledge base-based automatic query response method is used, there may be a case in which an error in which a user's information request is not grasped from an input natural language query sentence due to the nature of the natural language query sentence may occur. This error may occur when a natural language query makes a relatively simple request for information.

SUMMARY OF THE INVENTION An object of the present invention for solving the above problems is to provide an automatic query response method and apparatus for grasping an information request from a natural language query sentence and extracting information suitable for the information request from a database based on a knowledge base. .

According to an embodiment of the present invention for achieving the above object, in the automatic query response method performed in the automatic query response device, the automatic query response method comprises the steps of: splitting the input natural language query sentence into one or more phrases; Converting a word included in each of the phrases into a formal language; Generating a first query sentence by combining the phrases converted into a formal language according to a predefined grammar relating to the formal language; And extracting a first response sentence for the first query sentence from a database composed of a plurality of query sentence-response sentences expressed in the formal language, based on a database composed of a plurality of sample query sentences. Extracting a query pattern corresponding to the natural language query sentence from a predefined query pattern; Generating a second query sentence by applying a template corresponding to the extracted query pattern to the natural language query sentence; And extracting a second response sentence for the second query sentence.

Here, the automatic query response method may further include displaying the first response sentence, but additionally displaying the second response sentence.

The dividing into the phrase may include dividing the natural language query sentence into word units; And generating the phrase by combining the words, but the existing word may be omitted in the combining process.

Here, the formal language may express the natural language query sentence in a formalized structure that is not sensitive to word order or vocabulary changes.

Here, in the converting to the formal language, the word may be converted into an attribute and an entity name of the formal language.

The generating of the first query sentence may include: generating one or more query sentence candidates expressed in the formal language; And selecting, as the first query sentence, a query sentence candidate having the highest sum of similarities evaluated as air information of a formal language included in the query sentence candidate.

In the selecting of the first query sentence, in order to evaluate the query sentence candidate, a candidate evaluation model trained based on a database composed of pairs of natural language query sentence-correct query sentences may be used.

The extracting of the query pattern may include whether the natural language query sentence includes a predefined phenotype, and whether the chunk includes a predefined vocabulary when the natural language query sentence is analyzed in chunks. The query pattern may be extracted in consideration of at least one of the number of chunks and the type of the chunks.

The template may include: a slot information template for extracting slot information about a formal language corresponding to the natural language query sentence; And a query template for converting the natural language query sentence into the second query sentence using the slot information.

According to another embodiment of the present invention, the automatic query response apparatus generates a first query sentence expressed in a formal language from an input natural language query sentence, and comprises a database composed of a plurality of pairs of query sentence-response sentences expressed in a formal language. A semantic parsing module that extracts a first response sentence from the first query sentence from the first query sentence; And generating a second query sentence by applying a query template included in the query pattern corresponding to the natural language query sentence to the natural language query sentence among predefined query patterns, and generating a second response to the second query sentence from the database. It includes a query pattern template module to extract the.

The semantic parsing module may include: a parser that divides the input natural language query sentence into word units and combines the words to generate one or more phrases, and omits existing words in the recombination process; A candidate generation module for converting the phrase into a formal language phrase expressed in a formal language and generating one or more query sentence candidates by combining the formal language syntax based on a predefined grammar relating to the formal language; A candidate evaluation module that selects, as a first query sentence, a query sentence candidate having the highest sum of similarities evaluated as air information of a formal language included in the query sentence candidate; And an output module that extracts a first response sentence for the first query sentence.

The query pattern template module may include a pattern extraction module configured to extract a query pattern corresponding to the natural language query sentence from a predefined query pattern; A template application module for generating a second query sentence by applying a template included in the query pattern to the natural language query sentence; And an output module for extracting a second response to the second query sentence.

Here, the automatic query response device may further include a display unit that primarily displays the first response sentence and additionally displays the second response sentence.

Here, the candidate generation module may convert the word into an attribute and an entity name of a formal language.

Here, the output module is a query that conforms to the SPARQL standard, which is a standard that can query the database consisting of a plurality of query sentence-response sentences expressed in the formal language of the first query sentence or the second query sentence. Can be converted to a sentence.

Here, the candidate evaluation module may use a candidate evaluation model learned through a database composed of pairs of the natural language query sentence and the correct sentence sentence to evaluate the query sentence candidate.

Here, the pattern extraction module may include whether the natural language query sentence includes a predefined phenotype, whether the chunk includes a predefined vocabulary when the natural language query sentence is analyzed in chunks, and the chunk. The query pattern may be extracted in consideration of at least one of the number of and the type of the chunk.

According to the present invention, a response having a high degree of suitability can be output by grasping a user's information request even in a variation such as a change in word order or a change in a vocabulary of a natural language query sentence.

1 is a block diagram of an automatic query response device according to an embodiment of the present invention.

2 is a flowchart illustrating an automatic query response method according to an embodiment of the present invention.

As the present invention allows for various changes and numerous embodiments, particular embodiments will be illustrated in the drawings and described in detail in the written description. However, this is not intended to limit the present invention to specific embodiments, it should be understood to include all modifications, equivalents, and substitutes included in the spirit and scope of the present invention.

Terms such as first and second may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as the second component, and similarly, the second component may also be referred to as the first component. The term and / or includes a combination of a plurality of related items or any item of a plurality of related items.

When a component is referred to as being "connected" or "connected" to another component, it may be directly connected to or connected to that other component, but it may be understood that other components may be present in between. Should be. On the other hand, when a component is said to be "directly connected" or "directly connected" to another component, it should be understood that there is no other component in between.

The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting of the present invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In this application, the terms "comprise" or "have" are intended to indicate that there is a feature, number, step, operation, component, part, or combination thereof described in the specification, and one or more other features. It is to be understood that the present invention does not exclude the possibility of the presence or the addition of numbers, steps, operations, components, components, or a combination thereof.

Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art. Terms such as those defined in the commonly used dictionaries should be construed as having meanings consistent with the meanings in the context of the related art and shall not be construed in ideal or excessively formal meanings unless expressly defined in this application. Do not.

Hereinafter, with reference to the accompanying drawings, it will be described in detail a preferred embodiment of the present invention. In the following description of the present invention, the same reference numerals are used for the same elements in the drawings and redundant descriptions of the same elements will be omitted.

Referring to FIG. 1, an automatic query response system according to an exemplary embodiment of the present invention includes an automatic query response device 100, a database 200, a candidate evaluation model trainer 310, and a candidate evaluation model 320.

Here, the database 200 includes a database composed of a pair of query sentence-response sentences expressed in a formal language, a database composed of a plurality of sample query sentences, a database composed of a pair of natural language query sentence-correct query sentences, and a natural language-attribute. It may include a dictionary database such as a dictionary or entity name dictionary. In addition, the database 200 according to the embodiment of the present invention means a database in the form of a knowledge base.

The automatic query response apparatus 100 according to the embodiment of the present invention includes a semantic parsing module 110 and a query pattern template module 120.

The semantic parsing module 110 generates a first query sentence expressed in a formal language from an input natural language query sentence, and generates a first query sentence from a database composed of a plurality of pairs of query sentence-response sentences expressed in the formal language. The first response sentence may be extracted.

The query pattern template module 120 generates a second query sentence by applying the query template included in the query pattern corresponding to the natural language query sentence to the natural language sentence from among the predefined query patterns, and from the database 200, the second query sentence. A second response to may be extracted.

Here, the process of extracting the first response sentence by the semantic parsing module 110 and the process of extracting the second response sentence by the query pattern template module 120 may occur simultaneously or sequentially regardless of the order. In addition, the automatic query response apparatus 100 according to an embodiment of the present invention may simultaneously display or sequentially display the first response sentence and the second response sentence to the user using a display unit (not shown).

The automatic query response apparatus 100 may display a second response sentence that is distinguished from the first response sentence according to the input natural language query sentence. Accordingly, the user may identify the first response sentence as a response to the natural language query sentence, and may check and refer to the second response sentence if the first response sentence is not suitable as a response. Therefore, from the user's point of view, the template module 120 may play an additional role of extracting a second response regarding the natural language query sentence.

The semantic parsing module 110 may include a parser 111, a candidate generation module 112, a candidate evaluation module 113, and an output module 114.

The semantic parsing module 110 may derive a formal semantic expression from a natural language query sentence. Formal semantic representations of natural language query sentences may be represented by a formal language. Here, the formal language may express a natural language query sentence in a formal structure that is not sensitive to word order or vocabulary changes. Accordingly, the semantic parsing module 110 may generate the first query sentence expressed in the formal language from the natural language query sentence using the formal language. Specifically, the semantic parsing module 110 generates a first query sentence expressed in a formal language from an input natural language query sentence, and generates a first query from a database composed of a plurality of query sentence-response sentences expressed in the formal language. The first response to the sentence may be extracted.

The parser 111 divides the input natural language query sentence into word units and combines each word again to generate one or more phrases. In this process, the parser 111 may omit an existing word included in a natural language query sentence. That is, words that do not significantly affect the meaning of words constituting the natural language query sentence may be omitted. Here, the phrase refers to a sequence of words forming a natural language query sentence. In generating the phrase, the order of the words appearing in the natural language query sentence should be maintained. For example, if a natural query statement is "who is the wife of abraham lincoln?", A phrase such as "who", "is", ..., "who is" may be generated.

The candidate generation module 112 may generate one or more query sentence candidates by converting the generated syntax into a formal language syntax expressed in a formal language, and combining the formal language syntax based on a grammar relating to a predefined formal language. have. That is, the natural language query sentence divided into a plurality of phrases in the parser module 111 may be converted into a knowledge base vocabulary, which is a formal language corresponding to a portion corresponding to an entity name and a portion corresponding to an attribute in the candidate generation module 112. Can be.

When the generated phrase is convertible into a plurality of knowledge base vocabularies, a word having a higher similarity evaluated by air (co-occurrence) information included in the dictionary database may be selected. Here, the term air refers to a phenomenon in which a word and a word are used together in a single document or sentence. In other words, form, morpheme, phoneme, phoneme and so on appear in the same sentence, phrase, or word without grammatical deviation. The grammatical elements that have a favorable relationship are called air expressions and such a relationship is called air relations.

After the conversion of each syntax to a knowledge base vocabulary, one or more query sentence candidates corresponding to the formal semantic expression may be generated according to a compound grammar between knowledge base vocabularies, which are grammars for the formal language.

The dictionary database used for the knowledge base lexical conversion may include a natural language-property dictionary and an entity name dictionary. Natural language-property dictionaries are dictionary databases that represent natural language phrases and knowledgebase attribute vocabulary. Information is extracted from a large amount of text by an information extraction tool, and the extracted information and the actual knowledge base are aligned to generate a pair of natural language phrases and attribute vocabulary, and the obtained air information is used for similarity evaluation.

The entity name dictionary is a dictionary database constructed by collecting a knowledge base entity name vocabulary.

Here, the grammar for the formal language may be implemented as a combination rule dictionary. A concatenation rule dictionary is a dictionary that includes a few derivation rules for synthesizing from the minimum unit form semantic representation of a statement to a formal semantic representation representing the entire query statement.

The candidate evaluation module 113 may select, as the first query sentence, a query sentence candidate having the highest sum of similarities evaluated as air information of a formal language included in the query sentence candidate. That is, the candidate evaluation module 113 has a formal semantic expression that is evaluated as having the highest similarity as the sum of similarity previously evaluated by the air information with respect to the formal semantic expression of the query sentence candidate generated by the candidate generation module 112. The query sentence may be selected as the first query sentence.

In addition, the candidate evaluation module 113 may use a candidate evaluation model learned through a database composed of pairs of natural language question sentence-correct query sentences to evaluate a query sentence candidate.

The output module 114 converts the first query sentence or the second query sentence into a query sentence that conforms to the SPARQL standard, which is a standard for querying a database composed of a plurality of query statement-response sentences expressed in a formal language. Can be.

The candidate evaluation model trainer 310 trains the candidate evaluation model 320. That is, the candidate evaluation model trainer 310 may play a role of learning a model for evaluating a candidate of formal semantic expression in a machine learning method from a database composed of pairs of natural language sentence-correct query sentences.

The query pattern template module 120 extracts a response from the database by extracting a query pattern corresponding to the input natural language query sentence from a plurality of sample query sentences and applying a template included in the query pattern to the natural language query sentence. do. The query pattern template module 120 may include a pattern extraction module 121, a template application module 122, and an output module 123.

The pattern extraction module 121 checks the feature values from the natural language query sentence and extracts a query pattern in which the feature values match. The pattern extraction module 121 is a feature value, and whether the natural language query sentence includes a predefined phenotype, and if the natural language query sentence is analyzed in chunks, whether the chunk includes a predefined vocabulary. The query pattern may be extracted in consideration of at least one of the number of chunks and the type of chunks. Here, the chunk may be composed of words in a sentence that are semantically or grammatically related to each other. In addition, a chunk means a sequence of words in a sentence including a core vocabulary representing a function or a role.

The template application module 122 may convert a natural language query sentence into a second query sentence expressed in a formal language by applying a template corresponding to the extracted query pattern to the natural language query sentence.

The template applied here may include a slot information template for extracting slot information about a formal language corresponding to a natural language query sentence and a query template for converting a natural language query sentence into a second query sentence using slot information. .

Hereinafter, an automatic query response method according to an embodiment of the present invention will be described.

Referring to FIG. 2, the parser 111 may divide the input natural language query sentence into word units and combine the divided words to generate one or more phrases (S211). Here, the parser 111 may omit the existing word in the recombination process.

The candidate generation module 112 may generate one or more query sentence candidates by converting a word included in each phrase into a formal language and combining the formal language phrases based on the grammar of the predefined formal language (S212, S213).

In addition, the candidate evaluation module 113 may select, as the first query sentence, a query sentence candidate having the highest sum of similarities evaluated as the cumulative change number of the formal language included in the query sentence candidate.

The output module 114 may extract a first response sentence for the first query sentence from a database composed of a plurality of query sentence-response characters expressed in a formal language (S214).

Next, as an additional procedure, the automatic query response device 100 may perform the following procedure. That is, the automatic query response apparatus 100 may primarily display the first response sentence, but additionally display the second response sentence. The displaying of the second response sentence may be at the user's option.

The pattern extraction module 121 may extract a query pattern corresponding to a natural language query sentence from a predefined query pattern based on a database composed of a plurality of sample query sentences (S221). Here, the database consisting of a plurality of sample query sentences may be implemented in the form of a pattern dictionary. The pattern dictionary may be manually implemented in advance. One entry in the pattern dictionary includes a sentence pattern rule, a slot information template, and a query template.

Sentence pattern rules are further divided into lexical patterns, chunk type patterns, and chunk patterns. Vocabulary patterns are rules that determine matching patterns through the presence of direct vocabulary. The chunk pattern is a rule for determining a matching pattern through the number and type of chunks obtained as a result of chunking a natural language query sentence. The pattern in the chunk is a rule for determining a matching pattern through whether or not a vocabulary having a part of speech included in the rule exists among the elements in the chunk.

Next, the template application module 122 may generate a second query sentence by applying a template included in the extracted query pattern to the natural language query sentence (S222).

Next, the output module 123 may extract a second response sentence for the second query sentence from a database composed of a plurality of query sentence-response sentences expressed in a formal language (S223).

As described above, the automatic query response device 100 according to the embodiment of the present invention displays the first response to the first query sentence using the semantic parsing module 110 with respect to the input natural language query sentence, and additionally. For example, the second response to the second query sentence may be displayed using the query pattern template module 120 for the natural language query sentence. Here, the method of using the semantic parsing module 110 is suitable for a natural language query sentence in which various components are combined, and the method of using the query pattern template module 120 is suitable for a simple form of natural language query sentence. Therefore, according to the present invention, the response can be extracted by different methods according to the form of the natural language query sentence, and the response corresponding to the information request of the user can be output.

The methods according to the invention can be implemented in the form of program instructions that can be executed by various computer means and recorded on a computer readable medium. Computer-readable media may include, alone or in combination with the program instructions, data files, data structures, and the like. The program instructions recorded on the computer readable medium may be those specially designed and constructed for the present invention, or may be known and available to those skilled in computer software.

Examples of computer readable media include hardware devices that are specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code, such as produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate with at least one software module to perform the operations of the present invention, and vice versa.

Although it has been described above with reference to the preferred embodiment of the present invention, those skilled in the art will be able to variously modify and change the present invention without departing from the spirit and scope of the invention described in the claims below. It will be appreciated.

Claims

In the automatic query response method performed in the automatic query response device,

Dividing the input natural language query sentence into one or more phrases;

Converting a word included in each of the phrases into a formal language;

Generating a first query sentence by combining the phrases converted into a formal language according to a predefined grammar relating to the formal language; And

Extracting a first response sentence for the first query sentence from a database composed of a plurality of query sentence-response sentences expressed in the formal language,

Extracting a query pattern corresponding to the natural language query sentence from a predefined query pattern based on a database composed of a plurality of sample query sentences;

Generating a second query sentence by applying a template corresponding to the extracted query pattern to the natural language query sentence; And

And extracting a second response sentence for the second query sentence.
The method according to claim 1,

The automatic query response method,

Firstly displaying the first response sentence, and additionally displaying the second response sentence.
The method according to claim 1,

Dividing into the above syntax,

Dividing the natural language query sentence into word units; And

Combining the words to generate the phrase,

Automatic query response method that can omit the existing words in the combining process.
The method according to claim 1,

The formal language is,

An automatic query response method for expressing the natural language query sentence in a formal structure that is not sensitive to word order or vocabulary changes.
The method according to claim 4,

In the step of converting to the format language,

An automatic query response method for converting the word into an attribute and an entity name of a formal language.
The method according to claim 1,

Generating the first query sentence,

Generating one or more query sentence candidates expressed in the formal language; And

Selecting the query sentence candidate having the highest sum of similarity evaluated as co-occurrence information of a formal language included in the query sentence candidate as the first query sentence. .
The method according to claim 6,

Selecting the first query sentence,

And a candidate evaluation model trained based on a database composed of pairs of natural language query sentence-correct query sentences to evaluate the query sentence candidate.
The method according to claim 1,

Extracting the query pattern,

Whether the natural language query sentence includes a predefined phenotype, whether the chunk includes a predefined vocabulary when the natural language query sentence is analyzed in chunks, the number of chunks, and the type of the chunk. Extracting the query pattern in consideration of at least one, automatic query response method.
The method according to claim 1,

The template,

A slot information template for extracting slot information about a formal language corresponding to the natural language query sentence; And

And a query template for converting the natural language query sentence into the second query sentence using the slot information.
In the automatic question answering device,

A first query sentence expressed in a formal language is generated from the input natural language query sentence, and a first response sentence for the first query sentence is extracted from a database composed of a plurality of query sentence-response sentences expressed in the formal language. Meaning a parsing module; And

Among the predefined query patterns, a second query sentence is generated by applying a query template included in the query pattern corresponding to the natural language query sentence to the natural language query sentence, and a second response to the second query sentence is obtained from the database. An automatic query answering device comprising a query pattern template module for extracting.
The method according to claim 10,

The semantic parsing module,

A parser for dividing the input natural language query sentence into word units, combining the words to generate one or more phrases, and omitting existing words in the recombination process;

A candidate generation module for converting the phrase into a formal language phrase expressed in a formal language and generating one or more query sentence candidates by combining the formal language syntax based on a predefined grammar relating to the formal language;

A candidate evaluation module that selects, as a first query sentence, a query sentence candidate having the highest sum of similarities evaluated as air information of a formal language included in the query sentence candidate; And

And an output module for extracting a first response sentence for the first query sentence.
The method according to claim 10,

The query pattern template module,

A pattern extraction module for extracting a query pattern corresponding to the natural language query sentence from a predefined query pattern;

A template application module for generating a second query sentence by applying a template included in the query pattern to the natural language query sentence; And

And an output module for extracting a second response to the second query sentence.
The method according to claim 10,

The automatic query response device,

And primarily displaying the first response sentence and additionally displaying the second response sentence.
The method according to claim 11,

The formal language is,

An automatic query response device for expressing the natural language query sentence in a formal structure that is insensitive to word order or vocabulary changes.
The method according to claim 11,

The candidate generation module,

An automatic query answering device for converting the word into an attribute and an entity name of a formal language.
The method according to claim 11 or 12,

The output module,

An automatic query for converting the first query sentence or the second query sentence into a query sentence that conforms to the SPARQL standard, which is a standard for querying a database composed of a plurality of query sentence-response sentences expressed in the formal language. Answering device.
The method according to claim 11,

The candidate evaluation module,

And a candidate evaluation model trained through a database composed of pairs of natural language query sentence-correct query sentences to evaluate the query sentence candidate.
The method according to claim 12,

The pattern extraction module,

Whether the natural language query sentence includes a predefined phenotype, whether the chunk includes a predefined vocabulary when the natural language query sentence is analyzed in chunks, the number of chunks, and the type of the chunk. Automatic query response device for extracting the query pattern in consideration of at least one.
The method according to claim 12,

The template,

A slot information template for extracting slot information about a formal language corresponding to the natural language query sentence; And

And a query template for converting the natural language query sentence into the second query sentence using the slot information.