WO2020100553A1 - Dispositif de génération de données de questions-réponses et procédé de génération de données de questions-réponses - Google Patents

Dispositif de génération de données de questions-réponses et procédé de génération de données de questions-réponses Download PDF

Info

Publication number
WO2020100553A1
WO2020100553A1 PCT/JP2019/041828 JP2019041828W WO2020100553A1 WO 2020100553 A1 WO2020100553 A1 WO 2020100553A1 JP 2019041828 W JP2019041828 W JP 2019041828W WO 2020100553 A1 WO2020100553 A1 WO 2020100553A1
Authority
WO
WIPO (PCT)
Prior art keywords
question
response
response data
sentence
slot
Prior art date
Application number
PCT/JP2019/041828
Other languages
English (en)
Japanese (ja)
Inventor
敬一 松澤
光雄 早坂
Original Assignee
株式会社日立製作所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社日立製作所 filed Critical 株式会社日立製作所
Publication of WO2020100553A1 publication Critical patent/WO2020100553A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/55Rule-based translation
    • G06F40/56Natural language generation

Definitions

  • the present invention relates to a question answering data generation device and a question answering data generation method, and particularly, in response data used in a question answering system in which an information processing device automatically responds to a questioner,
  • the present invention relates to a question answering data generating device and a question answering data generating method suitable for generating.
  • Another method is to create response data indicating the correspondence between the question content and the response content in advance for the document group, check the input of the questioner and the question content in the response data, and return the corresponding response. is there.
  • this method it is possible for a person to confirm that the correspondence between the question and the response is correct when creating the data, and it is possible to clearly indicate what question the response corresponds to when the response is made, so that the questioner himself verifies the correctness. it can.
  • a plurality of data formats and a method of narrowing down the question content have been proposed in order to check the input of the questioner and the response data.
  • the latter method which creates response data indicating the correspondence between question content and response content in advance for the latter document group, is excellent in that the response data creator can confirm the relationship between question and response in advance. ..
  • the quality and quantity of response data must be maintained, and the response data creator needs some work.
  • “high quality of response data” in the present specification means that the question answering system requires a small amount of work for recognizing the question of the questioner, and that it is accurate and sufficient for the question as a response sentence. It means being able to give a quick and clear answer, such as returning information.
  • Patent Document 1 discloses a method of extracting a description that matches a predefined sentence pattern in a document and rearranging the sentences to generate a question sentence and a response sentence.
  • Patent Document 2 discloses a method of extracting a keyword from a sentence or a chart in a document and substituting the keyword into a template of a predefined question sentence to form a question / answer relationship in which the keyword becomes an answer.
  • Patent Document 3 discloses a technique for creating an answer sentence for a factual question based on a rule / answer table and a regular expression rule table.
  • the question / answer relationship is generated by focusing on only one word or one sentence in the document. Therefore, it is not possible to generate a question / answer relationship that associates multiple descriptions at different positions in the document, and the question / answer relationship that can be generated is limited. Therefore, it is necessary to return accurate and sufficient information to the question. May not be possible.
  • the confirmer corrects it. Need to work. Therefore, it can be said that the amount of work of the confirmer and the quality of the response data have a trade-off relationship.
  • An object of the present invention is to provide a question answering data generating device and a question answering data generating method that can generate high quality answering data without requiring much labor for correction and confirmation.
  • the configuration of the question answering data generating device of the present invention is preferably a question answering data generating device for generating answering data for a question answering system in which an information processing device automatically returns a response to a question. Holding the extraction pattern of the structural information and the response data generation pattern consisting of the question and the response data template to which the text of the response is applied, analyzing the input document, generating the structural information of the document, and inputting the document By performing pattern matching between the structure indicated by the structure information and the extraction pattern of the response data generation pattern, extracting the text from the document that matches the pattern indicated by the extraction pattern, and applying the extracted text to the response data template. , Response data is generated.
  • FIG. 3 is a diagram showing an example of structure information of a sentence according to the first embodiment.
  • FIG. 7 is a diagram showing an example of a response data generation pattern according to the first embodiment.
  • FIG. 6 is a diagram showing an example of response data according to the first embodiment. 6 is a flowchart showing a response data generation process of the first embodiment.
  • FIG. 9 is a diagram showing an example of a response data generation pattern according to the second embodiment. It is a figure which shows an example of a response sentence mapping table. It is a figure which shows an example of the response data of Embodiment 2.
  • 9 is a flowchart showing a response data generation process of the second embodiment. 17 is a flowchart showing a process of copying / changing a response data template (S865 of FIG. 17) according to the second embodiment.
  • the question answering system 100 has a form in which a question answering device 120 and a question answering data generating device 130 are connected by a network 5.
  • the questioner 110 sends a question sentence 112 describing the question content to the question answering device 120 via the question answering terminal 111, and receives the answer sentence 113.
  • a series of questions and answers is as follows.
  • the questioner 110 inputs a question content through a voice, an input device, an operation on a screen, a gesture, etc.
  • the question answering terminal 111 sends the question matter to the question answering device 120 as a question sentence 112.
  • the question sentence 112 is expressed in a natural language textual representation such as a question sentence, a word, or an expression similar thereto (such as a selection number in an option described in a sentence), or a format that can be converted into it.
  • the question answering device 120 searches the answer database 121 for an answer sentence corresponding to the question sentence having a meaning close to that of the question sentence 112, and if found, sends the answer sentence to the questioner.
  • the question answering terminal 111 When the question answering terminal 111 receives the answer sentence 113 from the question answering device 120, the question answering terminal 111 notifies the inquirer of the answer sentence 113 through a screen or voice, and a series of work for answering the question is completed.
  • the data stored in the response database 121 which is referred to in the series of operations for the question and answer, is created by the question and answer data generator 130.
  • the question response data generation device 130 holds a document database 131 that stores one or a plurality of documents 140 and a pattern database 132 that stores one or a plurality of response data generation patterns 141.
  • the response data generation pattern 141 is composed of a specific pattern appearing in the document 140 (such as a hierarchical structure of chapters or a dependency relation of words in a sentence) and a template of response data corresponding to the pattern.
  • the question / answer data generation device 130 extracts a portion matching the pattern described in the response data generation pattern 141 from the document 140 in the document database 131, applies the words / phrases included in the portion to the template, and then the response database 121. Generate the stored data of.
  • the question answering device 120 can be realized by a general information processing device as shown in FIG. 2, and has a hardware configuration of a CPU (Central Processing Unit) 210, a main memory 220, a network interface 230, and a storage interface 240. Are connected by a bus.
  • a CPU Central Processing Unit
  • main memory 220 main memory
  • network interface 230 main memory
  • storage interface 240 storage interface
  • the CPU 210 executes various programs loaded in the memory 220 and controls each component of the question answering device 120.
  • the main memory 220 holds a program stored in the HDD 250 and necessary work data at the time of execution.
  • the network interface 230 is an interface device for transmitting / receiving data to / from the question answering device 120 and another computer (the question answering terminal 111 or the question answering data generating device 130).
  • a NIC Network Internet Card
  • a wireless LAN Local Area Network
  • the storage interface 240 is an interface device with the auxiliary storage device for reading and writing data on the auxiliary storage device.
  • HBA Hyper Bus Adapter
  • the auxiliary storage device connected to the storage interface 240 is a relatively large-capacity storage device that stores data for a long period of time, and includes an HDD (Hard Disk Drive), an SSD (Solid State Drive), an optical disk, a magnetic disk, and a magnetic tape. And so on.
  • FIG. 2 illustrates the HDD 250 as an example of the auxiliary storage device.
  • a question and answer program 221 is installed in the HDD 250, and as a subordinate program to the question and answer type response program 222, a scenario branching type response program 223, and a drilldown type response program 224.
  • These programs are programs that operate according to each generation method of the response sentence 113 shown in the first to third embodiments described later.
  • the response data management program 225 is a program for managing the response database 121 in the HDD 250 and reading and writing the data stored in the response database 121 and the response history database 122.
  • the HDD 250 holds a response database 121 and a response history database 122.
  • the answer database 121 is a database that stores data used by the question answer program 221 to determine the answer sentence 113 for the question sentence 112.
  • the response history database 122 is a database that stores how much the response data stored in the response database 121 is used in the history of past question / response exchanges. As a method of realizing the response history database 122, it may be held in the form of a log of question / answer exchanges, or a usage frequency counter may be held corresponding to each response data stored in the response database 121. .. Further, the counter of the number of times of use may not be held as the response history database 122 separately from the response database 121, but a counter may be added to each response data in the response database 121.
  • the question / answer data generation device 130 includes a response data generation unit 360 and a storage unit 350 as functional units.
  • the response data generation unit 360 is a functional unit that generates response data, and as the lower functional units, the structure analysis unit 370, the text analysis unit 380, the pattern matching processing unit 385, the data generation related unit 390, and the response data management unit 395. Have.
  • the structure analysis unit 370 is a part that analyzes the structure of the sentence 140, and has a lower layout analysis unit 371, a chapter hierarchy analysis unit 372, a tabular format analysis unit 373, and a diagram format analysis unit 374.
  • the layout analysis unit 371 is a functional unit that analyzes a sentence layout.
  • the chapter hierarchy analysis unit 372 is a functional unit that analyzes the hierarchy of chapters of the document 140.
  • the table format analysis unit 373 is a functional unit that analyzes the format of the table described in the sentence 140.
  • the diagram format analysis unit 374 is a functional unit that analyzes the format of the diagram described in the sentence 140.
  • the structure analysis unit 370 is not limited to these, and may include another functional unit that analyzes the structure of the document.
  • the text analysis unit 380 is a functional unit that analyzes text information by paying attention to the meaning and content of a sentence. It has a portion 384.
  • the morpheme analysis unit 381 is a functional unit that analyzes a morpheme (the first unit having a meaning in linguistics) in the sentence 140.
  • the dependency analysis unit 382 is a functional unit that analyzes the relationship between words in the sentence 140.
  • the anaphora analysis unit 383 is a functional unit that analyzes the semantic content such as information of pronouns in the sentence 140.
  • the regular expression unit 384 is a functional unit that analyzes the regular expression of the sentence 140.
  • the text analysis unit 380 is not limited to these, and can include other functional units that analyze text information.
  • the pattern matching processing unit 385 is a part that performs matching processing between the sentence 140 and a response data generation pattern (described later).
  • the data generation related unit 390 is a functional unit related to the function of generating response data, and as a lower functional unit, a match data statistical unit 391, a generation possibility determination unit 392, an output data change unit 393, a synonym / paraphrase expansion. It has a part 394.
  • the match data statistic unit 391 is a functional unit that counts the number of times a slot value (word) appears (detailed in the second embodiment).
  • the generation permission / inhibition determination unit 392 is a functional unit that determines whether or not it is worth generating the response data in order to improve the quality of the response data.
  • the output data changing unit 393 is a functional unit for rewriting the template (described in detail in the second and third embodiments).
  • the synonym / paraphrase expansion unit 394 is a functional unit that expands a synonym of a question sentence as a paraphrase, or replaces a phrase with a synonym or a synonym in response data.
  • the response data management unit 395 is a functional unit that deletes the response data once generated and manages the layers at a later date.
  • the response data management unit 395 may be included in the question response device 120 instead of the question response data generation device 130.
  • the storage unit 350 is a functional unit that stores information.
  • the storage unit 350 holds a document database 131, a pattern database 132, matched data 133, and a synonym / paraphrase dictionary 134.
  • the document database 131 is a database that holds the sentences 140.
  • the pattern database 132 is a database that holds pattern information for generating response data.
  • the matched data 133 is data obtained by performing pattern matching for generating response data of the sentence 140.
  • the synonym / paraphrase dictionary 134 is a thesaurus that holds synonyms / synonyms of words used by the synonym / paraphrase expansion unit 394.
  • the question answering device 120 like the question answering device 120, can be realized by a general information processing device as shown in FIG.
  • the hardware configuration of the question answering device 120 is similar to that of the question answering device 120.
  • the HDD 250 of the question answering device 120 has a response data generation program 260.
  • the response data generation program 260 is a program that realizes the function of the response data generation unit 360.
  • the response data generation program 260 has, as subordinate programs, a structure analysis program 261, a text analysis program 262, a pattern matching processing program 263, a data generation related program 264, and a response data management program 265.
  • the structure analysis program 261, the text analysis program 262, the pattern matching processing program 263, the data generation related program 264, and the response data management program 265 are the structure analysis unit 370, the text analysis unit 380, the pattern matching processing unit 385, and the data generation related, respectively. This program realizes the functions of the unit 390 and the response data management unit 395.
  • the HDD 250 of the question answering device 120 also stores a document database 131, a pattern database 132, matched data 133, and a synonym / paraphrase dictionary 134.
  • the question-and-answer data generation device of the present embodiment generates response data for one-question-one-answer question-answer.
  • the one-question-one-answer type question response is a response in which the question of the questioner 110 is individually captured and the system does not analyze the relationship before and after.
  • the questioner 110 asks a question regarding year-end adjustment of tax processing, and the question / answer data generation device 130 generates response data based on a year-end adjustment manual.
  • the question / answer table 400 is a table used by the one-answer / one-answer type program 222 for question answering, and one or a plurality of questions are stored in the answer database 121.
  • the question / answer table 400 is a table in which the correspondence relations between the question sentence 410 and the response sentence 420 are listed and stored for each entry corresponding to a row. For example, in the question / answer table 400 shown in FIG. 5, three pairs of question sentence / answer sentence entries 431, 432, and 433 are registered.
  • the one-question-answer program 222 Upon receipt of the question sentence 112, the one-question-answer program 222 searches the entries 431, 432, and 433 of the question-answer table 400 for the question sentence 410 and the question sentence 112 that are close to each other.
  • “close” is measured, for example, when the numbers of words are the same and the semantic distances of the words are close. If there is an entry having a close question, the response sentence 420 of the entry is output as the response of the one-question-one-answer program 222.
  • the information of the entry referred to at that time is stored in the response history database 122.
  • the document 140 includes a metadata section 510 and a document body 520.
  • the metadata unit 510 holds information about the document itself, not the description inside the document, in a format in which the correspondence relationship between the item name 511 and its value 512 is listed.
  • the metadata entry 513 stores the document name
  • the metadata entry 514 stores the file name
  • the metadata entry 515 stores the last update date. ing.
  • the document body 520 stores data such as actual texts, figures, and tables that make up the document 140.
  • the document body 520 generally has a structure. This structure is defined by the position / content / size / decoration of the text, and by separating them with ruled lines.
  • the document body 520 represents one chapter having a title 530 as a title, and it can be considered that the chapter has two sections indicated by section titles 540 and 550.
  • bullets 542 are arranged after the section text 541.
  • a table caption 552 and a table 553 are arranged after the section text 551.
  • this document body 520 shows a hierarchical structure in which a section comes after a chapter and a section text comes after the section.
  • the structure information 600 of this embodiment is information obtained by analyzing the structure of the sentence 140, and is expressed in the form of a tree structure in the example shown in FIG. 7.
  • the structure information 600 is a tree structure formed by a node group having the root node 610 as a root.
  • the relations that have the inclusion relation in the document are expressed as parent-child relations.
  • the root node 610 has a node 620 corresponding to the metadata 510 and a node 630 corresponding to the document body as child nodes.
  • the node 620 corresponding to the metadata has the nodes 621, 622 and 623 corresponding to the metadata entries 513, 514 and 515 as child nodes.
  • the node 630 corresponding to the text has a node 640 corresponding to the chapter as a child node, and the node 640 corresponding to the chapter has nodes 641 and 650 corresponding to the clause as child nodes.
  • the nodes 641 and 650 corresponding to the clauses use the nodes 642 and 651 corresponding to the clause text, the node 643 corresponding to the bullets, the node 660 corresponding to the table, etc. as child nodes in relation to the contents of the clauses.
  • the node 643 corresponding to the itemized list has nodes 644, 645, 646 corresponding to the respective items constituting the itemized item.
  • the node 660 corresponding to the table has nodes 661, 664, 667 corresponding to each row forming the table, and the nodes 661, 664, 667 corresponding to the row respectively correspond to the nodes 662 corresponding to each cell forming the row. , 663, 664, 665, 668, 669.
  • the table may take different representation methods on the structural information. For example, a node corresponding to a column forming a table may be a child node of a node corresponding to the table, and a node corresponding to the column may have a node corresponding to each cell forming the column as a child node. Further, regardless of the order of columns and rows, all cells constituting the table may be used as the nodes corresponding to the table and may be the child nodes of the table.
  • Each node is not limited to the hierarchical name (chapter, section, table, etc.) of the part of the document corresponding to the node, and the text and structure-based information (the number of pages in the document, chapter, section, table, etc.). No., text position and font information) can be stored as well.
  • a description that matches a predefined pattern, that is, a subtree of the tree structure is extracted and a response sentence is generated.
  • the response data generation pattern 700 is a pattern applied to generate response data.
  • the response data generation pattern 700 is composed of three patterns 710, 711 and 712 as shown in FIG. Patterns 710, 711, and 712 are an extraction pattern 720 corresponding to a part of a tree structure of structure information, and a response data template 730 that is a source of a question / response pair generated when a description matching the pattern is extracted. Consists of.
  • the extraction pattern description 721 describes the information of the extraction pattern 720.
  • the structure to be extracted is shown by describing the hierarchical name of the node having a parent-child relationship in the tree structure and the text as a pair.
  • the hierarchy name 722 “section” and the hierarchy name 724 “section body” have a parent-child relationship.
  • the slot 723 “ ⁇ word>” and the slot 725 “ ⁇ meaning” are associated with each hierarchy name. > ”Is described. This indicates that in the extracted structure, the text of the corresponding node will be substituted into these slots.
  • a slot is a pattern expression that indicates that a specific value is assigned to that portion when generating response data.
  • the extraction pattern description 741 is the same as the extraction pattern description 721 in that it has a plurality of layer names 742, 743, 745. However, it is different in that the type of text (for example, a number) corresponding to the portion is described in the slots 746 and 747, and the text other than the slot is included.
  • the node associated with the hierarchical name 745 of the subtree extracted by the main extraction pattern 720 must have the correspondence between the text in the node and the slot. Wildcards, regular expressions, and other techniques can be used as the technique for establishing the correspondence relationship between such texts and slots.
  • the response data template 730 is described as a pair of question sentence and response sentence.
  • These question / answer sentences can include slots appearing in the extraction pattern 720 in the sentence.
  • the extracted subtree if there is a text associated with the slot in the extraction pattern 720, the text is substituted into the slot in the response sentence to generate the response sentence.
  • the response data template 730 can include the aggregated content of a plurality of subtrees associated with the same extraction pattern 720. For example, in the response sentence 736, a text listing a plurality of texts associated with the slot “ ⁇ item>” 766 in the extraction pattern example 761 is substituted for the slot “ ⁇ item: list>” 767.
  • a description for processing the slot output method may be added to the response data template 730.
  • a description for processing the slot output method may be added to the response data template 730.
  • the document structure is represented by a tree structure, but another representation format may be used as long as the partial structure can be represented.
  • the table in the document may be expressed in the form of a multidimensional array instead of the tree structure.
  • the response data 900 is generated from the document 140 and the corresponding document structure 600 using the response data generation pattern 700.
  • the response data entries 931 and 932 are the nodes 641 and 650 corresponding to clauses and their child nodes in the pattern 710. It is an example generated as a result corresponding to.
  • the response statement in entry 932 contains a table. This is because the reference destination of the description “Table 2” included in the node 651 is the node 660, and the table 553 is included in the document, so the table 553 is included in the response sentence by the replacement process described later. It is a thing.
  • the response data entries 933 and 934 are examples in which the nodes 664 and 667 corresponding to rows and their child nodes are generated as a result of being associated with the pattern 741.
  • the response data entry 935 is an example generated as a result of the nodes 664 and 667 corresponding to rows and their child nodes being associated with the pattern 761.
  • the response data generation program 360 in the question response data generation device 130 stores the response data 900 in the question and response table format stored in the response database 121 used by the question response device 120 from the document group stored in the document database 131. To generate.
  • the process shown between the loop start S810 and the loop end S840 is repeated for each input document 140. Further, if there is a document for which the response data generation process has already been executed in the document group, only the unexecuted document may be targeted.
  • the layout analysis unit 371, the chapter hierarchy analysis unit 372, the tabular format analysis unit 373, and the diagram format analysis unit 374 which are lower-level functional units of the structure analysis unit 370 of the question answer data generation device 130, analyze the document 140 and It is converted into a tree-structure representation like the document structure 600 shown in FIG. 7 (S815).
  • Existing techniques can be used to convert the document 140 into a tree-structured representation. For example, as a method of dividing a document file of a format that does not hold information about paragraphs corresponding to the layout analysis unit 371 into paragraphs, there is a method of treating sentences located near each other as the same paragraph.
  • the text analysis unit 380 analyzes the text information held by each node for the tree structure representation of the converted document 140 (S820).
  • the morpheme analysis unit 381, the dependency analysis unit 382, the anaphora analysis unit 383, and the like included in the text analysis unit 380 perform processing according to their respective functions.
  • the pattern matching processing unit 385 extracts, for each pattern stored in the pattern database 132, a subtree that matches the extraction pattern 720 from the tree structure representation of the document 140 (S825).
  • the method described in the above Dongwon Lee paper can be used to extract a node group in which the relationships between the nodes match.
  • the text at each node of the extracted subtree is collated with the text or slot in the extraction pattern 720 to determine whether or not a correspondence can be obtained. If the correspondence cannot be obtained, it is considered that the subtree cannot be extracted.
  • a regular expression or the like can be used for this matching process.
  • the extraction pattern 720 and the subtree are paired and stored in the matched data 133 (S835).
  • the process from S850 to S885 is performed by the data generation related unit 390 for each extraction pattern 720 when there are a plurality of subtrees that have a corresponding relationship with the specific extraction pattern 720 in the matched data 133.
  • the slots in the response data template 730 selected in S870 are filled and the response data is output (S875).
  • the response data template 730 from one subtree not only one response data but also a plurality of data may be output.
  • the synonym / paraphrase dictionary 134 it is possible to output the response data in which the words in the response data are replaced with the synonyms or the word order is changed.
  • the response data template 730 may include other information that can be generated from the document structure 600 analyzed in S815 and the matched data 133 stored in S835. For example, it can be used to enumerate a list of chapter titles in a document or to include the number of items in a table in a response sentence.
  • the response sentence output in S875 is rewritten (S880).
  • the response sentence includes a description indicating another position in the document such as “above”, “table 2”, and “page 180”
  • the tree structure of the corresponding document 140 is referred to, and the sentence indicated by such description is referred to.
  • a table is acquired, and the description thereof is replaced or added to the end of the response sentence so that the description in the corresponding document 140 appears in the response sentence.
  • a description such as "above” indicating a relative position from the position where the word appears, it may be replaced with a description indicating an absolute position, for example, a page number or a paragraph number.
  • S890 is a supplementary process, which does not change the response contents of the response inquiry system even if it is not executed, but reduces the computer resources (CPU usage time and memory / storage medium used capacity) of the inquiry response data generation device. Affect.
  • the generation permission / inhibition determination unit 392 of the data generation related unit 390 checks whether or not it is grammatically incorrect or inappropriate for use in a question answering system, and the response data including such an answer sentence is included. To delete.
  • the data generated by the response data generation process shown in FIG. 10 may not be correct from the following viewpoints.
  • One is a case where the correspondence between the question sentence and the answer sentence is not matched, and another is a case where the question sentence itself is unnatural in terms of grammar and meaning.
  • As a cause of these occurrences for example, in a complicated table, which of the first row and the first column means the item title cannot be specified only from the structure information.
  • the response data management unit 395 deletes the response database 121 that is once generated and started to be used by the question response program 221, depending on the usage status of each generation result. For example, referring to the response history database 122, for question / answer pairs that have not been used for a certain period of time or more, of the above two viewpoints, the latter question text itself is unnatural. It can be considered that there is no possibility that it can be regarded as similar to 112. In this case, since there is no utility value, the response data management unit 395 deletes such a question / response pair.
  • the response data management unit 395 may divide each response data into a plurality of groups in the response database 121 according to the usage status of each generation result. For example, question / answer pairs are divided into groups with high, medium, and low frequency of use, statistical information for each group is acquired, and the statistical information is used as teacher data when creating subsequent response data. It can be used to estimate the frequency of use.
  • the response data to be stored in the response database 121 can be generated from the pattern and the document 140.
  • deleting redundant redundant response data by including a document structure analyzed as a response sentence and a description based on a plurality of subtrees corresponding to extraction patterns, a word is assigned to a slot in the response data template 730. It is possible to generate high quality response sentences.
  • Embodiment 2 of the present invention will be described below with reference to FIGS. 11 to 17.
  • the question-and-answer data generation device of this embodiment generates the response data of the scenario branching system.
  • the response data of the scenario branching type system is response data created for the question of the questioner 110 by assuming a scenario and branching the question according to the scenario.
  • the response data of the scenario branching type system is used when a question is answered by the scenario branching type response program 223.
  • the questioner 110 when the questioner 110 sends the question sentence 112 to the question answering device 120, the answer is returned to the questioner 110 as a response sentence 113, and the question answer is one. It was completed as a break.
  • the questioner 110 and the question answering device 120 repeatedly exchange the question sentence 112 and the response sentence 113 a plurality of times, and finally the question of the questioner 110 is asked.
  • the question answering device 120 returns an answer by narrowing down the contents.
  • the scenario branch diagram 1000 expresses a scenario of a question as a tree structure diagram, and as shown in FIG. 11, for example, states 1010, 1020, 1030, 1031, 1040, 1041, 1042, 1043, 1050, 1051, 1052, 1053, 1054, 1055 and the state transition relationship connecting them.
  • the questioner 110 is a customer of a bank, and the question is to ask a question regarding a bank account.
  • the state transition in the case of inquiring about the business hours in opening an ordinary savings account will be described as an example.
  • State transition starts from the initial state 1010, and then transitions to the subsequent state 1020.
  • the response sentence “What is your desired work?” Is set, so the scenario branching response program 222 returns “What is your desired work?” To the questioner 110 as the response sentence 113.
  • a question sentence is set in both of the transition destination states 1030 and 1031.
  • the scenario branching response program 222 prompts the questioner 110 to make the next input.
  • the question sentence 112 is compared with the question sentences set in the transition destination states 1030 and 1031 to transit to the closer state.
  • the closeness between sentences can be evaluated by the number of matching words, the edit distance, the distance in the vector expression of words or sentences, and the like.
  • the scenario branching response program 222 may prompt the questioner 110 to input again.
  • the question sentence 112 is “about opening an account”, in the states 1030 and 1031, the question sentence set to the former has a larger number of words that include the same word, and therefore transitions to the state 1030.
  • the next step is the state 1040, and the response sentence "What is your interest?"
  • the interrogator 110 inputs "business hours” in response to "?”
  • the next step transits to the state 1051. Since the state 1051 has no further transition destinations set, if the response sentence “starting at 10:00 am on weekdays ...” set in the state 1051 is responded to, the exchange of this question and answer is completed. In the process of this state transition, the information of each entry referred to is stored in the response history database 122.
  • the scenario description table 1100 is a table representation of the scenario represented by the scenario branch diagram 1000, is stored in the response database 121, and is referred to by the scenario branch response program 223. ..
  • the entries 1120 to 1132 of the scenario description table 1100 are in one-to-one correspondence with the states in the scenario branch diagram 1000. Therefore, if the scenario description table 1100 can be generated, the question answering of the scenario as shown in the scenario branch diagram 1000 becomes possible.
  • Each entry of the scenario description table 1100 has a state ID 1110, a question sentence 1111, a response sentence 1112, and a transition destination state ID 1113.
  • each entry becomes the state of the transition destination state ID 1113 and responds with the response sentence 1112.
  • the state ID in each entry refers to the state of the transition source.
  • the structure information 1900 of the present embodiment is information obtained by analyzing the structure of the sentence 140, and is expressed in the form of a tree structure as shown in FIG.
  • a node 1910 (text) is provided under the root node, and child nodes under the root node are nodes 1920, 1940, and 1950 representing chapters.
  • the child nodes below the node 1920 include nodes 1921, 1919, and 1930 that represent nodes.
  • As child nodes of the node 1921 there are nodes 1922 and 1924 representing terms.
  • As a child node of the node 1922 there is a node 1923 representing the item body.
  • the present embodiment is the same as the first embodiment in that, in the tree structure shown in the structure information 1900, a description matching a predefined pattern, that is, a subtree of the tree structure is extracted and response data is generated. ..
  • a question and a response are made based on the scenarios shown in FIGS. 11 and 12, and therefore the format of the response data generation pattern is different.
  • the response data generation pattern 1200 has the entries of the extraction pattern 720 and the response data template 1230, as shown in FIG.
  • the extraction pattern 720 describes the content that matches a part of the tree structure in the structure information 1900 of FIG. 13, similar to the response data generation pattern 700 of FIG. 8 of the first embodiment.
  • the extraction pattern description 1221 describes an example that actually matches a part of the structure information 1900.
  • the response data template 1230 included in the response data generation pattern 1200 holds data matched with the scenario description table 1100.
  • a scenario description table template 1231 is described in the response data template 1230 of this embodiment.
  • the scenario description table template 1231 has a state ID 1110, a question sentence 1111, a response sentence 1112, and a transition destination state ID 1113, like the scenario description table 1100 shown in FIG.
  • the slot used in the extraction pattern example 1221 can be included in the contents of the question sentence 1111 and the response sentence 1112.
  • the state ID 1110 and the transition destination state ID 1113 do not include the ID of a specific state, but have tentative values ⁇ a> ⁇ b> ⁇ c>. This is because, when there are a plurality of subtrees corresponding to the same pattern, different IDs are generated and assigned to ⁇ a>, ⁇ b>, and ⁇ c> in the respective subtrees, so that the IDs overlap between different subtrees. This is to prevent this.
  • the response data template 1230 also has a plurality of entries 1240, 1241, 1242, 1243 generated corresponding to the subtree.
  • the same ID is generated and assigned to the temporary values ⁇ a> ⁇ b> ⁇ c> of the same ID between different entries.
  • the response sentence mapping table 1400 is a table showing a correspondence relationship of slot values when response data is generated based on the response data generation pattern 1200.
  • the value 1430 indicates the presence or absence of the corresponding response sentence.
  • the slot values 1420, 1421, and 1422 are not necessarily filled, but may be blank or asterisks (values indicating arbitrary values). This occurs when a text corresponding to the slot or a subtree in which a node does not exist is associated using a regular expression.
  • FIG. 16 is a diagram showing an example of response data of the present embodiment with reference to FIG. 16.
  • the data in the format of the scenario description table as shown in FIG. 16 is generated as the response data 2000.
  • the response data generation program 360 in the question response data generation device 130 generates a scenario description table 1100 to be stored in the response database 121 used by the question response device 120 from the document group stored in the document database 131.
  • the processing of the response data generation table of the present embodiment is almost the same as that shown in the flowchart of FIG. 10 of the first embodiment, but as shown in FIG. 17, between S850 and S875, S865 and S870 are executed.
  • the place where the process is inserted is different. Only the different points will be described below.
  • the output data changing unit 393 rewrites the content of the response data template 730 corresponding to the extraction pattern 720 using the document structure 600 analyzed in S815 and the matched data 133 stored in S835. , Different response data templates 730 are created.
  • the response statement mapping table 1400 shown in FIG. 15 (a) is before being rearranged, and the response statement mapping table 1450 shown in FIG. 15 (b) is where the slots are rearranged.
  • the response sentence mapping table 1450 is obtained by changing the arrangement of items 1410, 1411 and 1412 in the response sentence mapping table 1400.
  • the values are first classified by the item 1410 ( ⁇ work>) and then by the item 1411 ( ⁇ item>), but in the response statement mapping table 1450, the item 1411 ( ⁇ Matters>) are used for classification.
  • the response statement mapping table 1450 in the example of the ranges 1460, 1461, and 1462 in the value 1430, when the value of the item 1411 ( ⁇ item>) is confirmed, the item 1412 ( ⁇ account name>) can be taken at that time.
  • the scenario description table template 1231 is changed or copied correspondingly in S1320 and S1325.
  • FIG. 19 is a diagram illustrating an example of changing the scenario description table template 1231 in the response data generation pattern 1200 shown in FIG. 14, and FIG. 19A shows the scenario description table template 1500 before change.
  • 19 (b) shows the entry rearrangement scenario description table template 1520
  • FIG. 19 (c) shows the post-entry reduction scenario description table template 1540.
  • the entry 1510 has “ ⁇ a>” in the transition destination state ID 1113, and the entry 1511 has the same “ ⁇ a>” in the state ID.
  • the entry 1511 has a slot ⁇ work> in the question text. From this, it is estimated that the response sentence 1112 of the entry 1510 prompts the user to input the slot ⁇ work> in the question sentence 1111 of the entry 1511. Similarly, the response sentence 1112 of the entry 1511 is for inputting the slot ⁇ account name> in the question sentence 1111 of the entry 1512, and the response sentence 1112 of the entry 1512 is for prompting input of the slot ⁇ item> in the question sentence 1111 of the entry 1513. Presumed to be.
  • the entry rearrangement scenario description table template 1520 differs from the pre-change scenario description table template 1500 in that the defined order of slots defined in S1315 shown in the example of FIG. 15B is “ ⁇ item> ⁇ ⁇ account name> ⁇ ⁇ This shows the template after the replacement in the case of "work>".
  • the procedure for replacing the scenario description table template 1500 before change and creating the entry rearrangement scenario description table template 1520 is as follows.
  • an entry having a response sentence 1112 for asking the contents and an entry having a question sentence 1111 for receiving the contents of the slot can be estimated. Therefore, as the response sentence of the entry 1530 corresponding to the initial state as the state ID 1110, the response sentence 1112 inquiring about the slot ⁇ item> to be fixed first is set. For the transition destination state ID 1113 “ ⁇ a>” of the entry 1530, the subsequent entry 1531 has the same “ ⁇ a>” as the state ID 1111.
  • the question sentence 1111 sets the question sentence 1111 of the entry 1513 which is the question sentence for receiving the slot ⁇ item>.
  • the response sentence 1112 corresponding to the slot is set in a certain entry, the subsequent entry having the state ID 1110 corresponding to the transition destination state ID 1113 set in the entry, and the question sentence 1111 corresponding to the slot are set. This is repeated in the defined order, and the response statement 1112 of the entry 1533 in which all slots are defined includes the response statement 1112 of the last entry 1513 (having the end state as the transition destination state ID 1113) in the original pre-change scenario description table template 1500. Set.
  • the response sentence mapping table 1450 of FIG. 15B in the example of the ranges 1460, 1461, and 1462 in the value 1430, when the value of the item 1411 is confirmed, the values that the item 1412 can take are determined at once. Sometimes. Therefore, the response sentence mapping table 1450 of FIG. 15B is determined to have such a case. If there is such a case, the process proceeds to S1335.
  • the slot ⁇ account name> is the state transition destination after that. Each one is decided.
  • the post-entry reduction scenario description table template 1540 shown in FIG. 19C accordingly deletes the entry asking for slot ⁇ account name> from the entry rearrangement scenario description table template 1520 shown in FIG. 19B.
  • the entry rearrangement scenario description table template 1520 shown in FIG. 19B is an example.
  • the response data template can be updated / changed using the document structure information and the information of the plurality of subtrees corresponding to the extraction pattern 720.
  • an optimum one is selected from the plurality of response data templates 1230 generated by the process of copying / changing the response data template shown in FIG. 18 according to the slot value.
  • the response statement mapping table 1450 shown in FIG. 15B for the subtree whose slot ⁇ item> is “business hours”, the post-entry reduction scenario description table template 1540 of FIG. 19C is shown.
  • the slot ⁇ item> is a "required document" for a partial tree, by selecting the entry rearrangement scenario description table template 1520 of FIG. 19B, unnecessary data as shown in FIG. It is possible to generate the response data 2000 based on a scenario in which various responses / input entries are omitted.
  • the response data management unit 395 can also update the entry rearrangement scenario description table template 1520 by referring to the response history database 122 after the operation of the question response program 221 is started.
  • the value 1430 of the response sentence mapping table 1400 of FIG. 15A has only the true / false value of the presence / absence of a response sentence with respect to the value of the slot at the stage of executing the process of copying / modifying the response data template of FIG. Absent.
  • the usage frequency of each response sentence is known from the response history database 122 after the operation is started, it is possible to rearrange the fixed order of the slots by using the above-described usage frequency instead of the true / false value as the value 1430. it can.
  • step 890 of FIG. 8 by combining the states that make substantially the same transition into one, the number of states in the scenario branch diagram 1000 and the number of entries in the corresponding scenario description table 1100 are reduced. , The response data can be reduced.
  • ⁇ I will give you two examples of summarizing the states.
  • One is the collection of partial trees.
  • the correspondence between the question sentence and the response sentence may be completely the same.
  • state 1051 and state 1054 have the same content
  • state 1053 and state 1055 have the same content.
  • the entries 1125 and 1130 match
  • the entries 1129 and 1132 match.
  • the entry 1125 and the entry 1130 may be combined into a single entry (the state ID 1110 can store not only a single value but also a plurality of values). Further, if such an entry having a transition destination has a transition destination, it is not necessary to hold the entries corresponding to a plurality of subtrees by setting the transition destination state ID to one.
  • one scenario description table template 1231 (or a table obtained by changing the scenario description table template 1231) is output for each set of chapter / section / item.
  • a document may have multiple sections for a chapter and multiple sections for a section. Therefore, for the entries 1240, 1241, 1242 relating to the inquiry about the slot value included in the chapter or section, the response data is output by the number of all the terms.
  • the present embodiment by creating a pattern according to a question scenario in the pattern database, it is possible to generate response data based on the question scenario.
  • the quality of the response data to be generated is improved by analyzing the document structure analyzed as a response sentence and the description based on a plurality of subtrees corresponding to the extraction pattern and updating / changing the contents of the response data template.
  • the question / answer data generation device of this embodiment generates the answer data of the drill-down type question / answer system.
  • the questioner 110 and the question answering device 120 repeatedly exchange the question sentence 112 and the answer sentence 113 a plurality of times, as in the scenario branching type question answering system according to the second embodiment.
  • the question answering device 120 returns an answer by narrowing down the question contents of the questioner 110, and in each case, the value is fixed for the items of a plurality of slots and finally the required slot
  • the point that the final question answer is returned when the value is fixed is the same.
  • the method of determining the value of the slot and the structure of the response data for that are different.
  • the answer data of the drill-down type question answering system is used when the question answering by the drill-down type answer program 224 is performed.
  • the drill-down type is named because the value of a slot is narrowed down and the value is fixed. In the following, compared with the first and second embodiments, different points will be mainly described.
  • the answer data used in the drill-down type question answering system is composed of the question answering table 1600 shown in FIG. 20 and the slot attribute table 1650 shown in FIG.
  • the question / answer table 1600 is a pair of the answer text for the question finally with respect to the value of the slot.
  • Each entry 1630 to 1636 of the question / answer table 1600 has a response sentence 1620 corresponding to the slot group 1610.
  • the slot group 1610 has a plurality of slots 1611, 1612, 1613.
  • the questioner 110 and the question answering device 120 repeatedly exchange the question sentence 112 and the response sentence 113, and the drill-down type answer program 224 acquires the slot value from the question sentence 112 among them. Then, when there is an entry in the question answer table 1600 in which the values of the slots 1611, 1612, and 1613 match, the corresponding answer sentence 1620 is answered, and the question answer is finished.
  • the entry 1632 sets an asterisk (*) as a value corresponding to the slot 1613. This indicates that if the values of the other slots 1611 and 1612 match the values obtained from the input, the value corresponding to the slot 1613 does not matter (may be undetermined).
  • the slot value in each entry is not limited to a single value and a value indicating undetermined value, but a plurality of values may be enumerated or a description allowing a plurality of values such as using a regular expression may be used.
  • the values of all slots are undefined at the start.
  • the question sentence 112 from the questioner 110 is analyzed and the values of the slots 1611, 1612, and 1613 are acquired.
  • a method for acquiring the slot value from the question sentence 112 is disclosed in, for example, Patent Document 3.
  • the answer sentence 1620 corresponding to the entry is returned. Further, regarding the entry, information of the entry referred to at that time is stored in the response history database 122.
  • the drill-down type response program 224 determines the value of the undetermined slot. An inquiry is sent back to the questioner 110 using the response sentence 113 so as to be confirmed. At this time, the drill-down type response program 224 uses the slot attribute table 1650 shown in FIG. 21 to generate the response sentence 113 of the inquiry.
  • the slot attribute table 1650 has entries 1680 to 1683 for each slot. Exceptionally, it may include an entry 1680 that does not correspond to a slot. Each entry is composed of a set of an empty slot item 1661, an empty slot priority 1662, and a response sentence 1670.
  • the drill-down response program 224 searches those slots for an entry that matches the empty slot item 1661 of the slot attribute table 1650, and sets the value of the priority 1662 in the entry. (In the example of FIG. 21, 0 has the highest priority and 3 has the lowest priority).
  • the slot having the highest priority 1662 is determined and the response statement 1670 in the corresponding entry is determined. Is returned as the response sentence 113 to prompt the interrogator 110 to input the value of the corresponding slot.
  • the entries 1680 to 1683 may include an entry 1680 that does not correspond to a slot.
  • the entry 1680 includes a greeting message to be output when the question and answer are first exchanged.
  • the order of defining the slot values is different.
  • the slot value can be fixed only in the order defined in the scenario branch diagram 1000.
  • the scenario branch diagram 1000 itself must be rewritten as such.
  • the order of defining the slot values is arbitrary. For example, it is assumed that the drill-down response program 224 outputs a response sentence 1670 that prompts the user to input an account name according to the entry 1681 of the slot attribute table 1650.
  • the value of the slot ⁇ item> can be determined first. Further, it is possible to determine the values of a plurality of slots from one question sentence 112.
  • the response data generation pattern of this embodiment will be described with reference to FIG.
  • a description that matches a predefined pattern, that is, a subtree of the tree structure is extracted and response data is generated.
  • the points are the same as in the first embodiment.
  • the response data is the question response table 1600 and the slot attribute table 1650, the format of the response data generation pattern is different.
  • the extraction pattern 720 included in the response data generation pattern 1700 describes the content that matches a part of the tree structure in the structure information 1900.
  • the extraction pattern description 1721 shows an example that actually matches a part of the structure information 1900.
  • the response data template 1730 included in the response data generation pattern 1700 holds data for generating the question response table 1600 and the slot attribute table 1650.
  • the response data generation pattern 1700 has a question response table template 1740 and a slot attribute table template 1760 as the response data template 1730 for data generation.
  • the question and answer table template 1740 has a slot group 1610 and corresponding answer sentences 1620.
  • the slot group 1610 has a plurality of slots 1611, 1612, 1613.
  • the entry 1750 of the question answer table template 1740 is generated by substituting the value of each slot acquired from the text when a subtree associated with the extraction pattern example 1721 is extracted in the document structure 600. Shows the entry.
  • the slot attribute table template 1760 has a slot item 1661, a priority 1662, and a response statement 1670.
  • the response statement 1670 of each entry 1770 to 1773 of the slot attribute table template 1760 can include placeholders 1780 and 1781 for substituting a set of slot values.
  • FIG. 23 is a flowchart showing the process of copying / changing the response data template of the third embodiment (S865 of FIG. 17).
  • the response data generation flow 800 is used to generate response data as in the first and second embodiments.
  • the slot attribute table template 1760 generates only one slot attribute table 1650 for each extraction pattern example 1721. This is because the response sentence that prompts input for a slot whose value is undetermined does not depend on the status of the slot value being fixed.
  • the response statement 1670 includes placeholders 1780, 1781 that substitute a set of slot values. For example, the placeholders 1780 and 1781 substitute the list of slot values obtained in the process of acquiring statistical information into the response sentence 1670 to generate a response sentence.
  • the response data generation process of the present embodiment uses the same process as the flow 800, similar to the response data generation process of the second embodiment shown in FIG. 17, and uses the response data generation program 360 in the question response data generation device 130. Generates response data (question response table 1600 and slot attribute table 1650) stored in the response database 121 used by the question response device 120 from the document group stored in the document database 131.
  • S1310 and S1315 executed in the template change flow 1800 are the same as the template change and copy processing shown in FIG. 18 of the second embodiment.
  • the contents of the slot attribute table template 1760 are changed according to the order in which the slot values determined in S1310 and 1315 are determined. Similar to the response sentence mapping table 1450 of the second embodiment, the order of defining the slots determined in S1310 and 1315 is “ ⁇ item> ⁇ ⁇ account name> ⁇ ⁇ work>”. In that case, the priority 1662 in the slot attribute table template 1760 is adjusted to the determined order, and the value of the priority 1662 is set as a standard.
  • the second example is the generation and subdivision of slots based on statistical data.
  • slots 1611, 1612, 1613 are already set in the question response table template 1740.
  • the extracted subtree can be used to generate slots. This is done, for example, when there are too many values that can be allocated to slots and when it is desired to group them and treat them as independent slots. For example, it is assumed that a plurality of values corresponding to the slot ⁇ item> in the document structure 600 can be classified by word or meaning.
  • the slot ⁇ item> can be subdivided into ⁇ item: procedure> and ⁇ item: condition>, and the question answer table template 1740 and the slot attribute table template 1760 can be divided into each.
  • the slots are subdivided in this manner, it is possible to create fine response data by setting different priorities in the respective subdivided slot attribute table templates 1760.
  • the response data management unit 395 can also update the slot attribute table 1650 shown in FIG. 21 with reference to the response history database 122 after the operation of the question response program 221 is started.
  • the value 1430 of the response statement mapping table 1400 shown in FIG. 15 of the second embodiment is only the true / false value of the presence / absence of a response statement with respect to the value of the slot at the time of executing the process of copying / modifying the response data template. do not have.
  • the usage frequency of each response sentence is known from the response history database 122. Therefore, by using the above-described usage frequency instead of the true / false value as the value 1430, for example, the slot output to the placeholders 1780 and 1781. It is possible to rearrange the order of values in descending order of frequency of use, or to set a high priority value to the priority 1662 for a slot whose value is fixed at an early stage based on the question / answer history.
  • the number of entries of the question response table 1600 shown in FIG. 21 can be reduced by combining the substantially same states into one.
  • the entries 1631 and 1634 have the same contents except the slot 1612 ⁇ work>.
  • both entries can be combined into a single entry.
  • FIG. 24 is a diagram showing an example of the question answer table generated by the answer data generating process.
  • FIG. 25 is a diagram showing an example of generation of the slot attribute table generated by the response data generation processing.
  • the response data of this embodiment is the question response table 2100 shown in FIG. 24 and the slot attribute table 2150 shown in FIG.
  • the question answer table 2100 shown in FIG. 24 and the slot attribute table 2150 shown in FIG. 25 are equivalent to the question answer table 1600 shown in FIG. 20 and the slot attribute table 1650 shown in FIG. It can be said that the questions can be answered in the same range. However, by deleting the duplication and changing the priority based on the document structure, the question answer table generation example 2100 and the slot attribute table generation example 2150 of the present embodiment are different from the question answer table 1600 and the slot attribute table 1650. It is possible to realize a more appropriate order of questions in consideration of the amount of data (the number of rows in the table) and the priority.
  • the data amount is reduced, and the response data is generated by optimizing the response data in a more appropriate question order considering priority. It is possible to improve the quality of response data.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

L'invention concerne un dispositif de génération de données de questions-réponses qui : conserve un motif d'extraction d'informations de structure d'un document, et un modèle de génération de données de réponse constitué de questions et d'un modèle de données de réponse auquel sont appliqués des textes de réponses aux questions ; analyse le document d'entrée ; génère des informations de structure de document ; réalise un appariement de motifs entre la structure indiquée par les informations de structure de document d'entrée et le motif d'extraction du motif de génération de données de réponse ; extrait du texte d'un document qui correspond au motif indiqué par le motif d'extraction ; et génère des données de réponse par application du texte extrait au modèle de données de réponse. Ainsi, en ce qui concerne les données de réponse pour la réponse du système de questions-réponses dans lequel un appareil de traitement d'informations renvoie automatiquement une réponse à une question, des données de réponse de haute qualité peuvent être générées sans nécessiter beaucoup d'intervention humaine pour la correction et la confirmation.
PCT/JP2019/041828 2018-11-13 2019-10-25 Dispositif de génération de données de questions-réponses et procédé de génération de données de questions-réponses WO2020100553A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2018-212590 2018-11-13
JP2018212590A JP7163143B2 (ja) 2018-11-13 2018-11-13 質問応答データ生成装置および質問応答データ生成方法

Publications (1)

Publication Number Publication Date
WO2020100553A1 true WO2020100553A1 (fr) 2020-05-22

Family

ID=70730828

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2019/041828 WO2020100553A1 (fr) 2018-11-13 2019-10-25 Dispositif de génération de données de questions-réponses et procédé de génération de données de questions-réponses

Country Status (2)

Country Link
JP (1) JP7163143B2 (fr)
WO (1) WO2020100553A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113190767A (zh) * 2021-04-27 2021-07-30 维沃移动通信(深圳)有限公司 一种信息应答方法和装置
JP7416665B2 (ja) 2020-06-12 2024-01-17 株式会社日立製作所 対話システム、及び対話システムの制御方法

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102410068B1 (ko) * 2021-08-11 2022-06-22 주식회사 보인정보기술 자연어 모델을 기반으로 한 질의-응답 페어 생성 방법 및 이러한 방법을 수행하는 장치
JP7347559B2 (ja) 2022-02-24 2023-09-20 沖電気工業株式会社 対話知識作成装置及び対話知識作成プログラム
WO2024015252A1 (fr) * 2022-07-11 2024-01-18 Pryon Incorporated Récapitulation et structuration supervisées de documents non structurés
WO2024014383A1 (fr) * 2022-07-13 2024-01-18 ソニーグループ株式会社 Dispositif de traitement d'informations, procédé de traitement d'informations, dispositif terminal et programme de terminal
CN117371404B (zh) * 2023-12-08 2024-02-27 城云科技(中国)有限公司 一种文本问答数据对生成方法及装置

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0877179A (ja) * 1994-09-02 1996-03-22 Fujitsu Ltd 文書索引生成装置
JP2012079161A (ja) * 2010-10-04 2012-04-19 National Institute Of Information & Communication Technology 自然言語文生成装置及びコンピュータプログラム

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3908634B2 (ja) * 2002-09-11 2007-04-25 株式会社東芝 検索支援方法および検索支援装置
JP3992642B2 (ja) * 2003-05-01 2007-10-17 日本電信電話株式会社 音声シナリオ生成方法、音声シナリオ生成装置、音声シナリオ生成プログラム
JP2008145769A (ja) * 2006-12-11 2008-06-26 Hitachi Ltd 対話シナリオ生成システム,その方法およびプログラム
KR20160014463A (ko) * 2014-07-29 2016-02-11 삼성전자주식회사 서버, 서버의 정보 제공 방법, 디스플레이 장치, 디스플레이 장치의 제어 방법 및 정보 제공 시스템

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0877179A (ja) * 1994-09-02 1996-03-22 Fujitsu Ltd 文書索引生成装置
JP2012079161A (ja) * 2010-10-04 2012-04-19 National Institute Of Information & Communication Technology 自然言語文生成装置及びコンピュータプログラム

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7416665B2 (ja) 2020-06-12 2024-01-17 株式会社日立製作所 対話システム、及び対話システムの制御方法
CN113190767A (zh) * 2021-04-27 2021-07-30 维沃移动通信(深圳)有限公司 一种信息应答方法和装置

Also Published As

Publication number Publication date
JP2020080025A (ja) 2020-05-28
JP7163143B2 (ja) 2022-10-31

Similar Documents

Publication Publication Date Title
WO2020100553A1 (fr) Dispositif de génération de données de questions-réponses et procédé de génération de données de questions-réponses
US5802504A (en) Text preparing system using knowledge base and method therefor
RU2571373C2 (ru) Метод анализа тональности текстовых данных
US20100228711A1 (en) Enterprise Search Method and System
US20170161255A1 (en) Extracting entities from natural language texts
KR19990076970A (ko) 다수 및/또는 복합 질의를 사용하여 데이터 세트의 내용을 평가하는 방법 및 시스템
JP2005507524A (ja) 機械翻訳
US7398196B1 (en) Method and apparatus for summarizing multiple documents using a subsumption model
JP2002024211A (ja) 文書管理方法およびシステム並びにその処理プログラムを格納した記憶媒体
Dekker et al. It’s more than just overlap: Text As Graph
CA2360067A1 (fr) Systeme informatique a composants toute categorie
US11562134B2 (en) Method and system for advanced document redaction
RU2544739C1 (ru) Способ преобразования структурированного массива данных
JP2020067971A (ja) 情報処理システムおよび情報処理方法
JP2020113129A (ja) 文書評価装置、文書評価方法及びプログラム
Klahold et al. Computer aided writing
Jung Semantic wiki-based knowledge management system by interleaving ontology mapping tool
Bais et al. An Arabic natural language interface for querying relational databases based on natural language processing and graph theory methods
JP2019021194A (ja) 情報処理システムおよび情報処理方法
JP2004133564A (ja) 文書検索装置
JP7253951B2 (ja) 自然言語データ処理装置およびプログラム
JP2022185970A (ja) 質問応答システム、質問応答プログラム及び質問応答方法
JP7227705B2 (ja) 自然言語処理装置、検索装置、自然言語処理方法、検索方法およびプログラム
Bais et al. An independent-domain natural language interface for multimodel databases
Danenas et al. Enhancing the extraction of SBVR business vocabularies and business rules from UML use case diagrams with natural language processing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19885721

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19885721

Country of ref document: EP

Kind code of ref document: A1