WO2005041068A1 - 質問応答型文書検索のためのシステム及び方法 - Google Patents
質問応答型文書検索のためのシステム及び方法 Download PDFInfo
- Publication number
- WO2005041068A1 WO2005041068A1 PCT/JP2004/015719 JP2004015719W WO2005041068A1 WO 2005041068 A1 WO2005041068 A1 WO 2005041068A1 JP 2004015719 W JP2004015719 W JP 2004015719W WO 2005041068 A1 WO2005041068 A1 WO 2005041068A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- question
- document
- extracted
- type
- score
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 31
- 238000000605 extraction Methods 0.000 claims abstract description 42
- 230000004044 response Effects 0.000 claims description 16
- 238000004590 computer program Methods 0.000 claims 2
- 239000000284 extract Substances 0.000 abstract description 12
- 230000000877 morphologic effect Effects 0.000 description 19
- 238000004458 analytical method Methods 0.000 description 13
- 238000005516 engineering process Methods 0.000 description 12
- 238000010586 diagram Methods 0.000 description 9
- 230000006870 function Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 3
- 241000951471 Citrus junos Species 0.000 description 2
- 241000353790 Doru Species 0.000 description 2
- 235000010575 Pueraria lobata Nutrition 0.000 description 2
- 239000002671 adjuvant Substances 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/34—Browsing; Visualisation therefor
- G06F16/345—Summarisation for human users
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99941—Database schema or data structure
- Y10S707/99942—Manipulating data structure, e.g. compression, compaction, compilation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99941—Database schema or data structure
- Y10S707/99943—Generating database or data structure, e.g. via user interface
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99941—Database schema or data structure
- Y10S707/99944—Object-oriented database structure
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99941—Database schema or data structure
- Y10S707/99944—Object-oriented database structure
- Y10S707/99945—Object-oriented database structure processing
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99941—Database schema or data structure
- Y10S707/99948—Application of database or data structure, e.g. distributed, multimedia, or image
Definitions
- the present invention extracts the semantic role (SR) of a question as a search request in the form of a question of user power, extracts a description that is an answer to the question, and displays the extracted description.
- the present invention relates to a system and a method for question-response type document search suitable for presenting to a user via a screen.
- Japanese Patent Application Laid-Open No. 8-255172 discloses the following document search technology.
- a sentence or information is extracted as an excerpt sentence (excerpt sentence data) from the document data (original text data) constituting the relevant document.
- the excerpt sentence data is extracted in advance for each sentence pattern from the original sentence data of each document stored in the original sentence database based on various viewpoints or criteria called sentence patterns.
- Excerpt sentence data extracted for each sentence pattern is stored in a database (extract sentence database) in document units.
- Japanese Patent Laying-Open No. 2002-132811 discloses the following question answering type document search technology.
- a search request (a question-response type document search system) is given to a search system in the form of a question of user power.
- a search request in the form of a question for example, asks, "What is the price of XXX?"
- the query is a natural language search request, that is, a question.
- the query determines a set of search terms and a question type.
- the document set ability The set is searched, then the answer (word) to the question is extracted from the related document set, and the set of the extracted answer and the document containing the answer (or the document number of the document) is The answer to the question is presented to the user by the search system.
- the first document search technology As described above, in the document search technology described in Document 1 (hereinafter referred to as the first document search technology), of the excerpt sentence data extracted from the document data of the searched document, A list of excerpt sentence data that matches the sentence pattern selected by the user is displayed. As a result, an excerpt sentence (summary) that is likely to be required by the user can be displayed, and the load required for the user to perform document search can be reduced.
- the excerpt sentence data used as the excerpt sentence (summary) is extracted in advance for each original sentence data sentence pattern of each document stored in the original sentence database. For this reason, the first document search technology cannot respond to changes in sentence patterns.
- a question answering document search technology (hereinafter referred to as a second document search technology) described in Patent Document 2
- a direct answer to a question (a natural language search request) is used.
- the document on which the answer is based is presented to the user. Therefore, the user can confirm the reliability of the answer.
- the second document search technology there is no need to prepare data to be used as answers in advance. For this reason, it is possible to easily cope with the addition or change of the question type. Only However, with the second document search technology, the question is only a keyword or the question is ambiguous and the question type cannot be determined! In such a case, the answer result (search result) cannot be presented to the user.
- the present invention utilizes a list of first summaries extracted from a document retrieved by a keyword retrieval method using a keyword extracted from a question, and a question answering retrieval method from the retrieved document.
- the purpose is to be able to present the extracted summary of the second summary corresponding to the answer to the question to the user.
- a question answering document search system that executes a document search in response to a search request in a question format.
- the system searches for a related document based on a keyword extracted from a question as a query-type search request, and first describes a description related to the keyword included in the searched document.
- Searching means for obtaining a document search result including a list of the extracted first summaries, question type determining means for analyzing the semantic role of the question and determining the question type of the question, Of the original text data constituting each document to be searched stored in the database, the question type determination is performed based on the original text data constituting each document indicated by the document search result obtained by the search means.
- a summary extraction is performed to obtain a list of the second summary.
- FIG. 1 is a block diagram showing a configuration of a computer system for realizing a question answering document search system according to an embodiment of the present invention.
- FIG. 2 is a block diagram showing a configuration of a question answering document search system realized by the computer system of FIG.
- FIG. 3 is a flowchart showing a processing procedure of a search device 22 in the embodiment.
- FIG. 4 is a flowchart showing a processing procedure of a question type determination unit 231 in the embodiment. It is a low chart.
- FIG. 5 is a flowchart showing a processing procedure of a digest extraction unit 232 in the embodiment.
- FIG. 6 is a diagram for explaining question type determination by a question type determination unit 231 performed using the type determination dictionary 204.
- FIG. 7 is a diagram for explaining abstract extraction by an abstract extraction unit 232 performed using a type determination dictionary 204.
- FIG. 8 is a view showing an example of a display screen in the embodiment.
- FIG. 1 is a block diagram showing a hardware configuration of a computer system for realizing a question answering document search system according to an embodiment of the present invention.
- the computer system shown in FIG. 1 includes a CPU 1, a storage device 2, a display device 3, and an input device 4.
- the CPU 1 performs various processes related to document search and controls the entire system.
- the storage device 2 includes, for example, a main memory and a disk drive (for example, a node disk drive). Here, how to use and divide the main memory and the disk drive is not directly related to the present invention, and therefore the description is omitted.
- the storage device 2 is used to store various programs executed by the CPU 1.
- the storage device 2 is also used to store an original sentence database 201, a word index 202, a morpheme dictionary 203, and a type determination dictionary 204.
- the original sentence database 201 stores original sentence data (document data) constituting each of a plurality of documents to be searched.
- the word index 202 is index information used to search for a document from a keyword.
- the word index 202 indicates, for each word included in each document to be searched, a word including the word, a document included in the word, and a position in the document.
- the morphological dictionary 203 is a dictionary used for morphological analysis of a question (question expressed in natural language) as a question-type proof request.
- Morphological dictionary 2 03 includes, for each morpheme, a set of the morpheme and part-of-speech information indicating the POS of the morpheme.
- the type determination dictionary 204 is used to analyze the semantic role of the question and determine the type of the question (question type).
- the display device 3 includes a display represented by a liquid crystal display and a display controller for controlling the display.
- the display device 3 is used to display an input field for inputting a document search request (for example, a question-type document search request), a search result for the search request, and the like.
- Input device 4 includes a keyboard and a mouse. The input device 4 is used for inputting a document search request by a user's operation and for various selections.
- FIG. 2 is a block diagram showing a configuration of a question answering document search system realized by the computer system of FIG.
- This question answering type document search system mainly includes an interface 21, a search device 22, and an extract device 23.
- the question answering document search system also includes an original sentence database 201, a word index 202, a morphological dictionary 203 and a type determination dictionary 204 appearing in FIG.
- the interface 21, the search device 22 and the excerpt device 23 are realized by the CPU 1 shown in FIG. 1 executing a question answering type document search program.
- the interface 21 has a function of receiving a search request (here, a search request in the form of a question) from a user and passing the search request to the search device 22.
- a search request here, a search request in the form of a question
- the interface 21 also has a function of receiving a search result from the search device 22 and passing the search result and a search request corresponding to the search result to the extraction device 23.
- the interface 21 further has a function of receiving a list of summaries that meet the search request from the excerpt device 23 and displaying the list of summaries together with the search results from the search device 22 on the search result list screen by the display device 3.
- the interface 21 includes a display order determination unit 210.
- the search device 22 has a keyword extracting function of extracting a search request keyword in a question format passed from the interface 21.
- the search device 22 also has a document search function for searching for a document including the extracted keyword using the word index 205.
- a search using this keyword is called a keyword search.
- a score based on the appearance rate of keywords is added. (That is, calculation of a score indicating the degree of relevance to the keyword).
- the search device 22 selects the top M (M is an integer greater than 1) documents from the scored documents and interfaces the search results including the list of titles and summaries of the selected documents. Pass to 21.
- the excerpt device 23 includes a question type determination unit 231 and a summary extraction unit 232.
- the question type determination unit 231 determines the question type of the question by analyzing the semantic role (that is, the semantic role of the question) of the search request in the question format passed from the interface 21 based on the type determination dictionary 204. I do.
- the summary extraction unit 232 specifies a sentence structure specific to the question type determined by the question type determination unit 231 based on the type determination dictionary 204.
- the summary extraction unit 232 also extracts the sentence having the specified sentence structure from the original text data of up to M documents indicated by the search result passed from the interface 21.
- the summary extraction unit 232 further scores each extracted sentence, and selects the top N (N is an integer satisfying N ⁇ M) sentences as a summary suitable for the question. The selected N summaries are passed to interface 21.
- FIG. 3 is a flowchart showing a processing procedure of the search device 22
- FIG. 4 is a flowchart showing a processing procedure of the question type determination unit 231
- FIG. 5 is a flowchart showing a processing procedure of the digest extraction unit 232.
- FIG. 6 is a diagram for explaining question type determination by the question type determination unit 231 performed using the type determination dictionary 204.
- FIG. 7 is a diagram for explaining the abstract extraction by the abstract extraction unit 232 performed using the type determination dictionary 204
- FIG. 8 is a diagram showing an example of a display screen.
- a question input field 81 is displayed on the display screen of the display device 3 as shown in FIG.
- This field 81 is used to enter a search request in the form of a question.
- the user has performed an operation for inputting a question as a search request in a question format into the question input field 81.
- This operation is performed using the input device 4.
- a question 82 using a natural language for inquiring "the price of XXX", "How much is the price of XXX?" here
- question 82 "How much is the price of XXX?", Is entered in Japanese. Therefore, FIG.
- the question 82 input from the input device 4 is passed to the interface 21 as a question type search request.
- the interface 21 passes the search request to the search device 22.
- the search device 22 performs a morphological analysis of the search request in the question format passed from the interface 21, that is, the question 82, based on the morphological dictionary 203 (step Sl).
- the question 82 "XXX no nedan wa ikura” is a morphological analysis like "ZXXX noun> + Zno k adjunct” Is done.
- ⁇ noun>, ku adjunct, and ku adverb> indicate that the corresponding morphological power is a noun, adjunct and adverb, respectively.
- the search device 22 extracts a keyword included in the question based on the result of the morphological analysis (step S2).
- keywords whose part of speech is noun that is, "XXX” and “nedan” (that is, "price" are extracted.
- the search device 22 performs a document search by a so-called keyword search method for searching for a document including the keyword extracted from the question 82 (step S3).
- the search device 22 searches for a document including a keyword by referring to the word index 202.
- the document search method using the word index 202 is conventionally well known as a method for searching a document including a keyword at high speed, and is not directly related to the present invention.
- the search device 22 scores all the searched documents (step S4).
- scoring is performed for each retrieved document based on the appearance rate of keywords in the document.
- various methods of scoring the retrieved documents are conventionally known. For example, it is also possible to assign a score to each keyword term in advance and score the retrieved documents.
- the search device 22 determines the M documents with the highest scores among all the searched documents as document search results in descending order of the score. To select (step S5). If the number of retrieved documents is less than M, all retrieved documents are selected. Here, it is also possible to select only documents exceeding a certain score as document search results in descending order of the score.
- the search device 22 summarizes a description related to the keyword, for example, a sentence including the keyword, from each of all documents (here, M documents) selected in the order of score, into a summary (first summary). (Step S6).
- the extraction of the first summary is performed for each of the M documents selected in the order of score by referring to the original sentence data stored in the original sentence database 201 and constituting the document.
- the search device 22 passes the search result including the first summary of each of the M documents selected in the order of the score to the interface 21 (Step S7).
- the interface 21 passes the search result to the extraction device 23 together with the search request in the above-described question format.
- the question type determination unit 231 of the excerpt device 23 performs a morphological analysis on the search request in the question format passed from the interface 21, that is, the question 82 (step 11).
- the morphological analysis result 61 for the question 82 “XXX no nedan wa ikura” (that is, “how much is the value of XXX?”), Ie, “ZXXX ⁇ noun> + Zno ⁇ Adjective> + Znedan noun> + / wa ⁇ adjunct> + / ikura ⁇ adverb> "
- the question type determination unit 231 extracts the keywords included in the question 82 based on the morphological analysis result 61 (step 12).
- the adverb "ikura” that is, "how much”
- the noun "nedan” that is, "price”
- the type determination dictionary 204 stores, for each predetermined question type, question type determination rule information serving as a keyword for determining the question type.
- the type determination dictionary 204 stores question type determination rule information including question type determination rule information 204a and 204b, as shown in FIG.
- the question type determination rule information 204a is used to determine a question type regarding a person.
- This information 204a includes question type information indicating a question type regarding a person and word information unique to the question type regarding the person, for example, “who”.
- information 204a is extracted from the question Indicates that the question type for a person is determined when "who" is included in the set of issued keywords.
- the question type determination rule information 204b is used to determine a question type related to money, such as price and price.
- This information 204b includes question type information indicating a question type related to money, and word information specific to the question type related to the money, for example, “price”, “price”, “amount” and “how much”.
- Japanese information is used as the information 204b. Therefore, in FIG.
- the information 204b includes a determination condition described as “(price I price I amount) & how much”.
- "I” indicates an OR condition
- "&” indicates an AND condition.
- the information 204b indicates that if at least one of "price”, “price” or “amount” and "how much” are included in the set of keywords extracted from the question, Indicates that the question type is money. "Price”, “Price” and “Amount” are synonyms.
- step S12 is performed by the question type determination unit 231, as shown in FIG. 6, pattern matching between the keyword extracted as the question power and the question type determination rule information stored in the type determination dictionary 204 is performed as shown in FIG. I do.
- the question type determination unit 231 analyzes the semantic role of the question by using the pattern matching 62, and determines the question type representing the semantic role (step S13). Here, “value stage” and “how much” of the keywords extracted from the question hit “(price I price I amount) & how much” included in the question type determination rule information 204b. In this case, the question type determination unit 231 determines that the question type is "money”.
- the question type determination unit 231 notifies the summary extraction unit 232 in the extraction device 23 of the determined question type (step S14). .
- the abstract extraction unit 232 selects one unprocessed document among the M documents indicated by the search result passed from the interface 21 to the extraction device 23, and selects the unprocessed document stored in the original text database 201.
- the original data constituting the document thus obtained is extracted (step S21).
- the digest extraction unit 232 performs a morphological analysis on the extracted original text data based on the morphological dictionary 203 (step S22).
- the original sentence data is ⁇ " ⁇ """ wa ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇
- the type determination dictionary 204 stores, in addition to the above-described question type determination rule information, sentence structure information indicating the sentence structure of a sentence that matches the question type for each predetermined question type. ing.
- the sentence structure information 204c includes sentence structure information 204c unique to the question type related to a person and sentence structure information 204d unique to the question type related to money. Is stored.
- the sentence structure information 204c includes a common Japanese sentence structure for sentences (descriptions) that are recommended as conforming to the question type when the question type is "person".
- the sentence structure information 204d indicates a sentence structure common to a sentence (description) recommended as conforming to the question type ” ⁇ numeral> / ⁇ en I manen
- a part of the sentence structure information 204d includes Japanese romaji notation " en “ (that is, “yen”), "manen” (that is, “million yen”), and "oku” (that is, "billion”) And “doru” (or “dollar”).
- the sentence structure information 204d includes the sentence structure of "numerals + (yen or million yen or (billion + noun + yen) or dollar) + classifier"
- the sentence can be extracted as a sentence that matches the question type regarding money.
- step S23 the summary extraction unit 232, based on the morphological analysis result 72 in step S22, extracts each sentence extracted in step S21 and the sentence structure information unique to the question type determined by the question type determination unit 231.
- the pattern matching 73 is performed (step S23).
- pattern matching 73 between each sentence extracted in step S21 and the sentence structure information 204d of the sentence structure information for each question type stored in the type determination dictionary 204 is performed.
- the summary extraction unit 232 extracts the matched sentence as a candidate for a sentence that matches the question type (that is, a sentence that matches the semantic role indicated by the question) (step S24).
- the sentence 71 "XXX is released on December 1 and the price is from 1.25 million yen.” Of the "1.25 million yen" is the sentence structure indicated by the sentence structure information 204d.
- the sentence 71 is extracted as a sentence candidate that matches the question type.
- the abstract extraction unit 232 extracts, for example, a noun as a keyword from the question 82 (step S25).
- the abstract extraction unit 232 selects a candidate including the keyword extracted in step S25 from the candidates (sentence 71) extracted in step S24 (step S26).
- “XXX” and “price” are extracted as keywords from question 82, "What is the price of XXX?" "XXX" is included in the above sentence 71 ("XXX is released on December 1 and the price is as low as 1.25 million yen.”). Therefore, in step S25, the sentence 71, that is, "XXX is to be released on December 1 and the price is also 1.25 million yen" is selected.
- the summary extraction unit 232 selects, from each sentence of the document searched by the search device 22, a sentence including a sentence structure specific to the question type of the question and including a keyword extracted from the question. Yes (step S23 to S26).
- synonyms “price” and “amount” of “price” extracted as keywords from the question can also be used as keywords. This synonym is stored in the type determination dictionary 204, and is included in the question type determination rule information 204b relating to money.
- the summary extraction unit 232 scores the selected sentence based on, for example, the appearance rate of the keyword as in step S4 (step S27). Summary extraction unit 232 The above steps S21 to S27 are repeated for the M document indicated by the search result (step S28). The summary extraction unit 232 then selects the top N sentences (N is an integer that satisfies N ⁇ M) from the scored sentences (candidates), and summarizes the recommended summaries (second summary) that match the question. ) Are selected, for example, in descending order of the score (step S29). If the number of scored sentences is less than N, all the scored sentences are selected. Here, it is also possible to select only sentences exceeding a certain score in descending order of the score. The summary extraction unit 232 passes the summaries selected in the order of the scores (here, the second summaries of the top N items) to the interface 21 (step S30).
- the interface 21 displays the search result previously passed from the search device 22 and the second summary passed from the summary extraction unit 232 by the display controller of the display device 3 on the display device 3. Display on the screen.
- the search result passed from the search device 22, that is, the search result including the list of the first summaries of the documents selected in the order of the score is displayed in the first area of the display screen. It is displayed on 83.
- the second summary passed from the summary extraction unit 232 that is, a list of the second summary selected in the order of the score is displayed in the second area 84 of the display screen.
- the display order determination unit 210 of the interface 21 determines the display order of the first summaries.
- the display order is determined based on the score calculated when the search device 22 searches for a related document.
- the interface 21 causes the first summary list to be displayed in the first area 83 of the display screen so as to have the determined display order (ie, the score order).
- the display order determination unit 210 determines the display order of the second summary.
- the display order is determined in the order of the score based on the score calculated when the second abstract is extracted by the abstract extracting unit 232.
- the interface 21 displays a list of the second summaries in the second area 84 of the display screen so as to have the determined display order (that is, score order).
- the determined display order that is, score order.
- the list is divided and displayed.
- the first or second summary corresponding to the group with the highest score is displayed first.
- the group with the next highest score Is switched to the first or second summary display corresponding to.
- the search device 22 is configured to pass a list of the first summaries (and titles) arranged in the order of the scores to the interface 21 in the order of the score
- the display is performed.
- the ranking determining unit 210 can determine the display order of the first summary (and title) without being aware of the score in the order of the score.
- the summary extracting unit 232 is configured to pass a second summary list in which the second summary is arranged in the score order to the interface 21, the display order determination unit 210 determines the score.
- the display order of the second summary without being aware can be determined in the order of the score.
- a document is searched by a keyword search method using a keyword input from the question as a search request in the question format, which is input to the question input field 81. Then, a first summary, which is a description related to the keyword, is extracted from each of the top M documents among the retrieved documents. The extracted first summary power is displayed in the first area 83 of the display screen in the order of score. In addition, a description corresponding to the answer that matches the question type is extracted from each of the M documents. This question type is determined by analyzing the semantic role of the question using a question answer search method.
- the description (sentence) power of each of the top N of the descriptions of each of the M documents described above is extracted as the second summary corresponding to the answer to the question.
- the extracted second summary is displayed in the second area 84 of the display screen in the order of score.
- the list of the second summary is explicitly presented to the user. You. Thus, users can easily access the information they are looking for from the second summary list. Further, in the present embodiment, since the process of the question and answer search is performed only for the document indicated by the document search result, it is possible to suppress a decrease in the response time in the question and answer search. Further, in the present embodiment, it is possible to refer to two types of summary lists having different properties, that is, the first summary list and the second summary list in order of the score and the summary power. You can easily find and access information. Here, when the user performs an operation of selecting a desired summary from the first or second summary list, the document corresponding to the summary can be displayed. The
- the description related to the document strength keyword is described only when the second summaries cannot be found from the document indicated by the document search result. It could be extracted as one summary and displayed as an alternative to the second summary. However, this presentation method cannot distinguish between the first summary and the second summary. It is also conceivable to display the first and second summaries extracted from the same document as a set. However, in this display method, whether the display order is the score order calculated when the document is searched or the score order calculated when the second summary is extracted, the first order is used. Or one of the second summaries is not ordered by score. This makes it difficult for users to use.
- the display device 3 and the input device 4 and the processing section (the interface 21, the search device 22, and the extraction device 23) for performing a document search according to the document search request input from the input device 4 are included. It is assumed that they exist in the same computer system.
- the display device 3 and the input device 4 may be provided in, for example, a client terminal
- the processing section may be provided in, for example, a search server computer connected to the client terminal via a network.
- the original database 201 may be provided in a database server computer connected to the search server computer via, for example, a network.
- a document strength searched by a keyword search method using a keyword extracted from a question is used.
- a list of the extracted first summaries, and the searched document strength, a question response search method are used.
- the list of the second summaries corresponding to the questions extracted in this manner can be presented to the user, and the user can easily access the information they are looking for.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/572,458 US7587420B2 (en) | 2003-10-24 | 2004-10-22 | System and method for question answering document retrieval |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2003364949A JP3820242B2 (ja) | 2003-10-24 | 2003-10-24 | 質問応答型文書検索システム及び質問応答型文書検索プログラム |
JP2003-364949 | 2003-10-24 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2005041068A1 true WO2005041068A1 (ja) | 2005-05-06 |
Family
ID=34510140
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2004/015719 WO2005041068A1 (ja) | 2003-10-24 | 2004-10-22 | 質問応答型文書検索のためのシステム及び方法 |
Country Status (4)
Country | Link |
---|---|
US (1) | US7587420B2 (ja) |
JP (1) | JP3820242B2 (ja) |
CN (1) | CN100535898C (ja) |
WO (1) | WO2005041068A1 (ja) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2016133919A (ja) * | 2015-01-16 | 2016-07-25 | 日本電信電話株式会社 | 質問応答方法、装置、及びプログラム |
CN108920488A (zh) * | 2018-05-14 | 2018-11-30 | 平安科技(深圳)有限公司 | 多系统相结合的自然语言处理方法及装置 |
CN111241267A (zh) * | 2020-01-10 | 2020-06-05 | 科大讯飞股份有限公司 | 摘要提取和摘要抽取模型训练方法及相关装置、存储介质 |
Families Citing this family (45)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007099812A1 (ja) * | 2006-03-01 | 2007-09-07 | Nec Corporation | 質問回答装置、質問回答方法および質問回答用プログラム |
US20100287162A1 (en) * | 2008-03-28 | 2010-11-11 | Sanika Shirwadkar | method and system for text summarization and summary based query answering |
US7966316B2 (en) * | 2008-04-15 | 2011-06-21 | Microsoft Corporation | Question type-sensitive answer summarization |
US8332394B2 (en) | 2008-05-23 | 2012-12-11 | International Business Machines Corporation | System and method for providing question and answers with deferred type evaluation |
US8275803B2 (en) | 2008-05-14 | 2012-09-25 | International Business Machines Corporation | System and method for providing answers to questions |
US8984398B2 (en) * | 2008-08-28 | 2015-03-17 | Yahoo! Inc. | Generation of search result abstracts |
JP5816936B2 (ja) | 2010-09-24 | 2015-11-18 | インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation | 質問に対する解答を自動的に生成するための方法、システム、およびコンピュータ・プログラム |
US8892550B2 (en) | 2010-09-24 | 2014-11-18 | International Business Machines Corporation | Source expansion for information retrieval and information extraction |
US9569724B2 (en) * | 2010-09-24 | 2017-02-14 | International Business Machines Corporation | Using ontological information in open domain type coercion |
US8943051B2 (en) | 2010-09-24 | 2015-01-27 | International Business Machines Corporation | Lexical answer type confidence estimation and application |
EP2616926A4 (en) | 2010-09-24 | 2015-09-23 | Ibm | PROVISION OF QUESTIONS AND ANSWERS WITH DELAYED ASSESSMENT ON THE BASIS OF TEXT WITH LIMITED STRUCTURE |
US20120078062A1 (en) | 2010-09-24 | 2012-03-29 | International Business Machines Corporation | Decision-support application and system for medical differential-diagnosis and treatment using a question-answering system |
EP2622510A4 (en) | 2010-09-28 | 2017-04-05 | International Business Machines Corporation | Providing answers to questions using logical synthesis of candidate answers |
EP2622428A4 (en) | 2010-09-28 | 2017-01-04 | International Business Machines Corporation | Providing answers to questions using hypothesis pruning |
US8738617B2 (en) | 2010-09-28 | 2014-05-27 | International Business Machines Corporation | Providing answers to questions using multiple models to score candidate answers |
CN102456060A (zh) * | 2010-10-28 | 2012-05-16 | 株式会社日立制作所 | 信息处理装置及信息处理方法 |
WO2013142493A1 (en) * | 2012-03-19 | 2013-09-26 | Mayo Foundation For Medical Education And Research | Analyzing and answering questions |
US9229974B1 (en) | 2012-06-01 | 2016-01-05 | Google Inc. | Classifying queries |
US10614725B2 (en) | 2012-09-11 | 2020-04-07 | International Business Machines Corporation | Generating secondary questions in an introspective question answering system |
US9244952B2 (en) | 2013-03-17 | 2016-01-26 | Alation, Inc. | Editable and searchable markup pages automatically populated through user query monitoring |
US20140344259A1 (en) * | 2013-05-15 | 2014-11-20 | Google Inc. | Answering people-related questions |
CN103577558B (zh) * | 2013-10-21 | 2017-04-26 | 北京奇虎科技有限公司 | 一种优化问答对的搜索排名的装置和方法 |
CN103577556B (zh) * | 2013-10-21 | 2017-01-18 | 北京奇虎科技有限公司 | 一种获取问答对的相关联程度的装置和方法 |
US20150186527A1 (en) * | 2013-12-26 | 2015-07-02 | Iac Search & Media, Inc. | Question type detection for indexing in an offline system of question and answer search engine |
US10061861B2 (en) | 2014-08-19 | 2018-08-28 | Intuit Inc. | Common declarative representation of application content and user interaction content processed by a user experience player |
US10175997B2 (en) * | 2014-11-26 | 2019-01-08 | Intuit Inc. | Method and system for storage retrieval |
US9678936B2 (en) | 2014-11-26 | 2017-06-13 | Intuit Inc. | Dynamic user experience workflow |
US10891696B2 (en) * | 2014-11-26 | 2021-01-12 | Intuit Inc. | Method and system for organized user experience workflow |
US10417717B2 (en) | 2014-11-26 | 2019-09-17 | Intuit Inc. | Method and system for generating dynamic user experience |
WO2016122575A1 (en) * | 2015-01-30 | 2016-08-04 | Hewlett-Packard Development Company, L.P. | Product, operating system and topic based recommendations |
US9953265B2 (en) | 2015-05-08 | 2018-04-24 | International Business Machines Corporation | Visual summary of answers from natural language question answering systems |
US10402035B1 (en) | 2015-07-29 | 2019-09-03 | Intuit Inc. | Content-driven orchestration of multiple rendering components in user interfaces of electronic devices |
US10732782B1 (en) | 2015-07-29 | 2020-08-04 | Intuit Inc. | Context-aware component styling in user interfaces of electronic devices |
US10802660B1 (en) | 2015-07-29 | 2020-10-13 | Intuit Inc. | Metadata-driven binding of platform-agnostic content to platform-specific user-interface elements |
CN106909573A (zh) * | 2015-12-23 | 2017-06-30 | 北京奇虎科技有限公司 | 一种评价问答对质量的方法和装置 |
US10572726B1 (en) * | 2016-10-21 | 2020-02-25 | Digital Research Solutions, Inc. | Media summarizer |
JP6789860B2 (ja) * | 2017-03-14 | 2020-11-25 | ヤフー株式会社 | 情報提供装置、情報提供方法、および情報提供プログラム |
US10127323B1 (en) * | 2017-07-26 | 2018-11-13 | International Business Machines Corporation | Extractive query-focused multi-document summarization |
US10878193B2 (en) * | 2018-05-01 | 2020-12-29 | Kyocera Document Solutions Inc. | Mobile device capable of providing maintenance information to solve an issue occurred in an image forming apparatus, non-transitory computer readable recording medium that records an information processing program executable by the mobile device, and information processing system including the mobile device |
US20200210855A1 (en) * | 2018-12-28 | 2020-07-02 | Robert Bosch Gmbh | Domain knowledge injection into semi-crowdsourced unstructured data summarization for diagnosis and repair |
US11238027B2 (en) * | 2019-03-22 | 2022-02-01 | International Business Machines Corporation | Dynamic document reliability formulation |
US11586973B2 (en) | 2019-03-22 | 2023-02-21 | International Business Machines Corporation | Dynamic source reliability formulation |
KR20210043884A (ko) * | 2019-10-14 | 2021-04-22 | 삼성전자주식회사 | 전자 장치 및 이의 제어 방법 |
JP7168963B2 (ja) * | 2020-04-28 | 2022-11-10 | 株式会社Askプロジェクト | 自然言語処理装置及び自然言語処理方法 |
JP7112107B2 (ja) * | 2020-04-28 | 2022-08-03 | 株式会社Askプロジェクト | 自然言語処理装置及び自然言語処理方法 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH04281566A (ja) * | 1991-03-08 | 1992-10-07 | Toshiba Corp | 文書検索装置 |
JP2002132811A (ja) * | 2000-10-19 | 2002-05-10 | Nippon Telegr & Teleph Corp <Ntt> | 質問応答方法、質問応答システム及び質問応答プログラムを記録した記録媒体 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH08255172A (ja) | 1995-03-16 | 1996-10-01 | Toshiba Corp | 文書検索システム |
US7058624B2 (en) * | 2001-06-20 | 2006-06-06 | Hewlett-Packard Development Company, L.P. | System and method for optimizing search results |
-
2003
- 2003-10-24 JP JP2003364949A patent/JP3820242B2/ja not_active Expired - Lifetime
-
2004
- 2004-10-22 CN CNB2004800313320A patent/CN100535898C/zh not_active Expired - Fee Related
- 2004-10-22 WO PCT/JP2004/015719 patent/WO2005041068A1/ja active Application Filing
- 2004-10-22 US US10/572,458 patent/US7587420B2/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH04281566A (ja) * | 1991-03-08 | 1992-10-07 | Toshiba Corp | 文書検索装置 |
JP2002132811A (ja) * | 2000-10-19 | 2002-05-10 | Nippon Telegr & Teleph Corp <Ntt> | 質問応答方法、質問応答システム及び質問応答プログラムを記録した記録媒体 |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2016133919A (ja) * | 2015-01-16 | 2016-07-25 | 日本電信電話株式会社 | 質問応答方法、装置、及びプログラム |
CN108920488A (zh) * | 2018-05-14 | 2018-11-30 | 平安科技(深圳)有限公司 | 多系统相结合的自然语言处理方法及装置 |
CN111241267A (zh) * | 2020-01-10 | 2020-06-05 | 科大讯飞股份有限公司 | 摘要提取和摘要抽取模型训练方法及相关装置、存储介质 |
CN111241267B (zh) * | 2020-01-10 | 2022-12-06 | 科大讯飞股份有限公司 | 摘要提取和摘要抽取模型训练方法及相关装置、存储介质 |
Also Published As
Publication number | Publication date |
---|---|
US20070073683A1 (en) | 2007-03-29 |
JP3820242B2 (ja) | 2006-09-13 |
JP2005128873A (ja) | 2005-05-19 |
CN100535898C (zh) | 2009-09-02 |
US7587420B2 (en) | 2009-09-08 |
CN1871605A (zh) | 2006-11-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2005041068A1 (ja) | 質問応答型文書検索のためのシステム及び方法 | |
JP3429184B2 (ja) | テキスト構造解析装置および抄録装置、並びにプログラム記録媒体 | |
Al-Saleh et al. | Automatic Arabic text summarization: a survey | |
JP2810650B2 (ja) | 自然言語ドキュメントのセンテンスからセンテンスの部分集合を自動的に抽出する方法及び装置 | |
US6957213B1 (en) | Method of utilizing implicit references to answer a query | |
US5794177A (en) | Method and apparatus for morphological analysis and generation of natural language text | |
US20060282414A1 (en) | Question answering system, data search method, and computer program | |
JP2010257488A (ja) | 対話形サーチクエリー改良のためのシステム及び方法 | |
AU2003243989A1 (en) | Method and system for retrieving confirming sentences | |
WO2002048921A1 (en) | Method and apparatus for searching a database and providing relevance feedback | |
JPH11102374A (ja) | データベースの文書表示方法およびその装置 | |
JPH03172966A (ja) | 類似文書検索装置 | |
KR100396826B1 (ko) | 정보검색에서 질의어 처리를 위한 단어 클러스터 관리장치 및 그 방법 | |
JP4967037B2 (ja) | 情報検索装置、情報検索方法、端末装置、およびプログラム | |
JP2000200281A (ja) | 情報検索装置および情報検索方法ならびに情報検索プログラムを記録した記録媒体 | |
US20050033569A1 (en) | Methods and systems for automatically identifying gene/protein terms in medline abstracts | |
US8082240B2 (en) | System for retrieving information units | |
JP4162223B2 (ja) | 自然文検索装置、その方法及びプログラム | |
JP4499179B1 (ja) | 端末装置 | |
JP2009086903A (ja) | 検索サービス装置 | |
JP4009937B2 (ja) | 文書検索装置、文書検索プログラム及び文書検索プログラムを記録した媒体 | |
JPH08129554A (ja) | 関係表現抽出装置および関係表現検索装置 | |
KR20030006201A (ko) | 홈페이지 자동 검색을 위한 통합형 자연어 질의-응답시스템 | |
JP5439028B2 (ja) | 情報検索装置、情報検索方法、およびプログラム | |
JPH07134720A (ja) | 文章作成システムにおける関連情報提示方法及び装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 200480031332.0 Country of ref document: CN |
|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2007073683 Country of ref document: US Ref document number: 10572458 Country of ref document: US |
|
122 | Ep: pct application non-entry in european phase | ||
WWP | Wipo information: published in national office |
Ref document number: 10572458 Country of ref document: US |