KR20030006201A

KR20030006201A - Integrated Natural Language Question-Answering System for Automatic Retrieving of Homepage

Info

Publication number: KR20030006201A
Application number: KR1020010041890A
Authority: KR
Inventors: 서정연; 이근배
Original assignee: 서정연; 이근배
Priority date: 2001-07-12
Filing date: 2001-07-12
Publication date: 2003-01-23

Abstract

PURPOSE: An integrated answering system of natural language is provided to prepare a recommended question which is similar to a user's question by attaching the recommended question on each web page, searching the most similar questions to an inputted user's question, and suggesting the questions. CONSTITUTION: A morpheme analyzing unit(110) receives a natural language query of a user from a natural language query input unit(100) and analyzes the natural language query by one morpheme. A partial sentence structure analyzer(120) decides a mathematical expression relation between the analyzed morphemes. An integrated query indexer(200) stores a search-objected web document and a representative query of a web page in an index database. An integrated query finder(300) extracts a similar query list, a correct answer, and a document using a Lexico Syntactic pattern based on a morpheme and sentence structure analyzing result. An integrated result processing unit(400) integrates and manages the result of the integrated query finder(300).

Description

Integrated Natural Language Question-Answering System for Automatic Retrieving of Homepage}

본 발명은 자연어 질의-응답 시스템에 관한 것으로, 특히 서술형 질의를 효과적으로 처리하여 일반 사용자들이 홈페이지에서 질의의 형태에 관계없이 자신이 원하는 정보를 편리하고 정확하게 찾을 수 있는 통합형 자연어 질의-응답 시스템에 관한 것이다.The present invention relates to a natural language query-response system, and more particularly, to an integrated natural language query-response system in which general users can conveniently and accurately find information desired by a user regardless of the form of a query on a homepage by effectively processing a descriptive query. .

종래의 질의-응답 검색 시스템들은 검색을 수행할 때 정답 후보들을 추출하여 점수를 부여하고 불필요한 정보들을 여과함으로써 응답 시간이 매우 늦다는 단점이 있었고, 또한 시간적 제약으로 인해 정답 후보 주변의 문맥을 효과적으로 반영하지 못한다는 문제점이 있었다. 이러한 문제점을 해결하고자, 사용자의 질문이 단답형으로 답변될 수 있는 단답형 질의에 대해서 정답을 추천해 주는 자연어 질의-응답 시스템의 일 예를 본 출원인에 의한 "단락 단위의 실시간 응답 색인을 이용한 자연어 질의-응답 검색 시스템"이라는 제목하의 특허출원 제2001-12071호에서 찾아 볼 수 있다.Conventional question-and-response retrieval systems have a disadvantage in that response time is very slow by extracting and scoring correct candidates and filtering out unnecessary information, and also effectively reflecting the context around the correct candidates due to time constraints. There was a problem that can not. To solve this problem, an example of a natural language query-response system that recommends a correct answer to a short-answer question in which a user's question can be answered shortly is described by the present applicant as a "natural language query using a real-time response index in a paragraph unit-". Patent application 2001-12071, entitled "Response Search System."

그러나, 이와 같은 자연어 질의-응답 검색 시스템에서 단답형으로 답을 할 수 없는 '방법' 또는 '이유' 등을 묻는 서술형 질의(예를 들면 "복학을 하려고 하는데요?")에 대해서는 하나의 명사구로 답을 해줄 수 있는 것이 아니라 각각의 절차를 서술적으로 풀어서 설명해야 한다. 즉 하나의 명사구로 대답할 수 없는 문제점 이 있다.However, in such a natural-language query-response retrieval system, a noun phrase is used to answer a descriptive query that asks a 'how' or 'reason' that cannot be answered in a short answer form. Rather than being able to do this, each procedure should be explained descriptively. That is, there is a problem that cannot be answered with a noun phrase.

또한, 종래의 검색 사이트(예를 들면, naver.com)에서 실시하고 있는 자연어 검색 방법에서 서술형 질의에 대한 유사 질문을 추천하는 질의-응답 시스템은 모든 웹사이트(홈페이지)만을 분류하여 질문을 만들었고 따라서, 입력 질문에 추천되는 유사질문이 너무 포괄적이거나 원하는 바와 상이해서 사용자의 질의에 가장 유사한 응답을 할 수 없다는 문제점이 있다.In addition, the query-response system that recommends similar questions about the descriptive query in the natural language search method performed by the conventional search site (for example, naver.com) classified all websites (homepages) and made a question. However, there is a problem that the similar question recommended for the input question is too comprehensive or different from the desired one, so that the user's query cannot be answered most similarly.

본 발명은 특정 웹사이트의 각각의 웹 페이지마다 그 페이지가 답변할 수 있는 질문을 미리 붙여 놓고, 입력된 사용자의 질문과 가장 유사한 질문들을 찾아 제시함으로써 추천되는 질문이 사용자의 질문과 의미적으로 아주 유사해지는, 특정 자료검색을 위한 고정밀 버티컬(vertical) 기법을 이용한 서술형 자연어 질의-응답 시스템을 제공하는데 있다.According to the present invention, each web page of a specific website is pre-posted a question that the page can answer, and a question that is recommended by searching for and presenting a question most similar to the input user's question is significantly different from the user's question. To provide a descriptive natural language query-response system that uses similar high-precision vertical techniques for searching specific data.

본 발명의 다른 목적은 사용자의 서술형 질의 및 단답형 질의를 함께 처리할 수 있고, 사용자의 질의와 미리 구축된 대표 질의 셋을 비교하여 유사한 질의를 추천하는 통합형 질의-응답 시스템을 제공하는데 있다.Another object of the present invention is to provide an integrated query-response system that can process a user's descriptive query and a short answer query together, and recommends similar queries by comparing a user's query with a pre-built representative query set.

본 발명의 다른 목적은 관리자들이 웹 서핑(web surfing)이나 기존 사용자의 질의 로그 분석을 통해 예상되는 질의 예제를 작성해서 홈페이지에 연결시키는 대표 질의 구축 시스템을 제공하는 것이다.Another object of the present invention is to provide a representative query construction system that allows administrators to create a query example expected through web surfing or query log analysis of an existing user and link it to a homepage.

도 1은 본 발명에 따른 통합형 자연어 질의-응답 시스템의 구성도.1 is a block diagram of an integrated natural language query-response system according to the present invention;

도 2는 본 발명에 따른 통합형 질의 색인기의 구조도.2 is a structural diagram of an integrated query indexer according to the present invention;

도 3은 본 발명에 따른 통합형 질의 검색기의 구조도.3 is a structural diagram of an integrated query searcher according to the present invention;

도 4는 본 발명에 따른 서술형 질의-응답처리 방법을 나타내는 흐름도.4 is a flow diagram illustrating a descriptive query-response processing method in accordance with the present invention.

도 5는 본 발명에 따른 단답형 질의-응답처리 방법을 나타내는 흐름도.5 is a flow chart showing a short-answer question-response processing method according to the present invention.

도 6은 본 발명에 따른 도 5에서 정답 후보가 있는 문서로부터 정답을 포함한 단락을 결정하는 단락 구분 단계의 상세 흐름도.FIG. 6 is a detailed flowchart of a paragraph classification step of determining a paragraph including a correct answer from a document having a correct answer candidate in FIG. 5 according to the present invention. FIG.

도 7은 본 발명에 따른 대표 질의 구축 시스템의 구성도.7 is a block diagram of a representative query building system according to the present invention;

도 8은 본 발명에 따른 대표 질의 구축방법을 나타내는 흐름도.8 is a flowchart illustrating a representative query construction method according to the present invention.

* 도면의 주요 부분에 대한 부호의 설명 *Explanation of symbols on the main parts of the drawings

110 : 형태소 분석기120 : 부분 구문 분석기110: stemmer 120: partial parser

200 : 통합형 질의 색인기201 : 서술형 질의 색인기200: integrated query indexer 201: descriptive query indexer

202 : 단답형 질의 색인기300 : 통합형 질의 검색기202: short answer query indexer 300: integrated query searcher

301 : 서술형 질의 검색기302 : 단답형 질의 검색기301: Descriptive Query Searcher 302: Short Answer Query Searcher

400 : 통합 결과 처리기400: integrated result handler

상기 목적을 달성하기 위한 본 발명의 자연어 질의-응답 시스템은, 입력된사용자의 자연어 질의를 형태소로 분석하는 형태소 분석기; 상기 형태소 사이의 수식관계를 결정하는 부분 구문 분석기; 상기 형태소 분석 및 상기 구문 분석 결과에 따라서 단답형 질의에 대해 정답 및 문서를 추천해 주는 단답형 질의 검색기와 서술형 질의에 대해 가장 유사한 질의를 추천해 주는 서술형 질의 검색기로 구성된 통합형 질의 검색기; 검색대상 웹 문서의 정답후보를 추출해서 점수를 부여해서 정답 색인 데이터 베이스에 저장하는 단답형 질의 색인기와 검색 대상 웹 페이지의 의도를 가상 문서로 구성해서 상기 가상 문서의 필수 텀을 추출하고, 상기 필수 텀에 점수를 부여해서 서술형 질의 색인 데이터 베이스에 저장하는 서술형 질의 색인기로 구성된 통합형 질의 색인기; 및 상기 통합형 질의 검색기의 결과를 통합해서 단답형 질의에 대해 정답과 함께 미리 구축된 유사 질의를 추천하고, 서술형 질의에 대해 유사질의를 추천하는 통합 결과 처리기를 포함하는 것을 특징으로 한다.Natural language query-response system of the present invention for achieving the above object comprises: a morpheme analyzer for analyzing the input of the natural language query of the user; A partial parser for determining a mathematical relationship between the morphemes; An integrated query searcher comprising a short-answer query searcher that recommends correct answers and documents for short-answer questions according to the morphological analysis and the syntax analysis result, and a descriptive query searcher that recommends a query most similar to a descriptive query; The required term of the virtual document is extracted by constructing a short-answer query indexer that extracts the correct candidates of the searched web document and assigns scores to the correct index database and the intention of the searched web page as a virtual document. An integrated query indexer composed of a descriptive query indexer that scores and stores the score in a descriptive query index database; And an integrated result processor for integrating the results of the integrated query searcher to recommend a similar query pre-built with the correct answer for the short answer query, and to recommend a similar query for the descriptive query.

이하, 첨부된 도면을 참조하여 본 발명을 상세히 설명한다.Hereinafter, with reference to the accompanying drawings will be described in detail the present invention.

도 1은 본 발명에 따른 통합형 자연어 질의-응답 시스템의 구성도 이다. 도 1에서 자연어 질의 입력기(100)로부터 사용자의 자연어 질의를 입력받아 형태소로 분석하는 형태소 분석기(110)와 상기 분석된 형태소 사이의 수식 관계를 결정하는 부분 구문 분석기(120), 검색 대상 웹 문서 및 웹 페이지의 대표 질의를 색인 데이터 베이스에 저장하는 통합형 질의 색인기(200)와 상기 형태소 및 구문 분석한 결과를 렉시코 신택틱(Lexico Syntactic) 패턴을 이용하여, 유사질의 리스트, 정답 및 문서를 추출하는 통합형 질의 검색기(300) 및 상기 통합형 질의 검색기(300)의 결과를 통합해서 처리하는 통합 결과 처리기(400)를 포함한다.1 is a block diagram of an integrated natural language query-response system according to the present invention. In FIG. 1, a morphological analyzer 110 for receiving a user's natural language query from a natural language query inputter 100 and analyzing the morpheme with a partial syntax analyzer 120 for determining a mathematical relationship between the analyzed morphemes, a search target web document, and An integrated query indexer 200 which stores a representative query of a web page in an index database and extracting a list of similarities, correct answers, and documents using Lexico Syntactic patterns of the morpheme and parsing results. An integrated query searcher 300 and an integrated result processor 400 that integrates and processes the results of the integrated query searcher 300.

도 2는 본 발명에 따른 통합형 질의 색인기(200)의 구조도 이다.2 is a structural diagram of an integrated query indexer 200 according to the present invention.

도 2에서, 본 발명의 통합형 질의 색인기(200)는 검색 대상 웹 페이지의 동일한 의도를 가진 질의들로 구성된 질의 셋을 하나의 가상 문서로 구성하는 가상 문서 작성기(210), 상기 가상 문서를 형태소로 분석하는 형태소 분석기(212), 상기 분석된 형태소 사이의 수식 관계를 결정하는 부분 구문 분석기(214), 상기 형태소 분석기(212)와 상기 부분 구문 분석기(214)의 결과를 이용하여 상기 가상 문서의 내용어를 추출하는 내용어 추출기(216), 상기 내용어 추출기에서 추출된 내용어의 텀 빈도(TF:Term Frequency), 역 문서 빈도(IDF:Inverse Document Frequency) 및 품사에 따라 차등하여 가중치를 부여하는 점수 부여기(218) 및 상기 내용어와 점수를 저장하기 위한 서술형 색인 데이터 베이스(220)를 포함하는 서술형 질의 색인기(201)와 입력 대상 웹 문서에서 문장의 형태소를 분석하기 위한 형태소 분석기(222), 상기 분석된 형태소 사이의 수식 관계를 결정하는 부분 구문 분석기(224), 상기 분석된 구문에 영역 사전과 정답 유형 규칙을 이용하여 정답 후보를 추출하는 정답 후보 추출기(226)와, 상기 정답후보 추출기에서 추출한 후보의 주변 문맥의 범위를 결정하는 단락 구분기(228), 상기 구분된 단락 내부에 존재하는 단어(term)들과 정답 후보와의 연관도를 계산하여 주변 단어에 점수를 부여하는 점수 부여기(230) 와, 상기 정답 후보의 유형을 분류하여 주변 단어와 함께 정답 색인 데이터 베이스(234)에 저장하는 분류 저장기(232)를 포함하는 단답형 질의 색인기(202)로 구성되고, 상기 통합형 질의 색인기(200)는 바람직하게 웹 검색 사이트의 서버 시스템에서 동작된다.In FIG. 2, the integrated query indexer 200 of the present invention comprises a virtual document builder 210 which constitutes a query set composed of queries having the same intention of a search target web page as one virtual document, and the virtual document is morphologically. Contents of the virtual document using the results of the morpheme analyzer 212 to analyze, the partial parser 214 to determine the mathematical relationship between the analyzed morphemes, and the results of the morphological analyzer 212 and the partial parser 214. Content-weighted extractor 216 for extracting a word, and weighting the content according to the term frequency (TF), inverse document frequency (IDF) and parts of speech extracted from the content-extractor Analyze stems of sentences in the descriptive query indexer 201 including a score assigner 218 and a descriptive index database 220 for storing the content word and the score. A morpheme analyzer 222, a partial parser 224 for determining a mathematical relationship between the analyzed morphemes, and a correct candidate candidate extractor 226 for extracting a correct candidate using a region dictionary and a correct answer type rule in the analyzed syntax. ), A paragraph separator 228 for determining a range of surrounding contexts of candidates extracted by the correct candidate extractor, and a degree of association between terms existing within the separated paragraphs and correct answer candidates, by calculating a degree of surrounding words. A short answer query indexer 202 including a score giver 230 for scoring a score, and a classification store 232 for classifying the types of the correct answer candidates and storing the correct word candidates in a correct answer index database 234. The integrated query indexer 200 is preferably operated in a server system of a web search site.

도 3은 본 발명에 따른 통합형 질의 검색기(300)의 구조도 이다.3 is a structural diagram of an integrated query finder 300 according to the present invention.

도 3에서 본 발명의 통합형 질의 검색기(300)는 도 1에서 사용자의 자연어 질의를 형태소 분석기(110)와 부분 구문 분석기(120)로 분석한 결과를 입력으로 하고, 렉시코 신택틱 (Lexico Syntactic)패턴을 이용하여 사용자의 질의에서 핵심이 되는 단어를 필수 텀(Essential Term)과 선택 텀(Optional Term)으로 분류해서 추출하는 필수 텀 추출기(310), 상기 서술형 색인 데이터 베이스(220)의 정보를 이용하여 필수 텀을 포함하는 대표 질의를 추출하는 대표 질의 작성기(312), 상기 대표 질의 작성기에서 추출된 대표 질의와 사용자의 질의 사이의 유사도를 계산하는 유사도 계산기(314) 및 상기 유사도 계산기의 결과를 순위화 하는 대표 질의 순위기(316)를 포함하는 서술형 질의 검색기(301)와 상기 사용자의 자연어 질의를 형태소 분석기(110)와 부분 구문 분석기(120)로 분석한 결과를 렉시코 신택틱 패턴을 이용하여 사용자의 질의 의도를 파악하는 의도 분석기(318)와, 상기 정답 색인 데이터 베이스(234)의 정보를 이용하여 후보 정답들을 추출하고, 질의어와 각 정답 후보 사이의 유사도를 계산하는 유사도 계산기(320)를 포함하는 단답형 질의 검색기(302)로 구성된다. 상기 단답형 질의 검색기(302)는 바람직하게 사용자의 자연어 질의에 응답하여 웹 문서를 검색하는 일반 문서 검색기(322)와 문서 색인 데이터 베이스(326)의 정보를 이용하여 후보 정답들을 추출하고, 질의어와 각 정답 후보들 사이의 유사도에 상기 정답 후보와 질의어 사이의 유사도를 통합하는 유사도 통합기(328)를 더 포함한다. 상기와 같은 통합형 질의 검색기(300)는 바람직하게 웹 검색 사이트의 서버 컴퓨터에서 동작하도록 설치된다.In FIG. 3, the integrated query searcher 300 of the present invention receives the results of analyzing the user's natural language query using the morpheme analyzer 110 and the partial parser 120 in FIG. 1, and Lexico Syntactic. Using the information of the essential term extractor 310 and the descriptive index database 220 to extract and classify the key words in the user's query into an essential term and an optional term using a pattern. Representative query builder 312 for extracting the representative query including the required term, similarity calculator 314 for calculating the similarity between the representative query extracted from the representative query builder and the user's query ranking The descriptive query searcher 301 including the representative query ranker 316 to be formed and the natural language query of the user are analyzed by the stemmer 110 and the partial parser 120. Extracts candidate correct answers using information from the correct answer index database 234 using the intention analyzer 318 for determining a user's query intent using a Lexico syntax pattern, and the similarity between the query and each correct candidate It consists of a short-answer query searcher 302 that includes a similarity calculator 320 to calculate the. The short answer query searcher 302 preferably extracts candidate answers using information from a general document searcher 322 and a document index database 326 that search a web document in response to a user's natural language query. And a similarity integrator 328 that integrates the similarity between the correct candidate and the query word to the similarity between the correct candidates. The integrated query searcher 300 as described above is preferably installed to operate on a server computer of a web search site.

도 4 및 도 5를 참조해서 통합형 질의-응답 시스템을 보다 상세히 설명하면, 도 4에 도시된 바와 같이, 서술형 질의 색인기(201)는 가상 문서 작성기(210)에서 검색 대상 웹 페이지의 동일한 의도를 가진 질의 셋(set)을 하나의 가상 문서(예를 들면 "복학을 하려고 하는데요.", "복학원을 내려면 어떻게 하죠?" 와 같은 동일한 의도를 가진 사용자의 질의들을 하나의 문서로 작성)로 구성하는 가상 문서 작성 단계(S2), 및 상기 가상 문서작성기(210)를 통해 작성된 가상 문서를 형태소 분석기(212) 및 부분 구문 분석기(214)를 통해 가상 문서의 단락을 형태소 분석하고 구문 분석하는 단계(S3)를 수행한다.4 and 5, the integrated query-response system will be described in more detail. As shown in FIG. 4, the descriptive query indexer 201 has the same intention of the searched web page in the virtual document composer 210. Consisting a query set into a single virtual document (for example, "I want to go back to school", "How do I get a back home?"). Stemming and parsing the paragraph of the virtual document through the morphological analyzer 212 and the partial parser 214 of the virtual document creation step (S2), and the virtual document created through the virtual document creator 210 (S3) ).

상기 S3 단계를 수행한 후, 내용어 추출기(216)는 상기 가상 문서의 내용이 되는 단어를 품사 별로 추출하는 단계(S4)를 수행한다. 상기 S4 단계에서 내용어가 추출되면 하나의 문서 내에서 내용어 사용빈도를 나타내는 TF와 여러 문서 내에서 내용어 사용 개수의 역수를 나타내는 IDF의 곱( TF*IDF )을 계산하여 내용어에 가중치를 부여하는 단계(S5)를 수행한 후, 상기 각 내용어의 품사에 따라 차등하여(예를 들면, 고유명사는 일반명사보다 높은 가중치를 부여한다) 가중치를 부여하는 단계(S6)를 수행한다.After performing the step S3, the content word extractor 216 performs a step (S4) for extracting the words that are the contents of the virtual document for each part of speech. When the content word is extracted in step S4, the weight is applied to the content word by calculating the product (TF * IDF) of the TF indicating the frequency of the content word usage in one document and the IDF representing the inverse of the number of content word usage in the various documents. After performing step S5, the step S6 of weighting is performed according to the parts of speech of each content word (for example, proper nouns give a higher weight than common nouns).

상기 S5 및 S6 단계에서 부여받은 가중치와 상기 각 내용어에 부여한 점수는 서술형 색인 데이터 베이스(220)에 내용어와 함께 저장하는 단계(S7)를 수행한다.The weights given in the steps S5 and S6 and the scores assigned to the content words are stored together with the content words in the descriptive index database 220 (S7).

한편, 도 4에 도시된 서술형 질의 검색기(301)는 도 1의 자연어 질의 입력기(100)를 통해 입력받은 사용자의 자연어 질의를 형태소 분석기(110) 및 부분 구문 분석기(120)를 통해 언어 분석을 수행하고 상기 분석 결과는 필수 텀추출기(310)에서 렉시코 신택틱 패턴(H)을 이용하여 필수 텀과 선택 텀으로 분류해서 필수 텀을 추출하는 단계(S8)를 수행한다. 필수 텀은 사용자의 질의에서 핵심이 되는 단어로 검색 대상 문서나 문장에 꼭 나타나야 하는 단어이고 선택 텀은 사용자의 질의의도를 파악하는 데 도움을 주는 단어이다.Meanwhile, the descriptive query searcher 301 illustrated in FIG. 4 performs language analysis on the user's natural language query received through the natural language query inputter 100 of FIG. 1 through the morpheme analyzer 110 and the partial syntax analyzer 120. The result of the analysis is performed by extracting the required term by classifying the required term and the selected term using the Lexico syntax pattern H in the essential term extractor 310. The essential term is the key word in the user's query. The term must appear in the document or sentence to be searched. The selection term is the word that helps to identify the user's intention.

필수 텀의 추출은 텀 주변의 구문적요소 및 형태에 의해 결정되는데, 예를 들어, "중국에 대한 기사"라는 문장에서 '중국'은 '~에 대한' 이라는 구문적 요소에 의해 질의의 대상이 되며, 필수 텀으로 추출된다.The extraction of mandatory terms is determined by the syntactic elements and forms around the term. For example, in the sentence "Article about China", "China" is the subject of the query by the syntactic element of "about". It is extracted to the required term.

상기 S8 단계에서 추출된 사용자 질의의 필수 텀과 선택 텀이 추출되면, 서술형 색인 데이터 베이스(220)에 미리 구축된 각 홈페이지의 대표 질의들은 사용자 질의와 유사도를 측정해서 추천하기 위해 대표 질의 작성기(312)에서 대표 질의 셋을 구성하는 단계(S9)를 수행한다. 각 홈페이지에는 사용자가 해당 홈페이지에서 정보를 얻을 수 있도록 대표적인 질의(예를 들면 '학사 관리'에 해당하는 홈페이지에는 "복학을 하려는데 어떻게 해야 합니까?"와 같이 '복학'이라는 필수 텀을 포함하는 질의)가 미리 구축되어 있다.When the essential term and the selection term of the user query extracted in the step S8 are extracted, the representative queries of each home page pre-built in the descriptive index database 220 are recommended to measure and recommend the similarity with the user query. In step S9, a representative query set is configured. For each homepage, a typical query for the user to get information from the homepage (for example, a homepage corresponding to 'Bachelor Administration' contains a mandatory term 'return', such as "How do I get back to school?") Is built in advance.

상기 S9 단계에서 각 홈페이지의 대표 질의들로 구성된 대표 질의 셋이 구성되면, 유사도 계산기(314)는 필수 텀을 포함하지 않는 대표 질의들을 제거하고, 필수 텀을 포함하는 대표 질의들과 사용자의 질의 사이에 공유되는 색인 텀의 개수를 계산하는 단계(S10)를 수행한다. 상기 S10 단계에서 대표 질의와 사용자 질의 사이에 색인 텀을 많이 공유하는 대표 질의는 높은 유사도를 부여한다.When the representative query set composed of the representative queries of each homepage is configured in step S9, the similarity calculator 314 removes the representative queries that do not include the required term, and between the representative queries including the required term and the user's query. A step S10 of calculating the number of index terms shared in step S10 is performed. In the step S10, the representative query sharing a lot of index terms between the representative query and the user query gives a high similarity.

상기 S10 단계에서 추천된 대표 질의들 중에서 공유된 색인 텀의 개수가 일치하는 경우에 사용자의 질의 텀과 대표 질의의 색인 텀에 부여된 가중치의 곱에 따라 유사도를 부여하는 단계(S11)를 수행한다. 상기 S11 단계에서 사용자의 질의 텀의 가중치는 구문적 중요도에 따라 결정되는데 주제어, 주어, 목적어 순서로 높은 점수에서 낮은 점수로 부여된다. 가중치의 곱까지 일치할 경우 색인 텀의 개수가 적은 대표 질의에 높은 점수를 부여를 한다.If the number of shared index terms among the representative queries recommended in step S10 is identical, a similarity is given according to a product of weights assigned to the user's query term and the index term of the representative query (S11). . In the step S11, the weight of the user's query term is determined according to syntactic importance, and is given from the high score to the low score in the order of the main word, the subject, and the object. If the weights are matched up, a high score is given to a representative query with a small number of index terms.

상기 S11 단계를 수행한 후, 같은 홈페이지에 연결되며, 같은 의도를 가진 질의들은 점수가 가장 높은 하나의 대표 질의로 통합하는 단계(S12)를 수행하고, 상기 S10 내지 S12 단계를 수행하면서 부여된 점수를 이용해서 대표 질의 순위기(316)는 높은 점수에서 낮은 점수로 상기 각 대표 질의를 순위화하는 단계(S13)를 수행한다.After performing the step S11, the same homepage, queries with the same intention to perform the step (S12) of merging into a representative query having the highest score (S12), the score given while performing the steps S10 to S12 Using the representative query ranker 316 performs the step (S13) of ranking each of the representative queries from a high score to a low score.

도 5에 도시된 바와 같이, 단답형 질의 색인기(202)는 도 2의 형태소 분석기(222) 및 부분 구문 분석기(224)를 통해 대상이 되는 문서의 단락을 형태소 분석하고 부분 구문 분석하는 단계(S14)를 수행한다. 일련의 언어처리 과정을 수행한 후, 정답 후보 추출기(226)에서 정답 유형에 해당하는 정보를 담고 있는 인명 사전, 지명 사전, 기관명 사전 등을 포함하는 영역 사전(A)과 홈페이지 주소나 이메일 주소와 같이 인식할 수 있는 정규 문법을 포함하는 정답 유형 규칙(B)을 이용하여 정답으로 추천할 후보들을 추출한다(S15). 추출하는 정답 후보의 유형은 미리 정의되며, 단답형 질의 검색기(302)는 정의된 유형에 맞는 사용자 질의에 대해서만 응답을 생성할 수 있으며, 유형에 없는 사용자 질의가 입력되면 기존의 검색 시스템처럼 관련 문서를 추천한다.As illustrated in FIG. 5, the short-answer query indexer 202 may stem and partially parse a paragraph of a target document through the stemmer 222 and the partial parser 224 of FIG. 2 (S14). Perform After performing a series of linguistic processing, the correct candidate candidate extractor 226 includes an area dictionary (A) including a life dictionary, a place name dictionary, and an organization name dictionary containing information corresponding to the type of the correct answer, and a homepage address or an email address. Candidates to be recommended as correct answers are extracted using the correct answer type rule (B) including a regular grammar that can be recognized (S15). The types of correct answer candidates to be extracted are predefined, and the short-answer query searcher 302 can generate a response only for a user query that matches the defined type. I recommend you.

정답 후보가 추출되면, 단락 구분기(228)에 의해 대용어와 어휘 체인 정보(C) 등과 같이 문맥의 연결을 나타내는 표지를 사용하여 후보에 영향을 미칠 수 있는 주변 문장의 범위를 결정하는 단락 구분을 수행한다(S16).Once the correct candidate is extracted, paragraph separator 228 determines the range of surrounding sentences that may affect the candidate using markers indicating contextual linkage, such as substitute terms and lexical chain information (C). Perform (S16).

또한 단락 내에서 정답 후보에 영향을 미칠 수 있는 단어들을 추출하고, 점수 부여기(230)에 의해 각 단어에 정답 후보와의 동격 관계 여부를 나타내는 정보(D), 위치적인 거리 차(E), 품사 정보(F), 어휘 체인정보(G)등을 이용하여 점수를 부여한다(S18).In addition, words that may affect the correct candidate in the paragraph are extracted, and the scorer 230 indicates information (D), positional distance difference (E), indicating whether or not each word is equal to the correct candidate. The score is assigned using the part-of-speech information F, the lexical chain information G, and the like (S18).

상기 S18 단계를 수행한 후, 분류 저장기(232)에 의해 현재 추출된 정답의 유형에 따라 정답 색인 데이터 베이스(234)중 해당하는 하나를 선택하고, 주변단어를 저장한다(S20). 주변 단어는 정답 색인 데이터 베이스(234)의 키가 되며, 데이터 베이스의 내용은 각 단어에 부여된 점수와 함께 정답 후보의 문서내 위치 등의 정보도 저장된다.After performing step S18, a corresponding one of the correct answer index database 234 is selected according to the type of the correct answer currently extracted by the classification storage unit 232, and the peripheral words are stored (S20). The surrounding words become the keys of the correct answer index database 234, and the contents of the database also store information such as the position of the correct answer candidate in the document together with the scores assigned to each word.

상기 S16 단계를 도 6을 참고로 상세히 설명한다. 먼저 정답 후보가 있는 문서와 문장을 선택하여 추출하면(S141), 정답 후보 문장을 기준으로 앞, 뒤 몇 문장까지를 최대 단락의 크기로 할 것인지 결정한다(S142, S143). 최대 단락의 크기가 결정되면 문장에 존재하는 대용어나 어휘 체인을 살펴보고, 현재 단락에 포함할 것인지 아닌지를 결정한다(S144, S145). 예를 들어, 정답 후보 문장과 이전 문장 사이에 어휘 체인이 존재하거나 대용 현상을 관찰할 수 있다면, 이전 문장은 현재 단락에 포함된다. 정답 후보 문장과 다음 문장 사이의 단락 결정 방법도 상기 방법과 동일하게 수행된다.The step S16 will be described in detail with reference to FIG. 6. First, if a document and a sentence having a correct answer candidate are selected and extracted (S141), it is determined whether up to and following how many sentences are the maximum paragraph sizes based on the correct candidate sentence (S142 and S143). When the size of the maximum paragraph is determined, the alternative word or vocabulary chain existing in the sentence is examined, and it is determined whether to include in the current paragraph (S144, S145). For example, if there is a lexical chain between the correct candidate candidate sentence and the previous sentence, or if the substitution can be observed, the previous sentence is included in the current paragraph. The paragraph determination method between the correct candidate sentence and the next sentence is performed in the same manner as the above method.

한편 도 5에 도시된 단답형 질의 검색기(302)는 도 1의 자연어 질의 입력기(100)를 통해 입력받은 사용자의 자연어 질의를 형태소 분석기(110) 및 부분 구문 분석기(120)를 통해 언어 분석을 수행하고, 상기 분석 결과는 사용자 의도 분석기(318)에서 렉시코 신택틱 패턴(H)을 이용하여 사용자의 질의 의도를 파악하는 단계(S21)를 수행한다.Meanwhile, the short-answer query searcher 302 illustrated in FIG. 5 performs a language analysis on the user's natural language query received through the natural language query inputter 100 of FIG. 1 through the morpheme analyzer 110 and the partial syntax analyzer 120. In step S21, the analysis result is determined by the user intention analyzer 318 using the Lexico syntax pattern H.

상기 S21 단계를 수행한 후, 유사도 계산기(32)에서 질의 의도에 맞는 정답 색인 데이터 베이스(234)중 하나를 선택하고 사용자 질의 텀을 키로 하여 정답 후보들을 생성한다(S22). 정답 후보들과 사용자 질의어 사이의 유사도 계산은 정답 색인 데이터 베이스(234)에 저장되어 있는 각 단어들의 점수들을 이용하여 계산된다. 유사도 계산이 끝나면 우선 순위화하여 정답을 추천하는 단계(S23)를 수행한다.After performing step S21, the similarity calculator 32 selects one of the correct answer index databases 234 suitable for the intention of the query, and generates correct answer candidates using the user query term as a key (S22). The similarity calculation between the correct answer candidates and the user query word is calculated using the scores of the respective words stored in the correct answer index database 234. When the similarity calculation is finished, the step of recommending the correct answer by prioritizing is performed (S23).

상기 S23 단계를 수행한 후, 상기 단계에 부가하여 일반적인 문서 검색을 위한 일반 문서 검색기(322)와 문서 유사도 분석기(324)를 통해 사용자의 자연어 질의에 응답하여 웹 문서를 검색하고, 문서 색인 데이터 베이스(326)의 정보를 이용하여 후보 정답들을 추출하고, 질의어와 각 정답 후보들 사이의 유사도를 계산하는 일반적인 문서 검색 방법(S24)에 따라 문서-질의어 유사도를 별도로 구하고, 유사도 통합기(328)에서 상기 일반 문서 검색 결과인 문서-질의어 유사도에 상기 S22 단계에 따른 정답 질의어 유사도를 통합한 다음, 문서의 순위를 재순위화한다(S25).After performing step S23, in addition to the step, a general document searcher 322 and a document similarity analyzer 324 for searching a general document search a web document in response to a user's natural language query, and a document index database By using the information of 326, the candidate correct answers are extracted, and the document-query similarity is separately obtained according to the general document retrieval method S24 for calculating the similarity between the query word and each correct candidate, and the similarity integrator 328 determines the similarity. Integrating the correct query query similarity according to the step S22 to the document-query similarity which is a general document search result, and then rerank the documents (S25).

통합 결과 처리기(400)는 서술형 자연어 질의에 대해서, 상기 서술형 질의검색기(301)에서 출력되는 순위화된 상위 다수개의 유사 질의 리스트를 사용자에게 제시하고, 단답형 질의는 상기 서술형 질의 검색기(301)에서 출력되는 순위화된 상위 다수개의 유사 질의와 함께 상기 단답형 질의 검색기(302)에서 출력되는 순위화된 상위 다수개의 정답 리스트와 정답을 포함하거나 추천된 유사 질의에 연결된 상위 다수개의 문서를 제시한다. 제시된 유사 질의 중에서 사용자가 자기의 질의와 가장 유사한 대표 질의를 선택하면 그 대표 질의에 연결된 웹페이지(즉 답변애 있는 페이지)로 연결한다.The integrated result processor 400 presents the user with a ranked high-order similar query list output from the descriptive query searcher 301 for the descriptive natural language query, and the short answer query is output from the descriptive query searcher 301. A list of the ranked top plurality of correct answers outputted from the short-answer query searcher 302 together with the ranked top plurality of similar queries and the top plurality of documents including the correct answers or linked to the recommended similar query are presented. If the user selects a representative query that is most similar to his query among the suggested similar queries, the user is connected to the web page (that is, the answer page) connected to the representative query.

도 7은 대표 질의 구축 시스템의 구성도 이다.7 is a block diagram of a representative query building system.

관리자는 인터넷 홈페이지에 접속할 수 있는 대표 질의 구축용 웹 브라우저(520)를 통해 인터넷 서핑을 하면서, 사용자들이 유용하게 사용할 수 있는 홈페이지의 대표 질의 또는 상기 통합형 질의-응답 시스템에 축적된 사용자 질의 로그 중에서 가치가 있는 질의들은 대표 질의 구축용 질의 셋 작성기(540)에 입력하여 질의 셋을 구성하여, 상기 서술형 질의 색인기(201)의 입력으로 제공하고, 상기 서술형 질의 색인기(201)의 처리과정을 수행한 후, 대표 질의는 서술형 색인 데이터 베이스(220)에 저장된다.The administrator surfs the Internet through a representative web browser for constructing a representative query that can access the Internet homepage. Queries are inputted to the representative query construction query set builder 540 to construct a query set, provided as an input of the descriptive query indexer 201, and after performing a process of the descriptive query indexer 201. The representative query is stored in the descriptive index database 220.

도 8은 대표 질의를 사용자/관리자가 쉽게 구축하기 위한 시스템의 흐름을 나타낸 흐름도 이다.8 is a flowchart illustrating a flow of a system for easily constructing a representative query by a user / administrator.

관리자가 홈페이지 서핑 또는 자연어 질의-응답 시스템에 축적된 사용자의 기존 질의들을 상기 대표 질의 구축용 질의 셋 작성기(540)를 통해 질의 셋을 작성하고(S41), 상기 서술형 색인 데이터 베이스(220)에 홈페이지의 주소, 의도, 대표질의를 추가할 것인가를 판단한다(S42, S43, S44)). 상기 홈페이지를 등록할 의사가 없으면 삭제 단계(S47, S48, S49)를 수행하고, 상기 홈페이지를 서술형 색인 데이터 베이스(220)에 등록할 때는 상기 대표 질의 구축용 질의 셋 작성기(540)로 작성된 질의 셋을 상기 서술형 질의 색인기(201)의 입력으로 제공하고(S45), 상기 서술형 질의 색인기(201)의 처리과정을 통해 서술형 색인 데이터 베이스(220)에 저장하는 단계(S46)를 수행한다.The administrator creates a query set through the representative query construction query set generator 540 for the user's existing queries accumulated in the homepage surfing or natural language query-response system (S41), and the homepage is stored in the descriptive index database 220. It is determined whether to add an address, an intention, and a representative query (S42, S43, S44). If there is no intention to register the homepage, delete steps S47, S48, and S49 are performed. When the homepage is registered in the descriptive index database 220, a query set created by the representative query construction query set builder 540 is used. Is provided as an input of the descriptive query indexer 201 (S45), and stored in the descriptive index database 220 through the process of the descriptive query indexer 201 (S46).

또한 하나의 페이지에 여러 종류의 의도를 가진 대표 질의가 존재하므로 의도 별로 질의를 구분하여 저장한다. 예를 들어, '학사 관리'에 해당하는 페이지를 찾기 위해서 사용자들은 (1) "복학을 하려고 하는데요.", (2)"복학원을 내려면 어떻게 하죠?" 또는 (3)"휴학은 몇 번 할 수 있나요?" 등과 같은 질의를 할 수 있다. 여기서 (1)과 (2)는 같은 의도를 나타내고 있으며, (3)은 다른 의도를 나타내고 있으나 동일한 페이지를 가진다. 이러한 동일한 페이지에 다른 의도를 가진 대표 질의에 동일한 아이디를 부여해서, 서술형 질의 색인기(201)가 동일한 페이지를 가리키며, 동일한 의도를 가진 하나의 대표 질의가 되도록 통합한다.In addition, since there are representative queries with different kinds of intentions in one page, the queries are classified and stored according to intents. For example, in order to find a page corresponding to 'Bachelor Administration', users may want to (1) "I want to go back to school", (2) "How do I get a back home?" Or (3) "How many times can I take a leave of absence?" You can query like this. Here, (1) and (2) indicate the same intention, and (3) indicate different intentions but have the same page. By assigning the same ID to a representative query having a different intention on such a same page, the descriptive query indexer 201 points to the same page and integrates it into a representative query having the same intention.

상기된 바와 같은 본 발명에 따른 통합형 자연어 질의-응답 시스템은 기존의 자연어 질의-응답 시스템에서 처리하기 힘든 서술형 질의를 효과적으로 처리해서 일반 사용자들이 질의의 형태에 관계없이 자신이 원하는 정보를 편리하고 정확하게 찾을 수 있게, 사용자의 질의에 대해서 정답과 함께 미리 구축된 유사 질의를 추천함으로써, 사용자들이 원하는 정보를 빠르고 쉽게 찾을 수 있는 효과가 있다.The integrated natural language query-response system according to the present invention as described above effectively processes descriptive queries that are difficult to process in the existing natural language query-response system so that general users can conveniently and accurately find the information they want regardless of the form of the query. Thus, by recommending a similar query pre-built with the correct answer for the user's query, the user can quickly and easily find the desired information.

이상에서는, 본 발명을 특정의 바람직한 실시 예에 대해서 도시하고 설명하였다. 그러나, 본 발명은 상술한 실시 예에만 한정되는 것은 아니며, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 이하의 특허청구범위에 기재된 본 발명의 기술적 사상의 요지를 벗어남이 없이 얼마든지 다양하게 변경 실시할 수 있을 것이다.In the above, the present invention has been shown and described with respect to certain preferred embodiments. However, the present invention is not limited to the above-described embodiments, and those skilled in the art to which the present invention pertains may vary without departing from the spirit of the technical idea of the present invention described in the claims below. It will be possible to carry out the change.

Claims

In a natural language question-answer system,

A morpheme analyzer for morphologically analyzing a user's natural language query;

A partial parser for determining a mathematical relationship between the morphemes;

A descriptive query indexer configured to extract an indispensable term of the virtual document by constructing an intention of a search target web page as a virtual document, assigning a score to the mandatory term, and storing it in a descriptive index database; And

And a descriptive query searcher that ranks and recommends a representative query most similar to a user's descriptive natural language query using the stemmer, the parser result, and the index result of the descriptive query indexer.

In a natural language question-answer system,

A short-answer query indexer that extracts the correct candidates for the searched web documents, assigns scores and stores them in the correct index database, and extracts the required terms of the virtual documents by constructing the intention of the searched web pages into virtual documents. An integrated query indexer composed of a descriptive query indexer that scores and stores the score in a descriptive index database;

A short answer query searcher that recommends correct answers and documents for a short answer query using the stemmer, the parser result, and the index results of the integrated query indexer, and a descriptive query searcher that recommends the most similar query for a descriptive query. Integrated query finder; And

And a unified result processor for integrating the results of the unified query searcher to recommend a pre-built similar query with a correct answer for the short answer query, and to recommend a similar query for the descriptive query.

3. The method of claim 1 or 2, wherein the descriptive query indexer comprises: a virtual document builder which composes queries having the same intention of the searched web page into one virtual document;

A morpheme analyzer for morphologically analyzing the virtual document;

A partial parser for determining a mathematical relationship between the analyzed morphemes;

A content extractor for extracting content words of the virtual document using the stemmer and the partial parser results;

A score giver for assigning a weight to content words by differentially assigning weights according to TF, IDF, and parts of speech extracted from the content word extractor in the virtual document; And

And a descriptive index database for storing the content word.

The method of claim 1 or 2, wherein the descriptive query searcher inputs a result of analyzing the user's natural language query using a morpheme analyzer and a partial parser, and uses a Lexic syntax pattern to become a key part of the user's query. An essential term extractor for classifying and extracting words into essential terms and selection terms;

A representative query generator for extracting a representative query including a required term by using information of the descriptive query index database;

A similarity calculator that calculates a similarity between the representative query extracted by the representative query builder and the user's query; And

And a representative query ranker for ranking the results of the similarity calculator.

3. The system of claim 2, wherein the short answer query indexer comprises: a morpheme analyzer for morphologically analyzing the searched web document;

A correct candidate candidate extractor which extracts a correct answer candidate using an area dictionary and a correct answer type rule in the analyzed syntax;

A paragraph separator for determining a range of surrounding context of candidates extracted by the correct candidate extractor;

A score giver that scores the surrounding words by calculating a degree of association between the words existing in the divided paragraph and the correct answer candidates; And

And a classification store for classifying the type of the correct answer candidate and storing the correct answer candidate in a correct answer index database.

The apparatus of claim 2, wherein the short answer query searcher comprises: an intention analyzer configured to determine a natural language query intention using a Lexico syntax pattern as an input of a result obtained by analyzing a user's natural language query using a morpheme analyzer and a partial syntax analyzer; And

And a similarity calculator for extracting candidate correct answers using information of the correct answer index database and calculating similarity between the query and each correct candidate.

The apparatus of claim 6, further comprising: a general document searcher for searching a web document in response to the user's natural language query;

A document similarity analyzer that extracts candidate correct answers using information in a document index database and calculates similarity between the query and each correct candidate; And

And a similarity integrator for integrating the similarity between the query word and the correct answer candidate in the similarity between the document and the query word calculated in the document similarity analyzer.

The method of claim 2, wherein the integrated result processor presents a user with a ranked high-number of similar query list output from the descriptive query searcher for a descriptive natural language query and ranks the output from the descriptive query searcher for a short answer query. A natural language query-response system, comprising: a ranked list of top-ranked correct answers output from the short-answer query searcher and a plurality of top-level documents linked to a recommended similar query, together with the top-number of similar queries.

Morphologically analyzing the input natural language query of the user;

A partial parsing step of determining the mathematical relationship between the morphemes;

Storing the intention of the search target web page as a virtual document and storing the descriptive index database;

Ranking and recommending the most similar representative query for the descriptive natural language query word; And

And linking to a homepage connected to a representative query selected by the user among the recommended representative queries.

Morphologically analyzing the user's natural language query;

Extracting correct candidates of the searched web document, assigning scores, and storing the correct candidates in a correct answer index database;

Recommending a correct answer and a document for the short answer natural language query;

Ranking and recommending the most similar representative query for the descriptive natural language query word;

Integrating and presenting a similar query list, correct answers, and documents to the user; And

And linking to a homepage linked to a similar query selected by the user among the recommended similar query lists.

The method of claim 9 or 10, wherein the storing in the descriptive index database comprises:

Creating a query of the same intention of the search target web page into one virtual document;

Morpheme and partial parsing of the virtual document to determine and extract a part-of-speech of a content word; And

And assigning a score to the content word by differentially assigning weights according to TF * IDF and part-of-speech of the content word in the virtual document.

The method of claim 9 or 10, wherein the step of ranking and recommending the most similar representative query for the descriptive natural language query word,

Morphologically and parsing the natural language query of the user and extracting the core words of the user's query into essential terms and selection terms using a Lexicectic pattern;

Extracting a representative query including the required term using the information stored in the descriptive index database;

Calculating a similarity between the extracted representative query and the user's query; And

Ranking the results of the similarity calculator.

The method of claim 12, wherein the step of classifying and extracting a key word in the user's query into a required term and a selection term includes extracting a required term using syntactic elements and forms around the term. Natural language question-answer search method.

The method of claim 12, wherein the calculating of the similarity between the extracted representative query and the user's query,

Removing the representative query that does not include the required term;

Calculating a number of index terms shared between the user's query and the representative queries including the required term;

Multiplying the query term and the index term by weights when the number of shared index terms is identical;

Assigning a high score to a representative query having a small number of index terms when the product matches the product of weights; And

And selecting a representative query having the highest score among queries having the same intention and linked to the same homepage.

15. The natural language query of claim 14, wherein multiplying the weights of the query term and the index term when the number of shared index terms coincide comprises differentially assigning weights according to the parts of the query term. How to search for answers.

The method of claim 10, wherein storing in the correct answer index database,

Stemming and partial parsing the input document;

Extracting a correct answer candidate from a language-analyzed document;

Determining a range of sentences affecting the correct candidate;

Assigning a score to words around the correct candidate; And

Classifying correct candidates according to correct answer types and storing surrounding words in a correct answer index database.

The method of claim 10, wherein recommending a correct answer and a document for the short answer natural language query word comprises:

Identifying a user's intention using a Lexico syntax pattern based on the result of the morpheme and parsing of the user's natural language query;

Selecting the correct answer index database corresponding to a user's intention and generating correct answer candidates using a user query word as a key;

Recommending the correct answer by calculating and prioritizing the similarity between the correct answer candidates and the user query word; And

And re-ranking the correct candidate by combining the correct candidate-query similarity with the document-query similarity obtained based on the document index database of a general information retriever.

17. The method of claim 16, wherein extracting the correct answer candidate comprises using a correct answer type rule including an area dictionary containing information corresponding to a correct answer type, and a regular grammar for recognizing a homepage address and an e-mail address. Natural language question-answer search method.

The method of claim 16, wherein the determining the range of the sentence,

Selecting and extracting a document and a sentence having a correct candidate;

Determining how many sentences before and after the maximum paragraph size is based on the correct candidate sentences; And

And determining the maximum paragraph size to determine whether to include the current paragraph or a vocabulary chain in the sentence to determine whether to include it in the current paragraph.

18. The method of claim 16, wherein the surrounding words stored in the correct answer index database are used as keys of the database, and the contents stored in the correct answer index database determine the position in the document of the correct answer candidate along with the scores assigned to the words around the correct candidate. Natural language query-response search method comprising a.

A web browser for constructing a representative query that can access an Internet homepage;

A query set generator for constructing a representative query for creating a representative query of a connected homepage while surfing the Internet; And

A representative query building system including a descriptive query indexer for creating a home page intention as a virtual document, extracting a required term of the virtual document, assigning a score to the required term, and storing the score in the descriptive index database.

22. The representative query construction system as claimed in claim 21, wherein the representative query construction system stores a home page of representative queries worthwhile among user query logs accumulated in a natural language query-response system in the descriptive index database.

Creating a representative query of a homepage accessed while surfing the Internet; Adding or deleting the address, intention, and representative query of the homepage;

Indexing a representative query of the home page; And

And storing the indexed representative query in a descriptive index database.

24. The method of claim 23, wherein the generating of the representative query further comprises creating a representative query as a representative query of the user's query log accumulated in the natural language query-response system.