KR100434688B1

KR100434688B1 - Natural Language Question-Answering Search System for Integrated Access to Database, FAQ, and Web Site

Info

Publication number: KR100434688B1
Application number: KR10-2000-0028345A
Authority: KR
Inventors: 서정연; 이근배
Original assignee: 주식회사 다이퀘스트
Priority date: 2000-05-25
Filing date: 2000-05-25
Publication date: 2004-06-04
Also published as: KR20010107111A

Abstract

본 발명은 단지 웹사이트 상의 문서만을 인덱싱하여 검색된 결과를 제시하는 검색 시스템과 달리 웹사이트 상의 문서 처리뿐만 아니라 FAQ 리스트 검색과 대화형 DB 검색이 통합된 검색 시스템으로 사용자의 질의를 각 영역으로 분산시켜 최적의 검색된 응답 처리가 가능한 통합적 질의-응답 검색 방법을 제공한다. 본 발명은 (i)입력된 자연어 질의를 형태소/부분 구문 분석하는 단계; (ⅱ)분석한 질의가 대화형 DB 검색 방법에 적합한 질의인지 판단하여 이에 적합한 질의이면 대화형 DB 검색 방법으로 응답을 검색하는 단계; (ⅲ)대화형 DB 검색 방법에 적합하지 않은 질의에 대해 상향식 자연어 정보 분석을 적용하는 단계; (ⅳ)상향식 자연어 정보 분석에 의해 분석된 질의가 FAQ 리스트 검색 방법에 적합한 질의인지 판단하여 이에 적합한 질의이면 FAQ 리스트 검색 방법으로 응답을 검색하는 단계; (v)FAQ 리스트 검색 방법에 적합하지 않거나 또는 사용자가 웹사이트 검색을 요구하는지를 판단하여 웹사이트 질의-응답 검색 방법으로 응답을 검색하는 단계; (ⅵ)상기 대화형 DB 검색, FAQ 리스트 검색, 웹사이트 검색에서 얻어낸 응답을 통합하여 최선의 응답을 선택하는 단계를 포함하는 통합형 질의-응답 검색 방법이다.Unlike a search system that indexes documents on a website and presents the searched results, the present invention distributes a user's query to each area by a search system integrating FAQ list search and interactive DB search as well as document processing on a website. Provides an integrated query-response search method that enables optimal search response processing. The present invention comprises the steps of (i) morphological / partial parsing of the input natural language query; (Ii) determining whether the analyzed query is a query suitable for the interactive DB search method and searching for a response using the interactive DB search method if the query is suitable for the query; (Iii) applying bottom-up natural language information analysis to a query that is not suitable for the interactive DB retrieval method; (Iii) determining whether the query analyzed by the bottom-up natural language information analysis is a query suitable for the FAQ list retrieval method and retrieving the response using the FAQ list retrieval method if it is a suitable query; (v) retrieving the response with the website query-response retrieval method by determining whether it is not suitable for the FAQ list retrieval method or if the user requires a website retrieval; (Iii) an integrated query-response search method comprising the step of integrating the responses obtained from the interactive DB search, FAQ list search, and website search to select the best answer.

Description

Natural Language Question-Answering Search System for Integrated Access to Database, FAQ, and Web Site}

본 발명은 대화형 DB, FAQ(Frequently Asked Question) 리스트, 웹사이트를 모두 검색하는 통합형 질의-응답 검색 시스템에 관한 것으로, 특히 모든 검색을 자연어 질의-응답으로 처리하여 자유롭게 원하는 정보를 검색할 수 있는 질의-응답 시스템 및 방법에 관한 것이다.The present invention relates to an integrated query-response search system that searches an interactive DB, a Frequently Asked Question (FAQ) list, and a website. In particular, the present invention can freely search desired information by processing all searches as natural-language question-answers. A question and answer system and method are described.

최근에는 웹사이트 상에서 무수한 웹문서 정보를 처리하여 사용자의 요구에 해당하는 정보만을 추출하여 사용자에게 제공하여 주는 정보 검색 시스템이 널리 이용되고 있다. 또한, 단어 형식이 아닌 일상적인 자연어로도 인터넷 검색이 가능한 인터넷 정보 검색 시스템도 널리 제공되고 있다. 그러나 일반적으로 방대한 웹문서의 집합에서 정보 요구자가 원하는 문서를 정확히 추출하기란 매우 어려우며, 자연어 검색이 가능하다 하여도 정보 요구자가 원하는 특정 질의에 대한 응답을 정확히 얻는 것은 현 검색 시스템으로서는 많은 부족함이 있다.Recently, an information retrieval system that processes a myriad of web document information on a website, extracts only information corresponding to a user's request, and provides the information to a user is widely used. In addition, an Internet information retrieval system capable of searching the Internet even in everyday natural language rather than a word format is widely provided. However, in general, it is very difficult to accurately extract the information requested by the information requester from a large collection of web documents, and even though a natural language search is possible, there are many shortcomings for the current search system to correctly obtain a response to a specific query desired by the information requester. .

상기와 같은 문제점을 해결하기 위하여 본 발명의 목적은 단지 웹사이트상의 문서만을 인덱싱하여 검색된 결과를 제시하는 검색 시스템과 달리 웹사이트 상의 문서 처리뿐만 아니라 FAQ 리스트 검색과 대화형 DB 검색이 통합된 검색 시스템으로 사용자의 질의를 각 영역으로 분산시켜 최적으로 검색된 응답 처리가 가능한 통합형 질의-응답 검색 시스템 및 방법을 제공하는 것이다.In order to solve the above problems, an object of the present invention is a search system that integrates a FAQ list search and an interactive DB search as well as document processing on a website, unlike a search system that indexes only documents on a website and presents searched results. It is to provide an integrated query-response search system and method capable of processing an optimally searched response by distributing a user's query to each area.

본 발명에는 통계적 정보 검색 기법과 통계적 언어 분석 등의 버텀-업 (bottom-up) 기술과 언어 지식 기반, 질의-응답 모델링 등의 탑-다운(top-down) 기술을 유기적으로 통합하여 사용한다.In the present invention, a bottom-up technique such as statistical information retrieval technique and statistical language analysis, and a top-down technique such as language knowledge base and question-response modeling are integrated and used.

도 1은 본 발명에 따른 대화형 DB, FAQ 리스트, 웹사이트 질의-응답 통합 검색 시스템의 개념도.1 is a conceptual diagram of an interactive DB, a FAQ list, and a website query-response integrated search system according to the present invention;

도 2a는 본 발명이 적용되는 검색 시스템의 전체구성도.도 2b는 도 2a의 검색시스템의 상세 구성도.Figure 2a is an overall configuration diagram of a search system to which the present invention is applied. Figure 2b is a detailed configuration diagram of the search system of Figure 2a.

도 3a 및 도 3b는 본 발명의 통합형 질의-응답 검색 방법을 나타내는 흐름도.3A and 3B are flow charts illustrating the integrated query-response retrieval method of the present invention.

도 4는 도 3의 대화형 DB 검색 단계의 상세 흐름도 .4 is a detailed flowchart of the interactive DB retrieval step of FIG.

도 5는 도 3의 FAQ 리스트 검색 단계의 상세 흐름도.5 is a detailed flowchart of the FAQ list retrieval step of FIG.

도 6은 도 3의 웹사이트 질의-응답 검색 단계의 상세 흐름도.6 is a detailed flowchart of the website query-response retrieval step of FIG.

상기의 목적을 달성하기 위한, 본 발명의 일 실시예에 따른 통합형 질의-응답 검색 시스템은 (ⅰ)대화형 DB검색 시스템; (ⅱ)상기 대화형 DB 검색 시스템에서 찾지 못한 질의 및 사용자의 요구에 의하여 자주 질문되는 물음에 대한 응답 리스트를 검색하는 FAQ 리스트 검색 시스템; (ⅲ)상기 대화형 DB 검색 시스템 또는 상기 FAQ 리스트 검색 시스템에서 찾지 못한 질의 및 사용자의 요구에 의하여 웹사이트 상에서 질의-응답 검색을 실행하는 웹사이트 질의-응답 시스템을 포함한다.In order to achieve the above object, an integrated query-response search system according to an embodiment of the present invention includes (i) an interactive DB search system; (Ii) a FAQ list retrieval system for retrieving a list of answers to frequently asked questions due to queries and user requests not found in the interactive DB retrieval system; (Iii) a website query-response system that executes a query-response search on a website in response to a user's request and a query not found in the interactive DB search system or the FAQ list search system.

본 발명에 따르면, 상기 대화형 DB 검색 시스템은 (i)사용자의 자연어 질의를 형태소로 분석하는 형태소 분석기; (ⅱ)상기 분석된 형태소 사이의 수식 관계를 결정하는 구문 분석기; (ⅲ)상기 분석된 구문에 PLO 태거와 의미 코드를 부여하는 부여기; (ⅳ)상기 PLO 태거와 의미 코드가 부여된 구분을 SQL 문장으로 매핑하는 렉시코-신텍틱 패턴 매칭기; (ⅴ)상기 SQL 문장으로 기술된 질의에 대해 적합한 DB에 액세스하는 검색기; (ⅵ)상기 액세스된 DB에서 최적의 응답을 찾아내는 응답 생성기를 포함하는 것을 특징으로 한다.본 발명의 실시예에서, 상기 상향식 자연어 정보 검색기는,(i)형태소 구문 사전과 통계적 언어 정보를 기반으로 사용자의 자연어 질의를 형태소 및 구문으로 분석하는 형태소/구문 분석기;(ⅱ)상기 형대소 및 구문으로 분석된 질의를 단일어, 복합 명사 사전, 및 명사들 간의 공기 정보를 기반으로 복합 명사로 처리하는 복합 명사 처리기;(ⅲ)상기 불용어 사전과 색인된 명사의 가중치를 이용하여 검색을 수행하는 정보 검색기;(ⅳ)사용자 질의에 대해 한국어 워드넷 의미 정보를 이용하여 키워드 매칭을 수행하는 의미 구별 매칭기를 포함한다.또한, 상기 하향식 정보 여과기는,(ⅴ)질의 문장을 어휘, 품사, 구문, 의미 정보를 정규 표현 형태로 표현한 렉시코-신텍틱 패턴과 의문 형태를 30여가지로 분류하여 질문 의도를 파악하는 의문 형태 여과기;(ⅵ)특정 검색 영역에 의존적인 단어를 포함하는 사전을 구성하고 이를 이용하여 영역의 특성에 맞게 가중치를 조절하는 단어 정련 여과기를 포함한다.또한, FAQ 리스트 검색 시스템은,(ⅶ)통계적 자연어 분석기에서 사용되는 색인, 의미 여과기에서 사용되는 명사와 동사의 관계정보, 단어 정련 여과기에서 사용되는 특화된 영역 사전을 갱신하며, 반자동적으로 질문 유형 여과기에서 사용되는 렉시코-신텍틱 패턴과 질문 유형 정의를 갱신하는 지식 기반 갱신기를 포함하는 FAQ 편집기를 가진다.본 발명에 따르면, 상기 웹사이트 질의-응답 시스템은 상향식 자연어 정보 검색기와 하향식 정답 문장 추출기를 포함하며,상기 상향식 자연어 정보 검색기는, (i)형태소 구문 사전과 통계적 언어 정보를 기반으로 사용자의 자연어 질의를 형태소 및 구문으로 분석하는 형태소/구문 분석기; (ⅱ)형태소 및 구문으로 분석된 질의를 단일어, 복합 명사 사전, 및 명사들 간의 공기 정보를 기반으로 복합 명사로 처리하는 복합 명사 처리기; (ⅲ)불용어 사전과 색인된 명사의 가중치를 이용하여 검색을 수행하는 정보 검색기; 및 (ⅳ)사용자 질의에 대해 한국어 워드넷 의미 정보를 이용하여 키워드 매칭을 수행하는 의미 구별 매칭기를 포함하고,하향식 정답 문장 추출기는, (ⅴ)형태소 구문 사전과 통계적 언어 정보를 기반으로 사용자의 자연어 질의를 형태소 및 구문으로 분석하는 형태소/구문 분석기; (ⅵ)분석된 구문에 PLO 태거와 의미 코드를 부여하는 PLO 인식 및 의미 코드 부여기; (ⅶ)PLO 정보와 의미 코드가 부여된 구문을 렉시코-신텍틱 패턴을 사용하여 질문 유형을 결정하는 패턴 매치기; 및 (ⅷ)의미 유사도, ISA 형식의 구문 구조, 단서 단어 및 문서 내에서 위치등의 휴리스틱을 이용한 하향식 정보 여과기를 포함한다.According to the present invention, the interactive DB retrieval system comprises: (i) a morpheme analyzer for analyzing the user's natural language query; (Ii) a parser that determines a mathematical relationship between the analyzed morphemes; (Iii) a granter for assigning a PLO tagger and semantic code to the parsed phrase; (Iii) a Lexico-Syntick pattern matcher for mapping the PLO tagger and the semantic coded distinctions into SQL statements; (Iii) a searcher that accesses a suitable DB for the query described by the SQL statement; (Iii) a response generator for finding an optimal response in the accessed DB. In an embodiment of the present invention, the bottom-up natural language information searcher is based on (i) a morphological syntax dictionary and statistical language information. A morpheme / syntax analyzer that analyzes the user's natural language query in morphemes and phrases; (ii) a complex noun, a compound noun dictionary, and a compound noun based on the air information between nouns A noun processor; (ⅲ) an information searcher for performing a search using the weights of the stopword dictionary and the indexed noun; (포함) a semantic distinguishing matcher that performs keyword matching using Korean wordnet semantic information on a user query In addition, the top-down information filter is a Lexico-Shin that expresses the vocabulary, parts of speech, syntax, and semantic information in a regular expression form. Question form filter that classifies 30 kinds of text patterns and question forms to identify question intent; (ⅵ) Words that construct weighted dictionaries containing words that depend on specific search areas and adjust weights according to the characteristics of the areas. In addition, the FAQ list search system updates the index used in the statistical natural language analyzer, the relational information of nouns and verbs used in the semantic filter, and the specialized domain dictionary used in the word refined filter. It has a FAQ editor that includes a knowledge base updater that updates the Lexico-Syntic patterns and question type definitions that are used semi-automatically in the question type filter. According to the present invention, the website query-response system is a bottom-up natural language information searcher. And a top-down correct sentence extractor, wherein the bottom-up natural language information searcher includes (i) a morphological phrase A morpheme / syntax analyzer that analyzes a user's natural language query into morphemes and phrases based on dictionaries and statistical language information; (Ii) a compound noun processor for processing morphemes and phrases into queries based on single words, compound noun dictionaries, and air information between nouns; (Iii) an information searcher for performing a search using a stopword dictionary and a weighted index of nouns; And (iii) a semantic discriminating matcher that performs keyword matching using Korean WordNet semantic information on the user query, wherein the top-down correct sentence extractor comprises: (i) a natural language of the user based on the morphological syntax dictionary and statistical language information; Morpheme / syntax analyzer to parse queries into morphemes and phrases; (Iii) a PLO recognition and semantic code assigner that assigns a PLO tagger and semantic code to the parsed phrase; (Iii) a pattern matcher for determining question types using Lexico-Syntectic patterns with PLO information and semantic code assigned syntaxes; And (iii) a top-down information filter using heuristics such as semantic similarity, syntax structure in ISA format, clue words, and position within a document.

본 발명 일 실시예의 통합형 질의-응답 검색 방법에 따르면,검색 사이트의 웹 서버가 단말기로부터 입력된 사용자의 자연어 질의를 형태소/부분 구문 분석하는 단계와;상기 검색 사이트의 웹 서버는 상기 분석된 질의가 대화형 DB 검색에 적합한 질의인지를 판단하여 이에 적합한 질의이면 (a) 상기 검색 사이트의 웹 서버가 상기 분석된 형태소 사이의 수식 관계를 결정하는 단계; (b) 상기 검색 사이트의 웹 서버가 상기 분석된 구문에 PLO 태거와 의미 코드를 부여하는 단계; (c) 상기 사이트의 웹 서버가 상기 PLO 태거와 의미 코드가 부여된 구분을 SQL 문장으로 매핑하는 단계; (d) 상기 검색 사이트의 웹 서버가 상기 SQL 문장으로 기술된 질의에 대해 소정의 DB에 액세스하여 응답을 생성하는 단계로 이루어지는 대화형 DB 검색을 수행하는 단계와;상기 검색 사이트의 웹 서버는 상기 분석된 질의가 대화형 DB 검색에 적합하지 않다고 판단되었을 때 (e) 상기 검색 사이트의 웹 서버가 형태소 구문 사전과 통계적 언어 정보를 기반으로 사용자의 자연어 질의를 형태소 및 구문으로 분석하는 단계; (f) 상기 검색 사이트의 웹 서버가 상기 형태소 및 구문으로 분석된 질의를 단일어, 복합 명사 사전, 및 명사들 간의 공기 정보를 기반으로 복합 명사로 처리하는 단계; (g) 상기 검색 사이트의 웹 서버가 불용어 사전과 색인된 명사의 가중치를 이용하여 검색을 수행하는 단계; 및 (h) 상기 검색 사이트의 웹 서버가 상기 사용자 질의에 대해 한국어 워드넷 의미 정보를 이용하여 키워드 매칭을 수행하는 단계로 이루어지는 상향식 자연어 정보 분석을 수행하는 단계와;상기 검색 사이트의 웹 서버는 상기 상향식 자연어 정보 분석 단계에서 분석된 질의가 FAQ 리스트 검색에 적합한 질의인지 판단하여 FAQ 리스트 검색에 적합한 질의이면 (i) 상기 검색 사이트의 웹 서버가 상기 질의 문장을 어휘, 품사, 구문, 의미 정보를 정규 표현 형태로 표현한 렉시코-신텍틱 패턴과 질문 형태를 다수 가지로 분류하여 의문 의도를 판단하고, (j)검색 영역에 의존적인 단어를 포함하는 사전을 구성하고 이를 이용하여 영역의 특성에 맞게 가중치를 조절하여 정답문장을 추출하는 하향식 정보 여과 단계와;상기 검색 사이트의 웹 서버는 상기 FAQ 리스트 검색 방법에 적합하지 않거나 사용자가 웹사이트 검색을 요구하는지를 판단하여 상기 (e)단계의 수행 후 (k)상기 분석된 구문에 PLO 태거와 의미 코드를 부여하고, (l)상기 PLO 정보와 의미 코드가 부여된 구문을 렉시코-신텍틱 패턴을 사용하여 질문 유형을 결정하고, (m)의미 유사도, ISA 형식의 구문 구조, 단서 단어 및 문서 내에서 위치등의 휴리스틱을 이용한 하향식 여과를 통해 정답문장을 추출하는 웹사이트 질의-응답 검색을 수행하는 단계를 포함로 이루어진 통합형 질의-응답 검색 방법이 제공된다.사용자의 질의로부터 응답을 생성하는 단계를 요약하여 설명하면,먼저 사용자의 질의를 형태소/구문분석 정보를 DB 검색 시스템에게 넘겨 검색 결과가 있으면 사용자에게 결과를 제공하고 그렇지 않으면 FAQ 리스트 검색 시스템에게 검색을 요구한다. FAQ 검색 시스템의 검색 결과가 있으면 그 결과를 사용자에게 제공하고 그렇지 않으면 웹사이트 질의-응답 시스템에게 검색을 요구하여 그 결과를 사용자에게 제공한다. 시스템이 적용되는 영역에 따라 상기한 세 가지 시스템에 동시에 검색을 요구하여 모든 결과를 사용자에게 제공하거나 우선 순위에 따라 일부를 제공할 수도 있다.According to the integrated query-response search method of an embodiment of the present invention, the web server of the search site, the stemming / partial parsing of the user's natural language query input from the terminal; Determining whether the query is suitable for an interactive DB search, and if the query is suitable for the query, (a) determining, by the web server of the search site, a mathematical relationship between the analyzed morphemes; (b) the web server of the search site assigning a PLO tagger and semantic code to the parsed phrase; (c) the web server of the site mapping the division to which the PLO tagger and the semantic code are assigned to an SQL statement; (d) performing an interactive DB search comprising the web server of the search site accessing a predetermined DB and generating a response to the query described in the SQL statement; (E) analyzing, by the web server of the search site, the natural language query of the user in morpheme and syntax based on a morpheme phrase dictionary and statistical language information when it is determined that the analyzed query is not suitable for an interactive DB search; (f) processing, by the web server of the search site, the query analyzed with the morpheme and syntax into a compound noun based on a single word, a compound noun dictionary, and air information between nouns; (g) a web server of the search site performing a search using a stopword dictionary and weights of indexed nouns; And (h) performing bottom-up natural language information analysis, wherein the web server of the search site performs keyword matching on the user query using Korean wordnet semantic information. If the query analyzed in the bottom-up natural language information analysis step is a query suitable for the FAQ list search and the query is suitable for the FAQ list search, (i) the web site of the search site normalizes the vocabulary, parts of speech, syntax, and semantic information. The Lexicon-Syntectic pattern and the question form expressed in the form of expression are classified into a plurality of types to determine the intention of the question, and (j) a dictionary including words dependent on the search area is constructed and weighted according to the characteristics of the area. A top-down information filtering step of extracting correct answer sentences by adjusting a value; the web server of the search site is the FAQ lease. Determining whether it is not suitable for the search method or if the user requests a website search, and after performing step (e), (k) assigns the PLO tagger and semantic code to the analyzed phrase, and (l) the PLO information and semantic code. Is used to determine the question type using Lexico-syntactic pattern and (m) correct answer sentence through heuristic top-down filtering using semantic similarity, ISA format syntax structure, clue words and position in the document. There is provided an integrated query-response search method comprising performing a website query-response search for extracting the information from the user. Summarizing the steps for generating a response from the user's query, The analysis information is passed to the DB search system, and if there is a search result, the result is provided to the user. Otherwise, the FAQ list search system is requested to be searched. If there is a search result of the FAQ search system, the search result is provided to the user. Otherwise, the website query-response system is requested to search and the user is provided the result. Depending on the area in which the system is applied, the above three systems may be requested at the same time to provide all the results to the user, or some may be provided in order of priority.

이하, 첨부된 도면을 참조하여 본 발명을 상세히 설명한다.Hereinafter, with reference to the accompanying drawings will be described in detail the present invention.

도 1은 본 발명에 따른 사용자의 자연어 질의를 분석하고, 대화형 DB, FAQ 리스트, 웹사이트를 모두 검색하여 응답을 생성하는 통합형 자연어 질의-응답 검색 시스템의 구조를 나타낸다. 도시된 바와 같이, 통합형 자연어 질의-응답 시스템은 대화형 DB 검색 시스템(10); 대화형 DB 검색 시스템(10)에서 찾지 못한 질의 및 사용자의 요구에 의하여 자주 질문되는 물음에 대한 응답 리스트를 검색하는 FAQ 리스트 검색 시스템(20); 대화형 DB 검색 시스템(10) 또는 FAQ 리스트 검색 시스템(20)에서 찾지 못한 질의 및 사용자의 요구에 의하여 웹 사이트상에서 질의-응답 검색을 실행하는 웹 사이트 질의-응답 시스템(30)을 포함한다.1 illustrates a structure of an integrated natural language query-response search system that analyzes a user's natural language query and searches for an interactive DB, a FAQ list, and a web site to generate a response. As shown, the integrated natural language query-response system includes an interactive DB search system 10; A FAQ list retrieval system 20 for retrieving a list of answers to frequently asked questions due to queries and user requests not found in the interactive DB retrieval system 10; A web site query-response system 30 that executes a query-response search on a web site in response to a user's request and a query not found in the interactive DB search system 10 or the FAQ list search system 20.

영역 사전의 예 사전 표제어 의미 범주 가드너 (@city|@person) 강사 (@position) 강의 (%lecture) 뉴욕 (@city) 달러 (@unit_money) 서강대학교 (@organization|@building) 씨큐리티퍼시픽내쇼날은행 (@company) 인간 (%person) KG (%unit_weight) CM (%unit_length) 표 1의 의미 범주 중에서 '@X'는 해당 표제어가 의미 범주 'X'에 속한다는 것을, 의미하며, '%X'는 해당 표제어가 의미 범주 'X'라는 것을 의미한다. '@X|@Y'는 해당 표제어가 의미 범주 'X'에도 속하고 'Y'에도 속한다는 것을 의미한다. 의미 범주는 표 1에서 보듯이 임의로 결정될 수도 있으며 워드넷에 나타나 있는 센스코드를 그대로 부여할 수도 있다. Example of Zone Dictionary Dictionary headings Meaning Category Gardner (@city | @person) teacher (@position) lecture (% lecture) New York (@city) dollar (@unit_money) Sogang University (@organization | @building) Securities Pacific National Bank (@company) human (% person) KG (% unit_weight) CM (% unit_length) Of the semantic categories of Table 1, '@X' means that the heading belongs to the semantic category 'X', and '% X' means that the heading is the semantic category 'X'. '@X | @Y' means that the heading belongs to the semantic category 'X' and also to 'Y'. The semantic category can be arbitrarily determined as shown in Table 1, or the sense code shown in WordNet can be given as it is.

또한, FAQ 리스트 검색 시스템(20)은 상향식 자연어 정보 검색기(21)와 하향식 정보 여과기(22), 및 FAQ 편집기(23)를 포함하며, 상향식 자연어정보검색기(21)는 형태소 구문 사전과 통계적 언어 정보를 기반으로 사용자의 자연어 질의를 형태소 및 구문으로 분석하는 형태소/구문 분석기(210); 형태소 및 구문으로 분석된 질의를 단일어, 복합 명사 사전, 및 명사들 간의 공기 정보(대상으로 되는 두 명사가 문서 내에서 인접하여 출현할 확률값)를 기반으로 복합 명사로 처리하는 복합 명사 처리기(211); 불용어 사전과 색인된 명사의 가중치를 이용하여 검색을 수행하는 정보 검색기(212); 사용자 질의에 대해 한국어 워드넷 의미 정보를 이용하여 키워드 매칭을 수행하는 의미 구별 매칭기(213)를 포함한다. 복합 명사 처리기는 입력된 질의에 나타난 단어와 검색 대상 문장에 나타난 단어 사이의 어휘 불일치 문제를 줄여주는 역할을 한다. 예를 들어, 입력된 질의가 복합명사인 "정보검색"이고 검색 대상 문장이 "정보를 검색하다"이거나 그것 역 관계일 때, 인접하여 공기한 '정보'와 '검색'을 하나의 복합 명사인 '정보검색'으로 합성하여 두 문장 사이에 '정보검색'이라는 단어가 일치하는 것으로 판단하게 하여 어휘 불일치 문제를 해소하고 검색 성능을 향상시킨다.In addition, the FAQ list retrieval system 20 includes a bottom-up natural language information searcher 21, a top-down information filter 22, and a FAQ editor 23. The bottom-up natural language information searcher 21 includes a morpheme syntax dictionary and statistical language information. A morpheme / syntax analyzer 210 for analyzing the user's natural language query based on the morpheme and syntax; A compound noun processor 211 that processes a morpheme and phrase-queried query into a compound noun based on a single word, a compound noun dictionary, and air information between two nouns (the probability that two target nouns appear adjacent in the document). ; An information searcher 212 that performs a search using a stopword dictionary and a weighted index of nouns; And a semantic discriminating matcher 213 that performs keyword matching using Korean wordnet semantic information with respect to a user query. The compound noun processor reduces the lexical discrepancy problem between the words in the input query and the words in the search target sentence. For example, if the input query is "information search" which is a compound noun and the sentence to be searched is "searching for information" or vice versa, the adjacent information "information" and "search" are combined as one compound noun. Synthesized by 'information search' to make the word 'information search' coincide between two sentences, solving the problem of lexical discrepancy and improving search performance.

하향식 정보 여과기(22)는, 질의 문장을 어휘, 품사, 구문, 의미 정보를 정규 표현 형태로 표현한 렉시코-신텍틱 패턴과 의문 형태를 약 30 가지로 분류하여 질문 의도를 파악하는 의문 형태 여과기(220); 특정 검색 영역에 의존적인 단어를 포함하는 사전을 구성하고 이를 이용하여 영역의 특성에 맞게 가중치를 조절하는 단어 정련 여과기(221)를 포함한다. 의문 형태 여과기는 입력된 질의 문장의 의문 형태와 분석 대상 FAQ의 질문 형태를 분석하여 동일한 질문 형태에 해당하는 것만을 남겨두고 제거하는 역할을 한다. 예를 들어, 사용자의 질의 문장이 "비밀번호를 바꾸려면 어떻게 하나요?"이고, 분석 대상 FAQ 질문이 "비밀번호 변경 방법"과 "비밀번호 분실 신고는 어디에서 하나요?"라고 했을 때, 사용자의 질의 문장과 첫 번째 FAQ 질문은 방법에 대해 묻는 것이고 두 번째 FAQ 질문은 장소에 대해 묻는 것이므로 첫 번째 질문만이 남고 두 번째 질문은 제거된다. 단어 정련 여과기는 대상 FAQ를 사람이 수동으로 분석하여 중요하다고 판단되는 단어들을 FAQ 문장마다 미리 정의를 해두고 가중치를 부여하는 역할을 한다. 예를 들어, 사용자의 질의 문장이 "비밀번호 변경"이고, 분석 대상 FAQ 질문이 '비밀' 이라는 단어를 주요 단어로 가지는 "비밀번호를 바꾸려면"과 '아이디'를 주요 단어로 가지는 "비밀번호를 바꾸려면"과 '아이디'를 주요 단어로 가지는 "아이디 변경"이라고 했을 때, 사용자의 질의에 나타난 '비밀' 이라는 단어가 첫 번째 FAQ 질문에 주요 단어로 있으므로 두 번째 FAQ 질문보다 첫 번째 FAQ 질문이 보다 높은 가중치를 가진다.The top-down information filter 22 is a question-type filter for classifying question intent by classifying about 30 Lexicon-syntectic patterns and question types that express a sentence of a vocabulary, part-of-speech, syntax, and semantic information in a regular expression form. 220); A word refiner 221 may be configured to construct a dictionary including words dependent on a specific search region and adjust weights according to characteristics of the region using the dictionary. The question form filter analyzes the question form of the input question sentence and the question form of the FAQ to be analyzed and removes only the same question form. For example, if your query sentence is "How do I change my password?" And the FAQ question is "How do I change my password?" And "Where do I report my password forgot?" The first FAQ question asks how and the second FAQ question asks about the place, so only the first question remains and the second question is removed. The word refining filter plays a role of manually defining and weighting words that are considered to be important by manually analyzing a target FAQ for each FAQ sentence. For example, if the user's query sentence is "Change Password" and the FAQ to be analyzed is "Change Password" with the word "Password" as the main word and "Password" with "ID" as the main word, "And" ID "with key words as" keyword ", the first FAQ question is higher than the second FAQ question because the word" secret "in the user's query is the key word in the first FAQ question. Has a weight.

또한, FAQ 편집기(23)는, 통계적 자연어 분석기에서 사용되는 색인, 의미 여과기에서 사용되는 명사와 동사의 관계정보, 단어 정련 여과기에서 사용되는 특화된 영역 사전을 갱신하며, 반자동적으로 질문 유형 여과기에서 사용되는 렉시코-신텍틱 패턴과 질문 유형 정의를 갱신하는 지식 기반 갱신기(230)를 포함한다.In addition, the FAQ editor 23 updates the index used in the statistical natural language analyzer, the relational information of nouns and verbs used in the semantic filter, and the specialized domain dictionary used in the word refinement filter, and is used semi-automatically in the question type filter. It includes a knowledge base updater 230 for updating the Lexico-syntactic pattern and question type definition.

그리고, 웹사이트 질의-응답 시스템(30)은, 상향식 자연어 정보 검색기(31)와 상향식 자연어 정보 검색기(31)로부터의 검색결과를 받아 검색을 행하는 하향식 자연어 정보 검색기(32)를 포함하며, 다시 상향식 자연어 정보 검색기(31)는, 형태소 구문 사전과 통계적 언어 정보를 기반으로 사용자의 자연어 질의를 형태소 및 구문으로 분석하는 형태소/구문 분석기(310); 형태소 및 구문으로 분석된 질의를 단일어, 복합 명사 사전, 및 명사들 간의 공기 정보를 기반으로 복합 명사로 처리하는 복합 명사 처리기(311); 불용어 사전과 색인된 명사의 가중치를 이용하여 검색을 수행하는 정보 검색기(312); 사용자 질의에 대해 한국어 워드넷 의미 정보를 이용하여 키워드 매칭을 수행하는 의미 구별 매칭기(313)를 포함하고, 하향식 정답 문장 추출기(32)는, 형태소 구문 사전과 통계적 언어 정보를 기반으로 사용자의 자연어 질의를 형태소 및 구문으로 분석하는 형태소/구문 분석기(320); 분석된 구문에 PLO 태거와 의미 코드를 부여하고 PLO 정보와 의미 코드가 부여된 구문을 렉시코-신텍틱 패턴을 사용하여 질문 유형을 결정하는 패턴 매칭기(321); 의미 유사도, ISA 형식의 구문 구조, 단서 단어 및 문서 내에서 위치등의 휴리스틱을 이용한 하향식 정보 여과기(322)를 포함한다. 하향식 정보 여과기는 의미 유사도, ISA 형식의 구문 구조, 단서 단어, 문장 내의 위치 등과 같이 미리 정의된 자질(feature)에 점수를 할당을 해 놓고, 해당 자질이 정답 문장에 출현 했을 경우에 다음과 같은 가중치 합을 이용하여 각 문장의 점수를 계산한다.위의 식에서ts _i 는 문장i의 점수이고,f _ij 는 문장i의j번째 자질의 점수를 의미한다. 즉,f _i1 은 문장i의 1번째 자질에 대한 점수이고f _i2 는 2번째 자질에 대한 점수이다. A, B, C, D는 각 자질 점수를 합할 때 사용하는 가중치 상수이며 적용되는 영역마다 실험을 통하여 결정된다. 이렇게 결정된 문장의 점수를 바탕으로 상위 n개를 정답 문장으로 추출한다.도 2a에 본 발명이 적용되는 검색 시스템의 전체구성을 개략적으로 나타내었다.도 2a에서, 다수의 컴퓨터로 구성되는 사용자 단말기(U_{1, ...}U_n)는 인터넷(5)을 통해 어느 하나의 검색 사이트 웹 서버(6)로 연결된다. 이 검색 사이트의 웹 서버(6)는 사용자 인터페이스(7)와 문서정제 및 자동분류시스템(8)을 구비하고, 다수 클라이언트의 웹 서버 및 관계형 데이터베이스 통합검색을 지원하는 데이터 베이스(DB1)와, 웹 사이트 질의-응답 검색을 위한 데이터베이스(DB2), 대화형 DB 검색을 위한 데이터베이스(DB3), 및 FAQ 리스트 검색용 데이터베이스(DB4)를 가지고 있다. 상기 검색 사이트 웹 서버(6)는 운영자 단말기(M)를 포함한다.도 2b는 상기 검색사이트 웹서버(6)의 내부 구성을 나타낸 것으로서, 사용자 인터페이스(7)은 자연어 대화질의 분석기(7₁)와 SQL 생성기(7₂)를 포함한다. 또한 문서정제 및 자동분류시스템(8)은 자연어 정보 검색 엔진(8₂)을 포함한다. 상기 웹 서버(8)는 사용자의 자연어 질의 유형을 분석하여 적절한 검색 결과를 얻기 위한 대화형 DB 검색시스템(10)과, 관련 문의 FAQ를 제공하는 FAQ리스트 검색시스템(20)과, 질문에 대한 정답 또는 링크를 추천하기 위한 웹사이트 질의-응답 검색시스템(30)을 포함한다. 상기 대화형 DB 검색시스템(10), FAQ 검색시스템(20) 및 웹 사이트 질의-응답 검색시스템(30)에 대한 상세는 전술한 바의 본 발명의 검색 시스템 개별구성 설명과 같다. 상기 웹 서버(8)는 사용자의 자연어 질의 유형에 따라 액세스된 DB에서 최적의 응답을 찾아내는 통합응답생성기(9)를 포함한다.In addition, the website query-response system 30 includes a bottom-up natural language information searcher 31 and a top-down natural language information searcher 32 for receiving a search result from the bottom-up natural language information searcher 31 and searching. The natural language information searcher 31 may include a morpheme / syntax analyzer 310 for analyzing a user's natural language query into a morpheme and a phrase based on a morpheme phrase dictionary and statistical language information; A compound noun processor 311 for processing the morpheme and syntax analyzed query into a compound noun based on a single word, a compound noun dictionary, and air information between nouns; An information searcher 312 that performs a search using a stopword dictionary and weighted nouns of the index; And a semantic discriminating matcher 313 for performing keyword matching using Korean WordNet semantic information on the user query, and the top-down correct sentence extractor 32 includes a natural language of the user based on the morpheme syntax dictionary and statistical language information. A morpheme / syntax analyzer 320 that parses the query into morphemes and phrases; A pattern matcher 321 for assigning a PLO tagger and semantic code to the parsed phrase and determining a question type using a Lexico-syntactic pattern to which the PLO information and semantic code are assigned; Top-down information filter 322 using heuristics, such as semantic similarity, ISA format syntax structure, clue words, and position within a document. The top-down information filter assigns scores to predefined features, such as semantic similarity, ISA-type syntax structure, clue words, and positions within sentences, and the following weights when those features appear in the correct answer sentence: Calculate the score of each sentence using the sum. In the above formula ts _i is the score of sentence i , f _ij is the score of the j- th quality of the sentence i . That is, f _i1 is the score for the first feature of sentence i and f _i2 is the score for the second feature. A, B, C, and D are weight constants used to add each feature score and are determined through experiments for each applied area. Based on the scores of the sentences thus determined, the top n items are extracted as correct answer sentences. FIG. 2A schematically shows the overall configuration of a search system to which the present invention is applied. In FIG. 2A, a user terminal composed of a plurality of computers ( U _{1, ...} U _n ) is connected to any one search site web server 6 via the Internet 5. The web site 6 of this search site is provided with a user interface 7 and a document refinement and automatic classification system 8, a database (DB1) supporting a web server of a plurality of clients and a relational database integrated search, and a web. It has a database (DB2) for site query-response search, a database (DB3) for interactive DB search, and a database for FAQ list search (DB4). The search site web server 6 includes an operator terminal M. FIG. 2B shows the internal configuration of the search site web server 6, where the user interface 7 is a natural language interactive analyzer 7 ₁ . And the SQL generator 7 ₂ . The document refinement and automatic classification system 8 also includes a natural language information search engine 8 ₂ . The web server 8 includes an interactive DB search system 10 for analyzing a user's natural language query type to obtain an appropriate search result, a FAQ list search system 20 for providing a related inquiry FAQ, and a correct answer to a question. Or a website query-response search system 30 for recommending links. Details of the interactive DB retrieval system 10, the FAQ retrieval system 20, and the website query-response retrieval system 30 are the same as the description of the retrieval system of the present invention as described above. The web server 8 includes an integrated response generator 9 that finds the optimal response from the accessed DB according to the user's natural language query type.

상기와 같이 구성된 검색 시스템의 동작과 본 발명의 대화형 DB, FAQ리스트, 웹사이트에 대한 통합형 자연어 질의-응답 검색 방법을 도 3a를 참고로 상세히 설명한다. 도 3a에 도시된 바와 같이, 단계 S11에서, 사용자로부터 입력된 자연어 질의는 형태소/구문 분석 사전 및 통계적 언어 정보를 사용하여 한국어 문법상 하나의 의미를 가지는 최소 단위인 형태소와 부분 구문 구조로 분석된다. 그리고 나서, 이 분석된 질의는 단계 S12에서 적합한 검색 영역 즉, 대화형 DB 검색, FAQ 리스트 검색, 웹사이트 검색을 위한 시스템(10, 20, 30)으로 분산된다.The operation of the search system configured as described above and the integrated natural language query-response search method for the interactive DB, FAQ list, and website of the present invention will be described in detail with reference to FIG. 3A. As shown in FIG. 3A, in step S11, the natural language query input from the user is analyzed into a morpheme and a partial syntax structure which is a minimum unit having one meaning in Korean grammar using a morpheme / syntax analysis dictionary and statistical language information. . This analyzed query is then distributed to a suitable search area, i.e. interactive DB search, FAQ list search, website search system 10, 20, 30, in step S12.

대화형 DB 검색 시스템(10)으로 분산된 질의어는 단계 13에서 한국어 워드넷 정보/도메인 사전에 DB의 스키마(Schema) 정보가 포함된 의미 코드가 부여되고, 의미 코드가 부여된 질의어는 질의 문장의 어휘, 품사, PLO 정보, 구문 정보 및 의미 코드를 포함하고, 정규 표현으로 기술되는 렉시코-신텍틱 패턴 (lexico-syntactic pattern) 결정기(130)를 통해 미리 저장된 대화형 데이터베이스 질의문인 SQL(Structured Query Language)의 형태를 결정한다. 이렇게 결정된 SQL 질의는 단계 S14에서 대화형 데이터베이스를 통해 데이터베이스에서 가장 최적의 응답을 찾아낸다. 이 대화형 DB검색시스템(10)을 개별 시스템으로 사용할 때는 도 4와 같은 구조를 가진다.The query distributed to the interactive DB search system 10 is assigned a semantic code including schema information of the DB to the Korean WordNet information / domain dictionary in step 13, and the query with the semantic code is assigned to the query sentence. Structured Query, an interactive database query statement that is pre-stored through a Lexico-syntactic pattern determiner 130 that includes lexical, part-of-speech, PLO information, syntax information, and semantic code, and is described by regular expressions. Language). The SQL query thus determined finds the most optimal response in the database through the interactive database in step S14. When the interactive DB search system 10 is used as a separate system, it has a structure as shown in FIG.

한편, FAQ 리스트 및 웹사이트 질의 시스템(20, 30)으로 분산된 질의어는 단계 S15에서, 단일어, 복합 명사 사전을 이용하여 복합 명사를 처리한 후 불용어 사전과 색인된 명사의 가중치를 이용하여 검색을 수행하는 상향식 자연어 정보 검색 방법을 사용하여 다시 FAQ 질의와 웹사이트 질의로 분류된다.On the other hand, the query list distributed to the FAQ list and the website query system 20, 30 processes the compound nouns using the single word and the compound noun dictionary in step S15, and then searches using the weights of the stopword dictionary and the indexed nouns. Using the bottom-up natural language information retrieval method, it is classified into FAQ query and website query.

여기서 FAQ 리스트 질의로 분석된 질의는 한국어 워드넷 정보/도메인 사전의 의미 코드가 부여된다. 이 의미 코드가 부여된 질의어는 질의 문장의 어휘, 품사, PLO 정보, 구문 정보 및 의미 코드를 포함하고, 정규 표현으로 기술되는 렉시코-신텍틱 패턴 결정기를 통해 하향식 FAQ 여과기(22)를 통과하여 응답을 찾아낸다(S16). 이 FAQ리스트 검색 시스템(20)을 개별 시스템으로 사용할 때는 도 1 또는 도 5와 같은 구조를 가진다.Here, the query analyzed by the FAQ list query is given the semantic code of the Korean WordNet information / domain dictionary. The query word to which this semantic code is assigned includes a lexical, part-of-speech, PLO information, syntax information, and semantic code of the query sentence, and passes through a top-down FAQ filter 22 through a Lexico-Syntectic pattern determiner described in regular expression. Find the answer (S16). When using this FAQ list retrieval system 20 as a separate system, it has the structure as shown in FIG.

다른 한편, 웹사이트 정답 문서로 분석된 질의는 단계S17에서, PLO 인식 및 의미 코드가 부여되고, 렉시코-신텍틱 패턴을 사용하여 질문 유형을 결정하는 패턴을 결정하고, 의미 유사도, ISA 형식의 구문 구조, 단어 및 문서 내에서 위치등의 휴리스틱을 이용한 하향식 정보 여과기(322)로 이루어지는 하향식 정답 문장 추출기(32)를 통과하여 정답 문장을 찾아낸다. 이 웹사이트 질의-응답 시스템을 개별 시스템(30)으로 사용할 때는 도 6과 같은 구조를 가진다.On the other hand, the query analyzed by the website correct answer document is given a PLO recognition and semantic code in step S17, and determines the pattern for determining the question type using the Lexico-Syntectic pattern, and the semantic similarity, ISA format. The correct answer sentence is found by passing through a top-down correct sentence extractor 32 including a top-down information filter 322 using heuristics such as a syntax structure, a word, and a position in a document. When using this website query-response system as a separate system 30, it has a structure as shown in FIG.

그 후 S18 단계에서, 상기 대화형 DB 응답, 하향식 FAQ 여과기(21)에서 얻어진 FAQ 응답 및 하향식 정답 문장 추출기(32)에서 얻어진 정답 문장을 통합하여 최선의 응답을 찾아내게 된다.Then, in step S18, the best response is found by integrating the interactive DB response, the FAQ response obtained from the top-down FAQ filter 21, and the correct answer sentence obtained from the top-down correct sentence extractor 32.

상기의 통합형 자연어 질의-응답 시스템에서 대화형 DB, FAQ 리스트, 웹사이트의 검색에 대한 절차적 순서에 대하여 도 3b에 흐름도로 나타내었다. 도 3에 도시된 바와 같이, 사용자 질의가 본 발명에 따른 통합형 자연어 질의-응답 시스템에 입력 된다(S21). 단계 S21에서 입력된 질의에 대하여 통합형 자연어 질의-응답 시스템에서는 일단 형태소 및 부분 구문 분석을 한다. 그런 후에, 단계 S21에서 분석된 사용자 질의가 대화형 DB 검색에 적합한지를 판단한다.In the integrated natural language question-answering system described above, a procedural sequence for searching an interactive DB, a FAQ list, and a website is shown in a flowchart in FIG. 3B. As shown in FIG. 3, a user query is input into the integrated natural language question-response system according to the present invention (S21). The integrated natural language question-answering system performs morpheme and partial parsing on the query input in step S21. Then, it is determined whether the user query analyzed in step S21 is suitable for the interactive DB search.

만약 대화형 DB에 적합한 질의이면 대화형 DB 검색을 수행한다(S22). 그리고나서, FAQ 리스트 검색 모듈로 전달되며, 단계 S23에서 사용자가 대화형 DB 검색에서 얻은 응답보다 더 많은 정보를 얻기 위해 FAQ 리스트 검색을 요구하는 지를 판단하여, 만약 요구한다면 사용자 질의는 단계 S25에서 FAQ 리스트 검색을 수행한다. 반면에, 요구하지 않는다면, 통합 응답 선택 단계인 단계 S28로 전달된다.If the query is suitable for the interactive DB, the interactive DB search is performed (S22). Then, it is passed to the FAQ list retrieval module, and in step S23 it is determined whether the user requires a FAQ list retrieval to obtain more information than the response obtained in the interactive DB retrieval, and if so, the user query in step S25 Perform a list search. On the other hand, if not required, it is passed to step S28, which is an integrated response selection step.

단계 S26에서, 사용자가 웹사이트 검색을 요구하는 지를 판단하여, 요구한다면, 웹사이트를 검색하여 적절한 응답을 추출한다(S27). 사용자가 웹사이트 검색을 요구하지 않을 경우에는, 통합 응답 선택단계 S28을 통하여 사용자에게 검색 결과가 사용자에게 제시된다.In step S26, it is determined whether the user requests a website search, and if so, the website is searched to extract an appropriate response (S27). If the user does not require a website search, the search results are presented to the user through the integrated response selection step S28.

한편, 단계 S21에서 사용자 질의가 대화형 DB에 적합하지 않을 경우에는 상향식 자연어 정보분석이 행해진다(S24). 이 분석된 사용자 질의가 FAQ 리스트 검색에 적합한지를 판단한다(S24). 적합한 경우에는 단계 S25에서 FAQ 리스트를 검색한다. 그러나, 만약 사용자의 FAQ 리스트 검색에 적합하지 않을 경우에는, 웹사이트를 검색하여 적절한 응답을 단계 S27에서 추출한다. 사용자가 웹 사이트 검색을 요구할 경우에도, 웹사이트를 검색하여 적절한 응답을 추출할 수 있다(S26). 이와같이 사용자의 질의에 대한 통합적 검색 결과는 통합 응답 선택을 통해 사용자에게 적절하게 제시된다(S28).On the other hand, if the user query is not suitable for the interactive DB in step S21, bottom-up natural language information analysis is performed (S24). It is determined whether the analyzed user query is suitable for the FAQ list search (S24). If appropriate, the FAQ list is retrieved in step S25. However, if it is not suitable for the user's FAQ list search, the website is searched and an appropriate response is extracted in step S27. Even when the user requests a web site search, the web site may be searched to extract an appropriate response (S26). In this way, the integrated search result for the user's query is properly presented to the user through the selection of the integrated response (S28).

이러한 과정으로 예를 들면, 통합적 자연어 질의-응답 검색 시스템의 사용자 질의가 "김정민 대리의 전화 번호는?"이라면 해당 대화형 DB를 검색하여 원하는 정보를 제공하고, 만약 사용자 질의가 "비디오가 고장인데 A/S를 받는 방법은?"과 같은 사용자들이 자주 물어보는 질의에 해당한다면 FAQ 리스트를 검색하여 이 질의에 적절한 응답을 찾아낸다. 또한, 사용자 질의가 "신림동에서 가장 가까운 대리점은?"과 같이 웹사이트의 특정 문서가 답변이 될만한 질의라면 웹사이트를 검색하여 적절한 내용의 문서를 답변으로 제공한다.For example, if the user query of the integrated natural language query-response retrieval system is "What is Kim Jung-min's phone number?", The relevant interactive DB is searched to provide the desired information. How do I get after-sales service? ”If the answer is a question frequently asked by users, the FAQ list is searched for an appropriate response. In addition, if a user query is a query that can be answered by a specific document on the website, such as "Where is the nearest agency in Sillim-dong?", The website is searched and an appropriate document is provided as an answer.

이하 상기한 대화형 DB 검색방법, FAQ 리스트 검색 방법 및 웹사이트 질의-응답 검색 방법 각각에 대하여 더 상세히 설명한다.Hereinafter, each of the above-described interactive DB search method, FAQ list search method and website query-response search method will be described in more detail.

우선, 도 4에 의거하여 대화형 DB 검색방법을 상세히 설명한다. 본 발명에 따라 대화형 DB 검색 시스템(10)에 입력된 사용자의 질의는 형태소 분석과 부분 구문 분석이 수행되어 일정한 구문 단위(예를 들면, 명사구)를 생성한다(S110). 이 생성된 구문 단위에 고유 명사에 해당하는 명사구가 사람, 장소, 기관 중에 어느 범주에 속하는지에 대한 정보가 되는 PLO 정보와 한국어 워드넷과 도메인 사전에 DB의 스키마 정보를 포함시켜 확장한 의미 코드가 부여된다(S120). 이렇게 PLO 인식과 의미 코드 부여 과정을 거친 다음 질의 문장의 어휘, 품사, PLO 정보, 구문 정보 및 의미코드를 포함하고, 정규 표현(Regular expression) 형태로 기술되는 렉시코-신텍틱 패턴 매처를 이용하여 사용자 질의를 미리 정의된 SQL 문장으로 매핑한다(S130). SQL 문장이 선택되면 DB 엑세스를 통하여 응답을 추출하고(S140), 미리 정의된 응답 틀에 의해 적당한 응답 문장을 생성한다(S150).First, the interactive DB search method will be described in detail with reference to FIG. According to the present invention, the query of the user input to the interactive DB search system 10 is subjected to morphological analysis and partial syntax analysis to generate a predetermined syntax unit (eg, noun phrase) (S110). The generated syntactic unit includes PLO information, which is information about which noun phrases belong to proper nouns, categories, people, places, and institutions, and extended semantic codes by including DB schema information in Korean WordNet and domain dictionaries. It is given (S120). After the PLO recognition and semantic code assignment process, Lexic-Syntic pattern matcher including the lexical, part-of-speech, PLO information, syntax information, and semantic code of the query sentence is described in the form of regular expression. The user query is mapped to a predefined SQL statement (S130). When the SQL statement is selected, a response is extracted through DB access (S140), and a proper response sentence is generated by using a predefined response frame (S150).

다음, 다음으로 자주 질문되는 물음, FAQ의 응답을 쉽게 찾을 수 있는 FAQ 리스트 검색 방법을 도 5에 의거하여 설명한다. FAQ 리스트 검색 시스템(20)의 구조에 대하여는 앞서 설명하였으므로, 여기서는 상세한 설명을 생략하고자 한다. FAQ 리스트 검색 시스템은 상향식 자연어 정보 검색기(21)와 하향식 정보 여과기(22)를 거쳐 적합한 질의-응답쌍을 얻어낸다.Next, a frequently asked question, the FAQ list search method that can easily find the answer of the FAQ will be described based on FIG. Since the structure of the FAQ list retrieval system 20 has been described above, a detailed description thereof will be omitted. The FAQ list retrieval system obtains a suitable query-response pair via a bottom-up natural language information retriever 21 and a top-down information filter 22.

상향식 정보 검색은 형태소/구문 분석을 통해 질의를 분석하고(S210), 단일어, 복합 명사 사전을 이용하여 복합 명사를 처리한다(S211). 그런 후에 불용어 사전과 색인된 명사의 가중치를 이용하여 검색을 수행한다(S212). 한국어 워드넷 의미 정보를 이용한 의미 구별 매칭에 의해 키워드 매칭 검색을 한다(S213).Bottom-up information retrieval analyzes the query through morpheme / syntax analysis (S210), and processes a compound noun using a single word and a compound noun dictionary (S211). After that, the search is performed using the stopword dictionary and the weight of the indexed noun (S212). A keyword matching search is performed by semantic discriminating matching using Korean WordNet semantic information (S213).

따라서, 워드넷 의미 정보를 이용한 의미 구별 매칭에 의해 키워드 매칭 검색 방법만을 이용하는 시스템에서 나타나고 있는 문제점인 유사한 의미의 다른 키워드를 사용하는 경우에 있어서, 이 FAQ 리스트 검색 시스템(20)에서는 의미구별, 추상적인 추론을 통해 다른 키워드 사용에서 오는 오류를 해결할 수 있다.Therefore, in the case of using another keyword of similar meaning, which is a problem in a system using only the keyword matching search method by semantic discrimination matching using wordnet semantic information, the FAQ list search system 20 uses semantic classification and abstraction. Inference can be used to resolve errors from using other keywords.

한편, 하향식 정보 여과기(22)는 명사 이외의 정보 생략에서 오는 오류를 해결하고, 검색 대상 영역의 특성을 고려한 사전을 바탕으로 단어를 정련하여 검색의 성능을 향상시킨다. 이러한 하향식 정보 여과 시스템은 질의 문장을 어휘, 품사, 구문, 의미 정보를 정규 표현 형태로 표현한 렉시코-신텍틱 패턴과 의문 형태를 30여 가지로 분류한 질문 유형 여과를 통해 사용자의 질문 의도를 정확히 파악하여 검색 성능을 향상시킨다(S220). 또한, 특정 검색 영역에 의존적인 단어를 포함하는 사전을 구성하고 이를 이용하는 단어 정련은 검색 영역의 특성에 맞게 가중치를 조절한다(S221). 이 FAQ 리스트 검색 시스템(20)은 FAQ 편집기를 가지며, 이 FAQ 편집기(23)는 입력된 사용자의 질의의 모음을 사용하여 견고한 FAQ 리스트 검색 시스템(20)을 구성하고 FAQ 리스트에 새로운 질의-응답 쌍을 추가한다. 또한, FAQ 편집기(23)는 자동적으로 통계적 정보 검색 시스템에서 사용되는 색인 및 의미 여과에서 사용되는 명사와 동사의 관계 정보, 단어 정련에서 사용되는 특화된 영역 사전을 갱신하며, 반자동적으로 질문 유형 여과에서 사용되는 렉시코-신텍틱 패턴과 질문 유형 정의를 갱신한다(S230).On the other hand, the top-down information filter 22 improves the performance of the search by resolving errors resulting from information omissions other than nouns, and by refining words based on the dictionary considering the characteristics of the search target area. This top-down information filtering system accurately identifies user's question intention through lexico-syntectic pattern that expresses the sentence sentence, part-of-speech, syntax, and semantic information in the form of regular expression and the question type filtration that classifies 30 kinds of question forms. Identify and improve search performance (S220). In addition, the word reconstruction using a dictionary that includes words dependent on a specific search area is adjusted according to the characteristics of the search area (S221). This FAQ list retrieval system 20 has a FAQ editor, which constructs a robust FAQ list retrieval system 20 using a collection of entered user queries and creates a new question-answer pair in the FAQ list. Add In addition, the FAQ editor 23 automatically updates the relational information of nouns and verbs used in indexing and semantic filtering used in statistical information retrieval systems, and specialized domain dictionaries used in word refinement. The Lexico-syntactic pattern and the question type definition used are updated (S230).

다음에 웹사이트 질의-응답 검색 방법에 대해 도 6을 참고로 설명한다. 도 6에 도시되는 바와 같이, 사용자의 자연어 질의는 도 5에서 도시한 상향식 자연어 정보 검색기(21, 31)에서 검색된 상위 N 개 정답 문서에 대하여 하향식 여과(32)를 수행한다.Next, a website query-response retrieval method will be described with reference to FIG. As shown in FIG. 6, the user's natural language query performs top-down filtering 32 on the top N correct answer documents retrieved by the bottom-up natural language information searchers 21 and 31 shown in FIG. 5.

상향식 자연어 정보 검색기(21, 31)에서는 우선, 사용자의 자연어 질의가 형태소 분석과 부분 구문 분석 과정을 거쳐 일정한 구문 단위로 묶여진다. 그리고 각 구문 단위에는 PLO 정보와 의미 코드가 부여된다. 이렇게 PLO 정보와 의미 코드가 할당된 사용자 질의는 질문 유형을 결정하는 패턴 매처 모듈로 넘겨진다(S321). 이 모듈은 질문 유형을 결정하기 위해서 렉시코-신텍틱 패턴에 질의 문장의 어휘, 품사, PLO 정보, 구문 정보 및 의미 코드를 포함하고, 정규 표현 형태로 기술되는 렉시코-신텍틱 패턴을 이용한다(S321). 패턴 매처는 텍스트 형태로 기술된 렉시코-신텍틱 패턴을 판독하여 현재 질의의 질문 유형에 맞게 정답이 되는 문장을 여과한다(S322). 표 1은 렉시코-신텍틱 패턴의 예이다. 표 1에서 $로 시작하는 어휘는 해당 어휘가 속해 있는 한국어 워드넷의 신세트(Synset)와 그것의 하위어(Hyponym)를 의미하고, %는 유의어(Synonym)를 의미한다.In the bottom-up natural language information searcher (21, 31), first, the user's natural language query is grouped into a certain syntax unit through morphological analysis and partial syntax analysis. Each syntax unit is given PLO information and a semantic code. The user query assigned the PLO information and the semantic code is passed to the pattern matcher module for determining the question type (S321). To determine the question type, the module uses the Lexico-Syntic pattern, which is described in regular expression form, including the lexical, part-of-speech, PLO information, syntax information, and semantic code of the query sentence in the Lexico-Syntax pattern S321). The pattern matcher reads the Lexico-syntactic pattern described in text form and filters the sentence that is correct according to the question type of the current query (S322). Table 1 is an example of Lexico-Syntectic pattern. In Table 1, the vocabulary starting with $ means Synset and its lower term of Korean WordNet to which the vocabulary belongs, and% means synonym.

질문 유형Question type PersonPerson 패턴pattern (@사람|%사람)(jCs|jOc){0,1}(\?){0,1}$(@사람|%사람)(jCs){0,1}(\?){0,1}(.+(jOm){0,1}){0,1}누구.*$(@사람|%사람)(.+(jOm){0,1}){0,1}%이름(jCs|jOc){0,1}(\?){0,1}$(@사람|%사람)(.+(jOm){0,1}){0,1}%이름(jCs){0,1}(무엇|어떻게되).*$누구jCs누구jOm(@Person |% person) (jCs | jOc) {0,1} (\?) {0,1} $ (@ person |% person) (jCs) {0,1} (\?) {0,1 } (. + (jOm) {0,1}) {0,1} who. * $ (@ person |% person) (. + (jOm) {0,1}) {0,1}% name (jCs | jOc) {0,1} (\?) {0,1} $ (@ person |% person) (. + (jOm) {0,1}) {0,1}% name (jCs) {0, 1} (What | How). * $ Who jCsWho jOm 기호 설명Symbol Description $: Hyponym %: Synonym { }: 반복|: 선택 +: 1번이상 반복 *: 0번이상 반복$: 문장의 끝 jOm: 관형격 조사 jCs: 주격 조사$: Hyponym%: Synonym {}: Repeat |: selection +: repeat one or more times *: repeat zero or more times

< 표 1 > Lexico-syntatic 패턴 예<Table 1> Lexico-syntatic pattern example

상향식 자연어 정보 검색기(21,31)를 이용하여 검색된 상위 N 개의 문서도 역시 PLO 인식과 의미 코드 부여 과정을 거친다. 이렇게 사용자 질의와 검색된 문서에 대한 전처리를 거친 후에, 웹사이트 질의-응답 검색 시스템(30)은 정답으로 추정되는 문장을 검색된 문서에서 추출한다. 정답 추출을 위해서는 패턴 매처 모듈에서 찾은 질문 유형, PLO 정보, 의미 유사도 및 다양한 휴리스틱을 이용한다. 질문 유형과 PLO 정보는 정답일 가능성이 있는 문장만을 남기고 나머지는 후보에서 제거하는 여과기의 역할을 한다. 의미 유사도와 ISA 형식의 구문 구조, 단서 단어 및 문서 내에서의 위치등의 휴리스틱은 여과기를 통과한 문장들에 점수를 부여하여 정답일 가능성이 가장 높은 문장을 선택하는 역할을 한다. 웹사이트 질의-응답 검색 시스템(30)은 정답 문장을 포함하고 있지 않은 문서에 대해서는 낮은 점수를 부여하여 상향식 자연어 정보 검색기(21, 31)에서 찾아준 상위 N 개의 문서를 다시 순위화하는 역할도 수행한다.The top N documents retrieved using the bottom-up natural language information searcher (21, 31) also undergo a PLO recognition and semantic code assignment process. After this preprocessing of the user query and the retrieved document, the website query-response retrieval system 30 extracts a sentence estimated to be a correct answer from the retrieved document. To extract the correct answer, we use the question type, PLO information, semantic similarity, and various heuristics found in the pattern matcher module. The question type and PLO information serve as a filter to leave only the sentences that are likely to be correct and to remove the rest from candidates. Heuristics, such as semantic similarity and syntactic structure in ISA format, clue words, and position in a document, are used to score sentences that pass through the filter to select the sentences most likely to be correct. The website query-response retrieval system 30 also ranks the top N documents found by the bottom-up natural language information searcher (21, 31) by giving a low score to documents that do not contain the correct answer sentences. do.

본 발명은 단지 웹상의 문서만을 인덱싱하여 검색된 결과를 제시하는 검색 시스템과 달리 웹상의 문서 처리뿐만 아니라 FAQ 리스트 검색과 대화형 DB 검색이 통합된 검색 시스템으로 통계적 정보 검색 기법과 통계적 언어 분석등의 상향식 자연어 처리 기술과 언어 지식 기반, 질의-응답 모델링등의 하향식 자연어 처리 기술을 유기적으로 통합함으로써 사용자가 원하는 정보를 편리하고 정확하게 찾을 수 있는 효과가 있다.The present invention is a search system that integrates FAQ list search and interactive DB search as well as document processing on the web, unlike a search system that displays only the documents on the web and presents the searched results. Organically integrated top-down natural language processing techniques, such as natural language processing technology, language knowledge base, and question-answer modeling, can provide users with convenient and accurate information.

Claims

delete

Morpheme / partial parsing of a natural language query of a user input from a terminal by a web server of a search site;

Determining, by the web server of the search site, whether the analyzed query is a query suitable for an interactive DB search, and if the query is a suitable query, (a) determining, by the web server of the search site, a mathematical relationship between the analyzed morphemes; (b) the web server of the search site assigning a PLO tagger and semantic code to the parsed phrase; (c) the web server of the site mapping the division to which the PLO tagger and the semantic code are assigned to an SQL statement; (d) performing an interactive DB search comprising a web server of the search site accessing a predetermined DB and generating a response to a query described in the SQL statement;

When the web server of the search site determines that the analyzed query is not suitable for an interactive DB search, (e) the web server of the search site may stem the user's natural language query based on the morpheme syntax dictionary and statistical language information. Parsing; (f) processing, by the web server of the search site, the query analyzed with the morpheme and syntax into a compound noun based on a single word, a compound noun dictionary, and air information between nouns; (g) a web server of the search site performing a search using a stopword dictionary and weights of indexed nouns; And (h) performing bottom-up natural language information analysis by a web server of the search site performing keyword matching on the user query using Korean wordnet semantic information.

The web server of the search site determines whether the query analyzed in the bottom-up natural language information analysis step is a query suitable for the FAQ list search, and if the query is suitable for the FAQ list search, (i) the web site of the search site searches the query sentence. Lexicon-Syntectic patterns and question types that express part-of-speech, syntax, and semantic information in regular expressions are classified into a number of ways to determine questioning intention, and (j) construct a dictionary that includes words that depend on the search domain. A top-down information filtration step of extracting the correct sentence by adjusting the weight according to the characteristics of the region by using;

The web server of the search site determines whether it is not suitable for the FAQ list search method or if the user requests a website search, and after performing step (e), (k) assigns a PLO tag and semantic code to the analyzed phrase. (1) determine the question type using Lexico-Syntax pattern, and (m) the semantic similarity, the syntax structure of the ISA format, the clue words and their position in the document. An integrated natural language query-response search method comprising the step of performing a website query-response search for extracting correct sentences through top-down filtration using a heuristic of.

delete

6. The method of claim 5, wherein the top-down information filtering step comprises: (n) a web server of the search site, an index used in a statistical natural language analyzer, relationship information of nouns and verbs used in semantic filtering, and a specialized region used in word refinement filtering. And a FAQ editing step of updating the dictionary and updating the lexico-syntactic pattern and question type definitions used in the question type filtration.

delete

(i) morphologically analyzing the user's natural language query from the terminal by the web server of the search site; (ii) determining a mathematical relationship between the analyzed morphemes by the web server of the search site; A web server of a search site assigning a PLO tagger and a semantic code to the parsed phrase; (i) a web server of the search site mapping a phrase to which a PLO tagger and semantic code is assigned to an SQL sentence; (Iii) the web server of the search site accesses a suitable DB for the query described in the SQL statement; (iii) the web server of the search site extracts a response through the DB access and defines a predefined response framework. By generating a response sentence,

The step of determining the mathematical relationship between the analyzed morphemes is:

Extracting two representative morphemes consisting of real morphemes and formal morphemes from a word based on the result of the morphological analysis; And

Representing the morphemes according to the following rules,

The step of assigning the semantic code is:

Analyzing the target region to determine a semantic category;

Manually collecting words corresponding to each of the semantic categories and storing them in a database; And

Interactive DB retrieval method comprising the step of assigning a semantic code to each morpheme, noun phrase or verb phrase of the input sentence based on the database:

Where the rule is

● The leftmost morph of a word basically becomes the real morph of that word,

● The rightmost morpheme of a word becomes the formal morpheme of that word,

● Words consisting of only one morpheme serve as real morphemes and formal morphemes.

● Commas, exclamation marks, periods, and special symbols affect the structure of sentences, so they are treated like formal morphemes.

● When a commander or an investigator comes to the beginning of a word, the next statement is treated as a real morpheme.

(i) analyzing, by the web server of the search site, a natural language query of the user into a morpheme and a phrase based on a morpheme phrase dictionary and statistical language information;

(Ii) processing, by the web server of the search site, the query analyzed with the case and phrase into a compound noun based on a single word, a compound noun dictionary, and air information between nouns;

(Iii) the web server of the search site performing a search using a stopword dictionary and a weighted index of nouns; And

(Iii) a bottom-up natural language information retrieval step comprising a web server of the search site performing keyword matching on the user query using Korean wordnet semantic information;

(Iii) a web server of the search site classifies the Lexic-Syntectic pattern and the question form, which express the vocabulary, the part-of-speech, the phrase, and the semantic information in a regular expression form, to identify the intention of the question ; And

(Iii) a top-down information filtering step comprising the web server of the search site constructing a dictionary containing words dependent on the search area and using the same to adjust weights according to the characteristics of the area; and

(Iii) The web server of the search site updates the index used in the statistical natural language analyzer, the relational information of nouns and verbs used in semantic filtering, and the specialized domain dictionary used in word refinement filtering, and the Lexi used in question type filtering. A method for retrieving a list of frequently asked questions, which includes editing the co-syntactic pattern and question type definitions.

(Ii) the web server of the search site processing the query analyzed in the morpheme and phrase into a compound noun based on a single word, a compound noun dictionary, and air information between nouns;

(Iii) a bottom-up natural language information retrieval step comprising a web server of the search site performing keyword matching on the user query using Korean wordnet semantic information; and

(I) assigning a PLO tagger and semantic code to the parsed syntax by the web server of the search site after performing step (i);

(Iii) determining, by the web server of the search site, a question type using a Lexico-Synectic pattern with the PLO information and a semantic code; And

(Iii) a web server of the search site performing top-down filtering using heuristics such as semantic similarity, ISA format syntax structure, clue words, and position within a document.