KR100434902B1

KR100434902B1 - Knowledge base custom made information offer system and service method thereof

Info

Publication number: KR100434902B1
Application number: KR10-2000-0050160A
Authority: KR
Inventors: 인소란; 박영택
Original assignee: 주식회사 에이전트엑스퍼트
Priority date: 2000-08-28
Filing date: 2000-08-28
Publication date: 2004-06-07
Also published as: KR20020017076A

Abstract

지식기반 맞춤 정보 검색 시스템 및 그 서비스 방법이 개시된다. 본 발명에 따른 지식기반 맞춤 정보 검색 시스템은 검색 서비스를 제공하는 맞춤정보 검색서버 및 맞춤정보 검색서버에서 제공하는 검색 서비스를 이용하는 적어도 하나이상의 고객 단말기들을 구비하고, 맞춤정보 검색서버는 고객 단말기로부터 검색 요청된 검색어가 수신되거나 또는 주기적으로, 정보를 제공하는 다수의 웹사이트들로 접근하여 검색어를 포함하는 문서들을 검색하고, 검색된 문서에 대해서 지식기반 방식을 이용하여 웹사이트들의 웹 페이지 구조를 정보 온톨로지로서 구축하는 정보 수집부, 정보 수집부에 의해 검색된 문서들을 저장하는 검색 정보 데이터 베이스 및 검색어와 관련된 단어들을 속성 정보로서 저장하고, 검색어, 속성정보 및 검색된 문서에 대해 지식기반 방식을 이용하여 가비지 문서를 제거한 다음, 지식기반 방식을 통해 설정된 각 속성 단어들 간의 차별화된 가중치에 의해 문서의 우선순위를 결정하고, 결정된 우선순위에 따라 실시간으로 고객 단말기로 제공하거나 또는 고객 단말기의 요청에 의해 우선순위로 정렬된 문서를 고객 단말기로 제공하는 맞춤정보 제공부를 포함하는 것을 특징으로 하고, 정보 온톨로지 DB를 통해 실시간 검색을 빠르게하며, 검색된 문서에 지식기반 방식으로 가중치를 부여하여 고객의 요구에 보다 적합한 맞춤 정보를 제공할 수 있다.A knowledge-based customized information retrieval system and a service method thereof are disclosed. The knowledge-based customized information retrieval system according to the present invention includes a personalized information search server providing a search service and at least one or more customer terminals using a search service provided by the personalized information search server, and the personalized information search server searches from a customer terminal. The requested search term is received or periodically, accesses a number of websites that provide information, searches for documents containing the search term, and uses the knowledge-based method for the searched documents to identify the web page structure of the websites. An information collecting unit to construct a document, a search information database storing documents searched by the information collecting unit, and words related to a search word as attribute information, and a garbage document using a knowledge-based method for the search word, the attribute information, and the searched document. Then remove the knowledge base Priority of the document is determined by the differentiated weights of the attribute words set through the service, and the documents arranged in the order of priority by the request of the customer terminal are provided to the customer terminal in real time according to the determined priority. Characterized by providing a personalized information providing unit, it is possible to provide a personalized information more suited to the needs of customers by fastening the real-time search through the information ontology DB, weighting the searched documents in a knowledge-based manner.

Description

Knowledge base custom made information offer system and service method

본 발명은 인터넷상에서 고객으로부터 요청된 정보를 검색하여 제공하는 검색 시스템에 관한 것으로, 실시간으로 고객이 원하는 정보를 검색하고 이를 다양한 조건에 맞춰 양질의 정보만을 추출하는 특히, 지식기반 가중치를 적용하여 고객의 요구에 보다 정확한 정보를 검색하여 맞춤 정보로서 제공할 수 있는 지식 기반 맞춤 정보 검색 시스템 및 그 서비스 방법에 관한 것이다.The present invention relates to a search system for searching and providing information requested from a customer on the Internet. In particular, the present invention searches for information desired by a customer in real time and extracts only high-quality information according to various conditions. The present invention relates to a knowledge-based customized information retrieval system and a service method thereof capable of searching for more accurate information and providing the information as customized information.

종래의 정보 검색 시스템은 단순한 고객의 질의어를 받아들여 이를 공통적인 검색 시스템에 의해 추출된 공통적인 결과를 사용자에게 출력한다. 이 때, 일반적인 검색 시스템은 고객으로부터 수신된 질의어의 영역을 중심으로 구분하는 정도의 검색에 지나지 않았기 때문에 이를 받아보는 고객들은 자신의 취향 및 특성에 따라 차별화된 맞춤 정보를 얻기 어렵다. 또한, 종래에는 고객이 원하는 정보를 단순히 고객이 입력한 질의어에 대해서만 검색하므로 검색 범위가 넓어지고 이로 인해 고객이 원하는 정보와 추출결과 사이에 많은 차이가 발생하는등 검색된 정보의 정확도 및 신뢰도가 떨어진다.A conventional information retrieval system accepts a simple query of a customer and outputs a common result extracted by a common retrieval system to a user. In this case, since the general search system is only a search that divides the area of the query word received from the customer, it is difficult for the customers who receive it to obtain customized information differentiated according to their tastes and characteristics. In addition, in the related art, since the information desired by the customer is searched only for a query input by the customer, the search range is widened, and thus, the accuracy and reliability of the searched information are deteriorated due to a large difference between the information desired by the customer and the extraction result.

그리고, 종래의 정보 검색 시스템은 고객으로부터 수신된 질의어 검색시 정보를 제공하는 사이트들이 이용하는 검색 시스템에 의존하여 작업을 수행하기 때문에 검색되는 정보들의 정확도가 떨어지고, 실시간 정보제공이 어렵다. 그러나, 신문기사와 같이 생성된 즉시 또는 오랜 시간이 지나지 않은 정보의 검색이 요청될 경우, 실시간 정보 검색의 중요성은 크다고 할 수 있다.In addition, since the conventional information retrieval system performs a task depending on the retrieval system used by the sites providing the information when searching the query received from the customer, the accuracy of the retrieved information is inferior and it is difficult to provide real-time information. However, when a search for information is generated immediately, such as a newspaper article, or after a long time, it is important to search for information in real time.

또한, 종래의 맞춤 정보 검색 시스템은 고객이 원하는 정보를 검색할 때 정보 제공 사이트들에 대한 분류 작업이 없이 진행함으로써 추출하고자 하는 정보의특정 위치에 상관없이 사이트의 모든 영역에서 해당 정보를 검색하는 결과로 정보추출의 신속성과 작업의 효율성이 떨어졌다.In addition, the conventional personalized information retrieval system searches for the information in all areas of the site regardless of the specific location of the information to be extracted by proceeding without sorting the information providing sites when searching for information desired by the customer As a result, the speed of information extraction and the efficiency of work have decreased.

따라서, 신뢰성 및 정확도가 높은 정보를 실시간으로 검색/제공하는 검색 시스템이 요구된다.Therefore, there is a need for a retrieval system for searching / providing information with high reliability and accuracy in real time.

본 발명이 이루고자 하는 기술적 과제는 지식 기반 가중치 적용을 통해 고객이 요청한 정보를 보다 정확하고 신속하게 검색하여 제공할 수 있는 지식 기반 맞춤 정보 검색 시스템을 제공하는 데 있다.The technical problem to be achieved by the present invention is to provide a knowledge-based customized information retrieval system that can search and provide more accurate and faster information requested by the customer through the application of knowledge-based weight.

본 발명이 이루고자 하는 다른 기술적 과제는 상기 지식 기반 맞춤 정보 검색 시스템에서 수행되는 정보 검색 방법을 제공하는 데 있다.Another object of the present invention is to provide an information retrieval method performed in the knowledge-based customized information retrieval system.

도 1은 본 발명에 따른 지식 기반 맞춤 정보 검색 시스템의 전체 구성을 나타내는 시스템도이다.1 is a system diagram showing the overall configuration of a knowledge-based customized information retrieval system according to the present invention.

도 2는 도 1에 도시된 맞춤정보 검색서버(14)의 각 부를 보다 상세히 나타내는 블록도이다.FIG. 2 is a block diagram illustrating each part of the personalized information search server 14 shown in FIG. 1 in more detail.

도 3은 도 1에 도시된 시스템에서, 고객이 정보 검색할 때 사용되는 화면의 일 예를 나타내는 도면이다.3 is a diagram illustrating an example of a screen used when a customer searches for information in the system illustrated in FIG. 1.

도 4는 맞춤정보 검색서버(14)의 검색 결과를 나타내는 화면의 일 예를 나타내는 도면이다.4 is a diagram illustrating an example of a screen showing a search result of the personalized information search server 14.

도 5는 맞춤정보 검색서버(14)가 고객 단말기(10)로부터 요청된 정보를 검색하는 방법의 일실시예를 나타내는 흐름도이다.FIG. 5 is a flow diagram illustrating one embodiment of a method for a personalized information search server 14 to retrieve information requested from a customer terminal 10.

도 6은 도 5에서 지식기반 가중치 부여 단계(제135단계)를 상세하게 나타내는 흐름도이다.FIG. 6 is a flowchart illustrating the knowledge-based weighting step (step 135) in FIG.

상기 과제를 이루기 위해, 본 발명에 따른 지식 기반 맞춤 정보 검색 시스템은 검색 서비스를 제공하는 맞춤정보 검색서버 및 통신망을 통해 맞춤정보 검색서버와 접속되고, 맞춤정보 검색서버에서 제공하는 검색 서비스를 이용하는 적어도 하나이상의 고객 단말기들을 구비하고, 맞춤정보 검색서버는 고객 단말기로부터 검색 요청된 검색어가 수신되거나 또는 주기적으로, 정보를 제공하는 다수의 웹사이트들로 접근하여 검색어를 포함하는 문서들을 검색하고, 검색된 문서에 대해서 지식기반 방식을 이용하여 웹사이트들의 웹 페이지 구조를 정보 온톨로지로서 구축하는 정보 수집부, 정보 수집부에 의해 검색된 문서들을 저장하는 검색 정보 데이터 베이스 및 검색어와 관련된 단어들을 속성 정보로서 저장하고, 검색어, 속성정보및 검색된 문서에 대해 지식기반 방식을 이용하여 가비지 문서를 제거한 다음, 지식기반 방식을 통해 설정된 각 속성 단어들 간의 차별화된 가중치에 의해 문서의 우선순위를 결정하고, 결정된 우선순위에 따라 실시간으로 고객 단말기로 제공하거나 또는 고객 단말기의 요청에 의해 우선순위로 정렬된 문서를 고객 단말기로 제공하는 맞춤정보 제공부를 포함한다.In order to achieve the above object, the knowledge-based personalized information search system according to the present invention is connected to a personalized information search server through a personalized information search server and communication network providing a search service, at least using a search service provided by the personalized information search server Equipped with one or more customer terminals, the personalized information search server receives a search request requested from the customer terminal or periodically, access a plurality of websites providing information to search for documents containing the search terms, and retrieved documents An information collecting unit for constructing a web page structure of websites as an information ontology using a knowledge-based method, a search information database storing documents retrieved by the information collecting unit, and words related to a search word as attribute information, Search terms, attribution information, and After removing the garbage document using the knowledge-based method, the document is prioritized according to the differentiated weights of the attribute words set through the knowledge-based method, and provided to the client terminal in real time according to the determined priority. It includes a personalized information providing unit for providing to the customer terminal the documents arranged in priority order by the request of the customer terminal.

상기 다른 과제를 이루기 위해, 정보 검색 서비스를 제공하는 맞춤정보 검색서버 및 각 고객에게 구비된 적어도 하나 이상의 고객 단말기들이 통신망을 통하여 연결된 컴퓨터 네트웍 시스템에서, 맞춤정보 검색서버를 매개로 한 지식기반 맞춤 정보 서비스 방법은 고객 단말기로부터 검색 요청되는 검색어를 수신하는 (a)단계, 검색어에 대한 정보 온톨로지가 구축되었는가를 판단하는 (b)단계, (b)단계에서 정보 온톨로지가 구축되어 있으면, 정보 온톨로지를 참조하여 정보 제공 웹사이트들에서 검색어에 대한 정보가 등록된 위치로 직접 접근하여 실시간 검색을 하고, 검색된 문서를 검색 정보 데이터 베이스에 저장하는 (c)단계, 검색 정보 데이터 베이스에 저장된 문서에서 지식기반 방식을 통해 매겨진 속성정보의 중요도에 따라 가비지 문서를 제거하고, 지식기반 방식을 통해 각 문서에 가중치를 부여하여 검색된 문서의 우선순위를 결정하고, 우선순위에 따라 고객 단말기로 검색된 문서를 제공하는 (d)단계, (b)단계에서 정보 온톨로지가 구축되어 있지 않으면, 정보 제공 웹사이트들 각각의 URL로 접근하여 정보 제공 웹사이트들 전체를 실시간 검색하고, 검색된 문서를 검색 정보 데이터 베이스에 저장하는 (e)단계 및 (e)단계에서 검색된 문서의 정보 구조를 분석하여 검색어에 대한 정보 온톨로지를 구축하는 (f)단계로 이루어진다.In order to achieve the above another problem, in a computer network system in which a personalized information search server providing an information search service and at least one customer terminal provided to each customer are connected through a communication network, knowledge-based personalized information through a personalized information search server The service method may refer to the information ontology if the information ontology is constructed in the steps (a) of receiving a search request requested from the customer terminal and determining whether the information ontology for the search term has been constructed. (C) step of accessing the information about the search term from the information providing websites in real time and performing a real-time search, and storing the searched document in the search information database. Remove garbage documents according to the importance of attribute information In addition, the information ontology is constructed in steps (d) and (b) of determining the priority of the searched document by weighting each document through a knowledge-based method, and providing the searched document to the customer terminal according to the priority. If not, the information structure of the document retrieved in steps (e) and (e) accessing the URLs of the information providing websites in real time to search the entire information providing websites in real time, and storing the retrieved document in a search information database. (F) step of constructing the information ontology for the search word by analyzing the.

상기 다른 과제를 이루기 위해, 특정 웹사이트에 링크되고, 링크된 사이트에 대한 정보를 검색 및 제공하는 검색 서버 및 각 고객에게 구비된 적어도 하나 이상의 고객 단말기들이 통신망을 통하여 연결된 컴퓨터 네트웍 시스템에서, 맞춤 정보 검색 방법은 링크된 사이트에 대한 정보 온톨로지를 구축하는 (a)단계, 정보 온톨로지를 참조하여 정보 제공 웹사이트들에서 검색어에 대한 정보가 등록된 위치로 직접 접근하여 주기적으로 문서 검색을 하고, 검색된 문서를 검색 정보 데이터 베이스에 저장하는 (b)단계, 검색 정보 데이터 베이스에 저장된 문서에서 지식기반 방식을 통해 매겨진 속성정보의 중요도에 따라 가비지 문서를 제거하고, 지식기반 방식을 통해 각 문서에 가중치를 부여하여 검색된 문서의 우선순위를 결정하여 저장하는 (c)단계 및 고객 단말기로부터 정보 제공의 요청이 있으면, (c)단계에서 우선순으로 저장된 문서를 추출하여 고객 단말기로 제공하는 (d)단계로 이루어진다.In order to achieve the above another object, in a computer network system linked to a specific website, a search server for searching and providing information on the linked site, and at least one customer terminal provided to each customer through a communication network, the personalized information In the search method, in step (a) of constructing an information ontology for a linked site, a document search is periodically performed by directly accessing a location where information on a search term is registered in information providing websites by referring to the information ontology and searching for a document. (B) storing the data in the search information database, removing garbage documents according to the importance of attribute information assigned through the knowledge base method from the documents stored in the search information database, and assigning weights to each document through the knowledge base method. (C) and determining the priority of the retrieved documents At the request of the service information from the terminal, made of a (d) providing to the client terminal to extract the stored order to the first document in step (c).

이하, 본 발명에 따른 지식기반 맞춤 정보 검색 시스템 및 그 서비스 방법을 첨부한 도면들을 참조하여 다음과 같이 설명한다.Hereinafter, a knowledge-based customized information retrieval system and a service method thereof according to the present invention will be described with reference to the accompanying drawings.

도 1은 본 발명에 따른 지식기반 맞춤 정보 검색 시스템의 전체 구성을 나타내는 시스템도이다. 맞춤정보 검색서버(14)는 맞춤 정보 제공 시스템을 주관하는 회사에 구비된 컴퓨터이며, 고객 단말기(10)는 맞춤정보 검색서버(14)에서 운용하는 시스템을 이용하여 정보 검색 서비스를 받는 자의 컴퓨터이다. 고객 단말기(10)는 인터넷과 같은 통신망(12)을 통하여 맞춤정보 검색서버(14)에 연결되어 있다.1 is a system diagram showing the overall configuration of a knowledge-based customized information retrieval system according to the present invention. The personal information search server 14 is a computer provided in a company that hosts a personalized information providing system, and the customer terminal 10 is a computer of a person who receives an information search service using a system operated by the personal information search server 14. . The customer terminal 10 is connected to the customized information retrieval server 14 through a communication network 12 such as the Internet.

맞춤정보 검색서버(14)에는 고객이 요청하는 정보를 검색하기 위한 프로그램및 데이터베이스를 구비하고 있다. 고객 단말기(10)들은 HTML(Hyper Text Markup Language)의 형태로 웹 내용을 디스플레이 할 수 있는 인터넷 브라우저(예를 들어, Netscape, Internet Explorer)를 가지고 있다. 인터넷 브라우저는 각 컴퓨터가 맞춤정보 검색서버(14)에 있는 정보 검색 HTML 템플릿의 내용을 액세스하여 디스플레이할 수 있도록 한다. 맞춤정보 검색서버(14)의 HTML 템플릿은 이용자에게 디스플레이할 메인 웹(WWW) 페이지를 가지고 있다.The customized information search server 14 is provided with a program and a database for searching for information requested by a customer. The customer terminals 10 have Internet browsers (eg, Netscape, Internet Explorer) capable of displaying web content in the form of Hyper Text Markup Language (HTML). An internet browser allows each computer to access and display the contents of the information retrieval HTML template in the customized information retrieval server 14. The HTML template of the custom information retrieval server 14 has a main web (WWW) page to display to the user.

도 1을 참조하여, 고객 단말기(10)는 맞춤 정보 제공 시스템에서 제공하는 서비스를 이용하기 위해, 먼저 고객 자신의 개인 정보를 입력하고, 아이디(ID) 및 패스 워드를 설정하여 회원 가입을 한다. 이처럼 회원 가입된 고객이 고객 단말기(10)를 이용하여 맞춤정보 검색서버(14)로 정보 검색을 요청하면, 정보 수집부(14a)는 소정의 검색 드라이버를 이용하여 고객이 요청한 정보를 검색하여 수집한다. 여기서, 고객으로부터 검색 요청되는 정보는 인물, 기관, 회사, 학교, 토픽(topic)등 모든 정보를 대상으로 한다. 또한, 맞춤정보 검색서버(14)는 인터넷 신문, 인터넷 잡지, 인터넷 방송, 논문 등 인터넷을 통해 제공되는 모든 정보 제공 웹사이트를 검색 대상으로 하며, 검색 대상으로 하는 정보 제공 웹사이트들의 URL(Uniform Resource Locator)을 데이터 베이스화하여 저장하고 있다.Referring to FIG. 1, in order to use a service provided by a personalized information providing system, the customer terminal 10 first registers his or her personal information, sets an ID and a password, and registers as a member. When the registered customer requests a search for information to the personalized information search server 14 using the customer terminal 10, the information collecting unit 14a searches for and collects information requested by the customer using a predetermined search driver. do. Here, the information requested to be searched by the customer is for all information such as person, institution, company, school, topic. In addition, the personalized information search server 14 targets all information providing websites provided through the Internet such as internet newspapers, internet magazines, internet broadcasting, and articles, and searches URLs of information providing websites that are searched. Locator is stored as a database.

맞춤정보 검색서버(14)는 고객으로부터 검색 요청될 수 있는 검색어들을 보다 구체적으로 한정할 수 있는 속성정보들을 지식기반 데이터 베이스로 미리 구축한다. 그리고, 고객 단말기(10)를 통해 검색 요청되는 검색어가 통신망(12)을 통해 수신되면, 맞춤정보 검색서버(14)는 다수의 정보 제공 웹사이트들을 검색하여 검색어가 포함된 문서들을 검색하고, 검색어에 대한 속성 정보를 이용하여 검색된 문서들 중 고객에게 유용한 문서들만 추출한다. 또한, 맞춤정보 검색서버(14)는 속성 정보에 따라 추출된 문서들 각각의 가중치를 부여한다. 여기서, 가중치가 높을수록 고객의 요구에 보다 정확한 맞춤정보라 할 수 있다. 또한, 검색어에 대한 속성정보는 맞춤 정보 검색서버(14)가 반복적 검색에 의한 학습을 통해 구축될 수 있으며, 전문가에 의해 직접 입력될 수 있다. 구체적으로, 맞춤정보 검색서버(14)는 정보 수집부(14a), 검색 정보 데이터 베이스(DB, 14b) 및 지식기반 맞춤정보 제공부(14c)를 포함하여 구성된다.The custom information search server 14 pre-builds the attribute information that can more specifically define the search terms that can be requested from the customer in the knowledge base. When the search request requested through the customer terminal 10 is received through the communication network 12, the personalized information search server 14 searches a plurality of information providing websites to search for documents including the search word, and the search word. Using only the attribute information on, extracts only those documents that are useful to the customer. In addition, the customized information search server 14 assigns a weight to each of the extracted documents according to the attribute information. In this case, the higher the weight, the more accurate the customized information to the customer's needs. In addition, the attribute information for the search word may be built by the customized information search server 14 through repetitive search and may be directly input by an expert. Specifically, the customized information search server 14 includes an information collecting unit 14a, a search information database DB 14b, and a knowledge-based customized information providing unit 14c.

정보 수집부(14a)는 고객 컴퓨터(10)로부터 검색 요청되는 검색어가 통신망(12)을 통해 수신되면, 다수개의 정보 제공 웹사이트들을 검색하여 검색어를 포함하는 문서들을 검색하고 검색된 문서들을 검색 정보 DB(14b)에 저장한다.When the search request request from the customer computer 10 is received through the communication network 12, the information collecting unit 14a searches a plurality of information providing websites, searches for documents including the search word, and searches the searched documents for search information DB. Store in 14b.

맞춤정보 제공부(14c)는 검색 요청되는 검색어를 구체적으로 한정할 수 있는속성 정보를 지식기반 DB로서 구축하고 있으며, 지식 기반 DB를 이용하여 검색 정보 DB(14b)에 검색되어 저장된 문서들 중 고객에게 유용한 문서들만 추출하여 제공한다. 전술된 바와 같이, 속성정보는 검색어를 보다 구체적으로 한정할 수 있는 정보로서, 검색어가 특정 인물이라면, 인물명과 함께 소속, 직업, 영문이름, 한자이름, 별명 등이 함께 저장된다. 또한, 검색어가 특정 기관명이라면, 기관명과 함께, 부처명, 별명, 영문이름, 한자이름 등이 속성정보로서 함께 저장되고, 검색어가 회사명이라면, 회사명과 함께 부서명, 별명, 영문이름, 한자이름 등이 함께 저장되고, 검색어가 학교이름이라면 학교이름과 함께, 지역, 별명, 영문이름, 한자이름등이 속성정보로서 함께 저장된다. 그리고, 토픽의 경우 토픽을 구체적으로 한정할 수 있는 중요 키워드가 속성정보로서 함께 저장된다. 이처럼, 지식기반 DB에 검색어를 구체적으로 한정하는 다양한 속성 정보들이 함께 저장됨으로써, 쓸모없는 정보 즉, 가비지(garbage) 정보의 검색을 최소화할 수 있으며, 결국 고객에게 유용한 문서만이 추출된 맞춤정보를 제공할 수 있다. 예를 들어, 특정 인물에 대한 정보 검색시, 동명이인은 소속, 직업, 영문이름, 한자이름, 별명 등과 같은 속성 정보에 의해 쉽게 구분될 수 있다.The personalized information providing unit 14c is constructing attribute information that can specifically define a search query requested as a knowledge-based DB, and among the documents retrieved and stored in the search-information DB 14b using the knowledge-based DB Only documents that are useful to you are extracted and provided. As described above, the attribute information is information for more specifically defining the search word. If the search word is a specific person, the attribute information is stored along with the person's name, occupation, English name, Chinese character name, and nickname. In addition, if the search term is a specific institution name, along with the institution name, the department name, nickname, English name, Chinese character name, etc. are stored together as attribute information.If the search term is a company name, the department name, nickname, English name, Chinese character name, etc. If the search word is a school name, together with the school name, region, nickname, English name, Chinese character name, etc. are stored together as attribute information. In the case of a topic, important keywords that can specifically define the topic are stored together as attribute information. As such, various attribute information that specifically defines a search word is stored together in the knowledge base, thereby minimizing the retrieval of useless information, that is, garbage information. Can provide. For example, when searching for information on a specific person, the same name person can be easily distinguished by attribute information such as affiliation, occupation, English name, Chinese name, nickname, and the like.

계속해서, 지식기반 맞춤정보 제공부(14c)는 속성 정보를 이용하여 가비지 문서를 제거하여 고객에게 유용한 문서들만을 추출한다. 그런 다음, 속성 정보 및 문서의 위치정보 등을 이용하여 고객에게 유용한 문서들 각각에 가중치를 부여한다. 여기서, 추출된 문서의 위치정보는 추출된 문서를 제공한 웹사이트에서 추출된 문서의 위치에 대한 정보이다. 예를 들어, 인터넷 신문에서 문서가 추출되었다면 추출된 문서의 위치는 그 중요도를 반영하는 정보가 될 수 있으며 따라서, 문서의 위치에 따라 중요도를 반영하는 가중치가 달라질 수 있다. 또한, 추출된 문서의 내용에 검색어 및 지식기반 DB에 저장된 검색어에 대한 속성 정보를 나타내는 단어들이 얼마나 많이 포함되어있고 서로간의 관계를 가지며 나타나는가에 따라 가중치가 결정될 수 있다. 이러한 가중치는 고객의 요구에 맞춤정도를 나타내는 척도이며, 가중치가 높을수록 고객에게 유용한 정보라 할 수 있다. 또한, 속성 정보 각각에 가중치를 부여할 수 있다. 즉, 추출된 문서에 속성 정보에 해당하는 단어가 동일한 횟수로 나타난다 하더라도 속성 정보의 가중치에 따라 추출된 문서의 가중치가 달라질 수 있다. 그리고, 추출된 문서에서 가중치가 높은 속성 정보에 해당하는 단어가 서로 묶여서 나타나면 가중치가 또한 높아진다. 이처럼, 추출된 문서는 문서의 위치 및 속성정보에 따라 분석된 결과가 복합적으로 작용하여 그 우선순위가 결정된다.Subsequently, the knowledge-based personalized information providing unit 14c removes the garbage document by using the attribute information and extracts only documents useful to the customer. Then, using the attribute information and the location information of the document, and weight each of the documents useful to the customer. Here, the location information of the extracted document is information on the location of the document extracted from the website providing the extracted document. For example, if a document is extracted from an Internet newspaper, the location of the extracted document may be information reflecting its importance, and accordingly, the weight reflecting the importance may vary according to the location of the document. In addition, the weight of the extracted document may be determined according to how many words representing the attribute information of the search word and the search word stored in the knowledge base DB are related to each other and appear. This weight is a measure of the degree of fit to the customer's needs, the higher the weight is useful information to the customer. In addition, weights may be assigned to each attribute information. That is, even if the word corresponding to the attribute information appears in the extracted document the same number of times, the weight of the extracted document may vary according to the weight of the attribute information. In addition, when words corresponding to high-weighted attribute information are grouped together in the extracted document, the weight is also increased. In this way, the extracted document is analyzed according to the location and attribute information of the document, and the priority of the extracted document is determined.

한편, 정보 수집부(14a)는 검색 정보 DB(14b)에 저장된 데이터를 이용하여 정보 온톨로지 DB를 구축할 수 있다. 여기서, 정보 온톨로지는 정보 수집부(14a)에 의해 검색된 문서의 웹사이트에서의 위치 정보를 나타낸다. 예를 들어, 고객이 요청한 검색어를 실시간으로 검색하기 위해, 정보 수집부(14a)는 검색 대상 URL을 이용하여 정보 제공 웹사이트에 실시간 접속한 다음, 각 섹션별로 검색한다. 또한, 정보 수집부(14a)는 각 섹션에서 뎁스별로 정보를 검색하여 원하는 정보를 검색하게 된다. 이러한 절차를 거쳐 검색어를 포함하는 문서들을 검색하며, 검색된 문서는 그 문서의 위치정보와 함께 검색 정보 DB(14b)에 저장된다. 여기서, 위치 정보는 검색된 문서의 웹사이트명(URL명, 예컨대, 동아일보, 한겨레신문, 조선일보...), 섹션 이름(예컨대, 연예면, 스포츠면, 오락면...), 뎁스(depth) 등이 포함된다. 이 때, 뎁스는 예컨대 신문의 경우 등록되는 기사들을 수직 구조로 분류하는 단위로서, 낮은 뎁스일수록 비중이 큰 기사이고, 뎁스가 높아질수록 비중이 작은 기사로 분류할 수 있다.Meanwhile, the information collecting unit 14a may build an information ontology DB using data stored in the search information DB 14b. Here, the information ontology represents the positional information on the website of the document retrieved by the information collecting unit 14a. For example, in order to search the search term requested by the customer in real time, the information collecting unit 14a accesses the information providing website in real time using the search target URL, and then searches for each section. In addition, the information collecting unit 14a searches for the desired information by searching the information for each depth in each section. Through this procedure, documents including a search word are searched, and the searched documents are stored in the search information DB 14b together with the location information of the documents. Here, the location information may include the website name (URL name, for example, Dong-A Ilbo, Hankyoreh Newspaper, Chosun Ilbo ...), section name (eg, entertainment, sports, entertainment ...), and depth depth). In this case, the depth, for example, in the case of newspapers, is a unit for classifying articles in a vertical structure. The lower the depth, the greater the weight of the article, and the higher the depth, the weight may be classified into the smaller article.

정보 수집부(14a)는 검색된 기사의 위치정보만을 추출하여 정보 온톨로지 DB를 구축한다. 이처럼, 정보 온톨로지 DB가 구축되면, 향후 동일 검색어에 대한 검색이 다시 요청되면 정보 온톨로지 DB를 참조하여 직접 해당 위치로 접근하여 해당문서를 검색함으로써 정보 검색에 걸리는 시간을 단축할 수 있다. 만약, 정보 온톨로지가 구축되지 않는다면 정보 수집부(14a)는 정보 검색 때마다 해당 URL로 이동하여 각 섹션에서 뎁스별로 모든 기사를 검색해야 하므로 검색시간이 많이 걸리게 된다.The information collecting unit 14a extracts only the location information of the retrieved article to build an information ontology DB. As such, when the information ontology DB is constructed, when a search for the same search word is requested again in the future, the time required for the information search can be shortened by directly accessing the corresponding location by searching the corresponding document by referring to the information ontology DB. If the information ontology is not established, the information collecting unit 14a takes a lot of searching time because the information collecting unit 14a needs to search all the articles for each section in each section by searching for the corresponding URL.

한편, 이상에서는 맞춤정보 검색서버(14)가 고객 단말기(10)로부터 검색어에 대한 정보 검색이 요청되면 실시간으로 정보를 검색하여 고객 단말기(10)로 제공하는 것을 설명하였다. 그러나, 맞춤정보 검색서버(14)는 주기적으로 특정 검색어에 대한 정보를 검색하여 내부 데이터 베이스에 저장하고 있다가 고객 단말기(10)로부터 특정 검색어에 대한 정보 제공이 요청되면 내부 데이터 베이스를 참조하여 특정 검색어에 데이터를 추출하여 고객 단말기로 제공할 수 있다. 예를 들어, 특정 유명인의 홈페이지에 맞춤정보 검색서버(14)가 링크되는 경우, 맞춤정보 검색서버(14)는 주기적으로 특정 유명인에 대한 정보를 검색하여 저장하고 있다가 고객으로부터 정보 제공 요청이 되면 저장하고 있는 내용을 추출하여 제공한다.On the other hand, it has been described above that the personalized information search server 14 searches for information in real time when the information search for the search word is requested from the customer terminal 10 and provides the information to the customer terminal 10. However, the personalized information search server 14 periodically searches for information on a specific search term and stores the information in an internal database. When a request for providing information about a specific search term is requested from the customer terminal 10, the personal information search server 14 refers to the internal database. Data may be extracted to a search term and provided to a customer terminal. For example, when the personalized information search server 14 is linked to the homepage of a specific celebrity, the personalized information search server 14 periodically searches for and stores information about the specific celebrity, and when a request is made to provide information from a customer. Extract and save the contents.

도 2는 도 1에 도시된 맞춤정보 검색서버(14)의 각 부를 보다 상세히 나타내는 블록도이다. 도 2를 참조하여, 정보 수집부(14a)는 URL 리스트 DB(20), 검색 드라이버(30), 정보 온톨로지 DB(40), 정보 온톨로지 구축부(50)를 포함하여 구성된다. 또한, 지식기반 맞춤정보 제공부(14c)는 지식기반 DB(60), 상세정보 DB(70), 고객 DB(80), 지식기반 가중치 부여부(90)를 포함하여 구성된다.FIG. 2 is a block diagram illustrating each part of the personalized information search server 14 shown in FIG. 1 in more detail. Referring to FIG. 2, the information collecting unit 14a includes a URL list DB 20, a search driver 30, an information ontology DB 40, and an information ontology constructing unit 50. In addition, the knowledge-based custom information providing unit 14c is configured to include a knowledge base DB 60, detailed information DB 70, customer DB 80, knowledge base weighting unit 90.

도 2를 참조하여, URL 리스트 DB(20)에는 다수개의 정보 제공 웹사이트들의 URL명이 저장된다. 전술된 바와 같이, 정보 제공 웹사이트는 인터넷 신문, 인터넷잡지, 인터넷 방송 및 인터넷 상으로 제공되는 각종 문서들이 모두 포함되며, 그들에 대한 URL 명이 URL 리스트 DB(20)에 저장된다.Referring to FIG. 2, URL names of a plurality of information providing websites are stored in the URL list DB 20. As described above, the information providing website includes all Internet newspapers, Internet magazines, Internet broadcasting, and various documents provided on the Internet, and URL names for them are stored in the URL list DB 20.

정보 온톨로지 DB(40)는 고객이 검색 요청한 정보의 위치를 저장하고 있는 데이터 베이스이다. 정보 온톨로지 DB(40)의 구축은 검색 드라이버(30)를 이용하여 검색된 정보의 웹 페이지 구조 즉, URL 명, 섹션명 및 뎁스를 분석하고, 분석된 결과를 온톨로지 정보로서 정보 온톨로지 DB(40)에 저장함으로써 구축될 수 있다. 또한, 해당 분야의 전문가가 정보 온톨로지를 업데이트할 수 있다. 예컨대, 스포츠 전문가는 스포츠 또는 스포츠 선수들에 대한 정보를 보유하고 있는 전문가들을 통해 최신 정보가 어떤 사이트의 어느 섹션에 있는가 등에 대한 정보를 얻어 정보 온톨로지 DB(40)를 업데이트할 수 있다.The information ontology DB 40 is a database that stores the location of the information requested by the customer. The construction of the information ontology DB 40 analyzes the web page structure of the retrieved information, that is, the URL name, section name, and depth, using the search driver 30, and the analyzed result as ontology information to the information ontology DB 40. Can be built by storing. In addition, experts in the field can update the information ontology. For example, the sports expert may update the information ontology DB 40 by obtaining information on which section of the site the latest information is located through the experts having information on the sports or sports athletes.

검색 드라이버(30)는 URL 리스트 DB(20)에 저장된 URL을 주소로하는 정보 제공 사이트를 검색하여 고객이 요청한 정보를 검색한다. 구체적으로, 검색 드라이버(30)에는 메타 검색 드라이버(32)와 실시간 검색 드라이버(34)를 포함하여 구성된다. 여기서, 메타 검색 드라이버(32)는 URL 리스트 DB(20)에 리스트된 각 정보 제공 웹사이트들 각각이 운용하는 검색 엔진을 이용하여 찾고자 하는 정보를 검색한다. 이처럼 메타 검색 드라이버(32)를 이용하여 정보를 검색하는 경우, 검색 시간 이전에 등록된 정보를 검색하는 데 이용된다. 실시간 검색 드라이버(34)는 URL 리스트 DB(20)에 리스트된 각 정보 제공 웹사이트들을 실시간 검색하여 최신의 정보를 검색한다. 이 때, 실시간 검색 드라이버(34)는 각 정보 제공 사이트를 검색하기 전에 정보 온톨로지 DB(40)를 참조하여, 찾고자 하는 정보에 대한 온톨로지가구축되어 있는가를 검색한다. 만약, 검색하고자 하는 정보의 온톨로지 구축이 되어있다면, 해당 위치로 직접 접근할 수 있어 정보 검색에 걸리는 시간을 단축할 수 있다. 그러나, 정보 온톨로지가 구축되지 않았다면 검색 드라이버(30)는 각 URL의 각 섹션별 뎁스별로 모든 기사를 검색해야 하므로 검색시간이 많이 걸리게 된다.The search driver 30 searches the information providing site whose address is the URL stored in the URL list DB 20 and searches for information requested by the customer. Specifically, the search driver 30 includes a meta search driver 32 and a real time search driver 34. Here, the meta search driver 32 searches for information to be searched using a search engine operated by each of the information providing websites listed in the URL list DB 20. As described above, when information is searched using the meta search driver 32, it is used to search for information registered before the search time. The real-time search driver 34 searches in real time the respective information providing websites listed in the URL list DB 20 to search for the latest information. At this time, the real-time search driver 34 refers to the information ontology DB 40 before searching each information providing site, and searches whether the ontology for the information to be searched is constructed. If the ontology of the information to be searched is established, it is possible to directly access the corresponding location, thereby reducing the time required for information search. However, if the information ontology is not established, the search driver 30 needs to search all the articles by the depth of each section of each URL, which takes a lot of searching time.

이처럼, 검색 드라이버(30)는 메타 검색 드라이버(32)와 실시간 검색 드라이버(34)를 이용하여 검색 시간 이전에 각 사이트에 등록된 정보와 최신 정보를 동시에 검색할 수 있으며, 검색된 결과를 검색 정보 DB(14b)에 저장한다.As such, the search driver 30 may search the information registered in each site and the latest information at the same time before the search time by using the meta search driver 32 and the real time search driver 34, and search the search results DB Store in 14b.

온톨로지 구축부(50)는 검색 정보 DB(14b)에 저장된 정보를 이용하여 정보의 위치를 나타내는 정보만을 추출하여 정보 온톨로지를 구축하고, 그 결과를 정보 온톨로지 DB(40)에 저장하여, 정보 온톨로지 DB(40)를 업데이트한다. 구체적으로, 온톨로지 구축부(50)는 문서구조 전처리부(52) 및 온톨로지 생성부(54)를 포함하여 구성된다.The ontology construction unit 50 extracts only the information indicating the location of the information by using the information stored in the search information DB 14b, constructs the information ontology, stores the result in the information ontology DB 40, and stores the information ontology DB. Update 40. Specifically, the ontology construction unit 50 is configured to include a document structure preprocessor 52 and an ontology generator 54.

문서구조 전처리부(52)는 검색된 각 문서의 위치 정보를 만을 추출한다. 그리고, 온톨로지 생성부(54)는 각 문서에서 추출된 위치정보를 이용하여 상위 URL에서 하위 URL로 문서 구조를 생성하고 생성된 문서 구조를 정보 온톨로지 DB(40)에 저장한다.The document structure preprocessor 52 extracts only the location information of each retrieved document. The ontology generator 54 generates a document structure from the upper URL to the lower URL using the location information extracted from each document and stores the generated document structure in the information ontology DB 40.

계속해서, 지식기반 맞춤정보 제공부(14c)의 지식기반 DB(60)는 고객이 요청하는 검색어와 검색어에 대한 속성정보가 저장된다. 여기서, 전술된 바와 같이 속성 정보 각각은 가중치를 가질 수 있다.Subsequently, the knowledge base DB 60 of the knowledge base personalized information providing unit 14c stores the search word requested by the customer and the attribute information of the search word. As described above, each of the attribute information may have a weight.

상세정보 DB(70)는 지식기반 DB(60)에 저장된 속성 정보를 보다 상세히 나타내는 힌트정보가 저장된다. 예를 들어, 1집, 2집, 콘서트, 앨범 등은 가수에 대한 힌트정보이다. 즉, 검색된 문서에서 1집, 2집, 콘서트, 앨범 등의 단어가 검출되면 이는 문서 내에 가수라는 단어가 검출되지 않더라도 가수에 대한 기사임을 알 수 있도록 한다.The detailed information DB 70 stores hint information representing the attribute information stored in the knowledge base DB 60 in more detail. For example, first album, second album, concert, album, and the like are hint information about the singer. In other words, if a word such as a first album, a second album, a concert, an album, etc. is detected in the searched document, it may be known that the article is about the singer even if the word singer is not detected in the document.

고객 DB(80)는 고객에 대한 정보 즉, 고객 ID 및 비밀 번호를 고객별로 저장하고 있다.The customer DB 80 stores information about the customer, that is, a customer ID and a password for each customer.

지식기반 가중치 부여부(90)는 검색 정보 DB(14b)에 저장된 검색된 문서들을 속성정보를 이용하여 가비지 문서를 제거하여 유용한 문서만을 추출하고, 추출된 유용한 문서는 속성정보 및 문서의 위치 정보를 이용하여 가중치를 부여한다. 그리고, 지식기반 가중치 부여부(90)는 가중치가 부여된 문서를 고객 맞춤 정보로서 통신망(12)을 통해 고객 단말기(10)로 제공한다. 구체적으로, 지식기반 가중치 부여부(90)는 문서 편집부(92), 가중치 부여부(94) 및 지식 기반 문서 DB(96)를 포함하여 구성된다.The knowledge-based weighting unit 90 extracts only the useful documents by removing garbage documents using the attribute information of the retrieved documents stored in the search information DB 14b, and the extracted useful document uses the attribute information and the location information of the document. To give a weight. The knowledge-based weighting unit 90 provides the weighted document to the customer terminal 10 through the communication network 12 as the customized information. In detail, the knowledge-based weighting unit 90 includes a document editing unit 92, a weighting unit 94, and a knowledge-based document DB 96.

문서 편집부(92)는 검색 정보 DB(14b)를 참조하여 고객이 입력한 검색어를 포함하는 문서를 추출하고, 추출된 문서에서 조사와 같은 불용어를 제거한다. 그리고, 문서의 위치와 지식기반 방식을 적용하여 추출한 속성정보를 이용하여 문서를 편집한다. 구체적으로, 문서 편집부(92)는 검색 대상의 속성별(별명, 지역, 키워드, 힌트...)로 추출된 문서를 분류하고, 문서의 위치를 추출한다.The document editing unit 92 refers to the search information DB 14b to extract a document including a search word input by a customer, and removes stopwords such as an investigation from the extracted document. The document is edited using the extracted attribute information by applying the document location and knowledge-based method. Specifically, the document editing unit 92 classifies the extracted document by attribute (alias, region, keyword, hint ...) of the search target, and extracts the document position.

가중치 부여부(94)는 지식기반 문서 편집부(92)에서 속성정보에 따라 분석된 결과 및 위치 정보를 이용하여 각 문서에 가중치를 부여한다. 예를 들어, 추출된문서들에서 지식기반 방식을 통해 추출된 속성 단어가 언급되는 회수, 각 속성단어들간의 추출관계, 추출되는 단어가 속한 속성의 중요도, URL 및 뎁스에 따라 그 우선순위가 결정될 수 있다. 즉, 속성 단어가 언급되는 회수가 많을수록, 연관되는 속성단어들이 밀접하게 연결되어 나타나는 회수가 많을수록, 나타난 속성 단어의 중요도가 높을수록, 뎁스가 낮을수록 중요 문서로 인정되어 우선순위가 높아진다. 또한, 가중치 부여부(94)는 각 문서에 부여된 가중치에 따라 문서의 우선순위를 결정하고, 우선순위에 따라 문서들을 정렬하여 문서 DB(96)에 저장한다. 그리고, 전술된 바와 같이 실시간으로 고객 단말기(10)로 제공하거나 또는 고객 단말기(10)로부터의 요청이 있으면 문서 DB를 참조하여 해당 문서를 추출하여 고객 단말기(10)로 제공한다.The weighting unit 94 assigns a weight to each document by using the result and position information analyzed according to the attribute information in the knowledge-based document editing unit 92. For example, the priority of the extracted attributes may be determined according to the number of times that the extracted attribute words are mentioned through the knowledge-based method, the extraction relationship among the respective attribute words, the importance of the attribute to which the extracted word belongs, the URL and the depth. Can be. That is, the more the number of attribute words is mentioned, the more the number of related attribute words are closely connected, the higher the importance of the displayed attribute words, and the lower the depth is, the more important the document is recognized and the priority is increased. In addition, the weighting unit 94 determines the priority of the documents according to the weights assigned to the respective documents, sorts the documents according to the priorities, and stores the documents in the document DB 96. Then, as described above, if provided to the customer terminal 10 in real time or if a request from the customer terminal 10, the document is extracted with reference to the document DB and provided to the customer terminal 10.

도 3은 도 1에 도시된 시스템에서, 고객이 정보 검색할 때 사용되는 화면의 일 예를 나타내는 도면이다. 도 3을 참조하여, 맞춤정보 검색서버(14)는 고객이 정보 검색을 요청하면, 고객 단말기(10)로 도 3에 도시된 바와 같이 고객이 검색하고자 하는 인물의 "이름", "직업", "소속", "키워드", "가명", "영문이름", "한문이름"을 입력할 수 있는 화면을 제공한다. 고객은 검색하고자 하는 대상인 "박찬호"와 그에 대한 속성정보인 "직업"란에 "야구", "소속"란에 "다저스"를 각각 입력한다. 그리고, "키워드"란에 "LA", "메이저리그", "삼진", "세이브", "투수" 등, "박찬호"를 나타낼 수 있는 키워드들을 입력하며, "가명"란에 "코리안 특급"을 입력하며, "영문이름" 및 "한문이름"에 각각 데이터를 입력한 후, "확인" 키를 클릭하여 입력된 내용을 맞춤정보 검색서버(14)로 전송한다.3 is a diagram illustrating an example of a screen used when a customer searches for information in the system illustrated in FIG. 1. Referring to FIG. 3, when the customer requests information retrieval, the personalized information search server 14 sends the customer name 10 to the customer terminal 10 as shown in FIG. 3. It provides a screen where you can enter "Own", "Keyword", "A pseudonym", "English name" and "Chinese name". The customer inputs "Darkers" in the field "Baseball" and "Belonging" in the field "Park Chan-ho" and attribute information "Occupation" field. In the "Keyword" field, "LA", "Major League", "Striking", "Save", "Pitcher", and other keywords that can represent "Park Chan-ho", and "A pseudonym" in the "Corian Express" After inputting the data in the "English name" and "Chinese name", respectively, and click the "OK" key to transmit the input information to the personalized information search server (14).

도 4는 맞춤정보 검색서버(14)의 검색 결과를 나타내는 화면의 일예를 나타내는 도면이다. 맞춤정보 검색서버(14)는 정보 제공 웹사이트들을 검색하여 "박찬호"가 실린 기사를 모두 추출하고, 추출된 기사가 실린 위치, 기사 내용에 속성정보가 실린 회수등을 이용하여 가중치를 생성한 후, 도 4에 도시된 바와 같이, 기사의 제목, 가중치(W), 정보 제공 출처, 날짜를 표시한다. 도 4에는 맞춤정보 검색서버(14)가 검색된 기사를 날짜순으로 정렬하였으나, 가중치별로 또는 정보 제공 사이트별로 각각 정렬할 수 있다. 그리고, 고객은 도 4와 같이 고객 단말기(10)에 표시되는 기사들 중 원하는 기사를 클릭하면 그에 대한 내용을 확인할 수 있다.4 is a diagram illustrating an example of a screen showing a search result of the personalized information search server 14. The personalized information search server 14 searches for information-providing websites, extracts all articles containing “Park Chan Ho”, and generates weights using the location of the extracted articles and the number of times the attribute information is included in the article contents. 4, the title, weight (W), information source, and date of the article are displayed. In FIG. 4, the personalized information search server 14 sorts the searched articles by date, but may be sorted by weight or by information providing sites. Then, when the customer clicks the desired article among the articles displayed on the customer terminal 10 as shown in FIG.

도 1, 도 2 및 도 5를 참조하여, 맞춤정보 검색서버(14)는 고객 단말기(10)로부터 고객 ID 및 비밀번호와 같은 고객 정보를 수신한다(제100단계). 맞춤정보 검색서버(14)는 등록된 고객임을 확인하고, 고객 단말기(10)로부터 검색어를 수신한다(제105단계). 이 때, 맞춤정보 검색서버(14)는 고객 단말기(10)로부터 검색어의 속성정보를 함께 수신할 수 있다.1, 2 and 5, the customized information search server 14 receives customer information such as a customer ID and a password from the customer terminal 10 (step 100). The personalized information search server 14 confirms that it is a registered customer and receives a search word from the customer terminal 10 (step 105). At this time, the customized information search server 14 may receive the attribute information of the search word from the customer terminal 10 together.

맞춤정보 검색서버(14)는 고객 단말기(10)로부터 수신된 검색어에 대한 정보 온톨로지가 온톨로지 DB(40)에 구축되었는가를 확인한다(제110단계). 여기서, 검색어에 대한 정보 온톨로지는 전술된바와 같이 검색어를 포함하는 문서가 정보 제공 웹사이트에 등록되는 위치 정보이다. 제110단계에서, 정보 온톨로지가 정보 온톨로지 DB(40)에 구축되었다면, 맞춤정보 검색서버(14)의 실시간 정보 검색드라이버(34)는 정보 온톨로지 DB(40)를 참조하여 해당 검색어를 포함하는 문서가 존재하는 위치로 직접 접근하여 문서를 검색할 수 있다. 또한, 맞춤정보 검색서버(14)의 메타 검색 드라이버(32)는 검색 시점 이전에 정보 제공 사이트들에 등록된 문서로부터 검색어가 포함된 문서를 검색하고(제125단계), 검색된 문서를 검색 정보 DB(14b)에 저장한다(제130단계).The customized information search server 14 checks whether the information ontology for the search word received from the customer terminal 10 is built in the ontology DB 40 (step 110). Here, the information ontology for the search word is location information in which a document including the search word is registered in the information providing website as described above. In operation 110, if the information ontology is built in the information ontology DB 40, the real-time information retrieval driver 34 of the customized information retrieval server 14 may refer to the information ontology DB 40 to display a document including the corresponding search word. You can search the document by accessing the existing location directly. In addition, the meta search driver 32 of the personalized information search server 14 searches for a document including a search word from the documents registered in the information providing sites before the search point (step 125), and retrieves the searched document from the search information DB. In operation 14b, the controller 14 stores the stored information at 14b.

반면, 제110단계에서 정보 온톨로지가 구축되지 않았다면 정보 제공 웹사이트들 각각의 URL로 접근하여 정보 제공 웹사이트들에 등록된 전체 문서를 실시간 검색하고(제115단계), 검색된 문서의 위치 구조를 분석하여 정보 온톨로지 DB(40)를 업데이트하며(제120단계), 제115단계에서 검색된 문서들을 검색 정보 DB(14b)에 저장한다(제130단계).On the other hand, if the information ontology has not been established in step 110, by accessing the URLs of the information providing websites in real time to search the entire document registered in the information providing websites (step 115), and analyze the location structure of the retrieved documents. The information ontology DB 40 is updated (step 120), and the documents retrieved in step 115 are stored in the search information DB 14b (step 130).

제130단계 후에, 검색 정보 DB에 저장된 문서를 분석하여 가비지 문서를 제거하여 유용한 문서를 추출하고, 추출된 유용한 문서에 가중치를 부여하고(제135단계), 가중치가 부여된 유용한 문서를 통신망을 통해 고객 단말기로 실시간 제공한다(제140단계).After operation 130, the document stored in the search information DB is analyzed to remove garbage documents to extract useful documents, weight the extracted useful documents (step 135), and the weighted useful documents through a communication network. Provided in real time to the customer terminal (step 140).

도 6은 도 5에서 가중치 부여 단계(제135단계)를 상세하게 나타내는 흐름도이다.FIG. 6 is a flowchart illustrating a weighting step (step 135) in FIG.

도 2 및 도 6을 참조하여, 문서 편집부(92)는 검색 정보 DB(14b)에 저장된 문서에서 조사와 같은 불용어를 제거하고, 지식기반 DB(60)를 참조하여 속성 정보를 이용하여 쓸모없는 가비지 정보를 제거하여 고객에게 유용한 문서만을 추출한다(제135a단계). 제135a단계 후에, 문서 편집부(92)는 지식기반 DB(60)에 저장된 속성정보를 이용하여 추출된 문서를 속성별(별명, 지역, 키워드, 힌트...)로 분류하여 문서를 재편집하고(제135b단계), 추출된 문서의 위치를 추출한다(제135c단계).Referring to FIGS. 2 and 6, the document editing unit 92 removes stopwords, such as surveys, from a document stored in the search information DB 14b, and uses obsolete garbage by using the attribute information with reference to the knowledge base DB 60. The information is removed to extract only documents useful to the customer (step 135a). After step 135a, the document editing unit 92 re-edits the document by classifying the extracted document by attribute (alias, region, keyword, hint ...) using the attribute information stored in the knowledge base DB 60. (Step 135b), the position of the extracted document is extracted (step 135c).

제135c단계 후에, 가중치 부여부(94)는 문서 편집부(92)에서 속성정보에 따라 분석된 결과 및 위치 정보를 이용하여 각 문서에 가중치를 부여한다(제135d단계). 전술된 바와 같이, 추출된 문서의 위치는 그 중요도를 반영하는 정보가 되며, 추출된 문서의 내용에 검색어 및 지식기반 DB에 저장된 검색어에 대한 속성 정보를 나타내는 단어들이 얼마나 많이 포함되어있고 서로간의 관계를 가지며 나타나는가에 따라 가중치가 결정된다. 또한, 속성 정보 각각에 가중치가 부여될 수 있다. 즉, 추출된 문서에 속성 정보에 해당하는 단어가 동일한 횟수로 나타난다 하더라도 속성 정보의 가중치에 따라 추출된 문서의 가중치가 달라질 수 있다. 그리고, 추출된 문서에서 가중치가 높은 속성 정보에 해당하는 단어가 서로 묶여서 나타나면 가중치가 또한 높아진다. 이처럼, 추출된 문서는 문서의 위치 및 속성정보에 따라 분석된 결과가 복합적으로 작용하여 그 우선순위가 결정된다.After step 135c, the weight assigning unit 94 weights each document by using the result and position information analyzed according to the attribute information in the document editing unit 92 (step 135d). As described above, the location of the extracted document is information reflecting its importance, and how many words representing the attribute information of the search word and the search word stored in the knowledge base DB are included in the content of the extracted document and are related to each other. The weight is determined based on whether or not it appears. In addition, a weight may be given to each attribute information. That is, even if the word corresponding to the attribute information appears in the extracted document the same number of times, the weight of the extracted document may vary according to the weight of the attribute information. In addition, when words corresponding to high-weighted attribute information are grouped together in the extracted document, the weight is also increased. In this way, the extracted document is analyzed according to the location and attribute information of the document, and the priority of the extracted document is determined.

제135d 단계 후에, 부여된 가중치에 따라 문서의 우선순위를 결정하여 문서 DB(96)에 저장한다(제135e단계).After step 135d, the priority of the document is determined according to the assigned weight and stored in the document DB 96 (step 135e).

한편, 이상에서는 고객 단말기(10)로부터 검색이 요청되면 실시간으로 정보를 검색하고, 검색된 결과를 고객 단말기(10)로 제공하는 방법에 대해 설명하였다. 그러나, 전술된 바와 같이, 맞춤정보 검색서버(14)는 특정 인물, 기관, 회사 및 학교 등의 홈페이지에 링크될 수 있다. 이 경우, 도 1 및 도 2를 참조하여, 맞춤정보검색서버(14)는 링크된 인물, 기관, 회사 및 학교명을 검색어로 하여, 검색어에 대한 정보 온톨로지를 구축한다. 그 다음, 정보 온톨로지를 참조하여 정보 제공 웹사이트들에서 검색어에 대한 정보가 등록된 위치로 직접 접근하여 주기적으로 문서 검색을 하고, 검색된 문서를 검색 정보 DB(14b)에 저장한다.On the other hand, the above description has been given of a method of searching for information in real time when a search is requested from the customer terminal 10 and providing the searched result to the customer terminal 10. However, as described above, the personalized information search server 14 may be linked to homepages of specific persons, institutions, companies, and schools. In this case, referring to FIGS. 1 and 2, the personalized information search server 14 constructs an information ontology for the search word by using the linked person, institution, company, and school name as the search word. Then, the document is searched periodically by accessing the information ontology directly from the information providing websites to the location where the information about the search word is registered, and storing the searched document in the search information DB 14b.

그 다음, 지식기반 DB(60)에 저장된 속성 정보를 이용하여 검색 정보 DB(14b)에 저장된 문서에서 잘못 검색된 가비지 문서를 제거하여 유용한 문서들만 추출하고, 추출된 문서의 위치정보 및 속성 정보를 이용하여 각 문서에 가중치를 부여하고, 부여된 가중치에 따라 문서의 우선순위를 결정하여 문서 DB(96)에 저장한다. 이처럼, 문서 DB(96)에 가중치가 부여된 문서들을 저장해두고 있다가, 고객이 해당 홈페이지에 접속하여 관련된 정보를 요청하면 맞춤정보 검색서버(14)는 문서 DB(96)에 저장된 가중치가 부여된 문서를 추출하여 고객 단말기(10)로 제공한다.Then, by using the attribute information stored in the knowledge base DB (60) to remove garbage documents that were incorrectly retrieved from the document stored in the search information DB (14b) to extract only useful documents, using the location information and attribute information of the extracted document By weighting each document, the priority of the documents is determined according to the weights, and stored in the document DB 96. As such, while storing the weighted documents in the document DB 96, when the customer accesses the corresponding homepage and requests related information, the personalized information search server 14 receives the weighted information stored in the document DB 96. The document is extracted and provided to the customer terminal 10.

한편, 본 발명은 또한 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 기록매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플라피디스크, 광데이터 저장장치 등이 있으며, 또한 캐리어 웨이브(예를 들어 인터네을 통한 전송)의 형태로 구현되는 것도 포함한다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.On the other hand, the present invention can also be embodied as computer readable codes on a computer readable recording medium. The computer-readable recording medium includes all kinds of recording devices in which data that can be read by a computer system is stored. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage, and the like, and may also be implemented in the form of a carrier wave (for example, transmission over the Internet). Include. The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.

상술한 바와 같이, 본 발명에 따른 맞춤 정보 검색 시스템 및 그 방법은 정보를 제공하는 웹사이트들의 웹 페이지 구조를 정보 온톨로지로서 미리 구축하여 실시간 검색을 빠르게 할 수 있으며, 고객이 검색하고자 하는 검색어를 보다 구체적으로 한정할 수 있는 속성 정보들을 미리 지식기반 DB로서 구축하므로 가비지 정보 검색을 최소화할 수 있다. 또한, 속성정보 및 검색된 문서의 위치정보를 이용하여 가중치를 부여하며, 고객은 높은 가중치가 부여된 문서를 선택함으로써 자신의 요구에 적합한 정보를 얻을 수 있다.As described above, the customized information retrieval system and method thereof according to the present invention can pre-construct the web page structure of websites providing information as an information ontology to speed up the real-time search, and to search the search terms that the customer wants to search. Specifically, since the attribute information that can be limited is built in advance as a knowledge base DB, garbage information retrieval can be minimized. In addition, weights are assigned using the attribute information and the location information of the retrieved document, and the customer can obtain information suitable for his / her needs by selecting a document with high weight.

Claims

In the knowledge-based customized information retrieval system for providing a search result to the customer terminal in response to a search request from a customer terminal connected through a communication network,

Receiving a search request including a search word and attribute information defining the search word from the customer terminal, and accessing a plurality of websites that provide information when the search request is received from the customer terminal or periodically including the search word; An information collection unit for searching documents and constructing a web page structure of websites as an information ontology;

A search information database in which documents searched by the information collecting unit are stored; And

Based on the attribute information, useful documents are extracted from the searched documents, and priority of the extracted documents is determined based on differentiated weights among attribute words constituting the attribute information set through a knowledge-based method. Customized information system for sorting the extracted documents according to the priority and providing them to the customer terminal or providing the extracted documents to the customer terminal sorted according to the determined priority according to the information providing request received from the customer terminal. Knowledge-based customized information retrieval system, including; studying.

delete

The method of claim 1, wherein the information collecting unit

A URL list storage unit for storing URL information of each of the information providing web sites;

Information ontology database;

Search terms and attribute information requested to be searched from the customer terminal are received through a communication network or periodically, a document including a search word is extracted using a search system operated by each of the information providing web sites, and the information ontology data is also extracted. A search driver that directly accesses a location of a document for the search word by referring to a base, extracts a real-time document, and stores the extracted document in the search information database; And

A knowledge-based customized information retrieval system, comprising: an information ontology constructing unit for analyzing a web page structure that is location information of a searched document stored in the search information database and storing the analyzed result as an information ontology in the information ontology database .

According to claim 1, Custom information providing unit

A knowledge base database in which search terms frequently appearing in an information providing site and related words are stored as attribute information; And

And a knowledge base weighting unit for assigning a weight to each document stored in the search information database using a search term, an association between the attribute information extracted using a knowledge base method and a document, and an importance of each attribute information. Knowledge-based customized information retrieval system.

The method of claim 4, wherein the weighting unit

A document editing unit which removes a stop word such as a search from a searched document stored in the search information database and removes a garbage document using search terms and attribute information;

A weighting unit which removes garbage documents by the document editing unit, and then weights each document using search terms, attribute information, document position information, importance of attribute information, and the degree of relevance between attributes; And

And a document database storing a document weighted by the weighting unit.

In the knowledge-based customized information service method for providing a search result to the customer terminal in response to a search request from a customer terminal connected through a communication network,

(a) receiving a search request including a search word and attribute information defining the search word from the customer terminal;

(b) determining whether an information ontology for the search word is established;

(c) If the information ontology is established, the information ontology is directly accessed from the information providing websites to the location where the information corresponding to the search word is registered, and the search is performed in real time. Storing;

(d) extracting a document to be provided to the customer terminal by removing garbage documents from the documents stored in the search information database corresponding to the search word based on the importance of the attribute information set through the knowledge base method, and extracting the knowledge Determining the priority of the extracted document based on the weights assigned between the attribute words constituting the attribute information through the based method and providing the extracted document to the customer terminal;

(e) accessing a URL of each of the information providing websites if the information ontology is not established, searching for the information providing website in real time based on the search word, and storing the information providing website in the search information database; And

and (f) analyzing the information structure of the document obtained by searching the information providing website in real time and constructing an information ontology for the search word.

delete

The method of claim 6, wherein step (d)

(d1) removing the stopwords such as the search and removing the garbage information using the attribute information related to the search word;

(d2) extracting location information of the retrieved document;

(d3) generating a weight of each searched document using search words, attribute information, and location information; And

and (d4) determining the priority of the document according to the weight generated in the step (d3).

delete