KR20010082984A

KR20010082984A - A system for retrieving world wide web and a method for storing, viewing and using the search result

Info

Publication number: KR20010082984A
Application number: KR1020000008579A
Authority: KR
Inventors: 봉래 박
Original assignee: 박봉래; (주)나와넷
Priority date: 2000-02-22
Filing date: 2000-02-22
Publication date: 2001-08-31
Also published as: KR100379635B1

Abstract

PURPOSE: A system for searching a world wide web page and a method for storing, viewing, and using the search result are provided for a user to access to wanted information accurately and rapidly using hyperlink information on a world-wide-web and a tree-structure of a web page. CONSTITUTION: A hyperlink index word extracting unit(1500) extracts hyper links indicating a specific web page from a distribution database(1100), and selects index words from a link text of hyper links and forms an index gathering(default URL) responding to the corresponding web page and stores the index gathering in a hyperlink file(1600). An index word expanding unit(1400) reads one index gathering from the hyperlink file(1600) and calls a web page having the index gathering as an address from the distribution database(1100) and selects an important index word from the corresponding web page and appends the important index word to the index gathering, thus the index gathering is expanded. An index word weight value embodying unit(1300) receives the index gathering from the index word expanding unit(1400) and extracts a web page having the index gathering as an address and position information of sub pages being connected to a depth 'n' of a tree structure, and calls the corresponding sub page from the distribution database(1100) based on the position information. A query word processing unit receives a user query word(1220) through a user interface(1200) and extracts an index word by analyzing the query word and reads position information gathering responded to the index word extracted from a conversion file(1700) and processes the information gathering.

Description

A SYSTEM FOR RETRIEVING WORLD WIDE WEB AND A METHOD FOR STORING, VIEWING AND USING THE SEARCH RESULT}

본 발명은 인터넷에서 웹서핑을 용이하게 하는 검색엔진에 관한 것으로서,특히 월드와이드웹(world-wide-web) 페이지상의 하이퍼링크(hyperlink) 정보와 웹페이지의 트리구조(tree-structure)를 이용하여 사용자로 하여금 원하는 정보에 정확하고 신속하게 액세스(acess)할 수 있도록 하는 검색시스템에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a search engine that facilitates web surfing on the Internet, and in particular, by using hyperlink information on a world-wide-web page and a tree-structure of a web page. The present invention relates to a retrieval system that enables a user to accurately and quickly access desired information.

종래에 있어서, 데이터베이스내의 문서를 파싱(parsing)하고, 색인(index ing)하고, 검색하는 기술들은 많이 알려져 있다.Conventionally, techniques for parsing, indexing, and searching documents in a database are well known.

근래에 분산 데이터베이스가 월드와이드웹(world-wide-web)의 형태로 등장했다. 이 웹의 데이터베이스 문서는 인터넷을 경유하여 액세스할 수 있는 페이지 형태이다. 따라서, 인터넷에 대한 통신링크를 가지는 사용자라면 누구라도 웹페이지를 액세스할 수 있다.In recent years, distributed databases have emerged in the form of a world-wide-web. The database document on the web is in the form of pages accessible via the Internet. Thus, anyone who has a communication link to the Internet can access a web page.

웹페이지들은 세계도처의 서로 다른 수백만개의 컴퓨터시스템상에 분산되어 있다. 인터넷 사용자는 흥미를 갖는 정보를 포함하는 특정 웹페이지에 위치되기를 희망한다.Web pages are distributed on millions of different computer systems around the world. Internet users wish to be placed on a particular web page that contains information of interest.

이러한 웹페이지를 인터넷를 경유하여 접근하는 방법은 아래와 같다.The method of accessing these web pages via the Internet is as follows.

사용자가 WWW 브라우저로 열람하고 싶은 정보를 담고 있는 웹페이지의 위치정보(URL: uniform resource locator)를 지정하면, 그 위치정보로 지시되는 문서가 웹사이트로부터 사용자 컴퓨터로 전송된다. 이것을 사용자 컴퓨터의 웹브라우저가 소정의 형식으로 변환하여 화면표시한다.When the user specifies the location information (URL: uniform resource locator) of the web page containing the information to be viewed by the WWW browser, the document indicated by the location information is transmitted from the website to the user's computer. This is converted into a predetermined format by a web browser of the user's computer and displayed on the screen.

그러나, 이러한 방식에서는 사용자가 URL을 모르고 있으면 목표하는 문서에 액세스할 수 없다. 더구나, 인터넷상의 정보량은 방대하여 문서끼리의 링크 관계를 따르고 있는 것만으로는 목표하는 문서에 반드시 도달할 수 있다고는 할 수 없다.그래서, 문서의 검색처리를 제공하는 검색엔진이 필요하다.However, in this way, if the user does not know the URL, the target document cannot be accessed. In addition, the amount of information on the Internet is enormous, and simply following the link relationship between documents does not necessarily mean that the target document can be reached. Therefore, a search engine that provides a document retrieval process is required.

이 검색엔진은 사용자 컴퓨터에서의 검색요구를 수신하면, CGI(common gateway interface)를 통하여 검색 프로그램(spider programm)을 기동한다. 이 검색프로그램이 색인 데이터베이스로의 문의를 행하여 이 데이터베이스로부터 수신한 검색결과를 HTML(Hype Text Markup Language) 형식으로 변환한다. 그리고, 생성된 HTML 문서가 사용자 컴퓨터에 송신된다. 이렇게 해서, 사용자는 각종 URL 정보를 열람할 수 있다.When the search engine receives a search request from a user computer, the search engine starts a search program through a common gateway interface (CGI). The search program inquires into the index database and converts the search results received from the database into HTML (Hype Text Markup Language) format. Then, the generated HTML document is sent to the user computer. In this way, the user can browse various URL information.

그러나, 이러한 종래의 검색엔진 예를들어, ALTAVISTA(키워드를 입력받아서 검색하는 대표적인 검색엔진) 등은 인터넷에 존재하는 수많은 웹페이지상의 해당 웹페이지 문서내에서 모든 색인어를 추출하고, 이 색인어들로 해당 웹페이지를 색인한다.However, such a conventional search engine, for example, ALTAVISTA (a representative search engine that receives a keyword by searching for it) and the like extracts all index words from the corresponding web page documents on a large number of web pages existing on the Internet. Index web pages.

따라서, 해당 웹페이지의 특징적인 정보를 나타내지 않는 색인어(이하, 불용어로 정의한다)도 추출되어 색인 데이터베이스의 용량이 방대해지고, 구축시간이 오래 걸리며, 사용자를 자신이 원하는 정보에 정확하게 접근시키기가 어렵다.Therefore, index words (hereafter defined as stop words) that do not represent the characteristic information of the web page are also extracted, which increases the size of the index database, takes a long time to build, and makes it difficult for users to access the information exactly as they want. .

본 발명은 웹페이지를 색인함에 있어 구조기반검색을 기반으로 종래의 내용기반검색을 가미한 색인시스템에 관한 것이다.The present invention relates to an indexing system that adds a conventional content-based search based on structure-based search in indexing a web page.

본 발명의 제 1 목적은 URL의 계층구조상에서 파일명을 생략하더라도 해당 웹페이지에 액세스할 수 있는 URL구조의 웹페이지만을 선별하여 색인하는 방법을 제공하는 것이다.It is a first object of the present invention to provide a method of selecting and indexing only a web page of a URL structure that can access a web page even if a file name is omitted in a hierarchy of URLs.

본 발명의 제 2 목적은 복수의 외부 웹페이지내의 적어도 하나 이상의 하이퍼링크가 동일한 특정 웝페이지를 지시할 때, 이 하이퍼링크의 링크텍스트(link-text)로 부터 색인어를 추출하고, 추출된 색인어로 해당 웹페이지를 색인하는 기법을 제공하는 것이다.A second object of the present invention is to extract an index word from the link-text of this hyperlink when at least one or more hyperlinks in the plurality of external web pages point to the same specific page and extract the index word. It provides a technique for indexing the web page.

본 발명의 제 3 목적은 웹페이지상의 태그구조나 워드의 시사성, 범용 어휘사전의 등록여부를 고려하여 해당 웹페이지를 정의하는 색인어를 선정하는 방법을 제공하는 것이다.It is a third object of the present invention to provide a method of selecting an index word that defines a web page in consideration of a tag structure on a web page, word affinity, and registration of a general vocabulary dictionary.

본 발명의 제 4 목적은 특정 색인어들로 특정 웹페이지를 색인함에 있어 해당 웹페이지 뿐만 아니라 해당 웹페이지와 트리구조로 연결되는 서브페이지에서도 색인어의 발생빈도를 체크하여 색인어 가중치를 부여하는 방법을 제공하는 것이다.A fourth object of the present invention is to provide a method for assigning index word weights by checking the frequency of index word occurrences not only in the web page but also in subpages linked to the web page in indexing a specific web page with specific index words. It is.

본 발명의 제 5 목적은 특정 질의어에 대한 가중치가 가장 높게 부여된 웹문서를 논스톱으로 사용자 컴퓨터에 디스플레이하는 방법을 제공하는 것이다.It is a fifth object of the present invention to provide a method for non-stop displaying a web document weighted with the highest weight for a specific query on a user computer.

본 발명의 제 6 목적은 검색엔진 서버로부터 검색한 웹문서의 주소를 사용자 컴퓨터와 링킹된 인터넷상의 특정 서버에 저장하고, 이를 용이하게 활용할 수 있는 방법을 제공하는 것이다.A sixth object of the present invention is to provide a method of storing the address of a web document retrieved from a search engine server in a specific server on the Internet linked with a user computer and easily utilizing the same.

상술한 목적을 달성하기 위한 본 발명의 제 1 양태로서의 월드와이드 웹페이지의 색인시스템은 월드와이드 웹페이지의 위치정보(URL)로부터 인터넷을 통해 해당 웹페이지를 획득하기 위한 통신인터페이스와, 이 통신인터페이스에 의해 획득된 웹페이지상의 모든 하이퍼링크(hyperlink)를 순차적으로 하나씩 추출하기 위한 하이퍼링크 인출수단과, URL(uniform resource locator)구조에서 파일명이 생략된 상태로 또는 생략되어도 해당 웹페이지에 액세스할 수 있도록 하는 디폴트(default) URL을 갖는 하이퍼링크들만을 선별하는 하이퍼링크 선별수단과, 동일한 디폴트 URL을 갖는 하이퍼링크들을 월드와이드웹(WWW) 데이터베이스에서 수집하는 수집수단과, 상기 수집수단에 의해 수집된 하이퍼링크들의 링크텍스트로부터 색인어를 출하는 색인어 추출수단을 포함한다.An indexing system of a worldwide web page as a first aspect of the present invention for achieving the above object comprises a communication interface for acquiring a corresponding web page from the location information (URL) of the worldwide web page via the Internet, and this communication interface. Hyperlink retrieval means for sequentially extracting all the hyperlinks on the webpage obtained by step by step, and the webpage can be accessed even if the file name is omitted or omitted in the uniform resource locator (URL) structure. Hyperlink selecting means for selecting only hyperlinks having a default URL, and collecting means for collecting hyperlinks having the same default URL from the World Wide Web database, collected by the collecting means. Index word extracting means for extracting an index word from the link text of the hyperlinks.

상기 색인어 추출수단은 상기 디폴트 URL을 갖는 하이퍼링크들로부터 복수의 "위치정보:색인어리스트"를 형성하는 색인어 추출모듈과, 상기 "위치정보:색인어리스트"를 저장하여 축적하기 위한 제 1 저장파일과, 상기 제 1 저장파일에 저장되어 있는 특정 위치정보에 대응하는 모든 색인어리스트들을 탐색하여 색인어별 발생빈도를 카운팅하고, 그 발생빈도로부터 해당 색인어에 가중치를 부여함으로써 "위치정보:색인어(가중치)리스트"를 형성하는 색인모듈과, 상기 "위치정보:색인어(가중치)리스트"를 저장하기 위한 제 2 저장파일과, 상기 제 2 저장파일의 "위치정보:색인어(가중치)리스트"를 "색인어:위치정보(가중치)리스트"로 변환하는 변환모듈과, 상기 "색인어:위치정보(가중치)리스트"를 저장하기 위한 제 3 저장파일로 이루어진다.The index word extracting means includes: an index word extracting module for forming a plurality of " location information: index list " from hyperlinks having the default URL, a first storage file for storing and storing the " location information: index list " By searching all index indexes corresponding to the specific location information stored in the first storage file, counting the frequency of occurrence of each index word, and weighting the index word from the frequency of occurrence, " location information: index index (weighted) list. An index module for forming ", a second storage file for storing the " location information: index list ", and " location information: index list " list of the second storage file " And a third storage file for storing the " index: location information (weighted) list ".

따라서, 상기 추출된 색인어들로 상기 디폴트 URL이 지시하는 웹페이지를 색인할 수 있다.Therefore, the web page indicated by the default URL can be indexed by the extracted index words.

일반적으로, 월드와이드 웹페이지의 하이퍼텍스트에는 웹페이지와 웹페이지를 링크로 연결하여 사용자들이 이러한 링크를 따라 웹페이지를 비순차적인 방법으로 찾아보면서 검색할 수 있도록 하는 하이퍼링크를 포함하고 있다. 이 하이퍼링크에는 웹페이지를 지시하는 위치정보(URL)와 함께 해당 웹페이지를 대표하기 위한 텍스트가 포함되어 있다. 따라서, 이 하이퍼링크의 링크텍스트에서 색인어를 추출하는 경우 웹페이지의 텍스트상에서 색인어를 추출하는 경우보다 적은 색인 작업으로도 불용어를 제외하면서 보다 정확한 색인어의 추출이 가능하다.In general, the hypertext of a worldwide web page contains a hyperlink that links the web page and the web page so that users can browse and search the web page in a non-sequential way by following the link. The hyperlink includes text for representing the web page with location information (URL) indicating the web page. Therefore, when the index word is extracted from the link text of the hyperlink, it is possible to extract the index word more accurately with less index work than when extracting the index word on the text of the web page.

본 발명의 제 2 양태로서의 월드와이드 웹페이지의 색인시스템은 위치정보 (URL)에 근거하여 인터넷상의 분산 데이터베이스로부터 웹페이지를 획득하고, 이 웹페이지상에 존재하는 하이퍼링크의 링크텍스트로부터 색인어를 추출하여 "위치정보:색인어집합"을 형성하는 하이퍼링크ㆍ색인어 추출수단과, 상기 "위치정보:색인어집합"을 저장하기 위한 하이퍼링크ㆍ색인어파일과, 상기 하이퍼링크ㆍ색인어파일에 저장되어 있는 "위치정보:색인어집합"을 판독하고, 이 판독된 위치정보에 대응하는 웹페이지상의 텍스트로부터 핵심 색인어를 추출하여 상기 색인어집합을 확장하는 색인어 확장수단과, 상기 웹페이지와 트리구조로 연결되는 깊이 N까지의 서브페이지를 획득하고, 이렇게 획득된 서브페이지와 상기 웹페이지를 검색하여 상기 확장후 색인어집합의 각 색인어들이 발생하는 빈도를 체킹함으로써 그 발생빈도에 따라 상기 색인어집합의 색인어들에 가중치를 부여하는 색인어 가중치부여수단과, 상기 색인어 가중치부여수단에 의해 가중치가 부여된 "위치정보:색인어(가중치)집합"을 저장하기 위한 색인파일을 포함한다.A worldwide web page indexing system as a second aspect of the present invention obtains a web page from a distributed database on the Internet based on location information (URL), and extracts an index word from the link text of a hyperlink existing on the web page. Hyperlink indexing means for forming " location information: index set ", a hyperlink index file for storing the " location information: index set ", and a " location stored in the hyperlink index index file. Information index indexing means for reading the "information index index", extracting a key index word from the text on the web page corresponding to the read position information, and expanding the index word set, and to a depth N connected to the web page and the tree structure. Acquire a subpage of, search the obtained subpage and the webpage and index the post-expansion index An index word weighting means for weighting index words of the index set according to the frequency of occurrence by checking the frequency of occurrence of each index word, and " location information: index word (weighted value) weighted by the index word weighting means. Contains an index file to store the "set".

본 발명의 제 2 양태는 색인하고자 하는 웹페이지상의 문서에서 핵심색인어를 추출하고, 이 추출된 색인어를 하이퍼링크 정보로부터 추출한 색인어에 추가하여 색인어를 확장하는 기능과, 이렇게 확장된 색인어에 대한 가중치를 색인하고자하는 웹페이지와 이 웹페이지의 서브페이지를 이용하여 부여하기 때문에 보다 정밀하고 정확한 검색이 가능하도록 한다.A second aspect of the present invention provides a function of extracting a key index word from a document on a web page to be indexed, adding the extracted index word to an index word extracted from hyperlink information, and extending the index word, and weighting the extended index word. The web page to be indexed and the subpage of the web page are used to provide more precise and accurate search.

본 발명의 제 3 양태로서의 색인어 가중치 부여방법은 인터넷상의 분산 데이터베이스를 이용하여 웹페이지를 색인함에 있어서, 특정 웹페이지를 특징짓는 복수의 색인어와 이 웹페이지의 위치정보가 서로 대응되도록 "위치정보:색인어집합"을 형성하고 이를 저장하고, 상기 "위치정보:색인어집합"의 위치정보로 지시되는 웹페이지와 트리구조로 연결되는 깊이 N까지의 서브페이지들에 대한 위치정보를 인출하고, 상기 인출단계에서 추출된 위치정보로 지시되는 서브페이지를 상기 분산 데이터베이스로부터 획득하고, 상기 "위치정보:색인어집합"의 색인어들이 해당 웹페이지 및 이 웹페이지의 서브페이지에서 발생하는 빈도를 카운팅하여 각 색인어에 대하여 발생빈도에 따른 가중치를 부여함으로써 달성된다.The index word weighting method according to the third aspect of the present invention provides a method for indexing a web page using a distributed database on the Internet, such that a plurality of index words that characterize a particular web page and location information of the web page correspond to each other. An index word set ", and stores the same, and retrieves the web page indicated by the position information of the" location information: index word set "and position information about subpages up to a depth N connected in a tree structure. Obtains a subpage indicated by the location information extracted from the distributed database, counts the frequency of occurrence of the index words of the " location information: index set " This is achieved by assigning a weight to the frequency of occurrence.

본 발명의 제 4 양태로서의 검색결과의 뷰잉(viewing)방법은 사용자 컴퓨터를 통해 입력되는 질의어를 분석하여 적어도 하나 이상의 색인어를 추출하고, 이렇게 추출된 색인어들을 이용하여 색인 데이터베이스로부터 이 색인어들과 호응하는 복수의 웹페이지 위치정보를 검색하고, 검색된 복수의 위치정보를 그 가중치에 따라 내림차순으로 정리하며, 내림차순으로 정리된 복수의 위치정보중 최고 순위 위치정보를 사용자 컴퓨터의 웹브라우져에 전달함으로써 상기 최고 순위 위치정보에 호응하는 웹페이지가 논스톱으로 사용자 컴퓨터에 디스플레이되게 하는 것에 의해 달성된다.According to a fourth aspect of the present invention, a method of viewing a search result analyzes a query input through a user computer to extract at least one index word, and matches the index words from the index database using the extracted index words. Searching the plurality of web page location information, sorting the plurality of searched location information in descending order according to the weight, and delivering the highest ranking location information among the plurality of location information arranged in descending order to the web browser of the user's computer. This is accomplished by having a web page compliant with the location information displayed on the user's computer nonstop.

또한, 본 발명의 제 4 양태는 내림차순으로 정리된 복수의 위치정보에 대한리스트를 사용자 컴퓨터에 디스플레이하는 과정을 포함할 수도 있다.Further, the fourth aspect of the present invention may include displaying on the user computer a list of the plurality of location information arranged in descending order.

이때, 상기 최고순위 위치정보에 호응하는 웹페이지와 상기 위치정보에 대한 리스트가 하나의 화면에 동시에 디스플레이되는 것이 바람직하다.In this case, it is preferable that the web page corresponding to the highest position information and the list of the position information are simultaneously displayed on one screen.

본 발명의 제 5 양태로서의 검색결과의 저장방법은 검색엔진 서버가 사용자 컴퓨터에서 전달된 질의어를와 관련되는 문서 위치정보(URL)리스트들을 색인 데이터베이스로부터 검색하는 단계와, 이렇게 검색된 위치정보리스트들을 일반적인 질의어 처리방법에 의해 재정렬하는 단계와, 상기 검색엔진 서버가 검색된 문서 위치정보 리스트를 사용자 컴퓨터에 전송하는 단계와, 상기 위치정보 리스트를 사용자 컴퓨터의 화면상에 디스플레이하는 단계와, 사용자 컴퓨터에 디스플레이된 위치정보 리스트중 적어도 하나 이상의 소망하는 위치정보를 선택하는 단계와, 상기 선택된 적어도 하나 이상의 위치정보를 사용자 컴퓨터와 링킹된 특정 서버에 전송하는 단계와, 상기 특정 서버가 전송된 위치정보를 해당 사용자의 저장영역에 저장하는 단계를 포함한다.A method of storing a search result as a fifth aspect of the present invention comprises the steps of: a search engine server retrieving document location information (URL) lists associated with a query word transmitted from a user computer from an index database; Reordering by a processing method, transmitting the searched document location information list to the user computer by the search engine server, displaying the location information list on the screen of the user computer, and displaying the location displayed on the user computer. Selecting at least one or more desired location information from an information list, transmitting the selected at least one location information to a specific server linked to a user computer, and storing the location information transmitted by the specific server of the user Including saving to the area .

또한, 본 발명의 제 6 양태로서의 검색결과 저장방법은 검색엔진 서버가 검색문서에 대한 위치정보를 사용자 컴퓨터로 전송하는 단계와, 사용자 컴퓨터의 웹브라우져가 상기 위치정보에 근거하여 해당 검색문서를 불러들여 사용자 컴퓨터의 화면상에 디스플레이하는 단계와, 디스플레이된 검색문서의 위치정보를 사용자의 결정에 따라 상기 사용자 컴퓨터와 링킹되어 있는 특정 서버에 전송하는 단계와, 특정 서버가 전송된 위치정보를 해당 사용자의 저장영역에 저장하는 단계를 포함한다.In addition, a method of storing a search result as a sixth aspect of the present invention includes the steps of: transmitting, by a search engine server, location information of a search document to a user computer; and a web browser of the user computer recalls the search document based on the location information. And displaying the location information of the displayed search document to a specific server linked with the user computer according to the user's decision, and transmitting the location information transmitted by the specific server to the corresponding user. And storing in a storage area of the.

본 발명의 제 7 양태로서의 저장된 검색결과의 이용방법은 인터넷을 이용하여 복수의 사용자 컴퓨터와 서버 컴퓨터가 접속되고, 상기 서버에는 이 서버와 링킹된 복수의 사용자 컴퓨터들을 위한 복수의 저장영역이 구비되며, 이 저장영역에는 사용자 컴퓨터로부터 전송된 복수의 웹문서 위치정보 리스트가 사용자별로 저장되어 있는 네트워킹 환경에 있어서, 사용자가 자신의 컴퓨터를 이용하여 상기 서버 컴퓨터에 접속하는 단계와, 상기 서버 컴퓨터의 저장영역중 상기 사용자를 위한 저장영역으로부터 복수의 위치정보 리스트를 불러들여 사용자 컴퓨터상에 디스플레이하는 단계와, 디스플레이된 위치정보 리스트중 소망하는 위치정보를 선택하는 것에 의해 이 위치정보와 호응하는 웹페이지에 액세스하는 단계를 포함한다.As a method of using a stored search result as a seventh aspect of the present invention, a plurality of user computers and a server computer are connected by using the Internet, and the server is provided with a plurality of storage areas for the plurality of user computers linked with the server. In a storage environment, a plurality of web document location information lists transmitted from a user computer are stored for each user in the storage area, wherein the user accesses the server computer using his computer, and stores the server computer. Retrieving a plurality of location information lists from a storage area for the user in the area and displaying them on the user's computer, and selecting desired location information from the displayed location information list to display a web page corresponding to the location information. Accessing.

따라서, 사용자는 상기 서버 컴퓨터에 접속하는 것만으로도 용이하게 웹서핑이 가능하다.Therefore, the user can easily surf the web simply by connecting to the server computer.

본 발명의 다른 목적 및 장점들은 하기에 설명될 것이며, 본 발명의 실시에 의해 알게 될 것이다. 또한, 본 발명의 목적 및 장점들은 첨부된 특허청구범위에 나타낸 수단 및 조합에 의해 실현될 수 있다.Other objects and advantages of the invention will be described below and will be appreciated by the practice of the invention. Furthermore, the objects and advantages of the present invention can be realized by means and combinations indicated in the appended claims.

명세서내에 통합되어 있고 명세서의 일부를 구성하는 첨부도면은 발명의 현재의 바람직한 실시예를 예시하며, 다음의 바람직한 실시예의 상세한 설명과 함께 본 발명의 원리를 설명하는 역할을 할 것이다.The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate the presently preferred embodiments of the invention and, together with the description of the following preferred embodiments, serve to explain the principles of the invention.

도 1은 본 발명의 검색시스템을 설명하기 위한 전체 시스템 구성도이다.1 is an overall system configuration for explaining the search system of the present invention.

도 2는 특정 웹페이지와 복수의 외부 웹페이지들이 하이퍼링크로 연결된 상태를 설명하기 위한 도면이다.FIG. 2 is a diagram for explaining a state where a specific web page and a plurality of external web pages are connected by a hyperlink.

도 3은 본 발명에 따른 실시예 1의 검색엔진에 대한 내부 블록구성도이다.3 is an internal block diagram of the search engine of the first embodiment according to the present invention.

도 4a ~ 도 4d는 상기 도 3의 검색엔진을 이용하여 색인 데이터베이스를 구축하는 과정을 설명하기 위한 플로우챠트이다.4A to 4D are flowcharts illustrating a process of building an index database using the search engine of FIG. 3.

도 5는 상기 도 3의 검색엔진이 사용자 컴퓨터로부터 입력된 질의어를 처리하는 과정을 설명하기 위한 플로우챠트이다.FIG. 5 is a flowchart for describing a process of processing a query input from a user computer by the search engine of FIG. 3.

도 6은 본 발명의 검색엔진으로부터 제공된 검색정보가 디스플레이되는 웹브라우저의 화면구성을 도시한 것이다.6 illustrates a screen configuration of a web browser in which search information provided from a search engine of the present invention is displayed.

도 7은 본 발명에 따른 실시예 2의 검색엔진에 대한 전체적인 블록구성도이다.7 is an overall block diagram of the search engine of the second embodiment according to the present invention.

도 8a ~ 도 8b는 도 7의 검색엔진에 대한 상세 내부구성도이다.8A to 8B are detailed internal configuration diagrams of the search engine of FIG. 7.

도 9는 웹상에서 복수의 웹페이지들이 계층적 트리구조로 연결된 상태를 설명하기 위한 것이다.9 illustrates a state in which a plurality of web pages are connected in a hierarchical tree structure on the web.

도 10은 도 7의 검색엔진이 색인하고자 하는 웹페이지와 이 웹페이지의 서브페이지들로부터 색인어별 가중치를 부여하는 과정을 설명하기 위한 것이다.FIG. 10 is a diagram for describing a process of assigning a weight for each index word from a web page to be indexed by the search engine of FIG. 7 and subpages of the web page.

도 11a ~ 도 11d는 도 7의 검색엔진을 이용하여 색인 데이터베이스를 구축하는 과정을 설명하기 위한 도면이다.11A through 11D are diagrams for describing a process of building an index database using the search engine of FIG. 7.

도 12는 본 발명에 따른 검색결과의 저장방법 및 이용방법이 실행되는 시스템 구성도이다.12 is a system configuration diagram for executing a method of storing and using a search result according to the present invention.

<도면의 주요부분에 대한 부호의 설명><Description of the symbols for the main parts of the drawings>

1000: 검색엔진 1100: 분산 데이터베이스1000: search engine 1100: distributed database

1200: 사용자 인터페이스 1300: 색인어 가중치부여부1200: user interface 1300: indexed weighting

1400: 색인어 확장부 1500: 하이퍼링크ㆍ색인어추출부1400 index index expansion unit 1500 hyperlink index extraction unit

1600: 하이퍼링크 파일 1700: 변환파일1600: hyperlink file 1700: conversion file

1800: 색인파일 1900: 질의어 처리부1800: index file 1900: query processing unit

이하에서 첨부도면을 참조하여 본 발명의 실시예들을 상세히 설명한다.Hereinafter, with reference to the accompanying drawings will be described embodiments of the present invention;

도 1은 색인 데이터베이스를 포함하는 분산 컴퓨터시스템(100)을 나타낸다.1 illustrates a distributed computer system 100 that includes an index database.

분산시스템(100)은 네트워크(130)를 통해 서버(sever) 컴퓨터(웹사이트) (120)에 연결되는 클라이언트(client) 컴퓨터(110)를 포함한다. 이 네트워크(130)는 클라이언트가 서버와 통신하도록 허용하는 인터넷 통신 프로토콜(IP)을 사용할수 있다.Distributed system 100 includes a client computer 110 that is connected to a server computer (website) 120 via a network 130. This network 130 may use an Internet communication protocol (IP) that allows a client to communicate with a server.

상기 클라이언트 컴퓨터(110)는 개인용 컴퓨터, 워크스테이션 또는 이보다 크거나 더 작은 컴퓨터시스템이 될 수 있다. 전형적으로 각 클라이언트는 하나 또는 그 이상의 프로세서, 메모리 및 입/출력장치를 포함한다. 서버 역시 클라이언트와 유사한 구성을 갖는다.The client computer 110 may be a personal computer, a workstation or a computer system larger or smaller. Each client typically includes one or more processors, memory, and input / output devices. The server also has a similar configuration to the client.

그러나, 대부분의 경우에 있어서 서버사이트(120)는 지역(local) 네트워크에 의해 접속된 수많은 컴퓨터를 포함한다. 사실상 네트워크는 수백 내지 수천개의 개별적인 컴퓨터 네트워크를 포함한다.In most cases, however, serversite 120 includes a number of computers connected by a local network. In effect, the network includes hundreds to thousands of individual computer networks.

비록 클라이언트 컴퓨터가 서버 컴퓨터로부터 분리되어 도시되어 있지만 단일한 컴퓨터가 클라이언트와 서버의 역할을 동시에 수행할 수 있다.Although the client computer is shown separately from the server computer, a single computer can serve as both a client and a server at the same time.

이 분산 시스템의 작동시 클라이언트 컴퓨터의 사용자는 예를들어, 월드와이드웹 서버에 의해 저장된 정보레코드(122)에 액세스하기를 희망한다. 이 정보레코드(122)는 웹페이지(200)의 형태일 수 있다. 웹페이지(200)는 단순한 컨텐츠 텍스트 정보나 소프트웨어 프로그램, 그래픽, 오디오신호, 비디오 등과 같이 더 복잡하게 디지털적으로 암호화된 멀티미디어 컨텐츠와 같은 데이터레코드일 수 있다.In operation of this distributed system the user of the client computer wishes to access the information record 122 stored by, for example, the World Wide Web server. The information record 122 may be in the form of a web page 200. Web page 200 may be a data record such as simple content text information or more complex digitally encrypted multimedia content such as software programs, graphics, audio signals, video, and the like.

상기 클라이언트는 웹페이지를 위치지정하기 위해 네비게이트(Navigator) 또는 익스플로러(Explorer)와 같은 웹브라우저 프로그램(112)을 포함한다. 이 브라우저 프로그램(112)은 사용자가 검색할 특정 웹페이지의 주소(또는 위치정보)를 입력하도록 허락한다. 웝페이지의 주소(또는 위치정보)는 URL(uniform resource locator)로서 지정된다. 또한, 브라우저 프로그램은 페이지가 획득되었을때, 그 획득된 웹페이지의 하이퍼링크(hyperlink)를 클릭(click)하는 것에 의해 다른 페이지에 액세스(acess)할 수 있도록 한다. 이러한 하이퍼링크는 페이지를 획득한 후, 다른 페이지의 URL을 입력하는 것을 자동화시킨다.The client includes a web browser program 112, such as a navigator or explorer, to locate the web page. This browser program 112 allows the user to enter the address (or location information) of the particular web page to search. The address (or location information) of the page is specified as a uniform resource locator (URL). In addition, the browser program makes it possible to access another page by clicking on a hyperlink of the obtained web page when the page is obtained. These hyperlinks automate entering the URL of another page after obtaining the page.

웹상의 이용가능한 수백만개의 웹페이지중 사용자가 흥미를 갖는 페이지를 식별하기 위해서 검색엔진(140)이 제공된다.A search engine 140 is provided to identify pages of interest to the user from the millions of available web pages on the web.

이 검색엔진(140)은 고속 통신버스(143)에 의해 서로 접속된 다중프로세서 (P)(142), 메모리(M)(144), 색인 데이터베이스(146) 및 네트워크 인터페이스(148) 및 구동프로그램으로 이루어진다. 물론, 본 발명이 고속 통신버스에 의해 서로 접속된 다중 프로세서로 구성된 검색엔진에 한정되는 것은 아니다.The search engine 140 is a multiprocessor (P) 142, a memory (M) 144, an index database 146 and a network interface 148 and a driving program connected to each other by a high-speed communication bus 143. Is done. Of course, the present invention is not limited to a search engine composed of multiple processors connected to each other by a high speed communication bus.

이하에서, 첨부도면을 참조하여 본 발명에 따른 검색엔진의 바람직한 실시예들의 구성을 설명한다.Hereinafter, the configuration of the preferred embodiments of the search engine according to the present invention with reference to the accompanying drawings.

실시예 1Example 1

도 2는 본 실시예에 따른 검색엔진의 기술적인 개념을 설명하기 위한 도면이다.2 is a view for explaining the technical concept of the search engine according to the present embodiment.

본 실시예는 특정 웹페이지에 대한 색인작업을 해당 웹페이지의 텍스트문서로 부터 수행하는 것을 위주로 하기 보다는 이 특정 웹페이지를 지시하는 적어도 하나 이상의 하이퍼링크의 링크텍스트로 부터 수행하는 것이 특징이다.The present embodiment is characterized in that indexing of a specific web page is performed from link text of at least one hyperlink pointing to the specific web page, rather than mainly from the text document of the web page.

도 2를 참조하여 보다 상세하게 설명하면, 색인하고자 하는 웹페이지(10)는 크게 위치정보(11)와 텍스트문서(12)로 이루어진다. 또한, 이 웹페이지(10)는 복수의 외부 웹페이지들(20)과 하이퍼링크(21)로 연결된다.In more detail with reference to FIG. 2, the web page 10 to be indexed is composed of a location information 11 and a text document 12. In addition, the web page 10 is connected to a plurality of external web pages 20 and hyperlinks 21.

이 하이퍼링크(21)는 크게 위치정보(21a)(URL)와 링크텍스트(21b)를 포함한다. 상기 위치정보(21a)는 이 링크를 통해 접속하고자 하는 웹페이지(10)의 URL(0)로 정의된다. 상기 링크텍스트(21b)에는 상기 위치정보(21a)에 의해 지시되는 해당 웹페이지(10)의 텍스트문서(12)를 한마디로 요약하여 표현할 수 있는 대표적인 문구가 포함되어 있다.This hyperlink 21 largely includes positional information 21a (URL) and link text 21b. The location information 21a is defined as the URL 0 of the web page 10 to be accessed via this link. The link text 21b includes a representative phrase for summarizing and representing the text document 12 of the web page 10 indicated by the location information 21a.

따라서, 이 하이퍼링크들(21)의 링크텍스트(21b)(링크 텍스트 1, 2 ㆍㆍㆍ n)로부터 색인어를 추출할 경우 이 색인어들은 하이퍼링크의 위치정보(21a)에 의해 지시되는 웹페이지(10)상의 정보를 대표하는 워드가 될 것이다.Therefore, when extracting index words from the link text 21b (link texts 1, 2 ... n) of the hyperlinks 21, these index words are displayed on the web page indicated by the hyperlink position information 21a. 10) This will be a word representing the information on the image.

종래의 검색엔진의 경우에는 색인하고자 하는 웹페이지(10)의 텍스트(12)로부터 가능한 모든 색인어를 추출하여 해당 웹페이지를 색인한다. 그러나, 웹페이지(10)의 텍스트(12)에는 그 웹페이지의 정보를 대표하는 워드도 포함되어 있지만 대부분은 그 웹페이지를 대표하지 않는 워드(이하, 불용어로 정의한다)들로 이루어진다.In the conventional search engine, all possible index words are extracted from the text 12 of the web page 10 to be indexed and the corresponding web page is indexed. However, although the text 12 of the web page 10 includes words representing the information of the web page, most of them are composed of words (hereinafter, defined in a stop word) that do not represent the web page.

따라서, 종래와 동일한 방법을 통해 웹페이지를 색인하는 경우에는 색인작업이 방대해지고, 검색의 정확성이 떨어진다.Therefore, when the web page is indexed by the same method as before, the indexing operation is enormous and the accuracy of the search is inferior.

즉, 본 실시예는 도 2와 같이 색인하고자 하는 웹페이지(10)를 지시하는 하나 이상의 하이퍼링크들(21)의 링크텍스트(21b)로부터 색인어를 추출함으로써 적은 색인작업으로도 색인의 정확성을 높일 수 있다.That is, the present embodiment extracts index words from the link text 21b of one or more hyperlinks 21 indicating the web page 10 to be indexed as shown in FIG. Can be.

이하에서, 첨부도면 도 3 내지 도 5를 참조하여 도 2의 개념을 구체적으로 설명한다.Hereinafter, the concept of FIG. 2 will be described in detail with reference to FIGS. 3 to 5.

도 3은 실시예 1의 검색엔진에 대한 내부구성을 도시한 것이다.3 shows the internal structure of the search engine of the first embodiment.

<색인 데이터베이스의 구축><Building Index Database>

도 3에 도시된 바와같이, 본 실시예의 검색엔진은 URL 입력 및 저장부(40, 41, 42, 43, 44), 통신인터페이스(30), 하이퍼링크 추출모듈(50), 하이퍼링크 분석모듈(60), 색인어추출모듈(70), 하이퍼링크 파일(72), 색인파일 형성부(73, 75), 변환파일 형성부(76, 78), 질의처리 및 순위결정모듈(80) 및 질의어모듈(90)을 포함한다.As shown in FIG. 3, the search engine of the present embodiment includes a URL input and storage unit 40, 41, 42, 43, 44, a communication interface 30, a hyperlink extraction module 50, and a hyperlink analysis module ( 60, index word extraction module 70, hyperlink file 72, index file forming unit (73, 75), conversion file forming unit (76, 78), query processing and ranking module (80) and query module ( 90).

상기 URL 입력 및 저장부는 사용자 인터페이스(40), 제 1 URL 저장파일 (42), 저장파일 리셋 및 URL 할당모듈(43), 제 2 URL저장파일(44)로 이루어진다.The URL input and storage unit comprises a user interface 40, a first URL storage file 42, a storage file reset and URL assignment module 43, and a second URL storage file 44.

상기 사용자 인터페이스(40)는 검색엔진 서버측의 관리자가 색인 데이터베이스의 구축을 위해 초기 URL을 검색엔진 서버에 입력하는 수단이다. 여기서, 초기 URL은 루트 URL(41)인 것이 바람직하다. 여기서, 루트(root) URL이라 함은 웹상의 계층적 트리구조에서 해당 URL의 계층적 상위에 해당하는 선조 페이지가 존재하지 않는 URL을 의미한다. 상기 트리구조는 루트 URL로 지시되는 웹페이지를 담고 있는 디렉토리 및 하부 디렉토리에 존재하면서 루트 URL로 지시되는 웹페이지에서 하이퍼링크들을 따라 도달할 수 있는 웹페이지들간 그래프 구조도 포함한다.The user interface 40 is a means for the administrator of the search engine server to input an initial URL to the search engine server for building an index database. Here, the initial URL is preferably the root URL 41. Here, the root URL refers to a URL in which a ancestor page corresponding to a hierarchical upper level of the corresponding URL does not exist in the hierarchical tree structure on the web. The tree structure also includes a graph structure between webpages that reside in the directory containing the webpage indicated by the root URL and subdirectories that can be reached along hyperlinks in the webpage indicated by the root URL.

이때, 검색엔진 서버의 관리자는 사용자 인터페이스(40)를 통해 적어도 하나 이상의 루트 URL(41)을 입력시킨다.At this time, the administrator of the search engine server inputs at least one or more root URLs 41 through the user interface 40.

상기 제 1 URL 저장파일(42)은 사용자 인터페이스(40)로부터 입력되는 초기 루트 URL(41)과 아래에서 후술하는 하이퍼링크 분석모듈(60)에서 전달되는 루트URL(62)을 저장하기 위한 기억수단이다.The first URL storage file 42 is a storage means for storing the initial root URL 41 input from the user interface 40 and the root URL 62 transmitted from the hyperlink analysis module 60 described below. to be.

상기 저장파일 리셋 및 URL 할당모듈(43)은 제 2 URL 저장파일(44)을 리셋시킨후 상기 제 1 URL 저장파일(42)로부터 하나의 루트 URL을 판독하여 이를 제 2 URL 저장파일(44)에 저장시키기 위한 명령을 실행하는 프로그램 모듈이다.The storage file reset and URL assignment module 43 resets the second URL storage file 44, reads one root URL from the first URL storage file 42, and then stores the second URL storage file 44. Program module that executes commands to be stored in.

상기 제 2 URL 저장파일(44)은 상기 저장파일 리셋 및 URL 할당모듈(43)에 의해 할당된 루트 URL을 저장하고, 후술하는 하이퍼링크 분석모듈(60)에서 전달되는 서브페이지 URL(61)을 저장하기 위한 기억수단이다.(상기 제 2 URL 저장파일은 RAM에 존재할 수도 있다)The second URL storage file 44 stores the root URL assigned by the storage file reset and URL assignment module 43 and stores the subpage URL 61 transmitted from the hyperlink analysis module 60 to be described later. Storage means for storing (the second URL storage file may exist in RAM).

상기 통신인터페이스는 인터넷과 같은 네트워크(130)를 통해 검색엔진(140)과 외부의 웹사이트(120)(웹서버)를 인터페이스하기 위한 수단이다.The communication interface is a means for interfacing the search engine 140 and an external website 120 (web server) through a network 130 such as the Internet.

도 3의 경우 이 통신인터페이스가 웹페이지 획득모듈로 정의되어 있다. 즉, 이 웹페이지 획득모듈(30)은 제 2 URL저장파일(44)로부터 하나의 URL(45)을 판독하고, 이렇게 판독된 URL(45)이 지시하는 웹페이지(200)를 인터넷과 같은 네트워크(130)를 경유하여 해당 웹서버(120)로부터 획득하는 프로그램 모듈이다.3, this communication interface is defined as a web page acquisition module. That is, the web page obtaining module 30 reads one URL 45 from the second URL storing file 44, and the web page 200 indicated by the read URL 45 is connected to a network such as the Internet. The program module obtained from the web server 120 via the 130.

웹페이지 획득모듈(30)은 인터넷(130)을 통해 획득한 웹페이지(200)를 후술하는 하이퍼링크 추출모듈(50)에 전달한다.The web page acquisition module 30 transmits the web page 200 acquired through the Internet 130 to the hyperlink extraction module 50 described later.

이 하이퍼링크 추출모듈(50)은 웹페이지(200)상에 존재하는 모든 하이퍼링크를 순차적으로 추출하여 이를 후술하는 하이퍼링크 분석모듈(60)에 순차 전달하는 프로그램 모듈이다. 여기서, "하이퍼링크를 추출하여 전달한다"라는 의미는 웹페이지(200)상에 하이퍼링크가 존재하는지 여부를 판단하고, 웹페이지(200)상에 하이퍼링크가 존재하지 않는 경우에는 이를 웹페이지 획득모듈(30)에 통지하고, 웹페이지 (200)상에 하이퍼링크가 존재하는 경우에는 이를 중첩되지 않도록 하나씩 순차 추출하여 해당 하이퍼링크의 「위치정보와 하이퍼링크 텍스트」(51)를 후술하는 하이퍼링크 분석모듈(60)에 각각 전달한다는 것을 나타낸다.The hyperlink extraction module 50 is a program module that sequentially extracts all the hyperlinks existing on the web page 200 and sequentially delivers them to the hyperlink analysis module 60 which will be described later. Here, the term "extract and transfer the hyperlink" means determining whether a hyperlink exists on the web page 200, and obtaining a web page when the hyperlink does not exist on the web page 200. If a hyperlink exists on the web page 200, the module 30 is notified, and the hyperlink which extracts the position information and the hyperlink text 51 of the hyperlink later is extracted one by one so as not to overlap. It indicates to each of the analysis module 60.

상기 하이퍼링크 분석모듈(60)은 상기 하이퍼링크 추출모듈(50)로부터 전달되는 하이퍼링크의 위치정보를 분석하는 프로그램 모듈이다. 즉, 상기 하이퍼링크 추출모듈(50)로부터 전달되는 하이퍼링크의 URL을 분석하여 해당 URL이 상기 제 1 URL 저장파일로부터 판독한 루트 URL 부분을 포함하고 있는지 여부를 판단할 뿐만 아니라 해당 URL이 디폴트(default) URL인지 여부도 판단한다.The hyperlink analysis module 60 is a program module for analyzing the location information of the hyperlink transmitted from the hyperlink extraction module 50. That is, by analyzing the URL of the hyperlink delivered from the hyperlink extraction module 50, it is determined whether the URL includes the root URL portion read from the first URL storage file, and the URL is the default ( default) URL is also determined.

여기서, 루트 URL은 상술한 정의와 동일한 의미로 사용되었고, 서브페이지 URL은 URL의 계층적 구조에서 루트 URL에 대해 계층적으로 종속되는 하위 페이지를 가리키는 URL로 정의된다.Here, the root URL is used in the same meaning as the above definition, and the subpage URL is defined as a URL indicating a lower page that is hierarchically dependent on the root URL in the hierarchical structure of the URL.

또한, 상기 디폴트 URL은 URL의 구조에서 파일명이 생략된 상태로 또는 생략되더라도 해당 URL이 지시하는 웹페이지에 액세스할 수 있도록 구성된 URL을 의미한다.In addition, the default URL refers to a URL configured to access a web page indicated by the URL even if the file name is omitted or omitted from the structure of the URL.

일반적으로, URL은 "프로토콜://컴퓨터 주소/디렉토리명/파일명"과 같은 계층적 구조를 갖는다. 여기서, 프로토콜은 http(hypertext transfer protocol), ftp(file transfer protocol)와 같이 URL로 표시되는 자원을 통신망에서 전송하기 위해 사용되는 통신규약을 의미하고, 컴퓨터주소는 자원이 저장된 상대방 컴퓨터의 주소를 의미하며, 디렉토리명은 그 컴퓨터에 있는 디렉토리를 의미한다.In general, a URL has a hierarchical structure such as "protocol: // computer address / directory name / file name". Here, the protocol refers to a communication protocol used to transmit a resource represented by a URL such as http (hypertext transfer protocol) and ftp (file transfer protocol) in a communication network, and the computer address refers to the address of the other computer where the resource is stored. The directory name is the directory on that computer.

예를들어,「http://www.microsoft.com/Windows95/help.html」에서"www.microsoft.com"은 컴퓨터 주소를 나타내고, "Windows95"는 그 컴퓨터의 디렉토리명을 나타내며, "help.html"는 그 디렉토리에서 자원이 위치된 파일이름을 나타낸다.For example, in "http://www.microsoft.com/Windows95/help.html", "www.microsoft.com" represents a computer address, "Windows95" represents a directory name of the computer, and "help. html "represents the file name where the resource is located in that directory.

따라서, 웹브라우저가 특정 웹페이지에 접속하기 위해서는 원칙적으로 해당 웹페이지의 전체 주소(파일명까지)를 입력해야 한다. 그러나, 해당 웹페이지의 URL이 디폴트 URL인 경우에는 해당 웹페이지의 전체 주소중 파일명을 생략하더라도 웹브라우저는 해당 웹페이지에 액세스할 수가 있다.Therefore, in order for a web browser to access a specific web page, in principle, the full address (up to the file name) of the web page must be entered. However, if the URL of the web page is the default URL, the web browser can access the web page even if the file name is omitted among the entire addresses of the web page.

상기 하이퍼링크 분석모듈(60)은 후술하는 도 3a ~ 3b의 단계 S9 ~ S24의 루틴을 통해 루트 URL(62)은 제 1 URL 저장파일(42)에 저장하고, 서브페이지 URL(61)은 제 2 URL저장파일(44)에 저장시킨다.The hyperlink analysis module 60 stores the root URL 62 in the first URL storage file 42 and the subpage URL 61 through the routines of steps S9 to S24 of FIGS. 3A to 3B described later. 2 URL storage file (44).

또한, 상기 하이퍼링크 분석모듈(60)은 추출된 하이퍼링크가 지시하는 URL이 디폴트 URL인지 여부를 판단하여 디폴트 URL을 갖는 하이퍼링크만을 선별하여 해당 「위치정보와 링크텍스트」(63)를 후술하는 색인어 추출모듈(70)에 전달한다.In addition, the hyperlink analysis module 60 determines whether the URL indicated by the extracted hyperlink is the default URL, and selects only the hyperlink having the default URL to describe the corresponding "location information and link text" 63. Transfer to the index word extraction module 70.

상기 색인어 추출모듈(70)은 디폴트 URL을 갖는 하이퍼링크의 링크텍스트에서 색인어를 추출하고, 이렇게 추출된 색인어들을 이용하여 「위치정보: 색인어리스트」(71)(URL : Index=a, b, c)를 형성하는 프로그램 모듈이다.The index word extracting module 70 extracts an index word from the link text of the hyperlink having a default URL, and uses the extracted index words to " location information: index word list " 71 (URL: Index = a, b, c). Is a program module that forms

색인어 추출모듈(70)에 의해 형성된 「위치정보: 색인어리스트」(71)(URL : Index=a, b, c)는 하이퍼링크 저장파일(72)에 축적된다. 여기서, "축적된다"는 의미는 동일한 위치정보에 대하여 대응하는 색인어리스트가 복수개 형성되어 기억될수 있다는 의미이다. 즉, 예를들어, 특정한 디폴트 URL(0)에 대하여 (a, b, c),(d, e, f), (a, e, g), (a, e, g) 등의 복수의 색인어 리스트가 대응되고, 이에따라 「URL(0): (a, b, c)」, 「URL(0): (d, e, f)」,「URL(0): (a, e, g)」, 「URL (0): (a, e, g)」등의 복수개의 「위치정보: 색인어리스트」가 하이퍼링크 저장파일에 저장될 수 있다.The " positional information: index list " 71 (URL: Index = a, b, c) formed by the index word extracting module 70 is stored in the hyperlink storage file 72. As shown in FIG. Here, "accumulated" means that a plurality of index index lists corresponding to the same positional information can be formed and stored. That is, for example, a plurality of index words such as (a, b, c), (d, e, f), (a, e, g), (a, e, g), etc. for a specific default URL (0). Lists are matched, and accordingly, "URL (0): (a, b, c)", "URL (0): (d, e, f)", and "URL (0): (a, e, g)" A plurality of " location information: index word lists " such as " URL (0): (a, e, g) " can be stored in the hyperlink storage file.

이와 같이, 제 1 URL 저장파일(42)에 존재하는 모든 루트 URL과 그 서브페이지 URL로부터 하이퍼링크의 추출과 색인어 추출이 완료되면, 상기 하이퍼링크 저장파일(72)에는 수백만개 이상의 「위치정보: 색인어리스트」가 저장된다.In this manner, when the extraction of the hyperlink and the extraction of the index word from all the root URLs and the subpage URLs of the first URL storage file 42 are completed, the hyperlink storage file 72 includes millions of "location information: Index index list ”is stored.

색인파일 형성모듈(73)은 동일한 위치정보를 갖는 복수의 「위치정보: 색인어리스트」로부터 각 색인어에 대한 가중치를 부여함으로써 후술하는 색인파일(75)을 형성하는 프로그램 모듈이다. 즉, 색인파일 형성모듈(73)은 상기 하이퍼링크 저장파일(72)로부터 동일한 위치정보를 갖는 「위치정보: 색인어리스트」들을 수집하고, 이렇게 수집된 복수의 색인어리스트로부터 각 색인어의 발생빈도를 체킹하여 그 발생빈도에 따른 가중치를 부여한다. 예를들어, 디폴트 URL(0)에 대응하는 복수의 색인어리스트들에서 a라는 색인어가 100번 나왔다면, 해당 색인어(a)의 가중치는 100이 된다. 이와같은 작업을 반복하는 것에 의해 각각의 색인어에 대해 가중치가 부여되고, 하나의 URL에 대응하는 하나의 색인어(가중치)리스트[URL(0):a(w1), b(w2), c(w3)ㆍㆍㆍ]를 형성할 수 있다. 이렇게 형성된 「위치정보: 색인어(가중치)리스트」(74)(URL(0):a(w1), b(w2), c(w3)ㆍㆍㆍ)는 후술하는 색인파일(75)에 저장된다.The index file forming module 73 is a program module for forming the index file 75 to be described later by assigning a weight to each index word from a plurality of "location information: index word lists" having the same position information. That is, the index file forming module 73 collects the " location information: index list " having the same location information from the hyperlink storage file 72, and checks the occurrence frequency of each index word from the plurality of index index lists thus collected. The weight is given according to the frequency of occurrence. For example, if the index word a appears 100 times in the index list corresponding to the default URL 0, the weight of the index word a is 100. By repeating this operation, each index word is weighted, and one index word (weight) list corresponding to one URL [URL (0): a (w1), b (w2), c (w3) ) Can be formed. The thus-formed "location information: index word list" 74 (URL (0): a (w1), b (w2), c (w3) ...) is stored in the index file 75 described later. .

색인파일(75)에 모든 「위치정보: 색인어(가중치)리스트」가 저장되면, 후술하는 변환파일 형성모듈(76)에 의해 변환파일이 형성된다.When all " positional information: index word (weighted) list ”are stored in the index file 75, the converted file is formed by the conversion file forming module 76 described later.

변환파일 형성모듈(76)은 색인파일에 저장된 「위치정보: 색인어(가중치)리스트」를 「색인어: 위치정보(가중치)리스트」(77)[a : URL(w1), URL(w2), URL(w3) ㆍㆍㆍ]로 변환하는 프로그램 모듈(78)이다.The conversion file forming module 76 converts the "location information: index word (weighted) list" stored in the index file into the "index word: location information (weighted) list" 77 (a: URL (w1), URL (w2), URL). (w3) Program module 78 for converting to ....

상기 「색인어: 위치정보(가중치)리스트」(77)[a : URL(w1), URL(w2), URL (w3) ㆍㆍㆍ]들이 변환파일(78)에 저장됨으로써 색인 데이터베이스가 생성된다.The "index word: position information (weighted) list" 77 (a: URL w1, URL w2, URL w3) is stored in the conversion file 78, thereby creating an index database.

상술한 구성에 근거하여 색인 데이터베이스 구축과정을 도 3의 플로우챠트를 참조하여 상세히 설명하면 다음과 같다.An index database construction process based on the above-described configuration will be described in detail with reference to the flowchart of FIG. 3.

본 발명의 검색엔진 서버 관리자가 서버 컴퓨터(140)의 사용자 인터페이스 (40)를 통해 적어도 하나 이상의 웹페이지 루트 URL(41)을 입력하면, 이 루트 URL (41)은 제 1 URL 저장파일(42)에 저장된다. (단계 S1)When the search engine server administrator of the present invention inputs at least one web page root URL 41 through the user interface 40 of the server computer 140, the root URL 41 is the first URL storage file 42. Are stored in. (Step S1)

저장파일 리셋 및 URL 할당모듈(43)은 제 1 URL 저장파일(42)로부터 하나의 루트 URL을 읽어들인다. (단계 S2) 이때, 제 1 URL 저장파일로부터 루트 URL의 판독이 불가능한 경우에는 단계 S30으로 진행하고, 판독이 가능한 경우에는 단계 S4로 넘어간다. (단계 S3) 여기서, 제 1 URL 저장파일로부터 루트 URL의 판독이 불가능하다는 것은 더이상 처리할 루트 URL의 정보가 존재하지 않는다는 것을 의미한다.The storage file reset and URL assignment module 43 reads one root URL from the first URL storage file 42. (Step S2) At this time, if it is impossible to read the root URL from the first URL storage file, the process proceeds to step S30, and if the reading is possible, the process goes to step S4. (Step S3) Here, the inability to read the root URL from the first URL storage file means that there is no information of the root URL to be processed anymore.

상기 단계 S3에서 루트 URL의 판독이 가능하면, 상기 저장파일 리셋 및 URL 할당모듈(43)은 제 2 URL 저장파일(44)을 리셋하고, 판독한 루트 URL을 제 2 URL 저장파일(44)에 저장한다. (단계 S4)If the root URL can be read in step S3, the storage file reset and URL assignment module 43 resets the second URL storage file 44, and reads the read root URL to the second URL storage file 44. Save it. (Step S4)

웹페이지 획득모듈(30)은 제 2 URL 저장파일(44)로부터 하나의 URL(45)을 읽어들인다. (단계 S5) 이때, 제 2 URL 저장파일(44)로부터 URL(45)의 판독이 불가능한 경우에는 단계 S2로 복귀하고, 판독이 가능한 경우에는 단계 S8로 넘어간다. (단계 S6, S7) 여기서, 제 2 URL 저장파일로부터 URL의 판독이 불가능하다는 것은 제 2 URL 저장파일에 존재하는 모든 URL에 대한 하이퍼링크 정보의 추출 및 색인어 추출 작업이 완료되었다는 것을 의미하며, 따라서, 제 1 URL 저장파일로부터 새로운 루트 URL의 할당과 기존 URL의 리셋이 필요하다는 것을 의미한다.The web page acquisition module 30 reads one URL 45 from the second URL storage file 44. (Step S5) At this time, if the URL 45 cannot be read from the second URL storing file 44, the flow returns to step S2, and if the reading is possible, the process goes to step S8. (Steps S6, S7) Here, the inability to read the URL from the second URL storage file means that the extraction of the hyperlink information and the index word extraction for all the URLs present in the second URL storage file have been completed. This means that a new root URL is allocated from the first URL storage file and a reset of the existing URL is required.

상기 단계 S6에서 URL의 판독이 가능하면, 웹페이지 획득모듈(30)은 이 URL이 지시하는 웹페이지(200)를 인터넷(130)상의 웹서버(120)로부터 획득한다.(단계 S8)If the URL can be read in step S6, the web page obtaining module 30 obtains the web page 200 indicated by the URL from the web server 120 on the Internet 130 (step S8).

웹페이지의 획득이 완료되면, 이 웹페이지는 하이퍼링크 추출모듈(50)에 전달되고, 하이퍼링크 추출모듈(50)은 상기 웹페이지에서 순차적으로 하이퍼링크를 추출한다.(단계 S9)When the acquisition of the web page is completed, the web page is delivered to the hyperlink extracting module 50, and the hyperlink extracting module 50 sequentially extracts the hyperlink from the web page (step S9).

이때, 웹페이지상에 추출할 수 있는 하이퍼링크가 존재하지 않는 경우에는 단계 S5로 복귀하고, 하이퍼링크가 존재하는 경우에는 단계 S11로 넘어간다. (단계 S10) 여기서, 웹페이지상에서 추출할 수 있는 하이퍼링크가 존재하지 않는 경우는 웹페이지에서 추출할 수 있는 하이퍼링크가 현실적으로 전혀 존재하지 않는 경우와 웹페이지에 존재하는 모든 하이퍼링크의 추출이 완료된 경우를 포함한다.At this time, if there is no hyperlink to be extracted on the web page, the process returns to step S5, and if there is a hyperlink, the process proceeds to step S11. (Step S10) Here, if there is no hyperlink to be extracted on the web page, if there is no hyperlink to be extracted from the web page at all and the extraction of all hyperlinks existing on the web page is completed. Includes cases.

상기 단계 S10에서 웹페이지로부터 하이퍼링크의 추출이 가능하면, 하이퍼링크 추출모듈(50)은 추출된 하이퍼링크 정보로 부터 링크 URL과 링크텍스트를 획득한다. (단계 S11)If it is possible to extract the hyperlink from the web page in step S10, the hyperlink extraction module 50 obtains the link URL and the link text from the extracted hyperlink information. (Step S11)

상기 단계 S11에서 획득한 링크 URL들에서 루트 URL을 추출하여, 이 루트 URL이 상기 단계 S2에서의 루트 URL과 일치하는지 여부를 판단한다. (단계 S14)The root URL is extracted from the link URLs obtained in step S11, and it is determined whether the root URL matches the root URL in step S2. (Step S14)

이때, 루트 URL이 서로 일치하면(단계 S15), 상기 단계 S13에서 획득한 링크 URL이 제 2 저장파일에 존재하는지 여부를 판단한다. (단계 S16) 이때, 링크 URL이 제 2 저장파일에 존재하면 단계 S24로 진행하여 이 링크 URL이 디폴트 URL인지 여부를 판단하고, 존재하지 않으면 단계 S20로 넘어가서 해당 링크 URL을 제 2 저장파일에 저장하고 단계 S24로 넘어간다.At this time, if the root URLs match with each other (step S15), it is determined whether or not the link URL obtained in step S13 exists in the second storage file. (Step S16) At this time, if the link URL exists in the second storage file, the process proceeds to step S24 to determine whether the link URL is the default URL, and if not, the process proceeds to step S20 and transfers the link URL to the second storage file. Save and go to step S24.

상기 단계 S15에서 루트 URL이 서로 일치하지 않으면, 상기 단계 S13에서 획득한 루트 URL이 제 1 저장파일에 존재하는지 여부를 판단한다. (단계 S17)If the root URLs do not coincide with each other in step S15, it is determined whether the root URL obtained in step S13 exists in the first storage file. (Step S17)

이때, 루트 URL이 존재하면 단계 S24로 진행하여 획득한 링크 URL이 디폴트 URL인지 여부를 판단하고, 존재하지 않으면 단계 S23로 넘어가서 상기 단계 S13에서 획득한 루트 URL을 제 1 저장파일에 저장하고 단계 S24로 넘어간다.In this case, if the root URL exists, the process proceeds to step S24 to determine whether the obtained link URL is the default URL. If not, the process proceeds to step S23, and the root URL obtained in the step S13 is stored in the first storage file. Proceed to S24.

상기 단계 S13 ~ 단계 23까지의 과정은 하나의 루트 URL로부터 다른 루트 URL 및 서브페이지 링크 URL을 추출하여 제 1 및 제 2 URL 저장파일의 URL정보를 확장하는 과정을 설명하는 루틴이다. 이때, 중복되는 URL의 저장을 방지하기 위해서 추출된 루트 URL 및 서브페이지 링크 URL이 제 1 및 제 2 저장파일에 존재하는지 여부를 판단한다.Steps S13 to 23 are routines for explaining a process of extending URL information of the first and second URL storage files by extracting another root URL and a subpage link URL from one root URL. At this time, it is determined whether the extracted root URL and the subpage link URL exist in the first and second storage files in order to prevent storage of duplicate URLs.

단계 S24에서 하이퍼링크 분석모듈은 상기 단계 S11에서 획득한 링크 URL이 디폴트 URL인지 여부를 판단한다. 이때, 링크 URL이 디폴트 URL이 아닌 경우에는단계 S9로 복귀하여(단계 S26) 해당 웹페이지상에서 또 다른 하나의 링크 URL을 추출하고, 디폴트 URL인 경우에는 단계 S27로 넘어간다.In step S24, the hyperlink analysis module determines whether the link URL obtained in step S11 is a default URL. At this time, if the link URL is not the default URL, the process returns to step S9 (step S26), and another link URL is extracted from the web page, and if it is the default URL, the process goes to step S27.

여기서, 추출된 링크 URL중 디폴트 URL만을 선별하는 이유는 디폴트 URL로 지시되는 웹페이지가 그 하위 웹페이지들을 대표하는 경향이 있어 하위 웹페이지들에 비해 보다 중요성이 높은 정보를 내포하고 있으며, 이로인해 유사한 검색결과의 중복을 방지할 수 있고, 디폴트 URL로 지시되는 웹페이지로 부터 상기 하위 웹페이지로의 이동이 자유롭기 때문이다.Here, the reason for selecting only the default URL among the extracted link URLs is that the web page indicated by the default URL tends to represent the sub-web pages and thus contains more important information than the sub-web pages. This is because it is possible to prevent duplication of similar search results and to move freely from the web page indicated by the default URL to the sub-web page.

단계 S27에서 색인어 추출모듈은 해당 디폴트 URL의 링크 텍스트로부터 가능한 모든 색인어를 추출하여 「위치정보:색인어리스트」[URL(0):(a, b, c)]를 생성하고, 이를 하이퍼링크 저장파일에 축적한 후(단계 S28), 단계 S9로 복귀한다.In step S27, the index word extracting module extracts all possible index words from the link text of the corresponding default URL to generate "location information: index list" [URL (0) :( a, b, c)], and the hyperlink storage file. Is accumulated (step S28), and the flow returns to step S9.

상술한 단계 S2 ~ 단계 S29의 루틴을 반복하는 과정에서 상기 단계 S2에서 더 이상 제 1 URL 저장파일로부터 판독할 루트 URL이 존재하지 않는 경우에는 단계 S30으로 진행하여 색인파일 및 변환파일 생성 루틴을 실행한다.If the root URL to be read from the first URL storage file no longer exists in step S2 in the above-described routines of steps S2 to S29, the process proceeds to step S30 to execute the index file and the conversion file generation routine. do.

단계 S30에서 색인파일 형성모듈은 하이퍼링크 저장파일로부터 동일한 URL을 갖는 복수의 「위치정보:색인어리스트」를 판독한 후, 이들로부터 색인어들의 발생빈도를 체킹하여 각 색인어에 대해 그 발생빈도에 따른 가중치를 부여함으로써 「위치정보:색인어(가중치)리스트」[URL(0): a(w1), b(w2), c(w3) ㆍㆍㆍ]를 생성한다.In step S30, the index file forming module reads a plurality of " location information: index list " having the same URL from the hyperlink storage file, checks the occurrence frequency of the index words therefrom, and weights according to the occurrence frequency for each index word. To generate a "positional information: index list" (URL (0): a (w1), b (w2), c (w3) ...).

이렇게 생성된 복수의 「위치정보:색인어(가중치)리스트」[URL(0): a(w1), b(w2), c(w3) ㆍㆍㆍ] 는 색인파일에 저장된다. (단계 S31)A plurality of " positional information: index list (weighted values) " (URL (0): a (w1), b (w2), c (w3) ...) generated in this way is stored in the index file. (Step S31)

단계 S32에서 변환파일 생성모듈은 색인파일에서 「위치정보:색인어(가중치)리스트」[URL(0): a(w1), b(w2), c(w3) ㆍㆍㆍ]들을 판독한 후 이를 「색인어 : 위치정보(가중치)리스트」[a : URL1(w1), URL2(w2), URL(w3) ㆍㆍㆍ]로 변환하고, 이를 변환파일에 저장함으로써 색인 데이터베이스를 형성한다. (단계 S23)In step S32, the conversion file generation module reads " location information: index list (weighted) list " [URL (0): a (w1), b (w2), c (w3) ...)] from the index file. An index database is formed by converting the "index word: position information (weighted value) list" [a: URL1 (w1), URL2 (w2), URL (w3) ...] into a conversion file. (Step S23)

<색인 데이터베이스의 검색><Search the index database>

색인 데이터베이스(또는 변환파일)에 대한 검색은 통상의 방법과 유사하게 이루어지는바, 이하 도 3 및 도 5를 참조하여 설명하면 다음과 같다.The search for the index database (or the conversion file) is performed similarly to the conventional method, which will be described below with reference to FIGS. 3 and 5.

특정 웹페이지에 액세스하고자 하는 사용자는 본 발명의 검색엔진(140)에 질의어를 입력하게 된다.(단계 S41)A user who wants to access a specific web page enters a query into the search engine 140 of the present invention. (Step S41)

이 질의어는 하나의 키워드나 복수의 연산자로 연결된 키워드 또는 자연어일 수 있다.This query can be a keyword or natural language linked by one keyword or multiple operators.

질의어(91)가 입력되면 질의어 모듈(90)은 이 질의어(91)를 해석하여 관련 색인어를 추출함으로써 질의요구(92)를 만든다. 이 질의요구(92)는 질의처리 및 순위결정모듈(80)에 전달된다.When the query 91 is input, the query module 90 interprets the query 91 and extracts a related index to form the query request 92. This query request 92 is forwarded to the query processing and ranking module 80.

상기 질의처리 및 순위결정모듈(80)은 변환파일(색인 데이터베이스)(78)내에서 질의요구된 색인어와 동일 색인어를 갖는 「색인어: 위치정보(가중치)리스트」를 검색한다. (단계 S42)The query processing and ranking module 80 searches for "index: location information (weighted) list" having the same index as the query requested index in the conversion file (index database) 78. (Step S42)

검색된 「색인어 : 위치정보(가중치)리스트」[색인어(a) : URL1(w1), URL2 (w2), URL3(w3) ㆍㆍㆍ]는 질의처리 및 순위결정모듈(80)에 전달되고, 질의처리 및 순위결정모듈(80)은 검색된 위치정보들을 질의어내 색인어 계수와 연산자 정의에따라 처리한 후 최종 위치정보 검색결과를 그 가중치의 고, 저에 따라 내림차순 정리한 후, 이를 질의어 모듈(90)에 전달한다.(단계 S43)The searched "index word: location information (weighted) list" [index word (a): URL1 (w1), URL2 (w2), URL3 (w3) ...]] is transmitted to the query processing and ranking module 80, and the query The processing and ranking module 80 processes the retrieved location information according to the index word coefficient and operator definition in the query word, and then sorts the final location information search result in descending order according to the high and low of the weight, and then the query module 90 (Step S43).

예를 들어, 질의어 모듈에 전달되는 "위치정보리스트"는 아래의 (1)과 같다.For example, the "location information list" transmitted to the query module is as shown in (1) below.

「URL2(w2), URL1(w1), URL3(w3) ㆍㆍㆍ」.................(1)"URL2 (w2), URL1 (w1), URL3 (w3) ..." (1)

(여기서, w2＞w1＞w3ㆍㆍㆍ)(Where w2> w1> w3 ...)

내림차순으로 정리된 "위치정보리스트"를 전달받은 질의어 모듈은 이 위치정보리스트에 근거하여 후술하는 바와같이 해당 검색결과가 사용자 컴퓨터의 웹브라우저 화면상에 표시하게 한다. (단계 S44, S45)The query module, which has received the "location information list" arranged in descending order, causes the corresponding search result to be displayed on the web browser screen of the user's computer as described later based on the location information list. (Steps S44, S45)

<검색결과의 표시><Display of search results>

이하에서, 도 5 및 도 6을 참조하여 본 발명에 따른 검색결과의 뷰잉방법을 설명한다.Hereinafter, a viewing method of a search result according to the present invention will be described with reference to FIGS. 5 and 6.

질의어 모듈(90)에 내림차순으로 정리된 위치정보리스트(81)가 전달되면, 상기 질의어모듈(90)은 웹브라우저 화면을 복수의 프레임으로 나누어서 각각의 프레임에 서로 다른 문서나 파일을 디스플레이할 수 있도록 한다.When the location information list 81 arranged in descending order is transmitted to the query module 90, the query module 90 divides the web browser screen into a plurality of frames so that different documents or files can be displayed in each frame. do.

즉, 질의어모듈(90)은 웹브라우저로 하여금 가중치가 가장 높은 최고 순위 위치정보(URL)에 호응하는 웹페이지를 인터넷상의 웹서버로부터 불러들여서 웹브라우저의 화면상에 논스톱으로 디스플레이할 수 있도록 하는 제 1 프레임 정보와, 상기 위치정보리스트(81)를 사용자 컴퓨터의 웹브라우저 화면에 디스플레이하기 위한 제 2 프레임 정보와, 검색창이나 광고표시부 등을 웹브라우저 화면에 디스플레이하기 위한 제 3 프레임 정보를 해당 웹브라우저에 전달한다.That is, the query module 90 enables the web browser to retrieve a web page corresponding to the highest weighted position information (URL) from the web server on the Internet and display the web page on the screen of the web browser on a non-stop basis. The first frame information, the second frame information for displaying the location information list 81 on the web browser screen of the user computer, and the third frame information for displaying the search box or the advertisement display unit on the web browser screen. Pass it to the browser.

이렇게 검색엔진의 질의어모듈(90)로부터 사용자 컴퓨터의 웹브라우저로 복수의 프레임 정보를 담은 문서파일이 전달되면, 웹브라우저는 해당 문서파일을 해석하여 상기 프레임 정보를 이용하여 웹브라우저 화면을 구성한다.When a document file containing a plurality of frame information is transmitted from the query engine module 90 of the search engine to the web browser of the user's computer, the web browser interprets the corresponding document file and constructs a web browser screen using the frame information.

도 6에는 웹브라우저가 검색엔진으로 부터 전달된 프레임 정보에 근거하여 해당 브라우저 화면을 구성한 상태가 도시되어 있다.6 illustrates a state in which a web browser configures a corresponding browser screen based on frame information transmitted from a search engine.

웹브라우저 화면(150)은 크게 세부분의 프레임으로 구성되는데, 하나의 프레임(154)에는 상기 제 1 프레임 정보가 표시되고, 이 프레임(154)의 우측에는 상기 제 2 프레임 정보를 표시하기 위한 또 하나의 프레임(153)이 위치하며, 상기 프레임(154)의 상단에는 상기 제 3 프레임 정보를 표시하기 위한 나머지 하나의 프레임(155)이 위치된다.The web browser screen 150 is composed of largely divided frames, wherein one frame 154 displays the first frame information, and the right side of the frame 154 displays the second frame information. One frame 153 is positioned, and the other frame 155 for displaying the third frame information is positioned at the top of the frame 154.

물론, 본 발명의 브라우저 화면이 세부분의 프레임으로 구성되는 것으로 한정되는 것은 아니며 이보다 더 많은 프레임이나 더 적은 프레임으로도 구성 가능하다.Of course, the browser screen of the present invention is not limited to being composed of frames of detail, but may be configured with more frames or fewer frames.

또한, 프레임(154)과 나머지 프레임(153, 155)의 상호 위치관계가 반드시 도 6과 같은 구성에 한정되는 것은 아니다. 즉, 각 프레임의 위치는 서로 바뀔 수 있다. 예를 들어, 상기 프레임(155)이 프레임(154)의 하단에 위치하거나, 상기 프레임(153)이 프레임(154)의 좌측에 위치할 수도 있다.In addition, the mutual positional relationship between the frame 154 and the remaining frames 153 and 155 is not necessarily limited to the configuration as shown in FIG. That is, the position of each frame may be interchanged. For example, the frame 155 may be located at the bottom of the frame 154, or the frame 153 may be located at the left side of the frame 154.

웹브라우저는 검색엔진으로부터 전달된 최고 순위 위치정보에 근거하여 이 위치정보와 호응하는 웹페이지를 인터넷상의 웹서버로 부터 불러들여 이 웹페이지를 브라우저 화면의 프레임(154)에 논스톱으로 디스플레이한다.(도 5의 S45) 따라서, 사용자는 원하는 정보와 가장 밀접하게 연관된 웹페이지를 논스톱으로 제공받을 수 있기 때문에 종래와 같이 불필요한 몇차례의 클릭과정을 생략할 수 있다.The web browser retrieves a web page corresponding to the position information from a web server on the Internet based on the highest ranking position information transmitted from the search engine and displays the web page non-stop in a frame 154 of the browser screen. S45 of FIG. 5 Therefore, since the user can be provided non-stop a web page most closely associated with the desired information, it is possible to omit several unnecessary click processes as in the prior art.

또한, 웹브라우저는 검색엔진으로부터 전달된 내림차순으로 정리된 위치정보리스트에 대한 문서를 전달받아 이를 브라우저 화면의 프레임(153)에 디스플레이한다.(도 5의 단계 44)In addition, the web browser receives the document for the location information list arranged in descending order from the search engine and displays it on the frame 153 of the browser screen (step 44 of FIG. 5).

따라서, 사용자는 프레임(154)에 디스플레이된 웹페이지상의 정보 뿐만 아니라 다른 관련 웹페이지들에 대한 정보를 상기 프레임(153)의 위치정보리스트를 이용하여 제공받을 수 있다. 이때, 상기 프레임(153)에 표시되는 위치정보리스트는 가중치가 높은 순서대로 위에서 아래로 정리된 것이다. 따라서, 사용자는 자신이 원하는 정보와 관련된 비중을 한눈에 파악할 수 있어 불필요한 클릭횟수를 줄일 수 있다.Therefore, the user may be provided with information on other web pages as well as information on the web page displayed in the frame 154 using the location information list of the frame 153. At this time, the location information list displayed in the frame 153 is arranged from top to bottom in the order of high weight. Therefore, the user can grasp the weight associated with the information he wants at a glance, thereby reducing the number of unnecessary clicks.

웹브라우저의 프레임(155)상에는 사용자가 검색어를 입력하기 위한 검색창 (155a)과 배너광고를 표시하기 위한 광고표시부(155b)가 위치된다.On the frame 155 of the web browser, a search box 155a for a user to input a search word and an advertisement display unit 155b for displaying a banner advertisement are located.

실시예 2Example 2

상술한 실시예 1의 검색엔진은 하이퍼링크를 이용하여 외부에서 특정 웹페이지(디폴트 URL을 어드레스로 갖는 웹페이지)와 연결되는 복수의 웹페이지들의 하이퍼링크(상기 특정 웹페이지를 지시하는 하이퍼링크) 텍스트로부터 색인어를 추출하고, 이 색인어들이 상기 복수의 하이퍼링크 텍스트상에 발생하는 빈도로 부터 가중치를 부여함으로써 상기 특정 웹페이지를 색인 처리하는 것을 특징으로 한다.The search engine of Embodiment 1 described above uses a hyperlink to hyperlink a plurality of web pages (hyperlinks pointing to the specific web page) that are externally connected to a specific web page (web page having a default URL as an address). Extracting an index word from text and weighting the index word from the frequency of occurrence of the index word on the plurality of hyperlinked texts.

상기 실시예 1은 색인하고자 하는 웹페이지의 텍스트 문서에서 모든 색인어를 추출하는 종래의 검색엔진에 비해서는 진일보되었다.The first embodiment is an improvement over the conventional search engine that extracts all index words from the text document of the web page to be indexed.

일반적으로, 외부에서 연결되는 하이퍼링크를 많이 갖는 웹페이지와, 외부에서 연결되는 하이퍼링크의 수가 상대적으로 적은 웹페이지를 비교할 때, 특정 색인어에 대한 발생빈도(즉, 가중치)는 하이퍼링크를 상대적으로 많이 갖는 웹페이지에서 높게 나타날 가능성이 높다.In general, when comparing a web page with many externally linked hyperlinks and a web page with a relatively small number of externally linked hyperlinks, the incidence frequency (ie, weight) for a particular index word is relatively high. It is more likely to appear high on the web pages you have.

그러나, 외부에서 연결되는 하이퍼링크의 수가 많다는 것은 해당 웹페이지가 인터넷상에서 서비스를 시작한지 오래되었다는 것을 반영하는 경향은 있어도 이것이 곧 특정한 정보(즉, 특정 색인어를 대표할 수 있는 정보)를 대표할 수 있는 문서라는 것을 보증하지는 않는다.However, the large number of externally-linked hyperlinks tends to reflect that the web page has been out of service on the Internet for a long time, but this may represent specific information (ie, information that can represent a particular index). It does not guarantee that the document exists.

외부에서 연결되는 하이퍼링크의 수가 적을 경우는 최근에 서비스를 실시한 웹페이지일 가능성이 높고, 오히려 시사성이 높은 최신의 정보를 포함하고 있을 가능성이 높다.When the number of external hyperlinks is small, it is likely that the webpage has recently been serviced, but rather contains the latest information with high topicality.

따라서, 실시예 1과 같이 하이퍼링크로부터 색인어를 추출하고, 가중치를 부여하는 것은 색인하고자 하는 웹페이지에 대한 정확한 검색을 보장할 수 없는 경우가 발생할 수 있다. 즉, 하이퍼링크의 링크 텍스트에 포함되어 있는 문구가 해당 하이퍼링크의 URL이 지시하는 웹페이지를 대표할 가능성이 높지만, 상기 웹페이지를 외부에서 지시하는 하이퍼링크가 많다는 사실이 곧 해당 웹페이지가 특정 정보에 대하여 다른 웹페이지에 비해 우선순위를 갖는다고 볼 수는 없다. 또한, 링크 텍스트에서 해당 웹페이지에 대한 모든 대표적 색인어들을 추출하기도 어렵다.Therefore, extracting and weighting index words from hyperlinks as in Embodiment 1 may not guarantee accurate search for web pages to be indexed. That is, although the phrase contained in the link text of the hyperlink is likely to represent the web page pointed to by the URL of the hyperlink, the fact that there are many hyperlinks pointing to the web page externally indicates that the web page is specific. There is no way that information has priority over other web pages. Also, it is difficult to extract all representative index words for the web page from the link text.

실시예 2는 이러한 실시예 1의 단점을 보완하기 위하여 안출되었다.Example 2 was devised to compensate for this drawback of Example 1.

즉, 실시예 2는 실시예 1의 장점을 최대한 수용하면서 실시예 1이 가질수도 있는 단점을 보완한 것이다.That is, the second embodiment supplements the disadvantages that the first embodiment may have while fully accommodating the advantages of the first embodiment.

실시예 2는 크게 세부분의 색인과정 즉, 특정 웹페이지를 지시하는 하이퍼링크들로부터 이 웹페이지를 대표할 수 있는 색인어를 1차적으로 선정하고, 상기 웹페이지의 텍스트에서 핵심 색인어를 2차로 선정하여 이를 상기 1차로 선정된 색인어에 포함시켜 색인어를 확장시키고, 상기 웹페이지와 이 웹페이지의 트리구조상의 하부에 위치하는 서브페이지들에서 상기 확장된 색인어들의 발생빈도를 측정하여 색인어들에 가중치를 부여하는 과정으로 이루어진다.In the second embodiment, a detailed indexing process, that is, an index word that can represent the web page is primarily selected from hyperlinks indicating a specific web page, and a key index word is selected second in the text of the web page. The index word is expanded by including the index word in the first selected index word, and the occurrence frequency of the extended index words is measured on the web page and subpages located under the tree structure of the web page to weight the index words. The process of giving.

따라서, 실시예 2의 경우는 색인어를, 색인하고자 하는 웹페이지를 지시하는 하이퍼링크 텍스트들과, 상기 웹페이지의 텍스트상의 핵심 색인어들로 부터 선정함으로써 불용어가 색인어로 선정되는 종래의 단점을 보완하고, 색인어별 가중치 부여에 있어서는 상기 웹페이지와 그 서브페이지의 텍스트를 근거로 하기 때문에 가중치 부여에 신뢰성 및 공정성을 높힐 수 있다.Therefore, the second embodiment compensates for the disadvantage of the conventional disadvantage that the stopword is selected by selecting the index word from hyperlink texts indicating the web page to be indexed and the key index words on the text of the webpage. In the weighting for each index word, since the text of the web page and its subpages is based on text, reliability and fairness in weighting can be improved.

이하, 도 7 내지 도 11을 참조하여 본 발명의 검색엔진에 대한 또 다른 실시예의 구성을 상세히 설명한다.Hereinafter, the configuration of another embodiment of the search engine of the present invention will be described in detail with reference to FIGS. 7 to 11.

도 2에 도시된 기술적인 개념은 실시예 1과 마찬가지로 본 실시예에도 동일하게 적용된다. 따라서, 이하에서는 실시예 1의 기재를 참조하는 것에 의해 도 2에 도시된 기술적 개념의 설명은 생략한다.The technical concept shown in FIG. 2 is equally applied to this embodiment as in the first embodiment. Therefore, hereinafter, description of the technical concept shown in FIG. 2 will be omitted by referring to the description of the first embodiment.

도 7은 본 실시예에 따른 검색엔진의 구성을 설명하기 위한 도면이다.7 is a view for explaining the configuration of the search engine according to the present embodiment.

본 실시예의 검색엔진(1000)은 인터넷과 같은 네트워크를 통해 사용자 인터페이스(1200) 및 분산 데이터베이스(1100)와 접속된다.The search engine 1000 of the present embodiment is connected to the user interface 1200 and the distributed database 1100 through a network such as the Internet.

본 실시예의 검색엔진(1000)은 하이퍼링크ㆍ색인어 추출부(1500), 색인어 확장부(1400), 색인어 가중치 부여부(1300), 하이퍼링크 파일(1600), 변환파일 (1700), 색인파일(1800), 질의어처리부(1900)로 구성된다.The search engine 1000 of the present embodiment includes a hyperlink indexing unit 1500, an index expansion unit 1400, an index word weighting unit 1300, a hyperlink file 1600, a conversion file 1700, and an index file ( 1800, the query processing unit 1900.

상기 하이퍼링크ㆍ색인어 추출부(1500)는 인터넷상의 분산 데이터베이스 (1100)로부터 특정 웹페이지(디폴트 URL을 어드레스로 갖는 웹페이지)를 지시하는 하이퍼링크들을 추출하고, 이 하이퍼링크들의 링크텍스트로 부터 색인어들을 선정하여 해당 디폴트 URL에 대응하는 색인어집합을 형성한 후 이를 하이퍼링크 파일(1600)에 저장한다.The hyperlink index extractor 1500 extracts hyperlinks indicating a specific web page (web page having a default URL as an address) from a distributed database 1100 on the Internet, and indexes the link words from the link texts of the hyperlinks. These indexes are selected to form an index set corresponding to the corresponding default URL and then stored in the hyperlink file 1600.

상기 하이퍼링크 파일(1600)은 상기 하이퍼링크ㆍ색인어 추출부에서 형성된 「디폴트 URL : 색인어집합」을 저장하기 위한 기억수단이다.The hyperlink file 1600 is a storage means for storing " default URL: index word set " formed in the hyperlink index index extractor.

상기 색인어 확장부(1100)는 하이퍼링크 파일(1600)로부터 하나의 「디폴트 URL : 색인어집합」을 판독하고, 상기 디폴트 URL을 어드레스로 갖는 웹페이지를 분산 데이터베이스(1100)로부터 불러들여 해당 웹페이지로부터 핵심 색인어를 선정한 후 이 핵심 색인어를 상기 색인어집합에 추가함으로써 상기 색인어집합을 확장한다.The index word expansion unit 1100 reads one "default URL: index set" from the hyperlink file 1600, retrieves a web page having the default URL as an address from the distributed database 1100, and retrieves the web page from the corresponding web page. After selecting a key index, the key index is added to the index set to expand the index set.

상기 색인어 가중치 부여부(1300)는 상기 색인어 확장부(1400)로부터 「디폴트 URL : 확장 색인어집합」을 전달받아 상기 디폴트 URL을 어드레스로 갖는 웹페이지와 트리구조로 깊이 N까지 연결되는 서브페이지들의 위치정보(URL)를 상기 웹페이지에 근거하여 추출한 후, 이 위치정보들에 근거하여 분산 데이터베이스(1100)로부터 해당 서브페이지들을 불러들인다.The index word weighting unit 1300 receives a "default URL: extended index word set" from the index word expansion unit 1400 and positions of web pages having the default URL as an address and subpages connected to a depth N in a tree structure. After extracting the information URL based on the web page, the corresponding subpages are retrieved from the distributed database 1100 based on the location information.

또한, 상기 색인어 가중치 부여부(1300)는 상기 디폴트 URL을 어드레스로 갖는 웹페이지와 이 웹페이지의 서브페이지들에서 상기 확장 색인어집합에 속하는 색인어들을 추출하고, 각 색인어별 발생빈도를 체킹하여 그 발생빈도에 따라 각 색인어들에 가중치를 부여함으로써 「디폴트 URL : 색인어(가중치)집합」을 생성하여 이를 색인파일(1800)에 저장한다.In addition, the index word weighting unit 1300 extracts index words belonging to the extended index word set from the web page and the subpages of the web page having the default URL as an address, and checks the occurrence frequency of each index word. By weighting each index word according to the frequency, a "default URL: index word (weighted) set" is generated and stored in the index file 1800.

상기 색인파일(1800)의 「디폴트 URL : 색인어(가중치)리스트」는 「색인어 : 위치정보(가중치)집합」로 변환된 후 변환파일(1700)에 저장된다.The "default URL: index word list" of the index file 1800 is converted into "index word: location information (weight value) set" and stored in the conversion file 1700.

상기 질의어 처리부(1900)는 사용자 인터페이스(1200)를 통해 사용자 질의어 (1220)를 입력받고, 이 질의어를 분석하여 색인어를 추출한 후 변환파일(1700)로부터 추출된 색인어들에 대응하는 위치정보(가중치)집합을 판독하여 일반적인 검색결과 재가공 방법에 따라 처리한 후, 이 위치정보들을 가중치에 따라 내림차순 정리한다.The query processing unit 1900 receives the user query 1220 through the user interface 1200, analyzes the query, extracts an index, and then positions information corresponding to the indexes extracted from the converted file 1700. The set is read and processed according to a general search result reprocessing method, and the location information is sorted in descending order according to the weight.

또한, 질의어 처리부(1900)는 상기 내림차순 정리된 위치정보 리스트와 최고 순위 위치정보를 사용자 인터페이스(1200)를 통해 사용자 컴퓨터에 검색결과로 제공한다.In addition, the query processing unit 1900 provides the descending ordered location information list and the highest ranking location information to the user computer through the user interface 1200 as a search result.

이하에서, 도 8을 참조하여 본 발명의 검색엔진에 대한 상세 구성을 설명한다.Hereinafter, a detailed configuration of the search engine of the present invention will be described with reference to FIG. 8.

<색인 데이터베이스의 구축><Building Index Database>

도 7에 도시된 바와 같이, 본 실시예의 검색엔진(1000)은 하이퍼링크ㆍ색인어 추출부(1500), 색인어 확장부(1400), 색인어 가중치 부여부(1300), 하이퍼링크 파일(1600), 변환파일(1700), 색인파일(1800), 질의어처리부(1900)로 구성된다.As shown in FIG. 7, the search engine 1000 of the present embodiment includes a hyperlink index search unit 1500, an index word expansion unit 1400, an index word weighting unit 1300, a hyperlink file 1600, and a transformation. The file 1700, the index file 1800, and query processing unit 1900.

상기 하이퍼링크ㆍ색인어 추출부(1500)는 사용자 인터페이스(40), 제 1 URL 저장파일(1510), 저장파일 리셋 및 URL 할당모듈(1530), 제 2 URL저장파일(1540), 제 1 웹페이지 획득모듈(1550), 하이퍼링크 분석모듈(1560), 색인어 추출모듈 (1580) 및 제 1 색인어 확장모듈(1590)로 이루어진다.The hyperlink index extractor 1500 includes a user interface 40, a first URL storage file 1510, a storage file reset and URL allocation module 1530, a second URL storage file 1540, and a first web page. The acquisition module 1550, the hyperlink analysis module 1560, the index word extraction module 1580, and the first index word expansion module 1590.

또한, 상기 색인어 확장부(1400)는 링크 URL 획득모듈(1410), 제 2 웹페이지 획득모듈(1420), 핵심색인어 추출모듈(1430), 제 2 색인어 확장모듈(1450)로 이루어지고, 상기 색인어 가중치 부여부(1300)는 서브페이지 URL 추출모듈(1440), 서브페이지 획득모듈(1460), 가중치부여모듈(1310)로 이루어진다.The index word expansion unit 1400 includes a link URL obtaining module 1410, a second web page obtaining module 1420, a key index extracting module 1430, and a second index expanding module 1450. The weighting unit 1300 includes a subpage URL extraction module 1440, a subpage obtaining module 1460, and a weighting module 1310.

상기 질의어 처리부(1900)는 질의어모듈(1920)과 질의처리 및 순위결정모듈 (1910)로 이루어진다.The query processing unit 1900 includes a query module 1920 and a query processing and ranking module 1910.

상기 사용자 인터페이스(40)는 검색엔진 서버측의 관리자가 색인 데이터베이스의 구축을 위해 초기 URL을 검색엔진 서버에 입력하는 수단이다. 여기서, 초기 URL은 루트 URL(1510)인 것이 바람직하다. 여기서, 루트(root) URL이라 함은 웹상의 계층적 트리구조에서 해당 URL의 계층적 상위에 해당하는 선조 페이지가 존재하지 않는 URL을 의미한다. 상기 트리구조는 루트 URL로 지시되는 웹페이지를 담고 있는 디렉토리(directory) 및 하부 디렉토리에 존재하면서 루트 URL로 지시되는 웹페이지에서 하이퍼링크들을 따라 도달할 수 있는 웹페이지들간 그래프 구조도 포함한다.The user interface 40 is a means for the administrator of the search engine server to input an initial URL to the search engine server for building an index database. Here, the initial URL is preferably the root URL 1510. Here, the root URL refers to a URL in which a ancestor page corresponding to a hierarchical upper level of the corresponding URL does not exist in the hierarchical tree structure on the web. The tree structure also includes a directory structure containing a web page indicated by the root URL and a graph structure between web pages existing in the subdirectory and reachable along hyperlinks in the web page indicated by the root URL.

이때, 검색엔진 서버의 관리자는 사용자 인터페이스(40)를 통해 적어도 하나 이상의 루트 URL(1510)을 입력시킨다.At this time, the administrator of the search engine server inputs at least one root URL 1510 through the user interface 40.

상기 제 1 URL 저장파일(1520)은 사용자 인터페이스(40)로부터 입력되는 초기 루트 URL(1510)과 아래에서 후술하는 하이퍼링크 분석모듈(1570)에서 전달되는 루트 URL(1571)을 저장하기 위한 기억수단이다.The first URL storage file 1520 is a storage means for storing the initial root URL 1510 input from the user interface 40 and the root URL 1571 transmitted from the hyperlink analysis module 1570 described below. to be.

상기 저장파일 리셋 및 URL 할당모듈(1530)은 제 2 URL 저장파일(1540)을 리셋시킨후 상기 제 1 URL 저장파일(1520)로부터 하나의 루트 URL을 판독하여 이를 제 2 URL 저장파일(1540)에 저장시키기 위한 명령을 실행하는 프로그램 모듈이다.The storage file reset and URL assignment module 1530 resets the second URL storage file 1540, reads one root URL from the first URL storage file 1520, and then stores the one URL in the second URL storage file 1540. Program module that executes commands to be stored in.

상기 제 2 URL 저장파일(1540)은 상기 저장파일 리셋 및 URL 할당모듈 (1530)에 의해 할당된 루트 URL을 저장하고, 후술하는 하이퍼링크 분석모듈(1570)에서 전달되는 서브페이지 URL(1572)을 저장하기 위한 기억수단이다.The second URL storage file 1540 stores the root URL assigned by the storage file reset and URL assignment module 1530 and stores the subpage URL 1572 transmitted by the hyperlink analysis module 1570 to be described later. It is a storage means for storing.

상기 제 1 웹페이지 획득모듈(1550)은 인터넷과 같은 네트워크(130)를 통해 검색엔진(140)과 외부의 웹사이트(120)(웹서버)를 인터페이스하기 위한 수단이다.The first web page acquisition module 1550 is a means for interfacing the search engine 140 and an external website 120 (web server) through a network 130 such as the Internet.

즉, 이 제 1 웹페이지 획득모듈(1550)은 제 2 URL저장파일(1540)로부터 하나의 URL(1541)을 판독하고, 이렇게 판독된 URL(1541)이 지시하는 웹페이지(1542)를 인터넷과 같은 네트워크(130)를 경유하여 해당 웹서버(120)로부터 획득하는 프로그램 모듈이다.That is, the first web page acquisition module 1550 reads one URL 1541 from the second URL storage file 1540, and reads the web page 1542 indicated by the read URL 1541 from the Internet. The program module is obtained from the web server 120 via the same network 130.

상기 제 1 웹페이지 획득모듈(1550)은 인터넷(130)을 통해 획득한 웹페이지 (200)를 후술하는 하이퍼링크 추출모듈(1560)에 전달한다.The first web page acquisition module 1550 transfers the web page 200 acquired through the Internet 130 to a hyperlink extraction module 1560 which will be described later.

이 하이퍼링크 추출모듈(1560)은 획득된 웹페이지(1542)상에 존재하는 모든 하이퍼링크를 순차적으로 추출하여 이를 후술하는 하이퍼링크 분석모듈(1570)에 순차 전달하는 프로그램 모듈이다. 여기서, "하이퍼링크를 추출하여 전달한다"라는 의미는 웹페이지(200)상에 하이퍼링크가 존재하는지 여부를 판단하고, 웹페이지 (200)상에 하이퍼링크가 존재하지 않는 경우에는 이를 웹페이지 획득모듈(1550)에 통지하고, 웹페이지(200)상에 하이퍼링크가 존재하는 경우에는 이를 중첩되지 않도록 하나씩 순차 추출하여 해당 하이퍼링크의 「위치정보와 하이퍼링크 텍스트」 (1561)를 후술하는 하이퍼링크 분석모듈(1570)에 각각 전달한다는 것을 나타낸다.The hyperlink extracting module 1560 is a program module that sequentially extracts all the hyperlinks existing on the obtained webpage 1542 and sequentially delivers them to the hyperlink analyzing module 1570 described later. Here, "extract and transfer the hyperlink" means determining whether a hyperlink exists on the web page 200, and obtaining a web page when the hyperlink does not exist on the web page 200. If the module 1550 is notified, and hyperlinks exist on the web page 200, the hyperlinks are extracted one by one so as not to overlap each other and the "location information and hyperlink text" 1561 of the corresponding hyperlink will be described later. Indicated to each of the analysis module 1570.

상기 하이퍼링크 분석모듈(1570)은 상기 하이퍼링크 추출모듈(1560)로부터 전달되는 하이퍼링크의 위치정보를 분석하는 프로그램 모듈이다. 즉, 상기 하이퍼링크 추출모듈(1560)로부터 전달되는 하이퍼링크의 URL을 분석하여 해당 URL이 상기 제 1 URL 저장파일로부터 판독한 루트 URL 부분을 포함하고 있는지 여부를 판단할 뿐만 아니라 해당 URL이 디폴트(default) URL인지 여부도 판단한다.The hyperlink analysis module 1570 is a program module for analyzing the location information of the hyperlink transmitted from the hyperlink extraction module 1560. That is, by analyzing the URL of the hyperlink transmitted from the hyperlink extraction module 1560, it is not only determined whether the URL includes a root URL portion read from the first URL storage file, but also the default URL ( default) URL is also determined.

여기서의 루트 URL 및 디폴트 URL은 상기 실시예 1과 동일한 의미로 사용되었다.Here, the root URL and the default URL have the same meaning as in the first embodiment.

상기 하이퍼링크 분석모듈(1570)은 후술하는 도 11a ~ 11b의 단계 S109 ~ S124의 루틴을 통해 루트 URL(1571)은 제 1 URL 저장파일(1520)에 저장하고, 서브페이지 URL(1572)은 제 2 URL저장파일(1540)에 저장시킨다.The hyperlink analysis module 1570 stores the root URL 1571 in the first URL storage file 1520 and the subpage URL 1572 through the routines of steps S109 through S124 of FIGS. 11A through 11B. 2 is stored in the URL storage file (1540).

또한, 상기 하이퍼링크 분석모듈(1570)은 추출된 하이퍼링크가 지시하는 URL이 디폴트 URL인지 여부를 판단하여 디폴트 URL을 갖는 하이퍼링크만을 선별하여해당 「위치정보와 링크텍스트」(1573)를 후술하는 색인어 추출모듈(1580)에 전달한다.In addition, the hyperlink analysis module 1570 determines whether the URL indicated by the extracted hyperlink is the default URL, and selects only the hyperlink having the default URL to describe the corresponding location information and link text 1573. Transfer to the index word extraction module 1580.

상기 색인어 추출모듈(1580)은 디폴트 URL을 갖는 하이퍼링크의 링크텍스트에서 색인어를 선정하고, 이렇게 선정된 색인어들을 이용하여 「위치정보: 색인어집합」(1581)[URL : Index = {a, b, c}]을 형성하는 프로그램 모듈이다.The index word extracting module 1580 selects an index word from the link text of the hyperlink having a default URL, and uses the selected index words so that " location information: index word set " 1581 [URL: Index = {a, b, c}].

색인어 추출모듈(1580)에 의해 형성된 색인어집합은 제 1 색인어 확장모듈(1590)에서 확장된다.The index word set formed by the index word extraction module 1580 is expanded in the first index word expansion module 1590.

즉, 상기 제 1 색인어 확장모듈(1590)은 색인어 추출모듈(1580)로부터 「위치정보: 색인어집합」(1581)을 넘겨받아, 넘겨받은 위치정보와 동일한 위치정보를 갖는 「위치정보: 색인어집합」이 하이퍼링크 파일(1600)에 존재하는지 여부를 판단한다.That is, the first index word expansion module 1590 receives "location information: index word set" 1581 from the index word extraction module 1580 and has "location information: index word set" having the same position information as the received position information. It is determined whether or not the hyperlink file 1600 exists.

이때, 동일한 위치정보가 존재하지 않는 경우에는 넘겨받은 「위치정보: 색인어집합」(1581)을 그대로 하이퍼링크 파일(1600)에 저장하고, 동일한 위치정보가 존재하는 경우에는 해당 위치정보에 대응하는 색인어들을 읽어들인다.In this case, when the same location information does not exist, the received "location information: index word set" 1581 is stored in the hyperlink file 1600 as it is, and when the same location information exists, an index word corresponding to the location information. Read them.

하이퍼링크 파일(1600)로부터 판독된 색인어들은 상기 색인어 추출모듈에서 전달된 색인어집합에 추가되고, 이렇게 색인어들의 추가에 의해 확장된 「위치정보: 확장 색인어집합」(1592)이 하이퍼링크 파일(1600)에 저장됨으로써 「위치정보: 색인어집합」이 계속 갱신(rewrite)된다.The index words read from the hyperlink file 1600 are added to the index word set delivered by the index word extraction module, and the "location information: extended index word set" 1592 expanded by the addition of the index words is the hyperlink file 1600. The "location information: index word set" is continuously updated by being stored in.

상기 링크 URL 획득모듈(1410)은 하이퍼링크 파일(1600)로부터 하나의 URL(즉, 디폴트 URL)을 획득해서 이 URL(0)(1411)을 제 2 웹페이지 획득모듈(1420)에전달하는 프로그램 모듈이다.The link URL obtaining module 1410 obtains one URL (ie, a default URL) from the hyperlink file 1600 and delivers the URL (0) 1411 to the second webpage obtaining module 1420. Module.

상기 제 2 웹페이지 획득모듈(1420)은 상기 링크 URL 획득모듈(1410)로부터 하나의 URL[URL(0)](1411)을 전달받아 이 URL(1411)이 지시하는 웹페이지(1422)를 인터넷(130)상의 웹사이트(120)로부터 획득하는 프로그램 모듈이다.The second web page obtaining module 1420 receives one URL [URL (0)] 1411 from the link URL obtaining module 1410 and displays the web page 1422 indicated by the URL 1411 on the Internet. The program module obtains from the website 120 on 130.

상기 핵심 색인어 추출모듈(1430)은 상기 제 2 웹페이지 획득모듈(1420)에서 획득된 웹페이지의 텍스트로부터 핵심 색인어를 선정하기 위한 프로그램 모듈이다. 여기서, "핵심 색인어"는 타이틀 태그나 강조를 목적으로 사용된 태그가 붙은 문구에서 선정된 단어 또는 시사성이 있는 클루(clue) 문구 다음에 위치되는 문구에서 선정된 단어 또는 어휘 전자사전에 미등록된 단어로서 정의된다.The key index word extraction module 1430 is a program module for selecting a key index word from the text of the web page acquired by the second web page acquisition module 1420. Here, the "key index" is a word selected from a title tag or a tagged phrase used for emphasis or a word placed after a topical clue phrase or a word unregistered in a vocabulary electronic dictionary. Is defined as

상기 "타이틀 태그(title tag)"는 HTML 문서의 제목을 나타낼 때 사용하는 태그로 웹브라우저의 타이틀 바에 나타난다. 이 타이틀 태그 다음에는 이 웹페이지가 어떠한 것을 보여주는지를 나타내는 제목이 위치된다. 따라서, 타이틀 태그가 붙은 문구에는 색인하고자 하는 웹페이지의 특징을 압축적으로 표현할 수 있는 단어(핵심 워드)가 존재할 가능성이 높다.The "title tag" is a tag used to indicate the title of an HTML document and appears in the title bar of the web browser. This title tag is followed by a title that indicates what this web page shows. Therefore, there is a high possibility that words (core words) capable of compressively expressing the characteristics of the web pages to be indexed in the phrases tagged with the title tag.

상기 "강조를 목적으로 사용된 태그"라 함은 상대적으로 글자의 사이즈를 크게하는 속성을 가진 태그나 강조를 위해 밑줄을 긋기 위해 사용되는 태그 등을 의미한다. 따라서, 강조를 목적으로 사용된 태그가 붙은 문구에는 해당 웹페이지를 대표할 수 있는 단어(핵심 워드)가 존재할 가능성이 높다.The term " tag used for emphasis " means a tag having an attribute of increasing the size of a letter relatively large or a tag used for underlining for emphasis. Therefore, the tagged phrase used for emphasis is likely to have a word (core word) that can represent the web page.

상기 클루(clue)문구라 함은 예를들어, 「이름 : ...」, 「회사명 : ...」과 같이 성명, 명칭, 상호 등을 표현하는 문구를 의미한다. 따라서, 이러한 클루문구다음에는 해당 웹사이트의 내용과 직접 관련된 단어(핵심 단어)가 위치된다.The clue phrase refers to a phrase that expresses a name, name, trade name, etc., for example, "name: ...", "company name: ...". Therefore, after such clue phrases, words (key words) directly related to the contents of the website are placed.

또한, 상기 어휘 전자사전에 미등록된 단어는 발생빈도가 낮은 단어지만 상대적으로 중요성이 높은 것들이다.In addition, the unregistered words in the vocabulary electronic dictionary are low occurrence words but relatively high in importance.

이와 같이, 색인하고자 하는 웹페이지를 대표하는 단어나 웹페이지의 내용과 직접적 관련성이 높은 단어들을 핵심 색인어로 분류하여 해당 웹페이지로 부터 선정함으로써 하이퍼링크에서만 색인어를 선정할때 나타날 수 있는 색인어 선택의 편협성을 보완할 수 있다.As such, words representing the web pages to be indexed or words that are directly related to the contents of the web pages are classified as core indexes and selected from the corresponding web pages, thereby selecting index words that can appear when selecting index words only in hyperlinks. It can compensate for intolerance.

상기 핵심 색인어 추출모듈(1430)에서 선정된 핵심 색인어들은 상기 제 2 색인어 확장모듈(1450)에 전달된다.The key index words selected by the key index word extraction module 1430 are transferred to the second index word expansion module 1450.

상기 제 2 색인어 확장모듈(1450)은 하이퍼링크 파일(1600)로부터 핵심 색인어가 선정된 웹페이지의 URL(0)에 대응하는 색인어집합을 불러들이고, 상기 선정된 핵심 색인어를 이 색인어집합에 추가함으로써 상기 색인어집합을 확장하는 프로그램 모듈이다.The second index word expansion module 1450 imports an index word set corresponding to the URL (0) of the web page where the key index word is selected from the hyperlink file 1600, and adds the selected key index word to the index word set. A program module that extends the index set.

상기 서브페이지 URL 추출모듈(1440)은 상기 제 2 웹페이지 획득모듈(1420)로부터 전달되는 웹페이지에 근거하여 이 웹페이지와 트리구조로 연결되는 깊이 N까지의 서브페이지들에 호응하는 URL을 추출하는 프로그램 모듈이다.The subpage URL extraction module 1440 extracts URLs corresponding to subpages up to a depth N connected to the webpage in a tree structure based on the webpage transmitted from the second webpage obtaining module 1420. Is a program module.

도 9에는 색인하고자 하는 웹페이지와 이 웹페이지의 서브페이지들이 트리구조로 연결된 상태도가 도시되어 있다.9 shows a state diagram in which a web page to be indexed and subpages of the webpage are connected in a tree structure.

도 9에 도시된 바와같이, 색인하고자 하는 웹페이지[URL(0)]의 제 1 하위계층에는 URL(1,1)과 URL(1,2)로 지시되는 제 1 계층 서브페이지가 존재한다. 이와같이 URL(0)로 지시되는 웹페이지에는 계층적 트리구조로 연결된 제 1, 제 2 ㆍㆍㆍ 제 N 계층의 서브페이지들이 존재한다.As shown in Fig. 9, a first layer subpage indicated by URL (1,1) and URL (1,2) exists in the first lower layer of the web page (URL (0)) to be indexed. In this way, the web pages indicated by the URL (0) have subpages of the first, second, ..., Nth hierarchies connected in a hierarchical tree structure.

상기 서브페이지 URL 추출모듈(1440)은 색인하고자 하는 웹페이지의 URL(0)과 이 웹페이지의 서브페이지들(깊이 N까지의 서브페이지)의 URL들을 추출하여 이를 서브페이지 획득모듈(1460)에 전달한다.The subpage URL extracting module 1440 extracts the URL (0) of the web page to be indexed and the URLs of the subpages (subpages up to depth N) of the webpage and extracts the URLs of the subpage obtaining module 1460. To pass.

상기 서브페이지 획득모듈(1460)은 상기 서브페이지 URL 추출모듈(1440)로부터 색인하고자 하는 웹페이지의 URL[URL(0)]과 이 웹페이지와 트리구조로 깊이 N까지 연결되는 도 9의 서브페이지들의 URL[URL(1), URL(2), URL(3) ㆍㆍㆍ]을 전달받아 이 URL들에 호응하는 모든 웹페이지(1462)를 인터넷(130)상의 웹사이트(120)로부터 획득하는 프로그램 모듈이다.The subpage obtaining module 1460 is a subpage of FIG. 9 connected to the URL [URL (0)] of the webpage to be indexed from the subpage URL extraction module 1440 and the webpage to a depth N in a tree structure. URLs (URL (1), URL (2), URL (3) ...)] of all the webpages 1462 corresponding to these URLs are obtained from the website 120 on the Internet 130. Program module.

상기 서브페이지 획득모듈(1460)로부터 획득된 웹페이지들(1463)은 가중치 부여모듈(1310)로 전달된다.Web pages 1463 obtained from the subpage obtaining module 1460 are transferred to the weighting module 1310.

도 10은 가중치 부여모듈(1310)이 획득된 웹페이지들을 이용하여 색인어를 추출하여 이 색인어에 가중치를 부여하는 상태를 도시한다.10 illustrates a state in which the weighting module 1310 extracts an index word using the obtained web pages and weights the index word.

먼저, 상기 가중치 부여모듈(1310)은 색인어 확장모듈(1451)로부터 확장 색인어집합(1451)을 전달받아, 이 색인어집합(1451)으로부터 도 10과 같은 색인어리스트(1312)를 추출한다.First, the weighting module 1310 receives the extended index word set 1451 from the index word expansion module 1451, and extracts the index word list 1312 as shown in FIG. 10 from the index word set 1451.

이렇게 추출된 색인어리스트(1312)를 이용하여 상기 가중치 부여모듈(1310)은 상기 서브페이지 획득모듈(1450)로부터 전달되는 웹페이지들(1463)(색인하고자 하는 웹페이지와 이 웹페이지와 트리구조로 깊이 N까지 연결되는 서브페이지들)을검색하고, 색인어별 발생빈도를 체킹한다. 이렇게 체킹된 발생빈도를 근거로 각 색인어에 대한 가중치를 연산하여 「위치정보 : 색인어(가중치)집합」[URL(0) : {a(w1), b(w2), c(w3) ㆍㆍㆍ}](1311)을 생성하여 이를 색인파일(1800)에 저장한다.Using the extracted index index 1312, the weighting module 1310 is configured to provide web pages 1463 (web pages to be indexed and tree structures with the web pages to be indexed) transmitted from the subpage obtaining module 1450. Subpages connected up to depth N) and check the frequency of each index word. Based on the occurrence frequency checked in this way, a weight for each index word is calculated and " location information: index word (weighted) set " [URL (0): {a (w1), b (w2), c (w3)] }] 1311 is generated and stored in the index file 1800.

또한, 상기 상기 가중치 부여모듈(1310)은 하이퍼링크의 링크텍스트에서 선정된 색인어 또는 상기 웹페이지 트리구조의 트리루트 웹페이지에서 검색된 색인어에 대해서는 서브페이지에서 검색된 색인어 보다 더 높은 가중치를 부여할수도 있다.In addition, the weighting module 1310 may assign a higher weight to the index word selected from the link text of the hyperlink or the index word searched from the tree root webpage of the webpage tree structure than the index word searched from the subpage. .

색인파일(1800)에 모든 「위치정보: 색인어(가중치)집합」이 저장되면, 후술하는 변환파일 생성모듈(1710)에 의해 변환파일이 형성된다.When all of the " positional information: index word (weighted) set " are stored in the index file 1800, a converted file is formed by the converted file generating module 1710 described later.

변환파일 생성모듈(1710)은 색인파일(1800)에 저장된 「위치정보: 색인어(가중치)집합」을 「색인어: 위치정보(가중치)집합」(1711)[a : {URL(w1), URL(w2), URL(w3) ㆍㆍㆍ}]으로 변환하는 프로그램 모듈이다.The conversion file generation module 1710 stores "location information: index word (weighted) set" stored in the index file 1800 as "index word: location information (weighted) set" 1711 [a: {URL (w1), URL ( w2), and the program module converts the URL (w3).

상기 「색인어: 위치정보(가중치)집합」(1711)[a : {URL(w1), URL(w2), URL(w3) ㆍㆍㆍ}]들이 변환파일(1700)에 저장됨으로써 최종 색인 데이터베이스가 생성된다.The "index word: location information (weighted) set" 1711 [a: {URL (w1), URL (w2), URL (w3) ...]] is stored in the conversion file 1700, whereby the final index database is stored. Is generated.

상술한 구성을 근거로 하여 색인 데이터베이스 구축과정을 도 11a ~ 도 11d의 플로우챠트를 참조하여 상세히 설명하면 다음과 같다.An index database construction process based on the above-described configuration will be described in detail with reference to the flowcharts of FIGS. 11A to 11D.

본 발명의 검색엔진 서버 관리자가 서버 컴퓨터(140)의 사용자 인터페이스(40)를 통해 적어도 하나 이상의 웹페이지 루트 URL(1510)을 입력하면,이 루트 URL(1510)은 제 1 URL 저장파일(1520)에 저장된다. (단계 S101)When the search engine server manager of the present invention inputs at least one webpage root URL 1510 through the user interface 40 of the server computer 140, the root URL 1510 is the first URL storage file 1520. Are stored in. (Step S101)

저장파일 리셋 및 URL 할당모듈(1530)은 제 1 URL 저장파일(1520)로부터 하나의 루트 URL을 읽어들인다. (단계 S102) 이때, 제 1 URL 저장파일(1520)로부터 루트 URL의 판독이 불가능한 경우에는 단계 S131로 진행하고, 판독이 가능한 경우에는 단계 S104로 넘어간다. (단계 S103) 여기서, 제 1 URL 저장파일로부터 루트 URL의 판독이 불가능하다는 것은 더이상 처리할 루트 URL의 정보가 존재하지 않는다는 것을 의미한다.The storage file reset and URL allocation module 1530 reads one root URL from the first URL storage file 1520. (Step S102) At this time, if it is impossible to read the root URL from the first URL storage file 1520, the flow advances to step S131, and if the reading is possible, the flow goes to step S104. (Step S103) Here, the inability to read the root URL from the first URL storage file means that there is no more information of the root URL to be processed.

상기 단계 S103에서 루트 URL의 판독이 가능하면, 상기 저장파일 리셋 및 URL 할당모듈(1530)은 제 2 URL 저장파일(1540)을 리셋하고, 판독한 루트 URL을 제 2 URL 저장파일(1540)에 저장한다. (단계 S104)If the root URL can be read in step S103, the storage file reset and URL assignment module 1530 resets the second URL storage file 1540 and stores the read root URL in the second URL storage file 1540. Save it. (Step S104)

상기 웹페이지 획득모듈(1550)은 제 2 URL 저장파일(1540)로부터 하나의 URL(1541)을 읽어들인다. (단계 S105) 이때, 제 2 URL 저장파일(1540)로부터 URL(1541)의 판독이 불가능한 경우에는 단계 S102로 복귀하고(단계 107), 판독이 가능한 경우에는 단계 S108로 넘어간다. (단계 S106) 여기서, 제 2 URL 저장파일로부터 URL의 판독이 불가능하다는 것은 제 2 URL 저장파일에 존재하는 모든 URL에 대한 하이퍼링크 정보의 추출 및 색인어 선정 작업이 완료되었다는 것을 의미하며, 따라서, 제 1 URL 저장파일로부터 새로운 루트 URL의 할당과 기존 URL의 리셋이 필요하다는 것을 의미한다.The web page acquisition module 1550 reads one URL 1541 from the second URL storage file 1540. (Step S105) At this time, if it is impossible to read the URL 1541 from the second URL storage file 1540, the process returns to step S102 (step 107), and if the reading is possible, the process goes to step S108. (Step S106) Here, the inability to read the URL from the second URL storage file means that the extraction of the hyperlink information and the index word selection for all the URLs present in the second URL storage file have been completed, and therefore, the first 1 Means that you need to assign a new root URL from the URL archive and reset the existing URL.

상기 단계 S106에서 URL의 판독이 가능하면, 웹페이지 획득모듈(1550)은 이 URL이 지시하는 웹페이지(200)를 인터넷(130)상의 웹서버(120)로부터 획득한다.(단계 S108)If the URL can be read in step S106, the web page obtaining module 1550 obtains the web page 200 indicated by the URL from the web server 120 on the Internet 130 (step S108).

웹페이지의 획득이 완료되면, 이 웹페이지는 하이퍼링크 추출모듈(1560)에 전달되고, 하이퍼링크 추출모듈(1560)은 상기 웹페이지에서 순차적으로 하이퍼링크를 추출한다.(단계 S109)When the acquisition of the web page is completed, the web page is delivered to the hyperlink extracting module 1560, and the hyperlink extracting module 1560 sequentially extracts the hyperlink from the web page (step S109).

이때, 웹페이지상에 추출할 수 있는 하이퍼링크가 존재하지 않는 경우(하이퍼링크의 추출이 불가능한 경우)에는 단계 S105로 복귀하고(단계 S111), 하이퍼링크가 존재하는 경우(하이퍼링크의 추출이 가능한 경우)에는 단계 S112로 넘어간다. (단계 S110) 여기서, 웹페이지상에서 추출할 수 있는 하이퍼링크가 존재하지 않는 경우는 웹페이지에서 추출할 수 있는 하이퍼링크가 현실적으로 전혀 존재하지 않는 경우와 웹페이지에 존재하는 모든 하이퍼링크의 추출이 완료된 경우를 포함한다.At this time, if there is no hyperlink that can be extracted on the web page (the extraction of the hyperlink is impossible), the process returns to step S105 (step S111), and if the hyperlink exists (the extraction of the hyperlink is possible). Case), the process proceeds to step S112. (Step S110) In this case, when there is no hyperlink to be extracted on the web page, when there is no hyperlink to be extracted from the web page at all and the extraction of all hyperlinks existing on the web page is completed. Includes cases.

상기 단계 S110에서 웹페이지로부터 하이퍼링크의 추출이 가능하면, 하이퍼링크 추출모듈(1560)은 추출된 하이퍼링크 정보로 부터 링크 URL과 링크텍스트를 획득한다. (단계 S112)If it is possible to extract the hyperlink from the web page in step S110, the hyperlink extraction module 1560 obtains the link URL and the link text from the extracted hyperlink information. (Step S112)

상기 단계 S112에서 획득한 링크 URL에서 루트 URL을 추출하여(단계 113), 이 루트 URL이 상기 단계 S102에서의 루트 URL과 일치하는지 여부를 판단한다. (단계 S114)The root URL is extracted from the link URL obtained in step S112 (step 113), and it is determined whether this root URL matches the root URL in step S102. (Step S114)

이때, 루트 URL이 서로 일치하면(단계 S115), 상기 단계 S112에서 획득한 링크 URL이 제 2 URL 저장파일에 존재하는지 여부를 판단한다. (단계 S116) 이때, 링크 URL이 제 2 저장파일에 존재하면 단계 S124로 진행하여 이 링크 URL이 디폴트 URL인지 여부를 판단하고, 존재하지 않으면 단계 S119로 넘어가서 해당 링크 URL을제 2 URL 저장파일에 저장한다.(단계 S119)At this time, if the root URLs match with each other (step S115), it is determined whether the link URL obtained in step S112 exists in the second URL storage file. (Step S116) At this time, if the link URL exists in the second storage file, the process proceeds to step S124 to determine whether the link URL is the default URL, and if not, proceeds to step S119 and transfers the link URL to the second URL storage file. Save. (Step S119)

상기 단계 S115에서 루트 URL이 서로 일치하지 않으면, 상기 단계 S113에서 추출한 루트 URL이 제 1 URL 저장파일에 존재하는지 여부를 판단한다. (단계 S120)If the root URLs do not coincide with each other in step S115, it is determined whether the root URL extracted in step S113 exists in the first URL storage file. (Step S120)

이때, 루트 URL이 존재하면 단계 S124로 진행하여 획득한 링크 URL이 디폴트 URL인지 여부를 판단하고, 존재하지 않으면 단계 S123로 넘어가서 상기 단계 S113에서 추출한 루트 URL을 제 1 URL 저장파일에 저장하고 단계 S124로 넘어간다.In this case, if the root URL exists, the process proceeds to step S124 to determine whether the obtained link URL is the default URL. If not, the process proceeds to step S123 and the root URL extracted in the step S113 is stored in the first URL storage file. Proceed to S124.

상기 단계 S113 ~ 단계 123까지의 과정은 하나의 루트 URL로부터 다른 루트 URL 및 서브페이지 링크 URL을 추출하여 제 1 및 제 2 URL 저장파일의 URL정보를 확장하는 과정을 설명하는 루틴이다. 이때, 중복되는 URL의 저장을 방지하기 위해서 추출된 루트 URL 및 서브페이지 링크 URL이 제 1 및 제 2 저장파일에 존재하는지 여부를 판단한다.Steps S113 to 123 are routines for explaining a process of extending URL information of the first and second URL storage files by extracting another root URL and a subpage link URL from one root URL. At this time, it is determined whether the extracted root URL and the subpage link URL exist in the first and second storage files in order to prevent storage of duplicate URLs.

단계 S124에서 하이퍼링크 분석모듈은 상기 단계 S112에서 획득한 링크 URL이 디폴트 URL인지 여부를 판단한다. 이때, 링크 URL이 디폴트 URL이 아닌 경우에는 단계 S109로 복귀하여(단계 S126) 해당 웹페이지상에서 또 다른 하나의 링크 URL을 추출하고, 디폴트 URL인 경우에는 단계 S127로 넘어간다.In step S124, the hyperlink analysis module determines whether the link URL obtained in step S112 is a default URL. At this time, if the link URL is not the default URL, the process returns to step S109 (step S126), and another link URL is extracted from the web page, and if it is the default URL, the process goes to step S127.

단계 S127에서 색인어 추출모듈은 해당 디폴트 URL의 링크 텍스트로부터 가능한 모든 색인어를 선정하여 「위치정보:색인어집합」[URL(0):{a, b, c}]를 생성한다.In step S127, the index word extraction module selects all possible index words from the link text of the corresponding default URL and generates "location information: index set" [URL (0): {a, b, c}].

상기 단계 S127에서 「위치정보:색인어집합」의 생성이 완료되면, 색인어 추출모듈은 상기 단계 S127에서 생성한 「위치정보:색인어집합」[URL(0):{a, b, c}]의 위치정보와 동일한 위치정보를 갖는 「위치정보:색인어집합」이 하이퍼링크 파일에 존재하는지 여부를 판단한다. (단계 S128)When generation of the "location information: index set" is completed in step S127, the index word extraction module generates the position of "location information: index set" [URL (0): {a, b, c}] generated in step S127. It is determined whether or not "location information: index set" having the same location information as the information exists in the hyperlink file. (Step S128)

이때, 동일한 위치정보를 갖는 「위치정보:색인어집합」이 하이퍼링크 파일에 존재하지 않는 경우에는 상기 S127에서 생성한「위치정보:색인어집합」을 하이퍼링크 파일에 그대로 저장한 후 단계 S109로 복귀하고(단계 S128b), 존재하는 경우에는 동일 위치정보에 호응하는 색인어들을 하이퍼링크 파일로부터 읽어들인다.(단계 S128c)If "Location Information: Index Set" having the same location information does not exist in the hyperlink file, the "Location Information: Index Set" generated in S127 is stored in the hyperlink file as it is and returns to Step S109. (Step S128b) If present, index words corresponding to the same positional information are read from the hyperlink file. (Step S128c)

상기 단계 S128c에서 판독한 색인어들(c, d, e)을 상기 단계 S127에서 생성한 「위치정보:색인어집합」[URL(0) : {a, b, c}]에 추가하여 확장된 「위치정보:색인어집합」[URL(0) : {a, b, c, d, e}]을 생성하고, 이를 하이퍼링크 파일에 저장함으로써 기존 「위치정보:색인어집합」데이터를 갱신(rewrite)한 후(단계 S129), 단계 S109로 복귀한다.(단계 S130)The extended "position" in addition to the "location information: index set" [URL (0): {a, b, c}] generated in the step S127 by the index words c, d and e read in the step S128c. Information: Index Set "(URL (0): {a, b, c, d, e}] is created and stored in the hyperlink file, and the existing" Location Information: Index Set "data is rewritten. (Step S129), the flow returns to step S109. (Step S130)

상술한 단계 S102 ~ 단계 S130의 루틴을 반복하는 과정에서 상기 단계 S102에서 더 이상 제 1 URL 저장파일로부터 판독할 루트 URL이 존재하지 않는 경우에는 단계 S131로 진행하여 색인파일 및 변환파일 생성 루틴을 실행한다.If the root URL to be read from the first URL storage file no longer exists in step S102 while repeating the routines of steps S102 to S130 described above, the process proceeds to step S131 to execute the index file and the conversion file generation routine. do.

링크 URL 획득모듈(1410)은 하이퍼링크 파일로부터 하나의 URL을 읽어들여 이 URL(0)(1411)을 제 2 웹페이지 회득모듈(1420)에 전달한다.(단계 S131)The link URL obtaining module 1410 reads one URL from the hyperlink file and delivers this URL (0) 1411 to the second webpage obtaining module 1420 (step S131).

제 2 웹페이지 획득모듈(1420)은 판독한 URL(0)이 지시하는 웹페이지(1422)를 인터넷(130)을 통해 해당 웹사이트(120)로부터 불러들이고, 이 웹페이지(1422)를 상기 핵심 색인어 추출모듈(1430)과 서브페이지 URL 추출모듈(1440)에 각각 전달한다.(단계 S132)The second web page obtaining module 1420 retrieves the web page 1422 indicated by the read URL (0) from the corresponding website 120 through the Internet 130, and loads the web page 1422 into the core. It is transmitted to the index word extraction module 1430 and the subpage URL extraction module 1440, respectively (step S132).

핵심 색인어 추출모듈(1430)은 상기 획득 웹페이지의 텍스트로부터 핵심 색인어들(a, f, g )을 선정하여 「위치정보 : 핵심 색인어리스트」[URL(0) : (a, f, g)](1431)를 생성하고, 이를 제 2 색인어 확장모듈(1450)에 전달한다.(단계 S132a)The core index word extracting module 1430 selects the core index words a, f, and g from the text of the obtained web page, and selects "Location information: Core index list" [URL (0): (a, f, g)]. 1143 and generate it, and transmit it to the second index extension module 1450 (step S132a).

제 2 색인어 확장모듈은 하이퍼링크 파일에서 상기 핵심 색인어 추출모듈에서 전달된 「위치정보 : 핵심 색인어리스트」[URL(0) : ((a, f, g)](1431)의 위치정보[URL(0)]에 호응하는 「위치정보 : 색인어집합」」[URL(0) : {a, b, c, d, eㆍㆍㆍ}](1610)을 읽어들인다.The second index word expansion module transmits the location information [URL () of "location information: key index word list" [URL (0): ((a, f, g)]] 1431 transmitted from the core index word extraction module in the hyperlink file. "Location information: Index word set" [URL (0): {a, b, c, d, e ...]] 1610 corresponding to 0) is read.

제 2 색인어 확장모듈(1450)은 이렇게 판독한 「위치정보: 색인어 집합」」[URL(0) : {a, b, c, d, eㆍㆍㆍ}](1610)에 상기 핵심 색인어리스트(a, f, g)를 추가하여 확장된 「위치정보 : 색인어집합」」[URL(0) : {a, b, c, d, e, f, gㆍㆍㆍ}](1451)을 생성하여 이를 가중치 부여모듈(1310)에 전달한다. (단계 S132c)The second index word expansion module 1450 reads the key index word list ("location information: index word set" "[URL (0): {a, b, c, d, e ...]] 1610 in this manner. a, f, g) is added to generate extended "location information: index word set" [URL (0): {a, b, c, d, e, f, g ...]] 1451 This is transmitted to the weighting module 1310. (Step S132c)

한편 서브페이지 URL 추출모듈(1440)은 상기 S132에서 획득한 웹페이지로부터 도 9에 도시된 바와같이 이 웹페이지와 트리구조로 연결된 깊이 N까지의 서브페이지들의 URL[URL(1), URL(2), URL(3)ㆍㆍㆍ]을 추출한 후, 상기 웹페이지의 URL(0)과 이 웹페이지의 서브페이지들의 URL[URL(1), URL(2), URL(3)ㆍㆍㆍ]을 서브페이지 획득모듈(1460)에 전달한다. (단계 S132b)On the other hand, the subpage URL extraction module 1440 is a URL (URL (1), URL (2) of subpages from the webpage obtained in step S132 to a depth N connected to the webpage in a tree structure as shown in FIG. ), URL (3) ..], and then URL (0) of the webpage and URLs of subpages of this webpage (URL (1), URL (2), URL (3) ...)] It is transmitted to the subpage acquisition module 1460. (Step S132b)

서브페이지 획득모듈(1460)은 상기 서브페이지 URL 추출모듈(1440)로부터 전달된 위치정보들[ULR(0), URL(1), URL(2), URL(3)ㆍㆍㆍ](1441)에 근거하여 인터넷으로부터 상기 웹페이지와 이 웹페이지의 서브페이지들(1463)을 획득하여 이를 가중치 부여모듈(1310)에 전달한다. (단계 S132d)The subpage obtaining module 1460 receives the location information [ULR (0), URL (1), URL (2), URL (3) ...] 1441 transmitted from the subpage URL extracting module 1440. The web page and the subpages 1463 of the web page are obtained from the Internet and transferred to the weighting module 1310. (Step S132d)

가중치 부여모듈(1310)은 도 10에 도시된 바와같이, 제 2 색인어 확장모듈 (1450)에서 전달되는 확장된 「위치정보 : 색인어집합」」[URL(0) : {a, b, c, d, e, f, gㆍㆍㆍ}](1451)으로부터 확장 색인어리스트(a, b, c, d, e, f, g)(1312)를 추출한다. 또, 이 색인어들을 이용하여 상기 서브페이지 획득모듈(1460)로부터 전달되는 웹페이지와 이 웹페이지의 서브페이지들의 텍스트들(1463)에서 각 색인어별 발생빈도(1464)를 체킹하고, 이 발생빈도에 근거하여 「위치정보 : 색인어(가중치)집합」」[URL(0) : {a(w1), b(w2), c(w3)ㆍㆍㆍ}](1311)을 생성한 후(단계 S135), 이를 색인파일(1800)에 저장한다. (단계 S136)As shown in FIG. 10, the weighting module 1310 extends the extended "location information: index word set" transmitted from the second index word expansion module 1450 [URL (0): {a, b, c, d , e, f, g ...}] 1145 are extracted from the extended index list (a, b, c, d, e, f, g) 1312. In addition, the index words are used to check the frequency of occurrence 1464 of each index word in the texts 1463 of the web page and the subpages of the web page transmitted from the subpage obtaining module 1460, Based on "Location information: index word (weighted) set" "[URL (0): {a (w1), b (w2), c (w3) ...}] 1313 (step S135) This is stored in the index file 1800. (Step S136)

변환파일 생성모듈(1710)은 색인파일에 저장되어 있는 「위치정보 : 색인어(가중치)집합」」[URL(0) : {a(w1), b(w2), c(w3)ㆍㆍㆍ}](1311)을 「색인어 : 위치정보(가중치)집합」」[Index=a : {URL1(w1), URL2(w2), URL3(w3)ㆍㆍㆍ}](1711)으로 변환한 후, 이를 변환파일(1700)에 저장함으로써 색인 데이터베이스를 형성한다. (단계 S137)The conversion file generation module 1710 stores " location information: index word (weighted) set " " stored in the index file [URL (0): {a (w1), b (w2), c (w3) ...] ] 1311 to "index: location information (weighted) set" "[Index = a: {URL1 (w1), URL2 (w2), URL3 (w3) ...]] (1711) The index database is formed by storing in the conversion file 1700. (Step S137)

<색인 데이터베이스의 검색><Search the index database>

색인 데이터베이스(또는 변환파일)에 대한 검색은 상술한 실시예 1과 동일하기 때문에 실시예 1을 참조하는 것에 의해 그 설명을 생략한다.Since the search for the index database (or the conversion file) is the same as in the above-described first embodiment, the description thereof is omitted by referring to the first embodiment.

<검색결과의 표시><Display of search results>

실시예 2의 검색엔진을 이용하여 검색한 정보를 사용자 컴퓨터의 브라우저 화면상에 뷰잉하는 방법 역시 상술한 실시예 1(도 6 참조)과 사실상 동일하기 때문에 실시예 1을 참조하는 것에 의해 그 상세한 설명을 생략한다.The method of viewing the searched information on the browser screen of the user's computer using the search engine of the second embodiment is also substantially the same as that of the first embodiment (see FIG. 6) described above. Omit.

<검색결과의 저장 및 이용방법><How to save and use search results>

일반적으로 검색엔진에 의해 검색되는 정보들 중에는 사용자가 상시적으로(또는 주기적으로) 이용하고 싶어하는 정보가 있다. 따라서, 네비게이트나 익스플로러와 같은 웹브라우저에는 『즐겨찾기』라는 기능이 존재한다. 즉, 이 즐겨찾기라는 기능은 사용자가 자주 드나드는 웹페이지의 URL 정보가 사용자 컴퓨터의 메모리내에 저장되어, 사용자가 브라우저를 호출한 상태에서 자주 드나드는 웹페이지의 URL정보를 메모리로부터 판독하여 이 URL에 호응하는 웹페이지에 접속하면 된다.In general, among the information retrieved by the search engine is information that the user wants to use regularly (or periodically). Therefore, there is a feature called "Favorites" in web browsers such as Navigator and Explorer. In other words, this bookmark function stores the URL information of web pages frequently visited by users in the memory of the user's computer, and reads the URL information of web pages frequently visited by the user from the memory while calling the browser. Simply connect to a web page that supports.

그러나, 이러한 웹브라우저상의 즐겨찾기 기능은 공간적 한계를 지니고 있다. 요즘과 같이 인터넷을 접속할 수 있는 공간이 사용자의 집, 직장, PC방 등으로 다양해지는 상황에서 개별 컴퓨터상의 웹브라우저에 의존적인 즐겨찾기 기능은 그 유용성이 떨어진다.However, the bookmark function of the web browser has a space limitation. As the space for accessing the Internet is diversified to users' homes, workplaces, and PC rooms, the bookmark function, which is dependent on a web browser on an individual computer, is less useful.

따라서, 본 발명자는 도 12와 같이 사용자가 검색한 정보(즐겨찾는 웹페이지의 주소)를 인터넷상의 특정 서버에 저장하고, 이 서버에 접속한 상태에서도 웹브라우저의 도움없이 웹서핑을 즐길 수 있는 새로운 검색정보의 이용방법을 창안하였다.Therefore, the present inventor stores a user's searched information (the address of a favorite web page) as shown in FIG. 12 in a specific server on the Internet, and enjoys surfing the web without the help of a web browser even when connected to the server. The method of using search information was created.

도 12에 도시된 바와 같이, 본 발명에 따른 검색정보의 이용방법은 인터넷과 같은 네트워크를 통해 실현된다.As shown in Fig. 12, a method of using search information according to the present invention is realized through a network such as the Internet.

다중 프로세서(반드시 다중 프로세서일 필요는 없다)(2430)와 메모리(2420)를 갖는 검색엔진 서버(2400)는 색인 데이터베이스(2410)와 접속되어 있다.The search engine server 2400 having multiple processors (not necessarily multiple processors) 2430 and memory 2420 is connected to the index database 2410.

사용자는 클라이언트 검퓨터(2200)를 통해 검색엔진 서버(2400)에 접속한 상태에서 질의어를 입력하는 것에 의해 검색정보(질의어에 호응하는 웹페이지의 주소리스트)를 제공받는다. 또는 사용자는 상기 실시예 1 또는 실시예 2와 같은 검색엔진을 이용하는 것에 의해 최고 순위 위치정보와 내림차순으로 정리된 위치정보 리스트를 검색엔진으로 부터 제공받을 수 있다.The user is provided with search information (address list of a web page corresponding to the query) by inputting a query word while accessing the search engine server 2400 through the client computer 2200. Alternatively, the user may be provided with the location information list arranged in descending order with the highest ranking location information by using the same search engine as in the first embodiment or the second embodiment from the search engine.

이때, 사용자는 검색결과로 제공된 웹페이지들의 위치정보 리스트중 적어도 하나 이상의 위치정보를 선택하고, 클라이언트 컴퓨터(2200)의 마이링크 버튼 (2210)을 클릭함으로써 선택된 위치정보 리스트를 마이링크 서버(2500)에 전송한다.In this case, the user selects at least one or more location information from the location information list of the web pages provided as a search result, and clicks the MyLink button 2210 of the client computer 2200 to display the selected location information list in the MyLink server 2500. To transmit.

클라이언트 컴퓨터(2200)로부터 전송된 위치정보 리스트는 마이링크 서버 (2500)의 메모리(2510)내에 저장된다. 이때, 해당 사용자가 마이링크 서버(2500)와 회원가입 등에 의해 특별히 링킹되어 있는 경우에는 상기 메모리(2510)내에는 해당 사용자의 위치정보 리스트를 저장하기 위한 사용자 저장영역이 존재한다.The location information list transmitted from the client computer 2200 is stored in the memory 2510 of the mylink server 2500. At this time, when the user is specifically linked by the MyLink server 2500 and membership registration, etc., there is a user storage area for storing the location information list of the user in the memory 2510.

여기서, 마이링크 서버는 사용자의 개인정보를 관리하는 웹사이트 또는 사용자의 홈페이지가 상주하는 웹서버를 포함한다. 물론, 본 발명의 마이링크 서버가 반드시 이러한 웹서버로 한정되는 것은 아니다.Here, the mylink server includes a web server that manages the user's personal information or a web server where the user's homepage resides. Of course, the mylink server of the present invention is not necessarily limited to such a web server.

또한, 도 6과 같이 검색결과가 프레임으로 나누어져서 사용자 컴퓨터의 브라우저 화면상에 표시될 경우에는 최고 순위 위치정보에 호응하는 웹페이지의 URL을 선택적으로(또는 자동으로) 마이링크 서버에 전송하여 저장할 수도 있다.In addition, as shown in FIG. 6, when the search results are divided into frames and displayed on the browser screen of the user's computer, URLs of web pages corresponding to the highest position information are selectively (or automatically) transmitted to the MyLink server for storage. It may be.

이와 같이, 사용자가 검색을 통해 확보한 위치정보들이 마이링크 서버에 저장되어 있다면, 사용자는 마이링크 서버에 접속하여 해당 서버가 제공하는 서비스를 즐기면서 브라우저의 도움없이 즐겨찾기 기능을 실현할 수 있다.As such, if the location information obtained by the user is stored in the MyLink server, the user may access the MyLink server and enjoy the bookmark function without the help of a browser while enjoying the services provided by the server.

상술한 실시예에 본 발명이 한정되는 것은 아니며 본 발명이 속하는 기술분야에서 통상의 지식을 갖는 자에 의해 본 발명의 기술사상과 아래에 기재될 특허청구범위의 균등범위내에서 다양한 수정 및 변형이 가능함은 물론이다.The present invention is not limited to the above-described embodiments, and various modifications and variations are made by those skilled in the art within the equivalent scope of the technical concept of the present invention and the claims to be described below. Of course it is possible.

본 발명의 검색엔진은 색인작업을 최소화시키면서도 질의어에 대응하는 정보에 신속하고도 정확하게 접근할 수 있도록 한다.The search engine of the present invention enables quick and accurate access to information corresponding to a query while minimizing indexing.

또한, 본 발명의 검색엔진은 질의어와 가장 관련성이 높은 웹페이지 문서를 브라우저 화면상에 논스톱으로 디스플레이하기 때문에 원하는 정보에 접근하는 과정을 간소화할 수 있다.In addition, the search engine of the present invention can non-stop displaying the web page document most relevant to the query on the browser screen to simplify the process of accessing the desired information.

또한, 본 발명은 검색엔진으로부터 제공되는 검색정보를 특정 웹서버에 저장하여 두고, 사용자 자신의 컴퓨터 뿐만 아니라 인터넷 접속이 가능한 모든 컴퓨터에서 활용할 수 있도록 한다.In addition, the present invention stores the search information provided from the search engine in a specific web server, so that not only your own computer but also any computer that can access the Internet.

Claims

A communication interface for acquiring the web page through the Internet from the location information (URL) of the worldwide web page;

Hyperlink extracting means for extracting one by one all hyperlinks on the webpage acquired by the communication interface one by one;

Hyperlink selecting means for selecting only hyperlinks having a default URL that can access a web page even if a file name is omitted or omitted in a uniform resource locator (URL) structure;

Collecting means for collecting hyperlinks having the same default URL from a World Wide Web database;

Index word extracting means for extracting an index word from the link text of the hyperlinks collected by said collecting means;

As a result, a worldwide web page indexing system for indexing the web page indicated by the default URL with the extracted index terms.

The method of claim 1, wherein the index word extraction means

An index word extraction module for forming " location information: index list " from hyperlink information indicating a web page corresponding to the default URL;

A first storage file for storing and storing the " location information: index list ";

"Location information: index list (weighted list)" by searching all index list corresponding to the specific location information stored in the first storage file, counting the occurrence frequency for each index word, and weighting the index word from the occurrence frequency. An index module for indexing a web page indicated by location information;

A second storage file for storing the " location information: index list ";

A conversion module for converting a "location information: index list (weighted list)" of the second storage file into an "index index: location list (weighted list)";

And a third storage file for storing the "index: location information (weight) list".

The method of claim 2,

In assigning weights to index terms, we calculate not only the frequency of occurrence but also the length and length of the service periods of the web pages corresponding to the index terms, so that web pages with shorter service periods have higher weights than web pages with longer service periods. Indexing system for worldwide web pages, characterized in that the grant.

A system that indexes web pages using a distributed database on the Internet.

A hyperlink index index is obtained which obtains a web page from the distributed database based on the location information (URL), selects an index word from the link text of the hyperlink existing on the web page, and forms "location information: index set". Way;

A hyperlink file for storing the "location information: index set";

Reads " location information: index set " stored in the hyperlink file, selects a key index word from text on a web page corresponding to the read location information, and indexes the index set with location information which is a URL in a hyperlink. Index word expansion means for expanding;

Acquires a subpage up to a depth N connected to the webpage in a tree structure, and searches the obtained subpage and the webpage to check the frequency of occurrence of each index word of the extended index word set according to the frequency of occurrence. Index word weighting means for weighting index words of the extended index word set;

And an index file for storing " location information: index set (weighted) set " weighted by the index word weighting means.

The method according to claim 4, wherein the hyperlink indexing means is extracted.

A web page obtaining module for obtaining a corresponding web page from the location information (URL) of the worldwide web page through the Internet;

A hyperlink extracting module for sequentially extracting all the hyperlinks on the web page acquired by the web page obtaining module one by one;

A hyperlink sorting module that selects only hyperlinks having a default URL that can access a corresponding web page even if a file name is omitted or omitted from a uniform resource locator (URL) structure;

And an index word extracting module that selects an index word from link texts of the hyperlinks selected by the sorting module to form "location information: index word set".

The method of claim 4, wherein the index expansion means

A web page acquiring module for acquiring one link URL from the hyperlink storage file and importing a web page corresponding to the link URL from the distributed database;

A core index word extracting module for selecting a core index word from the document text on the web page;

And an index word expansion module that reads " location information: index set " having the same location information as the link URL stored in the hyperlink index file, and expands the index set by adding the core index word. Indexing system for worldwide web pages.

The method of claim 6,

The key index word is a word extracted from a title tag of an HTML document on a web page, a tagged phrase used for emphasis, and a phrase appearing after a clue word that is a suggestive word. Or new words or proper nouns, such as words that are not registered in the vocabulary electronic dictionary.

The method of claim 7, wherein the index word weighting means

A subpage URL extraction module for extracting location information of a subpage up to a depth N connected in a tree structure to a webpage acquired by the webpage obtaining module based on the obtained webpage;

A subpage obtaining module for obtaining a corresponding subpage from the distributed database based on the location information of the subpage;

Search all subpages and corresponding webpages acquired for the index words of "Location Information: Index Set" generated by the Index Expansion Module, and count the frequency of occurrence of the index words in the document text of the pages. Indexing system of the worldwide web page, characterized in that it comprises a weighting module for adding a weight according to the frequency of occurrence.

The method of claim 8,

In the weighting module assigning a weight to each index word based on the occurrence frequency of the index word, the weighting module assigns a higher weight to the index word selected from the link text of the hyperlink, and the tree root tree page of the web page tree structure. The indexing system of the worldwide web page is characterized by giving a higher weight to the index word searched in the sub-page than the index word searched in the subpage.

In indexing web pages using distributed databases on the Internet,

Forming " location information: index set " so that an index word for a specific web page and location information of the web page correspond to each other;

Retrieving location information of the subpages up to a depth N connected in a tree structure with the location information of the "location information: index set";

Obtaining a subpage corresponding to the location information extracted in the drawing step from the distributed database;

And counting the frequency of occurrence of the index words of the " location information: index set " in the web page and the sub-page of the web page, and assigning a weight to each index word according to the frequency of occurrence. How to weight it.

The method of claim 10,

In assigning a weight to each index word, the weighted index word is characterized by reflecting the weight of the index word as well as the importance of the structure-based search.

Extracting an index word by analyzing a query word input through a user computer;

Retrieving a plurality of web page location information corresponding to the index terms from the index database using the extracted index terms;

Processing the retrieved location information according to the query word;

Arranging the processed plurality of position information in descending order according to the weight thereof;

Displaying the web page corresponding to the highest position information on the user's computer non-stop by transmitting the highest position information among the plurality of position information arranged in descending order to a web browser of the user's computer. Viewing method.

The method of claim 12,

And displaying a list of the plurality of location information arranged in descending order on a user computer.

The method of claim 13,

And a web page corresponding to the highest position information and a list of the position information are simultaneously displayed on one screen.

The method of claim 14,

The display screen of the user's computer is divided into frames, the web page and the location information list are simultaneously displayed in different frames, and the remaining frames are provided with a search box for inputting a query and an advertisement display unit. Viewing method.

A search engine server analyzing the query words transmitted from the user computer to extract at least one or more index words, and retrieving a plurality of document location information (URLs) related to the extracted index words from the index database;

Processing the retrieved location information according to the query word;

Transmitting, by the search engine server, the processed location information list to a user computer;

Displaying the location information list on a screen of a user computer;

Selecting at least one or more desired location information from the location information list displayed on the user computer;

Transmitting the selected at least one location information to a specific server linked with a user computer;

And storing the location information transmitted by the specific server in a storage area of the corresponding user.

Transmitting, by the search engine server, location information of the search document to a user computer;

Displaying a search document on a screen of the user computer by a web browser of the user computer based on the location information;

Transmitting location information of the displayed search document to a specific server linked with the user computer according to a user's decision;

And storing the location information transmitted by a specific server in a storage area of the corresponding user.

A plurality of user computers and a server computer are connected by using the Internet, and the server is provided with a plurality of storage areas for the plurality of user computers linked with the server, and the plurality of web documents transmitted from the user computer in the storage area. In the networking environment where the location information list is stored for each user,

Connecting a user to the server computer using his computer;

Retrieving a plurality of location information lists from a storage area for the user among the storage areas of the server computer and displaying them on the user computer;

Accessing a web page corresponding to the location information by selecting desired location information from the displayed location information list;

As a result, a user can easily surf the web simply by connecting to the server computer.