KR100509276B1

KR100509276B1 - Method for searching web page on popularity of visiting web pages and apparatus thereof

Info

Publication number: KR100509276B1
Application number: KR10-2001-0049944A
Authority: KR
Inventors: 이준호; 최병엽; 안정수
Original assignee: 엔에이치엔(주)
Priority date: 2001-08-20
Filing date: 2001-08-20
Publication date: 2005-08-22
Also published as: JP2003076715A; JP3802813B2; KR20030016037A

Abstract

본 발명은 사용자 컴퓨터의 디스크 캐쉬정보를 이용하여 추출한 웹페이지별 방문인기도에 기반한 웹페이지 검색방법 및 그 장치에 관한 것으로, (a) 사용자들이 방문한 웹페이지들에 대한 URL정보를 수신하는 단계;(b) 수신한 URL정보에서 아이피 주소를 확인하여 중복 도메인을 제거하고 사용자별로 방문한 URL정보를 추출하는 단계;(c) 사용자별로 방문한 URL정보로부터 웹페이지별 방문횟수로 재정렬하는 단계;(d) 웹페이지들에 대한 방문인기도를 소정의 값으로 환산하여 저장하는 단계를 포함하므로, 인터넷 사용자들이 실제로 방문한 웹사이트들에 대한 인기도를 반영함으로써 사이트의 인기도를 고려한 검색결과를 제공할 수 있고 인터넷 사용자들의 행동을 분석하여 보다 관련성이 많은 사이트를 검색결과에 제공 할 수 있다.The present invention relates to a web page retrieval method and apparatus based on the visit popularity for each web page extracted by using the disk cache information of the user computer, comprising: (a) receiving URL information about web pages visited by users; b) removing the duplicate domains by extracting the IP address from the received URL information and extracting URL information visited for each user; (c) reordering the number of visits per web page from the URL information visited for each user; (d) the web It includes the step of converting the visit popularity of the pages into a predetermined value, so that Internet users can provide the search results in consideration of the popularity of the site by reflecting the popularity of the websites actually visited and the behavior of the Internet users You can analyze this to provide more relevant sites in your search results.

Description

Method for searching web page on popularity of visiting web pages and apparatus regarding

본 발명은 인터넷 상의 웹페이지 검색분야에 관한 것으로, 특히 사용자 디스크 캐쉬정보를 이용하여 추출한 페이지별 방문인기도에 기반한 웹페이지 검색방법 및 그 장치에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to the field of searching web pages on the Internet, and more particularly, to a method and apparatus for searching a web page based on visit popularity by page extracted using user disk cache information.

종래의 검색방법은 WWW상의 웹페이지를 통해서 검색결과를 제공하는데, 이는 사용자의 웹페이지 또는 서버 상의 웹페이지에 대한 인기도 여부에 상관없이 검색 질의에 사용된 단어를 포함하는 웹페이지를 검색결과에 반영하여 제공하는 방식이다. 다시 설명하면, 검색 로봇을 이용하여 인터넷 사이트의 자료를 찾아 분류하여 데이터베이스화하고, 사용자가 검색하고자하는 단어를 입력하면 구축된 데이터베이스를 검색하여 입력된 단어와 매칭되는 사이트 정보를 제공하거나 사용자에게 검색 업체가 구축한 분류 트리를 제시하여 사용자 자신이 분류 트리를 따라 내려가며 자신이 원하는 사이트를 찾아내는 검색 방식을 제공하고 있다. 이와 같은 검색방식에 의하면, 사용자의 실제 사용량이 반영되지 않은 검색결과로, 실제 사용자의 의도와 관계 없는 검색결과를 제공하거나 실제 인터넷 사용자의 웹사이트 인기도를 반영하는 보다 현실적이고 사이트의 인기도를 고려한 검색결과를 제공할 수 없게 된다.Conventional search methods provide search results through web pages on the WWW, which reflects the web page containing the words used in the search query regardless of the popularity of the user's web page or web page on the server. It is a way to provide. In other words, the search robot is used to find and classify the data on the Internet site, and if the user inputs a word to search, the constructed database is searched to provide the site information matching the entered word or to the user. By presenting the classification tree constructed by the company, the user himself goes down the classification tree and provides a search method that finds the site he wants. According to this search method, the search results that do not reflect the actual usage of the user, provides a search result irrelevant to the actual user's intentions, or more realistic, considering the popularity of the website of the actual Internet users, the site popularity search You will not be able to provide results.

종래의 웹페이지 검색에서 문서에 대한 가중치 계산에 대한 공식은 아래의 수학식 1과 같다.In the conventional web page search, a formula for calculating a weight for a document is represented by Equation 1 below.

문서 가중치 = α* 유사도 + (1 - α) * 링크인기도Document weight = α * similarity + (1-α) * link popularity

여기서, 검색어가 문장질의어인 경우에는 α> 0.5, 검색질의어가 단어인 경우에는 α< 0.5이다.Here, if the search term is a sentence query, α> 0.5, and if the search query is a word, α <0.5.

기존의 웹검색기술은 검색대상의 내용에 대한 정보와 그 정보로 링크를 걸고 있는 페이지수에 따른 검색결과를 제공하였다. 그러나 실제로 인터넷 웹사용자들이 어떠한 사이트를 주로 사용하는지에 대한 반영이 빠져있음으로 인해, 보다 관련성이 큰 사이트에 대한 결과치를 정확히 제공하지 못하는 단점을 가지고 있다.Existing web search technology has provided the search results according to the information about the contents of the search object and the number of pages linking to the information. However, due to the lack of reflection on which sites are actually used by Internet web users, it does not provide accurate results for more relevant sites.

본 발명이 이루고자 하는 기술적 과제는, 상기 문제점들을 해결하기 위해 사용자 디스크 캐쉬정보를 이용하여 추출한 웹페이지별 방문인기도에 기반한 웹페이지 검색방법 및 그 장치를 제공하는데 있다.SUMMARY OF THE INVENTION The present invention has been made in an effort to provide a web page retrieval method and apparatus based on visit popularity for each web page extracted by using user disk cache information to solve the above problems.

본 발명이 이루고자 하는 다른 기술적 과제는, 상기 방법을 컴퓨터에서 실행시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공하는데 있다.Another object of the present invention is to provide a computer-readable recording medium having recorded thereon a program for executing the method on a computer.

상기의 과제를 이루기 위한 본 발명에 따른 웹페이지별 방문인기도에 기반한 웹페이지 검색방법은, (a) 사용자들이 방문한 웹페이지들에 대한 URL정보를 수신하는 단계;(b) 상기 수신한 URL정보에서 아이피 주소를 확인하여 중복 도메인을 제거하고 사용자별로 방문한 URL정보를 추출하는 단계;(c) 상기 사용자별로 방문한 URL정보로부터 웹페이지별 방문횟수로 재정렬하는 단계;(d) 상기 웹페이지별 방문횟수를 방문인기도로 환산하여 저장하는 단계를 포함한다.According to the present invention, a web page retrieval method based on the visit popularity for each web page according to the present invention comprises the steps of: (a) receiving URL information about web pages visited by users; (b) in the received URL information Removing duplicate domains by extracting an IP address and extracting URL information visited for each user; (c) reordering the visited URL information for each webpage from the URL information visited for each user; And converting the visited popularity into a store.

상기의 과제를 이루기 위한 본 발명에 따른 웹페이지별 방문인기도에 기반한 웹페이지 검색방법은, (a) 사용자로부터 입력한 검색어를 포함하는 웹페이지를 추출하는 단계;(b) 상기 추출한 웹페이지들을 소정의 가중치에 따라서 재배열하는 단계;(c) 상기 사용자가 선택한 출력유형으로 웹페이지들의 리스트를 제공하는 단계를 포함한다.According to an embodiment of the present invention, a webpage search method based on visit popularity for each webpage includes: (a) extracting a webpage including a search word input from a user; (b) predetermined webpages of the extracted webpages; (C) providing a list of web pages according to the output type selected by the user.

상기의 과제를 이루기 위한 본 발명에 따른 웹페이지별 방문인기도에 기반한 웹페이지 검색장치는, 사용자들이 방문한 웹페이지들에 대한 URL정보 또는 사용자가 입력하는 검색어를 입력받는 입력부;상기 수신한 URL정보에서 아이피 주소를 확인하여 중복 도메인을 제거하고 사용자별로 방문한 URL정보를 추출하는 URL추출부;상기 입력받은 검색어를 포함하는 웹페이지에 대한 URL을 검색하여 추출하는 검색부;상기 사용자별로 방문한 URL정보로부터 웹페이지별 방문횟수로 재정렬하는 웹페이지배열부;상기 웹페이지들에 대한 방문인기도를 소정의 값으로 환산하여 저장하는 저장부;상기 검색부에서 추출한 웹페에지에 대한 URL을 사용자에게 제공하는 출력부를 포함를 포함한다.In accordance with an aspect of the present invention, there is provided a web page search apparatus based on visit popularity for each web page, including: an input unit configured to receive URL information about a web page visited by a user or a search word input by the user; A URL extractor for removing duplicate domains by extracting an IP address and extracting URL information visited for each user; a searcher for searching and extracting a URL for a webpage including the input search word; a web from the URL information visited for each user Web page arrangement unit for rearranging by the number of visits per page; Storage unit for converting the visit popularity degree for the web page to a predetermined value; Store; Include.

이하에서, 첨부된 도면을 참조하여 본 발명의 바람직한 실시 예에 대하여 상세히 설명한다.Hereinafter, with reference to the accompanying drawings will be described in detail a preferred embodiment of the present invention.

도 1은 본 발명에 따른 디스크 캐쉬정보를 이용하여 추출한 웹페이지별 방문인기도에 기반한 웹페이지의 URL정보를 저장하는 방법에 대한 흐름을 나타내는 도면으로, 사용자가 방문한 웹페이지의 URL정보의 도메인명을 정규화하여 각 웹페이지별 방문인기도를 저장하게 된다.1 is a flow chart illustrating a method for storing URL information of a web page based on a visit popularity degree for each web page extracted by using the disk cache information according to the present invention. Normalized to store visits for each web page.

도 2는 본 발명에 따른 검색방법에서 링크인기도를 계산하기 위한 웹페이지간에 링크되어 있는 도면을 나타내고 있다. 도 2에 대한 상세설명은 링크인기도 계산을 설명하면서 자세히 다루기로 한다.2 is a diagram showing links between web pages for calculating link popularity in the search method according to the present invention. Detailed description of Figure 2 will be described in detail while explaining the link popularity calculation.

사전동의를 얻은 인터넷 사용자의 브라우저 디스크 캐쉬파일을 추출하거나 사용자로부터 방문한 웹페이지에 대한 URL정보를 수신(110단계)하여, 수신한 URL정보에서 통신규약을 나타내는 부분(예를 들면, http://)을 제거(120단계)한 후 각 웹페이지의 도메인명에 대한 아이피(IP, Internet Protocol)주소 확인 검사를 통해서 중복 도메인을 제거하여 사용자별로 방문한 URL정보를 추출(130단계)하여 웹페이지별 방문횟수를 재정렬(140단계)한 후에 각 웹페이지에 대한 방문인기도를 계산하여 저장(150단계)한다.Extract the browser disk cache file of the Internet user who has obtained the prior consent or receive URL information about the visited web page from the user (step 110), and indicate the communication protocol in the received URL information (for example, http: // ), And extract URL information visited by each user (step 130) by removing duplicate domains through IP (Internet Protocol) address verification test of each web page's domain name. After rearranging the number of times (step 140), the visited popularity for each web page is calculated and stored (step 150).

방문인기도 계산은 사용자 n명의 방문페이지에 대한 통계를 통해 추출되는데, U1, U2, U3, ... Un 의 사용자별 페이지 방문결과를 반대로 페이지별 방문횟수로 재정렬한다. 해당 페이지의 방문인기도는 아래와 같다.The visit popularity calculation is extracted from the statistics of the landing pages of n users, and the page visit results of U1, U2, U3, ... Un are rearranged by the number of visits per page. The visit popularity of the page is as follows.

해당 페이지의 방문인기도 = 실제 방문자수/전체 사용자수Visit popularity for this page = actual visitors / total users

웹페이지에 포함되어 있는 단어의 빈도수에 따라서 웹페이지를 배열하는 유사도(Content Similarity)도에 따라서 웹페이지의 URL정보를 저장한다.The URL information of the web page is stored according to the degree of similarity (Content Similarity) in which the web pages are arranged according to the frequency of words included in the web page.

그리고, 웹페이지의 링크인기도는 아래와 같이 구할 수 있다. 도 2는 링크인기도를 설명하기 위해서 하나의 웹페이지에 다른 웹페이지들이 링크되어 있는 관계를 나타낸다.And the link popularity of the web page can be obtained as follows. 2 illustrates a relationship in which other web pages are linked to one web page for explaining the link popularity.

전제조건으로 모든 기본 웹페이지는 I(0)의 기본값을 가지는 것으로 가정한다. 웹페이지 P의 링크인기도를 계산하기 위해서는 웹페이지 P로 향하여 링크되어 있는 웨페이지의 갯수를 측정하면 된다. 도 2에서 나타낸 것처럼, 웹페이지 A는 외부로 향하는 링크 3개의 가지 중 한 개의 링크가 웹페이지 P를 지정하고 있고 웹페이지 B는 외부로 향하는 링크 2개의 가지 중 한 개의 링크가 웹페이지 P를 지정하고 있고 웹페이지 C는 외부로 향하는 링크 1가지 중 한 개의 링크가 P를 지정하고 있기 때문에 웹페이지 P의 링크인기도는,As a prerequisite, all default web pages are assumed to have a default value of I (0). To calculate the link popularity of web page P, we measure the number of web pages linked to web page P. As shown in Fig. 2, web page A designates web page P in one of three outbound links and web page B designates web page P in one of two outbound links. And web page C has a link to one of the outbound links, so

웹페이지 P의 링크인기도 = 본 페이지의 기본값 + 페이지 A로부터의 링크인기값 + 페이지 B로부터의 링크인기값 + 페이지C 로부터의 링크인기값Link popularity of web page P = default value of this page + link popularity value from page A + link popularity value from page B + link popularity value from page C

= I(0) + I(0)/3 + I(0)/2 + I(0)/1= I (0) + I (0) / 3 + I (0) / 2 + I (0) / 1

I(0)의 값은 서비스 방법에 따라 다양한 수치를 가질 수 있다.The value of I (0) may have various values depending on the service method.

도 3은 본 발명에 따른 사용자 디스크 캐쉬정보를 이용하여 추출한 웹페이지별 방문인기도에 기반한 웹페이지 검색방법에 대한 일실시예를 나타내는 도면으로, 사용자로부터 검색어를 입력받아 그 검색어에 해당하는 웹페이지를 제공할 때 웹페이지의 방문인기도, 링크인기도 또는 유사도 중에서 적어도 하나에 해당하는 순위에 따라서 배열하여 제공한다. 3 is a view showing an embodiment of a web page search method based on the visit popularity for each web page extracted by using the user disk cache information according to the present invention. When providing, it is arranged according to the ranking corresponding to at least one of the visit popularity, link popularity or similarity of the web page.

사용자로부터 검색어를 입력(310)받아 그 검색어를 포함하는 웹페이지들을 추출(320단계)하게 된다. 추출한 웹페이지들을 방문인기도, 유사도, 또는 링크인기도 중에서 적어도 하나의 순위에 따라서 웹페이지들을 재배열(330단계)한다.In operation 320, the user inputs a search word from the user and extracts web pages including the search word. The extracted web pages are rearranged (step 330) according to at least one ranking among visit popularity, similarity, and link popularity.

웹페이지들을 방문인기도, 유사도, 또는 링크인기도 등에 따라서 검색하는 방법을 예를 들어 설명하면, 사용자가 하나의 검색어를 입력하여 검색어에 대한 결과로써 n개의 문서정보를 검색하였다면, 유사도에 따른 검색결과와 유사도는 표 1에 나타나 있는 것처럼 유사도 즉, 검색어의 빈도수가 높은 문서부터 순차적으로 나와있다. For example, a method of searching web pages according to visit popularity, similarity, or link popularity may be described.If a user inputs one search term and searches for n pieces of document information as a result of the search term, Similarity is shown sequentially from the documents with high similarity, that is, the frequency of search terms, as shown in Table 1.

문서document 유사도Similarity D1D1 0.90.9 D2D2 0.830.83 .. .. DnDn 0.020.02

웹페이지에 링크되어 있는 다른 웹페이지의 다소를 나타내는 링크인기도에 따른 문서의 검색결과는 표 2와 같다.Table 2 shows the search results of the document according to the link popularity, which is somewhat indicative of the other web pages linked to the web pages.

문서document 링크인기도Link D1D1 0.0010.001 D2D2 0.0020.002 .. .. DnDn 0.1000.100

사용자가 웹페이지를 실제로 방문한 방문인기도에 따른 결과는 표 3에 나타나 있다.Table 3 shows the result of the visit popularity visit when the user actually visited the web page.

문서document 방문인기도Visit D1D1 0.010.01 D2D2 0.020.02 .. .. DnDn 0.900.90

상기 표에서 유사도 검색(Content similarity)을 통해서 얻은 결과값, 링크인기도(Link Popularity)를 통해 얻은 결과값 또는 방문인기도를 통해서 얻은 결과값을 아래의 수학식 2의 문서가중치에 의해서 재정렬한다.In the above table, the results obtained through the content similarity search, the results obtained through the link popularity, or the results obtained through the visit popularity are rearranged by the document weight of Equation 2 below.

문서 가중치 = α* 유사도 + β* 링크인기도 + γ* 방문인기도Document Weight = α * Similarity + β * Linkedness + γ * Visitedness

여기서, α+ β+ γ= 1 이다.Here, alpha + beta + gamma = 1.

즉, α의 값이 큰 경우에는 유사도에, β의 값이 큰 경우에는 링크인기도에, γ의 값이 큰 경우에는 방문인기도에 비중을 둔 검색결과를 나타내게 된다.In other words, when the value of α is large, the search result is given to the similarity, and the value of β is the linkage degree, and when the value of γ is large, the visit degree is weighted.

사용자가 유사도, 링크인기도 또는 방문인기도 중에서 적어도 하나의 가중치에 비중을 두는 유형을 선택하면, 사용자가 선택한 출력유형에 따라서 출력(340단계)하게 된다.When the user selects a type that weights at least one weight among similarity, link popularity, or visit popularity, the output is performed according to the output type selected by the user (step 340).

도 4는 본 발명에 따른 사용자 디스크 캐쉬정보를 이용하여 추출한 웹페이지별 방문인기도에 기반한 웹페이지 검색장치에 대한 도면이다.4 is a diagram illustrating an apparatus for searching a web page based on the visit popularity for each web page extracted by using the user disk cache information according to the present invention.

수신부(410)는 사전에 동의를 얻은 사용자 단말기(480-1, 480-2)의 브라우저 디스크 캐쉬파일로부터 웹페이지들의 URL정보를 수집하고 URL추출부(420)는 그 정보들로 URL정보를 정규화하여 방문자별 웹페이지의 URL정보를 추출하고 각 웹페이지에 대한 실제 방문자수를 전체 방문자수에 대한 비율을 나타내는 방문인기도를 계산하여 웹페이지데이터베이스(460)에 저장한다. 사용자가 검색어를 입력하면 수신부(410)는 검색어를 인터넷을 통하여 수신하면, 검색부(430)는 그 검색어를 포함하는 웹페이지를 색인데이터베이스(470)에서 검색하여 추출한다. 웹페이지데이터베이스(360)는 방문인기도, 유사도 또는 링크인기도별로 웹페이지의 정보를 저장하고 있고, 색인데이터베이스(370)는 웹페이지에서 검색한 단어 또는 문장 등을 저장하여 웹페이지와 링트되어 있다. 웹페이지배열부(440)는 검색부(340)에서 추출한 웹페이들을 재정렬하는데, 수학식 2에 의해서 웹페이지 문서를 재정렬한다. 출력부는 재정렬된 문서를 사용자가 선택한 유형(예를 들면, 방문인기도 순위, 유사도 순위 또는 링크인기도 순위)으로 웹페이지 출력리스트를 제공하게 된다.Receiving unit 410 collects the URL information of the web page from the browser disk cache files of the user terminal (480-1, 480-2) with a prior consent, the URL extraction unit 420 normalizes the URL information with the information By extracting the URL information of the web page for each visitor, and calculates the visit popularity degree representing the ratio of the actual number of visitors for each web page to the total number of visitors to the web page database 460. When the user inputs a search word, the receiver 410 receives the search word through the Internet, and the search unit 430 searches for and extracts a web page including the search word in the index database 470. The web page database 360 stores information of web pages according to visit popularity, similarity, or link popularity, and the index database 370 stores words or sentences searched on the web pages and is linked with the web pages. The web page arrangement unit 440 rearranges the web pages extracted by the search unit 340, and rearranges the web page document by Equation 2. The output unit provides the webpage output list in the type of the rearranged document selected by the user (eg, visit popularity ranking, similarity ranking, or link popularity ranking).

본 발명은 또한 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 기록매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 하드디스크, 플로피디스크, 플래쉬 메모리, 광데이터 저장장치 등이 있으며, 또한 캐리어 웨이브(예를 들어 인터넷을 통한 전송)의 형태로 구현되는 것도 포함한다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드로서 저장되고 실행될 수 있다.The invention can also be embodied as computer readable code on a computer readable recording medium. The computer-readable recording medium includes all kinds of recording devices in which data that can be read by a computer system is stored. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, hard disk, floppy disk, flash memory, optical data storage device, and also carrier waves (for example, transmission over the Internet). It also includes the implementation in the form of. The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.

이상에서 설명한 바와 같이, 본 발명에 의하면, 사용자의 디스크 캐쉬파일에 의해 직접 반영된 웹페이지의 URL정보를 수집하여 그 정보로부터 웹페이지별 인기도, 링크인기도 또는 내용의 유사도에 따라서 저장하고 사용자가 입력한 검색어를 포함하는 웹페이지들을 인기도, 링크인기도 또는 내용의 유사도별로 추출하여 사용자에게 제공함으로써 보다 관련성이 높은 검색결과를 제공할 수 있다.As described above, according to the present invention, the URL information of the web page directly reflected by the user's disk cache file is collected and stored according to the popularity of each web page, link popularity or similarity of the contents and inputted by the user. Web pages including search terms may be extracted and provided to the user by popularity, link popularity, or similarity of contents to provide more relevant search results.

도 1은 본 발명에 따른 사용자 디스크 캐쉬정보를 이용하여 추출한 웹페이지별 방문인기도에 기반한 웹페이지의 데이터베이스를 저장하는 방법에 대한 도면이다.FIG. 1 is a diagram illustrating a method of storing a database of a web page based on visit popularity for each web page extracted by using user disk cache information according to the present invention.

도 2는 본 발명에 따른 검색방법에서 링크인기도를 계산하기 위한 웹페이지간에 링크되어 있는 도면을 나타내고 있다.2 is a diagram showing links between web pages for calculating link popularity in the search method according to the present invention.

도 3은 본 발명에 따른 사용자 디스크 캐쉬정보를 이용하여 추출한 웹페이지별 방문인기도에 기반한 웹페이지 검색방법에 대한 도면이다.3 is a diagram illustrating a web page retrieval method based on visit popularity for each web page extracted by using user disk cache information according to the present invention.

Claims

A web page search method performed in a search apparatus that provides a web page search result corresponding to a search word input from a user terminal connected through a communication network,

(a) collecting, by the search apparatus, a browser disk cache file existing in the user terminal connected through a communication network;

(b) detecting, by the search apparatus, URL information of a web page visited by a user from the collected browser disk cache file;

(c) the search apparatus classifying the URL information of the detected web page for each user, and calculating the number of visited users for each web page from the classified URL information;

(d) calculating, by the search apparatus, the popularity of visits per web page by calculating a ratio of the number of visited users per web page to the total number of users; And

(e) providing, by the search apparatus, a web page search result corresponding to a search word input from the user terminal based on the visit popularity; and for each web page extracted by using the user disk cache information. How to search web pages based on visit popularity.

delete

The method of claim 1,

Step (c) is,

(c1) the search apparatus confirming an IP address from the URL information and removing a duplicate domain; And

(c2) the search apparatus classifying URL information of the web pages visited by the user for each user; a web page based on visit popularity for each web page extracted using user disk cache information; Search method.

delete

The method of claim 1,

(f) the search device calculating a link popularity based on the number of other web pages including links to the web pages; And

(g) the search apparatus calculating a similarity degree based on the frequency of the search word included in the web page;

In the step (e), the search apparatus provides a web page search result corresponding to a search word input from the user terminal according to the visit popularity, the link popularity, and the similarity. How to search web pages based on the popularity of visits by page extracted.

delete

A web page search apparatus for providing a web page search result corresponding to a search word input from a user terminal connected through a communication network,

A receiving unit for receiving a browser disk cache file existing in the user terminal and receiving a search word for searching a web page input by the user from the user terminal;

A URL extraction unit for extracting URL information of a web page visited by a user from the received browser disk cache file;

A search unit for searching a URL of a web page including the received search word;

A web page arrangement unit configured to calculate the number of visited users per web page from the visited URL information for each user, calculate a ratio of the number of visited users per web page to the total number of users, and calculate and store the visit popularity of each web page; And

A web page search based on visit popularity for each web page extracted using the user disk cache information, comprising: an output unit for providing a URL for the web page searched by the search unit to the user terminal based on the visit popularity; Device.

The method of claim 8,

The storage unit,

A web page database for storing URL information of the web pages; And

And an index database for indexing and storing words included in the web pages as index words. The web page retrieval apparatus based on visit popularity for each web page extracted using the user disk cache information.

A computer-readable recording medium having recorded thereon a program for executing the web page retrieval method according to any one of claims 1, 3, and 5 on a computer.