KR100645742B1

KR100645742B1 - Method and apparatus for collecting search data by getting various data including web document creation

Info

Publication number: KR100645742B1
Application number: KR1020050082434A
Authority: KR
Inventors: 남세동
Original assignee: (주)첫눈
Priority date: 2005-09-05
Filing date: 2005-09-05
Publication date: 2006-11-14

Abstract

A method and a device for collecting search data by obtaining diverse information including web document generation are provided to collect the new search data in real-time by checking whether a webpage is generated or updated by inserting a specific image into the webpage when the webpage is generated and checking a request for loading the image in real-time when the webpage is displayed. An image storing part(302) receives a transfer request for the specific image and transmits the image to a web browser(330) as a homepage referring to the image is read/displayed by the web browser. A URL(Uniform Resource Locator) information storing part(308) stores homepage search history information. A determiner(304) determines whether the homepage is appeared in the Internet first or not by referring to URL information of the homepage referring to the image and the URL search history information stored in the URL information storing part. A data collector(306) collects search-related information by connecting to the homepage in case of the first appeared homepage and increases a visit counter of the homepage in other cases.

Description

Method and apparatus for collecting search data by acquiring various information including web document creation {Method and apparatus for collecting search data by getting various data including web document creation}

도 1은 웹 문서에 이미지 파일을 삽입시켜, 웹 서버와 검색 데이터 수집 서버간에 웹 페이지에 관한 정보를 교환하는 것을 설명하기 위한 참조도,1 is a reference diagram for explaining the exchange of information about a web page between a web server and a search data collection server by inserting an image file into a web document;

도 2는 확인용 이미지가 포함된 웹 페이지의 일예를 도시한 도면,2 is a diagram illustrating an example of a web page including an image for confirmation;

도 3은 본 발명의 일실시예에 따른 검색 데이터 수집 장치의 구성도,3 is a block diagram of an apparatus for collecting search data according to an embodiment of the present invention;

도 4는 본 발명의 일실시예에 따른 검색 데이터 수집 방법의 흐름도,4 is a flowchart of a search data collection method according to an embodiment of the present invention;

도 5는 코멘트 필드가 삽입된 웹 페이지 소스 파일의 일예를 도시한 도면이다.5 is a diagram illustrating an example of a web page source file with a comment field inserted.

본 발명은 정보 검색에 관한 것으로, 보다 상세하게는 웹 페이지에 포함된 특정 이미지를 이용하여 웹 문서 생성을 포함한 다양한 정보 획득을 통해 새로운 검색 데이터를 수집하는 방법 및 그 장치에 관한 것이다.The present invention relates to information retrieval, and more particularly, to a method and apparatus for collecting new retrieval data by acquiring various information including web document generation using a specific image included in a web page.

최근에는 인터넷 상에 셀 수 없을 정도로 많은 웹 페이지가 생겨나고 없어지 고 있다. 따라서 정보 검색을 위하여 정보 수집 서버는, 인터넷 상의 웹 페이지를 일일이 방문하여 데이터를 수집하여야 하는데, 이렇게 수없이 많이 생겨나고 없어지는 웹 페이지의 데이터를 실시간으로 탐색하여 수집하는 것은 매우 어렵다.In recent years, countless web pages have been created and disappeared on the Internet. Therefore, for information retrieval, the information collection server must visit a web page on the Internet and collect data, and it is very difficult to search and collect data of web pages generated and lost so many times in real time.

즉, 종래에는 정보 수집 서버가 인터넷 상의 웹 페이지를 방문하여 데이터를 수집하는데 있어 검색 깊이에 한계가 있어 모든 정보를 검색하기가 어렵고, 검색 주기도 길기 때문에 빈번하게 갱신되는 정보를 실시간으로 수집하기가 어렵다. 이러한 문제점을 해결하기 위한 방법으로, 사용자 단말에 특정 프로그램을 설치하여 그 방문 URL을 보고하는 방식이 있으나 개인 정보 유출 등의 위험이나 보안 등에 문제가 있다.That is, conventionally, the information collection server has a limited depth of search in collecting data by visiting web pages on the Internet, and thus it is difficult to search all the information, and it is difficult to collect frequently updated information in real time because the search period is long. . As a method for solving such a problem, there is a method of reporting a visited URL by installing a specific program in a user terminal, but there is a problem such as a risk or security of personal information leakage.

따라서, 본 발명이 이루고자 하는 기술적 과제는 웹 페이지의 제작시 특정 이미지를 삽입하고, 이 웹 페이지를 디스플레이 함에 따라 상기 이미지의 로딩 요청이 들어오는 것을 실시간으로 체크함으로써 웹 페이지의 신규 생성 또는 갱신 여부를 알아내 새로운 검색 데이터를 실시간으로 수집하는 것을 가능하도록 한 검색 데이터 수집 방법 및 그 장치를 제공하는 것이다.Accordingly, the technical problem to be achieved by the present invention is to find out whether a web page is newly created or updated by inserting a specific image when the web page is produced and checking the loading request of the image in real time as the web page is displayed. It is to provide a method and apparatus for collecting search data that makes it possible to collect my new search data in real time.

본 발명이 이루고자 하는 다른 기술적 과제는 상기 웹 페이지의 소스 파일에 제목, 작성자, 작성 일자와 같은 코멘트 필드를 추가함으로써 상기 웹 페이지의 개요 정보를 보다 정확하게 알 수 있도록 한 검색 데이터 수집 방법 및 그 장치를 제공하는 것이다.Another technical problem to be achieved by the present invention is to add a comment field such as title, author, and creation date to the source file of the web page, and to provide a search data collection method and apparatus for more accurately knowing the overview information of the web page. To provide.

상기 기술적 과제는 본 발명에 따라, 특정 이미지를 참조하는 홈 페이지가 사용자 브라우저에 의해 읽혀져 디스플레이됨에 따라, 상기 특정 이미지의 전송 요청 및 상기 이미지를 참조하는 홈 페이지의 위치 정보를 수신하여 상기 이미지를 상기 브라우저로 전송하는 단계; 상기 위치 정보를 참조하여 상기 홈 페이지가 인터넷 상에 처음 출현하는 홈 페이지인가의 여부를 판단하는 단계; 및 상기 홈 페이지가 처음 출현하는 홈 페이지이면 상기 홈 페이지에 접속하여 검색 관련 정보를 수집하고, 그렇지 않으면 해당 홈 페이지의 방문 카운터를 증가시키는 단계를 포함하는 것을 특징으로 하는 검색 데이터 수집 방법에 의해 달성된다.According to the present invention, according to the present invention, as a home page referring to a specific image is read and displayed by a user browser, the request for transmission of the specific image and location information of the home page referring to the image are received and the image is recalled. Sending to a browser; Determining whether the home page is a home page that first appears on the Internet by referring to the location information; And if the home page is the first home page, accessing the home page to collect search related information, and otherwise increasing the visit counter of the corresponding home page. do.

상기 위치 정보는 상기 홈 페이지의 URL(Uniform Resource Locator)이고, 또한 상기 위치 정보는, 상기 홈 페이지를 관리하고 있는 웹 서버의 레퍼러(refferer)를 통해 수신되는 것이 바람직하다.Preferably, the location information is a URL (Uniform Resource Locator) of the home page, and the location information is received through a referrer of a web server managing the home page.

또한, 상기 홈 페이지에 접속하여 검색 관련 정보를 수집할 때, 상기 홈 페이지의 소스 파일에 포함된 코멘트 정보를 분석하여 상기 홈 페이지 문서의 제목, 작성자, 작성일을 포함하는 정보를 추출하는 것이 바람직하다.In addition, when collecting search-related information by accessing the home page, it is preferable to extract information including the title, author, and creation date of the home page document by analyzing comment information included in a source file of the home page. .

한편, 본 발명의 다른 분야에 따르면, 상기 기술적 과제는 특정 이미지를 참조하는 홈 페이지가 사용자 브라우저에 의해 읽혀져 디스플레이됨에 따라, 상기 특정 이미지의 전송 요청을 수신하여 상기 브라우저로 전송하는 이미지 저장부; 적어도 하나의 홈 페이지 검색 이력 정보를 저장하고 있는 URL 정보 저장부; 상기 이미지를 참조하는 홈 페이지의 위치 정보와 상기 URL 정보 저장부에 저장된 URL 검색 이력을 참조하여, 상기 홈 페이지가 인터넷 상에 처음 출현하는 홈 페이지인가의 여부를 판단하는 판단부; 및 상기 판단 결과 상기 홈 페이지가 처음 출현하는 홈 페이지이면 상기 홈 페이지에 접속하여 검색 관련 정보를 수집하고, 그렇지 않으면 해당 홈 페이지의 방문 카운터를 증가시키는 데이터 수집부를 포함하는 것을 특징으로 하는 검색 데이터 수집 장치에 의해서도 달성된다.On the other hand, according to another aspect of the present invention, the technical problem is an image storage unit for receiving a request to transmit the specific image and transmits to the browser, as the home page that is referenced to the specific image is read and displayed by the user browser; A URL information storage unit which stores at least one home page search history information; A determination unit determining whether the home page is a home page that first appears on the Internet by referring to location information of a home page referring to the image and a URL search history stored in the URL information storage unit; And a data collection unit for accessing the home page to collect search-related information when the home page is the first home page as a result of the determination, and increasing the visit counter of the corresponding home page. It is also achieved by the device.

이하 첨부된 도면을 참조하여 본 발명의 바람직한 실시예에 대해 상세히 설명한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 웹 문서에 이미지 파일을 삽입시켜, 웹 서버와 검색 데이터 수집 서버간에 웹 페이지에 관한 정보를 교환하는 것을 설명하기 위한 참조도이다.1 is a reference diagram for explaining the exchange of information on a web page between a web server and a search data collection server by inserting an image file into a web document.

웹 서버 #1 내지 #3(110a 내지 110c)은 웹 페이지를 운영 관리하고 있다. 일예로 웹 서버 # 1(110a)에 html 소스 파일(150)로 작성된 웹 문서가 있다고 하면, 이 문서는 도시한 웹 페이지(160)와 같이 디스플레이된다. html 소스 파일(150)에는 특정 이미지의 위치 정보를 포함하고 있는데, 일예로 URL(Uniform Resource Locator)(152)이 포함된 이미지 태그가 삽입되어 있다. 도 1을 참조하면 일예로 <img src = "http://lastsnow.1noon.net/lastsnow.gif">와 같은 태그가 삽입되어 있음을 알 수 있다. 즉, 도메인 네임 lastsnow.1noon.net을 갖는 서버에 lastsnow.gif라는 파일이 저장되어 있어, 이 파일을 읽어와 웹 페이지(160)를 만드는 것을 알 수 있다.Web servers # 1 to # 3 (110a to 110c) manage and manage web pages. As an example, if there is a web document written in the html source file 150 in the web server # 1 110a, the document is displayed as shown in the web page 160 shown. The html source file 150 includes location information of a specific image. For example, an image tag including a Uniform Resource Locator (URL) 152 is inserted. Referring to FIG. 1, for example, a tag such as <img src = "http://lastsnow.1noon.net/lastsnow.gif"> may be inserted. That is, a file called lastsnow.gif is stored in a server having the domain name lastsnow.1noon.net, and it can be seen that the web page 160 is read by reading this file.

사용자의 웹 브라우저(105)가 웹 서버 #1(110a)에 접속하여 html 소스 파일(150)을 갖는 웹 페이지를 읽어 오면, 사용자 웹 브라우저(105)는 html 소스 파일(150)을 해석하면서 맨 마지막에 <img src = "http://lastsnow.1noon.net/ lastsnow.gif">을 만난다. 이에 따라 도메인 네임 lastsnow.1noon.net을 갖는 서버에 접속하여 "lastsnow.gif" 파일을 불러와 사용자의 웹 브라우저(105)로 전송받는다. 이와 함께, 웹 브라우저(105)는 레퍼러(refferer)를 통해 "lastsnow.gif" 라는 이미지가 포함된 웹 문서의 위치 정보인 URL을 검색 데이터 수집 서버(100)로 전송한다. 그러면 검색 데이터 수집 서버(100)는 자신이 가지고 있는 검색 데이터베이스를 확인하여 이 URL이 처음 출현하는 URL이면 해당 URL에 접속하여 데이터를 수집한다. 만일 이 URL이 처음 출현하는 URL이 아니면 해당 URL의 방문자 카운터를 하나 증가시킨다. 따라서, 웹 페이지가 만들어져 처음 사용자의 웹 브라우저(105)에 로딩되면 이를 실시간으로 체크하여 해당 웹 페이지의 검색 정보를 검색 서버에 추가할 수 있을 뿐만 아니라 해당 웹 페이지의 방문자 수를 쉽게 카운트할 수 있다.When the user's web browser 105 connects to web server # 1 110a to read a web page with the html source file 150, the user's web browser 105 interprets the html source file 150 and finally Meet <img src = "http://lastsnow.1noon.net/ lastsnow.gif"> Accordingly, the server is connected to the server having the domain name lastsnow.1noon.net, and the "lastsnow.gif" file is loaded and transmitted to the user's web browser 105. In addition, the web browser 105 transmits the URL, which is the location information of the web document including the image “lastsnow.gif”, to the search data collection server 100 through the referrer. Then, the search data collection server 100 checks the search database which it owns and collects the data by accessing the URL if the URL is the first URL. If this is not the first URL that appears, increment the visitor counter for that URL by one. Therefore, when a web page is created and loaded into the user's web browser 105 for the first time, the web page can be checked in real time to add search information of the web page to the search server and easily count the number of visitors of the web page. .

도 2는 확인용 이미지가 포함된 웹 페이지의 일예를 도시한 도면이다.2 is a diagram illustrating an example of a web page including an image for confirmation.

도 2를 참조하면 홈 페이지(200) 내의 맨 마지막 부분에 확인용 이미지(210)가 포함되어 있음을 알 수 있다. 확인용 이미지(210)는 눈에 보이는 이미지가 될 수도 있고 눈에 보이지 않도록 할 수도 있다.Referring to FIG. 2, it can be seen that the confirmation image 210 is included in the last portion of the home page 200. The confirmation image 210 may be a visible image or may be invisible.

도 3은 본 발명의 일실시예에 따른 검색 데이터 수집 장치의 구성도이다.3 is a block diagram of a search data collection apparatus according to an embodiment of the present invention.

검색 데이터 수집 장치(300)는 이미지 저장부(302), 판단부(304), 데이터 수집부(306) 및 URL 정보 저장부(308)를 포함한다. 검색 데이터 수집 장치(300)에 저장된 특정 이미지를 참조하는 웹 페이지가 사용자의 웹 브라우저(330)에서 읽혀지면, 해당 이미지를 디스플레이하기 위하여 이미지를 저장하고 있는 검색 데이터 수 집 장치(300)에 접속하여 이미지 저장부(302)로부터 이미지를 전송받는다. 그리고 판단부(304)는 도 1을 참조하여 전술한 바와 같이 레퍼러(refferer)를 통해 웹 서버(310)의 URL을 전달받아 해당 URL이 처음 출현하는 URL인가를 판단한다. URL 정보 저장부(308)는 검색 데이터 수집 장치(300)가 방문하여 수집한 URL 정보와 해당 URL을 방문한 방문자의 카운터가 저장되어 있다. 즉 검색 데이터 수집 장치(300)가 데이터를 수집한 URL의 이력을 저장하고 있다.The search data collection device 300 includes an image storage unit 302, a determination unit 304, a data collection unit 306, and a URL information storage unit 308. When a web page referring to a specific image stored in the search data collection device 300 is read by the user's web browser 330, the user accesses the search data collection device 300 that stores the image to display the image. The image is received from the image storage unit 302. As described above with reference to FIG. 1, the determination unit 304 receives the URL of the web server 310 through the referrer and determines whether the corresponding URL is the first URL. The URL information storage unit 308 stores URL information collected by the search data collecting device 300 and a counter of visitors who visit the URL. That is, the search data collecting device 300 stores a history of URLs for collecting data.

판단부(304)는 전달받은 URL이 처음 출현하는 URL이면 데이터 수집부(306)에게 해당 URL에 접속하여 검색 데이터를 읽어오도록 한다. 데이터 수집부(306)는 수집한 검색 데이터를 검색 서버(320)로 전달하면, 검색 서버(320)에 접속한 사용자는 검색어을 입력하여 그 결과를 제공받는다. 한편 전달받은 URL이 이미 수집했던 URL이면 해당 URL의 방문 카운터를 하나 증가시켜 URL 정보 저장부(308)에 저장한다.The determination unit 304 accesses the data collection unit 306 to read the search data when the received URL is the first URL. When the data collection unit 306 transmits the collected search data to the search server 320, a user who accesses the search server 320 receives a search word and receives a result. On the other hand, if the received URL is a URL that has already been collected, the visit counter of the corresponding URL is increased by one and stored in the URL information storage unit 308.

도 4는 본 발명의 일실시예에 따른 검색 데이터 수집 방법의 흐름도이다.4 is a flowchart of a search data collection method according to an embodiment of the present invention.

검색 데이터 수집 장치(300)가 저장하고 있는 특정 이미지의 요청 메시지와, 레퍼러(refferer)를 통해 해당 이미지를 요청한 URL 정보를 수신하면(S410), 먼저 해당 이미지를 전송한다(S420). 그리고 이 URL이 첫 출현하는 URL인가를 판단하여(S430), 처음 출현하는 URL이면 해당 URL에 접속하여 검색 데이터를 수집하고(S440) 이 URL의 방문 카운터를 하나 증가시킨다(S450). 만일 이 URL이 처음 출현하는 URL이 아니면 이미 데이터를 수집한 URL로써, 다시 데이터를 수집할 필요가 없으므로 방문 카운터만을 하나 증가시킨다(S450).When the search data collection apparatus 300 receives the request message of the specific image stored in the search data and the URL information requesting the image through the referrer (S410), the corresponding image is first transmitted (S420). Then, it is determined whether the URL is the first URL (S430). If the URL is the first URL, the access data is collected by accessing the URL (S440) and the visit counter of the URL is increased by one (S450). If this URL is not the first URL to appear, it is a URL that has already collected data, and since there is no need to collect data again, only one visit counter is increased (S450).

html 소스 파일(500)에는 여러 가지 태그가 포함되어 있다. 예를 들어 타이틀 태그를 <title>첫눈</title>과 같이 사용하여 웹 페이지의 타이틀을 표현할 수 있다. 그러나 웹 페이지를 만들 때 이러한 태그들의 내용을 일일이 올바르게 넣지 않는 경우가 많으므로 이러한 정보를 이용하는데 어려움이 있다.The html source file 500 includes several tags. For example, you can use the title tag as <title> first sight </ title> to represent the title of a web page. However, when creating a web page, the contents of these tags are often not properly inserted, which makes it difficult to use this information.

따라서 도 5를 참조하면 본 발명의 웹 페이지 소스 파일에 코멘트 태그를 이용하여 제목(510), 작성자(520) 또는 작성일(530)과 같은 정보를 삽입할 수 있고, 정보 수집 서버는 이들 코멘트를 분석함으로써 웹 문서의 속성 정보를 추출할 수 있다. 예를 들어 제목은  과 같이 태그를 삽입하고, 작성자는  와 같은 태그를 삽입하고, 작성일은  와 같은 태그를 삽입하여 표현한다.Thus, referring to FIG. 5, information such as title 510, author 520, or creation date 530 can be inserted into a web page source file of the present invention using a comment tag, and the information collection server analyzes these comments. By doing so, attribute information of the web document can be extracted. For example, the title inserts a tag like <!-/ Title: First Snow Co ./-->, the author inserts a tag like <!-/ By: administrator /->, and the creation date is < Insert a tag such as!-/ Created: 20050719 /->.

한편, 전술한 검색 데이터 수집 방법은 컴퓨터 프로그램으로 작성 가능하다. 상기 프로그램을 구성하는 코드들 및 코드 세그먼트들은 당해 분야의 컴퓨터 프로그래머에 의하여 용이하게 추론될 수 있다. 또한, 상기 프로그램은 컴퓨터가 읽을 수 있는 정보저장매체(computer readable media)에 저장되고, 컴퓨터에 의하여 읽혀지고 실행됨으로써 검색 데이터 수집 방법을 구현한다. 상기 정보저장매체는 자기 기록매체, 광 기록매체, 및 캐리어 웨이브 매체를 포함한다.On the other hand, the above-described search data collection method can be created by a computer program. Codes and code segments constituting the program can be easily inferred by a computer programmer in the art. In addition, the program is stored in a computer readable media, which is read and executed by a computer to implement a search data collection method. The information storage medium includes a magnetic recording medium, an optical recording medium, and a carrier wave medium.

이제까지 본 발명에 대하여 그 바람직한 실시예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본 질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.So far I looked at the center of the preferred embodiment for the present invention. Those skilled in the art will appreciate that the present invention can be implemented in a modified form without departing from the essential features of the present invention. Therefore, the disclosed embodiments should be considered in descriptive sense only and not for purposes of limitation. The scope of the present invention is shown in the claims rather than the foregoing description, and all differences within the scope will be construed as being included in the present invention.

전술한 바와 같이 본 발명에 따르면, 새로운 웹 페이지가 생기는 것을 실시간으로 찾아내 검색 데이터를 수집하거나, 상기 웹 페이지의 방문 회수를 카운트하여 알려줌으로써 검색 데이터의 실시간 갱신이 가능하고 웹 페이지의 관리를 효율적으로 할 수 있다. 즉, 레퍼러를 이용하여 특정 이미지를 포함한 웹 페이지가 열람될 때마다 자동으로 그 이미지의 로딩을 요청한 URL을 보고하므로 그 문서의 열람회수까지 알 수 있기 때문에, 해당 웹 페이지의 신뢰도, 이슈화 정도 등의 척도로 사용할 수 있을 뿐만 아니라, 해당 웹 페이지의 URL 경로의 복잡성과 무관하게 실시간으로 정보를 수집하고 갱신할 수 있는 효과가 있다.As described above, according to the present invention, it is possible to find a new web page in real time and collect search data or to report the number of visits of the web page in real time so that the search data can be updated in real time and the management of the web page can be efficiently performed. You can do That is, whenever a web page containing a specific image is read using a referrer, the URL that requests the loading of the image is automatically reported, so that the number of readings of the document can be known. Therefore, the reliability of the web page, the degree of issue, etc. Not only can it be used as a measure, it can also collect and update information in real time regardless of the complexity of the URL path of the web page.

또한 웹 페이지에 코멘트 필드를 추가함으로써, 상기 웹 페이지의 제목, 작성자, 작성일 등의 관련 정보를 보다 쉽게 파악할 수 있다.In addition, by adding a comment field to a web page, related information such as title, author, creation date, etc. of the web page can be easily identified.

Claims

As the home page referring to a specific image is read and displayed by a user browser, receiving a request for transmission of the specific image and location information of the home page referring to the image and transmitting the image to the browser;

Determining whether the home page is a home page that first appears on the Internet by referring to the location information; And

And if the home page is the first home page, accessing the home page to collect search related information, and otherwise increasing the visit counter of the home page.

The method of claim 1,

And the location information is a URL (Uniform Resource Locator) of the home page.

The method of claim 1,

And the location information is received through a referrer of a web server managing the home page.

The method of claim 1,

When accessing the home page and collecting search related information, the search information is analyzed by extracting information including the title, author, and creation date of the home page document by analyzing comment information included in the source file of the home page. How data is collected.

The method of claim 4, wherein

And the comment is inserted into the home page source file according to a tag format such as <!-/ Comment /->.

An image storage unit which receives a request for transmission of the specific image and transmits it to the browser as a home page referring to the specific image is read and displayed by a user browser;

A URL information storage unit which stores at least one home page search history information;

A determination unit determining whether the home page is a home page that first appears on the Internet by referring to location information of a home page referring to the image and a URL search history stored in the URL information storage unit; And

And a data collector configured to access the home page to collect search-related information when the home page is the first home page as a result of the determination, and to increase the visit counter of the corresponding home page. .

The method of claim 6, wherein the determination unit

And receiving location information of the home page through a referrer of the web server managing the home page.

The method of claim 6, wherein the data collection unit

When accessing the home page and collecting search related information, the search information is analyzed by extracting information including the title, author, and creation date of the home page document by analyzing comment information included in the source file of the home page. Data collection device.

The method of claim 8,

A computer-readable recording medium having recorded thereon a program for executing the method of any one of claims 1 to 5 on a computer.