JP2009048380A

JP2009048380A - Search system, search apparatus and search method

Info

Publication number: JP2009048380A
Application number: JP2007213169A
Authority: JP
Inventors: Hideyuki Maekawa; 英之前川
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2007-08-17
Filing date: 2007-08-17
Publication date: 2009-03-05
Anticipated expiration: 2027-08-17
Also published as: JP4868245B2

Abstract

<P>PROBLEM TO BE SOLVED: To solve the problems wherein existing search services will exclude even popular and frequently accessed sites from the scope of search if they are not linked and cannot be reached by crawlers, and the crawlers must frequently visit popular pages, which are expected to be updated frequently, to maintain the latest indexes of search service databases. <P>SOLUTION: Many unspecified persons' access histories are collected by means of tool bar devices, and collection information is created therefrom. Popular new sites can be extracted and indexed quickly from the collection information. The reflection of the collection information in the patrolling schedule of crawlers implements a search service well adapted to users' browsing states. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、ＷＷＷ（World Wide Web）の検索システムにおいて、Ｗｅｂページなどのリソースの検索を効率的に行うための技術に関する。 The present invention relates to a technique for efficiently searching for resources such as Web pages in a WWW (World Wide Web) search system.

インターネットユーザーがWWW上のＷｅｂページやドキュメント、画像データなどのリソースを検索しようとする場合、通常Ｗｅｂ検索システムを利用する。そして当該検索システムは、そのユーザーからの検索キーワードなどに応じた検索処理を実行するために、以下のようにしてWWW上のリソースに関するインデクサを生成する。すなわち検索システムは、リソース間のハイパーリンクを辿りＷＷＷ上を巡回する「クローラ」と呼ばれるプログラムを実行し、そのクローラがＷＷＷ上を巡回して収集したリソースのURL（Uniform Resource Locator）やコンテンツ内容に関する情報を取得する。そしてそれら収集情報から、ＷＷＷ上のリソースに関するインデックス情報を構築、蓄積する、という具合である。そしてユーザーからの検索キーワードなどを取得した場合には、検索システムは検索対象としてそのインデックス情報を参照し、検索結果をユーザー端末に返信する。 When an Internet user tries to search for resources such as Web pages, documents, and image data on the WWW, a Web search system is usually used. The search system generates an indexer related to resources on the WWW as follows in order to execute a search process according to a search keyword from the user. In other words, the search system executes a program called “crawler” that traces hyperlinks between resources and circulates on the WWW, and relates to the URL (Uniform Resource Locator) and content contents of the resources that the crawler circulates on the WWW. Get information. Then, index information related to resources on the WWW is constructed and accumulated from the collected information. When a search keyword or the like is acquired from the user, the search system refers to the index information as a search target and returns the search result to the user terminal.

ところで、これらＷＷＷ上のリソースは更新などされることが常である。また、インターネットは情報（リソース）の即時提供性が特徴の一つになっている。そのためユーザーは更新された最新の提供情報を入手するために、時事関連ニュースページなど更新頻度が高いＷｅｂページなどを特に中心的に検索する傾向がある。 By the way, these resources on the WWW are usually updated. One of the features of the Internet is the immediate provision of information (resources). Therefore, in order to obtain the latest updated provision information, the user has a tendency to search mainly Web pages with high update frequency such as current news pages.

したがってユーザーのこのようなニーズに応えるため、検索システムでは「クローラ」によるリソース情報の収集処理ならびにインデックス情報の生成蓄積処理を、リソースの更新頻度などに応じて定期的に行い、その情報鮮度を保つよう構成されていることが望ましい。 Therefore, in order to meet such user needs, the search system periodically performs resource information collection processing and index information generation / accumulation processing by crawlers according to resource update frequency, etc., and maintains the information freshness. It is desirable to be configured as follows.

しかしインデックス情報の鮮度を保つために、例えば単にクローラの巡回頻度を上げてインデックス情報の鮮度を保つような構成とすると、WWW上のトラフィックを圧迫することになってしまう。そこで頻繁にインデックス情報を更新すべきリソースを特定し当該リソースに対する巡回優先度を上げることで、WWW上のトラフィックをあまり圧迫することなくインデックス情報の情報鮮度を保つよう構成する技術も提供されている。 However, in order to maintain the freshness of the index information, for example, if the configuration is such that the crawler's circulation frequency is simply increased to maintain the freshness of the index information, traffic on the WWW will be compressed. Therefore, a technology is also provided that keeps the index information fresh without squeezing too much traffic on the WWW by identifying resources that should be updated frequently and raising the cyclic priority for those resources. .

具体的には、前述のようにインターネットユーザーは更新頻度が高いＷｅｂページを中心的に検索しアクセスする傾向がある。つまり逆に言えば、アクセス頻度の高いリソースは頻繁に更新などされている可能性が高い。したがって、例えばユーザーのアクセス数を集計することで「クローラ」のリソースに対する巡回優先度を決定し、優先度の高いリソースの巡回回数を上げることで、WWW上のトラフィック負荷をあまり増大させることなくインデックス情報の情報鮮度を保つことができる、といった技術である。
特開２００６−２７７７３２号公報 Specifically, as described above, Internet users tend to search and access Web pages that are frequently updated. In other words, there is a high possibility that frequently accessed resources are frequently updated. Therefore, for example, by counting the number of user accesses, the patrol priority for resources of “crawlers” is determined, and by increasing the number of patrols for resources with high priority, an index can be created without increasing the traffic load on the WWW. It is a technology that can maintain the freshness of information.
JP 2006-277732 A

しかし従来の検索システムには、次のような課題が普遍的に存在する。先ず、インデックス情報の鮮度を保つためには、前述のように「クローラ」はリソース間のハイパーリンクを辿りＷＷＷ上を巡回するため、インデックス化の対象となるリソースは別のリソースにハイパーリンクされている必要がある。ところが、新規に開設されたばかりのＷｅｂページなどはいずれのリソースにもハイパーリンクされていないことがある。すると、そのようなリソースにはクローラが辿り着けないためインデックス化の対象から外れてしまう。つまり極めて高い情報鮮度を有するはずの新規開設Ｗｅｂページなどが検索の際に参照されるインデックスファイルに含まれていない可能性があり、そのために検索結果に反映することが出来ないことがあった。 However, conventional search systems have the following universal problems. First, in order to maintain the freshness of the index information, as described above, the “crawler” follows the hyperlink between resources and circulates on the WWW. Therefore, the resource to be indexed is hyperlinked to another resource. Need to be. However, a newly opened Web page or the like may not be hyperlinked to any resource. Then, since such a resource cannot be reached by the crawler, it is not indexed. In other words, there is a possibility that a newly opened Web page or the like that should have extremely high information freshness may not be included in the index file that is referred to in the search, and thus may not be reflected in the search result.

次に、従来の検索システムでは、上記ユーザーのアクセス数を集計して「クローラ」の巡回優先度を決定し、ネットワーク負荷をあまりかけずに情報鮮度を保つが、その集計されたアクセスは、ユーザーの意思による積極的なアクセスには限られない。例えば、操作ミスなどによってリンクがクリックされたことによるアクセスや、リンクを辿って目的のウェッブページに辿り着くための経由点としてなされたアクセスなどが含まれている場合がある。そのため集計されたリソースアクセス数は、実効的なアクセス実体（ユーザーの閲覧意思等による積極的なアクセスの数）を示していない可能性がある。そのため、その誤った集計結果から立てられたクローラの巡回スケジュールでは、正確な巡回優先順位を算出できていない可能性があった。 Next, in the conventional search system, the number of accesses of the above users is aggregated to determine the crawler's patrol priority, and the freshness of information is maintained without imposing much network load. It is not limited to active access by the will of the person. For example, there may be included access due to a link being clicked due to an operation error or the like, or access made as a waypoint to reach the target web page by following the link. Therefore, the total number of resource accesses may not indicate an effective access entity (the number of active accesses due to the user's intention to view). For this reason, the crawler's patrol schedule established from the erroneous tabulation results may not have been able to calculate an accurate patrol priority.

更に、検索結果の一覧を表示させる際の各検索結果の表示ランキング（順序）は、ページ間のリンク数に応じて行うページランクによるものの他、ユーザーによるクリックされたページを上位にランキングする方法も知られている。しかし、この場合、上述と同様に、ユーザーのクリックに実効的なアクセス実体を示していない可能性があるため、その誤った集計結果から立てられた検索結果の表示の方法では、実用に沿った正確なランキングを算出できていない可能性があった。 Furthermore, the display ranking (order) of each search result when displaying a list of search results is based on the page rank performed according to the number of links between pages, and there is also a method of ranking the pages clicked by the user at the top. Are known. However, in this case, as described above, since there is a possibility that the effective access entity is not indicated for the user click, the method of displaying the search result based on the erroneous aggregation result is in line with the practical use. There was a possibility that an accurate ranking could not be calculated.

以上の課題を主に解決するために、本発明は以下のような検索システムを提供する。すなわちユーザーの端末などにおいて、ブラウザからツールバー装置を介して、利用者が閲覧したリソースのURLをはじめとする閲覧情報を取得し、閲覧管理サーバ装置に送信する。そして閲覧管理サーバ装置にて管理されている各端末の閲覧情報を、WWW上のリソースを検索するためのインデクシング（インデックス化）機能を備える検索装置等にて収集し、検索装置にて利用することができるよう構成された検索システムを提供する。特に上記構成においては、ツールバー装置において利用者が閲覧したリソースのURLをはじめとする閲覧情報を取得することを特徴とする。 In order to mainly solve the above problems, the present invention provides the following search system. That is, the browsing information including the URL of the resource browsed by the user is acquired from the browser via the toolbar device at the user terminal or the like, and transmitted to the browsing management server device. The browsing information of each terminal managed by the browsing management server device is collected by a search device having an indexing function for searching resources on the WWW and used by the search device. Provided is a search system configured to be able to In particular, the above configuration is characterized in that browsing information including a URL of a resource browsed by a user in the toolbar device is acquired.

そしてこの検索システムではツールバー装置を介して取得した閲覧情報を利用して、例えば第一に、他のページにハイパーリンクされていないリソースのコンテンツをインデックス化のために取得したり、さらに「クローラ」の巡回先として指定したりすることができるよう構成されていることをさらなる特徴とする。また第二に、前記のような誤アクセスなどを排除して集計したリソースの実効的なアクセス指標（視聴度指数）であるトラフィックランクを算出し、「クローラ」の巡回スケジュールに利用することができるようにも構成されていることをさらなる特徴とする。 And in this search system, using browsing information acquired via the toolbar device, for example, first, the contents of resources that are not hyperlinked to other pages are acquired for indexing, or even "crawler" It is further characterized in that it can be designated as a circulation destination. Secondly, it is possible to calculate a traffic rank, which is an effective access index (viewing degree index) of a resource that is tabulated by eliminating such erroneous access as described above, and can use it for a crawler's traveling schedule. It is further characterized in that it is also configured.

具体的には、ブラウザから少なくとも閲覧ＵＲＬを含む閲覧情報を取得する閲覧情報取得部と、取得した閲覧情報を所定の閲覧管理サーバ装置に送信する閲覧情報送信部と、を有するツールバー装置と、前記閲覧情報送信部から送信される閲覧情報を収集する検索装置と、からなる検索システムである。 Specifically, a toolbar apparatus having a browsing information acquisition unit that acquires browsing information including at least a browsing URL from a browser, and a browsing information transmission unit that transmits the acquired browsing information to a predetermined browsing management server device, And a search device that collects browsing information transmitted from the browsing information transmitting unit.

そして上記第一又は第二のさらなる特徴点を実現するため、本発明の検索システムは上記構成に加えて、第一には、前記検索装置において、収集した閲覧情報に基づいて検索エンジンにて新規に検索対象として利用するＵＲＬである新規検索対象ＵＲＬを抽出するとともに、抽出したＵＲＬのコンテンツをインデクシングする第一インデクサ部を有する第一閲覧管理サーバ装置を含むよう構成しても良い。 And in order to implement | achieve the said 1st or 2nd further feature point, in addition to the said structure, the search system of this invention is new in a search engine first based on the browsing information collected in the said search apparatus. A first search management server device having a first indexer unit that extracts a new search target URL that is a URL used as a search target and indexes the content of the extracted URL may be included.

また、第二には、前記検索装置において、収集した閲覧情報を蓄積する閲覧情報蓄積部と、閲覧情報蓄積部に蓄積されている閲覧情報に基づいてＵＲＬ毎に視聴度指数であるトラフィックランクを算出するトラフィックランクスコアリング部を有する第二閲覧管理サーバ装置を含むよう構成し、さらに、クローラ部と、トラフィックランクスコアリング部で算出されたトラフィックランクに基づいてクローラ部のクローリングスケジュールを決定するスケジュール決定部と、を有するよう構成しても良い。 Second, in the search device, a browsing information storage unit that stores the collected browsing information, and a traffic rank that is an audience index for each URL based on the browsing information stored in the browsing information storage unit. A schedule that includes a second browsing management server device having a traffic rank scoring unit to be calculated, and further determines a crawling schedule of the crawler unit based on the crawler unit and the traffic rank calculated by the traffic rank scoring unit And a determination unit.

以上のような構成をとる第一の本発明によって、ブラウザからツールバー装置を介して送信された、利用者が閲覧したリソースのURLをはじめとする閲覧情報を、閲覧管理サーバ装置を介して検索装置にて収集し、様々に利用することができる。具体的には、第一に他のページにハイパーリンクされていないリソースのコンテンツを取得し検索用のインデックスファイル化処理を行うことができる。また「クローラ」の巡回先として指定することができる。そしてそれによって、極めて高い情報鮮度を有すると思われる新規開設Ｗｅｂページ等のリソースを「クローラ」にて巡回取得して検索対象としてインデックス化することができる。 According to the first aspect of the present invention configured as described above, browsing information including a URL of a resource browsed by a user transmitted from a browser via a toolbar device is retrieved via a browsing management server device. Can be collected and used in various ways. Specifically, first, the content of a resource that is not hyperlinked to another page can be acquired and index file processing for search can be performed. It can also be designated as a crawler destination. As a result, resources such as a newly opened Web page that seems to have extremely high information freshness can be obtained in a crawler and indexed as a search target.

また、第二に誤アクセスなどを排除して集計したリソースのアクセス指標（視聴度指数）であるトラフィックランクを算出し、「クローラ」の巡回スケジュールに利用することができる。したがって、ユーザーのリソースの実効的なアクセス実体に応じて、クローラの巡回スケジューリングを立てることができる。 Second, it is possible to calculate a traffic rank that is an access index (viewing degree index) of a resource that is tabulated by eliminating erroneous access and the like, and can use it for a traveling schedule of “crawlers”. Therefore, it is possible to establish crawler cyclic scheduling according to the effective access entity of the user's resources.

以下に、図を用いて本発明の実施の形態を説明する。なお、本発明はこれら実施の形態に何ら限定されるものではなく、その要旨を逸脱しない範囲において、種々なる態様で実施しうる。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. Note that the present invention is not limited to these embodiments, and can be implemented in various modes without departing from the spirit of the present invention.

なお実施例１は、主に請求項１などについて説明する。また実施例２は、主に請求項２、７、１２などについて説明する。また実施例３は、主に請求項３、８、１３などについて説明する。また実施例４は、主に請求項４、９、１４などについて説明する。また実施例５は、主に請求項５、１０、１５などについて説明する。また実施例６は、主に請求項６、１１、１６などについて説明する。 The first embodiment will mainly describe claim 1 and the like. In the second embodiment, claims 2, 7, and 12 will be mainly described. In the third embodiment, claims 3, 8, and 13 will be mainly described. In the fourth embodiment, claims 4, 9, and 14 will be mainly described. In the fifth embodiment, claims 5, 10, 15 and the like will be mainly described. In the sixth embodiment, claims 6, 11 and 16 will be mainly described.

≪実施例１≫ Example 1

<概要> <Overview>

図１は、本実施例の検索システムにおける閲覧情報収集の一例を説明するための概念図である。この図にあるように、クライアント端末１にてブラウザが起動されている。また、このブラウザにはツールバーが組み込まれており、ブラウザの起動に合わせてツールバーもブラウザ上の所定領域（０１０１）に表示されている。 FIG. 1 is a conceptual diagram for explaining an example of browsing information collection in the search system of the present embodiment. As shown in this figure, a browser is activated on the client terminal 1. In addition, a toolbar is incorporated in the browser, and the toolbar is displayed in a predetermined area (0101) on the browser as the browser is activated.

ここで、本実施例では、ユーザーがブラウザやツールバーに入力した検索キーやＵＲＬを、ツールバーのプログラムによって収集する。またブラウザでのＷｅｂページの表示時間や、移動先や移動元などＷｅｂページの遷移経路を示す情報も、ツールバーによって取得される。そして、このようにしてネットワーク上の各クライアント端末１，２，３、・・・にて取得された情報が、閲覧管理サーバ装置を含む検索装置（０１０２）に送信される。 In this embodiment, the search key and URL input by the user to the browser and toolbar are collected by the toolbar program. Also, information indicating the Web page transition time, such as the display time of the Web page in the browser and the destination and source, is acquired by the toolbar. Then, the information acquired at each of the client terminals 1, 2, 3,... On the network in this way is transmitted to the search device (0102) including the browsing management server device.

図２は、このようにクライアント端末のツールバーにて取得され、閲覧管理サーバ装置を介して検索装置に収集される閲覧情報の一例を表す図である。この図にあるように、ツールバーは閲覧情報として、ユーザーIDやブラウザにてアクセスしたＷｅｂページのＵＲＬ、そのＷｅｂページの閲覧開始時刻（アクセス時刻）、閲覧時間（Ｗｅｂページなどの表示時間）、あるいは図示しない移動先や移動元で示されるＷｅｂページの遷移経路を示す情報などを取得する、という具合である。 FIG. 2 is a diagram illustrating an example of browsing information acquired by the toolbar of the client terminal and collected by the search device via the browsing management server device. As shown in this figure, the tool bar displays the browsing information as the user ID, the URL of the Web page accessed by the browser, the browsing start time (access time) of the Web page, the browsing time (display time of the Web page, etc.), or For example, information indicating a transition path of a Web page indicated by a movement destination or a movement source (not shown) is acquired.

そして検索装置では、各クライアント端末から閲覧管理サーバ装置を介して収集した上記のような閲覧情報を利用して、例えばユーザーのＷｅｂブラウジング行動を解析したり、その解析結果からユーザーにとって人気の高いサイトなどを迅速に把握したりすることができる。そして、例えばいわゆる「クローラ」によって検索用インデックス情報を収集する際には、その人気サイトをクロールする頻度を上げて当該人気サイトの最新の情報が検索によって提供されるよう構成する、など様々に活用することができる、という具合である。 The search device uses the browsing information collected from each client terminal via the browsing management server device, for example, to analyze the user's Web browsing behavior, or from the analysis result, the site that is popular with the user Etc. can be quickly grasped. For example, when collecting index information for search by a so-called “crawler”, the frequency of frequently crawling the popular site is increased so that the latest information of the popular site is provided by the search. It can be done.

<機能的構成> <Functional configuration>

図３は、本実施例の検索システムにおける機能ブロックの一例を表す図である。この図にあるように、本実施例の「検索システム」（０３００）は、「ツールバー装置」（０３１０）と、「検索装置」（０３２０）と、からなる。 FIG. 3 is a diagram illustrating an example of functional blocks in the search system according to the present embodiment. As shown in this figure, the “search system” (0300) of this embodiment includes a “toolbar device” (0310) and a “search device” (0320).

なお、「ツールバー装置」（０３１０）とは、クライアント端末上のブラウザの機能を補助、拡張、代替などするため、ブラウザとともに動作するよう構成されたアプリケーションが組み込まれた装置をいう。またツールバー装置の一般的な機能としては、例えば、検索窓に入力されたキーワードによるＷｅｂ上のリソースの検索機能、ＲＳＳ文書データの収集や表示機能、ショートカットアイコンによるランチャー機能などが挙げられる。そして、本実施例の検索システムにおいては、後述する構成によってブラウザにて閲覧されたＵＲＬを含む閲覧情報を取得し、閲覧管理サーバ装置に送信するというさらなる機能を有していることを特徴とする。 The “toolbar device” (0310) refers to a device in which an application configured to operate with a browser is incorporated in order to assist, expand, or substitute the browser function on the client terminal. Also, as a general function of the toolbar device, for example, a resource search function on the Web using a keyword input in a search window, a RSS document data collection and display function, a launcher function using a shortcut icon, and the like can be cited. And in the search system of a present Example, it has the further function of acquiring the browsing information containing URL browsed with the browser by the structure mentioned later, and transmitting to a browsing management server apparatus, It is characterized by the above-mentioned. .

また、「検索装置」（０３２０）とは、前述のように「クローラ」が収集したＷｅｂ上のリソースの情報を利用して所定の検索エンジンの処理によってインデックス情報を生成し、ユーザーからの検索キーワードなどに応じてインデックス情報の検索処理を実行するサーバ装置をいう。そして本実施例の検索システムにおいては、クライアント端末のツールバー装置にて取得され閲覧管理サーバ装置に送信された閲覧情報を収集し、その検索処理やインデックス情報の生成処理などに活用することを特徴とする。 The “search device” (0320) is a method for generating index information by processing of a predetermined search engine using information on resources on the Web collected by a “crawler” as described above, and a search keyword from a user. This refers to a server device that executes index information search processing according to the above. In the search system of the present embodiment, the browsing information acquired by the toolbar device of the client terminal and transmitted to the browsing management server device is collected and used for the search processing, index information generation processing, and the like. To do.

なお本実施例では、図３（ａ）に示すように閲覧管理サーバ装置がこの検索装置に含まれる構成を例に挙げ説明するが、もちろん本実施例の検索システムの構成はそれに限定されるものではない。例えば図３（ｂ）に示すように閲覧管理サーバ装置が検索装置とは別個のサーバ装置としてネットワーク上に設けられ、検索装置は当該閲覧管理サーバ装置にて管理されている閲覧情報を取得する、といった構成であっても良い。 In this embodiment, as shown in FIG. 3 (a), the browsing management server device will be described as an example of a configuration included in this search device, but of course the configuration of the search system of this embodiment is limited to that. is not. For example, as shown in FIG. 3B, the browsing management server device is provided on the network as a server device separate from the searching device, and the searching device acquires browsing information managed by the browsing management server device. It may be configured as follows.

ここで、まず検索装置とともに本実施例の検索システムを構成する「ツールバー装置」（０３０１）の機能的構成について以下に説明する。図３にあるように「ツールバー装置」は、「閲覧情報送信部」（０３０３）と、「閲覧情報取得部」（０３０４）とを有する。 Here, the functional configuration of the “toolbar device” (0301) that constitutes the search system of this embodiment together with the search device will be described below. As shown in FIG. 3, the “toolbar apparatus” includes a “browsing information transmission unit” (0303) and a “browsing information acquisition unit” (0304).

なお、以下に記載する本システムのツールバー装置や検索装置の各機能ブロックは、ハードウェア、ソフトウェア、又はハードウェア及びソフトウェアの両方として実現され得る。具体的には、コンピュータを利用するものであれば、ＣＰＵや主メモリ、バス、あるいは二次記憶装置（ハードディスクや不揮発性メモリ、ＣＤやＤＶＤどの記憶メディアとそれらメディアの読取ドライブなど）、印刷機器や表示装置、その他の外部周辺装置などのハードウェア構成部、またその外部周辺装置用のＩ／Ｏポート、それらハードウェアを制御するためのドライバプログラムやその他アプリケーションプログラム、情報入力に利用されるユーザーインターフェースなどが挙げられる。 Note that each functional block of the toolbar device and the search device of the system described below can be realized as hardware, software, or both hardware and software. Specifically, if a computer is used, a CPU, a main memory, a bus, or a secondary storage device (a hard disk, a non-volatile memory, a storage medium such as a CD or a DVD, a reading drive for the medium, etc.), a printing device And display devices, hardware components such as other external peripheral devices, I / O ports for the external peripheral devices, driver programs and other application programs for controlling these hardware, and users used for information input Examples include interfaces.

そして主メモリ上に展開したプログラムに従ったＣＰＵの演算処理によって、インターフェースを介して入力されメモリやハードディスク上に保持されているデータなどが加工、蓄積されたり、上記各ハードウェアやソフトウェアを制御するための命令が生成されたりする。また、この発明はシステムとして実現できるのみでなく、方法としても実現可能である。また、このような発明の一部をソフトウェアとして構成することができる。さらに、そのようなソフトウェアをコンピュータに実行させるために用いるソフトウェア製品、及び同製品を記録媒体に固定した記録媒体も、当然にこの発明の技術的な範囲に含まれる（本明細書の全体を通じて同様である）。 Then, by CPU processing according to the program developed on the main memory, the data input via the interface and stored in the memory and hard disk are processed and stored, and the above hardware and software are controlled. Instructions are generated. The present invention can be realized not only as a system but also as a method. A part of the invention can be configured as software. Furthermore, a software product used for causing a computer to execute such software and a recording medium in which the product is fixed to a recording medium are naturally included in the technical scope of the present invention (the same applies throughout the present specification). Is).

「閲覧情報取得部」（０３１１）は、ブラウザから少なくとも閲覧ＵＲＬ含む閲覧情報を取得する機能を有する。「閲覧ＵＲＬ」とは、このツールバー装置が組み込まれたクライアント端末のブラウザにて表示されているリソースのＷｅｂ上での所在を示す情報をいい、例えば「http://・・・/001.bmp」等の情報が挙げられる。 The “browsing information acquisition unit” (0311) has a function of acquiring browsing information including at least a browsing URL from a browser. “Browse URL” refers to information indicating the location on the Web of the resource displayed in the browser of the client terminal in which the toolbar device is incorporated. For example, “http: //... /001.bmp And the like.

そして本実施例の検索システムでは、ネットワーク上の各クライアント端末のツールバーにて取得され、閲覧管理サーバ装置を介して検索装置にて収集されるこの閲覧ＵＲＬを利用した各種処理、例えば「クローラ」の巡回対象を決定するなどの処理を行うことを特徴とする。 In the search system according to the present embodiment, various processes using the browse URL acquired by the toolbar of each client terminal on the network and collected by the search device via the browse management server device, for example, “crawler” It is characterized by performing processing such as determining a patrol target.

また閲覧情報には、上記閲覧ＵＲＬ以外にもブラウザでの閲覧に係る各種情報が含まれていて良い。例えば、ユーザーＩＤ、ブラウザＩＤやツールバーＩＤ、閲覧開始時刻情報、閲覧時間情報、あるいはＷｅｂ上の２以上のリソースを連続で閲覧した際の「移動元リソース」や「移動先リソース」などの遷移情報などが挙げられる。 In addition to the browsing URL, the browsing information may include various types of information related to browsing with a browser. For example, user ID, browser ID, toolbar ID, browsing start time information, browsing time information, or transition information such as “movement source resource” and “movement destination resource” when browsing two or more resources on the web continuously Etc.

ここでユーザーＩＤとはユーザーを識別するための情報をいい、例えばクライアント端末へのログイン時のＩＤやサービス単位で割当てられるＩＤなどが挙げられる。また、ブラウザＩＤやツールバーＩＤも同様に当該ブラウザやツールバーを識別するための情報をいい、例えば予めブラウザプログラムやツールバープログラムに記録されている製品ＩＤなどを利用すると良い。そして、これら識別情報を利用することでクライアント端末を識別することができるので、クライアント端末（ユーザー）単位での閲覧情報の管理、分析を行うことができる。なお、これらＩＤ情報の保持は、例えばツールバー装置自身が保持する方法や、ブラウザ装置が有するcookie機能を利用する方法などが挙げられる。 Here, the user ID refers to information for identifying the user, and includes, for example, an ID at the time of logging in to the client terminal, an ID assigned in service units, and the like. Similarly, the browser ID and the toolbar ID are information for identifying the browser and the toolbar. For example, a product ID recorded in advance in the browser program or the toolbar program may be used. Since the client terminal can be identified by using these identification information, the browsing information can be managed and analyzed in units of client terminals (users). The ID information can be held by, for example, a method that the toolbar device itself holds, a method that uses the cookie function of the browser device, or the like.

また、閲覧開始時刻情報とは、ブラウザにてＷｅｂ上のリソースにアクセスされた時刻、あるいはそのアクセスによってブラウザにリソースが取得された時刻や表示された時刻などをいう。具体的に当該時刻は、例えばブラウザにてアクセスリクエストしたリソースのロード完了を検出した時刻や、Ｗｅｂサーバが「２００：ＯＫ」などのＨＴＴＰステータスコードを応答した時刻等が挙げられる。 The browsing start time information refers to a time when a resource on the Web is accessed by the browser, a time when the resource is acquired by the browser by the access, a time when the resource is displayed, or the like. Specifically, the time includes, for example, the time when the loading completion of the resource requested for access by the browser is detected, the time when the Web server responds with an HTTP status code such as “200: OK”, and the like.

また閲覧時間情報とは、ブラウザにてＷｅｂ上のリソースが表示されている時間をいい、例えば、ブラウザのＵＩウィンドウが、オープンしていた時間や、当該ウィンドウがアクティブとなっている時間、またウィンドウ領域にマウスポインタが存在する時間などが挙げられる。またさらに、一定時間の操作がないためクライアント端末のＯＳが自動的にログオフ処理を行ったり、スクリーンセーバ等が起動したりした場合には、ログオフされている時間やスクリーンセーバが起動している時間を除いた後の時間としてもよい。また閲覧時間情報は、前記閲覧開始時刻と閲覧終了時刻との差分を算出することで取得されても良い。 The browsing time information refers to the time when resources on the Web are displayed on the browser. For example, the time when the browser UI window is open, the time when the window is active, and the window For example, the time when the mouse pointer is in the area. In addition, since there is no operation for a certain period of time, when the OS of the client terminal automatically performs logoff processing or when a screen saver is activated, the log off time or the time when the screen saver is activated is excluded. It may be the time after. The browsing time information may be acquired by calculating a difference between the browsing start time and the browsing end time.

また遷移情報とは、Ｗｅｂ上の２以上のリソースを連続で閲覧した際の「移動元リソース」と「移動先リソース」とを示す情報をいい、例えばハイパーリンクのクリックによる遷移のほか、ブラウザやツールバーのＵＲＬ入力欄に連続で入力された２以上のＵＲＬを「移動先」「移動元」とする遷移情報などが挙げられる。 The transition information is information indicating “movement source resource” and “movement destination resource” when two or more resources on the web are browsed continuously. For example, in addition to transition by clicking a hyperlink, Examples include transition information in which two or more URLs continuously input in the URL input field of the toolbar are “movement destination” and “movement source”.

そしてツールバー装置は上記閲覧情報を、例えばＡＰＩ（Application Program Interface）やＤＤＥ（Dynamic Data Exchange）やファイルシステム等を用いてクライアント端末のブラウザから取得する、という具合である。 The toolbar device acquires the browsing information from the browser of the client terminal using, for example, an API (Application Program Interface), a DDE (Dynamic Data Exchange), a file system, or the like.

「閲覧情報送信部」（０３１２）は、取得した閲覧情報を所定の閲覧管理サーバ装置に送信する機能を有する。具体的には、上記のように取得した閲覧情報を、クライアント端末の通信Ｉ／Ｆを使用し通信回線等を介して、所定の閲覧管理サーバ装置に送信する機能を有する。なお、閲覧情報の送信先となる所定の閲覧管理サーバ装置の送信先アドレスは、例えばツールバー装置のプログラム内に予め記述されることで特定する方法などが挙げられる。また、この閲覧情報送信部から閲覧情報が送信されるタイミングはブラウザにて表示されているリソースが切換わるごとに送信するといったタイミングや、一定の時間、又は時刻が到来した際にバッチ処理で送信するタイミング、あるいはツールバー起動時または終了時に送信するタイミングなどが挙げられる。 The “browsing information transmission unit” (0312) has a function of transmitting the acquired browsing information to a predetermined browsing management server device. Specifically, it has a function of transmitting the browsing information acquired as described above to a predetermined browsing management server apparatus via a communication line or the like using the communication I / F of the client terminal. Note that, for example, a method of specifying a transmission destination address of a predetermined browsing management server device, which is a transmission destination of browsing information, by being described in advance in a program of a toolbar device can be used. In addition, the browsing information is transmitted from the browsing information transmission unit every time the resource displayed in the browser is switched, or is transmitted by batch processing when a certain time or time arrives. Or the timing to send when the tool bar starts or ends.

そして、このように各クライアント端末上ブラウザでの閲覧に応じてツールバー装置にて取得された閲覧情報が閲覧管理サーバ装置に送信される。そして後述するように閲覧管理サーバ装置を介して検索装置にて当該閲覧情報が収集されることで、検索装置ではクライアント端末の閲覧リソースの特定などブラウジング（閲覧）履歴を容易に把握することができる、という具合である。 And the browsing information acquired in the toolbar apparatus according to browsing with the browser on each client terminal in this way is transmitted to the browsing management server apparatus. Then, as will be described later, the browsing information is collected by the search device via the browsing management server device, so that the search device can easily grasp the browsing (browsing) history such as specifying the browsing resource of the client terminal. , And so on.

続いて、上記ツールバー装置とともに本実施例の検索システムを構成する「検索装置」（０３２０）の機能的構成について、同様に図３を用いて以下に説明する。図３にあるように「検索装置」は、「閲覧管理サーバ装置」（０３３０）を備える。 Next, the functional configuration of the “search device” (0320) that constitutes the search system of this embodiment together with the toolbar device will be described below with reference to FIG. As shown in FIG. 3, the “search device” includes a “browsing management server device” (0330).

「閲覧管理サーバ装置」（０３３０）は、閲覧情報送信部（０３１２）から送信される閲覧情報を受信、管理する機能を有する。そして、閲覧管理サーバ装置では閲覧情報を利用して、例えばユーザーのＷｅｂブラウジング行動を解析したり、その解析結果からユーザーにとって人気の高いサイトなどを迅速に把握したりすることができる。そして、検索装置での「クローラ」による検索用インデックス情報収集の際には、その人気サイトについてクロールする頻度を上げて当該人気サイトの最新の情報が検索によって提供されるようにする、など様々な処理を行うことができる。 The “browsing management server device” (0330) has a function of receiving and managing browsing information transmitted from the browsing information transmitting unit (0312). The browsing management server device can use the browsing information to analyze, for example, the user's Web browsing behavior, or quickly grasp a site popular with the user from the analysis result. When collecting index information for search by a “crawler” in a search device, the frequency of crawling the popular site is increased so that the latest information on the popular site is provided by the search. Processing can be performed.

またその他にも、詳細は実施例２にて後述するように、例えば、閲覧情報に含まれる閲覧ＵＲＬのうち自身が管理していないＵＲＬを他のページにハイパーリンクされていないリソースのＵＲＬとして「クローラ」の巡回先として新規に指定する処理を行っても良い。あるいは詳細は実施例３にて後述するように、閲覧情報に含まれる閲覧時間などの情報から、誤アクセスなどを排除して集計したリソースの実効的なアクセス指標（視聴度指数）であるトラフィックランクを算出し、「クローラ」の巡回スケジュールに利用する処理を行っても良い。 In addition, as will be described later in detail in the second embodiment, for example, a URL that is not managed by itself among browsing URLs included in browsing information is set as a URL of a resource that is not hyperlinked to another page. You may perform the process newly designated as a crawler's circulation destination. Alternatively, as will be described later in detail in the third embodiment, the traffic rank is an effective access index (viewing index) of resources that are tabulated by removing erroneous access from information such as browsing time included in browsing information. May be calculated and used for the “crawler” patrol schedule.

なお閲覧管理サーバ装置は前述のように、ネットワーク上にて検索装置とは別個のサーバ装置として存在しても良い。その場合、図３（ｂ）に示すように閲覧管理サーバ装置にて管理されている閲覧情報を、「検索装置」が収集し、上記のような各種処理を行うよう構成されていても良い。 As described above, the browsing management server device may exist as a server device separate from the search device on the network. In that case, as shown in FIG. 3B, the “search device” may collect browsing information managed by the browsing management server device and perform various processes as described above.

<ハードウェア的構成> <Hardware configuration>

図４は、上記機能的な各構成要件をハードウェアとして実現した際の、検索システムにおける構成の一例を表す概略図である。この図を利用して本実施例の検索システムにおけるそれぞれのハードウェア構成部の働きについて説明する。この図にあるように、本実施例の検索システムのハードウェア構成は、クライアント端末に組み込まれた「ツールバー装置」（０４１０）と、サーバ装置としてネットワーク上に配置されている「検索装置」（０４２０）と、により構成される。なお本例では、閲覧管理サーバ装置を実現するための「閲覧管理サーバプログラム」が検索装置において実行される構成、すなわち検索装置が閲覧管理サーバ装置を含む構成を例に挙げて説明する。 FIG. 4 is a schematic diagram illustrating an example of a configuration in the search system when the above functional components are realized as hardware. The operation of each hardware component in the search system of this embodiment will be described with reference to this figure. As shown in this figure, the hardware configuration of the search system according to the present embodiment includes a “toolbar device” (0410) incorporated in the client terminal and a “search device” (0420) arranged on the network as a server device. ). In this example, a configuration in which a “browsing management server program” for realizing the browsing management server device is executed in the search device, that is, a configuration in which the search device includes the browsing management server device will be described as an example.

そしてツールバー装置と検索装置は電気通信回線を介して相互に接続され、情報の送受信を行う。なお、電気通信回線はインターネットを含む。 The toolbar device and the search device are connected to each other via an electric communication line to transmit and receive information. The telecommunication line includes the Internet.

また「ツールバー装置」においては、閲覧情報取得部を実現し、またその他各種演算処理を行う「ＣＰＵ（中央演算装置）」（０４１１）と、「ＲＡＭ」（０４１２）や、閲覧情報送信部である「通信Ｉ／Ｆ」（０４１３）を備える。またキーボード、マウス等の入力装置である「ＵＩ」（ユーザーインターフェース）（０４１４）や、ブラウザプログラムにて処理されたリソースを表示するための「ＶＲＡＭ」（０４１５）や、ディスプレイなどの「表示装置」（０４１６）も備える。そしてそれらが「システムバス」などのデータ通信経路によって相互に接続され、情報の送受信や処理を行う。 The “toolbar device” is a “CPU (central processing unit)” (0411), “RAM” (0412), and a browsing information transmission unit that realizes a browsing information acquisition unit and performs various other arithmetic processes. “Communication I / F” (0413). Also, “UI” (user interface) (0414) which is an input device such as a keyboard and mouse, “VRAM” (0415) for displaying resources processed by the browser program, and “display device” such as a display. (0416) is also provided. They are connected to each other via a data communication path such as a “system bus” to transmit / receive information and process information.

またツールバー装置の「ＲＡＭ」上には、ブラウザプログラムと、ツールバープログラムとが格納されており、これらプログラムに従い閲覧情報の取得や送信処理やその他処理に係る「ＣＰＵ」の各種演算処理が実行される。また上記ツールバープログラムによって、ＲＡＭ上の所定のアドレスには閲覧ＵＲＬを格納する領域が確保されている。 In addition, a browser program and a toolbar program are stored on the “RAM” of the toolbar device, and various calculation processes of the “CPU” related to acquisition of browsing information, transmission processing, and other processing are executed in accordance with these programs. . The toolbar program secures an area for storing the browsing URL at a predetermined address on the RAM.

一方「検索装置」においては、各種演算処理を行う「ＣＰＵ」（０４２１）と、「ＲＡＭ」（０４２２）と、「通信Ｉ／Ｆ」（０４２３）と、大量の閲覧情報などを蓄積するためのハードディスクなどの「二次記憶装置」（０４２４）とを有している。そしてそれらが「システムバス」などのデータ通信経路によって相互に接続され、情報の送受信や処理を行う。 On the other hand, in the “search device”, “CPU” (0421), “RAM” (0422), “Communication I / F” (0423) for performing various arithmetic processes, and a large amount of browsing information are stored. And a “secondary storage device” (0424) such as a hard disk. They are connected to each other via a data communication path such as a “system bus” to transmit / receive information and process information.

また検索装置の「ＲＡＭ」上には、閲覧管理サーバプログラムが格納されており、当該プログラムに従い閲覧情報の管理処理やその他処理に係る「ＣＰＵ」の各種演算処理が実行される。 In addition, a browsing management server program is stored in the “RAM” of the search device, and various calculation processes of the “CPU” related to browsing information management processing and other processing are executed in accordance with the program.

ここで、「ツールバー装置」においてユーザーの「ＵＩ」を介した操作入力を受付け、「ＲＡＭ」上のブラウザプログラムおよびツールバープログラムに従い以下のような処理が実行される。すなわち「ＵＩ」を介したブラウザ操作によってＷｅｂ上のリソースへのアクセス指示が入力されると、そのアクセス指示で指定されたリソースのＵＲＬが「ＲＡＭ」の所定アドレスに格納される。するとブラウザプログラムに従い指定されたＵＲＬに対してＨＴＴＰリクエストが送信される。そしてそのＨＴＴＰリクエストに対するレスポンスをアクセスリソースのコンテンツとして「通信Ｉ／Ｆ」にて受信する。つづいて受信したコンテンツに含まれるＨＴＭＬファイル、イメージファイルなどを、ブラウザプログラムが有するＨＴＭＬレンダリングエンジンなどに従った「ＣＰＵ」の演算処理よってレンダリング処理する。そしてその処理結果が「ＶＲＡＭ」に転送され「表示装置」上にＷｅｂコンテンツが表示される。そしてユーザーは表示された当該ＷＷＷ上のＷｅｂページなどのリソースを閲覧する。 Here, the “toolbar device” receives an operation input from the user via the “UI”, and the following processing is executed according to the browser program and the toolbar program on the “RAM”. That is, when an instruction to access a resource on the Web is input by a browser operation via “UI”, the URL of the resource designated by the access instruction is stored in a predetermined address of “RAM”. Then, an HTTP request is transmitted to the URL specified according to the browser program. A response to the HTTP request is received as “access resource content” by the “communication I / F”. Subsequently, an HTML file, an image file, and the like included in the received content are subjected to rendering processing by arithmetic processing of a “CPU” in accordance with an HTML rendering engine or the like included in the browser program. Then, the processing result is transferred to “VRAM”, and the Web content is displayed on the “display device”. Then, the user browses resources such as the displayed Web page on the WWW.

また、それとともに前記ブラウザでのアクセス指示入力に応じてツールバープログラムは以下の処理を実行する。すなわち「ＲＡＭ」の所定アドレスに格納されたＵＲＬを閲覧情報として、前述のような所定の各タイミングで「通信Ｉ／Ｆ」から「検索装置」に送信する、という具合である。 At the same time, the toolbar program executes the following process in response to the access instruction input from the browser. That is, the URL stored at a predetermined address of “RAM” is transmitted as browsing information from the “communication I / F” to the “search device” at each predetermined timing as described above.

また、ツールバープログラムは、その他に以下のようにしてユーザーＩＤや閲覧時間情報などを取得して、上記ＵＲＬと同様に閲覧情報として送信しても良い。まず、ツールバープログラムが起動すると、インターネットキャッシュなどからユーザーＩＤを取得し、「ＲＡＭ」上の所定のアドレスに格納する。また、ブラウザプログラムによりＨＴＴＰレスポンスの取得が完了するとＯＳのタイマ関数などより現在時刻を取得し「ＲＡＭ」上の所定のアドレスに格納する。あるいは、ブラウザにおいて一のＵＲＬに関してウィンドウがアクティブ状態である時間を監視し、その時間を閲覧時間として「ＲＡＭ」上の所定のアドレスに格納する。そして、上記「ＲＡＭ」に格納した各情報を前述のような所定の各タイミングで「通信Ｉ／Ｆ」から「検索装置」に送信する、という具合である。 In addition, the toolbar program may acquire a user ID, browsing time information, and the like as follows, and transmit it as browsing information in the same manner as the URL. First, when the toolbar program is activated, a user ID is acquired from an Internet cache or the like and stored in a predetermined address on the “RAM”. Further, when acquisition of the HTTP response is completed by the browser program, the current time is obtained from the OS timer function or the like and stored in a predetermined address on the “RAM”. Alternatively, the time during which the window is active is monitored for one URL in the browser, and the time is stored as a browsing time at a predetermined address on the “RAM”. Then, the information stored in the “RAM” is transmitted from the “communication I / F” to the “search device” at the predetermined timings as described above.

そして、「検索装置」では、「通信Ｉ／Ｆ」にて各ツールバー装置から送信されてきた閲覧情報を受信し「ＲＡＭ」の所定アドレスに格納する。そして、閲覧管理サーバプログラムにしたがって、例えば閲覧情報に含まれるＵＲＬやユーザーＩＤなどパラメータとする閲覧履歴データベースなどを「ＣＰＵ」の演算処理によって生成し、ＨＤＤなどの「二次記憶装置」に格納する、という具合である。 Then, the “search device” receives the browsing information transmitted from each toolbar device through “communication I / F” and stores it in a predetermined address of “RAM”. Then, according to the browsing management server program, for example, a browsing history database or the like having parameters such as URL and user ID included in the browsing information is generated by calculation processing of “CPU” and stored in “secondary storage device” such as HDD. , And so on.

そして、そのデータベースを利用して、例えば閲覧ＵＲＬごとのアクセス数を「ＣＰＵ」の演算処理によって集計する。そして「アクセス数−巡回頻度テーブル」などを参照し、集計結果の高い順にクローラの巡回頻度を決定する処理を行う、という具合である。 Then, using the database, for example, the number of accesses for each browsing URL is totaled by the calculation processing of “CPU”. Then, with reference to the “number of accesses—circulation frequency table” or the like, a process of determining the crawler's circulation frequency in descending order of the counting result is performed.

<処理の流れ> <Process flow>

図５は、本実施例の検索システムにおける処理の流れの一例を表すフローチャートである。なお、以下に示すステップは、媒体に記録され計算機を制御するためのプログラムを構成する処理ステップであっても構わない。 FIG. 5 is a flowchart showing an example of the flow of processing in the search system of the present embodiment. Note that the steps shown below may be processing steps that constitute a program for controlling a computer recorded on a medium.

この図にあるように、ツールバー装置にてツールバーが起動されると、まずＵＲＬを含む閲覧情報を取得する（ステップＳ０５０１）。次にユーザーのブラウザの操作により閲覧ＵＲＬが更新（変更）されたか否かを判断する（ステップＳ０５０２）。更新されたと判断された場合には、閲覧ＵＲＬを含む情報を閲覧情報として閲覧管理サーバ装置を含む検索装置に送信する（ステップＳ０５０３）。 As shown in this figure, when the tool bar is activated in the tool bar device, first, browsing information including a URL is acquired (step S0501). Next, it is determined whether or not the browsing URL has been updated (changed) by the user's browser operation (step S0502). If it is determined that the information has been updated, information including the browsing URL is transmitted as browsing information to a search device including the browsing management server device (step S0503).

またツールバー装置では、閲覧情報の送信（ステップＳ０５０３）後、閲覧が終了したか否かの判断がなされる（ステップＳ０５０４）。ブラウザが終了されるなど、閲覧が終了したと判断されると処理は終了する。終了していないと判断された場合には、URLを含む閲覧情報を取得する（ステップＳ０５０１）処理に戻る。 In the tool bar device, after the browsing information is transmitted (step S0503), it is determined whether browsing is completed (step S0504). When it is determined that browsing has been completed, such as when the browser is terminated, the process ends. If it is determined that the process has not ended, the process returns to the process of acquiring browsing information including the URL (step S0501).

検索装置では、ステップＳ０５０３にてツールバー装置により送信されたURLを含む閲覧情報を受信し（ステップS０５１１）、記録媒体等に格納する（ステップＳ０５１２）。そして各ツールバー装置から取得した閲覧情報を利用してデータベースを生成、蓄積し、各種処理に利用する。 In the search device, the browsing information including the URL transmitted by the toolbar device in step S0503 is received (step S0511) and stored in a recording medium or the like (step S0512). A database is generated and stored using browsing information acquired from each toolbar device, and used for various processes.

もちろん、前述のようにステップＳ０５０３では、検索装置とは別個の閲覧管理サーバ装置に閲覧情報を送信しても良い。そして、その場合には、検索措置にて閲覧管理サーバ装置で受信した閲覧情報を収集するステップが追加されると良い。 Of course, as described above, in step S0503, the browsing information may be transmitted to a browsing management server device separate from the search device. In that case, a step of collecting browsing information received by the browsing management server device by a search measure may be added.

<効果の簡単な説明> <Brief description of effect>

以上のように本実施例の検索システムによってクライアント端末のツールバー装置を利用してユーザーの閲覧情報を取得し、その閲覧情報を閲覧管理サーバ装置を介して検索装置に送信することが可能となる。そして検索装置では、ネットワーク上の複数のツールバー装置から送信される閲覧情報を幅広く収集しデータベースなどを構築することが容易に可能である。 As described above, it is possible to acquire the user's browsing information by using the toolbar device of the client terminal by the search system of the present embodiment, and transmit the browsing information to the searching device via the browsing management server device. In the search device, it is possible to easily collect browsing information transmitted from a plurality of toolbar devices on the network and construct a database or the like.

そしてその構築したデータベースを利用して、例えば閲覧ＵＲＬごとのアクセス数を集計し、集計結果の高い順にクローラの巡回頻度を決定する処理を行ったりすることができる。 Then, using the constructed database, for example, the number of accesses for each browsing URL can be totaled, and processing for determining the crawler's circulation frequency in descending order of the totaling result can be performed.

≪実施例２≫ << Example 2 >>

<概要> <Overview>

図６は、本実施例の検索システムにおける処理の一例を説明するための概念図である。この図にあるように、あるユーザーＡが、ツールバー装置を備えたクライアント端末（０６０１）を利用して、友人が新規に開設したＷｅｂサイトＡ（０６０２）にアクセスしている。ここで、新規ＷｅｂサイトＡは新規開設されたばかりであるため、他のＷｅｂ上のリソースからのハイパーリンクなどが無い状態である。そのため「クローラ」の巡回対象にはなっておらず、未だ検索装置（０６０３）の検索サービス用インデックスファイル（０６０４）には存在していない。 FIG. 6 is a conceptual diagram for explaining an example of processing in the search system of the present embodiment. As shown in this figure, a user A accesses a Web site A (0602) newly opened by a friend using a client terminal (0601) equipped with a toolbar device. Here, since the new Web site A has just been newly established, there is no hyperlink from other Web resources. Therefore, the crawler is not a circulation target and does not yet exist in the search service index file (0604) of the search device (0603).

しかし、本実施例の検索システムにおいては、上記クライアント端末によるＷｅｂサイトＡへのアクセスによって、検索装置に対して当該ＵＲＬを含む閲覧情報が送信される構成となっている。そこで、検索装置では送信されてきた当該ＵＲＬと、検索サービス用インデックスファイルに含まれるＵＲＬとの比較処理を行う。そしてインデックスファイルに含まれていなければコンテンツを取得しインデックスファイルに追加格納する。またそれに加えて「クローラ」によるハイパーリンク以外の直接巡回対象ＵＲＬとして設定する、という具合である。 However, in the search system of the present embodiment, browsing information including the URL is transmitted to the search device when the client terminal accesses the Web site A. Therefore, the search device compares the transmitted URL with the URL included in the search service index file. If it is not included in the index file, the contents are acquired and additionally stored in the index file. In addition to this, the URL is set as a direct circulation target URL other than the hyperlink by “crawler”.

<機能的構成> <Functional configuration>

図７は、本実施例の閲覧管理サーバ装置における機能ブロックの一例を表す図である。この図にあるように、本実施例の「検索システム」（０７００）は、実施例１の構成を基本として、「ツールバー装置」（０７１０）と、「検索装置」（０７２０）とを有する。そして「ツールバー装置」は、実施例１同様に「閲覧情報取得部」（０７１１）と、「閲覧情報送信部」（０７１２）とを有する。なお、上記ツールバー装置と検索装置の機能的構成を含む詳細な説明は、実施例1にて記載済みであるので省略する。 FIG. 7 is a diagram illustrating an example of functional blocks in the browsing management server device according to the present embodiment. As shown in this figure, the “search system” (0700) of this embodiment has a “toolbar device” (0710) and a “search device” (0720) based on the configuration of the first embodiment. The “toolbar device” includes a “browsing information acquisition unit” (0711) and a “browsing information transmission unit” (0712) as in the first embodiment. The detailed description including the functional configurations of the toolbar device and the search device has already been described in the first embodiment, and will be omitted.

そして本実施例の特徴点は、「検索装置」が「第一閲覧管理サーバ装置」（０７３０）を有し、その第一閲覧管理サーバ装置がさらに「第一インデクサ部」（０７３１）を有する点である。 The feature of the present embodiment is that the “search device” has a “first browsing management server device” (0730), and the first browsing management server device further has a “first indexer unit” (0731). It is.

「第一インデクサ部」（０７３１）は、収集した閲覧情報に基づいて新規検索対象ＵＲＬを抽出するとともに、抽出したＵＲＬのコンテンツをインデクシングする機能を有する。 The “first indexer unit” (0731) has a function of extracting a new search target URL based on the collected browsing information and indexing the content of the extracted URL.

「新規検索対象ＵＲＬ」とは、検索エンジンにて新規に検索対象として利用するＵＲＬをいう。具体的には、検索装置は「クローラ」が巡回して収集したＷＷＷ上のリソースに関する情報をまとめてインデックスファイルを生成、保持している。そして検索エンジンによる検索の際には、その保持しているインデックスファイルを参照し検索を行うよう構成されている。つまりインデックスファイルに含まれていないＵＲＬは検索対象とならない構成となっている。そこで、閲覧情報に含まれるＵＲＬとインデックスファイルに格納されているＵＲＬと差分情報を取得し、その差分であるＵＲＬを新規検索対象ＵＲＬとして抽出する、という具合である。 “New search target URL” refers to a URL that is newly used as a search target in the search engine. Specifically, the search apparatus generates and holds an index file by collecting information on resources on the WWW collected by the “crawler”. When a search is performed by a search engine, the index file stored therein is referred to perform a search. That is, URLs that are not included in the index file are not searched. Therefore, the URL included in the browsing information, the URL stored in the index file, and the difference information are acquired, and the URL that is the difference is extracted as a new search target URL.

そして、このように抽出された新規検索対象のＵＲＬ、および当該ＵＲＬにあるリソースのコンテンツなどが検索装置にてインデクシング（インデックスファイルへの追加処理）されることで、本実施例の検索システムでは新規開設Ｗｅｂページなど本来検索対象となっていなかったようなＷＷＷ上のリソースについても検索対象とすることができる、という具合である。また、ここで「インデクシング」処理とは、詳細には検索エンジンがターゲットとなるキーワードを高速に検索しやすいようなデータ構造や、ファイル構造に再構成することである。例えば、抽出したキーワードをキーとし、これをハッシュテーブル化したインデックスを新たに付与するといった方法などが挙げられる。 Then, the URL of the new search target extracted in this way and the content of the resource at the URL are indexed (added to the index file) by the search device, so that the search system of the present embodiment is new. For example, resources on the WWW that were not originally searched, such as an established Web page, can be searched. Here, the “indexing” process is to specifically reconfigure the search engine into a data structure or a file structure that makes it easy to search for a target keyword at high speed. For example, there is a method of newly assigning an index in which the extracted keyword is used as a key and this is converted into a hash table.

また、本実施例の検索装置は上記インデクシング処理に加え、さらに抽出した新規検索対象ＵＲＬを「クローラ」がハイパーリンクを辿らずに直接巡回する対象として追加する処理を行い、更新情報を取得するよう構成しても良い。 In addition to the above-described indexing process, the search apparatus according to the present embodiment performs a process of adding the extracted new search target URL as a target that the “crawler” directly circulates without following the hyperlink, and acquires update information. It may be configured.

<ハードウェア的構成> <Hardware configuration>

図８は、上記機能的な各構成要件をハードウェアとして実現した際の、検索システムにおける構成の一例を表す概略図である。この図を利用して本実施例の検索システムにおけるそれぞれのハードウェア構成部の働きについて説明する。この図にあるように、本実施例の検索システムのハードウェア構成は、ツールバー装置（０８１０）と検索装置（０８２０）より構成される。そして両者はインターネットを含む電気通信回線を介して、相互に接続可能となっている。 FIG. 8 is a schematic diagram illustrating an example of a configuration in the search system when the above functional components are realized as hardware. The operation of each hardware component in the search system of this embodiment will be described with reference to this figure. As shown in this figure, the hardware configuration of the search system according to the present embodiment includes a toolbar device (0810) and a search device (0820). Both can be connected to each other via a telecommunication line including the Internet.

また、「ツールバー装置」（０８１０）は、実施例１と同様に「ＣＰＵ」（０８１１）と「ＲＡＭ」（０８１２）、「通信Ｉ／Ｆ」（０８１３）、「ＵＩ」（０８１４）、「ＶＲＡＭ」（０８１５）と「表示装置（ディスプレイなど）」（０８１６）と、を備える。 In addition, the “toolbar device” (0810) includes “CPU” (0811), “RAM” (0812), “Communication I / F” (0813), “UI” (0814), “VRAM” as in the first embodiment. ”(0815) and“ display device (display, etc.) ”(0816).

そして「検索装置」（０８２０）も実施例１と同様に、「ＣＰＵ」（０８２１）と「ＲＡＭ」（０８２２）、「通信Ｉ／Ｆ」（０８２３）と、「二次記憶装置」（０８２４）と、を有する。 Similarly to the first embodiment, the “search device” (0820) also includes “CPU” (0821), “RAM” (0822), “communication I / F” (0823), and “secondary storage device” (0824). And having.

ここで実施例１にて説明したように、ネットワーク上の複数のツールバー装置のツールバープログラムの処理によって、閲覧ＵＲＬを含む閲覧情報が検索装置に送信され、「ＲＡＭ」の所定アドレスに格納される。 As described in the first embodiment, the browsing information including the browsing URL is transmitted to the search device by the processing of the toolbar program of the plurality of toolbar devices on the network, and is stored in the predetermined address of “RAM”.

検索装置では、第一閲覧管理サーバプログラムに含まれる第一インデクサプログラムに従い、以下のような処理を実行する。まず、予め「二次記憶装置」にて格納されている検索サービス用のインデックスファイルを参照し、「ＲＡＭ」の所定アドレスに格納された閲覧ＵＲＬをキーとした検索処理が「ＣＰＵ」の演算によって実行される。そして、その検索処理の結果、インデックスファイルに当該ＵＲＬが含まれていないとの判断結果が出力されれば、そのＵＲＬは新規検索対象のＵＲＬである判断する。すると第一インデクサプログラムに従い、「ＣＰＵ」は「ＲＡＭ」に格納されたそのＵＲＬに対して通信Ｉ／Ｆ（０９１３）を介してアクセスし、リソースコンテンツを取得する命令を出力する。そして取得したコンテンツに関し、「ＣＰＵ」の処理によってインデクシング処理が実行され、新規のインデックス情報として「二次記憶装置」に保持されているインデックスファイルに記録される、という具合である。 In the search device, the following processing is executed in accordance with the first indexer program included in the first browsing management server program. First, an index file for a search service stored in a “secondary storage device” is referred to in advance, and a search process using a browse URL stored in a predetermined address of “RAM” as a key is performed by an operation of “CPU”. Executed. As a result of the search process, if a determination result indicating that the URL is not included in the index file is output, it is determined that the URL is a new search target URL. Then, according to the first indexer program, the “CPU” accesses the URL stored in the “RAM” via the communication I / F (0913), and outputs a command for acquiring the resource content. For the acquired content, the indexing process is executed by the process of the “CPU” and recorded as new index information in the index file held in the “secondary storage device”.

また、第一インデクサプログラムに従って、「ＲＡＭ」に格納された当該ＵＲＬを、次回以降のクローラの巡回対象としてスケジュールに組み込む処理を実行しても良い。またさらに、当該ＵＲＬに関しては情報鮮度の高いリソースであると仮定して、クローラの巡回頻度を上げたスケジュールを生成するよう処理しても良い。 Further, in accordance with the first indexer program, a process of incorporating the URL stored in the “RAM” into the schedule as a crawler's circulation target for the next and subsequent times may be executed. Still further, assuming that the URL is a resource with high information freshness, processing may be performed so as to generate a schedule with an increased crawler circulation frequency.

<処理の流れ> <Process flow>

図９は、本実施例の検索システムにおける処理の流れの一例を表すフローチャートである。なお、以下に示すステップは、媒体に記録され計算機を制御するためのプログラムを構成する処理ステップであっても構わない。 FIG. 9 is a flowchart illustrating an example of a processing flow in the search system according to this embodiment. Note that the steps shown below may be processing steps that constitute a program for controlling a computer recorded on a medium.

ここで、ツールバー装置にてツールバーが起動されると、まずＵＲＬを含む閲覧情報を取得する（ステップＳ０９０１）。次にユーザーのブラウザの操作により閲覧ＵＲＬが更新（変更）されたか否かを判断する（ステップＳ０９０２）。そして更新されたと判断された場合には、閲覧ＵＲＬを含む情報を閲覧情報として第一閲覧管理サーバ装置を含む検索装置に送信する（ステップＳ０９０３）。 Here, when the tool bar is activated in the tool bar device, first, browsing information including the URL is acquired (step S0901). Next, it is determined whether or not the browsing URL has been updated (changed) by the user's browser operation (step S0902). If it is determined that the information has been updated, information including the browsing URL is transmitted as browsing information to the search device including the first browsing management server device (step S0903).

つづいて検索装置において、ネットワーク上の各クライアント端末から送信されてきたｎ個のＵＲＬを含む閲覧情報を受信、収集する（ステップＳ０９１１）。次に、閲覧情報を記録媒体等に格納する（ステップＳ０９１２）。次にループ１に入り、格納されているｎ個のＵＲＬのそれぞれについて検索装置のインデックスファイル（データベース）を参照し、同一のＵＲＬが存在するか否かの判定を行う（ステップＳ０９１４）。 Subsequently, the search device receives and collects browsing information including n URLs transmitted from each client terminal on the network (step S0911). Next, the browsing information is stored in a recording medium or the like (step S0912). Next, loop 1 is entered, and the index file (database) of the search device is referred to for each of the n URLs stored to determine whether or not the same URL exists (step S0914).

そしてその判定処理の結果、インデックスファイルに存在しないと判定されたＵＲＬについては、新規検索対象ＵＲＬとしてリソースコンテンツを取得しインデクシングする（ステップＳ０９１５）。一方、同一のＵＲＬが存在すると判定されたＵＲＬについては、インデクシングの処理はスキップする。そして、閲覧情報で示される全てのＵＲＬに対して上記判定処理が完了すると、処理を終了する。 As a result of the determination process, for the URL determined not to exist in the index file, the resource content is acquired and indexed as a new search target URL (step S0915). On the other hand, the indexing process is skipped for URLs determined to have the same URL. When the determination process is completed for all URLs indicated by the browsing information, the process ends.

<効果の簡単な説明> <Brief description of effect>

以上のように本実施例の検索システムでは、新規開設Ｗｅｂページなどハイパーリンクが張られていないため従来のクローラでは検索対象となっていなかったようなＷＷＷ上のリソースについても検索用のインデックスファイルとしてインデックス化することができる。 As described above, in the search system of this embodiment, since a hyperlink such as a newly opened web page is not provided, resources on the WWW that have not been searched by conventional crawlers are also used as search index files. Can be indexed.

≪実施例３≫ Example 3

<概要> <Overview>

本実施例は、上記実施例１の検索システムを基本として、ネットワーク上の各クライアント端末のツールバー装置から閲覧情報を収集する。そしてＷＷＷ上のリソースの視聴度合いに関して、その閲覧情報を利用して従来よりも精度の高い視聴度指数である「トラフィックランク」を算出することを特徴とする検索システムである。 In the present embodiment, browsing information is collected from a toolbar device of each client terminal on the network based on the search system of the first embodiment. The search system is characterized by calculating a “traffic rank”, which is an audience degree index with higher accuracy than the related art, by using the browsing information regarding the degree of viewing of resources on the WWW.

図１０は、本実施例の検索装置における処理の一例を説明するための図である。この図にあるように、ツールバー装置を備えるクライアント端末にて、サイトＢを経由したサイトＡへのアクセスが実行された。ここで、サイトＢへのアクセス時間は単なるリンクの経由であったため短い。一方、閲覧を目的としたサイトＡへのアクセス時間は長いものとなっている。 FIG. 10 is a diagram for explaining an example of processing in the search device according to the present embodiment. As shown in this figure, access to the site A via the site B is executed at the client terminal equipped with the toolbar device. Here, the access time to the site B is short because it is simply via a link. On the other hand, the access time to the site A for the purpose of browsing is long.

そして、そのようなアクセス（閲覧）時間を含む閲覧情報が検索装置に送信されると、検索装置ではそのアクセス時間や閲覧情報に含まれるその他情報を利用して「トラフィックランク」を算出する。そして、このトラフィックランクは、後述するように上記リンク経由によるアクセスや誤アクセスを排除した閲覧実体に即した指数となっているため、従来のサイト（ＷＷＷ上のリソース）視聴度指数よりも精度の高い指数を算出することができる、という具合である。 When browsing information including such access (browsing) time is transmitted to the search device, the search device calculates the “traffic rank” using the access time and other information included in the browsing information. As will be described later, this traffic rank is an index corresponding to a browsing entity that eliminates access via the above link and erroneous access, and is therefore more accurate than the conventional site (resource on WWW) audience index. That is, a high index can be calculated.

<機能的構成> <Functional configuration>

図１１は、本実施例の検索装置における機能ブロックの一例を表す図である。この図にあるように、本実施例の「検索システム」（１１００）は、実施例１の構成を基本として、「ツールバー装置」（１１１０）と、「検索装置」（１１２０）とを有する。また「ツールバー装置」（１１１０）は、実施例１と同様に「閲覧情報送信部」（１１１１）と、「閲覧情報取得部」（１１１２）とを有する。なお、上記各構成は、実施例１にて既に記載済みであるのでその説明は省略する。 FIG. 11 is a diagram illustrating an example of functional blocks in the search device according to the present embodiment. As shown in this figure, the “search system” (1100) of this embodiment has a “toolbar device” (1110) and a “search device” (1120) based on the configuration of the first embodiment. The “toolbar apparatus” (1110) includes a “browsing information transmission unit” (1111) and a “browsing information acquisition unit” (1112) as in the first embodiment. Since each of the above configurations has already been described in the first embodiment, the description thereof is omitted.

そして本実施例の特徴は、「検索装置」（１１２０）が、「第二閲覧管理サーバ装置」（１１３０）を有し、その第二閲覧管理サーバ装置がさらに「閲覧情報蓄積部」（１１３１）と、「トラフィックランクスコアリング部」（１１３２）とを有する点である。 The feature of this embodiment is that the “search device” (1120) has a “second browsing management server device” (1130), and the second browsing management server device further includes a “browsing information storage unit” (1131). And a “traffic rank scoring section” (1132).

「閲覧情報蓄積部」（１１３１）は、収集した閲覧情報を蓄積する機能を有する。具体的にはツールバー装置において取得され、閲覧管理サーバ装置を介するなどして検索装置にて収集された閲覧情報を入力として受け取り、ＨＤＤや不揮発性メモリなどの記憶媒体に格納する機能を有する。なお、ここで蓄積される閲覧情報は、後述するトラフィックランクスコアリング部におけるトラフィックランクの算出のため、例えばリソースごとの閲覧（アクセス）時間やユーザーＩＤ（クライアント端末ＩＤ、ツールバー装置ＩＤ）、リソースの遷移情報などが含まれていることが望ましい。 The “browsing information storage unit” (1131) has a function of storing the collected browsing information. Specifically, it has a function of receiving browsing information acquired by the toolbar device and collected by the search device via the browsing management server device as an input and storing it in a storage medium such as an HDD or a non-volatile memory. Note that the browsing information accumulated here is, for example, browsing (access) time for each resource, user ID (client terminal ID, toolbar device ID), resource information, for the purpose of calculating a traffic rank in a traffic rank scoring unit described later. It is desirable to include transition information.

「トラフィックランクスコアリング部」（１１３２）は、閲覧情報蓄積部（１１３１）に蓄積されている閲覧情報に基づいてＵＲＬ毎にトラフィックランクを算出する機能を有する。「トラフィックランク」とは、ＷＷＷ上のリソースの視聴度指数をいい、例えば以下のようにして算出する方法が挙げられる。 The “traffic rank scoring unit” (1132) has a function of calculating the traffic rank for each URL based on the browsing information stored in the browsing information storage unit (1131). “Traffic rank” refers to an audience rating index of resources on the WWW, and examples include a method of calculating as follows.

すなわち、トラフィックランクの算出方法の一例としては、例えば１日における全ウェッブページの合計表示時間に対するウェッブページＡの合計表示時間のパーセンテージをウェッブページのランクとして算出する方法などが挙げられる。このようにウェッブページのランクを算出することで、クリックミスなどの表示時間が短いアクセスに関しては、そのランクを低いものとして算出することができる。したがって単純なアクセス数などでは推定することが困難な、クリックミスなどを排除して実体に即した広告効果などを示す指標としてウェッブページのトラフィックランクを算出することができる。 That is, as an example of the traffic rank calculation method, for example, a method of calculating the percentage of the total display time of the web page A with respect to the total display time of all the web pages in one day as the rank of the web page. By calculating the rank of the web page in this way, it is possible to calculate the access with a short display time such as a click mistake as a low rank. Therefore, it is possible to calculate the traffic rank of a web page as an index indicating an advertisement effect or the like according to the substance by eliminating click mistakes that are difficult to estimate with a simple number of accesses.

なお、上記ウェッブページのランクの算出処理として、主に以下のような３つのパターンによる算出処理が考えられる。第一に、例えば一日を単位とし、集計された全端末あるいは一部端末からのブラウジング情報に含まれる表示時間を利用して、いわゆるウェッブページごとの「全国ランク」を算出する処理が挙げられる。 As the web page rank calculation process, calculation processes based on the following three patterns are mainly conceivable. First, for example, a process of calculating a so-called “national rank” for each web page using the display time included in browsing information from all terminals or a part of the terminals, for example, in units of one day. .

また第二に、閲覧情報に含まれるユーザーＩＤなどから「ユーザーＩＤ−属性情報テーブル」などを参照して特定されるユーザーの属性情報を利用して、年齢別、男女別、居住地別、職業別などの各種セグメントごとにウェッブページの「属性別ランク」を算出するように構成しても良い。 Second, by using the user attribute information specified by referring to the “user ID-attribute information table” from the user ID included in the browsing information, etc., by age, gender, residence, occupation You may comprise so that the "rank according to attribute" of a web page may be calculated for every segment of others.

なお上記「ユーザーＩＤ−属性情報テーブル」は、一例として以下のようにして構築することができる。例えば、クライアント端末にツールバー装置をインストールする際には、上記ユーザーの属性情報の登録を必須とする。そして、登録された属性情報をＣｏｏｋｉｅなどのユーザーＩＤと紐付けて検索装置にて管理することで、クライアント端末のツールバー装置から送信されたユーザー（属性情報）の特定を実行する、という具合である。 The “user ID-attribute information table” can be constructed as follows as an example. For example, when installing a toolbar device in a client terminal, registration of the user attribute information is essential. Then, the registered attribute information is associated with a user ID such as Cookie and managed by the search device, so that the user (attribute information) transmitted from the toolbar device of the client terminal is specified. .

また第三に、ユーザーＩＤなどを利用して閲覧情報をユーザー単位で分類し、「ユーザーαにおけるサイトＡのランク」、「ユーザーαにおけるサイトＢのランク」といった具合に、いわゆる「ユーザー別ランク」を算出するよう構成しても良い。 Thirdly, the browsing information is classified by user using a user ID, etc., and so-called “rank by user” such as “rank of site A in user α” and “rank of site B in user α”. May be configured to calculate.

具体的に、この第三の算出処理を行うための構成としては、本実施例の検索システムのツールバー装置と検索装置が以下のような構成を備えていると良い。例えば、このツールバー装置は、その閲覧情報送信部にて送信する閲覧情報に、例えばＣｏｏｋｉｅやその他ユーザーを識別するための情報であるユーザー識別情報を含み検索装置に送信するよう構成されている。 Specifically, as a configuration for performing the third calculation process, the toolbar device and the search device of the search system according to the present embodiment may have the following configurations. For example, this toolbar apparatus is configured to include, for example, user identification information that is information for identifying a cookie or other user in the browsing information transmitted by the browsing information transmission unit, and transmits the information to the search apparatus.

そして、ここで送信されたユーザー識別情報を利用して、検索装置の以下の構成によって「ユーザー別ランク」が算出されることになる。すなわち検索装置は、そのトラフィックランクスコアリング部がさらに「ユーザー別トラフィックランク算出手段」をさらに含むことを特徴とする。 Then, using the user identification information transmitted here, the “rank by user” is calculated by the following configuration of the search device. That is, the search device is characterized in that the traffic rank scoring unit further includes “user-specific traffic rank calculation means”.

「ユーザー別トラフィックランク算出手段」は、ユーザー別にトラフィックランクを算出する機能を有する。このユーザー別のトラフィックランクは、例えば前述の表示時間に応じたウェッブページのトラフィックランク算出時に、ユーザー識別情報を利用してユーザー単位で算出される、という具合である。 The “user-specific traffic rank calculation means” has a function of calculating a traffic rank for each user. For example, the traffic rank for each user is calculated for each user using the user identification information when calculating the traffic rank of the web page according to the display time described above.

また、上記ウェッブページのランクの算出に際しては、同一ユーザーによる不正なども含めた同一リソースへの重複アクセスを排除してトラフィックランクを算出するよう構成しても良い。具体的には、閲覧情報に含まれるＣｏｏｋｉｅなどを利用して、同一ユーザーが同一ページへアクセスしたことを判断する。そして、そのように判断された表示時間に関しては、合計値ではなくその平均値などを利用してトラフィックランクを算出する、という具合である。 Further, when calculating the rank of the web page, the traffic rank may be calculated by eliminating duplicate access to the same resource including fraud by the same user. Specifically, it is determined that the same user has accessed the same page using Cookie included in the browsing information. For the display time determined as such, the traffic rank is calculated using not the total value but the average value.

また、上記のように集計されたトラフィックランクについて、前日のアクセス数からの増減率（バースト度）を算出し、その増減率をウェッブページのランクに反映するような計算式を用いるよう構成しても良い。このような増減率を反映させることで、ウェッブページの一日ごとの盛り上がりを反映させることもできる。 In addition, for the traffic ranks aggregated as described above, the rate of increase / decrease (burst degree) from the number of accesses the previous day is calculated, and a calculation formula that reflects the rate of increase / decrease in the rank of the web page is used. Also good. By reflecting such an increase / decrease rate, it is possible to reflect the daily excitement of the web page.

図１２は、上記のようにして算出されたＷｅｂページ（リソース）ごとのトラフィックランクの一例を表す図である。この図にあるように、例えばその視聴回数やバースト度（前日のアクセス数からの増減率）を用いて、所定の関数によりリソースごとにトラフィックランクが「１４０」、「１２０」、「６０」、「８９」といった具合に算出される。そして、このようにして算出されたトラフィックランクに関して、本実施例の検索装置は当該リソースのＵＲＬと関連付けて図に示すようにテーブルデータなどとして保持する、という具合である。そしてそのトラフィックランクを利用して実施例４や５にて後述するよう「クローラ」の巡回頻度のスケジュール調整や検索結果の順位ソートなど様々な処理を実行することができる。 FIG. 12 is a diagram illustrating an example of the traffic rank for each Web page (resource) calculated as described above. As shown in this figure, the traffic rank is “140”, “120”, “60” for each resource according to a predetermined function using, for example, the number of viewing times and the burst degree (the rate of increase / decrease from the number of accesses on the previous day). For example, “89” is calculated. Then, with respect to the traffic rank calculated in this way, the search device of this embodiment associates it with the URL of the resource and holds it as table data as shown in the figure. Then, using the traffic rank, various processes such as the schedule adjustment of the crawler circulation frequency and the sorting of the search result rank can be executed as will be described later in the fourth and fifth embodiments.

なお、例えばあるＷｅｂサイトの「トップページ」と「ページ１」「ページ２」、という具合に、通常は複数のリソースで一のＷｅｂコンテンツが構成されることが多い。したがってトラフィックランクスコアリング部で算出されるトラフィックランクは、ＵＲＬごとでなくても構わない。例えば、同一Ｗｅｂサイトを構成するＷｅｂページであれば、ＩＰアドレスやＵＲＬのドメインなどを比較することでそのことを判断し、ＵＲＬは異なっていても一のトラフィックランクが算出されるよう構成しても良い。 For example, “top page”, “page 1”, “page 2”, etc. of a certain web site, usually, a single web content is often composed of a plurality of resources. Therefore, the traffic rank calculated by the traffic rank scoring unit does not have to be for each URL. For example, if it is a Web page that constitutes the same Web site, it is judged by comparing the IP address, the domain of the URL, etc., and a single traffic rank is calculated even if the URLs are different. Also good.

<ハードウェア的構成> <Hardware configuration>

図１３は、上記機能的な各構成要件をハードウェアとして実現した際の、検索システムにおける構成の一例を表す概略図である。この図を利用して本実施例の検索システムにおけるそれぞれのハードウェア構成部の働きについて説明する。 FIG. 13 is a schematic diagram illustrating an example of a configuration in the search system when the above functional components are realized as hardware. The operation of each hardware component in the search system of this embodiment will be described with reference to this figure.

この図にあるように、本実施例の検索システムのハードウェア構成は、ツールバー装置（１３１０）と検索装置（１３２０）とにより構成され、両者はインターネットを含む電気通信回線を介して、相互に接続可能となっている。 As shown in this figure, the hardware configuration of the search system of the present embodiment is composed of a toolbar device (1310) and a search device (1320), which are connected to each other via a telecommunication line including the Internet. It is possible.

また、「ツールバー装置」（１３１０）は、実施例１と同様に「ＣＰＵ」（１３１１）と「ＲＡＭ」（１３１２）、「通信Ｉ／Ｆ」（１３１３）、「ＵＩ」（１３１４）、「ＶＲＡＭ」（１３１５）と「表示装置（ディスプレイなど）」（１３１６）と、を備える。 In addition, the “toolbar device” (1310) includes a “CPU” (1311), a “RAM” (1312), a “communication I / F” (1313), a “UI” (1314), and a “VRAM” as in the first embodiment. ”(1315) and“ display device (display, etc.) ”(1316).

そして「検索装置」（１３２０）も実施例１と同様に、「ＣＰＵ」（１３２１）と「ＲＡＭ」（１３２２）、「通信Ｉ／Ｆ」（１３２３）と、「二次記憶装置」（１３２４）と、を有する。 Similarly to the first embodiment, the “search device” (1320) also includes “CPU” (1321), “RAM” (1322), “communication I / F” (1323), and “secondary storage device” (1324). And having.

ここで実施例１にて説明したように、ツールバー装置のツールバープログラムの処理によって、閲覧ＵＲＬがツールバー装置の「ＲＡＭ」の所定アドレスに格納される。また、本実施例では、さらに以下のようにして取得された閲覧ＵＲＬごとのアクセス時刻や閲覧（アクセス）時間などの情報が同様に「ＲＡＭ」の所定アドレスに格納される。 As described in the first embodiment, the browsing URL is stored at a predetermined address in the “RAM” of the toolbar device by the processing of the toolbar program of the toolbar device. Further, in the present embodiment, information such as access time and browsing (access) time for each browsing URL acquired as described below is similarly stored in a predetermined address of “RAM”.

具体的に、例えば「ＵＩ」の操作入力に応じてブラウザプログラムが当該ＵＲＬのリソース（ここではＷｅｂページ）のコンテンツを「通信Ｉ／Ｆ」にて受信した時刻を内蔵時計などで取得し、アクセス時刻として「ＲＡＭ」に格納する。またブラウザプログラムによる処理を監視することで、当該ＵＲＬで示されるウェッブページが例えばディスプレイ上の最前面に表示されるよう制御されている時間や、ポインティングデバイスが当該ウェッブページのウィンドウ上にあるよう制御されている時間を図示しないカウンタや内蔵時計などで計測し、閲覧時間として「ＲＡＭ」に格納する、という具合である。 Specifically, for example, in response to an operation input of “UI”, the browser program obtains the time when the content of the resource (in this case, the Web page) of the URL is received by “Communication I / F” with the built-in clock, and accesses Stored in “RAM” as time. In addition, by monitoring the processing by the browser program, control is performed so that the web page indicated by the URL is controlled to be displayed, for example, in the foreground on the display, or the pointing device is on the web page window. For example, it is measured by a counter or a built-in clock (not shown) and stored in “RAM” as browsing time.

そしてツールバー装置ではツールバープログラムに従って、このように取得された閲覧ＵＲＬや、当該ＵＲＬで示されるリソースへのアクセス時刻や閲覧時間に、例えばCookieなどで示されるユーザーIDを加えて図２に示すようにテーブル化する。そしてそれら情報を閲覧情報として「通信I／F」から検索装置に対して送信する。 As shown in FIG. 2, the toolbar apparatus adds a user ID indicated by, for example, a cookie to the browsing URL acquired in this way, the access time to the resource indicated by the URL, or the browsing time in accordance with the toolbar program. Create a table. Then, the information is transmitted as browsing information from the “communication I / F” to the search device.

検索装置では、ツールバー装置のツールバープログラムにしたがった処理により送信されてきた閲覧情報を「通信Ｉ／Ｆ」にて受信し「ＲＡＭ」の所定アドレスに格納する。また閲覧情報は必要に応じて「二次記憶装置」に記録、蓄積されてもよい。 In the search device, the browsing information transmitted by the processing according to the toolbar program of the toolbar device is received by the “communication I / F” and stored in a predetermined address of the “RAM”. The browsing information may be recorded and accumulated in the “secondary storage device” as necessary.

そして検索装置にて閲覧情報が格納されると、トラフィックランク算出プログラムに従い、以下のような処理が実行される。すなわち「ＲＡＭ」上などに格納されている上記閲覧情報を参照し、例えば以下のような関数を利用した演算処理を「ＣＰＵ」において実行しトラフィックランクを算出する。

When the browsing information is stored in the search device, the following processing is executed according to the traffic rank calculation program. That is, referring to the browsing information stored on the “RAM” or the like, for example, a calculation process using the following function is executed in the “CPU” to calculate the traffic rank.

ここで、例えばウェッブページaのトラフィックランクR_aは数１のような関数を用いて算出される。t_aは全ユーザーIDによるページaの視聴時間の合算値である。Tは、全ユーザーIDによる全URLの視聴時間合算値である。B_aは次の数２で示された、バースト値である。なおバースト値については後述する。Ｎ_Iは、総閲覧ID数である。また、ｎ_aは全ユーザーIDによるページａの閲覧回数の合算値である。Ｎは全ユーザーによる全ウェッブページの閲覧回数合算値である。そして、ｒ_xはページaの関連ページxのトラフィックランクである。 Here, for example, traffic rank R _a web page a is calculated using a function such as the number 1. t _a is a total value of viewing time of page a by all user IDs. T is the total viewing time of all URLs by all user IDs. B _a is a burst value represented by the following equation (2). The burst value will be described later. N _I is the total viewing ID number. In addition, n _a is the sum of the Views of the page a by all user ID. N is the total number of browsing times of all web pages by all users. R _x is the traffic rank of the related page x of page a.

このように数１では、全ページの視聴時間（Ｔ）のうちのページａの視聴時間（ｔ_ａ）の割合（ｔ_ａ／Ｔ）を算出する。この数値によってクリックミスなどの表示時間が短いアクセスに関しては、そのトラフィックランクを低いものとして算出することができる。したがって前述のように、単純なアクセス数などでは推定することが困難なクリックミスなどを排除して、実体に即した指標としてリソースの視聴度指数を算出することができる。

In this way the number 1, to calculate the percentage of pages a viewing time of the viewing time of all the pages _{_{(T) (t a) (}} t a / T). With this numerical value, an access with a short display time such as a click mistake can be calculated with a low traffic rank. Therefore, as described above, it is possible to eliminate the click mistake that is difficult to estimate with a simple number of accesses and the like, and to calculate the audience rating index of the resource as an index in accordance with the substance.

また、ＷｅｂページａのＵＲＬのバースト度B_aは、例えば数２で示す関数式で求めると良い。ｒ_atは、Ｗｅｂページａの本日の閲覧数である。またｒ_ayは、Ｗｅｂページａの昨日の閲覧数である。つまり数１で算出されるバースト度は、昨日の閲覧数を基準とした本日の閲覧数の増減率となる。したがって、通常時のアクセス数に比べて急激な伸びが認められる場合にはこの値が高くなる。つまり、そのページの注目度の高まりをバースト度によって表し、当該Ｗｅｂページ（リソース）のトラフィックランクに反映させることができる。 Furthermore, the burst of B _a URL for a Web page a, for example when determined by a function expression shown in Equation 2. r _at is the number of browsing web pages a today. Moreover, _ray is the number of browsing of the web page a yesterday. That is, the burst degree calculated by Equation 1 is the rate of increase / decrease in the number of today's views based on the number of views yesterday. Therefore, this value is higher when a rapid increase is recognized compared to the normal number of accesses. That is, the increase in the attention level of the page can be expressed by the burst degree and reflected in the traffic rank of the Web page (resource).

また、その他にも総閲覧ＩＤ数（ＮＩ）を関数の変数として利用することで以下のような効果が期待できる。すなわち、例えばあるサイトのトラフィックランクを不正に上げようとして、一ユーザーがあるサイトへ複数アクセスを実行した。しかし、そのような場合でも変数ＮＩを利用することで、一ユーザーによる複数アクセスよりも複数ユーザーによる複数アクセスの方がトラフィックランクを高く算出することができる、という具合である。 In addition, the following effects can be expected by using the total browsing ID number (NI) as a function variable. That is, for example, in order to increase the traffic rank of a certain site illegally, a user performs multiple accesses to a certain site. However, even in such a case, by using the variable NI, it is possible to calculate a higher traffic rank for multiple accesses by multiple users than for multiple accesses by one user.

また、トラフィックランクの算出対象であるＷｅｂページａのリンク先などの関連ページｘ１、ｘ２、・・・のトラフィックランク（ｒ_ｘ）を、上記関数の変数として利用しても良い。 Further, the traffic rank (r _x ) of the related pages x1, x2,..., Such as the link destination of the web page a that is the traffic rank calculation target, may be used as the variable of the function.

また、ここで利用される関連ページのトラフィックランクｒ_ｘは、例えば算出の対象となっているＵＲＬからハイパーリンクが設けられているページについてのみ、加重平均等でトラフィックランクを算出するものでもよい。また、あるURLから遷移する確率が一定値以上であるURLを取り出し、各ＵＲＬが有するトラフィックランクを加重平均により算出したものを関連ＵＲＬのトラフィックランクとしてもよい。 Further, the traffic rank r _x of the related page used here may be a traffic rank calculated by a weighted average or the like only for a page provided with a hyperlink from a URL to be calculated, for example. Alternatively, a URL that has a probability of transition from a certain URL that is greater than or equal to a certain value may be extracted, and the traffic rank of each URL calculated by a weighted average may be used as the traffic rank of the related URL.

また上記算出されるランクは、いわゆる「全国ランク」以外にも、前述のように「属性別ランク」や「個人ランク」であっても良い。具体的には、受信した閲覧情報に含まれるＣｏｏｋｉｅなどを「ＲＡＭ」に格納する。そしてそのＣｏｏｋｉｅをキーとして、予め保持しているユーザー登録情報などからその閲覧情報で示される表示時間に関してセグメントや個人を特定する処理を「ＣＰＵ」の演算処理によって実行する。そして、上記例えばＵＲＬａおよび全ＵＲＬの表示時間の合算処理において、特定された個人やセグメント別にその合算値を算出し、その表示時間の割合を「ＣＰＵ」の演算処理によって算出する、という具合である。 In addition to the so-called “national rank”, the calculated rank may be “attribute rank” or “individual rank” as described above. Specifically, a cookie or the like included in the received browsing information is stored in the “RAM”. Then, using the cookie as a key, a process for specifying a segment or an individual with respect to the display time indicated by the browsing information from previously stored user registration information or the like is executed by a calculation process of the “CPU”. Then, for example, in the summation processing of the display time of URLa and all URLs, the sum value is calculated for each specified individual or segment, and the ratio of the display time is calculated by the arithmetic processing of “CPU”. .

そして、このようにして閲覧ＵＲＬごとに算出したトラフィックランクを、「二次記憶装置」にトラフィックランクデータベースとして記録、保持する。そして実施例４や５で後述するように、「クローラ」の巡回頻度のスケジュール調整や検索結果の順位ソートなど際に当該トラフィックランクデータベースを参照する、という具合である。 Then, the traffic rank calculated for each browsing URL in this way is recorded and held in the “secondary storage device” as a traffic rank database. Then, as will be described later in the fourth and fifth embodiments, the traffic rank database is referred to when adjusting the schedule of the crawler's circulation frequency or sorting the search result rank.

<処理の流れ> <Process flow>

図１４は、本実施例の検索システムにおける処理の流れの一例を表すフローチャートである。なお、以下に示すステップは、媒体に記録され計算機を制御するためのプログラムを構成する処理ステップであっても構わない。 FIG. 14 is a flowchart illustrating an example of a processing flow in the search system according to this embodiment. Note that the steps shown below may be processing steps that constitute a program for controlling a computer recorded on a medium.

この図にあるように、ツールバー装置にてツールバーが起動されると、まず閲覧ＵＲＬなどを取得する（ステップＳ１４０１）。次にユーザーのブラウザの操作により閲覧ＵＲＬが更新（変更）されたか否かを判断する（ステップＳ１４０２）。更新されたと判断された場合には当該ＵＲＬのアクセス時間を取得する（ステップＳ１４０３）。そして取得した閲覧ＵＲＬやアクセス時間を含む情報を閲覧情報として閲覧管理サーバ装置を含む検索装置に送信する（ステップＳ１４０４）。 As shown in this figure, when the tool bar is activated in the tool bar device, a browsing URL or the like is first acquired (step S1401). Next, it is determined whether or not the browsing URL has been updated (changed) by the user's browser operation (step S1402). If it is determined that the URL has been updated, the access time of the URL is acquired (step S1403). Then, the information including the acquired browsing URL and access time is transmitted as browsing information to a search device including the browsing management server device (step S1404).

次に検索装置において、この図にあるように、ｎ個のＵＲＬと、当該ＵＲＬに関連付けたユーザーＩＤや閲覧時間などの情報を含む閲覧情報をツールバー装置より受信する（ステップＳ１４１１）。次に受信した閲覧情報を主メモリなどに格納する（ステップＳ１４１２）。その後、ループ１（ステップＳ１４１３）を開始とする以下の処理を実行する。まず蓄積されたｎ個の閲覧情報を関数の変数として利用し、トラフィックランクを算出する（ステップＳ１４１４）。そして、ｎ個のＵＲＬごとに算出したトラフィックランクを、例えばＨＤＤなどにデータベースとして記録する。そして例えば当該データベースを参照し「クローラ」の巡回頻度のスケジュール調整を実行したり、検索結果の順位ソートを実行したりする。 Next, in the search device, as shown in this figure, browsing information including n URLs and information such as user ID and browsing time associated with the URLs is received from the toolbar device (step S1411). Next, the received browsing information is stored in the main memory or the like (step S1412). Thereafter, the following processing starting from loop 1 (step S1413) is executed. First, the n browsing information accumulated is used as a function variable to calculate a traffic rank (step S1414). Then, the traffic rank calculated for each of the n URLs is recorded as a database in, for example, an HDD. Then, for example, referring to the database, the schedule adjustment of the crawling frequency of the “crawler” is executed, or the ranking of the search results is executed.

<効果の簡単な説明> <Brief description of effect>

このように本実施例の検索システムによって、誤アクセスなどを排除して集計した実効的なリソースのアクセス指標（視聴度指数）であるトラフィックランクを算出することができる。また、上記のような関数を利用すれば、さらに単にＷｅｂページのＵＲＬ毎のアクセス数や、ユニークユーザー数などだけではなく、注目度の高まりなどを含めた総合的な評価指標によりＷｅｂページを評価することができるトラフィックランクを算出することができる。 In this manner, the search system of the present embodiment can calculate a traffic rank that is an effective resource access index (viewing degree index) that is tabulated while eliminating erroneous access and the like. In addition, if functions such as those described above are used, the Web page is evaluated not only by the number of accesses for each URL of the Web page and the number of unique users, but also by a comprehensive evaluation index including an increase in the degree of attention. The traffic rank that can be calculated can be calculated.

≪実施例４≫ Example 4

<概要> <Overview>

本実施例は、上記実施例３を基本として、算出したトラフィックランクをリソースの人気度の指標として利用し、クローラの巡回優先度の決定に利用することを特徴とする検索システムである。 The present embodiment is a search system based on the third embodiment, wherein the calculated traffic rank is used as an index of resource popularity and is used to determine the crawler's circulation priority.

図１５は、本実施例の検索システムにおける処理の一例を説明するための概念図である。この図にあるように、検索装置は上記実施例３で記載した構成、処理によって、トラフィックランクを算出し、データベースとして保持している。そしてさらに、本実施例の検索装置では、例えば「ＴＲ（トラフィックランク）：６０」のサイトＣに対しては「巡回優先度：１２」、一方「ＴＲ：２０」のサイトＤであれば「巡回優先度：４」という具合に、トラフィックランクに応じて巡回優先度を算出していることを特徴とする。 FIG. 15 is a conceptual diagram for explaining an example of processing in the search system of the present embodiment. As shown in this figure, the search device calculates the traffic rank by the configuration and processing described in the third embodiment and holds it as a database. Further, in the search device of the present embodiment, for example, “tour priority” is 12 for the site C of “TR (traffic rank): 60”, while “tour” is the site D of “TR: 20”. The priority is calculated according to the traffic rank, such as “priority: 4”.

そして、このトラフィックランクに応じた優先度を利用して、本実施例の検索装置では、クローラの巡回スケジュールを、例えばサイトＣであれば「３回／週」としてスケジューリングし、一方優先度（トラフィックランク）の低いサイトＤは「1回／週」としてスケジューリングする、という具合である。 Then, using the priority according to the traffic rank, the search device of this embodiment schedules the crawler's patrol schedule as “3 times / week” for the site C, for example. The site D having a low rank is scheduled as “once / week”.

つまり、前述のようにインターネットユーザーは更新頻度が高いＷｅｂページを中心的に検索しアクセスする傾向がある。これは逆に言えば、アクセス頻度の高いリソースは頻繁に更新などされている可能性が高い、ということである。そして、上記実施例３で説明したように「トラフィックランク」は、ユーザーの実効的なアクセス数を示す指標である。したがって、このトラフィックランクによってユーザーの実効的なアクセス数に応じてクローラの巡回優先度の決定することができる、ということである。 That is, as described above, Internet users tend to search and access Web pages that are frequently updated. In other words, there is a high possibility that resources with high access frequency are frequently updated. As described in the third embodiment, the “traffic rank” is an index indicating the number of effective accesses by the user. Therefore, the crawler's cyclic priority can be determined according to the effective number of accesses by the traffic rank.

<機能的構成> <Functional configuration>

図１６は、本実施例の検索システムにおける機能ブロックの一例を表す図である。この図にあるように、本実施例の「検索システム」（１６００）は、実施例３を基本として「ツールバー装置」（１６１０）と、「検索装置」（１６２０）とを有する。また「ツールバー装置」（１６１０）は、実施例３と同様に「閲覧情報送信部」（１６１１）と、「閲覧情報取得部」（１６１２）とを有する。 FIG. 16 is a diagram illustrating an example of functional blocks in the search system according to the present embodiment. As shown in this figure, the “search system” (1600) of this embodiment has a “toolbar device” (1610) and a “search device” (1620) based on the third embodiment. The “toolbar device” (1610) includes a “browsing information transmission unit” (1611) and a “browsing information acquisition unit” (1612), as in the third embodiment.

また「検索装置」（１６２０）も、実施例３を基本として「第二閲覧管理サーバ装置」（１６３０）を有し、その第二閲覧管理サーバ装置が「閲覧情報蓄積部」（１６３１）と、「トラフィックランクスコアリング部」（１６３２）とを有する。なお、上記ツールバー装置及び検索装置の各構成は、実施例１や３にて既に記載済みであるのでその説明は省略する。 The “search device” (1620) also has a “second browsing management server device” (1630) based on the third embodiment, and the second browsing management server device is a “browsing information storage unit” (1631). “Traffic rank scoring unit” (1632). Note that each configuration of the toolbar device and the search device has already been described in the first and third embodiments, and thus description thereof is omitted.

そして、本実施例の特徴点は、検索装置が、さらに「クローラ部」（１６２１）と、「スケジュール決定部」（１６２２）と、を有する点である。 The characteristic point of this embodiment is that the search device further includes a “crawler unit” (1621) and a “schedule determination unit” (1622).

「クローラ部」（１６２１）は、インターネット上のリソース間に張られたハイパーリンクをたどりＷＷＷ上を巡回することで各リソースにアクセスしコンテンツを収集するいわゆる「クローラ」プログラムによって実現することができる。 The “crawler unit” (1621) can be realized by a so-called “crawler” program that accesses each resource and collects content by following hyperlinks between resources on the Internet and circulating on the WWW.

また、このクローラプログラムは、通常その巡回対象となるリソースのＵＲＬやその巡回開始時刻などを予め定めたスケジュールに従ってコンテンツを収集する機能を有する。そして、本実施例では、下記の構成によってそのスケジューリングにおける巡回頻度を、トラフィックランクに応じて立案することを特徴とする。 In addition, this crawler program has a function of collecting contents according to a schedule in which the URL of the resource to be normally visited, the tour start time, and the like are determined in advance. In this embodiment, the following configuration is used to plan the circulation frequency in the scheduling according to the traffic rank.

「スケジュール決定部」（１６２２）は、トラフィックランクスコアリング部（１６３２）で算出されたトラフィックランクに基づいてクローラ部（１６２１）のクローリングスケジュールを決定する機能を有する。 The “schedule determining unit” (1622) has a function of determining the crawling schedule of the crawler unit (1621) based on the traffic rank calculated by the traffic rank scoring unit (1632).

具体的には、例えばトラフィックランクを変数とする比例関数ｙ＝ｆ（ＴＲ）を利用したＣＰＵの演算処理によって当該ＵＲＬに対するクローラプログラム実行の巡回優先度ｙを算出する。そして算出した巡回優先度に基づいて、例えば「３回／週（巡回優先度１２）」、「１回／週（巡回優先度４）」といった具合にクローラプログラムの巡回頻度を含むスケジュールを決定する、という具合である。 Specifically, for example, the cyclic priority y for executing the crawler program for the URL is calculated by a CPU calculation process using a proportional function y = f (TR) with the traffic rank as a variable. Then, based on the calculated tour priority, for example, “3 times / week (patrol priority 12)”, “1 time / week (patrol priority 4)”, etc., and a schedule including the crawler program tour frequency is determined. , And so on.

このようにして、ユーザーの実効的なアクセス数を示すトラフィックランクによってクローラの巡回優先度の決定し、更新頻度が高く情報鮮度が高いと思われるＷＷＷ上のリソースの情報を検索装置にて好適なタイミングで更新取得することができる。 In this way, the crawler's cyclic priority is determined based on the traffic rank indicating the number of effective accesses of the user, and information on resources on the WWW that is considered to have high update frequency and high information freshness is suitable for the search device. Updates can be acquired at the timing.

<ハードウェア的構成> <Hardware configuration>

図１７は、上記機能的な各構成要件をハードウェアとして実現した際の、検索システムにおける構成の一例を表す概略図である。この図を利用して本実施例の検索システムにおけるそれぞれのハードウェア構成部の働きについて説明する。 FIG. 17 is a schematic diagram illustrating an example of a configuration in the search system when the above functional components are realized as hardware. The operation of each hardware component in the search system of this embodiment will be described with reference to this figure.

この図にあるように、本実施例の検索システムのハードウェア構成は、ツールバー装置（１７１０）と検索装置（１７２０）より構成され、両者はインターネットを含む電気通信回線を介して、相互に接続可能となっている。 As shown in this figure, the hardware configuration of the search system of the present embodiment is composed of a toolbar device (1710) and a search device (1720), which can be connected to each other via a telecommunication line including the Internet. It has become.

なお「ツールバー装置」（１７１０）は、実施例３のツールバー装置と同様の構成、及び処理を行うため、ここでの説明は省略する。そして「検索装置」（１７２０）の構成は、上記実施例と同様に、「ＣＰＵ」（１７２１）と「ＲＡＭ」（１７２２）、「通信Ｉ／Ｆ」（１７２３）と、「二次記憶装置」（１７２４）と、を有し、以下のような処理を実行する。 The “toolbar apparatus” (1710) performs the same configuration and processing as the toolbar apparatus of the third embodiment, and thus the description thereof is omitted here. The configuration of the “search device” (1720) is the same as that of the above embodiment, “CPU” (1721), “RAM” (1722), “communication I / F” (1723), and “secondary storage device”. (1724) and execute the following processing.

すなわちツールバー装置から上記実施例３にて説明したような処理によって送信されてきた、閲覧ＵＲＬ、及びその閲覧ＵＲＬごとのアクセス時刻や閲覧（アクセス）時間などの情報を含む閲覧情報を、検索装置は「通信Ｉ／Ｆ」にて受信し、「ＲＡＭ」の所定アドレスに格納する。 In other words, the search device searches the browsing information including the browsing URL and the access time and browsing (access) time for each browsing URL transmitted from the toolbar device by the processing described in the third embodiment. Received by “communication I / F” and stored in a predetermined address of “RAM”.

そして、同じく上記実施例３にて記載したような処理を実行し、「ＲＡＭ」上に格納されている上記閲覧情報を利用してＵＲＬごとのトラフィックランクを算出し、トラフィックランクデータベースとして「二次記憶装置」に記録、保持する。 Then, the same processing as described in the third embodiment is executed, and the traffic rank for each URL is calculated using the browsing information stored in the “RAM”. Recorded and stored in “storage device”.

つづいてスケジュール決定プログラムに従い以下のような処理が実行される。すなわちクローラの巡回スケジュールを立てるべきＵＲＬをキーとして「二次記憶装置」に保持されているトラフィックランクデータベースを参照し、当該ＵＲＬのトラフィックランクの値ＴＲを取得する。そして例えばｙ＝ｆ（ＴＲ）といった関数を利用した演算処理を「ＣＰＵ」において実行し、当該トラフィックランクに係るＵＲＬを対象としたクローラプログラムの巡回優先度ｙを算出する。そして、「二次記憶装置」に保持されている「巡回優先度−巡回頻度テーブル」を参照し、算出された巡回優先度ｙと対応付けられた、例えば「３回／週」といった巡回頻度を決定する。 Subsequently, the following processing is executed according to the schedule determination program. In other words, the traffic rank database held in the “secondary storage device” is referred to using the URL for which the crawler's patrol schedule should be set as a key, and the traffic rank value TR of the URL is acquired. Then, for example, calculation processing using a function such as y = f (TR) is executed in the “CPU”, and the cyclic priority y of the crawler program for the URL related to the traffic rank is calculated. Then, with reference to the “cyclic priority-cyclic frequency table” stored in the “secondary storage device”, the cyclic frequency such as “3 times / week” associated with the calculated cyclic priority y is determined. decide.

そして決定された「３回／週」といった巡回頻度に応じてタスクスケジューラなどでクローラプログラムの実効日時を予約し、決定された頻度でのクローラプログラムによるリソース巡回、コンテンツ取得処理が実行される、という具合である。 The effective date and time of the crawler program is reserved by a task scheduler or the like according to the determined tour frequency such as “3 times / week”, and the resource tour and content acquisition processing by the crawler program is executed at the determined frequency. Condition.

<処理の流れ> <Process flow>

図１８は、本実施例の検索システムにおける処理の流れの一例を表すフローチャートである。なお、以下に示すステップは、媒体に記録され計算機を制御するためのプログラムを構成する処理ステップであっても構わない。 FIG. 18 is a flowchart illustrating an example of a processing flow in the search system according to this embodiment. Note that the steps shown below may be processing steps that constitute a program for controlling a computer recorded on a medium.

まず、ツールバー装置における処理の流れについては、図１４を用い実施例３で述べた処理（Ｓ１４０１〜Ｓ１４０４）と同様であるためその説明は省略する。 First, the process flow in the toolbar apparatus is the same as the process (S1401 to S1404) described in the third embodiment with reference to FIG.

次に検索装置において、この図にあるように、ｎ個のＵＲＬと、当該ＵＲＬに関連付けたユーザーＩＤや閲覧時間などの情報を含む閲覧情報をツールバー装置より受信する（ステップＳ１８１１）。次に受信した閲覧情報を主メモリなどに格納する（ステップＳ１８１２）。その後、ループ１（ステップＳ１８１３）を開始とする以下の処理を実行する。まず蓄積されたｎ個の閲覧情報を、例えば前述の関数の変数として利用しトラフィックランクを算出する（ステップＳ１８１４）。そして、算出したトラフィックランクに基づいて当該ＵＲＬに対するクローラ部のクローリングスケジュールを決定する（ステップＳ１８１５）。そして上記処理をｎ個のＵＲＬに関して繰り返して実行する（ステップＳ１８１６）。 Next, in the search device, as shown in this figure, browsing information including n URLs and information such as a user ID and browsing time associated with the URLs is received from the toolbar device (step S1811). Next, the received browsing information is stored in the main memory or the like (step S1812). Thereafter, the following processing starting from loop 1 (step S1813) is executed. First, the n browsing information stored is used as a variable of the above-described function, for example, to calculate a traffic rank (step S1814). Then, the crawler unit crawling schedule for the URL is determined based on the calculated traffic rank (step S1815). Then, the above process is repeated for n URLs (step S1816).

<効果の簡単な説明> <Brief description of effect>

このように本実施例の検索システムによって、ユーザーの実効的なアクセス数を示すトラフィックランクによってクローラの巡回優先度の決定することができる。したがって、更新頻度が高く情報鮮度が高いと思われるＷＷＷ上のリソースの情報を検索装置にて好適なタイミングで更新取得することができる。 As described above, the crawler's cyclic priority can be determined by the traffic rank indicating the effective number of accesses of the user by the search system of this embodiment. Therefore, it is possible to update and acquire information on resources on the WWW that is considered to have a high update frequency and high information freshness at a suitable timing by the search device.

≪実施例５≫ Example 5

<概要> <Overview>

本実施例は、上記実施例３や４を基本として、算出したトラフィックランクを利用して、ツールバー装置からの検索リクエストに対する検索結果をソートすることを特徴とする検索システムである。 The present embodiment is a search system characterized in that the search results for the search requests from the toolbar device are sorted using the calculated traffic rank based on the third and fourth embodiments.

図１９は、本実施例の検索システムにおける検索処理の一例を説明するための概念図である。この図にあるように、クライアント端末が当該検索システムにおける検索用Ｗｅｂページにアクセスし、検索クエリの入力、送信を行う。すると検索装置は従来の検索システム同様に検索クエリを含むリソースを検索用のインデックスファイルから抽出する。そして、抽出されたリソースをリスト化などし、検索結果用画面を生成する。 FIG. 19 is a conceptual diagram for explaining an example of search processing in the search system of the present embodiment. As shown in this figure, a client terminal accesses a search Web page in the search system, and inputs and transmits a search query. Then, the search device extracts the resource including the search query from the search index file as in the conventional search system. Then, the extracted resources are listed, and a search result screen is generated.

ここで、本実施例の検索装置は従来と異なる以下のような処理をさらに実行する。すなわち、検索結果で示されるＵＲＬ（のリスト）に係るトラフィックランク値の大小に応じて、例えばサイトＢを検索結果（リスト）の1番上位にソートし、サイトＡを２番目にソートする、といった検索結果の並び替えを実行する、という具合である。 Here, the search device according to the present embodiment further executes the following processing different from the conventional one. That is, according to the magnitude of the traffic rank value related to the URL (list) indicated by the search result, for example, the site B is sorted to the top of the search result (list), and the site A is sorted second. For example, the search results are rearranged.

このようにして、本実施例の検索システムではユーザーの実効的なアクセス数を示すトラフィックランクに応じた検索結果をクライアント端末に返信することができる。 In this manner, the search system according to the present embodiment can return a search result corresponding to the traffic rank indicating the effective number of accesses of the user to the client terminal.

<機能的構成> <Functional configuration>

図２０は、本実施例の検索システムにおける機能ブロックの一例を表す図である。この図にあるように、本実施例の「検索システム」（２０００）は、実施例３を基本として「ツールバー装置」（２０１０）と、「検索装置」（２０２０）とを有する。また「ツールバー装置」（２０１０）は、実施例３と同様に「閲覧情報送信部」（２０１１）と、「閲覧情報取得部」（２０１２）とを有する。 FIG. 20 is a diagram illustrating an example of functional blocks in the search system according to the present embodiment. As shown in this figure, the “search system” (2000) of this embodiment has a “toolbar device” (2010) and a “search device” (2020) based on the third embodiment. The “toolbar device” (2010) includes a “browsing information transmission unit” (2011) and a “browsing information acquisition unit” (2012), as in the third embodiment.

また「検索装置」（２０２０）も、実施例３を基本として「第二閲覧管理サーバ装置」（２０３０）を有し、その第二閲覧管理サーバ装置が「閲覧情報蓄積部」（２０３１）と、「トラフィックランクスコアリング部」（２０３２）とを有する。また、上記実施例４を基本として、検索装置が図示しない「クローラ部」や「スケジュール決定部」を有していても良い。なお、上記ツールバー装置及び検索装置の各構成は、上記各実施例にて既に記載済みであるのでその説明は省略する。 The “search device” (2020) also includes a “second browsing management server device” (2030) based on the third embodiment, and the second browsing management server device includes a “browsing information storage unit” (2031), A “traffic rank scoring unit” (2032). In addition, based on the fourth embodiment, the search device may include a “crawler unit” and a “schedule determination unit” (not shown). Note that the configurations of the toolbar device and the search device have already been described in the above-described embodiments, and thus description thereof will be omitted.

そして、本実施例の特徴点は、検索装置が、「ランクソート出力部」（２０２１）をさらに有する点である。 The feature point of this embodiment is that the search device further includes a “rank sort output unit” (2021).

「ランクソート出力部」（２０２１）は、トラフィックランクスコアリング部（２０３２）で算出されたトラフィックランクに基づいて検索結果をソートしてクライアントに対して出力する機能を有する。 The “rank sort output unit” (2021) has a function of sorting search results based on the traffic rank calculated by the traffic rank scoring unit (2032) and outputting the result to the client.

具体的に「ランクソート出力部」による前記ソートの前に、本実施例の検索システムでは従来の検索システム同様の検索処理が実行される。すなわち、クライアント端末にて送信された検索クエリをキーとして、検索用インデックスファイルを検索する。そして検索クエリを含むリソースをインデックスファイルから抽出する。 Specifically, before the sorting by the “rank sort output unit”, the search system of the present embodiment executes the same search process as the conventional search system. That is, the search index file is searched using the search query transmitted from the client terminal as a key. Then, the resource including the search query is extracted from the index file.

続いて、その抽出したリソースを並び替えて例えば箇条形式（リスト形式）などの検索結果としてクライアントに返信することになる。ここで通常の検索システムでは、検索結果の並び替えを、例えばデータ生成や取得の古い順／新しい順、アクセス数順、あいうえお順、あるいはリンクを利用して付されるリソースの再帰的な格付け、などに応じて実行する。しかし、本実施例の検索システムでは、実施例３で算出したユーザーの実体的なアクセス数を示すトラフィックランクを利用する事を特徴とする。すなわち、検索によって抽出されたリソースのトラフィックランクを、上記実施例３にて記載したトラフィックランクデータベースなどから取得する。そしてトラフィックランクの大小比較をＣＰＵの演算処理によって実行し、例えばトラフィックランクの大きい順にリソースを並び替え（ソートして）検索結果とする、という具合である。 Subsequently, the extracted resources are rearranged and returned to the client as a search result in, for example, an item format (list format). Here, in a normal search system, the search results are rearranged by, for example, the oldest / newest order of data generation or acquisition, the number of accesses, the order of access, or the recursive rating of resources assigned using links, Execute according to etc. However, the search system of the present embodiment is characterized in that a traffic rank indicating the actual number of accesses of the user calculated in the third embodiment is used. That is, the traffic rank of the resource extracted by the search is acquired from the traffic rank database described in the third embodiment. Then, the magnitude comparison of the traffic ranks is executed by the arithmetic processing of the CPU, and for example, the resources are rearranged (sorted) in descending order of the traffic ranks to obtain search results.

このようにして、ユーザーの実効的なアクセス数、すなわち実体的なアクセス人気度を示すトラフィックランクに応じた検索結果をクライアント端末に返信することができる。 In this way, a search result according to the traffic rank indicating the effective number of accesses of the user, that is, the substantial access popularity can be returned to the client terminal.

<ハードウェア的構成> <Hardware configuration>

図２１は、上記機能的な各構成要件をハードウェアとして実現した際の、検索システムにおける構成の一例を表す概略図である。この図を利用して本実施例の検索システムにおけるそれぞれのハードウェア構成部の働きについて説明する。 FIG. 21 is a schematic diagram illustrating an example of a configuration in the search system when the above functional components are realized as hardware. The operation of each hardware component in the search system of this embodiment will be described with reference to this figure.

この図にあるように、本実施例の検索システムのハードウェア構成は、ツールバー装置（２１１０）と検索装置（２１２０）より構成され、両者はインターネットを含む電気通信回線を介して、相互に接続可能となっている。 As shown in this figure, the hardware configuration of the search system of this embodiment is composed of a toolbar device (2110) and a search device (2120), both of which can be connected to each other via a telecommunication line including the Internet. It has become.

なお「ツールバー装置」（２１１０）は、実施例３や４のツールバー装置と同様の構成、及び処理を行うため、ここでの説明は省略する。そして「検索装置」（２１２０）の構成は、上記実施例と同様に、「ＣＰＵ」（２１２１）と「ＲＡＭ」（２１２２）、「通信Ｉ／Ｆ」（２１２３）と、「二次記憶装置」（２１２４）と、を有し、以下のような処理を実行する。 The “toolbar device” (2110) performs the same configuration and processing as the toolbar device of the third and fourth embodiments, and thus the description thereof is omitted here. The configuration of the “search device” (2120) is the same as in the above embodiment, “CPU” (2121), “RAM” (2122), “communication I / F” (2123), and “secondary storage device”. (2124) and execute the following processing.

すなわちツールバー装置から送信された閲覧ＵＲＬ、及びその閲覧ＵＲＬごとのアクセス時刻や閲覧（アクセス）時間などの情報を含む閲覧情報を利用して、検索装置にてトラフィックランクが算出され「二次記憶装置」に記録、保持される。 That is, using the browsing information transmitted from the toolbar device and browsing information including information such as the access time and browsing (access) time for each browsing URL, the traffic rank is calculated by the search device, and the “secondary storage device” Is recorded and retained.

その後、ネットワーク上のツールバー装置が検索用Ｗｅｂページにアクセスし、検索クエリの入力、送信を行うと、検索装置は「通信Ｉ／Ｆ」にてその検索クエリを含むＨＴＴＰリクエストを受信し、「ＲＡＭ」の所定アドレスに格納する。すると、検索装置では検索サーバプログラムに従い以下の処理を実行する。すなわち「ＲＡＭ」に格納されている検索クエリをキーとして、「二次記憶装置」に保持されている検索用のインデックスファイルを参照し、検索クエリを含むリソースを抽出する。 After that, when the toolbar device on the network accesses the search Web page and inputs and transmits the search query, the search device receives an HTTP request including the search query through “Communication I / F”, Is stored at a predetermined address. Then, the search device executes the following processing according to the search server program. That is, the search query stored in the “secondary storage device” is referenced using the search query stored in the “RAM” as a key, and resources including the search query are extracted.

つづいて、同じく「二次記憶装置」に保持されているトラフィックランクデータベースを参照し、抽出したリソースのトラフィックランクを取得する。そして、「ＣＰＵ」の比較演算処理によってトラフィックランクの大小比較を実行し、例えばトラフィックランクの大きい（ランクが高い）順に検索結果の並び順を決定する。そして決定された並び順に従って抽出したリソースを並べて箇条形式とした検索結果を「ＣＰＵ」の演算処理によって生成し、「通信Ｉ／Ｆ」より検索クエリの送信元のツールバー装置に対して返信する、という具合である。 Subsequently, referring to the traffic rank database similarly held in the “secondary storage device”, the traffic rank of the extracted resource is acquired. Then, the comparison of the traffic ranks is executed by the comparison calculation processing of “CPU”, and the order of the search results is determined in the order of, for example, the traffic ranks in descending order (ranking is high). Then, a search result in which the extracted resources are arranged in accordance with the determined arrangement order to form an item form is generated by the calculation processing of “CPU”, and is returned from the “communication I / F” to the toolbar device that is the transmission source of the search query. That's it.

<処理の流れ> <Process flow>

図２２は、本実施例の検索システムにおける処理の流れの一例を表すフローチャートである。なお、以下に示すステップは、媒体に記録され計算機を制御するためのプログラムを構成する処理ステップであっても構わない。 FIG. 22 is a flowchart illustrating an example of a processing flow in the search system according to this embodiment. Note that the steps shown below may be processing steps that constitute a program for controlling a computer recorded on a medium.

まず、ツールバー装置が組み込まれるなどしたクライアント端末における処理の流れについては、図１４を用い実施例３で述べた処理（Ｓ１４０１〜Ｓ１４０４）と同様であるためその説明は省略する。また検索装置におけるトラフィックランクの算出（ステップＳ２２１１）までの処理も実施例３で述べた処理（Ｓ１４１１〜Ｓ１４１４）までと同様であるのでその説明は省略する。 First, the processing flow in the client terminal in which the toolbar device is incorporated is the same as the processing (S1401 to S1404) described in the third embodiment with reference to FIG. The processing up to the calculation of the traffic rank (step S2211) in the search device is the same as the processing (S1411 to S1414) described in the third embodiment, and the description thereof is omitted.

そして、この図にあるように、検索装置ではネットワーク上のクライアント端末（ツールバー装置が組み込まれていなくとも構わない）から送信された検索クエリを受信する（ステップＳ２２１２）と、検索クエリをキーとして検索用インデックスファイルの検索処理を実行する（ステップＳ２２１３）。そして検索クエリをコンテンツに含むリソースを抽出し検索結果として取得する（ステップＳ２２１４）。 As shown in this figure, when the search device receives a search query transmitted from a client terminal on the network (the toolbar device may not be incorporated) (step S2212), the search device searches using the search query as a key. The index file search process is executed (step S2213). Then, a resource including the search query in the content is extracted and acquired as a search result (step S2214).

つづいて、抽出したリソースのトラフィックランクを取得し（ステップＳ２２１５）、そのトラフィックランクの例えば大小順に応じて検索結果のリソースを並び替える（ステップＳ２２１６）。そして並び替えた検索結果を、検索クエリの送信元のクライアント端末に対して返信する（ステップＳ２２１７）。 Subsequently, the traffic rank of the extracted resource is acquired (step S2215), and the search result resources are rearranged according to, for example, the order of the traffic rank (step S2216). Then, the sorted search results are returned to the client terminal that has transmitted the search query (step S2217).

<効果の簡単な説明> <Brief description of effect>

このように本実施例の検索システムによって、ユーザーの実効的なアクセス数、すなわち実体的なアクセス人気度を示すトラフィックランクに応じた検索結果をクライアント端末に返信することができる。 As described above, the search system according to the present embodiment can return a search result corresponding to the traffic rank indicating the effective access number of the user, that is, the substantial access popularity, to the client terminal.

≪実施例６≫ Example 6

<概要> <Overview>

本実施例は、実施例２と同様に新規の検索対象を抽出し、抽出した新規検索対象ＵＲＬを利用して検索用のインデックスファイルを追加更新する機能を備える。そして、実施例２との相違点は、その新規検索対象ＵＲＬの抽出において、実施例３などにて説明したトラフィックランクを利用する点である。 The present embodiment has a function of extracting a new search target in the same manner as the second embodiment and additionally updating a search index file using the extracted new search target URL. The difference from the second embodiment is that the traffic rank described in the third embodiment is used in extracting the new search target URL.

具体的には、検索装置において、クライアント端末のツールバー装置にて閲覧情報を取得し、上記実施例で記載したようにＵＲＬごとにトラフィックランクを算出する。ここで、実施例２と同じようにそのＵＲＬが検索用のインデックスファイルに含まれるかを判断する。そして、インデックスファイルに含まれておらず、かつ例えばトラフィックランクが所定値以上であれば、新規にインデックスファイルに追加するに相応しいリソースであるとして、クローラの新規巡回検索対象として、実施例２同様にインデクシング処理を実行する、という具合である。 Specifically, in the search device, browsing information is acquired by the toolbar device of the client terminal, and the traffic rank is calculated for each URL as described in the above embodiment. Here, as in the second embodiment, it is determined whether the URL is included in the search index file. If the traffic rank is not included in the index file and the traffic rank is equal to or higher than a predetermined value, for example, it is determined that the resource is suitable for newly adding to the index file, and the crawler is newly searched as in the second embodiment. For example, the indexing process is executed.

<機能的構成> <Functional configuration>

図２３は、本実施例の検索システムにおける機能ブロックの一例を表す図である。この図にあるように、本実施例の「検索システム」（２３００）は、実施例３を基本として「ツールバー装置」（２３１０）と、「検索装置」（２３２０）とを有する。また「ツールバー装置」（２３１０）は、実施例３と同様に「閲覧情報送信部」（２３１１）と、「閲覧情報取得部」（２３１２）とを有する。 FIG. 23 is a diagram illustrating an example of functional blocks in the search system according to the present embodiment. As shown in this figure, the “search system” (2300) of this embodiment has a “toolbar device” (2310) and a “search device” (2320) based on the third embodiment. The “toolbar device” (2310) includes a “browsing information transmission unit” (2311) and a “browsing information acquisition unit” (2312), as in the third embodiment.

また「検索装置」（２３２０）も、実施例３を基本として「第二閲覧管理サーバ装置」（２３３０）を有し、その第二閲覧管理サーバ装置が「閲覧情報蓄積部」（２３３１）と、「トラフィックランクスコアリング部」（２３３２）とを有する。また、上記実施例４や５を基本として、検索装置が図示しない「クローラ部」や「スケジュール決定部」、「ランクソート出力部」を有していても良い。なお、上記ツールバー装置及び検索装置の各構成は、上記各実施例にて既に記載済みであるのでその説明は省略する。 The “search device” (2320) also has a “second browsing management server device” (2330) based on the third embodiment, and the second browsing management server device is a “browsing information storage unit” (2331). And “traffic rank scoring section” (2332). Further, based on the fourth and fifth embodiments, the search device may include a “crawler unit”, a “schedule determination unit”, and a “rank sort output unit” (not shown). Note that the configurations of the toolbar device and the search device have already been described in the above-described embodiments, and thus description thereof will be omitted.

そして、本実施例の特徴点は、検索装置が、「第二インデクサ部」（２３２１）をさらに有する点である。 A feature point of the present embodiment is that the search device further includes a “second indexer unit” (2321).

「第二インデクサ部」（２３２１）は、トラフィックランクスコアリング部で算出されたトラフィックランクに基づいて検索エンジンにて新規に検索対象として利用するＵＲＬである新規検索対象ＵＲＬを抽出するとともに、抽出したＵＲＬのコンテンツをインデクシングする機能を有する。 The “second indexer unit” (2321) extracts and extracts a new search target URL that is a URL that is newly used as a search target in the search engine based on the traffic rank calculated by the traffic rank scoring unit. It has a function of indexing URL content.

なお、算出されたトラフィックランクに係るＵＲＬを、クローラの新規巡回検索の対象とするための処理については、実施例２にて記載したものと同様であるのでその説明は省略する。また、本実施例においては、インデックスファイル中のＵＲＬとトラフィックランクに係るＵＲＬの差分のみならず、そのＵＲＬで示されるリソースの視聴度指数であるトラフィックランクを利用するため、実施例２で記載した処理に加えさらに以下のような処理を実行しても良い。 Note that the processing for setting the URL related to the calculated traffic rank as the target of a new crawler search is the same as that described in the second embodiment, and thus the description thereof is omitted. Also, in this embodiment, not only the difference between the URL in the index file and the URL related to the traffic rank, but also the traffic rank that is the audience rating index of the resource indicated by the URL is described in the second embodiment. In addition to the processing, the following processing may be executed.

すなわち、トラフィックランクが所定値以上であるか否かの比較判断処理を実行し、所定値以上であればクローラの新規巡回検索対象として相応しいリソースである、と判断する。一方、トラフィックランクが所定値以下であると判断された場合には、クローラの新規巡回検索対象としてインデックスファイルには追加しない、という具合である。 That is, a comparison / determination process for determining whether or not the traffic rank is equal to or higher than a predetermined value is executed. On the other hand, when it is determined that the traffic rank is equal to or lower than the predetermined value, the traffic rank is not added to the index file as a new crawler search target.

<ハードウェア的構成> <Hardware configuration>

図２４は、上記機能的な各構成要件をハードウェアとして実現した際の、検索システムにおける構成の一例を表す概略図である。この図を利用して本実施例の検索システムにおけるそれぞれのハードウェア構成部の働きについて説明する。 FIG. 24 is a schematic diagram illustrating an example of a configuration in the search system when the above functional components are realized as hardware. The operation of each hardware component in the search system of this embodiment will be described with reference to this figure.

この図にあるように、本実施例の検索システムのハードウェア構成は、ツールバー装置（２４１０）と検索装置（２４２０）より構成され、両者はインターネットを含む電気通信回線を介して、相互に接続可能となっている。 As shown in this figure, the hardware configuration of the search system of this embodiment is composed of a toolbar device (2410) and a search device (2420), which can be connected to each other via a telecommunication line including the Internet. It has become.

なお「ツールバー装置」（２４１０）は、実施例３や４、５のツールバー装置と同様の構成、及び処理を行うため、ここでの説明は省略する。そして「検索装置」（２４２０）の構成は、上記実施例と同様に、「ＣＰＵ」（２４２１）と「ＲＡＭ」（２４２２）、「通信Ｉ／Ｆ」（２４２３）と、「二次記憶装置」（２４２４）と、を有し、以下のような処理を実行する。 The “toolbar device” (2410) performs the same configuration and processing as those of the toolbar devices of the third, fourth, and fifth embodiments, and thus description thereof is omitted here. The configuration of the “search device” (2420) is similar to that of the above embodiment, “CPU” (2421), “RAM” (2422), “communication I / F” (2423), and “secondary storage device”. (2424) and execute the following processing.

まず、上記実施例と同様の処理により、ツールバー装置から送信された閲覧ＵＲＬ、及びその閲覧ＵＲＬごとのアクセス時刻や閲覧（アクセス）時間などの情報を含む閲覧情報を利用して、検索装置にてトラフィックランクが算出され「二次記憶装置」に記録、保持される。 First, by using the browsing information including the browsing URL transmitted from the toolbar device and information such as the access time and browsing (access) time for each browsing URL by the same processing as in the above embodiment, the search device uses the browsing information. The traffic rank is calculated and recorded and stored in the “secondary storage device”.

つづいて、予め「二次記憶装置」にて格納されている検索サービス用のインデックスファイルを参照し、「二次記憶装置」の保持されたトラフィックランクに係るＵＲＬをキーとした検索処理を実行する。そして、その検索処理の結果、インデックスファイルに当該ＵＲＬが含まれていないとの判断結果が出力されれば、そのＵＲＬは新規検索対象のＵＲＬ候補であると判断する。 Subsequently, the index file for the search service stored in the “secondary storage device” is referred to in advance, and a search process using the URL related to the traffic rank held in the “secondary storage device” as a key is executed. . As a result of the search process, if a determination result indicating that the URL is not included in the index file is output, it is determined that the URL is a URL candidate for a new search target.

つづいて、第二インデクサプログラムに従い、「ＣＰＵ」は前記候補ＵＲＬに係るトラフィックランクを「ＲＡＭ」に格納し、予め「二次記憶装置」に保持されている所定値との大小比較処理を実行する。そしてその大小比較処理の結果トラフィックランクが所定値以上であると判断されれば、当該ＵＲＬは新規検索対象のＵＲＬとしてインデクシング処理（二次記憶装置に保持されているインデックスファイルへの追加更新処理）が実行される、という具合である。 Subsequently, according to the second indexer program, the “CPU” stores the traffic rank related to the candidate URL in the “RAM”, and executes a size comparison process with a predetermined value held in the “secondary storage device” in advance. . If it is determined that the traffic rank is greater than or equal to a predetermined value as a result of the size comparison process, the URL is indexed as a new search target URL (additional update process to the index file held in the secondary storage device). Is executed.

<処理の流れ> <Process flow>

図２５は、本実施例の検索システムにおける処理の流れの一例を表すフローチャートである。なお、以下に示すステップは、媒体に記録され計算機を制御するためのプログラムを構成する処理ステップであっても構わない。 FIG. 25 is a flowchart illustrating an example of a processing flow in the search system according to this embodiment. Note that the steps shown below may be processing steps that constitute a program for controlling a computer recorded on a medium.

次に検索装置において、この図にあるように、ｎ個のＵＲＬと、当該ＵＲＬに関連付けたユーザーＩＤや閲覧時間などの情報を含む閲覧情報をツールバー装置より受信する（ステップＳ２５１１）。次に受信した閲覧情報を主メモリなどに格納する（ステップＳ２５１２）。その後、ループ１（ステップＳ２５１３）を開始とする以下の処理を実行する。まず蓄積されたｎ個の閲覧情報を関数の変数として利用し、トラフィックランクを算出する（ステップＳ２５１４）。 Next, in the search device, as shown in this figure, browsing information including n URLs and information such as user ID and browsing time associated with the URLs is received from the toolbar device (step S2511). Next, the received browsing information is stored in the main memory or the like (step S2512). Thereafter, the following processing starting from loop 1 (step S2513) is executed. First, the stored n browsing information is used as a function variable to calculate a traffic rank (step S2514).

つづいて検索装置のインデックスファイルに当該ＵＲＬが存在するか否かの第一の判断処理を行う（ステップＳ２５１５）。そして第一の判断処理の結果、インデックスファイルに当該ＵＲＬが存在しないとの判断結果が出力された場合、当該ＵＲＬに係るトラフィックランクが所定値以上であるか否かの判断処理を実行する（ステップＳ２５１６）。 Subsequently, a first determination process is performed to determine whether or not the URL exists in the index file of the search device (step S2515). If a determination result indicating that the URL does not exist in the index file is output as a result of the first determination process, a determination process is executed to determine whether or not the traffic rank related to the URL is equal to or greater than a predetermined value (step S2516).

そして第二の判断処理の結果、トラフィックランクが所定値以上であるとの判断結果が出力されれば、クローラの新規巡回検索対象として相応しいリソースである、として抽出したＵＲＬを、クローラの新規検索対象ＵＲＬとしてインデクシング処理を実行する（ステップＳ２５１７）。 If the determination result that the traffic rank is equal to or higher than the predetermined value is output as a result of the second determination process, the URL extracted as a resource suitable as a crawler's new patrol search target is used as the crawler's new search target. An indexing process is executed as the URL (step S2517).

<効果の簡単な説明> <Brief description of effect>

このように本実施例の検索システムによって、インデックスファイルへの追加（インデクサ）を行うに相応しいリソースであるかを判断した上で、新規開設Ｗｅｂページなどハイパーリンクが張られていないため従来のクローラでは検索対象となっていなかったようなＷＷＷ上のリソースについてもインデクサを実行することができる。 In this way, the search system of the present embodiment determines whether the resource is suitable for addition to the index file (indexer), and a hyperlink such as a newly opened Web page is not provided. The indexer can also be executed for resources on the WWW that have not been searched.

実施例１の検索システムにおける閲覧情報収集の一例を説明するための概念図Conceptual diagram for explaining an example of browsing information collection in the search system of Embodiment 1 実施例１の検索システムの検索装置に収集される閲覧情報の一例を表す図The figure showing an example of the browsing information collected by the search device of the search system of Example 1. 実施例１の検索システムにおける機能ブロックの一例を表す図The figure showing an example of the functional block in the search system of Example 1. 実施例１の検索システムにおけるハードウェア構成の一例を表す図1 is a diagram illustrating an example of a hardware configuration in a search system according to a first embodiment. 実施例１の検索システムにおける処理の流れの一例を表すフローチャート7 is a flowchart illustrating an example of a processing flow in the search system according to the first embodiment. 実施例２の検索システムにおける処理の一例を説明するための概念図Conceptual diagram for explaining an example of processing in the search system according to the second embodiment. 実施例２の検索システムにおける機能ブロックの一例を表す図The figure showing an example of the functional block in the search system of Example 2. 実施例２の検索システムにおけるハードウェア構成の一例を表す図The figure showing an example of the hardware constitutions in the search system of Example 2. 実施例２の検索システムにおける処理の流れの一例を表すフローチャートA flowchart showing an example of the flow of processing in the search system of Example 2. 実施例３の検索システムにおける処理の一例を説明するための概念図Conceptual diagram for explaining an example of processing in the search system according to the third embodiment. 実施例３の検索システムにおける機能ブロックの一例を表す図The figure showing an example of the functional block in the search system of Example 3. 実施例３の検索システムのトラフィックランクスコアリング部にて算出されたＷｅｂページごとのトラフィックランクの一例を表す図The figure showing an example of the traffic rank for every web page calculated in the traffic rank scoring part of the search system of Example 3. 実施例３の検索システムにおけるハードウェア構成の一例を表す図The figure showing an example of the hardware constitutions in the search system of Example 3. 実施例３の検索システムにおける処理の流れの一例を表すフローチャートA flowchart showing an example of the flow of processing in the search system of Example 3. 実施例４の検索システムにおける処理の一例を説明するための概念図Conceptual diagram for explaining an example of processing in the search system according to the fourth embodiment. 実施例４の検索システムにおける機能ブロックの一例を表す図The figure showing an example of the functional block in the search system of Example 4. 実施例４の検索システムにおけるハードウェア構成の一例を表す図The figure showing an example of the hardware constitutions in the search system of Example 4. 実施例４の検索システムにおける処理の流れの一例を表すフローチャートA flowchart showing an example of the flow of processing in the search system of Example 4. 実施例５の検索システムにおける処理の一例を説明するための概念図Conceptual diagram for explaining an example of processing in the search system according to the fifth embodiment. 実施例５の検索システムにおける機能ブロックの一例を表す図The figure showing an example of the functional block in the search system of Example 5. 実施例５の検索システムにおけるハードウェア構成の一例を表す図The figure showing an example of the hardware constitutions in the search system of Example 5. 実施例５の検索システムにおける処理の流れの一例を表すフローチャート10 is a flowchart illustrating an example of a processing flow in the search system according to the fifth embodiment. 実施例６の検索システムにおける機能ブロックの一例を表す図The figure showing an example of the functional block in the search system of Example 6. 実施例６の検索システムにおけるハードウェア構成の一例を表す図The figure showing an example of the hardware constitutions in the search system of Example 6. 実施例６の検索システムにおける処理の流れの一例を表すフローチャートFlowchart showing an example of a processing flow in the search system according to the sixth embodiment.

Explanation of symbols

０３００検索システム
０３１０ツールバー装置
０３１１閲覧情報取得部
０３１２閲覧情報送信部
０３２０検索装置
０７３０第一閲覧管理サーバ装置
０７３１第一インデクサ部
１１３０第二閲覧管理サーバ装置
１１３１閲覧情報蓄積部
１１３２トラフィックランクスコアリング部 0300 Search system 0310 Toolbar device 0311 Browse information acquisition unit 0312 Browse information transmission unit 0320 Search device 0730 First browse management server device 0731 First indexer unit 1130 Second browse management server device 1131 Browse information storage unit 1132 Traffic rank scoring unit

Claims

A browsing information acquisition unit that acquires browsing information including at least a browsing URL from a browser;
A browsing information transmitter that transmits the acquired browsing information to a predetermined browsing management server device;
A toolbar device having
A search device for collecting browsing information transmitted from the browsing information transmitting unit;
Search system consisting of

The search device includes:
A first browsing management server apparatus having a first indexer unit that extracts a new search target URL, which is a URL that is newly used as a search target in a search engine, based on the collected browsing information, and indexes the content of the extracted URL The search system according to claim 1, comprising:

The search device includes:
A browsing information storage unit for storing the collected browsing information;
The search system according to claim 1, further comprising a second browsing management server device having a traffic rank scoring unit that calculates a traffic rank that is an audience rating index for each URL based on browsing information stored in the browsing information storage unit. .

The search device includes:
Crawler part,
A schedule determination unit that determines a crawling schedule of the crawler unit based on the traffic rank calculated by the traffic rank scoring unit;
The search system according to claim 3.

The search system according to claim 3 or 4, wherein the search device includes a rank sort output unit that sorts search results based on the traffic rank calculated by the traffic rank scoring unit and outputs the result to the client.

The search device includes:
A second indexer unit that extracts a new search target URL, which is a URL to be newly used as a search target in the search engine, based on the traffic rank calculated by the traffic rank scoring unit and indexes the content of the extracted URL The search system according to any one of claims 3 to 5.

A search device that collects browsing information from a plurality of toolbar devices according to claim 1,
A first browsing management server apparatus having a first indexer unit that extracts a new search target URL, which is a URL that is newly used as a search target in a search engine, based on the collected browsing information, and indexes the content of the extracted URL Search device including

A search device that collects browsing information from a plurality of toolbar devices according to claim 1,
A browsing information storage unit for storing the collected browsing information;
A search device including a second browsing management server device having a traffic rank scoring unit that calculates a traffic rank that is an audience rating index for each URL based on browsing information stored in a browsing information storage unit.

The search device according to claim 8, further comprising a rank sort output unit that sorts search results based on the traffic rank calculated by the traffic rank scoring unit and outputs the result to the client.

Crawler part,
A schedule determination unit that determines a crawling schedule of the crawler unit based on the traffic rank calculated by the traffic rank scoring unit;
The search device according to claim 8 or 9, comprising:

A second indexer unit that extracts a new search target URL that is a URL to be newly used as a search target in the search engine based on the traffic rank calculated by the traffic rank scoring unit, and indexes the content of the extracted URL. The search device according to any one of claims 8 to 10.

A browsing information collection step for collecting browsing information including a browsing URL;
Extracting a new search target URL that is a URL to be newly used as a search target in the search engine based on the collected browsing information;
A first indexing step for indexing the content of the extracted URL;
A search method that causes a computer to execute.

A browsing information collection step for collecting browsing information including a browsing URL;
A browsing information storage step for storing collected browsing information;
A traffic rank scoring step for calculating a traffic rank that is an audience rating index for each URL based on browsing information;
A search method that causes a computer to execute.

The search method according to claim 13, further causing the computer to execute a rank sort output step of sorting search results based on the traffic rank calculated in the traffic rank scoring step and outputting the result to the client.

Crawler step,
A schedule determination step for determining a crawling schedule for the crawler step based on the traffic rank calculated in the traffic rank scoring step;
The search method according to claim 13 or 14, which causes a computer to execute.

Extracting a new search target URL that is a URL to be newly used as a search target in the search engine based on the traffic rank calculated in the traffic rank scoring step;
A second indexing step for indexing the content of the extracted URL;
The search method according to any one of claims 13 to 15, wherein the computer is executed.