JP2000339316A

JP2000339316A - Method and device for collecting retrieval link type information and recording medium with its method stored therein

Info

Publication number: JP2000339316A
Application number: JP11144833A
Authority: JP
Inventors: Kazuhiro Hayakawa; 和宏早川; Hiroto Inagaki; 博人稲垣; Kazuo Tanaka; 一男田中
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1999-05-25
Filing date: 1999-05-25
Publication date: 2000-12-08

Abstract

PROBLEM TO BE SOLVED: To provide such collecting method and device as make the priority with which data is collected coincide with the request of a retrieving person and to provide a recording medium in which a computer program is stored. SOLUTION: These method store a retrieval word for retrieving a database, extract a retrieval word whose number of appearance times is a high frequency among stored retrieval words (S32), select one link with the highest priority from a list of links according to the extracted retrieval word (S34), acquire and output information indicated by the selected link, extract all the links included in the acquired information, calculate the priority of each extracted link according to the extracted retrieval word with the high frequency and add the priority and link to the above link (S40) and repeat operations that follow the link selection of the whole links in the list.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、他の文書へのリン
クを含むような文書が複数のサーバーに分散配置され、
かつ各々のサーバーの保持する文書が独立に追加・削除
・更新されるような分散型ハイパーテキストシステムの
文書を網羅的に取得するための検索連動型情報収集方法
及び装置及びプログラムを記録した記録媒体に関する。[0001] The present invention relates to a document which is arranged such that a document including a link to another document is distributed to a plurality of servers,
And a search-linked information collection method and apparatus for comprehensively acquiring documents of a distributed hypertext system in which documents held by each server are independently added, deleted, and updated, and a recording medium recording the program About.

【０００２】[0002]

【従来の技術】インターネット上の分散型ハイパーテキ
ストシステムであるＷｏｒｌｄＷｉｄｅＷｅｂ（以下Ｗ
ＷＷ）では、多数のサーバーがその内容を勝手に追加・
削除・更新している。そのため、ＷＷＷの全体を検索す
るためには、まず各サーバーに置かれている文書のコピ
ーを一箇所に集積し、そのデータの中を検索する方法が
採られている。2. Description of the Related Art World Wide Web (hereinafter referred to as W) is a distributed hypertext system on the Internet.
In WW), many servers add their content without permission.
Deleted / updated. Therefore, in order to search the entire WWW, a method of first collecting copies of documents placed on each server in one place and searching the data is adopted.

【０００３】そのような検索システムでは、各サーバー
の文書のコピーを収集するプロセス（以下クローリング
プロセス）が存在する。クローリングプロセスは、デー
タを取得してはそのデータが参照しているリンクを辿り
つづけることにより、分散したデータを集めていく。同
時に、同一文書も一定期間毎に再収集して常に最新の文
書を保持している。しかし、このような収集方法は通常
長い時間がかかるため、効率よくデータを集積するため
に、データを収集する優先順位を決める必要がある。In such a search system, there is a process (hereinafter referred to as a crawling process) for collecting a copy of a document of each server. The crawling process collects distributed data by acquiring data and continuing to follow the link referred to by the data. At the same time, the same document is re-collected at regular intervals to keep the latest document. However, since such a collection method usually takes a long time, it is necessary to determine a priority of collecting data in order to efficiently collect data.

【０００４】従来、優先順位決定手法として米Ｌｙｃｏ
ｓ社の方法である「被参照リンクが多い文書を優先す
る」という方法があった。この方法は「他の文書からよ
く参照されている情報」すなわち常識的な、すぐに探し
出せる情報が優先されることになる。しかし、検索者は
「自分が知らない情報」「簡単には見つからなかった情
報」こそが求めている情報であり、この優先順位決定方
法ではそのような情報の優先順位が高くならないという
問題があった。Conventionally, Lyco US
There is a method of "company giving priority to a document having many referenced links" which is a method of company s. In this method, “information frequently referred to from other documents”, that is, common sense and information that can be found immediately is given priority. However, searchers are only looking for information that they do not know or information that is not easily found, and there is a problem that this priority determination method does not increase the priority of such information. Was.

【０００５】[0005]

【発明が解決しようとする課題】本発明の目的は、デー
タを収集する優先順位が検索者のニーズと合致するよう
なデータ収集方法を提供する検索連動型情報収集方法及
び装置及びプログラムを記録した記録媒体を提供するこ
とにある。SUMMARY OF THE INVENTION An object of the present invention is to record a search-linked information collection method, apparatus and program for providing a data collection method in which the priority of collecting data matches the needs of a searcher. It is to provide a recording medium.

【０００６】[0006]

【課題を解決するための手段】本発明の検索連動型情報
収集方法は、データベースを検索するための検索語を蓄
積し、蓄積された検索語から出現回数が高頻度の検索語
を抽出し、抽出された検索語に従ってリンクのリストか
らもっとも優先度が高いリンクを一つ選択し、選択した
リンクが指す情報を取得して出力し、取得した情報の内
部に含まれるリンクを全て抽出して抽出された各リンク
の優先度を前記の抽出された高頻度の検索語に従って算
出して優先度とリンクを前記リストに追加し、リストの
中の全てのリンクに対し上記リンクの選択以下の動作を
繰り返すことを特徴とする。A search-linked information collecting method according to the present invention accumulates search terms for searching a database, extracts a search term having a high frequency of appearance from the accumulated search terms, Select one link with the highest priority from the list of links according to the extracted search term, acquire and output the information indicated by the selected link, and extract and extract all the links included in the acquired information The calculated priority of each link is calculated in accordance with the extracted high-frequency search term, and the priority and the link are added to the list. It is characterized by repeating.

【０００７】本発明の検索連動型情報収集装置は、デー
タベースを検索するための検索語を蓄積する手段と、蓄
積された検索語から高頻度の検索語を抽出する手段と、
高頻度の検索語を含むデータにリンクされた情報を収集
してデータベースに追加する手段を具備する。本発明に
よれば、頻繁に用いられる検索後についてより豊富かつ
新しい情報を収集することができる。[0007] A search-linked information collecting apparatus according to the present invention comprises: means for accumulating search terms for searching a database; means for extracting high-frequency search terms from the accumulated search terms;
Means are provided for collecting information linked to data containing high-frequency search terms and adding the information to a database. ADVANTAGE OF THE INVENTION According to this invention, richer and newer information can be collected about the frequently used search.

【０００８】本発明の検索連動型情報収集プログラムを
記録した記録媒体は、データベースを検索するための検
索語を蓄積し、蓄積された検索語から高頻度の検索語を
抽出し、高頻度の検索語を含むデータにリンクされた情
報を収集してデータベースに追加する。[0008] A recording medium storing the search-linked information collection program of the present invention stores search terms for searching a database, extracts high-frequency search terms from the stored search terms, and performs high-frequency search. Gather information linked to data containing words and add it to the database.

【０００９】本プログラムは検索要求が頻繁なものにつ
いてより多くの情報を収集することをもっとも主要な特
徴とする。本発明によれば、頻繁に用いられる検索語に
ついてより豊富かつ新しい情報を収集することができ
る。[0009] The main feature of the present program is to collect more information about frequent search requests. ADVANTAGE OF THE INVENTION According to this invention, more rich and new information can be collected about the search word used frequently.

【００１０】[0010]

【発明の実施の形態】図１に本発明の一実施形態による
情報収集装置の全体構成を示す。入力装置１０１は検索
語を入力するためのものである。記憶装置１０４には検
索プログラム、情報収集プログラム、収集したデータを
格納するデータベース、および使用された検索語を記録
する検索履歴データが格納されている。FIG. 1 shows an overall configuration of an information collecting apparatus according to an embodiment of the present invention. The input device 101 is for inputting a search word. The storage device 104 stores a search program, an information collection program, a database for storing collected data, and search history data for recording used search terms.

【００１１】入力装置１０１からは検索語が入力され
る。制御装置１０３は検索語を記憶装置１０４内の検索
履歴データに追加される。また検索プログラムを起動し
検索語を用いてデータベースを検索し、結果を出力装置
１０２に出力する。A search word is input from the input device 101. The control device 103 adds the search term to the search history data in the storage device 104. In addition, a search program is started, a database is searched using the search word, and the result is output to the output device 102.

【００１２】一方、収集プログラムは検索要求の有無と
は関係なく独立に定期的に起動される。収集プログラム
は与えられた起点から文書中のリンク情報を使って次々
にリンクされた文書を収集する。一般的に、一つの文書
には複数のリンクが含まれており、未収集のリンクは優
先順位を付けて順番に処理する必要がある。On the other hand, the collection program is started independently and periodically regardless of the presence or absence of a search request. The collection program collects documents linked one after another using the link information in the document from a given starting point. Generally, one document includes a plurality of links, and uncollected links need to be prioritized and processed in order.

【００１３】このため、検索履歴データを参照し、使用
頻度が高い検索語を調べる。使用頻度が高い検索語が文
書中のリンクと関連度が高い場合、そのリンクの優先順
位を高くする。[0013] For this reason, a search term frequently used is checked by referring to the search history data. If a frequently used search word is highly related to a link in the document, the priority of the link is increased.

【００１４】収集した文書は記憶装置１０４内のデータ
ベースに追加され、検索用に供される。[0014] The collected documents are added to a database in the storage device 104 and provided for searching.

【００１５】なお簡単のため記憶装置１０４は一つとし
たが、一般的には入出力の負荷を軽減するため複数の記
憶装置１０４にデータベースやプログラムを分散しても
よい。また入力装置１０１・出力装置１０２はネットワ
ークで接続された別のコンピュータでもよい。Although one storage device 104 is used for simplicity, a database or a program may be generally distributed to a plurality of storage devices 104 in order to reduce an input / output load. Further, the input device 101 and the output device 102 may be different computers connected via a network.

【００１６】図２は図１の装置をＣＰＵ２０１を用いて
実現した場合のハードウェア構成を示す図である。図２
において、ＣＰＵ２０１にはメモリ２０２、表示装置で
あるディスプレイ２０４、入力装置であるキーボード２
０３、記憶装置であるハードディスク２０５が接続され
ている。ハードディスク２０５には、検索プログラム２
０６、収集プログラム２０７、データベース２０８、検
索履歴データ２０９が格納されている。FIG. 2 is a diagram showing a hardware configuration when the apparatus of FIG. 1 is realized using a CPU 201. FIG.
In the CPU 201, a memory 202, a display 204 as a display device, and a keyboard 2 as an input device are provided.
03, a hard disk 205 as a storage device is connected. The hard disk 205 has a search program 2
06, a collection program 207, a database 208, and search history data 209 are stored.

【００１７】図３は図２における情報収集プログラムの
フローチャートである。入力として情報を取得したい起
点リンクのリストをキューに入れ、高頻度検索語抽出の
ためのパラメータを与えてスタート（３１）し、高頻度
検索語を抽出する（３２）。高頻度の検索語の抽出につ
いては図４、図５、図６で説明する。リンクが空であれ
ば（３３）処理を終了する（４１）。リンクが空でなけ
れば、キューからもっとも優先順位の高いリンクを選択
し（３４）、選択したリンクをリストから削除しておく
（３５）。そしてそのリンクが指す情報を取得し（３
６）、取得した情報を出力する（３７）。次に取得した
情報の内部に含まれるリンクを全て抽出し（３８）、抽
出された各リンクの優先度を算出（３９）する。優先度
の求め方は後述する（図７）。そして優先度とリンクを
リストに追加する（４０）。次に別のリンクについて同
様の処理を行い、キューが空になるまで繰り返す（３
３）。FIG. 3 is a flowchart of the information collection program in FIG. A list of origin links from which information is desired to be acquired is put into a queue, parameters for extracting high-frequency search terms are given, the process is started (31), and high-frequency search terms are extracted (32). The extraction of a high-frequency search term will be described with reference to FIGS. If the link is empty (33), the process ends (41). If the link is not empty, the link with the highest priority is selected from the queue (34), and the selected link is deleted from the list (35). Then, the information indicated by the link is obtained (3
6) Output the obtained information (37). Next, all the links included in the acquired information are extracted (38), and the priority of each extracted link is calculated (39). The method of obtaining the priority will be described later (FIG. 7). Then, the priority and the link are added to the list (40). Next, the same processing is performed for another link, and the processing is repeated until the queue becomes empty (3.
3).

【００１８】なお実際に運用する場合には、無限ループ
になったりあまりにも時間がかかるのを防止するため、
一定時間が経過したらキューが空でなくても終了するよ
うにしておく必要がある。In actual operation, in order to prevent an infinite loop or taking too much time,
After a certain period of time, it is necessary to terminate even if the queue is not empty.

【００１９】リンクの優先順位は、あらかじめ与えられ
た検索履歴データ中のキーワードとキーワードの出現頻
度の組を用いて算出する。各キーワードについて、（リ
ンクとキーワードとの関連度×キーワードの出現頻度）
を求め、これを各キーワードについて合計した量が大き
いものほど高い優先順位を与える。The priority of the link is calculated using a set of a keyword and a frequency of appearance of the keyword in search history data given in advance. For each keyword, (Relationship between link and keyword x Frequency of keyword appearance)
, And the larger the sum of the keywords, the higher the priority.

【００２０】ここで、リンクとキーワードとの関連度の
算出方法は、そのリンクを含む文書全体とキーワードと
の関連度を流用することが考えられる。文書とキーワー
ドとの関連度としては、ＴＦ＊ＩＤＦが知られている。
この場合、同一文書に含まれるリンクは同一の優先順位
が与えられることになる。ここで、ＴＦ（ＴｅｒｍＦｒ
ｅｑｕｅｎｃｙ）は単語の出現頻度をあらわし、ＩＤＦ
（ＩｎｖｅｒｓｅｄＤｏｃｕｍｅｎｔＦｒｅｑｕｅｎ
ｃｙ）は複数の文書の中の何％の文書に単語がでてくる
かの数値の逆数であり、＊は乗算をあらわす。Here, as a method of calculating the degree of association between the link and the keyword, it is conceivable to divert the degree of association between the entire document including the link and the keyword. TF * IDF is known as the degree of association between a document and a keyword.
In this case, the links included in the same document are given the same priority. Here, TF (TermFr
EQF) indicates the frequency of occurrence of a word, and IDF
(Inversed Document Frequency
cy) is the reciprocal of the numerical value of what percentage of a plurality of documents a word appears in, and * represents multiplication.

【００２１】また、ＨＴＭＬの場合にはリンク情報は単
語ないし文節に対して与えられていることが多いので、
この単語の検索履歴データにおける出現頻度を関連度と
して用いることも考えられる。In the case of HTML, since link information is often given to words or phrases,
It is also conceivable to use the appearance frequency of this word in the search history data as the relevance.

【００２２】図４はリンクの優先度の計算例を示した図
である。ここではリンクが付与されている単語の検索履
歴データにおける出現頻度を関連度として用いている。FIG. 4 is a diagram showing an example of calculating the priority of a link. Here, the frequency of appearance of the word to which the link is assigned in the search history data is used as the degree of association.

【００２３】まず、検索履歴データから出現頻度の高い
単語として「Ｌｉｎｕｘ」「チャット」「ＭＩＤＩ」
「ダウンロード」「Ｗｉｎｄｏｗｓ」が得られ、それぞ
れ出現回数が図４（Ａ）のようであったとする。一方、
リンクの優先順位を計算したい文書には、「Ｌｉｎｕｘ
のページ」「ＭＩＤＩのページ」の二つのリンクが含ま
れていたとする。すると、「Ｌｉｎｕｘのページ」とい
うリンクには「Ｌｉｎｕｘ」という単語が含まれている
ので、このリンクの検索履歴との関連度は４５０００と
なる。同様に「ＭＩＤＩのページ」の関連度は３７００
０である。優先順位は数値が大きいほうが優先されると
しておけば、この関連度をそのまま優先順位として採用
すればよい。First, the words "Linux", "chat" and "MIDI" are used as words having a high frequency of appearance from the search history data.
It is assumed that “download” and “Windows” are obtained and the number of appearances is as shown in FIG. on the other hand,
Documents for which you want to calculate link priorities include "Linux
Page "and" MIDI page "are included. Then, since the link “Page of Linux” includes the word “Linux”, the relevance of the link to the search history is 45000. Similarly, the relevance of “MIDI page” is 3700
0. Assuming that the higher the numerical value, the higher the priority, the degree of relevance may be used as the priority as it is.

【００２４】図５は本発明において蓄積された検索語か
ら高頻度の検索語を抽出するステップの例を示したもの
である。（Ａ）は蓄積された検索語の例、（Ｃ）は抽出
された高頻度検索語の例、（Ｂ）は抽出する処理の内容
である。FIG. 5 shows an example of steps for extracting a high-frequency search word from the search words stored in the present invention. (A) is an example of accumulated search terms, (C) is an example of extracted high-frequency search terms, and (B) is the content of the extraction process.

【００２５】検索システムでは、図５（Ａ）のようにい
つどのような検索が行われたかを検索履歴ファイルに記
録している。In the search system, as shown in FIG. 5A, when and what search is performed are recorded in a search history file.

【００２６】図５（Ｂ）は高頻度検索語の抽出処理の動
作フローチャートで、日数Ｄと出力語数Ｎを入力してス
タートする（５１）。Ｄ日前（例えば３日前）から前日
までの検索ログを読み込み（５２）、ログの中の各検索
語の出現回数を調べる（５３）。次に、検索語を出現回
数の多い順に並べ替え（５４）、検索語と出現回数を出
現回数の多い方からＮ個（例えば上位２０％）を出力し
て（５５）、終了する（５６）。出力されるものは、
「順位、検索語、Ｄ日前から前日までの検索語の出現回
数」をＮ個並べた表である。FIG. 5B is an operation flowchart of the extraction processing of a high-frequency search word, which is started by inputting the number of days D and the number of output words N (51). The search log from the day before D (for example, three days before) to the previous day is read (52), and the number of appearances of each search word in the log is checked (53). Next, the search words are rearranged in descending order of the number of appearances (54), and the search words and the number of appearances are output from the most frequently appearing N (for example, the top 20%) (55), and the process ends (56) . The output is
It is a table in which N ranks, search words, and the number of appearances of search words from D days ago to the previous day are arranged.

【００２７】ここで過去何日分の履歴を用いるか、また
上位何％の検索語を出力するかは検索システムの利用頻
度や検索語のばらつきにより変更してよい。また、検索
語が非常に多い場合には、（Ｂ）においてすべての検索
語を出現回数に応じて並べ替える代わりに、ある一定頻
度以下の検索語は切り捨てた残りの検索語についてのみ
並べ替え処理を行ってよい。Here, how many days of history in the past are used, and what percentage of the search words are output may be changed depending on the frequency of use of the search system and variations in the search words. If the number of search words is very large, instead of rearranging all search words according to the number of appearances in (B), search words below a certain frequency are rearranged only for the remaining search words that have been truncated. May be performed.

【００２８】図６は検索語の抽出基準として検索語の用
いられた回数の代わりに用いられた回数の伸び率を用い
る場合の例である。まず、図５で得られた検索語の順位
及び出現回数の過去の計算結果を記録しておく。FIG. 6 shows an example in which an expansion rate of the number of times of use of the search word is used instead of the number of times of use of the search word as a search word extraction criterion. First, the past calculation results of the ranking and the number of appearances of the search word obtained in FIG. 5 are recorded.

【００２９】次に、新たに図５に従い計算された検索語
の順位及び前日の各検索語の順位から、各検索語につい
て前日と現在の順位の差を求める。たとえば図６（Ａ）
において「金利」は３位から２位に上昇しているので＋
１、逆に「株」は２位から３位に下降しているので−１
となる。Next, the difference between the previous day and the current order for each search word is determined from the order of the search words newly calculated according to FIG. 5 and the order of each search word on the previous day. For example, FIG.
In the "interest rate" has risen from third to second place
1. Conversely, “shares” have dropped from second to third, so -1
Becomes

【００３０】この変動分を最新の順位から減算する。す
ると「金利」は１位、「株」は４位となる。This variation is subtracted from the latest ranking. Then "interest rate" ranks first and "shares" rank fourth.

【００３１】上記のように順位の変動を用いることで、
各検索語の今後の順位を推定し、それを実際の順位の代
わりに用いることができる。この例ではもっとも簡単に
１日前の順位との差を順位の変化の傾きとして用いた
が、検索システムの性質によってどのくらい過去の順位
を用いるか、また何次の推定を行うかは変えてよい。By using the change of the rank as described above,
The future ranking of each search term can be estimated and used in place of the actual ranking. In this example, the difference from the ranking one day ago is most simply used as the gradient of the change in the ranking. However, how much the ranking in the past is used and how many estimations are performed may be changed depending on the nature of the search system.

【００３２】図７は、図５や図６で得られた各検索語の
順位に基づき、リンクの優先順位を決定する処理を示す
動作フローチャートである。入力として、優先度を算出
したいリンクのリストと高頻度検索語の出力結果を与え
てスタートする（７１）。全てのリンクの優先度を算出
ずみ（７２）であれば処理を終了（７８）する。算出ず
みでなければ、優先度未算出のリンクを一つ選択し（７
３）、リンクのタイトルを単語に分割する（７４）。分
割するには漢字の連続やカタカナの連続を単語とみなす
方法や、辞書を用いた形態素解析を用いることができ
る。次にあらかじめ算出しておいた高頻度検索語の出力
結果からタイトルの各単語の出現回数を求め（７５）、
出現回数の合計を算出して（７６）、算出結果を選択さ
れたリンクの優先度として出力する（７７）。以上の動
作を全てのリンクについて繰り返す（７２）。FIG. 7 is an operation flowchart showing a process of determining the priority of a link based on the order of each search word obtained in FIG. 5 or FIG. As an input, a list of links whose priority is to be calculated and an output result of a frequently searched word are given to start (71). If the priorities of all the links have been calculated (72), the process ends (78). If the link has not been calculated, one link whose priority has not been calculated is selected (7
3) Divide the link title into words (74). For the division, a method in which a sequence of kanji or katakana is regarded as a word or a morphological analysis using a dictionary can be used. Next, the number of appearances of each word of the title is obtained from the output result of the high-frequency search word calculated in advance (75),
The total number of appearances is calculated (76), and the calculation result is output as the priority of the selected link (77). The above operation is repeated for all links (72).

【００３３】リンクの表題としてしばしば「ここ」「こ
れ」といった指示代名詞しかない場合がある。たとえば
「○○新聞のページ」というリンクを作る代わりに、
「○○新聞はここ」という文の「ここ」の部分にリンク
を設定するというような場合である。図７においてリン
クの表題を取得する代わりにリンクを含む文全体の単語
を用いることで、このような場合に対応することができ
る。Often, there are only demonstrative pronouns such as "here" and "this" as link titles. For example, instead of creating a link called “XX newspaper page”,
This is a case where a link is set at the “here” portion of the sentence “XX newspaper is here”. By using the words of the entire sentence including the link instead of acquiring the title of the link in FIG. 7, such a case can be dealt with.

【００３４】[0034]

【発明の効果】以上説明したように、本発明の請求項１
の検索連動型情報収集方法を用いれば、頻繁に用いられ
る検索語についてより豊富かつ新しい情報を収集するこ
とができる。As described above, according to the first aspect of the present invention,
By using the search-linked information collection method, richer and newer information about frequently used search words can be collected.

【００３５】本発明の請求項２の検索連動型情報収集装
置を用いれば、頻繁に用いられる検索語についてより豊
富かつ新しい情報を収集することができる。With the use of the search-linked information collection device according to the second aspect of the present invention, abundant and new information can be collected for frequently used search words.

【００３６】本発明の請求項３の検索連動型情報収集プ
ログラムを記録した記録媒体を用いれば、頻繁に用いら
れる検索語についてより豊富かつ新しい情報を収集する
ことができる。By using the recording medium storing the search-linked information collection program according to the third aspect of the present invention, richer and newer information can be collected for frequently used search words.

[Brief description of the drawings]

【図１】本発明の一実施形態に必要となるハードウェア
の全体構成を示した図である。FIG. 1 is a diagram illustrating an overall configuration of hardware necessary for an embodiment of the present invention.

【図２】図１の装置を、ＣＰＵを用いて実現した場合の
ハードウェア構成を示す図である。FIG. 2 is a diagram illustrating a hardware configuration when the device in FIG. 1 is implemented using a CPU.

【図３】図２における情報収集プログラムのフローチャ
ートである。FIG. 3 is a flowchart of an information collection program in FIG. 2;

【図４】リンクの優先度の計算例を示した図である。FIG. 4 is a diagram illustrating a calculation example of link priorities;

【図５】本発明において蓄積された検索語から高頻度の
検索語を抽出する例を示す図である。FIG. 5 is a diagram illustrating an example of extracting a high-frequency search word from search words stored in the present invention.

【図６】検索語の抽出基準として検索語の用いられた回
数の代わりに、用いられた回数の伸び率を用いる場合
の、検索語を抽出する例を示す図である。FIG. 6 is a diagram illustrating an example of extracting a search word in a case where an expansion rate of the number of times of use of the search word is used instead of the number of times of use of the search word as a search word extraction criterion.

【図７】リンクの優先順位を決定する処理の動作フロー
である。FIG. 7 is an operation flow of a process of determining a priority order of a link.

[Explanation of symbols]

１０１入力装置１０２出力装置１０３制御装置１０４記憶装置２０１ＣＰＵ２０２メモリ２０３キーボード２０４ディスプレイ２０５ハードディスク２０６検索プログラム２０７収集プログラム２０８データベース２０９検索履歴データ Reference Signs List 101 input device 102 output device 103 control device 104 storage device 201 CPU 202 memory 203 keyboard 204 display 205 hard disk 206 search program 207 collection program 208 database 209 search history data

フロントページの続き (72)発明者田中一男東京都新宿区西新宿三丁目19番２号日本電信電話株式会社内Ｆターム(参考） 5B075 ND03 ND36 NK31 PR04 Continued on the front page (72) Inventor Kazuo Tanaka 3-19-2 Nishishinjuku, Shinjuku-ku, Tokyo F-term (reference) in Nippon Telegraph and Telephone Corporation 5B075 ND03 ND36 NK31 PR04

Claims

[Claims]

In a method for collecting information, a search term for searching a database is accumulated, a search term having a high frequency of appearance is extracted from the accumulated search terms, and a link of a link is extracted according to the extracted search term. Select one link with the highest priority from the list, obtain and output the information indicated by the selected link, extract all the links included in the obtained information, and set the priority of each extracted link. Calculating according to the extracted high-frequency search term, adding a priority and a link to the list, and repeating the operation following the selection of the link for all the links in the list. Linked information collection method.

2. A search-linked information collection device for collecting information, comprising: means for accumulating search terms for searching a database; means for extracting a search term having a high frequency of appearance from the accumulated search terms. Means for selecting one of the links having the highest priority from the list of links according to the extracted search term; means for acquiring and outputting information indicated by the selected link; and linking included in the acquired information. Means for calculating the priority of each extracted and extracted link according to the extracted high-frequency search term, and adding the priority and the link to the list; Means for repeating the operation following the selection of a link.

3. A storage medium storing a computer program for collecting information, accumulating search terms for searching a database, extracting a search term having a high frequency of appearance from the accumulated search terms, and extracting the extracted search terms. One of the links with the highest priority is selected from the list of links according to the search term obtained, the information indicated by the selected link is obtained and output, and all the links included in the obtained information are extracted and extracted. A computer that calculates the priority of each link according to the extracted high-frequency search term, adds the priority and the link to the list, and repeats the above-mentioned operation of selecting the link for all the links in the list. A storage medium storing a search-linked information collection program, characterized by storing the program.