JPH09218876A

JPH09218876A - Node link searching device

Info

Publication number: JPH09218876A
Application number: JP8022343A
Authority: JP
Inventors: Shinya Kubo; 信也久保
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1996-02-08
Filing date: 1996-02-08
Publication date: 1997-08-19
Anticipated expiration: 2016-02-08
Also published as: JP2940459B2

Abstract

PROBLEM TO BE SOLVED: To provide a node link searching device capable of limiting the search of unrequired nodes having no semantic connection. SOLUTION: The number of the nodes from the node to be an origin to the node of a search destination is turned to a hierarchy number and the limiting conditions of a search range are set by the maximum value of the hierarchy number. A file collecting part 21 traces a link from the node to be the origin, successively reads the nodes from a server machine 13, obtains the title of the link destination node, a storage position and the correspondence relation of a link origin and a link destination from the nodes and registers them. A hierarchy calculation part 27 obtains the hiearchy number of the link destination node of the node every time the node is read from the server machine 13. When the hierarchy number becomes more than the set maximum value, the tracing of the further link and the read of the nodes are stopped. Since the search range is limited by the hierarchy number from the origin, only the nodes with the semantic connection with the node of the origin are searched in a required range.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、ノードとリンクか
らなるデータベースにおいて指定された条件の下で探索
の対象とすべきノードの範囲を見い出すノード・リンク
探索装置に係わり、特に分散ハイパーメディアシステム
のようにノードがネットワーク上の複数のサーバに分散
して格納されたデータベースで用いられるノード・リン
ク探索装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a node / link search device for finding a range of nodes to be searched under a specified condition in a database of nodes and links, and more particularly to a distributed hypermedia system. The present invention relates to a node / link search device used in a database in which nodes are distributed and stored in a plurality of servers on the network.

【０００２】[0002]

【従来の技術】データベースの１つとして、ドキュメン
トファイルなど各種情報を格納した複数のノードをこれ
らの従属関係を表わしたリンクをたどることによって検
索を行うハイパーテキストシステムがある。このような
ハイパーテキストでの検索作業は、ノードに格納されて
いる情報のインデックスと、ノード相互間のリンク関係
を予め保持しておくことでスムースに進めることができ
る。2. Description of the Related Art As one of databases, there is a hypertext system in which a plurality of nodes storing various kinds of information such as document files are searched by following links representing their subordinate relationships. Such a hypertext search operation can be smoothly performed by preliminarily retaining the index of information stored in the node and the link relation between the nodes.

【０００３】特開平４−３２１１４４号公報には、ノー
ド間相互間の関係を容易に把握できるように表示できる
ハイパーテキストシステムが開示されている。各ノード
は任意のノードにリンクすることができるので、複数の
ノード間でループ状にリンクが形成されることがある。
このシステムでは、ノード相互間のリンク関係を予め登
録したテーブルを基にして表示画面を作成している。そ
して、ループ状になっているリンクを切り離し、あるノ
ードを起点としたときに木構造としてノード相互間のリ
ンク関係が表示されるようにしている。Japanese Unexamined Patent Publication No. 4-321144 discloses a hypertext system that can be displayed so that the relationships between nodes can be easily grasped. Since each node can be linked to any node, a link may be formed in a loop between a plurality of nodes.
In this system, a display screen is created based on a table in which link relationships between nodes are registered in advance. Then, the looped links are separated, and when a certain node is set as a starting point, the link relationship between the nodes is displayed as a tree structure.

【０００４】従来、ハイパーテキストシステムはローカ
ルのワークステーションなどに構築されていたが、近年
の通信技術の発達により、ネットワークを介して接続さ
れた複数のサーバにノードを分散して格納するものが登
場している。このようなシステムは、分散ハイパーメデ
ィアシステムと呼ばれている。たとえば、インターネッ
ト上の情報発信手段として、ワールド・ワイド・ウェブ
（World Wide Web 以下ＷＷＷと表わす。) が注目され
ている。ハイパーメディアとは、文字や表などのテキス
トデータだけでなく、動画や音声などのマルチメディア
データも扱うことのできるハイパーテキストのことであ
る。Conventionally, a hypertext system has been built in a local workstation or the like, but with the recent development of communication technology, a system in which nodes are distributed and stored in a plurality of servers connected via a network has appeared. doing. Such a system is called a distributed hypermedia system. For example, the World Wide Web (hereinafter referred to as WWW) is drawing attention as a means of transmitting information on the Internet. Hypermedia is hypertext that can handle not only text data such as characters and tables but also multimedia data such as video and audio.

【０００５】分散ハイパーメディアシステムでは、各サ
ーバに格納されているノードの情報をネットワークを介
して取得することによって、ノード相互間のリンク関係
を表わしたテーブルを作成するようになっている。ノー
ド相互間のリンク関係を表わした情報をノード・リンク
情報データベースと呼ぶことにする。また、ノード・リ
ンク情報データベースを作成する装置をノード・リンク
探索装置と呼ぶことにする。In the distributed hypermedia system, a table showing a link relationship between nodes is created by acquiring information on nodes stored in each server via a network. Information representing the link relationship between nodes will be referred to as a node / link information database. A device that creates the node / link information database will be referred to as a node / link search device.

【０００６】ワールド・ワイド・ウェブ（ＷＷＷ）で
は、ノードはインターネット上の複数のサーバマシンに
分散して存在しており、各サーバに蓄積されているノー
ドのデータはハイパーテキスト・トランスファ・プロ
トコル(HyperText Transfer Protocol 以下ＨＴＴＰと
表わす。) と呼ばれる手順に従って転送される。ノード
になっているマルチメディアドキュメントは、ハイパー
テキスト・マークアップ・ランゲージ(HyperText Marku
p Language 以下ＨＴＭＬと表わす。) と呼ばれるハイ
パーメディア記述言語形式で記述されている。[0006] In the World Wide Web (WWW), nodes are distributed among a plurality of server machines on the Internet, and the data of the nodes accumulated in each server is a hypertext transfer protocol (HyperText). Transfer Protocol (hereinafter referred to as HTTP)). The multimedia documents that are nodes are HyperText Markup Languages.
p Language Hereinafter referred to as HTML. ) Is described in a hypermedia description language format.

【０００７】ノード・リンク探索装置は、ネットワーク
を介して取得したＨＴＭＬで記述されているノードの内
容を解析して、次のリンク先のノードの格納場所を表わ
すハイパーリンクを抽出する。ハイパーリンクは、ＨＴ
ＭＬで規定されているその開始位置を示す所定の文字列
（これをタグと呼ぶ。）と終了位置を示すタグを検索す
ることで、これらの間の文字列として抽出される。The node / link search device analyzes the content of the node described in HTML acquired via the network and extracts a hyperlink indicating the storage location of the next link destination node. Hyperlink is HT
By searching for a predetermined character string (this is called a tag) indicating the start position and a tag indicating the end position defined by the ML, a character string between them is extracted.

【０００８】リンク先ノードを表わすハイパーリンク
は、ユニフォーム・リソース・ロケーターズ(Uniform R
esource Locators 以下ＵＲＬと表わす。) と呼ばれる
表記形式により記述されている。取得したノードのテキ
スト中からハイパーリンクを示すタグに挟まれた部分を
見い出し、この中からＵＲＬで記述された文字列を抽出
することによってノードとハイパーリンクの探索を行う
ことができる。ノード・リンク探索装置は、このような
探索を繰り返すことによって、ノード・リンク情報デー
タベースを構築する。ノード・リンク探索装置として
は、“WWW Wanderer”、“WWW Robot ”、“WWW Spide
r”などと呼ばれるものがある。A hyperlink representing a destination node is a uniform resource locator (Uniform R
esource Locators Referred to as URL below. ) Is used for description. The node and the hyperlink can be searched by finding the part sandwiched between the tags indicating the hyperlinks in the acquired text of the node and extracting the character string described by the URL from the part. The node / link search device builds a node / link information database by repeating such a search. "WWW Wanderer", "WWW Robot", "WWW Spide"
There is something called "r".

【０００９】リンク先ノードを表わすハイパーリンク
は、ＵＲＬにおいて以下の構成が定義されている。＜スキーム(Scheme)＞：＜スキーム特有部(Scheme-Spec
ific-Part)＞ワールド・ワイド・ウェブ（ＷＷＷ）でノードのドキュ
メントファイルの転送プロトコルとして用いられるハイ
パーテキスト・トランスファ・プロコル（ＨＴＴＰ）ス
キームの場合には、ハイパーリンクは以下のように表さ
れる。 http://<host>:<port>:<path>?< searchpart>The hyperlink representing the linked node has the following structure defined in the URL. <Scheme>: <Scheme-Spec
ific-Part> In the case of the Hypertext Transfer Protocol (HTTP) scheme used as a transfer protocol of a document file of a node in the World Wide Web (WWW), the hyperlink is represented as follows. http: // <host>: <port>: <path>? <searchpart>

【００１０】ここで、“<host>”はドキュメントファイ
ルの存在するサーバマシンのホスト名を、“<port>”は
通信を行う際のポート番号を表わしている。ポート番号
は省略可能であり、省略した場合には標準値“８０”が
指定されたものとして取り扱われる。“<path>”はサー
バマシン上のドキュメントファイルの存在する場所を表
わしている。また、“< searchpart> ”はサーバマシン
にデータを渡す場合に用いられる領域である。Here, "<host>" represents the host name of the server machine in which the document file exists, and "<port>" represents the port number for communication. The port number can be omitted. If omitted, the standard value "80" is treated as specified. "<Path>" represents the location of the document file on the server machine. "<Searchpart>" is an area used when data is passed to the server machine.

【００１１】図１４は、ノードとこれらノード間を接続
するハイパーリンクの一例を表わしたものである。ノー
ド１００１、ノード１００２、ノード１００３にそれぞ
れ含まれるドキュメントファイルは、ＨＴＭＬで記述さ
れている。ドキュメントファイルの内容は、各種のタグ
によりその内容が分類され識別可能になっている。この
うちハイパーリンクを示すタグは、<A HREF="URL">…</
A>で表され、"URL" に対応する部分がリンク先ノードを
示している。ノード１００１のドキュメントファイルで
は、点線で囲んだ領域１００４、１００５がそれぞれハ
イパーリンクを表わしている。FIG. 14 shows an example of nodes and hyperlinks connecting these nodes. The document files included in the node 1001, the node 1002, and the node 1003 are described in HTML. The contents of the document file are categorized and identifiable by various tags. Of these, the tag that indicates a hyperlink is <A HREF="URL">… </
It is represented by A>, and the part corresponding to "URL" indicates the link destination node. In the document file of the node 1001, areas 1004 and 1005 surrounded by dotted lines respectively represent hyperlinks.

【００１２】ノード１００１は、ＵＲＬで“http://hos
tA/tale/TOC.html”のように表記されるノードであると
する。また、ノード１００２は“http://hostB:80/tale
/Introduction.html”と、ノード１００３は“http://h
ostC:8080/tale/Birth.html”と表記されるノードであ
るものとする。ノード１００１内のハイパーリンク１０
０４は、ノード１００２を指し示している。またノード
１００１内のハイパーリンク１００５は、ノード１００
３を指し示している。このようにノードのドキュメント
ファイル内に登録されているハイパーリンクによって、
リンク先のノードが表される。The node 1001 has a URL of "http: // hos
It is assumed that the node is represented as “tA / tale / TOC.html.” The node 1002 is “http: // hostB: 80 / tale.
/Introduction.html "and the node 1003 is" http: // h
It is assumed that the node is represented as “ostC: 8080 / tale / Birth.html”. Hyperlink 10 in the node 1001
Reference numeral 04 indicates the node 1002. The hyperlink 1005 in the node 1001 is the node 100
3 is indicated. By the hyperlink registered in the node document file like this,
The linked node is represented.

【００１３】図１５は、ノードのドキュメント同士の関
係の一例を表わしたものである。一点破線で囲んだ領域
は、ノードの格納されているサーバを表わしている。領
域１０１１はホスト名が“Ａ”のサーバマシンを、領域
１０１２はホスト名が“Ｂ”のサーバマシンを、領域１
０１３はホスト名が“Ｃ”のサーバマシンをそれぞれ表
わしている。FIG. 15 shows an example of the relationship between the documents of the nodes. The area enclosed by the dashed line represents the server in which the node is stored. Area 1011 is a server machine with a host name "A", area 1012 is a server machine with a host name "B", area 1
013 represents each server machine whose host name is "C".

【００１４】図１４に示したノード１００１〜１００３
に含まれるドキュメントファイルのうち、ＨＴＭＬで規
定されたタグなどを取り除いた内容１０１４〜１０１６
はそのノードの格納さているサーバを表わす領域１０１
１〜１０１３内に表わしある。また、ノード１００１の
リンク先ノードがノード１００２とノード１００３であ
ることをハイパーリンク１０１７、１０１８により表わ
している。Nodes 1001 to 1003 shown in FIG.
Content of the document file included in the above, which is obtained by removing tags specified by HTML etc. 1014 to 1016
Is an area 101 representing the server storing the node
Represented within 1-1013. Also, hyperlinks 1017 and 1018 represent that the link destination nodes of the node 1001 are the node 1002 and the node 1003.

【００１５】次に、各ノードに登録されているハイパー
リンクを基にしてノード・リンク情報データベースの構
築を行うノード・リンク探索装置の構成を説明する。Next, the configuration of the node / link search device for constructing the node / link information database based on the hyperlinks registered in each node will be described.

【００１６】図１６は、従来から使用されているノード
・リンク探索装置の構成の概要を表わしたものである。
ノード・リンク探索装置１０２１には、インターネット
などのネットワーク１０２２を介して複数のサーバマシ
ン１０２３が接続されている。各サーバマシン１０２３
には、ドキュメントファイルを格納したノードが蓄積さ
れている。ノード・リンク探索装置１０２１は、サーバ
マシン１０２３からノードのファイルを収集するファイ
ル収集部１０３１と、サーバマシン１０２３から取得し
たノードのファイルの中からハイパーリンクを抽出する
ハイパーリンク抽出部１０３２を備えている。FIG. 16 shows an outline of the configuration of a conventionally used node / link search device.
A plurality of server machines 1023 are connected to the node / link search device 1021 via a network 1022 such as the Internet. Each server machine 1023
The node that stores the document file is stored in. The node / link search device 1021 includes a file collection unit 1031 that collects node files from the server machine 1023, and a hyperlink extraction unit 1032 that extracts hyperlinks from the node files acquired from the server machine 1023. .

【００１７】また抽出したハイパーリンクを基にしてノ
ードの格納先などの各種属性情報およびノード相互間の
リンク情報を蓄積・管理するノード・リンク情報データ
ベース１０３３を有する。探索範囲制御部１０３４は、
図示しない入力端末あるいは設定用の外部ファイルから
与えられる探索範囲の条件に合致する範囲にノードの探
索範囲を制限する部分である。システム制御部１０３５
は、ノード・リンク探索装置１０２１内の各部の動作の
流れを統括的に制御する回路部分である。Further, it has a node / link information database 1033 for accumulating / managing various attribute information such as storage destinations of nodes and link information between nodes based on the extracted hyperlinks. The search range control unit 1034
This is a part that limits the search range of the node to a range that matches the conditions of the search range given from an input terminal (not shown) or an external file for setting. System control unit 1035
Is a circuit part that integrally controls the flow of operation of each part in the node / link search device 1021.

【００１８】ハイパーリンク抽出部１０３２は、ハイパ
ーメディア記述言語で書かれているノードのマルチメデ
ィアドキュメントファイルの内容を解析するハイパーメ
ディア記述言語解析部１０３６を備えている。ハイパー
メディア記述言語解析部１０３６は、各種タグを検出す
ることによってハイパーリンクとリンク先ノードを抽出
するようになっている。探索範囲は、サーバマシン１０
２３の識別子としてのホスト名によって指定される。探
索範囲制御部１０３４は、ノードの格納先のサーバマシ
ンのホスト名と指定された探索範囲としてのホスト名と
を比較するホスト名比較部１０３７を備えている。ノー
ドの格納位置を表わす情報の一部としてのホスト名を比
較することにより、ノードが探索範囲内のものであるか
否かを判定するようになっている。The hyperlink extraction unit 1032 includes a hypermedia description language analysis unit 1036 that analyzes the contents of the multimedia document file of the node written in the hypermedia description language. The hypermedia description language analysis unit 1036 is adapted to extract hyperlinks and link destination nodes by detecting various tags. The search range is the server machine 10
It is designated by the host name as an identifier of 23. The search range control unit 1034 includes a host name comparison unit 1037 that compares the host name of the server machine of the storage destination of the node with the host name as the specified search range. By comparing the host name as a part of the information indicating the storage position of the node, it is determined whether or not the node is within the search range.

【００１９】ノード・リンク探索装置１０２１のファイ
ル収集部１０３１は、探索範囲として指定されたホスト
名と一致するサーバマシン１０２３からネットワーク１
０２２を通じてノードのマルチメディアドキュメントフ
ァイルを読み込む。そして、ハイパーリンク抽出部１０
３２により読み込んだファイルからハイパーリンクとリ
ンク先ノードの情報を抽出する。探索範囲制御部１０３
４は抽出したリンク先ノードの格納されているサーバマ
シンのホスト名が探索範囲内の場合には、そのリンク先
ノードに対して探索を継続する。一方、リンク先ノード
の格納先のサーバマシンが探索範囲外のときは、それよ
り先のノードへの探索は中止する。ノード・リンク情報
データベース１０３３には、探索範囲内のノードおよび
ハイパーリンクについての各種属性が登録される。The file collection unit 1031 of the node / link search device 1021 operates from the server machine 1023 that matches the host name specified as the search range to the network 1
Read the multimedia document file of the node through 022. Then, the hyperlink extraction unit 10
Information of the hyperlink and the link destination node is extracted from the file read by 32. Search range control unit 103
When the host name of the server machine storing the extracted link destination node is within the search range, 4 continues the search for the link destination node. On the other hand, when the server machine of the storage destination of the link destination node is out of the search range, the search to the nodes beyond that is stopped. Various attributes of nodes and hyperlinks within the search range are registered in the node / link information database 1033.

【００２０】このようにホスト名を探索範囲として指定
するものの他に、何ら探索範囲を指定できないノード・
リンク探索装置も存在する。As described above, in addition to the one that specifies the host name as the search range, the node that cannot specify the search range at all
There is also a link search device.

【００２１】[0021]

【発明が解決しようとする課題】探索範囲を指定するこ
とのできないノード・リンク探索装置は、インターネッ
トなどネットワーク全体を探索範囲とし、ネットワーク
上の全てのサーバマシンに存在する全てのノードの探索
を行う。このため、ノード・リンク情報データベースに
は、本来、探索範囲とすべきもの以外のノードに関する
不必要な情報も蓄積され、その資源を浪費するばかりで
なく、検索する際の作業効率も低下していた。さらに、
ノードの存在するサーバマシンやネットワークをアクセ
スする時間が長くなり、他の利用者のネットワーク資源
の利用が制限されていしまうという問題もある。A node / link search device in which a search range cannot be specified searches the entire network such as the Internet and searches all nodes existing in all server machines on the network. . Therefore, in the node / link information database, unnecessary information about a node other than the one that should originally be the search range is accumulated, which not only wastes its resources but also reduces the work efficiency in searching. . further,
There is also a problem that it takes a long time to access the server machine or the network where the node exists, and the use of network resources of other users is restricted.

【００２２】探索範囲を指定することのできる装置であ
っても、従来はホスト名を単位として探索範囲を制限す
ることしかができない。これはノードの物理的な位置で
探索範囲を制限していることになる。しかしながら、ハ
イパーテキストはノード間の意味的な関係に基づいて順
次検索するものであるので、サーバマシンを単位とする
探索範囲の指定では、適切な範囲指定ができない。Even a device that can specify a search range can only limit the search range by host name. This means that the search range is limited by the physical position of the node. However, since hypertext is searched sequentially based on the semantic relationship between nodes, it is not possible to specify an appropriate range by specifying the search range in units of server machines.

【００２３】たとえば、図１５に示した例では、サーバ
“Ａ”１０１１上に目次の登録されたノード１０１４が
存在し、サーバ“Ｂ”１０１２上およびサーバ“Ｃ”１
０１３上に、目次に対応する内容の文章の登録されたノ
ード１０１５、１０１６がそれぞれ登録されている。こ
のような場合、目次のみではなく、書かれている文書の
中身までも探索範囲としたい場合には、サーバ“Ａ”、
サーバ“Ｂ”、サーバ“Ｃ”のすべてを探索範囲のホス
ト名として指定しなければならない。その結果、サーバ
“Ｂ”、サーバ“Ｃ”上に存在する目次と何ら意味的な
つながりの無い他の多数のノードまでが探索範囲とな
り、不要なノードの探索が行われてしまうという問題が
ある。For example, in the example shown in FIG. 15, the registered node 1014 of the table of contents exists on the server "A" 1011 and the server "B" 1012 and the server "C" 1 exist.
Nodes 1015 and 1016 in which sentences having contents corresponding to the table of contents are registered are respectively registered on 013. In this case, if it is desired to set not only the table of contents but also the contents of the written document as the search range, the server "A",
All of the servers "B" and "C" must be specified as the host names in the search range. As a result, a large number of other nodes that have no meaning connection with the table of contents existing on the server “B” and the server “C” are included in the search range, and unnecessary nodes are searched. .

【００２４】そこで本発明の目的は、意味的なつながり
のない不要なノードの探索を制限することのできるノー
ド・リンク探索装置を提供することにある。Therefore, an object of the present invention is to provide a node / link search device capable of limiting the search for unnecessary nodes that are not semantically connected.

【００２５】[0025]

【課題を解決するための手段】請求項１記載の発明で
は、ハイパーテキストの各ノードに含まれるリンク先の
ノードの名称とリンク先のノードの格納位置とを表わし
たリンク情報を基にして任意のノードからリンク先のノ
ードへの探索を順次行う際の探索範囲の制限条件を探索
の起点となるノードから探索先のノードまでの間に存在
するノードの数である階層数の最大値として設定する探
索条件設定手段と、リンク情報の示すリンク先のノード
の内容をそれを蓄積しているサーバから読み出すことを
探索の起点となるノードから順に繰り返し行うファイル
収集手段と、このファイル収集手段によって１つのノー
ドの内容を読み込むごとにそのノードの含むリンク情報
およびこれの示すリンク先のノードと今回読み込んだノ
ードとの対応付けを表わす情報とを記憶するノード・リ
ンク情報記憶手段と、ファイル収集手段によって１つの
ノードの内容を読み込むごとにそのノードに含まれるリ
ンク情報の示すリンク先のノードの階層数を求める階層
数算出手段と、この階層数算出手段によって求めた階層
数が探索条件設定手段により設定した階層数の最大値よ
りも大きいとき今回読み込んだノード以降にリンクされ
ているノードの内容のファイル収集手段による読み込み
を中止させる探索範囲制限手段とをノード・リンク探索
装置に具備させている。According to a first aspect of the present invention, any one of the hypertext nodes can be arbitrarily selected based on link information indicating the name of the link destination node and the storage location of the link destination node. Set the limit condition of the search range when sequentially performing the search from the node to the link destination node as the maximum value of the number of layers, which is the number of nodes existing from the search origin node to the search destination node The search condition setting means, the file collecting means for repeatedly reading the contents of the link destination node indicated by the link information from the server storing the same, in order from the node serving as the starting point of the search, and the file collecting means. Each time the contents of one node are read, the link information contained in that node and the link destination node indicated by this will be associated with the node read this time. A node / link information storage means for storing the information to be passed, and a layer number calculation means for obtaining the layer number of the link destination node indicated by the link information included in the node each time the file collecting means reads the content of the node. , When the number of layers obtained by the number-of-hierarchies calculating means is larger than the maximum value of the number of layers set by the search condition setting means, the reading of the contents of the nodes linked after the node read this time by the file collecting means is stopped. The node / link search device is provided with a search range limiting means.

【００２６】すなわち請求項１記載の発明では、探索の
起点となるノードから探索先のノードまでの間に存在す
るノードの数を階層数とし、探索を行う範囲を制限する
条件を階層数の最大値により指定する。起点となるノー
ドからリンクをたどることによって順次ノードの内容を
サーバから読み出し、読み出したノードのリンク先のノ
ードについてその名称や格納位置およびリンク元ノード
とリンク先ノードとの対応関係を登録する。また、ノー
ドをサーバから読み出すごとに読み込んだノードのリン
ク先のノードについての階層数を求める。この階層数が
設定した最大値以上になったとき、それ以上先へリンク
をたどりノードを読み出すことを中止する。That is, according to the first aspect of the invention, the number of nodes existing between the node that is the starting point of the search and the node that is the search destination is the number of layers, and the condition for limiting the search range is the maximum number of layers. Specify by value. The contents of the nodes are sequentially read from the server by tracing the links from the node serving as the starting point, and the names and storage positions of the nodes linked to the read nodes and the correspondence relationship between the link source nodes and the link destination nodes are registered. Also, every time a node is read from the server, the number of layers of the node to which the read node is linked is obtained. When the number of layers exceeds the set maximum value, the link is followed further and the reading of the node is stopped.

【００２７】これらにより、起点としたノードからの階
層数によって探索の範囲が制限される。このように起点
のノードからの階層数によって探索範囲を制限している
ので、探索されるノードは起点のノードと意味的なつな
がりの強いもののみとなる。また階層数により探索範囲
を制限できるので、必要な範囲での情報のみを収集する
ことができる。As a result, the range of search is limited by the number of layers from the node serving as the starting point. In this way, since the search range is limited by the number of layers from the starting node, the searched nodes are only those having a strong semantic connection with the starting node. Further, since the search range can be limited by the number of layers, it is possible to collect only the information within the required range.

【００２８】請求項２記載の発明では、ハイパーテキス
トの各ノードに含まれるリンク先のノードの名称とリン
ク先のノードの格納位置とを表わしたリンク情報を基に
して任意のノードからリンク先のノードへの探索を順次
行う際の探索範囲の制限条件を探索の起点となるノード
から探索先のノードまでの間に存在するリンクの数であ
る階層数の最大値として設定する探索条件設定手段と、
リンク情報の示すリンク先のノードの内容をそれを蓄積
しているサーバから読み出すことを探索の起点となるノ
ードから順に繰り返し行うファイル収集手段と、このフ
ァイル収集手段によって１つのノードの内容を読み込む
ごとにそのノードの含むリンク情報およびこれの示すリ
ンク先のノードと今回読み込んだノードとの対応付けを
表わす情報とを記憶するノード・リンク情報記憶手段
と、ファイル収集手段によって１つのノードの内容を読
み込むごとにそのノードに含まれるリンク情報の表わす
リンク先のノードへのリンクの階層数を求める階層数算
出手段と、この階層数算出手段によって求めた階層数が
探索条件設定手段により設定した階層数の最大値よりも
大きいとき今回読み込んだノード以降にリンクされてい
るノードの内容のファイル収集手段による読み込みを中
止させる探索範囲制限手段とをノード・リンク探索装置
に具備させている。According to the second aspect of the invention, based on the link information indicating the name of the link destination node and the storage position of the link destination node included in each node of the hypertext, the link destination from any node is A search condition setting means for setting a search range limiting condition when sequentially searching nodes as the maximum value of the number of layers, which is the number of links existing from the node that is the starting point of the search to the node of the search destination. ,
File collection means for repeatedly reading the content of the link destination node indicated by the link information from the server that stores it, starting from the node that is the starting point of the search, and each time the content of one node is read by this file collection means In the node / link information storage means for storing the link information included in the node and the information indicating the correspondence between the node of the link destination indicated by this node and the node read this time, and the content of one node is read by the file collection means. For each layer, the layer number calculation means for obtaining the layer number of the link to the link destination node represented by the link information included in the node, and the layer number obtained by this layer number calculation means are equal to the layer number set by the search condition setting means. If it is larger than the maximum value, the contents of the nodes linked after the node read this time are displayed. And it is provided in the node link search apparatus and a search range limiting means for stopping the reading by yl collection means.

【００２９】すなわち請求項２記載の発明では、探索の
起点となるノードから探索先のノードまでの間に存在す
るリンクの数を階層数とし、探索を行う範囲を階層数の
最大値により制限している。これにより起点のノードと
意味的なつながりが強いノードについての情報だけを収
集することができる。また階層数で指定された探索範囲
内のノードについてだけその名称や格納位置ならびにノ
ード相互間のリンク関係を収集できる。That is, according to the second aspect of the present invention, the number of links existing between the node that is the starting point of the search and the node that is the search destination is the number of layers, and the range in which the search is performed is limited by the maximum value of the number of layers. ing. As a result, it is possible to collect only the information about the node that has a strong semantic connection with the starting node. In addition, the names, storage positions, and link relationships between nodes can be collected only for nodes within the search range specified by the number of layers.

【００３０】請求項３記載の発明では、探索の対象とな
るノードがネットワークに接続された複数のサーバに分
散して格納されている。According to the third aspect of the invention, the nodes to be searched are distributed and stored in a plurality of servers connected to the network.

【００３１】すなわち請求項３記載の発明では、探索の
対象となる各ノードは、ネットワークを介して接続され
た複数のサーバに分散して格納されている。探索範囲を
階層数によって制限しているので、ノードが複数のサー
バに分散されていても、多数のサーバから必要なノード
の情報だけを収集することができる。たとえば、サーバ
単位でしか探索範囲を指定できない場合には、起点とす
るノードから必要な階層数を越えるノードのみならず、
起点のノードとリンク関係のない無関係のノードまで収
集される場合もある。階層数によって探索範囲を制限す
ることにより、探索範囲のノードが複数のサーバに分散
されていても、起点とするノードから意味的なつながり
のある必要範囲のノードについての情報だけを収集する
ことができる。That is, in the invention according to claim 3, the nodes to be searched are distributed and stored in a plurality of servers connected via the network. Since the search range is limited by the number of layers, only the necessary node information can be collected from a large number of servers even if the nodes are distributed among a plurality of servers. For example, if the search range can be specified only on a server-by-server basis, not only will the number of layers exceed the required number of layers from the starting node,
In some cases, even unrelated nodes that have no link relationship with the starting node are collected. By limiting the search range by the number of layers, even if the search range nodes are distributed to multiple servers, it is possible to collect only the information about the necessary range nodes that are semantically connected from the starting node. it can.

【００３２】請求項４記載の発明では、ファイル収集手
段は、今回読み込んだノードに含まれているリンク情報
の示すリンク先のノードが既に読み込んだノードと同一
であるか否かを判別する同一ノード判別手段と、この同
一ノード判別手段により既に読み込んだノードと同一で
あると判別されたときそのノードの再度の読み込みを中
止する多重読込中止手段とを具備している。In the invention according to claim 4, the file collecting means determines whether the node of the link destination indicated by the link information included in the node read this time is the same as the already read node or not. The discriminating means and the multiple reading suspending means for suspending the re-reading of the node when it is discriminated by the same node discriminating means that the node is the same as the node already read.

【００３３】すなわち請求項４記載の発明では、１度読
み込んだことのあるノードの再度の読み込みを防止して
いる。これにより、ループした範囲を繰り返し探索する
ことを回避することができる。That is, in the invention according to claim 4, the node which has been read once is prevented from being read again. This makes it possible to avoid repeatedly searching the looped range.

【００３４】[0034]

BEST MODE FOR CARRYING OUT THE INVENTION

【００３５】[0035]

【実施例】図１は、本発明の一実施例におけるノード・
リンク探索装置の構成の概要を表わしたものである。ハ
イパーメディア構造上の探索手順は、探索を開始したノ
ードを根とする木構造と見ることができる。そこで、根
を始点として各ノードのある木の深さを階層数とし、探
索範囲を始点からの階層数で制限するようになってい
る。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT FIG. 1 shows a node according to an embodiment of the present invention.
1 is a diagram showing an outline of the configuration of a link search device. The search procedure on the hypermedia structure can be regarded as a tree structure whose root is the node that started the search. Therefore, with the root as the starting point, the depth of a tree with each node is set as the number of layers, and the search range is limited by the number of layers from the starting point.

【００３６】ノード・リンク探索装置１１には、インタ
ーネットなどのネットワーク１２を介してノードを蓄積
した複数のサーバマシン１３が接続されている。ノード
・リンク探索装置１１は、サーバマシン１３からノード
のファイルを収集するファイル収集部２１と、サーバマ
シン１３から取得したノードのファイルの中からハイパ
ーリンクとリンク先ノードを抽出するハイパーリンク抽
出部２２を備えている。A plurality of server machines 13 that store nodes are connected to the node / link search device 11 via a network 12 such as the Internet. The node / link search apparatus 11 includes a file collection unit 21 that collects node files from the server machine 13 and a hyperlink extraction unit 22 that extracts hyperlinks and link destination nodes from the node files acquired from the server machine 13. Is equipped with.

【００３７】またノード・リンク探索装置１１は、抽出
したハイパーリンクとリンク先ノードの各種属性情報を
蓄積・管理するノード・リンク情報データベース２３を
有する。探索範囲制御部２４は、図示しない入力端末あ
るいは設定用のファイルから入力される探索範囲を指定
するための条件に合致するようにノードの探索範囲を制
限する部分である。システム制御部２５は、ノード・リ
ンク探索装置１１内の各部の動作の流れを統括的に制御
する回路部分である。Further, the node / link search device 11 has a node / link information database 23 for accumulating and managing the extracted hyperlinks and various attribute information of the link destination nodes. The search range control unit 24 is a part that limits the search range of the node so as to match the condition for specifying the search range input from an input terminal (not shown) or a setting file. The system control unit 25 is a circuit unit that integrally controls the operation flow of each unit in the node / link search device 11.

【００３８】ハイパーリンク抽出部２２は、ハイパーメ
ディア記述言語で書かれたノードのマルチメディアドキ
ュメントファイルの内容を解析するハイパーメディア記
述言語解析部２６を備えている。ハイパーメディア記述
言語解析部２６は、各種タグを検出することによってハ
イパーリンクとリンク先ノードを抽出するようになって
いる。また、ハイパーリンク抽出部２２は、抽出したリ
ンク先ノードの階層数を算出する階層数算出部２７を備
えている。階層数算出部２７は、リンク元のノードの階
層数に“１”を加えたものをリンク先ノードの階層数と
して求める。The hyperlink extraction unit 22 includes a hypermedia description language analysis unit 26 that analyzes the contents of the multimedia document file of the node written in the hypermedia description language. The hypermedia description language analysis unit 26 is adapted to extract hyperlinks and link destination nodes by detecting various tags. The hyperlink extraction unit 22 also includes a layer number calculation unit 27 that calculates the number of layers of the extracted link destination node. The number-of-layers calculation unit 27 obtains the number of layers of the link source node plus “1” as the number of layers of the link destination node.

【００３９】探索範囲は、サーバマシン１３の識別子と
してのホスト名と、最大階層数により指定される。探索
範囲制御部２４は、ノードの格納先のホスト名と探索範
囲として指定されたホスト名とを比較するホスト名比較
部２８と、そのノードの階層数と探索範囲の制限値とし
ての最大階層数とを比較する階層数比較部２９を備えて
いる。探索範囲制御部２４は、ノードの階層数、格納先
のホスト名によって探索範囲を制限する。また、同一の
ノードの多重読み込みを防止するために、前回の検索か
らの経過時間により探索の範囲を制限することも行う。The search range is specified by the host name as the identifier of the server machine 13 and the maximum number of layers. The search range control unit 24 compares the host name of the storage destination of the node with the host name specified as the search range, and the number of layers of the node and the maximum number of layers as the limit value of the search range. A hierarchy number comparison unit 29 for comparing and is provided. The search range control unit 24 limits the search range depending on the number of layers of nodes and the host name of the storage destination. In addition, in order to prevent multiple reading of the same node, the range of search is also limited by the elapsed time from the previous search.

【００４０】ファイル収集部２１は、ネットワーク１２
を通じてサーバマシン１３からノードのマルチメディア
ドキュメントファイルを読み込む。ハイパーリンク抽出
部２２は、ハイパーメディア記述言語解析部２６の解析
結果を基にして、マルチメディアドキュメントファイル
からハイパーリンクとリンク先ノードを抽出する。次
に、階層数算出部２７は、リンク元ノードに“１”を加
えることにより、リンク先ノードまたはハイパーリンク
の階層数を算出する。The file collection unit 21 is connected to the network 12
The multimedia document file of the node is read from the server machine 13 through. The hyperlink extraction unit 22 extracts a hyperlink and a link destination node from the multimedia document file based on the analysis result of the hypermedia description language analysis unit 26. Next, the layer number calculation unit 27 calculates the number of layers of the link destination node or the hyperlink by adding "1" to the link source node.

【００４１】探索範囲制御部２４は、ノードあるいはハ
イパーリンクの階層数が指定された最大階層数以下か否
か、あるいはホスト名が指定されたものと部分一致する
か否かを基に探索範囲を制限する。探索範囲内のノード
およびハイパーリンクについては、それらの属性情報が
ノード・リンク情報データベース２３に蓄積される。ノ
ード・リンク情報データベース２３内でノードについて
の属性を登録するテーブルをノードテーブルと呼び、リ
ンクに関する属性を登録するテーブルをリンクテーブル
と呼ぶことにする。このように、階層数によって探索範
囲を制限することで、不必要なノードについての属性情
報の収集およびデータベースへの登録を防いでいる。The search range control unit 24 determines the search range based on whether the number of layers of nodes or hyperlinks is less than or equal to the specified maximum number of layers, or whether the host name partially matches the specified one. Restrict. For the nodes and hyperlinks within the search range, their attribute information is stored in the node / link information database 23. A table for registering attributes of nodes in the node / link information database 23 is called a node table, and a table for registering attributes of links is called a link table. In this way, by limiting the search range according to the number of layers, it is possible to prevent collection of attribute information about unnecessary nodes and registration in the database.

【００４２】図２は、ノードテーブルの登録内容の一例
を表わしたものである。ノードテーブル３１には、図の
左から、ノード識別子３２、スキーム(Scheme)３３、ホ
スト名３４、ポート番号３５、スキーム特有部(Scheme-
Specific) ３６、階層数３７、登録日時３８、探索日時
３９、最終更新日時４１が登録される。このうちノード
識別子３２は、図１では示していないデータベース・マ
ネジメント・システム（ＤＢＭＳ）によって、ノードの
属性情報をノードテーブルに登録する際に自動的に生成
される識別子である。スキーム３３、ホスト名３４、ポ
ート番号３５およびスキーム特有部３６は、ノードの格
納位置を示すＵＲＬの文字列の各部分に対応するもので
ある。階層数３７は、検索開始位置からのノードの階層
数を表わす項目である。FIG. 2 shows an example of registered contents of the node table. In the node table 31, from the left of the figure, a node identifier 32, a scheme (Scheme) 33, a host name 34, a port number 35, and a scheme-specific part (Scheme-
Specific) 36, number of layers 37, registration date / time 38, search date / time 39, and last update date / time 41 are registered. Of these, the node identifier 32 is an identifier automatically generated when the attribute information of the node is registered in the node table by the database management system (DBMS) not shown in FIG. The scheme 33, the host name 34, the port number 35, and the scheme specific part 36 correspond to each part of the character string of the URL indicating the storage position of the node. The number of layers 37 is an item representing the number of layers of nodes from the search start position.

【００４３】登録日時３８は、ノードがデータベースに
最初に登録された日時を示す。探索日時３９は、このノ
ードが探索された最新の日時を表わしている。最終更新
日時４１は、ノードのマルチメディアドキュメントファ
イルの最終更新日時を表わしている。図中、点線４２で
囲んだ探索日時３９と、最終更新日時４１の項目は、リ
ンク元のノードとして登録されるときに更新される属性
であり、点線４３で囲んだ項目は、リンク元ノードのハ
イパーリンクにより指し示されているリンク先ノードと
して登録されるときに更新される属性である。The registration date and time 38 indicates the date and time when the node was first registered in the database. The search date and time 39 represents the latest date and time when this node was searched. The last update date and time 41 represents the last update date and time of the multimedia document file of the node. In the figure, the items of the search date / time 39 and the last update date / time 41 enclosed by a dotted line 42 are attributes that are updated when registered as a link source node, and the items enclosed by a dotted line 43 are of the link source node. This is an attribute that is updated when registered as a link destination node pointed to by a hyperlink.

【００４４】図３は、リンクテーブルの登録内容の一例
を表わしたものである。リンクテーブル５１は、図の左
から、リンク識別子５２、リンク元ノード識別子５３、
リンク先ノード識別子５４が登録される。これらの項目
には、ＤＢＭＳで自動的に割り当てられたノードの識別
番号が登録される。リンク識別子５２で示されるハイパ
ーリンクの出所元のノードのノード識別子がリンク元ノ
ード識別子５３に、ハイパーリンクの指す先のノードの
ノード識別子がリンク先ノード識別子５４として登録さ
れる。FIG. 3 shows an example of registered contents of the link table. The link table 51 includes a link identifier 52, a link source node identifier 53, and
The link destination node identifier 54 is registered. The identification numbers of the nodes automatically assigned by the DBMS are registered in these items. The node identifier of the source node of the hyperlink indicated by the link identifier 52 is registered as the link source node identifier 53, and the node identifier of the node pointed to by the hyperlink is registered as the link destination node identifier 54.

【００４５】ノード・リンク情報データベースのノード
テーブル３１には、探索の開始点となるノードの情報を
予め少なくとも１つ登録しておく。この際、登録してお
くべき項目は、ノード識別子３２、スキーム３３、ホス
ト名３４、ポート番号３５、スキーム特有部３６、階層
数３７、登録日時３８である。探索の起点となるノード
の階層数には“０”を設定しておく。In the node / table 31 of the node / link information database, at least one piece of information about the node serving as the search starting point is registered in advance. At this time, items to be registered are the node identifier 32, the scheme 33, the host name 34, the port number 35, the scheme specific part 36, the number of layers 37, and the registration date and time 38. “0” is set in advance for the number of layers of the node that is the starting point of the search.

【００４６】図４は、ノード・リンク探索装置の行う動
作の流れを表わしたものである。一点破線６１で囲んだ
ステップの処理は、ノード・リンク装置のシステム制御
部２５によって行われる。一点破線６２で囲んだステッ
プの処理は探索範囲制御部２４により、一点破線６３で
囲んだステップの処理はファイル収集部２１およびハイ
パーリンク抽出部２２によって行われる。まず、探索に
先立ちシステム制御部は探索条件の入力などのシステム
の初期化を行う（ステップＳ１０１）。探索条件は、図
１で示していない入力端末や外部ファイルから入力され
る。探索条件として、ここでは探索範囲とすべきノード
の階層数の最大値（最大階層数）と、前回の探索結果を
古いものと判定するための経過時間と、探索対象とする
サーバマシンを特定するための文字列（探索するホスト
名の一部あるいは全部）が条件として設定される。FIG. 4 shows the flow of operations performed by the node / link search device. The processing of the steps surrounded by the one-dot broken line 61 is performed by the system control unit 25 of the node / link device. The processing of the step surrounded by the dashed line 62 is performed by the search range control unit 24, and the processing of the step surrounded by the dashed line 63 is performed by the file collection unit 21 and the hyperlink extraction unit 22. First, prior to the search, the system control unit initializes the system such as inputting search conditions (step S101). The search condition is input from an input terminal or an external file not shown in FIG. As the search conditions, here, the maximum value of the number of layers of the nodes that should be the search range (maximum number of layers), the elapsed time for determining the previous search result as old, and the server machine to be searched are specified. A character string (part or all of the host name to be searched) is set as a condition.

【００４７】システム制御部２５は、ノードテーブル３
１を検索して、未探索のノードが存在するか否かを調べ
る（ステップＳ１０２）。ノードテーブル３１中に登録
されている探索日時３９が、未定義（未登録）の場合、
探索日時３９が現在時刻よりも探索条件として設定され
た経過時間以上過去の時刻である場合のいずれかに該当
するとき未探索のノードと判定する。未探索のノードが
存在する場合には（ステップＳ１０２；Ｙ）、探索範囲
制御部２４は、探索されていない１つのノードの属性情
報をノードテーブル３１から取り出す。この際、階層数
の小さいものから優先的にノードを選択する（ステップ
Ｓ１０３）。したがって、最初は、階層数が“０”であ
る探索開始点のノードの属性情報が取り出される。The system control unit 25 uses the node table 3
1 is searched to check whether or not there is an unsearched node (step S102). If the search date / time 39 registered in the node table 31 is undefined (unregistered),
When any of the cases where the search date and time 39 is a time past the elapsed time set as the search condition with respect to the current time, the node is determined as an unsearched node. When there is an unsearched node (step S102; Y), the search range control unit 24 retrieves the attribute information of one unsearched node from the node table 31. At this time, the node having a smaller number of layers is preferentially selected (step S103). Therefore, initially, the attribute information of the node at the search start point having the number of layers “0” is extracted.

【００４８】次に、選択したノードが指定された探索範
囲内に存在するノードであるかどうかを判別する（ステ
ップＳ１０４）。探索範囲内か否かを判別する処理につ
いては後に詳細に説明する。探索範囲内に存在するノー
ドである場合には（ステップＳ１０４；Ｙ）、そのノー
ドのドキュメントファイルをネットワークを通じて読み
出し、これに記述されているリンク先ノードをノードテ
ーブルに追加登録する等のテーブル情報更新処理（ステ
ップＳ１０５）を行う。テーブル更新処理の詳細につい
ては後に説明する。１つのノードについての更新処理
（ステップＳ１０５）を終えた後、再びステップＳ１０
２に戻り、未探索のノードについての探索処理を繰り返
す。Next, it is determined whether or not the selected node is a node existing within the designated search range (step S104). The process of determining whether it is within the search range will be described in detail later. When the node is within the search range (step S104; Y), the table file is updated by reading the document file of the node through the network and additionally registering the link destination node described in the node file in the node table. The process (step S105) is performed. Details of the table update process will be described later. After the update process (step S105) for one node is completed, the process is repeated in step S10.
Returning to step 2, the search process for unsearched nodes is repeated.

【００４９】選択したノードが指定された探索範囲外の
ノードである場合には（ステップＳ１０４；Ｎ）、ノー
ドテーブル３１内の選択したノードについての探索時刻
３９を現在時刻に変更し（ステップＳ１０６）、ステッ
プＳ１０２に戻る。未探索のノードがノードテーブルに
存在しなくなったとき（ステップＳ１０２；Ｎ）、探索
を終了するための終了処理を行い（ステップＳ１０
７）、処理を終了する（エンド）。If the selected node is outside the designated search range (step S104; N), the search time 39 for the selected node in the node table 31 is changed to the current time (step S106). , And returns to step S102. When the unsearched node no longer exists in the node table (step S102; N), the ending process for ending the search is performed (step S10).
7), the process ends (end).

【００５０】図５は、探索範囲内に存在するノードであ
るか否かを判定する際の処理の流れを表わしたものであ
る。ここでは、階層数のみによって探索範囲内か否かを
判別している。まず探索範囲制御部２４は、選択したノ
ードの階層数と探索条件として設定されている最大階層
数とを比較する（ステップＳ２０１）。ノードの階層数
が最大階層数よりも大きい場合には（ステップＳ２０
１；Ｎ）、このノードは探索範囲に存在しないものと判
定する（ステップＳ２０２）。ノードの階層数が最大階
層数よい小さいかあるいは等しい場合には（ステップＳ
２０１；Ｙ）、このノードが探索範囲に存在するものと
判定する（ステップＳ２０３）。FIG. 5 shows the flow of processing for determining whether or not the node is within the search range. Here, whether or not it is within the search range is determined only by the number of layers. First, the search range control unit 24 compares the number of layers of the selected node with the maximum number of layers set as the search condition (step S201). If the number of layers of nodes is larger than the maximum number of layers (step S20)
1; N), it is determined that this node does not exist in the search range (step S202). If the number of layers of nodes is smaller than or equal to the maximum number of layers (step S
201; Y), and it is determined that this node exists in the search range (step S203).

【００５１】図６は、探索範囲内に存在するノードであ
るか否かを判定する処理の他の一例の流れを表わしたも
のである。ここでは、階層数による判別の他に、ホスト
名による判別を加えている。探索範囲制御部２４は、選
択したノードの階層数と最大階層数とを比較し（ステッ
プＳ３０１）、ノードの階層数が最大階層数よりも大き
い場合には（ステップＳ３０１；Ｎ）、このノードは探
索範囲に存在しないものと判定する（ステップＳ３０
２）。ノードの階層数が最大階層数よい小さいかあるい
は等しい場合には（ステップＳ３０１；Ｙ）、探索範囲
のホスト名として指定された文字列と、ノードのホスト
名とが部分一致するか否かを調べる（ステップＳ３０
３）。FIG. 6 shows the flow of another example of the processing for determining whether or not the node exists within the search range. Here, in addition to the determination based on the number of layers, the determination based on the host name is added. The search range control unit 24 compares the number of layers of the selected node with the maximum number of layers (step S301). If the number of layers of the node is larger than the maximum number of layers (step S301; N), this node is It is determined that it does not exist in the search range (step S30
2). If the number of layers of the node is smaller than or equal to the maximum number of layers (step S301; Y), it is checked whether the character string specified as the host name of the search range partially matches the host name of the node. (Step S30
3).

【００５２】たとえば、探索文字列として“ＡＢ”が指
定されたときには、“ＡＢＣ”や“ＡＢＤ”などホスト
名の先頭から指定された文字列“ＡＢ”を含むものはす
べて部分一致していると判別される。“ＣＡＢ”などの
ように部分一致していない場合には（ステップＳ３０
３；Ｎ）、このノードを探索範囲に存在しないものと判
定する（ステップＳ３０２）。部分一致する場合には
（ステップＳ３０３；Ｙ）、このノードが探索範囲に存
在するものと判定する（ステップＳ３０４）。ここで
は、階層数の判定を行ってからホスト名の部分一致を判
定したが、これらの順序を入れ換えて行ってもよい。For example, when "AB" is specified as the search character string, all the characters including the character string "AB" specified from the beginning of the host name such as "ABC" and "ABD" are partially matched. To be determined. If there is no partial match such as "CAB" (step S30)
3; N), it is determined that this node does not exist in the search range (step S302). When they partially match (step S303; Y), it is determined that this node exists in the search range (step S304). Here, the number of layers is determined, and then the partial matching of host names is determined, but the order may be changed.

【００５３】図７は、図４に示したテーブル更新処理の
流れを表わしたものである。この処理は、ファイル収集
部２１とハイパーリンク抽出部２２により行われる。ま
ず、図４のステップＳ１０４により探索範囲内に存在す
ると判別されたノードの内容をネットワークを通じてそ
れが格納されているサーバマシンから読み出すことを行
う。このため、ファイル収集部２１は、該当するノード
の属性情報をノード・リンク情報データベース２３のノ
ードテーブル３１から読み込む（ステップＳ４０１）。
読み込んだノードの階層数をここでは、仮にｎとする。
次に、ファイル収集部２１は、サーバマシンとの通信に
用いる図示しない通信用バッファと一時バッファを初期
化した後サーバマシン１３との接続を行い、ノードのド
キュメントファイルの転送（読み込み）準備を行う（ス
テップＳ４０２）。FIG. 7 shows the flow of the table updating process shown in FIG. This processing is performed by the file collection unit 21 and the hyperlink extraction unit 22. First, the content of the node determined to exist within the search range in step S104 of FIG. 4 is read out from the server machine storing it through the network. Therefore, the file collection unit 21 reads the attribute information of the corresponding node from the node table 31 of the node / link information database 23 (step S401).
Here, it is assumed that the number of read node hierarchies is n.
Next, the file collection unit 21 initializes a communication buffer and a temporary buffer (not shown) used for communication with the server machine, and then connects to the server machine 13 to prepare for transfer (reading) of the document file of the node. (Step S402).

【００５４】続いてファイル収集部２１は、分散ハイパ
ーメディアシステムの転送プロトコルに従って、ドキュ
メントファイルの転送を開始する。ファイル収集部２１
は、転送プロトコルメッセージヘッダ部を読み込み（ス
テップＳ４０３）、ドキュメントファイルの記述言語な
どのフォーマットを調べる（ステップＳ４０４）。読み
込んだファイルがハイパーメディア記述言語形式でない
場合には（ステップＳ４０４；Ｎ）、そのドキュメント
ファイルの終端まで読み込む（ステップＳ４０５）。そ
して、通信用バッファおよび一時バッファの解放ならび
にサーバマシンとの接続を断するなどの後処理を行い
（ステップＳ４０６）、処理を終了する（エンド）。Subsequently, the file collection unit 21 starts the transfer of the document file according to the transfer protocol of the distributed hypermedia system. File collection unit 21
Reads the transfer protocol message header section (step S403) and checks the format such as the description language of the document file (step S404). If the read file is not in the hypermedia description language format (step S404; N), the end of the document file is read (step S405). Then, post-processing such as releasing the communication buffer and the temporary buffer and disconnecting the connection with the server machine is performed (step S406), and the processing ends (end).

【００５５】読み込んだファイルがハイパーメディア記
述言語形式の場合には（ステップＳ４０４；Ｙ）、ファ
イル収集部２１は、転送プロトコルメッセージ本体のド
キュュメントファイルを通信用バッファに読み込む（ス
テップＳ４０７）。次に、ハイパーメディアド記述言語
解析部２６により、ハイパーリンクの部分を表わすタグ
の開始文字から終了文字までを一時バッファに移動させ
る（ステップＳ４０８）。一時バッファに格納したハイ
パーリンクの中からリンク先ノードを記述している部分
を取り出す（ステップＳ４０９）。階層数算出部は、リ
ンク元のノードの階層数“ｎ”に“１”を加えた“ｎ＋
１”を、リンク先ノードの階層数として求める（ステッ
プＳ４１０）。When the read file is in the hypermedia description language format (step S404; Y), the file collecting unit 21 reads the document file of the transfer protocol message body into the communication buffer (step S407). Next, the hypermedia description language analysis unit 26 moves the start character to the end character of the tag representing the hyperlink portion to the temporary buffer (step S408). The part describing the link destination node is extracted from the hyperlinks stored in the temporary buffer (step S409). The layer number calculation unit adds “1” to the layer number “n” of the link source node, “n +
1 "is obtained as the number of layers of the link destination node (step S410).

【００５６】このようにして得たノードとハイパーリン
クの属性情報を、ノード・リンク情報データベース２３
内のノードテーブル３１およびリンクテーブル５１に書
き込む（ステップＳ４１１）。この際、リンク元ノード
の属性情報、リンク先ノードの属性情報の各一部項目
と、ハイパーリンクの属性情報（リンクテーブル）の全
項目を書き込む。これにより、今回読み込んだノードに
リンクされている１つ階層の進んだリンク先ノードにつ
いての属性情報と、読み込んだノードからそのリンク先
ノードへのハイパーリンクを各テーブルに１つ追加登録
したことになる。ただし、同一のノードについて既に登
録されている場合には、テーブルへの追加登録は行わ
ず、探索日時などの更新のみを行う。これにより同一の
ノードやハイパーリンクの多重登録が回避される。The attribute information of nodes and hyperlinks thus obtained is stored in the node / link information database 23.
The data is written in the node table 31 and the link table 51 (step S411). At this time, partial items of the attribute information of the link source node and the attribute information of the link destination node, and all items of the hyperlink attribute information (link table) are written. As a result, the attribute information about the advanced link destination node linked to the node read this time and the hyperlink from the read node to the link destination node are additionally registered in each table. Become. However, if the same node has already been registered, no additional registration in the table is performed and only the search date and time is updated. This avoids multiple registration of the same node or hyperlink.

【００５７】次に、今回読み込んだノードのドキュュメ
ントファイルの終端まで処理を行ったか否かを調べ（ス
テップＳ４１２）、終端に到らないときは（ステップＳ
４１２；Ｎ）、ステップＳ４０７に戻る。各ノードは、
複数のリンク先ノードを有することがあるので、ファイ
ルの終端までこのような処理を繰り返し行うことによ
り、今回読み込んだノードのリンク先ノードの全てにつ
いてノードテーブルとリンクテーブルへの登録を行う。
ファイルの終端まで処理したときは（ステップＳ４１
２；Ｙ）、このノードについてのテーブル更新処理を終
了する（エンド）。Next, it is checked whether or not processing has been performed up to the end of the document file of the node read this time (step S412). If the end has not been reached (step S4).
412; N), and returns to step S407. Each node is
Since there may be a plurality of link destination nodes, by repeating such processing until the end of the file, all the link destination nodes of the node read this time are registered in the node table and the link table.
When processing is performed up to the end of the file (step S41
2; Y), the table update process for this node ends (END).

【００５８】ここで、図７のステップＳ４１１において
ノードテーブルとリンクテーブルに登録する属性情報の
内容について説明する。サーバマシンから読み込んだノ
ードをリンク元ノードとし、このノードに含まれるハイ
パーリンクに記述されているノードをリンク先ノードと
する。まず、リンク元ノードについては、探索日時３９
と、最終更新日時４１を登録する。これにより、ノード
をいつ検索したかの最新の時刻情報を残すことができ
る。たとえば、図２に示すノードテーブル３１におい
て、ノード識別子が“１”のノードをサーバマシンから
読み出し、これのハイパーリンクの指す先としてノード
識別子が“２”のノードが記述されていたものとする。
この場合は、ノード識別子が“１”のノードがリンク元
ノードであり、ノード識別子が“２”のノードがリンク
先ノードである。The contents of the attribute information registered in the node table and the link table in step S411 of FIG. 7 will be described. The node read from the server machine is the link source node, and the node described in the hyperlink included in this node is the link destination node. First, regarding the link source node, the search date and time 39
Then, the last update date / time 41 is registered. Thereby, the latest time information of when the node is searched can be left. For example, in the node table 31 shown in FIG. 2, it is assumed that the node having the node identifier “1” is read from the server machine and the node having the node identifier “2” is described as the destination pointed to by the hyperlink.
In this case, the node with the node identifier “1” is the link source node, and the node with the node identifier “2” is the link destination node.

【００５９】図２において点線４２で囲まれている探索
日時と最終更新日時がリンク元ノードの属性情報として
更新される。リンク先ノードの属性情報として登録され
るのは図２の点線４４で囲まれている部分である。すな
わち、ノード識別子と、ノードの格納位置を示すＵＲＬ
の文字列に対応した、スキーム、ホスト名、ポート番
号、スキーム特有部、階層数算出部で求めた階層数、お
よびデータベースに登録された日時である登録日時であ
る。ノード識別子は、ＤＢＭＳによって自動生成された
ものが登録される。リンク先のノードについては、まだ
実際にサーバマシン１３から読み出してそれに含まれる
ハイパーリンクを調べていないので、探索日時および最
終更新日時は未登録のままとなる。The search date and time and the last update date and time enclosed by the dotted line 42 in FIG. 2 are updated as the attribute information of the link source node. The portion enclosed by the dotted line 44 in FIG. 2 is registered as the attribute information of the link destination node. That is, the node identifier and the URL indicating the storage location of the node
The scheme, the host name, the port number, the scheme specific part, the number of layers calculated by the number of layers calculating part, and the registration date and time corresponding to the character string of are registered in the database. As the node identifier, the one automatically generated by the DBMS is registered. Regarding the link destination node, since the hyperlink included in it has not been actually read from the server machine 13 yet, the search date and time and the last update date and time remain unregistered.

【００６０】リンクテーブル５１には、ＤＢＭＳによっ
て自動生成されたリンク識別子５２と、リンク元ノード
識別子５３と、リンク先ノード識別子５４が登録され
る。図３を例に説明する。ノード識別子が“１”のノー
ドを読み込み、これのハイパーリンクによってノード識
別子が“２”のノードがリンク先ノードとなっているも
のとする。まず、リンク識別子が“１”のハイパーリン
ク（５５）の出所元のノードのノード識別子は“１”で
あるので、この値をリンク元ノード識別子（５６）とし
て登録する。またハイパーリンクの指す先のノードのノ
ード識別子は“２”であるので、この値をリンク先ノー
ド識別子（５７）として登録する。読み込んだノードの
他のハイパーリンクが登録されている場合には、それら
リンクについてもリンク識別子を割り当て、リンク元ノ
ード識別子とリンク先ノード識別子が登録される。２つ
ハイパーリンクが存在してる場合には、図３の点線５８
で示した範囲の情報がリンクテーブル５１に登録され
る。In the link table 51, a link identifier 52 automatically generated by the DBMS, a link source node identifier 53, and a link destination node identifier 54 are registered. An example will be described with reference to FIG. It is assumed that the node having the node identifier of "1" is read and the node having the node identifier of "2" is the link destination node by the hyperlink. First, since the node identifier of the source node of the hyperlink (55) whose link identifier is "1" is "1", this value is registered as the link source node identifier (56). Also, since the node identifier of the node pointed to by the hyperlink is "2", this value is registered as the link destination node identifier (57). If other hyperlinks of the read node are registered, link identifiers are also assigned to those links, and the link source node identifier and the link destination node identifier are registered. When there are two hyperlinks, the dotted line 58 in FIG.
Information in the range indicated by is registered in the link table 51.

【００６１】ノードテーブル３１およびリンクテーブル
５１の更新は、１つのノードをサーバマシン１３から読
み込み、新たなリンク先を見い出すたびに行われる。し
たがって、図４の流れ図において１つのノードについて
テーブル更新処理（ステップＳ１０５）を終えた後、ス
テップＳ１０２に戻ると、テーブル更新処理において新
たに登録されたノードもその探索対象になる。これによ
り、次々とリンク先への探索が進められる。ただし、そ
の探索範囲は階層数などの条件によって制限される。The node table 31 and the link table 51 are updated each time one node is read from the server machine 13 and a new link destination is found. Therefore, when the table updating process (step S105) is completed for one node in the flowchart of FIG. 4 and the process returns to step S102, the node newly registered in the table updating process also becomes the search target. As a result, the search for link destinations proceeds one after another. However, the search range is limited by conditions such as the number of layers.

【００６２】また、探索日時が未定義か経過時間以上過
去の時刻であることを未探索判定基準にしており、また
一度探索したノードの探索日時には現在時刻に近い時刻
が登録されるので、同一のノードについて重ねてリンク
先の調査が行われることはない。これにより、たとえば
ノード“Ａ”のリンク先がノード“Ｂ”で、ノード
“Ｂ”のリンク先がノード“Ａ”のようにリンクにより
ループが形成されている場合であっても、ノード“Ａ”
を再度調べることが無く、最大階層数までループを繰り
返したどるようなことがない。Further, the unsearched determination criterion is that the searched date and time is undefined or a time that is past the elapsed time or more. Further, the searched date and time of the node once searched is registered with the time close to the current time. Link destinations will not be surveyed repeatedly for nodes. As a result, for example, even if the link destination of the node “A” is the node “B” and the destination destination of the node “B” is the node “A”, a loop is formed by the links. ”
There is no need to re-examine and repeat the loop up to the maximum number of layers.

【００６３】このようにノードの階層数によって探索の
範囲を制限しているので、起点となるノードと意味的な
つながりの強いノードだけを探索することができる。ま
た、サーバ単位でしか探索範囲を制限できない場合に比
べて不必要な探索を低減することができる。Since the search range is limited by the number of layers of nodes in this way, only the node having a strong semantic connection with the starting node can be searched. In addition, unnecessary searches can be reduced as compared with the case where the search range can be limited only on a server basis.

【００６４】変形例 Modification

【００６５】これまで説明した実施例では、ノードの属
性情報として階層数を持たせているが、変形例ではハイ
パーリンクの階層数を基にして探索範囲内か否かを判別
するようになっている。装置の構成は図１に示したもの
と同一でありその説明を省略する。In the embodiments described so far, the number of layers is given as the attribute information of the node, but in the modification, it is determined whether or not it is within the search range based on the number of layers of the hyperlink. There is. The configuration of the device is the same as that shown in FIG. 1 and its explanation is omitted.

【００６６】図８は、変形例のノード・リンク探索装置
で用いられるノードテーブルの登録内容の一例を表わし
たものである。図２と同一の項目には同一の符号を付し
てあり、それらの説明を適宜省略する。ノードテーブル
７１は、図２に示したノードテーブル３１に比べて、階
層数と探索日時を登録する項目が削除されている点で相
違する。FIG. 8 shows an example of the registered contents of the node table used in the modified node / link search device. The same items as those in FIG. 2 are denoted by the same reference numerals, and the description thereof will be appropriately omitted. The node table 71 differs from the node table 31 shown in FIG. 2 in that items for registering the number of layers and the search date and time are deleted.

【００６７】図９は、変形例のノード・リンク探索装置
で用いるリンクテーブルの登録内容の一例を表わしたも
のである。図３と同一項目には同一の符号を付してあ
り、それらの説明を適宜省略する。リンクテーブル８１
は、図３に示したリンクテーブル５１に加えて、階層数
の項目８２と、探索日時の項目８３を備えている。これ
らは、実施例においてはノードテーブルに登録されてい
たものである。変形例では、探索に先立って、探索の開
始点となるハイパーリンクを少なくとも１つリンクテー
ブル８１に登録しておく必要がある。ここで登録される
探索開始点となるハイパーリンクは、仮想的なものであ
り、リンク元ノードが存在しない（不明）。また、探索
の開始点となるハイパーリンクの示すリンク先ノードに
ついての属性情報をノードテーブル７１に予め登録して
おかなければならない。FIG. 9 shows an example of the registered contents of the link table used in the node / link search device of the modified example. The same items as those in FIG. 3 are denoted by the same reference numerals, and the description thereof will be appropriately omitted. Link table 81
In addition to the link table 51 shown in FIG. 3, it includes an item 82 for the number of layers and an item 83 for the search date and time. These are registered in the node table in the embodiment. In the modified example, it is necessary to register at least one hyperlink to be the starting point of the search in the link table 81 before the search. The hyperlink that is the search start point registered here is a virtual one, and the link source node does not exist (unknown). Also, the attribute information about the link destination node indicated by the hyperlink that is the search start point must be registered in advance in the node table 71.

【００６８】探索に先立ってノードテーブル７１に登録
する内容として、まず探索の起点のハイパーリンクの示
すリンク先ノードのノード識別子がある。この値はＤＢ
ＭＳによって自動的に割り当てられる。さらに、ノード
の格納位置を示すＵＲＬの文字列に対応した項目とし
て、スキーム、ホスト名、ポート番号、スキーム特有部
を初期登録しておく。また、このノードがデータベース
に登録された時刻を表わす登録日時を初期登録する。図
８の例では、点線７２で囲んだ範囲の項目が初期登録さ
れる。As the contents to be registered in the node table 71 prior to the search, there is the node identifier of the link destination node indicated by the hyperlink of the search starting point. This value is DB
Assigned automatically by MS. Furthermore, a scheme, a host name, a port number, and a scheme-specific part are initially registered as items corresponding to the URL character string indicating the storage location of the node. Also, the registration date and time indicating the time when this node is registered in the database is initially registered. In the example of FIG. 8, items in the range surrounded by the dotted line 72 are initially registered.

【００６９】リンクテーブル８１に初期登録しておく内
容としては、探索の起点となるハイパーリンクの識別子
であるリンク識別子がある。この値は、ＤＢＭＳによっ
て自動生成される。探索の起点となるハイパーリンクの
リンク元は不明であるので、リンク元ノード識別子の初
期値は未定義を表わす“０”とする。リンク先ノード識
別子は、リンク先となるノードに対してＤＢＭＳの割り
当てたノード識別子と同一の値を登録しておく。階層数
は、探索の開始点であるの“０”を初期登録する。また
探索日時は、当該ハイパーリンクについての探索を行っ
た日時を登録するものであり、探索開始の初期値として
は未定義のままとする。図９の例では点線８４で囲んだ
項目が初期登録される。The contents initially registered in the link table 81 include a link identifier which is the identifier of the hyperlink that is the starting point of the search. This value is automatically generated by the DBMS. Since the link source of the hyperlink that is the starting point of the search is unknown, the initial value of the link source node identifier is set to "0" indicating undefined. As the link destination node identifier, the same value as the node identifier assigned by the DBMS is registered for the link destination node. As the number of layers, "0" which is the starting point of the search is initially registered. The search date and time is to register the date and time when the search for the hyperlink was performed, and is left undefined as the initial value for starting the search. In the example of FIG. 9, items enclosed by a dotted line 84 are initially registered.

【００７０】図１０は、変形例におけるノード・リンク
探索装置の行う処理の流れを表わしたものである。一点
破線１０１で囲まれたステップは、システム制御部２５
の行う処理を表わしている。一点破線１０２で囲まれた
ステップは、探索範囲制御部２４により、一点破線１０
３で囲まれたステップは、ファイル収集部２１およびハ
イパーリンク抽出部２２によって行われる処理を表わし
ている。FIG. 10 shows the flow of processing performed by the node / link search device in the modification. The steps surrounded by the dashed line 101 are the system control unit 25.
Represents the processing performed by. The steps surrounded by the dashed-dotted line 102 are controlled by the search range control unit 24 by the dashed-dotted line 10
The steps surrounded by 3 represent the processes performed by the file collection unit 21 and the hyperlink extraction unit 22.

【００７１】まず、システム制御部２５は、当該システ
ムの初期化を行う（ステップＳ５０１）。この際、探索
範囲とするハイパーリンクの階層数の最大値としての最
大階層数と、前回の探索結果を古いものとして扱う基準
となる経過時間とを設定する。さらにホスト名によって
サーバの範囲を設定する場合には、ホスト名を制限する
ための文字列を入力する。これらは、図示しない入力端
末あるいは外部ファイルから取り込む。次に、リンクテ
ーブル８１の中に探索していないハイパーリンクが存在
するか否かを調べる（ステップＳ５０２）。探索されて
いないハイパーリンクとは、探索日時が未定義（未登
録）のもの、あるいは探索日時が現在時刻よりも経過時
間以上古いものである。First, the system control unit 25 initializes the system (step S501). At this time, the maximum number of hierarchies as the maximum number of hierarchies of the search range and the elapsed time serving as a reference for treating the previous search result as old are set. When setting the server range by host name, enter a character string to limit the host name. These are taken in from an input terminal (not shown) or an external file. Next, it is checked whether or not there is a hyperlink that has not been searched in the link table 81 (step S502). The unsearched hyperlinks have undefined (unregistered) search date / time or have a search date / time older than the current time by an elapsed time or more.

【００７２】探索されていないハイパーリンクがリンク
テーブル８１に存在するときは（ステップＳ５０２；
Ｙ）、探索されていないハイパーリンクのうちの１つを
選択しその属性情報を取り出す（ステップＳ５０３）。
次に、探索範囲制御部２４は、選択した１つのハイパー
リンクが探索範囲内であるかどうかを判定する（ステッ
プＳ５０４）。階層数あるいは階層数とホスト名の双方
により判定されるが、その詳細については後に説明す
る。When a hyperlink that has not been searched exists in the link table 81 (step S502;
Y), one of the unsearched hyperlinks is selected and its attribute information is taken out (step S503).
Next, the search range control unit 24 determines whether or not one selected hyperlink is within the search range (step S504). It is determined by the number of layers or both the number of layers and the host name, the details of which will be described later.

【００７３】探索範囲内に存在する場合には（ステップ
Ｓ５０４；Ｙ）、テーブル更新処理（ステップＳ５０
５）を行う。この処理では、ハイパーリンクの示すリン
ク先ノードの属性情報をノードテーブル７１から取り出
し、このノードのドキュメントファイルをサーバマシン
から読み出し、これに登録されたリンク先ノードの属性
情報を追加登録する等を行う。処理の詳細な流れについ
ては後述する。ハイパーリンクの示すリンク先ノードに
ついてのテーブル更新処理（ステップＳ５０５）を終え
た後、再びステップＳ５０２に戻り、未探索のハイパー
リンクについての探索処理を繰り返す。If it exists within the search range (step S504; Y), the table updating process (step S50) is performed.
Perform 5). In this processing, the attribute information of the link destination node indicated by the hyperlink is taken out from the node table 71, the document file of this node is read from the server machine, and the attribute information of the link destination node registered in this is additionally registered. . The detailed flow of processing will be described later. After the table updating process (step S505) for the link destination node indicated by the hyperlink is completed, the process returns to step S502 again, and the search process for the unsearched hyperlink is repeated.

【００７４】選択したハイパーリンクが探索範囲外の場
合には（ステップＳ５０４；Ｎ）、リンクテーブル８１
中の選択したハイパーリンクについての探索日時３９を
現在時刻に変更し（ステップＳ５０６）、ステップＳ５
０２に戻る。未探索のハイパーリンクがリンクテーブル
８１に存在しなくなったとき（ステップＳ５０２；
Ｎ）、探索を終了するための終了処理を行い（ステップ
Ｓ５０７）、処理を終了する（エンド）。If the selected hyperlink is outside the search range (step S504; N), the link table 81
The search date / time 39 for the selected hyperlink is changed to the current time (step S506), and step S5
Return to 02. When an unsearched hyperlink no longer exists in the link table 81 (step S502;
N), an end process for ending the search is performed (step S507), and the process ends (end).

【００７５】図１１は、ハイパーリンクが探索範囲内か
否かを判定する際の処理の流れを表わしたものである。
ここでは、ハイパーリンクの階層数のみによって探索範
囲内か否かを判別している。まず、探索範囲制御部２４
は、選択したハイパーリンクの階層数と探索条件として
設定されている最大階層数とを比較する（ステップＳ６
０１）。ハイパーリンクの階層数が最大階層数よりも大
きい場合には（ステップＳ６０１；Ｎ）、このハイパー
リンクは探索範囲に存在しないものと判定する（ステッ
プＳ６０２）。ハイパーリンクの階層数が最大階層数よ
い小さいかあるいは等しい場合には（ステップＳ６０
１；Ｙ）、このハイパーリンクが探索範囲に存在するも
のと判定する（ステップＳ６０３）。FIG. 11 shows the flow of processing for determining whether or not the hyperlink is within the search range.
Here, whether or not it is within the search range is determined only by the number of layers of the hyperlink. First, the search range control unit 24
Compares the number of layers of the selected hyperlink with the maximum number of layers set as the search condition (step S6).
01). When the number of layers of the hyperlink is larger than the maximum number of layers (step S601; N), it is determined that this hyperlink does not exist in the search range (step S602). If the number of layers of the hyperlink is smaller than or equal to the maximum number of layers (step S60)
1; Y), it is determined that this hyperlink exists in the search range (step S603).

【００７６】図１２は、ハイパーリンクが探索範囲内か
否かを判定する処理の他の一例の流れを表わしたもので
ある。ここでは、ハイパーリンクの階層数による判別の
他に、ホスト名による判別を加えている。探索範囲制御
部２４は、選択したハイパーリンクの階層数と最大階層
数とを比較し（ステップＳ７０１）、ハイパーリンクの
階層数が最大階層数よりも大きい場合には（ステップＳ
７０１；Ｎ）、このハイパーリンクは探索範囲に存在し
ないものと判定する（ステップＳ７０２）。ハイパーリ
ンクの階層数が最大階層数よい小さいかあるいは等しい
場合には（ステップＳ７０１；Ｙ）、このハイパーリン
クの示すリンク先ノードについての属性情報をノードテ
ーブル７１から取り出す（ステップＳ７０３）。FIG. 12 shows the flow of another example of the processing for determining whether or not the hyperlink is within the search range. Here, in addition to the determination based on the number of layers of hyperlinks, the determination based on the host name is added. The search range control unit 24 compares the number of hierarchies of the selected hyperlink with the maximum number of hierarchies (step S701), and when the number of hierarchies of hyperlinks is larger than the maximum number of hierarchies (step S701).
701; N), it is determined that this hyperlink does not exist in the search range (step S702). If the number of layers of the hyperlink is smaller than or equal to the maximum number of layers (step S701; Y), the attribute information about the link destination node indicated by this hyperlink is taken out from the node table 71 (step S703).

【００７７】取り出したリンク先ノードの属性情報に含
まれるホスト名が探索範囲のホスト名として指定された
文字列と、部分一致するか否かを調べる（ステップＳ７
０４）。部分一致する場合には（ステップＳ７０４；
Ｙ）、先のハイパーリンクが探索範囲内に存在するもの
と判定する（ステップＳ７０５）。部分一致しない場合
には（ステップＳ７０５；Ｎ）、ハイパーリンクが探索
範囲内に存在しないと判定する（ステップＳ７０２）。
ここでは、階層数の判定を行ってからホスト名の部分一
致を判定したが、これらの順序を入れ換えて行ってもよ
い。It is checked whether or not the host name included in the extracted attribute information of the link destination node partially matches the character string specified as the host name in the search range (step S7).
04). If there is a partial match (step S704;
Y), it is determined that the preceding hyperlink exists within the search range (step S705). If there is no partial match (step S705; N), it is determined that the hyperlink does not exist within the search range (step S702).
Here, the number of layers is determined, and then the partial matching of host names is determined, but the order may be changed.

【００７８】図１３は、図１０に示したテーブル更新処
理の流れを表わしたものである。この処理は、ファイル
収集部２１とハイパーリンク抽出部２２により行われ
る。まず、図１０のステップＳ５０４により探索範囲内
に存在すると判別されたハイパーリンクの属性情報をリ
ンクテーブル８１から読み込む（ステップＳ８０１）。
読み込んだハイパーリンクの階層数をここでは、仮にｎ
とする。次に、このハイパーリンクのリンク先ノード識
別子に対応するノードの属性情報をノードテーブル７１
から読み込む（ステップＳ８０２）。FIG. 13 shows the flow of the table updating process shown in FIG. This processing is performed by the file collection unit 21 and the hyperlink extraction unit 22. First, the attribute information of the hyperlink determined to exist within the search range in step S504 of FIG. 10 is read from the link table 81 (step S801).
Here, if the number of read hyperlink layers is n
And Next, the attribute information of the node corresponding to the link destination node identifier of this hyperlink is displayed in the node table 71.
Is read from (step S802).

【００７９】ファイル収集部２１は、サーバマシンとの
通信に用いる図示しない通信用バッファと一時バッファ
クを初期化した後サーバマシンとの接続を行い、ハイパ
ーリンクの示すリンク先ノードのドキュメントファイル
の読み込み準備を行う（ステップＳ８０３）。続いてフ
ァイル収集部２１は、分散ハイパーメディアシステムの
転送プロトコルに従って、ドキュメントファイルの転送
を開始する。The file collection unit 21 initializes a communication buffer (not shown) used for communication with the server machine and a temporary buffer, and then connects to the server machine to prepare to read the document file of the link destination node indicated by the hyperlink. Is performed (step S803). Subsequently, the file collection unit 21 starts the transfer of the document file according to the transfer protocol of the distributed hypermedia system.

【００８０】ファイル収集部２１は、転送プロトコルメ
ッセージヘッダ部を読み込み（ステップＳ８０４）、ド
キュメントファイルの記述言語がハイパーメディア記述
言語形式でない場合には（ステップＳ８０５；Ｎ）、そ
のドキュメントファイルの終端まで読み込む（ステップ
Ｓ８０６）。そして、通信用バッファおよび一時バッフ
ァの解放ならびにサーバマシンとの接続を断するなどの
後処理を行い（ステップＳ８０７）、処理を終了する
（エンド）。The file collection unit 21 reads the transfer protocol message header portion (step S804), and if the description language of the document file is not in the hypermedia description language format (step S805; N), reads up to the end of the document file. (Step S806). Then, post-processing such as releasing the communication buffer and the temporary buffer and disconnecting the connection with the server machine is performed (step S807), and the processing ends (end).

【００８１】読み込んだファイルがハイパーメディア記
述言語形式の場合には（ステップＳ８０５；Ｙ）、ファ
イル収集部２１は、転送プロトコルメッセージ本体のド
キュュメントファイルを通信用バッファに読み込む（ス
テップＳ８０８）。次に、ハイパーリンクの部分を表わ
すタグの開始文字から終了文字までを一時バッファに移
動させ（ステップＳ８０９）、一時バッファに格納した
ハイパーリンクの中からリンク先ノードを記述している
部分を取り出す（ステップＳ８１０）。階層数算出部２
７は、抽出したハイパーリンクの階層数として“ｎ＋
１”を設定する（ステップＳ８１２）。When the read file is in the hypermedia description language format (step S805; Y), the file collection unit 21 reads the document file of the transfer protocol message body into the communication buffer (step S808). Next, the start character to the end character of the tag representing the hyperlink part is moved to the temporary buffer (step S809), and the part describing the link destination node is extracted from the hyperlink stored in the temporary buffer ( Step S810). Number of layers calculation unit 2
7 is “n +” as the number of layers of the extracted hyperlink.
1 "is set (step S812).

【００８２】このようにして得たノードとハイパーリン
クの属性情報を、ノード・リンク情報データベース２３
内のノードテーブル７１およびリンクテーブル８１に書
き込む（ステップＳ８１２）。この際、リンク元ノード
の属性情報、リンク先ノードの属性情報の各一部項目
と、ハイパーリンクの属性情報の全項目を書き込む。こ
れにより、今回の読み込んだノードのリンク先ノードに
ついての属性情報と、読み込んだノードからそのリンク
先ノードへのハイパーリンクについて登録が行われる。
ただし、同一のリンク元とリンク先を有するハイパーリ
ンクが既に登録されている場合には、テーブルへの追加
登録は行わず、探索日時などの更新のみを行う。これに
より同一のノードやハイパーリンクが多重登録されるこ
とが回避される。The node / hyperlink attribute information thus obtained is stored in the node / link information database 23.
It writes in the node table 71 and the link table 81 therein (step S812). At this time, some partial items of the attribute information of the link source node and the attribute information of the link destination node, and all items of the hyperlink attribute information are written. As a result, the attribute information about the link destination node of the currently read node and the hyperlink from the read node to the link destination node are registered.
However, if a hyperlink having the same link source and link destination has already been registered, additional registration in the table is not performed, and only the search date and time is updated. This avoids multiple registration of the same node or hyperlink.

【００８３】次に、今回読み込んだノードのドキュュメ
ントファイルの終端まで処理を行ったか否かを調べ（ス
テップＳ８１３）、終端に到らないときは（ステップＳ
８１３；Ｎ）、ステップＳ８０８に戻る。各ノードは、
複数のリンク先ノードを有することがあるので、ファイ
ルの終端までこのような処理を繰り返し行うことによ
り、今回読み込んだノードのリンク先ノードの全てにつ
いてノードテーブルとリンクテーブルへの登録が行われ
る。ファイルの終端まで処理したとき（ステップＳ８１
３；Ｙ）、処理を終了する（エンド）。Next, it is checked whether or not processing has been performed up to the end of the document file of the node read this time (step S813). If the end has not been reached (step S).
813; N), and returns to step S808. Each node is
Since there may be a plurality of link destination nodes, by repeating this processing until the end of the file, all the link destination nodes of the node read this time are registered in the node table and the link table. When processing is performed up to the end of the file (step S81
3; Y), and the process ends (end).

【００８４】ここで、図１３のステップＳ８１２におい
てノードテーブル７１とリンクテーブル８１に登録され
る属性の内容について説明する。サーバマシンから読み
込んだノードをリンク元ノードとし、このノードに含ま
れるハイパーリンクに記述されているノードをリンク先
ノードとする。まず、リンク元ノードについては、最終
更新日時４１を登録する。たとえば、図８に示すノード
テーブル７１において、ノード識別子が“１”のノード
をサーバマシンから読み出し、これのハイパーリンクの
指す先としてノード識別子が“２”のノードが記述され
ていたものとする。この場合は、ノード識別子が“１”
のノードがリンク元ノードであり、ノード識別子が
“２”のノードがリンク先ノードになる。The contents of the attributes registered in the node table 71 and the link table 81 in step S812 of FIG. 13 will be described. The node read from the server machine is the link source node, and the node described in the hyperlink included in this node is the link destination node. First, the last update date / time 41 is registered for the link source node. For example, in the node table 71 shown in FIG. 8, it is assumed that the node having the node identifier “1” is read from the server machine and the node having the node identifier “2” is described as the destination pointed to by the hyperlink. In this case, the node identifier is "1"
Is the link source node, and the node with the node identifier “2” is the link destination node.

【００８５】図８において点線７３で囲まれている最終
更新日時がリンク元ノードの属性情報として更新され
る。リンク先ノードの属性情報として登録されるのは図
８の点線７４で囲まれている部分である。すなわち、ノ
ード識別子と、ノードの位置を示すＵＲＬの文字列に対
応した、スキーム、ホスト名、ポート番号、スキーム特
有部、およびデータベースに登録された日時である登録
日時である。ノード識別子は、ＤＢＭＳによって自動生
成されたものが登録される。The last update date and time enclosed by the dotted line 73 in FIG. 8 is updated as the attribute information of the link source node. The portion enclosed by the dotted line 74 in FIG. 8 is registered as the attribute information of the link destination node. That is, the registration date and time, which is the date and time registered in the scheme, the host name, the port number, the scheme specific part, and the database corresponding to the node identifier and the character string of the URL indicating the position of the node. As the node identifier, the one automatically generated by the DBMS is registered.

【００８６】リンクテーブル８１には、ＤＢＭＳによっ
て自動生成されたリンク識別子と、リンク元ノード識別
子と、リンク先ノード識別子と、階層数と、探索日時が
登録される。図９を例に説明する。リンク識別子が
“１”のハイパーリンクを基にしてリンク先であるノー
ド識別子が“１”のノードを読み込んだものとする。読
み込んだノードに記述されているハイパーリンクにより
ノード識別子が“２”とノード識別子が“３”のノード
とがリンクされているものとする。In the link table 81, the link identifier automatically generated by the DBMS, the link source node identifier, the link destination node identifier, the number of layers, and the search date and time are registered. An example will be described with reference to FIG. It is assumed that the node having the node identifier “1” as the link destination is read based on the hyperlink having the link identifier “1”. It is assumed that the node identifier "2" and the node identifier "3" are linked by the hyperlink described in the read node.

【００８７】この際、ハイパーリンク８４を参照して探
索を行ったので、その時刻を探索日時８５として登録す
る。次に新たに得られたハイパーリンクについてのリン
ク識別子をＤＢＭＳから取得し、これを追加登録するリ
ンク識別子の欄に登録する。たとえば、リンク識別子が
“２”のハイパーリンク（８６）のリンク識別子（８
７）として“２”を登録する。この値は、テーブルの登
録順などを基準にＤＢＭＳにより適宜与えられるＩＤ番
号である。At this time, since the search is performed by referring to the hyperlink 84, the time is registered as the search date / time 85. Next, the link identifier for the newly obtained hyperlink is acquired from the DBMS and registered in the link identifier column for additional registration. For example, the link identifier (8) of the hyperlink (86) whose link identifier is "2"
Register “2” as 7). This value is an ID number that is appropriately given by the DBMS based on the registration order of the table.

【００８８】ハイパーリンク８６のリンク元ノードは、
そのノード識別子の値が“１”であるので、リンク元ノ
ード識別子８８として“１”を登録する。またリンク先
ノード識別子８９には、リンク先のノードの属性をノー
ドテーブル７１に登録する際にＤＢＭＳにより与えられ
たノード識別子の値、すなわち“２”を登録する。ま
た、階層数９１には、階層数“０”のハイパーリンクを
たどって得たノードから取得したハイパーリンクである
ので“０”に“１”を加えた値“１”を登録する。ノー
ド識別子が“３”のノードに向けてのハイパーリンク９
２についも同様の手順により登録される。追加登録した
ハイパーリンクについて図９の点線９３で示した範囲の
情報が登録されることになる。The link source node of the hyperlink 86 is
Since the value of the node identifier is “1”, “1” is registered as the link source node identifier 88. Further, in the link destination node identifier 89, the value of the node identifier given by the DBMS when registering the attribute of the link destination node in the node table 71, that is, "2" is registered. Further, since the number of layers 91 is a hyperlink obtained from a node obtained by tracing a hyperlink having the number of layers “0”, a value “1” obtained by adding “1” to “0” is registered. Hyperlink 9 to the node whose node identifier is "3"
The second step is also registered by the same procedure. The information of the range shown by the dotted line 93 in FIG. 9 is registered for the additionally registered hyperlink.

【００８９】ノードテーブル７１およびリンクテーブル
８１の更新は、１つのハイパーリンクを基にしてそのリ
ンク先ノードをサーバマシから読み込み、新たなリンク
先を見い出すたびに行われる。したがって、図１０の流
れ図において１つのノードについてテーブル更新処理
（ステップＳ５０５）を終えた後、ステップＳ５０２に
戻ると、テーブル更新処理において新たに登録されたハ
イパーリンクもその探索対象になる。これにより、次々
とリンク先への探索が進められる。ただし、その探索範
囲は階層数などの条件によって制限される。The node table 71 and the link table 81 are updated each time a new link destination is found by reading the link destination node from the server machine based on one hyperlink. Therefore, when the table update process (step S505) is completed for one node in the flowchart of FIG. 10 and the process returns to step S502, the hyperlink newly registered in the table update process also becomes the search target. As a result, the search for link destinations proceeds one after another. However, the search range is limited by conditions such as the number of layers.

【００９０】また、探索日時が未定義か経過時間以上過
去の時刻であることを未探索の判定基準にしているの
で、一度探索したハイパーリンクについて再度調べるこ
とがない。これにより、たとえばノード“Ａ”のリンク
先がノード“Ｂ”で、ノード“Ｂ”のリンク先がノード
“Ａ”のようにリンクによりループが形成されている場
合であっても、ノード“Ａ”からノード“Ｂ”へのハイ
パーリンクを再度調べることが無く、最大階層数までル
ープをたどるようなことがない。Since the unsearched determination criterion is that the search date and time is undefined or a time that is past the elapsed time or more, the hyperlink searched once is not checked again. As a result, for example, even if the link destination of the node “A” is the node “B” and the destination destination of the node “B” is the node “A”, a loop is formed by the links. The hyperlink from "" to the node "B" is not checked again, and the loop is not traced up to the maximum number of layers.

【００９１】このようにハイパーリンクの階層数を基に
探索範囲を制限しても、ノードの階層数を基に探索範囲
を制限したときと同様の効果を得ることができる。ただ
し、ハイパーリンクに階層数を付与する場合には、リン
クテーブルを参照して得たハイパーリンクのリンク先ノ
ードを調べるために、ノードテーブルも参照する必要が
あり、実施例にようにノードに階層数を付与した場合に
比べてテーブルの参照処理が増加する。Thus, even if the search range is limited based on the number of layers of hyperlinks, the same effect as when the search range is limited based on the number of layers of nodes can be obtained. However, when giving the number of layers to the hyperlink, it is necessary to also refer to the node table in order to check the link destination node of the hyperlink obtained by referring to the link table. The reference processing of the table is increased as compared with the case where the number is added.

【００９２】[0092]

【発明の効果】以上詳細に説明したように請求項１ない
し請求項３記載の発明によれば、起点のノードからの階
層数によって探索範囲を制限しているので、探索される
ノードは起点のノードと意味的なつながりの強いものの
みとなる。また階層数により探索範囲を制限できるの
で、必要な範囲での情報のみを収集することができる。
また、不要なノードの探索が行われないので、サーバお
よびサーバとの間の通信回線の負担を軽減できるととも
に探索に要する時間を短くすることができる。As described in detail above, according to the inventions of claims 1 to 3, since the search range is limited by the number of layers from the starting node, the node to be searched is the starting point. Only those that have a strong semantic connection with the node. Further, since the search range can be limited by the number of layers, it is possible to collect only the information within the required range.
Further, since unnecessary nodes are not searched, the load on the server and the communication line between the servers can be reduced and the time required for the search can be shortened.

【００９３】また請求項４記載の発明によれば、１度読
み込んだことのあるノードを再度読み込むことを防止し
たので、ループした範囲を繰り返し探索することを回避
することができる。これにより探索を効率良く行うこと
ができる。According to the fourth aspect of the present invention, since the node that has been read once is prevented from being read again, it is possible to avoid repeatedly searching the looped range. This allows the search to be performed efficiently.

[Brief description of drawings]

【図１】本発明の一実施例におけるノード・リンク探索
装置の構成の概要を表わしたブロック図である。FIG. 1 is a block diagram showing an outline of a configuration of a node / link search device according to an embodiment of the present invention.

【図２】ノードテーブルの登録内容の一例を表わした説
明図である。FIG. 2 is an explanatory diagram showing an example of registered contents of a node table.

【図３】リンクテーブルの登録内容の一例を表わした説
明図である。FIG. 3 is an explanatory diagram showing an example of registered contents of a link table.

【図４】ノード・リンク探索装置の行う動作の流れを表
わした流れ図である。FIG. 4 is a flowchart showing a flow of operations performed by a node / link search device.

【図５】探索範囲内に存在するノードであるか否かを判
定する際の処理の流れを表わした流れ図である。FIG. 5 is a flowchart showing the flow of processing when determining whether or not a node exists within the search range.

【図６】探索範囲内に存在するノードであるか否かを判
定する処理の他の一例の流れを表わした流れ図である。FIG. 6 is a flow chart showing the flow of another example of the processing for determining whether or not the node exists within the search range.

【図７】図４に示したテーブル更新処理の流れを表わし
た流れ図である。FIG. 7 is a flowchart showing a flow of the table updating process shown in FIG.

【図８】変形例のノード・リンク探索装置で用いられる
ノードテーブルの登録内容の一例を表わした説明図であ
る。FIG. 8 is an explanatory diagram showing an example of registered contents of a node table used in a node / link search device of a modified example.

【図９】変形例のノード・リンク探索装置で用いられる
リンクテーブルの登録内容の一例を表わした説明図であ
る。FIG. 9 is an explanatory diagram showing an example of registered contents of a link table used in a node / link search device of a modified example.

【図１０】変形例におけるノード・リンク探索装置の行
う動作の流れを表わした流れ図である。FIG. 10 is a flowchart showing a flow of operations performed by a node / link search device in a modified example.

【図１１】ハイパーリンクが探索範囲内に存在するか否
かを判定する際の処理の流れを表わした流れ図である。FIG. 11 is a flowchart showing the flow of processing when determining whether or not a hyperlink exists within the search range.

【図１２】ハイパーリンクが探索範囲内に存在するか否
かを判定する処理の他の一例の流れを表わした流れ図で
ある。FIG. 12 is a flowchart showing another example of the processing for determining whether or not the hyperlink exists within the search range.

【図１３】図１０に示したテーブル更新処理の流れを表
わした流れ図である。13 is a flowchart showing the flow of the table update processing shown in FIG.

【図１４】ノードとこれらノード間を接続するハイパー
リンクの一例を表わした説明図である。FIG. 14 is an explanatory diagram showing an example of nodes and hyperlinks connecting these nodes.

【図１５】ノードのドキュメント同士の関係の一例を表
わした説明図である。FIG. 15 is an explanatory diagram showing an example of a relationship between documents of a node.

【図１６】従来から使用されているノード・リンク探索
装置の構成の概要を表わしたブロック図である。FIG. 16 is a block diagram showing an outline of a configuration of a conventionally used node / link search device.

[Explanation of symbols]

１１ノード・リンク探索装置１２ネットワーク１３サーバマシン２１ファイル収集部２２ハイパーリンク抽出部２３ノード・リンク情報データベース２４探索範囲制御部２５システム制御部２６ハイパーメディア記述言語解析部２７階層数算出部２８ホスト名比較部２９階層数比較部３１、７１ノードテーブル５１、８１リンクテーブル 11 node / link search device 12 network 13 server machine 21 file collection unit 22 hyperlink extraction unit 23 node / link information database 24 search range control unit 25 system control unit 26 hypermedia description language analysis unit 27 hierarchy number calculation unit 28 host name Comparison unit 29 Number of layers comparison unit 31, 71 Node table 51, 81 Link table

Claims

[Claims]

1. A search from an arbitrary node to a link destination node is sequentially performed based on link information indicating a name of a link destination node and a storage position of the link destination node included in each node of the hypertext. Search condition setting means for setting the limiting condition of the search range at the time of execution as the maximum value of the number of layers, which is the number of nodes existing between the node that is the starting point of the search and the node of the search destination, and the link information File collection means for repeatedly reading the content of the linked node from the server that stores it, starting from the node that is the starting point of the search, and the node for each time the content of one node is read by this file collection means Node that stores the link information included in and the information indicating the correspondence between the node of the link destination indicated by this and the node read this time. Link information storage means, a layer number calculation means for obtaining the layer number of the node of the link destination indicated by the link information contained in the node each time the content of one node is read by the file collection means, and the layer number calculation means And a search range limiting means for stopping the reading by the file collecting means of the contents of the nodes linked after the node read this time when the obtained number of hierarchies is larger than the maximum value of the number of hierarchies set by the search condition setting means. A node / link search device comprising:

2. A search from an arbitrary node to a link destination node is sequentially performed based on link information indicating the name of the link destination node and the storage location of the link destination node included in each node of the hypertext. Search condition setting means for setting the search range limiting condition when performing the search as a maximum value of the number of layers, which is the number of links existing from the node that is the starting point of the search to the node of the search destination, and the link information File collection means for repeatedly reading the content of the linked node from the server that stores it, starting from the node that is the starting point of the search, and the node for each time the content of one node is read by this file collection means Node that stores the link information included in and the information indicating the correspondence between the node of the link destination indicated by this and the node read this time. Link information storage means, a layer number calculation means for obtaining the layer number of the link to the link destination node represented by the link information contained in the node each time the content of one node is read by the file collection means, and the number of layers When the number of layers obtained by the calculation means is larger than the maximum value of the number of layers set by the search condition setting means, the search range for stopping the reading of the contents of the node linked after the node read this time by the file collecting means A node / link search device comprising: a limiting unit.

3. The node according to claim 1, wherein the nodes to be searched are distributed and stored in a plurality of servers connected to the network.
Link search device.

4. The file collection means, the same node determination means for determining whether the node of the link destination indicated by the link information included in the node read this time is the same as the node already read, 3. The node according to claim 1 or 2, further comprising: multiple read stop means for stopping the re-reading of the node when it is judged to be the same as the node already read by the same node judging means. -Link search device.