JP3597940B2

JP3597940B2 - HTML document book type shaping method and apparatus

Info

Publication number: JP3597940B2
Application number: JP8698996A
Authority: JP
Inventors: 健也鈴木; 俊哉吉宗; 英昭小澤; 洋浜田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1996-04-10
Filing date: 1996-04-10
Publication date: 2004-12-08
Anticipated expiration: 2016-04-10
Also published as: JPH09282218A

Description

【０００１】
【発明の属する技術分野】
本発明は、インターネットに蓄積されているＷＷＷ（ＷｏｒｌｄＷｉｄｅＷｅｂ）のようなＨＴＭＬ（ＨｙｐｅｒＴｅｘｔＭａｋｅｕｐＬａｎｇｕａｇｅ）文書を利用者が閲覧しやすい本の形に整形し表示する際に、ＨＴＭＬのリンクに本型の論理的構造を記述するための属性を追加し、その属性付きのリンクを用いて本の形に整形するための方法とその装置に関するものである。
【０００２】
【従来の技術】
従来のＨＴＭＬ文書を整形し表示するための装置、特にＷＷＷクライアントと呼ばれる装置においては、表示されるＨＴＭＬ文書を１表示装置につき１文書であった。そのＨＴＭＬ文書と他のＨＴＭＬ文書との関係はリンクを用いて表現され、例えそのＨＴＭＬ文書が他のＨＴＭＬ文書と一冊の本で表されるような密な関係をもっていたとしても、それぞれは独立に管理される。
【０００３】
このようなリンクを用いて、ＨＴＭＬ文書間の階層や前後関係などの論理的構造、例えば本のような目次、章、節など、を利用者に認識させるためには、「次ページ」、「前ページ」のようなリンクを設定し利用者にそのような遷移を行わせる必要がある。
【０００４】
【発明が解決しようとする課題】
従来の技術を用いた場合、ＨＴＭＬ文書間に本のような目次、章、節などの論理的構造を付与しようとしても、利用者に認識に頼った「次ページ」、「前ページ」のようなリンクを設定する必要があった。また、例え「次ページ」、「前ページ」のようなリンクが設定されていたとしても、それらのリンクは他のリンクと何ら区別されていないために、本の形に整形する際にどのリンクを使って順序づけすれば良いかという情報が不足し、これを計算機で処理することは難しかった。
【０００５】
本発明の目的は、ＨＴＭＬ文書間のリンクに本のような目次、章、節などの論理的構造を記述することができる属性を追加することで、ＨＴＭＬ文書間の論理的構造を記述し、その情報を使ってＨＴＭＬ文書を並べ替えることで効率的に本の形に整形し表示することができる方法とその装置を提供することにある。
【０００６】
【課題を解決するための手段】
上記目的を達成するため、請求項１記載の本発明は、インターネット上のハイパーテキスト情報などの情報をタグベースで記述するための構造記述言語であるＨＴＭＬを用いて記述されたＨＴＭＬ文書を整形する方法であって、任意の情報から他の情報に遷移するための、リンクと呼ばれるＨＴＭＬ文書内にある識別子に与えられた、複数のＨＴＭＬ文書間の本型の階層や前後関係といった論理的構造の記述である属性を解釈する第一の過程と、該属性を用いて該論理的構造を木構造に変換する第二の過程と、該木構造を該属性で表現された複数のＨＴＭＬ文書間の前後関係と矛盾の無いように並べ替える第三の過程と、該並べ替えられた木構造を基にＨＴＭＬ文書を線形に並べる第四の過程と、成ることを特徴とするＨＴＭＬ文書本型整形方法であり、ＨＴＭＬ文書を本型に整形することができることを最も主要な特徴とする。
【０００７】
請求項１記載の本発明にあっては、ＨＴＭＬ文書間のリンクに本のような目次、章、節などの論理的構造を記述することができ、従来の技術ではできなかったＨＴＭＬ文書間の論理的構造を記述することができるようになる。その情報を用いてＨＴＭＬ文書を並べ替えることで効率的に本の形に整形することができるようになる。
【０００８】
また、請求項２記載の本発明は、請求項１記載の発明において、前記第二の過程が、複数のＨＴＭＬ文書間の本型の論理的構造を記述した目次文書を用意し、該目次文書の記述を用いてＨＴＭＬ文書間の論理的構造を木構造に変換する過程であるとして、ＨＴＭＬ文書間の本型の論理的構造を記述した目次文書のみを与えることで、該目次文書の記述を用いてＨＴＭＬ文書間の論理的構造を木構造に変換する過程を有するものであり、ＨＴＭＬ文書そのものに本型の論理的構造を記述しなくてもＨＴＭＬ文書を本型に整形することができることを最も主要な特徴とする。
【０００９】
請求項２記載の本発明にあっては、ＨＴＭＬ文書そのものは従来の技術で記述されたものでも、ＨＴＭＬ文書間の本型の論理的構造を記述した目次文書を与えるだけで、従来の技術ではできなかったＨＴＭＬ文書間の論理的構造を記述することができるようになる。その情報を用いてＨＴＭＬ文書を並べ替えることで効率的に本の形に整形することができるようになる。
【００１０】
更に、請求項３記載の本発明は、請求項１または２記載の発明において、複数のＨＴＭＬ文書間の論理的構造が、該リンクの存在する順方向の関係を表現するＲＥＬ属性と逆方向の関係を表現するＲＥＶ属性とで記述されている場合、この論理的構造を本型の論理的構造の記述に変換する過程を前記第一の過程の前に新たに有し、ＨＴＭＬ文書間の論理的構造をＨＴＭＬのリンクに従来から存在するＲＥＬ属性やＲＥＶ属性で記述し、該記述を用いて表現されたＨＴＭＬ文書間の論理的構造を本型の論理的構造の記述に変換することで、ＨＴＭＬ文書を本型に整形することができることを最も主要な特徴とする。
【００１１】
請求項３記載の本発明にあっては、ＨＴＭＬのリンクの属性に対する拡張は行わなくとも、従来から存在するＲＥＬ属性やＲＥＶ属性を用いて、階層や前後関係などの本型の論理的関係を記述することで、ＨＴＭＬ文書間の論理的構造を記述することができるようになる。その情報を用いてＨＴＭＬ文書を並べ替えることで効率的に本の形に整形することができるようになる。
【００１２】
次に、上記目的を達成するため、請求項４記載の本発明は、インターネット上のハイパーテキスト情報などの情報をタグベースで記述するための構造記述言語であるＨＴＭＬを用いて記述されたＨＴＭＬ文書を整形する装置であって、ＨＴＭＬを用いて記述されたＨＴＭＬ文書を本型の構造に整形する手段と、本型の構造を画面上に本の形で表示する手段とを備えた装置において、ＨＴＭＬ文書間の本型の階層や前後関係といった論理的構造の記述である属性を解釈する手段と、該属性を用いて該論理的構造を木構造に変換する手段と、該木構造を該属性で表現された複数のＨＴＭＬ文書間の前後関係と矛盾の無いように並べ替える手段と、該並べ替えられた木構造を基にＨＴＭＬ文書を線形に並べる手段と、を備えることを特徴とするＨＴＭＬ文書本型整形装置であり、ＨＴＭＬ文書を本型に整形し表示することができることを最も主要な特徴とする。
【００１３】
請求項４記載の本発明を用いることで、ＨＴＭＬ文書間のリンクに本のような目次、章、節などの論理的構造を記述することができ、従来の技術ではできなかったＨＴＭＬ文書間の論理的構造を記述することができるようになる。その情報を用いてＨＴＭＬ文書を並べ替えることで効率的に本の形に整形し表示することができるようになる。
【００１４】
また、請求項５記載の本発明は、請求項４記載の発明において、前記論理的構造を木構造に変換する手段が、複数のＨＴＭＬ文書間の本型の論理的構造を記述した目次文書を用意し、これに基づいてＨＴＭＬ文書間の論理的構造を木構造に変換する手段であるとして、ＨＴＭＬ文書間の本型の論理的構造を記述した目次文書のみを与えることで、ＨＴＭＬ文書そのものに本型の論理的構造を記述しなくてもＨＴＭＬ文書を本型に整形することができることを最も主要な特徴とする。
【００１５】
請求項５記載の本発明にあっては、ＨＴＭＬ文書そのものは従来の技術で記述されたものでも、ＨＴＭＬ文書間の本型の論理的構造を記述した目次文書を与えるだけで、従来の技術ではできなかったＨＴＭＬ文書間の論理的構造を記述することができるようになる。その情報を用いてＨＴＭＬ文書を並べ替えることで効率的に本の形に整形することができるようになる。
【００１６】
更に、請求項６記載の本発明は、請求項４、または５記載の発明において、複数のＨＴＭＬ文書間の論理的構造が、リンクに存在する順方向の関係を表現するＲＥＬ属性と逆方向の関係を表現するＲＥＶ属性とで記述されている場合、この論理的構造を本型の論理的構造の記述に変換する手段を新たに備えることで、ＨＴＭＬ文書を本型に整形することができることを最も主要な特徴とする。
【００１７】
請求項６記載の本発明にあっては、ＨＴＭＬのリンクの属性に対する拡張は行わなくとも、従来から存在するＲＥＬ属性やＲＥＶ属性を用いて、階層や前後関係などの本型の論理的関係を記述することで、ＨＴＭＬ文書間の論理的構造を記述することができるようになる。その情報を用いてＨＴＭＬ文書を並べ替えることで効率的に本の形に整形することができるようになる。
【００１８】
以下に、本発明の作用を述べる。
【００１９】
請求項１記載の本発明において、リンクにＨＴＭＬ文書間の本型の階層や前後関係などの論理的構造の記述として与えられた属性を解釈する過程は、従来の技術では解釈することができなかった本型の論理的構造を解釈することができるようにしている。また、該属性を用いて表現されたＨＴＭＬ文書間の論理的構造を木構造に変換する過程は、各ＨＴＭＬ文書に分散した論理的構造の記述を一つの木構造として表現することで集中化して扱うことができるようにしている。更に、該木構造を前記属性で表現された文書間の前後関係とできるだけ矛盾のないように並べ替える過程は、より上位の階層で記述された前後関係を補助としてＨＴＭＬ文書間のリンクに記述されている前後関係の順に並べ替えることで、ＨＴＭＬ文書間に前後関係が記述されていない場合や矛盾した記述を含む場合にも正常に並べ替えが行われるようにしている。一方、該並べ替えられた木構造を基にＨＴＭＬ文書を線形に並べる過程は、ＨＴＭＬ文書に記述された本型の論理的構造にできるだけ適合させた木構造を深さ優先で探索することで、ＨＴＭＬ文書を線形に並べている。従って、ＨＴＭＬ文書間の論理的構造を記述し、その情報を使ってＨＴＭＬ文書を並べ替えることが可能となり、本発明の目的であるＨＴＭＬ文書を効率的に本の形に整形することができるようになる。
【００２０】
請求項２記載の本発明において、目次文書の記述を用いて表現されたＨＴＭＬ文書間の論理的構造を木構造に変換する過程は、与えられた目次文書の記述を文書の先頭から順に展開することでＨＴＭＬ文書間の階層や前後関係などを得、その情報を用いてＨＴＭＬ文書間の論理構造を木構造に変換することを行っている。従って、ＨＴＭＬ文書そのものには本型の論理的構造が記述されていなくても、目次文書の記述からＨＴＭＬ文書間の論理構造を木構造に変換することが可能となり、本発明の目的であるＨＴＭＬ文書を効率的に木の形に整形することができるようになる。
【００２１】
請求項３記載の本発明において、ＨＴＭＬのリンクに従来から存在するＲＥＬ属性やＲＥＶ属性の記述を用いて表現されたＨＴＭＬ文書間の論理的構造を本型の論理的構造の記述に変換する過程は、リンクのＲＥＬ属性やＲＥＶ属性によって表現された親子関係や前後関係を本型の階層関係や前後関係に変換することを行っている。従って、ＨＴＭＬ文書間の論理的構造を記述し、その情報を使ってＨＴＭＬ文書を並べ替えることが可能となり、本発明の目的であるＨＴＭＬ文書を効率的に本の形に整形することができるようになる。
【００２２】
請求項４記載の本発明において、リンクにＨＴＭＬ文書間の本型の階層や前後関係などの論理的構造の記述として与えられた属性を解釈する手段は、従来の技術では解釈することができなかった本型の論理的構造を解釈することができるようにしている。次に、該属性を用いて表現されたＨＴＭＬ文書間の論理的構造を木構造に変換する手段は、各ＨＴＭＬ文書に分散した論理的構造の記述を一つの木構造として表現することで集中化して扱うことができるようにしている。更に、該木構造を前記属性で表現された文書間の前後関係とできるだけ矛盾のないように並べ替える手段は、より上位の階層で記述された前後関係を補助としてＨＴＭＬ文書間のリンクに記述されている前後関係の順に並べ替えることで、ＨＴＭＬ文書間に前後関係が記述されていない場合や矛盾した記述を含む場合にも正常に並べ替えが行われるようにしている。最後に、該並べ替えられた木構造を基にＨＴＭＬ文書を線形に並べる手段は、ＨＴＭＬ文書に記述された本型の論理構造にできるだけ適合させた木構造を深さ優先で探索することで、ＨＴＭＬ文書を線形に並べている。従って、ＨＴＭＬ文書間の論理的構造を記述し、その情報を使ってＨＴＭＬ文書を並べ替えることが可能となり、本発明の目的であるＨＴＭＬ文書を効率的に本の形に整形し表示することができる装置を提供することができるようになる。
【００２３】
請求項５記載の本発明において、目次文書の記述を用いて表現されたＨＴＭＬ文書間の論理的構造を木構造に変換する手段は、与えられた目次文書の記述を文書の先頭から順に展開することでＨＴＭＬ文書間の階層や前後関係などを得、その情報を用いてＨＴＭＬ文書間の論理構造を木構造に変換することを行っている。従って、ＨＴＭＬ文書そのものには本型の論理的構造が記述されていなくても、目次文書の記述からＨＴＭＬ文書間の論理構造を木構造に変換することが可能となり、本発明の目的であるＨＴＭＬ文書を効率的に木の形に整形することができるようになる。
【００２４】
請求項６記載の本発明において、ＨＴＭＬのリンクに従来から存在するＲＥＬ属性やＲＥＶ属性で記述されたＨＴＭＬ文書間の論理的構造を本型の論理的構造の記述に変換する手段は、リンクのＲＥＬ属性やＲＥＶ属性によって表現された親子関係や前後関係を本型の階層関係や前後関係に変換することを行っている。従って、ＨＴＭＬ文書間の論理的構造を記述し、その情報を使ってＨＴＭＬ文書を並べ替えることが可能となり、本発明の目的であるＨＴＭＬ文書を効率的に本の形に整形することができるようになる。
【００２５】
【発明の実施の形態】
以下、図面を用いて本発明の実施形態例について説明する。
【００２６】
〔実施形態例１〕
図１は、本発明の第一の実施形態例によって整形された結果の本型の論理的構造をモデル化した図である。同図に示す本１は論理的構造としての本全体を表す。このような本は、通常、まえがき２や目次３、本文４、参考文献目録５、索引６、その他７から構成される。ここで、本文４は更に章８が繰り返されたもので構成され、章８は節９が繰り返されたもの、節９はページ１０が繰り返されたもの、ページ１１は単語１１が繰り返されたもので構成される。まえがき２やその他７も本文４と同様に、節９や章８が繰り返されたもので構成される。その他７には、付録や補追と呼ばれるものが該当する。また、目次３は主に章８は節９のような本内部への参照で構成される。索引６も同様にページ１０などの本内部への参照で構成される。一方、参考文献目録５は別の本など本の外部の情報への参照で構成され、本内部の単語１１などから参照される。本発明では、ページ１０と単語１１の論理的構造以外を記述し、ページ１０の繰り返しの構造は自動的に作成する。
【００２７】
図２は、本発明の第一の実施形態例における上記の論理的構造をリンクに記述するために必要となる属性を示した図である。同図に示す属性ｂｏｏｋ１２は図１における本１に対応し、本全体を記述したＨＴＭＬ文書からのリンクであることを表している。また、属性ｓｅｃｔｉｏｎ１３は図１におけるまえがき２やその他７、本文４の章８や節９に対応し、章や節などのページをまとめた構造のＨＴＭＬ文書からのリンクであることを表している。属性ｉｎｄｅｘ１４は図１における目次３や索引６に対応し、本内部への参照をまとめたＨＴＭＬ文書からのリンクであることを表している。属性ｂｉｂｌｉｏｇｒａｐｈｙ１５は図１における参考文献目録５に対応し、本の外部への参照をまとめたＨＴＭＬ文書からのリンクであることを表している。
【００２８】
属性ｂｏｏｋ１２の値としては、まえがき２や本文４、その他７などの属性ｓｅｃｔｉｏｎ１３で表されるＨＴＭＬ文書へのリンクの場合は“ｓｅｃｔｉｏｎ”、目次３や索引６などの属性ｉｎｄｅｘ１４で表されるＨＴＭＬ文書へのリンクの場合は“ｉｎｄｅｘ”、参考文献目録５などの属性ｂｉｂｌｉｏｇｒａｐｈｙ１５で表されるＨＴＭＬ文書へのリンクの場合は“ｂｉｂｌｉｏｇｒａｐｈｙ”を与える。また、続き物小説のように何冊かの本で一つのまとまりとなる本を記述するために、論理的に前を表す“ｐｒｅｖｉｏｕｓ”や、論理的に後ろを表す“ｎｅｘｔ”を与えることもできる。更に、作者を表す“ｍａｄｅ”を与えることもできる。次に、属性ｓｅｃｔｉｏｎ１３の値としては、リンクの書かれているＨＴＭＬ文書が章ならば節、節ならば項を表すＨＴＭＬ文書へのリンクに対して“ｓｅｃｔｉｏｎ”を与える。また、章や節などの論理的な前後関係を記述するために、前を表す“ｐｒｅｖｉｏｕｓ”や、後ろを表す“ｎｅｘｔ”を与えることもできる。属性ｉｎｄｅｘ１４の値としては、本内部の情報に対する参照のリンクの場合は“ｒｅｆｅｒ”を与える。また、目次や索引などが複数のＨＴＭＬ文書に渡って記述されている場合には、論理的に前を表す“ｐｒｅｖｉｏｕｓ”や、論理的に後ろを表す“ｎｅｘｔ”を与えることもできる。属性ｂｉｂｌｉｏｇｒａｐｈｙ１５の値としては、本外部の情報に対する参照のリンクの場合に、“ｒｅｆｅｒ”を与える。また、参照文献目録などが複数のＨＴＭＬ文書に渡って記述されている場合には、論理的に前を表す“ｐｒｅｖｉｏｕｓ”や、論理的に後ろを表す“ｎｅｘｔ”を与えることもできる。
【００２９】
本発明の第一の実施形態例における論理的構造を表す上記の属性をリンクに記述した例を以下に示す。
【００３０】
ｂｏｏｋ．ｈｔｍｌ〜１６
＜ｈｅａｄ＞
＜ｌｉｎｋｂｏｏｋ＝“ｍａｄｅ”ｈｒｅｆ＝“ｍａｉｌｔｏ：ｋｅｎｙａ＠ｎｔｔ．ｊｐ”＞
＜／ｈｅａｄ＞
＜ｂｏｄｙ＞
＜Ａｂｏｏｋ＝“ｉｎｄｅｘ”ｈｒｅｆ＝“ｍｏｋｕｊｉ．ｈｔｍｌ”＞目次＜／Ａ＞＜ｐ＞
＜Ａｂｏｏｋ＝“ｓｅｃｔｉｏｎ”ｈｒｅｆ＝“ｃｈａｐｌ．ｈｔｍｌ”＞第一章＜／Ａ＞＜ｐ＞
＜Ａｂｏｏｋ＝“ｂｉｂｌｉｏｇｒａｐｈｙ”ｈｒｅｆ＝“ｂｉｂ．ｈｔｍｌ”＞参考文献＜／Ａ＞＜ｐ＞
＜／ｂｏｄｙ＞
ｍｏｋｕｊｉ．ｈｔｍｌ〜１７
＜ｂｏｄｙ＞
＜ｈ１＞目次＜ｈ１＞＜ｐ＞
＜Ａｉｎｄｅｘ＝“ｒｅｆｅｒ”ｈｒｅｆ＝“ｃｈａｐ１．ｈｔｍｌ”＞第一章＜／Ａ＞＜ＢＲ＞
＜Ａｉｎｄｅｘ＝“ｒｅｆｅｒ”ｈｒｅｆ＝“ｓｅｃ１．ｈｔｍｌ”＞第一節＜／Ａ＞＜ＢＲ＞
＜Ａｉｎｄｅｘ＝“ｒｅｆｅｒ”ｈｒｅｆ＝“ｓｅｃ２．ｈｔｍｌ”＞第二節＜／Ａ＞
＜ｐ＞
＜Ａｉｎｄｅｘ＝“ｒｅｆｅｒ”ｈｒｅｆ＝“ｃｈａｐ２．ｈｔｍｌ”＞第二章＜／Ａ＞
＜／ｂｏｄｙ＞
ｃｈａｐ１．ｈｔｍｌ〜１８
＜ｂｏｄｙ＞
＜ｈ１＞第一章＜ｈ１＞＜ｐ＞
＜Ａｓｅｃｔｉｏｎ＝“ｓｅｃｔｉｏｎ”ｈｒｅｆ＝“ｓｅｃ１．ｈｔｍｌ”＞第一節＜／Ａ＞＜ｐ＞
＜Ａｓｅｃｔｉｏｎ＝“ｓｅｃｔｉｏｎ”ｈｒｅｆ＝“ｓｅｃ２．ｈｔｍｌ”＞第二節＜／Ａ＞＜ｐ＞
＜／ｂｏｄｙ＞
ｓｅｃ１．ｈｔｍｌ〜１９
＜ｂｏｄｙ＞
＜ｈ２＞第一節＜／ｈ２＞＜ｐ＞
これは本型表示のテストページです。＜ｐ＞
＜Ａｓｅｃｔｉｏｎ＝“ｎｅｘｔ”ｈｒｅｆ＝“ｓｅｃ２．ｈｔｍｌ”＞次節＜／Ａ＞＜ｐ＞
＜／ｂｏｄｙ＞
ｓｅｃ２．ｈｔｍｌ〜２０
＜ｂｏｄｙ＞
＜ｈ２＞第二節＜／ｈ２＞＜ｐ＞
これは本型表示のテストページの第二節です。＜ｐ＞
＜Ａｓｅｃｔｉｏｎ＝“ｐｒｅｖｉｏｕｓ”ｈｒｅｆ＝“ｓｅｃ１．ｈｔｍｌ”＞前節＜／Ａ＞＜ｐ＞
＜／ｂｏｄｙ＞
ｂｉｂ．ｈｔｍｌ〜２１
＜ｂｏｄｙ＞
＜ｈ１＞参考文献目録＜／ｈ１＞＜ｐ＞
〔１〕
＜Ａｂｉｂｌｉｏｇｒａｐｈｙ＝“ｒｅｆｅｒ”ｈｒｅｆ＝“ｈｔｔｐ：／／ｗｗｗ．ｎｔｔ．ｊｐ／”＞ＮＴＴＨｏｍｅＰａｇｅ＜／Ａ＞＜ＢＲ＞
〔２〕
＜Ａｂｉｂｌｉｏｇｒａｐｈｙ＝“ｒｅｆｅｒ”ｈｒｅｆ＝“ｈｔｔｐ：／／ｈｉｌ．ｎｔｔ．ｊｐ／”＞ＮＴＴＨｕｍａｍＩｎｔｅｒｆａｃｅｌａｂ＜／Ａ＞＜ＢＲ＞
＜／ｂｏｄｙ＞
上記に示すｂｏｏｋ．ｈｔｍｌ１６は、図１における本１に相当するＨＴＭＬ文書で本全体を表している。＜ｈｅａｄ＞タグと＜／ｈｅａｄ＞タグで囲まれたヘッダに、属性ｂｏｏｋ＝“ｍａｄｅ”をもつ＜ｌｉｎｋ＞タグによってこの本の作者を表すリンクが記述されている。ここで、ｈｒｅｆ＝“文字列”は、その文字列をリンクの識別子とすることを示し、その文字列のことをＵｎｉｆｏｒｍＲｅｓｏｕｒｃｅＩｄｅｎｔｉｆｉｅｒ略してＵＲＩと呼ぶ。また、＜ａ＞タグによって目次や第一章、参考文献目録へのリンクが記述されている。ｍｏｋｕｊｉ．ｈｔｍｌ１７は、図１における目次３に相当するＨＴＭＬ文書で、ｂｏｏｋ．ｈｔｍｌ１６から属性ｂｏｏｋ＝“ｉｎｄｅｘ”のリンクで参照されている。この文書１７には、本全体の構成を表す目次で属性ｉｎｄｅｘ＝“ｒｅｆｅｒ”をもつ＜ａ＞タグによって第一章、第一章第一節、第一章第二節、第二章への本内部参照を表すリンクが記述されている。このｍｏｋｕｊｉ．ｈｔｍｌ１７は、その他のＨＴＭＬ文書の内容から自動的に生成することができる。ｃｈａｐ１．ｈｔｍｌ１８は、図１における章８に相当するＨＴＭＬ文書で、ｂｏｏｋ．ｈｔｍｌ１６から属性ｂｏｏｋ＝“ｓｅｃｔｉｏｎ”を持つリンクで参照されている。この文書１８には、属性ｓｅｃｔｉｏｎ＝“ｓｅｃｔｉｏｎ”をもつ＜ａ＞タグによって第一章を構成する第一節、第二節へのリンクが記述されている。ｓｅｃ１．ｈｔｍｌ１９やｓｅｃ２．ｈｔｍｌ２０は、は図１における節９に相当するＨＴＭＬ文書で、ｃｈａｐ１．ｈｔｍｌ１８から属性ｓｅｃｔｉｏｎ＝“ｓｅｃｔｉｏｎ”を持つリンクで参照されている。これらの文書１９，２０には、第一章を構成する第一節と第二節の内容と、それらの論理的前後関係を示すリンクが属性ｓｅｃｔｉｏｎ＝“ｎｅｘｔ”や属性ｓｅｃｔｉｏｎ＝“ｐｒｅｖｉｏｕｓ”をもつ＜ａ＞タグによって記述されている。ｂｉｂ．ｈｔｍｌ２１は、図１における参考文献目録５に相当するＨＴＭＬ文書で、ｂｏｏｋ．ｈｔｍｌ１６から属性ｂｏｏｋ＝“ｂｉｂｌｉｏｇｒａｐｈｙ”を持つリンクで参照されている。この文書２１には、この本の参考文献のリストを属性ｂｉｂｌｉｏｇｒａｐｈｙ＝“ｈｒｅｆ”をもつ＜ａ＞タグによって記述している。
【００３１】
図３は、上記に示したＨＴＭＬ文書間の関係を図に表したものである。角丸四角形はＨＴＭＬ文書内でリンクの記述されている部分を示し、矢印がリンクの参照先を示している。ここで、網掛けした角丸四角形は本の外部への参照を表し、矢印に付加された文字はリンクのｂｏｏｋ属性やｓｅｃｔｉｏｎ属性、ｉｎｄｅｘ属性の値を表している。
【００３２】
図４は、上記に示したＨＴＭＬ文書を本発明の第一の実施形態例によって本型に整形した例を示した図である。それぞれのＨＴＭＬ文書は記述された本型の論理的構造に沿って、図１で示したモデルの本１、目次３、本文４、参考文献目録５の順に並べ替えがなされている。本実施形態例では、一つのＨＴＭＬ文書が一ページに収まりきらない場合は、二ページ以上に分割する。図４に示した各四角形は本型整形後の一ページを表しており、下部にハイフン（−）で囲んだ数字はそのページのページ番号を表している。
【００３３】
図５は、本発明の第一の実施形態例に係るＨＴＭＬ文書本型整形装置の構成を表すブロック図である。同図に示すＨＴＭＬ文書取得部２２は、ＷＷＷ等のＨＴＭＬ文書を蓄積しているデータベースよりＨＴＭＬ文書を取得しＨＴＭＬ構文解析部２３に渡す役割をもつ。ＨＴＭＬ構文解析部２３では、ＨＴＭＬ文書取得部２２より渡されたＨＴＭＬ文書の構文を解析し、処理中のＨＴＭＬ文書のＵＲＩと本実施形態例で定めた属性をもつリンクを本型構造解析部２４へ渡し、処理中のＨＴＭＬ文書を部品記憶部２５へと格納する。本型構造解析部２４は、本発明の最も主要な部分であり、本型の論理的構造を解析しＨＴＭＬ文書の並べ替えを行う。本型構造解析部２４で処理を行う際、処理中のＨＴＭＬ文書のＵＲＩを構造記憶部２６に登録する。また、ＨＴＭＬ構文解析部２３により渡されたリンクに構造記憶部２６に存在しないＵＲＩが記述されていた場合には、ＨＴＭＬ文書取得部２２にそれらのＵＲＩを渡してＨＴＭＬ文書を取得することを再帰的に行う。取得していないＨＴＭＬ文書がなくなったら、本実施形態例で定めた属性に従ってＨＴＭＬ文書の並べ替えを行い、ＨＴＭＬ文書の並び方の順番を構造記憶部２６に登録し、本型整形部２７の処理を開始する。本型整形部２７は、構造記憶部２６に登録されたＨＴＭＬ文書の並び方の順番で、部品記憶部２５に格納されたＨＴＭＬ文書をページに収まるように分割する処理を行う。本型整形部２７で処理を行う際、そのＨＴＭＬ文書のＵＲＩとページ番号の対応を記述したＵＲＩ⇔ページ番号対応表２８を作成する。本型整形部２７の処理が終了したら、その結果を表示データ生成部２９に渡す。表示データ生成部２９では、本型整形部２７で分割されたＨＴＭＬ文書を一ページ毎に情報表示部３０で表示できる形式に変換する。表示データ生成部２９で処理を行う際、ＵＲＩ⇔ページ番号対応表２８に存在しないＵＲＩは本外部への参照としてそのまま残し、ＵＲＩ⇔ページ番号対応表２８に存在するＵＲＩは本内部への参照としてページ番号に変換する処理を行う。情報表示部３０では、表示データ生成部２９で変換されたＨＴＭＬ文書を本の形で表示する。
【００３４】
以上のようにして、ＨＴＭＬ文書にそれらの間の論理的構造を記述し、その情報を使ってＨＴＭＬ文書を並べ替えることが可能となり、ＨＴＭＬ文書を効率的に本の形に整形し表示することができるようになる。
【００３５】
次に、図６のフローチャートを参照し、上記実施形態例において本型構造解析部２４で本型の論理的構造に基づいたＨＴＭＬ文書の並べ替えを行う動作について詳細に説明する。まず、本型に整形するための出発点となるＨＴＭＬ文書をルート文書と呼ぶ。ルート文書は図１に示した本１に対応し、論理的構造を記述するリンクは本実施形態例で定めた属性ｂｏｏｋをもつ。本実施形態例では、最初にルート文書を入力することで処理が開始される。本型構造解析部２４では、ステップＳ１としてＨＴＭＬ構文解析部２３より渡されたルート文書のＵＲＩを構造記憶部２６に格納し、同時に渡されたルート文書内のｂｏｏｋ属性をもつリンクに格納されたＵＲＩをルート文書に出現する順を並べ、図７に示すような木を作成する。図７の楕円はＵＲＩを表しノードと呼ぶ。同図の矢印はｂｏｏｋ属性をもつリンクとその属性値である“ｉｎｄｅｘ”，“ｓｅｃｔｉｏｎ”などを表している。図７のような木を作成するとき、ｂｏｏｋ属性として適さない値をもつリンクは木に含めないこととする。次に、現在、葉となっているＵＲＩをＨＴＭＬ文書取得部２２に渡し、ステップＳ２に進む。
【００３６】
ステップＳ２では、まず、ＨＴＭＬ構文解析部２３より渡された本実施形態例で定めた属性をもつリンクのうち属性値が“ｓｅｃｔｉｏｎ”，“ｎｅｘｔ”，“ｐｒｅｖｉｏｕｓ”であるものを、処理中のＨＴＭＬ文書に出現する順で木に追加する。追加先のノードはＨＴＭＬ構文解析部２３より渡されたＵＲＩと同じＵＲＩをもつノードとする。ｂｏｏｋ属性やｓｅｃｔｉｏｎ属性の値が“ｓｅｃｔｉｏｎ”であるリンクをｓｅｃｔｉｏｎリンクと呼ぶが、追加したリンクがｓｅｃｔｉｏｎリンクだった場合には、その参照先のＵＲＩをＨＴＭＬ文書取得部２２に渡す。木の中に処理していないｓｅｃｔｉｏｎリンクが無くなったらステップＳ３に進む。ステップＳ２の結果は図８のようになる。同図に示すレベルは、ルート文書から何回のｓｅｃｔｉｏｎリンク参照で到達できる文書かで定義し、小さい方をより上位のレベルとする。レベル１は図１における章８、レベル２は節９というような対応関係がある。ここで、木へリンクを追加する場合には、それぞれの属性として適さない値をもつリンクや同一もしくはそれ以上のレベルに対するｓｅｃｔｉｏｎリンクを無視する。
【００３７】
ステップＳ３では、木に存在するｎｅｘｔリンク、ｐｒｅｖｉｏｕｓリンクの参照先ＵＲＩで木の中に存在しないものをＨＴＭＬ文書取得部２２に渡し、ＨＴＭＬ構文解析部２３により渡された本実施形態例で定めた属性をもつリンクのうち属性値“ｓｅｃｔｉｏｎ”，“ｎｅｘｔ”，“ｐｒｅｖｉｏｕｓ”であるものを、処理中のＨＴＭＬ文書に出現する順で木に追加する。ここで、ｎｅｘｔリンクとは本実施形態例で定めた属性の値が“ｎｅｘｔ”であるリンクのことであり、ｐｒｅｖｉｏｕｓリンクとは本実施形態例で定めた属性の値が“ｐｒｅｖｉｏｕｓ”であるリンクのことである。追加先のノードはＨＴＭＬ構文解析部２３より渡されたＵＲＩと同じＵＲＩをもつノードとする。木の中に処理していないｎｅｘｔリンクやｐｒｅｖｉｏｕｓリンクが無くなったらステップＳ４に進む。
【００３８】
ステップＳ４では、木の中に未解決のリンクが含まれているかどうか判定する。未解決のリンクとは、ｓｅｃｔｉｏｎリンク、ｎｅｘｔリンク、ｐｒｅｖｉｏｕｓリンクで、その参照先のＵＲＩがＨＴＭＬ文書取得部２２に渡されていないリンクのことである。未解決のリンクが存在する場合にはステップＳ２に戻り、未解決のリンクが存在しない場合にはステップＳ５に進む。
【００３９】
ステップＳ５では、同一レベルにあるノードのｎｅｘｔリンク優先の並べ替えを行う。並べ替えは、同一レベルにあるノードをｐｒｅｖｉｏｕｓリンクにできるだけ矛盾がないように並べ替えた後、ｎｅｘｔリンクにできるだけ矛盾がないように並べ替えることで行う。ここで、矛盾がないように並べ替えるには、ｐｒｅｖｉｏｕｓリンクやｎｅｘｔリンクによる関係を値の大小関係と考えソートを実行すればよい。
【００４０】
ステップＳ５の動作例を、図９を参照しながら説明する。ステップＳ５の初期状態では図９（ａ）に示すように、ノードは１番，２番，３番，４番，５番の順で並び、１番から２番にｎｅｘｔリンク、２番から３番にｐｒｅｖｉｏｕｓリンク、３番から４番にｎｅｘｔリンク、４番から５番にｎｅｘｔリンクとｐｒｅｖｉｏｕｓリンクが設定されていたとする。ｐｒｅｖｉｏｕｓリンクにできるだけ矛盾がないように並べ替えるには、２番と３番を入れ替え、４番と５番も入れ替えれば良い。すると図１０（ｂ）に示すように、１番，３番，２番，５番，４番の順でノードが並ぶことになる。次に、ｎｅｘｔリンクにできるだけ矛盾がないように並べ替えるには、４番と５番を入れ替えれば良い。結果、図９（ｃ）に示すように、１番，３番，２番，４番，５番の順でノードが並ぶ。このとき、ｐｒｅｖｉｏｕｓリンクに矛盾が生ずるが、ｎｅｘｔリンクが優先であるので無視する。このような並べ替えを木に存在する全てのノードに対して行い、ステップＳ６に進む。
【００４１】
ステップＳ６では、作成された木に従って各ノードに対応するＵＲＩを順序づけする。図１０は、ステップＳ６の実行結果の例を示した図である。ステップＳ５までで、図１０（ａ）のように作成された木を深さ優先で一次元化することで、図１０（ｂ）に示したように本型の順序づけができる。また、同図に示したように、木に同一のＵＲＩをもつノードが複数存在した場合には、本型に整形されたときに後ろに来るノードを削除する。
【００４２】
以上で、本発明の第一の実施形態例における本型構造解析部２４で本型の論理的構造に基づいたＨＴＭＬ文書の並べ替えを行う動作が完了する。
【００４３】
〔実施形態例２〕
次に、本発明の第二の実施形態例について図面を用いて詳細に説明する。本実施形態例は、図６に示した本発明の第一の実施形態例に係るＨＴＭＬ文書本型整形装置の構成を表すブロック図における本型構造解析部２４を、第一の実施形態例のように本型の論理的構造の記述を全てのＨＴＭＬ文書から引き出すのではなく、本型の論理的構造を記述した目次文書から、その記述を用いて表現されたＨＴＭＬ文書間の論理的構造を本構造に変換するように変更した本発明の一実施形態例である。
【００４４】
図１１は、第二の実施形態例におけるＨＴＭＬ文書間の関係を表した図である。角丸四角形はＨＴＭＬ文書内でリンクの記述されている部分を示し、矢印がリンクの参照先を示している。ここで、網掛けした角丸四角形は本の外部への参照を表し、矢印に付加された文字はリンクのｂｏｏｋ属性やｓｅｃｔｉｏｎ属性、ｉｎｄｅｘ属性の値を表している。同図に示したｍｏｋｕｊｉ．ｈｔｍｌ３１は、ｂｏｏｋ．ｈｔｍｌ３２から、属性ｂｏｏｋ＝“ｉｎｄｅｘ”をもつリンクで参照されており、本実施形態例ではここにＨＴＭＬ文書の並び順などの論理的構造が記述される。記述の方法としては、属性ｉｎｄｅｘ＝“ｒｅｆｅｒ”をもつリンクを本に整形したときの順で記述することが挙げられる。このリンクに、ＨＴＭＬ文書内の構造を記述するためのタグである＜Ｈｎ＞タグを組み合わせることで、例えば＜Ｈ１＞と＜／Ｈ１＞で囲まれたリンクは章を表し、＜Ｈ２＞と＜／Ｈ２＞で囲まれたリンクは節を表すといったように、ＨＴＭＬ文書間の階層関係も表現することができる。
【００４５】
以下に上記した目次文書であるｍｏｋｕｊｉ．ｈｔｍｌ３１の例を示す。
【００４６】
ｍｏｋｕｊｉ．ｈｔｍｌ〜３３
＜ｂｏｄｙ＞
＜ｈ１＞目次＜ｈ１＞＜ｐ＞
＜ｈ１＞＜Ａｉｎｄｅｘ＝“ｒｅｆｅｒ”ｈｒｅｆ＝“ｃｈａｐ１．ｈｔｍｌ”＞第一章＜／Ａ＞＜／ｈ１＞
＜ｈ２＞＜Ａｉｎｄｅｘ＝“ｒｅｆｅｒ”ｈｒｅｆ＝“ｓｅｃ１．ｈｔｍｌ”＞第一節＜／Ａ＞＜ｈ２＞
＜ｈ２＞＜Ａｉｎｄｅｘ＝“ｒｅｆｅｒ”ｈｒｅｆ＝“ｓｅｃ２．ｈｔｍｌ”＞第二節＜／Ａ＞＜ｈ２＞
＜ｐ＞
＜Ａｉｎｄｅｘ＝“ｒｅｆｅｒ”ｈｒｅｆ＝“ｃｈａｐ２．ｈｔｍｌ”＞第二章＜／Ａ＞
＜／ｂｏｄｙ＞
図１２は、図１１に示したＨＴＭＬ文書間の関係を用いて本発明の第二の実施形態例によって本型に整形した例を示した図である。それぞれのＨＴＭＬ文書はｍｏｋｕｊｉ．ｈｔｍｌ３３に記述された本型の論理的構造に沿って、図１で示したモデルの本１、目次３、本文４の順に並べられている。本実施形態例でも、一つのＨＴＭＬ文書が一ページに収まりきらない場合は、二ページ以上に分割する。図１２に示した各四角形は本型整形後の一ページを表しており、下部にハイフン（−）で囲んだ数字はそのページのページ番号を表している。また、同図の矢印は上記に示した属性ｉｎｄｅｘ＝“ｒｅｆｅｒ”をもつリンクを表し、点線はＨＴＭＬ文書に記述されているｎｅｘｔリンク、ｐｒｅｖｉｏｕｓリンクによって結びつけられているグループを表している。
【００４７】
次に、本発明の第二の実施形態例に係るＨＴＭＬ文書本型整形装置の構成を表すブロック図であるが、これは図５に示したものと同様で、ＨＴＭＬ構文解析部２３から本型構造解析部２４へ渡されるデータと本型構造解析部２４の動作のみが異なる。ＨＴＭＬ構文解析部２３の変更は、本型構造解析部２４へ渡すデータの内、本実施形態例で定めた属性をもつリンクに、見出しの文字サイズを表す＜Ｈｎ＞タグの現在値を付加するように変更することで行う。
【００４８】
本型構造解析部２４の動作については第二の実施形態例における最も主要な部分であるため、図１３のフローチャートを参照しながら詳細に説明する。本実施形態例でも第一の実施形態例と同様に、最初にルート文書を入力することで処理が開始される。本実施形態例の本型構造解析部２４では、ステップＳ７としてＨＴＭＬ構文解析部２３より渡されたルート文書のＵＲＩを構造記憶部２６に格納し、同時に渡されたルート文書内の属性ｂｏｏｋ＝“ｉｎｄｅｘ”をもつリンクに格納されたＵＲＩを保持しつつＨＴＭＬ文書取得部２２に渡し、該保持したＵＲＩと同じＵＲＩがＨＴＭＬ構文解析部２３により渡されるのを待つ。該ＵＲＩがＨＴＭＬ構文解析部２３より渡されたら、同時に渡された本実施形態例で定めた属性をもつリンクのうち属性ｉｎｄｅｘ＝“ｒｅｆｅｒ”であるのを、処理中のＨＴＭＬ文書に出現する順で木に追加する。追加先のノードは、付加された＜Ｈｎ＞タグの値によって決定する。
【００４９】
図１６は、第二の実施形態例において目次文書から本型の論理的構造を木に追加している様子を示した図である。目次文書３３からは、第一章３４、第一章第一節３５、第一章第二節３６、第二章３７へと属性ｉｎｄｅｘ＝“ｒｅｆｅｒ”をもつリンクが設定されているので、そのリンクが出現する順に木への追加を行う。まず、第一章３４へのリンクは＜Ｈ１＞タグと＜／Ｈ１＞タグで囲まれているのでリンク先はレベル１のノードとなる。そこで、第一章３４へのリンクをルート文書からのリンクとして追加する。次に、第一章第一節３５へのリンクは＜Ｈ２＞タグと＜／Ｈ２＞タグで囲まれているのでリンク先はレベル２のノードとなる。また、最後に追加したレベル１のノードは第一章３４なので、第一章第一節３５へのリンクを第一章３４からのリンクとして追加する。同様に、第一章第二節３６へのリンクも第一章３４からのリンクとして追加する。最後に、第二章３７へのリンクには＜Ｈｎ＞タグの情報が存在しないため、ルート文書からのリンクとして追加を行う。以上のようにして、目次文書３３から本の論理的構造を表す木が作成されたら、木の中の未解決のリンクに対してそのリンク先のＵＲＩをＨＴＭＬ文書取得部２２に渡し、ＨＴＭＬ構文解析部２３より渡された本実施形態例で定めた属性をもつリンクのうち属性値が“ｎｅｘｔ”，“ｐｒｅｖｉｏｕｓ”であるものを、同時に渡されるＵＲＩと同じＵＲＩをもつノードに追加する。木の中の未解決のリンクがなくなったら、ステップＳ８に進む。
【００５０】
ステップＳ８では、ステップＳ３とほとんど同じ動作を行い、木に存在するｎｅｘｔリンク、ｐｒｅｖｉｏｕｓリンクの参照先ＵＲＩで木の中に存在しないものをＨＴＭＬ文書取得部２２に渡し、ＨＴＭＬ構文解析部２３より渡された本実施形態例で定めた属性をもつリンクのうち属性値“ｎｅｘｔ”，“ｐｒｅｖｉｏｕｓ”であるものを、処理中のＨＴＭＬ文書に出現する順で木に追加する。追加先のノードは同時に渡されたＵＲＩと同じＵＲＩをもつノードとする。木の中に処理していないｎｅｘｔリンクやｐｒｅｖｉｏｕｓリンクが無くなったらステップＳ９に進む。
【００５１】
ステップＳ９では、ステップＳ５と全く同じ動作を行い、同一レベルにあるノードのｎｅｘｔリンク優先の並べ替えを行う。並べ替えを木に存在する全てのノードに対して行ったら、ステップＳ１０に進む。
【００５２】
ステップＳ１０では、ステップＳ６と全く同じ動作を行い、作成された木に従って各ノードに対応するＵＲＩを順序づけし、木に同一のＵＲＩをもつノードが複数存在した場合には、本型に整形されたときに後ろに来るノードを削除する。
【００５３】
以上で、本発明の第二の実施形態例における本型構造解析部２４で本型の論理的構造に基づいたＨＴＭＬ文書の並べ替えを行う動作が完了する。
【００５４】
〔実施形態例３〕
最後に、本発明の第三の実施形態例について詳細に説明する。本実施形態例では、図６に示した本発明の第一の実施形態例に係るＨＴＭＬ文書本型整形装置の構成を表すブロック図におけるＨＴＭＬ構文解析部２３の動作のみが異なる。ＨＴＭＬ構文解析部２３では、ＨＴＭＬ文書取得部２２より渡されたＨＴＭＬ文書の構文を解析し、処理中のＨＴＭＬ文書のＵＲＩと本実施形態例で定めた属性をもつリンクを本型構造解析部２４へ渡し、処理中のＨＴＭＬ文書を部品記憶部２５へと格納するが、本実施形態例ではリンクを本型構造解析部２４へ渡す前にＨＴＭＬのリンクに従来から存在するＲＥＬ属性、ＲＥＶ属性を本実施形態例で定めた属性に変換して渡すことができる。以下、この変換について詳細に説明する。
【００５５】
ＨＴＭＬのリンクに従来から存在するＲＥＬ属性は、ＲＥＬａｔｉｏｎの略でリンク元からリンク先への順方向の関係を記述する。また、ＲＥＶ属性は、ＲＥＶｅｒｓｅの略でリンク先からリンク元へという逆方向の関係を記述する。ＲＥＬ属性やＲＥＶ属性の値としては、“ｍａｄｅ”，“ｐａｒｅｎｔ”，“ｎｅｘｔ”，“ｐｒｅｖｉｏｕｓ”などが記述できる。そこで、本実施形態例におけるＨＴＭＬ構文解析部２３では、属性ＲＥＶ＝“ｐａｒｅｎｔ”をルート文書では属性ｂｏｏｋ＝“ｓｅｃｔｉｏｎ”、その他のＨＴＭＬ文書では属性ｓｅｃｔｉｏｎ＝“ｓｅｃｔｉｏｎ”に変換する。また、属性値が“ｎｅｘｔ”や“ｐｒｅｖｉｏｕｓ”であるＲＥＬ属性をルート文書ではｂｏｏｋ属性、その他のＨＴＭＬ文書ではｓｅｃｔｉｏｎ属性に変換する。同様に、ルート文書における属性ＲＥＶ＝“ｍａｄｅ”を属性ｂｏｏｋ＝“ｍａｄｅ”に変換する。このような変換によって、ＨＴＭＬのリンクの属性に対する拡張を行うことなく、ＨＴＭＬ文書間の本型の論理的構造を記述することが可能となる。
【００５６】
以上で、本発明の第三の実施形態例におけるＨＴＭＬ構文解析部２３でＨＴＭＬのリンクに従来から存在するＲＥＬ属性、ＲＥＶ属性を本実施形態例で定めた属性に変換する動作が完了する。
【００５７】
【発明の効果】
以上説明したように、本発明によれば、ＨＴＭＬ文書間に本のような目次、章、節などの論理的構造を付与しようとした場合、その論理的構造に対応した属性を付与することで利用者の認識に頼らないリンクを設定することができるようになる。また、上記したようにして記述されたリンクは、他のリンクと属性の点で区別されており、本の形に整形する際にどのリンクを使って順序づけすれば良いかという情報を計算機で抽出することが簡単になるという効果がある。
【００５８】
さらに、本発明によれば、ＨＴＭＬ文書間の本型でない論理的構造の記述も損なうことなく本の形に整形し表示することができるため、ＷＷＷクライアントと呼ばれる装置に代わって、密な関係をもったＨＴＭＬ文書群を本の形で管理することが可能となり、利用者にとってよりわかりやすい利用法を提供できるようになるという利点もある。
【図面の簡単な説明】
【図１】本発明の第一の実施形態例によって整形された結果の本型の論理的構造をモデル化した図である。
【図２】本発明の第一の実施形態例における論理的構造をリンクに記述するために必要となる属性を示した図である。
【図３】本発明の第一の実施形態例におけるＨＴＭＬ文書間の論理的構造を図に表したものである。
【図４】ＨＴＭＬ文書を本発明の第一の実施形態例によって本型に整形した例を示した図である。
【図５】本発明の第一の実施形態例に係るＨＴＭＬ文書本型整形装置の構成を表すブロック図である。
【図６】本発明の第一の実施形態例における本型構造解析部で本型の論理的構造に基づいたＨＴＭＬ文書の並べ替えを行う動作を示すフローチャートである。
【図７】図６に示したステップＳ１を実行した結果、作成される木の例を示した図である。
【図８】図６に示したステップＳ２を実行した結果、図８の木より作成される木の例を示した図である。
【図９】図７に示したステップＳ５の動作例を示した図である。
【図１０】図７に示したステップＳ６の実行結果の例を示した図である。
【図１１】本発明の第二の実施形態例におけるＨＴＭＬ文書間の関係を表した図である。
【図１２】図１１に示したＨＴＭＬ文書間の関係を用いて本発明の第二の実施形態例によって本型に整形した例を示した図である。
【図１３】本発明の第二の実施形態例における本型構造解析部で本型の論理的構造に基づいたＨＴＭＬ文書の並べ替えを行う動作を示すフローチャートである。
【図１４】本発明の第二の実施形態例において目次文書から本型の論理的構造を木に追加している様子を示した図である。
【符号の説明】
１…本型の論理的構造モデルにおける本
２…本型の論理的構造モデルにおけるまえがき
３…本型の論理的構造モデルにおける目次
４…本型の論理的構造モデルにおける本文
５…本型の論理的構造モデルにおける参考文献目録
６…本型の論理的構造モデルにおける索引
７…本型の論理的構造モデルにおけるその他の内容
８…本型の論理的構造モデルにおける本文中の章
９…本型の論理的構造モデルにおける章中の節
１０…本型の論理的構造モデルにおける節中のページ
１１…本型の論理的構造モデルにおけるページ中の単語
２２…ＨＴＭＬ文書取得部
２３…ＨＴＭＬ構文解析部
２４…本型構造解析部
２５…部品記憶部
２６…構造記憶部
２７…本型整形部
２８…ＵＲＩ⇔ページ番号対応表
２９…表示データ生成部
３０…情報表示部
３１…ｍｏｋｕｊｉ．ｈｔｍｌ
３２…ｂｏｏｋ．ｈｔｍｌ
３３…目次文書ｍｏｋｕｊｉ．ｈｔｍｌ
３４…第一章
３５…第一章第一節
３６…第一章第二節
３７…第二章[0001]
TECHNICAL FIELD OF THE INVENTION
According to the present invention, when an HTML (Hyper Text Makeup Language) document such as the WWW (World Wide Web) stored in the Internet is formatted and displayed in a book form that is easy for a user to view, the present invention is applied to an HTML link. The present invention relates to a method and an apparatus for adding an attribute for describing a logical structure of a type, and shaping the book into a book shape using a link with the attribute.
[0002]
[Prior art]
In a conventional apparatus for shaping and displaying an HTML document, particularly an apparatus called a WWW client, one HTML document is displayed per display apparatus. The relationship between the HTML document and another HTML document is expressed using a link, and even if the HTML document has a close relationship with another HTML document as represented by one book, each is independent. Will be managed.
[0003]
In order for the user to recognize a logical structure such as a hierarchy or a context between HTML documents, for example, a table of contents, a chapter, a section, etc., using such a link, the following pages, It is necessary to set a link such as "previous page" and have the user make such a transition.
[0004]
[Problems to be solved by the invention]
In the case of using the conventional technology, even if an attempt is made to add a logical structure such as a table of contents, a chapter, or a section between HTML documents, the user is required to recognize the next page or the previous page, depending on the recognition. Had to set up a proper link. Also, even if links such as "next page" and "previous page" are set, since those links are not distinguished from other links at all, when linking them into a book shape, There was not enough information on whether or not to order using, and it was difficult to process this with a computer.
[0005]
An object of the present invention is to describe a logical structure between HTML documents by adding an attribute capable of describing a logical structure such as a table of contents, a chapter, and a section to a link between the HTML documents, An object of the present invention is to provide a method and an apparatus capable of efficiently formatting and displaying a book by rearranging an HTML document using the information.
[0006]
[Means for Solving the Problems]
In order to achieve the above object, the present invention according to claim 1 formats an HTML document described using HTML which is a structure description language for describing information such as hypertext information on the Internet on a tag basis. A method for transitioning from arbitrary information to other information, which is provided with an identifier in an HTML document called a link, and has a logical structure such as a book-type hierarchy and context between a plurality of HTML documents. A first step of interpreting an attribute which is a description, a second step of converting the logical structure into a tree structure using the attribute, and a step of converting the tree structure into a plurality of HTML documents represented by the attribute. A third process of rearranging the HTML document so as not to be inconsistent with the context, and a fourth process of linearly arranging the HTML document based on the rearranged tree structure. A law, the most important feature to be able to shape the HTML document to the type.
[0007]
According to the first aspect of the present invention, a logical structure such as a table of contents, a chapter, and a section can be described in a link between HTML documents. You will be able to describe logical structures. By rearranging the HTML document using the information, it is possible to efficiently shape the document into a book.
[0008]
According to a second aspect of the present invention, in the first aspect, the second step prepares a table of contents document describing a logical structure of a book type between a plurality of HTML documents. In the process of converting the logical structure between HTML documents into a tree structure using the description of the HTML document, by giving only a table of contents document describing the logical structure of the book type between the HTML documents, the description of the table of contents And converting the logical structure between the HTML documents into a tree structure by using the HTML document. It is possible to shape the HTML document into the main form without describing the logical structure of the main form in the HTML document itself. The most important feature.
[0009]
According to the second aspect of the present invention, even if the HTML document itself is described by the conventional technology, only the table of contents document describing the logical structure of the book type between the HTML documents is provided. It becomes possible to describe the logical structure between HTML documents that could not be created. By rearranging the HTML document using the information, it is possible to efficiently shape the document into a book.
[0010]
Further, in the present invention according to claim 3, in the invention according to claim 1 or 2, the logical structure between the plurality of HTML documents is different from the REL attribute expressing the forward relationship where the link exists in the reverse direction. When the logical structure is described with a REV attribute expressing a relationship, a step of converting this logical structure into a description of a logical structure of the present type is newly provided before the first step, and the logical By describing the logical structure with the existing REL attribute or REV attribute in the HTML link, and converting the logical structure between the HTML documents expressed using the description into a logical type description of this type, The most important feature is that the HTML document can be shaped into a book form.
[0011]
According to the third aspect of the present invention, even if the attribute of the link of the HTML is not extended, the logical relationship of the book type such as the hierarchy and the context is determined by using the existing REL attribute or REV attribute. By describing, the logical structure between HTML documents can be described. By rearranging the HTML document using the information, it is possible to efficiently shape the document into a book.
[0012]
Next, in order to achieve the above object, the present invention according to claim 4 provides an HTML document described using HTML which is a structure description language for describing information such as hypertext information on the Internet on a tag basis. A device for shaping an HTML document described using HTML into a book-type structure, and a unit for displaying the book-type structure in a book form on a screen. Means for interpreting an attribute which is a description of a logical structure such as a book type hierarchy or context between HTML documents; means for converting the logical structure into a tree structure using the attribute; And a means for rearranging the HTML documents so as not to contradict the context between the plurality of HTML documents represented by... And a means for linearly arranging the HTML documents based on the rearranged tree structure. An L document the type shaping apparatus, the most important feature that can be displayed to shape the HTML document to the type.
[0013]
By using the present invention described in claim 4, a logical structure such as a table of contents, a chapter, and a section can be described in a link between HTML documents. You will be able to describe logical structures. By rearranging the HTML document using the information, it is possible to efficiently shape and display the book in the form of a book.
[0014]
According to a fifth aspect of the present invention, in the invention of the fourth aspect, the means for converting the logical structure into a tree structure includes a table of contents document describing a logical structure of a book type between a plurality of HTML documents. As a means for preparing and converting a logical structure between HTML documents into a tree structure based on this, by giving only a table-of-contents document describing a book-type logical structure between HTML documents, the HTML document itself can be converted. The most important feature is that an HTML document can be shaped into a book without describing the logical structure of the book.
[0015]
According to the fifth aspect of the present invention, even if the HTML document itself is described by the conventional technique, only the table of contents document describing the logical structure of the book type between the HTML documents is provided. It becomes possible to describe the logical structure between HTML documents that could not be created. By rearranging the HTML document using the information, it is possible to efficiently shape the document into a book.
[0016]
Further, in the present invention according to claim 6, in the invention according to claim 4 or 5, the logical structure between the plurality of HTML documents is such that the logical structure between the HTML document and the REL attribute expressing the forward relationship exists in the reverse direction. In the case where the HTML document is described with a REV attribute that expresses a relationship, a means for converting the logical structure into a description of the logical structure of the book type is newly provided, so that the HTML document can be shaped into the book form. The most important feature.
[0017]
According to the sixth aspect of the present invention, even if the attribute of the HTML link is not extended, the logical relationship of the book type such as the hierarchy and the context is determined by using the existing REL attribute or REV attribute. By describing, the logical structure between HTML documents can be described. By rearranging the HTML document using the information, it is possible to efficiently shape the document into a book.
[0018]
Hereinafter, the operation of the present invention will be described.
[0019]
In the present invention according to claim 1, the process of interpreting an attribute given as a description of a logical structure such as a book hierarchy or a context between HTML documents in a link cannot be interpreted by conventional techniques. It allows you to interpret the logical structure of a book. Also, the process of converting the logical structure between the HTML documents expressed using the attribute into a tree structure is centralized by expressing the description of the logical structure dispersed in each HTML document as one tree structure. I can handle it. Further, the process of rearranging the tree structure so as to be as consistent as possible with the context between the documents represented by the attributes is described in the link between the HTML documents with the assistance of the context described in a higher hierarchy. By rearranging the HTML documents in the order of the HTML documents, the rearrangement can be performed normally even when the context is not described between the HTML documents or when an inconsistent description is included. On the other hand, the process of linearly arranging the HTML document based on the rearranged tree structure is performed by searching for a tree structure that is as suited as possible to the book-type logical structure described in the HTML document in a depth-first manner. HTML documents are arranged linearly. Therefore, it is possible to describe the logical structure between the HTML documents and to rearrange the HTML documents using the information, so that the HTML document, which is the object of the present invention, can be efficiently shaped into a book form. become.
[0020]
In the present invention according to claim 2, the step of converting the logical structure between the HTML documents expressed using the description of the table of contents document into a tree structure expands the description of the given table of contents document in order from the top of the document. Thus, the hierarchy and the context between the HTML documents are obtained, and the logical structure between the HTML documents is converted into a tree structure using the information. Therefore, even if the logical structure of this type is not described in the HTML document itself, the logical structure between the HTML documents can be converted into a tree structure from the description of the table of contents document. Documents can be efficiently shaped into trees.
[0021]
In the present invention according to claim 3, a step of converting a logical structure between HTML documents expressed by using a description of a REL attribute or a REV attribute existing in an HTML link to a description of a logical structure of the present type. Converts parent-child relationships and contexts expressed by REL attributes and REV attributes of links into hierarchical relationships and contexts of this type. Therefore, it is possible to describe the logical structure between the HTML documents and to rearrange the HTML documents using the information, so that the HTML document, which is the object of the present invention, can be efficiently shaped into a book form. become.
[0022]
In the present invention according to claim 4, means for interpreting an attribute given as a description of a logical structure such as a book type hierarchy or a context between HTML documents in a link cannot be interpreted by a conventional technique. It allows you to interpret the logical structure of a book. Next, the means for converting the logical structure between the HTML documents expressed using the attributes into a tree structure is centralized by expressing the description of the logical structure dispersed in each HTML document as one tree structure. To be able to handle. Further, the means for rearranging the tree structure so as to be as inconsistent as possible with the context between the documents represented by the attributes is described in a link between HTML documents with the assistance of the context described in a higher hierarchy. By rearranging the HTML documents in the order of the HTML documents, the rearrangement can be performed normally even when the context is not described between the HTML documents or when an inconsistent description is included. Lastly, the means for linearly arranging the HTML document based on the rearranged tree structure is such that a tree structure that matches as much as possible the book-type logical structure described in the HTML document is searched in a depth-first manner. HTML documents are arranged linearly. Therefore, it is possible to describe the logical structure between the HTML documents and to rearrange the HTML documents using the information, and to efficiently format and display the HTML document, which is the object of the present invention, in the form of a book. It is possible to provide a device capable of performing such operations.
[0023]
In the present invention according to claim 5, the means for converting the logical structure between the HTML documents expressed using the description of the table of contents document into a tree structure expands the description of the given table of contents document in order from the top of the document. Thus, the hierarchy and the context between the HTML documents are obtained, and the logical structure between the HTML documents is converted into a tree structure using the information. Therefore, even if the logical structure of this type is not described in the HTML document itself, the logical structure between the HTML documents can be converted into a tree structure from the description of the table of contents document. Documents can be efficiently shaped into trees.
[0024]
In the present invention according to claim 6, means for converting a logical structure between HTML documents, which has been conventionally described in an HTML link with a REL attribute or a REV attribute, into a description of a logical structure of this type is provided. The parent-child relationship and the context expressed by the REL attribute and the REV attribute are converted into the hierarchical relationship and the context of this type. Therefore, it is possible to describe the logical structure between the HTML documents and to rearrange the HTML documents using the information, so that the HTML document, which is the object of the present invention, can be efficiently shaped into a book form. become.
[0025]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[0026]
[First Embodiment]
FIG. 1 is a diagram modeling a book-type logical structure as a result of shaping according to the first embodiment of the present invention. Book 1 shown in the figure represents the whole book as a logical structure. Such a book usually comprises a preface 2, a table of contents 3, a text 4, a bibliography 5, an index 6, and others 7. Here, the main text 4 is composed of a repetition of chapter 8, the chapter 8 is a repetition of section 9, the section 9 is a repetition of page 10, and the page 11 is a repetition of word 11. It consists of. Preface 2 and others 7 are also composed of repeated sections 9 and 8 in the same manner as text 4. The other 7 corresponds to what is called an appendix or supplement. Also, the table of contents 3 is mainly composed of references to the inside of the book such as the chapter 9 in the chapter 8. The index 6 is similarly configured by reference to the inside of the book such as the page 10. On the other hand, the reference bibliography 5 is composed of references to information outside the book such as another book, and is referred to from the word 11 inside the book. In the present invention, other than the logical structure of the page 10 and the word 11 is described, and the repetitive structure of the page 10 is automatically created.
[0027]
FIG. 2 is a diagram showing attributes necessary for describing the above logical structure in a link in the first embodiment of the present invention. The attribute book 12 shown in the figure corresponds to the book 1 in FIG. 1 and represents a link from an HTML document describing the entire book. The attribute section 13 corresponds to the foreword 2 and others 7 in FIG. 1, the chapter 8 and the section 9 of the main body 4, and represents a link from an HTML document having a structure in which pages such as chapters and sections are put together. The attribute index14 corresponds to the table of contents 3 and the index 6 in FIG. 1 and indicates that the attribute is a link from an HTML document in which references inside the book are compiled. The attribute biography15 corresponds to the bibliography list 5 in FIG. 1 and indicates that the attribute is a link from an HTML document that summarizes references to the outside of the book.
[0028]
The value of the attribute book 12 is “section” in the case of a link to the HTML document represented by the attribute section 13 such as the preface 2, the body 4, and the other 7, and the HTML document represented by the attribute index 14 such as the table of contents 3 and the index 6. In the case of a link to, "index" is given, and in the case of a link to an HTML document represented by an attribute bibliography15 such as reference list 5, "bibliography" is given. In addition, in order to describe a single book by several books, such as a continuous novel, "previous" representing logically the front and "next" representing logically the rear can be given. . Further, "made" representing the author can be given. Next, as a value of the attribute section 13, "section" is given to a link to an HTML document representing a section if the HTML document in which the link is written is a chapter, or to a section if the HTML document is a section. Also, in order to describe the logical context of chapters and sections, “previous” indicating the front and “next” indicating the back can be given. As a value of the attribute index14, "reference" is given in the case of a reference link to information inside the book. When the table of contents, the index, and the like are described in a plurality of HTML documents, “previous” representing logically preceding and “next” representing logically behind can be given. In the case of a reference link to information outside the book, “refer” is given as the value of the attribute biography15. When the reference bibliography is described in a plurality of HTML documents, "previous" indicating logically the front and "next" indicating logically the rear can be given.
[0029]
An example in which the above-mentioned attribute representing the logical structure in the first embodiment of the present invention is described in a link is shown below.
[0030]
book. html-16
<Head>
<Link book = “made” href = “mailto: kenya@ntt.jp”>
</ Head>
<Body>
<A book="index"href="mokuji.html"> Table of contents </A><p>
<A book=“section” href=“chap.html”> Chapter 1 </A><p>
<A book=“bibliography” href=“bib.html”> References </A><p>
</ Body>
mokuji. html-17
<Body>
<H1> Table of Contents <h1><p>
<A index="refer"href="chap1.html"> Chapter 1 </A><BR>
<A index=“refer” href=“sec1.html”> First section </A><BR>
<A index="refer"href="sec2.html"> Second section </A>
<P>
<A index="reference"href="chap2.html"> Second chapter </A>
</ Body>
chap1. html-18
<Body>
<H1> Chapter 1 <h1><p>
<A section="section"href="sec1.html"> First section </A><p>
<A section="section"href="sec2.html"> Second section </A><p>
</ Body>
sec1. html-19
<Body>
<H2> first section </ h2><p>
This is a test page for book type display. <P>
<A section=“next” href=“sec2.html”> Next section </A><p>
</ Body>
sec2. html-20
<Body>
<H2> Second section </ h2><p>
This is the second section of the book type test page. <P>
<A section=“previous” href=“sec1.html”> previous section </A><p>
</ Body>
bib. html-21
<Body>
<H1> Reference bibliography </ h1><p>
[1]
<A bibliography=“refer” href=“http://www.ntt.jp/”> NTT Home Page <// A><BR>
[2]
<A bibliography=“reference” href=“http://hil.ntt.jp/”> NTT Hammam Interface lab <// A><BR>
</ Body>
The book. The html 16 is an HTML document corresponding to the book 1 in FIG. 1 and represents the entire book. In a header surrounded by a <head> tag and a </ head> tag, a link indicating the author of the book is described by an <link> tag having an attribute book = "mode". Here, href = “character string” indicates that the character string is used as an identifier of a link, and the character string is referred to as Uniform Resource Identifier, and is referred to as URI. Also, the <a> tag describes a link to the table of contents, the first chapter, and the bibliography. mokuji. html17 is an HTML document corresponding to the table of contents 3 in FIG. html 16 is referred to by a link of attribute book = “index”. In the document 17, the <a> tag having the attribute index = “refer” in the table of contents representing the entire structure of the book includes the first chapter, the first chapter, the first chapter, the second section, and the second chapter. A link indicating the internal reference is described. This mokuji. The html 17 can be automatically generated from the contents of other HTML documents. chap1. html18 is an HTML document corresponding to chapter 8 in FIG. html 16 is referred to by a link having the attribute book = "section". In this document 18, a link to the first and second sections constituting the first chapter is described by an <a> tag having an attribute section = "section". sec1. html19 and sec2. html20 is an HTML document corresponding to section 9 in FIG. It is referenced from html 18 by a link having the attribute section = "section". In these documents 19 and 20, the contents of the first and second sections constituting the first chapter and the links indicating the logical context of the first and second sections have the attribute section = “next” and the attribute section = “previous”. <a> tag. bib. html21 is an HTML document corresponding to reference list 5 in FIG. html 16 is referred to by a link having an attribute book = "bibliography". In this document 21, a list of references of this book is described by an <a> tag having an attribute bibliography = "href".
[0031]
FIG. 3 illustrates the relationship between the HTML documents described above. A rounded rectangle indicates a portion where a link is described in the HTML document, and an arrow indicates a reference destination of the link. Here, the shaded rounded squares represent references to the outside of the book, and the characters added to the arrows represent the values of the book attribute, section attribute, and index attribute of the link.
[0032]
FIG. 4 is a diagram showing an example in which the above-described HTML document is shaped into a book according to the first embodiment of the present invention. Each HTML document is rearranged in the order of book 1, table of contents 3, text 4, and reference list 5 of the model shown in FIG. 1 in accordance with the logical structure of the book type described. In the present embodiment, if one HTML document cannot fit on one page, it is divided into two or more pages. Each square shown in FIG. 4 represents one page after the main pattern shaping, and the number surrounded by a hyphen (-) at the bottom represents the page number of the page.
[0033]
FIG. 5 is a block diagram illustrating a configuration of the HTML document book shaping apparatus according to the first embodiment of the present invention. The HTML document acquisition unit 22 shown in FIG. 2 has a role of acquiring an HTML document from a database storing HTML documents such as WWW and passing the acquired HTML document to the HTML syntax analysis unit 23. The HTML syntax analysis unit 23 analyzes the syntax of the HTML document passed from the HTML document acquisition unit 22, and converts the URI of the HTML document being processed and the link having the attribute determined in the present embodiment into the main structure analysis unit 24. The HTML document being processed is stored in the component storage unit 25. The book type structure analysis unit 24 is the most important part of the present invention, and analyzes the logical structure of the book type and rearranges the HTML document. When processing is performed by the main structure analysis unit 24, the URI of the HTML document being processed is registered in the structure storage unit 26. If the link passed by the HTML parsing unit 23 describes a URI that does not exist in the structure storage unit 26, it is recursively passed to the HTML document obtaining unit 22 to obtain the HTML document. Do it. When there are no more HTML documents that have not been acquired, the HTML documents are rearranged in accordance with the attributes defined in the present embodiment, the order of arrangement of the HTML documents is registered in the structure storage unit 26, and the processing of the book type shaping unit 27 is performed. Start. The pattern shaping unit 27 divides the HTML document stored in the component storage unit 25 so as to fit on a page in the order of arrangement of the HTML documents registered in the structure storage unit 26. When processing is performed by the book type shaping unit 27, a URI / page number correspondence table 28 that describes the correspondence between the URI of the HTML document and the page number is created. When the processing of the pattern shaping section 27 is completed, the result is passed to the display data generating section 29. The display data generating unit 29 converts the HTML document divided by the book shaping unit 27 into a format that can be displayed on the information display unit 30 for each page. When processing is performed by the display data generation unit 29, URIs that do not exist in the URI / page number correspondence table 28 are left as they are as references to the outside, and URIs that are present in the URI / page number correspondence table 28 are used as references to the inside of the book. Performs processing to convert to page numbers. The information display unit 30 displays the HTML document converted by the display data generation unit 29 in the form of a book.
[0034]
As described above, the logical structure between them is described in the HTML document, and the HTML document can be rearranged using the information, and the HTML document can be efficiently shaped into a book form and displayed. Will be able to
[0035]
Next, with reference to the flowchart of FIG. 6, the operation of rearranging the HTML documents based on the logical structure of the book in the book structure analyzing unit 24 in the above embodiment will be described in detail. First, an HTML document serving as a starting point for shaping into a book form is called a root document. The root document corresponds to the book 1 shown in FIG. 1, and the link describing the logical structure has the attribute book defined in this embodiment. In the present embodiment, the process is started by first inputting a root document. In step S1, the type structure analysis unit 24 stores the URI of the root document passed from the HTML syntax analysis unit 23 in the structure storage unit 26, and stores the URI in the passed root document in a link having a book attribute. The URIs are arranged in the order in which they appear in the root document, and a tree as shown in FIG. 7 is created. The ellipse in FIG. 7 represents a URI and is called a node. The arrows in the figure represent a link having a book attribute and its attribute value “index”, “section”, or the like. When a tree as shown in FIG. 7 is created, a link having a value that is not suitable as a book attribute is not included in the tree. Next, the URI that is currently a leaf is passed to the HTML document acquisition unit 22, and the process proceeds to step S2.
[0036]
In step S2, among the links having the attributes defined in the present embodiment passed from the HTML syntax analysis unit 23, those whose attribute values are “section”, “next”, and “previous” are processed. Add them to the tree in the order in which they appear in the HTML document. The destination node is a node having the same URI as the URI passed from the HTML syntax analysis unit 23. A link in which the value of the book attribute or the section attribute is “section” is called a section link. If the added link is a section link, the reference destination URI is passed to the HTML document acquisition unit 22. If there are no unprocessed section links in the tree, the process proceeds to step S3. The result of step S2 is as shown in FIG. The level shown in the drawing is defined by the number of section links that can be reached from the root document, and the smaller one is the higher level. Level 1 has a correspondence such as Chapter 8 in FIG. 1, and level 2 has a correspondence such as Section 9. Here, when adding a link to the tree, a link having an unsuitable value as each attribute or a section link for the same or higher level is ignored.
[0037]
In step S3, the reference URIs of the next link and the previous link existing in the tree that are not present in the tree are passed to the HTML document acquisition unit 22, and the URIs specified in the present embodiment passed by the HTML syntax analysis unit 23. Links having attribute values “section”, “next”, and “previous” among links having attributes are added to the tree in the order in which they appear in the HTML document being processed. Here, the next link is a link whose attribute value defined in the present embodiment is “next”, and the previous link is a link whose attribute value defined in the present embodiment is “previous”. That is. The destination node is a node having the same URI as the URI passed from the HTML syntax analysis unit 23. If there are no unprocessed next links or previous links in the tree, the process proceeds to step S4.
[0038]
In step S4, it is determined whether an unresolved link is included in the tree. The unresolved link is a section link, a next link, or a previous link, and is a link whose reference destination URI is not passed to the HTML document acquisition unit 22. When there is an unresolved link, the process returns to step S2, and when there is no unresolved link, the process proceeds to step S5.
[0039]
In step S5, the nodes at the same level are rearranged with the priority of the next link. The rearrangement is performed by rearranging the nodes at the same level so that the previous link has as little inconsistency as possible, and then rearranging the next link so as to have as little inconsistency as possible. Here, in order to perform the sorting so that there is no inconsistency, it is sufficient to execute the sorting by regarding the relationship by the previous link or the next link as the magnitude relationship of the values.
[0040]
An operation example of step S5 will be described with reference to FIG. In the initial state of step S5, as shown in FIG. 9A, the nodes are arranged in the order of No. 1, No. 2, No. 3, No. 4, No. 5, next link from No. 1 to No. 2, No. 3 from No. 2, It is assumed that the previous link is set to the next link, the next link is set to the third to fourth, and the next link and the previous link are set to the fourth to fifth. In order to rearrange the previous links as much as possible without inconsistency, the numbers 2 and 3 should be replaced, and the numbers 4 and 5 should be replaced. Then, as shown in FIG. 10B, the nodes are arranged in the order of No. 1, No. 3, No. 5, No. 4, and No. 4. Next, in order to rearrange the next links so that there is as little inconsistency as possible, the fourth and fifth numbers may be exchanged. As a result, as shown in FIG. 9C, nodes are arranged in the order of No. 1, No. 3, No. 2, No. 4, and No. 5. At this time, inconsistency occurs in the previous link, but is ignored because the next link has priority. Such rearrangement is performed for all nodes existing in the tree, and the process proceeds to step S6.
[0041]
In step S6, URIs corresponding to each node are ordered according to the created tree. FIG. 10 is a diagram illustrating an example of the execution result of step S6. By step S5, the tree created as shown in FIG. 10A is made one-dimensional with depth priority, so that the book type can be ordered as shown in FIG. 10B. Also, as shown in the figure, if there are a plurality of nodes having the same URI in the tree, the node that comes after when the tree is shaped into a book is deleted.
[0042]
Thus, the operation of rearranging the HTML documents based on the logical structure of the book in the book structure analysis unit 24 in the first embodiment of the present invention is completed.
[0043]
[Embodiment 2]
Next, a second embodiment of the present invention will be described in detail with reference to the drawings. In the present embodiment, the book structure analysis unit 24 in the block diagram showing the configuration of the HTML document book shaping apparatus according to the first embodiment of the present invention shown in FIG. As described above, instead of extracting the description of the logical structure of this type from all HTML documents, the logical structure between HTML documents expressed using the description is extracted from the table of contents document describing the logical structure of this type. It is an example of one Embodiment of this invention changed so that it may convert into this structure.
[0044]
FIG. 11 is a diagram illustrating a relationship between HTML documents according to the second embodiment. A rounded rectangle indicates a portion where a link is described in the HTML document, and an arrow indicates a reference destination of the link. Here, the shaded rounded squares represent references to the outside of the book, and the characters added to the arrows represent the values of the book attribute, section attribute, and index attribute of the link. The mokuji. html31 is available from book. The logical structure such as the arrangement order of the HTML documents is described here from the html 32 by a link having the attribute book = “index”. As a description method, a link having the attribute “index =“ refer ”” can be described in the order in which the link is formed into a book. By combining this link with a <Hn> tag which is a tag for describing the structure in the HTML document, for example, a link surrounded by <H1> and </ H1> represents a chapter, and <H2> and <H2> For example, a link surrounded by / H2> can express a hierarchical relationship between HTML documents, such as a node.
[0045]
The table of contents document mokuji. html31 is shown.
[0046]
mokuji. html ~ 33
<Body>
<H1> Table of Contents <h1><p>
<H1><Aindex="reference"href="chap1.html"> Chapter 1 </A><//h1>
<H2><A index=“reference” href=“sec1.html”> First section </A><h2>
<H2><A index=“refer” href=“sec2.html”> Second section </A><h2>
<P>
<A index="reference"href="chap2.html"> Second chapter </A>
</ Body>
FIG. 12 is a diagram showing an example in which the relationship between the HTML documents shown in FIG. 11 is used to form a book according to the second embodiment of the present invention. Each HTML document is available as mokuji. According to the book-type logical structure described in html33, the model shown in FIG. 1 is arranged in the order of book 1, table of contents 3, and text 4. Also in the present embodiment, if one HTML document cannot fit on one page, it is divided into two or more pages. Each square shown in FIG. 12 represents one page after the main pattern shaping, and the number surrounded by a hyphen (-) at the bottom represents the page number of the page. The arrow in the figure represents a link having the above-mentioned attribute index = “reference”, and the dotted line represents a group linked by a next link and a previous link described in the HTML document.
[0047]
Next, a block diagram showing a configuration of an HTML document book shaping apparatus according to a second embodiment of the present invention is the same as that shown in FIG. Only the data passed to the structure analysis unit 24 and the operation of the main structure analysis unit 24 are different. The change of the HTML syntax analysis unit 23 is to add the current value of the <Hn> tag representing the character size of the heading to the link having the attribute determined in the present embodiment among the data passed to the book type structure analysis unit 24. It is done by changing as follows.
[0048]
The operation of the main structure analysis unit 24 is the most important part in the second embodiment, and will be described in detail with reference to the flowchart of FIG. In this embodiment, as in the first embodiment, the process is started by inputting the root document first. In step S7, the book type structure analysis unit 24 of this embodiment stores the URI of the root document passed from the HTML syntax analysis unit 23 in the structure storage unit 26, and the attribute book = "" in the root document passed at the same time. The URI stored in the link having the “index” is passed to the HTML document acquiring unit 22 while being held, and the HTML parsing unit 23 waits until the same URI as the held URI is passed. When the URI is passed from the HTML syntax analysis unit 23, the attribute index = “refer” among the links having the attribute defined in the present embodiment, which has been passed at the same time, is changed in the order of appearance in the HTML document being processed. To add to the tree. The destination node is determined by the value of the added <Hn> tag.
[0049]
FIG. 16 is a diagram illustrating a state in which a book-type logical structure is added to a tree from a table of contents document in the second embodiment. From the table of contents document 33, links having the attribute “index =“ refer ”” are set to the first chapter 34, the first chapter 35, the first chapter 35, and the second chapter 37. Add to the tree in the order in which the links appear. First, since the link to the first chapter 34 is surrounded by the <H1> tag and the </ H1> tag, the link destination is a level 1 node. Therefore, a link to the first chapter 34 is added as a link from the root document. Next, since the link to the first chapter, first section 35 is surrounded by the <H2> tag and the </ H2> tag, the link destination is a node of level 2. Since the last added node of level 1 is the first chapter 34, a link to the first chapter 35 is added as a link from the first chapter 34. Similarly, a link to the first chapter and the second section 36 is added as a link from the first chapter 34. Finally, since there is no <Hn> tag information in the link to the second chapter 37, the link is added as a link from the root document. As described above, when a tree representing the logical structure of a book is created from the table of contents document 33, the URI of the unresolved link in the tree is passed to the HTML document acquisition unit 22, and the HTML syntax The link having the attribute value “next” or “previous” among the links having the attributes determined in the present embodiment passed from the analysis unit 23 is added to the node having the same URI as the URI passed at the same time. When there are no more unresolved links in the tree, the process proceeds to step S8.
[0050]
In step S8, almost the same operation as in step S3 is performed, and the reference destination URI of the next link and the previous link existing in the tree and not present in the tree are passed to the HTML document acquisition unit 22 and passed from the HTML syntax analysis unit 23. The links having the attribute values “next” and “previous” among the links having the attributes determined in the present embodiment are added to the tree in the order in which they appear in the HTML document being processed. The destination node is a node having the same URI as the URI passed at the same time. If there are no unprocessed next links or previous links in the tree, the process proceeds to step S9.
[0051]
In step S9, exactly the same operation as in step S5 is performed, and the nodes of the same level are rearranged with the priority of the next link. When the rearrangement has been performed for all nodes existing in the tree, the process proceeds to step S10.
[0052]
In step S10, exactly the same operation as in step S6 is performed, the URIs corresponding to the respective nodes are ordered according to the created tree, and when there are a plurality of nodes having the same URI in the tree, the tree is shaped into a book. When deleting nodes that come behind.
[0053]
Thus, the operation of rearranging the HTML documents based on the logical structure of the book in the book structure analysis unit 24 in the second embodiment of the present invention is completed.
[0054]
[Embodiment 3]
Finally, a third embodiment of the present invention will be described in detail. In the present embodiment, only the operation of the HTML parser 23 in the block diagram showing the configuration of the HTML document book formatter according to the first embodiment of the present invention shown in FIG. 6 is different. The HTML syntax analysis unit 23 analyzes the syntax of the HTML document passed from the HTML document acquisition unit 22, and converts the URI of the HTML document being processed and the link having the attribute determined in the present embodiment into the main structure analysis unit 24. The HTML document being processed is stored in the component storage unit 25. In this embodiment, before the link is passed to the main structure analysis unit 24, the existing REL attribute and REV attribute are added to the HTML link. The attribute can be converted and passed to the attribute determined in the embodiment. Hereinafter, this conversion will be described in detail.
[0055]
The REL attribute conventionally existing in HTML links is an abbreviation of RElation, and describes a forward relationship from a link source to a link destination. The REV attribute is an abbreviation of REVerse and describes a reverse relationship from a link destination to a link source. As a value of the REL attribute or the REV attribute, "made", "parent", "next", "previous", and the like can be described. Therefore, the HTML syntax analysis unit 23 in the present embodiment converts the attribute REV = “parent” into the attribute book = “section” in the root document, and converts the attribute section = “section” in the other HTML documents. The REL attribute having the attribute value “next” or “previous” is converted into a book attribute in a root document and a section attribute in other HTML documents. Similarly, the attribute REV = “made” in the root document is converted into the attribute book = “made”. By such a conversion, it is possible to describe a book-type logical structure between HTML documents without extending HTML link attributes.
[0056]
With the above, the operation of converting the REL attribute and the REV attribute conventionally existing in the HTML link into the attributes defined in the present embodiment by the HTML syntax analysis unit 23 in the third embodiment of the present invention is completed.
[0057]
【The invention's effect】
As described above, according to the present invention, when a logical structure such as a table of contents, a chapter, or a section is to be provided between HTML documents, an attribute corresponding to the logical structure is provided. It is possible to set links that do not rely on user recognition. Also, the links described as above are distinguished from other links in terms of attributes, and the computer extracts information on which link to use and order when shaping it into a book shape This has the effect of simplifying the process.
[0058]
Further, according to the present invention, the description of a non-book type logical structure between HTML documents can be formed and displayed in the form of a book without losing any information. Therefore, a close relationship is established instead of a device called a WWW client. This makes it possible to manage a group of HTML documents in the form of a book, thereby providing an advantage that a user can use the method more easily.
[Brief description of the drawings]
FIG. 1 is a diagram modeling a book-like logical structure as a result of shaping according to a first embodiment of the present invention.
FIG. 2 is a diagram showing attributes necessary for describing a logical structure in a link according to the first embodiment of the present invention.
FIG. 3 is a diagram illustrating a logical structure between HTML documents according to the first embodiment of the present invention.
FIG. 4 is a diagram showing an example in which an HTML document is shaped into a book according to the first embodiment of the present invention.
FIG. 5 is a block diagram illustrating a configuration of an HTML document book type shaping apparatus according to the first embodiment of the present invention.
FIG. 6 is a flowchart showing an operation of reordering an HTML document based on a logical structure of a book in a book structure analyzer in the first embodiment of the present invention.
FIG. 7 is a diagram showing an example of a tree created as a result of executing step S1 shown in FIG. 6;
FIG. 8 is a diagram showing an example of a tree created from the tree of FIG. 8 as a result of executing step S2 shown in FIG. 6;
FIG. 9 is a diagram illustrating an operation example of step S5 illustrated in FIG. 7;
FIG. 10 is a diagram illustrating an example of an execution result of step S6 illustrated in FIG. 7;
FIG. 11 is a diagram illustrating a relationship between HTML documents according to the second embodiment of the present invention.
FIG. 12 is a diagram showing an example in which the relationship between the HTML documents shown in FIG. 11 is used to shape the book according to the second embodiment of the present invention.
FIG. 13 is a flowchart illustrating an operation of rearranging an HTML document based on a logical structure of a book in a book structure analyzer according to the second embodiment of the present invention.
FIG. 14 is a diagram showing a state in which a book-type logical structure is added to a tree from a table of contents document in the second embodiment of the present invention.
[Explanation of symbols]
1… Book in book type logical structure model
2. Introduction to this type of logical structure model
3. Table of contents in this type of logical structure model
4. The text in the book-type logical structure model
5 Reference bibliography in this type of logical structure model
6 Index in book type logical structure model
7: Other contents in this type of logical structure model
8 ... Chapters in the main text of this type of logical structure model
9: Sections in chapters in this type of logical structure model
10 ... Pages in the clauses of this type of logical structure model
11 ... Words in a page in this type of logical structure model
22 HTML document acquisition unit
23 HTML parsing unit
24… Structural analysis unit
25: Parts storage unit
26 ... Structure storage unit
27… Book type shaping part
28 ... URI page number correspondence table
29: Display data generator
30 ... Information display section
31. mokuji. html
32 ... book. html
33 ... table of contents document mokuji. html
34 ... Chapter 1
35 ... Chapter 1, Section 1
36 ... Chapter 1, Section 2
37. Chapter 2

Claims

A method of formatting an HTML document described using HTML, which is a structure description language for describing information such as hypertext information on the Internet on a tag basis,
An attribute, which is a description of a logical structure such as a book-type hierarchy and a context between a plurality of HTML documents, given to an identifier in a HTML document called a link for transition from arbitrary information to other information. The first process of interpreting,
A second step of converting the logical structure into a tree structure using the attribute;
A third process of rearranging the tree structure so as not to be inconsistent with the context between a plurality of HTML documents represented by the attribute;
A fourth process of linearly arranging the HTML documents based on the rearranged tree structure;
An HTML document book type shaping method characterized by comprising:

The second step is a step of preparing a table of contents document describing a book-type logical structure between a plurality of HTML documents, and converting the logical structure between the HTML documents into a tree structure using the description of the table of contents document. Is,
2. The HTML document book type shaping method according to claim 1, wherein:

If the logical structure between a plurality of HTML documents is described by a REL attribute that expresses a forward relationship where the link exists and a REV attribute that expresses a reverse relationship, this logical structure is used as a main type. Having a step of converting to a description of the logical structure of the first step before the first step,
The HTML document book type shaping method according to claim 1 or 2, wherein:

An apparatus for formatting an HTML document described using HTML which is a structure description language for describing information such as hypertext information on the Internet on a tag basis,
An apparatus comprising: means for shaping an HTML document described using HTML into a book-type structure; and means for displaying the book-type structure in a book form on a screen.
Means for interpreting an attribute which is a description of a logical structure such as a book hierarchy or a context between HTML documents;
Means for converting the logical structure into a tree structure using the attribute;
Means for rearranging the tree structure so as not to be inconsistent with the context between a plurality of HTML documents represented by the attribute;
Means for linearly arranging HTML documents based on the rearranged tree structure;
An HTML document book-type shaping device comprising:

The means for converting the logical structure into a tree structure includes preparing a table of contents document describing a book-type logical structure between a plurality of HTML documents, and converting the logical structure between the HTML documents into a tree structure based on the table. Means to
The HTML document book-type shaping apparatus according to claim 4, wherein:

When a logical structure between a plurality of HTML documents is described by a REL attribute that expresses a forward relationship existing in a link and a REV attribute that expresses a reverse relationship, this logical structure is defined as a main type. Characterized by newly providing means for converting to a description of a logical structure,
The HTML document book shaping device according to claim 4 or claim 5.