JPH11203291A

JPH11203291A - Sort information generating device

Info

Publication number: JPH11203291A
Application number: JP10015158A
Authority: JP
Inventors: Atsushi Ito; 篤伊藤
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1998-01-10
Filing date: 1998-01-10
Publication date: 1999-07-30
Also published as: DE19900155A1

Abstract

PROBLEM TO BE SOLVED: To provide a sort information generating device capable of efficiently obtaining necessary information by merging dispersed sort information into one information. SOLUTION: The sort information generating device is provided with a document updating detection part 2, a document analysis part 3, a sort information merging part 5, and a sort information changed position presenting part 8. The document analysis part 3 analyzes a document 1 with sort information described and extracts sort information 4. The merging part 5 merges plural sort information components 4 into one sort information 7 based on sorting system definition information 6. When a document 1 is corrected, the detection part 2 detects the correction, regenerates new sort information 4 and the presenting part 8 informs a user of a changed position in the merged sort information 7.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、ＷＷＷ（ワールド
ワイドウェブ）の個人用情報フィルタリング装置（メタ
・ディレクトリ・エンジン）や、ＷＷＷの整理ツールに
適用される分類情報生成装置に関する。[0001] 1. Field of the Invention [0002] The present invention relates to a personal information filtering device (meta directory engine) for WWW (World Wide Web) and a classification information generating device applied to a WWW organizing tool.

【０００２】[0002]

【従来の技術】ＷＷＷの発達により、Ｙａｈｏｏのよう
なディレクトリ・サービスと呼ばれる文書の分類情報を
集めたサービスが増えてきた。また、個人でも自分のホ
ームページ（ＷＷＷページ）で、リンク集と称して分類
情報を集めることが多いが、これら分類情報は分散して
おり、ユーザの欲するまとまった形では得られないのが
現状である。一部のデレクトリ・サービスでは個人用に
カスタマイズする方法を提供しているが、多くの人によ
って作られていく上記のごとき分散された分類情報には
量や速度の点でかなわない。2. Description of the Related Art With the development of the WWW, services called directory services such as Yahoo that collect document classification information have increased. Individuals often collect classification information on their own homepage (WWW page) in the form of a link collection, but these classification information is dispersed and cannot be obtained in a coherent form desired by the user at present. is there. While some directory services offer ways to customize them personally, they are incomparable in volume and speed with such distributed classifications that are being created by many.

【０００３】ここで特開平７−９８７０８号公報に示さ
れる技術では、読み手の意図、レベル、状況に合わせて
文書を再処理、表示する文書処理装置を設けており、ネ
ットワークを介して書き手用端末から読み手用端末に文
書が送信される際に、意図を明確にする意図明確化部、
意図を解釈する意図解釈部、文書処理基本部、知識ベー
ス、情報ファイル等を用いて、読み手の意図、状況を明
確にし、その意図に従って作成済みの文書を読み手に適
合するように再構成するようになっている。[0003] In the technique disclosed in Japanese Patent Application Laid-Open No. 7-98708, a document processing apparatus for reprocessing and displaying a document according to the intention, level, and situation of the reader is provided. When the document is sent to the reader's terminal from, the intention clarification unit that clarifies the intention,
Clarify the intention and situation of the reader using the intention interpreting unit that interprets the intention, the document processing basic unit, the knowledge base, the information file, etc., and restructure the created document according to the intention so that it matches the reader It has become.

【０００４】[0004]

【発明が解決しようとする課題】上記従来技術において
は、読み手に合わせて文書を再構築するが、予め作られ
ているリンク集のような分類情報を効果的に使っておら
ず、分散された膨大な情報に対処することができないと
いう欠点がある。そこで本発明は、分散された分類情報
を一つにまとめることで、必要な情報を効率的に得るこ
とができる分類情報生成装置を提供することを目的とす
るものである。In the above-mentioned prior art, the document is reconstructed according to the reader, but the classification information such as a link collection prepared in advance is not used effectively, and the document is distributed. There is a drawback that it is not possible to deal with a huge amount of information. Therefore, an object of the present invention is to provide a classification information generation device that can efficiently obtain necessary information by combining distributed classification information into one.

【０００５】[0005]

【課題を解決するための手段】上記目的を達成するため
に、請求項１記載の発明は、分類情報を記述した文書を
解析し、分類情報を抽出する文書解析部と、抽出した複
数の分類情報を一つの分類情報にまとめる分類情報マー
ジ部とを少なくとも備えることを特徴とするものであ
る。In order to achieve the above object, according to the first aspect of the present invention, there is provided a document analysis unit for analyzing a document describing classification information and extracting the classification information, and a plurality of extracted classifications. And a classification information merging unit for collecting information into one classification information.

【０００６】また上記目的を達成するために、請求項２
記載の発明は、請求項１記載の発明において、分類情報
を記述した文書とは、ＨＴＭＬ文書やＳＧＭＬ文書のよ
うな構造化文書であることを特徴とするものである。ま
た上記目的を達成するために、請求項３記載の発明は、
請求項１記載の発明において、分類情報マージ部は、分
類体系定義情報を基に情報をまとめることを特徴とする
ものである。また上記目的を達成するために、請求項４
記載の発明は、請求項１記載の発明において、さらに文
書更新検出部と分類情報変更個所提示部とを備え、分類
情報を記述した文書が修正されたとき、文書更新検出部
がそれを検出し、新たに分類情報を生成し直し、分類情
報変更個所提示部が、まとめた分類情報中の変更個所を
ユーザに通知することを特徴とするものである。[0006] To achieve the above object, the present invention provides a second aspect.
According to the invention described in the first aspect, the document describing the classification information is a structured document such as an HTML document or an SGML document. In order to achieve the above object, the invention according to claim 3 is
According to the first aspect of the present invention, the classification information merging section collects information based on classification system definition information. Further, in order to achieve the above object, the present invention is directed to claim 4.
According to the invention described in claim 1, the document update detecting unit further includes a document update detecting unit and a classification information change point presenting unit, and when a document describing the classification information is modified, the document update detecting unit detects the modification. The classification information is newly generated, and the classification information change point presentation unit notifies the user of the change point in the collected classification information.

【０００７】本発明では、文書更新検出部、文書解析
部、分類情報マージ部、分類情報変更個所提示部を備
え、文書解析部で、分類情報を記述した文書を解析し、
分類情報を抽出する。また、分類情報マージ部で、複数
の分類情報を分類体系定義情報を基に一つの分類情報に
まとめる。また、文書が修正されたとき、文書更新検出
部がそれを検出し、新たに分類情報を生成し直し、分類
情報変更個所提示部が、まとめた分類情報中の変更個所
をユーザに通知する。ここで、文書とはＨＴＭＬ（ハイ
パー・テキスト・マークアップ・ランゲージ）文書や、
ＳＧＭＬ（スタンダード・ジェネラライズド・マークア
ップ・ランゲージ）文書のような構造化文書である。According to the present invention, a document update detecting unit, a document analyzing unit, a classification information merging unit, and a classification information change point presenting unit are provided. The document analyzing unit analyzes a document describing the classification information.
Extract classification information. Further, the classification information merging unit combines the plurality of classification information into one classification information based on the classification system definition information. Further, when a document is modified, the document update detection unit detects the modification and newly generates the classification information, and the classification information change point presentation unit notifies the user of the change point in the collected classification information. Here, the document is an HTML (Hyper Text Markup Language) document,
It is a structured document such as an SGML (Standard Generalized Markup Language) document.

【０００８】[0008]

【発明の実施の形態】以下、本発明の実施の形態を添付
図面を参照しながら説明する。図１は本発明の実施の形
態を示す分類情報生成装置のブロック構成図である。こ
の分類情報生成装置は、複数のＨＴＭＬ文書１、文書更
新検出部２、文書解析部３、抽出した複数の分類情報
４、分類情報マージ部５、分類体系定義情報６、まとめ
た分類情報７、分類情報変更個所提示部８を備えてい
る。なお、ＨＴＭＬ文書１、抽出した分類情報４、分類
体系定義情報６、まとめた分類情報７等は、装置を構成
する要素とは言えないが、複数の分類情報４を一つの分
類情報７にまとめる動作を説明する上で不可欠であるの
で、ブロックとして示したものである。Embodiments of the present invention will be described below with reference to the accompanying drawings. FIG. 1 is a block diagram of a classification information generating apparatus according to an embodiment of the present invention. This classification information generation device includes a plurality of HTML documents 1, a document update detection unit 2, a document analysis unit 3, an extracted plurality of classification information 4, a classification information merge unit 5, a classification system definition information 6, a summarized classification information 7, A classification information change point presentation unit 8 is provided. Note that the HTML document 1, the extracted classification information 4, the classification system definition information 6, the collected classification information 7, and the like cannot be said to be elements constituting the device, but a plurality of classification information 4 are collected into one classification information 7. Since they are indispensable for explaining the operation, they are shown as blocks.

【０００９】図２はＨＴＭＬ文書Ａの内容を示す図、図
３はＨＴＭＬ文書Ｂの内容を示す図、図４は図２に示す
ＨＴＭＬ文書Ａから抽出した分類情報を示す図表、図５
は図３に示すＨＴＭＬ文書Ｂから抽出した分類情報を示
す図表である。FIG. 2 is a diagram showing the contents of an HTML document A, FIG. 3 is a diagram showing the contents of an HTML document B, FIG. 4 is a chart showing classification information extracted from the HTML document A shown in FIG.
4 is a table showing classification information extracted from the HTML document B shown in FIG.

【００１０】図２、図３に示すＨＴＭＬ文書１（ＨＴＭ
Ｌ文書Ａ，Ｂ）は、アンカーの後ろが説明分であること
が多いことを利用し、リンクの名前、説明文、キーワー
ドを抽出して、図４、図５の二つの分類情報４を抽出す
る。抽出した分類情報４のフォーマットは、図４、図５
に示すように、「ＵＲＬ」「名前」「説明文」「キーワ
ード」「変更フラグ」に分類されている。ここでキーワ
ードは、文書タイトルやリンクの名前や説明文から抽出
する。この例では、リンクの名前と説明文から抽出して
いる。An HTML document 1 (HTM) shown in FIGS.
In L documents A and B), using the fact that the part after the anchor is often an explanation, the link name, explanation, and keyword are extracted, and the two classification information 4 in FIGS. 4 and 5 are extracted. I do. The format of the extracted classification information 4 is shown in FIGS.
As shown in the table, the URL is classified into “URL”, “name”, “description”, “keyword” and “change flag”. Here, the keyword is extracted from the document title, the name of the link, and the description. In this example, it is extracted from the link name and description.

【００１１】図６は分類体系定義情報を示す図、図７は
まとめた分類情報を示す図表である。図６に示す分類体
系定義情報６は、このようにユーザの興味の持っている
キーワードが書かれているものとする。このとき、抽出
した分類情報４を次の二つのステップでマージする。１．分類体系のノードと一致するキーワードを持つ分類
情報の行を、そのノードにまとめる。このとき、一致の
判定に異表記辞書を用いて、類義語の場合は同じキーワ
ードと見做してもよい（例では製品と商品を同一視して
いる）。２．親子関係にあるノードに同一の行が割り当てられた
ときは親ノードにある方を除く。これにより、図７のよ
うなまとめた形の分類情報が得られ、効率的に興味のあ
る情報を体系化して得ることができる。FIG. 6 is a diagram showing classification system definition information, and FIG. 7 is a table showing summarized classification information. In the classification system definition information 6 shown in FIG. 6, it is assumed that keywords which the user is interested in are written. At this time, the extracted classification information 4 is merged in the following two steps. 1. A line of classification information having a keyword that matches a node of the classification system is collected into the node. At this time, a different notation dictionary may be used to determine the match, and in the case of synonyms, the keywords may be regarded as the same keyword (in the example, the product and the product are identified). 2. If the same row is assigned to a node in a parent-child relationship, the one in the parent node is excluded. As a result, the grouped classification information as shown in FIG. 7 is obtained, and the information of interest can be efficiently systematized and obtained.

【００１２】図８は更新後のＨＴＭＬ文書Ａを示す図、
図９は図８に示すＨＴＭＬ文書Ａから抽出した分類情報
を示す図表、図１０は新しくまとめた分類情報を示す図
表、図１１は分類情報の表示例を示す図表である。ＨＴ
ＭＬ文書Ａが図８のように更新されたとする。文書更新
検出部２はこれを検出する。変更は前回の分類情報を覚
えておくことで容易に検出できる。文書解析部３は、図
９のように変更個所が判る形（変更個所は「変更有」と
示される）で、分類情報４を文書から抽出する。これを
分類情報マージ部５でマージすると、図１０のようにな
る。このとき、変更フラグがオンのところはマージした
ものにも反映される。なお、ここで「名前」「コメン
ト」欄は変更されたものを優先している。これは、変更
されたものはそうでないものに比べて情報として新しい
ので、優先すべきといういう考えからである。分類情報
変更個所提示部８は、太字にする、反転する、色を変え
る、等の方法で、ユーザに変更個所が直ぐ判る形で情報
を提示する。例えば図１１の例では、変更個所を太字に
している。FIG. 8 is a diagram showing the updated HTML document A.
9 is a chart showing classification information extracted from the HTML document A shown in FIG. 8, FIG. 10 is a chart showing newly compiled classification information, and FIG. 11 is a chart showing a display example of classification information. HT
It is assumed that the ML document A has been updated as shown in FIG. The document update detection unit 2 detects this. Changes can be easily detected by remembering the previous classification information. The document analysis unit 3 extracts the classification information 4 from the document in such a manner that the changed part is known as shown in FIG. 9 (the changed part is indicated as “changed”). When this is merged by the classification information merging unit 5, the result is as shown in FIG. At this time, the place where the change flag is ON is also reflected on the merged one. Here, the "name" and "comment" fields have priority to the changed ones. This is because changed things are newer information than non-changed things and should be given priority. The classification information change point presenting unit 8 presents information to the user in a form that allows the user to immediately recognize the change point by using a method such as bolding, inversion, or changing the color. For example, in the example of FIG. 11, the changed part is in bold.

【００１３】本実施の形態の分類情報生成装置は、文書
更新検出部２、文書解析部３、分類情報マージ部５、分
類情報変更個所提示部８を備え、文書解析部３で、分類
情報を記述した文書１を解析し、分類情報４を抽出す
る。また、分類情報マージ部５で、複数の分類情報４を
分類体系定義情報６を基に一つの分類情報７にまとめ
る。また、文書１が修正されたとき、文書更新検出部２
がそれを検出し、新たに分類情報４を生成し直し、分類
情報変更個所提示部８が、まとめた分類情報７中の変更
個所をユーザに通知する。The classification information generating apparatus according to the present embodiment includes a document update detection unit 2, a document analysis unit 3, a classification information merge unit 5, and a classification information change point presentation unit 8. The described document 1 is analyzed, and classification information 4 is extracted. Further, the classification information merging unit 5 combines the plurality of classification information 4 into one classification information 7 based on the classification system definition information 6. When the document 1 is modified, the document update detecting unit 2
Detects this, newly generates the classification information 4, and the classification information change point presenting unit 8 notifies the user of the change point in the collected classification information 7.

【００１４】[0014]

【発明の効果】請求項１記載の発明によれば、分散した
分類情報を一つにまとめて得ることができるので、効率
的に分類情報を得ることができる。According to the first aspect of the present invention, distributed classification information can be collectively obtained as one, so that classification information can be obtained efficiently.

【００１５】請求項２記載の発明によれば、情報が新鮮
で多いＷＷＷの構造化文書を対象としているので、必要
な分類情報をより正確かつ効率的に得ることができる。According to the second aspect of the present invention, since a structured document of WWW having a lot of fresh information is targeted, necessary classification information can be obtained more accurately and efficiently.

【００１６】請求項３記載の発明によれば、分類体系の
定義に合わせて分類情報をまとめることができるので、
ユーザの要求に合った分類情報を効率的に得ることがで
きる。According to the third aspect of the invention, the classification information can be put together according to the definition of the classification system.
Classification information that meets the user's request can be obtained efficiently.

【００１７】請求項４記載の発明によれば、変更個所が
判るようにユーザに提示されるので、新規情報をより効
率的に得ることができる。According to the fourth aspect of the present invention, the changed information is presented to the user so as to be recognized, so that new information can be obtained more efficiently.

[Brief description of the drawings]

【図１】本発明の実施の形態を示す分類情報生成装置の
ブロック構成図である。FIG. 1 is a block diagram of a classification information generating apparatus according to an embodiment of the present invention.

【図２】ＨＴＭＬ文書Ａの内容を示す図である。FIG. 2 is a diagram showing the contents of an HTML document A.

【図３】ＨＴＭＬ文書Ｂの内容を示す図である。FIG. 3 is a diagram showing the contents of an HTML document B.

【図４】図２に示すＨＴＭＬ文書Ａから抽出した分類情
報を示す図表である。FIG. 4 is a table showing classification information extracted from the HTML document A shown in FIG. 2;

【図５】図３に示すＨＴＭＬ文書Ｂから抽出した分類情
報を示す図表である。FIG. 5 is a table showing classification information extracted from the HTML document B shown in FIG. 3;

【図６】分類体系定義情報を示す図である。FIG. 6 is a diagram showing classification system definition information.

【図７】まとめた分類情報を示す図表である。FIG. 7 is a table showing summarized classification information.

【図８】更新後のＨＴＭＬ文書Ａを示す図である。FIG. 8 is a diagram showing an updated HTML document A.

【図９】図８に示すＨＴＭＬ文書Ａから抽出した分類情
報を示す図表である。FIG. 9 is a table showing classification information extracted from the HTML document A shown in FIG. 8;

【図１０】新しくまとめた分類情報を示す図表である。FIG. 10 is a table showing newly compiled classification information.

【図１１】分類情報の表示例を示す図表である。FIG. 11 is a chart showing a display example of classification information.

[Explanation of symbols]

１ＨＴＭＬ文書２文書更新検出部３文書解析部４抽出した分類情報５分類情報マージ部６分類体系定義情報７まとめた分類情報８分類情報変更個所提示部 Reference Signs List 1 HTML document 2 Document update detection unit 3 Document analysis unit 4 Extracted classification information 5 Classification information merge unit 6 Classification system definition information 7 Grouped classification information 8 Classification information change point presentation unit

Claims

[Claims]

At least a document analysis unit that analyzes a document describing classification information and extracts classification information, and a classification information merge unit that combines a plurality of extracted classification information into one classification information is provided. Classification information generating device.

2. The document according to claim 1, wherein the document describing the classification information is an HTML document or an SGML.
A classification information generating device characterized by being a structured document such as a document.

3. The classification information generating device according to claim 1, wherein the classification information merging unit compiles information based on classification system definition information.

4. The apparatus according to claim 1, further comprising a document update detecting section and a classification information change point presenting section, wherein when a document describing the classification information is corrected, the document update detecting section detects the correction and newly updates the document. A classification information generating unit that regenerates classification information, and a classification information change point presentation unit notifies a user of a change point in the collected classification information.