JP3673111B2

JP3673111B2 - Document management method, document management apparatus, and storage medium

Info

Publication number: JP3673111B2
Application number: JP09389399A
Authority: JP
Inventors: 保長谷川; 博史杉山; 達上林; 善啓大盛
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1999-03-31
Filing date: 1999-03-31
Publication date: 2005-07-20
Anticipated expiration: 2019-03-31
Also published as: JP2000285134A

Description

【０００１】
【発明の属する技術分野】
本発明は、例えば、ＷＷＷブラウザで閲覧されるＷＷＷページや、ワードプロセッサ、エディタ、表計算ソフトその他のアプリケーションで作成される文書などを記憶・管理する文書管理装置に関する。
【０００２】
【従来の技術】
近年のＰＣやワードプロセッサの普及によりオフィス内での文書の電子化が進んでいる。加えてインターネットやイントラネットの発達で電子メールの利用が急増し上記電子化をいっそう加速している。この結果多くのオフィスが大量の電子化されたデータで溢れかえるようになり、これらを効率良く整理し管理することの重要性が高まってきている。
【０００３】
このためこれら電子化された文書を効率良く管理する様々な文書管理装置が開発されてきた。ある装置では文書の登録方法に特徴があり、デスクトップ上のアイコンに対してファイルをドラッグ＆ドロップするだけでデータベースに登録することができる。またある装置では文書の検索方法に特徴があり、例えば「昨日の会議で配布された資料」の様に日常使っている自然言語で文書を検索することができる。また電子メイルに特化したある装置では、受け取った電子メールを自動的に取り込み、差出人ごとのフォルダに仕分けてくれる他、後から全文検索などで検索を行うことができる。また最近はこれらの特徴を兼ね備えた装置も出てきている。
【０００４】
このように従来、文書管理装置として様々な装置が開発されてきたが、その登録機能はユーザが直接登録操作をしなければならないか、電子メールのように特定の文書だけを対象として自動登録できるシステムであった。
【０００５】
また、インターネットとＷＷＷ（World Wide Web）の発達により、ＷＷＷブラウザを通して情報を入手する割合が増大しており、ＷＷＷページを管理することの重要性が高まっている。予め登録しておいたＵＲＬに従ってＷＥＢサイトを自動巡回し、ＷＷＷページのコピーを収集するソフトが開発されている。しかし自分が見たページを対象として登録するものではない。
【０００６】
また、ＷＷＷブラウザがディスク上に残しているＷＷＷページのキャッシュを登録する機能を備えた装置がある。この機能を用いれば自分が見たページだけをデータベースに登録できるが、キャッシュがクリアされてしまうと登録できない問題がある。またキャッシュは短時間のうちに膨大な数になるため、登録前に適切なフィルタリングを行う必要があるが、今までにこのようなフィルタリング機能を備えた装置は登場していない。
【０００７】
【発明が解決しようとする課題】
このように従来の文書管理装置の登録機能は、ユーザ自身が作成、編集、閲覧したもの、即ち自分が直接操作したもの全てを対象とした自動登録機能を備えたものではなく、ユーザ自身が直接登録操作をしなければならないか、あるいは前記メールのように特定の文書だけを対象としていた。
【０００８】
またＷＷＷブラウザを通して得られる膨大な情報に対し、実際に自分が見たページの中からフィルタリングで絞り込んだ物を自動的に登録する機能を備えた装置は存在しなかった。
【０００９】
そこで、本発明は、上記問題点に鑑み、ユーザ自身が実際に作成、編集、あるいは閲覧した文書を後に検索可能にするインデックス情報を自動的に作成することにより、文書管理が容易に行える文書管理方法およびそれを用いた文書管理装置を提供することを目的とする。
【００１０】
また本発明は、ＷＷＷブラウザを通して得られる膨大な文書の中から、予め設定したプロファイル情報にヒットするものだけを閲覧時に自動的にローカルマシン上にコピーし、さらにこのコピーしたデータに対するインデックス情報を自動的に作成することにより、文書管理が容易に行える文書管理方法およびそれを用いた文書管理装置を提供することを目的とする。
【００１１】
【課題を解決するための手段】
本発明の文書管理方法は、記憶されている文書を検索するためのインデックス情報を作成するタイミングを設定して、その設定されたタイミングに従って該文書に関する属性情報を抽出してインデックス情報を作成して該文書に対応付けて記憶することにより、ユーザ自身が実際に作成、編集、あるいは閲覧した文書に対するインデックス情報を自動的に作成し、後に、このインデックス情報を用いて文書を容易に検索できる。
【００１２】
また、本発明の文書管理方法は、閲覧された文書から抽出した属性情報と、選択すべき文書の属性を定めたプロファイル情報とに基づき、閲覧された文書のうち該文書を検索するためのインデックス情報を作成する文書を選択し、この選択された文書から抽出された属性情報から前記インデックス情報を作成して、前記選択された文書と前記インデックス情報とを対応付けて記憶することにより、ＷＷＷブラウザを通して得られる膨大な文書（ＷＷＷページ）の中から、予め設定したプロファイル情報にヒットするものだけを閲覧時に自動的に記憶し、さらにこの記憶したデータに対するインデックス情報を自動的に作成し、後に、このインデックス情報を用いて記憶したＷＷＷページを容易に検索できる。
【００１３】
本発明の文書管理装置は、記憶された文書を検索するためのインデックス情報を作成するタイミングを設定するタイミング設定手段と、前記設定されたタイミングで前記文書に関する属性情報を抽出してインデックス情報を作成する作成手段と、前記インデックス情報を該文書に対応付けて記憶する記憶手段とを具備したことにより、ユーザ自身が実際に作成、編集、あるいは閲覧した文書に対するインデックス情報を自動的に作成し、後に、このインデックス情報を用いて文書を容易に検索できる。
【００１４】
好ましくは、前記インデックス情報に基づき文書を検索する検索手段と、この検索手段での検索結果を少なくとも前記インデックス情報とともに呈示する呈示手段とを具備する。
【００１５】
また、本発明の文書管理装置は、閲覧された文書から属性情報を抽出する抽出手段と、前記抽出された属性情報と、選択すべき文書の属性を定めたプロファイル情報とに基づき、閲覧された文書のうち該文書を検索するためのインデックス情報を作成する文書を選択する選択手段と、前記選択された文書から抽出された属性情報から前記インデックス情報を作成する作成手段と、前記選択された文書と前記インデックス情報とを対応付けて記憶する記憶手段とを具備したことにより、ＷＷＷブラウザを通して得られる膨大な文書（ＷＷＷページ）の中から、予め設定したプロファイル情報にヒットするものだけを閲覧時に自動的に記憶し、さらにこの記憶したデータに対するインデックス情報を自動的に作成し、後に、このインデックス情報を用いて記憶したＷＷＷページを容易に検索できる。また、ＷＷＷページのように日々更新される情報に対しても有効なインデックス情報を生成することが可能になる。
【００１６】
好ましくは、前記インデックス情報に基づき文書を検索する検索手段と、この検索手段での検索結果を少なくとも前記インデックス情報とともに呈示する呈示手段とを具備する。例えば、ツリー形式と表形式とで呈示して、双方が互いに連携し合って表示することにより、検索結果から所望の文書を見つける場合など、ユーザにとって使い勝手がよくなる。
【００１７】
好ましくは、前記プロファイル情報に基づき指定された属性情報を有する文書を検索する検索手段をさらに具備する。また、前記記憶手段で記憶された文書およびインデックス情報のうち、前記プロファイル情報に基づき指定された属性情報を有する文書およびそのインデックス情報を削除する削除手段をさらに具備する。これにより、文書の検索、削除がより合理的に行え、文書管理の上で利便性が向上する。
【００１８】
なお、上記各手段は、コンピュータに実行させるプログラムとして、フロッピーディスク、ＣＤ−ＲＯＭ等の記憶媒体に記録して頒布することができる。例えば、図１、図１１のユーザインタフェース部１５、ファイル記憶部８、インデックスデータベース９はコンピュータの持つハードウエア資源を利用して構成し、その他の構成部はコンピュータに実行させるプログラムで実現可能である。
【００１９】
【発明の実施の形態】
以下、本発明の実施形態について図面を参照して説明する。
【００２０】
（第１の実施形態）
図１は本発明の第１の実施形態にかかる文書管理装置の構成例を示したものである。
【００２１】
閲覧部１は、ファイル記憶部８に保存されている例えば、テキスト文書、ＨＴＭＬ文書等の電子化された文書を閲覧あるいは作成、編集するものである。閲覧部１は、この文書管理装置専用に作られたものの他、エディタやワープロなどの既存のアプリケーションなどによって構成される。
【００２２】
登録動作設定部２は、閲覧部１で作成、編集、閲覧された文書に対するインデックス情報の作成を開始するタイミングとなる閲覧部１の動作を設定するものである。
【００２３】
登録動作定義テーブル３は登録動作設定部２での設定内容を記録保管するためのもので、登録制御部４は、この登録動作定義テーブル３を参照しながら登録制御を行うようになっている。登録動作定義テーブル３は例えば、ＭＳ−Ｗｉｎｄｏｗｓの場合、レジストリやＩＮＩファイルなどを利用する。他のシステム、例えばＵＮＩＸやＭａｃＯＳなどにおいても同様のものを用いる。
【００２４】
登録制御部４は登録動作定義テーブル３の設定値を元に閲覧部１の動作を検知したら、キーワード抽出部５、属性取得部６、インデックス情報作成部７を制御してインデックス情報の作成、登録を実行する。
【００２５】
キーワード抽出部５は閲覧部１で閲覧、作成、編集されている文書からキーワードを抽出するものである。キーワードの抽出方法、既存のものでよく、例えば、文書中のテキスト部分を形態素解析して、名詞等の単語を抽出し、その名詞の出現頻度を求めて、出現頻度の高いいくつかの名詞をキーワードとしてもよい。また、予め各分野毎にキーワードとなり得る単語が登録されている辞書を用いて、文書とこの辞書とのマッチングを行って、一致した単語を当該文書のキーワードとしてもよい。
【００２６】
属性取得部６は、閲覧部１で閲覧、作成、編集されている文書のキーワード以外の属性を取得するものである。
【００２７】
インデックス情報作成部７は、キーワード抽出部５および属性取得部６で得られたキーワードと各種属性値を基に、閲覧部１で閲覧あるいは作成編集された文書に対するインデックス情報を生成する。
【００２８】
ファイル記憶部８は、文書の各種ファイルを保存するもので、ハードディスクや各種リムーバブルメディアで構成される。
【００２９】
インデックスデータベース９はインデックス情報作成部７で作られたインデックス情報を保存管理するものであり、検索部１０はこのインデックス情報により検索を行う。
【００３０】
検索部１０はユーザからの検索要求を基にインデックスデータベース９に対して検索を行い、その結果を検索結果呈示部１４に伝える。
【００３１】
検索結果呈示部１４は、表形式呈示部１１、ツリー形式呈示部１２、連携制御部１３の３つで構成され、検索部１０での検索結果をユーザに呈示するものである。
【００３２】
表形式呈示部１１は、検索結果を１件につき１行の表にして呈示する。類似度やキーワードなどで結果を一覧しやすいように呈示する。
【００３３】
ツリー形式呈示部１２は、検索された文書ファイルがファイル記憶部８のどこに保存されているかをわかりやすく呈示するためにディレクトリ構造をツリー状に呈示する。
【００３４】
連携制御部１３は、表形式呈示部１１とツリー形式呈示部１２に対して行うユーザの文書選択操作に応じて両方の表示を連携させる制御を行う。
【００３５】
ユーザインタフェース部１５は、ディスプレイ装置等の出力装置、キーボード、マウス等の各種入力装置から構成され、ユーザが各種指示入力を行ったり、検索式を入力したり、また、文書の表示や、文書検索結果の表示等を行うようになっている。
【００３６】
次に、図１の文書管理装置による登録タイミングとその設定について説明する。なお、ここで登録とは、ファイル記憶部８に既に記憶されている文書ファイルについてインデックス情報を作成して、そのインデックス情報をインデックスデータベースに登録することで、このことを「文書の登録」と呼ぶこともある。
【００３７】
登録タイミングとして設定される閲覧部１の動作としては、例えば、
▲１▼閲覧部１の終了あるいは閲覧ウインドウのクローズ
▲２▼ユーザ操作によるファイルのセーブ
の２つがあり、ユーザは登録動作設定部２を通して、これら２つのいづれか１つ、あるいは両方を選択すればよい。例えば、▲１▼だけを設定すると閲覧、編集等された文書の全てが登録される。▲２▼だけを設定すると編集した文書だけが登録される。なお、閲覧部１が自動セーブの機能を有する場合、自動セーブ時の登録を▲２▼のオプションとして追加することができる。▲１▼と▲２▼の両方を設定すると▲１▼だけの設定と同じく閲覧された文書、編集された文書の全てが登録されるが、編集時の登録のタイミングが▲１▼だけの設定とは異なる。セーブ時の登録が基本になり、セーブ後に編集を行わずに終了した場合、終了時の登録は行われない。
【００３８】
設定内容は登録動作定義テーブル３に書き込まれ、登録制御部４から参照される。
【００３９】
以上が登録タイミングの設定手順である。
【００４０】
図２は、文書管理装置で用いられるインデックス情報の構造を示した図で、文書ファイルに関する属性情報（例えば、ＩＤ番号、ファイル名、パス名、ファイルの種類、メディアの種類、作成日時、タイトル、作者等）と、該文書から抽出されたキーワードをから構成されていて、文書１つに対しこのインデックス情報が１つ作成される。
【００４１】
ＩＤ番号は、インデックス情報の作成順に「１」、「２」、「３」…と順番に発番され、インデックスデータベース９に格納されている全てのインデックス情報においてユニークな整数値データである。
【００４２】
ファイル名は、当該文書ファイルのファイル記憶部８上でのファイル名を表す文字列データである。
【００４３】
パス名は、同じく当該文書ファイルのファイル記憶部８上でのパス名を表す文字列データである。
【００４４】
ファイル名とパス名の表現方法は、この文書管理装置が動作するＯＳに依存し、例えばＭＳ−Ｗｉｎｄｏｗｓの場合、それぞれ「Ｄｏｃｕｍｅｎｔ．ｔｘｔ」、「ｃ：￥ｈｏｍｅ￥ＭｙＤｏｃｕｍｅｎｔ￥ｄｏｃｕｍｅｎｔ．ｔｘｔ」などの文字列データとなる。
【００４５】
ファイルの種類は、図３に示すように、ファイルの種類を表す予め定義された数値データであり、例えば、この文書管理装置が動作するＯＳと利用するアプリケーションと対象とするファイルの種類とに依存する。例えばＭＳ−Ｗｉｎｄｏｗｓ上でＭＳ−Ｏｆｆｉｃｅの各アプリケーションのファイルとリッチテキストフォーマット（ＲＴＦ）ファイル、標準テキストファイルを対象とする場合、図３に示す定義値を利用する。
【００４６】
メディアの種類は、図４に示すように、ファイルの保存場所の種類を表す予め定義された数値データであり、例えば、この文書管理装置が動作するＯＳに依存する。例えば、ＭＳ−Ｗｉｎｄｏｗｓの場合は図４に示すような定義値を利用する。
【００４７】
作成日時は、当該文書ファイルが作成された日時、あるいは最終更新日時を表す日付型のデータである。
【００４８】
タイトルは、当該文書ファイルのタイトルを表す文字列データである。
【００４９】
作者は、当該文書ファイルを作成した作者を表す文字列データである。
【００５０】
キーワードは、キーワード抽出部５で当該文書のテキスト部分から抽出したキーワードを値とするリスト型のデータである。
【００５１】
次に、図５に示すフローチャートを参照して、登録制御部４によるインデックス情報の作成および登録の処理動作について説明する。
【００５２】
登録制御部４は、閲覧部１で登録動作定義テーブル３に設定された動作、即ちファイルのセーブもしくは閲覧部１の終了動作の実行を監視する（ステップＳ１）。動作の実行を監視する方法には、システムに常駐するアプリケーションで標準的に用いられている方法を利用する。例えば、ＭＳ−Ｗｉｎｄｏｗｓの場合では、当該動作が行われる際に発生するＷｉｎｄｏｗｓメッセージを監視し、該メッセージに対してフックをかけることで当該動作の監視が行える。また、他の方法として、閲覧部１に既存のアプリケーションが利用された場合、該アプリケーション自身の機能として該設定された動作を検知し、さらにその動作に応じた処理を実行することも可能である。例えばＭＳ−ＷｉｎｄｏｗｓのアプリケーションであるＭＳ−ＷｏｒｄやＭＳ−Ｅｘｃｅｌの場合、マクロ言語ＶＢＡ（ＶｉｓｕａｌＢａｓｉｃｆｏｒＡｐｐｌｉｃａｔｉｏｎ）を用いてアプリケーションのコマンド自体を書き換えることによりアプリケーションの動作をカスタマイズすることが可能で、セーブコマンドに手を加えることでセーブ時の一連の登録処理を実行させることができる。また当該動作時に自動的に実行するマクロを作ることもできる。例えば、Ａｕｔｏ＿Ｃｌｏｓｅ（）マクロで任意のファイルをクローズした際の処理を、Ａｕｔｏ＿Ｅｘｉｔ（）マクロでＷｏｒｄ自体を終了する際の処理をそれぞれ記述することができるため、これらを用いて終了時の一連の登録処理を実行させることができる。
【００５３】
予め設定したセーブあるいは終了の動作が検知された場合（ステップＳ２）、登録制御部４は、キーワード抽出部５を起動し、キーワード抽出部５は、閲覧部１で閲覧、編集、作成等されていた文書からキーワードを抽出する（ステップＳ３）。
【００５４】
続いて、属性取得部６を起動し、閲覧部１よりファイル名、パス名、ファイルの種類、メディア種別、ファイル作成日時、タイトル、作者の各属性値を取得する（ステップＳ４）。
【００５５】
続いてインデックス情報作成部７を起動し、キーワード抽出部５で抽出されたキーワードと属性取得部６で取得された各属性値を元に図２に示したインデックス情報を作成する（ステップＳ５）。
【００５６】
続いてインデックス情報作成部７は、作成したインデックス情報をインデックスデータベース９に登録する（ステップＳ６）。
【００５７】
属性取得部６での各属性値の取得方法、インデックス情報作成部７でのインデックス情報の作成方法、及びインデックス情報作成部でのインデックス情報のインデックスデータベース９への登録方法については後で改めて説明する。
【００５８】
続いて、閲覧部１が引き続き動作しているかを調べる（ステップＳ７）。セーブ時の登録や、ウインドウを閉じての登録などで閲覧部１が引き続き動作中の場合は、再び（ステップＳ１）の監視状態に戻り、以上の動作が継続して行われる。
【００５９】
以上の様に登録制御部４においてインデックス情報の作成および登録が行われる。
【００６０】
次に、属性取得部６における各属性値の取得方法について説明する。
【００６１】
ファイル名、パス名は閲覧部１から直接取得する。閲覧部１に既存のアプリケーションを利用する場合は、ＯＳ及びアプリケーションの機能から属性を取得する。例えばＭＳ−Ｗｉｎｄｏｗｓ上のアプリケーションの場合、ＯＬＥオートメーションなどによって属性を取得することができる。
【００６２】
ファイルの種類はファイル名から取得する。例えばＭＳ−Ｗｉｎｄｏｗｓの場合、ファイル名の拡張子部分からファイルの種類を取得することができる。
【００６３】
メディア種別はパス名からドライブ名を取り出し、ＯＳの機能からドライブの種類を調べることによって取得する。
【００６４】
ファイル作成日時はＯＳの機能を用いて取得する。例えばＭＳ−Ｗｉｎｄｏｗｓの場合ＷｉｎｄｏｗｓＡＰＩの各関数を用いて実装する。
【００６５】
タイトルは、当該文書ファイルが例えばＭＳ−Ｗｏｒｄの文書ファイルのようにファイル自身の属性としてタイトルを持っている場合には、このタイトルをそのまま利用する。当該文書ファイルが属性としてタイトル持っていない場合、あるいは属性の取得方法がわからない文書ファイルである場合には、例えば、当該文書ファイルのテキスト部分の最初の１文をタイトルとして利用する。この１文が文字列フィールドの大きさを越える場合には、その大きさまでをタイトルとして利用する。
【００６６】
作者は、当該文書ファイルが自分の属性として作者の情報を持っている場合には、その値をそのまま利用する。当該文書ファイルが属性として作者の情報を持っていない場合、あるいは属性の取得方法がわからない文書ファイルである場合には、この文書管理装置が動作するＯＳが文書ファイルの作者を取り出すＡＰＩを備えている場合、このＡＰＩ関数によって作者を取得する。ＡＰＩが無い場合、あるいは値の取得に失敗した場合には、空の文字列を値とする。
【００６７】
なお、閲覧部１に既存のアプリケーションを適用する場合、アプリケーションの機能として上記したような属性値を取得することが可能な場合にはこの機能によってその属性値を取得する。例えばＭＳ−Ｗｉｎｄｏｗｓのアプリケーションの場合、ＯＬＥオートメーションの機能で属性値の多くをアプリケーションから取得することが可能である。
【００６８】
以上のように属性取得部６において各属性値を取得する。
【００６９】
次に、インデックス情報作成部７におけるインデックス情報の作成方法について説明する。
【００７０】
まず、新しいインデックスを１つ生成し、
（ＩＤ番号）＝（直前に作られたインデックスのＩＤ番号）＋１
で定まるＩＤ番号を設定する。ＩＤ番号としては各インデックスにユニークな値が設定できれば上記方法でなくてもかまわない。例えばインデックス先のデータが無くなったような不要なインデックスの削除で欠番となったＩＤ番号を小さい物から優先的に割り当てるような方法でも良い。
【００７１】
続いて、キーワード抽出部５で抽出された各キーワードを設定する。キーワードが抽出されなかった場合、本フィールドは空のリスト型となる。またキーワードの数が予めリスト型としてサポートしている最大要素数の上限を越える場合には、キーワードとして検出された順番に最大要素数分のリスト型を形成するものとする。
【００７２】
最後に、属性取得部６で取得されたファイル名、パス名、ファイルの種類、メディア種別、ファイル作成日時、タイトル、作者の各属性値を設定する。
【００７３】
以上のようにインデックス情報作成部７においてインデックス情報を作成する。
【００７４】
次に、図６を参照して、インデックス情報作成部７がインデックスデータベース９にインデックス情報を登録する動作についてを説明する。
【００７５】
インデックス情報作成部７は、その作成したインデックス情報と同じ文書のインデックス情報がインデックスデータベース９に既にあるかどうかを調べるために、当該インデックス情報にあるパス名と同じパス名を持ったインデックス情報がインデックスデータベース９にあるかを調べる（ステップＳ１１）。
【００７６】
同じパス名のインデックス情報が無い場合、当該作成したインデックス情報をインデックスデータベース９に新規登録する（ステップＳ１２）。
【００７７】
同じパス名のインデックス情報が既に存在する場合、この既に登録されたインデックス情報と作成したインデックス情報の各属性値を比較し、異なる属性値が１つ以上存在する場合には（ステップＳ１３）、既にあるインデックス情報を今回作成したインデックス情報で更新する（ステップＳ１４）。
【００７８】
異なる属性値が１つも存在しない場合には、変更が無いので今回作成したインデックス情報を破棄し（ステップＳ１５）、登録をしないで終了する。
【００７９】
以上のようにして、インデックス情報作成部７においてインデックス情報がインデックスデータベース９へ登録される。
【００８０】
次に、図７を参照して、検索部１０における文書検索処理動作について説明する。
【００８１】
ユーザインタフェース部１５を介して、検索部１０にインデックスデータベース９に対する検索式を入力する（ステップＳ２１）。この検索式の中身はインデックスデータベース９に依存する。例えばインデックスデータベース９がＳＱＬ（ＳｔｒｕｃｔｕｒｅｄＱｕｅｒｙＬａｎｇｕａｇｅ）ベースのデータベースの場合にはＳＱＬ文での検索式が用いられる。
【００８２】
続いて、検索部１０では、この検索式を用いてインデックスデータベース９からインデックス情報を検索し（ステップＳ２２）、その検索結果を受け取って検索結果呈示部１４に送る（ステップＳ２３）。検索結果呈示部１４では、検索されたインデックス情報をユーザインタフェース部１５に呈示する。
【００８３】
検索結果呈示部１４に呈示された検索結果に所望の文書がない場合、ユーザは引き続き現在の検索結果を破棄してステップＳ２１から検索をやり直すことができる。検索結果が多く所望の文書を見つけられない場合、現在の検索結果に対しての追加の検索を実行することができる（ステップＳ２４）。追加の検索を行う場合には、現在の検索結果即ち検索された文書のすべてのインデックス情報のＩＤ番号をメモリ上にデータとして蓄えておき（ステップＳ２５）、追加の検索によってインデックスデータベース９からインデックス情報を取得する際に、当該メモリ上のＩＤ番号と一致したものだけを取り出すようにすれば良い。また、それぞれの検索での検索結果を保存しておくことで、以前の検索結果に立ち戻ったり、追加の検索を複数平行して行うこともできる。
【００８４】
以上のように検索部１０において文書検索が実行される。
【００８５】
次に、図８を参照して、検索結果呈示部１４での検索結果の取得の動作を説明する。
【００８６】
まず、ユーザインタフェース部１５に呈示されたインデックス情報の中から、ユーザが所望の文書を選択する（ステップＳ３１）。次に、選択された文書のインデックス情報からファイル名とパス名を抽出し（ステップＳ３２）、ファイル名とパス名とが閲覧部１に伝えられる（ステップＳ３３）。閲覧部１は当該ファイル名とパス名とからファイル記憶部８から該当する文書ファイルを読み込んで呈示する（ステップＳ３４）。
【００８７】
以上のように検索結果呈示部１４における検索結果の取得が実行される。
【００８８】
次に、検索結果呈示部１４での検索結果の呈示方法について説明する。検索結果呈示部１４では、図９に示すように、検索結果をディスプレイ装置の表示画面に、表形式呈示ウインドウとツリー形式呈示ウインドウとにそれぞれ呈示する。表形式呈示ウインドウには、各文書のインデックス情報の内容をその検索スコアの上位から順に表形式の呈示を行う。ツリー形式呈示ウインドウには、検索されたインデックス情報に含まれているパス情報を基にツリー状に呈示する。
【００８９】
これら２つの呈示ウインドウは、連携制御部１３によって互いに連携して動作し合う。例えば表形式呈示ウインドウに呈示された任意の文書をマウスなどでクリックして選択すると、ツリー形式呈示ウインドウ上の当該文書の部分が選択されハイライト表示などによって一目でわかるようになる。同様にツリー形式呈示ウインドウに呈示された任意の文書を選択すると、表形式呈示ウインドウ上の当該文書が選択されるようになる。
【００９０】
このために連携制御部１３では、一方の呈示ウインドウで選択された文書に対するインデックス情報のＩＤ番号を他方の呈示ウインドウに伝える働きをする。
【００９１】
また、ツリー形式呈示ウインドウには、各フォルダ内の検索されなかった文書ファイルも同時に呈示するようにしてもよい（図１０参照）。この場合は、検索されたファイルはハイライト表示を行うなどして検索されなかったファイルと一目で区別ができるようにする。
【００９２】
以上のように検索結果呈示部１４で検索結果の呈示を行う。
【００９３】
（第２の実施形態）
本発明の第２の実施形態にかかる文書管理装置は、図１の閲覧部１に、ＷＷＷブラウザアプリケーションを適用した例で、管理対象の文書としての、ユーザ自身が閲覧したＷＷＷページ（ＷＷＷブラウザ上に表示されているページ）に対し、後に当該ＷＷＷページを検索するために必要なインデックス情報を自動的に作成し、そのインデックス情報を用いて所望のＷＷＷページを検索するためのものである。
【００９４】
ところで、ＷＷＷページの場合、閲覧している実体は通常ローカルマシンの外にある。また、その実体は日々更新されることが多く、実体に対してのインデックス情報をファイリングしても後日の役に立たないケースが多い。このため、第２の実施形態では上記ＷＷＷページのコピーをローカルマシン上のファイルシステムに作成する機能を追加し、そのコピーに対するインデックス情報を作成してインデックスデータベースに登録する方式をとっている。
【００９５】
また、日常ＷＷＷブラウザを通してＷＷＷページを閲覧していると、知らず知らずのうちに膨大な数のページを閲覧していることに気がつく。第１の実施形態のように閲覧した文書（のインデックス情報）全てを登録する方式の場合、登録数が膨大になってしまうという問題が生じる。そこで第２の実施形態では、予めプロファイル情報を登録し、閲覧したＷＷＷページに対して、このプロファイル情報によるフィルタリング処理を行う機能が追加されている。なお、第２の実施形態に係る文書管理装置の閲覧部は、図１１に示すように、ＷＷＷブラウザ１０３に限定するものではなく、エディタやワープロなどの既存のアプリケーションを適用してもかまわない。前者のローカルマシン外にある実体をコピーしてインデックスを生成する機能は、エディタやワープロなどのアプリケーションでフロッピーディスクや光磁気ディスクなどの各種リムーバブルメディア上にある文書ファイルに対する閲覧や編集を行う際に適用することができる。また後者のプロファイル情報によるフィルタリング処理は、そのまま文書ファイルに対して適用することが可能である。
【００９６】
図１１は、第２の実施形態にかかる文書管理装置の構成例を示したものである。なお、図１１において、図１と同一部分には同一符号を付し、異なる部分について説明する。すなわち、図１１では、図１の閲覧部１をＷＷＷブラウザ１０３に置き換え、図１の登録動作設定部２、登録動作定義テーブル３が、プロファイル設定部１０１、プロファイル登録テーブル１０２、フィルタリング部１０６、ＷＷＷページ取得・保存部１１７に置き換わり、さらに、プロファイル情報呈示部１１７が追加されてた構成となっている。
【００９７】
ＷＷＷブラウザ１０３は、既存のブラウザアプリケーションで構成されるが、専用に作り込んだ物であっても構わない。
【００９８】
プロファイル設定部１０１は、フィルタリング部１０６で行われる文書のフィルタリング処理を行う際に用いるプロファイル情報（図１２参照）を設定するためのもので、図１２に示すように、例えば、複数のキーワードと複数のＵＲＬ（ＵｎｉｆｏｒｍＲｅｓｏｕｒｃｅＬｏｃａｔｏｒ）をそれぞれ設定することができる。
【００９９】
プロファイル登録テーブル１０２は、プロファイル設定部１０１で設定された図１２に示したようなプロファイル情報を、設定された時間とともに、図１３に示すテーブル形式で保存するものである。
【０１００】
フィルタリング部１０６は、キーワード抽出部５で抽出されたキーワードと属性取得部６で取得された属性値とプロファイル登録テーブル１０２に設定された最新のプロファイル情報とを比較し、登録を行うか否かを登録制御部４に伝える。閲覧中のＷＷＷページから抽出されたキーワードおよび属性値がプロファイル情報と一致しているときは、登録制御部４は、当該ＷＷＷページの登録を行う。
【０１０１】
ＷＷＷページ取得保存部１０７は、ＷＷＷブラウザ１０３で閲覧されたページを構成している各オブジェクトを１つのフォルダにまとめてファイル記憶部８上にコピーする。
【０１０２】
プロファイル情報呈示部１１７は、プロファイル登録テーブル１０２の内容を時系列でグラフィカルに呈示するためのもので、呈示したプロファイル情報を基にＷＷＷページの検索と削除を行うこともできる（後述）。
【０１０３】
図１４は、インデックス情報の構造を示したもので、ＩＤ番号、ＵＲＬ、フォルダ名、先頭ファイル名、タイトル、作成日時、キーワード、フィルタリング種別、ヒットしたキーワードの各項目によって構成され、登録されるＷＷＷページごとに１つ作成される。
【０１０４】
ＩＤ番号はインデックス情報の作成順に「１」、「２」、「３」…と順番に発番され、インデックスデータベース９の全てのインデックス情報においてユニークな整数値データである。
【０１０５】
ＵＲＬは、登録するＷＷＷページのＵＲＬを表す文字列データである。
【０１０６】
フォルダ名はファイル記憶部８上にコピーしたＷＷＷページを保存しているフォルダのパス名を表す文字列データである。
【０１０７】
先頭ファイル名は、フォルダ内の各ファイルの中で、先頭ページのＨＴＭＬ文書をコピーしたファイルのファイル名を表す文字列データである。
【０１０８】
タイトルは登録するＷＷＷページに付けられたタイトルを表す文字列データである。
【０１０９】
作成日時は、コピーしたファイルが作成された日時、あるいは最終更新日時を表す日付型のデータである。
【０１１０】
キーワードは、キーワード抽出部５においてＷＷＷページのテキスト部分に対して抽出したキーワードを値とするリスト型のデータである。キーワードが抽出されなかった場合、本フィールドは空のリスト型となる。またキーワードの数がリスト型として予め定められた最大要素数の上限を越える場合には、キーワードとして検出された順番に最大要素数分のリスト型を形成するものとする。
【０１１１】
フィルタリング種別は、登録するＷＷＷページに対するフィルタリング処理がプロファイルに定義されたＵＲＬで行われたのか、キーワードで行われたのか、その両方で行われたのか、あるいはフィルタリングが行われなかったのかを表す予め定義された数値のデータで、図１５に示す各定義値のいずれかの値を取る。
ヒットしたキーワードは、登録するＷＷＷページに対するフィルタリング処理がキーワードで行われた際に、そのＷＷＷページから抽出されたキーワードのうち、プロファイル情報のキーワードと一致したキーワードを値とするリスト型のデータである。キーワードによるフィルタリングが行われなかった場合、本フィールドは空のリスト型となる。
【０１１２】
次に、図１６を参照して、登録制御部４におけるインデックス情報の作成および登録の処理動作について説明する。
【０１１３】
ＷＷＷブラウザ１０３で新たなＷＷＷページが表示されたかを監視する（ステップＳ４１）。この監視方法としては、システムに常駐するアプリケーションで標準的に用いられている方法を利用する。例えば、ＭＳ−Ｗｉｎｄｏｗｓの場合では、新たなＷＷＷページが表示された際に発生するＷｉｎｄｏｗｓメッセージを監視し、そのメッセージに対してフックをかけることで新たなＷＷＷページの表示されたこと検知する。
【０１１４】
新たなＷＷＷページの表示が検知された場合（ステップＳ４２）、プロファイル登録テーブル１０２を参照し、プロファイルとしてＵＲＬが登録されているかを調べる（ステップＳ４３）。
【０１１５】
プロファイルにＵＲＬが登録されていない場合、ステップＳ４６へ進む。一方、プロファイルにＵＲＬが登録されている場合、ステップＳ４４へ進み、属性取得部６を起動し、属性取得部６がＷＷＷページのＵＲＬをＷＷＷブラウザ１０３から取得する（ステップＳ４４）。
【０１１６】
次に、フィルタリング部１０６を起動し、プロファイルとして登録されているＵＲＬと、現在のＷＷＷページのＵＲＬとを比較し、同じＵＲＬがあるかを調べる（ステップＳ４５）。同じＵＲＬがある場合は、以下の処理を実行し、同じＵＲＬがない場合は登録処理を中断し、ステップＳ４１へ戻り、ＷＷＷブラウザの監視を行う。
【０１１７】
次に、プロファイル登録テーブル１０２を参照し、プロファイルとしてキーワードが登録されているかを調べる（ステップＳ４６）。プロファイルにキーワードが登録されていない場合、ステップＳ４９へ進む。一方、プロファイルにキーワードが登録されている場合、すて４７へ進み、キーワード抽出部５を起動し、ＷＷＷページ内のテキスト情報からキーワードを抽出する（ステップＳ４７）。なお、キーワードの抽出は前述したように、通常のファイリングシステムや検索システムにおいて通常使われている方法によって行われるものとする。
【０１１８】
続いて、フィルタリング部１０６を起動し、当該ＷＷＷページから抽出されたキーワードとプロファイル登録テーブル１０２にプロファイルとして登録されたキーワードとを比較し照合を行う（ステップＳ４８）。一致するキーワード存在しない場合、すなわち、プロファイルとして登録されたキーワードと同じキーワードが当該ＷＷＷページに存在しない場合、そのＷＷＷページの登録を中断し、ステップＳ４１へ戻り、再びＷＷＷブラウザの監視を行う。
【０１１９】
続いて、インデックスデータベース９に当該ＷＷＷページと同じＵＲＬを持ったインデックス情報があるか否かを調べる（ステップＳ４９）。同じＵＲＬのインデックス情報が無い場合は、ステップＳ５１へ進み、以降の登録処理を実行する。インデックスデータベース９に当該ＷＷＷページと同じＵＲＬを持ったインデックス情報が存在する場合、そのインデックス情報が指しているフォルダにあるＷＷＷページのコピーと閲覧されている当該ＷＷＷページを構成している各ファイルとの間でファイル名及びファイルの中身を比較し、異なるファイルが１つも存在しないときは（ステップＳ５０）、その閲覧されているＷＷＷページと同じものが既に登録されていると判断できるので、現在の登録処理を終了して、ステップＳ４１へ戻り、再び、ＷＷＷブラウザの監視を行う。一方、異なるファイルが１つでも存在する場合、ステップＳ５１へ進み、以降の登録処理を実行する。
【０１２０】
ＷＷＷページの登録処理では、まず、ＷＷＷページ取得・保存部１０７を起動し、当該ＷＷＷページのコピーをファイル記憶部８に作成する（ステップＳ５１）。コピーの作成は、ＷＷＷページの自動巡回機能を持ったアプリケーションなどで一般的に行われている方法を用いて行う。
【０１２１】
次に、属性取得部６を起動して、当閲覧中のＷＷＷページとそのＷＷＷページのコピーから、フォルダ名、先頭ファイル名、タイトル、作成時間の各属性値を取得する（ステップＳ５２）。このうちタイトルはＷＷＷブラウザ１０３から直接取得する。フォルダ名と先頭ファイル名はコピーの作成時に情報を残しておき、これを読み出すことで取得する。作成時間はコピー先となるフォルダの作成時間を第１の実施形態の場合と同じ方法で取得する。あるいはＯＳの時計機能で現在の時間を求めてこれを作成時間としても良い。また、ＵＲＬは先に処理（ステップＳ４４）で取得してあるものを利用する。
【０１２２】
続いて、インデックス情報作成部７を起動し、キーワード抽出部５で抽出されたキーワードと、属性取得部６で取得された各属性値を基にインデックス情報を作成する（ステップＳ５３）。インデックス情報の作成方法は第１の実施形態と同様である。作成したインデックス情報は、インデックスデータベース９に新規登録する（ステップＳ５４）。その後、ステップＳ４１へ戻り、ＷＷＷブラウザ１０３の監視を行い、以下ブラウザが終了するまで上記処理を継続して行う。
【０１２３】
次に、登録されたＷＷＷページの検索処理動作について説明する。図１１の文書管理装置には、検索部１０に対して入力された検索式によって行う第１の実施形態と同様の検索機能と、プロファイル情報呈示部１１７からの検索機能の２つの検索機能がある。
【０１２４】
まず、検索部１０に対して入力された検索式によって行う検索処理動作について説明する。ここでは、検索結果呈示部１４の動作が第１の実施形態の場合と異なる。検索結果呈示部１４は、図１７に示すように検索されたＷＷＷページを表形式呈示ウインドウとツリー形式呈示ウインドウとにそれぞれ呈示する。表形式呈示ウインドウでは、各ＷＷＷページのインデックス情報内容をその検索スコアの上位から順に表形式で呈示を行う。ツリー形式呈示ウインドウでは、各ＷＷＷページの中でＵＲＬが同じものを１つの階層にまとめ、図１７に示すように、全体で２階層のツリー構造にした呈示を行う。これら２つの呈示ウインドウは連携制御部１３によって第１の実施形態と同様に互いに連携して動作し合う。
【０１２５】
なお、ツリー形式呈示ウインドウでは、ＵＲＬのドメイン表示部分を右側の要素、すなわち第１ドメインから順番に階層的にまとめた図１８に示すようなツリー表示を行うこともできる。
【０１２６】
次に、プロファイル情報呈示部１１７からの検索処理動作について、図１９と図２０を参照して説明する。
【０１２７】
図１９にプロファイル情報呈示部１１でディスプレイ装置に表示された表示画面の一例を示す。プロファイル情報呈示部１１は、プロファイル登録テーブル１０２の内容を時間軸２０１に従ってグラフィック表示したもので、プロファイルとして設定された全てのＵＲＬとキーワードを、それぞれの期間（プロファイル情報として設定されている、使われる期間）を表した線分で、ＵＲＬの表示領域２０４、キーワードの表示領域２０５の各領域に表示する。例えば、ＵＲＬ「ｗｗｗ．ａａａ．ｄｄｄ．ｅｄｕ」は１９９９年１１月中旬から２０００年１月末まで、キーワード「ネットワーク」は２０００年１月末までそれぞれ有効なプロファイル情報であることが図１９の表示から確認できる。また、表示領域２０４、２０５は時間軸２０１の左右に付けられたスクロールボタン２０２、２０３をマウスなどのポインティングデバイスでクリックすることで左右に（時間軸で）スクロールさせることができる。
【０１２８】
以下、図２０に示すフローチャートを参照して、プロファイル情報呈示部１１７からの検索処理動作について説明する。
【０１２９】
まず、ユーザは、プロファイル情報呈示部１１７より表示された図１９に示した表示画面から任意のＵＲＬ、キーワードを選択する（ステップＳ６１）。以下、ＵＲＬとキーワードをともに選択する場合を例にとり説明するが、いずれか一方のみを選択する場合も同様である。選択方法としては、ポインティングデバイスでＵＲＬ、キーワードを表す線分をそれぞれクリックして選択する方法と、図１９に示すように、縦の点線で示す時間を指定する線分２０６をポインティングデバイスによって左右に移動させ、所望のＵＲＬ、キーワードの線分に重ねることで選択する方法の２種類の選択方法がある。なお後者では線分２０６を２本にして、両方の線分で囲んだ矩形領域でＵＲＬ、キーワードを選択することも可能である。
【０１３０】
次に、検索ボタン２０７を押下する（ステップＳ６２）。続いて、選択されたＵＲＬ、キーワードが検索部１０に伝えられると（ステップＳ６３）、ＵＲＬとキーワードとで行われたフィルタリング処理によって、既に登録されているＷＷＷページのインデックス情報が、フィルタリング種別、ＵＲＬ、ヒットしたキーワードの各フィールドに対するフィールド検索によって検索される（ステップＳ６４）。
【０１３１】
続いて、検索結果が検索結果呈示部１４に伝えられ、図１７、図１８に示したような、検索結果を検索式での検索結果と同様に呈示する（ステップＳ６５）。
【０１３２】
以下、第１の実施形態の検索と同様に、検索結果からＷＷＷページを選択し（ステップＳ６６）、対応するインデックス情報からファイル名とパス名を取得する（ステップＳ６７）。続いてファイル名とパス名をＷＷＷブラウザに１０３通知し（ステップＳ６８）、ＷＷＷブラウザ１０３がファイル記憶部８から対応するＷＷＷページを読み込んで呈示する（ステップＳ６９）。
【０１３３】
次に、プロファイル情報呈示部１１７における、インデックス情報、ＷＷＷページのコピーファイル、プロファイル情報を削除する処理について説明する。
【０１３４】
プロファイル情報呈示部１１７は、先に説明したプロファイル情報からのＷＷＷページの検索機能に加え、検索されたインデックス情報とＷＷＷページのコピーファイルとをそれぞれインデックスデータベース９とファイル記憶部８から、さらにプロファイル情報をプロファイル登録テーブル１０２からそれぞれ削除する機能を有する。この機能により、既に不要となった過去に設定したプロファイル情報とこれに対応したＷＷＷページのコピーファイルとインデックス情報とを効果的に削除することができる。以下、図２１に示すフローチャートを参照して削除処理について説明する。
【０１３５】
まず、ユーザは、プロファイル情報呈示部１１７に表示された任意のＵＲＬ、キーワードを選択する（ステップＳ７１）。以下、ＵＲＬとキーワードをともに選択する場合を例にとり説明すが、いずれか一方のみを選択する場合も同様である。選択方法としては、ポインティングデバイスでＵＲＬ、キーワードを表す線分をクリックして選択する方法と、図１９に示すように縦の点線で示す時間を指定する線分２０６をポインティングデバイスによって左右に移動させ、所望のＵＲＬ、キーワードを表す線分に重ねることで選択する方法の２種類の選択方法がある。なお、後者においては、指定時間より以前に有効であった、即ち時間を示指定する線分２０６よりも左側の領域にのみ存在するＵＲＬ、キーワードを表す線分を選択することや、あるいは、時間を指定する線分２０６を２本にして、両方の線分で囲んだ矩形領域でＵＲＬ、キーワードを選択することもできる。
【０１３６】
次に、削除ボタン２０８を押下する（ステップＳ７２）。このとき、選択されたＵＲＬ、キーワードと同じものが、選択されたもの以外に存在しないかをチェックする（ステップＳ７３）。存在する場合、これらのＵＲＬ、キーワードは選択から外される（ステップＳ７４）。そして、選択されたＵＲＬ、キーワードが残っているかを調べる（ステップＳ７５）。残っている場合には、ステップＳ７６に進み、残っていない場合には削除を行わずに処理を終了する。
【０１３７】
続いて、ＵＲＬとキーワードが検索部１０に伝えられ（ステップＳ７６）、ＵＲＬとキーワードで行われたフィルタリング処理によって、登録されたＷＷＷページのインデックス情報が、フィルタリング種別、ＵＲＬ、ヒットしたキーワードの各フィールドに対するフィールド検索によって検索される（ステップＳ７７）。
【０１３８】
次に、プロファイル情報呈示部１１７では、この検索された全てのインデックス情報のそれぞれについて、フィルタリング種別、ＵＲＬ、ヒットしたキーワードの各フィールド値から、そのインデックス情報が選択した以外のＵＲＬ、キーワードでフィルタリングされているか調べる（ステップＳ７８）。他のＵＲＬ、キーワードでフィルタリングされている場合、そのインデックス情報を削除対象から外す（ステップＳ７９）。以上の処理の結果、削除対処となるインデックス情報が残っている場合は（ステップＳ８０）、ステップＳ８１へ進み、残っていない場合にはステップＳ８４へ進む。
【０１３９】
プロファイル情報呈示部１１７は、削除対象のインデックス情報からファイル名とパス名を取得する（ステップＳ８１）。プロファイル情報呈示部１１７では、ファイル記憶部８から対応するＷＷＷページのコピーファイルを削除し（ステップＳ８２）、続いて、インデックスデータベース９から対応するインデックス情報を削除し（ステップＳ８３）、最後にプロファイル登録テーブル１０２から削除されたＷＷＷページのファイルのプロファイル情報であるＵＲＬ、キーワードを削除する（ステップＳ８４）。
【０１４０】
なお、上記実施形態のみ限定されず、要旨を変更しない範囲で、例えば、第１の実施形態と第２の実施形態とを組み合わせる等して、適宜変形して実施できる。
【０１４１】
以上説明したように、上記実施形態によれば、閲覧部１での動作を検知して、予め設定した動作時に自動的にインデックス情報を作成することで、ユーザ自身が実際に作成、編集、あるいは閲覧した文書に対するインデックス情報を自動的に作成し、このインデックス情報を用いて後に簡単に文書を検索して呼び出すことが可能になる。
【０１４２】
また、予め設定したプロファイルを基に閲覧した文書（例えば、ＷＷＷページ）をフィルタリングし、閲覧した文書ファイルをコピーすることで、例えば、ＷＷＷブラウザを通して得られる膨大な文書のうち、予め設定したプロファイル情報にマッチするものだけを閲覧時に自動的にファイル記憶部８上にコピーし、さらにこのコピーしたデータに対するインデックス情報を自動的に作成し、後に、このインデックス情報を用いて簡単に文書を検索して呼び出すことが可能になる。
【０１４３】
【発明の効果】
以上説明したように、本発明によれば、ユーザ自身が実際に作成、編集、あるいは閲覧した文書を後に容易に検索可能にして、文書管理が容易に行える。
【図面の簡単な説明】
【図１】本発明の第１の実施形態に係る文書管理装置の構成例を示した図。
【図２】インデックス情報の構造を示した図。
【図３】ファイルの種類を示す値の具体例を示した図。
【図４】メディアの種類を示す値の具体例を示した図。
【図５】インデックス情報作成処理動作を説明するためのフローチャート。
【図６】インデックス情報のインデックスデータベースへの登録処理動作を説明するためのフローチャート。
【図７】検索処理動作を説明するためのフローチャート。
【図８】検索結果呈示部の検索結果取得処理動作を説明するためのフローチャート。
【図９】検索結果の呈示例を示した図。
【図１０】検索結果の他の呈示例を示した図。
【図１１】本発明の第２の実施形態にかかる文書管理装置の構成例を示した図。
【図１２】プロファイル情報の構造を示した図。
【図１３】プロファイル登録テーブルの構造を示した図。
【図１４】インデックス情報の構造を示した図。
【図１５】フィルタリング種別を表す値の具体例を示した図。
【図１６】インデックス情報の作成処理動作を説明するためのフローチャート。
【図１７】検索結果の呈示例を示した図。
【図１８】検索結果の他の呈示例を示した図。
【図１９】プロファイル情報呈示部の処理動作を説明するためのもので、プロファイル登録テーブルの内容を時間軸に従ってグラフィック表示したもので、プロファイルとして設定された全てのＵＲＬとキーワードを、それぞれの有効期間を表した線分で表示する様子を示した図。
【図２０】図１９に示した表示内容から文書の検索を行う場合の処理動作を説明するためのフローチャート。
【図２１】図１９に示した表示内容から文書、インデックス情報の削除を行う場合の処理動作を説明するためのフローチャート。
【符号の説明】
１…閲覧部
２…登録動作設定部
３…登録動作定義テーブル
４…登録制御部
５…キーワード抽出部
６…属性取得部
７…インデックス情報作成部
８…ファイル記憶部
９…インデックスデータベース
１０…検索部
１１…表形式呈示部
１２…ツリー形式呈示部
１３…連携制御部
１４…検索結果呈示部
１５…ユーザインタフェース部
１０１…プロファイル設定部
１０２…プロファイル登録テーブル
１０３…ＷＷＷブラウザ
１０６…フィルタリング部
１０７…ＷＷＷページ取得・保存部
１１７…プロファイル情報呈示部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a document management apparatus that stores and manages, for example, a WWW page browsed by a WWW browser, a document created by a word processor, an editor, spreadsheet software, or other applications.
[0002]
[Prior art]
With the spread of PCs and word processors in recent years, digitization of documents in offices is progressing. In addition, with the development of the Internet and intranet, the use of e-mail has increased rapidly, and the above digitization has been accelerated further. As a result, many offices are overflowing with a large amount of electronic data, and the importance of efficiently organizing and managing them is increasing.
[0003]
For this reason, various document management apparatuses for efficiently managing these digitized documents have been developed. A certain apparatus has a feature in a document registration method, and can be registered in a database simply by dragging and dropping a file with respect to an icon on a desktop. In addition, a certain apparatus has a feature in a document search method. For example, a document can be searched in a natural language used everyday such as “material distributed at a meeting yesterday”. In addition, a certain apparatus specialized for electronic mail automatically retrieves received electronic mails and sorts them into folders for each sender, and can also perform searches by full-text search later. Recently, devices having these features have also appeared.
[0004]
As described above, various devices have been developed as document management devices. However, the registration function can be automatically registered only for a specific document such as an e-mail or the user has to perform a direct registration operation. It was a system.
[0005]
In addition, with the development of the Internet and WWW (World Wide Web), the rate of obtaining information through a WWW browser is increasing, and the importance of managing WWW pages is increasing. Software that automatically circulates a WEB site according to a URL registered in advance and collects a copy of a WWW page has been developed. However, it is not intended to register the page that you have seen.
[0006]
In addition, there is an apparatus having a function of registering a cache of a WWW page left on a disk by a WWW browser. If you use this function, you can register only the pages you see in the database, but there is a problem that you cannot register if the cache is cleared. In addition, since the cache becomes enormous in a short time, it is necessary to perform appropriate filtering before registration, but no device having such a filtering function has appeared so far.
[0007]
[Problems to be solved by the invention]
As described above, the registration function of the conventional document management apparatus is not provided with an automatic registration function for all items created, edited, and browsed by the user, that is, directly operated by the user. Either a registration operation has to be performed, or only a specific document such as the mail is targeted.
[0008]
In addition, there has been no device that has a function of automatically registering a large amount of information obtained through the WWW browser by filtering out the pages actually viewed by filtering.
[0009]
In view of the above problems, the present invention provides document management that facilitates document management by automatically creating index information that allows a user to search for documents that are actually created, edited, or viewed later. It is an object to provide a method and a document management apparatus using the method.
[0010]
In addition, the present invention automatically copies only documents that hit preset profile information from a vast number of documents obtained through a WWW browser to a local machine at the time of browsing, and further automatically creates index information for the copied data. It is an object of the present invention to provide a document management method and a document management apparatus using the document management method, which can easily manage documents by creating them manually.
[0011]
[Means for Solving the Problems]
The document management method of the present invention sets the timing for creating index information for searching stored documents, extracts attribute information about the document according to the set timing, and creates index information. By storing in association with the document, index information for a document actually created, edited, or viewed by the user can be automatically created, and the document can be easily searched later using the index information.
[0012]
Further, the document management method of the present invention provides an index for searching a document among the browsed documents based on the attribute information extracted from the browsed document and the profile information defining the attribute of the document to be selected. By selecting a document for which information is to be created, creating the index information from attribute information extracted from the selected document, and storing the selected document and the index information in association with each other, a WWW browser Of the enormous amount of documents (WWW pages) obtained through the process, only those that hit the preset profile information are automatically stored at the time of browsing, and index information for the stored data is automatically created. The stored WWW page can be easily searched using this index information.
[0013]
The document management apparatus of the present invention creates timing information by setting timing for creating index information for searching stored documents, and extracts attribute information about the document at the set timing to create index information And a storage means for storing the index information in association with the document, thereby automatically creating index information for a document actually created, edited, or viewed by the user, The document can be easily searched using this index information.
[0014]
Preferably, a search means for searching for a document based on the index information and a presentation means for presenting a search result by the search means together with at least the index information are provided.
[0015]
Further, the document management apparatus of the present invention is browsed based on extraction means for extracting attribute information from the browsed document, the extracted attribute information, and profile information defining attributes of the document to be selected. Selection means for selecting a document for creating index information for searching the document among documents, creation means for creating the index information from attribute information extracted from the selected document, and the selected document And storage means for storing the index information in association with each other, so that only documents that hit preset profile information from among a huge number of documents (WWW pages) obtained through a WWW browser are automatically viewed. And automatically create index information for this stored data, and later on this index information You can easily search for WWW pages stored using. Also, it is possible to generate effective index information for information that is updated daily such as a WWW page.
[0016]
Preferably, a search means for searching for a document based on the index information and a presentation means for presenting a search result by the search means together with at least the index information are provided. For example, by presenting in a tree format and a table format and displaying both in cooperation with each other, it is convenient for the user to find a desired document from the search result.
[0017]
Preferably, the image processing apparatus further includes search means for searching for a document having attribute information designated based on the profile information. Further, the apparatus further includes a deletion unit that deletes the document having the attribute information designated based on the profile information and the index information among the document and the index information stored in the storage unit. This makes it possible to search and delete documents more rationally and improve convenience in document management.
[0018]
Each of the above means can be recorded and distributed as a program to be executed by a computer on a storage medium such as a floppy disk or a CD-ROM. For example, the user interface unit 15, the file storage unit 8, and the index database 9 in FIGS. 1 and 11 are configured using hardware resources of the computer, and the other configuration units can be realized by programs executed by the computer. .
[0019]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the present invention will be described below with reference to the drawings.
[0020]
(First embodiment)
FIG. 1 shows an example of the configuration of a document management apparatus according to the first embodiment of the present invention.
[0021]
The browsing unit 1 is for browsing, creating, or editing a digitized document such as a text document or an HTML document stored in the file storage unit 8. The browsing unit 1 is configured by an existing application such as an editor or a word processor in addition to the one created exclusively for the document management apparatus.
[0022]
The registration operation setting unit 2 sets an operation of the browsing unit 1 that is a timing for starting creation of index information for a document created, edited, and browsed by the browsing unit 1.
[0023]
The registered operation definition table 3 is for recording and storing the setting contents in the registered operation setting unit 2, and the registration control unit 4 performs registration control with reference to the registered operation definition table 3. For example, in the case of MS-Windows, the registration operation definition table 3 uses a registry or an INI file. The same system is used in other systems such as UNIX and MacOS.
[0024]
When the registration control unit 4 detects the operation of the browsing unit 1 based on the setting value of the registered operation definition table 3, the registration control unit 4 controls the keyword extraction unit 5, the attribute acquisition unit 6, and the index information creation unit 7 to create and register index information. Execute.
[0025]
The keyword extraction unit 5 extracts keywords from a document that has been browsed, created, and edited by the browsing unit 1. A keyword extraction method may be used, for example, a morphological analysis of a text part in a document, a word such as a noun is extracted, an appearance frequency of the noun is obtained, and some nouns with high appearance frequency are obtained. It may be a keyword. In addition, using a dictionary in which words that can be keywords for each field are registered in advance, matching may be performed between the document and the dictionary, and the matched word may be used as the keyword of the document.
[0026]
The attribute acquisition unit 6 acquires attributes other than keywords of the document being browsed, created, and edited by the browsing unit 1.
[0027]
The index information creation unit 7 generates index information for a document viewed or created and edited by the browsing unit 1 based on the keywords and various attribute values obtained by the keyword extraction unit 5 and the attribute acquisition unit 6.
[0028]
The file storage unit 8 stores various files of a document, and includes a hard disk and various removable media.
[0029]
The index database 9 stores and manages the index information created by the index information creation unit 7, and the search unit 10 performs a search using this index information.
[0030]
The search unit 10 searches the index database 9 based on a search request from the user, and informs the search result presentation unit 14 of the search result.
[0031]
The search result presenting unit 14 is composed of a table format presenting unit 11, a tree format presenting unit 12, and a cooperation control unit 13, and presents the search results in the search unit 10 to the user.
[0032]
The tabular form presentation unit 11 presents the search results as a one-line table. Present the results in a way that makes it easy to list results based on similarities and keywords.
[0033]
The tree form presenting unit 12 presents the directory structure in a tree shape in order to easily show where the retrieved document file is stored in the file storage unit 8.
[0034]
The cooperation control unit 13 performs control for linking both displays in accordance with a user's document selection operation performed on the table format presentation unit 11 and the tree format presentation unit 12.
[0035]
The user interface unit 15 includes an output device such as a display device, and various input devices such as a keyboard and a mouse. The user inputs various instructions, inputs a search expression, displays a document, and searches a document. The result is displayed.
[0036]
Next, registration timing and its setting by the document management apparatus of FIG. 1 will be described. Here, registration refers to creating index information for a document file already stored in the file storage unit 8 and registering the index information in the index database, which is called “document registration”. Sometimes.
[0037]
As an operation of the browsing unit 1 set as the registration timing, for example,
(1) Termination of browsing part 1 or closing of the browsing window
(2) File saving by user operation
The user only has to select one or both of these two through the registration operation setting unit 2. For example, if only (1) is set, all the documents that have been viewed or edited are registered. If only (2) is set, only the edited document is registered. When the browsing unit 1 has an automatic saving function, registration at the time of automatic saving can be added as an option (2). If both (1) and (2) are set, all the viewed and edited documents will be registered in the same way as setting only for (1), but the registration timing during editing is set only for (1). Is different. Registration at the time of saving is basic, and if you quit without editing after saving, registration at the end is not performed.
[0038]
The setting contents are written in the registration operation definition table 3 and referred to by the registration control unit 4.
[0039]
The above is the registration timing setting procedure.
[0040]
FIG. 2 is a diagram showing the structure of index information used in the document management apparatus. The attribute information about the document file (for example, ID number, file name, path name, file type, media type, creation date, title, Author) and a keyword extracted from the document, and one index information is created for each document.
[0041]
The ID number is an integer value data that is unique in all index information stored in the index database 9 in the order of creation of index information in the order of “1”, “2”, “3”.
[0042]
The file name is character string data representing the file name of the document file on the file storage unit 8.
[0043]
Similarly, the path name is character string data representing the path name of the document file on the file storage unit 8.
[0044]
The method of expressing the file name and path name depends on the OS on which the document management apparatus operates. For example, in the case of MS-Windows, “Document.txt”, “c: \ home \ My Document \ document.txt”, etc. It becomes the character string data.
[0045]
As shown in FIG. 3, the file type is predefined numerical data representing the file type. For example, the file type depends on the OS on which the document management apparatus operates, the application to be used, and the target file type. To do. For example, when a file of each application of MS-Office, a rich text format (RTF) file, and a standard text file are targeted on MS-Windows, the definition values shown in FIG. 3 are used.
[0046]
As shown in FIG. 4, the media type is predefined numerical data representing the type of file storage location, and depends on, for example, the OS on which the document management apparatus operates. For example, in the case of MS-Windows, a definition value as shown in FIG. 4 is used.
[0047]
The creation date / time is date-type data representing the date / time when the document file was created or the last update date / time.
[0048]
The title is character string data representing the title of the document file.
[0049]
The author is character string data representing the author who created the document file.
[0050]
The keyword is list-type data whose value is the keyword extracted from the text portion of the document by the keyword extraction unit 5.
[0051]
Next, processing operations for creating and registering index information by the registration control unit 4 will be described with reference to the flowchart shown in FIG.
[0052]
The registration control unit 4 monitors the operation set in the registered operation definition table 3 by the browsing unit 1, that is, the execution of the file saving or the ending operation of the browsing unit 1 (step S1). As a method for monitoring the execution of the operation, a method that is standardly used by an application that is resident in the system is used. For example, in the case of MS-Windows, a Windows message generated when the operation is performed is monitored, and the operation can be monitored by hooking the message. As another method, when an existing application is used in the browsing unit 1, it is possible to detect the set operation as a function of the application itself and to execute processing corresponding to the operation. . For example, in the case of MS-Word and MS-Excel, which are MS-Windows applications, it is possible to customize the operation of the application by rewriting the application command itself using the macro language VBA (Visual Basic for Application). A series of registration processing at the time of saving can be executed by modifying the command. It is also possible to create a macro that is automatically executed during the operation. For example, the process when an arbitrary file is closed with the Auto_Close () macro and the process when the Word itself is terminated with the Auto_Exit () macro can be described. Processing can be executed.
[0053]
When a preset save or end operation is detected (step S2), the registration control unit 4 activates the keyword extraction unit 5, and the keyword extraction unit 5 is browsed, edited, created, etc. A keyword is extracted from the selected document (step S3).
[0054]
Subsequently, the attribute acquisition unit 6 is activated, and each attribute value of the file name, path name, file type, media type, file creation date, title, and author is acquired from the browsing unit 1 (step S4).
[0055]
Subsequently, the index information creating unit 7 is activated, and the index information shown in FIG. 2 is created based on the keyword extracted by the keyword extracting unit 5 and each attribute value acquired by the attribute acquiring unit 6 (step S5).
[0056]
Subsequently, the index information creation unit 7 registers the created index information in the index database 9 (step S6).
[0057]
A method for acquiring each attribute value in the attribute acquisition unit 6, a method for creating index information in the index information creation unit 7, and a method for registering index information in the index database 9 in the index information creation unit will be described later. .
[0058]
Subsequently, it is checked whether or not the browsing unit 1 continues to operate (step S7). If the browsing unit 1 is still in operation due to registration at the time of saving or registration with the window closed, the operation returns to the monitoring state (step S1) again, and the above operations are continued.
[0059]
As described above, the registration control unit 4 creates and registers index information.
[0060]
Next, a method for acquiring each attribute value in the attribute acquisition unit 6 will be described.
[0061]
The file name and path name are obtained directly from the browsing unit 1. When an existing application is used in the browsing unit 1, attributes are acquired from the OS and application functions. For example, in the case of an application on MS-Windows, the attribute can be acquired by OLE automation or the like.
[0062]
The file type is obtained from the file name. For example, in the case of MS-Windows, the file type can be acquired from the extension portion of the file name.
[0063]
The media type is obtained by extracting the drive name from the path name and examining the drive type from the OS function.
[0064]
The file creation date and time is acquired using the OS function. For example, in the case of MS-Windows, it implements using each function of WindowsAPI.
[0065]
If the document file has a title as an attribute of the file itself, such as an MS-Word document file, the title is used as it is. If the document file does not have a title as an attribute, or if it is a document file whose attribute acquisition method is unknown, for example, the first sentence of the text portion of the document file is used as the title. When this one sentence exceeds the size of the character string field, up to that size is used as a title.
[0066]
The author uses the value as it is when the document file has the author's information as its own attribute. If the document file does not have the author information as an attribute or is a document file whose attribute acquisition method is unknown, the OS on which the document management apparatus operates has an API for retrieving the author of the document file. In this case, the author is acquired by this API function. When there is no API or when acquisition of a value fails, an empty character string is set as the value.
[0067]
When an existing application is applied to the browsing unit 1, if an attribute value as described above can be acquired as a function of the application, the attribute value is acquired by this function. For example, in the case of an MS-Windows application, it is possible to acquire many attribute values from the application using the OLE automation function.
[0068]
As described above, the attribute acquisition unit 6 acquires each attribute value.
[0069]
Next, a method for creating index information in the index information creating unit 7 will be described.
[0070]
First, create a new index,
(ID number) = (ID number of the index created immediately before) +1
The ID number determined by is set. The ID number may not be the above method as long as a unique value can be set for each index. For example, a method may be used in which an ID number that is missing due to deletion of an unnecessary index that has lost data at the index destination is preferentially assigned from a small number.
[0071]
Subsequently, each keyword extracted by the keyword extraction unit 5 is set. If no keyword is extracted, this field is an empty list type. When the number of keywords exceeds the upper limit of the maximum number of elements supported in advance as a list type, list types corresponding to the maximum number of elements are formed in the order detected as keywords.
[0072]
Finally, the attribute value of the file name, path name, file type, media type, file creation date / time, title, and author acquired by the attribute acquisition unit 6 is set.
[0073]
As described above, the index information creating unit 7 creates index information.
[0074]
Next, with reference to FIG. 6, an operation in which the index information creating unit 7 registers index information in the index database 9 will be described.
[0075]
In order to check whether the index information of the same document as the created index information already exists in the index database 9, the index information creation unit 7 uses the index information having the same path name as the index name in the index information. It is checked whether it exists in the database 9 (step S11).
[0076]
If there is no index information with the same path name, the created index information is newly registered in the index database 9 (step S12).
[0077]
When index information with the same path name already exists, the attribute values of the already registered index information and the created index information are compared, and when one or more different attribute values exist (step S13), Certain index information is updated with the index information created this time (step S14).
[0078]
If no different attribute value exists, the index information created this time is discarded because there is no change (step S15), and the process ends without registration.
[0079]
As described above, the index information creating unit 7 registers the index information in the index database 9.
[0080]
Next, a document search processing operation in the search unit 10 will be described with reference to FIG.
[0081]
A search expression for the index database 9 is input to the search unit 10 via the user interface unit 15 (step S21). The contents of this search expression depend on the index database 9. For example, when the index database 9 is a SQL (Structured Query Language) -based database, a search expression in an SQL sentence is used.
[0082]
Subsequently, the search unit 10 searches the index database 9 for index information using this search formula (step S22), receives the search result, and sends it to the search result presenting unit 14 (step S23). The search result presentation unit 14 presents the searched index information to the user interface unit 15.
[0083]
If there is no desired document in the search result presented in the search result presentation unit 14, the user can continue to discard the current search result and start the search again from step S21. If there are many search results and a desired document cannot be found, an additional search for the current search results can be performed (step S24). When performing an additional search, the current search results, that is, ID numbers of all index information of the searched document are stored as data in the memory (step S25), and the index information is retrieved from the index database 9 by the additional search. When acquiring the ID, only the one that matches the ID number on the memory may be taken out. In addition, by saving the search results of each search, it is possible to return to the previous search results or to perform a plurality of additional searches in parallel.
[0084]
As described above, the document search is executed in the search unit 10.
[0085]
Next, with reference to FIG. 8, the search result acquisition operation in the search result presentation unit 14 will be described.
[0086]
First, the user selects a desired document from the index information presented on the user interface unit 15 (step S31). Next, the file name and path name are extracted from the index information of the selected document (step S32), and the file name and path name are transmitted to the browsing unit 1 (step S33). The browsing unit 1 reads the corresponding document file from the file storage unit 8 from the file name and path name and presents it (step S34).
[0087]
As described above, the retrieval result presenting unit 14 obtains the retrieval result.
[0088]
Next, a method for presenting search results in the search result presentation unit 14 will be described. As shown in FIG. 9, the search result presentation unit 14 presents the search results on the display screen of the display device, in a tabular presentation window and a tree presentation window. In the tabular presentation window, the contents of the index information of each document are presented in tabular order from the top of the search score. The tree form presentation window presents the tree form based on the path information included in the retrieved index information.
[0089]
These two presentation windows operate in cooperation with each other by the cooperation control unit 13. For example, when an arbitrary document presented in the tabular presentation window is selected by clicking with the mouse or the like, the portion of the document on the tree presentation window is selected and can be recognized at a glance by highlighting or the like. Similarly, when an arbitrary document presented in the tree form presentation window is selected, the document on the table form presentation window is selected.
[0090]
For this purpose, the linkage control unit 13 serves to transmit the ID number of the index information for the document selected in one presentation window to the other presentation window.
[0091]
Further, the document file that has not been searched in each folder may be presented at the same time in the tree form presentation window (see FIG. 10). In this case, the searched file is highlighted so that it can be distinguished from the unsearched file at a glance.
[0092]
As described above, the search result presentation unit 14 presents the search result.
[0093]
(Second Embodiment)
The document management apparatus according to the second embodiment of the present invention is an example in which a WWW browser application is applied to the browsing unit 1 in FIG. 1, and a WWW page (on the WWW browser) browsed by a user as a document to be managed. Index information necessary for searching for the WWW page later is automatically created, and the desired WWW page is searched using the index information.
[0094]
By the way, in the case of a WWW page, the browsing entity is usually outside the local machine. In addition, the entity is often updated every day, and filing index information for the entity is often useless at a later date. For this reason, in the second embodiment, a function of creating a copy of the WWW page in the file system on the local machine is added, and index information for the copy is created and registered in the index database.
[0095]
In addition, when browsing WWW pages through a daily WWW browser, it is noticed that a huge number of pages are being browsed without knowing it. In the case of the method of registering all the documents (index information) viewed as in the first embodiment, there is a problem that the number of registrations becomes enormous. Therefore, in the second embodiment, a function is added in which profile information is registered in advance and filtering processing based on the profile information is performed on the browsed WWW page. Note that the browsing unit of the document management apparatus according to the second embodiment is not limited to the WWW browser 103 as shown in FIG. 11, and an existing application such as an editor or a word processor may be applied. The former function that creates an index by copying entities outside the local machine is used when viewing and editing document files on various removable media such as floppy disks and magneto-optical disks with applications such as editors and word processors. Can be applied. Further, the latter filtering process using profile information can be applied to a document file as it is.
[0096]
FIG. 11 shows a configuration example of the document management apparatus according to the second embodiment. In FIG. 11, the same parts as those in FIG. 1 are denoted by the same reference numerals, and different parts will be described. That is, in FIG. 11, the browsing unit 1 in FIG. 1 is replaced with the WWW browser 103, and the registration operation setting unit 2 and the registration operation definition table 3 in FIG. 1 are profile setting unit 101, profile registration table 102, filtering unit 106, WWW. The page acquisition / storage unit 117 is replaced with a profile information presentation unit 117.
[0097]
The WWW browser 103 is configured by an existing browser application, but may be a dedicated one.
[0098]
The profile setting unit 101 is for setting profile information (see FIG. 12) used when the document filtering process performed by the filtering unit 106 is performed. As shown in FIG. 12, for example, a plurality of keywords and a plurality of keywords are set. URLs (Uniform Resource Locators) can be set.
[0099]
The profile registration table 102 stores the profile information set in the profile setting unit 101 as shown in FIG. 12 together with the set time in the table format shown in FIG.
[0100]
The filtering unit 106 compares the keyword extracted by the keyword extracting unit 5, the attribute value acquired by the attribute acquiring unit 6 with the latest profile information set in the profile registration table 102, and determines whether or not to perform registration. Tell the registration control unit 4. When the keyword and the attribute value extracted from the browsing WWW page match the profile information, the registration control unit 4 registers the WWW page.
[0101]
The WWW page acquisition / save unit 107 copies each object constituting the page browsed by the WWW browser 103 into a single folder onto the file storage unit 8.
[0102]
The profile information presentation unit 117 is for graphically presenting the contents of the profile registration table 102 in time series, and can also search and delete a WWW page based on the presented profile information (described later).
[0103]
FIG. 14 shows the structure of index information, which is composed of items of ID number, URL, folder name, head file name, title, creation date, keyword, filtering type, and hit keyword, and is registered. One is created for each page.
[0104]
The ID numbers are numbered in the order of creation of index information in the order of “1”, “2”, “3”... And are unique integer data in all index information in the index database 9.
[0105]
The URL is character string data representing the URL of the WWW page to be registered.
[0106]
The folder name is character string data representing the path name of the folder storing the WWW page copied on the file storage unit 8.
[0107]
The first file name is character string data representing the file name of the file in which the HTML document of the first page is copied among the files in the folder.
[0108]
The title is character string data representing the title attached to the WWW page to be registered.
[0109]
The creation date / time is date-type data representing the date / time when the copied file was created or the last update date / time.
[0110]
The keyword is list-type data whose value is the keyword extracted from the text portion of the WWW page by the keyword extraction unit 5. If no keyword is extracted, this field is an empty list type. When the number of keywords exceeds the upper limit of the maximum number of elements predetermined as a list type, list types corresponding to the maximum number of elements are formed in the order detected as keywords.
[0111]
The filtering type indicates in advance whether the filtering process for the WWW page to be registered was performed with a URL defined in the profile, with a keyword, with both, or with no filtering. The defined numeric data takes one of the defined values shown in FIG.
The hit keyword is list-type data whose value is a keyword that matches the keyword of the profile information among the keywords extracted from the WWW page when the filtering process is performed on the WWW page to be registered. . If filtering by keyword is not performed, this field will be an empty list type.
[0112]
Next, with reference to FIG. 16, processing for creating and registering index information in the registration control unit 4 will be described.
[0113]
It is monitored whether a new WWW page is displayed on the WWW browser 103 (step S41). As this monitoring method, a method that is standardly used by an application resident in the system is used. For example, in the case of MS-Windows, a Windows message generated when a new WWW page is displayed is monitored, and it is detected that a new WWW page is displayed by hooking the message.
[0114]
When the display of a new WWW page is detected (step S42), the profile registration table 102 is referenced to check whether a URL is registered as a profile (step S43).
[0115]
If the URL is not registered in the profile, the process proceeds to step S46. On the other hand, if the URL is registered in the profile, the process proceeds to step S44, where the attribute acquisition unit 6 is activated, and the attribute acquisition unit 6 acquires the URL of the WWW page from the WWW browser 103 (step S44).
[0116]
Next, the filtering unit 106 is activated, and the URL registered as a profile is compared with the URL of the current WWW page to check whether there is the same URL (step S45). If there is the same URL, the following process is executed. If there is no same URL, the registration process is interrupted, and the process returns to step S41 to monitor the WWW browser.
[0117]
Next, referring to the profile registration table 102, it is checked whether or not a keyword is registered as a profile (step S46). If no keyword is registered in the profile, the process proceeds to step S49. On the other hand, if the keyword is registered in the profile, the process proceeds to 47, where the keyword extraction unit 5 is activated to extract the keyword from the text information in the WWW page (step S47). Note that, as described above, the keyword extraction is performed by a method normally used in a normal filing system or search system.
[0118]
Subsequently, the filtering unit 106 is activated, and the keyword extracted from the WWW page is compared with the keyword registered as a profile in the profile registration table 102 and collated (step S48). When there is no matching keyword, that is, when the same keyword as the keyword registered as a profile does not exist in the WWW page, the registration of the WWW page is interrupted, the process returns to step S41, and the WWW browser is monitored again.
[0119]
Subsequently, it is checked whether or not there is index information having the same URL as the WWW page in the index database 9 (step S49). If there is no index information of the same URL, the process proceeds to step S51 and the subsequent registration process is executed. When index information having the same URL as the WWW page exists in the index database 9, a copy of the WWW page in the folder pointed to by the index information and each file constituting the browsed WWW page If no different file exists (Step S50), it can be determined that the same WWW page as that being browsed has already been registered. After completing the registration process, the process returns to step S41 to monitor the WWW browser again. On the other hand, if even one different file exists, the process proceeds to step S51, and the subsequent registration process is executed.
[0120]
In the WWW page registration process, first, the WWW page acquisition / save unit 107 is activated to create a copy of the WWW page in the file storage unit 8 (step S51). The copy is created using a method generally performed by an application having an automatic patrol function for WWW pages.
[0121]
Next, the attribute acquisition unit 6 is activated to acquire each attribute value of the folder name, the first file name, the title, and the creation time from the currently viewed WWW page and a copy of the WWW page (step S52). Of these, the title is acquired directly from the WWW browser 103. The folder name and the head file name are obtained by leaving information at the time of making a copy and reading it out. As the creation time, the creation time of the folder to be copied is acquired by the same method as in the first embodiment. Alternatively, the current time may be obtained using the clock function of the OS, and this may be used as the creation time. Also, the URL used in the process (step S44) is used.
[0122]
Subsequently, the index information creating unit 7 is activated, and index information is created based on the keyword extracted by the keyword extracting unit 5 and each attribute value acquired by the attribute acquiring unit 6 (step S53). The method for creating index information is the same as that in the first embodiment. The created index information is newly registered in the index database 9 (step S54). Thereafter, the process returns to step S41, the WWW browser 103 is monitored, and the above processing is continued until the browser is terminated.
[0123]
Next, the search processing operation for the registered WWW page will be described. The document management apparatus shown in FIG. 11 has two search functions, that is, a search function similar to that of the first embodiment performed by a search expression input to the search unit 10 and a search function from the profile information presenting unit 117. .
[0124]
First, the search processing operation performed by the search expression input to the search unit 10 will be described. Here, the operation of the search result presentation unit 14 is different from that of the first embodiment. The search result presenting unit 14 presents the retrieved WWW pages in a tabular format presentation window and a tree format presentation window as shown in FIG. In the tabular presentation window, the index information content of each WWW page is presented in tabular order from the top of the search score. In the tree format presentation window, the WWW pages having the same URL are grouped into one hierarchy and presented as a two-level tree structure as shown in FIG. These two presentation windows operate in cooperation with each other as in the first embodiment by the cooperation control unit 13.
[0125]
In the tree form presentation window, the tree display as shown in FIG. 18 in which the domain display portion of the URL is hierarchically arranged in order from the right element, that is, the first domain, can also be performed.
[0126]
Next, the search processing operation from the profile information presentation unit 117 will be described with reference to FIGS. 19 and 20.
[0127]
FIG. 19 shows an example of a display screen displayed on the display device by the profile information presentation unit 11. The profile information presentation unit 11 is a graphic display of the contents of the profile registration table 102 according to the time axis 201, and all URLs and keywords set as profiles are used for their respective periods (set as profile information. (Period) are displayed in the URL display area 204 and the keyword display area 205. For example, it is confirmed from the display in FIG. 19 that the URL “www.aaa.ddd.edu” is valid profile information from mid-November 1999 to the end of January 2000, and the keyword “network” is valid from the end of January 2000. it can. The display areas 204 and 205 can be scrolled left and right (in the time axis) by clicking the scroll buttons 202 and 203 attached to the left and right of the time axis 201 with a pointing device such as a mouse.
[0128]
Hereinafter, the search processing operation from the profile information presentation unit 117 will be described with reference to the flowchart shown in FIG.
[0129]
First, the user selects an arbitrary URL and keyword from the display screen shown in FIG. 19 displayed by the profile information presentation unit 117 (step S61). Hereinafter, a case where both a URL and a keyword are selected will be described as an example, but the same applies to the case where only one of them is selected. As a selection method, a method of clicking and selecting a line segment representing a URL and a keyword with a pointing device and a line segment 206 for designating a time indicated by a vertical dotted line as shown in FIG. There are two types of selection methods: a method of selecting by moving and overlaying on a desired URL and keyword line segment. In the latter case, it is also possible to select two URLs and keywords in a rectangular area surrounded by both line segments 206.
[0130]
Next, the search button 207 is pressed (step S62). Subsequently, when the selected URL and keyword are transmitted to the search unit 10 (step S63), the index information of the already registered WWW page is obtained by filtering type and URL by the filtering process performed on the URL and the keyword. The field search is performed for each field of the hit keyword (step S64).
[0131]
Subsequently, the search result is transmitted to the search result presentation unit 14, and the search result as shown in FIGS. 17 and 18 is presented in the same manner as the search result using the search formula (step S65).
[0132]
Thereafter, similarly to the search of the first embodiment, a WWW page is selected from the search result (step S66), and the file name and path name are acquired from the corresponding index information (step S67). Subsequently, the file name and path name are notified to the WWW browser 103 (step S68), and the WWW browser 103 reads the corresponding WWW page from the file storage unit 8 and presents it (step S69).
[0133]
Next, processing for deleting index information, a WWW page copy file, and profile information in the profile information presentation unit 117 will be described.
[0134]
In addition to the WWW page search function from the profile information described above, the profile information presenting unit 117 further stores the searched index information and a copy file of the WWW page from the index database 9 and the file storage unit 8, respectively. Are deleted from the profile registration table 102 respectively. With this function, it is possible to effectively delete the profile information set in the past that has become unnecessary, and the copy file and index information of the corresponding WWW page. The deletion process will be described below with reference to the flowchart shown in FIG.
[0135]
First, the user selects an arbitrary URL and keyword displayed on the profile information presentation unit 117 (step S71). Hereinafter, a case where both a URL and a keyword are selected will be described as an example, but the same applies to the case where only one of them is selected. As a selection method, a method of selecting by clicking a line segment representing a URL or a keyword with a pointing device, and a line segment 206 for designating a time indicated by a vertical dotted line as shown in FIG. 19 are moved left and right by the pointing device. There are two types of selection methods: a method of selecting by overlaying on a line segment representing a desired URL and keyword. In the latter case, it is effective to select a line segment representing a URL or keyword that was valid before the specified time, that is, only in the area on the left side of the line segment 206 indicating the time, or the time. It is also possible to select two URLs and keywords in a rectangular area surrounded by both line segments 206.
[0136]
Next, the delete button 208 is pressed (step S72). At this time, it is checked whether or not the same URL and keyword as the selected one exist other than the selected one (step S73). If they exist, these URLs and keywords are removed from the selection (step S74). Then, it is checked whether the selected URL or keyword remains (step S75). When it remains, it progresses to step S76, and when it does not remain, a process is complete | finished without deleting.
[0137]
Subsequently, the URL and the keyword are transmitted to the search unit 10 (step S76), and the index information of the registered WWW page by filtering processing performed with the URL and the keyword includes the filtering type, URL, and hit keyword fields. Is searched by field search for (step S77).
[0138]
Next, in the profile information presenting unit 117, all of the searched index information is filtered by the URL and keywords other than the index information selected from the field values of the filtering type, URL, and hit keyword. (Step S78). If it is filtered by another URL or keyword, the index information is removed from the deletion target (step S79). As a result of the above processing, if there remains index information to be deleted (step S80), the process proceeds to step S81, and if not, the process proceeds to step S84.
[0139]
The profile information presentation unit 117 acquires a file name and a path name from the index information to be deleted (Step S81). The profile information presentation unit 117 deletes the copy file of the corresponding WWW page from the file storage unit 8 (step S82), subsequently deletes the corresponding index information from the index database 9 (step S83), and finally registers the profile. The URL and the keyword, which are the profile information of the file of the WWW page deleted from the table 102, are deleted (step S84).
[0140]
Note that the present invention is not limited to the above-described embodiment, and can be implemented with appropriate modifications, for example, by combining the first embodiment and the second embodiment within a range that does not change the gist.
[0141]
As described above, according to the above-described embodiment, the user himself / herself actually creates, edits, or creates index information by detecting an operation in the browsing unit 1 and automatically during a preset operation. Index information for the browsed document is automatically created, and it is possible to easily retrieve and call the document later using this index information.
[0142]
Further, by filtering a document (for example, a WWW page) viewed based on a preset profile and copying the browsed document file, for example, preset profile information in a huge document obtained through a WWW browser Are automatically copied onto the file storage unit 8 at the time of browsing, and index information for the copied data is automatically created, and a document can be easily searched later using the index information. It becomes possible to call.
[0143]
【The invention's effect】
As described above, according to the present invention, documents actually created, edited, or viewed by the user can be easily retrieved later, and document management can be easily performed.
[Brief description of the drawings]
FIG. 1 is a diagram showing a configuration example of a document management apparatus according to a first embodiment of the present invention.
FIG. 2 is a diagram showing the structure of index information.
FIG. 3 is a diagram showing a specific example of a value indicating a file type.
FIG. 4 is a diagram showing a specific example of a value indicating the type of media.
FIG. 5 is a flowchart for explaining an index information creation processing operation;
FIG. 6 is a flowchart for explaining an operation of registering index information in an index database.
FIG. 7 is a flowchart for explaining a search processing operation;
FIG. 8 is a flowchart for explaining the search result acquisition processing operation of the search result presentation unit.
FIG. 9 is a diagram showing an example of presenting search results.
FIG. 10 is a diagram showing another example of presentation of search results.
FIG. 11 is a diagram showing a configuration example of a document management apparatus according to a second embodiment of the present invention.
FIG. 12 is a diagram showing the structure of profile information.
FIG. 13 is a diagram showing the structure of a profile registration table.
FIG. 14 is a diagram showing the structure of index information.
FIG. 15 is a diagram showing a specific example of a value indicating a filtering type.
FIG. 16 is a flowchart for explaining an index information creation processing operation;
FIG. 17 is a diagram showing an example of presenting search results.
FIG. 18 is a diagram showing another example of presentation of search results.
FIG. 19 is a diagram for explaining the processing operation of the profile information presentation unit, in which the contents of the profile registration table are graphically displayed according to the time axis, and all the URLs and keywords set as profiles are displayed for each valid period. The figure which showed a mode that it displayed with the line segment showing.
20 is a flowchart for explaining a processing operation when searching for a document from the display content shown in FIG.
FIG. 21 is a flowchart for explaining a processing operation when deleting document and index information from the display content shown in FIG. 19;
[Explanation of symbols]
1 ... Browsing part
2. Registration operation setting part
3 ... Registered action definition table
4. Registration control unit
5 ... Keyword extraction unit
6 ... Attribute acquisition unit
7 ... Index information creation part
8 ... File storage
9 ... Index database
10 ... Search section
11 ... Tabular presentation section
12 ... Tree format presentation section
13 ... Cooperation control unit
14 ... Search result presentation section
15. User interface section
101 ... Profile setting section
102 ... Profile registration table
103 ... WWW browser
106: Filtering unit
107 ... WWW page acquisition / storage unit
117 ... Profile information presentation part

Claims

A creation / editing / display means for creating, editing, and displaying a document by displaying a screen for creating, editing, and displaying a document;
First storage means for storing a document created / edited / displayed by the creation / editing / display means;
Second storage means for storing index information generated for a document created / edited / displayed on the screen;
A document management method in a document management apparatus comprising:
When the creation / editing / display means closes the screen, and when the creation / editing / display means stores the document created / edited / displayed on the screen in the first storage means, At least one of a plurality of timings including when the creation / editing / display unit displays a new document is used as a registration timing for registering index information of the document created, edited, and displayed on the screen. A first step to set,
A second step of detecting the registration timing generated by the creating / editing / displaying unit;
A third step of extracting a keyword from the document created / edited / displayed on the screen;
A fourth step of acquiring attribute information including a file name of a document created / edited / displayed on the screen and a storage position in the first storage unit in which the document is stored;
Index information including keywords extracted in the third step and attribute information acquired in the previous fourth step with respect to the document created, edited, and displayed on the screen when the registration timing is detected A fifth step of generating
A sixth step of storing the index information in the second storage means;
A document management method comprising:

An input step of inputting profile information including at least one of a keyword and a URL (Uniform Resource Locator);
Storing the inputted profile information together with a time when the profile information is inputted in a third storage means provided in the document management apparatus;
A selection step of selecting a document having a URL included in the profile information and an extracted document having the same keyword as the keyword included in the profile information from among the documents created, edited, and displayed on the screen;
Further including
The fifth step includes the keyword extracted in the third step and the attribute information acquired in the fourth step for the document selected in the selection step when the registration timing is detected. 2. The document management method according to claim 1, wherein index information is generated.

An input step for inputting a search expression for searching for a desired document;
A search step of searching for a document from the first storage means using the search formula and the index information stored in the second storage means;
Displaying the index information of each searched document and the hierarchical structure indicating the storage position of the searched document in the first storage means on the display means provided in the document management device;
When one of the index information displayed on the display means is selected, the step of highlighting the storage position of the document corresponding to the selected index information on the hierarchical structure displayed on the display means; ,
The document management method according to claim 1, further comprising:

The third storage means stores a plurality of profile information input in time series by the input means together with the time when each profile information was input,
3. The document management method according to claim 2, wherein the selecting step selects a document by using profile information with the latest time among a plurality of profile information stored in the third storage unit.

Each keyword and each URL included in each profile information stored in the third storage means, and a period during which each keyword and each URL was used as profile information based on the input time of each profile information Displaying on the display means provided in the document management apparatus;
Selecting at least one of the keyword and URL displayed on the display means;
Further including
The search step searches for index information from the second storage means using a search expression including at least one of the selected keyword and URL, and the searched index information from the first storage means. The document management method according to claim 4, wherein a document corresponding to is searched.

6. The document management method according to claim 5, further comprising the step of deleting the index information searched in the search step and the document corresponding to the index information from the first and second storage means.

A creation / editing / display means for creating, editing, and displaying a document by displaying a screen for creating, editing, and displaying a document;
First storage means for storing a document created / edited / displayed by the creation / editing / display means;
When the creation / editing / display means closes the screen, and when the creation / editing / display means stores the document created / edited / displayed on the screen in the first storage means, At least one of a plurality of timings including when the creation / editing / display unit displays a new document is used as a registration timing for registering index information of the document created, edited, and displayed on the screen. Setting means for setting;
Detection means for detecting the registration timing generated in the creation / editing / display means;
Extraction means for extracting a keyword from a document created, edited and displayed on the screen;
An acquisition means for acquiring attribute information including a file name of a document created, edited and displayed on the screen and a storage position in the first storage means in which the document is stored;
Generating to generate index information including keywords extracted by the extracting unit and attribute information acquired by the acquiring unit for a document created, edited, and displayed on the screen when the registration timing is detected Means,
Second storage means for storing the index information;
A document management apparatus comprising:

Means for inputting profile information including at least one of a keyword and a URL (Uniform Resource Locator);
Third storage means for storing the profile information together with the time when the profile information was input;
A selection unit that selects a document having a URL included in the profile information and a document extracted with the same keyword as the keyword included in the profile information, among documents created, edited, and displayed on the screen;
Further comprising
The generating unit generates index information including a keyword extracted by the extracting unit and attribute information acquired by the acquiring unit for the document selected by the selecting unit when the registration timing is detected. The document management apparatus according to claim 7.

An input means for inputting a search expression for searching for a desired document;
Search means for searching for a document from the first storage means using the search formula and index information stored in the second storage means;
First display control means for displaying index information of each searched document and a hierarchical structure indicating a storage position of the searched document in the first storage means on the display means;
When one of the index information displayed on the display means is selected, the storage position of the document corresponding to the selected index information on the hierarchical structure displayed on the display means is highlighted. The document management apparatus according to claim 7, wherein:

The third storage means stores a plurality of profile information input in time series by the input means together with the time when each profile information is input,
9. The document management apparatus according to claim 8, wherein the selection unit selects a document using profile information with the latest time among a plurality of profile information stored in the third storage unit.

Each keyword and each URL included in each profile information stored in the third storage means, and a period during which each keyword and each URL was used as profile information based on the input time of each profile information Second display control means for displaying on the display means;
Means for selecting at least one of the keyword and URL displayed by the display means;
Further comprising
The search means searches for index information from the second storage means using a search expression including at least one of the selected keyword and URL, and sets the index information searched from the first storage means. The document management apparatus according to claim 10, wherein a corresponding document is searched.

Of the index information searched by the search means, out of the keywords and URLs included in the profile information, the index information including only the keywords and URLs included in the search formula, and the document corresponding to the index information 12. The document management apparatus according to claim 11, further comprising means for deleting from the first and second storage means.

A creation / editing / display means for creating, editing, and displaying a document by displaying a screen for creating, editing, and displaying a document;
First storage means for storing a document created / edited / displayed by the creation / editing / display means;
Second storage means for storing index information generated for a document created / edited / displayed on the screen;
On a computer with
When the creation / editing / display means closes the screen, and when the creation / editing / display means stores the document created / edited / displayed on the screen in the first storage means, At least one of a plurality of timings including when the creation / editing / display unit displays a new document is used as a registration timing for registering index information of the document created, edited, and displayed on the screen. A first step to set,
A second step of detecting the registration timing generated by the creating / editing / displaying unit;
A third step of extracting a keyword from the document created / edited / displayed on the screen;
A fourth step of acquiring attribute information including a file name of a document created / edited / displayed on the screen and a storage position in the first storage unit in which the document is stored;
Index information including keywords extracted in the third step and attribute information acquired in the previous fourth step with respect to the document created, edited, and displayed on the screen when the registration timing is detected A fifth step of generating
A sixth step of storing the index information in the second storage means;
A machine-readable recording medium on which a program for executing the program is recorded.

The program is
An input step of inputting profile information including at least one of a keyword and a URL (Uniform Resource Locator);
Storing the input profile information together with the time at which the profile information was input in a third storage means provided in the computer;
A selection step of selecting a document having a URL included in the profile information and an extracted document having the same keyword as the keyword included in the profile information from among the documents created, edited, and displayed on the screen;
Further including
The fifth step includes the keyword extracted in the third step and the attribute information acquired in the fourth step for the document selected in the selection step when the registration timing is detected. 14. The recording medium according to claim 13, wherein index information is generated.

A screen for creating, editing, and displaying a document, and a creation / edit / display means for detecting creation, editing, and display of the document,
First storage means for storing a document created / edited / displayed by the creation / editing / display means;
Second storage means for storing index information generated for a document created / edited / displayed on the screen;
A document management method in a document management apparatus comprising:
When the creation / editing / display means closes the screen, and when the creation / editing / display means stores the document created / edited / displayed on the screen in the first storage means, At least one of a plurality of timings including when the creation / editing / display unit displays a new document is used as a registration timing for registering index information of the document created, edited, and displayed on the screen. A first step to set,
A second step of detecting the registration timing generated by the creating / editing / displaying unit;
A third step of extracting a keyword from the document created / edited / displayed on the screen;
A fourth step of acquiring attribute information including a file name of a document created / edited / displayed on the screen and a storage position in the first storage unit in which the document is stored;
Index information including keywords extracted in the third step and attribute information acquired in the previous fourth step with respect to the document created, edited, and displayed on the screen when the registration timing is detected A fifth step of generating
A sixth step of storing the index information in the second storage means;
A document management method comprising:

A screen for creating, editing, and displaying a document, and a creation / edit / display means for detecting creation, editing, and display of the document,
First storage means for storing a document created / edited / displayed by the creation / editing / display means;
When the creation / editing / display means closes the screen, and when the creation / editing / display means stores the document created / edited / displayed on the screen in the first storage means, At least one of a plurality of timings including when the creation / editing / display unit displays a new document is used as a registration timing for registering index information of the document created, edited, and displayed on the screen. Setting means for setting;
Detection means for detecting the registration timing generated in the creation / editing / display means;
Extraction means for extracting a keyword from a document created, edited and displayed on the screen;
An acquisition means for acquiring attribute information including a file name of a document created, edited and displayed on the screen and a storage position in the first storage means in which the document is stored;
Generating to generate index information including keywords extracted by the extracting unit and attribute information acquired by the acquiring unit for a document created, edited, and displayed on the screen when the registration timing is detected Means,
Second storage means for storing the index information;
A document management apparatus comprising: