JP4253134B2

JP4253134B2 - Document processing apparatus, document processing method, program, and recording medium

Info

Publication number: JP4253134B2
Application number: JP2001037762A
Authority: JP
Inventors: 一寿武谷
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2001-02-14
Filing date: 2001-02-14
Publication date: 2009-04-08
Anticipated expiration: 2021-02-14
Also published as: JP2002245065A

Description

【０００１】
【発明の属する技術分野】
本発明は、電子化された文書を管理する文書処理装置、文書処理方法、プログラムおよび記録媒体に係り、詳細には、文書の属性を利用して検索や分類を行い文書群の分析を行う技術に関する。
【０００２】
【従来の技術】
一般に、電子化文書を管理するシステムでは、文書名、作成日や更新日などの書誌情報を利用して文書群の管理、検索や分類などの文書群の分析を支援している。
また、個々の文書にキーワードや文書内容の一部を抽出した情報を文書の書誌情報として利用し、文書群の分析を行ったり、キーワード抽出や文書内容からの情報抽出を自動化して、文書データベース等へ文書を登録するときの利便性を向上させることも従来から行われている（例えば、特開平９−６７９２号を参照）。
また、文書へ予め用意された付帯情報以外に、利用者が文書へ付帯させる情報を定義できるシステムでは、属性の項目名称や項目の型を定義することが可能で、文書登録時や文書更新時を含めた任意の時点で利用者が定義した項目の型に従い項目値を入力したり、自動で入力させたりすることもできる。
また、これまでに述べた文書の書誌情報や文書付帯項目を複数、纏めて文書の属性として定義することも従来から行われている。例えば、属性の名称を定義し、その属性に含まれる項目の名称や項目の型を複数定義して文書種として利用するものであって、属性内の項目の増減、項目の名称／型を編集することができ、文書の登録や更新時に文書の属性を選択し、個々の項目の型に従った項目値を代入することができる。
このような文書の書誌情報や文書属性に対して、書誌情報の値、文書属性の値を検索条件として指定して文書群からいくつかの文書を抽出したり、それらの値に基づいて分類して文書群の分析を行ったりすることも従来から行われている。
さらに、指定した検索条件で検索された文書や、すでに提示されている文書との類似での文書検索の指示や、その文書とはある観点が違っている文書の検索を支援する技術としては、特許第２５８１３７６号公報に記載された技術がある。
【０００３】
【発明が解決しようとする課題】
これらの従来の方法では、利用者が定義する文書の属性の名称、属性に含まれる項目の名称、項目の型の変更などの、文書属性の編集を行った場合に、その編集の前後で、属性種が異なるため、旧属性種を指定して検索を行うと新属性種に対応した文書が検索できず、また、このような編集記録を利用者が記録管理したとしても検索条件の指定が煩雑になるという問題があった。
また、システムが文書群の分類を行う際にも、このような編集の前後の関連性を考慮しないため、望んでいる検索結果や分類結果が得られないことがあった。
本発明は、上記問題点を解決するためになされたものであって、文書の属性の名称、属性に含まれるデータ項目の名称やデータ型等の変更があった場合にも、変更前後の属性を用いて検索や分類が行える文書処理装置、文書処理方法、プログラムおよび記録媒体を提供し、これにより利用者が文書の属性等の変更を自分で管理する労力を軽減することを目的とする。
【０００４】
【課題を解決するための手段】
上記の問題を解決するために、本願の請求項１の発明は、コンピュータを用いて電子化された文書の検索を行う文書処理装置であって、識別名称である属性ＩＤと、該属性ＩＤに対応付けた属性の名称と複数のデータ項目と、該データ項目毎のデータ型とからなる属性を保持する文書属性蓄積部と、前記属性を文書毎に付与して複数の文書を保持する文書蓄積部と、利用者が前記文書属性蓄積部に保持された前記属性の編集あるいは新規に属性を定義する文書属性編集部と、前記文書属性編集部で編集された編集履歴を編集もとになった属性ＩＤと関連付けて保持する文書属性編集履歴蓄積部と、利用者から入力された検索条件に適合する文書を前記文書蓄積部から検索する分析処理実行部とを備え、前記分析処理実行部は、入力された検索条件中の前記属性に対応する編集履歴を前記文書属性編集履歴蓄積部および前記文書属性蓄積部から取り出して、編集もとになった属性ＩＤに対応付けた属性の名称とデータ項目を検索条件に追加して該当文書を検索することを特徴とする。
また、本願の請求項２の発明は、前記分析処理実行部が検索した結果を出力装置に出力する請求項１に記載の文書処理装置を特徴とする。
【０００５】
また、本願の請求項３の発明は、コンピュータを用いて電子化された文書の検索を行う文書処理方法であって、文書属性蓄積部に識別名称である属性ＩＤと、該属性ＩＤに対応付けた属性の名称と複数のデータ項目と、該データ項目毎のデータ型とからなる属性を保持し、文書蓄積部に前記属性を文書毎に付与して複数の文書を保持し、文書属性編集部に利用者が前記文書属性蓄積部に保持された前記属性の編集あるいは新規に属性を定義し、文書属性編集履歴蓄積部に前記文書属性編集部で編集された編集履歴を編集もとになった属性ＩＤと関連付けて保持して、分析処理実行部が利用者から入力された検索条件に適合する文書を前記文書蓄積部から検索し、入力された検索条件中の前記属性に対応する編集履歴を前記文書属性編集履歴蓄積部および前記文書属性蓄積部から取り出して、編集もとになった属性ＩＤに対応付けた属性の名称とデータ項目を検索条件に追加して該当文書を検索することを特徴とする。
【０００６】
また、本願の請求項４の発明は、コンピュータを、識別名称である属性ＩＤと、該属性ＩＤに対応付けた属性の名称と複数のデータ項目と、該データ項目毎のデータ型とからなる属性を保持する文書属性蓄積部、前記属性を文書毎に付与して複数の文書を保持する文書蓄積部、利用者が前記文書属性蓄積部に保持された前記属性の編集あるいは新規に属性を定義する文書属性編集部、前記文書属性編集部で編集された編集履歴を編集もとになった属性ＩＤと関連付けて保持する文書属性編集履歴蓄積部、利用者から入力された検索条件に適合する文書を前記文書蓄積部から検索し、入力された検索条件中の前記属性に対応する編集履歴を前記文書属性編集履歴蓄積部および前記文書属性蓄積部から取り出して、編集もとになった属性ＩＤに対応付けた属性の名称とデータ項目を検索条件に追加して該当文書を検索する分析処理実行部、として機能させるためのプログラムを特徴とする。
【０００７】
また、本願の請求項５の発明は、コンピュータを、識別名称である属性ＩＤと、該属性ＩＤに対応付けた属性の名称と複数のデータ項目と、該データ項目毎のデータ型とからなる属性を保持する文書属性蓄積部、前記属性を文書毎に付与して複数の文書を保持する文書蓄積部、利用者が前記文書属性蓄積部に保持された前記属性の編集あるいは新規に属性を定義する文書属性編集部、前記文書属性編集部で編集された編集履歴を編集もとになった属性ＩＤと関連付けて保持する文書属性編集履歴蓄積部、利用者から入力された検索条件に適合する文書を前記文書蓄積部から検索し、入力された検索条件中の前記属性に対応する編集履歴を前記文書属性編集履歴蓄積部および前記文書属性蓄積部から取り出して、編集もとになった属性ＩＤに対応付けた属性の名称とデータ項目を検索条件に追加して該当文書を検索する分析処理実行部、として機能させるためのプログラムを記録したコンピュータ読み取り可能な記録媒体を特徴とする。
【０００８】
【発明の実施の形態】
以下に、図面を用いて本発明の実施の形態の構成および動作を詳細に述べる。
（１）機能構成
図１は、本発明の実施の形態の一例である文書処理装置の機能構成を示すブロック図である。
本実施の形態の文書処理装置は、制御部１００、文書登録部１１０、文書属性付与部１２０、文書属性編集部１３０、分析処理実行部１４０、文書蓄積部１５０、文書属性蓄積部１６０、文書属性編集履歴蓄積部１７０とから構成される。
制御部１００は、利用者がキーボード等の入力装置を操作して文書登録、文書属性編集、文書属性付与、分析実行などを指定したときの文書処理装置全体の制御を行う。
文書登録部１１０は、スキャナ、カメラ、キーボード等の入力装置またはネットワークを介して、文書を入力し、入力された文書を文書蓄積部１５０へ登録する。この文書の登録過程やその結果は、ディスプレイやプリンタ等の出力装置で確認することができる。
【０００９】
文書蓄積部１５０のデータ構造は、文書ごとに、文書を管理するための文書識別子（文書ＩＤ）、文書名、書誌事項（登録日、登録者名、更新日、文書の形式（例えば、テキスト情報、イメージ情報、音声情報等）、文書のサイズ、メモ等）、属性ＩＤ、データ項目一覧およびコンテンツリスト等を保持している（図２、図３参照）。ここで、属性ＩＤは、文書属性蓄積部１６０に登録された属性を識別する唯一名であり、データ項目一覧は、この属性ＩＤで示される属性のデータ項目に対応した実際の値を保持する。また、コンテンツリストは、文書の内容（テキスト、画像、音声等のデータ）を示すコンテンツを保存しているファイル識別子のリストである。
文書属性編集部１３０は、文書に付与する属性の名称と、この属性に含まれる項目の名称やこの項目のデータ型等を定義し、この属性を文書属性蓄積部１６０へ登録する。一度、登録された文書の属性を文書属性蓄積部１６０から取り出し、編集を行った結果を文書属性蓄積部１６０へ再登録して更新することができる。もちろん、属性自身やデータ項目を削除することもできる。
この文書の属性の編集の過程や結果は、ディスプレイやプリンタ等の出力装置で確認することができる。
文書属性蓄積部１６０のデータ構造（図４、図５参照）は、各文書に付与される属性のデータ項目を定義するものであって、属性ＩＤによって識別される。すなわち、各文書は、書誌事項の他にここで定義される可変データ項目をもち、このデータ項目は文書蓄積部１５０のそれに対応している。この可変データ項目は、この可変データ項目を使用するか否かのフラグ、このデータ項目のデータ型（文字型、文字列型、日付型、名前型、数値型、論理型等）およびこのデータ項目の名前との３つ組対で構成される。
【００１０】
図５は、本実施の形態の説明で使われる属性の例である。属性の名称は、任意の名前（例えば「abc」）を定義することができる。この例のように、属性には複数の項目をまとめて定義することができ、データ項目の名称とデータ項目のデータ型（値の型）として、データ項目の名称に文書印刷日、そのデータ項目に対するデータ型に日付型等のように定義する。
図５の文書属性蓄積部１６０に登録された属性を編集する例を図６（編集前の状態）と図７（編集後の状態）に示した。
この図６では、属性の名称を「abc」から「xyz」に、データ項目の名称「文書印刷日」を「最終印刷日」に、データ項目の名称「図面区別」のデータ型を「文字列型」から「数値型」に変更し、データ項目として名称に「顧客コード」、そのデータ型に「数値型」を追加している。また、属性「abc」自体を削除することもできる。
なお、属性の編集作業を図８に示したような画面を使って行うように構成してもよい。
図８の画面上部に示された登録属性参照画面は、既に文書属性蓄積部１６０に登録されている属性の一覧を表示する。また、図８の画面下部の新規定義画面は、新たに定義する属性の編集画面を示しており、この例では属性「new」を定義している。この新規定義画面では、一つ一つの項目型の名称、項目型を定義する。このとき、登録属性参照画面から属性の全体、または、それに含まれる項目の一部を選択して定義することもできる。ここでは、属性「abc」と属性「ghi」のすべてのデータ項目を選択して新たな属性「new」としている。
この属性の定義は、登録属性参照画面に示された二つ以上の属性を利用して新たな属性を定義することもできる。
例えば、登録属性参照画面に示された属性「Ａ」や「Ｂ」から、新規定義画面で属性「ａ」、「ｂ」、「ｃ」、「ｄ」を定義する場合に両者の関係として図９のような関係が可能である。図中左から、
・属性Ａの全項目と、新たに定義する項目を加えてａを定義する
・属性Ａの項目の一部を選択して、ｂを定義する
・属性Ａと属性Ｂの項目の和から、ｃを定義する
・属性Ａと属性Ｂの項目の積から、ｄを定義する
【００１１】
文書属性付与部１２０は、文書蓄積部１５０から取り出した文書と文書属性蓄積部１６０から取り出した属性とを対応づけたり、その属性に含まれる項目の値を代入する等を行う。この対応づけられた文書と属性は、文書蓄積部１５０へ登録される。この文書への属性の対応づけの処理過程やその結果は、ディスプレイやプリンタ等の出力装置で確認することができる。
例えば、図１０を用いて文書属性付与部１２０を説明する。この例では、文書登録部１１０で文書蓄積部１５０へ登録する文書あるいはすでに文書蓄積部１５０に登録されている文書「いろは」に、属性を対応づけを行っているところを示している。
この図１０では、すでに付与されている文書「いろは」の属性「xyz」を、文書属性蓄積部１６０に登録されている属性名リスト（画面の右の部分）の中から属性「abc」を選択して、変更している。さらに、属性「abc」を選択した後、その属性に定義されている各々の項目の値を図１１のように代入する。これらの編集結果を文書蓄積部１５０へ文書のコンテンツと対応させる形式で登録する。
分析処理実行部１４０は、利用者によりキーボードやポインティングデバイス等の入力装置から検索や分類の条件の設定が行われた後、文書蓄積部１５０から文書群を取り出し、指定された条件を参照して検索や分類等の処理を実行する（詳しくは後述する）。この文書群の分析過程やその結果は、ディスプレイやプリンタ等の出力装置で確認することができる。
【００１２】
文書属性編集履歴蓄積部１７０は、文書属性編集部１３０が実行する編集処理の履歴を記録する。文書属性編集履歴蓄積部１７０のデータ構造（図１２参照）は、各属性の編集登録日に対して、属性ＩＤ、その編集日と編集のもとになった属性の属性ＩＤとの３つ組みで表される。図１２の文書属性編集履歴蓄積部１７０のデータ構造の編集日を文書属性蓄積部１６０のデータ構造に加えるようにして、文書属性蓄積部１６０と文書属性編集履歴蓄積部１７０とを一つに管理してもよい。
例えば、図１２は、図６の属性「abc」が１９８３年３月３１日に新たに定義され、１９９６年７月２８日に図７の属性「xyz」のように編集を加えられ、さらに、同日にその属性「xyz」が変更された場合を示している。この編集のように、データ項目「顧客コード」が削除された場合は、使用可否のフラグが「否」となる。
なお、この属性の編集履歴を表現するときに、文書属性蓄積部１６０のデータ構造を図１３のように、属性のデータ項目ごとに文書属性蓄積部１６０内でユニークに管理される「管理ＮＯ．」とどの属性のどのデータ項目を流用したかを示す「from」とを追加し、さらにその属性の編集日を追加して構成すれば、文書属性編集履歴蓄積部１７０は文書属性蓄積部１６０へ統合され、且つ、編集のもとになる属性を２つ以上表現することができる。
例えば、図１４（文書属性蓄積部１６０）に示すように、１９８２年６月１０日に定義された属性「abc」のデータ項目の一部を利用して新たに属性「OPQ」を定義する場合を考える。
【００１３】
先ず、属性「abc」のデータ項目「印刷日」は、文書属性蓄積部１６０内でユニークに管理された管理番号「1」が割り振られている。これを利用したとすると属性「OPQ」の「from」にはこの管理番号「１」が代入され、それ自身の管理番号はユニークな管理番号「１０１」が与えられる。次に、属性「abc」のデータ項目「承認者」は、文書属性蓄積部１６０内でユニークに管理された管理番号「２」が割り振られて、これを属性「OPQ」では「所属長」と名称を変えたときには、「from」はこの管理番号「２」が代入され、名称や型を変更し、自身の管理番号はユニークな管理番号「１０２」が与えられる。
また、属性「abc」のすべてのデータ項目を利用して、新たな属性「UVW」を定義する場合には、属性「OPQ」のデータ項目「印刷日」の「from」には「1/AID001」が代入され、それ自身の管理番号はユニークな管理番号「２０１」が与えられている。この「／」の右側の「AID001」は、１９８２年６月１０日に編集登録された属性「abc」のユニークな識別子を表し、の左側の「１」はその属性のデータ項目「印刷日」のユニークな管理番号を示している。
また、図１４の属性ＩＤ「AID001」と「AID002」および「AID005」と「AID006」のように属性名だけを変更するような使い方としては、例えば、属性名称を「１９９９年発信文書」としていたものを、２０００年を契機に「２０００年発信文書」に変更したり、組織変更等で属性名称「開発室」を「第１開発課」というように変更したりするときに用いるとよい。
【００１４】
（２）分析処理実行部における検索処理
図１５は、分析処理実行部１４０において検索処理の処理手順を説明するためのフローチャートである。
先ず、利用者は、キーボード等の入力装置から文書蓄積部１５０に登録されている文書を検索するための検索条件を入力する（ステップＳ１００）。
入力された検索条件中に、属性名や属性の項目名が含まれているかを調べる（ステップＳ１１０）。含まれていないときには、ステップＳ１５０へ進んで、その検索条件で通常の検索を実行する。
含まれている場合は、文書属性蓄積部１６０と文書属性編集履歴蓄積部１７０を参照して、検索条件に含まれている属性名や属性の項目名に対応する編集履歴を取り出す（ステップＳ１２０）。
検索条件に含まれている属性名や属性の項目名に対応する編集履歴があるかどうか調べ（ステップＳ１３０）、なければステップＳ１５０進み、あれば、その取り出された履歴から新しい検索条件を追加する（ステップＳ１４０）。
例えば、図５と図１２を用いて説明すると、検索条件に属性「xyz」を持つ文書群を検索するように指定されている場合に、この属性を作成するもとになった属性「abc」をもつ文書群も検索するように検索条件に追加するようにする。
また、属性「xyz」のデータ項目「承認者」の値が検索条件として指定されたときには、その属性を作成するもととなった属性「abc」のさらに対応するデータ項目「文書検印者」を含めた検索条件を追加する。
【００１５】
この属性や項目の編集履歴を参照するには、文書属性蓄積部１６０のデータ構造を図１３のように構成したほうが見つけやすくなる。
例えば、図１４を使って説明すると、入力された検索条件の中に属性「OPQ」のデータ項目「印刷日」の値が指定されているとすると、このデータ項目を作成するときに参照したもとのデータ項目、即ち、属性「abc」のデータ項目「印刷日」を見つけられる。（図１４の管理番号「１０１」にその属性のデータ項目を定義した際に管理番号「１」を利用した記録が残されている。）
従って、管理番号「１」のデータ項目を文書属性蓄積部１６０から取得し、属性「abc」と対応づけられているデータ項目の名称「印刷日」も検索条件に追加することができる。
追加された検索条件（ステップＳ１１０（Ｎｏ）およびＳ１３０（Ｎｏ）からのときには入力された検索条件のまま）を用いて、文書蓄積部１５０の通常の検索を実行する（ステップＳ１５０）。
この検索結果をディスプレイやプリンタ等の出力装置へ出力する（ステップＳ１６０）。この結果を出力する際、利用者が指定した検索条件から直接検索された文書群と、本文書処理装置が検索対象を拡張した条件から検索された文書群を色別、グループ別等によって区別すると利用者には便利である。
本実施の形態をこのように構成すると、属性の名称、属性に含まれるデータ項目の名称やデータ型の変更などの編集が行われた際の履歴を記録しているので、その情報を利用して、編集の前、あるいは後の属性が条件として指定されて検索が行われた場合に、編集履歴に基づき検索対象を拡張して検索を行うことができる。
【００１６】
（３）分析処理実行部における分類処理
図１６は、分析処理実行部１４０において分類処理の処理手順を説明するためのフローチャートである。
先ず、文書蓄積部１５０の中の利用者は分類させたい文書群をキーボードやマウス等の入力装置によって指定し、この指定された文書群に対して行う分類手法やそれに付属した情報などの諸設定を行う（ステップＳ２００）。
次に、分類対象となった文書の書誌事項、内容、属性の数量化を行う（ステップＳ２１０）。
この数量化は、例えば、図１７に示したような同じ属性をもった文書Aと文書Bがあるとき、各データ項目の値を全一致で見た場合の相違したデータ項目の個数と考えれば、差は２(種類と文書番号)となる。
また、データ項目同士の値の差を使うとすれば、差は１＋(４５−３０)＝１６である。これは図面と請求書が異なるから１であり、文書番号はその値の差を計算して上記の数値を得る。
【００１７】
ここで、図面と請求書の差を１とはせず、図面は(x,y)=(4,7)、請求書は(x,y)=(10,15)などと定義して距離を差として使うこともできる。この場合、軸を「再利用度」や「閲覧されることの多さ」などを予め決めて、文書ごとに数値化しておくようにする。
また、項目数の差異だけでなく、編集された日時の差、データ項目の名称やデータ型の編集による差異などを数量化するようにしてもよい。例えば、図１８のように同じ文書Ｃが１２日に作成され、１５日と１８日に編集されたとすると、１２日の文書と１５日の文書の差異は、文書番号の差（４５−３０）に日にちの差３を乗じて計算するようにしてもよい。
また、文書内容の数量化は、含まれる単語の数の差を利用する方法などが考えられる。
更に、任意の２文書の差異を求めるのに、単語の頻度を用いて数値化することもできる。文書を形態素解析して、単語を抽出し、「文書×単語」の行列を作成し特定の文書ベクトル間距離を計算することを例にして説明する。
【００１８】
この単語集合を列、文書集合を行として、この行列の列を次のように増やす。増やす列名は、図１４にある管理番号（例えば、「１」、「２」、「１０１」や「１０２」）である。
例えば、文書の属性名称「OPQ」に属する場合には、列名「１０１」と「１０２」の要素に非０（これは１でもよいが、重みとしてもよい）を代入し、属性名称「OPQ」のfromを参照し、列名「１」と「２」の要素にも非０を代入する。
このようして拡張した行列を用いて、距離の計算（＝分類）を行う。
このようにして数量化された各文書に対して、クラスタ分析等の利用者から指定された分類手法によって、指定された文書群を分類し（ステップＳ２２０）、その分類結果をディスプレイやプリンタ等の出力装置に出力することによって利用者へ提示する（ステップＳ２３０）。上述したように実施の形態を構成することによって、属性の名称、属性に含まれるデータ項目の名称やデータ型の変更などの編集が行われた際の履歴を記録しているので、以下のような効果が達成できる。
（１）編集の前後の属性が検索条件として指定された場合であっても、この編集履歴に基づき検索対象を拡張して検索を行うことができるので、利用者は属性が変えられたことを気にせずに検索の実行ができる。
（２）文書群を分類させる場合に、その編集前後の属性間の近接度を数量化して利用し、文書属性の編集履歴を反映した文書群の分類を実行することができる。
【００１９】
＜コンピュータシステムによる実施の形態＞
本発明の文書処理装置は、図１７に上げたようなネットワーク接続されたコンピュータシステムによっても実現することができる。
すなわち、利用者のクライアント２００と文書データを管理しているサーバ３００とがネットワーク９を介して接続している。クライアント２００とサーバ３００とは、必要であればそれぞれ複数台が接続されていて良い。
このネットワーク９は、これらの利用者のクライアント２００とサーバ３００とを結合するための伝送路であって、一般には、ケーブルで実現され、通信プロトコルにはＴＣＰ／ＩＰが使われる。但し、伝送路としてはケーブルだけではなく、それらの間の通信プロトコルが一致するものであれば無線ＬＡＮや放送波を使ったものであっても良い。
これらクライアント２００とサーバ３００は、汎用的なコンピュータによって構成される。
すなわち、入力装置１はキーボード、マウス、タッチパネル、スキャナ等により構成され、情報の入力に使用される。
出力装置２は、種々の出力情報や入力装置１からの入力された情報などを出力させるディスプレイやプリンタで構成される。
ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ；中央処理ユニット）３は、種々のプログラムを動作させる。
メモリ４は、プログラム自身を保持し、またそのプログラムがＣＰＵ３によって実行されるときに一時的に作成される情報等を保持する。
記憶装置５は、データ、プログラムやプログラム実行時の一時的な情報等を保持する。
【００２０】
媒体駆動装置６は、プログラムやデータ等を記憶した記録媒体を装着してそれらを読み込み、メモリ４または記憶装置５へ格納するのに用いられる。また、直接データの入出力やプログラム実行するのに使ってもよい。
ネットワーク接続装置７は、クライアント２００やサーバ３００をネットワーク９へ接続するためのインタフェースである。
バス８は、上記各部を接続する。
このようなコンピュータにおいて、図１に示したクライアント２００を構成する各機能やサーバ３００の文書蓄積部１５０、文書属性蓄積部１６０や文書属性編集履歴蓄積部１７０を管理する機能をそれぞれプログラム化し、予めＣＤ−ＲＯＭ等の記録媒体に書き込んでおき、このＣＤ−ＲＯＭをクライアント２００やサーバ３００のＣＤ−ＲＯＭドライブのような媒体駆動装置６に装着して、プログラムをメモリ４あるいは記憶装置５に格納し、それを実行することによって、上記の実施の形態の機能を実現することができる。
なお、記録媒体としては半導体媒体（例えば、ＲＯＭ、ＩＣメモリカード等）、光媒体（例えば、ＤＶＤ−ＲＯＭ，ＭＯ，ＭＤ，ＣＤ−Ｒ等）、磁気媒体（例えば、磁気テープ、フレキシブルディスク等）のいずれであってもよい。
また、本発明の文書処理装置の機能を実現するプログラムは、媒体の形で頒布することができる。
また、本発明の文書処理装置の機能を実現するプログラムを磁気ディスク等の記憶装置に格納しておき、有線又は無線の通信ネットワークによりダウンロード等の形式で頒布したり、放送波によって配布したりすることで提供することもできる。
【００２１】
【発明の効果】
以上説明したように、本発明によれば、文書の属性の名称、属性に含まれるデータ項目の名称やデータ型等の変更があった場合にも、変更前後の属性を用いて検索や分類が行えるので、利用者が文書の属性等の変更を自分で管理する労力を軽減することができる。
【図面の簡単な説明】
【図１】本発明の実施の形態の一例を示す文書処理装置の構成を説明するブロック図である。
【図２】文書蓄積部のデータ構造を示す図である。
【図３】文書蓄積部の文書の登録例を示す図である。
【図４】文書属性蓄積部のデータ構造を示す図である。
【図５】文書属性蓄積部の属性の登録例を示す図である。
【図６】属性を編集する例（編集前の状態）を示す図である。
【図７】属性を編集する例（編集後の状態）を示す図である。
【図８】画面を使った属性の編集作業の例を示す図である。
【図９】二つ以上の属性を利用して新たな属性を定義するときの概念図である。
【図１０】文書に属性を付与するときの例を示す図である。
【図１１】選択された属性に属するデータ項目へ値を設定するときの例を示す図である。
【図１２】文書属性編集履歴蓄積部の登録例を示す図である。
【図１３】文書属性蓄積部の他のデータ構造を示す図である。
【図１４】文書属性蓄積部の属性の登録例を示す図である。
【図１５】分析処理実行部において検索処理の処理手順を説明するためのフローチャートである。
【図１６】分析処理実行部において分類処理の処理手順を説明するためのフローチャートである。
【図１７】分類処理における数量化を説明するための図である。
【図１８】分類処理における数量化を説明するための図である。
【図１９】本発明の実施される文書処理装置のクライアントおよびサーバを実現するためのコンピュータハードウェアを示す構成図である。
【符号の説明】
１ …… 入力装置
２ …… 出力装置
３ …… ＣＰＵ
４ …… メモリ
５ …… 記憶装置
６ …… 媒体駆動装置
７ …… ネットワーク接続装置
８ …… バス
９ …… ネットワーク
１００…… 制御部
１１０…… 文書登録部
１２０…… 文書属性付与部
１３０…… 文書属性編集部
１４０…… 分析処理実行部
１５０…… 文書蓄積部
１６０…… 文書属性蓄積部
１７０…… 文書属性編集履歴蓄積部
２００…… クライアント
３００…… サーバ[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a document processing apparatus, a document processing method, a program, and a recording medium for managing an electronic document, and more specifically, a technique for performing a search or classification using document attributes and analyzing a document group. About.
[0002]
[Prior art]
In general, a system for managing digitized documents supports bibliographic information such as document name, creation date and update date, and supports analysis of document groups such as search and classification.
In addition, information extracted from keywords and part of document contents is used as bibliographic information for individual documents, document groups are analyzed, keyword extraction and information extraction from document contents are automated, and the document database Conventionally, the convenience of registering a document in a document has also been improved (see, for example, JP-A-9-6792).
In addition to the supplementary information prepared in advance in the document, in a system that allows users to define information to be attached to the document, it is possible to define the attribute item name and item type at the time of document registration or document update. The item value can be entered according to the item type defined by the user at any time including, or can be entered automatically.
It has also been conventionally performed to collectively define a plurality of document bibliographic information and document incidental items described above as document attributes. For example, you can define attribute names, define multiple item names and item types included in the attributes, and use them as document types. Change the number of items in the attribute and edit the item name / type. It is possible to select a document attribute when registering or updating a document, and substitute an item value according to the type of each item.
For bibliographic information and document attributes of such documents, you can extract several documents from the document group by specifying the bibliographic information values and document attribute values as search conditions, or classify them based on those values. For example, document groups have been analyzed.
In addition, as a technology that supports searching for documents that have been searched using the specified search conditions, document search instructions similar to those already presented, and documents that have a different viewpoint from that document, There is a technique described in Japanese Patent No. 2581376.
[0003]
[Problems to be solved by the invention]
In these conventional methods, when editing document attributes such as the name of the attribute of the document defined by the user, the name of the item included in the attribute, and the change of the item type, before and after the editing, Because the attribute type is different, if you specify the old attribute type and perform a search, you cannot search for documents that correspond to the new attribute type, and you can specify search conditions even if you record and manage such edit records. There was a problem of becoming complicated.
Also, when the system classifies a document group, it does not take into account the relationship before and after such editing, so that the desired search result and classification result may not be obtained.
The present invention has been made in order to solve the above-described problems, and even when there is a change in the name of the attribute of the document, the name of the data item included in the attribute, the data type, etc., the attribute before and after the change is changed. An object of the present invention is to provide a document processing apparatus, a document processing method, a program, and a recording medium that can be used for searching and classification, thereby reducing a user's effort to manage changes in document attributes and the like.
[0004]
[Means for Solving the Problems]
  In order to solve the above problem, the invention of claim 1 of the present application is a document processing apparatus that searches a computerized document using a computer, and includes an attribute ID that is an identification name, and an attribute ID MappedAttribute name andA document attribute storage unit that holds attributes including a plurality of data items and a data type for each data item, a document storage unit that holds a plurality of documents by assigning the attributes to each document, and a user Editing of the attribute held in the document attribute storage unit or a document attribute editing unit for defining a new attribute, and editing history edited by the document attribute editing unit as the editing sourceAttribute IDA document attribute editing history storage unit that is stored in association with the document storage unit, and an analysis processing execution unit that searches the document storage unit for a document that matches a search condition input by a user. Search criteriaEdit history corresponding to the attribute ofThe document attribute editing history storage unit and the document attribute storage unitThe attribute name and data item associated with the attribute ID that is the source of editingIt is characterized by searching the corresponding document in addition to the search conditions.
  In addition, the claims of this application2According to the invention, the analysis processing execution unitSearchTo output the result to the output device.1The document processing apparatus described in 1. is characterized.
[0005]
  In addition, the claims of this application3The present invention is a document processing method for searching for an electronic document using a computer, wherein an attribute ID that is an identification name is associated with the attribute ID in the document attribute storage unitAttribute name andAn attribute including a plurality of data items and a data type for each data item is held, the attribute is added to the document storage unit for each document, and a plurality of documents are held. Editing the attribute held in the document attribute storage unit or defining a new attribute, and editing history edited by the document attribute editing unit in the document attribute editing history storage unitAttribute IDThe analysis processing execution unit searches the document storage unit for a document that matches the search condition input by the user, andEdit history corresponding to the attribute ofThe document attribute editing history storage unit and the document attribute storage unitThe attribute name and data item associated with the attribute ID that is the source of editingSearch criteriaInIt is characterized by adding the corresponding document and searching.
[0006]
  In addition, the claims of this application4In the invention, the computer is associated with an attribute ID that is an identification name and the attribute ID.Attribute name andA document attribute storage unit that holds an attribute including a plurality of data items and a data type for each data item, a document storage unit that assigns the attribute to each document and holds a plurality of documents, and a user uses the document attribute Editing of the attribute stored in the storage unit or a document attribute editing unit for defining a new attribute, and editing history edited by the document attribute editing unit as a source of editingAttribute IDA document attribute editing history storage unit that is stored in association with the document, and searches the document storage unit for a document that matches the search condition input by the user.Edit history corresponding to the attribute ofThe document attribute editing history storage unit and the document attribute storage unitThe attribute name and data item associated with the attribute ID that is the source of editingSearch criteriaInFeatures a program to function as an analysis processing execution part that adds and searches for the corresponding document.
[0007]
  In addition, the claims of this application5In the invention, the computer is associated with an attribute ID that is an identification name and the attribute ID.Attribute name andA document attribute storage unit that holds an attribute including a plurality of data items and a data type for each data item, a document storage unit that assigns the attribute to each document and holds a plurality of documents, and a user uses the document attribute Editing of the attribute stored in the storage unit or a document attribute editing unit for defining a new attribute, and editing history edited by the document attribute editing unit as a source of editingAttribute IDA document attribute editing history storage unit that is stored in association with the document, and searches the document storage unit for a document that matches the search condition input by the user.Edit history corresponding to the attribute ofThe document attribute editing history storage unit and the document attribute storage unitThe attribute name and data item associated with the attribute ID that is the source of editingSearch criteriaInA computer-readable recording medium that records a program for functioning as an analysis processing execution unit that additionally searches for a corresponding document.
[0008]
DETAILED DESCRIPTION OF THE INVENTION
The configuration and operation of the embodiment of the present invention will be described below in detail with reference to the drawings.
(1) Functional configuration
FIG. 1 is a block diagram showing a functional configuration of a document processing apparatus as an example of an embodiment of the present invention.
The document processing apparatus according to the present embodiment includes a control unit 100, a document registration unit 110, a document attribute assignment unit 120, a document attribute editing unit 130, an analysis processing execution unit 140, a document storage unit 150, a document attribute storage unit 160, a document attribute. And an editing history storage unit 170.
The control unit 100 controls the entire document processing apparatus when the user designates document registration, document attribute editing, document attribute assignment, analysis execution, etc. by operating an input device such as a keyboard.
The document registration unit 110 inputs a document via an input device such as a scanner, a camera, and a keyboard, or a network, and registers the input document in the document storage unit 150. The document registration process and its result can be confirmed by an output device such as a display or a printer.
[0009]
The data structure of the document storage unit 150 includes a document identifier (document ID) for managing a document, a document name, a bibliographic item (registration date, registrant name, update date, document format (for example, text information) for each document. , Image information, audio information, etc.), document size, notes, etc.), attribute ID, data item list, content list, etc. (see FIGS. 2 and 3). Here, the attribute ID is a unique name for identifying the attribute registered in the document attribute storage unit 160, and the data item list holds an actual value corresponding to the data item of the attribute indicated by the attribute ID. The content list is a list of file identifiers that store content indicating the contents of the document (data such as text, images, and audio).
The document attribute editing unit 130 defines an attribute name to be given to the document, an item name included in the attribute, a data type of the item, and the like, and registers the attribute in the document attribute storage unit 160. Once the attribute of the registered document is extracted from the document attribute storage unit 160, the result of editing can be re-registered in the document attribute storage unit 160 and updated. Of course, attributes themselves and data items can also be deleted.
The editing process and result of the attribute of the document can be confirmed by an output device such as a display or a printer.
The data structure of the document attribute storage unit 160 (see FIGS. 4 and 5) defines attribute data items to be assigned to each document, and is identified by an attribute ID. That is, each document has a variable data item defined here in addition to the bibliographic item, and this data item corresponds to that of the document storage unit 150. This variable data item has a flag indicating whether or not to use this variable data item, the data type of this data item (character type, character string type, date type, name type, numeric type, logical type, etc.) and this data item It consists of a triple pair with the name.
[0010]
FIG. 5 is an example of attributes used in the description of the present embodiment. As an attribute name, an arbitrary name (for example, “abc”) can be defined. As shown in this example, multiple items can be defined together in the attribute. As the data item name and data item data type (value type), the data item name is the document print date and the data item. The data type for is defined as a date type.
An example of editing an attribute registered in the document attribute storage unit 160 in FIG. 5 is shown in FIG. 6 (state before editing) and FIG. 7 (state after editing).
In FIG. 6, the attribute name is changed from “abc” to “xyz”, the data item name “document printing date” is set to “last printing date”, and the data item name “drawing distinction” is set to “character string”. “Type” is changed to “Numeric type”, and “Customer code” is added to the name as a data item, and “Numeric type” is added to the data type. Also, the attribute “abc” itself can be deleted.
The attribute editing operation may be performed using a screen as shown in FIG.
The registered attribute reference screen shown in the upper part of the screen in FIG. 8 displays a list of attributes already registered in the document attribute storage unit 160. Also, the new definition screen at the bottom of the screen of FIG. 8 shows an edit screen for a newly defined attribute. In this example, the attribute “new” is defined. In this new definition screen, the name and item type of each item type are defined. At this time, the entire attribute or a part of the items included in the attribute can be selected and defined from the registered attribute reference screen. Here, all data items of the attribute “abc” and the attribute “ghi” are selected and set as a new attribute “new”.
This attribute can be defined by using two or more attributes shown on the registered attribute reference screen.
For example, when the attributes “a”, “b”, “c”, and “d” are defined on the new definition screen from the attributes “A” and “B” shown on the registered attribute reference screen, the relationship between the two is illustrated. A relationship such as 9 is possible. From the left in the figure
・ Define a by adding all items of attribute A and newly defined items
-Select a part of the item of attribute A and define b
-Define c from the sum of attribute A and attribute B items
・ Define d from the product of attribute A and attribute B
[0011]
The document attribute assigning unit 120 associates the document extracted from the document storage unit 150 with the attribute extracted from the document attribute storage unit 160, substitutes the value of an item included in the attribute, and the like. The associated document and attribute are registered in the document storage unit 150. The process of associating the attribute with the document and the result can be confirmed by an output device such as a display or a printer.
For example, the document attribute assigning unit 120 will be described with reference to FIG. In this example, an attribute is associated with the document registered in the document storage unit 150 by the document registration unit 110 or the document “Iroha” already registered in the document storage unit 150.
In FIG. 10, the attribute “xyz” of the document “Iroha” already assigned is selected from the attribute name list (the right part of the screen) registered in the document attribute storage unit 160. And have changed. Further, after selecting the attribute “abc”, the value of each item defined in the attribute is substituted as shown in FIG. These editing results are registered in the document storage unit 150 in a format corresponding to the document contents.
The analysis processing execution unit 140 retrieves a document group from the document storage unit 150 after the user sets search and classification conditions from an input device such as a keyboard and a pointing device, and refers to the specified conditions. Processing such as search and classification is executed (details will be described later). The analysis process of the document group and the result can be confirmed by an output device such as a display or a printer.
[0012]
The document attribute editing history storage unit 170 records a history of editing processing executed by the document attribute editing unit 130. The data structure of the document attribute editing history storage unit 170 (see FIG. 12) is a triple of the attribute ID, the editing date, and the attribute ID of the attribute based on the editing for each attribute editing registration date. It is represented by The document attribute storage unit 160 and the document attribute edit history storage unit 170 are managed as one by adding the edit date of the data structure of the document attribute edit history storage unit 170 of FIG. 12 to the data structure of the document attribute storage unit 160. May be.
For example, in FIG. 12, the attribute “abc” in FIG. 6 is newly defined on March 31, 1983, edited on July 28, 1996, like the attribute “xyz” in FIG. The case where the attribute “xyz” is changed on the same day is shown. When the data item “customer code” is deleted as in this editing, the flag indicating whether or not the data item can be used is “No”.
When the attribute editing history is expressed, the data structure of the document attribute storage unit 160 is uniquely managed in the document attribute storage unit 160 for each attribute data item as shown in FIG. ”And“ from ”indicating which data item of which attribute is diverted, and by adding the edit date of the attribute, the document attribute edit history storage unit 170 transfers to the document attribute storage unit 160. Two or more attributes that are integrated and are the basis of editing can be expressed.
For example, as shown in FIG. 14 (document attribute storage unit 160), a new attribute “OPQ” is defined using a part of the data item of attribute “abc” defined on June 10, 1982. think of.
[0013]
First, the management item “1” uniquely managed in the document attribute storage unit 160 is assigned to the data item “print date” of the attribute “abc”. If this is used, this management number “1” is assigned to “from” of the attribute “OPQ”, and a unique management number “101” is given as its own management number. Next, the data item “approver” of the attribute “abc” is assigned a management number “2” that is uniquely managed in the document attribute storage unit 160, and this is assigned to the “affiliation head” in the attribute “OPQ”. When the name is changed, this management number “2” is substituted for “from”, the name and type are changed, and a unique management number “102” is given as its own management number.
If a new attribute “UVW” is defined using all data items of attribute “abc”, “1 / AID001” is displayed for “from” of data item “print date” of attribute “OPQ”. "Is substituted, and the unique management number" 201 "is given as its own management number. “AID001” on the right side of “/” represents a unique identifier of the attribute “abc” edited and registered on June 10, 1982, and “1” on the left side of the attribute “abc” represents the data item “print date” of the attribute. Shows a unique management number.
Further, as a method of changing only the attribute names such as attribute IDs “AID001” and “AID002” and “AID005” and “AID006” in FIG. 14, for example, the attribute name is “1999 outgoing document”. It may be used when the document is changed to “2000-origin document” in 2000 or the attribute name “development room” is changed to “first development section” due to organizational change or the like.
[0014]
(2) Search processing in the analysis processing execution unit
FIG. 15 is a flowchart for explaining a processing procedure of search processing in the analysis processing execution unit 140.
First, the user inputs a search condition for searching for a document registered in the document storage unit 150 from an input device such as a keyboard (step S100).
It is checked whether or not the input search condition includes an attribute name or an attribute item name (step S110). If not included, the process proceeds to step S150, and a normal search is executed with the search condition.
If it is included, the document attribute storage unit 160 and the document attribute edit history storage unit 170 are referred to, and an edit history corresponding to the attribute name and the item name of the attribute included in the search condition is extracted (step S120). .
It is checked whether there is an edit history corresponding to the attribute name or attribute item name included in the search condition (step S130). If not, the process proceeds to step S150. If there is, a new search condition is added from the retrieved history. (Step S140).
For example, referring to FIGS. 5 and 12, when the search condition is designated to search for a document group having the attribute “xyz”, the attribute “abc” from which this attribute is created is generated. It is added to the search condition so as to search for a document group having.
In addition, when the value of the data item “approver” of the attribute “xyz” is specified as a search condition, the corresponding data item “document checker” of the attribute “abc” from which the attribute was created is changed. Add included search criteria.
[0015]
In order to refer to the editing history of attributes and items, it is easier to find the data structure of the document attribute storage unit 160 as shown in FIG.
For example, with reference to FIG. 14, if the value of the data item “print date” of the attribute “OPQ” is specified in the input search condition, the data item referred to when creating this data item will be described. That is, the data item “print date” of the attribute “abc” can be found. (A record using the management number “1” is left when the data item of that attribute is defined in the management number “101” in FIG. 14).
Therefore, the data item having the management number “1” is acquired from the document attribute storage unit 160, and the name “print date” of the data item associated with the attribute “abc” can be added to the search condition.
A normal search of the document storage unit 150 is executed using the added search conditions (the search conditions input from Steps S110 (No) and S130 (No) remain unchanged) (Step S150).
The search result is output to an output device such as a display or a printer (step S160). When outputting this result, the document group directly searched from the search condition specified by the user and the document group searched from the condition that the document processing apparatus has expanded the search target are distinguished by color, group, etc. Convenient for users.
If this embodiment is configured in this way, the history when the attribute name, the name of the data item included in the attribute, the data type, etc. are edited is recorded, so that information is used. Thus, when a search is performed by specifying the attribute before or after editing as a condition, the search target can be expanded based on the editing history.
[0016]
(3) Classification processing in the analysis processing execution unit
FIG. 16 is a flowchart for explaining the processing procedure of the classification processing in the analysis processing execution unit 140.
First, a user in the document storage unit 150 designates a document group to be classified by an input device such as a keyboard or a mouse, and various settings such as a classification method to be performed on the designated document group and information attached thereto. (Step S200).
Next, the bibliographic items, contents, and attributes of the document to be classified are quantified (step S210).
For example, when there are a document A and a document B having the same attribute as shown in FIG. 17, the quantification is considered to be the number of data items that differ when the values of the data items are viewed as a complete match. The difference is 2 (type and document number).
If the difference between the values of the data items is used, the difference is 1+ (45-30) = 16. This is 1 because the drawing and invoice are different, and the document number is calculated by calculating the difference between the values.
[0017]
Here, the difference between the drawing and the invoice is not set to 1, the drawing is defined as (x, y) = (4,7), the invoice is defined as (x, y) = (10,15), etc. Can also be used as a difference. In this case, the “reuse degree”, “amount of browsing”, and the like are determined in advance for the axis, and are converted into numerical values for each document.
Further, not only the difference in the number of items but also the difference in the edited date and time, the difference due to the editing of the name and data type of the data item, etc. may be quantified. For example, as shown in FIG. 18, if the same document C is created on the 12th and edited on the 15th and 18th, the difference between the 12th and 15th documents is the difference between the document numbers (45-30). It may be calculated by multiplying the date difference by 3.
In addition, for quantification of document contents, a method using the difference in the number of included words can be considered.
Furthermore, in order to obtain the difference between two arbitrary documents, it is possible to make a numerical value using the frequency of words. An example will be described in which a morphological analysis is performed on a document, words are extracted, a “document × word” matrix is created, and a distance between specific document vectors is calculated.
[0018]
With this word set as a column and the document set as a row, the columns of this matrix are increased as follows. The column names to be increased are the management numbers (for example, “1”, “2”, “101”, “102”) shown in FIG.
For example, in the case of belonging to the attribute name “OPQ” of the document, a non-zero (this may be 1 or a weight) may be substituted for the elements of the column names “101” and “102”, and the attribute name “OPQ” "From" is referenced, and non-zero is substituted for the elements of column names "1" and "2".
The distance is calculated (= classified) using the matrix thus expanded.
For each document quantified in this way, the specified document group is classified by a classification method designated by the user such as cluster analysis (step S220), and the classification result is displayed on a display or printer. The data is output to the output device and presented to the user (step S230). By configuring the embodiment as described above, the history when the attribute name, the name of the data item included in the attribute, and the data type are edited is recorded. Effects can be achieved.
(1) Even if the attributes before and after editing are specified as search conditions, the search can be expanded based on this editing history, so that the user can change the attribute. You can search without worrying.
(2) When classifying a document group, the proximity between attributes before and after the editing can be quantified and used to classify the document group reflecting the document attribute editing history.
[0019]
<Embodiment by computer system>
The document processing apparatus of the present invention can also be realized by a network-connected computer system as shown in FIG.
That is, the user's client 200 and the server 300 managing document data are connected via the network 9. A plurality of clients 200 and servers 300 may be connected if necessary.
The network 9 is a transmission path for connecting the client 200 and the server 300 of these users, and is generally realized by a cable, and TCP / IP is used as a communication protocol. However, the transmission path is not limited to a cable, but may be a wireless LAN or a broadcast wave as long as the communication protocol between them matches.
These client 200 and server 300 are constituted by general-purpose computers.
That is, the input device 1 includes a keyboard, a mouse, a touch panel, a scanner, and the like, and is used for inputting information.
The output device 2 includes a display and a printer that output various output information and information input from the input device 1.
A CPU (Central Processing Unit) 3 operates various programs.
The memory 4 holds the program itself, and holds information that is temporarily created when the program is executed by the CPU 3.
The storage device 5 holds data, programs, temporary information at the time of program execution, and the like.
[0020]
The medium driving device 6 is used to load a recording medium storing programs, data, and the like, read them, and store them in the memory 4 or the storage device 5. It may also be used for direct data input / output and program execution.
The network connection device 7 is an interface for connecting the client 200 and the server 300 to the network 9.
The bus 8 connects the above parts.
In such a computer, the functions constituting the client 200 shown in FIG. 1 and the functions for managing the document storage unit 150, the document attribute storage unit 160, and the document attribute editing history storage unit 170 of the server 300 are respectively programmed. The data is written on a recording medium such as a CD-ROM, and the CD-ROM is mounted on the medium driving device 6 such as the CD-ROM drive of the client 200 or the server 300, and the program is stored in the memory 4 or the storage device 5. By executing it, the functions of the above-described embodiments can be realized.
As a recording medium, a semiconductor medium (for example, ROM, IC memory card, etc.), an optical medium (for example, DVD-ROM, MO, MD, CD-R, etc.), a magnetic medium (for example, magnetic tape, flexible disk, etc.) Any of these may be used.
A program for realizing the functions of the document processing apparatus of the present invention can be distributed in the form of a medium.
Also, a program for realizing the functions of the document processing apparatus of the present invention is stored in a storage device such as a magnetic disk, and distributed in the form of download or the like via a wired or wireless communication network, or distributed by broadcast waves. Can also be provided.
[0021]
【The invention's effect】
As described above, according to the present invention, even when there is a change in the name of the attribute of the document, the name of the data item included in the attribute, the data type, or the like, search and classification can be performed using the attributes before and after the change. Since this can be done, it is possible to reduce the effort of the user to manage the change of the attribute of the document by himself / herself.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a configuration of a document processing apparatus according to an example of an embodiment of the present invention.
FIG. 2 is a diagram illustrating a data structure of a document storage unit.
FIG. 3 illustrates an example of document registration in a document storage unit.
FIG. 4 is a diagram illustrating a data structure of a document attribute storage unit.
FIG. 5 is a diagram illustrating an example of registering attributes of a document attribute storage unit;
FIG. 6 is a diagram illustrating an example of editing an attribute (a state before editing).
FIG. 7 is a diagram illustrating an example of editing an attribute (a state after editing).
FIG. 8 is a diagram illustrating an example of attribute editing work using a screen.
FIG. 9 is a conceptual diagram when a new attribute is defined using two or more attributes.
FIG. 10 is a diagram illustrating an example when attributes are assigned to a document.
FIG. 11 is a diagram illustrating an example when a value is set to a data item belonging to a selected attribute.
FIG. 12 is a diagram illustrating a registration example of a document attribute editing history accumulation unit.
FIG. 13 is a diagram showing another data structure of the document attribute storage unit.
FIG. 14 is a diagram illustrating an example of registration of attributes in a document attribute storage unit.
FIG. 15 is a flowchart for explaining a processing procedure of search processing in an analysis processing execution unit;
FIG. 16 is a flowchart for explaining a processing procedure of classification processing in an analysis processing execution unit;
FIG. 17 is a diagram for explaining quantification in classification processing;
FIG. 18 is a diagram for explaining quantification in classification processing;
FIG. 19 is a configuration diagram showing computer hardware for realizing a client and a server of a document processing apparatus in which the present invention is implemented.
[Explanation of symbols]
1 ... Input device
2 ... Output device
3 …… CPU
4 ... Memory
5 ... Storage device
6 …… Medium drive device
7 …… Network connection device
8 …… Bus
9 …… Network
100 …… Control unit
110 …… Document Registration Department
120... Document attribute assigning section
130 ... Document attribute editing section
140 …… Analysis processing execution unit
150 …… Document storage
160 …… Document attribute storage unit
170... Document attribute editing history storage unit
200 …… Client
300 …… Server

Claims

A document processing apparatus for searching for an electronic document using a computer,
A document attribute storage unit that holds an attribute ID that is an identification name, an attribute name associated with the attribute ID, a plurality of data items, and a data type for each data item;
A document storage unit that holds the plurality of documents by assigning the attribute to each document;
A document attribute editing unit in which a user edits the attribute held in the document attribute storage unit or newly defines an attribute;
A document attribute editing history accumulating unit for storing an editing history edited by the document attribute editing unit in association with an attribute ID that is an editing source;
An analysis processing execution unit that searches the document storage unit for a document that matches a search condition input by a user,
The analysis processing execution unit extracts an editing history corresponding to the attribute in the input search condition from the document attribute editing history storage unit and the document attribute storage unit, and associates the editing history with the attribute ID that is the editing source. A document processing apparatus for searching a corresponding document by adding a name and a data item of an attribute to a search condition.

The document processing apparatus according to claim 1 , wherein the analysis processing execution unit outputs a search result to an output apparatus.

A document processing method for searching a computerized document using a computer,
An attribute ID that is an identification name, an attribute name associated with the attribute ID, a plurality of data items, and a data type for each data item are held in the document attribute storage unit, and the document storage unit stores the attribute An attribute is assigned to each document to hold a plurality of documents, and a user edits the attribute stored in the document attribute storage unit or defines a new attribute in the document attribute editing unit, and a document attribute editing history storage unit The document storage unit stores the editing history edited by the document attribute editing unit in association with the attribute ID that is the editing source, and the analysis processing execution unit stores a document that meets the search condition input by the user. The edit history corresponding to the attribute in the input search condition is retrieved from the document attribute edit history storage unit and the document attribute storage unit, and the attribute history associated with the attribute ID that is the editing source is extracted. Name and data terms Document processing method characterized by retrieving the relevant documents by adding the search criteria.

A document attribute storage unit that holds an attribute including an attribute ID that is an identification name, an attribute name associated with the attribute ID, a plurality of data items, and a data type for each data item; A document storage unit for each document to hold a plurality of documents; a user editing the attribute stored in the document attribute storage unit; or a document attribute editing unit for defining a new attribute; and the document attribute editing unit A document attribute editing history storage unit that stores the edited editing history in association with the attribute ID that is the source of editing. A document that matches the search condition input by the user is searched from the document storage unit and input. Remove the editing history corresponding to the attribute in the search condition from the document attribute editing history storage unit and the document attribute storage unit, the names and data items of the attribute associated with the attribute ID in edit based on Program for adding a search condition to function as an analysis process execution unit, to search for relevant documents.

A document attribute storage unit that holds an attribute including an attribute ID that is an identification name, an attribute name associated with the attribute ID, a plurality of data items, and a data type for each data item; A document storage unit for each document to hold a plurality of documents; a user editing the attribute stored in the document attribute storage unit; or a document attribute editing unit for defining a new attribute; and the document attribute editing unit A document attribute editing history storage unit that stores the edited editing history in association with the attribute ID that is the source of editing. A document that matches the search condition input by the user is searched from the document storage unit and input. Remove the editing history corresponding to the attribute in the search condition from the document attribute editing history storage unit and the document attribute storage unit, the names and data items of the attribute associated with the attribute ID in edit based on A computer-readable recording medium characterized by recording a program for adding a search condition analysis process execution unit to retrieve the relevant documents, to function as a.