JP2004078343A

JP2004078343A - Document management system

Info

Publication number: JP2004078343A
Application number: JP2002234592A
Authority: JP
Inventors: Takamichi Sekido; 関戸　崇道
Original assignee: Konica Minolta Inc
Current assignee: Konica Minolta Inc
Priority date: 2002-08-12
Filing date: 2002-08-12
Publication date: 2004-03-11

Abstract

<P>PROBLEM TO BE SOLVED: To provide a document management system which easily and surely registers a retrieved keyword in a database, shortening a document retrieval time, and utilizing already extracted keyword also when a document is extracted from the database. <P>SOLUTION: A document management server 1 comprises a keyword extraction processing part 101a cutting out character strings forming the keyword based on a keyword identification mark and converting to character data, an external storage device 102 storing at least one database storing the electronic data of documents and character data of keywords, and a registration processing part 101b storing the keyword character data in an attribute information storage area on the electronic data and storing the electronic data and the character data of the keywords stored in the attribute information storage area of the electronic data in the database. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、文書管理システムに係り、特に文書を検索する際のキーワードとなる文字列をデータベースに登録する文書管理システムに関する。
【０００２】
【従来の技術】
従来、一般的に、文書を電子データとしてデータベースへ登録して一元管理し、複数の利用者がそのデータベースから必要な文書をコンピュータにより検索し取り出して使用することができるようにした文書管理システムがある。
【０００３】
ここで、データベースに登録したい文書が紙面に表されている場合には、当該文書を電子データで表す必要がある。そこで、当該文書をスキャナ等で光学的に読み取って画像データとして表すことが行われている。
【０００４】
ところで、利用者が文書を検索する際には、キーボードなどの入力装置を用いて、検索したい文書に含まれると思われる文字列を検索条件として入力する。そうすると、コンピュータを構成するＣＰＵ（Ｃｅｎｔｒａｌ　Ｐｒｏｃｅｓｓｉｎｇ　Ｕｎｉｔ）等から構成される制御装置が、検索条件で指定された文字列が当該文書に含まれているかを検索し、当該文字列が含まれている文書を読み出すことによって文書を検索する。ここで、検索条件で指定される文字列は、文字コードによって表される。したがって、検索の対象となる文書に含まれる文字列がドットの集合として表される画像データでは、両者を対比するのが困難である。
【０００５】
そこで、文書を検索する際に検索の対象としたい文字列を文字コードすなわち文字データで表し、これをキーワードとして当該キーワードを含む文書と関連付けた上でデータベースに登録しておくことが行われている。これにより、キーワードについて検索条件で指定された文字列と一致する部分があるかを検索し、一致する部分がある場合には当該キーワードを含む文書を読み出すことで、文書の検索を行うものである。
【０００６】
従来、キーワードをデータベースに登録する文書管理システムとしては、オペレーターがキーボードから入力した文字列をキーワードとしてデータベースに登録する文書管理システムが知られている。この文書管理システムは、文書をデータベースに登録し管理するとともに文書を検索する文書管理サーバと、文書管理サーバに対して文書の検索を依頼する文書管理クライアントとスキャナとで構成される。文書管理サーバは、制御装置とキーボードとハードディスク装置とで構成されている。制御装置は、スキャナから出力された文書の画像データをハードディスク装置に格納される文書管理データベースに記憶させるとともに、オペレーターがキーボードから入力した文字列をキーワードとして当該キーワードを含む文書と関連付けた上で外部記憶装置に格納されたキーワードデータベースに記憶させるものである。
【０００７】
また、データベースに登録する文書自体を文字データで表し、これをデータベースに登録する文書管理システムも知られている。この文書管理システムは、文書管理サーバと文書管理クライアントとスキャナとで構成され、文書管理サーバは、制御装置とキーボードとハードディスク装置とで構成される。制御装置は、スキャナから出力された文書の画像データをＯＣＲ（Ｏｐｔｉｃａｌ　Ｃｈａｒａｃｔｅｒ　Ｒｅｃｏｇｎｉｔｉｏｎ）処理することにより文字データに変換し、これをハードディスク装置に格納される文書管理データベースに記憶させるものである。
【０００８】
【発明が解決しようとする課題】
しかし、オペレーターがキーボードから入力した文字列をキーワードとしてデータベースに登録する文書管理システムでは、オペレーターがキーボードからキーワードを入力するため、手間がかかる上、入力誤りが生じるおそれがある。また、文書の電子データと検索用のキーワードを別個独立した状態でデータベースに記憶させるため、文書の電子データをデータベースから抜き出して、他のコンピュータでキーワードにより検索を行おうとする場合、再度検索用のキーワードを抽出しなければならず、手間がかかる。
【０００９】
一方、データベースに登録する文書自体を文字データで表し、これをデータベースに登録する文書管理システムでは、文書を検索する際に、文書全体について検索条件で指定された文字列が含まれているかを検索するため、検索時間が長くなるという問題がある。
【００１０】
そこで本発明は、簡易かつ確実に検索の対象となる文書に含まれる文字列をキーワードとしてデータベースへ登録でき、文書の検索時間が短く、しかも文書をデータベースから抜き出した場合にも既に抽出したキーワードを利用することができる文書管理システムを提供することを目的とする。
【００１１】
【課題を解決するための手段】
上記課題を解決するために本発明による文書管理システムは、文書をデータベースに登録し管理するとともに文書を検索する文書管理サーバと、文書管理サーバに対して文書の検索を依頼する文書管理クライアントと、文書の電子データとキーワードの文字データとを記憶する少なくとも１つのデータベースを格納する外部記憶装置と、予め検索用キーワードとして登録したい文字列に所定のキーワード識別マークを付した文書を画像として読み取って画像データとして出力する文書読み取り装置とを有する文書管理システムであって、前記文書管理サーバは、前記文書読み取り装置から出力された画像データの中から、前記キーワード識別マークを認識し当該キーワード識別マークに基づいてキーワードとなる文字列の画像データを切り出して文字データに変換するキーワード抽出処理部と、前記キーワードの文字データを前記文書の電子データ上の属性情報格納領域に格納し、当該電子データ及び当該電子データの属性情報格納領域に格納されたキーワードの文字データを前記データベースに格納する登録処理部とを有することを特徴とする。
【００１２】
本発明によれば、キーワード抽出処理部が、画像データ化された文書の中から文書検索用のキーワードを切り出して文字データに変換し、登録処理部が当該キーワードの文字データをデータベースに格納する。したがって、オペレーターの手によることなく、キーワードがデータベースに登録される。そして、文書を検索は、キーワードについて検索条件で指定された文字列と一致する部分があるかを検索することによって行われる。
【００１３】
また、本発明によれば、登録処理部は、キーワードの文字データを当該キーワードを含む文書の電子データの属性情報格納領域に格納し、この属性情報格納領域にキーワードの文字データが格納された状態の文書の電子データをデータベースに格納する。したがって、文書の電子データをデータベースから抜き出して、他のコンピュータでキーワードにより検索を行おうとする場合に、文書の電子データの属性情報格納領域に格納したキーワードを利用して検索用のキーワードを登録することができる。
【００１４】
請求項２に記載の文書管理システムは、請求項１に記載の発明において、前記キーワード識別マークは、キーワードとして登録したい文字列の近傍に筆記用具で付し、前記キーワード抽出処理部は、当該キーワード識別マークが付されている位置を認識し、この位置から一定の範囲内にある文字列を切り出すことを特徴とする。
【００１５】
本発明によれば、キーワード抽出処理部は、キーワード識別マークが付されている位置を認識し、この位置から一定の範囲内にある文字列を切り出される。したがって、キーワードとしたい文字列全体にキーワード識別マークが付されている場合ばかりでなく、キーワードとしたい文字列の一部や、キーワードとしたい文字列近傍の紙の余白部分に付されている場合であっても、キーワードとしたい文字列全体が切り出される。
【００１６】
請求項３に記載の文書管理システムは、請求項１又は請求項２に記載の文書管理システムにおいて、前記キーワード識別マークを予め複数種類設定し、前記キーワード抽出処理部は、前記キーワード識別マークの種類を認識して、この種類に応じて所定の範囲の文字列の画像データを切り出し、登録処理部は、前記キーワード識別マークの種類に応じてキーワードを分類して前記文書の電子データ上の属性情報格納領域に格納することを特徴とする。
【００１７】
本発明によれば、登録処理部は、キーワード識別マークの種類に応じてキーワードを分類して前記文書の電子データ上の属性情報格納領域に格納する。文書の電子データは、属性情報格納領域にキーワードを格納した状態でデータベースに格納されるため、キーワードは、文書のタイトル、文書作成者名、文書の内容が現れている文章等のように分類された状態でデータベースに登録されることになる。
【００１８】
また、本発明によれば、文字列を切り出す範囲は、キーワード識別マークの種類によって異なる。したがって、キーワードとして登録したい文字列の性質に応じてキーワード識別マークの種類を変えることによって、適切な長さの文字列を切り出すことができる。たとえば、文章をキーワードとしたい場合には、切り出す範囲が長く設定してあるキーワード識別マークを使用し、文書作成者をキーワードとしたい場合には、切り出す範囲が短く設定してあるキーワード識別マークを使用することにより、それぞれ適切な長さの文字列を切り出すことができるものである。なお、キーワード識別マークの種類としては、円、楕円、多角形等の形の別、太線、破線等の線種の別、黒、赤等の色の別のようなものが考えられる。
【００１９】
【発明の実施の形態】
以下、本発明の実施の形態を図１から図５を参照して説明する。図１は、本発明による文書管理システムの構成例を示す図である。文書管理システムは、図１に示すように、文書管理サーバ１、スキャナ２、文書管理クライアント３から構成されている。これらは、ＬＡＮ等のネットワーク４を通して接続されている。
【００２０】
図２は、本発明による文書管理システムの構成例を示すブロック図である。文書管理サーバ１は、電子データの一態様である画像データで表された文書のデータベースへの登録、データの管理、及び文書の検索を行うものである。文書管理サーバ１は、コンピュータで構成されており、文書管理サーバ１には、図２に示すように、制御部１０１、外部記憶装置１０２、及びキーボードやマウス等の入力操作部１０３がそれぞれ設けられている。
【００２１】
制御部１０１は、ＣＰＵ等から構成されており、キーワード抽出処理部１０１ａ、登録処理部１０１ｂ、検索処理部１０１ｃ、及びメモリ１０１ｄがそれぞれ設けられている。
【００２２】
メモリ１０１ｄには、キーワードの抽出処理、キーワードのデータベースへの登録処理、文書の検索処理に関する各プログラムが記憶されている。
【００２３】
キーワード抽出処理部１０１ａはスキャナ２から出力された画像データを入力し、この画像データの中からキーワード識別マークの種類及び位置を認識し、認識したキーワード識別マークの種類情報及び位置情報に基づいて、キーワード識別マークから一定の範囲にある文字列をキーワードとして切り出し、切り出した文字列をテキストからなる文字データへ変換することにより、キーワードを抽出するようになっている。文字列の切り出し及び文字データへの変換はＯＣＲ処理によって行うようになっている。
【００２４】
ここで、スキャナ２は、文書のイメージデータを光学的に読み込む装置であり、複写機等との複合機を含む。スキャナ２は、データベースに登録する文書を画像として読み取って画像データとして出力する。画像データを出力する際のファイル形式には、例えばＪＰＥＧ，ＢＭＰ，ＴＩＦＦ等があり、コンピュータで画像として扱えるものであれば、その形式は問わない。
【００２５】
文字列の切り出しについて例示すると、図３を参照して、「ＡＢＣＤ」はタイトル、「○田×郎」は文書作成者名、「ＸＹＺ、ＹＹＹＹＹ・・・。」は文書の内容がよく現れている文章、「ＥＦＧＨＩ」はサブタイトルである。
【００２６】
キーワード識別マークは、タイトルは楕円、文書作成者名は三角形、文書の内容がよく現れている文章は四角形、サブタイトルは二重線で表した楕円というように予め設定されている。キーワード識別マークが楕円、三角形、及び二重線で表した楕円である場合には、当該キーワード識別マークが付されている行にあるひとかたまりの文字列を切り出すように予め設定されている。また、キーワード識別マークが四角形である場合には、当該キーワード識別マークが付されている行に含まれる一文の文字列を切り出すように設定されている。なお、文字列を切り出す範囲は、例えば、キーワード識別マークが付された位置にある一文節、一行、一文、一段落に含まれる文字列とする等のように設定してもよい。
【００２７】
キーワード識別マークは、図３（ａ）〜（ｃ）では、キーワードとしたい文字列全体を囲むように付されている。図３（ｄ）〜（ｆ）及び（ｊ）では、キーワードとしたい文字列の一部にかかるように付されている。図３（ｇ）〜（ｉ）では、キーワードとしたい文字列のある行の紙の余白部分に付されている。キーワード識別マークが楕円、三角形、及び二重線で表した楕円である場合には、当該キーワード識別マークが付されている行にあるひとかたまりの文字列が切り出されるのであるから、図３（ａ），（ｂ），（ｄ），（ｅ），（ｇ），（ｈ），（ｊ）では、楕円、三角形、二重線で表された楕円が付された行にあるひとかたまりの文字列である「ＡＢＣＤ」，「○田×郎」，「ＥＦＧＨＩ」がそれぞれ切り出される。また、キーワード識別マークが四角形である場合には、当該キーワード識別マークが付されている行に含まれる一文の文字列が切り出されるのであるから、図３（ｃ），（ｆ），（ｉ）では、四角形が付された行に含まれる一文の文字列である「ＸＹＺ、ＹＹＹＹＹ・・・。」が切り出される。
【００２８】
このように切り出され、文字データに変換されたタイトル、文書作成者名、文書の内容がよく現れている文章、サブタイトルは、登録処理部１０１ｂに対して出力される。
登録処理部１０１ｂは、外部記憶装置１０２に文書に関するデータを記憶させることにより、データベースへの文書の画像データの登録及び文書検索用キーワードの登録を行うようになっている。
【００２９】
ここで、外部記憶装置１０２は、たとえばハードディスク装置等の大容量の記憶装置で構成され、外部記憶装置には１０２、図２に示すように、文書の画像データ群を記憶する文書管理データベース１０２ａと、データベースに登録された文書の検索を行う際に検索の対象となるキーワードを記憶するキーワードデータベース１０２ｂとを格納するようになっている。
【００３０】
登録処理部１０１ｂは、スキャナ２から出力された文書の画像データをレコードとして文書管理データベース１０２ａに記憶させる。その後、登録処理部１０１ｂは、キーワード抽出処理部１０１ａから出力されたキーワードの文字データ及びキーワード識別マークの種類に関する情報を入力したときに、文書管理データベース１０２ａから当該キーワードを含む画像データを読み出す。そして、当該画像データの属性情報格納領域にキーワードの文字データを格納する。このとき、登録処理部１０１ｂは、キーワードをキーワード識別マークの種類に基づいて各項目に分類して格納する。その後、登録処理部１０１ｂは、属性情報格納領域にキーワードの文字データが格納された画像データを、文書管理データベース１０２ａに格納するとともに、属性情報格納領域に格納されているキーワードの文字データをキーワードデータベース１０２ｂに格納する。このとき、キーワードの文字データにポインタ型の領域を付加し、当該キーワードを含む文書の画像データが格納されている文書管理データベース１０２ａ上のアドレスをこのポインタ型の領域に格納する。
【００３１】
検索処理部１０１ｃは、文書管理クライアント３からの検索要求信号に対応して、キーワードデータベース内のキーワードが、検索条件として指定されている文字列を含む部分を有するかを検索し、その結果を文書管理クライアント３に対して出力するようになっている。結果の出力は、キーワードデータベース内のキーワードが、検索条件として指定されている文字列を含む部分を有する場合には、当該キーワードのデータ上のポインタに基づいて当該キーワードを含む文書の画像データが格納されている文書管理データベース上のアドレスを参照して、当該文書のタイトル等に関する信号を文書管理クライアント３に対して出力することによって行う。キーワードデータベース内のキーワードが、検索条件として指定されている文字列を含む部分を有しない場合には、検索条件に該当する文書がない旨の信号を文書管理クライアント３に対して出力する。
【００３２】
また、検索処理部１０１ｃは、検索結果に基づいて文書管理クライアント３から出力される文書読み出し要求信号に対応して、該当文書の画像データを文書管理データベース１０２ａから読み出し、これを文書管理クライアント３に対して出力するようになっている。
【００３３】
文書管理クライアント３は、データベースに登録されている文書の検索を文書管理サーバに対して要求するものである。文書管理クライアント３は、パーソナルコンピュータ等から構成されており、文書管理クライアント３には、図２に示すように、制御部３０１、キーボードやマウス等の入力操作部３０２、及びＣＲＴ（Ｃａｔｈｏｄｅ‐Ｒａｙ　　Ｔｕｂｅ）ディスプレイや液晶ディスプレイ等の表示部３０３がそれぞれ設けられている。
【００３４】
制御部３０１は、ＣＰＵ（Ｃｅｎｔｒａｌ　Ｐｒｏｃｅｓｓｉｎｇ　Ｕｎｉｔ）等から構成され、制御部３０１には、検索要求処理部３０１ａ、検索結果処理部３０１ｂ、文書読み出し要求処理部３０１ｃ、及びメモリ３０１ｄがそれぞれ設けられている。
【００３５】
メモリ３０１ｂには、検索要求処理、検索結果処理、文書読み出し要求処理に関するプログラムが記憶されている。
【００３６】
検索要求処理部３０１ａは、入力操作部３０２から利用者が検索条件として入力した文字列の信号を入力し、この信号に基づいて文書管理サーバ１に対して文書の検索要求信号を出力するようになっている。
【００３７】
検索結果処理部３０１ｂは、文書管理サーバ１から出力された検索結果に関する信号に基づいて、検索条件に該当する文書のタイトル等の一覧又は該当する文書がない旨を表示部３０３に表示させるようになっている。
【００３８】
文書読み出し要求処理部３０１ｃは、入力操作部３０２から利用者が入力した読み出し文書を選択する信号を入力し、この信号に基づいて文書管理サーバ１に対して文書の読み出し要求信号を出力するようになっている。その結果、文書管理サーバ１が読み出した文書の画像データに基づいて、検索結果処理部３０１ｂが、当該文書を表示部３０３に表示させるようになっている。
【００３９】
次に、図４を参照して、文書管理システムによる文書の登録の流れについて説明する。先ず、キーワード抽出処理部１０１ａが、スキャナ２から出力された画像データを入力し（Ｓ１）、この画像データの中からキーワード識別マークの種類及び位置を認識する（Ｓ２）。次に、キーワード抽出処理部１０１ａは、認識したキーワード識別マークの種類情報及び位置情報に基づいて、キーワード識別マークから一定の範囲の文字列をキーワードとして切り出す（Ｓ３）。その後、キーワード抽出処理部１０１ａは、切り出した文字列を文字データへ変換する（Ｓ４）。次に、登録処理部１０１ｂが、文字データに変換されたキーワードを、当該キーワードを含む文書の画像データの属性情報格納領域へ格納する（Ｓ５）。その後、登録処理部１０１ｂは、属性情報格納領域にキーワードの文字データが格納された画像データを文書管理データベース１０２ａに格納するとともに、画像データの属性情報格納領域に格納されているキーワードをキーワードデータベース１０２ｂに格納することにより、文書をデータベースに登録する。
【００４０】
次に、文書管理システムによる文書の検索の流れについて説明する。先ず、検索要求処理部３０１ａが、入力操作部３０２から利用者が検索条件として入力した文字列の信号を入力し、この信号に基づいて文書管理サーバ１に対して文書の検索要求信号を出力する。次に、検索処理部１０１ｃが、検索要求信号に対応して、キーワードデータベース内のキーワードが、検索条件として指定されている文字列を含む部分を有するかを検索し、その結果を文書管理クライアント３に対して出力し、検索結果処理部３０１ｂが、この検索結果を表示部３０３に表示させる。次に、文書読み出し要求処理部３０１ｃが、入力操作部３０２から利用者が入力した読み出し文書を選択する信号を入力し、この信号に基づいて文書管理サーバ１に対して文書の読み出し要求信号を出力する。次に、検索処理部１０１ｃが、文書読み出し要求信号に対応して、該当文書の画像データを文書管理データベース１０２ａから読み出して、これを文書管理クライアント３に対して出力し、検索結果処理部３０１ｂが、当該文書を表示部３０３に表示させる。
【００４１】
なお、本発明の実施の形態では、文書管理サーバ１と文書管理クライアント３とは、それぞれ異なるコンピュータにより構成するようにしたが、同一のコンピュータにより構成するようにしてもよい。
【００４２】
また、本発明の実施の形態では、外部記憶装置１０２は、文書の画像データ群を記憶する文書管理データベース１０２ａと、キーワードを記憶するキーワードデータベース１０２ｂとをそれぞれ格納するようになっているが、１つのデータベースを格納し、このデータベースに画像データとキーワードとを記憶させるようにしてもよい。
【００４３】
以上より、本発明の実施の形態によれば、データベースに登録する文書を画像データとして読み取って、これに含まれるキーワードを抽出して文字データに変換し、文書を検索する際の検索対象としてデータベースに登録する。したがって、オペレーターの手によることなく、キーワードがデータベースに登録されるため、簡易かつ確実にキーワードを登録することができる。また、文書の検索は、キーワードについて検索条件で指定された文字列と一致する部分があるかを検索することによって行われるため、検索時間が短くてすむ。
【００４４】
また、本発明の実施の形態によれば、キーワードとしたい文字列全体にキーワード識別マークが付されている場合ばかりでなく、キーワードとしたい文字列の一部や、キーワードとしたい文字列近傍の紙の余白部分に付されている場合であっても、キーワードとしたい文字列全体が切り出されるため、キーワード識別マークを付するのに手間がかからない。また、必ずしも色付きの筆記用具でキーワード識別マークを付する必要がないため、白黒のデータとしてのみ認識するスキャナにより文書を読み取ったとしても、キーワードが抽出できる。
【００４５】
さらに、キーワードの文字データを当該キーワードを含む文書の画像データの属性情報格納領域に格納し、これを文書管理データベースに格納する。したがって、文書の画像データをデータベースから抜き出した場合でも、属性情報格納領域に格納されているキーワードを利用することができる。
【００４６】
【発明の効果】
請求項１に記載の発明によれば、キーワード抽出処理部が、画像データ化された文書の中から文書検索用のキーワードを切り出して文字データに変換し、登録処理部が当該キーワードの文字データをデータベースに格納するため、簡易かつ確実にキーワードを登録することができる。また、文書の検索は、キーワードについて検索条件で指定された文字列と一致する部分があるかを検索することによって行われるため、検索時間が短くてすむ。また、文書の電子データをデータベースから抜き出して、他のコンピュータでキーワードにより検索を行おうとする場合に、文書の電子データの属性情報格納領域に格納したキーワードを利用して検索用のキーワードを登録することができるため、再度キーワードを抽出する手間が省ける。
【００４７】
請求項２に記載の発明によれば、キーワード抽出処理部は、キーワード識別マークが付されている位置から一定の範囲内にある文字列を切り出すため、キーワードとしたい文字列全体にキーワード識別マークが付されている場合ばかりでなく、キーワードとしたい文字列の一部や、キーワードとしたい文字列近傍の紙の余白部分に付されている場合であっても、キーワードとしたい文字列全体が切り出される。したがって、キーワード識別マークを付するのに手間がかからない。
【００４８】
請求項３に記載の発明によれば、キーワードは、文書のタイトル、文書作成者名、文書の内容が現れている文章等のように分類された状態でデータベースに登録される。したがって、文書を検索する際に、分類別に検索することができる。また、キーワードとして登録したい文字列の性質に応じてキーワード識別マークの種類を変えることによって、適切な長さの文字列を切り出すことができる。
【図面の簡単な説明】
【図１】本発明による文書管理システムの全体の構成例を示す図である。
【図２】本発明による文書管理システムの構成例を示すブロック図である。
【図３】本発明による文書管理システムにおけるキーワードの抽出例を示す図である。
【図４】本発明による文書管理システムにおける文書のデータベースへの登録に関する処理の流れを示すフローチャートである。
【符号の説明】
１　　　　　文書管理サーバ
２　　　　　スキャナ
３　　　　　文書管理クライアント
１０１ａ　　キーワード抽出処理部
１０１ｂ　　登録処理部
１０１ｃ　　検索処理部
１０２ａ　　文書管理データベース
１０２ｂ　　キーワードデータベース
３０１ａ　　検索要求処理部
３０１ｂ　　検索結果処理部
３０１ｃ　　文書読み出し要求処理部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a document management system, and more particularly, to a document management system that registers a character string serving as a keyword when searching for a document in a database.
[0002]
[Prior art]
2. Description of the Related Art Conventionally, a document management system has generally been used in which a document is generally registered as electronic data in a database and centrally managed, and a plurality of users can search, retrieve, and use a required document from the database using a computer. is there.
[0003]
Here, when a document to be registered in the database is shown on paper, the document needs to be represented by electronic data. Therefore, the document is optically read by a scanner or the like and is represented as image data.
[0004]
When a user searches for a document, a character string that is considered to be included in the document to be searched is input as a search condition using an input device such as a keyboard. Then, a control device including a CPU (Central Processing Unit) or the like constituting the computer searches for a character string specified by a search condition in the document, and determines whether the document includes the character string. To retrieve the document by reading. Here, the character string specified by the search condition is represented by a character code. Therefore, in image data in which a character string included in a document to be searched is represented as a set of dots, it is difficult to compare the two.
[0005]
Therefore, when a document is searched, a character string to be searched is represented by a character code, that is, character data, and is associated with a document containing the keyword as a keyword and registered in a database. . Thus, a search is made for a keyword by searching for a portion that matches the character string specified by the search condition, and when there is a match, the document containing the keyword is read, thereby searching the document. .
[0006]
Conventionally, as a document management system that registers a keyword in a database, a document management system that registers a character string input by an operator from a keyboard as a keyword in a database is known. This document management system includes a document management server that registers and manages documents in a database and searches for documents, a document management client that requests the document management server to search for documents, and a scanner. The document management server includes a control device, a keyboard, and a hard disk device. The control device stores the image data of the document output from the scanner in the document management database stored in the hard disk device, associates the character string input from the keyboard by the operator with the document including the keyword as a keyword, and outputs the document to the external device. This is stored in a keyword database stored in a storage device.
[0007]
There is also known a document management system in which a document to be registered in a database is represented by character data and is registered in the database. This document management system includes a document management server, a document management client, and a scanner. The document management server includes a control device, a keyboard, and a hard disk device. The control device converts the image data of the document output from the scanner into character data by performing OCR (Optical Character Recognition) processing, and stores the character data in a document management database stored in a hard disk device.
[0008]
[Problems to be solved by the invention]
However, in a document management system in which a character string input by an operator from a keyboard is registered as a keyword in a database, the operator inputs the keyword from the keyboard, which is troublesome and may cause an input error. In addition, in order to store the electronic data of the document and the search keyword separately in the database, if the electronic data of the document is extracted from the database and the search is to be performed by another computer using the keyword, the search for the keyword is performed again. Keywords must be extracted, which is troublesome.
[0009]
On the other hand, a document management system that expresses a document itself to be registered in a database as character data and registers this in the database, searches for the entire document including a character string specified by a search condition when searching for a document. Therefore, there is a problem that the search time becomes longer.
[0010]
Therefore, according to the present invention, a character string included in a document to be searched can be easily and reliably registered as a keyword in a database, and the retrieval time of the document is short. Moreover, even when the document is extracted from the database, the keyword already extracted is used. An object of the present invention is to provide a document management system that can be used.
[0011]
[Means for Solving the Problems]
In order to solve the above problems, a document management system according to the present invention includes a document management server that registers and manages a document in a database and searches for the document, a document management client that requests the document management server to search for the document, An external storage device that stores at least one database that stores electronic data of a document and character data of a keyword; and reads a document in which a character string to be registered in advance as a search keyword with a predetermined keyword identification mark as an image is read as an image. A document reading device that outputs the data as data, wherein the document management server recognizes the keyword identification mark from the image data output from the document reading device, and based on the keyword identification mark, Cut out the image data of the character string that becomes the keyword A keyword extraction processing unit for converting the character data of the keyword into the attribute information storage area on the electronic data of the document, and the electronic data and the keyword stored in the attribute information storage area of the electronic data. And a registration processing unit for storing the character data in the database.
[0012]
According to the present invention, the keyword extraction processing unit cuts out a keyword for document search from a document converted into image data and converts it into character data, and the registration processing unit stores the character data of the keyword in the database. Therefore, the keyword is registered in the database without the help of the operator. The document search is performed by searching for a portion that matches the character string specified by the search condition for the keyword.
[0013]
Further, according to the present invention, the registration processing unit stores the character data of the keyword in the attribute information storage area of the electronic data of the document including the keyword, and stores the character data of the keyword in the attribute information storage area. The electronic data of the document is stored in a database. Therefore, when the electronic data of the document is extracted from the database and a search is to be performed using a keyword on another computer, a search keyword is registered using the keyword stored in the attribute information storage area of the electronic data of the document. be able to.
[0014]
According to a second aspect of the present invention, in the document management system according to the first aspect, the keyword identification mark is attached with a writing tool near a character string to be registered as a keyword. It is characterized by recognizing the position where the identification mark is attached and cutting out a character string within a certain range from this position.
[0015]
According to the present invention, the keyword extraction processing unit recognizes the position where the keyword identification mark is attached, and cuts out a character string within a certain range from this position. Therefore, not only when the keyword identification mark is attached to the entire character string to be used as a keyword, but also when it is attached to a part of the character string to be used as a keyword or a margin of paper near the character string to be used as a keyword. Even if it does, the entire character string that you want to be a keyword is cut out.
[0016]
A document management system according to a third aspect is the document management system according to the first or second aspect, wherein a plurality of types of the keyword identification marks are set in advance, and the keyword extraction processing unit determines a type of the keyword identification marks. , And the image data of a character string in a predetermined range is cut out according to the type, and the registration processing unit classifies the keyword according to the type of the keyword identification mark to extract attribute information on the electronic data of the document. It is characterized by being stored in a storage area.
[0017]
According to the present invention, the registration processing unit classifies the keyword according to the type of the keyword identification mark and stores the keyword in the attribute information storage area on the electronic data of the document. Since the electronic data of the document is stored in the database with the keyword stored in the attribute information storage area, the keyword is classified into a document title, a document creator name, a sentence in which the content of the document appears, and the like. Will be registered in the database.
[0018]
Further, according to the present invention, the range from which the character string is cut out differs depending on the type of the keyword identification mark. Therefore, by changing the type of the keyword identification mark according to the property of the character string to be registered as a keyword, a character string having an appropriate length can be cut out. For example, if you want to use a sentence as a keyword, use a keyword identification mark that has a long clipping range. If you want to use a document creator as a keyword, use a keyword identification mark that has a short clipping range. By doing so, a character string having an appropriate length can be cut out. The type of the keyword identification mark may be a circle, an ellipse, a polygon, or the like, a line type such as a thick line or a broken line, or a color such as black or red.
[0019]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, an embodiment of the present invention will be described with reference to FIGS. FIG. 1 is a diagram showing a configuration example of a document management system according to the present invention. As shown in FIG. 1, the document management system includes a document management server 1, a scanner 2, and a document management client 3. These are connected through a network 4 such as a LAN.
[0020]
FIG. 2 is a block diagram showing a configuration example of the document management system according to the present invention. The document management server 1 registers a document represented by image data, which is one mode of electronic data, in a database, manages data, and searches for a document. The document management server 1 is composed of a computer, and as shown in FIG. 2, the document management server 1 includes a control unit 101, an external storage device 102, and an input operation unit 103 such as a keyboard and a mouse. ing.
[0021]
The control unit 101 includes a CPU and the like, and includes a keyword extraction processing unit 101a, a registration processing unit 101b, a search processing unit 101c, and a memory 101d.
[0022]
The memory 101d stores programs related to a keyword extraction process, a keyword registration process in a database, and a document search process.
[0023]
The keyword extraction processing unit 101a inputs image data output from the scanner 2, recognizes the type and position of the keyword identification mark from the image data, and, based on the type information and position information of the recognized keyword identification mark, A keyword is extracted by extracting a character string within a certain range from the keyword identification mark as a keyword, and converting the extracted character string into character data composed of text. The extraction of the character string and the conversion to the character data are performed by OCR processing.
[0024]
Here, the scanner 2 is a device that optically reads image data of a document, and includes a multifunction peripheral with a copying machine or the like. The scanner 2 reads a document registered in the database as an image and outputs the image as image data. File formats for outputting image data include, for example, JPEG, BMP, and TIFF, and any format can be used as long as it can be handled as an image by a computer.
[0025]
As an example of character string extraction, referring to FIG. 3, “ABCD” indicates a title, “Oda * ro” indicates a document creator name, and “XYZ, YYYYY. The sentence “EFGHI” is a subtitle.
[0026]
The keyword identification mark is preset such that the title is an ellipse, the document creator's name is a triangle, the sentence in which the contents of the document frequently appear is a rectangle, and the subtitle is an ellipse represented by a double line. When the keyword identification mark is an ellipse, a triangle, and an ellipse represented by a double line, it is set in advance so as to cut out a group of character strings on a line to which the keyword identification mark is attached. Further, when the keyword identification mark is a rectangle, a setting is made so that a character string of one sentence included in the line to which the keyword identification mark is attached is cut out. The range from which the character string is cut out may be set to, for example, a character string included in one phrase, one line, one sentence, one paragraph at the position where the keyword identification mark is attached, and the like.
[0027]
In FIGS. 3A to 3C, the keyword identification mark is provided so as to surround the entire character string to be used as a keyword. 3 (d) to 3 (f) and 3 (j), they are attached so as to cover a part of a character string to be a keyword. In FIGS. 3 (g) to 3 (i), a character string to be used as a keyword is added to a blank portion of a sheet of paper. When the keyword identification mark is an ellipse, a triangle, and an ellipse represented by a double line, a lump of character strings in a line to which the keyword identification mark is attached is cut out, and therefore, FIG. , (B), (d), (e), (g), (h), and (j) are a set of character strings in a line with an ellipse represented by an ellipse, a triangle, and a double line. Certain “ABCD”, “○ ××”, and “EFGHI” are cut out. When the keyword identification mark is rectangular, a character string of one sentence included in the line to which the keyword identification mark is attached is cut out, and therefore, FIGS. 3 (c), (f), and (i). In this case, “XYZ, YYYYY,...”, Which is a character string of one sentence included in a line with a rectangle, is cut out.
[0028]
The title cut out and converted into character data, the name of the document creator, the text in which the contents of the document often appear, and the subtitle are output to the registration processing unit 101b.
The registration processing unit 101b stores the image data of the document in the database and the keyword for document search by storing the data regarding the document in the external storage device 102.
[0029]
Here, the external storage device 102 is composed of a large-capacity storage device such as a hard disk device. The external storage device 102 includes a document management database 102a that stores a group of document image data, as shown in FIG. And a keyword database 102b for storing keywords to be searched when searching for documents registered in the database.
[0030]
The registration processing unit 101b stores the image data of the document output from the scanner 2 as a record in the document management database 102a. Thereafter, when the character data of the keyword and the information on the type of the keyword identification mark output from the keyword extraction processing unit 101a are input, the registration processing unit 101b reads out the image data including the keyword from the document management database 102a. Then, the character data of the keyword is stored in the attribute information storage area of the image data. At this time, the registration processing unit 101b classifies the keywords into items based on the type of the keyword identification mark and stores the items. Thereafter, the registration processing unit 101b stores the image data in which the keyword character data is stored in the attribute information storage area in the document management database 102a, and also stores the keyword character data stored in the attribute information storage area in the keyword database. 102b. At this time, a pointer-type area is added to the character data of the keyword, and the address on the document management database 102a where the image data of the document including the keyword is stored is stored in the pointer-type area.
[0031]
In response to a search request signal from the document management client 3, the search processing unit 101c searches whether a keyword in the keyword database has a portion including a character string specified as a search condition, and compares the result with a document. The information is output to the management client 3. If the keyword in the keyword database has a portion including the character string specified as the search condition, the image data of the document including the keyword is stored based on the pointer on the data of the keyword. This is performed by referring to the address on the document management database, and outputting a signal relating to the title of the document to the document management client 3. If the keyword in the keyword database does not include a part including the character string specified as the search condition, a signal indicating that there is no document corresponding to the search condition is output to the document management client 3.
[0032]
Further, the search processing unit 101c reads the image data of the relevant document from the document management database 102a in response to the document read request signal output from the document management client 3 based on the search result, and sends this to the document management client 3. Output.
[0033]
The document management client 3 requests the document management server to search for a document registered in the database. As shown in FIG. 2, the document management client 3 includes a control unit 301, an input operation unit 302 such as a keyboard and a mouse, and a CRT (Cathode-Ray Tube). ) A display unit 303 such as a display or a liquid crystal display is provided.
[0034]
The control unit 301 includes a CPU (Central Processing Unit) and the like. The control unit 301 includes a search request processing unit 301a, a search result processing unit 301b, a document read request processing unit 301c, and a memory 301d. .
[0035]
The memory 301b stores programs related to search request processing, search result processing, and document read request processing.
[0036]
The search request processing unit 301a receives a character string signal input by the user as a search condition from the input operation unit 302, and outputs a document search request signal to the document management server 1 based on the signal. Has become.
[0037]
The search result processing unit 301b causes the display unit 303 to display a list of titles or the like of documents that meet the search conditions or display that there is no applicable document based on the signal about the search result output from the document management server 1. Has become.
[0038]
The document read request processing unit 301c receives a signal for selecting a read document input by the user from the input operation unit 302, and outputs a document read request signal to the document management server 1 based on this signal. Has become. As a result, based on the image data of the document read by the document management server 1, the search result processing unit 301b causes the display unit 303 to display the document.
[0039]
Next, the flow of document registration by the document management system will be described with reference to FIG. First, the keyword extraction processing unit 101a inputs image data output from the scanner 2 (S1), and recognizes the type and position of the keyword identification mark from the image data (S2). Next, the keyword extraction processing unit 101a cuts out a character string in a certain range from the keyword identification mark as a keyword based on the type information and the position information of the recognized keyword identification mark (S3). Thereafter, the keyword extraction processing unit 101a converts the cut-out character string into character data (S4). Next, the registration processing unit 101b stores the keyword converted into the character data in the attribute information storage area of the image data of the document including the keyword (S5). Thereafter, the registration processing unit 101b stores the image data in which the character data of the keyword is stored in the attribute information storage area in the document management database 102a, and stores the keyword stored in the attribute information storage area of the image data in the keyword database 102b. To register the document in the database.
[0040]
Next, the flow of document search by the document management system will be described. First, the search request processing unit 301a inputs a character string signal input by the user as a search condition from the input operation unit 302, and outputs a document search request signal to the document management server 1 based on this signal. . Next, the search processing unit 101c searches whether the keyword in the keyword database has a portion including the character string specified as the search condition in response to the search request signal, and compares the result with the document management client 3. , And the search result processing unit 301b causes the display unit 303 to display the search result. Next, the document read request processing unit 301c inputs a signal for selecting a read document input by the user from the input operation unit 302, and outputs a document read request signal to the document management server 1 based on this signal. I do. Next, the search processing unit 101c reads the image data of the document from the document management database 102a in response to the document read request signal, and outputs the read image data to the document management client 3. Then, the document is displayed on the display unit 303.
[0041]
In the embodiment of the present invention, the document management server 1 and the document management client 3 are configured by different computers, respectively, but may be configured by the same computer.
[0042]
In the embodiment of the present invention, the external storage device 102 stores a document management database 102a that stores a group of document image data and a keyword database 102b that stores keywords. One database may be stored, and the image data and the keyword may be stored in this database.
[0043]
As described above, according to the embodiment of the present invention, a document to be registered in a database is read as image data, a keyword included in the document is extracted and converted into character data, and the database is used as a search target when searching for a document. Register with. Therefore, the keyword is registered in the database without the operator's hand, so that the keyword can be registered easily and reliably. Further, the search of the document is performed by searching for a portion of the keyword that matches the character string specified by the search condition, so that the search time is short.
[0044]
Further, according to the embodiment of the present invention, not only the case where the keyword identification mark is attached to the entire character string to be used as the keyword, but also a part of the character string to be used as the keyword or the paper near the character string to be used as the keyword. , The entire character string desired to be a keyword is cut out, so that it does not take much time to attach a keyword identification mark. In addition, since it is not always necessary to attach a keyword identification mark with colored writing utensils, keywords can be extracted even if a document is read by a scanner that recognizes only black and white data.
[0045]
Further, the character data of the keyword is stored in the attribute information storage area of the image data of the document including the keyword, and is stored in the document management database. Therefore, even when the image data of the document is extracted from the database, the keyword stored in the attribute information storage area can be used.
[0046]
【The invention's effect】
According to the first aspect of the present invention, the keyword extraction processing unit cuts out a keyword for document search from a document converted into image data and converts it into character data, and the registration processing unit converts the character data of the keyword into character data. Since the keyword is stored in the database, the keyword can be registered easily and reliably. Further, the search of the document is performed by searching for a portion of the keyword that matches the character string specified by the search condition, so that the search time is short. Also, when the electronic data of the document is extracted from the database and a search is to be performed using a keyword on another computer, a search keyword is registered using the keyword stored in the attribute information storage area of the electronic data of the document. This eliminates the need to extract keywords again.
[0047]
According to the second aspect of the present invention, the keyword extraction processing section cuts out a character string within a certain range from the position where the keyword identification mark is attached. Not only when it is attached, but also when it is attached to a part of the character string that you want to use as a keyword or the margin of the paper near the character string that you want to use as a keyword, the entire character string that you want to use as a keyword is cut out. . Therefore, it does not take much time to attach the keyword identification mark.
[0048]
According to the third aspect of the present invention, the keywords are registered in the database in a state where the keywords are classified as a document title, a document creator name, a sentence in which the contents of the document appear, and the like. Therefore, when searching for a document, it is possible to search by classification. Further, by changing the type of the keyword identification mark according to the property of the character string to be registered as a keyword, a character string having an appropriate length can be cut out.
[Brief description of the drawings]
FIG. 1 is a diagram showing an example of the overall configuration of a document management system according to the present invention.
FIG. 2 is a block diagram illustrating a configuration example of a document management system according to the present invention.
FIG. 3 is a diagram showing an example of keyword extraction in the document management system according to the present invention.
FIG. 4 is a flowchart showing a flow of processing relating to registration of a document in a database in the document management system according to the present invention.
[Explanation of symbols]
1 Document management server
2 Scanner
3 Document management client
101a Keyword extraction processing unit
101b Registration processing unit
101c Search processing unit
102a Document management database
102b Keyword database
301a search request processing unit
301b Search result processing unit
301c Document read request processing unit

Claims

A document management server that registers and manages documents in a database and searches for documents, a document management client that requests the document management server to search for documents, and at least one that stores electronic data of documents and character data of keywords. A document management system comprising: an external storage device that stores two databases; and a document reading device that reads a document in which a character string to be registered in advance as a search keyword with a predetermined keyword identification mark as an image and outputs the image as image data. The document management server recognizes the keyword identification mark from the image data output from the document reading device, cuts out image data of a character string serving as a keyword based on the keyword identification mark, and converts the image data into character data. A keyword extraction processing unit to be converted, and the key A registration processing unit that stores character data of a keyword in an attribute information storage area on the electronic data of the document, and stores character data of the electronic data and the keyword stored in the attribute information storage area of the electronic data in the database. A document management system comprising:

The keyword identification mark is attached with a writing tool in the vicinity of a character string to be registered as a keyword, and the keyword extraction processing unit recognizes the position where the keyword identification mark is attached, and within a certain range from this position. 2. The document management system according to claim 1, wherein a certain character string is cut out.

A plurality of types of the keyword identification marks are set in advance, the keyword extraction processing unit recognizes the type of the keyword identification mark, cuts out image data of a character string in a predetermined range according to the type, and the registration processing unit 3. The document management system according to claim 1, wherein keywords are classified according to the type of the keyword identification mark and stored in an attribute information storage area on electronic data of the document.