JP3619825B2

JP3619825B2 - Document retrieval and delivery method and apparatus

Info

Publication number: JP3619825B2
Application number: JP2003208580A
Authority: JP
Inventors: 奈津子菅谷; 久光川口; 紀之山崎
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2003-08-25
Filing date: 2003-08-25
Publication date: 2005-02-16
Anticipated expiration: 2016-07-11
Also published as: JP2004030682A

Description

【０００１】
【発明の属する技術分野】
本発明は、電子メールや情報収集ロボット等を用いて通信社や新聞社等の複数の情報源から入手した電子化文書を、ユーザが予め登録しておいた検索条件式で検索し、条件が成立したユーザに対してその電子化文書を配布する文書検索配送システムに係わり、特にユーザ数が増えても電子化文書を一度走査するだけで全てのユーザの検索を終了できる即時性の高いテキスト検索配布機能を有する文書検索配送システムに関する。
【０００２】
【従来の技術】
近年、電子メールや電子ニュース等により、大量の電子化文書（以下、テキストと呼ぶ）が時々刻々とユーザのもとへ送られてくるようになった。また、インターネットを介して情報を提供する情報源が急激に増えており、これらの情報源から情報収集ロボット等を用いて収集するテキストも膨大な量となっている。このため、これらのテキストを検索し、そのテキストを求めているユーザに即座に配布する文書検索配送システムへのニーズが高まってきている。
この文書検索配送システムを実現するための核として、文書検索が用いられる（例えば、非特許文献１参照）。
これは、複数の照合すべき検索文字列（以下、検索タームと呼ぶ）からパターンマッチングマシンと呼ばれる一種の有限オートマトンを作成し、これにより、テキストをただ一度走査するだけで、同時に複数の検索タームを照合することができる方式である。
【０００３】
本方式について図２を用いて説明する。
同図は、“ｈｅ”、“ｓｈｅ”、“ｈｉｓ”および“ｈｅｒｓ”という４つの検索タームを照合する有限オートマトンの状態遷移図である。
ここで、円形は有限オートマトンの状態を、実線の矢印は状態遷移を表している。
各実線の矢印に付記されたアルファベットはこれに対応した状態遷移が起きる入力文字を、各円形の内部に記された数値は同状態の状態番号を示す。
また、破線の矢印はこの有限オートマトンに示されていない文字が入力された場合（以下、フェイルと呼ぶ）の遷移先を示している。
ここで、状態１、２、３、６、８から状態０への破線矢印は省略してある。
このフェイルによる遷移先は、実際には、図３に示すようなフェイル先状態番号テーブルによって管理される。
また、フェイルによって遷移した場合には、該入力文字に対し遷移先の状態で再照合を行う。
テキスト走査時に状態２、５、７、９に到達した場合、検索タームと一致する部分文字列がテキスト中に現われたことになるが、これは図４に示すような出力テーブルを参照することによって検出される。
この出力テーブルには、状態番号とその状態に到達したときに出力される文字列、すなわちテキスト中の部分文字列と一致した検索タームが格納されている。以下、本方式の動作について、図２を用いて説明する。
初期状態は状態０である。
この例の場合、入力文字が“ｈ”ならば状態１へ遷移し、“ｓ”ならば状態３へ遷移する。
もし、ここで、これら以外の文字（¬｛ｈ、ｓ｝で表し、“¬”は次の要素に否定条件が掛かることを示す）が入ってきた場合は初期状態である状態０に戻る。
また、状態３で入力文字が“ｈ”ならば状態４へ遷移する。
もし、ここで“ｈ”以外の文字が入ってきた場合は状態０へ戻り、ここで再度照合処理を行う。
一方、状態４で入力文字が“ｅ”ならば状態５へ遷移し、図４の出力テーブルを参照することによって検索ターム“ｓｈｅ”および“ｈｅ”と一致する部分文字列がテキスト中に現われたことが検出される。
ここで、もし“ｅ”以外の例えば“ｉ”が入力されたときは、フェイルの破線矢印を参照して状態１へ遷移する。
そして、遷移先の状態１で該入力文字“ｉ”について再照合することにより状態６へ遷移する。
【０００４】
次に、テキスト「ｕｓｈｅｒｓ」を対象に検索タームの照合を行った場合について説明する。
まず、第一文字目“ｕ”が入力される。
しかし、“ｕ”は“ｈ”および“ｓ”以外の文字なので初期状態である状態０に戻る。
次に、第二文字目“ｓ”が入力されることにより、状態０から状態３へ遷移する。
以下、第三文字目“ｈ”、第四文字目“ｅ”が入力されることにより、状態４、状態５へ遷移し、図４の出力テーブルを参照することにより、検索ターム“ｓｈｅ”および“ｈｅ”と一致する部分文字列がテキスト中に現われたことが検出される。
次に、第五文字目“ｒ”が入力される。
しかし、状態５には、入力文字“ｒ”に対する遷移先が存在しないためフェイルとなり、状態２へ遷移する。
ここで、“ｒ”に対して再照合を行うことにより、状態８へ遷移する。
最後に、第六文字目“ｓ”が入力されることにより、状態９へ遷移し、図４の出力テーブルを参照することにより、検索ターム“ｈｅｒｓ”と一致する部分文字列がテキスト中に現われたことが検出される。
このように、上記非特許文献１には、テキストに対するただ一度の走査で、同時に複数の検索タームを照合することができる文書検索方法が開示されている。
【０００５】
上記非特許文献１に記載された方式を日本語に拡張したものが知られている（例えば、非特許文献２参照）。
日本語は英語と異なり文字種が多い。
そのため計算機内部では通常１文字を２バイト、すなわち英語２文字分で日本語の１文字を表現している。
この２バイト文字を１バイトに分割して、上記非特許文献１のような有限オートマトンを単純に作成したのでは、２バイト文字の一部である１バイトと英語の１バイト文字との区別がつかないため、ノイズが発生する可能性がある。
そこで上記非特許文献２では、日本工業規格で情報交換用の符号系について１バイト文字と２バイト文字の切り換えを示す３バイトの文字コード（ＫＩおよびＫＯで表す）を規定していることに着目し、図５のように１バイト文字と２バイト文字を区別する有限オートマトンを作成することによってこの問題を解決している。
【０００６】
なお、
１バイトは８ビットとし、１バイト文字はＯＯ_（１６）からＦＦ_（１６）までの２５６文字である。
２バイト文字は２バイトから構成され、前半１バイト、後半１バイトはともに２１_（１６）から７Ｅ_（１６）までである。全部で９４×９４ある。
３バイトの文字コードＫＩは１Ｂ２４４２_（１６）である。
３バイトの文字コードＫＯは１Ｂ２８４Ａ_（１６）である。
１バイト文字のうち英数字は２１_（１６）から７Ｅ_（１６）までに属しており、カナ文字はＡＯ_（１６）からＤＦ_（１６）に属している。
【０００７】
ここで、状態０から状態０への遷移は、１バイト文字照合用有限オートマトンへ遷移する１バイト文字以外かＫＩ以外の１バイト文字が入力されたときに実行される。
また、状態３から状態３あるいは状態６への遷移は、２バイト文字照合用有限オートマトンへ遷移する２バイト文字の上位バイト以外かＫＯ以外の１バイトコードが入力されたときに実行され、その１バイトコードが１バイト文字であったときに状態３へ遷移し、２バイト文字の上位バイトであったときに状態６へ遷移する。
ここで、状態６は２バイト文字照合用有限オートマトンへ遷移しない２バイト文字の上位バイトが入力された場合にバイトずれを防ぐために設けられている。
【０００８】
バイトずれとは、２バイト文字の下位バイトが２バイト文字の上位バイトであるとみなされてしまうことである。
本方式では、状態６を設け、２バイト文字の下位バイトが入力されないと状態３へ戻れないようにすることによってこのバイトずれを防いでいる。この有限オートマトンの動作は上記非特許文献１とまったく同様である。
このように、上記非特許文献２に記載された方法を用いることにより、日本語のような１バイト文字と２バイト文字が混在する言語を対象とした場合でも、テキストに対するただ一度の走査で、同時に複数の検索タームを照合することが可能となる。
【０００９】
なお、上記では、テキストから検索タームを探す手法として有限オートマトンの例を説明したが、他の手法として拡張ＢＭ法が知られている（例えば、非特許文献３）。拡張ＢＭ法は、高速なパターン・マッチング手法であるＢＭ（Ｂｏｙｅｒ−ｍｏｏｒｅ）法を複数個のパターン（検索文字列）が扱えるように拡張したものであり、上記非特許文献３ではＥＢＭ（Ｅｘｐａｎｄｅｄ−Ｂｏｙｅｒ−ｍｏｏｒｅ）法と呼んでいる。
また、拡張ＢＭ法ではないが、多重文字列照合を行う手法も知られている（非特許文献４）。この手法は、高速なパターン・マッチング手法であるＢＭ法と、有限オートマトンを用いて複数パターンを同時に照合するＡＣ（Ａｈｏ−Ｃｏｒａｓｉｃｋ）法の基本的なアイディアを組み合わせたものである。
【非特許文献１】
“ＥｆｆｉｃｉｅｎｔＳｔｒｉｎｇＭａｔｃｈｉｎｇ：ＡｎＡｉｄｔｏＢｉｂｌｉｏｇｒａｐｈｉｃＳｅａｒｃｈ”（ＡｌｆｒｅｄＶ．ＡｈｏａｎｄＭａｒｇａｒｅｔＪ．Ｃｏｒａｓｉｃｋ、ＣｏｍｍｕｎｉｃａｔｉｏｎｓｏｆｔｈｅＡＣＭ、Ｊｕｎｅ１９７５、Ｖｏｌ．１８、Ｎｏ．６、ｐｐ．３３３−３４０）
【非特許文献２】
“日本語テキスト用のＡｈｏ−Ｃｏｒａｓｉｃｋ型パターン照合アルゴリズム”（篠原武、有川節夫、情報処理学会研究会資料、自然言語処理、１９８５．１１．１５、Ｖｏｌ．８６、Ｎｏ．４８、ｐｐ．５２．４．１−５２．４．８）
【非特許文献３】
「５種類のパターン・マッチング手法をＣ言語の関数で実現する」ＮＩＫＫＥＩＢＹＴＥ，Ａｕｇｕｓｔ１９８７，ｐ．ｐ．１７５−１８９
【非特許文献４】
「高速な複数文字列照合アルゴリズム：ＦＡＳＴ」情報処理学会論文誌，Ｖｏｌ．３０，ＮＯ．９，Ｓｅｐｔｅｍｂｅｒ１９８９
【００１０】
【発明が解決しようとする課題】
以上説明した上記非特許文献１、２に示された文書検索方法によると、テキストをただ一度走査するだけで、同時に複数の検索タームを照合することが可能となる。しかし、多数のユーザの検索条件式に対してテキスト検索を行う場合には以下に示す問題が生じる。
（１）ユーザ識別の問題
多数のユーザの検索条件式中に含まれる全ての検索タームで１つの有限オートマトンを作成することにより、テキストの一度の走査で全ての検索タームを照合することが可能となる。しかし、テキスト中の部分文字列と一致した検索タームがどのユーザの検索条件式中に含まれるものであるかを判別できないため、どのユーザの検索条件式が成立したのかが分からない。
（２）処理時間の問題
各ユーザの検索条件式毎に、その検索条件式中に含まれる検索タームで有限オートマトンを作成すれば、どのユーザの検索条件式が成立しているのかを判別することが可能となる。しかし、有限オートマトンの数分（すなわち、ユーザ数分）だけテキストを走査しなければならなくなるため、ユーザ数が増えるとその分検索に時間が掛かることになる。なお、有限オートマトンに代えて上記非特許文献３，４に示された手法を用いた場合についても同様である。
こうした問題に対し、本発明では以下の課題を解決することを目的とする。すなわち、本発明の目的は、複数の情報源から入手したテキストを、ユーザが予め登録しておいた検索条件式に基づき、テキストのただ一度の走査で複数のユーザの検索条件式が成立しているかどうかを判別し、条件が成立しているユーザに対してそのテキストを配布する文書検索配送システムを提供することにある。
【００１１】
【課題を解決するための手段】
上記目的を達成するため、本発明は、
一つ以上の情報源から入手した文書情報のテキストデータを対象として、
１個以上の検索タームを含む１人以上のユーザが指定した検索条件式を登録する検索条件式登録ステップと、テキストを入手した際に、該テキストに対する前記検索条件式の成否を判断し、該検索条件式が成立したユーザに対して、該テキストを配布するテキスト検索配布ステップを有する文書検索配送方法において、前記テキスト検索配布ステップは、前記テキストをただ一度走査することによって前記複数の検索条件式の該テキストに対する成否を判断するテキスト検索ステップを有するようにしている。
さらに、前記検索条件式登録ステップは、前記検索条件式から全ての検索タームを抽出する検索条件式解析ステップと、
ユーザ毎にユーザと該ユーザの検索条件式から抽出された全ての検索タームの数を含む管理情報を格納する検索ターム数カウントテーブルを作成する検索ターム数カウントテーブル作成ステップと、
前記検索条件式から抽出した全ての検索タームを、テキストのただ一度の走査により照合する際に参照する多重文字列照合テーブルを生成する多重文字列照合テーブル生成ステップと、
検索条件式から抽出された各検索ターム対応に該検索条件式を指定したユーザのユーザ識別子をリストとしてつないだユーザリストを生成するユーザリスト生成ステップを有し、
前記テキスト検索配布ステップは、該テキストに対する前記検索条件式の成否の判断時に、前記多重文字列照合テーブルを参照して該テキストを走査することによって、前記検索条件式解析ステップにより抽出された全ての検索タームを照合するテキスト走査ステップと、
前記テキスト走査ステップによって照合された検索タームと前記ユーザリストと前記検索ターム数カウントテーブルを照合することにより、該テキストに対する前記検索条件式の成否を判断する検索条件式成否判断ステップを有するようにしている。
さらに、前記多重文字列照合テーブルとして有限オートマトンを用いるようにしている。
さらに、前記検索条件式成否判断ステップは、前記ユーザリストを参照し、前記テキスト走査ステップによって照合された検索タームの個数をユーザ毎に算出する検索ターム照合数算出ステップと、
前記検索ターム照合数算出ステップで算出された検索タームの個数と前記検索ターム数カウントテーブルに格納されている検索タームの個数とを比較し、一致している場合には該検索タームが含まれる検索条件式が成立しているものとみなす検索ターム数比較ステップを有するようにしている。
【００１２】
また、一つ以上の情報源から入手した文書情報のテキストデータを対象として、
１個以上の検索タームを含む１人以上のユーザが指定した検索条件式を登録する検索条件式登録ステップと、テキストを入手した際に、該テキストに対する前記検索条件式の成否を判断し、該検索条件式が成立したユーザに対して、該テキストを配布するテキスト検索配布ステップを有する文書検索配送方法において、１人以上のユーザあるいはシステム管理者が指定したテキスト配布の条件を記した配布条件を含む配布条件設定式を登録する配布条件設定式登録ステップを有し、
前記テキスト検索配布ステップは、前記テキストをただ一度走査することによって前記複数の検索条件式の該テキストに対する成否を判断するテキスト検索ステップと、
前記テキスト検索ステップによって前記検索条件式が成立したユーザに対して、前記配布条件設定式登録ステップによって登録された前記配布条件が成立した時点で前記テキストを配布するテキスト配布制御ステップを有するようにしている。
さらに、前記配布条件設定式登録ステップは、前記配布条件設定式から配布条件を設定すべきユーザの識別子と配布条件を抽出する配布条件設定式解析ステップと、
前記配布条件設定式解析ステップにおいて前記配布条件設定式から抽出されたユーザの識別子と配布条件を格納した配布条件管理テーブルを作成する配布条件管理テーブル作成ステップを有し、
前記テキスト配布制御ステップは、前記配布条件管理テーブルを参照して前記配布条件の成否を判断する配布条件成否判断ステップと、
前記配布条件成否判断ステップによって前記配布条件が成立していると判断された時点でユーザに対して前記テキストを配布するテキスト配布ステップを有するようにしている。
さらに、前記配布条件として、配布する時間、配布する件数またはテキスト検索から配布までの遅延時間を用いるようにしている。
また、一つ以上の情報源から入手した文書情報のテキストデータを対象として、
１個以上の検索タームを含む１人以上のユーザが指定した検索条件式を登録する検索条件式登録ステップと、テキストを入手した際に、該テキストに対する前記検索条件式の成否を判断し、該検索条件式が成立したユーザに対して、該テキストを配布するテキスト検索配布ステップを有する文書検索配送方法において、前記検索条件式の削除が指示された場合には該検索条件式を削除する検索条件式削除ステップを有するようにしている。
さらに、前記検索条件式登録ステップは、前記検索条件式から全ての検索タームを抽出する検索条件式解析ステップと、
ユーザ毎にユーザと該ユーザの検索条件式から抽出された全ての検索タームの数を含む管理情報を格納する検索ターム数カウントテーブルを作成する検索ターム数カウントテーブル作成ステップと、
前記検索条件式から抽出した全ての検索タームを、テキストのただ一度の走査により照合する際に参照する多重文字列照合テーブルを生成する多重文字列照合テーブル生成ステップと、
検索条件式から抽出された各検索ターム対応に該検索条件式を指定したユーザのユーザ識別子をリストとしてつないだユーザリストを生成するユーザリスト生成ステップを有し、
前記検索条件式削除ステップは、削除が指示された前記検索条件式に関連する情報を前記検索ターム数カウントテーブルおよび前記ユーザリストから削除する検索条件式管理テーブル削除ステップを有するようにしている。
さらに、前記検索条件式登録ステップは、さらに、前記検索条件式解析ステップにより抽出された検索タームを格納した検索ターム管理テーブルを作成する検索ターム管理テーブル作成ステップを有し、
前記検索条件式管理テーブル削除ステップは、前記検索ターム管理テーブルを参照して、削除が指示された前記検索条件式に含まれる前記検索タームに対応する該検索条件式を指定したユーザのユーザ識別子を前記ユーザリストから削除するユーザリスト削除ステップと、
削除が指示された前記検索条件式に関連するユーザの管理情報を、前記検索ターム数カウントテーブルから削除する検索ターム数カウントテーブル削除ステップを有するようにしている。
また、一つ以上の情報源から入手した文書情報のテキストデータを対象として、
１個以上の検索タームを含む１人以上のユーザが指定した検索条件式を登録する検索条件式登録手段と、テキストを入手した際に、該テキストに対する前記検索条件式の成否を判断し、該検索条件式が成立したユーザに対して、該テキストを配布するテキスト検索配布手段を有する文書検索配送装置において、
前記テキスト検索配布手段は、前記テキストをただ一度走査することによって前記複数の検索条件式の該テキストに対する成否を判断するテキスト検索手段を有するようにしている。
さらに、前記検索条件式登録手段は、前記検索条件式から全ての検索タームを抽出する検索条件式解析手段と、
ユーザ毎にユーザと該ユーザの検索条件式から抽出された全ての検索タームの数を含む管理情報を格納する検索ターム数カウントテーブルを作成する検索ターム数カウントテーブル作成手段と、
前記検索条件式から抽出した全ての検索タームを、テキストのただ一度の走査により照合する際に参照する多重文字列照合テーブルを生成する多重文字列照合テーブル生成手段と、
検索条件式から抽出された各検索ターム対応に該検索条件式を指定したユーザのユーザ識別子をリストとしてつないだユーザリストを生成するユーザリスト生成手段を有し、
前記テキスト検索配布手段は、該テキストに対する前記検索条件式の成否の判断時に、前記多重文字列照合テーブルを参照して該テキストを走査することによって、前記検索条件式解析手段により抽出された全ての検索タームを照合するテキスト走査手段と、
前記テキスト走査手段によって照合された検索タームと前記ユーザリストと前記検索ターム数カウントテーブルを照合することにより、該テキストに対する前記検索条件式の成否を判断する検索条件式成否判断手段を有するようにしている。
さらに、前記多重文字列照合テーブルとして有限オートマトンを用いるようにしている。
さらに、前記検索条件式成否判断手段は、前記ユーザリストを参照し、前記テキスト走査手段によって照合された検索タームの個数をユーザ毎に算出する検索ターム照合数算出手段と、
前記検索ターム照合数算出手段で算出された検索タームの個数と前記検索ターム数カウントテーブルに格納されている検索タームの個数とを比較し、一致している場合には該検索タームが含まれる検索条件式が成立しているものとみなす検索ターム数比較手段を有するようにしている。
【００１３】
また、一つ以上の情報源から入手した文書情報のテキストデータを対象として、
１個以上の検索タームを含む１人以上のユーザが指定した検索条件式を登録する検索条件式登録手段と、テキストを入手した際に、該テキストに対する前記検索条件式の成否を判断し、該検索条件式が成立したユーザに対して、該テキストを配布するテキスト検索配布手段を有する文書検索配送装置において、
１人以上のユーザあるいはシステム管理者が指定したテキスト配布の条件を記した配布条件を含む配布条件設定式を登録する配布条件設定式登録手段を有し、前記テキスト検索配布手段は、前記テキストをただ一度走査することによって前記複数の検索条件式の該テキストに対する成否を判断するテキスト検索手段と、
前記テキスト検索手段によって前記検索条件式が成立したユーザに対して、前記配布条件設定式登録手段によって登録された前記配布条件が成立した時点で前記テキストを配布するテキスト配布制御手段を有するようにしている。
さらに、前記配布条件設定式登録手段は、前記配布条件設定式から配布条件を設定すべきユーザの識別子と配布条件を抽出する配布条件設定式解析手段と、
前記配布条件設定式解析手段において前記配布条件設定式から抽出されたユーザの識別子と配布条件を格納した配布条件管理テーブルを作成する配布条件管理テーブル作成手段を有し、
前記テキスト配布制御手段は、前記配布条件管理テーブルを参照して前記配布条件の成否を判断する配布条件成否判断手段と、
前記配布条件成否判断手段によって前記配布条件が成立していると判断された時点でユーザに対して前記テキストを配布するテキスト配布手段を有するようにしている。
さらに、前記配布条件として、配布する時間、配布する件数またはテキスト検索から配布までの遅延時間を用いるようにしている。
また、一つ以上の情報源から入手した文書情報のテキストデータを対象として、
１個以上の検索タームを含む１人以上のユーザが指定した検索条件式を登録する検索条件式登録手段と、テキストを入手した際に、該テキストに対する前記検索条件式の成否を判断し、該検索条件式が成立したユーザに対して、該テキストを配布するテキスト検索配布手段を有する文書検索配送装置において、
前記検索条件式の削除が指示された場合には該検索条件式を削除する検索条件式削除手段を有するようにしている。
さらに、前記検索条件式登録手段は、前記検索条件式から全ての検索タームを抽出する検索条件式解析手段と、
ユーザ毎にユーザと該ユーザの検索条件式から抽出された全ての検索タームの数を含む管理情報を格納する検索ターム数カウントテーブルを作成する検索ターム数カウントテーブル作成手段と、
前記検索条件式から抽出した全ての検索タームを、テキストのただ一度の走査により照合する際に参照する多重文字列照合テーブルを生成する多重文字列照合テーブル生成手段と、
検索条件式から抽出された各検索ターム対応に該検索条件式を指定したユーザのユーザ識別子をリストとしてつないだユーザリストを生成するユーザリスト生成手段を有し、
前記検索条件式削除手段は、削除が指示された前記検索条件式に関連する情報を前記検索ターム数カウントテーブルおよび前記ユーザリストから削除する検索条件式管理テーブル削除手段を有するようにしている。
さらに、前記検索条件式登録手段は、さらに、前記検索条件式解析手段により抽出された検索タームを格納した検索ターム管理テーブルを作成する検索ターム管理テーブル作成手段を有し、
前記検索条件式管理テーブル削除手段は、前記検索ターム管理テーブルを参照して、削除が指示された前記検索条件式に含まれる前記検索タームに対応する該検索条件式を指定したユーザのユーザ識別子を前記ユーザリストから削除するユーザリスト削除手段と、
削除が指示された前記検索条件式に関連するユーザの管理情報を、前記検索ターム数カウントテーブルから削除する検索ターム数カウントテーブル削除手段を有するようにしている。
【００１４】
【発明の実施の形態】
以下に、本発明の実施例を図を参照して説明する。
《第一実施例》
最初に、第一実施例の概略説明を図６を参照して行う。
まず、検索条件式登録処理について説明する。
まず、検索条件式を解析し、検索条件式中に含まれる検索タームを抽出する。そして、抽出された検索タームの数を、検索ターム数カウントテーブル作成処理により検索ターム数カウントテーブルに格納する。
例えば、図６において、ユーザ１：「“文書”と“検索”が含まれる文書」という検索条件式には“文書”と“検索”という２つの検索タームが含まれているので、検索ターム数カウントテーブルのユーザ１に対応する箇所に２を格納する。同様に、ユーザ２、ユーザ３に対応する箇所に１、２をそれぞれ格納する。
次に、有限オートマトン作成処理で、上記検索条件式解析で抽出された全ての検索タームを照合する有限オートマトンを作成する。この有限オートマトンは、上記非特許文献１および上記非特許文献２に示されたものと同様である。
ユーザ１：「“文書”と“検索”が含まれる文書」、ユーザ２：「“登山”が含まれる文書」およびユーザ３：「“検索”と“登録”が含まれる文書」という検索条件式から抽出した検索ターム“文書”、“検索”、“登山”および“登録”を照合する有限オートマトンの状態遷移図は図に示されたものとなる。本図では、簡単のため、状態遷移を２バイト（１文字）単位で示す。
次に、ユーザリスト作成処理で、それぞれの検索タームを指定したユーザの識別子をユーザリストとして、有限オートマトンに接続する。図６では、例えば“検索”を照合するとその末尾の状態４からユーザリストが参照され、“検索”を指定したユーザが“ユーザ１”および“ユーザ３”であることが検出される。
【００１５】
次に、テキストの検索および配布処理であるテキスト検索配布処理について説明する。
この処理では、まずテキスト走査処理でテキストの走査を行い、検索タームを照合する。
例えば、テキスト：「文書を検索する」を図６に示した有限オートマトンを用いて走査した場合には、“文書”および“検索”と一致する部分文字列がテキスト中に現われたことが検出される。本図に示した有限オートマトンで、末尾の状態に“○”が記されている検索タームはテキスト中に一致する部分文字列が出現したことを示し、“×”が記されている検索タームはテキスト中に一致する部分文字列が出現しなかったことを示す。
本例では、“文書”および“検索”と一致する部分文字列がテキスト中に現われたので、その末尾の状態である状態２および状態４に“○”が記されている。
【００１６】
次に、検索ターム数カウント処理でこれらテキスト中の部分文字列と一致した検索タームの出現数をユーザ毎にカウントする。
例えば、ユーザ１に対しては“文書”および“検索”が一致しているので２と、ユーザ３に対しては“検索”だけが一致しているので１とカウントする。しかし、ユーザ２はテキスト中に検索タームと一致する部分文字列が現われなかったので０である。
最後に、検索条件式チェック処理で、上記検索ターム数カウントテーブルに格納された検索ターム数と上記検索ターム数カウント処理で算出した検索ターム出現カウント数とを比較し、一致している場合には、テキスト配布処理でそのユーザに対しテキストを配布する。
例えば、図６でユーザ１は検索ターム数が２で一致しているため、テキストを配布するが、ユーザ２およびユーザ３は一致していないので配布しない。
【００１７】
以上のように、本実施例では、有限オートマトンを用いてテキストを走査し、テキスト中に一致する部分文字列として現われた検索タームの出現数を、ユーザリストを参照しながら各ユーザ毎にカウントする。
そして、カウントした結果と検索ターム数カウントテーブルに予め格納しておいた検索ターム数とを比較することによって検索条件式が成立しているかどうかをチェックする。
この結果、テキストのただ一度の走査で複数のユーザの検索条件式が成立しているかどうかを判別することが可能となり、即時性の高いテキスト検索配布が実現できることになる。
【００１８】
以下、本発明の第一の実施例について図１を用いて詳細に説明する。
本発明を適用した文書検索配送システムは、ディスプレイ１００、キーボード１０１、中央演算処理装置（ＣＰＵ）１０２、主メモリ１０４およびこれらを結ぶバス１０３から構成される。
また、バス１０３には、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）等の通信回線１２４を介して、ニュースを配信するニュース配信元１２５や文書検索配送システムを利用するユーザ１２６が接続されている。
ニュース配信元１２５は電子メールや電子ニュース等を用いてニュースデータを電子化したテキストを本システムへ配信したり、インターネットを介してテキストを提示し、ユーザ１２６は電子メールを用いて検索条件式を本システムへ登録する。
本システムからは上記検索条件式に基づいて検索された上記テキストが電子メールを用いて該当ユーザへ配布される。
以下、本実施例では、ニュース配信元１２５は電子メール等を用いて本システムにテキストを配信するものとして述べるが、ニュース配信元１２５はテキストをインターネット上に提示するだけで、テキストの収集は情報収集ロボットを用いて行うようにしてもかまわない。
【００１９】
主メモリ１０４には、システム制御プログラム１０５、検索条件式登録制御プログラム１０６、検索条件式解析プログラム１０７、検索ターム数カウントテーブル作成プログラム１０８、検索用オートマトン作成プログラム１０９、テキスト検索制御プログラム１１２、テキスト取得プログラム１１３、テキスト検索プログラム１１４、テキスト成形プログラム１１８、電子メールプログラム１１９、検索ターム数カウントテーブル１２０、有限オートマトン１２１およびユーザリスト１２２が格納されるとともにワークエリア１２３が確保される。
検索用オートマトン作成プログラム１０９は有限オートマトン作成プログラム１１０およびユーザリスト作成プログラム１１１で構成される。
なお、本実施例では、テキストから検索タームを探し出す手法として有限オートマトンを用いているが、テキストから検索タームを探し出す手法としては有限オートマトンに限られるものではなく、有限オートマトンを用いてもよいし、上記非特許文献３，４に示された手法を用いてもよく、この場合、検索用オートマトン作成プログラム、有限オートマトン作成プログラムの表現は適当ではなく、より一般化した表現を用いると、検索用文字列照合テーブル作成プログラム、多重文字列照合テーブル作成プログラムの様になる。
また、テキスト検索プログラム１１４はテキスト走査プログラム１１５、検索ターム数カウントプログラム１１６および検索条件式チェックプログラム１１７で構成される。
以上のプログラムはハードディスク装置（本図には示していない）、フレキシブルディスク（本図には示していない）などのコンピュータで読み書きできる記憶媒体に格納することもできる。
【００２０】
システム制御プログラム１０５は文書検索配送システムの管理者によるキーボード１０１からの指示を受け起動する。
検索条件式登録制御プログラム１０６およびテキスト検索制御プログラム１１２はユーザ１２６からの検索条件式の登録指示やニュース配信元１２５からのテキストの配信により、システム制御プログラム１０５によって起動され、それぞれ検索条件式解析プログラム１０７、検索ターム数カウントテーブル作成プログラム１０８および検索用オートマトン作成プログラム１０９の制御と、テキスト取得プログラム１１３、テキスト検索プログラム１１４およびテキスト成形プログラム１１８の制御を行う。
電子メールプログラム１１９にはワークステーションなどで一般的に用いられている既存のメールプログラムを用いる。
本電子メールプログラム１１９はテキスト検索制御プログラム１１２の処理結果に応じて、システム制御プログラム１０５によって起動される。
【００２１】
以下、本実施例における文書検索配送システムの処理内容について説明する。まず、システム制御プログラム１０５の処理内容について図７のＰＡＤ（ＰｒｏｂｌｅｍＡｎａｌｙｓｉｓＤｉａｇｒａｍ）図を用いて説明する。
システム制御プログラム１０５では、まずステップ７００で、キーボード１０１から終了コマンドが入力されるまで、以下のステップを繰り返す。
この繰り返し処理では、まずステップ７０１でユーザ１２６から電子メールによって検索条件式が送られてきているかどうかを調べる。
ここで、検索条件式が送られてきている場合には、ステップ７０２で検索条件式登録制御プログラム１０６を起動して、検索条件式の登録を行う。
次に、ステップ７０３でニュース配信元１２５から電子メールによってテキストが送られてきているかどうかを調べる。
ここで、テキストが送られてきている場合には、ステップ７０４でテキスト検索制御プログラム１１２を起動して、テキストの検索を行う。
次に、ステップ７０５でテキスト検索制御プログラム１１２におけるテキスト検索の結果を調べ、成立している検索条件式が一つでも存在すると判断された場合には、ステップ７０６で電子メールプログラム１１９を起動し、成立した検索条件式を指定したユーザに対し、該当テキストを電子メールを用いて配布する。以上がシステム制御プログラム１０５の処理内容である。
【００２２】
次に、検索条件式登録制御プログラム１０６による検索条件式登録の処理内容について図８のＰＡＤ図を用いて説明する。
検索条件式登録制御プログラム１０６はシステム制御プログラム１０５によって起動される。
本プログラムは、まずステップ８００で検索条件式解析プログラム１０７を起動し、ユーザ１２６から電子メールによって送られてきた検索条件式を解析する。
この検索条件式の解析処理では、検索条件式を以下のいずれかの形式に展開する。すなわち、
（ａ）単一検索タームのみ、
（ｂ）複数の（ａ）の論理積条件、
（ｃ）複数の（ａ）と複数の（ｂ）の論理和条件
である。つまり、論理和条件の外側に論理積条件が掛からないように検索条件式を展開する。
ただし、否定条件は検索条件式全体または検索タームに掛かるようにする。
ここで、論理積条件とは、例えば、
検索式「“文書” ａｎｄ “検索”」
のように、「“文書”と“検索”の両方の文字列が現れる文書を探せ」という意味を持ち、論理和条件とは、例えば、
検索式「“文書” ｏｒ “検索”」
のように、「“文書”か“検索”のどちらかの文字列が現れる文書を探せ」という意味を持つ。
また、否定条件とは、例えば、
検索式「¬“検索”」
のように、「“検索”が現れない文書を探せ」という意味を持つ。
例えば、“Ａ”、“Ｂ”、“Ｃ”、“Ｄ”、“Ｅ”を検索タームとすると、それぞれ次のような形式に展開される。
（ａ）Ａ
（ｂ）ＡａｎｄＢａｎｄＣａｎｄ・・・
（ｃ）（ＡａｎｄＢ）ｏｒＣｏｒ（ＤａｎｄＥ）ｏｒ・・・
例えば、検索条件式が論理和条件に対する論理積条件である場合、すなわち論理和条件の外側に論理積条件が掛かっている場合には、図９に示すように展開して上記の条件を満たすように変形する。
ここで、展開結果における論理積条件あるいは単一の検索タームの部分を項と呼ぶ。
【００２３】
次に、ステップ８０１で検索条件式登録制御プログラム１０６は、検索ターム数カウントテーブル作成プログラム１０８を起動して、検索条件式解析プログラム１０７の解析結果として得られた検索条件式中に含まれる検索タームの数や検索条件式に否定条件が掛かっているかどうかを示す情報を検索ターム数カウントテーブル１２０に格納する。
この検索ターム数カウントテーブル１２０は、検索条件式毎（すなわちユーザ毎）に、検索条件式中の項に対応してその中に含まれる検索タームの数を格納したテーブルである。
図１０にその構造を示す。
本図に示す検索ターム数カウントテーブル１２０は、ユーザ番号１：「（“文書”ａｎｄ“検索”）ｏｒ（“文書”ａｎｄ“サーチ”）」、ユーザ番号２：「“文字”ａｎｄ ¬“認識”ａｎｄ ¬“学習”」およびユーザ番号３：「¬（“検索”ａｎｄ“学習”）」という３つの検索条件式に対して作成されたものである。
まず、この検索ターム数カウントテーブル１２０の先頭の要素として検索条件式否定フラグを設ける。
この検索条件式否定フラグには、検索条件式全体に否定条件が掛かっている場合には１を、そうでなければ０を設定する。
例えば、本図に示す例の場合、ユーザ番号３の検索条件式全体に否定条件が掛かっているので１を設定するが、その他の検索条件式には否定条件が掛かっていないので０を設定する。
この検索条件式否定フラグに対し第１項から順にその項に含まれる検索タームの数をリストとしてつないでいく。
例えば、ユーザ番号１に対応するリストの２番目の要素には、検索条件式の第１項に含まれる検索タームの数が格納されることになるが、本項には“文書”と“検索”という２つの検索タームが含まれているので２を格納する。
さらにその次の要素には、第２項に含まれる検索タームの数である２を格納する。
また、２番目以降の要素にはそれぞれ、検索タームの数を格納するのと同時に、テキスト検索時にテキスト中の部分文字列と一致した検索タームの出現数をカウントするための領域が確保されている。
【００２４】
最後に、ステップ８０２で検索条件式登録制御プログラム１０６は、検索用オートマトン作成プログラム１０９を起動し、上記検索条件式解析プログラム１０７の解析結果として得られた検索条件式中に含まれる全ての検索タームを照合する有限オートマトン１２１を作成する。
そして、それらの検索タームが含まれる検索条件式を指定したユーザ１２６の識別子をリストとしてつないだユーザリスト１２２を作成し、これを有限オートマトン１２１に接続する。
この検索用オートマトン作成プログラム１０９の処理内容については、後で詳細に説明する。
以上が検索条件式登録制御プログラム１０６による検索条件式登録の処理内容である。
【００２５】
次に、テキスト検索制御プログラム１１２によるテキスト検索の処理内容について図１１のＰＡＤ図を用いて説明する。
テキスト検索制御プログラム１１２はシステム制御プログラム１０５によって起動される。
本プログラムは、まずステップ１１００でテキスト取得プログラム１１３を起動し、ニュース配信元１２５から電子メール等によって送られてきたテキストをワークエリア１２３に格納する。
次に、ステップ１１０１でテキスト検索プログラム１１４を起動し、前記検索ターム数カウントテーブル作成プログラム１０８によって作成された検索ターム数カウントテーブル１２０および前記検索用オートマトン作成プログラム１０９によって作成された有限オートマトン１２１とユーザリスト１２２を用いて、ワークエリア１２３に格納されているテキストを検索する。
このテキスト検索プログラム１１４の処理内容については、後で詳細に説明する。
次に、ステップ１１０２で、テキスト検索プログラム１１４におけるテキスト検索処理の結果を調べ、成立している検索条件式が一つでも存在した場合には、ステップ１１０３でテキスト成形プログラム１１８を起動し、ワークエリア１２３に格納されているテキストを電子メールプログラム１１９が配布できるような形式に成形する。
以上がテキスト検索制御プログラム１１２によるテキスト検索の処理内容である。
【００２６】
次に、図８に示した検索条件式登録制御プログラム１０６による検索条件式登録処理における検索用オートマトン作成プログラム１０９の処理内容について、図１２のＰＡＤ図を用いて説明する。
検索用オートマトン作成プログラム１０９は、図１２に示すように、まずステップ１２００で有限オートマトン作成プログラム１１０を起動し、前記検索条件式解析プログラム１０７によって抽出された全ての検索タームを照合する有限オートマトン１２１を作成する。
この有限オートマトン１２１の作成方法には、上記非特許文献１および上記非特許文献２に開示されている方法を用いる。
次に、ステップ１２０１でユーザリスト作成プログラム１１１を起動し、前記検索条件式解析プログラム１０７によって抽出された検索タームが含まれる検索条件式を指定したユーザ１２６の識別番号（ユーザ番号）とその検索タームが含まれる項の番号（項番号）からユーザリスト１２２を作成し、ステップ１２０２でポインタを介して有限オートマトン１２１の出力テーブルに接続する。
【００２７】
前記検索ターム数カウントテーブル１２０の説明に用いた３つの検索条件式から作成される有限オートマトン１２１とユーザリスト１２２の例を図１３に示す。
本図に示した有限オートマトン１２１は、検索条件式中に含まれる“文書”、“文字”、“検索”、“サーチ”、“認識”および“学習”という６つの検索タームを照合するものである。
本図では簡単のため、状態遷移を２バイト（１文字）単位で示している。
この有限オートマトン１２１は上記非特許文献１および上記非特許文献２で示したものと同様なものであるが、出力テーブルの部分が異なる。
出力テーブルに格納されている各状態番号に対応して出現フラグが設けてある。この出現フラグはテキストの走査開始時に０にリセットしておき、検索タームと一致する部分文字列がテキスト中に現われた場合に１をセットする。
また、出力テーブルの末尾にはポインタが設けてあり、その検索タームを含む検索条件式のユーザ番号および項番号をリストとしてつないだユーザリスト１２２を指し示すようにしている。
ユーザリスト１２２の一つ一つの要素には、検索ターム否定フラグが設けてあり、検索条件式においてその検索タームに否定条件が掛かっている場合には１を、そうでなければ０を設定する。
例えば、本図において“文書”という検索タームはユーザ番号１の検索条件式の項番号１および２の項に否定条件無しで存在し、“認識”という検索タームはユーザ番号２の検索条件式の項番号１の項に否定条件付きで存在している。
【００２８】
次に、図１１に示したテキスト検索制御プログラム１１２におけるテキスト検索処理を実行するテキスト検索プログラム１１４の処理内容について、図１４のＰＡＤ図を用いて説明する。
テキスト検索プログラム１１４は、本図に示すように、まずステップ１４００でテキスト検索の初期設定として、検索ターム数カウントテーブル１２０に設けてある検索タームの出現数カウント用領域および有限オートマトン１２１の出力テーブルに設けてある出現フラグを０にリセットする。
次に、ステップ１４０１で、テキスト走査プログラム１１５を起動し、ワークエリア１２３に格納されているテキストを、有限オートマトン作成プログラム１１０によって作成された有限オートマトン１２１で走査し、検索タームを照合する。
この有限オートマトン１２１による検索タームの照合処理には上記非特許文献および上記非特許文献２に開示されている方法を用いる。
このとき、テキスト中に一致する部分文字列が出現した検索タームについては、その検索タームに対応する出力テーブルの出現フラグを１に設定する。
次に、ステップ１４０２で検索ターム数カウントプログラム１１６を起動し、テキスト中に一致する部分文字列が出現した検索タームのカウントを行う。
これは、出力テーブルに設けてある出現フラグが１であるユーザリスト１２２を辿り、検索ターム否定フラグが０であるユーザ番号と項番号に対応する検索ターム数カウントテーブル１２０における検索タームの出現数カウント用領域の値を１ずつ増やしていくことにより実現する。
次に、ステップ１４０３でテキスト中に一致する部分文字列が出現しなかった検索タームのカウントを行う。
これは、出力テーブルに設けてある出現フラグが０であるユーザリスト１２２を辿り、検索ターム否定フラグが１であるユーザ番号と項番号に対応する検索ターム数カウントテーブル１２０における検索タームの出現数カウント用領域の値を１ずつ増やしていくことにより実現する。
【００２９】
次に、ステップ１４０４で検索条件式チェックプログラム１１７を起動し、検索ターム数カウントテーブル１２０を参照し、検索条件式が成立しているかどうかを調べる。
ここで、以下の２つの条件のうち、どちらかを満たしている検索条件式は成立しているものとみなせる。
条件（１）：検索条件式否定フラグが０（すなわち、検索条件式に否定条件が掛かっていない）で、検索ターム数が一致している項番号が少なくとも一つある。
条件（２）：検索条件式否定フラグが１（すなわち、検索条件式に否定条件が掛かっている）で、検索ターム数が一致している項番号が一つもない。
【００３０】
この検索条件式の成否の判定について、図１０を用いて説明する。
本発明では本図に示すように、ユーザ１２６が指定した検索条件式を項が論理和条件でつながれた形式に変形し、その項毎に含まれる検索ターム数を検索ターム数カウントテーブル１２０に格納している。
項が論理和条件でつながれているということは、それらの項のどれか一つが成立すればその検索条件式全体が成立することになる。
ここで、項は単一の検索タームあるいは検索タームの論理積条件である。
そのため、その項の中に含まれる検索タームと一致する部分文字列全てがテキスト中に出現した場合、すなわち予め検索ターム数カウントテーブル１２０に格納しておいた検索ターム数と検索ターム数カウントプログラム１１６によって算出された検索ターム出現カウント数が一致した場合、その項が成立することになる。
その結果、項の論理和条件で構成される検索条件式も成立することになる。
このように、上記条件（１）を満たせば、検索条件式は成立しているものとみなせる。
【００３１】
上記条件（２）では、条件（１）と逆になる。
検索条件式に否定条件が掛かっている場合、検索条件式から否定条件を取った検索条件式が成立していれば、否定条件が掛かった元の検索条件式は成立せず、検索条件式から否定条件を取った検索条件式が成立していなければ、否定条件が掛かった元の検索条件式は成立していることになる。
予め検索ターム数カウントテーブル１２０に格納しておいた検索ターム数と検索ターム数カウントプログラム１１６によって算出された検索ターム出現カウント数が一致している項が一つもなければ、否定条件を取った検索条件式は成立せず、否定条件が掛かった元の検索条件式が成立することになる。
このように、上記条件（２）を満たせば、検索条件式は成立しているものとみなせる。
上記条件のどちらかを満たしている検索条件式は成立しているとみなせるため、ステップ１４０５でそのユーザ番号をテキスト検索制御プログラム１１２へ出力し、本プログラムを終了する。
以上が本発明の文書検索方法の実施例である。
【００３２】
以下、図８に示した本実施例における検索条件式登録制御プログラム１０６の処理手順について具体的に説明する。
まず、図８の検索条件式登録制御プログラム１０６のステップ８００における検索条件式解析プログラム１０７の処理について説明する。
検索条件式解析プログラム１０７は検索条件式登録制御プログラム１０６によって起動される。
例えば、「“文書”ａｎｄ（“検索”ｏｒ“サーチ”）」、すなわち「“文書”を含み、かつ“検索”または“サーチ”を含む文書」というユーザ１の検索条件式の展開結果を数（１）に、「“文字”ａｎｄ ¬（“認識”ｏｒ“学習”）」、すなわち「“文字”を含むが、“認識”も“学習”も含まない文書」というユーザ２の検索条件数の展開結果を数（２）に、「¬（“検索”ａｎｄ “学習”）」、すなわち「“検索”も“学習”も含まない文書」というユーザ３の検索条件式の展開結果を数（３）に示す。
【００３３】
【数１】

【数２】

【数３】

この結果、論理和条件の外側に論理積条件が掛かっていない検索条件式が、すなわち、「（“文書”ａｎｄ“検索”）ｏｒ（“文書”ａｎｄ“サーチ”）」、「“文字”ａｎｄ ¬“認識”ａｎｄ ¬“学習”」、「¬（“検索”ａｎｄ “学習”）」が得られる。
これらの検索条件式中に含まれる検索タームをユーザ番号と項番号という観点から表にまとめると表１のようになる。
【００３４】
【表１】

ここで、ユーザ番号の前に“¬”が付与されているのは検索条件式全体に否定条件が掛かることを、検索タームの前に“¬”が付与されているのは検索タームに否定条件が掛かることを示す。
例えば、ユーザ番号１の検索条件式の項番号１には“文書”と“検索”という２つの検索タームが、項番号２には“文書”と“サーチ”という２つの検索タームが含まれ、ユーザ番号２の検索条件式の項番号１には“文字”、“認識”および“学習”という３つの検索タームが含まれ、このうち“認識”と“学習”には否定条件が掛かることを表している。
【００３５】
次に、図８の検索条件式登録制御プログラム１０６のステップ８０１における検索ターム数カウントテーブル作成プログラム１０８の処理について説明する。検索ターム数カウントテーブル作成プログラム１０８は検索条件式登録制御プログラム１０６によって、検索条件式解析プログラム１０７の次に起動される。
本プログラムは、検索条件式解析プログラム１０７による解析結果に基づき検索ターム数カウントテーブル１２０を作成する。
表１の解析結果から作成される検索ターム数カウントテーブル１２０は図１０のようになる。
本テーブルには、各ユーザ番号毎に項番号に対応して検索タームの数が格納される。
また前述したように、検索条件式否定フラグには検索条件式全体に否定条件が掛かっているなら１を、そうでなければ０を設定する。
【００３６】
最後に、図８の検索条件式登録制御プログラム１０６のステップ８０２における検索用オートマトン作成プログラム１０９の処理について説明する。
本プログラムの処理内容は図１２に示した。本プログラムは図１に示したように、有限オートマトン作成プログラム１１０およびユーザリスト作成プログラム１１１から構成される。
以下、順に説明する。
【００３７】
有限オートマトン作成プログラム１１０では、検索条件式解析プログラム１０７によって抽出された全ての検索タームを照合する有限オートマトン１２１を作成する。
例えば、数（１）、数（２）、数（３）および表１に示した例の場合、検索条件式解析プログラム１０７の解析結果から“文書”、“検索”、“サーチ”、“文字”、“認識”および“学習”の６つの検索タームが得られる。
上記非特許文献１および上記非特許文献２に開示されている方法を用いて、これらの検索タームを照合する有限オートマトン１２１を作成すると、図１３に示したような有限オートマトン１２１が得られる。
ただし、ここでは簡単のため、状態遷移を１文字すなわち２バイト単位で示している。
【００３８】
ユーザリスト作成プログラム１１１では、検索条件式解析プログラム１０７によって得られた検索タームが含まれる検索条件式のユーザ番号、項番号およびその検索タームに否定条件が掛かっているかどうかという情報をリストでつなぐことによってユーザリスト１２２を作成し、ポインタを介して有限オートマトン１２１の出力テーブルに接続する。
このユーザリスト１２２の作成方法は前述した通りである。
表１の解析結果から作成されるユーザリスト１２２を図１５に示す。
例えば、“学習”という検索タームはユーザ番号２の検索条件式の項番号１の項に否定条件付きで含まれ、ユーザ番号３の検索条件式の項番号１の項に否定条件無しで含まれるので、これらに対応する番号をリストでつないだ形でユーザリスト１２２が作成される。
そして、このようにして作成されたユーザリスト１２２は、有限オートマトン１２１の出力テーブルにポインタを介して接続される。
以上が本実施例における検索条件式登録制御プログラム１０６における検索条件式登録の詳細な手順である。
【００３９】
以下、図１１に示した本実施例におけるテキスト検索制御プログラム１１２の処理手順について具体的に説明する。
まず、図１１のテキスト検索制御プログラム１１２のステップ１１００におけるテキスト取得プログラム１１３の処理について説明する。
テキスト取得プログラム１１３はテキスト検索制御プログラム１１２によって起動される。
本プログラムでは、電子メールによって配信されたテキストをワークエリア１２３に格納する。
本プログラムによって、「検索した文書の書式を解析し、文字列部分を認識する」というテキストがワークエリア１２３に格納されたものとして、以下の説明を行う。
【００４０】
図１１のテキスト検索制御プログラム１１２のステップ１１０１におけるテキスト検索プログラム１１４の処理について説明する。
本プログラムの処理内容は図１４に示した。
本プログラムは図１に示したように、テキスト走査プログラム１１５、検索ターム数カウントプログラム１１６および検索条件式チェックプログラム１１７から構成される。
以下、順に説明する。
まず、上記テキスト走査プログラム１１５、検索ターム数カウントプログラム１１６および検索条件式チェックプログラム１１７が実行される前に初期設定が行われる。
ここでは、図１０および図１５に示すように、検索ターム数カウントテーブル１２０の検索タームの出現数カウント用領域および出力テーブルの出現フラグが０にリセットされる。
【００４１】
テキスト走査プログラム１１５では、ワークエリア１２３に格納されているテキストを、有限オートマトン１２１で走査して、検索タームを照合する。
ここで、テキスト中に一致する部分文字列が出現した検索タームに対応する出現フラグを１に設定する。
例えば、図１６に示したようにテキスト「検索した文書の書式を解析し、文字列部分を認識する」を走査すると、まず、“検索”がテキスト中に現われる。
【００４２】
そこで、“検索”に対応する出現フラグを１に設定する。
以下、“文書”、“文字”および“認識”の順に出現するので、それらの検索タームに対応する出現フラグを１に設定する。
“サーチ”および“学習”という検索タームについては、テキスト中に一致する部分文字列が現われないので、それらの検索タームに対応する出現フラグは０のままである。
【００４３】
検索ターム数カウントプログラム１１６では、まず、テキスト中に一致する部分文字列が出現した検索タームのカウントを行う。
ここでは、テキスト中に一致する部分文字列が出現し、すなわち出現フラグが１で、検索タームに否定条件が掛かっていない、すなわち検索ターム否定フラグが０であるユーザ番号と項番号に対応する検索ターム数カウントテーブル１２０の検索タームの出現数カウント用領域を１増やす。
例えば、図１７の例では、検索ターム“検索”に関してはユーザ番号３の項番号１の検索ターム否定フラグは０なのでカウントするが、検索ターム“認識”についてはユーザ番号２の項番号１の検索ターム否定フラグが１なのでカウントしない。
次に、テキスト中に一致する部分文字列が出現しなかった検索タームのカウントを行う。
ここでは、テキスト中に一致する部分文字列が出現せず、すなわち出現フラグが０で、検索タームに否定条件が掛かっている、すなわち検索ターム否定フラグが１であるユーザ番号と項番号に対応する検索ターム数カウントテーブル１２０の検索タームの出現数カウント用領域を１増やす。
例えば、図１８の例では、検索ターム“学習”に関してはユーザ番号２の項番号１の検索ターム否定フラグは１なのでカウントするが、ユーザ番号３の項番号１の検索ターム否定フラグは０なのでカウントしない。
【００４４】
次に、検索条件式チェックプログラム１１７では、検索ターム数カウントテーブル１２０を参照し、検索条件式が成立しているかどうかを調べる。
ここで、以下の２つの条件のうち、どちらかを満たしている検索条件式は成立しているとみなせるため、その検索条件式を指定したユーザの番号を出力する。条件（１）：検索条件式否定フラグが０、すなわち検索条件式に否定条件が掛かっておらず、検索ターム数が一致している項番号が少なくとも一つある。
条件（２）：検索条件式否定フラグが１、すなわち検索条件式に否定条件が掛かっていて、検索ターム数が一致している項番号が一つもない。
例えば、図１９の例の場合、ユーザ番号１に対応する検索条件式否定フラグが０で、項番号１の検索ターム数と検索ターム出現カウント数が一致しているため、上記条件（１）を満たしている。
また、ユーザ番号３に対応する検索条件式否定フラグが１で、検索ターム数と検索ターム出現カウント数が一致している項番号が存在しないため、上記条件（２）を満たしている。
しかし、ユーザ番号２では、検索条件式否定フラグが０であるにもかかわらず、検索ターム数と検索ターム出現カウント数が一致している項が存在しないため、上記条件をいずれも満たしてはいない。
したがって、ユーザ番号１およびユーザ番号３の検索条件式が成立しているとみなせるため、これらのユーザ番号を出力する。
【００４５】
最後に、図１１のテキスト検索制御プログラム１１２のステップ１１０３におけるテキスト成形プログラム１１８の処理について説明する。
テキスト成形プログラム１１８は、上記テキスト検索プログラム１１４の結果、ユーザ番号が出力された場合のみ、テキスト検索制御プログラム１１２によって起動される。
本プログラムでは、ワークエリア１２３に格納されているテキストを電子メールプログラム１１９が配布できるような形式に成形する。
例えば、テキストの先頭にヘッダと呼ばれる制御情報を付加する。
図２０に本プログラムの処理結果の例を示す。
本図では、ヘッダとして“Ｔｏ：”、“Ｓｕｂｊｅｃｔ：”および
“Ｆｒｏｍ：”の各行が付加されている。
“Ｔｏ：”行にはテキストを配布する宛て先、例えば電子メールの送り先のアドレスを付加する。
図２０では、ユーザ１およびユーザ３にテキストを配布するため、“ユーザ１”および“ユーザ３”と記述されている。
“Ｓｕｂｊｅｃｔ：”行にはユーザが識別しやすい情報を付加する。
本図では、配布するテキストの最初の数文字を抜き出して記述しているが、ここには何を付加してもよい。
“Ｆｒｏｍ：”行にはテキストの送り元、例えば電子メールの送り元のアドレスを付加する。
本図では、テキストを配布するシステムの名称である“文書検索配送システム”と記述されている。
以上が本実施例におけるテキスト検索制御プログラム１１２におけるテキスト検索の詳細な手順である。
【００４６】
テキスト検索制御プログラム１１２の処理の結果、成立している検索条件式が一つでも存在した場合には、図７に示したように、テキスト検索制御プログラム１１２の終了後、電子メールプログラム１１９がシステム制御プログラム１０５によって起動される。
本プログラムでは、テキスト成形プログラム１１８によって付加されたヘッダを参照して、テキストを電子メールによって配布する。
例えば、図２０の例の場合、ヘッダの“Ｔｏ：”行に相当する部分を参照し、そこに記述されている宛て先にヘッダが付加されたテキストを送付する。
本図では、“Ｔｏ：”行に“ユーザ１”および“ユーザ３”と記述されているため、ユーザ１とユーザ３にテキストを配布し、処理を終了する。
【００４７】
以上説明したように、本発明によれば、複数ユーザの検索条件式の登録時に、それらの検索条件式中に含まれる検索タームを指定したユーザの識別情報とそのユーザが指定した検索条件式中に含まれる検索タームの数を記憶しておき、テキストの検索時に、テキスト中に一致する部分文字列が出現した検索タームのユーザ毎の数と記憶しておいたユーザ毎の検索ターム数とを比較することによって全ての検索条件式が成立しているかどうかを判別することができるため、ただ一度のテキスト走査で全てのユーザの検索条件式の成否を判定でき、全ユーザの検索条件式に関する検索処理を一度に行うことが可能となる。
その結果、複数の情報源から入手したテキストを、ユーザが予め登録しておいた検索条件式に基づき、テキストのただ一度の走査で複数ユーザの検索条件式が成立しているかどうかを判別し、条件が成立しているユーザに対して即座にそのテキストを配布することができる即時性の高い文書検索配送システムを実現することが可能となる。
また、この文書検索配送システムは即時性が高いので、ユーザがシステムに検索条件式を通知してから検索されたテキストが配送されるまでの時間が短く、この時間を監視することにより本発明が適用されているか否かを判定することが可能である。
【００４８】
《第二実施例》
次に、本発明の第二の実施例について説明する。
本実施例で示す文書検索配送システムでは、ユーザ毎に配布条件を管理することにより、ある程度まとめてテキストを配布したり、決まった時間に配布するなど、ユーザの希望に応じてテキストを配布することが可能となる。
また、商業的なシステムとして用いる場合には、ユーザの契約条件に応じて時間遅延を設けてテキストを配布することも可能となる。
【００４９】
本実施例は第一の実施例（図１）と基本的に同様の構成をとるが、その中の主メモリ１０４内の構成が異なる。
この主メモリ１０４内の構成は図２１に示すようなものとなる。
図２１に示すように、主メモリ１０４ａに配布管理テーブル２１０８を確保し、システム制御プログラム１０５ａの制御下に配布条件登録制御プログラム２１００およびテキスト配布制御プログラム２１０４を新たに設ける。
また、配布条件登録制御プログラム２１００の制御下に配布条件解析プログラム２１０１および配布条件登録プログラム２１０２を、テキスト検索制御プログラム１１２ａの制御下に配布情報格納プログラム２１０３を、テキスト配布制御プログラム２１０４の制御下にテキスト配布プログラム２１０５を設ける。
このテキスト配布プログラム２１０５は配布条件チェックプログラム２１０６、電子メールプログラム１１９および配布情報修正プログラム２１０７で構成される。
電子メールプログラム１１９にはワークステーションなどで一般的に用いられている既存のメールプログラムを用いる。
以上のプログラムはハードディスク装置、フレキシブルディスクなどのコンピュータで読み書きできる記憶媒体に格納することもできる。
【００５０】
システム制御プログラム１０５ａは文書検索配送システムの管理者によるキーボード１０１からの指示を受けて起動される。
配布条件登録制御プログラム２１００、検索条件式登録制御プログラム１０６、テキスト検索制御プログラム１１２ａおよびテキスト配布制御プログラム２１０４はユーザ１２６からの配布条件や検索条件式の登録指示、キーボード１０１からの配布条件の登録指示およびニュース配信元１２５からのテキストの配信により、システム制御プログラム１０５ａによって起動され、それぞれ配布条件解析プログラム２１０１および配布条件登録プログラム２１０２の制御、検索条件式解析プログラム１０７、検索ターム数カウントテーブル作成プログラム１０８および検索用オートマトン作成プログラム１０９の制御、テキスト取得プログラム１１３、テキスト検索プログラム１１４、テキスト成形プログラム１１８および配布情報格納プログラム２１０３の制御、テキスト配布プログラム２１０５の制御を行う。
以下、本実施例における文書検索配送システムの処理内容について説明する。
【００５１】
まず、システム制御プログラム１０５ａの処理内容について図２２のＰＡＤ図を用いて説明する。
システム制御プログラム１０５ａでは、まずステップ２２００で、キーボード１０１から終了コマンドが入力されるまで、以下のステップを繰り返す。
この繰り返し処理では、まずステップ２２０１でユーザ１２６からの電子メールあるいはキーボード１０１の入力によって配布条件が送られてきているかどうかを調べる。
ここで、配布条件が送られてきている場合には、ステップ２２０２で配布条件登録制御プログラム２１００を起動して、配布条件の登録を行う。
次に、ステップ２２０３でユーザ１２６から電子メールによって検索条件式が送られてきているかどうかを調べる。
ここで、検索条件式が送られてきている場合には、ステップ２２０４で検索条件式登録制御プログラム１０６を起動して、検索条件式の登録を行う。
次に、ステップ２２０５でニュース配信元１２５から電子メールによってテキストが送られてきているかどうかを調べる。
ここで、テキストが送られてきている場合には、ステップ２２０６でテキスト検索制御プログラム１１２ａを起動して、テキストの検索を行う。
最後に、ステップ２２０７でテキスト配布制御プログラム２１０４を起動し、配布条件を判定してその条件を満たしているユーザに対してのみテキストを配布する。
以上がシステム制御プログラム１０５ａの処理内容である。
【００５２】
以下、第一の実施例にはない配布条件登録制御プログラム２１００とテキスト配布制御プログラム２１０４および第一の実施例と処理が異なるテキスト検索制御プログラム１１２ａの処理内容について説明する。
まず、配布条件登録制御プログラム２１００による配布条件登録の処理内容について図２３のＰＡＤ図を用いて説明する。
配布条件登録制御プログラム２１００はシステム制御プログラム１０５ａによって起動される。
本プログラムは、まずステップ２３００で配布条件解析プログラム２１０１を起動し、ユーザ１２６からの電子メールあるいはキーボード１０１の入力によって送られてきた配布条件を解析する。
この配布条件の解析処理では、配布条件から以下の情報を抽出する。
（Ａ）配布条件を設定するユーザの識別子
（Ｂ）配布条件の形式
（Ｃ）配布条件の設定値
上記（Ｂ）の配布条件の形式としては、「配布時間」、「配布件数」、「遅延時間」などの配布条件の種別を抽出する。
（Ｃ）の配布条件の設定値として抽出する値は、例えば（Ｂ）が「配布時間」ならその時間、「配布件数」なら配布する件数、「遅延時間」なら検索してから実際に配布するまでの経過時間である。
例えば、
ユーザ番号１：配布時間（１８：００）
という“ユーザ番号１に対して１８：００に配布する”ことを意味する配布条件が送られてきた場合には「ユーザ番号１」、「配布時間」および「１８：００」を抽出する。
ユーザ番号２：配布件数（５）
という“ユーザ番号２に対して５件たまったら配布する”ことを意味する配布条件が送られてきた場合には「ユーザ番号２」、「配布件数」および「５」を抽出する。
ユーザ番号３：遅延時間（０１：３０）
という“ユーザ番号３に対して１時間３０分遅れて配布する”ことを意味する配布条件が送られてきた場合には「ユーザ番号３」、「遅延時間」および「０１：３０」を抽出する。
最後に、ステップ２３０１で配布条件登録プログラム２１０２を起動し、配布条件解析プログラム２１０１によって解析された結果を配布管理テーブル２１０８に格納する。
図２４に配布管理テーブル２１０８の例を示す。
配布管理テーブル２１０８には配布条件解析プログラム２１０１によって抽出された配布条件の形式と設定値がユーザ番号に対応する形で格納され、配布条件チェック用領域および配布テキスト番号格納用領域が確保されている。
【００５３】
次に、テキスト検索制御プログラム１１２ａによるテキスト検索の処理内容について図２５のＰＡＤ図を用いて説明する。
テキスト検索制御プログラム１１２ａはシステム制御プログラム１０５ａによって起動される。
図２５に示す本プログラムの処理内容のうち、ステップ２５００〜２５０３におけるテキスト取得プログラム１１３、テキスト検索プログラム１１４およびテキスト成形プログラム１１８の処理内容は第一の実施例で述べた通りである。
テキスト検索制御プログラム１１２ａはステップ２５０４で、配布情報格納プログラム２１０３を起動し、配布管理テーブル２１０８の配布テキスト番号格納用領域に検索条件が成立した配布すべきテキストの番号を追加格納する。
次に、ステップ２５０５で配布管理テーブル２１０８の配布テキスト番号格納用領域に格納してあるテキスト番号の数、あるいは現在の時間を本テーブルの配布条件チェック用領域に格納する。
このとき、配布条件の形式が「配布件数」の場合には格納してあるテキスト番号の数を、「遅延時間」の場合には現在の時間を格納する。「配布時間」の場合には何も格納する必要はない。
その後、ステップ２５０６でテキスト成形プログラム１１８によって成形されたテキストをワークエリア１２３に格納する。
【００５４】
最後に、テキスト配布制御プログラム２１０４によるテキスト配布の処理内容について図２６のＰＡＤ図を用いて説明する。
テキスト配布制御プログラム２１０４はシステム制御プログラム１０５ａによって起動される。
本プログラムは、ステップ２６００でテキスト配布プログラム２１０５を起動し、ユーザ毎に配布条件を判定し、その条件を満たしているユーザに対してテキストを配布する。
【００５５】
テキスト配布プログラム２１０５の詳細な処理内容を図２７に示す。
テキスト配布プログラム２１０５は、まずステップ２７００で、配布管理テーブル２１０８に配布条件が格納されている全てのユーザ番号に対して以下のステップを繰り返す。
この繰り返し処理では、まずステップ２７０１で配布条件チェックプログラム２１０６を起動し、配布管理テーブル２１０８を用いて配布条件を満たしているか否かを判定する。
ここで、以下の条件を満たしていれば配布条件を満たしているとみなす。
条件（１）：配布条件の形式が「配布時間」で、配布条件の設定値と現在の時間が一致するか、あるいは配布条件の設定値より現在の時間の方が超過している。
条件（２）：配布条件の形式が「配布件数」で、配布条件の設定値と配布条件チェック用領域に格納されている件数が一致する。
条件（３）：配布条件の形式が「遅延時間」で、配布条件の設定値と配布条件チェック用領域に格納されている時間から現在の時間までの経過時間が一致するか、あるいは配布条件の設定値より経過時間の方が超過している。
上記の条件を満たしている場合には、ステップ２７０２で配布条件を満たしていると判断し、ステップ２７０３で電子メールプログラム１１９を起動して配布管理テーブル２１０８の配布テキスト番号格納用領域に格納されている番号のテキストをそのユーザ番号に配布する。
最後に、ステップ２７０４で配布情報修正プログラム２１０７を起動し、テキストを配布したユーザ番号に対応する配布管理テーブル２１０８の配布条件チェック用領域と配布テキスト番号格納用領域をリセットする。
これは、配布条件チェック用領域をＮＵＬＬクリアし、配布テキスト番号格納用領域からテキスト番号を削除することで実現する。
以上が本発明の文書検索システムの実施例である。
【００５６】
以下、図２３に示した本実施例における配布条件登録制御プログラム２１００の処理手順について図２８を用いて具体的に説明する。
まず、図２３の配布条件登録制御プログラム２１００のステップ２３００における配布条件解析プログラム２１０１の処理について説明する。
配布条件解析プログラム２１０１は配布条件登録制御プログラム２１００によって起動される。
本プログラムは、ユーザ１２６から電子メールで送られてきた配布条件あるいはキーボード１０１から入力された配布条件を解析する。
例として「ユーザ番号１：配布時間（１８：００）」、「ユーザ番号２：配布件数（５）」および「ユーザ番号３：遅延時間（０１：３０）」という配布条件を解析した結果を図２８に示す。
例えば「ユーザ番号１：配布時間（１８：００）」という配布条件の場合、解析結果として、配布条件を設定するユーザ番号「１」、配布条件の形式「配布時間」、配布条件の設定値「１８：００」が得られる。
【００５７】
次に、図２３の配布条件登録制御プログラム２１００のステップ２３０１における配布条件登録プログラム２１０２の処理について説明する。
配布条件登録プログラム２１０２は配布条件登録制御プログラム２１００によって、配布条件解析プログラム２１０１の次に起動される。
本プログラムは、配布条件解析プログラム２１０１による解析結果に基づき配布管理テーブル２１０８を作成する。
作成された配布管理テーブル２１０８の例を図２８に示す。
本テーブルには、配布条件解析プログラム２１０１による解析結果に基づき、各ユーザ番号に対応して配布条件の形式および設定値を格納する。また、配布条件チェック用領域および配布テキスト番号格納用領域を確保する。
以上が本実施例における配布条件登録制御プログラム２１００における検索条件式登録処理の詳細な手順である。
【００５８】
以下、図２５に示した本実施例におけるテキスト検索制御プログラム１１２ａの処理手順について具体的に説明する。
図２５に示す本プログラムの処理内容のうち、ステップ２５００〜２５０３におけるテキスト取得プログラム１１３、テキスト検索プログラム１１４およびテキスト成形プログラム１１８の処理内容は第一の実施例で詳しく述べた通りである。
以下は、ステップ２５０４〜２５０６における配布情報格納プログラム２１０３の詳細な処理内容である。
配布情報格納プログラム２１０３は、テキスト検索制御プログラム１１２ａによってテキスト成形プログラム１１８の次に起動される。
本プログラムは、まず、ステップ２５０４でテキストの番号を、検索条件式が成立したユーザ番号に対応する配布管理テーブル２１０８の配布テキスト番号格納用領域に格納する。
図２９に本プログラムの処理内容の例を示す。
本図は、５９番のテキストに対してユーザ番号１およびユーザ番号２の検索条件式が成立した場合の例である。
そのため、配布管理テーブル２１０８の配布テキスト番号格納用領域のユーザ番号１およびユーザ番号２に対応する場所にテキスト番号“５９”が格納される。
次に、配布情報格納プログラム２１０３は、ステップ２５０４で配布管理テーブル２１０８の配布テキスト番号格納用領域に格納してあるテキスト番号の数、あるいは現在の時間を配布管理テーブル２１０８の配布条件チェック用領域に格納する。
このとき、配布条件の形式が「配布件数」の場合には格納してあるテキスト番号の数を、「遅延時間」の場合には現在の時間を格納する。「配布時間」の場合には何も格納する必要はない。
図２９の例の場合、ユーザ番号２の配布条件の形式は「配布件数」であるので、配布条件チェック用領域の値を１増やして“５”にするが、ユーザ番号１の配布条件の形式は「配布時間」であるため何もしない。
最後に、配布情報格納プログラム２１０３はステップ２５０６で、テキスト成形プログラム１１８によって成形されたテキストをワークエリア１２３に格納して終了する。
【００５９】
最後に、図２６のテキスト配布制御プログラム２１０４のステップ２６００におけるテキスト配布プログラム２１０５の処理について説明する。
テキスト配布プログラム２１０５の詳細な処理内容は図２７に示した通りである。
まず、ステップ２７００で、配布管理テーブル２１０８に配布条件が格納されている全てのユーザに対して以下の処理を繰り返す。
この繰り返し処理では、まずステップ２７０１で配布条件チェックプログラム２１０５を起動し、配布条件を判定する。
ここで、以下の条件を満たしていれば配布条件を満たしているとみなす。
条件（１）：配布条件の形式が「配布時間」で、配布条件の設定値と現在の時間が一致するか、あるいは配布条件の設定値より現在の時間の方が超過している。
条件（２）：配布条件の形式が「配布件数」で、配布条件の設定値と配布条件チェック用領域に格納されている件数が一致する。
条件（３）：配布条件の形式が「遅延時間」で、配布条件の設定値と配布条件チェック用領域に格納されている時間から現在の時間までの経過時間が一致するか、あるいは配布条件の設定値より経過時間の方が超過している。
【００６０】
ユーザ番号１の配布条件の形式は「配布時間」である。
しかし、現在の時間“１４：００”は配布条件の設定値“１８：００”を超過していないため、配布条件が満たされていない。そのため、次の繰り返しに移る。
【００６１】
ユーザ番号２の配布条件の形式は「配布件数」であり、配布条件チェック用領域に格納されている件数と配布条件の設定値が“５”と一致しているので、ステップ２７０３で電子メールプログラム１１９を起動し、配布管理テーブル２１０８の配布テキスト番号格納用領域に格納されている番号のテキストを配布する。本図においてユーザ番号２に対応する配布テキスト番号格納用領域には、テキスト番号１９、２４、３３、４２、５９が格納されているので、ワークエリア１２３に格納されているテキストで同じ番号のものをユーザ番号２に配布する。
次に、ステップ２７０４で配布情報修正プログラム２１０７を起動し、テキストを配布したユーザ番号、すなわちユーザ番号２に対応する配布管理テーブル２１０８の配布チェック用領域および配布テキスト番号格納用領域をリセットし、次の繰り返し処理を行う。
【００６２】
ユーザ番号３の配布条件の形式は「遅延時間」であり、配布条件チェック用領域に格納されている時間から現在の時間までの経過時間と配布条件の設定値に格納されている時間が“０１：３０”と一致しているので、ステップ２７０３で電子メールプログラム１１９を起動し、配布管理テーブル２１０８の配布テキスト番号格納用領域に格納されている番号のテキストを配布する。
本図においてユーザ番号３に対応する配布テキスト番号格納用領域には、テキスト番号５３が格納されているので、ワークエリア１２３に格納されているテキストで同じ番号のものをユーザ番号３に配布する。
次に、ステップ２７０４で配布情報修正プログラム２１０７を起動し、テキストを配布したユーザ番号、すなわちユーザ番号２に対応する配布管理テーブル２１０８の配布チェック用領域および配布テキスト番号格納用領域をリセットする。
全ての繰り返し処理が終了したときの配布管理テーブル２１０８を図３０に示す。
ユーザ番号２およびユーザ番号３にテキスト配布処理を行ったので、それらに対応する配布条件チェック用領域および配布テキスト番号格納用領域がリセットされている。
【００６３】
以上説明したように、本発明によれば、ユーザ毎に配布条件を設定し、その配布条件にしたがって検索条件式が成立したテキストを配布することにより、ある程度まとめてテキストを配布したり、決まった時間に配布するなど、ユーザの希望に応じてテキストを配布することが可能となる。
また、商業的なシステムとして用いる場合には、ユーザの契約条件に応じて時間遅延を設けてテキストを配布することも可能となる。
その結果、複数の情報源から入手したテキストを、ユーザが予め登録しておいた検索条件式に基づき、テキストのただ一度の走査で複数ユーザの検索条件式が成立しているかどうかを判別し、ユーザの個々の希望配布条件に応じてそのテキストを配布することができる柔軟性の高い文書検索配送システムを実現することが可能となる。
【００６４】
《第三実施例》
次に、本発明の第三の実施例について説明する。
本実施例で示す文書検索配送システムでは、各ユーザが指定した検索条件式に含まれる検索タームをユーザ毎に管理し、ユーザから検索条件式の削除が指示された際に、管理してある検索タームで有限オートマトンを探索し、ユーザリストのポインタを張り替えることによって、容易にユーザリストから以前の情報を削除することができる。
この実施例によれば、ユーザによる検索条件式の変更指示がある場合にも容易に変更を行うことが可能となる。
本実施例は第一の実施例（図１）と基本的に同様の構成をとるが、その中の主メモリ１０４内の構成が異なる。
この主メモリ１０４内の構成は図３１に示すようなものとなる。
図３１に示すように、主メモリ１０４ｂに検索ターム管理テーブル３１０６を確保し、システム制御プログラム１０５ｂの制御下に検索条件式削除制御プログラム３１００を新たに設ける。
また、検索条件式削除制御プログラム３１００の制御下にユーザリスト修正プログラム３１０１および検索ターム数カウントテーブル修正プログラム３１０４を、検索条件式登録制御プログラム１０６ｂの制御下に検索ターム管理テーブル作成プログラム３１０５を設ける。
ユーザリスト修正プログラム３１０１は有限オートマトン探索プログラム３１０２およびユーザリスト部分削除プログラム３１０３で構成される。
以上のプログラムはハードディスク装置、フレキシブルディスクなどのコンピュータで読み書きできる記憶媒体に格納することもできる。
【００６５】
システム制御プログラム１０５ｂは文書検索配送システムの管理者によるキーボード１０１からの指示で起動される。
検索条件式削除制御プログラム３１００、検索条件式登録制御プログラム１０６ｂおよびテキスト検索制御プログラム１１２はユーザ１２６からの検索条件式の登録や削除指示およびニュース配信元１２５からのテキストの配信により、システム制御プログラム１０５ｂによって起動され、それぞれユーザリスト修正プログラム３１０１および検索ターム数カウントテーブル修正プログラム３１０４の制御、検索条件式解析プログラム１０７、検索ターム数カウントテーブル作成プログラム１０８、検索用オートマトン作成プログラム１０９および検索ターム管理テーブル作成プログラム３１０５の制御、テキスト取得プログラム１１３、テキスト検索プログラム１１４およびテキスト成形プログラム１１８の制御を行う。
【００６６】
以下、本実施例における文書検索配送システムの処理内容について説明する。まず、システム制御プログラム１０５ｂの処理内容について図３２のＰＡＤ図を用いて説明する。
システム制御プログラム１０５ｂでは、まずステップ３２００で、キーボード１０１から終了コマンドが入力されるまで、以下のステップを繰り返す。
この繰り返し処理では、まずステップ３２０１でユーザ１２６から電子メールによって検索条件式の削除指示が送られてきているかどうかを調べる。
ここで、検索条件式の削除指示が送られてきている場合には、ステップ３２０２で検索条件式削除制御プログラム３１００を起動して、検索条件式の削除を行う。
次に、ステップ３２０３でユーザ１２６から電子メールによって検索条件式が送られてきているかどうかを調べる。
ここで、検索条件式が送られてきている場合には、ステップ３２０４で検索条件式登録制御プログラム１０６ｂを起動して、検索条件式の登録を行う。
次に、ステップ３２０５でニュース配信元１２５から電子メールによってテキストが送られてきているかどうかを調べる。
ここで、テキストが送られてきている場合には、ステップ３２０６でテキスト検索制御プログラム１１２を起動して、テキストの検索を行う。
次に、ステップ３２０７でテキスト検索制御プログラム１１２におけるテキスト検索の結果を調べ、成立している検索条件式が一つでも存在すると判断された場合には、ステップ３２０８で電子メールプログラム１１９を起動し、成立した検索条件式を指定したユーザに対し、該当テキストを電子メールを用いて配布する。
以上がシステム制御プログラム１０５ｂの処理内容である。
【００６７】
以下、第一の実施例にはない検索条件式削除制御プログラム３１００および第一の実施例と処理が異なる検索条件式登録制御プログラム１０６ｂの処理内容について説明する。
まず、検索条件式削除制御プログラム３１００による検索条件式の削除処理の内容について図３３のＰＡＤ図を用いて説明する。
検索条件式削除制御プログラム３１００はシステム制御プログラム１０５ｂによって起動される。
本プログラムは、まずステップ３３００でユーザリスト修正プログラム３１０１を起動し、検索条件式の削除が指定されたユーザ番号に対応するユーザリスト１２２を削除する。
そして、ステップ３３０１で検索ターム数カウントテーブル修正プログラム３１０４を起動し、検索条件式の削除が指定されたユーザ番号に対応する検索ターム数カウントテーブル１２０を削除する。
【００６８】
ユーザリスト修正プログラム３１０１の詳細な処理内容を図３４に示す。
本プログラムは、まずステップ３４００で検索ターム管理テーブル３１０６に格納されている検索タームの数分、以下のステップを繰り返す。検索ターム管理テーブル３１０６については後で説明する。
繰り返し処理では、まずステップ３４０１で有限オートマトン探索プログラム３１０２を起動し、検索タームで有限オートマトン１２１を探索することによってユーザリスト１２２へのポインタを得る。
最後に、ステップ３４０２でユーザリスト部分削除プログラム３１０３を起動し、ユーザリスト１２２のポインタを張り替えることによって、検索条件式の削除を行うユーザ番号に対応するリストを削除する。
以上が、検索条件式削除制御プログラム３１００による検索条件式削除の処理内容である。
【００６９】
次に、検索条件式登録制御プログラム１０６ｂによる検索条件式の登録処理の内容について図３５のＰＡＤ図を用いて説明する。
検索条件式登録制御プログラム１０６ｂはシステム制御プログラム１０５ｂによって起動される。
図３５に示す本プログラムの処理内容のうち、ステップ３５００〜３５０２における検索条件式解析プログラム１０７、検索ターム数カウントテーブル作成プログラム１０８および検索用オートマトン作成プログラム１０９の処理内容については第一の実施例で述べた通りである。
検索条件式登録制御プログラム１０６ｂはステップ３５０３で、検索ターム管理テーブル作成プログラム３１０５を起動し、検索条件式中に含まれる検索タームを検索ターム管理テーブル３１０６に格納する。
ここで、作成される検索ターム管理テーブル３１０６の例を図３６に示す。
本図に示した検索ターム管理テーブル３１０６は、「“文書”ａｎｄ（“検索”ｏｒ“サーチ”）」というユーザ１の検索条件式、「“文字”ａｎｄ ¬（“認識”ｏｒ“学習”）」というユーザ２の検索条件式および「¬（“検索”ａｎｄ “学習”）」というユーザ３の検索条件式から抽出した検索タームから作成されたものである。
以上が本発明の文書検索方法の実施例である。
【００７０】
以下、図３３に示した本実施例における検索条件式削除制御プログラム３１００の処理手順について図３７を用いて具体的に説明する。
まず、図３３の検索条件式削除制御プログラム３１００のステップ３３００におけるユーザリスト修正プログラム３１０１の処理について説明する。
ユーザリスト修正プログラム３１０１は検索条件式削除制御プログラム３１００によって起動される。本プログラムの詳細な処理内容は図３４に示した通りである。
本プログラムはステップ３４００で、検索ターム管理テーブル３１０６に格納されている検索タームの数分、ステップ３４０１およびステップ３４０２を繰り返す。
ステップ３４０１では、有限オートマトン探索プログラム３１０２を起動し、検索タームで有限オートマトン１２１を探索してユーザリスト１２２へのポインタを得る。
ステップ３４０２では、ユーザリスト部分削除プログラム３１０３を起動し、検索条件式の削除が指定されたユーザ番号に対応するユーザリスト１２２を削除する。
【００７１】
ユーザリスト修正プログラム３１０１の処理例を図３７に示す。
本図は「文字」、「認識」および「学習」を検索条件式中に含むユーザ番号２の検索条件式を削除する場合の例である。
検索ターム管理テーブル３１０６には「文字」、「認識」および「学習」が格納されており、これらの検索タームに対して繰り返し処理が行われる。
繰り返し処理の中でステップ３４０１では「文字」、「認識」および「学習」で有限オートマトン１２１を探索し、それぞれユーザリスト１２２へのポインタを得る。
ステップ３４０２では、それらのユーザリスト１２２を辿り、ユーザ番号２に関するユーザリスト１２２を削除する。
本図では、ユーザリスト１２２のうちユーザ番号“２”の部分が削除される。
【００７２】
次に、図３３の検索条件式削除制御プログラム３１００のステップ３３０１における検索ターム数カウントテーブル修正プログラム３１０４の処理について説明する。
検索ターム数カウントテーブル修正プログラム３１０４は、検索条件式削除制御プログラム３１００によってユーザリスト修正プログラム３１０１の次に起動される。
本プログラムでは、検索条件式の削除が指定されたユーザ番号に対応する検索ターム数カウントテーブル１２０を削除する。
図３８に本プログラムの処理例を示す。
本図に示す通り、ユーザ番号２に対応する検索ターム数カウントテーブルが削除される。
以上が本実施例における検索条件式削除制御プログラム３１００における検索条件式削除の詳細な手順である。
【００７３】
以下、図３５に示した本実施例における検索条件式登録制御プログラム１０６ｂの処理手順について具体的に説明する。
図３５に示す本プログラムの処理内容のうち、ステップ３５００〜３５０２における検索条件式解析プログラム１０７、検索ターム数カウントテーブル作成プログラム１０８および検索用オートマトン作成プログラム１０９の処理内容については第一の実施例で述べた通りである。
以下、ステップ３５０３における検索ターム管理テーブル作成プログラム３１０５の詳細な処理内容を説明する。
検索ターム管理テーブル作成プログラム３１０５は、検索条件式登録制御プログラム１０６ｂによって検索用オートマトン作成プログラム１０９の次に起動される。
本プログラムでは、検索条件式解析プログラム１０７による解析結果として得られた検索条件式中に含まれる検索タームを検索ターム管理テーブル３１０６に格納する。
【００７４】
本プログラムの処理例を図３９に示す。
本図は、ユーザ番号２のユーザが「“構造”ａｎｄ“認識”」、すなわち「“構造”と“認識”を同時に含む文書」という検索条件式を指定した場合の例である。
この検索条件式を検索条件式解析プログラム１０７によって解析すると“構造”および“認識”という２つの検索タームが得られる。
これらの検索タームを検索ターム管理テーブル３１０６にユーザ番号に対応した形で格納する。本図では、ユーザ番号２に対応する場所に“構造”および“認識”が格納されている。
以上が本実施例における検索条件式登録制御プログラム１０６ｂにおける検索条件式登録の詳細な手順である。
【００７５】
以上説明したように、本発明によれば、各ユーザが指定した検索条件式に含まれる検索タームをユーザ毎に管理し、ユーザから検索条件式の削除が指示された際に、管理してある検索タームで有限オートマトンを探索し、ユーザリストのポインタを張り替えることによって、容易にユーザリストから以前の情報を削除することが可能となる。
その結果、複数の情報源から入手したテキストを、ユーザが予め登録しておいた検索条件式に基づき、テキストのただ一度の走査で複数ユーザの検索条件式が成立しているかどうかを判別し、条件が成立しているユーザに対して即座にそのテキストを配布する文書検索配送システムにおいて、ユーザの要求に応じて検索条件式を随時変更できる文書検索配送システムを実現することが可能となる。
本実施例では、検索条件式の削除指示と登録指示が別々に送られてくる場合について説明したが、検索条件式の削除指示と登録指示が同時に送られてくる、すなわち更新指示が送られてくる場合にも削除処理と登録処理を連続して行うことにより対応できることは明らかであろう。
また、本実施例は第一の実施例に対して検索条件式の削除処理を追加したものであるが、第二の実施例に対して適用することも可能なことは明らかであろう。
【００７６】
【発明の効果】
本発明によれば、ユーザ数が、すなわち検索条件式の数が大量となる場合でも、ただ一度のテキスト走査で全ての検索条件式が成立しているかどうかを判別することができるため、高速なテキスト検索を実現することが可能となる。
また、この結果、ユーザ数が増えた場合でも、ユーザ数に依らない高速なリアルタイムのテキスト検索配布が行える文書検索配送システムを提供することが可能となる。
【図面の簡単な説明】
【図１】本発明の第一の実施例の構成を示す図である。
【図２】従来技術１における有限オートマトンの構成を示す図である。
【図３】従来技術１におけるフェイル先状態番号テーブルの構成を示す図である。
【図４】従来技術１における出力テーブルの構成を示す図である。
【図５】従来技術２における日本語テキスト用有限オートマトンの例を示す図である。
【図６】本発明の概略の処理内容を示す図である。
【図７】システム制御プログラム１０５の処理内容を示すＰＡＤ図である。
【図８】検索条件式登録制御プログラム１０６の処理内容を示すＰＡＤ図である。
【図９】検索条件式の解析方法を説明するための図である。
【図１０】検索ターム数カウントテーブル１２０の作成方法を説明するための図である。
【図１１】テキスト検索制御プログラム１１２の処理内容を示すＰＡＤ図である。
【図１２】検索用オートマトン作成プログラム１０９の処理内容を示すＰＡＤ図である。
【図１３】有限オートマトン１２１およびユーザリスト１２２の作成方法を説明するための図である。
【図１４】テキスト検索プログラム１１４の処理内容を示すＰＡＤ図である。
【図１５】ユーザリスト１２２の作成処理を説明するための図である。
【図１６】テキスト走査処理を説明するための図である。
【図１７】テキスト中に一致する部分文字列が出現した検索タームのカウント処理を説明するための図である。
【図１８】テキスト中に一致する部分文字列が出現しなかった検索タームのカウント処理を説明するための図である。
【図１９】検索条件式の成立チェック処理を説明するための図である。
【図２０】テキスト成形処理を説明するための図である。
【図２１】本発明の第二の実施例の構成を示す図である。
【図２２】システム制御プログラム１０５ａの処理内容を示すＰＡＤ図である。
【図２３】配布条件登録制御プログラム２１００の処理内容を示すＰＡＤ図である。
【図２４】配布管理テーブル２１０８の構成を示す図である。
【図２５】テキスト検索制御プログラム１１２ａの処理内容を示すＰＡＤ図である。
【図２６】テキスト配布制御プログラム２１０４の処理内容を示すＰＡＤ図である。
【図２７】テキスト配布プログラム２１０５の処理内容を示すＰＡＤ図である。
【図２８】配布条件登録処理を説明するための図である。
【図２９】配布情報格納処理を説明するための図である。
【図３０】配布条件チェック処理および配布情報修正処理を説明するための図である。
【図３１】本発明の第三の実施例の構成を示す図である。
【図３２】システム制御プログラム１０５ｂの処理内容を示すＰＡＤ図である。
【図３３】検索条件式削除制御プログラム３１００の処理内容を示すＰＡＤ図である。
【図３４】ユーザリスト修正プログラム３１０１の処理内容を示すＰＡＤ図である。
【図３５】検索条件式登録制御プログラム１０６ｂの処理内容を示すＰＡＤ図である。
【図３６】検索ターム管理テーブル３１０６の構成を示す図である。
【図３７】ユーザリスト修正処理を説明するための図である。
【図３８】検索ターム数カウントテーブル修正処理を説明するための図である。
【図３９】検索ターム管理テーブル作成処理を説明するための図である。
【符号の説明】
１００ディスプレイ
１０１キーボード
１０２ＣＰＵ
１０３バス
１０４、１０４ａ、１０４ｂ主メモリ
１０５、１０５ａ、１０５ｂシステム制御プログラム
１０６、１０６ｂ検索条件式登録制御プログラム
１０７検索条件式解析プログラム
１０８検索ターム数カウントテーブル作成プログラム
１０９検索用オートマトン作成プログラム
１１０有限オートマトン作成プログラム
１１１ユーザリスト作成プログラム
１１２、１１２ａテキスト検索制御プログラム
１１３テキスト取得プログラム
１１４テキスト検索プログラム
１１５テキスト走査プログラム
１１６検索ターム数カウントプログラム
１１７検索条件式チェックプログラム
１１８テキスト成形プログラム
１１９電子メールプログラム
１２０検索ターム数カウントテーブル
１２１有限オートマトン
１２２ユーザリスト
１２３ワークエリア
１２４ＬＡＮ
１２５ニュース配信元
１２６文書検索配送システムのユーザ
２１００配布条件登録制御プログラム
２１０１配布条件解析プログラム
２１０２配布条件登録プログラム
２１０３配布情報格納プログラム
２１０４テキスト配布制御プログラム
２１０５テキスト配布プログラム
２１０６配布条件チェックプログラム
２１０７配布情報修正プログラム
２１０８配布管理テーブル
３１００検索条件式削除制御プログラム
３１０１ユーザリスト修正プログラム
３１０２有限オートマトン探索プログラム
３１０３ユーザリスト部分削除プログラム
３１０４検索ターム数カウントテーブル修正プログラム
３１０５検索ターム管理テーブル作成プログラム
３１０６検索ターム管理テーブル[0001]
BACKGROUND OF THE INVENTION
The present invention retrieves an electronic document obtained from a plurality of information sources such as a news agency or a newspaper company using an e-mail, an information collecting robot, or the like using a search condition expression registered in advance by a user. Involved in a document retrieval and delivery system that distributes the digitized documents to established users, especially when the number of users increases, the retrieval of all users can be completed by scanning the digitized documents once. The present invention relates to a document retrieval / delivery system having a distribution function.
[0002]
[Prior art]
In recent years, a large amount of electronic documents (hereinafter referred to as “text”) has been sent to users every moment by e-mail and electronic news. In addition, the number of information sources that provide information via the Internet is increasing rapidly, and the amount of text collected from these information sources using an information collecting robot or the like is enormous. For this reason, there is an increasing need for a document retrieval and delivery system that retrieves these texts and immediately distributes them to users who are seeking the texts.
Document retrieval is used as a core for realizing the document retrieval / delivery system (see, for example, Non-Patent Document 1).
This creates a kind of finite automaton called a pattern matching machine from a plurality of search strings to be matched (hereinafter referred to as a search term), so that multiple search terms can be simultaneously scanned by scanning the text once. It is a method that can be verified.
[0003]
This method will be described with reference to FIG.
This figure is a state transition diagram of a finite automaton that collates four search terms “he”, “she”, “his”, and “hers”.
Here, a circle represents a finite automaton state, and a solid arrow represents a state transition.
The alphabets attached to each solid line arrow indicate input characters in which the corresponding state transition occurs, and the numerical values indicated in each circle indicate the state number of the same state.
A broken arrow indicates a transition destination when a character not shown in the finite automaton is input (hereinafter referred to as “fail”).
Here, the broken line arrow from

state

1, 2, 3, 6, 8 to state 0 is omitted.
The transition destination due to this failure is actually managed by a failure destination state number table as shown in FIG.
When the transition is caused by a failure, the input character is re-verified in the transition destination state.
When the

states

2, 5, 7, and 9 are reached during text scanning, a partial character string that matches the search term appears in the text. This is done by referring to an output table as shown in FIG. Detected.
This output table stores a state number and a character string output when the state is reached, that is, a search term that matches a partial character string in the text. Hereinafter, the operation of this method will be described with reference to FIG.
The initial state is state 0.
In this example, if the input character is “h”, the state transitions to state 1, and if “s”, the state transitions to state 3.
If a character other than these (expressed by ¬ {h, s} and "¬" indicates that a negative condition is applied to the next element) is entered, the state returns to the initial state 0.
If the input character is “h” in state 3, the state transitions to state 4.
If a character other than “h” is entered here, the process returns to the state 0, and collation processing is performed again here.
On the other hand, if the input character is “e” in state 4, the state transitions to state 5, and by referring to the output table of FIG. 4, a partial character string that matches the search terms “sh” and “he” appears in the text. It is detected.
Here, if, for example, “i” other than “e” is input, the state transitions to the state 1 with reference to the broken arrow of the fail.
Then, the state transitions to state 6 by re-verifying the input character “i” in state 1 of the transition destination.
[0004]
Next, a case where search terms are collated for the text “users” will be described.
First, the first character “u” is input.
However, since “u” is a character other than “h” and “s”, the state returns to the initial state 0.
Next, a transition from state 0 to state 3 is made by inputting the second character “s”.
Hereinafter, when the third character “h” and the fourth character “e” are input, the state transits to the state 4 and the state 5, and the search term “she” and the state are referred to by referring to the output table of FIG. It is detected that a partial character string that matches “he” appears in the text.
Next, the fifth character “r” is input.
However, since there is no transition destination for the input character “r” in the state 5, the state becomes a failure and the state transitions to the state 2.
Here, a transition to the state 8 is made by performing re-verification on “r”.
Finally, when the sixth character “s” is input, the state transits to the state 9, and by referring to the output table of FIG. 4, a partial character string that matches the search term “hers” appears in the text. Is detected.
As described above, the non-patent document 1 discloses a document search method capable of collating a plurality of search terms at the same time with only one scan of text.
[0005]
An extension of the method described in Non-Patent Document 1 to Japanese is known (for example, see Non-Patent Document 2).
Unlike English, Japanese has many character types.
Therefore, in the computer, one character is usually represented by two bytes, that is, two English characters.
By dividing this 2-byte character into 1-byte and simply creating a finite automaton as in Non-Patent Document 1, there is a distinction between 1-byte characters that are part of 2-byte characters and English 1-byte characters. Since it is not connected, noise may occur.
Therefore, in Non-Patent Document 2, attention is paid to the fact that the Japanese Industrial Standard defines a 3-byte character code (represented by KI and KO) that indicates switching between 1-byte characters and 2-byte characters in the code system for information exchange. However, this problem is solved by creating a finite automaton that distinguishes 1-byte characters and 2-byte characters as shown in FIG.
[0006]
In addition,
1 byte is 8 bits, 1 byte character is OO ₍₁₆₎ To FF ₍₁₆₎ Up to 256 characters.
A 2-byte character consists of 2 bytes, and the first half 1 byte and the second half 1 are both 21. ₍₁₆₎ To 7E ₍₁₆₎ Up to. There are 94x94 in total.
The 3-byte character code KI is 1B2442. ₍₁₆₎ It is.
The 3-byte character code KO is 1B284A. ₍₁₆₎ It is.
21 alphanumeric characters out of 1 byte ₍₁₆₎ To 7E ₍₁₆₎ The Kana character is AO ₍₁₆₎ To DF ₍₁₆₎ Belongs to.
[0007]
Here, the transition from the state 0 to the state 0 is executed when a 1-byte character other than a 1-byte character or a KI other than KI is input to the 1-byte character collating finite automaton.
The transition from the state 3 to the state 3 or the state 6 is executed when a 1-byte code other than the upper byte of the 2-byte character or the KO other than KO is input to the 2-byte character collating finite automaton. When the byte code is a 1-byte character, the state transitions to state 3, and when the byte code is a high-order byte of a 2-byte character, the state transitions to state 6.
Here, the state 6 is provided in order to prevent a byte shift when an upper byte of a 2-byte character that does not transit to the 2-byte character collating finite automaton is input.
[0008]
Byte shift means that the lower byte of a 2-byte character is regarded as the upper byte of a 2-byte character.
In this method, state 6 is provided, and this byte shift is prevented by preventing return to state 3 unless the lower byte of a 2-byte character is input. The operation of this finite automaton is exactly the same as in Non-Patent Document 1.
In this way, by using the method described in Non-Patent Document 2 above, even when targeting a language in which 1-byte characters and 2-byte characters such as Japanese are mixed, the text can be scanned only once. It is possible to collate a plurality of search terms at the same time.
[0009]
In the above, an example of a finite automaton has been described as a method for searching for a search term from text, but an extended BM method is known as another method (for example, Non-Patent Document 3). The extended BM method is an extension of the BM (Boyer-more) method, which is a high-speed pattern matching method, so that a plurality of patterns (search character strings) can be handled. In Non-Patent Document 3, EBM (Expanded- This is called the Boyer-moore method.
Further, although not an extended BM method, a method of performing multiple character string matching is also known (Non-Patent Document 4). This method is a combination of the basic idea of the BM method, which is a high-speed pattern matching method, and the AC (Aho-Corasick) method that simultaneously matches a plurality of patterns using a finite automaton.
[Non-Patent Document 1]
“Efficient String Matching: An Aid to Bibliographic Search” (Alfred V. Aho and Margaret J. Corasick, Communications of the ACM, June 1975, Vol.
[Non-Patent Document 2]
“Aho-Corasick pattern matching algorithm for Japanese text” (Takeshi Shinohara, Seto Arikawa, Information Processing Society of Japan, Natural Language Processing, 1985.11.15, Vol. 86, No. 48, pp. 52. 4.1-52.4.8)
[Non-Patent Document 3]
“Five types of pattern matching methods are realized by C language functions” NIKKEI BYTE, August 1987, p. p. 175-189
[Non-Patent Document 4]
"High-speed multiple character string matching algorithm: FAST" IPSJ Journal, Vol. 30, NO. 9, September 1989
[0010]
[Problems to be solved by the invention]
According to the document search methods shown in the above-mentioned

Non-Patent Documents

1 and 2, it is possible to collate a plurality of search terms simultaneously by scanning the text once. However, the following problem arises when text search is performed for search condition formulas of a large number of users.
(1) User identification problems
By creating one finite automaton with all the search terms included in the search condition formulas of a large number of users, it becomes possible to match all the search terms with a single scan of the text. However, since it is impossible to determine which user's search condition expression contains the search term that matches the partial character string in the text, it is not known which user's search condition expression is satisfied.
(2) Problem of processing time
For each user search condition formula, if a finite automaton is created with the search terms included in the search condition formula, it is possible to determine which user's search condition formula is satisfied. However, since it is necessary to scan the text by the number of finite automata (that is, the number of users), the search takes time as the number of users increases. The same applies to the case where the techniques shown in

Non-Patent Documents

3 and 4 are used instead of the finite automaton.
In order to solve these problems, the present invention aims to solve the following problems. That is, an object of the present invention is to establish a plurality of user search condition formulas by scanning texts only once based on search condition formulas registered in advance by users. An object of the present invention is to provide a document search / delivery system that discriminates whether or not the text is distributed to users who satisfy the condition.
[0011]
[Means for Solving the Problems]
In order to achieve the above object, the present invention provides:
For text data of document information obtained from one or more information sources,
A search condition expression registration step for registering a search condition expression specified by one or more users including one or more search terms, and when the text is obtained, determining whether the search condition expression for the text is successful, In a document search / distribution method having a text search / distribution step of distributing the text to a user who has satisfied the search condition formula, the text search / distribution step scans the text only once, thereby the plurality of search condition formulas Has a text search step for determining success or failure of the text.
Further, the search condition expression registration step includes a search condition expression analysis step for extracting all search terms from the search condition expression,
A search term count table creation step for creating a search term count table that stores management information including the number of all search terms extracted from the user and the search condition formula of the user for each user;
A multiple character string collation table generating step for generating a multiple character string collation table to be referred to when collating all the search terms extracted from the search condition expression by a single scan of the text;
A user list generation step of generating a user list in which user identifiers of users who have specified the search condition expression are connected as a list corresponding to each search term extracted from the search condition expression;
The text search distribution step scans the text with reference to the multiple character string matching table at the time of determination of success or failure of the search condition formula for the text, and thereby extracts all of the text extracted by the search condition formula analysis step. A text scanning step to match the search terms;
A search condition expression success / failure determination step of determining whether or not the search condition expression is successful for the text by comparing the search term, the user list, and the search term number count table checked in the text scanning step; Yes.
Further, a finite automaton is used as the multiple character string matching table.
Further, the search condition formula success / failure determination step refers to the user list, and calculates a search term collation number calculation step for calculating the number of search terms collated by the text scanning step for each user,
The number of search terms calculated in the search term matching number calculation step is compared with the number of search terms stored in the search term count table, and if they match, the search including the search terms A search term number comparison step that assumes that the conditional expression is satisfied is provided.
[0012]
In addition, for text data of document information obtained from one or more information sources,
A search condition expression registration step for registering a search condition expression specified by one or more users including one or more search terms, and when the text is obtained, determining whether the search condition expression for the text is successful, In a document search / distribution method having a text search / distribution step for distributing the text to a user who satisfies the search condition formula, a distribution condition describing a text distribution condition designated by one or more users or a system administrator is provided. A distribution condition setting expression registration step for registering a distribution condition setting expression including
The text search distribution step includes a text search step of determining success or failure of the plurality of search condition expressions with respect to the text by scanning the text only once.
A text distribution control step of distributing the text to a user who has satisfied the search condition expression by the text search step when the distribution condition registered by the distribution condition setting expression registration step is satisfied. Yes.
Further, the distribution condition setting expression registration step includes a distribution condition setting expression analysis step for extracting a user identifier and distribution condition for which a distribution condition is to be set from the distribution condition setting expression;
A distribution condition management table creating step for creating a distribution condition management table storing a user identifier and a distribution condition extracted from the distribution condition setting expression in the distribution condition setting expression analyzing step;
The text distribution control step refers to a distribution condition success / failure determination step of determining success or failure of the distribution condition with reference to the distribution condition management table;
A text distribution step of distributing the text to the user when the distribution condition is determined to be satisfied by the distribution condition success / failure determination step is provided.
Further, as the distribution condition, the distribution time, the number of distributions, or the delay time from text search to distribution is used.
In addition, for text data of document information obtained from one or more information sources,
A search condition expression registration step for registering a search condition expression specified by one or more users including one or more search terms, and when the text is obtained, determining whether the search condition expression for the text is successful, In a document search / delivery method having a text search / distribution step for distributing the text to a user who has satisfied the search condition formula, when the deletion of the search condition formula is instructed, the search condition for deleting the search condition formula An expression deletion step is included.
Further, the search condition expression registration step includes a search condition expression analysis step for extracting all search terms from the search condition expression,
A search term count table creation step for creating a search term count table that stores management information including the number of all search terms extracted from the user and the search condition formula of the user for each user;
A multiple character string collation table generating step for generating a multiple character string collation table to be referred to when collating all the search terms extracted from the search condition expression by a single scan of the text;
A user list generation step of generating a user list in which user identifiers of users who have specified the search condition expression are connected as a list corresponding to each search term extracted from the search condition expression;
The search condition expression deletion step includes a search condition expression management table deletion step of deleting information related to the search condition expression instructed to be deleted from the search term count table and the user list.
Furthermore, the search condition formula registration step further includes a search term management table creation step for creating a search term management table storing the search terms extracted by the search condition formula analysis step,
The search condition expression management table deletion step refers to the search term management table, and determines a user identifier of a user who specified the search condition expression corresponding to the search term included in the search condition expression instructed to be deleted. A user list deletion step of deleting from the user list;
A search term count table deleting step for deleting, from the search term count table, user management information related to the search condition expression instructed to be deleted.
In addition, for text data of document information obtained from one or more information sources,
A search condition expression registration means for registering a search condition expression designated by one or more users including one or more search terms, and when the text is obtained, the success or failure of the search condition expression for the text is determined; In a document search / delivery device having a text search / distribution means for distributing the text to users who satisfy the search condition formula,
The text search / distribution means includes text search means for judging success or failure of the plurality of search condition expressions for the text by scanning the text only once.
Further, the search condition expression registering means is a search condition expression analyzing means for extracting all search terms from the search condition expression,
A search term count table creating means for creating a search term count table that stores management information including the number of all search terms extracted from the user and the search condition formula of the user for each user;
Multiple character string matching table generating means for generating a multiple character string matching table to be referred to when all the search terms extracted from the search condition expression are verified by a single scan of text;
User list generation means for generating a user list in which user identifiers of users who have specified the search condition formula are connected as a list corresponding to each search term extracted from the search condition formula,
The text search / distribution means scans the text with reference to the multiple character string matching table at the time of determination of success or failure of the search condition expression for the text, and thereby extracts all of the text extracted by the search condition expression analysis means. Text scanning means for matching search terms;
Search term expression success / failure judging means for judging success / failure of the search condition expression for the text by comparing the search term collated by the text scanning means, the user list, and the search term count table. Yes.
Further, a finite automaton is used as the multiple character string matching table.
Further, the search condition expression success / failure determining means refers to the user list, and a search term matching number calculating means for calculating the number of search terms checked by the text scanning means for each user,
The number of search terms calculated by the search term collation number calculating means is compared with the number of search terms stored in the search term count table, and if they match, the search including the search terms A search term number comparison means for assuming that the conditional expression is satisfied is provided.
[0013]
In addition, for text data of document information obtained from one or more information sources,
A search condition expression registration means for registering a search condition expression designated by one or more users including one or more search terms, and when the text is obtained, the success or failure of the search condition expression for the text is determined; In a document search / delivery device having a text search / distribution means for distributing the text to users who satisfy the search condition formula,
A distribution condition setting expression registration means for registering a distribution condition setting expression including a distribution condition describing a condition of text distribution specified by one or more users or a system administrator, wherein the text search distribution means Text search means for determining success or failure of the plurality of search condition expressions for the text by scanning only once;
A text distribution control unit that distributes the text to a user who has satisfied the search condition formula by the text search unit when the distribution condition registered by the distribution condition setting formula registration unit is satisfied. Yes.
Further, the distribution condition setting expression registration means extracts a distribution condition setting expression analysis means for extracting the identifier and distribution condition of a user who should set distribution conditions from the distribution condition setting expression,
A distribution condition management table creating means for creating a distribution condition management table storing a user identifier and a distribution condition extracted from the distribution condition setting expression in the distribution condition setting expression analyzing means;
The text distribution control means is a distribution condition success / failure determination means for referring to the distribution condition management table to determine whether or not the distribution condition is successful,
A text distribution unit is provided for distributing the text to the user when the distribution condition success / failure determination unit determines that the distribution condition is satisfied.
Further, as the distribution condition, the distribution time, the number of distributions, or the delay time from text search to distribution is used.
In addition, for text data of document information obtained from one or more information sources,
A search condition expression registration means for registering a search condition expression designated by one or more users including one or more search terms, and when the text is obtained, the success or failure of the search condition expression for the text is determined; In a document search / delivery device having a text search / distribution means for distributing the text to users who satisfy the search condition formula,
When deletion of the search condition formula is instructed, search condition formula deletion means for deleting the search condition formula is provided.
Further, the search condition expression registering means is a search condition expression analyzing means for extracting all search terms from the search condition expression,
A search term count table creating means for creating a search term count table that stores management information including the number of all search terms extracted from the user and the search condition formula of the user for each user;
Multiple character string matching table generating means for generating a multiple character string matching table to be referred to when all the search terms extracted from the search condition expression are verified by a single scan of text;
User list generation means for generating a user list in which user identifiers of users who have specified the search condition formula are connected as a list corresponding to each search term extracted from the search condition formula,
The search condition expression deletion means includes search condition expression management table deletion means for deleting information related to the search condition expression instructed to be deleted from the search term count table and the user list.
Further, the search condition expression registration means further includes a search term management table creation means for creating a search term management table storing the search terms extracted by the search condition expression analysis means,
The search condition expression management table deleting means refers to the search term management table, and determines a user identifier of a user who specified the search condition expression corresponding to the search term included in the search condition expression instructed to be deleted. User list deletion means for deleting from the user list;
Search term count table deletion means for deleting user management information related to the search condition expression instructed to be deleted from the search term count table is provided.
[0014]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the present invention will be described below with reference to the drawings.
<< First Example >>
First, a schematic description of the first embodiment will be given with reference to FIG.
First, the search condition expression registration process will be described.
First, the search condition expression is analyzed, and the search terms included in the search condition expression are extracted. Then, the number of extracted search terms is stored in the search term count table by the search term count table creation processing.
For example, in FIG. 6, the search condition formula “User 1:“ Documents including “Document” and “Search” ”includes two search terms“ Document ”and“ Search ”. 2 is stored in a location corresponding to user 1 in the count table. Similarly, 1 and 2 are stored in locations corresponding to the user 2 and the user 3, respectively.
Next, in the finite automaton creation process, a finite automaton that collates all the search terms extracted by the search condition formula analysis is created. This finite automaton is the same as that shown in Non-Patent Document 1 and Non-Patent Document 2.
User 1: “Document including“ document ”and“ search ””, User 2: “Document including“ climbing ”” and User 3: “Condition including“ search ”and“ registration ”” The state transition diagram of the finite automaton for collating the search terms “document”, “search”, “climbing”, and “registration” extracted from FIG. In this figure, for the sake of simplicity, the state transition is shown in units of 2 bytes (1 character).
Next, in the user list creation process, the identifiers of the users who specify the respective search terms are connected as a user list to the finite automaton. In FIG. 6, for example, when “search” is collated, the user list is referenced from state 4 at the end thereof, and it is detected that the users who have designated “search” are “user 1” and “user 3”.
[0015]
Next, text search distribution processing that is text search and distribution processing will be described.
In this processing, first, text is scanned by text scanning processing, and search terms are collated.
For example, when text: “search document” is scanned using the finite automaton shown in FIG. 6, it is detected that a substring matching “document” and “search” appears in the text. The In the finite automaton shown in this figure, a search term with “○” at the end indicates that a matching substring appears in the text, and a search term with “x” is Indicates that no matching substring appeared in the text.
In this example, since a partial character string that matches “document” and “search” appears in the text, “◯” is written in the

states

2 and 4 at the end of the text.
[0016]
Next, the number of search terms that match the partial character string in the text is counted for each user in the search term count processing.
For example, “document” and “search” match for user 1, so 2 is counted, and for user 3 only “search” matches, so it counts as 1. However, user 2 is 0 because the partial character string that matches the search term did not appear in the text.
Finally, in the search condition expression check process, the number of search terms stored in the search term count table is compared with the search term appearance count calculated in the search term count process. The text is distributed to the user in the text distribution process.
For example, in FIG. 6, the user 1 distributes the text because the number of search terms is 2 and matches, but the user 2 and the user 3 do not distribute because they do not match.
[0017]
As described above, in this embodiment, text is scanned using a finite automaton, and the number of search terms appearing as a partial character string that matches in the text is counted for each user while referring to the user list. .
Then, by comparing the counted result with the number of search terms stored in advance in the search term count table, it is checked whether the search condition formula is satisfied.
As a result, it is possible to determine whether or not the search condition formulas of a plurality of users are satisfied by only scanning the text, and text search distribution with high immediacy can be realized.
[0018]
Hereinafter, a first embodiment of the present invention will be described in detail with reference to FIG.
A document retrieval / delivery system to which the present invention is applied includes a display 100, a keyboard 101, a central processing unit (CPU) 102, a main memory 104, and a bus 103 connecting them.
In addition, a news distribution source 125 for distributing news and a user 126 using a document search / delivery system are connected to the bus 103 via a communication line 124 such as a LAN (Local Area Network).
The news distribution source 125 distributes text obtained by digitizing news data using e-mail, electronic news, or the like to the system or presents the text via the Internet, and the user 126 uses the e-mail to specify a search condition formula. Register with this system.
From this system, the text searched based on the search condition formula is distributed to the corresponding user using an electronic mail.
Hereinafter, in this embodiment, the news distribution source 125 is described as distributing text to the system using e-mail or the like. However, the news distribution source 125 only presents text on the Internet, and text collection is information. You may carry out using a collection robot.
[0019]
The main memory 104 includes a system control program 105, a search condition formula registration control program 106, a search condition formula analysis program 107, a search term count table creation program 108, a search automaton creation program 109, a text search control program 112, and text acquisition. A program 113, a text search program 114, a text shaping program 118, an e-mail program 119, a search term count table 120, a finite automaton 121, and a user list 122 are stored, and a work area 123 is secured.
The search automaton creation program 109 includes a finite automaton creation program 110 and a user list creation program 111.
In this embodiment, a finite automaton is used as a method for searching for a search term from text.However, a method for searching for a search term from text is not limited to a finite automaton, and a finite automaton may be used. The methods shown in

Non-Patent Documents

3 and 4 may be used. In this case, the expressions of the search automaton creation program and the finite automaton creation program are not appropriate. If more generalized expressions are used, the search characters It becomes like a column collation table creation program and a multiple character string collation table creation program.
The text search program 114 includes a text scanning program 115, a search term count program 116, and a search condition formula check program 117.
The above programs can also be stored in a computer readable / writable storage medium such as a hard disk device (not shown in the figure) or a flexible disk (not shown in the figure).
[0020]
The system control program 105 is activated in response to an instruction from the keyboard 101 by the administrator of the document search / delivery system.
The search condition formula registration control program 106 and the text search control program 112 are activated by the system control program 105 in response to a search condition formula registration instruction from the user 126 and text distribution from the news distribution source 125, respectively. 107, the search term count table creation program 108 and search automaton creation program 109 are controlled, and the text acquisition program 113, text search program 114, and text shaping program 118 are controlled.
As the e-mail program 119, an existing mail program generally used in a workstation or the like is used.
The electronic mail program 119 is activated by the system control program 105 in accordance with the processing result of the text search control program 112.
[0021]
Hereinafter, processing contents of the document search / delivery system in this embodiment will be described. First, processing contents of the system control program 105 will be described with reference to a PAD (Problem Analysis Diagram) diagram of FIG.
In the system control program 105, first, in step 700, the following steps are repeated until an end command is input from the keyboard 101.
In this iterative process, first, in step 701, it is checked whether or not a search condition expression is sent from the user 126 by e-mail.
If a search condition expression has been sent, the search condition expression registration control program 106 is activated in step 702 to register the search condition expression.
Next, in step 703, it is checked whether or not a text is sent from the news distribution source 125 by e-mail.
If text has been sent, the text search control program 112 is activated in step 704 to search for text.
Next, in step 705, the text search result in the text search control program 112 is examined. If it is determined that there is at least one established search condition expression, the e-mail program 119 is started in step 706. The corresponding text is distributed by e-mail to the user who specifies the established search condition formula. The above is the processing content of the system control program 105.
[0022]
Next, the processing contents of search condition expression registration by the search condition expression registration control program 106 will be described with reference to the PAD diagram of FIG.
The search condition expression registration control program 106 is activated by the system control program 105.
In step 800, the program starts the search condition expression analysis program 107, and analyzes the search condition expression sent from the user 126 by e-mail.
In the search condition expression analysis processing, the search condition expression is expanded into one of the following formats. That is,
(A) Only single search terms
(B) a plurality of logical product conditions of (a),
(C) OR condition of plural (a) and plural (b)
It is. That is, the search condition formula is expanded so that the logical product condition is not applied outside the logical sum condition.
However, the negative condition is applied to the entire search condition expression or the search term.
Here, the logical product condition is, for example,
Search formula “Document” and “Search”
As such, it has the meaning of “Find a document in which both“ document ”and“ search ”strings appear.”
Search formula “document” or “search”
As described above, this means “search for a document in which either“ document ”or“ search ”appears.
The negative condition is, for example,
Search formula “¬“ Search ”
As such, it has the meaning of “Find a document where“ search ”does not appear”.
For example, if “A”, “B”, “C”, “D”, and “E” are search terms, they are expanded into the following formats, respectively.
(A) A
(B) AandBandCand ...
(C) (AandB) or Cor (DandE) or ...
For example, when the search condition formula is a logical product condition for the logical sum condition, that is, when the logical product condition is applied outside the logical sum condition, the search condition formula is expanded as shown in FIG. Transforms into
Here, a logical product condition or a single search term portion in the expansion result is called a term.
[0023]
Next, in step 801, the search condition expression registration control program 106 activates the search term count table creation program 108, and the search terms included in the search condition expression obtained as an analysis result of the search condition expression analysis program 107. And information indicating whether or not a negative condition is applied to the search condition expression is stored in the search term count table 120.
The search term count table 120 is a table that stores, for each search condition formula (that is, for each user), the number of search terms included in the search condition formula corresponding to a term.
FIG. 10 shows the structure.
The search term count table 120 shown in the figure includes a user number 1: “(“ document ”and“ search ”) or (“ document ”and“ search ”)”, and a user number 2: ““ character ”and ¬” recognition. "And ¬" Learning "" and user number 3: Created for three search condition expressions "¬ (" Search "and" Learning ")".
First, a search condition expression negation flag is provided as the first element of the search term count table 120.
In this search condition expression negation flag, 1 is set when a negative condition is applied to the entire search condition expression, and 0 is set otherwise.
For example, in the case of the example shown in this figure, a negative condition is applied to the entire search condition expression for user number 3, so 1 is set. However, since no negative condition is applied to the other search condition expressions, 0 is set. .
For this search condition expression negation flag, the number of search terms included in the term is sequentially connected from the first term as a list.
For example, the number of search terms included in the first term of the search condition expression is stored in the second element of the list corresponding to the user number 1, but in this term, “document” and “search” are stored. "2" is stored because two search terms "" are included.
Furthermore, 2 which is the number of search terms included in the second term is stored in the next element.
Each of the second and subsequent elements stores the number of search terms, and at the same time, an area is provided for counting the number of occurrences of search terms that match a partial character string in the text during text search. .
[0024]
Finally, in step 802, the search condition expression registration control program 106 activates the search automaton creation program 109, and all search terms included in the search condition expressions obtained as an analysis result of the search condition expression analysis program 107. Is created.
Then, a user list 122 is created by connecting the identifiers of the users 126 that specify the search condition expressions including those search terms as a list, and this is connected to the finite automaton 121.
The processing contents of the search automaton creation program 109 will be described later in detail.
The above is the processing content of the search condition expression registration by the search condition expression registration control program 106.
[0025]
Next, text search processing contents by the text search control program 112 will be described with reference to the PAD diagram of FIG.
The text search control program 112 is activated by the system control program 105.
First, the program starts the text acquisition program 113 in step 1100 and stores the text sent from the news distribution source 125 by e-mail or the like in the work area 123.
Next, in step 1101, the text search program 114 is activated, the search term count table 120 created by the search term count table creation program 108, the finite automaton 121 created by the search automaton creation program 109, and the user. The text stored in the work area 123 is searched using the list 122.
The processing contents of the text search program 114 will be described in detail later.
Next, in step 1102, the result of the text search process in the text search program 114 is examined. If even one search condition expression is satisfied, the text shaping program 118 is started in step 1103, and the work area The text stored in 123 is formed into a format that can be distributed by the e-mail program 119.
The above is the text search processing content by the text search control program 112.
[0026]
Next, the processing contents of the search automaton creation program 109 in the search condition expression registration process by the search condition expression registration control program 106 shown in FIG. 8 will be described using the PAD diagram of FIG.
As shown in FIG. 12, the search automaton creation program 109 first starts the finite automaton creation program 110 in step 1200, and executes a finite automaton 121 for collating all the search terms extracted by the search condition expression analysis program 107. create.
As a method for creating the finite automaton 121, the methods disclosed in Non-Patent Document 1 and Non-Patent Document 2 are used.
Next, in step 1201, the user list creation program 111 is activated, and the identification number (user number) of the user 126 specifying the search condition expression including the search term extracted by the search condition expression analysis program 107 and its search term. The user list 122 is created from the number of the term including the term (term number), and connected to the output table of the finite automaton 121 via the pointer in step 1202.
[0027]
FIG. 13 shows an example of the finite automaton 121 and the user list 122 created from the three search condition formulas used for explaining the search term count table 120.
The finite automaton 121 shown in the figure collates six search terms “document”, “character”, “search”, “search”, “recognition”, and “learn” included in the search condition formula. is there.
In this figure, for the sake of simplicity, the state transition is shown in units of 2 bytes (one character).
The finite automaton 121 is the same as that shown in Non-Patent Document 1 and Non-Patent Document 2, but the output table is different.
Appearance flags are provided corresponding to the respective state numbers stored in the output table. This appearance flag is reset to 0 at the start of text scanning, and is set to 1 when a partial character string that matches the search term appears in the text.
In addition, a pointer is provided at the end of the output table to point to the user list 122 in which the user numbers and item numbers of the search condition formula including the search term are connected as a list.
Each element of the user list 122 is provided with a search term negation flag, which is set to 1 when a negative condition is applied to the search term in the search condition formula, and set to 0 otherwise.
For example, in this figure, the search term “document” exists in the terms No. 1 and No. 2 of the search condition expression of user number 1 without a negative condition, and the search term “recognition” is the search condition expression of user number 2. The term number 1 exists with a negative condition.
[0028]
Next, processing contents of the text search program 114 for executing text search processing in the text search control program 112 shown in FIG. 11 will be described with reference to the PAD diagram of FIG.
As shown in the figure, the text search program 114 first sets the search term appearance count count area provided in the search term count table 120 and the output table of the finite automaton 121 as an initial setting for text search in step 1400. The provided appearance flag is reset to 0.
Next, in step 1401, the text scanning program 115 is activated, the text stored in the work area 123 is scanned with the finite automaton 121 created by the finite automaton creating program 110, and the search terms are collated.
For the search term matching processing by the finite automaton 121, the methods disclosed in the non-patent document and the non-patent document 2 are used.
At this time, for a search term in which a matching partial character string appears in the text, the appearance flag of the output table corresponding to the search term is set to 1.
Next, in step 1402, the search term number counting program 116 is started, and the search terms in which a matching partial character string appears in the text are counted.
This is done by tracing the user list 122 whose appearance flag is 1 provided in the output table and counting the number of occurrences of the search term in the search term count table 120 corresponding to the user number and the item number whose search term negative flag is 0. This is realized by increasing the value of the working area by one.
Next, in step 1403, a search term in which no matching partial character string appears in the text is counted.
This is done by tracing the user list 122 whose appearance flag is 0 provided in the output table and counting the number of occurrences of the search term in the search term count table 120 corresponding to the user number and item number whose search term negative flag is 1. This is realized by increasing the value of the working area by one.
[0029]
Next, in step 1404, the search condition formula check program 117 is activated, and the search term count table 120 is referenced to check whether the search condition formula is satisfied.
Here, a search condition expression that satisfies either of the following two conditions can be regarded as being satisfied.
Condition (1): There is at least one item number in which the search condition expression negation flag is 0 (that is, no negative condition is applied to the search condition expression) and the number of search terms matches.
Condition (2): The search condition expression negative flag is 1 (that is, the search condition expression has a negative condition), and there is no item number with the same number of search terms.
[0030]
Determination of success or failure of this search condition formula will be described with reference to FIG.
In the present invention, as shown in the figure, the search condition expression designated by the user 126 is transformed into a form in which terms are connected by a logical sum condition, and the number of search terms included in each term is stored in the search term count table 120. doing.
The fact that terms are connected by a logical sum condition means that if any one of those terms is satisfied, the entire search condition formula is satisfied.
Here, the term is a single search term or a logical product condition of the search terms.
Therefore, when all the partial character strings that match the search term included in the term appear in the text, that is, the search term count stored in the search term count table 120 and the search term count program 116 in advance. If the search term appearance counts calculated by the above match, the term is established.
As a result, a search condition expression constituted by a logical OR condition of terms is also established.
Thus, if the above condition (1) is satisfied, it can be considered that the search condition formula is satisfied.
[0031]
The condition (2) is opposite to the condition (1).
When a negative condition is applied to the search condition expression, if the search condition expression obtained by taking the negative condition from the search condition expression is satisfied, the original search condition expression applied with the negative condition is not satisfied, and If the search condition formula taking the negative condition is not satisfied, the original search condition formula having the negative condition is satisfied.
If there is no term in which the number of search terms stored in the search term count table 120 in advance matches the search term appearance count calculated by the search term count program 116, a search that takes a negative condition The conditional expression is not satisfied, and the original search conditional expression with a negative condition is satisfied.
Thus, if the above condition (2) is satisfied, the search condition formula can be regarded as being satisfied.
Since it can be considered that the search condition expression satisfying either of the above conditions is satisfied, the user number is output to the text search control program 112 in step 1405, and this program is terminated.
The above is the embodiment of the document search method of the present invention.
[0032]
The processing procedure of the search condition expression registration control program 106 in this embodiment shown in FIG. 8 will be specifically described below.
First, the processing of the search condition expression analysis program 107 in step 800 of the search condition expression registration control program 106 of FIG. 8 will be described.
The search condition expression analysis program 107 is activated by the search condition expression registration control program 106.
For example, “Document” and (“Search” or “Search”), ie, “Documents that include“ Document ”and include“ Search ”or“ Search ””, ”is the number of expansion results of the search condition expression of user 1 In (1), the number of search conditions of user 2 ““ character ”and ¬ (“ recognition ”or“ learning ”)”, that is, “a document that includes“ character ”but does not include“ recognition ”and“ learning ”. The expansion result of the search condition expression of the user 3 “¬ (“ search ”and“ learning ”), that is,“ a document including neither “search” nor “learning” ”is expressed as the number (2). Shown in 3).
[0033]
[Expression 1]

[Expression 2]

[Equation 3]

As a result, a search condition expression in which no logical product condition is applied outside the logical sum condition, that is, “(“ document ”and“ search ”) or (“ document ”and“ search ”)”, ““ character ”and ¬ "Recognition" and ¬ "Learning", "¬ (" Search "and" Learning ")" are obtained.
Table 1 summarizes the search terms included in these search condition formulas from the viewpoint of user numbers and item numbers.
[0034]
[Table 1]

Here, “¬” is given before the user number that a negative condition is applied to the entire search condition expression, and “¬” is given before the search term is a negative condition for the search term. Indicates that it will be applied.
For example, item number 1 of the search condition expression of user number 1 includes two search terms “document” and “search”, and item number 2 includes two search terms “document” and “search”. The item number 1 of the search condition formula of the user number 2 includes three search terms “character”, “recognition”, and “learning”. Among these, “recognition” and “learning” are subject to a negative condition. Represents.
[0035]
Next, the processing of the search term count table creation program 108 in step 801 of the search condition expression registration control program 106 of FIG. 8 will be described. The search term count table creation program 108 is started next to the search condition expression analysis program 107 by the search condition expression registration control program 106.
This program creates a search term count table 120 based on the analysis result by the search condition expression analysis program 107.
The search term count table 120 created from the analysis result of Table 1 is as shown in FIG.
This table stores the number of search terms corresponding to the item number for each user number.
As described above, the search condition expression negative flag is set to 1 if a negative condition is applied to the entire search condition expression, and 0 otherwise.
[0036]
Finally, the processing of the search automaton creation program 109 in step 802 of the search condition expression registration control program 106 of FIG. 8 will be described.
The processing contents of this program are shown in FIG. As shown in FIG. 1, this program includes a finite automaton creation program 110 and a user list creation program 111.
Hereinafter, it demonstrates in order.
[0037]
The finite automaton creation program 110 creates a finite automaton 121 that collates all search terms extracted by the search condition expression analysis program 107.
For example, in the case of the number (1), the number (2), the number (3) and the example shown in Table 1, “document”, “search”, “search”, “character” from the analysis result of the search condition formula analysis program 107 Six search terms are obtained: "recognition" and "learning".
When a finite automaton 121 for collating these search terms is created using the methods disclosed in Non-Patent Document 1 and Non-Patent Document 2, a finite automaton 121 as shown in FIG. 13 is obtained.
However, here, for the sake of simplicity, the state transition is shown in units of one character, that is, 2 bytes.
[0038]
In the user list creation program 111, the user number and the item number of the search condition formula including the search term obtained by the search condition formula analysis program 107 and information on whether or not a negative condition is applied to the search term are connected in a list. To create a user list 122 and connect it to the output table of the finite automaton 121 via a pointer.
The method for creating the user list 122 is as described above.
FIG. 15 shows a user list 122 created from the analysis result of Table 1.
For example, the search term “learning” is included in the term of item number 1 of the search condition expression of user number 2 with a negative condition, and is included in the term of item number 1 of the search condition expression of user number 3 without a negative condition. Therefore, the user list 122 is created by connecting the numbers corresponding to these lists.
The user list 122 created in this way is connected to the output table of the finite automaton 121 via a pointer.
The above is the detailed procedure for registering the search condition formula in the search condition formula registration control program 106 in this embodiment.
[0039]
The processing procedure of the text search control program 112 in this embodiment shown in FIG. 11 will be specifically described below.
First, the processing of the text acquisition program 113 in step 1100 of the text search control program 112 in FIG. 11 will be described.
The text acquisition program 113 is activated by the text search control program 112.
In this program, the text distributed by electronic mail is stored in the work area 123.
The following description will be given on the assumption that the text “analyze the retrieved document format and recognize the character string portion” is stored in the work area 123 by this program.
[0040]
Processing of the text search program 114 in step 1101 of the text search control program 112 in FIG. 11 will be described.
The processing contents of this program are shown in FIG.
As shown in FIG. 1, this program includes a text scanning program 115, a search term count program 116, and a search condition expression check program 117.
Hereinafter, it demonstrates in order.
First, initialization is performed before the text scanning program 115, the search term count program 116, and the search condition expression check program 117 are executed.
Here, as shown in FIGS. 10 and 15, the search term appearance count area in the search term count table 120 and the appearance flag in the output table are reset to zero.
[0041]
In the text scanning program 115, the text stored in the work area 123 is scanned by the finite automaton 121, and the search terms are collated.
Here, the appearance flag corresponding to the search term in which the matching partial character string appears in the text is set to 1.
For example, when the text “analyze the format of the retrieved document and recognize the character string portion” is scanned as shown in FIG. 16, first, “search” appears in the text.
[0042]
Therefore, the appearance flag corresponding to “search” is set to 1.
Hereinafter, since “document”, “character”, and “recognition” appear in this order, the appearance flags corresponding to those search terms are set to 1.
For the search terms “search” and “learn”, no matching partial character string appears in the text, so the appearance flags corresponding to those search terms remain 0.
[0043]
The search term count program 116 first counts the search terms in which matching partial character strings appear in the text.
Here, a search corresponding to a user number and an item number in which a matching partial character string appears in the text, that is, the appearance flag is 1, and the search term is not negative, that is, the search term negative flag is 0 The search term count area for the search term in the term count table 120 is incremented by one.
For example, in the example of FIG. 17, the search term “search” is counted because the search term negation flag of the item number 1 of the user number 3 is 0, but the search term “recognition” is counted by the search of the item number 1 of the user number 2. Since the term negation flag is 1, it is not counted.
Next, a search term in which no matching partial character string appears in the text is counted.
Here, the corresponding partial character string does not appear in the text, that is, the appearance flag is 0, and the search term is negative, that is, the search term negative flag is 1, corresponding to the user number and item number. The search term count area for the search term in the search term count table 120 is incremented by one.
For example, in the example of FIG. 18, the search term “learning” is counted because the search term negation flag of the item number 1 of the user number 2 is 1, but the search term negation flag of the item number 1 of the user number 3 is 0. do not do.
[0044]
Next, the search condition expression check program 117 refers to the search term count table 120 to check whether the search condition expression is satisfied.
Here, since the search condition expression satisfying either of the following two conditions can be regarded as being satisfied, the number of the user who specified the search condition expression is output. Condition (1): The search condition expression negation flag is 0, that is, the search condition expression is not subjected to a negative condition, and there is at least one item number with the same number of search terms.
Condition (2): The search condition expression negative flag is 1, that is, the search condition expression has a negative condition, and there is no item number with the same number of search terms.
For example, in the case of the example of FIG. 19, the search condition expression negative flag corresponding to the user number 1 is 0, and the search term number of the item number 1 matches the search term appearance count number. Satisfies.
Further, since the search condition expression negative flag corresponding to the user number 3 is 1 and there is no item number in which the number of search terms and the number of search term appearance counts match, the above condition (2) is satisfied.
However, in user number 2, although the search condition expression negative flag is 0, there is no term in which the number of search terms matches the number of search term appearance counts, so none of the above conditions is satisfied. .
Therefore, since it can be considered that the search condition formulas for the

user numbers

1 and 3 are satisfied, these user numbers are output.
[0045]
Finally, the processing of the text shaping program 118 in step 1103 of the text search control program 112 in FIG. 11 will be described.
The text shaping program 118 is activated by the text search control program 112 only when a user number is output as a result of the text search program 114.
In this program, the text stored in the work area 123 is formed into a format that can be distributed by the e-mail program 119.
For example, control information called a header is added to the head of text.
FIG. 20 shows an example of the processing result of this program.
In this figure, “To:”, “Subject:” and
Each line of “From:” is added.
An address to which the text is distributed, for example, an e-mail destination address is added to the “To:” line.
In FIG. 20, “user 1” and “user 3” are described in order to distribute the text to user 1 and user 3.
Information that can be easily identified by the user is added to the “Subject:” line.
In this figure, the first few characters of the text to be distributed are extracted and described, but anything may be added here.
In the “From:” line, an address of a text sender, for example, an email sender is added.
In this figure, “document search and delivery system”, which is the name of the system that distributes text, is described.
The above is the detailed procedure of the text search in the text search control program 112 in the present embodiment.
[0046]
As a result of the processing of the text search control program 112, if even one search condition expression is satisfied, as shown in FIG. 7, after the text search control program 112 ends, the e-mail program 119 It is started by the control program 105.
In this program, the text is distributed by electronic mail with reference to the header added by the text forming program 118.
For example, in the example of FIG. 20, a portion corresponding to the “To:” line of the header is referred to, and the text with the header added is sent to the destination described there.
In this figure, since “User 1” and “User 3” are described in the “To:” line, the text is distributed to the user 1 and the user 3, and the process is terminated.
[0047]
As described above, according to the present invention, when registering a search condition expression for a plurality of users, the identification information of the user who specified the search terms included in those search condition expressions and the search condition expression specified by the user The number of search terms included in the search term is stored, and when searching for text, the number of search terms for which a matching partial character string appears in the text and the number of search terms stored for each user are stored. Since it is possible to determine whether or not all search condition expressions are satisfied by comparing, it is possible to determine the success or failure of the search condition expressions for all users by a single text scan. Processing can be performed at once.
As a result, the text obtained from a plurality of information sources is determined based on a search condition formula registered in advance by the user, and whether or not the search condition formula for the plurality of users is satisfied by a single scan of the text, It is possible to realize a highly immediate document search / delivery system that can immediately distribute the text to users who satisfy the conditions.
In addition, since this document retrieval / delivery system has high immediacy, the time from when the user notifies the system of the retrieval condition expression until the retrieved text is delivered is short. It is possible to determine whether it is applied.
[0048]
<< Second Example >>
Next, a second embodiment of the present invention will be described.
In the document retrieval and delivery system shown in this embodiment, by managing the distribution conditions for each user, the text is distributed according to the user's wishes, such as distributing text to a certain extent or distributing it at a fixed time. Is possible.
When used as a commercial system, it is also possible to distribute text with a time delay according to the contract conditions of the user.
[0049]
This embodiment has basically the same configuration as the first embodiment (FIG. 1), but the configuration in the main memory 104 is different.
The configuration in the main memory 104 is as shown in FIG.
As shown in FIG. 21, a distribution management table 2108 is secured in the main memory 104a, and a distribution condition registration control program 2100 and a text distribution control program 2104 are newly provided under the control of the system control program 105a.
Also, the distribution condition analysis program 2101 and the distribution condition registration program 2102 are controlled under the control of the distribution condition registration control program 2100, the distribution information storage program 2103 is controlled under the control of the text search control program 112a, and the text distribution control program 2104 is under control. A text distribution program 2105 is provided.
This text distribution program 2105 includes a distribution condition check program 2106, an e-mail program 119, and a distribution information correction program 2107.
As the e-mail program 119, an existing mail program generally used in a workstation or the like is used.
The above programs can also be stored in a storage medium that can be read and written by a computer, such as a hard disk device or a flexible disk.
[0050]
The system control program 105a is activated in response to an instruction from the keyboard 101 by the administrator of the document search / delivery system.
The distribution condition registration control program 2100, the search condition expression registration control program 106, the text search control program 112a, and the text distribution control program 2104 are a distribution condition and search condition expression registration instruction from the user 126, and a distribution condition registration instruction from the keyboard 101. The text distribution from the news distribution source 125 is activated by the system control program 105a, and controls the distribution condition analysis program 2101 and distribution condition registration program 2102, the search condition expression analysis program 107, and the search term count table creation program 108, respectively. And control of search automaton creation program 109, text acquisition program 113, text search program 114, text shaping program 118, and distribution information storage program Control of the continuously 2103, to control the text distribution program 2105.
Hereinafter, processing contents of the document search / delivery system in this embodiment will be described.
[0051]
First, the processing contents of the system control program 105a will be described with reference to the PAD diagram of FIG.
In the system control program 105a, first, in step 2200, the following steps are repeated until an end command is input from the keyboard 101.
In this repetitive process, first, in step 2201, it is checked whether or not a distribution condition is sent by an e-mail from the user 126 or an input from the keyboard 101.
If distribution conditions have been sent, the distribution condition registration control program 2100 is activated in step 2202 to register the distribution conditions.
Next, in step 2203, it is checked whether or not a search condition expression has been sent from the user 126 by e-mail.
If a search condition formula has been sent, the search condition formula registration control program 106 is activated in step 2204 to register the search condition formula.
Next, in step 2205, it is checked whether or not text has been sent from the news distribution source 125 by e-mail.
If text has been sent, the text search control program 112a is activated in step 2206 to search for the text.
Finally, in step 2207, the text distribution control program 2104 is activated to determine the distribution condition and distribute the text only to users who satisfy the condition.
The above is the processing content of the system control program 105a.
[0052]
The processing contents of the distribution condition registration control program 2100, the text distribution control program 2104, and the text search control program 112a, which are different from the first embodiment, will be described below.
First, the contents of the distribution condition registration process performed by the distribution condition registration control program 2100 will be described with reference to the PAD diagram of FIG.
The distribution condition registration control program 2100 is activated by the system control program 105a.
In step 2300, the program first activates the distribution condition analysis program 2101, and analyzes the distribution conditions sent by e-mail from the user 126 or input from the keyboard 101.
In the distribution condition analysis process, the following information is extracted from the distribution condition.
(A) User identifier for setting distribution conditions
(B) Distribution condition format
(C) Distribution condition settings
As the format of the distribution condition (B), the distribution condition type such as “distribution time”, “number of distributions”, “delay time” is extracted.
The value extracted as the setting value of the distribution condition in (C) is, for example, the distribution time if (B) is “distribution time”, the distribution number if “distribution number”, and the actual distribution after searching if “delay time”. Is the elapsed time until.
For example,
User number 1: Distribution time (18:00)
When a distribution condition that means “distribute to user number 1 at 18:00” is sent, “user number 1”, “distribution time”, and “18:00” are extracted.
User number 2: Number of distributions (5)
When a distribution condition that means “distribute when 5 items are collected for user number 2” is sent, “user number 2”, “number of distributions”, and “5” are extracted.
User number 3: delay time (01:30)
When a distribution condition that means “distributed with a delay of 1 hour 30 minutes” is sent, “user number 3”, “delay time”, and “01:30” are extracted. .
Finally, in step 2301, the distribution condition registration program 2102 is activated, and the result analyzed by the distribution condition analysis program 2101 is stored in the distribution management table 2108.
FIG. 24 shows an example of the distribution management table 2108.
The distribution management table 2108 stores the distribution condition format and setting value extracted by the distribution condition analysis program 2101 in a form corresponding to the user number, and secures a distribution condition check area and a distribution text number storage area. .
[0053]
Next, text search processing contents by the text search control program 112a will be described with reference to the PAD diagram of FIG.
The text search control program 112a is activated by the system control program 105a.
Of the processing contents of this program shown in FIG. 25, the processing contents of the text acquisition program 113, the text search program 114, and the text shaping program 118 in steps 2500 to 2503 are as described in the first embodiment.
In step 2504, the text search control program 112a starts the distribution information storage program 2103, and additionally stores the number of text to be distributed that satisfies the search condition in the distribution text number storage area of the distribution management table 2108.
In step 2505, the number of text numbers stored in the distribution text number storage area of the distribution management table 2108 or the current time is stored in the distribution condition check area of this table.
At this time, the number of stored text numbers is stored when the format of the distribution condition is “number of distributions”, and the current time is stored when the format is “delay time”. In the case of “distribution time”, nothing needs to be stored.
Thereafter, the text formed by the text forming program 118 is stored in the work area 123 in step 2506.
[0054]
Finally, processing contents of text distribution by the text distribution control program 2104 will be described with reference to the PAD diagram of FIG.
The text distribution control program 2104 is activated by the system control program 105a.
In step 2600, the program activates the text distribution program 2105, determines the distribution condition for each user, and distributes the text to users who satisfy the condition.
[0055]
The detailed processing contents of the text distribution program 2105 are shown in FIG.
In step 2700, the text distribution program 2105 first repeats the following steps for all user numbers whose distribution conditions are stored in the distribution management table 2108.
In this repetitive processing, first, the distribution condition check program 2106 is started in step 2701, and it is determined whether or not the distribution conditions are satisfied using the distribution management table 2108.
Here, if the following conditions are satisfied, it is considered that the distribution conditions are satisfied.
Condition (1): The format of the distribution condition is “distribution time”, and the setting value of the distribution condition matches the current time, or the current time exceeds the setting value of the distribution condition.
Condition (2): The format of the distribution condition is “number of distributions”, and the setting value of the distribution condition matches the number of cases stored in the distribution condition check area.
Condition (3): The distribution condition format is “delay time” and the set value of the distribution condition matches the elapsed time from the time stored in the distribution condition check area to the current time, or the distribution condition The elapsed time exceeds the set value.
If the above condition is satisfied, it is determined in step 2702 that the distribution condition is satisfied. In step 2703, the e-mail program 119 is activated and stored in the distribution text number storage area of the distribution management table 2108. Distribute the text of the number to the user number.
Finally, in step 2704, the distribution information correction program 2107 is activated to reset the distribution condition check area and distribution text number storage area of the distribution management table 2108 corresponding to the user number who distributed the text.
This is realized by NULL clearing the distribution condition check area and deleting the text number from the distribution text number storage area.
The above is the embodiment of the document search system of the present invention.
[0056]
The processing procedure of the distribution condition registration control program 2100 in this embodiment shown in FIG. 23 will be specifically described below with reference to FIG.
First, the processing of the distribution condition analysis program 2101 in step 2300 of the distribution condition registration control program 2100 of FIG. 23 will be described.
The distribution condition analysis program 2101 is activated by the distribution condition registration control program 2100.
This program analyzes the distribution condition sent from the user 126 by e-mail or the distribution condition input from the keyboard 101.
As an example, the results of analyzing distribution conditions of “user number 1: distribution time (18:00)”, “user number 2: distribution number (5)”, and “user number 3: delay time (01:30)” are shown in FIG. 28.
For example, in the case of the distribution condition “user number 1: distribution time (18:00)”, the analysis result is the user number “1” for setting the distribution condition, the distribution condition format “distribution time”, and the distribution condition setting value “ 18:00 "is obtained.
[0057]
Next, processing of the distribution condition registration program 2102 in step 2301 of the distribution condition registration control program 2100 of FIG. 23 will be described.
The distribution condition registration program 2102 is started next to the distribution condition analysis program 2101 by the distribution condition registration control program 2100.
This program creates a distribution management table 2108 based on the analysis result by the distribution condition analysis program 2101.
An example of the created distribution management table 2108 is shown in FIG.
In this table, the format and setting value of the distribution condition are stored corresponding to each user number based on the analysis result by the distribution condition analysis program 2101. Also, a distribution condition check area and a distribution text number storage area are secured.
The above is the detailed procedure of the search condition expression registration process in the distribution condition registration control program 2100 in the present embodiment.
[0058]
The processing procedure of the text search control program 112a in the present embodiment shown in FIG. 25 will be specifically described below.
Of the processing contents of this program shown in FIG. 25, the processing contents of the text acquisition program 113, the text search program 114, and the text shaping program 118 in steps 2500 to 2503 are as described in detail in the first embodiment.
The following is detailed processing contents of the distribution information storage program 2103 in steps 2504 to 2506.
The distribution information storage program 2103 is started next to the text shaping program 118 by the text search control program 112a.
In step 2504, the program first stores the text number in the distribution text number storage area of the distribution management table 2108 corresponding to the user number for which the search condition expression is satisfied.
FIG. 29 shows an example of processing contents of this program.
This figure is an example when the search condition expressions of user number 1 and user number 2 are established for the 59th text.
Therefore, the text number “59” is stored in a location corresponding to user number 1 and user number 2 in the distribution text number storage area of the distribution management table 2108.
Next, the distribution information storage program 2103 stores the number of text numbers stored in the distribution text number storage area of the distribution management table 2108 in step 2504 or the current time in the distribution condition check area of the distribution management table 2108. Store.
At this time, the number of stored text numbers is stored when the format of the distribution condition is “number of distributions”, and the current time is stored when the format is “delay time”. In the case of “distribution time”, nothing needs to be stored.
In the example of FIG. 29, the format of the distribution condition for user number 2 is “number of distributions”, so the value of the distribution condition check area is incremented by 1 to “5”, but the distribution condition format for user number 1 Does not do anything because it is a “distribution time”.
Finally, the distribution information storage program 2103 stores the text formed by the text forming program 118 in the work area 123 in step 2506 and ends.
[0059]
Finally, the processing of the text distribution program 2105 in step 2600 of the text distribution control program 2104 in FIG. 26 will be described.
The detailed processing contents of the text distribution program 2105 are as shown in FIG.
First, in step 2700, the following processing is repeated for all users whose distribution conditions are stored in the distribution management table 2108.
In this iterative process, first, in step 2701, the distribution condition check program 2105 is activated to determine the distribution conditions.
Here, if the following conditions are satisfied, it is considered that the distribution conditions are satisfied.
Condition (1): The format of the distribution condition is “distribution time”, and the setting value of the distribution condition matches the current time, or the current time exceeds the setting value of the distribution condition.
Condition (2): The format of the distribution condition is “number of distributions”, and the setting value of the distribution condition matches the number of cases stored in the distribution condition check area.
Condition (3): The distribution condition format is “delay time” and the set value of the distribution condition matches the elapsed time from the time stored in the distribution condition check area to the current time, or the distribution condition The elapsed time exceeds the set value.
[0060]
The format of the distribution condition for user number 1 is “distribution time”.
However, since the current time “14:00” does not exceed the set value “18:00” of the distribution condition, the distribution condition is not satisfied. Therefore, it moves to the next repetition.
[0061]
The format of the distribution condition for user number 2 is “number of distributions”, and the number stored in the distribution condition check area and the set value of the distribution conditions match “5”. 119 is activated and the text of the number stored in the distribution text number storage area of the distribution management table 2108 is distributed. In this figure, the distribution text number storage area corresponding to user number 2

stores text numbers

19, 24, 33, 42, and 59, so the text stored in work area 123 has the same number. Is distributed to user number 2.
Next, in step 2704, the distribution information correction program 2107 is activated to reset the user number who distributed the text, that is, the distribution check area and distribution text number storage area of the distribution management table 2108 corresponding to the user number 2. Repeat the process.
[0062]
The format of the distribution condition for user number 3 is “delay time”. The elapsed time from the time stored in the distribution condition check area to the current time and the time stored in the set value of the distribution condition are “01”. : 30 ″, the e-mail program 119 is started in step 2703, and the text of the number stored in the distribution text number storage area of the distribution management table 2108 is distributed.
In the drawing, since the text number 53 is stored in the distribution text number storage area corresponding to the user number 3, the same text stored in the work area 123 is distributed to the user number 3.
Next, in step 2704, the distribution information correction program 2107 is activated to reset the user number who distributed the text, that is, the distribution check area and distribution text number storage area of the distribution management table 2108 corresponding to the user number 2.
FIG. 30 shows the distribution management table 2108 when all the repetition processes are completed.
Since text distribution processing has been performed for user number 2 and user number 3, the distribution condition check area and the distribution text number storage area corresponding to them are reset.
[0063]
As described above, according to the present invention, the distribution condition is set for each user, and the text satisfying the search condition formula is distributed according to the distribution condition. Text can be distributed according to the user's wishes, such as distribution on time.
When used as a commercial system, it is also possible to distribute text with a time delay according to the contract conditions of the user.
As a result, the text obtained from a plurality of information sources is determined based on a search condition formula registered in advance by the user, and whether or not the search condition formula for the plurality of users is satisfied by a single scan of the text, It is possible to realize a highly flexible document search / delivery system that can distribute the text according to each user's desired distribution conditions.
[0064]
《Third embodiment》
Next, a third embodiment of the present invention will be described.
In the document search / delivery system shown in the present embodiment, search terms included in the search condition formula designated by each user are managed for each user, and the search is managed when the user is instructed to delete the search condition formula. Previous information can be easily deleted from the user list by searching for a finite automaton by term and changing the pointer of the user list.
According to this embodiment, even when there is an instruction to change the search condition formula by the user, it can be easily changed.
This embodiment has basically the same configuration as the first embodiment (FIG. 1), but the configuration in the main memory 104 is different.
The configuration in the main memory 104 is as shown in FIG.
As shown in FIG. 31, a search term management table 3106 is secured in the main memory 104b, and a search condition expression deletion control program 3100 is newly provided under the control of the system control program 105b.
Further, a user list correction program 3101 and a search term count table correction program 3104 are provided under the control of the search condition expression deletion control program 3100, and a search term management table creation program 3105 is provided under the control of the search condition expression registration control program 106b.
The user list correction program 3101 includes a finite automaton search program 3102 and a user list part deletion program 3103.
The above programs can also be stored in a storage medium that can be read and written by a computer, such as a hard disk device or a flexible disk.
[0065]
The system control program 105b is activated by an instruction from the keyboard 101 by the administrator of the document search / delivery system.
The search condition expression deletion control program 3100, the search condition expression registration control program 106 b, and the text search control program 112 receive the search condition expression registration and deletion instruction from the user 126 and the text distribution from the news distribution source 125, thereby the system control program 105 b. Control of the user list correction program 3101 and search term count table correction program 3104, search condition formula analysis program 107, search term count table creation program 108, search automaton creation program 109, and search term management table creation, respectively. Control of the program 3105, control of the text acquisition program 113, text search program 114, and text shaping program 118 are performed.
[0066]
Hereinafter, processing contents of the document search / delivery system in this embodiment will be described. First, the processing contents of the system control program 105b will be described with reference to the PAD diagram of FIG.
In the system control program 105b, first, in step 3200, the following steps are repeated until an end command is input from the keyboard 101.
In this iterative process, first, in step 3201, it is checked whether or not a search condition expression deletion instruction is sent from the user 126 by e-mail.
If a search condition expression deletion instruction has been sent, the search condition expression deletion control program 3100 is activated in step 3202 to delete the search condition expression.
Next, in step 3203, it is checked whether or not a search condition formula has been sent from the user 126 by e-mail.
If a search condition formula has been sent, the search condition formula registration control program 106b is activated in step 3204 to register the search condition formula.
Next, in step 3205, it is checked whether or not a text is sent from the news distribution source 125 by e-mail.
If text has been sent, the text search control program 112 is activated in step 3206 to search for text.
Next, in step 3207, the text search result in the text search control program 112 is checked. If it is determined that there is at least one search condition expression, the e-mail program 119 is started in step 3208. The corresponding text is distributed by e-mail to the user who specifies the established search condition formula.
The above is the processing content of the system control program 105b.
[0067]
The processing contents of the search condition expression deletion control program 3100 that is not in the first embodiment and the search condition expression registration control program 106b that is different in processing from the first embodiment will be described below.
First, the contents of search condition expression deletion processing by the search condition expression deletion control program 3100 will be described with reference to the PAD diagram of FIG.
The search condition expression deletion control program 3100 is activated by the system control program 105b.
In step 3300, the program first activates the user list modification program 3101 to delete the user list 122 corresponding to the user number designated to delete the search condition formula.
In step 3301, the search term count table correction program 3104 is activated, and the search term count table 120 corresponding to the user number designated to delete the search condition formula is deleted.
[0068]
The detailed processing contents of the user list correction program 3101 are shown in FIG.
First, the program repeats the following steps for the number of search terms stored in the search term management table 3106 in step 3400. The search term management table 3106 will be described later.
In the iterative process, first, in step 3401, the finite automaton search program 3102 is activated, and the finite automaton 121 is searched by the search term to obtain a pointer to the user list 122.
Finally, in step 3402, the user list partial deletion program 3103 is activated, and the list corresponding to the user number for deleting the search condition formula is deleted by changing the pointer of the user list 122.
The above is the processing content of the search condition expression deletion by the search condition expression deletion control program 3100.
[0069]
Next, the contents of the search condition expression registration process by the search condition expression registration control program 106b will be described with reference to the PAD diagram of FIG.
The search condition expression registration control program 106b is started by the system control program 105b.
Among the processing contents of this program shown in FIG. 35, the processing contents of the search condition expression analysis program 107, the search term count table creation program 108, and the search automaton creation program 109 in steps 3500-3502 are the same as in the first embodiment. As stated.
In step 3503, the search condition expression registration control program 106 b starts the search term management table creation program 3105 and stores the search terms included in the search condition expression in the search term management table 3106.
An example of the created search term management table 3106 is shown in FIG.
The search term management table 3106 shown in the figure includes a search condition expression ““ character ”and ¬ (“ recognition ”or“ learning ”) of the user 1“ “document” and (“search” or “search”) ”. Is created from a search term expression extracted from the search condition expression of the user 2, and the search condition expression of the user 3, “¬ (“ search ”and“ learn ”).
The above is the embodiment of the document search method of the present invention.
[0070]
The processing procedure of the search condition expression deletion control program 3100 in the present embodiment shown in FIG. 33 will be specifically described below with reference to FIG.
First, the processing of the user list modification program 3101 in step 3300 of the search condition expression deletion control program 3100 of FIG. 33 will be described.
The user list correction program 3101 is activated by the search condition expression deletion control program 3100. The detailed processing contents of this program are as shown in FIG.
In step 3400, the program repeats step 3401 and step 3402 for the number of search terms stored in the search term management table 3106.
In step 3401, the finite automaton search program 3102 is started, the finite automaton 121 is searched by the search term, and a pointer to the user list 122 is obtained.
In step 3402, the user list partial deletion program 3103 is activated to delete the user list 122 corresponding to the user number designated to delete the search condition formula.
[0071]
A processing example of the user list correction program 3101 is shown in FIG.
This figure shows an example in which the search condition formula of user number 2 including “character”, “recognition”, and “learning” in the search condition formula is deleted.
The search term management table 3106 stores “character”, “recognition”, and “learning”, and these search terms are repeatedly processed.
In the repetitive processing, in step 3401, the finite automaton 121 is searched by “character”, “recognition”, and “learning”, and a pointer to the user list 122 is obtained.
In step 3402, the user list 122 is traced and the user list 122 related to the user number 2 is deleted.
In this figure, the user number “2” in the user list 122 is deleted.
[0072]
Next, processing of the search term count table correction program 3104 in step 3301 of the search condition expression deletion control program 3100 of FIG. 33 will be described.
The search term count table correction program 3104 is started next to the user list correction program 3101 by the search condition expression deletion control program 3100.
In this program, the search term count table 120 corresponding to the user number designated to delete the search condition formula is deleted.
FIG. 38 shows a processing example of this program.
As shown in the figure, the search term count table corresponding to the user number 2 is deleted.
The above is the detailed procedure for deleting the search condition formula in the search condition formula deletion control program 3100 in the present embodiment.
[0073]
The processing procedure of the search condition expression registration control program 106b in this embodiment shown in FIG. 35 will be specifically described below.
Among the processing contents of this program shown in FIG. 35, the processing contents of the search condition expression analysis program 107, the search term count table creation program 108, and the search automaton creation program 109 in steps 3500-3502 are the same as in the first embodiment. As stated.
The detailed processing contents of the search term management table creation program 3105 in step 3503 will be described below.
The search term management table creation program 3105 is started next to the search automaton creation program 109 by the search condition expression registration control program 106b.
In this program, the search terms included in the search condition formula obtained as an analysis result by the search condition formula analysis program 107 are stored in the search term management table 3106.
[0074]
A processing example of this program is shown in FIG.
This figure shows an example in which the user of the user number 2 designates a search condition expression ““ structure ”and“ recognition ”, that is,“ a document including “structure” and “recognition” at the same time ”.
When this search condition expression is analyzed by the search condition expression analysis program 107, two search terms "structure" and "recognition" are obtained.
These search terms are stored in the search term management table 3106 in a form corresponding to the user number. In this figure, “structure” and “recognition” are stored at a location corresponding to the user number 2.
The above is the detailed procedure for registering the search condition formula in the search condition formula registration control program 106b in this embodiment.
[0075]
As described above, according to the present invention, the search terms included in the search condition formula designated by each user are managed for each user, and are managed when the user instructs to delete the search condition formula. By searching for a finite automaton using a search term and changing the pointer of the user list, previous information can be easily deleted from the user list.
As a result, the text obtained from a plurality of information sources is determined based on a search condition formula registered in advance by the user, and whether or not the search condition formula for the plurality of users is satisfied by a single scan of the text, In the document search / delivery system that immediately distributes the text to the user who satisfies the condition, it is possible to realize a document search / delivery system that can change the search condition formula at any time according to the user's request.
In this embodiment, the case where the search condition expression deletion instruction and the registration instruction are sent separately has been described. However, the search condition expression deletion instruction and the registration instruction are sent simultaneously, that is, an update instruction is sent. It will be clear that the deletion process and the registration process can be dealt with continuously.
Further, although this embodiment is obtained by adding a search condition expression deletion process to the first embodiment, it will be apparent that the present embodiment can also be applied to the second embodiment.
[0076]
【The invention's effect】
According to the present invention, even when the number of users, that is, the number of search condition formulas is large, it is possible to determine whether or not all the search condition formulas are satisfied by only one text scan. A text search can be realized.
As a result, even when the number of users increases, it is possible to provide a document search and delivery system that can perform high-speed real-time text search and distribution that does not depend on the number of users.
[Brief description of the drawings]
FIG. 1 is a diagram showing a configuration of a first exemplary embodiment of the present invention.
FIG. 2 is a diagram showing a configuration of a finite automaton in the prior art 1;
FIG. 3 is a diagram showing a configuration of a fail destination state number table in the prior art 1;
FIG. 4 is a diagram showing a configuration of an output table in the conventional technique 1;
FIG. 5 is a diagram showing an example of a finite automaton for Japanese text in Conventional Technique 2;
FIG. 6 is a diagram showing a schematic processing content of the present invention.
7 is a PAD showing the processing contents of a system control program 105. FIG.
8 is a PAD showing the processing contents of a search condition expression registration control program 106. FIG.
FIG. 9 is a diagram for explaining a search condition expression analysis method;
FIG. 10 is a diagram for explaining a method for creating a search term count table 120;
11 is a PAD showing the processing contents of a text search control program 112. FIG.
12 is a PAD showing the processing contents of a search automaton creation program 109. FIG.
13 is a diagram for explaining a method of creating a finite automaton 121 and a user list 122. FIG.
14 is a PAD showing the processing contents of a text search program 114. FIG.
FIG. 15 is a diagram for explaining a user list 122 creation process;
FIG. 16 is a diagram for explaining text scanning processing;
FIG. 17 is a diagram for explaining a search term counting process in which a matching partial character string appears in text.
FIG. 18 is a diagram for explaining a search term counting process in which a matching partial character string does not appear in text.
FIG. 19 is a diagram for explaining search condition expression validation checking processing;
FIG. 20 is a diagram for explaining text forming processing;
FIG. 21 is a diagram showing a configuration of a second exemplary embodiment of the present invention.
FIG. 22 is a PAD showing the processing contents of the system control program 105a.
23 is a PAD showing the processing contents of a distribution condition registration control program 2100. FIG.
24 is a diagram showing a configuration of a distribution management table 2108. FIG.
FIG. 25 is a PAD showing the processing contents of the text search control program 112a.
FIG. 26 is a PAD showing the processing contents of the text distribution control program 2104.
FIG. 27 is a PAD showing the processing contents of the text distribution program 2105.
FIG. 28 is a diagram for explaining distribution condition registration processing;
FIG. 29 is a diagram for explaining distribution information storage processing;
FIG. 30 is a diagram for explaining distribution condition check processing and distribution information correction processing;
FIG. 31 is a diagram showing a configuration of a third exemplary embodiment of the present invention.
FIG. 32 is a PAD showing the processing contents of the system control program 105b.
FIG. 33 is a PAD showing the processing contents of a search condition expression deletion control program 3100.
34 is a PAD showing the processing contents of a user list correction program 3101. FIG.
FIG. 35 is a PAD showing the processing contents of a search condition expression registration control program 106b.
FIG. 36 is a diagram showing the structure of a search term management table 3106;
FIG. 37 is a diagram for explaining user list correction processing;
FIG. 38 is a diagram for explaining search term count table correction processing;
FIG. 39 is a diagram for explaining search term management table creation processing;
[Explanation of symbols]
100 display
101 keyboard
102 CPU
103 bus
104, 104a, 104b Main memory
105, 105a, 105b System control program
106, 106b Search condition expression registration control program
107 Search condition formula analysis program
108 Search term count table creation program
109 Search automaton creation program
110 Finite automaton creation program
111 User list creation program
112, 112a Text search control program
113 Text acquisition program
114 text search program
115 Text scanning program
116 Search term count program
117 Search Condition Formula Check Program
118 Text shaping program
119 E-mail program
120 Search term count table
121 limited automata
122 User list
123 work area
124 LAN
125 News distributor
126 User of document retrieval and delivery system
2100 Distribution condition registration control program
2101 Distribution condition analysis program
2102 Distribution condition registration program
2103 Distribution information storage program
2104 Text distribution control program
2105 Text distribution program
2106 Distribution condition check program
2107 Distribution information correction program
2108 Distribution management table
3100 Search condition expression deletion control program
3101 User list modification program
3102 Finite automaton search program
3103 User list part deletion program
3104 Search term count table correction program
3105 Search term management table creation program
3106 Search term management table

Claims

In a document retrieval / delivery device for delivering a document retrieved from text data of document information obtained from one or more information sources,
Search condition expression registration means for registering a search condition expression specified by one or more users including one or more search terms;
Text search distribution means for determining success or failure of the search condition formula for the text when the text is obtained, and distributing the text to a user who satisfies the search condition formula;
A search condition expression deleting means for deleting the search condition expression when an instruction to delete the search condition expression is given ;
The search condition expression registration means includes:
Search condition expression analyzing means for extracting all search terms from the search condition expression;
A search term count table creating means for creating a search term count table that stores management information including the number of all search terms extracted from the user and the search condition formula of the user for each user;
Multiple character string matching table generating means for generating a multiple character string matching table to be referred to when all the search terms extracted from the search condition expression are verified by a single scan of text;
User list generation means for generating a user list in which user identifiers of users who have specified the search condition expression are connected as a list in correspondence with each search term extracted from the search condition expression;
A search term management table creating means for creating a search term management table storing the search terms extracted by the search condition expression analyzing means;
The search condition expression deleting means is:
A search condition expression management table deleting means for deleting information related to the search condition expression instructed to be deleted from the search term count table and the user list;
The search condition expression management table deleting means includes:
User list deletion means for deleting from the user list a user identifier of a user who specified the search condition expression corresponding to the search term included in the search condition expression instructed to be deleted with reference to the search term management table When,
A document search / delivery device , comprising: a search term count table deleting means for deleting user management information related to the search condition expression instructed to be deleted from the search term count table .

The document search / delivery apparatus according to claim 1 ,
When a search condition expression update instruction including a search condition expression deletion instruction and a registration instruction is received, a search condition expression deletion process by the search condition expression deletion means and a search condition expression registration process by the search condition expression registration means are performed. A document retrieval and delivery apparatus characterized by being performed continuously.

The document search / delivery device according to claim 1 or 2,
Distribution conditions for distributing searched text entered by a user after a certain period of time, distribution conditions for distributing text when a predetermined number of searched texts are accumulated, or by a predetermined time It has a distribution condition registration means for registering distribution conditions for batch distribution of searched texts,
The text search distribution means includes:
Text scanning means for determining success or failure of the search condition formula for the text by scanning the text only once;
A text distribution control unit that distributes the text to the user who has satisfied the search condition expression by the text scanning unit when the distribution condition registered by the distribution condition registration unit is satisfied. Document retrieval and delivery device.