JP2002269136A

JP2002269136A - Document retrieval system and program

Info

Publication number: JP2002269136A
Application number: JP2001071472A
Authority: JP
Inventors: Junichi Yamagata; 純一山形; Kazushige Asada; 一繁浅田; Hiroshi Takegawa; 弘志竹川; Tetsuya Ikeda; 哲也池田; Takuya Hiraoka; 卓也平岡; Katsumi Kanezaki; 克己金崎
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2001-03-14
Filing date: 2001-03-14
Publication date: 2002-09-20

Abstract

PROBLEM TO BE SOLVED: To enable reversing display before different notation normalization by a pattern matching at an outputting side of retrieval results in retrieval that absorbs fluctuation of a different notation character string. SOLUTION: A document registration request processing means 2 performs different notation normalization of a document requested to be registered in a database 3 by a database operation request inputting means 2 to prepare an index and holds a character string before being subjected to different notation normalization in a prepared index even after the different notation normalization. In retrieving a document from the database 3, a document retrieval request processing means 5 performs different notation normalization of a retrieval character string, performs retrieval, and outputs a document hit by the retrieval and a character string that is before being subjected to different notation normalization of a character string hit in the retrieval and held in the index.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、文書検索システ
ム及びプログラムに関する。[0001] The present invention relates to a document search system and a program.

【０００２】[0002]

【従来の技術】特開平10-334118号公報には、文書から
テキストを抽出し索引を作成することや、正規表現辞
書、単語索引を使用し索引サイズの低減および異表記の
揺らぎを吸収する点が開示されている。2. Description of the Related Art Japanese Unexamined Patent Publication No. Hei 10-334118 discloses a technique for extracting an index by extracting a text from a document, reducing a size of an index by using a regular expression dictionary and a word index, and absorbing fluctuations of different notations. Is disclosed.

【０００３】特開平08-263508号公報には、検索文字列
の異表記文字列を複数作成し索引無しで揺らぎを吸収し
た全文検索を行う点が開示されている。[0003] Japanese Patent Application Laid-Open No. 08-263508 discloses that a plurality of differently described character strings of a search character string are created and a full-text search is performed without an index to absorb fluctuations.

【０００４】[0004]

【発明が解決しようとする課題】例えば、特定のアプリ
ケーションでしか表示・編集の出来ない形式の文書複数
に対して、全文検索のインデックスを付ける場合、１．対象文書から全文テキストを抽出する。２．抽出した全文テキストに索引を付ける。という手順になる。For example, when indexing a plurality of documents in a format that can be displayed and edited only by a specific application, a full-text search is performed. Extract the full text from the target document. 2. Index the extracted full text. The procedure is as follows.

【０００５】ある検索条件が与えられたとき、その検索
文字列で索引を引くことになるが、この方式では異表記
文字列（例えば「インターフォン」で検索した場合は
「インタフォーン」や「インタフォン」は意味としては
同一であるが表記が異なる文字列）を探し出すことは出
来ない。[0005] When a certain search condition is given, an index is searched using the search character string. In this method, when a search is performed using a different notation character string (for example, "interphone" or "interphone" ] Is a character string with the same meaning but a different notation).

【０００６】そこで、抽出した全文テキストに対して異
表記正規化を行い、上記例のような異表記文字列を統一
された表記方法に変換し、それに対して索引を付けるこ
とを行うことができる。また、検索文字列も異表記正規
化することにより、異表記を意識すること無く検索が可
能である。[0006] Then, the notation normalization is performed on the extracted full-text text, and the notation character string as in the above example can be converted into a unified notation method, and an index can be added to the converted character string. . Also, by normalizing the search character string in a different notation, a search can be performed without being aware of the different notation.

【０００７】しかし、この方法でヒットした文字列は元
の文書ではどのような文字列であったかは判別できず、
そのため、アプリケーションでヒットした文書を表示し
ヒットした文字列を反転表示させようとした場合、アプ
リケーション側でヒットした正確な位置を求める必要が
あるが、そのためには全文検索システムと同じアルゴリ
ズムでその文書を異表記正規化して検索し直す必要があ
り、効率的ではない。However, the character string hit by this method cannot be determined as to what character string was in the original document.
Therefore, when displaying a document hit by the application and trying to reverse the display of the hit character string, it is necessary to find the exact position of the hit on the application side. It is necessary to normalize in different notation and search again, which is not efficient.

【０００８】この発明の目的は、異表記文字列の揺らぎ
を吸収した検索において、検索結果の出力側でのパター
ンマッチによる異表記正規化前の反転表示を可能とする
ことである。SUMMARY OF THE INVENTION It is an object of the present invention to enable reverse display before normalization of a different notation by pattern matching on the output side of a search result in a search that absorbs fluctuation of a different notation character string.

【０００９】この発明の目的は、検索にヒットした場
合、異表記正規化された文字列ではなく、異表記正規化
される前の文字列を検索結果として出力できるようにす
ることである。SUMMARY OF THE INVENTION It is an object of the present invention to output, as a search result, a character string that has not been normalized in a notation but not a character string that has been normalized in a different notation when a search hits.

【００１０】この発明の目的は、システムを構成するク
ライアント側では、検索でヒットした文字列の反転表示
を行う際に異表記正規化作業を不要とすることである。[0010] It is an object of the present invention to eliminate the need for normalization work for a different notation when a character string hit in a search is displayed in reverse on the client side constituting the system.

【００１１】[0011]

【課題を解決するための手段】請求項１に記載の発明
は、登録を要求された文書の索引を作成して当該文書を
データベースに登録し、前記索引を用いて前記データベ
ース中の複数の文書を対象として与えられた検索文字列
に基づいて検索を行う文書検索システムにおいて、登録
要求文書を異表記正規化して前記索引の作成を行い、こ
の異表記正規化した後も異表記正規化される前の文字列
を作成後の索引中に保持する文書登録要求処理手段と、
前記検索文字列を異表記正規化して前記検索を行ない、
この検索でヒットした文書と当該検索にヒットした文字
列の前記異表記正規化される前の文字列とを出力する文
書検索要求処理手段と、を備えていることを特徴とする
文書検索システムである。According to the first aspect of the present invention, an index of a document requested to be registered is created, the document is registered in a database, and a plurality of documents in the database are registered using the index. In a document search system that performs a search based on a search character string given as a target, the registration request document is normalized in a different notation, and the index is created. Document registration request processing means for retaining the previous character string in the created index;
Performing the search by normalizing the search string in a different notation,
A document search request processing means for outputting a document hit in the search and a character string hit in the search before the notation is normalized. is there.

【００１２】したがって、異表記文字列の揺らぎを吸収
した検索において、検索結果の出力側でのパターンマッ
チによる異表記正規化前の反転表示が可能となる。Therefore, in a search that absorbs fluctuations of a character string of a different notation, inverted display before normalization of the different notation by a pattern match on the output side of the search result becomes possible.

【００１３】請求項２に記載の発明は、請求項１に記載
の文書検索システムにおいて、前記文書登録要求処理手
段は、前記登録要求文書からテキストを抜き出し前記異
表記正規化された全文テキストに対して前記索引の作成
を行うことを特徴とする。According to a second aspect of the present invention, in the document search system according to the first aspect, the document registration request processing means extracts a text from the registration request document, The index is created by using

【００１４】したがって、登録要求文書からテキストを
抜き出し異表記正規化された全文テキストに対して索引
の作成を行うことができる。Therefore, it is possible to extract the text from the registration request document and create an index for the full text which has been normalized in different notation.

【００１５】請求項３に記載の発明は、請求項１又は２
に記載の文書検索システムにおいて、前記文書検索要求
処理手段は、前記検索にヒットした文書につき前記異表
記正規化された文字列に代えて前記索引中の異表記正規
化される前の文字列を出力することを特徴とする。[0015] The invention described in claim 3 is the first or second invention.
In the document search system described in the above, the document search request processing means, for the document hit in the search, instead of the character string normalized by the different notation, the character string before the different notation normalization in the index It is characterized by outputting.

【００１６】したがって、検索にヒットした場合、異表
記正規化された文字列ではなく、異表記正規化される前
の文字列を検索結果として出力することができる。[0016] Therefore, when a search is hit, a character string before being normalized in a different notation can be output as a search result instead of a character string normalized in a different notation.

【００１７】請求項４に記載の発明は、請求項１〜３の
何れかの一に記載の文書検索システムにおいて、クライ
アントサーバシステムにより構築され、前記文書登録要
求処理手段及び文書検索要求処理手段による異表記正規
化作業をサーバ上で行うことにより、クライアント側で
は前記検索でヒットした文字列を前記異表記正規化を省
いてディスプレイ上に反転表示することを特徴とする。According to a fourth aspect of the present invention, in the document search system according to any one of the first to third aspects, the document search request processing means is constructed by a client server system, and is configured by the document registration request processing means and the document search request processing means. By performing the different notation normalization operation on the server, the client side reversely displays the character string hit in the search on a display without the different notation normalization.

【００１８】したがって、クライアント側では、検索で
ヒットした文字列の反転表示を行う際に異表記正規化作
業を不要とすることができる。Accordingly, the client side can eliminate the need for normalization of the notation when reversely displaying the character string hit in the search.

【００１９】請求項５に記載の発明は、登録を要求され
た文書の索引を作成して当該文書をデータベースに登録
し、前記索引を用いて前記データベース中の複数の文書
を対象として与えられた検索文字列に基づいて検索を行
う処理をコンピュータに実行させるプログラムにおい
て、登録要求文書を異表記正規化して前記索引の作成を
行い、この異表記正規化した後も異表記正規化される前
の文字列を作成後の索引中に保持する文書登録要求処理
と、前記検索文字列を異表記正規化して前記検索を行な
い、この検索でヒットした文書と当該検索にヒットした
文字列の前記異表記正規化される前の文字列とを出力す
る文書検索要求処理と、コンピュータに実行させること
を特徴とするプログラムである。According to a fifth aspect of the present invention, an index of a document requested to be registered is created, the document is registered in a database, and a plurality of documents in the database are given using the index. In a program for causing a computer to execute a process of performing a search based on a search character string, the registration request document is normalized in a different notation, and the index is created. A document registration request process for holding a character string in an index after creation, and performing the search by normalizing the search character string in a different notation, and performing the search in the different notation of a document hit in the search and a character string hit in the search A program for causing a computer to execute a document search request process that outputs a character string before being normalized and a computer.

【００２０】したがって、異表記文字列の揺らぎを吸収
した検索において、検索結果の出力側でのパターンマッ
チによる異表記正規化前の反転表示が可能となる。Therefore, in a search that absorbs fluctuations of a character string of a different notation, inverted display before normalization of the different notation by pattern matching on the output side of the search result becomes possible.

【００２１】請求項６に記載の発明は、請求項５に記載
のプログラムにおいて、前記文書登録要求処理は、前記
登録要求文書からテキストを抜き出し前記異表記正規化
された全文テキストに対して前記索引の作成を行うこと
を特徴とする。According to a sixth aspect of the present invention, in the program according to the fifth aspect, the document registration request processing includes extracting a text from the registration request document, and indexing the indexed text with respect to the full-text normalized. Is created.

【００２２】したがって、登録要求文書からテキストを
抜き出し異表記正規化された全文テキストに対して索引
の作成を行うことができる。Therefore, it is possible to extract the text from the registration request document and create an index for the full text which has been normalized in different notation.

【００２３】請求項７に記載の発明は、請求項５又は６
に記載のプログラムにおいて、前記文書検索要求処理
は、前記検索にヒットした文書につき前記異表記正規化
された文字列に代えて前記索引中の異表記正規化される
前の文字列を出力することを特徴とする。The invention according to claim 7 is the invention according to claim 5 or 6.
Wherein the document search request process outputs a character string before the alternative notation normalization in the index in place of the alternative notation normalized character string for the document hit in the search. It is characterized by.

【００２４】したがって、検索にヒットした場合、異表
記正規化された文字列ではなく、異表記正規化される前
の文字列を検索結果として出力することができる。Therefore, when a search is hit, a character string before being normalized in a different notation but not a character string normalized in a different notation can be output as a search result.

【００２５】[0025]

【発明の実施の形態】この発明の一実施の形態について
説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS One embodiment of the present invention will be described.

【００２６】図１は、この発明の文書検索システム１の
機能ブロック図である。図１に示すように、データベー
ス操作要求入力手段２は、例えば、サーバに用意された
データベース３などの操作要求を入力するクライアント
端末からなり、ユーザはデータベース操作要求入力手段
２から、グラフィカルあるいはキャラクターベースのイ
ンタフェースを介して、データベース操作要求を入力す
る。入力した操作要求に対する処理の結果は再びデータ
ベース操作要求入力手段２に送られ、ディスプレイに表
示するなどしてユーザに知らされる。また、特定のアプ
リケーションでしか内容を表示、編集できない文書を扱
う場合は、データベース操作要求入力手段２であるユー
ザのクライアント端末にそのアプリケーションがインス
トールされている必要がある。FIG. 1 is a functional block diagram of the document search system 1 of the present invention. As shown in FIG. 1, the database operation request input means 2 comprises, for example, a client terminal for inputting an operation request such as a database 3 prepared in a server. A database operation request is input via the interface. The result of the process for the input operation request is sent again to the database operation request input means 2 and displayed on the display to notify the user. When handling a document whose contents can be displayed and edited only by a specific application, the application needs to be installed in the user's client terminal which is the database operation request input unit 2.

【００２７】データベース３は、データベース操作要求
入力手段２から登録を要求された文書を登録し、また、
この登録を要求された文書から作成した索引を記憶す
る。The database 3 registers a document requested to be registered by the database operation request input means 2, and
The index created from the document requested to be registered is stored.

【００２８】文書登録要求処理手段４について、その処
理手順の例を、図２に示すフローチャートを参照して説
明する。図２に示すように、文書登録要求、登録文書を
受け取ると（ステップＳ１）、文書登録要求された文書
を検査し（ステップＳ２）、全文テキストの抽出が可能
であれば（ステップＳ３のＹ）、処理を続行し、そうで
なければ終了する。全文テキストの抽出が可能な場合は
（ステップＳ３のＹ）、文書登録要求された文書をデー
タベースに保存しユニークな番号（以下、文書ＩＤ）を
付け、全文テキストを抽出する（ステップＳ４）。そし
て、抽出された全文テキストに対して異表記正規化を行
い、表記の揺らぎを取り除き（ステップＳ５）、異表記
正規化された全文テキストに対して索引作成する（ステ
ップＳ６）。An example of the processing procedure of the document registration request processing means 4 will be described with reference to the flowchart shown in FIG. As shown in FIG. 2, when a document registration request and a registered document are received (step S1), the document requested to be registered is inspected (step S2), and if full-text data can be extracted (Y in step S3). , Continue the process, otherwise end. If the full text can be extracted (Y in step S3), the document requested to be registered is stored in a database, a unique number (hereinafter, document ID) is assigned, and the full text is extracted (step S4). Then, the notation normalization is performed on the extracted full-text text to remove the fluctuation of the notation (step S5), and an index is created for the all-text normalized normalization (step S6).

【００２９】ステップＳ５で行う「異表記正規化」とは
次のような処理である。すなわち、表記は異なるが、そ
の意味するところは同じ文字列がある。例えば、「コン
ピューター」と「コンピュータ」のような長音の有無
や、「ＡＢＣ」と「ａｂｃ」のような大文字、小文字の
違い、「アクティブ」と「アクテイブ」のような拗音で
あるかどうかの違い、「ベランダ」と「ヴェランダ」の
ような発音の表記による違いなどである。そして、これ
を一意の表記に変換することを異表記正規化という。"Normalization normalization" performed in step S5 is the following processing. That is, although the notation is different, the meaning is the same character string. For example, the presence or absence of long sounds like "computer" and "computer", the difference between uppercase and lowercase letters such as "ABC" and "abc", and the difference between whether or not the sound is muted like "active" and "active" , "Veranda" and "Veranda" are the differences in pronunciation notation. Then, converting this into a unique notation is referred to as a notation normalization.

【００３０】文書登録要求処理手段４では、異表記正規
化した後も異表記正規化される前の文字列を索引中に保
持する。そのため、全文テキストの異表記正規化された
部分に、異表記正規化前の文字列を併せて記録する手法
を採る。例えば、「ヘキサ」が「ヘクサ」へ異表記正規
化された場合、異表記正規化後の全文テキストは「……
ヘ“キ，ク”サ……」となる。この際、「“」はエスケ
ープ文字となり全文テキスト中で特殊な扱いをせねばな
らない。異表記正規化前の全文テキストに「”」が使用
されていた場合は「“”」のように２つ並べることで区
別する。The document registration request processing means 4 retains, in the index, the character string that has not been normalized after the abnormal notation normalization. For this reason, a technique is employed in which the character string before the normalization of the variant notation is recorded together with the part of the full text that has been normalized. For example, if “hex” is abnormally normalized to “hexa”, the full-text text after the abnormal notation normalization is “……”.
Hey, "K, Ku" ... At this time, "" becomes an escape character and must be treated specially in the full text. When "" is used in the full-text before the notation normalization, it is distinguished by arranging two such as "".

【００３１】ステップＳ６で行う索引作成は次のように
行う。すなわち、前記手段によって異表記正規化された
全文テキストに対して索引を作成する。索引は喩えて言
うなら電話帳のようなものであり、文字列を人名とする
と、文書ＩＤ、出現回数、出現位置、（もし異表記正規
化された文字列ならば）異表記正規化前の文字列、が電
話番号である。電話帳で、人名が見つかれば電話番号が
分かるように、検索文字列が索引から見つかれば、それ
に対応する文書ＩＤ、出現回数、出現位置、（もし異表
記正規化された文字列ならば）異表記正規化前の文字列
が分かる。異表記正規化される前の文字列を記録するこ
とにより、異表記正規化後の文字列での全文検索におい
ても異表記正規化前の文字列を取り出すことが可能とな
る。The index creation performed in step S6 is performed as follows. That is, an index is created for the full-text text that has been normalized in the different notation by the means. The index is analogous to a telephone directory. If a character string is a person's name, the document ID, the number of appearances, the appearance position, (if the character string has been normalized in a different notation), The character string is a telephone number. In the telephone directory, if a person's name is found, a telephone number can be found, and if a search character string is found in an index, a corresponding document ID, the number of appearances, an appearance position, (if a character string normalized in a different notation) are different. You can see the character string before notation normalization. By recording the character string before the normalization in the different notation, the character string before the normalization in the different notation can be extracted even in a full-text search using the character string after the normalization in the different notation.

【００３２】文書検索要求処理手段５について、その処
理手順の例を、図３に示すフローチャートを参照して説
明する。図３に示すように、文書検索要求とともに検索
文字列を受け取ると（ステップＳ１１）、検索文字列に
対して異表記正規化を行い、揺らぎを吸収する（ステッ
プＳ１２）。そして、文書登録時に作成された索引を利
用し高速に全文検索を行い（ステップＳ１３）、ヒット
した場合はそれを含む文書の文書ＩＤ、異表記正規化さ
れる前の文字列があればその文字列を、ヒットしなかっ
た場合はその旨を返す（ステップＳ１４）。そして、ア
プリケーション側でその文字列を使用しパターンマッチ
を行うことにより反転表示をさせる（ステップＳ１
５）。An example of the processing procedure of the document search request processing means 5 will be described with reference to the flowchart shown in FIG. As shown in FIG. 3, when a search character string is received together with a document search request (step S11), normalization is performed on the search character string to absorb fluctuations (step S12). Then, full-text search is performed at high speed using the index created at the time of document registration (step S13). If there is a hit, the document ID of the document containing the hit, and if there is a character string before being normalized in a different notation, the character If no hit is found in the column, that fact is returned (step S14). Then, the application is used to perform pattern matching using the character string to cause the application to perform reverse display (step S1).
5).

【００３３】図４は、文書検索システム１のハードウェ
ア構成を概略的に示すブロック図である。図４に示すよ
うに、この文書検索システム１は、前記サーバの構成を
示すもので、各部を制御する機能を有するＣＰＵ１２
と、ＢＩＯＳなどを格納したＲＯＭ１３と、ＣＰＵ１２
の作業エリアとなるＲＡＭ１４とが、バス１５で接続さ
れている。FIG. 4 is a block diagram schematically showing a hardware configuration of the document search system 1. As shown in FIG. As shown in FIG. 4, the document search system 1 shows the configuration of the server, and has a CPU 12 having a function of controlling each unit.
, A ROM 13 storing a BIOS and the like, and a CPU 12
And a RAM 14 serving as a work area of the computer.

【００３４】バス１５には、ハードディスク１６と、キ
ーボード、マウス等の入力装置１７と、ＣＲＴ、ＬＣＤ
等の表示装置１８と、ＣＤ，ＤＶＤ，ＦＤなどの記憶媒
体１９に対するデータの読み取りを行なう記憶媒体読取
装置２０と、ネットワーク２１に接続する通信制御装置
２２とが接続されている。The bus 15 has a hard disk 16, an input device 17 such as a keyboard and a mouse, a CRT and an LCD.
, A storage medium reading device 20 that reads data from a storage medium 19 such as a CD, DVD, or FD, and a communication control device 22 that is connected to a network 21.

【００３５】ハードディスク１６には、この発明のプロ
グラムを実現する文書管理プログラム等各種プログラム
が記憶されている。このプログラムは、記憶媒体１９か
ら記憶媒体読取装置２０により読み取るか、あるいは、
ネットワーク２１を介してインターネットなどからダウ
ンロードするなどして、ハードディスク１６にインスト
ールしたものである。このインストールにより文書検索
システム１は動作可能な状態となる。なお、文書管理プ
ログラム等のプログラムは、特定のアプリケーションソ
フトの機能の一部をなすものであってもよい。また、所
定のＯＳ上で動作するものであってもよい。The hard disk 16 stores various programs such as a document management program for implementing the program of the present invention. This program is read from the storage medium 19 by the storage medium reading device 20, or
It is installed on the hard disk 16 by downloading from the Internet or the like via the network 21. With this installation, the document search system 1 becomes operable. Note that a program such as a document management program may be a part of a function of a specific application software. Further, it may operate on a predetermined OS.

【００３６】文書管理プログラム等のプログラムによ
り、サーバには、データベース３が構築され、文書登録
要求処理手段４、文書検索要求処理手段５の機能をサー
バにおいて実現する。ネットワーク２１はＬＡＮであ
り、データベース操作要求入力手段２を実現する前記の
クライアント（図示せず）と接続されている。The database 3 is built in the server by a program such as a document management program, and the functions of the document registration request processing means 4 and the document search request processing means 5 are realized in the server. The network 21 is a LAN, and is connected to the client (not shown) that implements the database operation request input unit 2.

【００３７】[0037]

【発明の効果】請求項１に記載の発明は、異表記文字列
の揺らぎを吸収した検索において、検索結果の出力側で
のパターンマッチによる異表記正規化前の反転表示が可
能となる。According to the first aspect of the present invention, it is possible to perform reverse display before normalization of different notation by pattern matching on the output side of the search result in search in which fluctuation of the different notation character string is absorbed.

【００３８】請求項２に記載の発明は、請求項１に記載
の文書検索システムにおいて、登録要求文書からテキス
トを抜き出し異表記正規化された全文テキストに対して
索引の作成を行うことができる。According to a second aspect of the present invention, in the document search system according to the first aspect, text can be extracted from a registration request document and an index can be created for a full-text normalized in a different notation.

【００３９】請求項３に記載の発明は、請求項１又は２
に記載の文書検索システムにおいて、検索にヒットした
場合、異表記正規化された文字列ではなく、異表記正規
化される前の文字列を検索結果として出力することがで
きる。The third aspect of the present invention is the first or second aspect.
In the document search system described in (1), when a search is hit, a character string before being subjected to the variant notation normalization can be output as a search result, instead of a character string normalized to the variant notation.

【００４０】請求項４に記載の発明は、請求項１〜３の
何れかの一に記載の文書検索システムにおいて、クライ
アント側では、検索でヒットした文字列の反転表示を行
う際に異表記正規化作業を不要とすることができる。According to a fourth aspect of the present invention, in the document search system according to any one of the first to third aspects, when the client side reversely displays a character string hit in the search, the client uses a different notation regularity. It is possible to eliminate the need for conversion work.

【００４１】請求項５に記載の発明は、異表記文字列の
揺らぎを吸収した検索において、検索結果の出力側での
パターンマッチによる異表記正規化前の反転表示が可能
となる。According to the fifth aspect of the present invention, in a search that absorbs fluctuations in a different notation character string, it is possible to perform inverted display before normalization of the different notation by pattern matching on the output side of the search result.

【００４２】請求項６に記載の発明は、請求項５に記載
のプログラムにおいて、登録要求文書からテキストを抜
き出し異表記正規化された全文テキストに対して索引の
作成を行うことができる。According to a sixth aspect of the present invention, in the program according to the fifth aspect, text can be extracted from the registration request document and an index can be created for the full text which has been normalized in a different notation.

【００４３】請求項７に記載の発明は、請求項５又は６
に記載のプログラムにおいて、検索にヒットした場合、
異表記正規化された文字列ではなく、異表記正規化され
る前の文字列を検索結果として出力することができる。The invention according to claim 7 is the invention according to claim 5 or 6
If the search hits in the program described in
It is possible to output a character string before being normalized in a different notation, as a search result, instead of a character string normalized in a different notation.

[Brief description of the drawings]

【図１】この発明の一実施の形態である文書検索システ
ムの機能ブロック図である。FIG. 1 is a functional block diagram of a document search system according to an embodiment of the present invention.

【図２】前記文書検索システムの文書登録要求処理手段
が行う処理を説明するフローチャートである。FIG. 2 is a flowchart illustrating a process performed by a document registration request processing unit of the document search system.

【図３】前記文書検索システムの文書検索要求処理手段
が行う処理を説明するフローチャートである。FIG. 3 is a flowchart illustrating a process performed by a document search request processing unit of the document search system.

【図４】前記文書検索システムの電気的な接続を示すブ
ロック図である。FIG. 4 is a block diagram showing an electrical connection of the document search system.

[Explanation of symbols]

１文書検索システム３データベース４文書登録要求処理手段５文書検索要求処理手段 DESCRIPTION OF SYMBOLS 1 Document search system 3 Database 4 Document registration request processing means 5 Document search request processing means

───────────────────────────────────────────────────── フロントページの続き (72)発明者竹川弘志東京都大田区中馬込１丁目３番６号株式会社リコー内 (72)発明者池田哲也東京都大田区中馬込１丁目３番６号株式会社リコー内 (72)発明者平岡卓也東京都大田区中馬込１丁目３番６号株式会社リコー内 (72)発明者金崎克己東京都大田区中馬込１丁目３番６号株式会社リコー内Ｆターム(参考） 5B009 QA12 QA15 RB32 VA02 5B075 ND03 NK02 NK35 NK49 PQ02 PQ22 ──────────────────────────────────────────────────続き Continuing on the front page (72) Inventor Hiroshi Takekawa 1-3-6 Nakamagome, Ota-ku, Tokyo Inside Ricoh Company (72) Inventor Tetsuya Ikeda 1-3-6 Nakamagome, Ota-ku, Tokyo Stock Inside Ricoh Company (72) Inventor Takuya Hiraoka 1-3-6 Nakamagome, Ota-ku, Tokyo Stock Company Ricoh Company (72) Katsumi Kanazaki 1-6-6 Nakamagome, Ota-ku, Tokyo Stock Company Ricoh Company F Terms (reference) 5B009 QA12 QA15 RB32 VA02 5B075 ND03 NK02 NK35 NK49 PQ02 PQ22

Claims

[Claims]

1. An index of a document requested to be registered is created, the document is registered in a database, and a plurality of documents in the database are searched based on a given search character string using the index. In the document search system for performing the above, the registration request document is normalized in different notation and the index is created, and the character string before being normalized in different notation is retained in the created index even after the normalized notation. Document registration request processing means, performing the search by normalizing the search character string in a different notation,
A document search system, comprising: a document search request processing unit that outputs a document hit in the search and a character string hit in the search before being normalized in a different notation.

2. The method according to claim 1, wherein the document registration request processing unit extracts a text from the registration request document and creates the index for the all-text normalized normalization. Document search system.

3. The document search request processing unit outputs a character string before being subjected to the abnormal notation normalization in the index, in place of the character string normalized by the abnormal notation for the document hit in the search. The document search system according to claim 1, wherein:

4. A client-server system, wherein the document registration request processing means and the document search request processing means perform a different notation normalization operation on a server, so that the client can convert the character string hit in the search into the character string. 4. The document search system according to claim 1, wherein the document is inverted and displayed on a display without normalization of a different notation.

5. An index of a document requested to be registered is created, the document is registered in a database, and a plurality of documents in the database are searched based on a given search character string using the index. In a program that causes a computer to execute the process of performing the above, an index is created by normalizing the registration request document in a different notation, and the character string before creation of the character string before the different notation normalization after this notation normalization is performed. Document registration request processing to be held therein, and performing the search by normalizing the search character string in a different notation,
A program for causing a computer to execute a document search request process for outputting a document that has been hit in the search and a character string that has been hit in the search before being normalized in a different form.

6. The program according to claim 5, wherein in the document registration request processing, text is extracted from the registration request document and the index is created with respect to the all-text normalized normalization. .

7. The document search request process according to claim 1, further comprising outputting, in the index, a character string that has not been normalized in the index, in place of the normalized character string in the document hit in the search. The program according to claim 5 or 6, wherein