JPH07319892A

JPH07319892A - Character string collation device

Info

Publication number: JPH07319892A
Application number: JP6136572A
Authority: JP
Inventors: Akio Yamashita; 明男山下; Hiroshi Yamaguchi; 浩山口; Makoto Ando; 誠安藤; Kazuo Aihara; 一雄相原; Tatsuomi Kita; 辰臣喜多
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 1994-05-26
Filing date: 1994-05-26
Publication date: 1995-12-08

Abstract

PURPOSE:To speedily and appropriately search a character string obtained by means of showing a word which a user intends by free notation by including a similar notation. CONSTITUTION:A text storage means 5 holding a text, a character string storage means 2 holding the character string to be searched, a character string deformation means 8 changing the notations of the character string in the text and the character string to be searched in accordance with the same rule 9 and a collation means 6 collating whether the character string whose notation is changed is included in the text whose notation is changed are provided. The searched character string held in the character string storage means 2 and the text held by the text storage means 5 are deformed into the same notations by the character string deformation means 8 with the same rule. The character string collation means 6 collates the searched character string whose notation is deformed with the text, and collates whether the searched character string is included in the text.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、探索する文字列が対象
テキストに含まれるか否かを照合する文字列照合装置に
関し、特に、文字列の表記の揺れによる影響を解消した
文字列照合装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character string collating device for collating whether or not a character string to be searched is included in a target text, and more particularly to a character string collating device which eliminates the influence of fluctuation in the notation of the character string. Regarding

【０００２】[0002]

【従来の技術】テキストファイルをスキャンしてユーザ
が指定した文字列を照合する機能やツールは、情報検索
システムのみならずエディタシステム等にも広く利用さ
れている。このような従来の文字列照合装置は、ユーザ
が探索のために指定した文字列と完全に一致する文字列
をテキストファイルで照合するものであった。しかしな
がら、日本語においては、例えば、「ウィンドウ」と
「ウインドウ」、「インデックス」と「インデクス」、
「インターフェース」と「インタフェイス」等のよう
に、同じ言葉を示すにも拘わらずカタカナ表記に統一性
がなく、類似する表記が存在するという事情がある。ま
た、カタカナ表記に限らず、「読み取り装置」、「読取
り装置」、「読取装置」等のように送り仮名の表記等に
も統一性がないという事情がある。このため、ユーザが
指定した文字列を完全に一致する文字列を探索していて
は、類似表記の文字列が対象から漏れ、ユーザが意図す
る言葉をテキストファイルから探索できないこととな
る。2. Description of the Related Art Functions and tools for scanning a text file and collating a character string designated by a user are widely used not only in an information retrieval system but also in an editor system. Such a conventional character string collating device collates a character string that exactly matches a character string designated by the user for a search with a text file. However, in Japanese, for example, "window" and "window", "index" and "index",
There is a situation in which katakana notation is not uniform even though the same word is shown, and similar notations exist, such as “interface” and “interface”. Further, there is a situation that not only the katakana notation but also the notation of the sending kana such as “reading device”, “reading device”, “reading device”, etc. is not uniform. Therefore, when searching for a character string that exactly matches the character string specified by the user, a character string of similar notation is omitted from the target, and the word intended by the user cannot be searched from the text file.

【０００３】上記のような問題点を改善し得るものとし
て、従来より次のような技術が知られている。文献”Ｕ
ＮＩＸプログラミング環境”（Brian W.Kernighan, Rob
Pike著、アスキー出版局、１９８５年）に記載される
情報検索方法（ｅｇｒｅｐ）は、探索する文字列を類似
表記を包含するパターンにし、このパターンに一致する
文字列を含むテキストの行を全て表示するものである。
例えば、「ウインドウ」と「ウィンドウ」という表記を
包含させる場合には、ｏｒ演算子の表現”｜”を用いて
これら文字列を’ウインドウ｜ウィンドウ’或いは’ウ
（イ｜ィ）ンドウ’というパターンとし、また、「読み
取り」、「読取り」、「読取」という表記を包含させる
場合には、ゼロ又は１つの文字を意味する表現”？”を
用いてこれら文字列を’読み？取り？’というパターン
とする。したがって、この情報検索方法によれば、ユー
ザが想定した類似表記を全て含めて文字列照合を行うこ
とができる。The following techniques have been conventionally known to improve the above problems. Literature "U
UNIX programming environment "(Brian W. Kernighan, Rob
The information retrieval method (egrep) described by Pike, ASCII Publishing Office, 1985) makes a character string to be searched into a pattern including a similar notation, and displays all lines of text including the character string matching this pattern. To do.
For example, when including the notations “window” and “window”, the expression “|” of the or operator is used to convert these character strings into a pattern of “window | window” or “window”. In addition, when the expressions "read", "read", and "read" are included, the expression "?" Meaning zero or one character is used to read these character strings. take? The pattern is'. Therefore, according to this information search method, it is possible to perform character string matching including all similar notations assumed by the user.

【０００４】また、特開昭６２−１１９３２号公報に記
載される検索方法は、探索する文字列をカナ、ローマ
字、漢字表現といった異なる文字の表記まで含めて自動
的に類似表記に拡張し、これら類似表記の群に一致する
文字列を含むテキストの行を全て表示するものである。
したがって、この検索方法によれば、自動的に拡張され
た類似表記を全て含めて文字列照合を行うことができ
る。また、特開昭６３−２１１０２３号公報に記載され
る検索方法は、予めテキスト中のカタカナ表記の文字列
を長音は母音化する等して標準表記に変換し、この標準
表記を検索用のファイルに保持しておき、検索時には、
入力された文字列を同様に標準表記に変換して検索用の
ファイルを検索し、対応するテキストを検索するもので
ある。したがって、この検索方法によれば、類似表記の
カタカナ語の文字列を含めて文字列照合を行うことがで
きる。また、この他に、同義語辞書に同じ意味で異なる
表記の文字列を予め設定しておき、検索時には、入力さ
れた文字列を同義語辞書を参照して類似表記に展開して
検索を行う検索方法も知られている。Further, the search method described in Japanese Patent Laid-Open No. 62-11932 automatically expands a character string to be searched to a similar notation including notations of different characters such as kana, romaji and kanji. It displays all lines of text that contain strings that match a group of similar notations.
Therefore, according to this search method, it is possible to perform character string matching including all automatically expanded similar notations. In the search method described in Japanese Patent Laid-Open No. 63-211023, a character string in katakana notation in a text is converted into a standard notation by converting vowels into long vowels in advance, and the standard notation is used as a search file. To search for
Similarly, the input character string is converted into standard notation, the search file is searched, and the corresponding text is searched. Therefore, according to this search method, it is possible to perform the character string collation including the character string of similar written Katakana. Further, in addition to this, a character string of the same meaning and a different notation is set in advance in the synonym dictionary, and at the time of search, the input character string is expanded to similar notation by referring to the synonym dictionary and a search is performed. Search methods are also known.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、上記し
た従来の情報検索方法（ｅｇｒｅｐ）にあっては、類似
表記はユーザが指定しなければならないため、文字列の
どの部分で表記の揺れが生じ、更に、どのような類似表
記があるかということをユーザが予め知っていなければ
ならず、文字列照合の精度はユーザの知識に大きく依存
してしまうという問題がある。また、特開昭６２−１１
９３２号公報に記載される検索方法にあっては、文字列
は異なる文字の表記まで含めた類似表記に自動的に拡張
されてしまうため、例えば漢字で表記することのないよ
うなカタカナ表記の文字列もあえて漢字表記に変換して
文字列照合がなされることとなり、ユーザが必要としな
い表記についてまで照合がなされて、処理時間の増大や
システムの大型化を招いてしまうという問題がある。However, in the above-mentioned conventional information retrieval method (egrep), since the similar notation has to be specified by the user, the notation fluctuates at any part of the character string. Further, the user must know in advance what kind of similar notation exists, and the accuracy of character string matching greatly depends on the user's knowledge. Also, JP-A-62-11
In the search method described in Japanese Patent No. 932, since the character string is automatically expanded to a similar notation including notations of different characters, for example, characters in katakana notation that are not written in kanji The strings are also converted to the Kanji notation and the character string collation is performed, so that the notation not required by the user is collated, which causes an increase in processing time and an increase in the size of the system.

【０００６】また、特開昭６３−２１１０２３号公報に
記載される検索方法にあっては、テキストに登録時に予
め標準表記化できるような処理を施しておかなければ文
字列照合の対象とできないため、煩雑な準備を強いら
れ、また、この標準表記化のために検索用のファイルを
保持しておかなければならないため、システムが大型化
してしまうという問題がある。また、文字列を同義語辞
書を参照して類似表記に展開して検索を行う検索方法に
あっては、同義語辞書にない文字列は探索できないた
め、探索漏れをなくすためには膨大な同義語辞書を用意
しておかなければならず、システムが大型化してしまう
という問題がある。Further, in the search method described in Japanese Patent Laid-Open No. 63-211023, the text cannot be subjected to character string collation unless the text is processed in advance so that it can be standardized. However, there is a problem that the system becomes large in size because complicated preparations are required and a search file must be retained for standardization. In addition, in the search method in which a synonym dictionary is referred to and expanded to a similar notation and a search is performed, a character string that is not in the synonym dictionary cannot be searched. There is a problem that the system becomes large because a word dictionary must be prepared.

【０００７】本発明は上記従来の事情に鑑みなされたも
ので、上記の問題を合理的に解決し、ユーザが意図した
語を自由な表記で表した文字列を類似表記を含めて適切
に探索できる文字列照合装置を提供することを目的とす
る。The present invention has been made in view of the above-mentioned conventional circumstances, and rationally solves the above problem, and appropriately searches for a character string in which a word intended by the user is expressed in free notation including similar notations. It is an object of the present invention to provide a character string collating device that can perform the above.

【０００８】[0008]

【課題を解決するための手段】本発明の文字列照合装置
は、探索する文字列がテキストに含まれているかを照合
する文字列照合装置において、テキストを保持するテキ
スト記憶手段と、探索する文字列を保持する文字列記憶
手段と、テキスト中の文字列と探索する文字列とを同一
の規則に従って表記を変更する文字列変形手段と、表記
を変更されたテキストに表記を変更された文字列が含ま
れているかを照合する照合手段とを備えたことを特徴と
する。A character string collating device of the present invention is a character string collating device for collating whether a character string to be searched is included in text, and a text storage means for holding the text and a character to be searched. A character string storage means for holding a string, a character string transformation means for changing the notation of the character string in the text and the searched character string according to the same rule, and a character string for which the notation has been changed for the changed text And a collating unit that collates whether or not is included.

【０００９】なお、本発明はカタカナ表記の文字列につ
いてばかりではなく、表記変換のための規則によって
は、漢字仮名混じり表記やローマ字表記といった他の文
字種による文字列の照合に用いることができ、この場合
にもこれら文字種の文字列における類似表記を適切に照
合して探索することができる。また、表記を変形するた
めの規則は種々設定できるものであり、要は文字列を常
に一定の表記に変形できる規則であればよい。The present invention can be used not only for character strings in katakana notation but also for collating character strings in other character types such as kanji and kana mixed notation or roman character notation depending on the rules for notation conversion. Also in this case, similar notations in character strings of these character types can be appropriately collated and searched. Further, various rules for transforming the notation can be set, and the point is that the rule can always transform the character string into a constant notation.

【００１０】[0010]

【作用】本発明の文字列照合装置によると、文字列変形
手段が文字列記憶手段に保持されている探索文字列とテ
キスト記憶手段に保持されているテキストとを同一の規
則で同一の表記に変形し、文字列照合手段がこれら表記
を変形された探索文字列をテキストに照合して、探索文
字列がテキストに含まれているかを判別する。According to the character string collating apparatus of the present invention, the character string transforming means writes the search character string held in the character string storing means and the text held in the text storing means under the same rule and the same notation. The transformed character string collating means collates the search character string obtained by transforming these notations with the text to determine whether the search character string is included in the text.

【００１１】[0011]

【実施例】以下、カタカナ表記の文字列の探索に適用し
た本発明の一実施例に係る文字列照合装置を図面を参照
して説明する。図１に示すように、本実施例の文字列照
合装置は、探索する文字列をユーザが入力するための探
索文字列入力手段１と、この探索文字列及びその他の情
報を保持する探索文字列記憶手段２と、テキストファイ
ル３から文字列照合の対象となるテキスト行を読み込む
テキスト読み込み手段４と、このテキスト行及びその他
の情報を保持するテキスト記憶手段５と、探索文字列を
テキスト行に照合させる文字列照合手段６と、文字列照
合の結果を表示する表示手段７と、探索文字列とテキス
ト行の表記を変形する変形手段８と、表記変形のための
規則を記録した変形規則辞書９とを有している。DESCRIPTION OF THE PREFERRED EMBODIMENTS A character string collating apparatus according to an embodiment of the present invention applied to a search for a character string written in katakana will be described below with reference to the drawings. As shown in FIG. 1, the character string collating apparatus according to the present embodiment includes a search character string input means 1 for a user to input a character string to be searched, and a search character string holding the search character string and other information. A storage unit 2, a text reading unit 4 for reading a text line to be subjected to character string matching from a text file 3, a text storage unit 5 for holding this text line and other information, and a search character string for matching with a text line. The character string collating means 6, the display means 7 for displaying the result of the character string collation, the transforming means 8 for transforming the notation of the search character string and the text line, and the transformation rule dictionary 9 for recording the rules for the notation transformation. And have.

【００１２】探索文字列入力手段１からは探索文字列の
文字コードの他に制御コードも入力され、本実施例では
改行を意味する制御コードが入力されることによって文
字列照合手段６が起動される。探索文字列記憶手段２
は、探索文字列の他に、当該文字列の長さ（すなわち、
文字数）及び表記を変形された後の探索文字列とその長
さも保持する。テキスト読み込み手段４は、文字列照合
手段６の指示によって、テキストファイル３からテキス
トを１行分ずつ読み込んでテキスト記憶手段５に格納す
るものであり、対象とするテキストから読み込む行がな
くなったときに文字列照合装置による処理が終了する。
テキスト記憶手段５には、図５に示すように、読み込ま
れたテキスト行の内容を示す変数ｌｉｎｅの他に、テキ
スト行の文字数を示す変数ｌｌｅｎ、テキスト行中の照
合開始位置を示す変数ｉ、テキスト行中の部分文字列の
内容を示す変数ｓｔｒ、変数ｉの増分を示す変数ｉｎｃ
が格納され、これら格納情報に基づいて後述する照合処
理が行われる。A control code is input from the search character string input means 1 in addition to the character code of the search character string, and in this embodiment, the character string collating means 6 is activated by inputting a control code meaning a line feed. It Search character string storage means 2
Is the search string and the length of the string (ie,
The number of characters) and the search character string after the notation are transformed and their length are also retained. The text reading means 4 reads the text from the text file 3 line by line according to the instruction of the character string collating means 6 and stores it in the text storage means 5. When there are no more lines to read from the target text. The processing by the character string collating device ends.
In the text storage means 5, as shown in FIG. 5, in addition to the variable line indicating the content of the read text line, a variable llen indicating the number of characters in the text line, a variable i indicating the collation start position in the text line, Variable str indicating the contents of the partial character string in the text line, variable inc indicating the increment of variable i
Is stored, and a collating process described later is performed based on the stored information.

【００１３】文字列照合手段６は探索文字列記憶手段２
に保持された探索文字列をテキスト記憶手段５に保持さ
れたテキスト行中の部分文字列の内容（ｓｔｒ）に順次
照合し、これら文字列が一致する場合には当該テキスト
行を表示手段７に表示する。また、照合される部分文字
列の内容（ｓｔｒ）或いは探索文字列がカタカナ表記で
ある場合には、文字列照合手段６は変形手段８にこの文
字列の表記を変形させて照合処理を行う。変形規則辞書
９には図２及び図３に示す表記を変形するための規則が
記録されており、この規則を用いて変形手段８がカタカ
ナ表記の文字列に削除又は置換する変形処理を施す。例
えば、「ッ」、「ー」、「・」といった文字は削除さ
れ、「インデックス」は「インデクス」に、「バイナリ
ーサーチ」及び「バイナリ・サーチ」は「バイナリサ
チ」にそれぞれ変形される。The character string collating means 6 is a search character string storing means 2
The search character string stored in the text storage means 5 is sequentially collated with the content (str) of the partial character string in the text line stored in the text storage means 5, and if the character strings match, the text line is displayed in the display means 7. indicate. When the content (str) of the partial character string to be collated or the search character string is in katakana notation, the character string collating means 6 causes the transforming means 8 to transform the notation of this character string to perform the collating process. Rules for transforming the notations shown in FIGS. 2 and 3 are recorded in the transformation rule dictionary 9, and the transforming means 8 uses this rule to perform a transforming process of deleting or replacing a character string in katakana notation. For example, characters such as "tsu", "-", and "." Are deleted, "index" is transformed into "index", and "binary search" and "binary search" are transformed into "binary satchi".

【００１４】上記構成の文字列照合装置によれば、図４
に示す処理手順に従って文字列照合がなされる。ここ
に、以下の説明では、テキストファイル３には、第１行
目が「階層インデックスの構成」、第２行目が「検索は
バイナリ・サーチを用いて」という２行のテキストが格
納されているものとし、探索文字列入力手段１からは
「バイナリーサーチ」という探索文字列が入力されたも
のとする。According to the character string collating apparatus having the above-mentioned configuration, FIG.
Character string matching is performed according to the processing procedure shown in FIG. Here, in the following description, the text file 3 stores two lines of text, the first line being "structure of hierarchical index" and the second line being "search using binary search". It is assumed that the search character string "binary search" is input from the search character string input means 1.

【００１５】まず、ユーザが探索文字列入力手段１から
探索文字列を入力すると、この文字列及び及び文字列の
長さが探索文字列記憶手段２に送られ、探索文字列記憶
手段２が探索文字列を変数ｐａｔ、その長さを変数ｐｌ
ｅｎとして保持する（ステップＳ１）。すなわち、探索
文字列記憶手段２には、変数ｐａｔとして「バイナリー
サーチ」が保持され、変数ｐｌｅｎとして”８”が保持
される。なお、探索文字列の入力とともに、探索文字列
入力手段１から改行コードも入力され、これに基づいて
文字列照合手段６が起動される。次いで、変数ｐａｔの
内容にカタカナが含まれているかを文字列照合手段６が
判断し（ステップＳ２）、カタカナが含まれていない場
合にはステップＳ４以降の処理を行う一方、カタカナが
含まれている場合には文字列照合手段６が変形手段８に
指示して変数ｐａｔの内容を変形させる（ステップＳ
３）。すなわち、変形手段８は、変数ｐａｔの内容であ
る探索文字列「バイナリーサーチ」に変形規則辞書９の
図２に示すＮｏ５の規則を２回適用して、「バイナリサ
チ」に表記を変形し、これを新たな変数ｐａｔとして探
索文字列記憶手段２に保持させる。なお、この場合にあ
っても、変数ｐｌｅｎは”８”のまま探索文字列記憶手
段２に保持されている。First, when the user inputs a search character string from the search character string input means 1, the character string and the length of the character string are sent to the search character string storage means 2, and the search character string storage means 2 searches. The character string is the variable pat, and its length is the variable pl
It is held as en (step S1). That is, the search character string storage means 2 holds “binary search” as the variable pat and “8” as the variable plen. Along with the input of the search character string, the line feed code is also input from the search character string input means 1, and the character string collating means 6 is activated based on this. Next, the character string collating means 6 determines whether or not the contents of the variable pat include katakana (step S2). If the katakana is not included, the processing from step S4 onward is performed while katakana is included. If so, the character string collating means 6 instructs the transforming means 8 to transform the content of the variable pat (step S).
3). That is, the modification unit 8 applies the rule No. 5 of the modification rule dictionary 9 shown in FIG. 2 twice to the search character string "binary search" which is the content of the variable pat, and modifies the notation to "binary sachi". Is stored in the search character string storage means 2 as a new variable pat. Even in this case, the variable plen is held in the search character string storage means 2 as "8".

【００１６】次いで、文字列照合手段６からの指示に基
づいて、テキスト読み込み手段４がテキストファイル３
からテキストを１行ずつ読み込み、テキスト記憶手段５
に変数ｌｉｎｅとして保持させる（ステップＳ４）。こ
の結果、テキスト記憶手段５には、図５の（ａ）に示す
ように、テキストの第１行目が変数ｌｉｎｅとして保持
される。次いで、文字列照合手段６がテキスト記憶手段
５の変数ｌｉｎｅの内容が空か否かを判断し（ステップ
Ｓ５）、空である場合には処理を終了する一方、空でな
い場合には、テキスト行の文字数を示す変数ｌｌｅｎに
変数ｌｉｎｅの内容の文字数をセットし、照合開始位置
を示す変数ｉに初期値をセットして、テキスト記憶手段
５に保持させる（ステップＳ６）。すなわち、、図５の
（ｂ）に示すように、変数ｌｌｅｎは”１１”に、変数
ｉは”１”にセットされる。次いで、文字列照合手段６
が変数ｉの値が変数ｌｌｅｎの値以下かを判断し（ステ
ップＳ７）、変数ｉの値が変数ｌｌｅｎの値を上回って
いる場合には、テキスト行の最後の文字まで照合処理が
終了したのでテキストの次の行を処理するためにステッ
プＳ４の処理に戻る。一方、変数ｉの値が変数ｌｌｅｎ
の値以下の場合には、テキスト行に未だ照合処理してい
ない部分があるので、当該テキスト行についての処理を
続行する。すなわち、この時点では変数ｉは”１”、変
数ｌｌｅｎは”１１”であるので、処理を続行する。Then, based on the instruction from the character string collating means 6, the text reading means 4 causes the text file 3 to read.
Text is read line by line from the text storage means 5
To be held as a variable line (step S4). As a result, the first line of the text is held in the text storage means 5 as a variable line, as shown in FIG. Next, the character string collating means 6 judges whether or not the content of the variable line of the text storing means 5 is empty (step S5). If it is empty, the processing is ended, while if it is not empty, the text line The number of characters in the content of the variable line is set in the variable llen indicating the number of characters, and the initial value is set in the variable i indicating the collation start position, and is stored in the text storage means 5 (step S6). That is, as shown in FIG. 5B, the variable llen is set to "11" and the variable i is set to "1". Next, the character string collating means 6
Determines whether the value of the variable i is less than or equal to the value of the variable llen (step S7). If the value of the variable i is greater than the value of the variable llen, the collation processing is completed up to the last character of the text line. Return to step S4 to process the next line of text. On the other hand, the value of the variable i is the variable llen
If the value is less than or equal to, there is a portion of the text line that has not been subjected to the collation process, and therefore the process for the text line is continued. That is, since the variable i is "1" and the variable llen is "11" at this point, the process is continued.

【００１７】次いで、表記の変形の要否を調べるため
に、文字列照合手段６が変数ｌｉｎｅの内容のｉ番目の
文字がカタカナか否かを判断し（ステップＳ８）、照合
開始位置ｉの文字がカタカナであるときに後述するよう
な表記の変形処理を行う（ステップＳ１０、Ｓ１１）。
この時点では、変数ｉは”１”であり、変数ｌｉｎｅの
１番目の文字は「階」であるので表記の変形は行わず、
文字列照合手段６はテキスト保持手段５の変数ｓｔｒに
変数ｌｉｎｅの内容のｉ番目（１番目）から探索文字列
の長さ（変数ｐｌｅｎ）分の文字列をセットし、変数ｉ
ｎｃに初期値をセットする（ステップＳ９）。すなわ
ち、図５の（ｃ）に示すように、変数ｓｔｒにはテキス
ト行の第１文字目から８文字分の文字列「階層インデッ
クス」が保持され、変数ｉｎｃには”１”が保持され
る。次いで、文字列照合手段６が、変数ｓｔｒの内容に
変数ｐａｔの内容である探索文字列が含まれているかを
判断し（ステップＳ１２）、含まれている場合には表示
手段７に変数ｌｉｅｎの内容を表示させる一方（ステッ
プＳ１５）、含まれていない場合には照合開始位置を次
に移動する。この場合、探索文字列「バイナリサチ」は
文字列「階層インデックス」に含まれていないので、文
字列照合手段６は変数ｉの値を（ｉ＋ｉｎｃ）の値にセ
ットして、照合開始位置を移動させる（ステップＳ１
３）。Next, in order to check whether or not the notation needs to be changed, the character string collating means 6 judges whether or not the i-th character of the content of the variable line is katakana (step S8), and the character at the collation start position i is determined. If is a katakana, the notation transformation process described below is performed (steps S10 and S11).
At this point, the variable i is “1” and the first character of the variable line is “floor”, so the notation is not changed,
The character string collating means 6 sets the variable str of the text holding means 5 to the character string corresponding to the length of the search character string (variable plen) from the i-th (first) content of the variable line, and the variable i
An initial value is set in nc (step S9). That is, as shown in FIG. 5C, the variable str holds a character string “hierarchical index” for eight characters from the first character of the text line, and the variable inc holds “1”. . Next, the character string collating means 6 determines whether the content of the variable str includes the search character string which is the content of the variable pat (step S12). While the contents are displayed (step S15), if not included, the collation start position is moved to the next. In this case, since the search character string "binary sachi" is not included in the character string "hierarchical index", the character string collating means 6 sets the value of the variable i to the value of (i + inc) and moves the collation start position. (Step S1
3).

【００１８】次いで、文字列照合手段６が変数ｐｌｅｎ
の値と（ｌｌｅｎ−ｉ＋１）の値を比較して、テキスト
行中の未照合の文字数が探索文字列の文字数より少なく
なったかを判断し（ステップＳ１４）、少なくなってし
まった場合には次のテキスト行を読み込むためにステッ
プＳ４の処理へ戻り、そうでない場合には処理中のテキ
スト行で照合位置を移動させるためにステップＳ７の処
理へ戻る。この場合では、変数ｐｌｅｎの値は”８”、
変数ｌｌｅｎの値は”１１”、変数ｉの値は”２”であ
るので、テキスト行中の未照合の文字数が探索文字列の
文字数以上あるのでステップＳ７の処理へ戻る。したが
って、文字列照合手段６が、変数ｉの値が変数ｌｌｅｎ
の値以下かを判断し（ステップＳ７）、変数ｉの値が変
数ｌｌｅｎの値以下あるので、変数ｌｉｎｅの内容のｉ
番目（２番目）の文字がカタカナかを判断する（ステッ
プＳ８）。Next, the character string collating means 6 causes the variable plen to
Is compared with the value of (llen-i + 1) to determine whether the number of unmatched characters in the text line is less than the number of characters in the search character string (step S14). To return to the processing of step S4 to read the text line, and if not, the processing returns to step S7 to move the collation position in the text line being processed. In this case, the value of the variable plen is “8”,
Since the value of the variable llen is "11" and the value of the variable i is "2", the number of unmatched characters in the text line is equal to or more than the number of characters in the search character string, and therefore the process returns to step S7. Therefore, the character string collating means 6 determines that the value of the variable i is the variable llen.
Is determined to be less than or equal to the value of variable ilen (step S7), and the value of variable i is less than or equal to the value of variable llen.
It is determined whether the second (second) character is katakana (step S8).

【００１９】この場合、２番目の文字は「層」であるの
で表記の変形は行わず、文字列照合手段６はテキスト保
持手段５の変数ｓｔｒに変数ｌｉｎｅの内容の２番目か
ら探索文字列の長さ（８文字）分の文字列「層インデッ
クスの」をセットし、変数ｉの増分を示す変数ｉｎｃ
に”１”をセットする（ステップＳ９）。次いで、文字
列照合手段６が、変数ｓｔｒの内容に変数ｐａｔの内容
である探索文字列が含まれているかを判断し（ステップ
Ｓ１２）、この場合、探索文字列「バイナリサチ」は文
字列「層インデックスの」に含まれていないので、文字
列照合手段６は変数ｉの値を（ｉ＋ｉｎｃ）の値”３”
にセットして、照合開始位置を移動させる（ステップＳ
１３）。次いで、文字列照合手段６が変数ｐｌｅｎの値
と（ｌｌｅｎ−ｉ＋１）の値を比較すると（ステップＳ
１４）、この場合では、変数ｐｌｅｎの値は”８”、変
数ｌｌｅｎの値は”１１”、変数ｉの値は”３”である
ので、テキスト行中の未照合の文字数が探索文字列の文
字数以上あるのでステップＳ７の処理へ再び戻る。In this case, since the second character is "layer", the notation is not changed, and the character string collating means 6 stores the search string from the second content of the variable line in the variable str of the text holding means 5. Variable inc that indicates the increment of variable i by setting the character string "layer index" for the length (8 characters)
Is set to "1" (step S9). Next, the character string collating means 6 determines whether the content of the variable str includes the search character string which is the content of the variable pat (step S12). In this case, the search character string "binary sat" is the character string "layer". Since it is not included in “in the index”, the character string collating means 6 sets the value of the variable i to the value “3” of (i + inc).
To move the collation start position (step S
13). Next, the character string collating means 6 compares the value of the variable plen with the value of (llen-i + 1) (step S
14) In this case, since the value of the variable plen is “8”, the value of the variable llen is “11”, and the value of the variable i is “3”, the number of unmatched characters in the text line is the search character string. Since there are more characters, the process returns to step S7.

【００２０】したがって、文字列照合手段６が、変数ｉ
の値が変数ｌｌｅｎの値以下かを判断し（ステップＳ
７）、変数ｉの値が変数ｌｌｅｎの値以下あるので、変
数ｌｉｎｅの内容のｉ番目（３番目）の文字がカタカナ
かを判断する（ステップＳ８）。この結果、３番目の文
字は「イ」であるので表記の変形が行われ、文字列照合
手段６が変数ｌｉｎｅの内容のｉ番目（３番目）の文字
から連続するカタカナ文字列を変数ｓｔｒにセットし、
変数ｉｎｃに変数ｓｔｒの内容の文字数をセットする
（ステップＳ１０）。すなわち、図５の（ｄ）に示すよ
うに、テキスト記憶手段５の変数ｓｔｒには「インデッ
クス」が保持され、変数ｉｎｃにはこの文字数”６”が
セットされる。次いで、文字列照合手段６が変形手段８
に指示して変数ｓｔｒの内容のカタカナ文字列の表記を
変形させる（ステップＳ１１）。この結果、図５の
（ｅ）に示すように、変形手段８は変形辞書９に記録さ
れた規則（図２、図３）を用いて変数ｓｔｒの内容の
「インデックス」を「インデクス」に変形する。Therefore, the character string collating means 6 determines that the variable i
Is less than or equal to the value of the variable llen (step S
7) Since the value of the variable i is less than or equal to the value of the variable llen, it is determined whether the i-th (third) character in the content of the variable line is katakana (step S8). As a result, since the third character is "a", the notation is modified, and the character string collating means 6 stores the katakana character string continuous from the i-th (third) character of the content of the variable line in the variable str. Set,
The number of characters of the contents of the variable str is set in the variable inc (step S10). That is, as shown in FIG. 5D, the variable str of the text storage means 5 holds "index" and the variable inc is set to this character number "6". Next, the character string collating means 6 is transformed by the transforming means 8.
To transform the notation of the katakana character string of the contents of the variable str (step S11). As a result, as shown in (e) of FIG. 5, the transforming means 8 transforms the “index” of the content of the variable str into “index” using the rules (FIGS. 2 and 3) recorded in the transform dictionary 9. To do.

【００２１】次いで、文字列照合手段６が、変数ｓｔｒ
の内容に変数ｐａｔの内容である探索文字列が含まれて
いるかを判断し（ステップＳ１２）、この場合、探索文
字列「バイナリサチ」は文字列「インデクス」に含まれ
ていないので、文字列照合手段６は変数ｉの値を（ｉ＋
ｉｎｃ）の値”３＋６＝９”にセットして、照合開始位
置を移動させる（ステップＳ１３）。次いで、文字列照
合手段６が変数ｐｌｅｎの値と（ｌｌｅｎ−ｉ＋１）の
値を比較する（ステップＳ１４）。この結果、変数ｐｌ
ｅｎの値は”８”、変数ｌｌｅｎの値は”１１”、変数
ｉの値は”９”であるので、テキスト行中の未照合の文
字数が探索文字列の文字数より少なくなっっていると判
断されるので、次のテキスト行を読み込むためにステッ
プＳ４の処理へ戻る。Next, the character string collating means 6 determines the variable str.
It is determined whether the content of the search string includes the search character string that is the content of the variable pat (step S12). In this case, the search character string "binary sachi" is not included in the character string "index". The means 6 changes the value of the variable i to (i +
The value of (inc) is set to "3 + 6 = 9" and the collation start position is moved (step S13). Next, the character string collating means 6 compares the value of the variable plen with the value of (llen-i + 1) (step S14). As a result, the variable pl
Since the value of en is “8”, the value of the variable llen is “11”, and the value of the variable i is “9”, the number of unmatched characters in the text line is less than the number of characters in the search character string. Since it is determined, the process returns to step S4 to read the next text line.

【００２２】この結果、テキスト読み込み手段５がテキ
ストの第２行目「検索はバイナリ・サーチを用いて」を
テキストファイル３から読み込み、これをテキスト記憶
手段５に変数ｌｉｎｅとして保持させる（ステップＳ
４）。次いで、文字列照合手段６がテキスト記憶手段５
の変数ｌｉｎｅの内容が空か否かを判断するが（ステッ
プＳ５）、この場合には空でないので、図５の（ｆ）に
示すように、テキスト行の文字数を示す変数ｌｌｅｎに
変数ｌｉｎｅの内容の文字数”１５”をセットし、照合
開始位置を示す変数ｉに初期値”１”をセットして、テ
キスト記憶手段５に保持させる（ステップＳ６）。次い
で、文字列照合手段６が変数ｉの値が変数ｌｌｅｎの値
以下かを判断するが（ステップＳ７）、この場合では、
変数ｉは”１”、変数ｌｌｅｎは”１５”であるので、
次の処理を続行する。As a result, the text reading means 5 reads the second line of the text "search using binary search" from the text file 3 and stores it in the text storage means 5 as a variable line (step S).
4). Then, the character string collating means 6 is replaced by the text storing means 5.
It is determined whether or not the content of the variable line of is empty (step S5). In this case, since it is not empty, as shown in (f) of FIG. The number of characters of the content is set to "15", the initial value "1" is set to the variable i indicating the collation start position, and it is held in the text storage means 5 (step S6). Next, the character string collating means 6 determines whether the value of the variable i is less than or equal to the value of the variable llen (step S7). In this case,
Since the variable i is “1” and the variable llen is “15”,
Continues the next process.

【００２３】次いで、文字列照合手段６が変数ｌｉｎｅ
の内容のｉ番目（１番目）の文字がカタカナか否かを判
断するが（ステップＳ８）、この場合、１番目の文字は
「検」であるので表記の変形は行わない。次いで、文字
列照合手段６はテキスト保持手段５の変数ｓｔｒに変数
ｌｉｎｅの内容のｉ番目（１番目）から探索文字列の文
字数分”８”の文字列「検索はバイナリ・」をセット
し、変数ｉｎｃに初期値”１”をセットする（ステップ
Ｓ９）。次いで、文字列照合手段６が、変数ｓｔｒの内
容に変数ｐａｔの内容である探索文字列「バイナリサ
チ」が含まれているかを判断するが（ステップＳ１
２）、この場合、探索文字列は含まれていないので、文
字列照合手段６は変数ｉの値を（ｉ＋ｉｎｃ）の値”
２”にセットして、照合開始位置を移動させる（ステッ
プＳ１３）。Next, the character string collating means 6 causes the variable line
It is determined whether or not the i-th (first) character of the content is katakana (step S8). In this case, since the first character is "inspection", the notation is not changed. Next, the character string collating means 6 sets the variable str of the text holding means 5 to the character string "search is binary" of "8" for the number of characters of the search character string from the i-th (first) content of the variable line, The initial value "1" is set in the variable inc (step S9). Next, the character string collating means 6 determines whether or not the content of the variable str includes the search character string "binary sachi" which is the content of the variable pat (step S1).
2) In this case, since the search character string is not included, the character string collating means 6 sets the value of the variable i to the value of (i + inc) "
It is set to 2 "and the collation start position is moved (step S13).

【００２４】次いで、文字列照合手段６が変数ｐｌｅｎ
の値と（ｌｌｅｎ−ｉ＋１）の値を比較して、テキスト
行中の未照合の文字数が探索文字列の文字数より少なく
なったかを判断するが（ステップＳ１４）、この場合で
は、変数ｐｌｅｎの値は”８”、変数ｌｌｅｎの値は”
１５”、変数ｉの値は”２”であるので、テキスト行中
の未照合の文字数が探索文字列の文字数以上あるのでス
テップＳ７の処理へ戻る。したがって、上記と同様にし
て、ステップＳ７乃至Ｓ１４の処理を行うが、テキスト
記憶手段５の変数ｓｔｒの内容は「索はバイナリ・サ」
であるのでステップＳ１１の変形処理は行われず、ま
た、この文字列中に探索文字列「バイナリサ」も含まれ
ておらず（ステップＳ１２）、変数ｐｌｅｎの値が”
８”、変数ｌｌｅｎの値が”１５”に対して変数ｉの値
が”３”となるだけであるので、ステップＳ１４の判断
で再度ステップＳ７の処理へ戻こととなる。更に、上記
と同様にして、再度ステップＳ７乃至Ｓ１４の処理を行
うが、テキスト記憶手段５の変数ｓｔｒの内容は「はバ
イナリ・サー」であるのでステップＳ１１の変形処理は
行われず、また、この文字列中に探索文字列「バイナリ
サ」も含まれておらず（ステップＳ１２）、変数ｐｌｅ
ｎの値が”８”、変数ｌｌｅｎの値が”１５”に対して
変数ｉの値が”４”となるだけであるので、ステップＳ
１４の判断で再度ステップＳ７の処理へ戻こととなる。Next, the character string collating means 6 causes the variable plen to
Is compared with the value of (llen-i + 1) to determine whether the number of unmatched characters in the text line is less than the number of characters in the search character string (step S14). In this case, the value of the variable plen Is "8" and the value of the variable llen is "
Since 15 "and the value of the variable i are" 2 ", the number of unmatched characters in the text line is equal to or more than the number of characters in the search character string, and therefore the process returns to step S7. The processing of S14 is performed, but the content of the variable str of the text storage means 5 is "search is binary."
Therefore, the transformation process of step S11 is not performed, the search character string "binary" is not included in this character string (step S12), and the value of the variable plen is "
Since the value of the variable i is 8 "and the value of the variable llen is" 15 ", the value of the variable i is" 3 ", and the process returns to the process of step S7 in the determination of step S14. Then, the processes of steps S7 to S14 are performed again, but since the content of the variable str of the text storage means 5 is "is a binary server", the transformation process of step S11 is not performed, and the search is performed in this character string. The character string "binary" is not included (step S12), and the variable ple
Since the value of n is "8" and the value of the variable llen is "15", the value of the variable i is "4".
With the determination of 14, the process returns to step S7 again.

【００２５】そして更に、上記と同様にして、再度ステ
ップＳ７乃至Ｓ１４の処理を行うが、今回は、変数ｌｉ
ｎｅの内容の４番目の文字は「バ」であるので表記の変
形が行われ、文字列照合手段６が変数ｌｉｎｅの内容の
ｉ番目（４番目）の文字から連続するカタカナ文字列を
変数ｓｔｒにセットし、変数ｉｎｃに変数ｓｔｒの内容
の文字数をセットする（ステップＳ１０）。すなわち、
図５の（ｇ）に示すように、テキスト記憶手段５の変数
ｓｔｒには「バイナリ・サーチ」が保持され、変数ｉｎ
ｃにはこの文字数”８”がセットされる。なお、カタカ
ナに後続する「ー」および「・」は本実施例ではカタカ
ナとして扱っている。次いで、文字列照合手段６が変形
手段８に指示して変数ｓｔｒの内容のカタカナ文字列の
表記を変形させる（ステップＳ１１）。この結果、図５
の（ｈ）に示すように、変形手段８は変形辞書９に記録
された規則（図２、図３）を用いて変数ｓｔｒの内容の
「バイナリ・サーチ」を「バイナチサチ」に変形する。Further, similarly to the above, the processing of steps S7 to S14 is performed again, but this time, the variable li
Since the fourth character of the contents of ne is "ba", the notation is changed, and the character string collating means 6 changes the katakana character string continuous from the i-th (4th) character of the contents of the variable line to the variable str. And the number of characters of the contents of the variable str is set to the variable inc (step S10). That is,
As shown in (g) of FIG. 5, “binary search” is held in the variable str of the text storage means 5, and the variable in
The number of characters "8" is set in c. It should be noted that "-" and "." Following katakana are treated as katakana in this embodiment. Next, the character string collating means 6 instructs the transforming means 8 to transform the notation of the katakana character string of the content of the variable str (step S11). As a result, FIG.
As shown in (h) of the above, the transforming means 8 transforms the “binary search” of the content of the variable str into “binachisachi” using the rules (FIGS. 2 and 3) recorded in the transformation dictionary 9.

【００２６】そして、文字列照合手段６が、変数ｓｔｒ
の内容に変数ｐａｔの内容である探索文字列が含まれて
いるかを判断すると（ステップＳ１２）、この場合に
は、探索文字列「バイナリサチ」は変数ｓｔｒの内容の
文字列「バイナリサチ」に含まれているので、文字列照
合手段６が変数ｌｉｎｅの内容を表示手段７へ出力し
て、「検索はバイナリ・サーチを用いて」が探索文字列
を含むテキスト行として表示手段７の画面上に表示され
る（ステップＳ１５）。次いで、テキスト読み取り手段
４がテキストファイル３から次の行を読み取ることとな
るが（ステップＳ４）、テキストファイル３には第２行
目以降のテキストが格納されていないので、テキスト読
み取り手段４はテキスト記憶手段５の変数ｌｉｎｅに”
空”をセットする。そして、文字列照合手段６がこの状
態を判断して（ステップＳ５）、一連の文字列照合処理
が終了する。Then, the character string collating means 6 uses the variable str.
When it is determined whether the content of the search string includes the search character string that is the content of the variable pat (step S12), in this case, the search character string "binary sachi" is included in the character string "binary sachi" of the content of the variable str. Therefore, the character string collating means 6 outputs the contents of the variable line to the displaying means 7, and "Use binary search for search" is displayed on the screen of the displaying means 7 as a text line containing the search character string. (Step S15). Next, the text reading means 4 reads the next line from the text file 3 (step S4). However, since the text file 3 does not store the text of the second and subsequent lines, the text reading means 4 reads the text. In the variable line of the storage means 5
Then, the character string collating means 6 judges this state (step S5), and the series of character string collating processing is ended.

【００２７】ここで、文書の検索作業等においては、ユ
ーザが指定した探索文字列がテキスト中のどこに位置し
ているかを検出することが、検索の作業性を向上させる
ために重要な課題となっている。本発明の第２実施例は
上記従来の事情に鑑み、照合されたテキスト中での探索
文字列の出現位置を検出することも目的とする。図６に
は本発明の第２実施例に係る文字列照合装置の特徴的な
処理手順を示してある。なお、本実施例の説明は前記し
た第１実施例を引用して行い、同一の部分には同一の参
照符号を付して重複する説明は省略する。Here, in a document search operation or the like, detecting where the search character string designated by the user is located in the text is an important subject for improving the search workability. ing. In view of the above conventional circumstances, the second embodiment of the present invention also aims to detect the appearance position of the search character string in the collated text. FIG. 6 shows a characteristic processing procedure of the character string collating apparatus according to the second embodiment of the present invention. The description of the present embodiment will be made by citing the first embodiment described above, and the same portions will be denoted by the same reference symbols and redundant description will be omitted.

【００２８】本実施例の文字列照合装置は、探索文字列
記憶手段２が探索文字列のテキスト行中における位置情
報をも保持するように構成されている次の点以外は、図
１に示した構成と同様である。本実施例の文字列照合装
置は図４に示したステップＳ１乃至Ｓ１４の処理手順と
同様にして文字列照合処理を行うが、文字列照合手段６
が変数ｓｔｒの内容に変数ｐａｔの内容である探索文字
列が含まれていると判断した場合には（ステップＳ１
２）、この探索文字列のテキスト行（変数ｌｉｎｅの内
容）中での照合範囲を文字列照合手段６がテキスト記憶
手段５から検出して探索文字列記憶手段２に保持させる
（ステップＳ１６）。すなわち、照合範囲の開始位置は
変数ｉの値であり、照合範囲の終了位置は（ｉ＋ｉｎｃ
−１）の値として求められ、例えば、前記第１実施例で
説明した探索文字列「バイナリ・サーチ」を照合させた
場合には、照合範囲はテキスト行中での文字位置表示で
（４、１１）となる。The character string collating apparatus according to the present embodiment is shown in FIG. 1 except that the search character string storage means 2 is also configured to hold position information of a search character string in a text line. It is similar to the configuration. The character string collating apparatus of this embodiment performs the character string collating process in the same manner as the processing procedure of steps S1 to S14 shown in FIG.
Determines that the content of the variable str includes the search character string that is the content of the variable pat (step S1
2) The character string collating means 6 detects the collating range in the text line (content of the variable line) of this search character string from the text storing means 5 and stores it in the search character string storing means 2 (step S16). That is, the start position of the matching range is the value of the variable i, and the ending position of the matching range is (i + inc)
-1), for example, when the search character string "binary search" described in the first embodiment is collated, the collation range is (4, 11).

【００２９】そして、文字列照合手段６が変数ｐｌｅｎ
の値と（ｌｌｅｎ−ｉ＋１）の値を比較して、テキスト
行中の未照合の文字数が探索文字列の文字数より少なく
なったと判断し（ステップＳ１４）、次のテキスト行を
読み込むためにステップＳ４の処理へ戻る場合に、文字
列照合手段６がステップＳ１６で照合範囲が探索文字列
記憶手段５に保持れているかを判断する（ステップＳ１
７）。この結果、保持されていると判断したときには、
文字列照合手段６が変数ｌｉｎｅの内容及び照合範囲の
位置情報を表示手段７へ出力して、探索文字列をハイラ
イトする等して当該探索文字列を含むテキスト行を表示
手段７の画面上に表示する（ステップＳ１８）。このよ
うに、本実施例によれば、探索文字列が照合されたテキ
スト中のどこに位置しているかを検出することができ、
照合の作業性を向上させることができる。Then, the character string collating means 6 uses the variable plen.
Is compared with the value of (llen-i + 1), it is determined that the number of unmatched characters in the text line is less than the number of characters in the search character string (step S14), and step S4 is performed to read the next text line. When the process returns to step S16, the character string collating means 6 determines whether the collating range is held in the search character string storing means 5 in step S16 (step S1).
7). As a result, when it is judged that it is held,
The character string collating means 6 outputs the contents of the variable line and the position information of the collating range to the display means 7, highlights the search character string, etc., and displays the text line containing the search character string on the screen of the display means 7. (Step S18). Thus, according to the present embodiment, it is possible to detect where the search character string is located in the collated text,
The workability of collation can be improved.

【００３０】なお、上記実施例では、探索文字列は１つ
入力される例を示したが、複数の探索文字列を特殊記号
で区切って入力するようにしてもよい。この場合、文字
列照合手段６が特殊記号で区切られた各探索文字列を抽
出して、上述したと同様にして、各探索文字列をテキス
ト行に照合させるようにすればよい。また、表記の変形
はカタカナ文字列に対して行うように設定されている場
合にあっても、入力する探索文字列はカタカナ語以外の
漢字等の文字列であってもよく、この場合にも表記変形
処理を行わないだけで、所期の文字列照合処理を行うこ
とができる。また、照合されるテキストは１行づつでは
なく、２行以上読み込んで処理してもよい。In the above embodiment, one search character string is inputted, but a plurality of search character strings may be separated by a special symbol and inputted. In this case, the character string collating means 6 may extract each search character string delimited by a special symbol and collate each search character string with a text line in the same manner as described above. Further, even when the notation is set to be changed for the Katakana character string, the search character string to be input may be a character string such as a Kanji character other than the Katakana word. The desired character string matching process can be performed only by not performing the notation transformation process. Further, the text to be collated may be read and processed in two or more lines instead of line by line.

【００３１】[0031]

【発明の効果】以上詳細に説明したように、本発明の文
字列照合装置によれば、同一の規則に従って表記を変更
されたテキストに探索する文字列を照合するようにした
ため、膨大な同義語辞や登録時に予め標準表記化できる
ような処理を必要とすることなく、また、ユーザの知識
に依存することなく、ユーザが意図した語を自由な表記
で表した文字列を類似表記を含めて迅速且つ適切に探索
することができる。As described in detail above, according to the character string collating apparatus of the present invention, since the character string to be searched is collated with the text whose notation is changed according to the same rule, a huge synonym is obtained. Remarks and character strings that express the word intended by the user in free notation without including the processing that can be standardized in advance at the time of registration and without depending on the user's knowledge, including similar notations It is possible to search quickly and appropriately.

[Brief description of drawings]

【図１】本発明の一実施例に係る文字列照合装置の構成
を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a character string matching device according to an embodiment of the present invention.

【図２】表記変形規則テーブルを示す概念図である。FIG. 2 is a conceptual diagram showing a notation transformation rule table.

【図３】表記変形規則テーブルを示す概念図である。FIG. 3 is a conceptual diagram showing a notation transformation rule table.

【図４】本発明の一実施例に係る文字列照合の処理手順
を示すフローチャートである。FIG. 4 is a flowchart showing a processing procedure of character string matching according to an embodiment of the present invention.

【図５】本発明の一実施例に係る文字列照合の状態を示
す概念図である。FIG. 5 is a conceptual diagram showing a state of character string matching according to an embodiment of the present invention.

【図６】本発明の他の一実施例に係る文字列照合の処理
手順を示すフローチャートである。FIG. 6 is a flowchart showing a processing procedure of character string matching according to another embodiment of the present invention.

[Explanation of symbols]

２探索文字列記憶手段５テキスト記憶手段６文字列照合手段８変形手段９変形規則辞書 2 search character string storage means 5 text storage means 6 character string matching means 8 transformation means 9 transformation rule dictionary

───────────────────────────────────────────────────── フロントページの続き (72)発明者安藤誠神奈川県川崎市高津区坂戸３丁目２番１号ＫＳＰＲ＆Ｄビジネスパークビル富士ゼロックス株式会社内 (72)発明者相原一雄神奈川県川崎市高津区坂戸３丁目２番１号ＫＳＰＲ＆Ｄビジネスパークビル富士ゼロックス株式会社内 (72)発明者喜多辰臣神奈川県川崎市高津区坂戸３丁目２番１号ＫＳＰＲ＆Ｄビジネスパークビル富士ゼロックス株式会社内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Makoto Ando 32-1 Sakado, Takatsu-ku, Kawasaki City, Kanagawa Prefecture KSP R & D Business Park Building Fuji Xerox Co., Ltd. (72) Kazuo Aihara Sakado, Takatsu-ku, Kawasaki City, Kanagawa Prefecture 3-2-1 KSP R & D Business Park Building in Fuji Xerox Co., Ltd. (72) Inventor Tatsuomi Kita 3-2-1 Sakado, Takatsu-ku, Kawasaki City, Kanagawa KSP R & D Business Park Building in Fuji Xerox Co., Ltd.

Claims

[Claims]

1. In a character string collating device for collating whether a character string to be searched is included in text, a text storage means for holding the text, a character string storage means for holding the character string to be searched, and A character string transforming means for changing the notation of the character string and the searched character string according to the same rule, and a collating means for checking whether or not the changed notation text includes the changed notation string, A character string collating device characterized by being provided.