JP2005182361A

JP2005182361A - String collating device, string collating system, string collating method, program, and recording medium

Info

Publication number: JP2005182361A
Application number: JP2003420717A
Authority: JP
Inventors: Hiroshi Takegawa; 弘志竹川
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2003-12-18
Filing date: 2003-12-18
Publication date: 2005-07-07

Abstract

<P>PROBLEM TO BE SOLVED: To provide a string collating device, a string collating system, and a string collating method, which can collate whether or not a search string is included in a text of collation object, only by specifying the search string, even if the notation of the search string changes, and a program, and a recording medium therefor. <P>SOLUTION: The text of collation object and the language of the text are associated, and are stored in a text storage means. For every available language, the notation of a search string specified by a transformational rule corresponding to the language is transformed. For each text stored in the text storage means, its notation is transformed by a transformational rule corresponding to the language of the text, and the transformed text is collated with the transformed search string. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、探索する文字列が照合対象テキストに含まれるか否かを照合する文字列照合装置、文字列照合システム、文字列照合方法、プログラムおよび記録媒体に関し、具体的には、探索文字列の表記が変化する場合の照合技術に関する。 The present invention relates to a character string collation device, a character string collation system, a character string collation method, a program, and a recording medium that collate whether or not a character string to be searched is included in a collation target text. TECHNICAL FIELD OF THE INVENTION

ユーザが指定した文字列を照合する機能は、文書検索や文書作成等で広く利用されている。従来、文字列を照合する方法には、ユーザが指定した文字列と完全に一致する文字列を照合する方法や多少の表記の揺れを包含した照合方法（例えば、特許文献１）がある。
特許文献１は、テキスト記憶手段に保持するテキストと探索する文字列とを同一の規則に従って表記を変更して、表記を変更されたテキストに表記を変更された文字列が含まれているかを照合することによって、表記の揺れを包含した文字列の照合を行っている。 The function of collating character strings designated by the user is widely used in document search, document creation, and the like. Conventionally, as a method for collating character strings, there are a method for collating character strings that completely match a character string specified by a user, and a collation method that includes some fluctuations in notation (for example, Patent Document 1).
Patent Document 1 changes the notation of text to be stored in a text storage means and a character string to be searched according to the same rule, and checks whether the character string whose notation is changed is included in the changed notation By doing so, the character string including the fluctuation of the notation is collated.

例えば、日本語を対象とした表記変更規則としては、統一性のないカタカナ表記「ウィンドウ」と「ウインドウ」、「インデックス」と「インデクス」、「インターフェース」と「インタフェイス」等、また、統一性がない送り仮名の表記「読み取り装置」、「読取り装置」、「読取装置」等に対していずれかに統一する規則を採用している。 For example, the notation change rules for Japanese include inconsistent katakana notation “window” and “window”, “index” and “index”, “interface” and “interface”, etc. A rule that unifies any one of the notation “reading device”, “reading device”, “reading device”, etc., is used.

また、英語、ドイツ語、フランス語等のように、名詞、動詞、形容詞等の語尾変化を伴うものに対しては、単語の原形へ表記を変更する規則を採用することができる。
この変更規則には例えば、次のものが考えられる。
英語：「papers」→「paper」のように単語の末尾の複数形を表す「s」を除去する。
ドイツ語：「papiere」→「papier」のように単語の末尾の複数形を表す「ｅ」を除去する。
フランス語：「papiers」→「papier」のように単語の末尾の複数形を表す「ｓ」を除去する。
特開平０７−３１９８９２号公報 Moreover, the rule which changes notation to the original form of a word can be employ | adopted about things with ending changes, such as a noun, a verb, and an adjective, like English, German, French.
For example, the following can be considered as this change rule.
English: Remove “s” representing plurals at the end of a word like “papers” → “paper”.
German: “e” representing plurals at the end of a word is removed like “papiere” → “papier”.
French: “s” representing plurals at the end of a word is removed like “papiers” → “papier”.
JP 07-319892 A

上述した従来の照合を行うシステムでは、記憶されている文書のテキストと探索する文字列のそれぞれを指定された規則にしたがって表記を変更してから照合を行っている。
一方、文書は必ずなんらかの言語で作成されているはずであるから、この文書を集積したデータベース等には多種類の言語で作成された文書が含まれることになる。 In the conventional collation system described above, collation is performed after changing the notation of the text of the stored document and the character string to be searched according to a specified rule.
On the other hand, since a document must be created in some language, a database in which the documents are accumulated includes documents created in many kinds of languages.

このようなデータベースから特定の言語の文字列を探索するときには、従来の技術では一定の変形規則でしか対処できないので、まず、（１）ユーザは記憶されている文書のテキストのうち特定の言語の文書を探し、（２）見つかった文書と探索する文字列のそれぞれを特定の言語に応じた変形規則にしたがって表記を変更してから照合を行うという二段階の手間を要することになる。 When searching for a character string of a specific language from such a database, since the conventional technique can deal only with a certain transformation rule, first, (1) the user can search for a specific language in the text of a stored document. This requires two steps of searching for a document and (2) collating the found document and the character string to be searched after changing the notation according to a transformation rule corresponding to a specific language.

このようにユーザは、探索する文字列がどんな言語で表現されているかを指定しなければならないため面倒であり、また、探索する文字列がどんな言語で表現されているかを知っていなければならない。 Thus, the user has to specify what language the character string to be searched is expressed in, and is troublesome, and also needs to know what language the character string to be searched is expressed in.

本発明は、上述のような実情を考慮してなされたものであって、探索文字列の表記が変化する場合であっても、探索文字列を指定するのみで、この探索文字列が照合対象のテキストに含まれるか否かを照合できる文字列照合装置、文字列照合システム、文字列照合方法、プログラムおよび記録媒体を提供することを目的とする。 The present invention has been made in consideration of the above-described circumstances, and even when the search character string notation changes, the search character string can be collated only by specifying the search character string. It is an object of the present invention to provide a character string collation device, a character string collation system, a character string collation method, a program, and a recording medium that can collate whether they are included in the text.

上記課題を解決するために、請求項１の発明は、探索する文字列がテキストに含まれているかを照合する文字列照合装置において、言語ごとに用意した変形規則を用いて文字列の表記を変形する文字列変形手段と、照合対象となるテキストおよび該テキストの言語を対応付けて保持するテキスト記憶手段と、本装置で利用可能な言語を取得する言語取得手段と、前記取得言語のすべてに対して、指定された探索文字列の表記を前記文字列変形手段により変形する変形探索文字列生成手段と、前記テキスト記憶手段に保持された各テキストに対して、該テキストの言語で表記を前記文字列変形手段により変形し、該変形したテキストと前記変形探索文字列とを照合する文字列照合手段を備えることを特徴とする。 In order to solve the above-mentioned problems, the invention of claim 1 is a character string collating device for collating whether or not a character string to be searched is included in text, and using a transformation rule prepared for each language to express the character string. A character string transformation unit that transforms, a text storage unit that associates and holds text to be collated and a language of the text, a language acquisition unit that acquires a language that can be used in the apparatus, and all of the acquisition languages On the other hand, the modified search character string generating means for transforming the notation of the specified search character string by the character string deforming means, and the notation in the language of the text for each text held in the text storage means Character string matching means for deforming by the character string deforming means and for collating the deformed text with the deformed search character string is provided.

請求項２の発明は、探索する文字列がテキストに含まれているかを照合する文字列照合装置において、言語ごとに用意した変形規則を用いて文字列の表記を変形する文字列変形手段と、照合対象となるテキストの言語で該テキストの表記を前記文字列変形手段により変形し、該変形したテキストと該テキストの言語とを対応付けて保持するテキスト記憶手段と、本装置で利用可能な言語を取得する言語取得手段と、前記取得言語のすべてに対して、指定された探索文字列の表記を前記文字列変形手段により変形する変形探索文字列生成手段と、前記テキスト記憶手段に保持された各変形したテキストに対して、該変形したテキストと前記変形探索文字列とを照合する文字列照合手段を備えることを特徴とする。 The invention according to claim 2 is a character string collating device for collating whether or not a character string to be searched is included in the text, a character string deforming means for deforming the character string notation using a deformation rule prepared for each language, Text storage means for transforming the notation of the text in the language of the text to be collated by the character string transforming means, and storing the deformed text and the language of the text in association with each other, and languages usable in the apparatus Stored in the text storage unit, the language acquisition unit for acquiring the search string, the modified search character string generation unit for transforming the notation of the specified search character string by the character string modification unit for all of the acquisition languages, Character string collating means for collating the deformed text with the deformed search character string is provided for each deformed text.

請求項３の発明は、請求項１または２に記載の文字列照合装置において、前記テキスト記憶手段から選択された照合対象のテキストを照合対象記憶手段へ登録する照合対象登録手段を有し、前記文字列照合手段は前記テキスト記憶手段の代わりに前記照合対象記憶手段に登録されたテキストに対して照合を行うようにしたことを特徴とする。 The invention of claim 3 is the character string collating apparatus according to claim 1 or 2, further comprising collation target registration means for registering the collation target text selected from the text storage means in the collation target storage means, The character string collating means is adapted to collate the text registered in the collation target storage means instead of the text storage means.

請求項４の発明は、請求項１または２に記載の文字列照合装置において、前記言語取得手段は、前記文字列変形手段で参照する変形規則に対応する言語のすべての言語集合を利用可能な言語として取得することを特徴とする。 According to a fourth aspect of the present invention, in the character string collating apparatus according to the first or second aspect, the language acquisition unit can use all language sets of languages corresponding to the transformation rules referred to by the character string transformation unit. It is acquired as a language.

請求項５の発明は、請求項１または２に記載の文字列照合装置において、前記言語取得手段は、前記テキスト記憶手段に保持されたすべてのテキストの言語集合から重複した言語を除いた言語集合を利用可能な言語として取得することを特徴とする。 According to a fifth aspect of the present invention, in the character string collating apparatus according to the first or second aspect, the language acquisition unit is a language set obtained by excluding duplicate languages from the language set of all texts held in the text storage unit. Is acquired as an available language.

請求項６の発明は、請求項３に記載の文字列照合装置において、前記言語取得手段は、前記照合対象記憶手段に登録されたすべてのテキストの言語集合から重複した言語を除いた言語集合を利用可能な言語として取得することを特徴とする。 A sixth aspect of the present invention is the character string collating apparatus according to the third aspect, wherein the language acquisition unit obtains a language set obtained by removing duplicate languages from the language set of all texts registered in the collation target storage unit. It is obtained as an available language.

請求項７の発明は、請求項１乃至６のいずれかに記載の文字列照合装置において、前記変形探索文字列生成手段は、前記取得言語ごとに生成された前記変形探索文字列を言語とは関係なく複数の変形探索文字列とみなし、前記文字列照合手段は、該複数の変形探索文字列のいずれかが含まれているかを照合するようにしたことを特徴とする。 A seventh aspect of the present invention is the character string collating apparatus according to any one of the first to sixth aspects, wherein the modified search character string generation unit is configured to use the modified search character string generated for each acquired language as a language. Regardless of a plurality of modified search character strings, the character string collating means collates whether any of the plurality of modified search character strings is included.

請求項８の発明は、請求項７に記載の文字列照合装置において、前記変形探索文字列生成手段は、前記複数の変形探索文字列のうち重複するものを除くようにしたことを特徴とする。 The invention according to claim 8 is the character string collating apparatus according to claim 7, wherein the modified search character string generation unit excludes duplicated ones of the plurality of modified search character strings. .

請求項９の発明は、請求項１乃至６のいずれかに記載の文字列照合装置において、前記変形探索文字列生成手段は、前記取得言語ごとに生成された前記変形探索文字列を言語ごとに保持しておき、前記文字列照合手段は、照合対象の変形されたテキストと該テキストの言語に対応する前記変形探索文字列を照合するようにしたことを特徴とする。 A ninth aspect of the present invention is the character string collating apparatus according to any one of the first to sixth aspects, wherein the modified search character string generation unit generates the modified search character string generated for each acquired language for each language. The character string collating means is configured to collate the modified text to be collated with the modified search character string corresponding to the language of the text.

請求項１０の発明は、探索する文字列がテキストに含まれているかを照合する文字列照合装置において、言語ごとに用意した変形規則を用いて文字列の表記を変形する文字列変形手段と、照合対象となるテキストおよび該テキストの言語を対応付けて保持するテキスト記憶手段と、前記テキスト記憶手段に保持された各テキストに対して、該テキストの言語で該テキストおよび指定された探索文字列の表記を前記文字列変形手段により変形して、該変形したテキストと該変形した探索文字列とを照合する文字列照合手段を備えることを特徴とする。 The invention of claim 10 is a character string collating device for collating whether or not a character string to be searched is included in the text, and a character string deforming means for deforming the notation of the character string using a deformation rule prepared for each language; A text storage means for associating and holding the text to be collated and the language of the text; and for each text held in the text storage means, the text and the specified search character string in the language of the text Character string collating means for modifying the notation by the character string deforming means and collating the deformed text with the deformed search character string is provided.

請求項１１の発明は、探索する文字列がテキストに含まれているかを照合する文字列照合装置において、言語ごとに用意した変形規則を用いて文字列の表記を変形する文字列変形手段と、照合対象となるテキストの言語で該テキストの表記を前記文字列変形手段により変形し、該変形したテキストと該テキストの言語とを対応付けて保持するテキスト記憶手段と、前記テキスト記憶手段に保持された各変形したテキストに対して、前記文字列変形手段により該テキストの言語で指定された探索文字列の表記を変形し、該変形したテキストと該変形した探索文字列とを照合する文字列照合手段を備えることを特徴とする。 The invention according to claim 11 is a character string collating device for collating whether or not a character string to be searched is included in the text, and a character string deforming means for deforming the character string notation using a deformation rule prepared for each language; A text storage means for transforming the notation of the text in the language of the text to be collated by the character string transforming means, and storing the deformed text and the language of the text in association with each other; held in the text storage means For each deformed text, the character string matching is performed by transforming the notation of the search character string specified in the language of the text by the character string deforming means, and collating the deformed text with the deformed search character string. Means are provided.

請求項１２の発明は、請求項１０または１１に記載の文字列照合装置において、前記テキスト記憶手段から選択された照合対象のテキストを照合対象記憶手段へ登録する照合対象登録手段を有し、前記文字列照合手段は前記テキスト記憶手段の代わりに前記照合対象記憶手段に登録されたテキストに対して照合を行うようにしたことを特徴とする。 The invention of claim 12 is the character string collating apparatus according to claim 10 or 11, further comprising collation target registration means for registering text to be collated selected from the text storage means in the collation target storage means, The character string collating means is adapted to collate the text registered in the collation target storage means instead of the text storage means.

請求項１３の発明は、文字列照合サーバと１台以上の端末を接続し、該文字列照合サーバでは該端末から指定された探索文字列とテキストを照合し、照合結果を前記端末に返信する文字列照合システムにおいて、前記端末は、照合対象のテキストと該テキストの言語を入力して、前記文字列照合サーバへ登録要求を送信するテキスト入力手段と、探索する文字列を入力して、該探索文字列の照合要求を前記文字列照合サーバへ送信する探索文字列入力手段と、照合結果を前記文字列照合サーバから受信して表示する照合結果表示手段を備え、前記文字列照合サーバは、前記端末から受信したテキストおよび該テキストの言語を対応付けてテキスト記憶手段へ登録するテキスト登録手段と、言語ごとに用意した変形規則を用いて文字列の表記を変形する文字列変形手段と、利用可能な言語を取得する言語取得手段と、前記取得言語のすべてに対して、前記探索文字列の表記を前記文字列変形手段により変形する変形探索文字列生成手段と、前記テキスト記憶手段に保持された各テキストに対して、該テキストの言語で表記を前記文字列変形手段により変形し、該変形したテキストと前記変形探索文字列とを照合する文字列照合手段と、照合結果を前記端末へ送信する照合結果送信手段を備えることを特徴とする。 The invention according to claim 13 connects the character string matching server and one or more terminals, the character string matching server collates the search character string designated by the terminal with the text, and returns the matching result to the terminal. In the character string collating system, the terminal inputs a text to be collated and a language of the text, inputs a text input means for transmitting a registration request to the character string collating server, inputs a character string to be searched, A search character string input means for transmitting a search character string collation request to the character string collation server; and a collation result display means for receiving and displaying a collation result from the character string collation server, the character string collation server comprising: Text registration means for associating the text received from the terminal and the language of the text and registering the text in the text storage means, and a character string notation using a transformation rule prepared for each language Character string deforming means for forming, language acquiring means for acquiring available languages, and deformed search character string generating means for deforming notation of the search character string by the character string deforming means for all of the acquired languages Character string matching means for transforming the notation in the language of the text by the character string deforming means for each text held in the text storage means, and collating the deformed text with the deformed search character string And a verification result transmitting means for transmitting the verification result to the terminal.

請求項１４の発明は、文字列照合サーバと１台以上の端末を接続し、該文字列照合サーバでは該端末から指定された探索文字列とテキストを照合し、照合結果を前記端末に返信する文字列照合システムにおいて、前記端末は、照合対象のテキストと該テキストの言語を入力して、前記文字列照合サーバへ登録要求を送信するテキスト入力手段と、探索する文字列を入力して、該探索文字列の照合要求を前記文字列照合サーバへ送信する探索文字列入力手段と、照合結果を前記文字列照合サーバから受信して表示する照合結果表示手段を備え、前記文字列照合サーバは、言語ごとに用意した変形規則を用いて文字列の表記を変形する文字列変形手段と、前記端末から受信したテキストを該テキストの言語で表記を前記文字列変形手段により変形し、該変形したテキストと該テキストの言語とを対応付けてテキスト記憶手段へ登録するテキスト登録手段と、利用可能な言語を取得する言語取得手段と、前記取得言語のすべてに対して、前記探索文字列の表記を前記文字列変形手段により変形する変形探索文字列生成手段と、前記テキスト記憶手段に保持された各変形したテキストに対して、該変形したテキストと前記変形探索文字列とを照合する文字列照合手段と、照合結果を前記端末へ送信する照合結果送信手段を備えることを特徴とする。 The invention according to claim 14 connects the character string matching server and one or more terminals, the character string matching server collates the search character string designated by the terminal with the text, and returns the matching result to the terminal. In the character string collating system, the terminal inputs a text to be collated and a language of the text, inputs a text input means for transmitting a registration request to the character string collating server, inputs a character string to be searched, A search character string input means for transmitting a search character string collation request to the character string collation server; and a collation result display means for receiving and displaying a collation result from the character string collation server, the character string collation server comprising: Character string transformation means for transforming the notation of a character string using a transformation rule prepared for each language, and transformation of the text received from the terminal in the language of the text by the character string transformation means A text registration unit that associates the deformed text with the language of the text and registers the text in the text storage unit, a language acquisition unit that acquires an available language, and the search character for all of the acquired languages. The deformed search character string generating means for deforming the notation of the column by the character string deforming means, and the deformed text and the deformed search character string are checked against each deformed text held in the text storage means. Character string matching means and matching result transmission means for sending a matching result to the terminal are provided.

請求項１５の発明は、文字列照合サーバと１台以上の端末を接続し、該文字列照合サーバでは該端末から指定された探索文字列とテキストを照合し、照合結果を前記端末に返信する文字列照合システムにおいて、前記端末は、照合対象のテキストと該テキストの言語を入力して、前記文字列照合サーバへ登録要求を送信するテキスト入力手段と、探索する文字列を入力して、該探索文字列の照合要求を前記文字列照合サーバへ送信する探索文字列入力手段と、照合結果を前記文字列照合サーバから受信して表示する照合結果表示手段を備え、前記文字列照合サーバは、前記端末から受信したテキストおよび該テキストの言語を対応付けてテキスト記憶手段へ登録するテキスト登録手段と、言語ごとに用意した変形規則を用いて文字列の表記を変形する文字列変形手段と、前記テキスト記憶手段に保持された各テキストに対して、該テキストの言語で該テキストおよび前記探索文字列の表記を前記文字列変形手段により変形して、該変形したテキストと該変形した探索文字列とを照合する文字列照合手段と、照合結果を前記端末へ送信する照合結果送信手段を備えることを特徴とする。 The invention of claim 15 connects a character string collation server and one or more terminals, the character string collation server collates a search character string designated by the terminal with text, and returns a collation result to the terminal. In the character string collating system, the terminal inputs a text to be collated and a language of the text, inputs a text input means for transmitting a registration request to the character string collating server, inputs a character string to be searched, A search character string input means for transmitting a search character string collation request to the character string collation server; and a collation result display means for receiving and displaying a collation result from the character string collation server, the character string collation server comprising: Text registration means for associating the text received from the terminal and the language of the text and registering the text in the text storage means, and a character string notation using a transformation rule prepared for each language A character string deforming means for shaping, and for each text held in the text storage means, the notation of the text and the search character string is deformed by the character string deforming means in the language of the text. Characteristic collation means for collating the text with the deformed search character string, and collation result transmission means for transmitting the collation result to the terminal.

請求項１６の発明は、文字列照合サーバと１台以上の端末を接続し、該文字列照合サーバでは該端末から指定された探索文字列とテキストを照合し、照合結果を前記端末に返信する文字列照合システムにおいて、前記端末は、照合対象のテキストと該テキストの言語を入力して、前記文字列照合サーバへ登録要求を送信するテキスト入力手段と、探索する文字列を入力して、該探索文字列の照合要求を前記文字列照合サーバへ送信する探索文字列入力手段と、照合結果を前記文字列照合サーバから受信して表示する照合結果表示手段を備え、前記文字列照合サーバは、言語ごとに用意した変形規則を用いて文字列の表記を変形する文字列変形手段と、前記端末から受信したテキストを該テキストの言語で表記を前記文字列変形手段により変形し、該変形したテキストと該テキストの言語とを対応付けてテキスト記憶手段へ登録するテキスト登録手段と、前記テキスト記憶手段に保持された各変形したテキストに対して、該テキストの言語で前記探索文字列の表記を前記文字列変形手段により変形し、該変形したテキストと該変形した探索文字列とを照合する文字列照合手段と、照合結果を前記端末へ送信する照合結果送信手段を備えることを特徴とする。 The invention according to claim 16 connects the character string matching server and one or more terminals, the character string matching server collates the search character string designated by the terminal with the text, and returns the matching result to the terminal. In the character string collating system, the terminal inputs a text to be collated and a language of the text, inputs a text input means for transmitting a registration request to the character string collating server, inputs a character string to be searched, A search character string input means for transmitting a search character string collation request to the character string collation server; and a collation result display means for receiving and displaying a collation result from the character string collation server, the character string collation server comprising: Character string transformation means for transforming the notation of a character string using a transformation rule prepared for each language, and transformation of the text received from the terminal in the language of the text by the character string transformation means Text registration means for associating the deformed text with the language of the text and registering the text in the text storage means; and for each deformed text held in the text storage means, the search character in the language of the text A character string collating unit that modifies the notation of a column by the character string deforming unit, collates the deformed text with the deformed search character string, and a collation result transmitting unit that transmits a collation result to the terminal. Features.

請求項１７の発明は、探索する文字列がテキストに含まれているかを照合する文字列照合方法において、照合対象となるテキストおよび該テキストの言語とを対応付けてテキスト記憶手段に保持しておき、利用可能な言語のすべてに対して、該言語に対応する変形規則で指定された探索文字列の表記を変形し、前記テキスト記憶手段に保持された各テキストに対して、該テキストの言語に対応する変形規則で表記を変形し、該変形したテキストと前記変形探索文字列とを照合することを特徴とする。 According to a seventeenth aspect of the present invention, in the character string collating method for collating whether or not the character string to be searched is included in the text, the text to be collated and the language of the text are stored in the text storage unit in association with each other. For all the available languages, the notation of the search character string specified by the transformation rules corresponding to the language is transformed, and for each text held in the text storage means, the language of the text is changed. The notation is modified by a corresponding modification rule, and the modified text is collated with the modified search character string.

請求項１８の発明は、探索する文字列がテキストに含まれているかを照合する文字列照合方法において、照合対象となるテキストを該テキストの言語で表記を変形し、該変形したテキストと該テキストの言語とを対応付けてテキスト記憶手段に保持しておき、利用可能な言語のすべてに対して、該言語に対応する変形規則で指定された探索文字列の表記を変形し、前記テキスト記憶手段に保持された各変形したテキストに対して、該変形したテキストと前記変形した探索文字列とを照合することを特徴とする。 The invention according to claim 18 is a character string collating method for collating whether or not a character string to be searched is included in the text, wherein the notation of the text to be collated is changed in the language of the text, the deformed text and the text Are stored in the text storage means in association with each other, and the notation of the search character string specified by the transformation rule corresponding to the language is transformed for all the available languages, and the text storage means The deformed text and the deformed search character string are collated with respect to each deformed text held in (1).

請求項１９の発明は、探索する文字列がテキストに含まれているかを照合する文字列照合方法において、照合対象となるテキストおよび該テキストを作成した言語とを対応付けてテキスト記憶手段に保持しておき、前記テキスト記憶手段に保持された各テキストに対して、該テキストの言語に対応する変形規則で該テキストおよび指定された探索文字列の表記を変形して、該変形したテキストと該変形した探索文字列とを照合することを特徴とする。 According to a nineteenth aspect of the present invention, in the character string collating method for collating whether or not the character string to be searched is included in the text, the text to be collated and the language in which the text is created are stored in the text storage means in association with each other. In addition, for each text held in the text storage means, the notation of the text and the specified search character string is transformed by a transformation rule corresponding to the language of the text, the transformed text and the transformation The searched character string is collated.

請求項２０の発明は、探索する文字列がテキストに含まれているかを照合する文字列照合方法において、照合対象となるテキストを該テキストの言語に対応する変形規則で表記を変形し、該変形したテキストと該テキストの言語とを対応付けてテキスト記憶手段に保持しておき、前記テキスト記憶手段に保持された各変形したテキストに対して、該テキストの言語に対応する変形規則で指定された探索文字列の表記を変形し、該変形したテキストと該変形した探索文字列とを照合することを特徴とする。 The invention according to claim 20 is a character string collating method for collating whether or not a character string to be searched for is included in the text, wherein the notation of the text to be collated is modified by a deformation rule corresponding to the language of the text, The text stored in the text storage unit is associated with the text and the language of the text, and each deformed text stored in the text storage unit is designated by a transformation rule corresponding to the language of the text. The notation of the search character string is modified, and the deformed text is collated with the deformed search character string.

請求項２１の発明は、コンピュータに、請求項１乃至１２のいずれかに記載の文字列照合装置の機能、または請求項１３乃至１６のいずれかに記載の文字列照合システムの機能を実行させるためのプログラムである。
請求項２２の発明は、請求項２０に記載のプログラムを記録したコンピュータ読み取り可能な記録媒体である。 The invention of claim 21 causes a computer to execute the function of the character string matching device according to any one of claims 1 to 12 or the function of the character string matching system according to any of claims 13 to 16. It is a program.
A twenty-second aspect of the present invention is a computer-readable recording medium on which the program according to the twentieth aspect is recorded.

本発明によると、ユーザが探索する文字列がどんな言語で表現されているかを指定する必要がなくなり探索指定時の手間が減少する。
また、照合対象となるテキストを予め変形規則で変形したテキストを記憶しておき、この変形したテキストに対して照合処理をするので、照合のたびに対象テキストを変換しないため、処理コストが少なくて済む。 According to the present invention, it is not necessary to specify in what language the character string to be searched for is expressed by the user, and the time and labor for specifying the search is reduced.
In addition, since the text to be collated is preliminarily deformed according to the deformation rules, and the deformed text is collated, the target text is not converted every time collation is performed, so the processing cost is low. That's it.

以下、図面を参照して、本発明の好適な実施形態について説明する。 Hereinafter, preferred embodiments of the present invention will be described with reference to the drawings.

図１は、本実施形態の文字列照合システムの全体構成を示すブロック図であり、図１において、本実施形態は、文字列照合サーバ１０、任意台数のファイルサーバ４０、任意台数の端末２０および通信ネットワーク３０とから構成される。 FIG. 1 is a block diagram showing the overall configuration of the character string matching system of this embodiment. In FIG. 1, this embodiment shows a character string matching server 10, an arbitrary number of file servers 40, an arbitrary number of terminals 20, and A communication network 30;

文字列照合サーバ１０は、端末２０と通信ネットワーク３０を介して接続され、端末２０からの照合要求に応じた照合処理を行い、その結果を端末２０に返すためのコンピュータである。また、文字列照合サーバ１０は、ＣＰＵ、メモリ、ハードディスクを備え、要求された照合処理、通信ネットワーク３０を介して端末２０やファイルサーバ４０とのデータの送受信等の各種プログラムが実行可能である。探索対象となるテキスト、探索文字列、照合結果等の文字列照合サーバ１０によって管理されるデータは、メモリ、ハードディスク、ＣＤやＤＶＤなどの記録媒体に記録される。 The character string matching server 10 is a computer that is connected to the terminal 20 via the communication network 30, performs a matching process in response to a matching request from the terminal 20, and returns the result to the terminal 20. The character string collation server 10 includes a CPU, a memory, and a hard disk, and can execute various programs such as requested collation processing and data transmission / reception with the terminal 20 and the file server 40 via the communication network 30. Data managed by the character string matching server 10 such as a text to be searched, a search character string, and a matching result are recorded in a recording medium such as a memory, a hard disk, a CD, or a DVD.

端末２０は、文字列照合サーバ１０と通信ネットワーク３０を介して接続され、ユーザが文字列照合サーバ１０への問合せの入力や照合結果を出力するために利用するコンピュータである。
一般に、端末２０には探索する文字列や記憶するテキストを入力するためのキーボードおよびマウスなどのポインティングデバイスや、照合結果を表示するための表示装置が装備されている。また、端末２０は、ＣＰＵ、メモリ、ハードディスクを備え、文字列の入力、照合結果の出力、通信ネットワーク３０を介して文字列照合サーバ１０やファイルサーバ４０とのデータの送受信等の各種プログラムが実行可能である。 The terminal 20 is a computer that is connected to the character string collation server 10 via the communication network 30 and used by the user to input an inquiry to the character string collation server 10 and output a collation result.
In general, the terminal 20 is equipped with a pointing device such as a keyboard and a mouse for inputting a character string to be searched and text to be stored, and a display device for displaying a collation result. The terminal 20 includes a CPU, a memory, and a hard disk, and executes various programs such as character string input, collation result output, and data transmission / reception with the character string collation server 10 and the file server 40 via the communication network 30. Is possible.

ファイルサーバ４０は、必要に応じて文字列照合サーバ１０や端末２０と通信ネットワーク３０を介して接続され、探索対象となるテキスト、探索文字列、照合結果等のデータを記録するハードディスク、ＣＤやＤＶＤ等の記録媒体を有するコンピュータである。
また、照合結果のテキストがファイルサーバ４０上にある場合には、端末２０からの要求に応じてそのテキストを送信することもある。
また、探索対象となるテキスト、探索文字列、照合結果等のデータがすべて文字列照合サーバ１０で管理される場合には、ファイルサーバ４０は備えなくてもよい。以下の説明では、上記のデータはすべて文字列照合サーバ１０で管理されるものとして説明する。 The file server 40 is connected to the character string collation server 10 or the terminal 20 as necessary via the communication network 30 and records data such as text to be searched, search character string, collation result, CD or DVD. A computer having a recording medium such as
In addition, when the collation result text is on the file server 40, the text may be transmitted in response to a request from the terminal 20.
Further, when all data such as text to be searched, search character strings, and collation results are managed by the character string collation server 10, the file server 40 may not be provided. In the following description, it is assumed that all the above data is managed by the character string matching server 10.

通信ネットワーク３０は、文字列照合サーバ１０、端末２０およびファイルサーバ４０間を結合するための伝送路であって、一般には、ケーブルで実現され、通信プロトコルにはＴＣＰ／ＩＰが使われる。但し、伝送路としてはケーブルだけではなく、それらの間の通信プロトコルが一致するものであれば有線または無線のいずれでもよく、例えば、公衆回線や専用回線等によるＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、ＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）、インターネットなどを用いることができる。 The communication network 30 is a transmission path for connecting the character string matching server 10, the terminal 20, and the file server 40, and is generally realized by a cable, and TCP / IP is used as a communication protocol. However, the transmission path is not limited to a cable, and may be either wired or wireless as long as the communication protocol between them is the same. For example, a LAN (Local Area Network), a WAN ( Wide Area Network) or the Internet can be used.

図２は、本実施形態の機能構成を示すブロック図であり、同図において、文字列照合サーバ１０は、サーバ制御手段１１０、テキスト登録手段１２０、テキスト記憶手段１２５、照合対象登録手段１３０、照合対象記憶手段１３５、文字列変形手段１４０、変形規則辞書１４５、文字列照合手段１５０、照合結果記憶手段１５５、照合結果送信手段１６０を少なくとも含んで構成される。また、端末２０は、端末制御手段２１０、テキスト入力手段２２０、照合対象選択手段２３０、探索文字列入力手段２４０、照合結果表示手段２５０を少なくとも含んで構成される。 FIG. 2 is a block diagram showing the functional configuration of the present embodiment. In FIG. 2, the character string matching server 10 includes a server control unit 110, a text registration unit 120, a text storage unit 125, a collation target registration unit 130, a collation. It includes at least a target storage unit 135, a character string transformation unit 140, a transformation rule dictionary 145, a character string collation unit 150, a collation result storage unit 155, and a collation result transmission unit 160. The terminal 20 includes at least a terminal control unit 210, a text input unit 220, a collation target selection unit 230, a search character string input unit 240, and a collation result display unit 250.

サーバ制御手段１１０は文字列照合サーバ１０に係る全般の機能を制御し、端末制御手段２１０は端末２０に係る全般の機能を制御する。
テキスト入力手段２２０は、本システムへ登録するテキスト本体（あるいはテキスト本体を格納したファイルの識別子（以下、ファイル識別子という））およびこのテキストを作成した主たる言語の種別（日本語、英語、ドイツ語、フランス語等の種別）とを入力させ、入力したテキスト本体（あるいはファイル識別子）および言語の種別を付加したテキスト登録要求を端末制御手段２１０へ渡し、端末制御手段２１０はこのテキスト登録要求を通信ネットワーク３０を介して文字列照合サーバ１０へ送信する。ここで、テキストを複数の言語を使って作成している場合には、どの言語が主体となっているかを判断して、その主体的な言語の種別を指定するようにする。
また、登録するテキストに分類を与えるようにしてもよい。この場合、分類を付加したテキスト登録要求を送信する。
この他に、テキストに関する情報として、テキストの名称、作成者、登録日等の書誌事項を付加して送信するようにしてもよい。 The server control unit 110 controls general functions related to the character string matching server 10, and the terminal control unit 210 controls general functions related to the terminal 20.
The text input means 220 includes a text body to be registered in the system (or an identifier of a file storing the text body (hereinafter referred to as a file identifier)) and a type of a main language (Japanese, English, German, And the text registration request with the input text body (or file identifier) and language type added is passed to the terminal control unit 210, and the terminal control unit 210 sends the text registration request to the communication network 30. Is transmitted to the character string matching server 10. Here, when the text is created using a plurality of languages, it is determined which language is the main language and the type of the main language is specified.
Further, classification may be given to the text to be registered. In this case, a text registration request with classification added is transmitted.
In addition to this, bibliographic items such as the name of the text, the creator, and the registration date may be added and transmitted as information about the text.

テキスト登録要求をサーバ制御手段１１０が受信すると、テキスト登録手段１２０へこの登録要求を渡す。テキスト登録手段１２０は、渡されたテキスト本体（あるいはファイル識別子）および言語の種別をテキスト記憶手段１２５に記憶させる（図３（Ａ）参照）。
このテキスト記憶手段１２５がテーブル形式であれば、テキスト本体（あるいはファイル識別子）および言語の種別をテーブルの１行として登録する。この場合、テキストの識別子はテーブルの先頭からの行数となる。また、テキスト記憶手段１２５がデータベース形式であれば、テキスト本体（あるいはファイル識別子）および言語の種別が識別子を付加して登録される。
また、登録要求に分類が送られてきた場合には、テキスト本体を分類してテキスト記憶手段１２５に記憶させる（図４（Ａ）参照）。
さらに、テキストに関する情報が送信された場合には、この情報もテキスト記憶手段１２５に記憶させる。このような書誌事項が登録されていれば、照合対象を選択する場合の目安として使うことができる。 When the server control unit 110 receives the text registration request, it passes this registration request to the text registration unit 120. The text registration unit 120 stores the passed text body (or file identifier) and language type in the text storage unit 125 (see FIG. 3A).
If the text storage means 125 is in a table format, the text body (or file identifier) and language type are registered as one row of the table. In this case, the text identifier is the number of lines from the top of the table. If the text storage means 125 is in a database format, the text body (or file identifier) and language type are registered with the identifier added.
When a classification is sent in the registration request, the text body is classified and stored in the text storage means 125 (see FIG. 4A).
Further, when information related to text is transmitted, this information is also stored in the text storage means 125. If such a bibliographic item is registered, it can be used as a guide when selecting a collation target.

テキスト記憶手段１２５は、文字列照合サーバ１０のメモリ、ハードディスク等の記録媒体、または、ファイルサーバ４０のハードディスク等の記録媒体で構成する。 The text storage means 125 is configured by a recording medium such as a memory or a hard disk of the character string matching server 10 or a recording medium such as a hard disk of the file server 40.

また、テキスト記憶手段１２５へテキスト本体をそのまま記憶するのではなく、文字列変形手段１４０（後述）を使って、このテキストの主たる言語に対応する変形規則を用いて、表記を変形したテキストを作成して、これを記憶するようにしてもよい。
ここで、テキスト本体でなくファイル識別子が端末２０から送られてきた場合には、指定されたファイル識別子で示されるファイルからテキスト本体を取得して、指定された言語に対応する変形規則を用いて、表記を変形したテキストを作成して記憶する。
このように、テキストを登録するときに表記を変形しておけば、照合の度にテキストの表記を変形しなくてすむので、照合処理全体にかかる処理コストが少なくて済む。 In addition, the text body is not stored in the text storage unit 125 as it is, but the text is transformed by using the transformation rule corresponding to the main language of the text using the character string transformation unit 140 (described later). Then, this may be stored.
Here, when the file identifier is sent from the terminal 20 instead of the text body, the text body is acquired from the file indicated by the designated file identifier, and the transformation rule corresponding to the designated language is used. , Create and store text with modified notation.
In this way, if the notation is modified when the text is registered, it is not necessary to change the notation of the text every time collation is performed, so that the processing cost for the entire collation process can be reduced.

照合対象選択手段２３０は、テキスト記憶手段１２５に記憶されているテキストのうち、照合対象となるテキストを選択するための条件または選択結果を入力させ、入力した選択条件または選択結果を付加した照合対象登録要求を端末制御手段２１０へ渡し、端末制御手段２１０はこの照合対象登録要求を通信ネットワーク３０を介して文字列照合サーバ１０へ送信する。
ここで、上記の選択条件としては、テキストに関する情報に対する検索条件、例えば、分類、書誌事項等から作成される検索式である。
また、選択結果は、文字列照合サーバ１０からテキスト記憶手段１２５に登録されているテキストのリスト（テキスト一部分、テキストに関する情報等によるリスト）を取得し、このリストを端末２０の表示装置上に表示し、この中から照合対象のテキストを選択する。この場合には、選択条件の代わりに、リストの中から選択されたものに対応するテキストの識別子の集合が文字列照合サーバ１０へ送信される。 The collation target selection unit 230 inputs a condition or a selection result for selecting a text to be collated among the texts stored in the text storage unit 125, and a collation target to which the input selection condition or selection result is added. The registration request is passed to the terminal control unit 210, and the terminal control unit 210 transmits this verification target registration request to the character string verification server 10 via the communication network 30.
Here, the selection condition is a search condition for information related to text, for example, a search formula created from classification, bibliographic items, and the like.
As a selection result, a list of texts registered in the text storage unit 125 (a list based on a part of text, information on the text, etc.) is acquired from the character string matching server 10, and this list is displayed on the display device of the terminal 20. The text to be verified is selected from these. In this case, instead of the selection condition, a set of text identifiers corresponding to those selected from the list is transmitted to the character string matching server 10.

照合対象登録要求をサーバ制御手段１１０が受信すると、照合対象登録手段１３０へこの登録要求を渡す。
照合対象登録手段１３０は、選択条件による登録要求を受け取った場合には、テキスト記憶手段１２５をこの条件で検索して、合致したテキストの識別子の集合を照合対象記憶手段１３５に記憶させる。
照合対象登録手段１３０は、識別子の集合による登録要求を受け取った場合には、この識別子の集合を照合対象記憶手段１３５に記憶させる。
このように照合対象記憶手段１３５に識別子の集合を記憶した場合、テキストの照合や言語の種別を取得するときには、識別子が指すテキスト記憶手段１２５のテキスト本体や言語の種別を取り出す。さらに、テキスト記憶手段１２５がテキスト本体ではなくファイル識別子を記憶している場合には、ファイル識別子で示されるファイルからテキスト本体を取得する。
照合対象記憶手段１３５は、文字列照合サーバ１０のメモリ、ハードディスク等の記録媒体、または、ファイルサーバ４０のハードディスク等の記録媒体で構成する。 When the server control unit 110 receives the verification target registration request, it passes the registration request to the verification target registration unit 130.
When the registration target registration unit 130 receives a registration request based on the selection condition, the verification target registration unit 130 searches the text storage unit 125 under this condition and stores the set of matched text identifiers in the verification target storage unit 135.
When receiving a registration request based on a set of identifiers, the verification target registration unit 130 stores the set of identifiers in the verification target storage unit 135.
In this way, when a set of identifiers is stored in the collation target storage unit 135, when acquiring text collation or language type, the text body or language type of the text storage unit 125 indicated by the identifier is taken out. Further, when the text storage unit 125 stores a file identifier instead of the text body, the text body is acquired from the file indicated by the file identifier.
The collation target storage unit 135 is configured by a recording medium such as a memory of the character string collation server 10 or a hard disk, or a recording medium such as a hard disk of the file server 40.

ここで、照合対象記憶手段１３５を識別子の集合ではなく、テキスト記憶手段１２５と同様なデータ構造としてもよい。この場合には、識別子に対応するテキスト本体（あるいはファイル識別子）および言語の種別をテキスト記憶手段１２５から抜き出して、照合対象記憶手段１３５に記憶させる。 Here, the collation target storage unit 135 may have a data structure similar to that of the text storage unit 125 instead of a set of identifiers. In this case, the text body (or file identifier) and language type corresponding to the identifier are extracted from the text storage means 125 and stored in the collation target storage means 135.

例えば、図４（Ａ）のように分類されたテキストであるときに、「カタログ」という分類を選択条件とした場合には、図４（Ｂ）のような「カタログ」という分類に属するテキストが選択される。
このように、ユーザが照合対象となるテキストを制限することにより、文字列照合サーバ１０ではユーザが必要としないテキストに対して処理することがなくなるので、照合処理全体にかかる処理コストが少なくて済む。 For example, when the text is classified as shown in FIG. 4A, and the category “catalog” is used as the selection condition, the text belonging to the category “catalog” as shown in FIG. Selected.
In this way, by restricting the text to be collated by the user, the character string collating server 10 does not process the text that the user does not need, so the processing cost for the entire collating process can be reduced. .

また、照合対象登録手段１３０では、選択条件だけを照合対象記憶手段１３５へ記憶させておき、文字列照合手段１５０でテキストと探索文字列を照合するときに、この照合対象記憶手段１３５に記憶された選択条件にマッチしたテキスト集合をテキスト記憶手段１２５から取得するようにしてもよい。 Further, the collation target registration unit 130 stores only the selection condition in the collation target storage unit 135, and is stored in the collation target storage unit 135 when the text string collation unit 150 collates the text with the search character string. A text set that matches the selected condition may be acquired from the text storage unit 125.

文字列変形手段１４０は、指定された言語の種別に対応する変形規則を変形規則辞書１４５から取り出し、取り出した規則に基づいて、照合対象となるテキストや探索文字列の表記を変形し、変形結果を返す。
文字列変形手段１４０は、テキスト登録手段１２０または文字列照合手段１５０から起動される。テキスト登録手段１２０の場合には、登録対象のテキストとそのテキストの主たる言語の種別を指定して文字列変形手段１４０を呼び出す。また、文字列照合手段１５０から起動されるときには、照合対象のテキストと言語の種別を指定する場合と、探索文字列と言語の種別を指定する場合とがある。 The character string transformation unit 140 takes out a transformation rule corresponding to the specified language type from the transformation rule dictionary 145, transforms the text to be collated and the notation of the search character string based on the extracted rule, and produces a transformation result. return it.
The character string transformation unit 140 is activated from the text registration unit 120 or the character string collation unit 150. In the case of the text registration means 120, the character string transformation means 140 is called by designating the text to be registered and the type of the main language of the text. In addition, when activated from the character string collating unit 150, there are a case where a type of text and language to be collated is designated and a case where a type of search character string and language is designated.

変形規則辞書１４５は、本システムで提供している言語の種別ごとに変形規則を記憶している。この変形規則は、言語ごとに持つ表記の揺れや語形変化を統一するための規則である。
例えば、日本語の場合には、カタカナや送り仮名の表記などの表記の統一規則であり、語形変化のある言語（例えば、英語、ドイツ語、フランス語等）については、表記を原形へ統一する規則である。
変形規則辞書１４５は、ハードディスクなどの記録媒体に記憶されるが、一度メモリ上に読み込まれた後、次の取得時の高速化のために、そのままメモリ上に保持されキャッシュするようにしてもよい。 The deformation rule dictionary 145 stores a deformation rule for each type of language provided by this system. This transformation rule is a rule for unifying the shaking of the notation and the change of the form of the word for each language.
For example, in the case of Japanese, it is a unified rule of notation such as katakana and syllabary notation, and for languages with inflections (for example, English, German, French, etc.), the rules that unify the notation to the original form It is.
The deformation rule dictionary 145 is stored in a recording medium such as a hard disk, but once read into the memory, it may be held in the memory and cached as it is for speeding up the next acquisition. .

探索文字列入力手段２４０は、照合対象となるテキストに対して探索する文字列を入力させ、この探索文字列を付加した探索要求を端末制御手段２１０へ渡し、端末制御手段２１０はこの探索要求を通信ネットワーク３０を介して文字列照合サーバ１０へ送信する。
この探索要求を文字列照合サーバ１０で受信すると、サーバ制御手段１１０はこの探索要求を文字列照合手段１５０へ渡す。 The search character string input unit 240 inputs a character string to be searched for the text to be collated, passes the search request with the search character string added thereto to the terminal control unit 210, and the terminal control unit 210 outputs the search request. It transmits to the character string collation server 10 via the communication network 30.
When the character string matching server 10 receives this search request, the server control unit 110 passes this search request to the character string matching unit 150.

文字列照合手段１５０は、渡された探索文字列をメモリやハードディスク等の記憶媒体へ一時的に記憶し、探索文字列がテキスト記憶手段１２５あるいは照合対象記憶手段１３５から取得した照合対象となるテキスト中に含まれるかを照合して、照合結果を照合結果記憶手段１５５へ記憶する。 The character string collating unit 150 temporarily stores the passed search character string in a storage medium such as a memory or a hard disk, and the search character string is a text to be collated acquired from the text storage unit 125 or the collation target storage unit 135. The collation result storage unit 155 stores the collation result.

文字列照合手段１５０は、まず、次のいずれかの方法によって利用可能なすべての言語の種別を取得する（言語取得手段）。 The character string matching unit 150 first acquires all types of languages that can be used by any one of the following methods (language acquisition unit).

（１）変形規則辞書１４５に記憶されたすべての変形規則がどの言語に対応しているかを調べて、文字列照合サーバ１０で利用可能な言語の集合として取得する。 (1) It is checked which language all the transformation rules stored in the transformation rule dictionary 145 correspond to and acquired as a set of languages that can be used by the character string matching server 10.

（２）文字列照合サーバ１０で利用可能な言語であっても、その言語で作成されたテキストがテキスト記憶手段１２５に登録されていないものも存在する。
この場合には、変形規則辞書１４５を調べるのではなく、テキスト記憶手段１２５に記憶されているすべてのテキストの言語の種別を取り出して、重複を省いて唯一化した言語の種別を利用可能な言語の集合として取得する。
このようにすると、ある言語のテキストをテキスト記憶手段１２５に一つも保持していないとき、その言語について探索文字列の表記を変形した探索文字列を使って照合を行わないので、処理コストが少なく済むとともに、ユーザの意図しない結果が減少する。 (2) Even in a language that can be used by the character string collation server 10, there is a language in which text created in that language is not registered in the text storage unit 125.
In this case, instead of checking the transformation rule dictionary 145, the language types of all the texts stored in the text storage unit 125 are extracted, and the language types that are made unique by omitting duplication can be used. Get as a set of.
In this way, when no text in a certain language is held in the text storage means 125, the search character string obtained by modifying the notation of the search character string for the language is not used for matching, so the processing cost is low. As a result, unintended results of the user are reduced.

（３）さらに、照合対象を絞って、照合対象のテキストを照合対象記憶手段１３５に記憶した場合には、言語の種類も減少することが考えられる。
このように照合対象を絞った場合には、照合対象記憶手段１３５に記憶されているすべてのテキストの言語の種別を取り出して、重複を省いて唯一化した言語の種別を利用可能な言語の集合として取得する。
このようにすると、選択された照合対象のテキストに対応する言語に対してのみ探索する文字列の表記を変形し、その変形した探索文字列のみに対して照合を行うので、処理コストが少なく済むとともに、ユーザの意図しない結果が減少する。 (3) Furthermore, when the collation target is narrowed down and the collation target text is stored in the collation target storage unit 135, the types of languages may be reduced.
When the collation target is narrowed down in this way, the language types of all the texts stored in the collation target storage unit 135 are extracted, and a set of languages that can use the unique language type without duplication Get as.
In this way, the notation of the character string to be searched only for the language corresponding to the selected text to be collated is modified, and only the deformed search character string is collated, so that the processing cost can be reduced. At the same time, results unintended by the user are reduced.

文字列照合手段１５０は、次に、利用可能な言語の集合の言語ごとに、一時的に記憶している探索文字列と言語を文字列変形手段１４０へ渡し、言語に応じた変形規則を使って表記を変形した文字列を得る（変形探索文字列生成手段）。
これらの言語ごとの変形探索文字列は次のいずれかの方法で生成して一時的に記憶しておく。 Next, the character string collating means 150 passes the temporarily stored search character string and language to the character string deforming means 140 for each language of the set of available languages, and uses the deformation rule corresponding to the language. To obtain a character string in which the notation is modified (modified search character string generation means).
These modified search character strings for each language are generated by one of the following methods and temporarily stored.

（１）言語とは関係なく、複数の変形探索文字列を一まとめで記憶する。
この場合、言語間で同じ変形探索文字列になる場合には、重複する変形探索文字列を取り除いて記憶するようにしてもよい。探索文字列が複数の単語からなるような場合には、その語順を含めて同じ表記であるかを比較する。
重複する変形後の探索文字を取り除くようにした場合、すでに処理済の探索する文字列に対して処理することがなくなるので、処理コストが少なく済む。
（２）言語ごとに変形探索文字列を記憶する。 (1) Regardless of language, a plurality of deformation search character strings are stored together.
In this case, when the same modified search character string is used between languages, the duplicate modified search character string may be removed and stored. When the search character string is composed of a plurality of words, it is compared whether the notation is the same including the word order.
If duplicate search characters after transformation are removed, processing is not performed on the already-searched character string to be searched, so that the processing cost can be reduced.
(2) A transformation search character string is stored for each language.

例えば、英語、ドイツ語およびフランス語に対して、「背景技術」で上述した変形規則を用いるものとすると、探索文字列「papers」を英語、ドイツ語、フランス語の各変形規則を適用すると次のようになる。
英語の変形規則を適用した場合、「papers」→「paper」。
ドイツ語の変形規則を適用した場合、「papers」→「papers」。これは、「papers」に対応する変形規則がないため変形した表記が作られないためである。
フランス語の変形規則を適用した場合、「papers」→「paper」。
これらの変形探索文字列を一まとめにする場合、英語とフランス語の変形規則を用いると同じ表記「paper」となるために、「paper」と「papers」の２つの変形探索文字列を記憶する。 For example, assuming that the transformation rules described above in “Background Technology” are used for English, German, and French, applying the English, German, and French transformation rules to the search string “papers” is as follows. become.
When applying English transformation rules, “papers” → “paper”.
When the German transformation rules are applied, “papers” → “papers”. This is because there is no deformation rule corresponding to “papers”, so that a deformed notation cannot be made.
When applying the French transformation rules, “papers” → “paper”.
When these modified search character strings are grouped together, two modified search character strings “paper” and “papers” are stored in order to obtain the same notation “paper” when the modified rules of English and French are used.

文字列照合手段１５０は、次に、照合対象となるテキストに変形探索文字列が含まれているかを次の手順で調べる。
（１）変形探索文字列を言語とは関係なく、一まとめで記憶した場合： Next, the character string collating unit 150 checks whether the text to be collated includes a deformation search character string according to the following procedure.
(1) When deformed search character strings are stored together, regardless of language:

（１Ａ）テキスト記憶手段１２５または照合対象記憶手段１３５から未処理の照合対象のテキストの識別子を選択する。 (1A) An unprocessed text identifier to be collated is selected from the text storage means 125 or the collation target storage means 135.

（１Ｂ）指定された識別子が指すテキストの内容と言語の種別を取り出して、文字列変形手段１４０に渡して、その言語に応じた変形規則を使って表記を変形した変形テキストを得る。ここで、テキストに変形規則を予め適用して、表記を変形してテキスト記憶手段１２５に登録した場合には、この変形処理は省略できる。
例えば、図３（Ａ）の識別子「０００２」のテキストの内容は「paper,papers」で言語の種別が「英語」であるから、英語の変形規則を適用して「paper,paper」という変形テキストとなる。同様に、識別子「０００３」のテキストは、ドイツ語の変形規則を適用して「papier,papier」という変形テキストとなる。また、識別子「０００４」のテキストは、フランス語の変形規則を適用して「papier,papier」という変形テキストとなる。 (1B) The content of the text indicated by the specified identifier and the language type are taken out and passed to the character string transformation unit 140 to obtain a transformed text whose notation is transformed using a transformation rule corresponding to the language. Here, when the transformation rule is applied in advance to the text, the notation is transformed and registered in the text storage means 125, this transformation process can be omitted.
For example, since the content of the text with the identifier “0002” in FIG. 3A is “paper, papers” and the language type is “English”, the modified text “paper, paper” is applied by applying the English transformation rule. It becomes. Similarly, the text of the identifier “0003” becomes a modified text “papier, papier” by applying a German modification rule. In addition, the text of the identifier “0004” is changed to “papier, papier” by applying the French change rule.

（１Ｃ）変形テキストに変形探索文字列のいずれかが含まれているかを調べる。含まれていれば、今調べたテキストの識別子を記憶しておく。
例えば、照合対象の変形テキストのうち識別子「０００２」には変形探索文字列のうち１番目と３番目の「paper」が含まれるので、識別子「０００２」を記憶しておく。
識別子「０００３」の変形テキストには変形探索文字列の文字列のいずれも含まれていない。同様に、識別子「０００４」の変形テキストにも変形探索文字列の文字列のいずれも含まれていない。 (1C) It is checked whether any one of the modified search character strings is included in the modified text. If it is included, the identifier of the text just examined is stored.
For example, the identifier “0002” of the modified text to be collated includes the first and third “paper” of the modified search character string, so the identifier “0002” is stored.
The modified text with the identifier “0003” does not include any character string of the modified search character string. Similarly, none of the character strings of the modified search character string is included in the modified text with the identifier “0004”.

（１Ｄ）（１Ａ）から（１Ｃ）までを未処理の照合対象のテキストがなくなるまで繰り返し、すべての照合対象のテキストを処理したときには、（１Ｃ）で記憶したテキストの識別子の集合を照合結果として、照合結果記憶手段１５５へ記憶する。上記例の場合、識別子「０００２」が照合結果記憶手段１５５へ記憶される。
なお、照合結果は探索文字列が含まれるテキストの識別子の集合としたが、探索文字列が含まれるテキストそのものや、探索文字列がテキストのどの位置に含まれていたか、また、含まれていたテキストの一部分などの情報を照合結果としたりしてもよい。 (1D) (1A) to (1C) are repeated until there is no unprocessed text to be collated, and when all text to be collated are processed, the set of text identifiers stored in (1C) is used as the collation result. And stored in the collation result storage means 155. In the case of the above example, the identifier “0002” is stored in the verification result storage unit 155.
The collation result is a set of identifiers of the text including the search character string. However, the text itself including the search character string, the position where the search character string was included in the text, and the text were included. Information such as a part of text may be used as a collation result.

（２）言語ごとに変形探索文字列を記憶した場合: (2) When a modified search character string is stored for each language:

（２Ａ）テキスト記憶手段１２５または照合対象記憶手段１３５から未処理の照合対象のテキストの識別子を選択する。
（２Ｂ）指定された識別子が指すテキストの内容と言語の種別を取り出して、文字列変形手段１４０に渡して、その言語に応じた変形規則を使って表記を変形した変形テキストを得る。ここで、テキストに変形規則を予め適用して、表記を変形してテキスト記憶手段１２５に登録した場合には、この変形処理は省略できる。
（２Ｃ）照合対象のテキストの言語と同じ言語に対応した変形探索文字列が照合対象のテキストの変形テキストに含まれているかを調べる。含まれていれば、今調べたテキストの識別子を記憶しておく。
（２Ｄ）（２Ａ）から（２Ｃ）までを未処理の照合対象のテキストがなくなるまで繰り返し、すべての照合対象のテキストを処理したときには、（２Ｃ）で記憶したテキストの識別子の集合を照合結果として、照合結果記憶手段１５５へ記憶する。 (2A) The identifier of the unprocessed text to be collated is selected from the text storage means 125 or the collation target storage means 135.
(2B) The content of the text indicated by the specified identifier and the language type are extracted and passed to the character string transformation unit 140 to obtain a modified text whose notation is modified using a transformation rule according to the language. Here, when the transformation rule is applied in advance to the text, the notation is transformed and registered in the text storage means 125, this transformation process can be omitted.
(2C) It is checked whether the deformation search character string corresponding to the same language as the language of the text to be collated is included in the text of the text to be collated. If it is included, the identifier of the text just examined is stored.
(2D) (2A) to (2C) are repeated until there is no unprocessed text to be collated, and when all texts to be collated are processed, the set of text identifiers stored in (2C) is used as the collation result. And stored in the collation result storage means 155.

（３）上記（２）の変形として、探索文字列に対して、利用可能なすべての言語に対する変形探索文字列を求めるのではなく、照合対象のテキストをその作成言語によって変形表記を得るときに、同じ言語の変形規則で変形探索文字列を得てから照合する場合: (3) As a modification of (2) above, instead of obtaining a modified search character string for all available languages for a search character string, when obtaining a modified notation for the text to be collated in its creation language When collating after obtaining a transformation search string with the same language transformation rules:

（３Ａ）テキスト記憶手段１２５または照合対象記憶手段１３５から未処理の照合対象のテキストの識別子を選択する。
（３Ｂ）指定された識別子が指すテキストの内容と言語の種別を取り出して、文字列変形手段１４０に渡して、その言語に応じた変形規則を使って表記を変形した変形テキストを得る。ここで、テキストに変形規則を予め適用して、表記を変形してテキスト記憶手段１２５に登録した場合には、この変形処理は省略できる。
（３Ｃ）指定された識別子が指すテキストの言語の種別と探索文字列を文字列変形手段１４０に渡して、その言語に応じた変形規則を使って表記を変形した変形探索文字列を得る。
（３Ｄ）変形テキストに変形探索文字列が含まれているかを調べる。含まれていれば、今調べたテキストの識別子を記憶しておく。
（３Ｅ）（３Ａ）から（３Ｄ）までを未処理の照合対象のテキストがなくなるまで繰り返し、すべての照合対象のテキストを処理したときには、（３Ｄ）で記憶したテキストの識別子の集合を照合結果として、照合結果記憶手段１５５へ記憶する。 (3A) The identifier of the unprocessed text to be collated is selected from the text storage means 125 or the collation target storage means 135.
(3B) The text content and language type pointed to by the specified identifier are taken out and passed to the character string transformation means 140 to obtain a transformed text whose notation is transformed using a transformation rule according to the language. Here, when the transformation rule is applied in advance to the text, the notation is transformed and registered in the text storage means 125, this transformation process can be omitted.
(3C) The type of the language of the text indicated by the specified identifier and the search character string are passed to the character string deforming unit 140, and a deformed search character string whose notation is modified using a deformation rule corresponding to the language is obtained.
(3D) Check whether the deformation text includes a deformation search character string. If it is included, the identifier of the text just examined is stored.
(3E) (3A) to (3D) are repeated until there is no unprocessed text to be collated, and when all texts to be collated are processed, a set of text identifiers stored in (3D) is used as a collation result. And stored in the collation result storage means 155.

最後に、照合が終了すると、文字列照合手段１５０からサーバ制御手段１１０へ制御を戻す。サーバ制御手段１１０は照合結果送信手段１６０を起動して、照合結果記憶手段１５５の内容を探索要求のあった端末２０へ送信する。
また、照合結果送信手段１６０は、照合結果を送信するのではなく、照合が終了したことだけを端末２０へ返信し、端末２０からの要求されたときに照合結果を送信して、実際の表示を行うようにしてもよい。または、照合結果を記憶する照合結果記憶手段１５５を照合要求ごとに一意の記憶領域に記憶して、照合結果送信手段１６０からは照合結果を一意に記憶した記憶領域（照合結果記憶手段１５５）のＵＲＬを端末２０へ送信しておき、端末２０側ではこのＵＲＬへアクセスしたときに返信するようにしてもよい。 Finally, when the collation is completed, the control is returned from the character string collating unit 150 to the server control unit 110. The server control unit 110 activates the collation result transmission unit 160 and transmits the contents of the collation result storage unit 155 to the terminal 20 that requested the search.
In addition, the verification result transmission means 160 does not transmit the verification result, but only returns that the verification is completed to the terminal 20, and transmits the verification result when requested by the terminal 20, so that the actual display is performed. May be performed. Alternatively, the collation result storage unit 155 that stores the collation result is stored in a unique storage area for each collation request, and the collation result transmission unit 160 stores the collation result uniquely in the storage area (collation result storage unit 155). The URL may be transmitted to the terminal 20, and the terminal 20 may send a reply when accessing this URL.

文字列照合サーバ１０から照合結果を受信した端末制御手段２１０は、照合結果表示手段２５０を起動して、照合結果を渡す。照合結果表示手段２５０は、渡された照合結果を端末２０の表示装置へ表示する。この照合結果としては、探索文字列が含まれていたテキストそのもの、探索文字列を含むテキストの部分文字列、探索文字列が含まれていたテキストの位置、あるいは探索文字列を含むテキストに関する情報（テキスト記憶手段１２５に記憶されている場合のみ）等がある。 The terminal control unit 210 that has received the collation result from the character string collation server 10 activates the collation result display unit 250 and passes the collation result. The matching result display means 250 displays the passed matching result on the display device of the terminal 20. As a result of this collation, the text itself including the search character string, the partial character string of the text including the search character string, the position of the text including the search character string, or information on the text including the search character string ( Only when stored in the text storage means 125).

次に、図５および図６のフローチャートを用いて、文字列照合手段１５０の処理手順を詳細に説明する。図５は、利用可能なすべての言語の変形規則を用いて、探索文字列を変形した表記に直す場合の文字列照合手段１５０の処理手順を示すフローチャートである。 Next, the processing procedure of the character string matching unit 150 will be described in detail with reference to the flowcharts of FIGS. FIG. 5 is a flowchart showing the processing procedure of the character string collating means 150 when the search character string is converted into a modified notation using the transformation rules of all available languages.

端末２０からの探索要求で送られてきた探索文字列を取得して、バッファのような一時的なメモリに記憶しておく（ステップＳ１０）。 A search character string sent in response to a search request from the terminal 20 is acquired and stored in a temporary memory such as a buffer (step S10).

次のいずれかの方法によって利用可能なすべての言語の種別を取得する（ステップＳ１１）。
（１）変形規則辞書１４５に記憶されたすべての変形規則の対応言語を取得して利用可能な言語の集合として得る。
（２）テキスト記憶手段１２５に記憶されているすべてのテキストの言語の種別を取り出して、重複を省いて唯一化した言語の種別を利用可能な言語の集合として得る。
（３）照合対象記憶手段１３５に記憶されているすべてのテキストの言語の種別を取り出して、重複を省いて唯一化した言語の種別を利用可能な言語の集合として得る。 The types of all languages that can be used are acquired by one of the following methods (step S11).
(1) Acquire the corresponding languages of all the transformation rules stored in the transformation rule dictionary 145 and obtain them as a set of usable languages.
(2) The language types of all the texts stored in the text storage unit 125 are extracted, and the unique language types are obtained as a set of usable languages by omitting duplication.
(3) The language types of all the texts stored in the collation target storage unit 135 are extracted, and the unique language types are obtained as a set of usable languages without duplication.

ステップＳ１１で取得した言語集合のうちの各言語の種別とバッファに記憶されている探索文字列を文字列変形手段１４０に渡し、この言語に対応する変形規則を探索文字列に適用して、表記を変形した変形探索文字列を得る（ステップＳ１２，Ｓ１３）。
ここで、
（１）得られた各言語に対応した変形探索文字列を一まとまりとしてまとめて保持する。この際、同じ文字列に対しては唯一化して保持することもできる。
（２）または、言語ごとに変形探索文字列を保持するようにしてもよい。 The type of each language in the language set acquired in step S11 and the search character string stored in the buffer are passed to the character string transformation means 140, and the transformation rule corresponding to this language is applied to the search character string to express A modified search character string obtained by transforming is obtained (steps S12 and S13).
here,
(1) The transformation search character strings corresponding to the obtained languages are held together as a group. At this time, the same character string can be kept unique.
(2) Alternatively, a modified search character string may be held for each language.

次に、テキスト記憶手段１２５または照合対象記憶手段１３５から未処理の照合対象のテキストの識別子を選択し（ステップＳ１４）、未処理のテキストがなければステップＳ１９へ進む（ステップＳ１５のＮＯ）。 Next, the identifier of the unprocessed text to be collated is selected from the text storage means 125 or the collation target storage means 135 (step S14), and if there is no unprocessed text, the process proceeds to step S19 (NO in step S15).

一方、未処理のテキストがあった場合（ステップＳ１５のＹＥＳ）、指定された識別子が指すテキストの内容と言語の種別を取り出して、文字列変形手段１４０に渡して、その言語に応じた変形規則を使って表記を変形した変形テキストを得る（ステップＳ１６）。
ここで、テキストに変形規則を予め適用して、表記を変形してテキスト記憶手段１２５に登録した場合には、この照合対象のテキストに対する変形処理を省略できる。 On the other hand, if there is unprocessed text (YES in step S15), the content of the text indicated by the specified identifier and the type of language are extracted and passed to the character string transformation means 140 to be transformed according to the language. A modified text obtained by transforming the notation using is obtained (step S16).
Here, when the transformation rule is applied to the text in advance and the notation is transformed and registered in the text storage means 125, the transformation processing for the text to be collated can be omitted.

変形テキストに変形探索文字列のいずれかが含まれているかを調べる（ステップＳ１７）。
ここで、探索文字列を変形して、言語とは関係なく一まとめにした場合には、変形テキストの中に変形探索文字列のいずれかが含まれているかを調べる。しかし、変形探索文字列を言語ごとに保持した場合には、照合対象のテキストの言語の種別に該当する変形探索文字列だけに対して変形テキストを調べるようにすれば、照合時間が減少するので処理コストが少なくて済む。 It is checked whether any one of the modified search character strings is included in the modified text (step S17).
Here, when the search character string is transformed into one group regardless of the language, it is checked whether any one of the modified search character strings is included in the modified text. However, if a modified search character string is stored for each language, if the modified text is examined only for the modified search character string corresponding to the language type of the text to be collated, the collation time decreases. Processing cost is low.

変形テキストに変形探索文字列が含まれていれば（ステップＳ１７のＹＥＳ）、今調べたテキストの識別子を記憶し（ステップＳ１８）、未処理の照合対象テキストの処理をするためにステップＳ１４へ戻る。
ここで、照合結果としては、探索文字列が含まれるテキストの識別子の集合ばかりでなく、探索文字列が含まれるテキストそのもの、探索文字列がテキストのどの位置に含まれていたか、また、含まれていたテキストの一部分などの情報としてもよい。 If the modified text includes a modified search character string (YES in step S17), the identifier of the text that has just been examined is stored (step S18), and the process returns to step S14 to process the unprocessed text to be collated. .
Here, the matching result includes not only the set of identifiers of the text including the search character string, but also the text itself including the search character string, the position where the search character string was included in the text, and the matching result. It may be information such as a part of the text.

未処理の照合対象のテキストがなくなると（ステップＳ１５のＮＯ）、ステップＳ１８で記憶したテキストの識別子の集合を照合結果として、照合結果記憶手段１５５へ記憶し（ステップＳ１９）、照合処理が終了したことをサーバ制御手段１１０へ通知して処理を終了する。 When there is no unprocessed text to be collated (NO in step S15), the set of text identifiers stored in step S18 is stored as a collation result in collation result storage means 155 (step S19), and the collation process is completed. This is notified to the server control means 110 and the process is terminated.

次に、図６は、照合対象のテキストを読み出したときに、このテキストの言語の種別についてだけ探索文字列を変形した表記に直す場合の文字列照合手段１５０の処理手順を示すフローチャートである。 Next, FIG. 6 is a flowchart showing the processing procedure of the character string collating means 150 when the text to be collated is read and the search character string is converted into a modified notation only for the language type of the text.

端末２０からの探索要求で送られてきた探索文字列を取得して、バッファのような一時的なメモリに記憶しておく（ステップＳ２０）。
テキスト記憶手段１２５または照合対象記憶手段１３５から未処理の照合対象のテキストの識別子を選択し（ステップＳ２１）、未処理のテキストがなければステップＳ２７へ進む（ステップＳ２２のＮＯ）。 The search character string sent by the search request from the terminal 20 is acquired and stored in a temporary memory such as a buffer (step S20).
The identifier of the unprocessed text to be collated is selected from the text storage means 125 or the collation target storage means 135 (step S21), and if there is no unprocessed text, the process proceeds to step S27 (NO in step S22).

一方、未処理のテキストがあった場合（ステップＳ２２のＹＥＳ）、指定された識別子が指すテキストの内容と言語の種別を取り出して、文字列変形手段１４０に渡して、その言語に応じた変形規則を使って表記を変形した変形テキストを得る（ステップＳ２３）。
ここで、テキストに変形規則を予め適用して、表記を変形してテキスト記憶手段１２５に登録した場合には、この照合対象のテキストに対する変形処理を省略できる。 On the other hand, if there is unprocessed text (YES in step S22), the text content and language type pointed to by the designated identifier are taken out and passed to the character string transformation means 140 to be transformed according to the language. A modified text obtained by transforming the notation using is obtained (step S23).
Here, when the transformation rule is applied to the text in advance and the notation is transformed and registered in the text storage means 125, the transformation processing for the text to be collated can be omitted.

照合対象となったテキストの言語の種別とバッファに記憶された探索文字列を文字列変形手段１４０に渡し、この言語に対応する変形規則を探索文字列に適用して、表記を変形した変形探索文字列を得る（ステップＳ２４）。 A modified search in which the type of the language of the text to be collated and the search character string stored in the buffer are passed to the character string transformation means 140, and a transformation rule corresponding to this language is applied to the search character string to transform the notation. A character string is obtained (step S24).

変形テキストに変形探索文字列が含まれているかを調べる（ステップＳ２５）。
ここでは、照合対象のテキストの言語の種別に該当する変形探索文字列だけを調べるので、照合時間が減少して処理コストが少なくて済む。 It is checked whether the modified text includes a modified search character string (step S25).
Here, since only the deformation search character string corresponding to the language type of the text to be collated is examined, the collation time is reduced and the processing cost can be reduced.

変形テキストに変形探索文字列が含まれていれば（ステップＳ２５のＹＥＳ）、今調べたテキストの識別子を記憶し（ステップＳ２６）、未処理の照合対象テキストの処理をするためにステップＳ２１へ戻る。
ここで、照合結果としては、探索文字列が含まれるテキストの識別子の集合ばかりでなく、探索文字列が含まれるテキストそのもの、探索文字列がテキストのどの位置に含まれていたか、また、含まれていたテキストの一部分などの情報としてもよい。 If the modified text includes the modified search character string (YES in step S25), the identifier of the text that has been examined is stored (step S26), and the process returns to step S21 to process the unprocessed text to be collated. .
Here, the matching result includes not only the set of identifiers of the text including the search character string, but also the text itself including the search character string, the position where the search character string was included in the text, and the matching result. It may be information such as a part of the text.

未処理の照合対象のテキストがなくなると（ステップＳ２２のＮＯ）、ステップＳ２６で記憶したテキストの識別子の集合を照合結果として、照合結果記憶手段１５５へ記憶し（ステップＳ２７）、照合処理が終了したことをサーバ制御手段１１０へ通知して処理を終了する。 When there is no unprocessed text to be collated (NO in step S22), the set of identifiers of the text stored in step S26 is stored as collation results in the collation result storage means 155 (step S27), and the collation process is completed. This is notified to the server control means 110 and the process is terminated.

上述した実施形態では、文字列照合サーバ１０、端末２０を別のコンピュータとして説明したが、文字列照合サーバ１０、端末２０およびファイルサーバ４０の各機能を同一のコンピュータで実行することも可能であり、その場合には通信ネットワーク３０は必要としない。この場合、図２における端末制御手段２１０およびサーバ制御手段１１０を統合した制御手段３００を設けるとともに、照合結果送信手段１６０が不要となるので、図７に示すような機能構成となる。図７において、図２と同じ機能の部分については同じ符号を付してある。 In the above-described embodiment, the character string matching server 10 and the terminal 20 have been described as separate computers. However, the functions of the character string matching server 10, the terminal 20, and the file server 40 can be executed by the same computer. In that case, the communication network 30 is not required. In this case, the control unit 300 in which the terminal control unit 210 and the server control unit 110 in FIG. 2 are integrated is provided, and the collation result transmission unit 160 is not necessary, so that the functional configuration shown in FIG. 7 is obtained. In FIG. 7, parts having the same functions as those in FIG.

ここで制御手段３００は、テキスト入力手段２２０、照合対象選択手段２３０および探索文字列入力手段２４０からユーザの入力指示を受け付けて、それぞれテキスト登録手段１２０、照合対象登録手段１３０および文字列照合手段１５０を起動させる。
また、制御手段３００は、文字列照合手段１５０から照合終了を受け付けて、照合結果表示手段２５０を起動させて、照合結果を表示装置へ表示させるように制御する。 Here, the control unit 300 accepts user input instructions from the text input unit 220, the collation target selection unit 230, and the search character string input unit 240, and the text registration unit 120, the collation target registration unit 130, and the character string collation unit 150, respectively. Start up.
In addition, the control unit 300 receives a collation end from the character string collating unit 150, activates the collation result display unit 250, and controls to display the collation result on the display device.

さらに、上述した実施形態の端末の各機能および文字列照合サーバの各機能をそれぞれプログラム化し、予めＣＤ−ＲＯＭ等の記録媒体に書き込んでおき、端末および文字列照合サーバの記録媒体読取装置にこの記録媒体を装着して、これらのプログラムをＣＰＵで実行することによって、本発明の目的が達成される。
この場合、記録媒体から読出されたプログラム自体が上述した実施形態を実現することになり、そのプログラムおよびそのプログラムを記録した記録媒体も本発明を構成することになる。 Furthermore, each function of the terminal and each function of the character string collation server in the above-described embodiment is programmed, written in a recording medium such as a CD-ROM in advance, and this is stored in the recording medium reading device of the terminal and the character string collation server. By mounting the recording medium and executing these programs by the CPU, the object of the present invention is achieved.
In this case, the program read from the recording medium itself realizes the above-described embodiment, and the program and the recording medium on which the program is recorded also constitute the present invention.

なお、記録媒体としては半導体媒体（例えば、ＲＯＭ、不揮発性メモリカード等）、光媒体（例えば、ＤＶＤ、ＭＯ、ＭＤ、ＣＤ−Ｒ等）、磁気媒体（例えば、磁気テープ、フレキシブルディスク等）のいずれであってもよい。
あるいは、インターネット等の通信ネットワークを介して記憶装置に格納されたプログラムをサーバコンピュータから直接供給を受けるようにしてもよい。この場合、このサーバコンピュータの記憶装置も本発明の記録媒体に含まれる。 As a recording medium, a semiconductor medium (for example, ROM, nonvolatile memory card, etc.), an optical medium (for example, DVD, MO, MD, CD-R, etc.), a magnetic medium (for example, magnetic tape, flexible disk, etc.) Either may be sufficient.
Alternatively, the program stored in the storage device may be directly supplied from the server computer via a communication network such as the Internet. In this case, the storage device of this server computer is also included in the recording medium of the present invention.

また、ロードしたプログラムを実行することにより上述した実施形態の機能が実現されるだけでなく、そのプログラムの指示に基づき、オペレーティングシステムあるいは他のアプリケーションプログラム等と共同して処理することによって上述した実施形態の機能が実現される場合も含まれる。 Further, not only the functions of the above-described embodiment are realized by executing the loaded program, but also the above-described implementation by cooperating with the operating system or other application programs based on the instructions of the program. The case where the function of the form is realized is also included.

このように上述した実施形態の機能をプログラム化して流通させることによって、コスト、可搬性、汎用性を向上させることができる。 As described above, by programming and distributing the functions of the above-described embodiment, cost, portability, and versatility can be improved.

さらに、上述した文字列照合サーバの各機能をプログラム化して、サーバコンピュータの磁気ディスク等の記憶装置に格納しておき、インターネット等の通信ネットワークで接続されたユーザのコンピュータから実行指示を受信して、当該プログラムを実行し、その結果をユーザのコンピュータへ返信するようなＡＳＰ（application service provider）による利用を提供する場合、このサーバコンピュータの記憶装置およびそのプログラムも本発明に含まれる。 Further, each function of the character string matching server described above is programmed and stored in a storage device such as a magnetic disk of a server computer, and an execution instruction is received from a user computer connected via a communication network such as the Internet. When providing use by an ASP (Application Service Provider) that executes the program and returns the result to the user's computer, the storage device of the server computer and the program are also included in the present invention.

本発明に係る実施形態の文字列照合システムの全体構成を示すブロック図である。It is a block diagram which shows the whole structure of the character string collation system of embodiment which concerns on this invention. 本発明に係る実施形態の文字列照合システムの機能構成を示すブロック図である。It is a block diagram which shows the function structure of the character string collation system of embodiment which concerns on this invention. テキスト記憶手段のデータ構造の一例である。It is an example of the data structure of a text storage means. テキストが分類つきの場合のテキスト記憶手段のデータ構造例（Ａ）と、テキスト記憶手段から選択されたテキストを格納する照合対象記憶手段のデータ構造例（Ｂ）である。A data structure example (A) of the text storage means when the text is classified and a data structure example (B) of the collation target storage means for storing the text selected from the text storage means. 利用可能なすべての言語の変形規則を用いて、探索文字列を変形した表記に直した場合の文字列照合手段の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the character string collation means at the time of changing into the description which changed the search character string using the deformation | transformation rule of all the languages which can be utilized. 照合対象のテキストを読み出したときに、このテキストの言語の種別についてだけ探索文字列を変形した表記に直す場合の文字列照合手段の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the character string collation means in the case of changing the search character string into the description which changed only about the classification of the language of this text, when the text of collation object is read. 本発明に係る実施形態の文字列照合システムの他の機能構成（文字列照合システムをコンピュータ単体で実現する場合）を示すブロック図である。It is a block diagram which shows the other function structure (when implement | achieving a character string collation system with a computer single-piece | unit) of the character string collation system of embodiment which concerns on this invention.

Explanation of symbols

１０…文字列照合サーバ、２０…端末、３０…通信ネットワーク、４０…ファイルサーバ、１１０…サーバ制御手段、１２０…テキスト登録手段、１２５…テキスト記憶手段、１３０…照合対象登録手段、１３５…照合対象記憶手段、１４０…文字列変形手段、１４５…変形規則辞書、１５０…文字列照合手段、１５５…照合結果記憶手段、１６０…照合結果送信手段、２１０…端末制御手段、２２０…テキスト入力手段、２３０…照合対象選択手段、２４０…探索文字列入力手段、２５０…照合結果表示手段、３００…制御手段。 DESCRIPTION OF SYMBOLS 10 ... Character string collation server, 20 ... Terminal, 30 ... Communication network, 40 ... File server, 110 ... Server control means, 120 ... Text registration means, 125 ... Text storage means, 130 ... Collation object registration means, 135 ... Collation object Storage means 140 ... Character string transformation means 145 ... Deformation rule dictionary 150 ... Character string collation means 155 ... Collation result storage means 160 ... Collation result transmission means 210 ... Terminal control means 220 ... Text input means 230 ... collation target selection means, 240 ... search character string input means, 250 ... collation result display means, 300 ... control means.

Claims

In a character string collating device for collating whether or not a character string to be searched is included in a text, a character string deforming unit that modifies the notation of the character string using a deformation rule prepared for each language, a text to be collated, and the text Text storage means for associating and storing the language of the text, language acquisition means for acquiring languages usable in the apparatus, and the search string specified for all of the acquisition languages, the character string Deformation search character string generation means deformed by the deformation means, and for each text held in the text storage means, the notation in the language of the text is deformed by the character string deformation means, the deformed text and the deformation A character string matching device comprising character string matching means for matching a search character string.

In a character string collation device for collating whether or not the character string to be searched is included in the text, a character string transformation means for transforming the notation of the character string using a transformation rule prepared for each language, and the language of the text to be collated A text storage means for transforming the notation of the text by the character string transforming means, associating and holding the deformed text and the language of the text, and a language acquisition means for acquiring a language usable in the apparatus; , For all of the acquired languages, a modified search character string generating means for transforming the notation of the designated search character string by the character string deforming means, and for each deformed text held in the text storage means A character string collating device comprising character string collating means for collating the deformed text with the modified search character string.

3. The character string collating apparatus according to claim 1, further comprising collation target registration means for registering a collation target text selected from the text storage means in a collation target storage means, wherein the character string collation means is the text A character string collation apparatus characterized in that collation is performed on text registered in the collation target storage means instead of storage means.

3. The character string collating apparatus according to claim 1, wherein the language acquisition unit acquires all language sets of languages corresponding to the transformation rules referred to by the character string transformation unit as usable languages. A character string matching device.

3. The character string collating apparatus according to claim 1, wherein the language acquisition unit acquires a language set obtained by removing a duplicate language from a language set of all texts held in the text storage unit as an available language. A character string matching device characterized by:

4. The character string collating apparatus according to claim 3, wherein the language acquiring unit acquires, as an available language, a language set obtained by excluding duplicate languages from the language set of all texts registered in the verification target storage unit. A character string matching device characterized by that.

7. The character string collating apparatus according to claim 1, wherein the modified search character string generation unit uses the modified search character string generated for each acquired language as a plurality of modified search characters regardless of language. The character string collating device is characterized in that the character string collating means regards any of the plurality of modified search character strings as being a string.

8. The character string collating apparatus according to claim 7, wherein the modified search character string generating unit excludes duplicated ones of the plurality of modified search character strings.

7. The character string collating apparatus according to claim 1, wherein the modified search character string generation unit holds the modified search character string generated for each acquired language for each language, and The character string collating apparatus characterized in that the column collating means collates the modified text to be collated with the modified search character string corresponding to the language of the text.

In a character string collating device for collating whether or not a character string to be searched is included in a text, a character string deforming unit that modifies the notation of the character string using a deformation rule prepared for each language, a text to be collated, and the text Text storage means for associating and holding the language of the text, and for each text held in the text storage means, the text transformation means for expressing the text and the designated search character string in the language of the text A character string collating device comprising: a character string collating unit that deforms the deformed text and the deformed search character string.

In a character string collation device for collating whether or not the character string to be searched is included in the text, a character string transformation means for transforming the notation of the character string using a transformation rule prepared for each language, and the language of the text to be collated The text representation is transformed by the character string transformation means, the text storage means for holding the transformed text and the language of the text in association with each other, and for each transformed text held in the text storage means A character string collating unit that modifies the notation of the search character string specified in the language of the text by the character string deforming unit, and collates the deformed text with the deformed search character string. Character string matching device.

12. The character string collating apparatus according to claim 10, further comprising collation target registration means for registering text to be collated selected from the text storage means in the collation target storage means, wherein the character string collating means is the text A character string collation apparatus characterized in that collation is performed on text registered in the collation target storage means instead of storage means.

In a character string matching system that connects a character string matching server and one or more terminals, the character string matching server matches a search character string designated by the terminal with text, and returns a matching result to the terminal.
The terminal inputs a text to be collated and a language of the text, sends a registration request to the character string collation server, and inputs a character string to be searched for a collation request for the search character string. Search character string input means for transmitting to the character string collation server, and collation result display means for receiving and displaying the collation result from the character string collation server,
The character string matching server
Text registration means for associating the text received from the terminal and the language of the text and registering the text in the text storage means; a character string transformation means for transforming the character string notation using a transformation rule prepared for each language; Language acquisition means for acquiring possible languages, modified search character string generation means for transforming the notation of the search character string by the character string modification means for all of the acquired languages, and the text storage means. In addition, for each text, the notation in the language of the text is deformed by the character string deforming means, a character string matching means for matching the deformed text and the deformed search character string, and a matching result is transmitted to the terminal The character string collation system characterized by including the collation result transmission means to perform.

In a character string matching system that connects a character string matching server and one or more terminals, the character string matching server matches a search character string designated by the terminal with text, and returns a matching result to the terminal.
The terminal inputs a text to be collated and a language of the text, sends a registration request to the character string collation server, and inputs a character string to be searched for a collation request for the search character string. Search character string input means for transmitting to the character string collation server, and collation result display means for receiving and displaying the collation result from the character string collation server,
The character string matching server includes: a character string deforming unit that deforms a character string representation using a deformation rule prepared for each language; and a text received from the terminal in the language of the text by the character string deforming unit. A text registration unit that deforms and associates the deformed text with the language of the text and registers them in the text storage unit, a language acquisition unit that acquires an available language, and for all of the acquisition languages, A modified search character string generation unit that transforms the notation of the search character string by the character string transformation unit, and for each modified text held in the text storage unit, the modified text and the modified search character string A character string matching system comprising character string matching means for matching and matching result transmitting means for sending a matching result to the terminal.

In a character string matching system that connects a character string matching server and one or more terminals, the character string matching server matches a search character string designated by the terminal with text, and returns a matching result to the terminal.
The terminal inputs a text to be collated and a language of the text, sends a registration request to the character string collation server, and inputs a character string to be searched for a collation request for the search character string. Search character string input means for transmitting to the character string collation server, and collation result display means for receiving and displaying the collation result from the character string collation server,
The character string matching server
Text registration means for associating the text received from the terminal and the language of the text and registering the text in the text storage means; a character string transformation means for transforming the notation of the character string using a transformation rule prepared for each language; For each text held in the text storage means, the notation of the text and the search character string is transformed by the character string transformation means in the language of the text, and the transformed text, the transformed search character string, A character string collating system comprising: character string collating means for collating and collation result transmitting means for transmitting a collation result to the terminal.

In a character string matching system that connects a character string matching server and one or more terminals, the character string matching server matches a search character string designated by the terminal with text, and returns a matching result to the terminal.
The terminal inputs a text to be collated and a language of the text, sends a registration request to the character string collation server, and inputs a character string to be searched for a collation request for the search character string. Search character string input means for transmitting to the character string collation server, and collation result display means for receiving and displaying the collation result from the character string collation server,
The character string matching server includes: a character string deforming unit that deforms a character string representation using a deformation rule prepared for each language; and a text received from the terminal in the language of the text by the character string deforming unit. A text registration unit that deforms and associates the deformed text with the language of the text and registers the text in the text storage unit; and for each deformed text held in the text storage unit, the text language A character string collating unit that modifies the notation of the search character string by the character string deforming unit, collates the deformed text with the deformed search character string, and a collation result transmitting unit that transmits the collation result to the terminal. Character string matching system characterized by that.

In a character string collation method for collating whether or not a character string to be searched is included in text, the text to be collated and the language of the text are stored in the text storage unit in association with each other, and all available languages In contrast, the notation of the search character string designated by the transformation rule corresponding to the language is transformed, and for each text held in the text storage means, the notation is represented by the transformation rule corresponding to the language of the text. A character string matching method comprising: deforming and collating the deformed text with the modified search character string.

In a character string collation method for collating whether or not a character string to be searched is included in a text, the notation of the text to be collated is changed in the language of the text, and the deformed text and the language of the text are associated with each other Retained in the text storage means, for all the available languages, the notation of the search character string specified by the transformation rules corresponding to the language is transformed, and each transformation held in the text storage means A character string matching method, wherein the deformed text and the deformed search character string are checked against text.

In the character string collating method for collating whether or not the character string to be searched is included in the text, the text to be collated and the language in which the text is created are stored in the text storage unit in association with each other, and the text storage unit For each text stored in the text, the notation of the text and the specified search character string is modified by a deformation rule corresponding to the language of the text, and the deformed text and the deformed search character string are collated. A character string matching method characterized by:

In a character string collating method for collating whether or not a character string to be searched is included in text, the notation of the text to be collated is modified by a deformation rule corresponding to the language of the text, and the deformed text and the language of the text Are stored in the text storage unit in association with each other, and for each deformed text stored in the text storage unit, the notation of the search character string specified by the transformation rule corresponding to the language of the text is transformed. A character string collating method comprising collating the deformed text with the deformed search character string.

The program for making a computer perform the function of the character string collation apparatus in any one of Claims 1 thru | or 12, or the function of the character string collation system in any one of Claims 13 thru | or 16.

A computer-readable recording medium on which the program according to claim 20 is recorded.