JP5190192B2

JP5190192B2 - SEARCH DEVICE, SEARCH METHOD, AND PROGRAM

Info

Publication number: JP5190192B2
Application number: JP2006283227A
Authority: JP
Inventors: 勲園部
Original assignee: NS Solutions Corp
Current assignee: NS Solutions Corp
Priority date: 2006-10-18
Filing date: 2006-10-18
Publication date: 2013-04-24
Anticipated expiration: 2026-10-18
Also published as: JP2008102641A

Description

本発明は、検索装置、検索方法及びプログラムに関する。 The present invention relates to a search device, a search method, and a program.

従来は、類似文字列の検索において、類似基準に従って適用した類似展開文字列を洩れなく単純に派生させ、完全一致でマッチングを行っていた。例えば、類似基準「相違する任意の文字列が図４の関係にある」、質問称呼「エヌエス」の場合、類似展開文字列「イヌエス、ヘヌエス、エムエス・・・・、エヌエシュ」は、３×４×３×２−１＝７１通りである。 Conventionally, in the search for similar character strings, similar expansion character strings applied according to the similarity criterion are simply derived without omission and matching is performed with perfect matching. For example, in the case of the similarity criterion “arbitrary arbitrary character strings are in the relationship of FIG. 4” and the question name “NS”, the similar expansion character string “Inu S, HNS, MS,... X3x2-1 = 71.

このような従来方法では、類似展開文字列が膨大な数になり、それらに対して何度も比較を行うため、計算効率は非常に悪く、メモリ使用量が膨大になっていた。 In such a conventional method, there are an enormous number of similar expansion character strings, and comparisons are made many times. Therefore, the calculation efficiency is very poor, and the memory usage is enormous.

一方、オートマトンを用いて類似文字列を検索する方法が開示されている（例えば、特許文献１参照）。 On the other hand, a method for searching for a similar character string using an automaton is disclosed (for example, see Patent Document 1).

特開平８−３３９３７８号公報JP-A-8-339378

オートマトンを用いた類似文字列の検索方法では、計算効率や、メモリ効率はよい代わりに、ルールが作成し難く、作成したルールも理解が難しかったりするため、システムの保守性に問題があった。 In the similar character string search method using the automaton, the calculation efficiency and the memory efficiency are good, but the rule is difficult to create and the created rule is difficult to understand.

本発明はこのような問題点に鑑みなされたもので、計算効率及びメモリ効率と共に、システムの保守性がよい検索装置、検索方法及びプログラムを提供することを目的とする。 The present invention has been made in view of such problems, and an object of the present invention is to provide a search device, a search method, and a program with good system maintainability as well as calculation efficiency and memory efficiency.

そこで、本発明の検索装置は、検索対象データ列を保持する検索対象データ列保持手段と、検索データ列を取得する検索データ列取得手段と、前記検索データ列取得手段において取得された検索データ列に対して、前記検索データ列と前記検索対象データ列とが類似の関係にあるか否かの類似関係を定義したオブジェクトを用いる回数の範囲を、予め定められたルールに応じて設定する設定手段と、検索処理として、前記設定手段において設定された回数の範囲内で、前記検索データ列取得手段において取得された検索データ列と、前記検索対象データ列保持手段において保持されている検索対象データ列と、が前記オブジェクトに定義された類似関係にあるか否かの比較を行う検索手段と、を有し、前記検索手段は、前記設定手段において設定された回数の範囲内で、前記オブジェクトを用いて、前記検索データ列を構成する単位データと、前記検索対象データ列を構成する単位データと、の比較を、単位データ毎に行い、一致しない単位データが存在すると、前記設定手段において設定された回数の範囲内で、前記オブジェクトを用いて、前記一致しない単位データ同士が、前記オブジェクトに定義された類似関係にあるか否かの比較を行うことを特徴とする。 Therefore, the search device of the present invention includes a search target data string holding unit that stores a search target data string, a search data string acquisition unit that acquires a search data string, and a search data string acquired by the search data string acquisition unit On the other hand, a setting means for setting a range of the number of times of using an object defining a similar relationship as to whether or not the search data sequence and the search target data sequence have a similar relationship according to a predetermined rule And the search data string acquired by the search data string acquisition means and the search object data string held by the search object data string holding means within the range of the number of times set by the setting means as the search processing When, will have a, a search unit for comparing whether the defined similarity relation with the object, the searching means, in said setting means Within the specified number of times, using the object, the unit data constituting the search data string and the unit data constituting the search target data string are compared for each unit data and do not match. If unit data exists, the unit is used to compare whether the unit data that do not match is in a similar relationship defined in the object within the range of the number of times set by the setting unit. It is characterized by that.

係る構成とすることにより、予め定められたルールに応じて、オブジェクトを用いる回数の範囲を設定し、この範囲内で、検索データ列と、検索対象データ列と、の比較を行うので、計算効率及びメモリ効率と共に、システムの保守性がよい検索装置を提供することができる。 By adopting such a configuration, a range of the number of times the object is used is set according to a predetermined rule, and the search data string and the search target data string are compared within this range. In addition to the memory efficiency, it is possible to provide a search device with good system maintainability.

なお、オブジェクトとは、例えば、後述する類似関係テーブル等に対応する。 The object corresponds to, for example, a similarity relationship table described later.

また、本発明は、検索方法及びプログラムとしてもよい。 Further, the present invention may be a search method and a program.

本発明によれば、計算効率、メモリ効率及びシステムのメンテナンス性がよい検索装置等を提供することができる。 According to the present invention, it is possible to provide a search device or the like having good calculation efficiency, memory efficiency, and system maintainability.

以下、本発明の実施形態について図面に基づいて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

＜実施形態１＞
以下、情報処理装置１の一例のハードウェア構成を図１に示す。図１は、情報処理装置１の一例のハードウェア構成図である。 <Embodiment 1>
A hardware configuration of an example of the information processing apparatus 1 is shown in FIG. FIG. 1 is a hardware configuration diagram of an example of the information processing apparatus 1.

図１に示されるように、情報処理装置１は、ハードウェア構成として、入力装置１１と、表示装置１２と、記録媒体ドライブ装置１３と、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）１５と、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）１６と、少なくとも１つ以上のＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１７と、インターフェース装置１８と、ＨＤ（ＨａｒｄＤｉｓｋ）１９と、を含む。 As shown in FIG. 1, the information processing apparatus 1 includes, as a hardware configuration, an input device 11, a display device 12, a recording medium drive device 13, a ROM (Read Only Memory) 15, and a RAM (Random Access Memory). ) 16, at least one CPU (Central Processing Unit) 17, an interface device 18, and an HD (Hard Disk) 19.

入力装置１１は、情報処理装置１の操作者（又はユーザ）が操作するキーボード及びマウス等で構成され、情報処理装置１に各種操作情報等を入力するのに用いられる。表示装置１２は、情報処理装置１の操作者が利用するディスプレイ等で構成され、各種情報（又は画面）等を表示するのに用いられる。 The input device 11 includes a keyboard and a mouse that are operated by an operator (or user) of the information processing apparatus 1, and is used to input various operation information and the like to the information processing apparatus 1. The display device 12 includes a display used by an operator of the information processing device 1 and is used to display various types of information (or screens).

インターフェース装置１８は、情報処理装置１をネットワーク等に接続するインターフェースである。後述する情報処理装置１の類似検索に係る機能又は後述する類似検索に係るフローチャート等に関するプログラムは、例えば、ＣＤ−ＲＯＭ等の記録媒体１４によって情報処理装置１に提供されるか、ネットワーク等を通じてダウンロードされる。記録媒体１４は、記録媒体ドライブ装置１３にセットされ、プログラムが記録媒体１４から記録媒体ドライブ装置１３を介してＨＤ１９にインストールされる。 The interface device 18 is an interface that connects the information processing device 1 to a network or the like. A program related to a similarity search function of the information processing apparatus 1 described later or a flowchart related to a similarity search described later is provided to the information processing apparatus 1 by a recording medium 14 such as a CD-ROM, or downloaded through a network or the like Is done. The recording medium 14 is set in the recording medium drive device 13, and the program is installed from the recording medium 14 to the HD 19 via the recording medium drive device 13.

ＲＯＭ１５は、情報処理装置１の電源投入時に最初に読み込まれるプログラム等を記録する。ＲＡＭ１６は、情報処理装置１のメインメモリである。ＣＰＵ１７は、必要に応じて、ＨＤ１９よりプログラムを読み出して、ＲＡＭ１６に格納し、プログラムを実行することで、後述する類似検索に係る機能の全て又は一部を提供したり、後述する類似検索に係る機能に係るフローチャート等を実行したりする。また、ＨＤ１９は、プログラム以外に、例えば後述するルール、ルールに対応する類似関係テーブルの適用可能回数及び閾値、検索対象文字列、類似関係テーブル等を格納する。 The ROM 15 records a program that is read first when the information processing apparatus 1 is powered on. The RAM 16 is a main memory of the information processing apparatus 1. The CPU 17 reads out the program from the HD 19 as necessary, stores it in the RAM 16, and executes the program to provide all or part of the functions related to the similar search described later, or to perform the similar search described later. A flowchart relating to the function is executed. In addition to the program, the HD 19 stores, for example, rules to be described later, the number of applicable similarity tables corresponding to the rules, a threshold, a search target character string, a similarity relationship table, and the like.

以下、ＣＰＵ１７、ＲＡＭ１６、ＨＤ１９及びプログラム等から構成される、情報処理装置１の機能構成の一例を図２に示す。図２は、情報処理装置１の一例の機能構成図（その１）である。図２に示されるように、情報処理装置１は、機能構成として、検索文字列取得部２１と、設定部２２と、ルール保持部２３と、検索部２４と、検索対象文字列保持部２５と、類似関係テーブル保持部２６と、検索結果集約部２７と、を含む。 Hereinafter, an example of a functional configuration of the information processing apparatus 1 including the CPU 17, the RAM 16, the HD 19, a program, and the like is illustrated in FIG. FIG. 2 is a functional configuration diagram (part 1) of an example of the information processing apparatus 1. As shown in FIG. 2, the information processing apparatus 1 includes a search character string acquisition unit 21, a setting unit 22, a rule holding unit 23, a search unit 24, and a search target character string holding unit 25 as functional configurations. The similarity relationship table holding unit 26 and the search result aggregation unit 27 are included.

検索文字列取得部２１は、検索文字列を取得する。検索文字列取得部２１は、例えば、検索画面等においてユーザが入力した検索したい文字列を検索文字列として取得する。設定部２２は、検索文字列取得部２１が取得した検索文字列に対して、ルール保持部２３に保持されているルールに応じて、図３に示されるような、類似関係テーブルの適用可能回数及び閾値を設定する。図３は、類似関係テーブルの適用可能回数及び閾値の一例を示す図（その１）である。ここで、類似関係テーブルの適用可能回数は、類似関係テーブルを用いる（参照する）ことができる最大値（最大回数）を表しており、閾値は、類似関係テーブルを用いなければならない（参照しなければならない）最小値（最小回数）を表していると言える。 The search character string acquisition unit 21 acquires a search character string. The search character string acquisition unit 21 acquires, as a search character string, for example, a character string that the user inputs on the search screen or the like. The setting unit 22 can apply the similarity relationship table as shown in FIG. 3 according to the rules held in the rule holding unit 23 for the search character string acquired by the search character string acquisition unit 21. And set a threshold. FIG. 3 is a diagram (part 1) illustrating an example of the applicable number of times and threshold values of the similarity relationship table. Here, the applicable number of times of the similarity relationship table represents the maximum value (maximum number of times) at which the similarity relationship table can be used (referenced), and the threshold value must be the similarity relationship table (reference must be made). It can be said that it represents the minimum value (minimum number of times).

ルール保持部２３は、ルールと、ルールに対応する図３に示されるような類似関係テーブルの適用可能回数及び閾値とを保持する。ここで、ルールとは、例えば、「１音乃至２音が類似関係テーブルＡの関係にある」等である。なお、ルールは、例えば、ファイル等に記述されているものとする。 The rule holding unit 23 holds the rules and the applicable number of times and threshold values of the similarity relationship table as shown in FIG. 3 corresponding to the rules. Here, the rule is, for example, “one sound or two sounds are in the relationship of the similarity relationship table A” or the like. The rules are described in, for example, a file.

ルールは、複数であってもよいし、単数であってもよい。複数のルールが存在する場合は、例えば、検索画面等においてユーザが選択（又は入力）した検索条件等に応じて、情報処理装置１が自動的に、複数のルールの中から一つ或いは複数のルールを選択して、検索文字列に対して適用するようにしてもよいし、ユーザが検索画面等において一つ或いは複数のルールを選択できるようにしてもよい。 There may be a plurality of rules or a single rule. When there are a plurality of rules, for example, the information processing apparatus 1 automatically selects one or a plurality of rules from the plurality of rules according to the search condition selected (or input) by the user on the search screen or the like. A rule may be selected and applied to the search character string, or the user may be able to select one or more rules on the search screen or the like.

検索部２４は、設定部２２において設定された類似関係テーブルの適用可能回数及び閾値の範囲内で、検索文字列と、検索対象文字列保持部２５において保持されている検索対象文字列との比較を行い、ヒットした検索対象文字列を出力する。なお、検索部２４における比較処理の詳細は、後述する図５を用いて説明する。 The search unit 24 compares the search character string with the search target character string held in the search target character string holding unit 25 within the range of the applicable number of times and the threshold value of the similarity relationship table set in the setting unit 22. And output the search target character string. Details of the comparison processing in the search unit 24 will be described with reference to FIG.

検索対象文字列保持部２５は、検索対象文字列を保持する。ここで、検索対象文字列とは、例えば、商標の称呼検索を例にすると、商標（文字列の商標）である。検索対象文字列保持部２５は、検索対象文字列を、例えばトライ構造で保持する。 The search target character string holding unit 25 holds a search target character string. Here, the search target character string is, for example, a trademark (a trademark of a character string) in the case of a trademark name search. The search target character string holding unit 25 holds the search target character string in, for example, a trie structure.

類似関係テーブル保持部２６は、図４に示されるような、類似関係テーブルを保持する。図４は、類似関係テーブルの一例を示す図（その１）である。 The similarity relationship table holding unit 26 holds a similarity relationship table as shown in FIG. FIG. 4 is a diagram (part 1) illustrating an example of the similarity relationship table.

検索結果集約部２７は、検索部２４における検索処理の結果を集約し（例えば、重複したデータのマージ等を行い）、検索結果（検索結果の一覧）を検索結果画面等に出力する。 The search result aggregating unit 27 aggregates the results of the search process in the search unit 24 (for example, merging duplicate data) and outputs the search results (search result list) to the search result screen or the like.

図５は、比較処理の一例を示すフローチャートである。なお、図５に示す比較処理は、１つの検索文字列と、１つの検索対象文字列との比較処理である。
ステップＳ１０において、検索部２４は、類似関係テーブル適用カウンタをリセットする（ゼロにする）。 FIG. 5 is a flowchart illustrating an example of the comparison process. The comparison process shown in FIG. 5 is a comparison process between one search character string and one search target character string.
In step S10, the search unit 24 resets the similarity relationship table application counter (sets it to zero).

続いて、ステップＳ１１において、検索部２４は、検索文字列と、検索対象文字列との注目文字の位置（注目文字位置）を先頭にする。ステップＳ１２において、検索部２４は、検索文字列と、検索対象文字列との注目文字位置を取り出す。ここで、以下、ステップＳ１２において検索部２４が取り出した、注目文字位置の検索文字列の文字を検索文字という。また、以下、ステップＳ１２において検索部２４が取り出した、注目文字位置の検索対象文字列の文字を検索対象文字という。 Subsequently, in step S11, the search unit 24 sets the position of the target character (target character position) in the search character string and the search target character string to the top. In step S <b> 12, the search unit 24 takes out the target character position between the search character string and the search target character string. Hereafter, the character of the search character string at the target character position extracted by the search unit 24 in step S12 is referred to as a search character. Hereinafter, the character in the search target character string at the target character position extracted by the search unit 24 in step S12 is referred to as a search target character.

ステップＳ１３において、検索部２４は、検索文字と、検索対象文字とを類似関係テーブルを適用せず、比較する。ここで、検索部２４は、検索文字の種類と、検索対象文字の種類とを、ビット列中の1となるビットの位置で表現し、検索文字と、検索対象文字とが一致するか否かを、ビット演算を用いて比較する。
例えば、
ア：０００１、イ：００１０、ウ：０１００、エ：１０００、
等４種類の文字を４ビットで、ア又はイ又はウ（＝ア｜イ｜ウ）を論理和"０１１１"と表現し、検索部２４は、検索文字と、検索対象文字とが一致するか否かを、ビット演算を用いて比較する。なお、説明の簡略化のため、文字を４ビットで表しているが、より多い種類の文字を表すのにもっと多いビット数を用いてもよい。また当然ながら、ビット数と種類の数が同じである必要は無い。
つまり、検索文字がア"０００１"で、検索対象文字がア"０００１"の場合、検索部２４は、ビット積演算を行い、
０００１＆０００１＝０００１≠０
と算出する。また、検索文字がア"０００１"で、検索対象文字がイ"００１０"の場合、検索部２４は、ビット積演算を行い、
０００１＆００１０＝００００＝０
と算出する。また、検索文字列が、検索対象文字列の中に一致する文字があるか否かを、ビット積演算を用いて比較することもできる。検索文字に類似関係テーブルを適用した検索文字が、ア又はイ又はウ "０１１１"で、検索対象文字がア "０００１"の場合、検索部２４は、ビット積演算を行い、
０１１１＆０００１＝０００１≠０
と算出する。このように、比較したい文字が複数あっても、１つの検索文字との比較を１回のビット積演算でまとめて行える為、計算効率が非常によくなっている。 In step S13, the search unit 24 compares the search character and the search target character without applying the similarity relationship table. Here, the search unit 24 expresses the type of the search character and the type of the search target character by the position of the bit that is 1 in the bit string, and determines whether or not the search character matches the search target character. Compare using bit operations.
For example,
A: 0001, A: 0010, C: 0100, D: 1000,
The four types of characters are represented by 4 bits and a or i or c (= a | i | c) is expressed as a logical sum "0111", and the search unit 24 determines whether the search character matches the search target character. Whether or not is compared using a bit operation. For simplification of description, a character is represented by 4 bits, but a larger number of bits may be used to represent more types of characters. Of course, the number of bits and the number of types need not be the same.
That is, when the search character is “0001” and the search target character is “0001”, the search unit 24 performs a bit product operation,
0001 & 0001 = 0001 ≠ 0
And calculate. When the search character is “0001” and the search target character is “0010”, the search unit 24 performs a bit product operation,
0001 & 0010 = 0000 = 0
And calculate. It is also possible to compare whether or not the search character string has a matching character in the search target character string using a bit product operation. When the search character obtained by applying the similarity table to the search character is “A”, “I”, or “C” “0111” and the search target character is “A” “0001”, the search unit 24 performs a bit product operation,
0111 & 0001 = 0001 ≠ 0
And calculate. As described above, even if there are a plurality of characters to be compared, the comparison with one search character can be performed by a single bit product operation, so that the calculation efficiency is very good.

ステップＳ１４において検索部２４は、ステップＳ１３のビット積演算の結果に基づいて、検索文字と、検索対象文字とが一致したか否かを判定する。検索部２４は、ビット積演算の結果≠０の場合は、検索文字と、検索対象文字とが一致したと判定し（ステップＳ１４においてＹＥＳ）、ステップＳ１９に進む。一方、検索部２４は、ビット積演算の結果＝０の場合は、検索文字と、検索対象文字とが一致しなかったと判定し（ステップＳ１４においてＮＯ）、ステップＳ１５に進む。 In step S14, the search unit 24 determines whether the search character matches the search target character based on the result of the bit product operation in step S13. If the result of the bit product operation is not 0, the search unit 24 determines that the search character matches the search target character (YES in step S14), and proceeds to step S19. On the other hand, when the bit product operation result = 0, the search unit 24 determines that the search character does not match the search target character (NO in step S14), and proceeds to step S15.

ステップＳ１５において、検索部２４は、類似関係テーブルＡの適用可能回数が、類似関係テーブル適用カウンタより大きいか否かを判定する。検索部２４は、類似関係テーブルＡの適用可能回数が、類似関係テーブル適用カウンタより大きいと判定すると（ステップＳ１５においてＹＥＳ）、ステップＳ１６に進む。一方、検索部２４は、類似関係テーブルＡの適用可能回数が、類似関係テーブル適用カウンタより大きくないと判定すると（ステップＳ１５においてＮＯ）、図５に示す処理を終了する。 In step S15, the search unit 24 determines whether or not the number of times that the similarity relationship table A can be applied is larger than the similarity relationship table application counter. If the search unit 24 determines that the applicable number of similar relationship table A is greater than the similar relationship table application counter (YES in step S15), the search unit 24 proceeds to step S16. On the other hand, when determining that the applicable number of similar relationship table A is not greater than the similarity relationship table application counter (NO in step S15), the search unit 24 ends the process illustrated in FIG.

ステップＳ１６において、検索部２４は、検索文字と、検索対象文字とを例えば、類似関係テーブルＡを適用して比較する。ステップＳ１７において、検索部２４は、検索文字と、検索対象文字とが類似関係にあるか否かを判定する。検索部２４は、検索文字と、検索対象文字とが類似関係にあると判定すると（ステップＳ１７においてＹＥＳ）、ステップＳ１８に進む。一方、検索部２４は、検索文字と、検索対象文字とが類似関係にないと判定すると（ステップＳ１７においてＮＯ）、図５に示す処理を終了する。 In step S16, the search unit 24 compares the search character and the search target character by applying, for example, the similarity relationship table A. In step S <b> 17, the search unit 24 determines whether the search character and the search target character are in a similar relationship. When the search unit 24 determines that the search character and the search target character are in a similar relationship (YES in step S17), the search unit 24 proceeds to step S18. On the other hand, when the search unit 24 determines that the search character and the search target character are not in a similar relationship (NO in step S17), the process illustrated in FIG. 5 ends.

ステップＳ１８において、検索部２４は、類似関係テーブル適用カウンタを１つ増やす。ステップＳ１９において、検索部２４は、検索文字列に次の文字があるか否かを判定する。検索部２４は、検索文字列に次の文字があると判定すると（ステップＳ１９においてＹＥＳ）、ステップＳ２０に進む。一方、検索部２４は、検索文字列に次の文字がないと判定すると（ステップＳ１９においてＮＯ）、ステップＳ２１に進む。 In step S18, the search unit 24 increments the similarity relationship table application counter by one. In step S19, the search unit 24 determines whether there is a next character in the search character string. If search unit 24 determines that there is a next character in the search character string (YES in step S19), it proceeds to step S20. On the other hand, when determining that there is no next character in the search character string (NO in step S19), search unit 24 proceeds to step S21.

ステップＳ２０において、検索部２４は、検索文字列と、検索対象文字列との注目文字位置を一つ（一文字分）進める。そして、検索部２４は、ステップＳ１２に戻る。 In step S20, the search unit 24 advances the attention character position of the search character string and the search target character string by one (one character). Then, the search unit 24 returns to step S12.

一方、ステップＳ２１において、検索部２４は、類似関係テーブル適用カウンタが、閾値以上か否かを判定する。検索部２４は、類似関係テーブル適用カウンタが、閾値以上であると判定すると（ステップＳ２１においてＹＥＳ）、ステップＳ２２に進む。一方、検索部２４は、類似関係テーブル適用カウンタが、閾値以上でないと判定すると（ステップＳ２２においてＮＯ）、図５に示す処理を終了する。 On the other hand, in step S21, the search unit 24 determines whether or not the similarity relationship table application counter is greater than or equal to a threshold value. When the search unit 24 determines that the similarity relationship table application counter is equal to or greater than the threshold (YES in step S21), the search unit 24 proceeds to step S22. On the other hand, when the search unit 24 determines that the similarity relationship table application counter is not equal to or greater than the threshold (NO in step S22), the search unit 24 ends the process illustrated in FIG.

ステップＳ２２において、検索部２４は、対象としている検索対象文字列を、検索文字列に類似する文字列として、つまり、検索にヒットした検索対象文字列として、検索結果集約部２７に出力する。 In step S <b> 22, the search unit 24 outputs the target search target character string to the search result aggregation unit 27 as a character string similar to the search character string, that is, as a search target character string hit in the search.

なお、ステップＳ１５、ステップＳ１７、ステップＳ２１の判定においてＮＯと判定した場合、図５に示す処理を終了するよう説明を行ったが、そのまま処理を終了するのではなく、検索部２４は、例えば、対象としている検索対象文字列は、検索文字列に類似する文字列ではなかった旨の検索結果等を、検索結果集約部２７に出力するようにしてもよい。 In addition, when it determined with NO in determination of step S15, step S17, and step S21, it demonstrated so that the process shown in FIG. 5 might be complete | finished, but the search part 24 does not complete | finish a process as it is, for example, A search result or the like indicating that the target search target character string is not a character string similar to the search character string may be output to the search result aggregation unit 27.

次に、より具体的に例を用いて、検索部２４が行う比較処理を説明する。ここで、検索部２４は、検索文字列"アイウエオ"と、検索対象文字列"アエユエオ"との比較を行うものとする。 Next, the comparison process performed by the search unit 24 will be described using a more specific example. Here, it is assumed that the search unit 24 compares the search character string “Aiweo” with the search target character string “Ae Yueo”.

上述したように、ステップＳ１０において、検索部２４は、類似関係テーブル適用カウンタをリセットする（ゼロにする）。続いて、ステップＳ１１において、検索部２４は、検索文字列と、検索対象文字列との注目文字位置を先頭にする。ステップＳ１２において、検索部２４は、検索文字列と、検索対象文字列との注目文字位置を取り出す。つまり、まず、検索部２４は、検索文字として、検索文字列の先頭文字である"ア"を取り出し、検索対象文字として、検索対象文字列の先頭文字である"ア"を取り出す。 As described above, in step S10, the search unit 24 resets the similarity relationship table application counter (sets it to zero). Subsequently, in step S11, the search unit 24 sets the attention character position of the search character string and the search target character string to the top. In step S <b> 12, the search unit 24 takes out the target character position between the search character string and the search target character string. That is, first, the search unit 24 extracts “a” that is the first character of the search character string as the search character, and extracts “a” that is the first character of the search target character string as the search target character.

ステップＳ１３において、検索部２４は、ビット積演算を用いて、検索文字列と、検索対象文字列とを、類似関係テーブルを適用せずに比較する。そして、ステップＳ１４において、検索部２４は、一致するか否かを判定する。今、両者は共に"ア"で一致するため、検索部２４は、ステップＳ１９に進み、検索文字列に次の文字があるか否かを判定する。今の場合、次に文字があるので、検索部２４は、ステップＳ２０に進み、注目文字位置を一つ進め、ステップＳ１２に進む。 In step S13, the search unit 24 uses a bit product operation to compare the search character string and the search target character string without applying the similarity relationship table. In step S14, the search unit 24 determines whether or not they match. Now, since both match with "a", the search part 24 progresses to step S19 and determines whether the next character is in a search character string. In this case, since there is a character next, the search unit 24 proceeds to step S20, advances the attention character position by one, and proceeds to step S12.

ステップＳ１２において、検索部２４は、検索文字として、検索文字列の二番目の文字である"イ"を取り出し、検索対象文字として、検索対象文字列の二番目の文字である"エ"を取り出す。ステップＳ１３において、検索部２４は、ビット積演算を用いて、検索文字列と、検索対象文字列とを、類似関係テーブルを適用せずに比較する。そして、ステップＳ１４において、検索部２４は、一致するか否かを上述したビットを用いた比較処理により判定する。今、検索文字は"イ"、検索対象文字は"エ"で一致しないため、検索部２４は、ステップＳ１５に進み、類似関係テーブルＡの適用可能回数が、類似関係テーブル適用カウンタより大きいか否かを判定する。今の例では、類似関係テーブルＡの適用可能回数は２、類似関係テーブル適用カウンタは０のため、検索部２４は、ステップＳ１６に進み、検索文字と、検索対象文字とを、類似関係テーブルＡを適用して比較する。 In step S12, the search unit 24 extracts “i”, which is the second character of the search character string, as the search character, and extracts “d”, which is the second character of the search target character string, as the search target character. . In step S13, the search unit 24 uses a bit product operation to compare the search character string and the search target character string without applying the similarity relationship table. In step S14, the search unit 24 determines whether or not they match by the comparison process using the above-described bits. Since the search character is “a” and the search target character is “D”, the search unit 24 proceeds to step S15 and determines whether the applicable number of similar relationship table A is larger than the similar relationship table application counter. Determine whether. In the present example, since the number of applicable times of the similarity relationship table A is 2 and the similarity relationship table application counter is 0, the search unit 24 proceeds to step S16, and the search character and the search target character are converted to the similarity relationship table A. Apply and compare.

類似関係テーブルＡを参照すると、"イ"と、"エ"とは類似関係にあるため、検索部２４は、上述したビットを用いた比較処理により類似すると判定し、ステップＳ１８に進み、類似関係テーブル適用カウンタを１つ増やして１とする。そして、検索部２４は、ステップＳ１９に進み、検索文字列に次の文字があるか否かを判定する。今の場合、次に文字があるので、検索部２４は、ステップＳ２０に進み、注目文字位置を一つ進め、ステップＳ１２に進む。 Referring to the similarity relationship table A, since “i” and “d” are in a similar relationship, the search unit 24 determines that they are similar by the comparison process using the above-described bit, and proceeds to step S18, where the similarity relationship The table application counter is incremented by 1 to 1. Then, the search unit 24 proceeds to step S19, and determines whether or not there is a next character in the search character string. In this case, since there is a character next, the search unit 24 proceeds to step S20, advances the attention character position by one, and proceeds to step S12.

ステップＳ１２において、検索部２４は、検索文字として、検索文字列の三番目の文字である"ウ"を取り出し、検索対象文字として、検索対象文字列の三番目の文字である"ユ"を取り出す。ステップＳ１３において、検索部２４は、ビット積演算を用いて、検索文字列と、検索対象文字列とを、類似関係テーブルを適用せずに比較する。そして、ステップＳ１４において、検索部２４は、一致するか否かを判定する。今、検索文字は"ウ"、検索対象文字は"ユ"で一致しないため、検索部２４は、ステップＳ１５に進み、類似関係テーブルＡの適用可能回数が、類似関係テーブル適用カウンタより大きいか否かを判定する。今の例では、類似関係テーブルＡの適用可能回数は２、類似関係テーブル適用カウンタは１のため、検索部２４は、ステップＳ１６に進み、検索文字と、検索対象文字とを、類似関係テーブルＡを適用して比較する。 In step S12, the search unit 24 extracts “c”, which is the third character of the search character string, as a search character, and extracts “yu”, which is the third character of the search target character string, as a search target character. . In step S13, the search unit 24 uses a bit product operation to compare the search character string and the search target character string without applying the similarity relationship table. In step S14, the search unit 24 determines whether or not they match. Since the search character is “C” and the search target character is “Y”, the search unit 24 proceeds to step S15, and whether or not the applicable number of similar relationship table A is larger than the similar relationship table application counter. Determine whether. In the present example, since the number of applicable times of the similarity relationship table A is 2 and the similarity relationship table application counter is 1, the search unit 24 proceeds to step S16, and the search character and the search target character are converted to the similarity relationship table A. Apply and compare.

類似関係テーブルＡを参照すると、"ウ"と、"ユ"とは類似関係にあるため、検索部２４は、類似すると判定し、ステップＳ１８に進み、類似関係テーブル適用カウンタを１つ増やして２とする。そして、検索部２４は、ステップＳ１９に進み、検索文字列に次の文字があるか否かを判定する。今の場合、次に文字があるので、検索部２４は、ステップＳ２０に進み、注目文字位置を一つ進め、ステップＳ１２に進む。 Referring to the similarity relationship table A, since “U” and “Yu” are in a similar relationship, the search unit 24 determines that they are similar, and proceeds to step S18 to increment the similarity relationship table application counter by one and set 2 And Then, the search unit 24 proceeds to step S19, and determines whether or not there is a next character in the search character string. In this case, since there is a character next, the search unit 24 proceeds to step S20, advances the attention character position by one, and proceeds to step S12.

ステップＳ１２において、検索部２４は、検索文字として、検索文字列の四番目の文字である"エ"を取り出し、検索対象文字として、検索対象文字列の四番目の文字である"エ"を取り出す。 In step S12, the search unit 24 extracts “d”, which is the fourth character of the search character string, as the search character, and extracts “d”, which is the fourth character of the search target character string, as the search target character. .

ステップＳ１３において、検索部２４は、ビット積演算を用いて、検索文字列と、検索対象文字列とを、類似関係テーブルを適用せずに比較する。そして、ステップＳ１４において、検索部２４は、一致するか否かを判定する。今、両者は共に"エ"で一致するため、検索部２４は、ステップＳ１９に進み、検索文字列に次の文字があるか否かを判定する。今の場合、次に文字があるので、検索部２４は、ステップＳ２０に進み、注目文字位置を一つ進め、ステップＳ１２に進む。 In step S13, the search unit 24 uses a bit product operation to compare the search character string and the search target character string without applying the similarity relationship table. In step S14, the search unit 24 determines whether or not they match. Now, since both match with "d", the search part 24 progresses to step S19 and determines whether the next character exists in a search character string. In this case, since there is a character next, the search unit 24 proceeds to step S20, advances the attention character position by one, and proceeds to step S12.

ステップＳ１２において、検索部２４は、検索文字として、検索文字列の五番目の文字である"オ"を取り出し、検索対象文字として、検索対象文字列の五番目の文字である"オ"を取り出す。 In step S12, the search unit 24 extracts “o”, which is the fifth character of the search character string, as the search character, and extracts “o”, which is the fifth character of the search target character string, as the search target character. .

ステップＳ１３において、検索部２４は、ビット積演算を用いて、検索文字列と、検索対象文字列とを、類似関係テーブルを適用せずに比較する。そして、ステップＳ１４において、検索部２４は、一致するか否かを判定する。今、両者は共に"オ"で一致するため、検索部２４は、ステップＳ１９に進み、検索文字列に次の文字があるか否かを判定する。今の場合、次に文字がないので、検索部２４は、ステップＳ２１に進み、類似関係テーブル適用カウンタが、閾値以上か否かを判定する。 In step S13, the search unit 24 uses a bit product operation to compare the search character string and the search target character string without applying the similarity relationship table. In step S14, the search unit 24 determines whether or not they match. Now, since both match with “O”, the search unit 24 proceeds to step S19, and determines whether or not there is a next character in the search character string. In this case, since there is no next character, the search unit 24 proceeds to step S21, and determines whether or not the similarity relationship table application counter is equal to or greater than a threshold value.

今の場合、類似関係テーブル適用カウンタは２、閾値は１であるため、検索部２４は、類似関係テーブル適用カウンタが、閾値以上であると判定し、ステップＳ２２に進み、検索対象文字列"アエユエオ"を、検索文字列"アイウエオ"に類似する文字列として、つまり、検索にヒットした検索対象文字列として、検索結果集約部２７に出力する。 In this case, since the similarity relationship table application counter is 2 and the threshold value is 1, the search unit 24 determines that the similarity relationship table application counter is equal to or greater than the threshold value, and proceeds to step S22 to search for the search target character string “AEEUEO "Is output to the search result aggregating unit 27 as a character string similar to the search character string" Iueo ", that is, as a search target character string hit in the search.

他に、類似関係テーブルＡの適用可能回数が、類似関係テーブル適用カウンタより大きくない場合、検索文字列全てに対して比較処理を行わずに済む。例えば、検索対象文字列が"アエユルレ"、他は上述と同じ条件の処理の場合を説明する。検索文字列の注目文字位置が３番目までは、上述と同じ処理である為省略する。検索文字列の注目文字が４番目に移動し、ステップＳ１４において、検索部２４は、一致するか否かを上述したビットを用いた比較処理により判定する。今、検索文字は"ウ"、検索対象文字は"ル"で一致しないため、検索部２４は、ステップＳ１５に進み、類似関係テーブルＡの適用可能回数が、類似関係テーブル適用カウンタより大きいか否かを判定する。２番目と３番目の検索文字列において類似関係テーブルＡを適用して比較しているため、類似関係テーブルＡの適用可能回数は２、類似関係テーブル適用カウンタは２となっている。そのため、検索部２４は、類似関係テーブルＡの適用可能回数が、類似関係テーブル適用カウンタより大きくないと判定すると（ステップＳ１５においてＮＯ）、図５に示す処理を終了する。
つまり、残りの４番目と５番目の検索文字列を比較する処理（ステップ１６、ステップ１３）を実行しなくとも、検索対象文字列"アエユルレ"を、検索文字列"アイウエオ"に類似しない文字列として、判定することが可能となる。 In addition, when the applicable number of similar relationship table A is not larger than the similar relationship table application counter, it is not necessary to perform comparison processing for all the search character strings. For example, a case will be described in which the search target character string is “Ae Yule” and other conditions are the same as those described above. The processing up to the third character position of interest in the search character string is omitted because it is the same processing as described above. The attention character of the search character string moves to the fourth position, and in step S14, the search unit 24 determines whether or not they match by the comparison process using the above-described bits. Since the search character is “c” and the search target character is “le”, the search unit 24 proceeds to step S15 and determines whether the applicable number of similar relationship table A is greater than the similar relationship table application counter. Determine whether. Since the similarity relationship table A is applied and compared in the second and third search character strings, the number of applicable times of the similarity relationship table A is 2, and the similarity relationship table application counter is 2. Therefore, if the search unit 24 determines that the number of times that the similarity relationship table A can be applied is not larger than the similarity relationship table application counter (NO in step S15), the processing illustrated in FIG.
In other words, the character string that is not similar to the search character string “Aiweo” is selected as the search target character string “Aeulere” without executing the process of comparing the remaining fourth and fifth search character strings (steps 16 and 13). Can be determined as follows.

以上、上述したように本実施形態によれば、音の種類をビット列中の1となるビットの位置で表現し、ビット積演算を用いて比較を行うことによって、比較時間（又は検索時間）を短くすることができる。つまり、計算効率をよくすることができる。 As described above, according to the present embodiment as described above, the comparison time (or search time) is expressed by expressing the type of sound by the position of the bit that becomes 1 in the bit string and performing the comparison using the bit product operation. Can be shortened. That is, calculation efficiency can be improved.

また、上述したように本実施形態によれば、ルールに応じて、類似関係テーブルの適用可能回数及び閾値を設定し、この範囲内において比較を行うので、無駄な比較処理を行わなくてもよく、比較時間（又は検索時間）を短くすることができる。つまり、計算効率をよくすることができる。また、無駄な類似展開を行わなくてもよいため、メモリ効率をよくすることができる。また、オートマトンを用いる方法に比べて、ユーザがルールを設定し易く、また、ルールも理解し易いため、メンテナンス性がよい。 Further, as described above, according to the present embodiment, the applicable number of similar relationship tables and the threshold value are set according to the rule, and the comparison is performed within this range, so that unnecessary comparison processing may not be performed. The comparison time (or search time) can be shortened. That is, calculation efficiency can be improved. In addition, since it is not necessary to perform useless similar expansion, memory efficiency can be improved. Further, compared to the method using an automaton, the user can easily set rules and understand the rules, so that the maintenance is good.

また、説明の簡略化のため1つの検索対象文字列との比較を例に挙げたが、検索対象文字列保持部２５に保持されている複数ある全ての検索対象文字列との比較であっても本願は対応可能あることは言うまでも無い。例えば、検索対象文字列保持部２５において検索対象文字列をトライ構造で保持しているのであればバックトラッキングを利用し、また、一覧形式で保持しているのであれば図５に示す処理を繰返すことによって、検索対象文字列保持部２５において保持されている全ての検索対象文字列との比較を行い、その結果を検索結果集約部２７に出力可能である。 Further, for the sake of simplification of explanation, the comparison with one search target character string is given as an example. However, the comparison with all the plurality of search target character strings held in the search target character string holding unit 25 is as follows. Needless to say, the present application is applicable. For example, if the search target character string holding unit 25 holds the search target character string in a trie structure, backtracking is used, and if the search target character string holding unit 25 holds the search target character string in a list format, the processing shown in FIG. 5 is repeated. As a result, it is possible to compare with all the search target character strings held in the search target character string holding unit 25 and output the result to the search result aggregation unit 27.

＜実施形態２＞
図６は、情報処理装置１の一例の機能構成図（その２）である。図６に示されるように、情報処理装置１は、機能構成として、検索文字列取得部２１と、設定部２２と、ルール保持部２３と、検索部２４と、検索対象文字列保持部２５と、類似関係テーブル保持部２６と、検索結果集約部２７と、ルール操作部２８とを含む。 <Embodiment 2>
FIG. 6 is a functional configuration diagram (part 2) of an example of the information processing apparatus 1. As illustrated in FIG. 6, the information processing apparatus 1 includes a search character string acquisition unit 21, a setting unit 22, a rule storage unit 23, a search unit 24, and a search target character string storage unit 25 as functional configurations. The similarity relation table holding unit 26, the search result aggregation unit 27, and the rule operation unit 28 are included.

ルール操作部２８以外の他の機能構成は、実施形態１と同様であるため、本実施形態では説明を省略する。
ルール操作部２８は、入力装置１１、表示装置１２等を用いたユーザ操作に応じて、ルール保持部２３に保持されているルールや、ルールに対応する類似関係テーブルの適用可能回数及び閾値等を、変更したり、新たなルールや、ルールに対応する類似関係テーブルの適用可能回数及び閾値等をルール保持部２３に設定したりする。 Since the functional configuration other than the rule operation unit 28 is the same as that of the first embodiment, the description thereof is omitted in this embodiment.
The rule operation unit 28 determines the rule held in the rule holding unit 23, the applicable number of times of the similarity table corresponding to the rule, the threshold value, and the like in accordance with a user operation using the input device 11, the display device 12, and the like. The rule holding unit 23 sets a new rule, the applicable number of times of the similarity table corresponding to the rule, a threshold value, and the like.

例えば、実施形態１では、ルールとして、「１音乃至２音が類似関係テーブルＡの関係にある」を例に説明を行ったが、例えば、このルールを「０音乃至２音が類似関係テーブルＡの関係にある」に変更したい場合、ルール操作部２８は、入力装置１１、表示装置１２等を用いたユーザ操作に応じて、ルール保持部２３に保持されているファイルに記述されている「１音乃至２音が類似関係テーブルＡの関係にある」を「０音乃至２音が類似関係テーブルＡの関係にある」に変更する。また、ルール操作部２８は、前記ルールに対応して、ルール保持部２３に保持されている図３に示されるような類似関係テーブルの適用可能回数及び閾値の閾値を１から０に変更する。このように変更することで、検索文字列と完全一致する検索対象文字列についても、検索文字列に類似する文字列として、つまり、検索にヒットした検索対象文字列として、検索結果集約部２７に出力する仕様へと変更できる。 For example, in the first embodiment, as an example, the rule “1 to 2 sounds are in the relationship of the similarity relationship table A” has been described as an example. When it is desired to change to “A relationship”, the rule operation unit 28 is described in a file held in the rule holding unit 23 according to a user operation using the input device 11, the display device 12, or the like. “1 to 2 sounds are in the relationship in the similarity relationship table A” is changed to “0 to 2 sounds are in the relationship in the similarity relationship table A”. Further, the rule operation unit 28 changes the applicable number of times and the threshold threshold value of the similarity relationship table as shown in FIG. 3 held in the rule holding unit 23 from 1 to 0 corresponding to the rule. By changing in this way, a search target character string that completely matches the search character string is also stored in the search result aggregating unit 27 as a character string similar to the search character string, that is, as a search target character string that has been hit by the search. It can be changed to output specifications.

以上、上述したように本実施形態によれば、ユーザは、ルールや、ルールに対応する類似関係テーブルの適用可能回数及び閾値等をより容易に設定、変更することができる。つまり、メンテナンス性をよくすることができる。 As described above, according to the present embodiment, as described above, the user can more easily set and change the applicable number of rules and the similarity table corresponding to the rule, the threshold value, and the like. That is, maintainability can be improved.

＜その他の実施形態＞
上述した実施形態１では、ルールとして、「１音乃至２音が類似関係テーブルＡの関係にある」を例に説明を行った。しかしながら、ルールが例えば、「１音乃至２音が類似関係テーブルＡ、又は類似関係テーブルＢ、又は類似関係テーブルＣの関係にある」等の場合もある。このような場合、検索部２４は、図５のステップＳ１６において、検索文字と、検索対象文字とが類似しているか否かを、類似関係テーブルＡ、類似関係テーブルＢ、類似関係テーブルＣを適用して比較する。この際、各テーブルに優先順位が付加されている場合は、検索部２４は、その優先順位に応じて、類似関係テーブルを用いて比較を行う。 <Other embodiments>
In the first embodiment described above, the description has been given by taking “one sound or two sounds are in the relationship of the similarity relationship table A” as an example of the rule. However, the rule may be, for example, “one or two sounds are in the relationship of the similarity relationship table A, the similarity relationship table B, or the similarity relationship table C”. In such a case, the search unit 24 applies the similarity relationship table A, the similarity relationship table B, and the similarity relationship table C to determine whether or not the search character and the search target character are similar in step S16 of FIG. And compare. At this time, if a priority is added to each table, the search unit 24 performs comparison using the similarity relationship table according to the priority.

なお、このような場合、類似関係テーブルの適用可能回数は、各テーブルに対して設定されていてもよいし、全てのテーブルに対して一つの値が設定されていてもよい。閾値も同様に、各テーブルに対して設定されていてもよいし、全てのテーブルに対して一つの値が設定されていてもよい。 In such a case, the applicable number of similar relationship tables may be set for each table, or one value may be set for all tables. Similarly, the threshold value may be set for each table, or one value may be set for all tables.

上述した実施形態では、図５のステップＳ２２において、検索部２４は、対象としている検索対象文字列を、検索文字列に類似する文字列として、つまり、検索にヒットした検索対象文字列として、検索結果集約部２７に出力するよう説明を行った。 In the above-described embodiment, in step S22 of FIG. 5, the search unit 24 searches the target search target character string as a character string similar to the search character string, that is, as a search target character string hit in the search. The description was made so that the result is output to the result aggregation unit 27.

しかしながら、検索部２４は、図５のステップＳ２２において、類似関係テーブル適用カウンタと共に、検索対象文字列を検索結果集約部２７に出力するようにしてもよい。このようにすることによって、検索結果集約部２７は、類似関係テーブル適用カウンタに応じて重み付けを行い、この重み付けに応じて、検索結果を出力することができる。 However, the search unit 24 may output the search target character string to the search result aggregation unit 27 together with the similarity relationship table application counter in step S22 of FIG. By doing so, the search result aggregating unit 27 can perform weighting according to the similarity relationship table application counter and output the search result according to this weighting.

例えば、類似関係テーブル適用カウンタが１の検索対象文字列は、類似関係テーブル適用カウンタが２の検索対象文字列に比べて、検索文字列に対してより類似していると言える。したがって、検索結果集約部２７は、類似関係テーブル適用カウンタが２の検索対象文字列より、類似関係テーブル適用カウンタが１の検索対象文字列に対して重み付けを行い、例えば、重み付けが高い（より類似している）検索対象文字列から順に、検索結果として出力するようにしてもよい。 For example, it can be said that a search target character string having a similarity relationship table application counter of 1 is more similar to a search character string than a search target character string having a similarity relationship table application counter of 2. Therefore, the search result aggregating unit 27 weights the search target character string whose similarity relation table application counter is 1 than the search target character string whose similarity relation table application counter is 2, for example, the weight is higher (more similar) The search result character strings may be output in order from the search target character string.

このようにすることによって、ユーザは、例えば、検索文字列により類似した検索対象文字列の順に、検索対象文字列を得ることができる。 By doing so, the user can obtain the search target character strings in the order of search target character strings that are more similar to the search character string, for example.

また、上述した実施形態では、検索文字列に対して類似関係テーブルの適用可能回数を設定する例を説明したが、図７或いは図８に示す通り、検索文字列を構成する各文字（一文字）に対して類似関係テーブルの適用可能回数を設定するようにしてもよいし、検索文字列を構成する各文字の集合（つまり、二文字、三文字、・・・）に対して類似関係テーブルの適用可能回数を設定するようにしてもよい。但し、検索文字列を構成する各文字の集合に対して類似関係テーブルの適用可能回数を設定する場合、検索文字列と、検索対象文字列との比較は、一文字ずつではなく、各文字の集合（つまり、二文字、三文字、・・・）ずつである。 In the above-described embodiment, an example in which the number of times that the similarity table can be applied is set for the search character string has been described. However, as shown in FIG. 7 or FIG. 8, each character (one character) constituting the search character string. May be set to the number of times the similarity table can be applied, or the similarity table may be set for each character set (that is, two characters, three characters,...) Constituting the search character string. The applicable number of times may be set. However, when setting the number of times that the similarity table can be applied to each character set constituting the search character string, the comparison between the search character string and the search target character string is not a single character but a set of each character. (That is, two letters, three letters, ...).

また、上述した実施形態では、適用文字１文字に対応する類似文字は１文字であったが、図９に示す通り、適用文字１文字に対応する類似文字が複数文字から成る文字列であってもよい。 In the above-described embodiment, one similar character corresponding to one applicable character is one character. However, as shown in FIG. 9, the similar character corresponding to one applicable character is a character string composed of a plurality of characters. Also good.

また、上述した実施形態では、テーブル（類似関係テーブル）を用いて文字（又は文字の音）同士の類似関係を示す例を用いて説明を行ったが、本発明の実施はテーブルに限定されるものではなく、例えば、ファイル（類似関係が記述されたファイル）等であってもよい。 Further, in the above-described embodiment, description has been made using an example in which a table (similarity relationship table) is used to indicate a similarity relationship between characters (or sound of characters), but the implementation of the present invention is limited to the table. For example, it may be a file (a file in which a similar relationship is described) or the like.

また、上述した実施形態では、検索文字列と、検索対象文字列とを、互いの文字列の先頭から比較していく例を用いて説明を行ったが、本発明の実施は文字列の先頭からに限定されるものではなく、例えば、互いの文字列の末尾や特定の任意の場所から比較してもよい。 In the above-described embodiment, the search character string and the search target character string have been described using an example in which the character strings are compared from the top of each other. For example, the comparison may be performed from the end of each character string or from a specific arbitrary place.

また、上述した実施形態では、検索文字列取得部２１は、例えば、検索画面等においてユーザが入力した検索したい文字列を検索文字列として取得するよう説明を行ったが、本発明の実施はこのことに限定されるものではない。検索文字列取得部２１は、例えば、ＯＣＲ（ＯｐｔｉｃａｌＣｈａｒａｃｔｅｒＲｅａｄｅｒ）等で読み取った文字列を検索対象として取得するようにしてもよい。この場合、例えば、情報処理装置１は画像形成装置等に対応し、自身で文字列を読み取ってもよいし、文字列を読み取った画像形成装置より、文字列を取得してもよい。なお、ＯＣＲは、例えば、下線が付されている文字列等を検索文字列として読み取る。 In the above-described embodiment, the search character string acquisition unit 21 has been described to acquire, as a search character string, a character string to be searched that is input by a user on a search screen or the like. It is not limited to that. The search character string acquisition unit 21 may acquire, for example, a character string read by an OCR (Optical Character Reader) as a search target. In this case, for example, the information processing apparatus 1 corresponds to the image forming apparatus or the like, and may read the character string by itself, or may acquire the character string from the image forming apparatus that has read the character string. The OCR reads, for example, an underlined character string as a search character string.

また、上述した実施形態１では、ルールとして、「１音乃至２音が類似関係テーブルＡの関係にある」を例に説明を行ったため、検索が類似検索となったが、本発明の実施は類似検索に限定されるものではなく、あるルールに従う記号間の配列を検索するようにしてもよい。例えば、ＤＮＡ配列の一部"Ａ（アデニン）Ｇ（グアニン）Ｃ（シトシン）Ｔ（チミン）"のうち先頭と先頭から２番目のアミノ酸配列が同じであれば、同じ特性を持つとする。この条件で同じ特性を持つＤＮＡを検索したい場合、図１０に示す適用関係テーブル、適用可能回数及び閾値を設定して上述した実施形態と同様に処理すれば、"ＡＧＧＴ"、"ＡＧＣＡ"、"ＡＧＴＴ"等、同じ特性を持ったＤＮＡ配列を検索することが可能になる。なお、ここでいう適用関係テーブルとは、実施形態１における類似関係テーブル（図４）と同等の役割を持つテーブルである。 In the above-described first embodiment, the rule is explained as an example of “one sound or two sounds are in the relationship of the similarity relationship table A”, so that the search is a similarity search. It is not limited to the similarity search, but an arrangement between symbols according to a certain rule may be searched. For example, if the first and second amino acid sequences from the beginning of a part of the DNA sequence “A (adenine) G (guanine) C (cytosine) T (thymine)” are the same, they have the same characteristics. If it is desired to search for DNA having the same characteristics under these conditions, the application relation table shown in FIG. 10, the applicable number of times, and the threshold value are set and processed in the same manner as in the above-described embodiment, so that “AGGT”, “AGCA”, “ It becomes possible to search for DNA sequences having the same characteristics such as “AGTT”. The application relationship table here is a table having the same role as the similarity relationship table (FIG. 4) in the first embodiment.

また、上述した実施形態では、検索対象文字列は検索文字列の一部或いは全部を置き換えた文字列によって構成されていたが、本発明の実施はこれに限定されるものではなく、検索文字列の任意の位置に、任意の数の空白や文字、記号等が挿入された構成にある検索対象文字列であってもよい。 In the embodiment described above, the search target character string is configured by a character string in which a part or all of the search character string is replaced. However, the embodiment of the present invention is not limited to this, and the search character string is not limited to this. The search target character string may be configured such that an arbitrary number of blanks, characters, symbols, and the like are inserted at arbitrary positions.

また、上述した実施形態では、文字列や記号を例に類似検索の説明を行ったが、本発明の実施はこれらに限定されるものではなく、音声や画像等あらゆるデータ列間の検索等であってもよい。 Further, in the above-described embodiment, the description of the similarity search has been made by taking the character string and the symbol as an example. However, the implementation of the present invention is not limited to these, and the search is performed between all data strings such as voice and images. There may be.

以上、上述した各実施形態によれば、計算効率及びメモリ効率と共に、システムの保守性がよい検索装置、検索方法及びプログラムを提供することができる。
なお、上述した各実施形態を任意に組み合わせて実施してもよい。 As described above, according to each of the above-described embodiments, it is possible to provide a search device, a search method, and a program with good system maintainability as well as calculation efficiency and memory efficiency.
In addition, you may implement combining each embodiment mentioned above arbitrarily.

以上、本発明の好ましい実施形態について詳述したが、本発明は係る特定の実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形・変更が可能である。 The preferred embodiments of the present invention have been described in detail above, but the present invention is not limited to such specific embodiments, and various modifications can be made within the scope of the gist of the present invention described in the claims.・ Change is possible.

情報処理装置１の一例のハードウェア構成図である。2 is a hardware configuration diagram of an example of an information processing apparatus 1. FIG. 情報処理装置１の一例の機能構成図（その１）である。2 is a functional configuration diagram (part 1) of an example of an information processing apparatus 1. FIG. 類似関係テーブルの適用可能回数及び閾値の一例を示す図（その１）である。FIG. 6 is a diagram (part 1) illustrating an example of the number of applicable times and a threshold value of a similarity relationship table. 類似関係テーブルの一例を示す図（その１）である。It is a figure (the 1) which shows an example of a similarity relationship table. 比較処理の一例を示すフローチャートである。It is a flowchart which shows an example of a comparison process. 情報処理装置１の一例の機能構成図（その２）である。FIG. 3 is a functional configuration diagram (part 2) of an example of the information processing apparatus 1; 類似関係テーブルの適用可能回数及び閾値の一例を示す図（その２）である。FIG. 6 is a diagram (part 2) illustrating an example of the number of applicable times and a threshold value of a similarity relationship table. 類似関係テーブルの適用可能回数及び閾値の一例を示す図（その３）である。FIG. 13 is a diagram (part 3) illustrating an example of the applicable number of times and threshold values of the similarity relationship table. 類似関係テーブルの一例を示す図（その２）である。It is a figure (the 2) which shows an example of a similarity relationship table. 類似関係テーブルと、類似関係テーブルの適用可能回数及び閾値との一例を示す図である。It is a figure which shows an example of a similar relationship table, the applicable frequency | count of a similar relationship table, and a threshold value.

Explanation of symbols

１情報処理装置
１１入力装置
１２表示装置
１３記録媒体ドライブ装置
１４記録媒体
１５ＲＯＭ
１６ＲＡＭ
１７ＣＰＵ
１８インターフェース装置
１９ＨＤ
２１検索文字列取得部
２２設定部
２３ルール保持部
２４検索部
２５検索対象文字列保持部
２６類似関係テーブル保持部
２７検索結果集約部
２８ルール操作部 DESCRIPTION OF SYMBOLS 1 Information processing apparatus 11 Input apparatus 12 Display apparatus 13 Recording medium drive apparatus 14 Recording medium 15 ROM
16 RAM
17 CPU
18 Interface device 19 HD
21 Search character string acquisition unit 22 Setting unit 23 Rule storage unit 24 Search unit 25 Search target character string storage unit 26 Similarity relationship table storage unit 27 Search result aggregation unit 28 Rule operation unit

Claims

Search target data string holding means for storing the search target data string;
A search data string acquisition means for acquiring a search data string;
A range of the number of times of using an object that defines a similar relationship as to whether or not the search data sequence and the search target data sequence are similar to the search data sequence acquired by the search data sequence acquisition means Setting means for setting according to a predetermined rule;
As search processing, within the range of the number of times set in the setting means, the search data string acquired in the search data string acquisition means, the search target data string held in the search target data string holding means, Search means for performing a comparison as to whether or not there is a similarity relationship defined for the object;
I have a,
The search means uses the object to compare the unit data constituting the search data string and the unit data constituting the search target data string within the range of the number of times set by the setting means. If there is unit data that does not match when the unit data does not match, the unit data that does not match each other within the range of the number of times set by the setting unit is defined in the object. A search device characterized in that a comparison is made as to whether or not there is any .

Search target data string holding means for storing the search target data string;
A search data string acquisition means for acquiring a search data string;
A range of the number of times of using an object that defines a similar relationship as to whether or not the search data sequence and the search target data sequence are similar to the search data sequence acquired by the search data sequence acquisition means Setting means for setting according to a predetermined rule;
As search processing, within the range of the number of times set in the setting means, the search data string acquired in the search data string acquisition means, the search target data string held in the search target data string holding means, Search means for performing a comparison as to whether or not there is a similarity relationship defined for the object;
Have
The setting means uses the table that holds the relationship between unit data and similar unit data related to the search target data that can be applied to the search data string to be output as a search result as an object, and the search data string acquisition means For the search data string acquired in step 1, the range of the number of times the object is used is set according to a predetermined rule,
The search means uses the object to compare the unit data constituting the search data string and the unit data constituting the search target data string within the range of the number of times set by the setting means. If there is unit data that does not match when the unit data does not match, the unit data that does not match each other within the range of the number of times set by the setting unit is defined in the object. A search device characterized in that a comparison is made as to whether or not there is any .

Search target data string holding means for storing the search target data string;
A search data string acquisition means for acquiring a search data string;
A range of the number of times of using an object that defines a similar relationship as to whether or not the search data sequence and the search target data sequence are similar to the search data sequence acquired by the search data sequence acquisition means Setting means for setting according to a predetermined rule;
As search processing, within the range of the number of times set in the setting means, the search data string acquired in the search data string acquisition means, the search target data string held in the search target data string holding means, Search means for performing a comparison as to whether or not there is a similarity relationship defined for the object;
Have
The search means uses the object to count the number of comparisons between the unit data constituting the search data string and the unit data constituting the search target data string, and according to the counted number A search device that performs weighting according to the degree of application of the search data string and the search target data string, and outputs the search target data string to be applied to the search data string in accordance with the weighting .

The said search means performs the comparison with the unit data which comprises the said search data row | line | column, and the unit data which comprise the said search object data row | line | column using bit product operation, The Claim 1 thru | or 3 characterized by the above-mentioned. The search device according to any one of the above.

A plurality of the objects,
The search means is acquired by the search data string acquisition means as a search process using one of the plurality of objects in accordance with the priority order within the range of the number of times set by the setting means. 5. The search device according to claim 1 , wherein the search data string is compared with the search object data string held in the search object data string holding unit .

A search method in a search device,
A search data string acquisition stage for acquiring a search data string;
For the search data string acquired in the search data string acquisition step, a range of the number of times to use an object that defines a similar relationship as to whether the search data string and the search target data string are similar A setting stage for setting according to a predetermined rule;
As a search process, within the range of the number of times set in the setting stage, the search data string acquired in the search data string acquisition stage, the search target data string held in the search target data string holding stage, A search stage for comparing whether or not there is a similarity relationship defined for the object;
Including
In the search stage, within the range of the number of times set in the setting stage, using the object, the unit data constituting the search data string and the unit data constituting the search target data string are compared. If there is unmatched unit data for each unit data, the unmatched unit data is defined in the object using the object within the range of the number of times set in the setting step. A search method characterized in that a comparison is made as to whether or not there is any.

A search method in a search device,
A search data string acquisition stage for acquiring a search data string;
For the search data string acquired in the search data string acquisition step, a range of the number of times to use an object that defines a similar relationship as to whether the search data string and the search target data string are similar A setting stage for setting according to a predetermined rule;
As a search process, within the range of the number of times set in the setting stage, the search data string acquired in the search data string acquisition stage, the search target data string held in the search target data string holding stage, A search stage for comparing whether or not there is a similarity relationship defined for the object;
Including
In the setting step, the search data string acquisition step is performed using, as an object, a table holding a relationship between the unit data and the similar unit data related to the search target data applicable to the search data string to be output as a search result. For the search data string acquired in step 1, the range of the number of times the object is used is set according to a predetermined rule,
In the search stage, within the range of the number of times set in the setting stage, using the object, the unit data constituting the search data string and the unit data constituting the search target data string are compared. If there is unmatched unit data for each unit data, the unmatched unit data is defined in the object using the object within the range of the number of times set in the setting step. A search method characterized in that a comparison is made as to whether or not there is any.

A search method in a search device,
A search data string acquisition stage for acquiring a search data string;
For the search data string acquired in the search data string acquisition step, a range of the number of times to use an object that defines a similar relationship as to whether the search data string and the search target data string are similar A setting stage for setting according to a predetermined rule;
As a search process, within the range of the number of times set in the setting stage, the search data string acquired in the search data string acquisition stage, the search target data string held in the search target data string holding stage, A search stage for comparing whether or not there is a similarity relationship defined for the object;
Including
In the search step, the object is used to count the number of comparisons between the unit data constituting the search data string and the unit data constituting the search target data string, and according to the counted number A search method characterized by weighting the degree of applicability between the search data string and the search object data string, and outputting the search object data string to be applied to the search data string according to the weighting .

A program for causing a computer to execute the search method according to any one of claims 6 to 8.