JP2007034870A

JP2007034870A - Language processor based on concept of distance

Info

Publication number: JP2007034870A
Application number: JP2005219908A
Authority: JP
Inventors: Hidehiko Murata; 秀彦村田
Original assignee: TEAM LAB Inc
Current assignee: TEAM LAB Inc
Priority date: 2005-07-29
Filing date: 2005-07-29
Publication date: 2007-02-08
Anticipated expiration: 2025-07-29
Also published as: JP4705430B2

Abstract

<P>PROBLEM TO BE SOLVED: To basically provide a character evaluation system allowing rapid and easy retrieval of a similar character string. <P>SOLUTION: In this system, similarity is evaluated by basically finding a 'distance' between two compared character strings, setting an evaluation value based on the distance as the similarity, and comparing the evaluation value. Specifically, the evaluation system or the like for evaluating the similarity between a stored character string and an input character string has: a character string input means 2; a character number calculation means 3; a numerical value allocation means 4 for allocating a numerical value of n/N to each character of the input character string such that the characters between the head and the end have equal intervals; a storage means 5 for allocating a numerical value of m/M to the m-th character of the storage character string and storing it; a character consistency decision means 6; a difference means 7; a square means 8; and an evaluation value calculation means 9. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、幾何学的手法を用いて単語間の距離を算出することによる単語の揺らぎを補正するための装置や類似度を算出するための装置などに関する。 The present invention relates to a device for correcting fluctuation of a word by calculating a distance between words using a geometric technique, a device for calculating a similarity, and the like.

キーワード入力による情報検索装置では、キーワードの表記の揺れなどにより検索洩れが生じることがある。このような検索漏れを解消するために，種々の類似キーワード・検索装置が考案されている。 In an information search apparatus using keyword input, search omission may occur due to fluctuations in keyword notation. In order to eliminate such a search omission, various similar keyword / search devices have been devised.

例えば，ある類似キーワード・検索装置は，入力キーワードに合致する文字列が検索対象の文書中に存在しない場合（つまり「ヒット」しない場合）に、その入力キーワードに類似した文字列を自動的に判定・抽出し、この類似文字列をキーワードとして新たに検索を行う。また、「ヒット」があった場合でも、他にそのキーワードに類似したキーワードがあれば、これも検索の対象にする。 For example, if a similar keyword / search device does not have a character string that matches the input keyword in the document to be searched (that is, does not “hit”), it automatically determines a character string similar to the input keyword. Extract and perform a new search using this similar character string as a keyword. Even if there is a “hit”, if there is another keyword similar to that keyword, it is also included in the search.

このような装置において，様々な類似キーワードを判定する方法が考案されている。たておば、起こりやすい文字列の置換えを規則化しておき、これを入力キーワードに適用するものがある。この方法では、置き換えによって得られた様々な文字列を、データベースの辞書（単語インデックス）などと比較し、語彙中にあるものを類似文字列とみなす。一方、入力文字列と単語インデックスのエントリを文字単位に対応付けて比較し、類似したものを類似文字列として用いるものもある。 In such an apparatus, a method for determining various similar keywords has been devised. For example, there is a method in which replacement of a character string that is likely to occur is made regular and applied to an input keyword. In this method, various character strings obtained by replacement are compared with a database dictionary (word index) or the like, and those in the vocabulary are regarded as similar character strings. On the other hand, an input character string and a word index entry are compared in association with each character, and similar ones are used as similar character strings.

類似キーワードを判定する場合において、上述したように文字列の置換えを規則化する方法では、置換え規則のメンテナンスが面倒である。また、置換え規則が適当でない場合は，ヒットしないこととなり，類似キーワードを出力できない。実際、入力キーワードがヒットしない原因が、新語や造語、固有名詞などにあることが多いので、置換え規則による方法では対応しきれない場合がある。 In the case of determining similar keywords, the replacement rule maintenance is troublesome in the method of regularizing character string replacement as described above. If the replacement rule is not appropriate, no hit will occur and similar keywords cannot be output. Actually, there are many reasons why the input keyword does not hit a new word, coined word, proper noun, etc., and there are cases where the method using the replacement rule cannot cope with it.

そこで，キーワードの表記の揺れなどにより検索洩れが生じることのないように、入力キーワードに類似したキーワードを的確に判定、出力することができる類似文字列検索装置が研究された。このような例として，特許第３５３１２２２号公報（下記特許文献１）には以下の発明が開示されている。すなわち、入力文字列に類似した文字列を出力する機能を有する類似文字列検索装置において、ある文字と置換可能な文字を対応づけた文字置換表と、入力文字列を構成する文字の一覧と、前記各文字の前記入力文字列中の出現位置を示す情報とを保持する入力文字列の文字成分表に、前記文字置換表を参照して前記入力文字列に置換可能な文字がある場合には前記入力文字列中の置き換えられる文字の出現位置を置換文字の出現位置として、置換文字とその出現位置を示す情報とを追加した文字成分表を作成する文字成分表作成部と、様々な文字列を登録した辞書と、前記辞書に登録された文字列の各文字毎に文字成分表を検索し、検索した文字が存在する場合には、その文字の出現位置を取り出して、前記辞書に登録された文字列と入力文字列の各文字の一対一対応を表す文字列対応表を作成する文字列対応表作成部と、前記辞書の各登録文字列毎に、入力文字列と登録文字列との類似度の評価値を、入力文字列と登録文字列の全ての文字が最適対応文字として対応したときの値から、前記文字列対応表をもとに対応しない文字分の減点を行うことで求め、置換文字を置換するときの減点の値を文字に食い違いがあるとき減点の値よりも少なくすることで、起こり易い文字置換を反映した形で入力文字列と各登録文字列との類似度を判定する類似度判定部とを設けることを特徴とする類似文字列検索装置である（下記特許文献１の請求項１を参照）。 Therefore, a similar character string search device capable of accurately determining and outputting a keyword similar to the input keyword has been studied so that a search omission is not caused by fluctuation of the keyword notation. As such an example, Japanese Patent No. 353122 (the following Patent Document 1) discloses the following invention. That is, in a similar character string search device having a function of outputting a character string similar to an input character string, a character replacement table that associates a character with a replaceable character, a list of characters that constitute the input character string, When there is a character that can be replaced with the input character string by referring to the character replacement table in the character component table of the input character string that holds information indicating the appearance position of each character in the input character string A character component table creation unit that creates a character component table in which a replacement character and information indicating the appearance position are added, with the appearance position of the character to be replaced in the input character string as the appearance position of the replacement character, and various character strings The character component table is searched for each character of the character string registered in the dictionary and the character string registered in the dictionary, and when the searched character exists, the appearance position of the character is extracted and registered in the dictionary. String and input A character string correspondence table creating unit that creates a character string correspondence table representing a one-to-one correspondence between each character of the character string, and an evaluation value of the similarity between the input character string and the registered character string for each registered character string in the dictionary Is calculated by subtracting the uncorresponding characters based on the character string correspondence table from the values when all the characters in the input character string and the registered character string correspond as the optimum corresponding characters, and the replacement character is replaced. Similarity determination that determines the similarity between the input character string and each registered character string in a form that reflects easy-to-occur character replacement by making the deduction value when the character is inconsistent less than the deduction value A similar character string search device (see claim 1 of Patent Document 1 below).

しかしながら，このような類似文字列検索装置では，置換文字を置換するための文字置換表を参照して置換文字群を作成しなければならず，類似度を評価するために多くの計算を必要とするので，類似度の高い文字列を迅速に検索できないという問題がある。 However, in such a similar character string search device, a replacement character group must be created by referring to a character replacement table for replacing the replacement character, and many calculations are required to evaluate the similarity. Therefore, there is a problem that a character string having a high similarity cannot be searched quickly.

特許第３５３１２２２号公報Japanese Patent No. 3531222

本発明は、基本的には，迅速かつ簡単に類似文字列を検索できる文字評価システムを提供することを目的とする。 An object of the present invention is basically to provide a character evaluation system capable of quickly and easily searching for similar character strings.

本発明は入力単語を，あらかじめ登録した単語に補正することができ，同一の観念を持つ語に対して様々な単語が入力されるアンケートデータを容易に集計できる評価システムを提供することを上記とは別の目的とする。 The present invention provides an evaluation system capable of correcting input words to pre-registered words and easily counting questionnaire data in which various words are input for words having the same concept. Is for another purpose.

本発明は，入力文字列をあらかじめ登録した文字列に補正でき，単語の自動補正機能を有するシステムを提供することを上記とは別の目的とする。 Another object of the present invention is to provide a system that can correct an input character string to a pre-registered character string and has an automatic word correction function.

上記の課題は，基本的には，対比する二つの文字列の"距離"を求め，その距離に基づく評価値を類似度とし，その評価値を比較することにより類似度を評価するシステムにより解決される。 The above problem is basically solved by a system that evaluates the similarity by calculating the "distance" between the two strings to be compared, using the evaluation value based on the distance as the similarity, and comparing the evaluation values. Is done.

本発明のシステムによれば，置換文字表を必須とせずに，文字の"距離"という概念に基づいて，類似度を評価するので，置換文字表によって作成された複数の置換文字群の類似度を評価する必要がなくなり，また簡便なアルゴリズムにより類似度を評価できることとなる。よって，本発明のシステムは，実装が容易となり，また迅速かつ適格に類似度を評価でき，類似度の高い文字列を検索・抽出できることとなる。 According to the system of the present invention, the similarity is evaluated based on the concept of “distance” of characters without requiring a replacement character table. Therefore, the similarity of a plurality of replacement character groups created by the replacement character table is evaluated. It is no longer necessary to evaluate and similarity can be evaluated by a simple algorithm. Therefore, the system of the present invention is easy to implement, can evaluate the degree of similarity quickly and properly, and can search and extract a character string having a high degree of similarity.

本発明のシステムによれば，入力文字列を，記録文字列のうち評価度の最も小さい文字列に置換させる置換手段によって，入力文字列をあらかじめ登録した登録文字列に補正することができるので，同一の観念を持つ語に対して様々な単語が入力されるアンケートデータを容易に集計するシステムなどとして利用されうる。 According to the system of the present invention, the input character string can be corrected to the registered character string registered in advance by the replacement means for replacing the input character string with the character string having the lowest evaluation degree among the recorded character strings. The present invention can be used as a system for easily counting questionnaire data in which various words are input for words having the same concept.

また，本発明のシステムによれば，入力文字列を，記録文字列のうち評価度の最も小さい文字列に置換させる置換手段によって，あらかじめ登録した単語に補正できるので，単語の自動補正機能を有するワープロソフトとして利用されうる。また，本発明のシステムでは，様々な単語を含むデータベースに記憶される各単語と，入力された単語の類似度を求めることができ，さらに選択される語の類似度を設定できるので，インターネットのサーチエンジンにおける検索システムとして利用されうる。 In addition, according to the system of the present invention, an input character string can be corrected to a pre-registered word by a replacement means for replacing the input character string with a character string having the smallest evaluation degree among recorded character strings. It can be used as word processing software. In the system of the present invention, the similarity between each word stored in a database including various words and the input word can be obtained, and the similarity of the selected word can be set. It can be used as a search system in a search engine.

以下，図面に基づいて，本発明の第一の態様（パターン例）に係る評価システムを説明する。図１は，本発明の第一の態様に係る評価システムのブロック図である。第一の態様に係る評価システムは，特に同じ文字数の入力文字列と記憶文字列との類似度を評価する場合に有効である。図１に示されるように第一の実施の形態に係るシステム１は，文字列を入力するための文字列入力手段２と，前記文字列入力手段により入力された入力文字列を構成する文字数を算出する文字数算出手段３と，前記文字列入力手段により入力された入力文字列の各文字に対して，その先頭の文字を０とし，その末尾の文字を１とし，前記文字数算出手段が算出した文字数をＮとすると，その先頭と末尾との間の文字が等間隔となるように，前記文字列の第ｎ番目の文字に対して，ｎ／Ｎの数値を割り当てるための数値割り当て手段４と，様々な記憶文字列と，前記記憶文字列の各文字に対して，その先頭の文字を０とし，その末尾の文字を１とし，その先頭と末尾との間の文字が等間隔となるように，前記記憶文字列の文字数をＭとしたときに，前記記憶文字列の第ｍ番目の文字に対して，ｍ／Ｍの数値を割り当て記憶するための記憶手段５と，前記入力文字列を構成する各文字と，前記辞書に記憶される記憶文字列を構成する各文字が一致するかどうか判断する文字一致性判断手段６と，前記文字一致性判断手段が一致する文字があると判断した場合，入力文字列の各文字に割り当てられた数値の差を求めるための差分手段７と，前記差分手段が求めた差の値を二乗するための二乗手段８と，前記二乗手段が求めた各文字についての値の和を算出することにより評価値を求めるための評価値算出手段９と，を備えた入力文字列と記憶文字列の類似度を評価するための評価システムである。なお，図中１０は，ＲＯＭ，ＲＡＭ，外部メモリ，ハードディスクなどのメモリを示し，１１はプリンタ，モニタ，外部端末などの出力装置を示す。 Hereinafter, an evaluation system according to a first aspect (pattern example) of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram of an evaluation system according to the first aspect of the present invention. The evaluation system according to the first aspect is particularly effective when evaluating the similarity between an input character string having the same number of characters and a stored character string. As shown in FIG. 1, the system 1 according to the first embodiment includes a character string input means 2 for inputting a character string, and the number of characters constituting the input character string input by the character string input means. The number-of-characters calculating unit 3 calculates the number of characters calculated by the number-of-characters calculating unit, and for each character of the input character string input by the character string input unit, the first character is set to 0 and the last character is set to 1. Numerical value assigning means 4 for assigning a numerical value of n / N to the nth character of the character string, so that the number of characters is N, so that the characters between the beginning and the end are equally spaced; For each character in the stored character string and the stored character string, the first character is 0, the last character is 1, and the characters between the first and last characters are equally spaced. When the number of characters in the stored character string is M , Storage means 5 for allocating and storing a numerical value of m / M for the m-th character of the stored character string, each character constituting the input character string, and a stored character stored in the dictionary When the character matching judgment means 6 for judging whether or not each character constituting the column matches, and when the character matching judgment means judges that there is a matching character, the numerical value assigned to each character of the input character string The difference means 7 for obtaining the difference, the square means 8 for squaring the difference value obtained by the difference means, and the evaluation value by calculating the sum of the values for each character obtained by the square means An evaluation system for evaluating the similarity between an input character string and a stored character string, comprising an evaluation value calculation means 9 for obtaining. In the figure, reference numeral 10 denotes a memory such as a ROM, a RAM, an external memory, and a hard disk, and 11 denotes an output device such as a printer, a monitor, and an external terminal.

なお，本発明の評価システムは，パーソナルコンピュータなどのコンピュータであっても良いし，インターネットやイントラネットに連結されたコンピュータサーバなどであっても良い。図２は，インターネットに接続されたコンピュータである本発明のシステムの例を示す図である。図２に示されるように，本発明のコンピュータ２１は，インターネット網２２を介して，ほかのコンピュータ２３と接続されている。接続様式は，特に限定されず，無線ＬＡＮなどによって接続されてもよく，また携帯電話などと情報の授受ができるようにされていても良い。なお，このようなコンピュータは，たとえば，コンピュータを上記の各手段として機能させるプログラムを記憶したメインメモリと，情報をシステムに入力するための入力装置と，情報をシステム外に出力するための出力装置と，情報を一時的又は半永久的に記憶するデータベースなどの記憶装置と，各種演算処理を行う中央処理装置（ＣＰＵ）とを具備するものとして構成される。 Note that the evaluation system of the present invention may be a computer such as a personal computer, or a computer server connected to the Internet or an intranet. FIG. 2 is a diagram showing an example of the system of the present invention which is a computer connected to the Internet. As shown in FIG. 2, the computer 21 of the present invention is connected to another computer 23 via the Internet network 22. The connection mode is not particularly limited, and may be connected by a wireless LAN or the like, and may be configured to exchange information with a mobile phone or the like. Such a computer includes, for example, a main memory storing a program that causes the computer to function as each of the above means, an input device for inputting information to the system, and an output device for outputting information to the outside of the system. And a storage device such as a database for temporarily or semi-permanently storing information, and a central processing unit (CPU) for performing various arithmetic processes.

以下では，第一の態様に係る本発明のシステムの動作例を説明する。図３は，この動作を説明するためのフローチャートである。なお，図３において，Ｓはステップを意味する。本発明のシステムは，まず，入力装置などの文字列入力手段がシステムに文字列を入力する（Ｓ１０１）。この文字列を"入力文字列"とよぶ。たとえば，インターネットに接続されたサーバに文字列が入力されてもよいし，コンピュータなどに文字列が入力されてもよい。ここでは，文字列"ＡＢＣ"がコンピュータに入力されたとする。 Below, the operation example of the system of this invention which concerns on a 1st aspect is demonstrated. FIG. 3 is a flowchart for explaining this operation. In FIG. 3, S means a step. In the system of the present invention, first, a character string input means such as an input device inputs a character string into the system (S101). This character string is called "input character string". For example, a character string may be input to a server connected to the Internet, or a character string may be input to a computer or the like. Here, it is assumed that the character string “ABC” is input to the computer.

次に，文字数算出手段が，入力文字列の文字数を算出する（Ｓ１０２）。文字列を算出するための文字数算出回路などのハードウェアによって文字列を算出してもよい。また，ＣＰＵなどの文字数算出手段が，メインメモリ中の制御プログラムの指令を受け，入力文字列の文字数を算出してもよい。本明細書では，入力文字列の文字数をＮとする。入力文字列"ＡＢＣ"であれば，文字数は３と算出される。この場合，たとえば入力文字列"ＡＢＣ"を一文字ずつ分離し，分離できた回数をもって文字数とすればよい。そして，ｎ番目の文字にはｎの順番が振られる。たとえば，先の例では，各文字には１番目から３番目までの順番が振られる。 Next, the character number calculating means calculates the number of characters in the input character string (S102). The character string may be calculated by hardware such as a character number calculation circuit for calculating the character string. Further, a character number calculating means such as a CPU may calculate the number of characters in the input character string in response to a command from a control program in the main memory. In this specification, the number of characters in the input character string is N. If the input character string is “ABC”, the number of characters is calculated as 3. In this case, for example, the input character string “ABC” may be separated character by character, and the number of characters that can be separated may be used as the number of characters. The n-th character is assigned the order of n. For example, in the previous example, each character is assigned the order from the first to the third.

数値割り当て手段が，前記文字列入力手段により入力された入力文字列の各文字に対して，その先頭の文字を０とし，その末尾の文字を１とし，その先頭と末尾との間の文字が等間隔となるように，前記文字数算出手段が算出した文字数をＮとすると，前記文字列の第ｎ番目の文字に対して，（ｎ−１）／（Ｎ−１）の数値（座標値）を割り当てる（Ｓ１０３）。ＣＰＵなどの文字数算出手段が，メインメモリ中の制御プログラムの指令を受け，上記の数値を割り当てても良い。たとえば，差分回路や差分プログラムにより，ｎ−１及びＮ−１の値を求め，除算回路や除算プログラムにより（ｎ−１）／（Ｎ−１）の数値を求めればよい。また，文字数とその文字数における各文字の座標値を記憶したテーブルを用意し，そのテーブルから座標値を読み出しても良い。なお，Ｎは，２０以下の数があげられ，１５以下であれば好ましい。たとえば，上記の入力文字列"ＡＢＣ"については，文字Ａに対して０を割り当て，文字Ｂに対して１／２から０．５を割り当て，文字Ｃに対して１を割り当てる。これを，（Ａ，Ｂ，Ｃ）＝（０，０．５，１）のようにあらわす。なお，入力文字列が"ＡＢＣＤＥ"の場合は（Ａ，Ｂ，Ｃ，Ｄ，Ｅ）= （０，０．２５，０．５，０．７５，１）というように数値を割り当てればよい。 For each character of the input character string input by the character string input means, the numerical value assigning means sets the first character to 0, the last character to 1, and the character between the first and last characters. Assuming that the number of characters calculated by the number-of-characters calculation means is N so as to be equally spaced, a numerical value (coordinate value) of (n−1) / (N−1) is obtained for the nth character in the character string. Is assigned (S103). Character number calculation means such as a CPU may be assigned the above numerical values upon receiving a command from a control program in the main memory. For example, the values of n-1 and N-1 may be obtained by a difference circuit or a difference program, and the value of (n-1) / (N-1) may be obtained by a division circuit or a division program. A table storing the number of characters and the coordinate value of each character in the number of characters may be prepared, and the coordinate value may be read from the table. N is a number of 20 or less, preferably 15 or less. For example, for the input character string “ABC”, 0 is assigned to the character A, 1/2 to 0.5 are assigned to the character B, and 1 is assigned to the character C. This is expressed as (A, B, C) = (0, 0.5, 1). If the input character string is “ABCDE”, a numerical value may be assigned such as (A, B, C, D, E) = (0, 0.25, 0.5, 0.75, 1). .

一方，記憶辞書やデータベースなどの記憶手段には，様々な記憶文字列が記憶されており，前記記憶文字列の各文字に対して，その先頭の文字を０とし，その末尾の文字を１とし，その先頭と末尾との間の文字が等間隔となるように，前記記憶文字列の文字数をＭとしたときに，前記記憶文字列の第ｍ番目の文字に対して，（ｍ−１）／（Ｍ−１）の数値が割り当てられて記憶されている。このような情報は，コンピュータなどのハードディスク，ＲＯＭなどに記憶されていてもよい。記憶文字列は，記憶文字列，記憶文字列を構成する各文字，各文字列の文字数，文字列を構成する各文字の座標値が関連付けられて記憶されていることが好ましく，たとえばテーブルやデータベースとして記憶されることが好ましい。データベースとして情報を記憶する場合、データを分類し・整理するデータモデルとして、階層データモデル、ネットワークデータモデル、リレーショナルデータモデルといった公知のデータモデルを用いることができる。これらの中でも、リレーショナルデータモデルが，データをアプリケーションから容易に独立させることができ、入力情報に応じた新たな表を容易に作成でき、データ操作が容易である点で特に好ましい。 On the other hand, various storage character strings are stored in storage means such as a storage dictionary or a database. For each character in the storage character string, the first character is set to 0 and the last character is set to 1. , (M−1) for the mth character of the stored character string, where M is the number of characters in the stored character string so that the characters between the beginning and the end are equally spaced. A numerical value of / (M−1) is assigned and stored. Such information may be stored in a hard disk such as a computer, a ROM, or the like. The stored character string is preferably stored in association with the stored character string, each character constituting the stored character string, the number of characters in each character string, and the coordinate value of each character constituting the character string. Is preferably stored as. When information is stored as a database, a known data model such as a hierarchical data model, a network data model, or a relational data model can be used as a data model for classifying and organizing data. Among these, the relational data model is particularly preferable in that data can be easily made independent from an application, a new table corresponding to input information can be easily created, and data manipulation is easy.

ここで、リレーショナルデータモデルとは、２次元の表（テーブル）の概念を利用し、表の縦一列を項目、横一列がレコードにあたるようにデータを管理するデータモデルである。射影、選択、結合、商などの演算処理を用いて表から任意のデータを取り出して、新しい表（ビュー表等）を作成することができる。また、複数の表からデータを抽出して新たな表を作成することもできる。表は、複数の組からなる。それぞれの組は、複数の情報からなり、行（ロー）を構成する。そして、表の縦の列には同じ属性のデータが整理され列（カラム）を構成する。それぞれの組には、管理のための数値（インデックス）等が振られており、複数の組の当該数値は、定義域（ドメイン）を構成していることが好ましい。インデックスを設定することは、データの記憶装置上の格納位置にすばやくたどりつけるため好ましい。インデックスが振られていない場合、データベースは、キーワードなどを利用して情報を検索できるように体系的に構成されていることが好ましい。リレーショナルデータベースの各指令は，たとえばＳＱＬなどの言語で作成されればよい。 Here, the relational data model is a data model that uses the concept of a two-dimensional table (table) and manages data so that a vertical column corresponds to an item and a horizontal column corresponds to a record. A new table (view table or the like) can be created by extracting arbitrary data from the table using arithmetic processing such as projection, selection, join, and quotient. It is also possible to create a new table by extracting data from a plurality of tables. The table consists of a plurality of sets. Each set is composed of a plurality of pieces of information and constitutes a row. In the vertical column of the table, data having the same attribute is organized to form a column. A numerical value (index) for management is assigned to each group, and it is preferable that the numerical values of a plurality of sets constitute a domain (domain). Setting an index is preferable because it quickly reaches the storage location on the data storage device. When the index is not assigned, the database is preferably systematically configured so that information can be searched using keywords or the like. Each command in the relational database may be created in a language such as SQL.

たとえば，記憶手段に"ＡＢＣ"，及び"ＡＣＢ"が記憶されていたとする。この場合，（Ａ，Ｂ，Ｃ）＝（０，０．５，１）が記憶され，（Ａ，Ｃ，Ｂ）＝（０，０．５，１）と記憶される。上記の情報がリレーショナルデータベースに記憶される場合，例えば，記憶文字列ＡＢＣ，文字数３，Ａの座標値（０），Ｂの座標値（０．５），及びＣの座標値（１）が関連付けられて記憶され，それぞれが組を構成する。 For example, it is assumed that “ABC” and “ACB” are stored in the storage means. In this case, (A, B, C) = (0, 0.5, 1) is stored, and (A, C, B) = (0, 0.5, 1) is stored. When the above information is stored in a relational database, for example, the stored character string ABC, the number of characters 3, the coordinate value (0) of A, the coordinate value (0.5) of B, and the coordinate value (1) of C are associated with each other. Each of which is stored.

文字一致性判断手段が，前記入力文字列を構成する各文字と，前記辞書に記憶される記憶文字列を構成する各文字が一致するかどうか判断する（Ｓ１０４）。この文字一致性判断手段は，文字一致性を判断する回路などのハードウェアによって構成されてもよい。また，ＣＰＵなどが，メインメモリ中の制御プログラムの指令を受け，文字の一致性を判断するものであってもよい。このようなプログラムは，たとえばエクセル（登録商標），アクセス（登録商標）など公知のプログラムを用いて容易に作成できる。 The character matching judgment unit judges whether each character constituting the input character string matches each character constituting the stored character string stored in the dictionary (S104). This character matching determination means may be configured by hardware such as a circuit for determining character matching. In addition, the CPU or the like may receive a command from a control program in the main memory and determine character matching. Such a program can be easily created using a known program such as Excel (registered trademark) or Access (registered trademark).

文字一致性判断手段が前記入力文字列のいずれかの文字と，前記記憶文字列を構成する文字とが一致すると判断した場合，差分手段が，入力文字列の文字に割り当てられた数値と，記憶文字列を構成する文字のうち当該入力文字列の文字と一致する文字の数値との差を求める（Ｓ１０５）。差分手段は，数値の差分値を求める差分回路であってもよいし，差分値を記憶したテーブルによって差分値が求められてもよい。いずれにせよ，２つの座標値が回路又はテーブルに入力され，それに応じた差分値が求められることとなる。なお，差分手段は，コンピュータを差分値を求める手段として機能させるプログラムによって達成されてもよい。このようなプログラムは，公知のプログラムを用いて容易に作成できる。 When the character matching judgment means judges that any of the characters in the input character string matches the characters constituting the stored character string, the difference means stores the numerical value assigned to the character in the input character string, The difference between the character constituting the character string and the numerical value of the character that matches the character of the input character string is obtained (S105). The difference means may be a difference circuit for obtaining a numerical difference value, or the difference value may be obtained by a table storing the difference value. In any case, two coordinate values are input to the circuit or table, and a difference value corresponding to the two coordinate values is obtained. The difference means may be achieved by a program that causes a computer to function as a means for obtaining a difference value. Such a program can be easily created using a known program.

たとえば，入力文字列，記憶文字列ともに"ＡＢＣ"の場合，各文字の差分値Δは，Δ＝（０，０，０）となる。一方，入力文字列が"ＡＢＣ"であり，記憶文字列が"ＡＣＢ"の場合は，Δ＝（０，−０．５，０．５）となる。 For example, when both the input character string and the stored character string are “ABC”, the difference value Δ of each character is Δ = (0, 0, 0). On the other hand, when the input character string is “ABC” and the stored character string is “ACB”, Δ = (0, −0.5, 0.5).

二乗手段が，前記差分手段が求めた差の値を二乗し差分値の二乗の値を求める（Ｓ１０６）。二乗手段は，二乗値を求めるための二乗回路であってもよいし，入力値の２乗の値を記憶したテーブルによって構成され，二乗値が求められてもよい。また，二乗手段は，ＣＰＵなどが，メインメモリ中の制御プログラムの指令を受け二乗値を求めるためのものであってもよい。このようなプログラムは，公知のプログラムを用いて容易に作成できる。 The square means squares the difference value obtained by the difference means to obtain the square value of the difference value (S106). The square means may be a square circuit for obtaining a square value, or may be configured by a table storing the square value of the input value, and the square value may be obtained. Further, the square means may be one in which a CPU or the like obtains a square value in response to an instruction of a control program in the main memory. Such a program can be easily created using a known program.

たとえば，入力文字列，記憶文字列ともに"ＡＢＣ"の場合，各文字の差分値Δは，Δ＝（０，０，０）であった。したがって，各文字の差分値を２乗した値もそれぞれ０となる。一方，入力文字列が"ＡＢＣ"であり，記憶文字列が"ＡＣＢ"の場合は，差分値Δ＝（０，−０．５，０．５）であった。したがって，入力文字列"ＡＢＣ"の各文字の二乗値は，それぞれ（０，０．２５，０．２５）となる。 For example, when the input character string and the stored character string are both “ABC”, the difference value Δ of each character is Δ = (0, 0, 0). Therefore, the value obtained by squaring the difference value of each character is also 0. On the other hand, when the input character string is “ABC” and the stored character string is “ACB”, the difference value Δ = (0, −0.5, 0.5). Therefore, the square value of each character of the input character string “ABC” is (0, 0.25, 0.25), respectively.

評価値算出手段が，前記二乗手段が求めた各文字についての値の和を算出することにより評価値を求める（Ｓ１０７）。このような評価値算出手段は，和を求めるための回路であってもよいし，和の値を記憶したテーブルなどのハードウェアであってもよい。また，ＣＰＵなどが，メインメモリ中の制御プログラムの指令を受け，各差分値の和を求めるものであっても良い。このようなプログラムは，公知のプログラムを用いて容易に作成できる。なお，求めた評価値は，記憶文字列と関連して，メモリなどに記憶される。そして，求めた評価値又はその記憶文字列は，出力装置により出力される。具体的には，モニタに表示されたり，プリンタに印字されたりする。また，情報がウェブサイトにアップロードされてもよい。 The evaluation value calculation means calculates an evaluation value by calculating the sum of the values for each character obtained by the square means (S107). Such an evaluation value calculation means may be a circuit for obtaining the sum, or may be hardware such as a table storing the sum value. Alternatively, the CPU or the like may receive a command from a control program in the main memory and obtain the sum of the difference values. Such a program can be easily created using a known program. The obtained evaluation value is stored in a memory or the like in association with the stored character string. Then, the obtained evaluation value or the stored character string is output by the output device. Specifically, it is displayed on a monitor or printed on a printer. Information may also be uploaded to a website.

たとえば，入力文字列，記憶文字列ともに"ＡＢＣ"の場合，各文字の差分値を２乗した値が全て０となる。したがって，それらの和である評価値も０となる。一方，入力文字列が"ＡＢＣ"であり，記憶文字列が"ＡＣＢ"の場合は，入力文字列"ＡＢＣ"の各文字に対する差分値の２乗の値は，それぞれ（０，０．２５，０．２５）となる。よって，評価値は，それらの和であるから，０．５となる。すなわち，入力文字列と記憶文字列とが完全に一致している場合は，評価値が０となり，それからずれると評価値が大きくなる。このように本システムによれば，入力文字列と記憶文字列の類似度を評価することができる。なお，本システムでは，座標値の差を求めその値を２乗し，２乗した値を足し合わせている。これは，あたかも距離（の２乗）をもとめて，距離（の２乗）を比較しているようにも見える。以上が，本発明のシステムが，距離の概念に基づいて，文字の類似度を評価すると見るゆえんである。 For example, when both the input character string and the stored character string are “ABC”, the values obtained by squaring the difference value of each character are all zero. Therefore, the evaluation value which is the sum of them is also zero. On the other hand, when the input character string is “ABC” and the stored character string is “ACB”, the square value of the difference value for each character of the input character string “ABC” is (0, 0.25, 0.25). Therefore, the evaluation value is 0.5 because it is the sum of them. That is, when the input character string and the stored character string completely match, the evaluation value becomes 0, and when the input character string deviates from this, the evaluation value increases. Thus, according to this system, the similarity between the input character string and the stored character string can be evaluated. In this system, the difference between the coordinate values is obtained, the value is squared, and the squared value is added. This looks as if the distance (the square) is compared with the distance (the square). This is why the system of the present invention evaluates character similarity based on the concept of distance.

なお，上記の例では，座標値の差を求めその値を２乗し，２乗した値を足し合わせて評価値とした。しかし，この評価値を仮の評価値とし，仮の評価値を所定の数（たとえば，Ｎ，Ｍ，又はＮ＋Ｍ）で割ったものを評価値としてもよい。また，仮の評価値の平方根をもって評価値としても良い。さらには，１又はある数を，仮の評価値で割った値を評価値としても良い。この場合は，評価値が大きいほど，類似度が高くなる。 In the above example, the difference between the coordinate values is obtained, the value is squared, and the squared value is added to obtain an evaluation value. However, the evaluation value may be a temporary evaluation value, and the evaluation value may be obtained by dividing the temporary evaluation value by a predetermined number (for example, N, M, or N + M). Alternatively, the square root of the temporary evaluation value may be used as the evaluation value. Furthermore, it is good also considering the value which divided 1 or a certain number by the temporary evaluation value as an evaluation value. In this case, the greater the evaluation value, the higher the similarity.

このようなシステムによれば，データベースに記憶される記憶文字列の中から，入力文字列と記憶文字列の類似度を評価できるので，評価値の低いものから表示することにより，ウェブサイトなどの検索エンジンに好ましく用いることができる。 According to such a system, the similarity between the input character string and the stored character string can be evaluated from the stored character strings stored in the database. It can be preferably used for a search engine.

第一の態様に係る評価システムの好ましい態様は，評価値を設定するための評価値設定手段と，前記表価値算出手段が算出した評価値である算出評価値と，前記評価値設定手段が設定した表価値である設定評価値とを比較するための評価値比較手段とをさらに具備し，前記評価値比較手段が比較した結果，前記算出表価値が前記設定表価値よりも小さな値であった場合に，前記記憶文字列を選択する上記の評価システムである。この態様に係る評価システムでは，評価値を設定することにより，入力文字列と関連する記憶文字列を選択できることとなる。 In a preferred aspect of the evaluation system according to the first aspect, an evaluation value setting means for setting an evaluation value, a calculated evaluation value that is an evaluation value calculated by the table value calculation means, and the evaluation value setting means are set. An evaluation value comparing means for comparing with the set evaluation value which is the table value, and as a result of comparison by the evaluation value comparing means, the calculated table value is smaller than the set table value In this case, the evaluation system selects the stored character string. In the evaluation system according to this aspect, the stored character string associated with the input character string can be selected by setting the evaluation value.

この態様のシステムでは，評価値設定手段あらかじめ評価値を設定する。この設定値は，入力手段が，所定値をサーバやコンピュータへ入力することにより適宜変更できる。すなわち，クライアントなどからサーバやコンピュータへ評価値を入力することにより，評価値が設定される。この設定評価値は，たとえばＲＡＭなどに一時期的に記憶されてもよいし，ＲＯＭやハードディスクなどに長期的に記憶されてもよい。また，設定値に関する回路としてハードウェア的に設計されてもよい。 In the system of this aspect, the evaluation value is set in advance by the evaluation value setting means. This set value can be appropriately changed by the input means inputting a predetermined value to the server or computer. That is, an evaluation value is set by inputting an evaluation value from a client or the like to a server or a computer. This set evaluation value may be temporarily stored in a RAM or the like, for example, or may be stored in a ROM or a hard disk for a long time. Further, it may be designed in hardware as a circuit relating to the set value.

評価値設定手段の設定例として，設定評価値が０．７又は０．３があげられる。例えば設定評価値として０．７が，キーボードなどのポインティングデバイスにより打ち込まれ，コンピュータの入力装置によりシステムに入力される。 As a setting example of the evaluation value setting means, the set evaluation value is 0.7 or 0.3. For example, a setting evaluation value of 0.7 is input by a pointing device such as a keyboard and is input to the system by an input device of a computer.

評価値比較手段は,前記表価値算出手段が算出した評価値である算出評価値と，前記評価値設定手段が設定した表価値である設定評価値とを比較する。この評価値比較手段は，比較回路として構成されてもよい。またＣＰＵなどが，メインメモリ中の制御プログラムの指令を受け，記憶された設定評価値及び算出評価値を読み出し，算出評価値と設定評価値とを比較するものであっても良い。このようなプログラムは，公知のプログラムを用いて容易に作成できる。この比較結果は，メモリなどに記憶される。そして，選択手段としてのＣＰＵなどは，メインメモリ中の制御プログラムの指令を受け，メモリなどに記憶された前記比較結果を読み出し，前記評価値比較手段が比較した結果，前記算出表価値が前記設定表価値よりも小さな値であった場合に，前記記憶文字列を選択する。このようにして，設定値をクリアーし，入力文字列と類似すると判断された記憶文字列が選択されることとなる。 The evaluation value comparison means compares the calculated evaluation value, which is the evaluation value calculated by the table value calculation means, with the set evaluation value, which is the table value set by the evaluation value setting means. This evaluation value comparison means may be configured as a comparison circuit. Alternatively, the CPU or the like may receive a command from a control program in the main memory, read the stored set evaluation value and the calculated evaluation value, and compare the calculated evaluation value and the set evaluation value. Such a program can be easily created using a known program. This comparison result is stored in a memory or the like. Then, the CPU or the like as the selection means receives a control program command in the main memory, reads the comparison result stored in the memory or the like, and compares the evaluation value comparison means with the result that the calculated table value is the setting value. If the value is smaller than the table value, the stored character string is selected. In this way, the set value is cleared, and the stored character string determined to be similar to the input character string is selected.

たとえば，入力文字列，記憶文字列ともに"ＡＢＣ"の場合，評価値が０となる。一方，入力文字列が"ＡＢＣ"であり，記憶文字列が"ＡＣＢ"の場合は，評価値が０．５となる。そうであれば，設定評価値が０．７のときは，記憶文字列"ＡＢＣ"，及び"ＡＣＢ"ともに算出評価値が０．７以下なので，両方の文字列が選択される。一方，設定評価値が０．３のときは，記憶文字列"ＡＢＣ"のみが，算出表価値が前記設定表価値よりも小さな値なので，類似する記憶文字列として選択される。 For example, if the input character string and the stored character string are both “ABC”, the evaluation value is 0. On the other hand, when the input character string is “ABC” and the stored character string is “ACB”, the evaluation value is 0.5. If so, when the set evaluation value is 0.7, since the calculated evaluation values of both the stored character strings “ABC” and “ACB” are 0.7 or less, both character strings are selected. On the other hand, when the set evaluation value is 0.3, only the stored character string “ABC” is selected as a similar stored character string because the calculated table value is smaller than the set table value.

前記のシステムの好ましい別の態様は，前記選択手段が選択した記憶文字列が複数個ある場合に，各記憶文字列の算出評価値の小さな順に並び替えるソート手段を有するシステムである。このようなシステムでは，選択手段が選択した複数の記憶文字列を類似する順（すなわち評価値が小さい順）に並べることができる。たとえば，前記の算出評価値は，記憶文字列と関連してメモリなどに記憶されており，ＣＰＵなどは，メインメモリ中の制御プログラムの指令を受け，その算出評価値を比較し，その値が小さいものから順に並べ，その値に伴って，算出評価値の小さな記憶文字列から順に並べられる。 Another preferred aspect of the system is a system having sorting means for rearranging the calculated evaluation values of each stored character string in ascending order when there are a plurality of stored character strings selected by the selecting means. In such a system, a plurality of stored character strings selected by the selection means can be arranged in a similar order (that is, in the order of small evaluation values). For example, the calculated evaluation value is stored in a memory or the like in association with the stored character string, and the CPU or the like receives a command from the control program in the main memory, compares the calculated evaluation value, and the value is They are arranged in order from the smallest, and according to the value, they are arranged in order from the stored character string with the smallest calculated evaluation value.

たとえば，入力文字列，記憶文字列ともに"ＡＢＣ"の場合，評価値が０となる。一方，入力文字列が"ＡＢＣ"であり，記憶文字列が"ＡＣＢ"の場合は，評価値が０．５となる。そうであれば，設定評価値が０．７のときは，記憶文字列"ＡＢＣ"，及び"ＡＣＢ"ともに算出評価値が０．７以下なので，両方の文字列が選択される。そして，算出評価値が低い順に並べ替えられる。すなわち，"ＡＢＣ"，"ＡＣＢ"の順とされる。 For example, if the input character string and the stored character string are both “ABC”, the evaluation value is 0. On the other hand, when the input character string is “ABC” and the stored character string is “ACB”, the evaluation value is 0.5. If so, when the set evaluation value is 0.7, since the calculated evaluation values of both the stored character strings “ABC” and “ACB” are 0.7 or less, both character strings are selected. Then, the calculation evaluation values are rearranged in ascending order. That is, the order is “ABC” and “ACB”.

本システムの好ましい別の態様は，前記記憶文字列に，前記入力文字列のいずれかの文字が含まれていない場合，その含まれていない文字についての差分値を１とする評価システムである。このようにして評価値を算出するので，たとえば入力文字列が"ＡＢＣ"で，記憶文字列が"ＡＢＤ"，又は"ＡＤＢ"のような場合であっても評価値を算出できることとなる。たとえば，入力文字列の各文字と，記憶文字列の各文字とを比較し，入力文字列のうち，記憶文字列に含まれていないものがあれば，その情報をメモリなどに記憶する。そして、ＣＰＵなどは，メインメモリ中の制御プログラムの指令を受け，メモリに記憶されたその情報を読出し，その文字の差分値として，１を与える。そして，差分値が１という情報は，メモリなどに記憶される。 Another preferable aspect of the present system is an evaluation system in which when the stored character string does not include any character of the input character string, the difference value for the character not included is set to 1. Since the evaluation value is calculated in this way, for example, the evaluation value can be calculated even when the input character string is “ABC” and the stored character string is “ABD” or “ADB”. For example, each character of the input character string is compared with each character of the stored character string, and if there is an input character string that is not included in the stored character string, the information is stored in a memory or the like. Then, the CPU or the like receives an instruction from the control program in the main memory, reads the information stored in the memory, and gives 1 as the difference value of the character. Information having a difference value of 1 is stored in a memory or the like.

先ほどの例では，入力文字列のうち，文字Ａと文字Ｂとは，記憶文字列中にも存在する。一方，入力文字列を構成する文字のうち，文字Ｃが記憶文字列に含まれていないので，文字Ｃについての差分値を１とする。すなわち，"ＡＢＤ"の差分値Δは（０，０，１）となるので，評価値は１となる。一方，"ＡＤＢ"の差分値は、（０、−０.５，１）となるので，評価値は１．２５となる。そして，たとえば設定評価値が０，７などの場合，これらの記憶文字列はいずれも選択されないこととなる。 In the previous example, the characters A and B in the input character string also exist in the stored character string. On the other hand, since the character C is not included in the stored character string among the characters constituting the input character string, the difference value for the character C is set to 1. That is, since the difference value Δ of “ABD” is (0, 0, 1), the evaluation value is 1. On the other hand, since the difference value of “ADB” is (0, −0.5, 1), the evaluation value is 1.25. For example, when the set evaluation value is 0, 7 or the like, none of these stored character strings is selected.

そして，この評価値とその評価値を与えた文字列は，評価値テーブルなどの評価値情報記憶手段に一時的又は半永久的に記憶されることとなる。そして，この評価値とその評価値を与えた記憶文字列（又は記憶文字列のみ）は，メインメモリ中の制御プログラムの指令を受けて読み出され，ディスプレイなどの表示手段に表示されることとなる。 Then, the evaluation value and the character string given the evaluation value are temporarily or semi-permanently stored in evaluation value information storage means such as an evaluation value table. Then, the evaluation value and the stored character string (or only the stored character string) to which the evaluation value is given are read in response to a command from the control program in the main memory, and are displayed on a display means such as a display. Become.

なお，第一の態様に係るシステムの別の態様は，前記文字一致性判断手段が一致する文字があると判断した場合，前記差分手段が，前記記憶文字列の各文字の数値とそれに対応する入力文字列の各文字の数値の差を求めるものである。たとえば，たとえば入力文字列が"ＡＢＣ"で，記憶文字列が"ＡＢＤ"の場合，入力文字列には文字Ｄが含まれていない。そこで，文字列"ＡＢＤ"についての差分値が（０，０，１）となり，記録文字列に基づく入力文字列の評価値が１となる。 In another aspect of the system according to the first aspect, when the character matching judgment means judges that there is a matching character, the difference means corresponds to the numerical value of each character in the stored character string and the character string corresponding thereto. The difference of the numerical value of each character of the input character string is obtained. For example, when the input character string is “ABC” and the stored character string is “ABD”, for example, the character D is not included in the input character string. Therefore, the difference value for the character string “ABD” is (0, 0, 1), and the evaluation value of the input character string based on the recorded character string is 1.

第二の態様に係る本発明の評価システムは，前記差分手段が，入力文字列の各文字に割り当てられた数値と，記憶文字列のうち入力文字列の各文字と同じ文字に割り当てられた数値の差と；前記記憶文字列の各文字に割り当てられた数値と，前記入力文字列のうち記憶文字列の各文字と同じ文字に割り当てられた数値の差を求めるための手段であり，前記差分手段が求めた入力文字列の各文字に割り当てられた数値と，記憶文字列のうち入力文字列の各文字と同じ文字に割り当てられた数値の差の値を二乗し，それらの和を求めることで，仮の評価値を求め，その仮の評価値をＮで割ることにより入力由来評価値を求める入力由来評価値算出手段と，前記記憶文字列の各文字に割り当てられた数値と，前記入力文字列のうち記憶文字列の各文字と同じ文字に割り当てられた数値の差の値を二乗し，それらの和を求めることで，仮の評価値を求め，その評価値をＭであることにより記憶由来評価値を求める記憶由来評価値算出手段と，前記入力由来評価値算出手段が算出した入力由来評価値と、前記記憶由来評価値算出手段が算出した記憶由来評価値とを合わせて評価値とする上記に記載の評価システムである。 In the evaluation system of the present invention according to the second aspect, the difference means assigns a numerical value assigned to each character of the input character string and a numerical value assigned to the same character as each character of the input character string in the stored character string. A difference between a numerical value assigned to each character of the stored character string and a numerical value assigned to the same character as each character of the stored character string in the input character string, the difference Square the difference between the numerical value assigned to each character of the input character string obtained by the means and the numerical value assigned to each character of the input character string in the stored character string, and obtain the sum of them The input-derived evaluation value calculating means for calculating the input-derived evaluation value by calculating the temporary evaluation value and dividing the temporary evaluation value by N, the numerical value assigned to each character of the stored character string, and the input Each sentence of the memory string in the string The value of the difference between the numerical values assigned to the same character is squared and the sum of them is obtained to obtain a temporary evaluation value, and the evaluation value is M. The evaluation system according to the above, wherein the calculation means, the input-derived evaluation value calculated by the input-derived evaluation value calculating means, and the memory-derived evaluation value calculated by the memory-derived evaluation value calculating means are combined into an evaluation value. .

なお，仮の評価値を求める工程は，第一の態様において評価値を求める工程と同様である。それぞれの仮の評価値は除算回路や除算プログラムなどの除算手段よりＮ又はＭで割った値が求められ，和算回路や和算プログラムなどの和算手段により評価値が求められる。このようなシステムでは，入力文字列の文字数や記憶文字数の文字数によらず，評価値が２に正規化されるので，適切に類似度を評価できる。なお，このシステムにおいて，評価値を最大１又は１００などに修正するように，適宜除算又は乗算手段を組み合わせてもかまわない。 The step of obtaining a temporary evaluation value is the same as the step of obtaining the evaluation value in the first aspect. For each temporary evaluation value, a value obtained by dividing by N or M is obtained by a division means such as a division circuit or a division program, and an evaluation value is obtained by a summation means such as an addition circuit or an addition program. In such a system, the evaluation value is normalized to 2 regardless of the number of characters in the input character string or the number of stored characters, so that the similarity can be appropriately evaluated. In this system, division or multiplication means may be appropriately combined so that the evaluation value is corrected to 1 or 100 at the maximum.

たとえば，入力文字列が"ＡＢＣ"で，これを記憶文字列"ＡＢＤＥ"との類似度を評価するとする。"ＡＢＤＥ"の座標値は、（Ａ，Ｂ，Ｄ，Ｅ）= （０，０．３３，０．６６，１）となる。入力文字列"ＡＢＣ"に基づく"ＡＢＤＥ"の差分値Δは，（Ａ，Ｂ，Ｃ）＝（０，０．１７，１）となる。それゆれ,仮の評価値は，０^２＋０．１７^２＋１^２＝１．０２８９となる。そして，これを３で割った値として，０．３４３が求められる。一方，文字列"ＡＢＤＥ"に基づく"ＡＢＣ"の差分値Δは，（Ａ，Ｂ，Ｄ，Ｅ）= （０，−０．１７，１，１）となる。よってその仮の評価値は，０^２＋０．１７^２＋１^２＋１^２＝２．０２８９となる。この値を４で割ると，０．５７２となる。したがって，この評価値は，０．３４３＋０．５７２＝０．９１５となる。 For example, it is assumed that the input character string is “ABC” and the similarity with the stored character string “ABDE” is evaluated. The coordinate value of “ABDE” is (A, B, D, E) = (0, 0.33, 0.66, 1). The difference value Δ of “ABDE” based on the input character string “ABC” is (A, B, C) = (0, 0.17, 1). Therefore, a temporary evaluation value is 0 ² +0.17 ² +1 ² = 1.0289. As a value obtained by dividing this by 3, 0.343 is obtained. On the other hand, the difference value Δ of “ABC” based on the character string “ABDE” is (A, B, D, E) = (0, −0.17, 1, 1). Therefore, the provisional evaluation value is 0 ² +0.17 ² +1 ² +1 ² = 2.0289. Dividing this value by 4 gives 0.572. Therefore, this evaluation value is 0.343 + 0.572 = 0.915.

本発明の第三の態様に係る評価システムは，前記差分手段は，入力文字列の各文字に割り当てられた数値と，記憶文字列のうち入力文字列の各文字と同じ文字に割り当てられた数値の差と；前記記憶文字列の各文字に割り当てられた数値と，前記入力文字列のうち記憶文字列の各文字と同じ文字に割り当てられた数値の差を求めるための手段であり，前記差分手段が求めた入力文字列の各文字に割り当てられた数値と，記憶文字列のうち入力文字列の各文字と同じ文字に割り当てられた数値の差の値を二乗し，それらの和を求めることで，仮の評価値を求め，前記記憶文字列の各文字に割り当てられた数値と，前記入力文字列のうち記憶文字列の各文字と同じ文字に割り当てられた数値の差の値を二乗し，それらの和を求めることで，仮の評価値を求め，これらの仮の評価値の和を求め，当該和を（Ｎ＋Ｍ）で割った値を求めて評価値とする上記に記載の評価システムである。仮の評価値を求める工程は，第一の態様において説明した工程を利用できる。 In the evaluation system according to the third aspect of the present invention, the difference means includes a numerical value assigned to each character of the input character string and a numerical value assigned to the same character as each character of the input character string in the stored character string. A difference between a numerical value assigned to each character of the stored character string and a numerical value assigned to the same character as each character of the stored character string in the input character string, the difference Square the difference between the numerical value assigned to each character of the input character string obtained by the means and the numerical value assigned to each character of the input character string in the stored character string, and obtain the sum of them Then, a temporary evaluation value is obtained, and the difference between the numerical value assigned to each character of the stored character string and the numerical value assigned to the same character as each character of the stored character string in the input character string is squared. , By calculating the sum of them, Calculated values, obtains a sum of the evaluation values of the temporary, the evaluation system described in that the evaluation value calculated values obtained by dividing the sum by (N + M). The process described in the first embodiment can be used as the process for obtaining a temporary evaluation value.

たとえば，入力文字列が"ＡＢＣ"で，これを記憶文字列"ＡＢＤＥ"との類似度を評価するとする。"ＡＢＤＥ"の座標値は、（Ａ，Ｂ，Ｄ，Ｅ）= （０，０．３３，０．６６，１）となる。入力文字列"ＡＢＣ"からの差分値Δは，（Ａ，Ｂ，Ｃ）＝（０，０．１７，１）となる。それゆれ,仮の評価値は，０^２＋０．１７^２＋１^２＝１．０２８９となる。一方，文字列"ＡＢＤＥ"からの差分値Δは，（Ａ，Ｂ，Ｄ，Ｅ）= （０，−０．１７，１，１）となる。よってその仮の評価値は，０^２＋０．１７^２＋１^２＋１^２＝２．０２８９となる。仮の評価値の和は，３．０５７８である。これを７で割ると，０．４３６８が求められ，これが評価値となる。 For example, it is assumed that the input character string is “ABC” and the similarity with the stored character string “ABDE” is evaluated. The coordinate value of “ABDE” is (A, B, D, E) = (0, 0.33, 0.66, 1). The difference value Δ from the input character string “ABC” is (A, B, C) = (0, 0.17, 1). Therefore, a temporary evaluation value is 0 ² +0.17 ² +1 ² = 1.0289. On the other hand, the difference value Δ from the character string “ABDE” is (A, B, D, E) = (0, −0.17, 1, 1). Therefore, the provisional evaluation value is 0 ² +0.17 ² +1 ² +1 ² = 2.0289. The sum of the temporary evaluation values is 3.0578. Dividing this by 7 gives 0.4368, which is the evaluation value.

本発明のシステムの好ましい別の態様は，前記のシステムにおいて，前記入力文字列又は前記記録文字列に，同じ文字が２つ以上含まれている場合，各文字に割り当てられた数値の差を求めるにあたり，差の絶対値が小さいほうの文字を用いて数値の差を求める評価システムである。このような構成を有するので，前記入力文字列又は前記記録文字列に，同じ文字が２つ以上含まれている場合であっても適切に評価値を求めることができることとなる。 In another preferred embodiment of the system of the present invention, in the above system, when the input character string or the recorded character string includes two or more same characters, a difference between numerical values assigned to the respective characters is obtained. This is an evaluation system that uses the character with the smaller absolute value of the difference to find the numerical difference. Since it has such a structure, even if it is a case where two or more same characters are contained in the said input character string or the said recording character string, an evaluation value can be calculated | required appropriately.

この態様に係るシステムの動作として，以下のものがあげられる。各文字列に同じ文字が２つ以上含まれるかどうか判断する判断手段が，各文字列に同じ文字が２つ以上含まれるかどうか判断する。より具体的には，先頭文字が次の文字と同一か判断し，さらに先頭文字がさらに次の文字と同一かどうか判断するといった処理を繰り返す。そして，先頭文字がすべての文字と同一でないとわかった場合，２番目の文字と３番目の文字とが同一かどうか判断し，さらに２番目の文字が４番目の文字と同じかどうか判断する。このような処理を順次繰り返す。そして，文字列を構成するいずれか２つ以上の文字が同一と判断された場合も，通常の場合と同様に各文字に座標値を割り当てる。そして，差分値を求める際に，各文字の差分値の絶対値を求め，この差分値の絶対値を比較する比較回路や比較プログラムなどの比較手段により，差分値を比較する。そして，その差分値のうちもっとも小さいものを差分値として採用する。 Examples of the operation of the system according to this aspect include the following. A determination means for determining whether or not each character string includes two or more of the same characters determines whether or not each character string includes two or more of the same characters. More specifically, the process of determining whether the first character is the same as the next character, and determining whether the first character is the same as the next character is repeated. If it is determined that the first character is not the same as all characters, it is determined whether the second character is the same as the third character, and further, it is determined whether the second character is the same as the fourth character. Such processing is sequentially repeated. When any two or more characters constituting the character string are determined to be the same, a coordinate value is assigned to each character as in the normal case. Then, when obtaining the difference value, the absolute value of the difference value of each character is obtained, and the difference value is compared by comparison means such as a comparison circuit or a comparison program for comparing the absolute value of the difference value. Then, the smallest difference value is adopted as the difference value.

たとえば，入力文字列が"ＡＢＢ"の場合，先頭文字Ａは，２番目の文字，３番目の文字と相違する。しかし，２番目の文字Ｂは，３番目の文字Ｂと一致する。したがって，文字列を構成する文字に同一のものがあるので，それを処理するため回路又はプログラムによる処理が行われることとなる。この場合も，通常の場合と同様に座標値を与える。すなわち，（Ａ，Ｂ１，Ｂ２）＝（０，０．５，１）とする。一方，記録文字列が"ＡＢＣ"である場合は，（Ａ，Ｂ，Ｃ）＝（０，０．５，１）であるから，入力文字列（Ａ，Ｂ，Ｂ）に基づく差分値は以下のように求められる。（Ａ，Ｂ１，Ｂ２）の差分値Δ＝（０，０，０．５）となる。よって，評価値は，０．２５となる。一方，記録文字列（Ａ，Ｂ，Ｃ）の差分値を求めることを考える。入力文字列のうちＡの座標は０であり，入力文字列にはＣの文字がない。よって，記録文字列のうち文字Ａと文字Ｃの差分値は，それぞれ０及び１となる。一方，記録文字列の文字Ｂに対応する入力文字列の文字は文字Ｂ１及び文字Ｂ２の二つ存在する。そこで，本態様では，それら二つの文字に対する差分値を求める。すなわち，文字Ｂ１の座標値との差分値は，０．５−０．５＝０となる。一方，文字Ｂ２の座標値との差分値は，０．５−１＝−０．５となる。そして，それらの絶対値は，それぞれ０と０．５であるから，それらの絶対値を比較して，絶対値の小さい方である０を，文字Ｂの差分値として採用する。すなわち，記録文字列（Ａ，Ｂ，Ｃ）に基づく差分値Δは，（０，０，１）となるので,記録文字列（Ａ，Ｂ，Ｃ）の評価値は，０^２＋０^２＋１^２＝１となる。上記のような処理は，たとえば，ＣＰＵなどが，計算に必要な情報をメモリから読み出し，メインメモリ中の制御プログラムの指令を受け，読み出した情報に基づいて評価値を求めればよい。 For example, when the input character string is “ABB”, the first character A is different from the second character and the third character. However, the second character B matches the third character B. Therefore, since there is the same character constituting the character string, processing by a circuit or a program is performed to process it. In this case as well, coordinate values are given as in the normal case. That is, (A, B1, B2) = (0, 0.5, 1). On the other hand, when the recorded character string is “ABC”, since (A, B, C) = (0, 0.5, 1), the difference value based on the input character string (A, B, B) is It is calculated as follows. The difference value Δ of (A, B1, B2) = (0, 0, 0.5). Therefore, the evaluation value is 0.25. On the other hand, it is considered to obtain the difference value of the recorded character string (A, B, C). The coordinate of A in the input character string is 0, and there is no C character in the input character string. Therefore, the difference values between the characters A and C in the recorded character string are 0 and 1, respectively. On the other hand, there are two characters B1 and B2 in the input character string corresponding to the character B in the recorded character string. Therefore, in this embodiment, a difference value for these two characters is obtained. That is, the difference value from the coordinate value of the character B1 is 0.5−0.5 = 0. On the other hand, the difference value from the coordinate value of the character B2 is 0.5-1 = −0.5. Since the absolute values thereof are 0 and 0.5, respectively, the absolute values are compared, and 0 having the smaller absolute value is adopted as the difference value of the character B. That is, since the difference value Δ based on the recorded character string (A, B, C) is (0, 0, 1), the evaluation value of the recorded character string (A, B, C) is 0 ² +0 ² +1. ² = 1. In the processing as described above, for example, a CPU or the like may read information necessary for calculation from the memory, receive an instruction of a control program in the main memory, and obtain an evaluation value based on the read information.

上記のような処理を行うシステムでは，例えば文字列（ＡＢＣＤＥＦ）と文字列（ＡＢＣＤＥＡ）とを比較する場合，（ＡＢＣＤＥＦ）に基づく差分値を算出する際に，（ＡＢＣＤＥＡ）の末尾のＡの座標値を比較する事態を防止できるので，適切な評価値を算出できることとなる。 In the system that performs the above processing, for example, when comparing a character string (ABCDEF) with a character string (ABCDEFA), when calculating a difference value based on (ABCDEF), the coordinates of A at the end of (ABCDEA) Since the situation where the values are compared can be prevented, an appropriate evaluation value can be calculated.

本発明の好ましい別の態様は，入力文字列を，記録文字列のうち評価度の最も小さい文字列に置換させる置換手段を具備するものである。このような置換手段によれば，入力文字列を，記録文字列のうち評価度の最も小さい文字列に置換させることができる。上記のような処理は，ＣＰＵが，所定の情報をメモリから読み出し，メインメモリ中の制御プログラムの指令を受け，入力文字列を記録文字列に変換すればよい。 In another preferred embodiment of the present invention, there is provided replacement means for replacing an input character string with a character string having the smallest evaluation degree among recorded character strings. According to such replacement means, it is possible to replace the input character string with the character string having the lowest evaluation degree among the recorded character strings. In the processing as described above, the CPU may read predetermined information from the memory, receive an instruction from the control program in the main memory, and convert the input character string into a recorded character string.

この態様のシステムは，たとえば，ワードプロセッサのオートコレクトシステムとして利用されうる。また，ウェブサイトを用いたアンケートなどでは，たとえば "ルイヴィトン（登録商標）""ルイビトン"，及び"ビトン"などの文字列が，同じ概念を指す語として入力される。このような入力文字列を，あらかじめ登録した記録文字列である"ルイヴィトン"（登録商標）に補正することができるので，同一の観念を持つ語に対して様々な単語が入力されるアンケートデータを容易に集計するシステムなどに有効である。 The system of this aspect can be used as, for example, a word processor autocorrect system. In a questionnaire using a website, for example, character strings such as “Louis Vuitton (registered trademark)”, “Louis Vuitton”, and “Vuitton” are input as words indicating the same concept. Since such an input character string can be corrected to "Louis Vuitton" (registered trademark) which is a recorded character string registered in advance, questionnaire data in which various words are input for words having the same idea This is effective for systems that easily count

本発明の好ましい別の態様は，前記記憶手段が，様々な記憶文字列を記憶する手段であり,さらに前記記憶文字列の各文字に対して，その先頭の文字を０とし，その末尾の文字を１とし，前記記憶文字列の文字数をＭとしたときに，その先頭と末尾との間の文字が等間隔となるように，前記記憶文字列の第ｍ番目の文字に対して，（ｍ−１）／（Ｍ−１）の数値を割り当てるための数値割り当て手段をさらに上記の評価システムがあげられる。 In another preferred embodiment of the present invention, the storage means stores various stored character strings, and for each character of the stored character string, the leading character is set to 0 and the trailing character is Is 1 and the number of characters in the stored character string is M, the mth character of the stored character string is (m The evaluation system further includes a numerical value assigning means for assigning a numerical value of -1) / (M-1).

この態様に係る評価システムは，記憶文字列に対してあらかじめ数値を割り当てるのではなく，評価値を求めるにあたり，入力文字列のみならず，記憶文字列に対しても数値を割り当てる計算を行う。したがって，先に説明した態様にかかる評価システムに比べて処理速度が遅くなることが想定される。しかし，この態様に係るシステムでは，あらかじめ記憶文字列の座標値を設定する必要がないので，あらゆるデータベースなどに記憶された文字列と入力文字列との類似度を評価できることとなる。 The evaluation system according to this aspect does not assign a numerical value to a stored character string in advance, but performs a calculation to assign a numerical value not only to an input character string but also to a stored character string when obtaining an evaluation value. Therefore, it is assumed that the processing speed is slower than the evaluation system according to the aspect described above. However, in the system according to this aspect, since it is not necessary to set the coordinate value of the stored character string in advance, the similarity between the character string stored in any database or the like and the input character string can be evaluated.

本発明のシステムとして，共起関係の強さを考慮して評価値を求めるものは，本発明の好ましい別の態様である。共起関係とは，言語が運用される際に，統計的にある複数の属性の組合せが現れる頻度の高低を表現するものである。ある複数の属性の組合せが現れる頻度が高いほど，それらの属性の共起関係は強いとする。属性の重要度とは，検索者が指定した検索式が含む属性が，システムが生成する類似した検索式に含まれていることを，検索者がどれくらい望むかを表現するものである。検索者が指定した検索式が含む属性が，システムが生成する類似した検索式に含まれていることを，検索者が強く望む属性ほど，その属性の重要度は高い。 As a system of the present invention, a system for obtaining an evaluation value in consideration of the strength of the co-occurrence relationship is another preferable aspect of the present invention. The co-occurrence relationship expresses the frequency of occurrence of a statistical combination of a plurality of attributes when a language is operated. The more frequently a combination of attributes appears, the stronger the co-occurrence of those attributes. The attribute importance level expresses how much the searcher wants the attribute included in the search expression specified by the searcher to be included in a similar search expression generated by the system. The attribute that the searcher strongly desires that the attribute included in the search expression specified by the searcher is included in the similar search expression generated by the system is more important.

すなわち，ある検索式が複数の属性Ａ1 ，…，Ａr を含む時に、これら複数の属性Ａ1 ，…，Ａr のうち、ある属性Ａs とその他の属性Ａ1 ，…，Ａs-1 Ａs+1 ，…，Ａr の間の共起関係が強いほど、その属性Ａs の重要度が高くなるように、重要度を計算し、異なる検索式間の類似度の計算は、類似度の評価尺度を属性の現れかたの類似性とし、前記検索式が前記複数の属性Ａ1 ，…，Ａr を含むなら、これら複数の属性Ａ1 ，…，Ａr のうち、ある属性Ａs と、その他の属性Ａ1 ，…，Ａs-1 Ａs+1 ，…，Ａr との共起関係の強さと、前記ある属性Ａs とは異なる任意の属性Ａt と、前記その他の属性Ａ1 ，…，Ａs-1 Ａs+1 ，…，Ａr との共起関係の強さの差が小さいほど、属性Ａs と属性Ａt の間の類似度が高くなるようにして、求める検索式において、重要度がより低い属性を他の属性に置換するか、または、重要度がより低い属性を検索式から除去するかいずれかの処理を行い、前記検索式と類似度が高い類似検索式を得ることを特徴とする類似検索方法（たとえば，特開平９−１５３０６８号公報，"類似検索方法及び装置"の請求項1に記載の発明）などを適宜利用すればよい。 That is, when a certain search expression includes a plurality of attributes A1,..., Ar, a certain attribute As and other attributes A1,..., As-1 As + 1,. The importance is calculated so that the importance of the attribute As increases as the co-occurrence relationship between Ar increases, and the calculation of the similarity between different search formulas uses the evaluation scale of similarity as the attribute appearance. If the retrieval formula includes the plurality of attributes A1,..., Ar, among the plurality of attributes A1,..., Ar, one attribute As and other attributes A1,. .., Ar, the co-occurrence of the co-occurrence relationship with Ar + 1, the arbitrary attribute At different from the certain attribute As, and the other attributes A1,..., As-1 As + 1 As + 1,. The smaller the difference in the strength of the relationship, the higher the similarity between the attribute As and the attribute At. Either replace the less important attribute with another attribute or remove the less important attribute from the search expression to obtain a similar search expression with a high similarity to the search expression A similar search method (for example, the invention described in claim 1 of Japanese Patent Laid-Open No. 9-153068, “similar search method and apparatus”) may be used as appropriate.

類似検索式を求める場合、重要度がより低い属性を他の属性に置換するか，または，重要度がより低い属性を検索式から除去するかいずれかの処理を行い，検索者が入力した検索式と類似度の高い類似検索式を得る。ここで、類似度の評価尺度は属性の現れかたの類似性とする。即ち、検索式が複数の属性Ａ1 ，…，Ａr を含むなら，これら複数の属性Ａ1 ，…，Ａr のうち，ある属性Ａs と，その他の属性Ａ1 ，…，Ａs-1 Ａs+1 ，…，Ａr との共起関係の強さと，ある属性Ａs とは異なる任意の属性Ａt と，その他の属性Ａ1 ，…，Ａs-1 Ａs+1 ，…，Ａr との共起関係の強さの差が小さいほど，属性Ａs と属性Ａt の間の類似度が高いとすればよい。 When obtaining a similar search expression, either the lower importance attribute is replaced with another attribute, or the lower importance attribute is removed from the search expression, and the search entered by the searcher A similarity search formula having a high similarity to the formula is obtained. Here, the similarity evaluation scale is the similarity of how the attribute appears. That is, if the search expression includes a plurality of attributes A1,..., Ar, among these attributes A1,..., Ar, a certain attribute As and other attributes A1, ..., As-1 As + 1,. The difference between the strength of the co-occurrence relationship with Ar and the strength of the co-occurrence relationship between an arbitrary attribute At different from a certain attribute As and the other attributes A 1,..., As-1 As + 1,. The smaller the value, the higher the similarity between the attribute As and the attribute At.

このようにすれば，属性の間の重要度と類似度をシステム運用者が予め設定する必要がなくなり，属性の数が多くなっても装置の運用が困難にならず，また，重要度と類似度について多くの検索者の感覚が一致するという効果が得られる。 In this way, it is not necessary for the system operator to set the importance and similarity between attributes in advance, and the operation of the device does not become difficult even if the number of attributes increases, and is similar to the importance. The effect is obtained that the feelings of many searchers agree on the degree.

本発明の評価システムを，類似度算出部として用いることで，類似文字列検索システムを提供できる。 By using the evaluation system of the present invention as a similarity calculation unit, a similar character string search system can be provided.

このような類似文字列検索システムは，たとえば，入力された文字列中からキーワードを抽出するキーワード抽出部と、階層的構造を持つデータベース中の注目している項目から所定の複数レベル下位の階層までの全ての項目を順次読み出す読出し部と、この読み出し部によって読み出した項目と、キーワードとの類似度を算出する類似度算出部と、この類似度算出部によって算出された類似度に基づいて、項目を特定する項目特定部とを備えた類似文字列検索システムである。 Such a similar character string search system includes, for example, a keyword extraction unit that extracts a keyword from an input character string, and from a focused item in a database having a hierarchical structure to a hierarchy lower than a plurality of predetermined levels. Based on the reading unit that sequentially reads all items of the item, the item read by the reading unit, the similarity calculation unit that calculates the similarity to the keyword, and the similarity calculated by the similarity calculation unit, This is a similar character string search system including an item specifying unit for specifying.

キーワード抽出部は，キー分解部と，コマンド決定部とにより構成される。キー分解部は、入力文からキーワードを切り出すものである。コマンド決定部２は、キー分解部によって入力文から切り出したキーワードからコマンド（検索、登録、更新、削除など）を決定するものである。項目検索部は、データ中からキーワードに対応する項目を検索するものであって、読出し部，類似度算出部，及び項目特定部から構成されている。読出し部は、データ中の注目している項目から下位の所定レベルまでの階層に登録されている全ての項目を順次読み出すものである。類似度算出部は、キーワード（入寮文字列）に対して，読出し部によってデータ中から読み出した項目（記憶文字列）の類似度である評価値をそれぞれ算出し，最小評価値及びこの時の文字列（項目の文字列）を類似テーブルに格納するものである。項目特定部は、類似度算出部５によって算出された最大類似度およびこの時の最大類似文字列に基づいて、キーワードに対応する項目を特定するものである。データは、データベース中に格納されている項目などである。類似テーブルは、類似度算出部によって算出された最大類似度およびこの時の最大類似文字列を格納するものである。 The keyword extraction unit is composed of a key decomposition unit and a command determination unit. The key decomposition unit extracts keywords from the input sentence. The command determination unit 2 determines commands (search, registration, update, deletion, etc.) from the keywords extracted from the input sentence by the key decomposition unit. The item retrieval unit retrieves an item corresponding to a keyword from data, and includes a reading unit, a similarity calculation unit, and an item specifying unit. The reading unit sequentially reads all the items registered in the hierarchy from the item of interest to the lower predetermined level in the data. The similarity calculation unit calculates an evaluation value that is the similarity of the item (stored character string) read out from the data by the reading unit with respect to the keyword (dormitory character string), and the minimum evaluation value and the character at this time A column (item character string) is stored in a similar table. The item specifying unit specifies an item corresponding to the keyword based on the maximum similarity calculated by the similarity calculating unit 5 and the maximum similar character string at this time. The data is items stored in the database. The similarity table stores the maximum similarity calculated by the similarity calculation unit and the maximum similarity character string at this time.

次に、この類似文字列検索システムの動作を説明する。まず，任意要素であるキー分解部は、入力文字列をキーワードに分解する。そして，キーワードをコマンド決定部２に通知する。これらキーワードの通知を受けたコマンド決定部は、コマンドを決定、例えばキーワードとして入力された入力文字列に基づいて、入力文に対して実行する処理を検索（検索コマンド）と決定し、項目検索部に通知する。項目検索部を構成する読出し部は、キーワードに対応する候補項目，例えば注目する項目から２レベル下位までの全ての項目をデータ７中から順次読み出して類似度算出部に通知する。類似度算出部は、この通知を受けた項目に対するキーワードとの類似度を，夫々算出し、最小評価値の値およびこの時の項目の文字列を類似テーブルに格納する。項目特定部は、この類似テーブルに格納された最小評価値およびこの時の最大類似文字列に基づいて、注目すべき項目を特定する。以下同様に、次のキーワードについてこの注目している項目から下位２レベルまでの全ての項目を読み出し、その最小評価値およびこの時の最大類似文字列を類似テーブルに格納して、該当する項目を特定する。 Next, the operation of this similar character string search system will be described. First, the key decomposition unit, which is an optional element, decomposes the input character string into keywords. Then, the keyword is notified to the command determination unit 2. Upon receiving these keyword notifications, the command determination unit determines the command, for example, determines the processing to be executed for the input sentence based on the input character string input as the keyword (search command), Notify The reading unit constituting the item search unit sequentially reads out candidate items corresponding to the keyword, for example, all items from the item of interest to the lower level of the second level from the data 7 and notifies the similarity calculation unit. The similarity calculation unit calculates the similarity between the notified item and the keyword, and stores the value of the minimum evaluation value and the character string of the item at this time in the similarity table. The item specifying unit specifies an item to be noted based on the minimum evaluation value stored in the similar table and the maximum similar character string at this time. Similarly, for the next keyword, all items from the focused item to the lower two levels are read out, the minimum evaluation value and the maximum similar character string at this time are stored in the similarity table, and the corresponding item is Identify.

以上のように、入力文から切り出したキーワードについて、順次注目する項目から所定レベル下位の階層までの全ての項目を読み出し、最も類似している項目を特定し、次にこの特定した項目に注目して同様に繰り返し最も類似する項目を特定することにより、簡単なシステムを用いて入力文から切り出したキーワードに最も類似する項目を容易に検索できることとなる。 As described above, with regard to the keywords extracted from the input sentence, all items from the sequentially focused item to the hierarchy lower than the predetermined level are read, the most similar item is identified, and then the identified item is focused. Similarly, by repeatedly specifying the most similar item, it is possible to easily search for the item most similar to the keyword extracted from the input sentence using a simple system.

実施例１では，類似度の評価値の妥当性を検証した。あるカラオケ機種の配信曲リストを用いて，本発明のシステムの実行性を確認した。具体的には，パーソナルコンピュータに，本発明のシステムを機能させるプログラムを記憶させ，コンピュータを各手段として機能させた。 In Example 1, the validity of the similarity evaluation value was verified. Using the distribution song list of a certain karaoke model, the operability of the system of the present invention was confirmed. Specifically, a program for functioning the system of the present invention was stored in a personal computer, and the computer was functioned as each means.

アルゴリズムは上記に説明したものを用いた。 The algorithm described above was used.

そして，コンピュータのハードディスクに，約8000曲のタイトルを記憶するリレーショナルデータベースからなる辞書を作成した。そして，本発明のシステムに「ダンシング夏祭り」という単語を入力し，類似度の高い曲を探した。 A dictionary consisting of a relational database storing about 8000 titles was created on the computer's hard disk. Then, the word “Dancing Summer Festival” was entered into the system of the present invention to search for a song with a high degree of similarity.

その結果は以下のとおりであった。
順位曲名類似度
１ダンシング！夏祭り 0.113746
２だんじり 0.757937
３ダーリング 0.795599
４黄昏ダンシング 0.819262
５夏祭り 0.917304
６夏祭り 0.917304
７シンシア 0.949830
８ダンシング・オールナイト 0.974581
９リンダリンダ 1.020226
１０ギャンブリング 1.037901
(近いものから10曲のみ表示) The results were as follows.
Rank Song name Similarity 1 Dancing! Summer festival 0.113746
2 Danjiri 0.757937
3 Darling 0.795599
4 Twilight Dancing 0.819262
5 Summer Festival 0.917304
6 Summer Festival 0.917304
7 Cynthia 0.949830
8 Dancing All Night 0.974581
9 Linda Linda 1.020226
10 Gambling 1.037901
(Display only 10 songs from the nearest)

実施例２は，入力文字列が，辞書に登録されるいずれに最も近いかを評価するアンケートシステムとして本発明の評価システムを利用した例である。この例では，本発明のシステムは，サーバとして構成した。そして，ウェブサイトにおいて，「好きなブランドをお答えください」という質問をし，解答を募った。 Example 2 is an example in which the evaluation system of the present invention is used as a questionnaire system for evaluating which input character string is closest to which is registered in the dictionary. In this example, the system of the present invention is configured as a server. And on the website, I asked the question “Please answer your favorite brand” and asked for an answer.

本発明のシステムによれば，たとえば "ルイヴィトン（登録商標）""ルイビトン"，及び"ビトン"などの入力単語を，あらかじめ登録した単語"ルイヴィトン"（登録商標）に補正することができるので，同一の観念を持つ語に対して様々な単語が入力されるアンケートデータを容易に集計するシステムなどとして利用されうる。 According to the system of the present invention, for example, input words such as “Louis Vuitton (registered trademark)”, “Louis Vuitton”, and “Vuitton” can be corrected to the previously registered word “Louis Vuitton” (registered trademark). , It can be used as a system for easily counting questionnaire data in which various words are input for words having the same concept.

また，本発明のシステムは，あらかじめ登録した単語に補正できるので，単語の自動補正機能を有するワープロソフトとして利用されうる。また，本発明のシステムでは，様々な単語を含むデータベースに記憶される各単語と，入力された単語の類似度を求めることができ，さらに選択される語の類似度を設定できるので，インターネットのサーチエンジンにおける検索システムとして利用されうる。 In addition, since the system of the present invention can be corrected to pre-registered words, it can be used as word processing software having an automatic word correction function. In the system of the present invention, the similarity between each word stored in a database including various words and the input word can be obtained, and the similarity of the selected word can be set. It can be used as a search system in a search engine.

図１は，本発明の第一の態様に係る評価システムのブロック図である。FIG. 1 is a block diagram of an evaluation system according to the first aspect of the present invention. 図２は，インターネットに接続されたコンピュータである本発明のシステムの例を示す図である。FIG. 2 is a diagram showing an example of the system of the present invention which is a computer connected to the Internet. 図３は，第一の態様に係る本発明のシステムの動作例を説明するためのフローチャートである。FIG. 3 is a flowchart for explaining an operation example of the system of the present invention according to the first aspect.

Explanation of symbols

１第一の実施の形態に係るシステム
２文字列入力手段
３文字数算出手段
４数値割り当て手段
５記憶手段
６文字一致性判断手段
７差分手段
８二乗手段
９評価値算出手段
１０メモリ
１１出力装置
２１コンピュータ
２２インターネット網
２３コンピュータ
DESCRIPTION OF SYMBOLS 1 System based on 1st Embodiment 2 Character string input means 3 Character number calculation means 4 Numerical value assignment means 5 Storage means 6 Character matching judgment means 7 Difference means 8 Square means 9 Evaluation value calculation means 10 Memory 11 Output device 21 Computer 22 Internet network 23 Computer

Claims

A character string input means for inputting a character string;
A character number calculating means for calculating the number of characters constituting the input character string input by the character string input means;
For each character of the input character string input by the character string input means, if the leading character is 0, the trailing character is 1, and the number of characters calculated by the character count calculating means is N, the leading character is Numerical value assigning means for assigning a numerical value of (n-1) / (N-1) to the nth character of the input character string so that the characters between and the end are equally spaced; ,
For various stored character strings and each character of the stored character string, when the leading character is 0, the trailing character is 1, and the number of characters in the stored character string is M, Storage means for assigning and storing a numerical value of (m−1) / (M−1) for the mth character of the stored character string so that the characters between the end and the end are equally spaced; ,
Character matching judgment means for judging whether or not each character constituting the input character string matches each character constituting the stored character string stored in the dictionary,
A difference means for calculating a difference between a numerical value assigned to each character of the input character string and a numerical value assigned to the same character as each character of the input character string in the stored character string;
A squaring means for squaring the value of each character difference in the input character string obtained by the difference means;
An evaluation value calculating means for calculating an evaluation value by calculating a sum of square values of characters of the input character string obtained by the square means;
With
An evaluation system for evaluating the similarity between an input character string and a stored character string.

Evaluation value setting means for setting an evaluation value in advance;
An evaluation value comparing means for comparing a calculated evaluation value that is an evaluation value calculated by the table value calculating means with a set evaluation value that is a table value set by the evaluation value setting means;
2. The evaluation system according to claim 1, further comprising selection means for selecting the stored character string when the calculated table value is smaller than the set table value as a result of the comparison by the evaluation value comparison unit.

3. The evaluation system according to claim 2, further comprising: a sorting unit that rearranges the calculated evaluation values of each stored character string in ascending order when there are a plurality of stored character strings selected by the selecting unit.

The difference value about the character which is not contained is set to 1 in any one of the said input character strings, when the said character string is not contained in the said memory character string. The described evaluation system.

The difference means is a means for obtaining a difference between a numerical value assigned to each character of the stored character string and a numerical value assigned to the same character as each character of the stored character string in the input character string. The evaluation system according to any one of claims 1 to 4.

The difference means is:
The difference between the numerical value assigned to each character of the input string and the numerical value assigned to the same character as each character of the input character string in the stored character string,
Means for determining a difference between a numerical value assigned to each character of the stored character string and a numerical value assigned to the same character as each character of the stored character string in the input character string;
The difference between the numerical value assigned to each character of the input character string obtained by the difference means and the numerical value assigned to the same character as each character of the input character string in the stored character string is squared, and the sum of them is calculated. An input-derived evaluation value calculating means for obtaining a temporary evaluation value and determining an input-derived evaluation value by dividing the temporary evaluation value by N;
By squaring the value assigned to each character of the stored character string and the value assigned to the same character as each character of the stored character string in the input character string, and obtaining the sum thereof, A memory-derived evaluation value calculating means for obtaining a temporary evaluation value and obtaining a memory-derived evaluation value by M being the evaluation value;
The input-derived evaluation value calculated by the input-derived evaluation value calculating means and the memory-derived evaluation value calculated by the memory-derived evaluation value calculating means are combined into an evaluation value. Evaluation system.

The difference means is:
The difference between the numerical value assigned to each character of the input string and the numerical value assigned to the same character as each character of the input character string in the stored character string,
Means for determining a difference between a numerical value assigned to each character of the stored character string and a numerical value assigned to the same character as each character of the stored character string in the input character string;
The difference between the numerical value assigned to each character of the input character string obtained by the difference means and the numerical value assigned to the same character as each character of the input character string in the stored character string is squared, and the sum of them is calculated. To obtain a temporary evaluation value, and square the difference between the numerical value assigned to each character of the stored character string and the numerical value assigned to the same character as the stored character string of the input character string. Then, by calculating the sum of them, a temporary evaluation value is obtained,
6. The evaluation system according to claim 1, wherein a sum of these provisional evaluation values is obtained, and a value obtained by dividing the sum by (N + M) is obtained as an evaluation value.

When the input character string or the recorded character string includes two or more of the same characters, when calculating the difference between the numerical values assigned to each character, use the character with the smaller absolute value of the difference. The evaluation system according to claim 1, wherein a difference is obtained.

The storage means is means for storing various stored character strings,
For each character in the stored character string, the number of characters in the stored character string is such that the leading character is 0, the trailing character is 1, and the characters between the leading and trailing are equally spaced. A numerical value assigning means for assigning a numerical value of (m-1) / (M-1) to the mth character of the stored character string when M is M. Item 9. The evaluation system according to any one of Items 8.

The program for functioning a computer or a server as an evaluation system in any one of Claims 1-9.

A recording medium storing the program according to claim 10.