JP3230641B2

JP3230641B2 - String search device

Info

Publication number: JP3230641B2
Application number: JP10979495A
Authority: JP
Inventors: 泰親岸元
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1995-05-08
Filing date: 1995-05-08
Publication date: 2001-11-19
Anticipated expiration: 2016-11-19
Also published as: JPH08305722A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、文字読取装置を備えた
電子機器などに用いるものであって、データベースから
検索文字列に一致する文字列を含むデータを検索する文
字列検索装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character string search apparatus for use in an electronic apparatus having a character reading device and for searching a database for data including a character string matching a search character string.

【０００２】[0002]

【従来の技術】従来、データベースの検索では、データ
ベース中の各データのタイトルやキーワードまたはデー
タ本体などの文字列を対象として行われる場合がある。
このような文字列の検索（文字列探索[string searchin
g]または文字列照合[string pattern matching]）は、
検索文字列（パターン[pattern]）を指定して、各デー
タ中の検索対象となる欄（探索キー[search key]）の被
検索文字列（テキスト[text]）がこの検索文字列に一致
する文字列を含むかどうかを判断するものである。従来
の文字列の検索処理を説明する。この検索処理では、図
１０に示すように、検索文字列Ｐを指定してデータベー
ス中の各データの被検索文字列Ｔを検索するものとす
る。ここで、各文字は□で示す。また、検索文字列Ｐ
は、文字数をｍ文字（図ではｍ＝４）とし、ｉ文字目の
文字はＰ[i]で表すと共に、被検索文字列Ｔは、文字数
をｎ文字とし、ｉ文字目の文字はＴ[i]で表すものとす
る。検索処理は、検索文字列Ｐが被検索文字列Ｔに含ま
れる文字数ｍの各部分文字列と一致するかどうかを検査
する。この部分文字列は、被検索文字列Ｔが検索文字列
Ｐの文字数以上の場合にのみ存在し（ｎ≧ｍ）、この被
検索文字列Ｔ中にｎ−ｍ＋１個存在する。即ち、被検索
文字列Ｔ中の１文字目の文字からｍ文字目の文字までの
Ｔ[1]〜Ｔ[m]の文字列が１番目の部分文字列となり、Ｔ
[2]〜Ｔ[m+1]の文字列が２番目の部分文字列となって、
以降同様にＴ[n-m+1]〜Ｔ[n]の文字列がｎ−ｍ＋１番目
の最後の部分文字列となる。2. Description of the Related Art Conventionally, a database search may be performed on a character string such as a title, a keyword, or a data body of each data in the database.
Search for such a string (string searchin
g] or string pattern matching)
Specify the search string (pattern [pattern]), and the search target string (text [text]) of the field to be searched (search key [search key]) in each data matches this search string This is to determine whether or not it contains a character string. A conventional character string search process will be described. In this search processing, as shown in FIG. 10, a search character string P is designated and a search is performed on the search target character string T of each data in the database. Here, each character is indicated by □. Also, the search string P
Sets the number of characters to m (m = 4 in the figure), the i-th character is represented by P [i], the searched character string T has n characters, and the i-th character is T [ i]. The search process checks whether or not the search character string P matches each partial character string of the number m of characters included in the search target character string T. This partial character string exists only when the search target character string T is equal to or more than the number of characters of the search character string P (n ≧ m), and there are (n−m + 1) pieces in the search target character string T. That is, the character string of T [1] to T [m] from the first character to the m-th character in the search target character string T becomes the first partial character string.
The character string of [2] to T [m + 1] becomes the second partial character string,
Thereafter, similarly, the character string of T [n-m + 1] to T [n] becomes the (n-m + 1) -th last partial character string.

【０００３】図１１に示すように、検索処理の最初のス
テップ（以下「Ｓ」という）５１では、まず部分文字列
の先頭位置ｔｏｐの値を“０”に初期化する。この先頭
位置ｔｏｐの値は、検査対象となる部分文字列の先頭文
字が被検索文字列Ｔの先頭文字Ｔ[1]から何文字目にな
るかを示す。そして、文字数ｎの被検索文字列Ｔの先頭
位置ｔｏｐ以降に文字数ｍ分の文字があるかどうかを判
断して（Ｓ５２）、ｍ文字分に足りず部分文字列が存在
しない場合には検索結果が不一致であるとして検索処理
を終了する。検索処理の開始時にここで検索結果が不一
致であるとして終了するのは、被検索文字列Ｔの文字数
ｎが検索文字列Ｐの文字数ｍよりも少なかった場合であ
る。As shown in FIG. 11, in a first step (hereinafter, referred to as "S") 51 of a retrieval process, first, a value of a top position top of a partial character string is initialized to "0". The value of the top position top indicates the number of the first character of the partial character string to be inspected from the first character T [1] of the searched character string T. Then, it is determined whether or not there are m characters in the number of characters after the head position top of the searched character string T having the number of characters n (S52). Are not matched, and the search process ends. At the start of the search process, the search result is determined to be inconsistent, and the process ends when the number n of characters of the searched character string T is smaller than the number m of characters of the search character string P.

【０００４】Ｓ５２でｍ文字分の文字があり部分文字列
が存在すると判断された場合には、カウンタｉの値を
“１”に初期化して（Ｓ５３）、検索文字列Ｐにおける
ｉ文字目の文字Ｐ[i]が被検索文字列Ｔにおける部分文
字列の先頭位置ｔｏｐからｉ文字目の文字Ｔ[top+i]に
一致するかどうかを判断する（Ｓ５４）。そして、これ
らの文字が一致する場合には、カウンタｉに“１”を加
えて（Ｓ５５）、このカウンタｉの値が検索文字列Ｐの
文字数ｍを超えるまでの間、Ｓ５４に戻りこの処理を繰
り返す（Ｓ５６）。また、Ｓ５６でカウンタｉの値が文
字数ｍを超えたと判断された場合には、検索結果が一致
であるとして検索処理を終了する。したがって、被検索
文字列Ｔにおける先頭位置ｔｏｐからｍ文字分の部分文
字列の各文字が検索文字列Ｐの同じ文字位置で対応する
文字と全て一致する場合には、このＳ５４〜Ｓ５６のル
ープをｍ回繰り返した後に検索結果が一致であるとして
終了する。If it is determined in S52 that there are m characters and a partial character string exists, the value of the counter i is initialized to "1" (S53), and the i-th character in the search character string P It is determined whether or not the character P [i] matches the i-th character T [top + i] from the top position top of the partial character string in the search target character string T (S54). If these characters match, "1" is added to the counter i (S55), and the process returns to S54 until the value of the counter i exceeds the number m of characters of the search character string P, and this processing is repeated. It repeats (S56). If it is determined in step S56 that the value of the counter i has exceeded the number of characters m, the search processing is determined to be a match and the search process ends. Therefore, when all the characters of the partial character string of m characters from the head position top in the search target character string T match all the corresponding characters at the same character position of the search character string P, the loop of S54 to S56 is performed. After repetition of m times, it is determined that the search result is a match and the process ends.

【０００５】しかし、このＳ５４〜Ｓ５６のループの間
に１文字でも文字の不一致が検出されると、Ｓ５４でこ
のループから脱する。そして、被検索文字列Ｔの先頭位
置ｔｏｐを１文字分先に進めて次の部分文字列を設定す
ると共に（Ｓ５７）、Ｓ５２に戻り上記処理を繰り返
す。したがって、Ｓ５２〜Ｓ５７のループでは、図１０
に示した先頭位置ｔｏｐ＝０の場合の１番目の部分文字
列Ｔ[1]〜Ｔ[m]から先頭位置ｔｏｐ＝ｎ−ｍの場合のｎ
−ｍ＋１番目の部分文字列Ｔ[n-m+1]〜Ｔ[n]までの各部
分文字列が検索文字列Ｐに一致するかどうかを順に検査
することになり、一致する部分文字列が発見された場合
にはＳ５６でループから脱し検索結果が一致であるとし
て検索処理を終了する。また、いずれの部分文字列とも
一致しなかった場合には、Ｓ５２でこれ以上部分文字列
が存在しないと判断されてループから脱し検索結果が不
一致であるとして検索処理を終了する。However, if even one character mismatch is detected during the loop from S54 to S56, the process exits from this loop in S54. Then, the head position top of the searched character string T is advanced by one character to set the next partial character string (S57), and the process returns to S52 to repeat the above processing. Therefore, in the loop of S52 to S57, FIG.
From the first partial character string T [1] to T [m] when the head position top = 0, and n when the head position top = nm
It is checked whether or not each of the m + 1-th partial character strings T [n-m + 1] to T [n] matches the search character string P. If found, the process is exited from the loop in S56, and the search process is terminated assuming that the search result is a match. On the other hand, if there is no match with any of the partial character strings, it is determined in S52 that there are no more partial character strings, and the process exits from the loop and ends the search processing as a mismatch.

【０００６】データベースの文字列の検索では、各デー
タ中の被検索文字列に対して上記文字列の検索処理を実
行し、検索文字列がこの被検索文字列のいずれかの部分
文字列に一致したとして終了した場合に、この被検索文
字列を有するデータを抽出する。In the search for a character string in a database, the above character string is searched for a character string to be searched in each data, and the search character string matches any of the partial character strings of the character string to be searched. If the processing ends, data having the searched character string is extracted.

【０００７】なお、図１１に示すアルゴリズムでは同じ
文字を繰り返し比較する無駄があるため、実際の文字列
の検索では、ＢＭ法[Boyer-Moore string pattern matc
hingalgorithm]のような高速化アルゴリズムが用いられ
る場合が多い。In the algorithm shown in FIG. 11, there is no use in repeatedly comparing the same character. Therefore, in the actual character string search, the BM method [Boyer-Moore string pattern matc
hingalgorithm] is often used.

【０００８】[0008]

【発明が解決しようとする課題】ところで、最近の電子
機器は、タブレットなどの座標入力装置（位置入力装置
[pointing device]）を用いた手書き文字読取装置[hand
written character reader]（手書き文字認識装置[hand
written character recognition system]）や、イメー
ジスキャナなどの画像入力装置を用いた光学的文字読取
装置（ＯＣＲ[Optical Character Reader]）を備えたも
のが多くなっている。これらの文字読取装置は、入力さ
れたパターンから抽出される特徴と一致する標準パター
ンを選出することにより文字認識を行い、この標準パタ
ーンの文字コードを出力するものである。ただし、この
ような文字読取装置による文字列の入力では、一部の文
字コードを完全に確定できない場合が生じるのは避け得
ず、この場合には可能性のある複数の候補文字を列挙す
ることになる。また、文字の誤認識が発生して、入力文
字とは異なる文字コードに確定する場合もある。Incidentally, recent electronic devices include a coordinate input device (position input device) such as a tablet.
[pointing device])
written character reader] (handwritten character recognition device [hand
Many have an optical character reader (OCR [Optical Character Reader]) using an image input device such as a written character recognition system] or an image scanner. These character reading devices perform character recognition by selecting a standard pattern that matches a feature extracted from an input pattern, and output a character code of the standard pattern. However, when a character string is input by such a character reading device, it is inevitable that some character codes cannot be completely determined. In this case, a plurality of possible candidate characters must be listed. become. In some cases, erroneous recognition of a character occurs, and a character code different from the input character is determined.

【０００９】しかし、従来の文字列の検索では、検索文
字列と被検索文字列の双方の文字コードが確定していな
ければ検索処理を実行することができない。このため、
検索文字列や被検索文字列を文字読取装置で入力する電
子機器では、文字認識が不完全であった場合に、いずれ
の候補文字が正しい入力文字であるかを確定する確定作
業を行ってから検索処理を実行する必要がある。ところ
が、この確定作業は、列挙された候補文字の中から操作
者がいずれかを対話的に選択しなければならないため、
極めて面倒な作業となる。しかも、データベース中の被
検索文字列を印刷物の活字文書などから大量に入力した
ような場合には、この確定作業が実質的に不可能な場合
もある。However, in the conventional character string search, the search process cannot be executed unless the character codes of both the search character string and the searched character string are determined. For this reason,
In an electronic device that inputs a search character string or a character string to be searched using a character reading device, if character recognition is incomplete, a confirmation operation must be performed to determine which candidate character is the correct input character. You need to perform a search operation. However, since this operation requires the operator to interactively select one of the listed candidate characters,
This is extremely troublesome. In addition, when a large number of character strings to be searched in the database are input from printed type documents or the like, this determination operation may be substantially impossible.

【００１０】また、従来の文字列の検索では、文字コー
ドが完全に一致しなければ検索文字列と被検索文字列と
が一致したと判断することができないので、文字読取装
置で文字の誤認識が発生した場合には検索処理が正しく
実行されない。In a conventional character string search, it is impossible to determine that the search character string matches the search target character string unless the character codes completely match. Does not execute the search process correctly.

【００１１】このため、文字読取装置を備えた従来の電
子機器などでは、文字列の検索を有効に活用することが
できないという問題点があった。For this reason, there has been a problem that a conventional electronic device having a character reading device cannot effectively utilize a character string search.

【００１２】本発明は、上記従来の問題を解決するもの
で、文字が確定されない文字列についても一致の程度を
算出することにより検索を可能とする文字列検索装置を
提供することを目的としている。An object of the present invention is to solve the above-mentioned conventional problem, and an object of the present invention is to provide a character string search apparatus capable of searching a character string whose character is not determined by calculating the degree of coincidence. .

【００１３】[0013]

【課題を解決するための手段】本発明の文字列検索装置
は、各文字が１または２以上の候補文字からなり、か
つ、各候補文字ごとに確度値が対応付けられた不確定文
字列を検索文字列として入力する検索文字列入力手段
と、文字が確定した被検索文字列を有するデータの集合
からなるデータベースと、該データベースにおける各デ
ータ内の各被検索文字列に含まれる、該検索文字列と同
じ文字数の各部分文字列について、該部分文字列の各文
字が同じ文字位置で対応する検索文字列の文字のいずれ
かの候補文字に一致する場合に、それぞれの文字ごとに
一致した候補文字の確度値に基づいて当該部分文字列の
一致度に演算を施す一致度演算手段と、該一致度演算手
段が算出した各部分文字列の一致度を他の部分文字列の
一致度または所定値と比較して、この比較結果に基づき
選出した部分文字列を含む被検索文字列を有するデータ
を該データベースから抽出するデータ抽出手段とを備え
たものであり、そのことにより上記目的が達成される。According to the character string retrieval apparatus of the present invention, each character is composed of one or more candidate characters, and an uncertain character string in which a certainty value is associated with each candidate character. Search character string input means for inputting as a search character string, a database consisting of a set of data having the search target character string whose characters are determined, and the search character included in each search target character string in each data in the database. For each partial character string having the same number of characters as the column, if each character of the partial character string matches any of the candidate characters of the corresponding search character string at the same character position, the candidate matched for each character Matching degree calculating means for calculating the matching degree of the partial character string based on the certainty value of the character; and determining the matching degree of each partial character string calculated by the matching degree calculating means to the matching degree of another partial character string or a predetermined degree. Value and And compare the data with the search string containing selected portions string based on the comparison result and which was equipped with a data extracting means for extracting from said database, said object is achieved.

【００１４】また、好ましくは、文字が確定した検索文
字列を入力する検索文字列入力手段と、各文字が１また
は２以上の候補文字からなり、かつ、各候補文字ごとに
確度値が対応付けられた不確定文字列である被検索文字
列を有するデータの集合からなるデータベースと、該デ
ータベースにおける各データ内の各被検索文字列に含ま
れる、該検索文字列と同じ文字数の各部分文字列につい
て、該部分文字列の各文字のいずれかの候補文字が同じ
文字位置で対応する検索文字列の文字に一致する場合
に、それぞれの文字ごとに一致した候補文字の確度値に
基づいて当該部分文字列の一致度に演算を施す一致度演
算手段と、該一致度演算手段が算出した各部分文字列の
一致度を他の部分文字列の一致度または所定値と比較し
て、この比較結果に基づき選出した部分文字列を含む被
検索文字列を有するデータを該データベースから抽出す
るデータ抽出手段とを備える。[0014] Preferably, a search character string input means for inputting a search character string whose character has been determined, and each character is composed of one or more candidate characters, and a certainty value is associated with each candidate character. And a partial character string having the same number of characters as the search character string included in each search character string in each data in the database. If any of the candidate characters of each character of the partial character string matches the character of the corresponding search character string at the same character position, the relevant part is determined based on the certainty value of the candidate character matched for each character. A degree-of-match calculating means for calculating the degree of matching of the character strings; and a degree of matching of each partial character string calculated by the degree-of-match calculating means is compared with a degree of matching of another partial character string or a predetermined value. To The data having the searched string containing selected portions string Hazuki and a data extracting means for extracting from the database.

【００１５】さらに、好ましくは、各文字が１または２
以上の候補文字からなり、かつ、各候補文字ごとに確度
値が対応付けられた不確定文字列を検索文字列として入
力する検索文字列入力手段と、各文字が１または２以上
の候補文字からなり、かつ、各候補文字ごとに確度値が
対応付けられた不確定文字列である被検索文字列を有す
るデータの集合からなるデータベースと、該データベー
スにおける各データ内の各被検索文字列に含まれる、該
検索文字列と同じ文字数の各部分文字列について、該部
分文字列の各文字のいずれかの候補文字が同じ文字位置
で対応する検索文字列の文字のいずれかの候補文字に一
致する場合に、それぞれの文字ごとに一致した双方の候
補文字の確度値に基づいて当該部分文字列の一致度に演
算を施す一致度演算手段と、該一致度演算手段が算出し
た各部分文字列の一致度を他の部分文字列の一致度また
は所定値と比較して、この比較結果に基づき選出した部
分文字列を含む被検索文字列を有するデータを該データ
ベースから抽出するデータ抽出手段とを備える。Further, preferably, each character is 1 or 2
A search character string input means for inputting, as a search character string, an uncertain character string composed of the above candidate characters and associated with a certainty value for each candidate character; And a database consisting of a set of data having a character string to be searched, which is an uncertain character string in which a certainty value is associated with each candidate character, and a character string included in each character string to be searched in each data in the database. For each partial character string having the same number of characters as the search character string, any candidate character of each character of the partial character string matches one of the candidate characters of the corresponding search character string at the same character position. In this case, a matching score calculating means for calculating the matching score of the partial character string based on the likelihood value of both candidate characters matched for each character; and Data extracting means for comparing the matching degree with a matching degree of another partial character string or a predetermined value, and extracting data having a searched character string including the partial character string selected based on the comparison result from the database; .

【００１６】さらに、好ましくは、前記検索文字列入力
手段が、文字読取装置によって入力文字列の各文字を識
別し、それぞれの文字ごとに１または２以上の候補文字
を選出すると共に、選出した各候補文字ごとに認識の正
確さを示す確度値を付加するものである。Further, preferably, the search character string input means identifies each character of the input character string by a character reader, selects one or more candidate characters for each character, and selects each selected character. A certainty value indicating the accuracy of recognition is added to each candidate character.

【００１７】さらに、好ましくは、前記データベースに
おける各データの各被検索文字列が、文字読取装置によ
って入力文字列の各文字を識別し、それぞれの文字ごと
に１または２以上の候補文字を選出すると共に、選出し
た各候補文字ごとに認識の正確さを示す確度値を付加し
て入力したものである。Preferably, each character string to be searched for each data in the database identifies each character of the input character string by a character reading device, and selects one or more candidate characters for each character. In addition, a certainty value indicating the accuracy of recognition is added to each selected candidate character and input.

【００１８】[0018]

【００１９】[0019]

【作用】上記構成により、一致度演算手段は、被検索文
字列の各部分文字列について、不確定文字列からなる検
索文字列との一致の程度を一致度として算出することが
できる。しかも、いずれの候補文字にも一致しない文字
が検出された場合にも、この一致度算出手段の処理を最
後の文字まで続行することにより、不一致文字が含まれ
る部分文字列に対しても一致度の算出を行うことができ
る。また、データ抽出手段は、各部分文字列の一致度を
他の部分文字列の一致度と相互に比較したり所定値と比
較することにより、各部分文字列の最大の一致度を検出
したり、各部分文字列を一致度の高い順にソート[sorti
ng]したり、一致度が所定値以上の部分文字列を選出し
て、データベースから操作者が所望するデータを抽出す
ることができる。According to the above arrangement, the degree-of-coincidence calculating means can calculate, as the degree of coincidence, the degree of matching of each partial character string of the character string to be searched with the search character string composed of the uncertain character string. Moreover, even if a character that does not match any of the candidate characters is detected, the processing of the matching degree calculating means is continued up to the last character, so that the matching degree calculation is performed even on the partial character string including the mismatching character. Can be calculated. Further, the data extracting means detects the maximum matching degree of each partial character string by mutually comparing the matching degree of each partial character string with the matching degree of another partial character string or by comparing it with a predetermined value. , Sort each substring in descending order of matching [sorti
ng] or selecting a partial character string having a matching degree equal to or more than a predetermined value, and extracting data desired by the operator from the database.

【００２０】したがって、検索文字列が各文字の確定し
ていない不確定文字列であっても、そのままデータベー
スの各データの被検索文字列に対して検索を行うことが
可能となり、例えば一致度が最も高いデータを抽出した
り一致度が所定値以上のデータを一致度が高い順にソー
トして抽出することができるようになる。しかも、一部
に不一文字が含まれている場合にも、他の文字が一致し
て高い一致度が得られれば、そのデータも抽出の対象と
することが可能となる。Therefore, even if the search character string is an uncertain character string in which each character has not been determined, it is possible to perform a search on the search target character string of each data in the database without any change. It is possible to extract the highest data or to sort and extract data having a matching degree equal to or higher than a predetermined value in descending order of the matching degree. In addition, even when a part of the character includes a non-character, if the other character matches and a high degree of matching is obtained, the data can be extracted.

【００２１】なお、候補文字は、通常は具体的な文字コ
ードを有するものであるが、特定の文字種を表すもの
や、被検索文字列の全ての文字に一致するようなメタキ
ャラクタを用いることも可能である。データベースは、
データベーススキーマによって定義されたデータの集合
のみならず、ここではテキストデータからなるファイル
なども含む広い意味で用いている。この場合、各データ
は例えばテキストデータ中の各行とすることができる。The candidate characters usually have a specific character code, but it is also possible to use a character representing a specific character type or a metacharacter that matches all characters in the character string to be searched. It is possible. The database is
The term is used in a broad sense, including not only a set of data defined by the database schema but also a file composed of text data. In this case, each data can be, for example, each line in the text data.

【００２２】確度値は、各候補文字ごとに明示的に値が
対応付けられる他、各文字についての候補文字の総数や
当該候補文字の序列などに応じて自動的に定まるように
対応付けてもよい。この確度値は、対応する候補文字が
実際にその文字である可能性が高いほど高得点となる得
点とする他、実際にその文字である可能性を示す確率な
どの数値を用いることもできる。各部分文字列の一致度
は、最初に適当な初期値を与えておき、部分文字列の文
字が検索文字列の候補文字に一致するたびに、その確度
値に基づいて一致度演算手段が演算を施す。確度値が得
点である場合には、例えば一致度の初期値を０とし、各
確度値を順次加算する演算を施せば、その部分文字列が
検索文字列のより可能性の高い候補文字に一致するほど
高得点の一致度を得ることができる。しかも、不一致の
文字については、確度値の加算を行わないことにより一
致度が高得点となるのを抑制できる。また、確度値が０
以上１以下の値を有する確率である場合には、例えば一
致度の初期値を１とし、各確度値を順次乗算する演算を
施せば、その部分文字列が検索文字列に一致する確率が
高いほど１に近い値の一致度を得ることができる。ただ
し、この場合には、不一致文字があった場合にも、一致
度に予め定められた十分に低い確率を乗算する演算を施
す必要がある。The certainty value is explicitly associated with each candidate character, or may be associated so that it is automatically determined according to the total number of candidate characters for each character or the order of the candidate character. Good. The certainty value may be a score that is higher as the possibility that the corresponding candidate character is actually the character is higher, or may be a numerical value such as a probability indicating that the character is actually the character. For the degree of coincidence of each partial character string, an appropriate initial value is given first, and each time a character of the partial character string matches a candidate character of the search character string, the degree of coincidence calculating means calculates based on the probability value. Is applied. If the certainty value is a score, for example, if the initial value of the matching degree is set to 0 and an operation of sequentially adding the certainty values is performed, the partial character string matches the more likely candidate character of the search character string. The higher the score, the higher the degree of coincidence. In addition, for non-matching characters, the addition of the certainty value is not performed, so that a high matching score can be suppressed. Also, the accuracy value is 0
In the case of the probability of having a value of 1 or more and 1 or less, for example, if the initial value of the matching degree is set to 1 and an operation of sequentially multiplying the respective probability values is performed, the probability that the partial character string matches the search character string is high. The degree of coincidence closer to 1 can be obtained. However, in this case, even when there is a mismatched character, it is necessary to perform an operation of multiplying the degree of match by a predetermined sufficiently low probability.

【００２３】データ抽出手段が各部分文字列の一致度を
他の部分文字列の一致度と比較する場合、同じ被検索文
字列中における他の部分文字列の一致度と比較する場合
と、同じデータ内の他の被検索文字列における部分文字
列の一致度と比較する場合と、他のデータ内の被検索文
字列における部分文字列の一致度と比較する場合とが存
在し得る。そして、同じデータ内の同じ被検索文字列中
または他の被検索文字列の部分文字列の一致度と比較す
る場合は、一般に最大の一致度を求めるためのものであ
り、他のデータとの間の比較は、一致度の高い順に並べ
るなどのソートを行うためのものである。When the data extracting means compares the degree of coincidence of each partial character string with the degree of coincidence of another partial character string, the same as when comparing the degree of coincidence of another partial character string in the same searched character string. There may be a case where the comparison is made with the matching degree of a partial character string in another searched character string in data, and a case where the comparison is made with the matching degree of a partial character string in another searched character string in other data. When comparing with the matching degree of the same string to be searched in the same data or a partial character string of another string to be searched, the maximum matching degree is generally obtained. The comparison between them is for performing sorting such as arranging in descending order of the degree of coincidence.

【００２４】また、上記構成により、一致度演算手段
は、不確定文字列からなる被検索文字列の各部分文字列
について、検索文字列との一致の程度を一致度として算
出することができる。したがって、データベースの各デ
ータの被検索文字列が各文字の確定していない不確定文
字列であっても、そのまま検索文字列による検索を行う
ことが可能となる。Further, with the above configuration, the matching degree calculating means can calculate, as the matching degree, the degree of matching between each partial character string of the searched character string composed of the uncertain character string and the search character string. Therefore, even if the search target character string of each data in the database is an undetermined character string in which each character has not been determined, it is possible to perform a search using the search character string as it is.

【００２５】さらに、上記構成により、一致度演算手段
は、不確定文字列からなる被検索文字列の各部分文字列
について、不確定文字列からなる検索文字列との一致の
程度を一致度として算出することができる。したがっ
て、検索文字列が各文字の確定していない不確定文字列
であり、かつ、データベースの各データの被検索文字列
も同様の不確定文字列である場合であっても、そのまま
検索を行うことが可能となる。Further, according to the above configuration, the matching degree calculating means sets the degree of matching of each partial character string of the searched character string consisting of the uncertain character string with the search character string consisting of the uncertain character string as the matching degree. Can be calculated. Therefore, even if the search character string is an uncertain character string in which each character has not been determined, and the searched character string of each data in the database is the same uncertain character string, the search is performed as it is. It becomes possible.

【００２６】さらに、上記構成により、不確定文字列か
らなる検索文字列を文字読取装置によって入力したもの
とすることができる。したがって、例えば手書き文字読
取装置を備えた電子機器において、検索文字列を手書き
によって指定した場合に、複数の候補文字が選出された
ときにもその文字の確定作業を行うことなく、また、誤
認識があったときにもその文字の訂正作業を行うことな
く、直ちに検索を実行することができるようになる。Further, according to the above configuration, a search character string composed of an uncertain character string can be input by a character reading device. Therefore, for example, in an electronic device equipped with a handwritten character reading device, when a search character string is designated by handwriting, even when a plurality of candidate characters are selected, the work of confirming the characters is not performed, and erroneous recognition is not performed. Even when there is an error, the search can be executed immediately without correcting the character.

【００２７】さらに、上記構成により、不確定文字列か
らなる被検索文字列を文字読取装置によって入力したも
のとすることができる。したがって、例えば光学的文字
読取装置を備えた電子機器において、印刷物の活字文書
などを機械的に読み込んで文字の確定作業や訂正作業を
行うことなくデータベースの各データの被検索文字列と
したような場合にも、直ちに検索を実行することができ
るようになる。Further, according to the above configuration, it is possible to input a character string to be searched consisting of an uncertain character string by a character reading device. Therefore, for example, in an electronic device equipped with an optical character reading device, it is possible to mechanically read a printed type document or the like to obtain a character string to be searched for each data in a database without performing a character fixing operation or a correction operation. In this case, the search can be performed immediately.

【００２８】さらに、上記構成により、被検索文字列の
文字数が検索文字列の文字数に一致する場合にのみ検索
対象とすることができる。Further, according to the above configuration, it is possible to search only when the number of characters of the searched character string matches the number of characters of the searched character string.

【００２９】[0029]

【実施例】以下、本発明の実施例について説明する。Embodiments of the present invention will be described below.

【００３０】図１〜図５は本発明の第１実施例を示すも
のであって、図１は文字列検索装置の構成を示すブロッ
ク図、図２は文字列検索装置を備えた電子機器の構成を
示すブロック図、図３は文字列検索装置における検索処
理の動作を示すフローチャート、図４は検索処理におけ
る一致度の計算処理の動作を示すフローチャート、図５
は検索文字列の候補文字と部分文字列の文字とが一致す
る状態を示す図である。FIGS. 1 to 5 show a first embodiment of the present invention. FIG. 1 is a block diagram showing a configuration of a character string search device. FIG. FIG. 3 is a block diagram showing a configuration, FIG. 3 is a flowchart showing an operation of a search process in the character string search device, FIG.
FIG. 4 is a diagram showing a state where candidate characters of a search character string match characters of a partial character string.

【００３１】本実施例は、不確定文字列の検索文字列に
基づいてデータベースの検索を行う文字列検索装置を備
えた電子機器について説明する。この電子機器は、図２
に示すように、装置全体の制御と各種演算を行う演算装
置１を備えている。この演算装置１には、半導体メモリ
などからなる主記憶装置２が接続されている。主記憶装
置２は、文字列検索装置などのプログラムを実行のため
にロードする他、このプログラムの実行中に検索文字列
を格納したり一致度の得点を保存するための作業領域を
確保するためなどに使用される。また、演算装置１に
は、ハードディスク装置やフロッピーディスク装置など
からなる補助記憶装置３と、ＣＲＴ[Cathode Ray Tube]
ディスプレイやＬＣＤ[Liquid Crystal Display]などか
らなるディスプレイ装置４とが接続されている。補助記
憶装置３は、文字列検索装置などのプログラムやデータ
ベースを格納するために使用される。ディスプレイ装置
４は、文字列検索装置のプログラムの実行中に、入力さ
れた検索文字列を表示したり検索結果を表示するために
使用される。さらに、演算装置１には、入力装置として
タブレット５とイメージスキャナ６とが接続されてい
る。タブレット５は、入力ペン７を用いて手書きした文
字のストロークなどのパターンを入力する座標入力装置
であり、補助記憶装置３に格納された手書き文字読取装
置のプログラムを実行することにより、この手書き文字
のパターンを文字コードとして認識し検索文字列として
主記憶装置２に格納することができるようになってい
る。イメージスキャナ６は、印刷物の活字文書などを光
学的に読み込んで画像パターンとして入力する画像入装
置であり、補助記憶装置３に格納された光学的文字読取
装置のプログラムを実行することにより、活字文書など
の画像パターンを文字コードとして認識しデータベース
の各データの被検索文字列として補助記憶装置３に格納
することができるようになっている。なお、タブレット
５は、ディスプレイ装置４と一体型のものであってもよ
い。また、このタブレット５の代わりにマウスなどの他
の座標入力装置を用いることもできる。In this embodiment, an electronic apparatus including a character string search device for searching a database based on a search character string of an uncertain character string will be described. This electronic device is shown in FIG.
As shown in FIG. 1, an arithmetic unit 1 for controlling the entire apparatus and performing various calculations is provided. The arithmetic unit 1 is connected to a main storage device 2 composed of a semiconductor memory or the like. The main storage device 2 loads a program such as a character string search device for execution, and also secures a work area for storing a search character string during execution of the program and for storing a score of the degree of coincidence. Used for etc. The arithmetic unit 1 includes an auxiliary storage device 3 such as a hard disk device or a floppy disk device, and a CRT [Cathode Ray Tube].
A display and a display device 4 such as an LCD [Liquid Crystal Display] are connected. The auxiliary storage device 3 is used for storing programs such as a character string search device and a database. The display device 4 is used for displaying an input search character string and displaying search results during execution of a program of the character string search device. Further, a tablet 5 and an image scanner 6 are connected to the arithmetic device 1 as input devices. The tablet 5 is a coordinate input device for inputting a pattern such as a stroke of a handwritten character using the input pen 7, and executes the handwritten character reading device program stored in the auxiliary storage device 3. Can be recognized as a character code and stored in the main storage device 2 as a search character string. The image scanner 6 is an image input device for optically reading a printed type document or the like of a printed matter and inputting it as an image pattern. The image scanner 6 executes a program of the optical character reading device stored in the auxiliary storage device 3 to execute the type printing. Such an image pattern can be recognized as a character code and stored in the auxiliary storage device 3 as a searched character string of each data in the database. Note that the tablet 5 may be integrated with the display device 4. Further, instead of the tablet 5, another coordinate input device such as a mouse can be used.

【００３２】上記ハードウエア構成の電子機器において
データベースの検索処理を実行する文字列検索装置の構
成を図１に示す。この文字列検索装置は、タブレット５
上に入力ペン７を用いて手書きした各文字のパターンを
手書き文字読取装置１１によって認識し検索文字列とし
て検索処理部１２に入力する。ただし、本実施例では、
手書き文字読取装置１１が手書き入力文字を確定できな
かった場合にも、確定作業を行うことなく不確定文字列
のまま入力する。不確定文字列は、各文字が１または２
以上の候補文字からなり、かつ、各候補文字ごとに確度
値が対応付けられた文字列である。手書き入力文字が確
定できた場合には、その文字が１文字だけの候補文字か
らなり、確度値として満点の得点が対応付けられる。ま
た、手書き文字がいずれの候補文字に該当するかを確定
できなかった場合には、複数の候補文字が列挙され、各
候補文字ごとにその手書き文字に該当する可能性が高い
ほど満点に近い得点が対応付けられる。もっとも、候補
文字が複数ある場合のみならず１文字だけの場合にも、
実際の手書き入力文字がいずれの候補文字にも該当しな
い誤認識は生じる得る。FIG. 1 shows the configuration of a character string search device that executes a database search process in an electronic device having the above hardware configuration. This character string search device is a tablet 5
The pattern of each character handwritten using the input pen 7 is recognized by the handwritten character reading device 11 and input to the search processing unit 12 as a search character string. However, in this embodiment,
Even when the handwritten character reading device 11 cannot determine the handwritten input character, the handwritten character is input as an undetermined character string without performing the determination operation. In the uncertain character string, each character is 1 or 2
It is a character string composed of the above candidate characters and associated with a certainty value for each candidate character. When a handwritten input character is determined, the character is composed of only one candidate character, and a score of a perfect score is associated as a certainty value. If it is not possible to determine which candidate character the handwritten character corresponds to, a plurality of candidate characters are enumerated, and the higher the possibility that the handwritten character corresponds to the handwritten character, the closer the score to the full score. Are associated. However, not only when there are multiple candidate characters but also when there is only one character,
Erroneous recognition in which the actual handwritten input character does not correspond to any of the candidate characters may occur.

【００３３】上記不確定文字列は、任意のデータ構造に
よって実現することができる。即ち、例えば各文字につ
いての候補文字の最大数が定まっている場合には２次元
配列によって実現することができる。この場合、最初の
次元の添え字によって各文字の要素を指定し、次の次元
の添え字によって各候補文字の要素を指定する。そし
て、確度値についても、要素数が同じ数値型の２次元配
列に格納することにより対応付けることができる。ま
た、確度値が文字型と同じデータ型で表すことができる
場合には、不確定文字列を３次元配列として、最後の次
元の全ての添え字によって確度値の要素を指定すること
もできる。さらに、この確度値は、例えば候補文字が１
文字だけしか存在しない場合には満点の得点とし、２文
字の場合には先の候補文字が満点の６割の得点で後の候
補文字が満点の４割の得点とするというように、各文字
についての候補文字の総数や当該候補文字の序列などに
応じて自動的に定まるように対応付けることもできる。
したがって、この場合の確度値は、データとして不確定
文字列に付随するのではなく、アルゴリズムとして対応
付けられることになる。The indeterminate character string can be realized by an arbitrary data structure. That is, for example, when the maximum number of candidate characters for each character is determined, it can be realized by a two-dimensional array. In this case, the element of each character is specified by the subscript of the first dimension, and the element of each candidate character is specified by the subscript of the second dimension. The likelihood values can also be associated by storing them in a numerical two-dimensional array having the same number of elements. If the certainty value can be represented by the same data type as the character type, the uncertain character string may be a three-dimensional array, and the elements of the certainty value may be designated by all the subscripts of the last dimension. Further, the accuracy value is, for example, 1 candidate character.
If there are only characters, the score is a perfect score. If two characters, the first candidate character has a score of 60% of the full score and the subsequent candidate character has a score of 40% of the full score. Can be automatically determined in accordance with the total number of candidate characters, the order of the candidate characters, and the like.
Therefore, the certainty value in this case is not attached to the uncertain character string as data, but is associated as an algorithm.

【００３４】また、例えば各文字の１または２以上の候
補文字を順に配置すると共に、特別の区切り記号を定め
て、各文字の間にこの区切り記号を挿入することにより
文字の区切りを判別できるようにすれば、上記不確定文
字列を１次元配列などのシーケンシャルなデータとして
取り扱うことができ、各文字の候補文字の数も無制限と
することができる。そして、この場合にも、確度値を各
候補文字の直後に配置したり、別のシーケンシャルデー
タに格納したり、アルゴリズムとして対応付けることが
できる。さらに、この不確定文字列は、木構造[tree st
ructure]によって実現することも可能である。For example, one or more candidate characters of each character are arranged in order, a special delimiter is determined, and the delimiter is inserted between the characters so that a character delimiter can be determined. Then, the uncertain character string can be handled as sequential data such as a one-dimensional array, and the number of candidate characters for each character can be unlimited. Also in this case, the accuracy value can be arranged immediately after each candidate character, stored in another sequential data, or associated as an algorithm. In addition, this uncertain character string has a tree structure [tree st
ructure].

【００３５】上記検索処理部１２では、この不確定文字
列の検索文字列に基づいて補助記憶装置３に格納された
データベースにおける各データ中の被検索文字列の検索
を行う。このデータベースの検索の際には、各データ中
の検索対象となる欄を探索キーとして指定する。探索キ
ーとなる欄は、データのタイトルやキーワードまたはデ
ータ本体などであり、タイトル欄の場合にはタイトル名
の文字列が被検索文字列となり、キーワード欄の場合に
は列挙された各キーワードの文字列がそれぞれ被検索文
字列となり、データ本体の場合にはこのデータ本体を構
成する各文字列がそれぞれ被検索文字列となる。検索処
理部１２の検索処理は、まず被検索文字列における検索
文字列と同じ文字数の各部分文字列ごとに、一致した候
補文字の確度値に基づいて後に説明する一致度を算出す
ると共に、当該被検索文字列における各部分文字列の一
致度の最大得点を最大一致度として求める。そして、こ
の最大一致度が高いデータを優先して、ディスプレイ装
置４の検索結果リストに表示する。The search processing unit 12 searches for a character string to be searched in each data in the database stored in the auxiliary storage device 3 based on the search character string of the uncertain character string. When searching this database, a search target column in each data is designated as a search key. The search key field is a data title, a keyword, or the data itself. In the case of the title field, the character string of the title name is the character string to be searched, and in the case of the keyword field, the character of each listed keyword is displayed. Each column is a searched character string, and in the case of a data body, each character string constituting this data body is a searched character string. The search processing of the search processing unit 12 first calculates a matching degree, which will be described later, based on the likelihood value of the matching candidate character for each partial character string having the same number of characters as the search character string in the search target character string. The maximum score of the matching degree of each partial character string in the searched character string is obtained as the maximum matching degree. Then, the data with the highest maximum matching degree is displayed with priority on the search result list of the display device 4.

【００３６】上記構成の検索処理部１２の動作を図３お
よび図４のフローチャートに基づいて説明する。この検
索処理部１２は、補助記憶装置３に格納されたデータベ
ースから各データの被検索文字列を順次読み出し、それ
ぞれの被検索文字列について検索処理を実行する。ここ
で、検索文字列Ｐの文字数をｍ文字とし、被検索文字列
Ｔの文字数をｎ文字とする。検索処理では、図３に示す
ように、まず最大一致度の得点を“０”に初期化すると
共に（Ｓ１）、部分文字列の先頭位置ｔｏｐの値も
“０”に初期化する（Ｓ２）。この先頭位置ｔｏｐの値
は、検査対象となる部分文字列の先頭文字が被検索文字
列Ｔの先頭文字から何文字目になるかを示す。そして、
文字数ｎの被検索文字列Ｔの先頭位置ｔｏｐ以降に文字
数ｍ分の文字があるかどうかを判断して（Ｓ３）、ｍ文
字分に足りず部分文字列が存在しない場合には検索処理
を終了する。被検索文字列Ｔの文字数ｎが検索文字列Ｐ
の文字数ｍよりも少なかった場合には、検索処理の開始
時に直ちにここで処理を終了し、最大一致度の得点も初
期値のままとなる。The operation of the retrieval processing unit 12 having the above configuration will be described with reference to the flowcharts of FIGS. The search processing unit 12 sequentially reads the searched character strings of each data from the database stored in the auxiliary storage device 3 and executes a search process for each searched character string. Here, the number of characters of the search character string P is m, and the number of characters of the search target character string T is n. In the search process, as shown in FIG. 3, first, the score of the maximum matching score is initialized to "0" (S1), and the value of the top position top of the partial character string is also initialized to "0" (S2). . The value of the top position top indicates the number of the first character of the partial character string to be inspected from the first character of the searched character string T. And
It is determined whether or not there are m characters in number after the start position top of the searched character string T having n characters (S3). If there are not enough m characters and there is no partial character string, the search process ends. I do. The number of characters n of the search target character string T is the search character string P
If the number of characters is less than m, the process is immediately terminated at the start of the search process, and the score of the maximum matching degree remains at the initial value.

【００３７】Ｓ３でｍ文字分の文字があり部分文字列が
存在すると判断された場合には、先頭位置ｔｏｐからｍ
文字分の部分文字列について一致度の計算処理が実行さ
れる（Ｓ４）。そして、このＳ４で算出した当該部分文
字列の一致度と最大一致度とを比較して（Ｓ５）、最大
一致度の得点の方が低かった場合にはこの最大一致度の
得点をＳ４で算出した一致度の得点に書き換え（Ｓ
６）、また、最大一致度の得点の方が高いか同得点であ
ればそのまま、先頭位置ｔｏｐを１文字分先に進めて次
の部分文字列を設定すると共に（Ｓ７）、Ｓ３に戻って
被検索文字列Ｔの全ての部分文字列の処理が終わるまで
この処理を繰り返す。したがって、この検索処理では、
被検索文字列Ｔの先頭からｍ文字分の部分文字列を順に
切り出して、それぞれ検索文字列Ｐとの一致度を計算す
る。また、この検索処理の終了時には、当該被検索文字
列Ｔにおける各部分文字列の一致度の最大の得点が最大
一致度に格納されることになる。If it is determined in step S3 that there are m characters and a partial character string exists, m
Calculation processing of the degree of coincidence is performed on the partial character strings for the characters (S4). Then, the degree of coincidence of the partial character string calculated in S4 is compared with the maximum degree of coincidence (S5). If the score of the maximum degree of coincidence is lower, the score of the maximum degree of coincidence is calculated in S4. (S
6) If the score of the maximum matching score is higher or the score is the same, the head position top is moved forward by one character, the next partial character string is set (S7), and the process returns to S3. This processing is repeated until the processing of all the partial character strings of the searched character string T is completed. Therefore, in this search process,
A partial character string of m characters is sequentially cut out from the head of the search target character string T, and the degree of coincidence with the search character string P is calculated. At the end of the search processing, the maximum score of the matching degree of each partial character string in the searched character string T is stored in the maximum matching degree.

【００３８】上記Ｓ４における一致度の計算処理の詳細
を図４に基づいて説明する。ただし、ここでは、検索文
字列Ｐにおけるｉ文字目の文字のｊ番目の候補文字はＰ
[i,j]で表すものとし、被検索文字列Ｔにおけるｉ文字
目の文字はＴ[i]で表すものとする。一致度の計算処理
の開始時には、まず一致度の得点を“０”に初期化する
と共に（Ｓ１１）、カウンタｉの値を“１”に初期化し
（Ｓ１２）、カウンタｊの値も“１”に初期化する（Ｓ
１３）。そして、検索文字列Ｐにおけるｉ文字目の文字
のｊ番目の候補文字Ｐ[i,j]が被検索文字列Ｔにおける
部分文字列の先頭位置ｔｏｐからｉ文字目の文字Ｔ[top
+i]に一致するかどうかを判断する（Ｓ１４）。The details of the process of calculating the degree of coincidence in S4 will be described with reference to FIG. Here, the j-th candidate character of the i-th character in the search character string P is P
[i, j], and the i-th character in the searched character string T is represented by T [i]. At the start of the coincidence calculation process, first, the score of the coincidence is initialized to "0" (S11), the value of the counter i is initialized to "1" (S12), and the value of the counter j is also set to "1". (S
13). Then, the j-th candidate character P [i, j] of the i-th character in the search character string P is the i-th character T [top from the top position top of the partial character string in the search target character string T.
+ i] is determined (S14).

【００３９】Ｓ１４において、候補文字Ｐ[i,j]が文字
Ｔ[top+i]に一致しないと判断された場合には、カウン
タｊに“１”を加えて（Ｓ１５）、次の候補文字Ｐ[i,
j]が存在するかどうかを判断し（Ｓ１６）、次の候補文
字Ｐ[i,j]が存在する場合にはＳ１４に戻ってこの処理
を繰り返す。したがって、ここでは部分文字列の各文字
Ｔ[top+i]が検索文字列Ｐにおける同じ文字位置で対応
する１または２以上の候補文字Ｐ[i,j]と順に比較され
る。なお、Ｓ１６における次の候補文字Ｐ[i,j]の存在
の有無は、不確定文字列のデータ構造に応じた方法で検
出することができる。例えば２次元配列を用いる場合、
候補文字が存在しない要素に特別の空記号を格納してお
けば、候補文字Ｐ[i,j]がこの空記号であるかどうかを
検査することにより次の候補文字の有無が検出できる。
また、区切り記号を用いたシーケンシャルデータの場合
には、直前の候補文字の次の文字がこの区切り記号であ
るかどうかを検査することにより次の候補文字の有無が
検出できる。If it is determined in S14 that the candidate character P [i, j] does not match the character T [top + i], "1" is added to the counter j (S15), and the next candidate character P [i, j] is added. P [i,
j] is determined (S16), and if the next candidate character P [i, j] exists, the process returns to S14 to repeat this process. Therefore, each character T [top + i] of the partial character string is sequentially compared with one or more candidate characters P [i, j] corresponding to the same character position in the search character string P here. The presence or absence of the next candidate character P [i, j] in S16 can be detected by a method according to the data structure of the uncertain character string. For example, when using a two-dimensional array,
If a special empty symbol is stored in an element having no candidate character, the presence or absence of the next candidate character can be detected by checking whether the candidate character P [i, j] is this empty symbol.
In the case of sequential data using a delimiter, the presence or absence of the next candidate character can be detected by checking whether the character next to the immediately preceding candidate character is this delimiter.

【００４０】Ｓ１４において、候補文字Ｐ[i,j]が文字
Ｔ[top+i]に一致すると判断された場合には、残りの候
補文字との比較を打ち切って、その候補文字Ｐ[i,j]に
対応付けられた確度値を一致度の得点に加算する（Ｓ１
７）。そして、このＳ１７で一致度が加算された場合、
または、Ｓ１６で当該文字における全ての候補文字との
比較が完了したと判断された場合には、カウンタｉに
“１”を加えて次の文字に進み（Ｓ１８）、このカウン
タｉの値が検索文字列Ｐの文字数ｍを超えるまで、Ｓ１
３に戻りカウンタｊを再初期化してからこの処理を繰り
返す（Ｓ１９）。また、Ｓ１９でカウンタｉの値が文字
数ｍを超えたと判断された場合には、一致度の計算処理
を終了する。したがって、部分文字列の各文字が検索文
字列Ｐにおける同じ文字位置で対応する候補文字のいず
れかに一致した場合には、その候補文字の確度値が一致
度の得点に順次加算される。また、いずれの候補文字に
も一致しなかった場合にも、得点の加算は行われない
が、以降の各文字について処理を続行する。このため、
検索文字列Ｐの各候補文字に１文字も一致しなかった部
分文字列の一致度の得点は初期値である“０”のままで
あり、一般に一致した文字数が多いほど一致度の得点も
高くなる。また、部分文字列の同じ文字位置の文字が一
致した場合であっても、いずれの候補文字に一致したか
によって一致度が変化し、より可能性の高い候補文字に
一致するほど高得点となる。In S14, when it is determined that the candidate character P [i, j] matches the character T [top + i], the comparison with the remaining candidate characters is terminated, and the candidate character P [i, j] is terminated. j] is added to the score of the degree of coincidence (S1
7). When the coincidence is added in S17,
Alternatively, if it is determined in S16 that the comparison with all the candidate characters of the character has been completed, "1" is added to the counter i and the process proceeds to the next character (S18), and the value of the counter i is searched. Until the number m of characters of the character string P is exceeded, S1
3, the process is repeated after the counter j is reinitialized (S19). If it is determined in step S19 that the value of the counter i has exceeded the number m of characters, the process of calculating the degree of coincidence is terminated. Therefore, when each character of the partial character string matches any of the corresponding candidate characters at the same character position in the search character string P, the certainty value of the candidate character is sequentially added to the score of the degree of coincidence. In addition, even if no match is found with any of the candidate characters, the score is not added, but the processing is continued for each subsequent character. For this reason,
The score of the matching degree of the partial character string in which no character matched each candidate character of the search character string P remains at the initial value “0”, and generally, the score of the matching degree increases as the number of matching characters increases. Become. Also, even when the characters at the same character position in the partial character string match, the degree of matching changes depending on which candidate character matches, and the higher the possibility of matching the candidate character, the higher the score. .

【００４１】上記一致度の計算処理において、検索文字
列Ｐの各文字の候補文字と被検索文字列Ｔに含まれる部
分文字列の各文字とが一致する様子を図５に例示する。
ここで、各文字や候補文字は□で示すものとし、一致し
た文字や候補文字には□の中に黒丸を表示している。ま
た、検索文字列Ｐは、文字数を４文字（ｍ＝４）とし、
１文字目は５番目までの候補文字があり、２文字目は３
番目までの候補文字があり、３文字目は４番目までの候
補文字があり、４文字目は５番目までの候補文字がある
ものとする。カウンタｉ＝１の最初のループでは、カウ
ンタｊ＝２のときに、検索文字列Ｐの１文字目の２番目
の候補文字Ｐ[1,2]が部分文字列の１文字目の文字Ｔ[to
p+1]と一致するので、Ｓ１７でこの候補文字Ｐ[1,2]の
確度値が一致度の得点に加算される。カウンタｉ＝２の
ループでは、カウンタｊ＝１のときに、検索文字列Ｐの
２文字目の１番目の候補文字Ｐ[2,1]が部分文字列の２
文字目の文字Ｔ[top+2]と一致するので、この候補文字
Ｐ[2,1]の確度値が一致度の得点に加算される。カウン
タｉ＝３のループでは、カウンタｊ＝２のときに、検索
文字列Ｐの３文字目の２番目の候補文字Ｐ[3,2]が部分
文字列の３文字目の文字Ｔ[top+3]と一致するので、こ
の候補文字Ｐ[3,2]の確度値が一致度の得点に加算され
る。そして、カウンタｉ＝４のループでは、カウンタｊ
＝３のときに、検索文字列Ｐの４文字目の３番目の候補
文字Ｐ[4,3]が部分文字列の４文字目の文字Ｔ[top+4]と
一致するので、この候補文字Ｐ[4,3]の確度値が一致度
の得点に加算される。したがって、ここで算出される一
致度は、候補文字Ｐ[1,2]と候補文字Ｐ[2,1]と候補文字
Ｐ[3,2]と候補文字Ｐ[4,3]の各確度値の総和となる。FIG. 5 shows an example of how the candidate character of each character of the search character string P and each character of the partial character string included in the search target character string T match in the above-described matching degree calculation processing.
Here, each character or candidate character is indicated by □, and a matching character or candidate character is indicated by a black circle inside □. In addition, the search character string P has four characters (m = 4),
The first character has up to the fifth candidate character, and the second character is 3
It is assumed that there are up to the fourth candidate character, the third character has the fourth candidate character, and the fourth character has the fifth candidate character. In the first loop of the counter i = 1, when the counter j = 2, the second candidate character P [1,2] of the first character of the search character string P is replaced with the first character T [of the partial character string. to
p + 1], the likelihood value of the candidate character P [1,2] is added to the score of the matching degree in S17. In the loop of the counter i = 2, when the counter j = 1, the first candidate character P [2,1] of the second character of the search character string P is the partial character string 2
Since the character T matches the character T [top + 2], the certainty value of the candidate character P [2,1] is added to the score of the matching degree. In the loop of the counter i = 3, when the counter j = 2, the second candidate character P [3,2] of the third character of the search character string P is replaced with the third character T [top + 3], the certainty value of the candidate character P [3,2] is added to the score of the matching degree. Then, in the loop of the counter i = 4, the counter j
= 3, the fourth candidate character P [4,3] of the fourth character of the search character string P matches the fourth character T [top + 4] of the partial character string. The certainty value of P [4,3] is added to the score of the coincidence. Therefore, the degree of coincidence calculated here is a certainty value of candidate character P [1,2], candidate character P [2,1], candidate character P [3,2], and candidate character P [4,3]. Is the sum of

【００４２】上記検索処理部１２では、被検索文字列の
検索処理が終了すると、最大一致度の得点を参照してそ
のデータの得点とする。また、同じデータ中に複数の被
検索文字列があった場合には、これらの最大一致度の得
点のうちのさらに最大のものをそのデータの得点とす
る。そして、例えばこの得点が所定以上となるデータの
みをデータベースから抽出し、これらのデータを得点が
高い順にディスプレイ装置４の検索結果リストに表示す
る。図１では、検索処理部１２の検索処理によって、補
助記憶装置３に格納されたデータベースから２つのデー
タが抽出され、ディスプレイ装置４の検索結果リストに
最高得点のデータ２と次に得点が高いデータ５とが表示
された状態を示す。When the search processing unit 12 completes the search process for the searched character string, the search unit 12 refers to the score of the maximum matching degree and determines the score of the data. Further, when there are a plurality of searched character strings in the same data, the largest one of the scores of the maximum matching degree is set as the score of the data. Then, for example, only data whose score is equal to or more than a predetermined value are extracted from the database, and these data are displayed on the search result list of the display device 4 in descending order of the score. In FIG. 1, two data are extracted from the database stored in the auxiliary storage device 3 by the search processing of the search processing unit 12, and the highest score data 2 and the next highest score data are displayed in the search result list of the display device 4. 5 shows the displayed state.

【００４３】以上説明したように、本実施例の電子機器
の文字列検索装置によれば、タブレット５と入力ペン７
を用いた手書き文字読取装置１１により検索文字列を入
力する場合に、この検索文字列に確定していない文字や
誤認識の文字が含まれていたとしても、文字の確定作業
や訂正作業を行うことなく検索を実行することができ
る。即ち、検索文字列に複数の候補文字からなる未確定
の文字があったとしても、被検索文字列の対応文字がい
ずれかの候補文字に一致すれば一致度が高得点となるの
で、この被検索文字列を有するデータを確実に抽出する
ことができる。また、手書き文字読取装置１１が手書き
文字の一部を誤認識した場合にも、残りの他の文字が確
実に一致すれば比較的高得点の一致度を得ることができ
るので、本来の検索文字列に一致する被検索文字列を有
するデータを抽出できる可能性が高くなる。As described above, according to the character string search device for electronic equipment of this embodiment, the tablet 5 and the input pen 7 are used.
When a search character string is input by the handwritten character reading device 11 using, even if an unconfirmed character or an erroneously recognized character is included in the search character string, the character is determined or corrected. Search can be performed without any need. That is, even if there is an undetermined character consisting of a plurality of candidate characters in the search character string, if the corresponding character of the search target character string matches any of the candidate characters, the matching score is high, so Data having a search character string can be reliably extracted. In addition, even when the handwritten character reading device 11 erroneously recognizes a part of the handwritten character, a relatively high score can be obtained if the remaining other characters are surely matched. It is more likely that data having a searched character string that matches the column can be extracted.

【００４４】なお、図４に示した一致度の計算処理で
は、実際にはほとんどの部分文字列が検索文字列と全く
異なる文字列であるのが通常である。しかも、いずれの
候補文字にも一致しない文字がある程度の文字数以上検
出された場合には、最後の文字まで計算を続行したとし
ても、一致度が高得点となることはあり得ない。したが
って、処理の高速化のために、不一致の文字が所定数以
上検出された場合には、その部分文字列については、以
降の文字の一致度の計算処理を打ち切るようにしてもよ
い。In the process of calculating the degree of coincidence shown in FIG. 4, it is normal that most of the partial character strings are actually completely different from the search character string. In addition, if a character that does not match any of the candidate characters is detected by a certain number or more, even if the calculation is continued up to the last character, the matching score cannot be high. Therefore, in order to speed up the processing, when a predetermined number or more of unmatched characters are detected, the calculation processing of the degree of matching of the subsequent characters may be terminated for the partial character string.

【００４５】また、最高一致度の得点が所定値以上とな
る被検索文字列のデータをその得点にかかわりなく全て
一律に抽出するような検索を行う場合には、図３に示し
た検索処理において、この所定値以上の一致度を有する
部分文字列が検出されると、そのデータは既に抽出の対
象となることが確定するので、以降の部分文字列につい
てのＳ４の一致度の計算処理を打ち切るようにすること
もできる。When performing a search for uniformly extracting all the data of the searched character string in which the highest matching score is equal to or more than the predetermined value, regardless of the score, the search processing shown in FIG. When a partial character string having a degree of coincidence equal to or greater than the predetermined value is detected, it is determined that the data has already been extracted, and the calculation processing of the degree of coincidence in S4 for the subsequent partial character string is terminated. You can also do so.

【００４６】図６および図７は本発明の第２実施例を示
すものであって、図６は検索処理における一致度の計算
処理の動作を示すフローチャート、図７は検索文字列の
文字と部分文字列の候補文字とが一致する状態を示す図
である。なお、図１乃至図５に示した第１実施例と同様
の機能を有する構成部材には同じ符号を付記する。FIGS. 6 and 7 show a second embodiment of the present invention. FIG. 6 is a flowchart showing the operation of the calculation processing of the degree of coincidence in the search processing. FIG. It is a figure showing the state where a candidate character of a character string matches. Note that components having the same functions as those of the first embodiment shown in FIGS. 1 to 5 are denoted by the same reference numerals.

【００４７】本実施例は、各文字が確定した検索文字列
に基づいてデータベースの不確定文字列の被検索文字列
を検索する文字列検索装置を備えた電子機器について説
明する。この電子機器のハードウエア構成は図２に示し
た第１実施例と同じである。そして、文字列検索装置の
構成も、図１に示した第１実施例と同じでよい。ただ
し、本実施例では、補助記憶装置３に格納されたデータ
ベースの各データ中の被検索文字列が第１実施例で示し
たものと同じ不確定文字列によって構成されている。ま
た、本実施例では、タブレット５と入力ペン７による検
索文字列の入力の際に、手書き文字読取装置１１が手書
き入力文字を確定できなかった場合は、確定作業を行っ
て全ての文字を確定させてから検索処理部１２に入力す
る。したがって、この検索文字列は、図示しないキーボ
ードなどの文字の入力装置から入力したものであっても
よい。なお、データベース中の被検索文字列は、例えば
印刷物の活字文書などを図２に示したイメージスキャナ
６で画像パターンとして読み込み光学的文字読取装置に
よって認識したものを確定作業を行うことなく入力した
ものとする。また、タブレット５と入力ペン７を用いて
手書き文字読取装置１１により入力したものであっても
よい。In the present embodiment, an electronic apparatus having a character string search device for searching a character string to be searched for an undefined character string in a database based on a search character string in which each character is determined will be described. The hardware configuration of this electronic device is the same as that of the first embodiment shown in FIG. The configuration of the character string search device may be the same as that of the first embodiment shown in FIG. However, in this embodiment, the searched character string in each data of the database stored in the auxiliary storage device 3 is constituted by the same uncertain character string as shown in the first embodiment. Further, in the present embodiment, when the handwritten character reading device 11 cannot determine the handwritten input character at the time of inputting the search character string by the tablet 5 and the input pen 7, the determination operation is performed to determine all the characters. Then, it is input to the search processing unit 12. Therefore, the search character string may be input from a character input device such as a keyboard (not shown). The character string to be searched in the database is, for example, a character type document of a printed matter read as an image pattern by the image scanner 6 shown in FIG. And The input may be input by the handwritten character reading device 11 using the tablet 5 and the input pen 7.

【００４８】本実施例の検索処理部１２は、各文字が確
定した検索文字列に基づいて補助記憶装置３に格納され
たデータベースにおける各データ中の不確定文字列から
なる被検索文字列の検索を行う。この検索処理部１２の
検索処理は、図３に示した第１実施例の場合と同じであ
るが、Ｓ４における一致度の計算処理の内容は図４に示
した第１実施例の場合とは異なる。The search processing unit 12 according to the present embodiment searches for a character string to be searched consisting of an uncertain character string in each data in the database stored in the auxiliary storage device 3 based on the search character string in which each character is determined. I do. The search processing of the search processing unit 12 is the same as that of the first embodiment shown in FIG. 3, but the content of the process of calculating the degree of coincidence in S4 is different from that of the first embodiment shown in FIG. different.

【００４９】本実施例におけるこの一致度の計算処理の
詳細を図６に基づいて説明する。ただし、ここでは、検
索文字列Ｐにおけるｉ文字目の文字はＰ[i]で表すもの
とし、被検索文字列Ｔにおけるｉ文字目の文字のｊ番目
の候補文字はＴ[i,j]で表すものとする。一致度の計算
処理の開始時には、まず一致度の得点を“０”に初期化
すると共に（Ｓ２１）、カウンタｉの値とカウンタｊの
値を“１”に初期化する（Ｓ２２，Ｓ２３）。そして、
検索文字列Ｐにおけるｉ文字目の文字Ｐ[i]が被検索文
字列Ｔにおける部分文字列の先頭位置ｔｏｐからｉ文字
目の文字のｊ番目の候補文字Ｔ[top+i,j]に一致するか
どうかを判断する（Ｓ２４）。The details of the process of calculating the degree of coincidence in this embodiment will be described with reference to FIG. Here, the i-th character in the search character string P is represented by P [i], and the j-th candidate character of the i-th character in the search target character string T is T [i, j]. Shall be represented. At the start of the process of calculating the coincidence, the score of the coincidence is initialized to "0" (S21), and the value of the counter i and the value of the counter j are initialized to "1" (S22, S23). And
The i-th character P [i] in the search character string P matches the j-th candidate character T [top + i, j] of the i-th character from the top position top of the partial character string in the search target character string T It is determined whether or not to perform (S24).

【００５０】Ｓ２４において、文字Ｐ[i]が候補文字Ｔ
[top+i,j]に一致しないと判断された場合には、カウン
タｊに“１”を加えて（Ｓ２５）、次の候補文字Ｔ[top
+i,j]が存在するかどうかを判断し（Ｓ２６）、次の候
補文字Ｔ[top+i,j]が存在する場合にはＳ２４に戻って
この処理を繰り返す。したがって、ここでは検索文字列
Ｐの各文字Ｐ[i]が部分文字列における同じ文字位置で
対応する１または２以上の各候補文字Ｔ[top+i,j]と順
に比較される。In S24, the character P [i] is replaced with the candidate character T
If it is determined that they do not match [top + i, j], “1” is added to the counter j (S25), and the next candidate character T [top
+ i, j] is determined (S26), and if the next candidate character T [top + i, j] exists, the process returns to S24 to repeat this process. Therefore, each character P [i] of the search character string P is sequentially compared with one or more corresponding candidate characters T [top + i, j] at the same character position in the partial character string.

【００５１】Ｓ２４において、文字Ｐ[i]が候補文字Ｔ
[top+i,j]に一致すると判断された場合には、残りの候
補文字との比較を打ち切って、その候補文字Ｔ[top+i,
j]に対応付けられた確度値を一致度の得点に加算する
（Ｓ２７）。そして、このＳ２７で一致度が加算された
場合、または、Ｓ２６で当該文字における全ての候補文
字との比較が完了したと判断された場合には、カウンタ
ｉに“１”を加えて次の文字に進み（Ｓ２８）、このカ
ウンタｉの値が検索文字列Ｐの文字数ｍを超えるまで、
Ｓ２３に戻ってカウンタｊを再初期化してからこの処理
を繰り返す（Ｓ２９）。また、Ｓ２９でカウンタｉの値
が文字数ｍを超えたと判断された場合には、一致度の計
算処理を終了する。したがって、検索文字列Ｐの各文字
が部分文字列における同じ文字位置で対応する候補文字
のいずれかに一致した場合には、その候補文字の確度値
が一致度の得点に順次加算される。また、いずれの候補
文字にも一致しなかった場合にも、得点の加算は行われ
ないが、以降の各文字について処理を続行する。このた
め、各部分文字列は、第１実施例の場合と同様に、検索
文字列Ｐとの一致の度合が高いほど一致度が高得点とな
る。In S24, the character P [i] is replaced with the candidate character T
If it is determined that the character string matches [top + i, j], the comparison with the remaining candidate characters is terminated, and the candidate character T [top + i, j] is canceled.
j]] is added to the score of the degree of coincidence (S27). If the degree of coincidence is added in S27, or if it is determined in S26 that the comparison with all the candidate characters in the character has been completed, “1” is added to the counter i and the next character is added. (S28), and until the value of the counter i exceeds the number m of characters of the search character string P,
Returning to S23, the counter j is reinitialized, and this process is repeated (S29). If it is determined in step S29 that the value of the counter i has exceeded the number of characters m, the process of calculating the degree of coincidence is terminated. Therefore, when each character of the search character string P matches any of the corresponding candidate characters at the same character position in the partial character string, the certainty value of the candidate character is sequentially added to the score of the degree of coincidence. In addition, even if no match is found with any of the candidate characters, the score is not added, but the processing is continued for each subsequent character. Therefore, as in the case of the first embodiment, the higher the degree of matching with the search character string P, the higher the score of each partial character string.

【００５２】上記一致度の計算処理において、検索文字
列Ｐの各文字と被検索文字列Ｔに含まれる部分文字列の
各文字の候補文字とが一致する様子を図７に例示する。
ここでも、各文字や候補文字は□で示すものとし、一致
した文字や候補文字には□の中に黒丸を表示している。
また、検索文字列Ｐの文字数を４文字（ｍ＝４）とし、
被検索文字列Ｔの部分文字列が１文字目は７番目までの
候補文字があり、２文字目は５番目までの候補文字があ
り、３文字目は２番目までの候補文字があり、４文字目
は３番目までの候補文字があるものとする。カウンタｉ
＝１の最初のループでは、カウンタｊ＝２のときに、検
索文字列Ｐの１文字目の文字Ｐ[1]が部分文字列の１文
字目の２番目の候補文字Ｔ[top+1,2]と一致するので、
Ｓ２７でこの候補文字Ｔ[top+1,2]の確度値が一致度の
得点に加算される。カウンタｉ＝２のループでは、カウ
ンタｊ＝３のときに、検索文字列Ｐの２文字目の文字Ｐ
[2]が部分文字列の２文字目の３番目の候補文字Ｔ[top+
2,3]と一致するので、この候補文字Ｔ[top+2,3]の確度
値が一致度の得点に加算される。カウンタｉ＝３のルー
プでは、カウンタｊ＝１のときに、検索文字列Ｐの３文
字目の文字Ｐ[3]が部分文字列の３文字目の１番目の候
補文字Ｔ[top+3,1]と一致するので、この候補文字Ｔ[to
p+3,1]の確度値が一致度の得点に加算される。そして、
カウンタｉ＝４のループでは、カウンタｊ＝３のとき
に、検索文字列Ｐの４文字目の文字Ｐ[4]が部分文字列
の４文字目の３番目の候補文字Ｔ[top+4,3]と一致する
ので、この候補文字Ｔ[top+4,3]の確度値が一致度の得
点に加算される。したがって、ここで算出される一致度
は、候補文字Ｔ[top+1,2]と候補文字Ｔ[top+2,3]と候補
文字Ｔ[top+3,1]と候補文字Ｔ[top+4,3]の各確度値の総
和となる。FIG. 7 shows an example in which each character of the search character string P matches a candidate character of each character of the partial character string included in the search target character string T in the above-described matching degree calculation processing.
Here, each character or candidate character is indicated by □, and the matching character or candidate character is indicated by a black circle inside □.
Further, the number of characters of the search character string P is set to 4 characters (m = 4),
The first character of the substring to be searched T is the seventh candidate character, the second character is the fifth candidate character, the third character is the second candidate character, and the fourth character is the fourth character. The character is assumed to have up to the third candidate character. Counter i
In the first loop of = 1, when the counter j = 2, the first character P [1] of the search character string P is changed to the second candidate character T [top + 1, 2]
In S27, the certainty value of the candidate character T [top + 1,2] is added to the score of the matching degree. In the loop of the counter i = 2, when the counter j = 3, the second character P of the search character string P
[2] is the third candidate character T [top +
2,3], the certainty value of the candidate character T [top + 2,3] is added to the score of the matching degree. In the loop of the counter i = 3, when the counter j = 1, the third character P [3] of the search character string P becomes the first candidate character T [top + 3, 1], the candidate character T [to
p + 3,1] is added to the score of the coincidence. And
In the loop of the counter i = 4, when the counter j = 3, the fourth character P [4] of the search character string P becomes the third candidate character T [top + 4, 3], the likelihood value of the candidate character T [top + 4,3] is added to the score of the matching degree. Therefore, the degree of matching calculated here is that the candidate character T [top + 1,2], the candidate character T [top + 2,3], the candidate character T [top + 3,1], and the candidate character T [top + 4,3].

【００５３】上記検索処理部１２で被検索文字列の検索
処理が終了すると、第１実施例の場合と同様に、最大一
致度の得点をそのデータの得点とし、例えばこの得点が
所定以上となるデータのみをデータベースから抽出し
て、これらのデータを得点が高い順にディスプレイ装置
４の検索結果リストに表示する。When the search processing unit 12 completes the search process for the searched character string, the score of the maximum matching score is used as the score of the data, as in the case of the first embodiment. For example, this score is equal to or higher than a predetermined value. Only data is extracted from the database, and these data are displayed on the search result list of the display device 4 in descending order of the score.

【００５４】以上説明したように、本実施例の電子機器
の文字列検索装置によれば、イメージスキャナ６を用い
た光学的文字読取装置によりデータベースの各データ中
に被検索文字列を入力したような場合に、この被検索文
字列に確定していない文字や誤認識の文字が含まれてい
たとしても、文字の確定作業や訂正作業を行うことなく
検索を実行することができる。したがって、印刷物の活
字文書などを機械的に大量に読み込んでデータベース化
したようなものに対しても、検索文字列による検索を行
うことができるようになる。As described above, according to the character string search device of the electronic apparatus of the present embodiment, the character string to be searched is input into each data of the database by the optical character reading device using the image scanner 6. In such a case, even if an unconfirmed character or an erroneously recognized character is included in the searched character string, the search can be executed without performing the character fixing operation or the correction operation. Therefore, it is possible to perform a search using a search character string even when a large number of printed type documents or the like are mechanically read and stored in a database.

【００５５】なお、図６に示した一致度の計算処理や図
３に示した検索処理は、第１実施例の場合と同様に、不
一致の文字が所定数以上検出された場合や一致度の得点
が所定値以上となった場合に、高速化のために適宜処理
を途中で打ち切るようにすることができる。The calculation processing of the degree of coincidence shown in FIG. 6 and the search processing shown in FIG. 3 are performed in the same manner as in the first embodiment, when a predetermined number of unmatched characters are detected or when the degree of coincidence is detected. When the score becomes equal to or more than a predetermined value, the processing can be appropriately terminated midway for speeding up.

【００５６】図８および図９は本発明の第３実施例を示
すものであって、図８は検索処理における一致度の計算
処理の動作を示すフローチャート、図７は検索文字列の
候補文字と部分文字列の候補文字とが一致する状態を示
す図である。なお、図１乃至図５に示した第１実施例と
同様の機能を有する構成部材には同じ符号を付記する。FIGS. 8 and 9 show a third embodiment of the present invention. FIG. 8 is a flowchart showing the operation of the calculation of the degree of coincidence in the search process. FIG. It is a figure showing the state where candidate character of a partial character string matches. Note that components having the same functions as those of the first embodiment shown in FIGS. 1 to 5 are denoted by the same reference numerals.

【００５７】本実施例は、不確定文字列の検索文字列に
基づいてデータベースの不確定文字列の被検索文字列を
検索する文字列検索装置を備えた電子機器について説明
する。この電子機器のハードウエア構成は図２に示した
第１実施例と同じである。また、文字列検索装置の構成
も、図１に示した第１実施例と同じであり、タブレット
５と入力ペン７を用いて手書き文字読取装置１１により
入力した検索文字列は不確定文字列によって構成されて
いる。ただし、本実施例では、第２実施例の場合と同様
に、補助記憶装置３に格納されたデータベースの各デー
タ中の被検索文字列も不確定文字列によって構成されて
いる。In this embodiment, an electronic apparatus equipped with a character string search device for searching a character string to be searched for an uncertain character string in a database based on a search character string of an uncertain character string will be described. The hardware configuration of this electronic device is the same as that of the first embodiment shown in FIG. The configuration of the character string search device is the same as that of the first embodiment shown in FIG. 1, and the search character string input by the handwritten character reading device 11 using the tablet 5 and the input pen 7 is an uncertain character string. It is configured. However, in the present embodiment, similarly to the case of the second embodiment, the character string to be searched in each data of the database stored in the auxiliary storage device 3 is also constituted by the uncertain character string.

【００５８】本実施例の検索処理部１２は、不確定文字
列の検索文字列に基づいて補助記憶装置３に格納された
データベースにおける各データ中の不確定文字列からな
る被検索文字列の検索を行う。この検索処理部１２の検
索処理は、図３に示した第１実施例の場合と同じである
が、Ｓ４における一致度の計算処理の内容は図４に示し
た第１実施例や図６に示した第２実施例の場合とは異な
る。The search processing unit 12 according to the present embodiment searches for a character string to be searched consisting of an uncertain character string in each data in a database stored in the auxiliary storage device 3 based on the uncertain character string search character string. I do. The search processing of the search processing unit 12 is the same as that of the first embodiment shown in FIG. 3, but the content of the process of calculating the degree of coincidence in S4 is the same as that of the first embodiment shown in FIG. This is different from the case of the second embodiment shown.

【００５９】本実施例におけるこの一致度の計算処理の
詳細を図８に基づいて説明する。ただし、ここでは、検
索文字列Ｐにおけるｉ文字目の文字のｊ番目の候補文字
はＰ[i,j]で表すものとし、被検索文字列Ｔにおけるｉ
文字目の文字のｋ番目の候補文字はＴ[i,k]で表すもの
とする。一致度の計算処理の開始時には、まず一致度の
得点を“０”に初期化すると共に（Ｓ３１）、カウンタ
ｉの値とカウンタｊの値とカウンタｋの値をそれぞれ
“１”に初期化する（Ｓ３２〜Ｓ３４）。そして、検索
文字列Ｐにおけるｉ文字目の文字のｊ番目の候補文字Ｐ
[i,j]が被検索文字列Ｔにおける部分文字列の先頭位置
ｔｏｐからｉ文字目の文字のｋ番目の候補文字Ｔ[top+
i,k]に一致するかどうかを判断する（Ｓ３５）。Details of the process of calculating the degree of coincidence in this embodiment will be described with reference to FIG. Here, the j-th candidate character of the i-th character in the search character string P is represented by P [i, j].
The k-th candidate character of the character is represented by T [i, k]. At the start of the process of calculating the coincidence, the score of the coincidence is initialized to "0" (S31), and the value of the counter i, the value of the counter j, and the value of the counter k are each initialized to "1". (S32-S34). Then, the j-th candidate character P of the i-th character in the search character string P
[i, j] is the k-th candidate character T [top + of the i-th character from the head position top of the partial character string in the search target character string T
i, k] (S35).

【００６０】Ｓ３５において、候補文字Ｐ[i,j]が候補
文字Ｔ[top+i,k]に一致しないと判断された場合には、
カウンタｋに“１”を加えて（Ｓ３６）、次の候補文字
Ｔ[top+i,k]が存在するかどうかを判断し（Ｓ３７）、
次の候補文字Ｔ[top+i,k]が存在する場合にはＳ３５に
戻ってこの処理を繰り返す。また、次の候補文字Ｔ[top
+i,k]が存在しないと判断された場合には、カウンタｊ
に“１”を加えて（Ｓ３８）、次の候補文字Ｐ[i,j]が
存在するかどうかを判断し（Ｓ３９）、次の候補文字Ｐ
[i,j]が存在する場合にはＳ３４に戻りカウンタｋを再
初期化してからこの処理を繰り返す。したがって、ここ
では検索文字列Ｐの各文字の１または２以上の候補文字
Ｐ[i,j]と、部分文字列における同じ文字位置で対応す
る１または２以上の候補文字Ｔ[top+i,k]とが総当たり
で順に比較される。In S35, when it is determined that the candidate character P [i, j] does not match the candidate character T [top + i, k],
"1" is added to the counter k (S36), and it is determined whether or not the next candidate character T [top + i, k] exists (S37).
If the next candidate character T [top + i, k] exists, the process returns to S35 and repeats this process. Also, the next candidate character T [top
+ i, k] does not exist, the counter j
Is added (S38) to determine whether or not the next candidate character P [i, j] exists (S39).
If [i, j] exists, the process returns to S34 to reinitialize the counter k, and then repeats this process. Therefore, here, one or more candidate characters P [i, j] of each character of the search character string P and one or more candidate characters T [top + i, k] are brute force compared in order.

【００６１】Ｓ３５において、候補文字Ｐ[i,j]が候補
文字Ｔ[top+i,k]に一致すると判断された場合には、残
りの候補文字との比較を打ち切って、候補文字Ｐ[i,j]
と候補文字Ｔ[top+i,k]に対応付けられたそれぞれの確
度値を一致度の得点に加算する（Ｓ４０）。そして、こ
のＳ４０で一致度が加算された場合、または、Ｓ３９で
当該文字における全ての候補文字の比較が完了したと判
断された場合には、カウンタｉに“１”を加えて次の文
字に進み（Ｓ４１）、このカウンタｉの値が検索文字列
Ｐの文字数ｍを超えるまで、Ｓ３３に戻りカウンタｊと
カウンタｋを再初期化してからこの処理を繰り返す（Ｓ
４２）。また、Ｓ４２でカウンタｉの値が文字数ｍを超
えたと判断された場合には、一致度の計算処理を終了す
る。したがって、検索文字列Ｐの各文字の候補文字のい
ずれかが部分文字列における同じ文字位置で対応する候
補文字のいずれかに一致した場合には、一致した双方の
候補文字の確度値が一致度の得点に順次加算される。ま
た、いずれの候補文字にも一致しなかった場合にも、得
点の加算は行われないが、以降の各文字について処理を
続行する。このため、各部分文字列は、第１実施例や第
２実施例の場合と同様に、検索文字列Ｐとの一致の度合
が高いほど一致度が高得点となる。If it is determined in S35 that the candidate character P [i, j] matches the candidate character T [top + i, k], the comparison with the remaining candidate characters is terminated, and the candidate character P [ i, j]
And the respective probabilities associated with the candidate characters T [top + i, k] are added to the score of the degree of coincidence (S40). Then, if the degree of coincidence is added in S40, or if it is determined in S39 that the comparison of all candidate characters in the character has been completed, “1” is added to the counter i and the next character is added. Proceeding (S41), the process returns to S33 and re-initializes the counters j and k until the value of the counter i exceeds the number m of characters of the search character string P, and then repeats this process (S41).
42). If it is determined in S42 that the value of the counter i has exceeded the number m of characters, the process of calculating the degree of coincidence is terminated. Therefore, if any of the candidate characters of each character of the search character string P matches any of the corresponding candidate characters at the same character position in the partial character string, the likelihood value of both of the matched candidate characters is Are sequentially added to the score. In addition, even if no match is found with any of the candidate characters, the score is not added, but the processing is continued for each subsequent character. Therefore, as in the case of the first and second embodiments, the higher the degree of matching with the search character string P, the higher the score of each partial character string.

【００６２】上記一致度の計算処理において、検索文字
列Ｐの各文字の候補文字と被検索文字列Ｔに含まれる部
分文字列の各文字の候補文字とが一致する様子を図９に
例示する。ここでも、各文字や候補文字は□で示すもの
とし、一致した文字や候補文字には□の中に黒丸を表示
している。また、検索文字列Ｐは、文字数を４文字（ｍ
＝４）とし、各文字にはそれぞれ図５に示した第１実施
例の場合と同じ数の候補文字があるものとする。また、
部分文字列の各文字には、それぞれ図７に示した第２実
施例の場合と同じ数の候補文字があるものとする。カウ
ンタｉ＝１の最初のループでは、カウンタｊ＝２，ｋ＝
２のときに、検索文字列Ｐの１文字目の２番目の候補文
字Ｐ[1,2]が部分文字列の１文字目の２番目の候補文字
Ｔ[top+1,2]と一致するので、Ｓ４０でこれらの候補文
字Ｐ[1,2]と候補文字Ｔ[top+1,2]の確度値が一致度の得
点に加算される。カウンタｉ＝２のループでは、カウン
タｊ＝１，ｋ＝３のときに、検索文字列Ｐの２文字目の
１番目の候補文字Ｐ[2,1]が部分文字列の２文字目の３
番目の候補文字Ｔ[top+2,3]と一致するので、これらの
候補文字Ｐ[2,1]と候補文字Ｔ[top+2,3]の確度値が一致
度の得点に加算される。カウンタｉ＝３のループでは、
カウンタｊ＝２，ｋ＝１のときに、検索文字列Ｐの３文
字目の２番目の候補文字Ｐ[3,2]が部分文字列の３文字
目の１番目の候補文字Ｔ[top+3,1]と一致するので、こ
れらの候補文字Ｐ[3,2]と候補文字Ｔ[top+3,1]の確度値
が一致度の得点に加算される。そして、カウンタｉ＝４
のループでは、カウンタｊ＝３，ｋ＝３のときに、検索
文字列Ｐの４文字目の３番目の候補文字Ｐ[4,3]が部分
文字列の４文字目の３番目の候補文字Ｔ[top+4,3]と一
致するので、これらの候補文字Ｐ[4,3]と候補文字Ｔ[to
p+4,3]の確度値が一致度の得点に加算される。したがっ
て、ここで算出される一致度は、候補文字Ｐ[1,2]と候
補文字Ｐ[2,1]と候補文字Ｐ[3,2]と候補文字Ｐ[4,3]の
各確度値の和と、候補文字Ｔ[top+1,2]と候補文字Ｔ[to
p+2,3]と候補文字Ｔ[top+3,1]と候補文字Ｔ[top+4,3]の
各確度値の和との総和となる。FIG. 9 shows an example of how the candidate character of each character of the search character string P and the candidate character of each character of the partial character string included in the search target character string T match in the above-described matching degree calculation processing. . Here, each character or candidate character is indicated by □, and the matching character or candidate character is indicated by a black circle inside □. In addition, the search character string P has four characters (m
= 4), and each character has the same number of candidate characters as in the case of the first embodiment shown in FIG. Also,
It is assumed that each character of the partial character string has the same number of candidate characters as in the second embodiment shown in FIG. In the first loop of the counter i = 1, the counters j = 2, k =
When 2, the second candidate character P [1,2] of the first character of the search character string P matches the second candidate character T [top + 1,2] of the first character of the partial character string Therefore, in S40, the certainty values of these candidate characters P [1,2] and candidate characters T [top + 1,2] are added to the score of the matching degree. In the loop of the counter i = 2, when the counters j = 1 and k = 3, the first candidate character P [2,1] of the second character of the search character string P is the third character of the second character of the partial character string.
Since the candidate character T [top + 2,3] matches, the certainty value of these candidate characters P [2,1] and candidate character T [top + 2,3] is added to the score of the degree of coincidence. . In the loop with the counter i = 3,
When the counters j = 2 and k = 1, the second candidate character P [3,2] of the third character of the search character string P becomes the first candidate character T [top + of the third character of the partial character string. 3, [1], the likelihood values of these candidate characters P [3,2] and T [top + 3,1] are added to the score of the matching degree. And the counter i = 4
When the counters j = 3 and k = 3, the fourth candidate character P [4,3] of the fourth character of the search character string P becomes the third candidate character of the fourth character of the partial character string. T [top + 4,3], these candidate characters P [4,3] and candidate character T [to
p + 4,3] is added to the score of the coincidence. Therefore, the degree of coincidence calculated here is a certainty value of candidate character P [1,2], candidate character P [2,1], candidate character P [3,2], and candidate character P [4,3]. And the candidate characters T [top + 1,2] and the candidate characters T [to
p + 2,3], the sum of the accuracy values of the candidate character T [top + 3,1] and the candidate character T [top + 4,3].

【００６３】上記検索処理部１２で被検索文字列の検索
処理が終了すると、第１実施例や第２実施例の場合と同
様に、最大一致度の得点をそのデータの得点とし、例え
ばこの得点が所定以上となるデータのみをデータベース
から抽出して、これらのデータを得点が高い順にディス
プレイ装置４の検索結果リストに表示する。When the search processing unit 12 completes the search process for the search target character string, the score of the maximum matching score is used as the score of the data, as in the first and second embodiments. Is extracted from the database, and these data are displayed on the search result list of the display device 4 in descending order of score.

【００６４】以上説明したように、本実施例の電子機器
の文字列検索装置によれば、検索文字列や被検索文字列
を文字読取装置によって入力したような場合に、これら
の文字列に確定していない文字や誤認識の文字が含まれ
ていたとしても、文字の確定作業や訂正作業を行うこと
なく検索を実行することができる。As described above, according to the character string search device of the electronic apparatus of the present embodiment, when a search character string or a character string to be searched is input by a character reading device, these character strings are determined. Even if a character that has not been performed or a character that has been misrecognized is included, the search can be performed without performing a character fixing operation or a correction operation.

【００６５】なお、図８に示した一致度の計算処理や図
３に示した検索処理は、第１実施例や第２実施例の場合
と同様に、不一致の文字が所定数以上検出された場合や
一致度の得点が所定値以上となった場合に、高速化のた
めに適宜処理を途中で打ち切るようにすることができ
る。In the calculation processing of the degree of coincidence shown in FIG. 8 and the search processing shown in FIG. 3, as in the first and second embodiments, a predetermined number or more of unmatched characters are detected. In such a case, or when the score of the degree of coincidence is equal to or more than a predetermined value, the processing can be appropriately terminated midway for speeding up.

【００６６】また、第１〜第３実施例における検索処理
では、いずれの候補文字にも一致しない文字が検出され
た場合にも以降の文字の比較を続行するので、ＢＭ法の
ような高速化アルゴリズムを用いることはできない。し
かし、例えば不一致の文字が所定数以上検出されたとき
に以降の文字の比較を打ち切るようにする場合には、Ｂ
Ｍ法などを応用した高速化アルゴリズムを利用すること
も可能となる。In the search processing in the first to third embodiments, even if a character that does not match any of the candidate characters is detected, the comparison of the subsequent characters is continued. No algorithm can be used. However, for example, when the comparison of the subsequent characters is terminated when a predetermined number or more of mismatched characters are detected, B
It is also possible to use a high-speed algorithm applying the M method or the like.

【００６７】さらに、第１実施例や第３実施例では、不
確定文字列の検索文字列による検索を行うが、この不確
定文字列は、文字ごとの各候補文字を選択[union]によ
って結合すると共に、各文字を連結[concatenation]に
よって結合した正規表現[regular expression]として表
すことができるので、上記検索処理に代えて、この正規
表現によって構成される有限オートマトン[finite auto
maton]を用いた検索処理を行うことも可能となる。ただ
し、この場合には、文字ごとの各候補文字の状態に加え
て、これらの候補文字以外の全ての文字を受理する状態
を追加すると共に、各候補文字の状態に確度値を対応付
けておき、部分文字列から文字を受理することにより実
際に遷移した状態の確度値を順次加算するような処理と
する必要がある。Further, in the first and third embodiments, a search is performed by using a search character string for an uncertain character string. The uncertain character string is combined by selecting [union] each candidate character for each character. In addition, since each character can be represented as a regular expression [regular expression] combined by concatenation, a finite automaton [finite automaton] formed by this regular expression is used instead of the above search processing.
maton] can also be used. However, in this case, in addition to the state of each candidate character for each character, a state of accepting all characters other than these candidate characters is added, and a probability value is associated with the state of each candidate character. In addition, it is necessary to perform a process of sequentially adding the certainty values of the state that has actually transitioned by receiving a character from the partial character string.

【００６８】また、第１〜第３実施例における検索処理
では、被検索文字列に含まれる全ての部分文字列を検索
対象としたが、この被検索文字列の文字数が検索文字列
の文字数に一致する場合にのみ、被検索文字列全体を部
分文字列として検索を行うようにすることもできる。In the search processing in the first to third embodiments, all the partial character strings included in the searched character string are searched, but the number of characters in the searched character string is reduced to the number of characters in the searched character string. Only when there is a match, the search can be performed with the entire searched character string as a partial character string.

【００６９】[0069]

【発明の効果】以上のように本発明の文字列検索装置に
よれば、検索文字列と被検索文字列に含まれる各部分文
字列との一致度を候補文字の確度値に基づいて算出し、
一致した候補文字が正しい文字である可能性が高いほど
この一致度の比較によって優位に抽出することができる
ので、文字読取装置などによって入力した各文字の確定
していない不確定文字列を検索文字列や被検索文字列と
して検索を行うことができるようになる。この際、検索
文字列と被検索文字列のいずれが不確定文字列であって
もよく、また、双方が不確定文字列であってもよい。し
かも、一部の文字が不確定文字列の対応する文字の全て
の候補文字に一致しない場合があったとしても、他の文
字が正確に一致すれば比較的高い一致度を得ることがで
きるので、文字認識が全くできなかった場合や誤認識が
生じていた場合にも検索が可能となる。したがって、例
えば手書き文字読取装置によって検索文字列を入力した
場合や光学的文字読取装置によって入力した被検索文字
列に対しても、文字の確定作業や訂正作業を行うことな
く検索を実行することができるようになる。As described above, according to the character string search apparatus of the present invention, the matching degree between the search character string and each partial character string included in the searched character string is calculated based on the certainty value of the candidate character. ,
The higher the likelihood that the matched candidate character is a correct character, the more it can be extracted by comparing the degree of matching. Therefore, the uncertain character string of each character that has been input by a character reader or the like is searched for. The search can be performed as a column or a character string to be searched. At this time, either the search character string or the searched character string may be an indeterminate character string, or both may be indeterminate character strings. Moreover, even if some characters do not match all the candidate characters of the corresponding characters in the uncertain character string, a relatively high degree of matching can be obtained if the other characters match exactly. Also, the search can be performed even when character recognition cannot be performed at all or erroneous recognition has occurred. Therefore, for example, even when a search character string is input by a handwritten character reading device or a search target character string input by an optical character reading device, a search can be performed without performing a character fixing operation or a correction operation. become able to.

[Brief description of the drawings]

【図１】本発明の第１実施例を示すものであって、文字
列検索装置の構成を示すブロック図である。FIG. 1 is a block diagram illustrating a configuration of a character string search device according to a first embodiment of the present invention.

【図２】本発明の第１実施例を示すものであって、文字
列検索装置を備えた電子機器の構成を示すブロック図で
ある。FIG. 2, showing the first embodiment of the present invention, is a block diagram illustrating a configuration of an electronic apparatus including a character string search device.

【図３】本発明の第１実施例を示すものであって、文字
列検索装置における検索処理の動作を示すフローチャー
トである。FIG. 3, showing the first embodiment of the present invention, is a flowchart showing an operation of a search process in a character string search device.

【図４】本発明の第１実施例を示すものであって、検索
処理における一致度の計算処理の動作を示すフローチャ
ートである。FIG. 4 is a flowchart illustrating the operation of calculating the degree of coincidence in the search processing according to the first embodiment of the present invention.

【図５】本発明の第１実施例を示すものであって、検索
文字列の候補文字と部分文字列の文字とが一致する状態
を示す図である。FIG. 5 illustrates the first embodiment of the present invention, and illustrates a state where candidate characters of a search character string match characters of a partial character string.

【図６】本発明の第２実施例を示すものであって、検索
処理における一致度の計算処理の動作を示すフローチャ
ートである。FIG. 6 is a flow chart showing a second embodiment of the present invention and showing an operation of a process of calculating a degree of coincidence in a search process.

【図７】本発明の第２実施例を示すものであって、検索
文字列の文字と部分文字列の候補文字とが一致する状態
を示す図である。FIG. 7 illustrates the second embodiment of the present invention, and illustrates a state in which characters of a search character string match candidate characters of a partial character string.

【図８】本発明の第３実施例を示すものであって、検索
処理における一致度の計算処理の動作を示すフローチャ
ートである。FIG. 8 is a flow chart showing a third embodiment of the present invention and showing an operation of a process of calculating a degree of coincidence in a search process.

【図９】本発明の第３実施例を示すものであって、検索
文字列の候補文字と部分文字列の候補文字とが一致する
状態を示す図である。FIG. 9 illustrates the third embodiment of the present invention, and illustrates a state where candidate characters of a search character string match candidate characters of a partial character string.

【図１０】従来例を示すものであって、検索文字列と被
検索文字列の各部分文字列との関係を示す図である。FIG. 10 illustrates a conventional example, and is a diagram illustrating a relationship between a search character string and each partial character string of a search target character string.

【図１１】従来例を示すものであって、文字列の検索処
理の動作を示すフローチャートである。FIG. 11 shows a conventional example, and is a flowchart showing an operation of a character string search process.

[Explanation of symbols]

１演算装置３補助記憶装置５タブレット６イメージスキャナ７入力ペン１１手書き文字読取装置１２検索処理部 REFERENCE SIGNS LIST 1 arithmetic unit 3 auxiliary storage device 5 tablet 6 image scanner 7 input pen 11 handwritten character reading device 12 search processing unit

Claims

(57) [Claims]

A search character string input means for inputting, as a search character string, an uncertain character string in which each character is composed of one or two or more candidate characters and a probability value is associated with each candidate character; A database consisting of a set of data having a character string to be searched for which characters are fixed; and for each partial character string having the same number of characters as the search character string included in each character string to be searched in each data in the database, If each character in the string matches any of the candidate characters in the corresponding search string at the same character position, the matching of the substring based on the likelihood value of the candidate character that matched for each character Means for calculating the degree of match, and the degree of match of each partial character string calculated by the degree of match calculation means is compared with the degree of match of another partial character string or a predetermined value, and selected based on the comparison result. Department A character string search device comprising: data extraction means for extracting data having a character string to be searched including a minute character string from the database.

2. A search character string input means for inputting a search character string in which a character is fixed, and a character string comprising one or more candidate characters, and a probability value associated with each candidate character. A database including a set of data having a searched character string that is a fixed character string; and a partial character string having the same number of characters as the search character string included in each searched character string in each data in the database. If any candidate character of each character in the partial character string matches the character of the corresponding search character string at the same character position, based on the certainty value of the candidate character matched for each character, Matching degree calculating means for calculating the matching degree; comparing the matching degree of each partial character string calculated by the matching degree calculating means with the matching degree of another partial character string or a predetermined value, and selecting based on the comparison result Part A character string search device comprising: data extraction means for extracting data having a character string to be searched including a minute character string from the database.

3. Search character string input means for inputting, as a search character string, an uncertain character string in which each character is composed of one or more candidate characters, and a probability value is associated with each candidate character, A database consisting of a set of data each having one or more candidate characters, and having a character string to be searched, which is an uncertain character string associated with a certainty value for each candidate character; For each partial character string having the same number of characters as the search character string included in each searched character string in each data, a search character string in which any candidate character of each character of the partial character string corresponds at the same character position A matching degree calculating means for calculating a matching degree of the partial character string based on the certainty value of both candidate characters matched for each character when the matching character matches any of the candidate characters; Every time The degree of coincidence of each partial character string calculated by the arithmetic means is compared with the degree of coincidence of another partial character string or a predetermined value, and data having a searched character string including the partial character string selected based on the comparison result is obtained. A character string search device comprising: data extraction means for extracting data from a database.

4. The search character string input means identifies each character of the input character string by a character reading device, and selects one or more candidate characters for each character.
4. The character string search device according to claim 1, wherein a certainty value indicating recognition accuracy is added to each of the selected candidate characters.

5. Each character string to be searched for each data in the database identifies each character of the input character string by a character reading device, and selects one or more candidate characters for each character. 3. A method according to claim 2, wherein a certainty value indicating the accuracy of recognition is added to each of said candidate characters.
5. The character string search device according to any one of-to 4.