JP5224847B2

JP5224847B2 - Pattern recognition method, character recognition method, pattern recognition program, and character recognition program

Info

Publication number: JP5224847B2
Application number: JP2008039137A
Authority: JP
Inventors: 倫行浜村; 文平入江; 直毅名取; 琢磨赤木
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2008-02-20
Filing date: 2008-02-20
Publication date: 2013-07-03
Anticipated expiration: 2028-02-20
Also published as: JP2009199256A

Description

本発明は、たとえば、被読取物上に記載されている文字を認識する文字認識装置あるいは生体情報により人物を認識する生体照合装置などに用いられるパターン認識方法、文字認識方法、パターン認識プログラムおよび文字認識プログラムに関する。 The present invention relates to a pattern recognition method, a character recognition method, a pattern recognition program, and a character used in, for example, a character recognition device for recognizing characters described on an object to be read or a biometric matching device for recognizing a person based on biometric information. It relates to recognition programs.

従来、パターン認識方法には、種々の手法が提案されている。たとえば、特開２００４−１７８２８０号公報（特許文献１）および特開２００１−２８３１５６号公報（特許文献２）には、階層構造を持つ住所情報に対するパターン認識方法が開示されている。
上記特開２００４−１７８２８０号公報には、ある階層における認識結果の候補に対する各単語の信頼度の和を評価値とし、評価値が上位Ｅ個の認識結果の候補のみ残し、残りを破棄する手法が開示されている。
また、上記特開２００１−２８３１５６号公報には、各単語内の文字が単語長に対し一定割合以上認識されていない認識結果の候補を破棄する手法が開示されている。 Conventionally, various methods have been proposed for pattern recognition methods. For example, JP-A-2004-178280 (Patent Document 1) and JP-A-2001-283156 (Patent Document 2) disclose a pattern recognition method for address information having a hierarchical structure.
Japanese Patent Application Laid-Open Publication No. 2004-178280 discloses a method in which the sum of reliability of each word with respect to recognition result candidates in a certain hierarchy is used as an evaluation value, only the recognition result candidates having the highest E evaluation values are left, and the rest are discarded. Is disclosed.
Japanese Laid-Open Patent Publication No. 2001-283156 discloses a technique for discarding recognition result candidates in which characters in each word are not recognized at a certain rate or more with respect to the word length.

また、IEEE Trans. Pattern Analysis and Machine Intelligence, vol.11, no.1, pp.68-83, Jan.1989（非特許文献１）には、事後確率を事前確率で除した値を評価値とし、評価値が最大となる候補を次の探索対象とする手法が開示されている。なお、事後確率は、Ｐ（ｃ｜ｘ）と書き表される。事後確率Ｐ（ｃ｜ｘ）は、認識処理により得られた出力ｘを条件とし、候補ｃが正解である確率を意味するものと定義されている。また、事前確率は、Ｐ（ｃ）と書き表されるものとする。事前確率Ｐ（ｃ）は、認識処理を行っていない段階において、候補ｃが正解である確率のことを意味するものと定義されている。 Also, IEEE Trans. Pattern Analysis and Machine Intelligence, vol.11, no.1, pp.68-83, Jan.1989 (Non-Patent Document 1) uses the value obtained by dividing the posterior probability by the prior probability as the evaluation value. A method is disclosed in which a candidate having the maximum evaluation value is set as the next search target. The posterior probability is expressed as P (c | x). The posterior probability P (c | x) is defined to mean the probability that the candidate c is correct, with the output x obtained by the recognition processing as a condition. The prior probability is written as P (c). The prior probability P (c) is defined as meaning the probability that the candidate c is correct when the recognition process is not performed.

しかしながら、特開２００４−１７８２８０号公報に開示されている手法では、誤って正解候補が破棄されてしまうことがある。これは、各単語の信頼度の和に基づく評価値を用いて認識結果の候補を並べ替えているためである。
また、特開２００１−２８３１５６号公報に開示されている手法は、実際の運用面において非現実的であることがある。これは、認識対象とする各パターンに対する認識処理が複雑かつ詳細である場合、破棄されない認識結果も全て探索対象となることにより、膨大な処理時間が必要となるためである。
また、IEEE Trans. Pattern Analysis and Machine Intelligence, vol.11, no.1, pp.68-83, Jan.1989 に開示されている手法は、正解となる認識結果を得るのに処理時間がかかってしまう。特に、認識対象となる各パターンの事前確率に偏りがある場合、膨大な処理時間がかかるものと考えられる。これは、評価値として、事後確率を事前確率で除した値を用いているためである。
特開２００４−１７８２８０号公報特開２００１−２８３１５６号公報「IEEE Trans. Pattern Analysis and Machine Intelligence, vol.11, no.1, pp.68-83, Jan.1989」 However, with the method disclosed in Japanese Patent Application Laid-Open No. 2004-178280, correct answer candidates may be discarded by mistake. This is because the recognition result candidates are rearranged using an evaluation value based on the sum of the reliability of each word.
Further, the method disclosed in Japanese Patent Laid-Open No. 2001-283156 may be unrealistic in actual operation. This is because if the recognition process for each pattern to be recognized is complex and detailed, a huge amount of processing time is required because all recognition results that are not discarded are also searched.
Also, the method disclosed in IEEE Trans.Pattern Analysis and Machine Intelligence, vol.11, no.1, pp.68-83, Jan.1989 takes time to obtain a correct recognition result. End up. In particular, when the prior probabilities of the patterns to be recognized are biased, it is considered that a huge amount of processing time is required. This is because a value obtained by dividing the posterior probability by the prior probability is used as the evaluation value.
JP 2004-178280 A JP 2001-283156 A `` IEEE Trans. Pattern Analysis and Machine Intelligence, vol.11, no.1, pp.68-83, Jan.1989 ''

この発明の一形態は、効率的にパターンを認識することができるパターン認識方法、文字認識方法、パターン認識プログラムおよび文字認識プログラムを提供することを目的とする。 An object of one aspect of the present invention is to provide a pattern recognition method, a character recognition method, a pattern recognition program, and a character recognition program that can efficiently recognize a pattern.

この発明の一形態に係るパターン認識方法は、複数の段階でパターンを認識する処理を行う情報処理装置に用いられる方法であって、認識候補に属する次の段階の認識候補を展開し、各認識候補に従属する次の段階以降の認識候補に対する認識処理に要する時間を推定し、展開された各認識候補に対して、認識処理済みの認識候補に対する全認識処理結果を条件とした事後確率と前記推定した認識処理に要する時間とに基づいて評価値を算出し、算出した各認識候補に対する評価値に基づいて認識候補を選出し、前記選出した認識候補からパターンの認識結果を決定する。 Pattern recognition method according to an embodiment of the invention provides a method for use in an information processing apparatus that performs processing for recognizing a pattern in a plurality of stages, to expand the recognition candidates for the next step belonging to the recognition candidates, the recognition estimating a time required for recognition processing on the recognition candidates subordinate subsequent stage to the candidate, for each recognition candidate deployed, subject to full recognition processing result for the recognition processed recognition candidates was the posterior probability the An evaluation value is calculated based on the estimated time required for recognition processing, a recognition candidate is selected based on the calculated evaluation value for each recognition candidate , and a pattern recognition result is determined from the selected recognition candidate.

この発明の一形態に係る文字認識方法は、複数の階層の情報からなる文字情報を認識する処理を行う文字認識装置に用いられる方法であって、単語候補に属する次の段層の単語候補を展開し、各単語候補に従属する次の階層以降の認識候補に対する認識処理に要する時間を推定し、展開された各単語候補に対して、文字認識処理済みの単語候補に対する全文字認識処理結果を条件とした事後確率と前記推定した認識処理に要する時間とに基づいて評価値を算出し、算出した各単語候補に対する評価値に基づいて単語候補を選出し、前記選出した単語候補から文字情報全体の認識結果を決定する。 A character recognition method according to an aspect of the present invention is a method used in a character recognition device that performs processing for recognizing character information including information of a plurality of layers, and a word candidate of a next layer belonging to a word candidate is determined. Expand, estimate the time required for the recognition process for the next and subsequent recognition candidates subordinate to each word candidate , and for each expanded word candidate, display the result of all character recognition processing for the word recognition processed word candidate. An evaluation value is calculated based on the posterior probability as a condition and the estimated time required for the recognition process, word candidates are selected based on the calculated evaluation value for each word candidate , and the entire character information is selected from the selected word candidates The recognition result of is determined.

この発明の一形態に係るパターン認識プログラムは、コンピュータに、複数の段階でパターンを認識する処理を行わせるためのプログラムであって、認識候補に属する次の段階の認識候補を展開する機能と、各認識候補に従属する次の段階以降の認識候補に対する認識処理に要する時間を推定する機能と、展開された各認識候補に対して、認識処理済みの認識候補に対する全認識処理結果を条件とした事後確率と前記推定した認識処理に要する時間とに基づいて評価値を算出する機能と、算出した各認識候補に対する評価値に基づいて認識候補を選出する機能と、選出した認識候補からパターンの認識結果を決定する機能とを実現させる。 A pattern recognition program according to an aspect of the present invention is a program for causing a computer to perform a process of recognizing a pattern at a plurality of stages, and a function of developing a recognition candidate at a next stage belonging to a recognition candidate; A function that estimates the time required for recognition processing for the next and subsequent recognition candidates subordinate to each recognition candidate, and for each expanded recognition candidate, all recognition processing results for recognition candidates that have already been processed are used as conditions. A function for calculating an evaluation value based on the posterior probability and the estimated time required for the recognition process, a function for selecting a recognition candidate based on the calculated evaluation value for each recognition candidate, and pattern recognition from the selected recognition candidate Realize the function to determine the result.

この発明の一形態に係る文字認識プログラムは、コンピュータに、複数の階層の情報からなる文字情報の認識処理を行わせるためのプログラムであって、単語候補に属する次の段階の単語候補を展開する機能と、各単語候補に従属する次の階層以降の認識候補に対する認識処理に要する時間を推定する機能と、展開された各単語候補に対して、文字認識処理済みの単語候補に対する全文字認識処理結果を条件とした事後確率と前記推定した認識処理に要する時間とに基づいて評価値を算出する機能と、算出した各単語候補に対する評価値に基づいて単語候補を選出する機能と、選出した単語候補から文字情報全体の認識結果を決定する機能とを実現させる。 A character recognition program according to an aspect of the present invention is a program for causing a computer to perform recognition processing of character information composed of information of a plurality of layers, and expands a word candidate at a next stage belonging to a word candidate. A function, a function for estimating the time required for recognition processing for recognition candidates for the next and subsequent layers subordinate to each word candidate, and all character recognition processing for word candidates subjected to character recognition processing for each expanded word candidate A function for calculating an evaluation value based on the posterior probability based on the result and the estimated time required for the recognition process, a function for selecting a word candidate based on the calculated evaluation value for each word candidate, and the selected word And a function for determining a recognition result of the entire character information from the candidates.

この発明の一形態によれば、効率的にパターンを認識することができるパターン認識方法、文字認識方法、パターン認識プログラムおよび文字認識プログラムを提供することができる。 According to one aspect of the present invention, it is possible to provide a pattern recognition method, a character recognition method, a pattern recognition program, and a character recognition program that can efficiently recognize a pattern.

本発明の実施の形態について図面を参照しつつ詳細に説明する。
まず、本実施の形態としてのパターン認識処理について概略的に説明する。
パターン認識処理では、所定時間内での処理の必要性から、高速化な処理が必要となることがしばしば起こる。例えば、パターン認識処理の代表的な例としては、住所情報などように複数の階層の情報から構成される文字情報全体を認識する処理がある。住所情報の認識処理では、先に候補数の少ない上位階層を認識し、その結果を用いて下位階層の候補を絞り込むことがしばしば行われる。このような複数段階の処理からなるパターン認識処理（複数段構成のパターン認識）では、処理全体を高速化するため、初期段階の処理において、高速動作する粗い識別器による認識処理（つまり、処理速度を重視して候補を選出する認識処理）が実行される。この場合、後段の処理では、前段の処理結果を用いて、順次、処理範囲あるいは処理対象を絞り込むという方法が取られることがある。このような複数段階の候補から正解となる候補を選出する問題は、探索問題と捉えることが可能である。 Embodiments of the present invention will be described in detail with reference to the drawings.
First, the pattern recognition process as this embodiment will be schematically described.
In pattern recognition processing, it is often the case that high-speed processing is required due to the necessity of processing within a predetermined time. For example, as a typical example of the pattern recognition process, there is a process for recognizing entire character information composed of information of a plurality of layers such as address information. In address information recognition processing, it is often the case that an upper hierarchy with a small number of candidates is recognized first, and candidates of lower hierarchies are narrowed down using the result. In such a pattern recognition process consisting of a plurality of stages (pattern recognition of a multi-stage configuration), in order to speed up the whole process, a recognition process (that is, a processing speed) by a coarse classifier that operates at high speed in the initial stage process. (Recognition processing for selecting candidates with emphasis on). In this case, in the subsequent process, a method of sequentially narrowing down the processing range or processing target using the process result of the previous stage may be taken. Such a problem of selecting a candidate that is a correct answer from a plurality of candidates can be regarded as a search problem.

すなわち、複数段構成のパターン認識では、前段の認識結果などを用いて、後段の処理範囲あるいは対象を絞ることが可能である。このような手法は、ビーム探索を行っていることに相当する。本実施の形態で説明する複数段構成のパターン認識処理には、各段階の候補を事後確率などによる評価値に基づいて順次絞り込む探索手法が適用される。また、探索方法としては、たとえば、各段において所定数の候補を順次絞り込む手法、あるいは、最良優先探索の手法などが想定される。なお、事後確率は、Ｐ（ｃ｜ｘ）と書き表され、認識処理により得られた出力ｘを条件とし、候補ｃが正解である確率を意味するものとする。事後確率Ｐ（ｃ｜ｘ）は、後述する近似計算法により近似値が算出されるものである。また、事前確率は、Ｐ（ｃ）と書き表され、認識処理を行っていない段階において、候補ｃが正解である確率のことを意味するものとする。 That is, in pattern recognition with a multi-stage configuration, it is possible to narrow down the processing range or target of the subsequent stage using the recognition result of the previous stage. Such a method corresponds to performing a beam search. In the pattern recognition processing with a multi-stage configuration described in the present embodiment, a search technique for sequentially narrowing down candidates at each stage based on an evaluation value based on a posteriori probability or the like is applied. As a search method, for example, a method of sequentially narrowing down a predetermined number of candidates in each stage or a method of best priority search is assumed. The posterior probability is written as P (c | x), and means the probability that the candidate c is a correct answer on the condition of the output x obtained by the recognition process. The posterior probability P (c | x) is an approximate value calculated by an approximate calculation method described later. The prior probability is written as P (c) and means the probability that the candidate c is correct in the stage where the recognition process is not performed.

一方、複数段構成のパターン認識処理の代表的な例としては、顔画像などの生体情報による人物の認識処理、あるいは、複数層の情報から構成される住所情報の認識処理などがある。たとえば、住所情報は、第１階層の情報（例えば、都市名）、第２階層の情報（例えば、町名、ストリート名）、第３階層の情報（たとえば、番地、ストリート番号）などの複数階層の情報から構成される。これらの各階層の情報を組み合わせて構成される住所情報の総数は、数百万から数千万に及ぶ。このため、全住所情報について認識処理を行うことは、効率的ではない。本実施の形態で説明するパターン認識方法は、複数層の情報からなる情報（たとえば、住所情報）の認識処理に適用することにより、情報の認識処理を効率的かつ高速に認識結果を得ることが可能である。 On the other hand, as a typical example of the pattern recognition process having a plurality of stages, there is a person recognition process using biological information such as a face image, or an address information recognition process including a plurality of layers of information. For example, the address information includes information on multiple layers such as information on the first layer (for example, city names), information on the second layer (for example, town names and street names), and information on the third layer (for example, street addresses and street numbers). Consists of information. The total number of address information configured by combining the information of each layer ranges from several million to several tens of millions. For this reason, it is not efficient to perform recognition processing for all address information. The pattern recognition method described in the present embodiment can be applied to recognition processing of information (for example, address information) composed of information of multiple layers, thereby obtaining recognition results efficiently and at high speed. Is possible.

ここで、単語の認識処理に関する技術として、文献１「浜村倫行、赤木琢磨、水谷博之、入江文平：“ワード長正規化されたベイズ推定によるワードマッチング”、画像の認識・理解シンポジウム(MIRU2000) 講演論文集II、pp.1-6(Jul.2000).」、文献２（「浜村倫行、赤木琢磨、入江文平：“単語認識における事後確率を用いた評価関数”、信学技報、PRMU2006-92,(Oct.2006).」）、文献３（「浜村倫行、赤木琢磨、入江文平：“事後確率を用いた解析的単語認識−文字切り出し数の正規化−”、信学技報、PRMU2006-238,(Mar. 2007).」）には、単語候補の評価値としての事後確率を算出する手法が開示されている。 Here, as a technology related to word recognition processing, Reference 1 “Lumiyuki Hamamura, Takuma Akagi, Hiroyuki Mizutani, Bunpei Irie:“ Word Matching by Word Length Normalized Bayesian Estimation ”, Image Recognition and Understanding Symposium (MIRU2000) Lecture Proceedings II, pp.1-6 (Jul. 2000).], Reference 2 ("Hamamura Tomoyuki, Akagi Takuma, Irie Fumihei:" Evaluation function using posterior probabilities in word recognition ", IEICE Technical Report, PRMU2006- 92, (Oct. 2006). ”, Reference 3 (“ Lumiyuki Hamamura, Takuma Akagi, Bunpei Irie: “Analytical Word Recognition Using Posterior Probability—Normalization of Number of Characters” —, IEICE Technical Report, PRMU2006 -238, (Mar. 2007) ") discloses a method for calculating a posteriori probability as an evaluation value of a word candidate.

上記文献１には、各文字の認識結果が独立であるとする近似を用いることにより、単語候補の事後確率比（事前確率と事後確率の比）が各文字の事後確率比の積に分解できることが記載されている。上記文献２には、上記文献１において大きな誤差を起こす近似を避ける式展開を行うことにより、正規化事後確率比を算出する手法が記載されている。上記文献３には、注目している場所以外にも何らかの文字が書かれていることを用いて、拡張事後確率比を算出する手法が記載されている。 In the above-mentioned document 1, by using an approximation that the recognition result of each character is independent, the posterior probability ratio of word candidates (ratio of prior probability and posterior probability) can be decomposed into the product of the posterior probability ratio of each character. Is described. Document 2 describes a method for calculating a normalized posterior probability ratio by performing expression expansion that avoids approximation that causes a large error in Document 1 above. Document 3 describes a method of calculating an extended posterior probability ratio using the fact that some characters are written in addition to the place of interest.

しかしながら、上記文献２あるいは上記文献３に記載されている正規化事後確率比あるいは拡張事後確率比の計算には、いずれも単語辞書内の全単語についての処理が必要である。これは、単語辞書内の単語の数が多ければ多いほど、正規化事後確率比あるいは拡張事後確率比の計算に時間がかかってしまうことを示唆している。これに対して、本実施の形態では、注目する１つの単語についての処理のみを行った状況においても、文献３のアイデアを適用できるような式展開について後述する。すなわち、本実施の形態で説明するパターン認識方法では、効率的な候補の探索が可能となる手法に利用可能な計算式により事後確率を算出するようになっている。 However, the calculation of the normalized posterior probability ratio or the expanded posterior probability ratio described in Document 2 or Document 3 requires processing for all words in the word dictionary. This suggests that the larger the number of words in the word dictionary, the longer it takes to calculate the normalized posterior probability ratio or the expanded posterior probability ratio. On the other hand, in the present embodiment, expression expansion that can apply the idea of Document 3 will be described later even in a situation where only the processing for one word of interest is performed. That is, in the pattern recognition method described in the present embodiment, the posterior probability is calculated by a calculation formula that can be used for a technique that enables efficient candidate search.

次に、複数段構成のパターン認識と探索問題との関係について説明する。
ここでは、複数段構成のパターン認識処理の例として、顔認識処理と住所認識処理とについて説明する。
まず、顔認識処理の例について説明する。
顔認識処理は、バイオメトリクスの一種である。顔認識処理では、生体情報としての人物の顔画像に基づいて個人を特定する。顔認識処理は、大きく分けて、以下のような３段階の処理により、認識結果が得られるようになっている。すなわち、顔認識処理は、第１段階の処理として取得した画像（入力画像）内から人物の画像を検出し、第２段階の処理として検出した人物の画像から顔画像を検出し、第３段階の処理として検出した顔画像と登録者の顔画像との照合を行う。 Next, the relationship between multi-stage pattern recognition and a search problem will be described.
Here, a face recognition process and an address recognition process will be described as examples of a pattern recognition process having a plurality of stages.
First, an example of face recognition processing will be described.
Face recognition processing is a kind of biometrics. In the face recognition process, an individual is specified based on a person's face image as biometric information. The face recognition process is roughly divided to obtain a recognition result by the following three-stage process. That is, the face recognition process detects a person image from the image (input image) acquired as the first stage process, detects a face image from the person image detected as the second stage process, and performs the third stage. The face image detected as the process is compared with the face image of the registrant.

第１段階の処理は、入力画像に対して人物検出用識別器を適用することにより実行される人物検出処理である。上記人物検出用識別器は、入力画像に対して位置及びサイズを変化させた人物検出用のテンプレートを重ね合わせることにより人物の画像を検出するものである。この第１段階の処理は、たとえば、高速に処理結果を得るために、低解像度の画像において実施される。 The first stage process is a person detection process executed by applying a person detection classifier to the input image. The person detection classifier detects a person image by superimposing a person detection template whose position and size are changed on an input image. This first-stage processing is performed on a low-resolution image, for example, in order to obtain a processing result at high speed.

第２段階の処理は、第１段階の処理で得られた人物の画像に対して顔検出用識別器を適用することより実行される顔検出処理である。上記顔検出用識別器は、人物の画像に対して位置及びサイズを変化させた顔検出用のテンプレートを重ね合わせることにより顔画像を検出する。この第２段階の処理は、たとえば、高精度に顔を検出するために、高解像度の画像において実施される。 The second stage process is a face detection process that is executed by applying a face detection classifier to the person image obtained in the first stage process. The face detection classifier detects a face image by superimposing a face detection template whose position and size are changed on a human image. This second-stage processing is performed on a high-resolution image in order to detect a face with high accuracy, for example.

第３段階の処理は、第２段階の処理で得られた顔画像と登録者の顔画像との照合を行う顔照合処理である。第３段階の処理は、たとえば、高精度に顔を識別するために、高解像度の顔画像において実行される。 The third stage process is a face collation process for collating the face image obtained in the second stage process with the registrant's face image. The third stage process is performed on a high-resolution face image, for example, in order to identify the face with high accuracy.

図１は、顔認識処理の各段階の処理を探索木で表現したものである。なお、探索木とは、複数階層の候補をノードとして表示した場合の探索の様子を示す概念図である。図１に示す探索木では、登録者の人数が３人であるものとする。
図１に示すような探索木では、各段階の候補がそれぞれノードとして表現される。第１段階の処理では、人物検出用識別器を種々の位置及びサイズで走らせて複数の人物検出結果を得る。図１に示す探索木では、１段目の各ノードが第１段階の候補として得られる複数の人物検出結果に対応する。つまり、探索木においてルートノードに従属する１段目の各ノードは、人物検出用識別器が１つの位置及びサイズで検出した人物の検出結果に相当している。第２段階の処理では、第１段階の処理として得られた各人物検出結果（人物の画像）に対して、複数の顔検出結果が得られる。図１に示す探索木では、２段目の各ノードが第２段階の処理結果として得られる複数の顔検出結果に対応する。従って、探索木における１段目の各ノードには、それぞれ複数の２段目のノードが従属している。第３段階の処理では、第２段階の処理として得られた各顔検出結果（顔画像）に対して、登録人数分の照合結果が得られる。図１に示す探索木では、３段目の各ノードが第３段階の処理結果として得られる各登録者に対する照合結果に対応する。従って、探索木における２段目の各ノードには、それぞれ登録人数分の３段目のノードが従属している。 FIG. 1 represents the process of each stage of the face recognition process by a search tree. Note that the search tree is a conceptual diagram showing a state of search when candidates in a plurality of layers are displayed as nodes. In the search tree shown in FIG. 1, it is assumed that the number of registrants is three.
In the search tree as shown in FIG. 1, each stage candidate is represented as a node. In the first stage of processing, the person detection classifier is run at various positions and sizes to obtain a plurality of person detection results. In the search tree shown in FIG. 1, each node in the first stage corresponds to a plurality of person detection results obtained as candidates in the first stage. That is, each node in the first stage subordinate to the root node in the search tree corresponds to a person detection result detected by the person detection classifier at one position and size. In the second stage process, a plurality of face detection results are obtained for each person detection result (person image) obtained as the first stage process. In the search tree shown in FIG. 1, each node in the second stage corresponds to a plurality of face detection results obtained as a processing result in the second stage. Accordingly, a plurality of second-stage nodes are subordinate to each first-stage node in the search tree. In the third-stage process, a matching result for the registered number of persons is obtained for each face detection result (face image) obtained as the second-stage process. In the search tree shown in FIG. 1, each node in the third stage corresponds to the collation result for each registrant obtained as the processing result in the third stage. Therefore, the second-tier nodes in the search tree are subordinate to the third-tier nodes corresponding to the registered number.

図１に示すような各ノードの親子関係は、包含関係を意味する。例えば、図１に示す第１段階の１つの処理結果としてのノードＡには、４つのノードＢ〜Ｅが従属している。ノードＢ〜Ｅは、ノードＡの処理結果に基づく４つの第２段階の処理結果（顔検出結果）に相当している。ノードＣには、３つのノードＦ〜Ｈが従属している。ノードＦ〜ノードＨは、第２段階の処理結果としてのノードＣに基づく３つの第３段階の処理結果（３人の登録者との照合結果）に相当する。 The parent-child relationship of each node as shown in FIG. 1 means an inclusion relationship. For example, four nodes B to E are subordinate to the node A as one processing result of the first stage shown in FIG. Nodes B to E correspond to four second-stage processing results (face detection results) based on the processing results of node A. The node C is subordinate to three nodes F to H. Node F to node H correspond to three third-stage processing results (matching results with three registrants) based on node C as the second-stage processing results.

顔認識処理では、３段目の全ノードに対応する顔認識処理を全て実行できれば理想的ではある。しかしながら、処理を高速化するためには、第１段階の処理結果および第２段階の処理結果を利用して効率的に第３段階の処理（照合処理）を行う必要がある。これは、図１に示すような探索木を効率的に探索するという探索問題の解決方法に相当する。 In face recognition processing, it is ideal if all face recognition processing corresponding to all nodes in the third stage can be executed. However, in order to increase the processing speed, it is necessary to efficiently perform the third stage process (collation process) using the first stage process result and the second stage process result. This corresponds to a solution to the search problem of efficiently searching a search tree as shown in FIG.

次に、住所情報の認識処理の例について説明する。
図２は、住所データベースの例を示す図である。ここでは、認識対象とする住所情報は、複数段階の情報から構成されているものとする。図２に示す例では、認識対象とする住所情報は、１段目が都市名、２段目が町名、３段目が番地である。 Next, an example of address information recognition processing will be described.
FIG. 2 is a diagram illustrating an example of an address database. Here, it is assumed that the address information to be recognized is composed of information of a plurality of stages. In the example shown in FIG. 2, the address information to be recognized includes a city name in the first row, a town name in the second row, and an address in the third row.

また、図３は、住所情報の表記例を示す図である。図３は、認識処理の入力画像として与えれる画像の例を示しているものとする。たとえば、図３に示すような画像は、住所情報が記載された媒体をスキャナなどで読取ることにより得られる情報であるものとする。図３に示すような住所情報を含む画像に対しては、行候補検出、単語候補切出し、文字候補切出し、各文字候補に対する個別文字認識などが実行される。個別の文字認識結果が得られると、個別の文字認識結果に基づいて複数の単語候補が選出される。以下の説明では、各単語候補と住所データベース内の各単語とをマッチングする処理について説明するものである。 FIG. 3 is a diagram illustrating a notation example of address information. FIG. 3 shows an example of an image given as an input image for recognition processing. For example, an image as shown in FIG. 3 is information obtained by reading a medium on which address information is described with a scanner or the like. For an image including address information as shown in FIG. 3, line candidate detection, word candidate extraction, character candidate extraction, individual character recognition for each character candidate, and the like are executed. When individual character recognition results are obtained, a plurality of word candidates are selected based on the individual character recognition results. In the following description, processing for matching each word candidate and each word in the address database will be described.

図４は、図３に示す住所情報の画像から得られた単語候補の例を示す図である。図４に示す例では、単語候補として、Ｉ１〜Ｉ１０までの候補が検出されている。また、図５は、マッチング処理を探索木で示すものである。すなわち、単語候補Ｉｉと住所データベース内の単語との組が、探索木の１つのノードに対応し、１組に対してマッチングを行うことが、１つのノードを探索することに相当する。 FIG. 4 is a diagram illustrating an example of word candidates obtained from the address information image illustrated in FIG. 3. In the example shown in FIG. 4, candidates I1 to I10 are detected as word candidates. FIG. 5 shows the matching process using a search tree. That is, a pair of the word candidate Ii and a word in the address database corresponds to one node of the search tree, and matching one set corresponds to searching for one node.

住所情報の認識処理としては、住所データベース内の全単語について、全単語候補とマッチングすれば理想的である。しかしながら、住所データベースに記憶されている単語は膨大である。このため、複数段構成のパターン認識方法では、上位の階層からマッチング処理を行うことにより、効率的に解を探索（住所情報全体の認識結果を決定）する。 As address information recognition processing, it is ideal to match all words in the address database with all word candidates. However, the words stored in the address database are enormous. For this reason, in the pattern recognition method having a multi-stage configuration, a matching process is performed from an upper layer to efficiently search for a solution (determine the recognition result of the entire address information).

上述したような複数段構成のパターン認識では、通常、上位の段の処理から順に行い、各段ごとに候補を絞り込んでから次の段の処理を行うという手順が取られる。例えば、図１の例では、１段目のノードに対して全て処理を行い、その中からｎ個のノードに絞り込む。次に、絞り込んだ１段目の各ノードに従属する２段目の各ノードを全て処理し、それらの２段目のノードの中からｎ個に絞り込む。さらに、絞り込んだ２段目の各ノードに従属する３段目の各ノードを全て処理し、それらの３段目のノードの中から最適な認識結果を決定する。このような一連の処理の流れを探索問題の視点から見ると、ビーム探索を行っていることに相当する。 In the multi-stage pattern recognition as described above, a procedure is generally performed in which processing is performed in order starting from the processing of the upper level, and after narrowing down candidates for each level, processing of the next level is performed. For example, in the example of FIG. 1, all processes are performed on the first-stage nodes, and the number of nodes is narrowed down to n. Next, all the nodes in the second stage subordinate to the nodes in the first stage that have been narrowed down are processed, and the number of nodes in the second stage is narrowed down to n. Further, all the nodes in the third stage subordinate to the narrowed-down nodes in the second stage are processed, and the optimum recognition result is determined from the nodes in the third stage. When such a series of processing flows is viewed from the viewpoint of a search problem, it corresponds to performing a beam search.

一方、優れた探索問題の解法として、最良優先探索という方法が知られている。最良優先探索という方法は、処理済のノードに従属する各子ノードのうち、未処理のものを全て記憶しておき、その中で最も評価値の高いノードを次の探索対象とする方法である。例えば、図１に示す例において、ノードＡとノードＣとが処理済、かつ、他のノードが未処理であるとすると、ノードＡおよびノードＣの子ノードで未処理のものは、ノードＢ、Ｄ、Ｅ、Ｆ、Ｇ、Ｈの６つのノードである。この場合、、ノードＢ、Ｄ、Ｅ、Ｆ、Ｇ、Ｈの６つのノードから次の探索対象を選ぶことになる。複数段構成のパターン認識においても、上記のような最良優先探索を用いることができれば、探索効率を向上することが可能と考えられる。 On the other hand, a method called best priority search is known as a solution to an excellent search problem. The best priority search method is a method in which all unprocessed child nodes subordinate to a processed node are stored, and the node with the highest evaluation value is selected as the next search target. . For example, in the example shown in FIG. 1, assuming that node A and node C have been processed and other nodes are unprocessed, the unprocessed child nodes of node A and node C are node B, There are six nodes D, E, F, G, and H. In this case, the next search target is selected from the six nodes B, D, E, F, G, and H. Even in the pattern recognition of a multi-stage configuration, if the best priority search as described above can be used, it is considered that the search efficiency can be improved.

一般に、ビーム探索では、同一段内のノード間での比較のみできればよい。このため、例えば、その段の識別器の出力する類似度等を評価値とすればよい。しかしながら、最良優先探索では、異なる段のノード間での比較が必要となる。このため、最良優先探索では、識別器の出力をそのまま用いても比較しても意味がないものとなってしまう。つまり、既存のビーム探索で用いられている評価値は、最良優先探索には用いることができない。 Generally, in beam search, it is only necessary to compare between nodes in the same stage. For this reason, for example, the degree of similarity output by the classifier at that stage may be used as the evaluation value. However, the best priority search requires comparison between nodes at different stages. For this reason, in the best priority search, it is meaningless to use the output of the discriminator as it is or to compare it. That is, the evaluation value used in the existing beam search cannot be used for the best priority search.

そこで、本実施の形態では、事後確率を評価値として用いるパターン認識方法を提供するものである。一般に、上述した非特許文献３などで定義される事後確率は、直接計算することが難しい。ただし、上述した文献１、２、３に記載の方法によれば、たとえば、単語認識において、個別文字認識の出力から近似的に事後確率を計算することが可能である。本実施の形態のバーコード認識処理では、上述した文献１、２、３に記載されている計算テクニックを応用することにより、評価値としての近似的な事後確率を計算する。 Therefore, in the present embodiment, a pattern recognition method using the posterior probability as an evaluation value is provided. In general, it is difficult to directly calculate the posterior probability defined in Non-Patent Document 3 and the like described above. However, according to the methods described in Documents 1, 2, and 3, the posterior probability can be calculated approximately from the output of individual character recognition, for example, in word recognition. In the barcode recognition process of the present embodiment, an approximate posterior probability as an evaluation value is calculated by applying the calculation techniques described in the above-mentioned documents 1, 2, and 3.

次に、最良優先探索に用いる評価値としての事後確率の計算法について説明する。
ここで、ノードｎｉに対応する処理結果をｘｉとする。たとえば、最良優先探索を顔認識認識に応用する場合、各ノードｎｉに対応する各段階の識別器の出力が処理結果ｘｉに相当する。また、最良優先探索を住所情報の認識処理に応用する場合、各ノードｎｉに対応する各段階における各単語候補内の文字認識などの結果が処理結果ｘｉに相当する。なお、以下の説明では、Ｘ、ｎｉ、Ｕｉ、Ｘｅｔｃを以下のように定義する。処理済の全ノードに対応する全処理結果をＸとする。ｎｉの親ノードを辿ることで到達できるノードの集合をＵｉとする。ｎｉが集合Ｕｉの元でないとする。Ｕｉに含まれないノードのうち、処理済のノードの全処理結果をＸｅｔｃとする。 Next, a method for calculating the posterior probability as an evaluation value used for the best priority search will be described.
Here, the processing result corresponding to the node ni is assumed to be xi. For example, when applying the best priority search to face recognition recognition, the output of the classifier at each stage corresponding to each node ni corresponds to the processing result xi. When applying the best priority search to address information recognition processing, the result of character recognition in each word candidate at each stage corresponding to each node ni corresponds to the processing result xi. In the following description, X, ni, Ui, and Xetc are defined as follows. Let X be all processing results corresponding to all processed nodes. Let Ui be the set of nodes that can be reached by following the parent node of ni. Assume that ni is not an element of the set Ui. Of the nodes not included in Ui, all processing results of processed nodes are set as Xetc.

図６は、探索木で表される各ノードの状態を示す図である。図６において、黒丸は処理済のノード、白丸と二重丸は未処理のノードであり、二重丸は処理済のノードを親に持つノードである。つまり、図６に示す例では、二重丸で表されたノードが次の探索対象の候補である。これらの各ノードについては、事後確率Ｐ（ｎｉ｜Ｘ）が計算される。ここで、事後確率Ｐ（ｎｉ｜Ｘ）とは、ノードｎｉだけでなく、Ｕｉに含まれる全ノードが満たされる確率を意味するものとする。たとえば、住所情報の認識処理の例では、単独の階層（町等）の単語が書かれている確率ではなく、その上位の階層の単語も同時に書かれている確率となり、住所が書かれている確率に相当する。なお、顔認識処理の例では、子ノードが満たされれば、親ノードが自動的に満たされる。

FIG. 6 is a diagram illustrating the state of each node represented by the search tree. In FIG. 6, a black circle is a processed node, a white circle and a double circle are unprocessed nodes, and a double circle is a node having a processed node as a parent. That is, in the example shown in FIG. 6, a node represented by a double circle is a candidate for the next search target. For each of these nodes, a posterior probability P (ni | X) is calculated. Here, the posterior probability P (ni | X) means not only the node ni but also the probability that all nodes included in Ui are satisfied. For example, in the example of address information recognition processing, the address is written because it is not the probability that a word in a single hierarchy (town, etc.) is written, but the word in a higher hierarchy is also written at the same time. It corresponds to the probability. In the example of face recognition processing, if the child node is satisfied, the parent node is automatically satisfied.

式（２）では、Ｕｉに含まれる各ノードの処理結果が、他の処理結果とは独立に起こっているとする近似を用いている。式（３）では、Ｐ（ｘｊ｜ｎｉ）≒Ｐ（ｘｊ｜ｎｊ）とする近似を用いた。式（３）を用いることで、ノードｎｉの事後確率Ｐ（ｎｉ｜Ｘ）を近似的に計算できる。顔認識処理の例では、ｘｊは単独の識別器の出力である。このため、データを集めることにより、Ｐ（ｘｊ｜ｎｊ）、Ｐ（ｘｊ）は容易に得ることが可能であり、事後確率の計算が可能となる。一方、住所情報の認識処理の例では、ｘｊには単語候補内の複数の文字認識結果が含まれている。このため、式（３）の計算が単純ではない。住所情報の認識処理における事後確率の計算方法については、後で詳細にするものとする。 Equation (2) uses an approximation that the processing result of each node included in Ui occurs independently of other processing results. In equation (3), an approximation of P (xj | ni) ≈P (xj | nj) is used. By using Expression (3), the posterior probability P (ni | X) of the node ni can be calculated approximately. In the example of face recognition processing, xj is the output of a single classifier. Therefore, by collecting data, P (xj | nj) and P (xj) can be easily obtained, and the posterior probability can be calculated. On the other hand, in the example of address information recognition processing, xj includes a plurality of character recognition results in word candidates. For this reason, the calculation of Formula (3) is not simple. The posterior probability calculation method in the address information recognition process will be described later in detail.

式（３）は、更に、

Equation (3) further defines

と変形することができる。事前確率と事後確率との比を事後確率比と呼ぶことにすると、「ノードｎｉの事後確率比は、その親ノードの事後確率比の積である」と表現できる。 And can be transformed. If the ratio between the prior probability and the posterior probability is called the posterior probability ratio, it can be expressed as “the posterior probability ratio of the node ni is the product of the posterior probability ratios of its parent node”.

次に、住所情報の認識処理における事後確率比の計算方法について説明する。
住所情報の認識処理において、ｘｊには、単語候補内の複数の文字認識結果が含まれている。このため、式（３）の事後確率比Ｐ（ｘｊ｜ｎｊ）／Ｐ（ｘｊ）の計算が容易ではないことが多い。上記文献１、２、３では、事後確率比の計算式がいくつか提案されている。それらの事後確率比の分母Ｐ（ｘｊ）に注目して整理する。まず、文献１では、ｘｊに含まれる各文字候補の文字認識結果を全て独立と近似している。これは、大きな誤差を起こしてしまうことと、文字候補構造（説明は後述）あるいは経路の選択に関する項が残ってしまい計算が困難となることがある。文献２に記載されている正規化事後確率比は、分母をＰ（ｘｊ）＝Σ_ｋＰ（ｘｊ、ｗｋ）とする変形により、分母の計算を単語辞書内の全単語で展開し、大きな誤差を起こす近似変形を避けることに成功している。また、同時に、計算のしにくい項を分母と分子とでうまくキャンセルさせることに成功している。更に、文献３に記載されている拡張事後確率比では、注目している単語候補以外の場所にも何らかの文字が書かれていることを考慮することにより、近似誤差を低減している。何らかの文字が書かれていることを考慮することにより、計算利便性を落とす項が発生する。しかし、正規化事後確率比と同様に分母の計算を単語辞書内の全単語で展開することにより、計算し難い項が分母と分子とでキャンセルされている。 Next, a method for calculating the posterior probability ratio in the address information recognition process will be described.
In the address information recognition process, xj includes a plurality of character recognition results in word candidates. For this reason, it is often not easy to calculate the posterior probability ratio P (xj | nj) / P (xj) in equation (3). In the above documents 1, 2, and 3, several formulas for calculating the posterior probability ratio are proposed. They are organized by paying attention to the denominator P (xj) of the posterior probability ratio. First, in Document 1, all character recognition results of each character candidate included in xj are approximated as independent. This may cause a large error and may make calculation difficult due to a remaining character candidate structure (described later) or a path selection term. The normalized posterior probability ratio described in Document 2 is a large error because the denominator calculation is expanded for all words in the word dictionary by transforming the denominator to P (xj) = Σ _k P (xj, wk). It has succeeded in avoiding the approximate deformation that causes At the same time, we succeeded in successfully canceling difficult-to-calculate terms with the denominator and numerator. Furthermore, in the extended posterior probability ratio described in Document 3, the approximation error is reduced by considering that some characters are written in places other than the word candidate of interest. Considering that some characters are written, terms that reduce the convenience of calculation occur. However, as with the normalized posterior probability ratio, the denominator calculation is expanded with all the words in the word dictionary, so that the difficult-to-calculate terms are canceled with the denominator and the numerator.

ただし、探索時にも事後確率比を用いることができるようにするためには、全単語の処理を必要とせず、注目する１つの単語のみの処理で計算できるのが望ましい。そこで、本実施の形態では、以下の式展開により、分母の計算を単語辞書内の全単語に展開せず、「何らかの文字が書かれている」とする。これにより、拡張事後確率比の利点、すなわち、文字認識結果を全て独立とするよりも近似精度が高まるという利点がある上に、計算が困難な項のキャンセルもできていることを示す。 However, in order to be able to use the posterior probability ratio at the time of searching, it is desirable that calculation is possible by processing only one word of interest without processing all words. Therefore, in the present embodiment, it is assumed that the calculation of the denominator is not expanded to all the words in the word dictionary and “some character is written” by the following expression expansion. This shows that there is an advantage of the extended posterior probability ratio, that is, there is an advantage that the approximation accuracy is higher than when all the character recognition results are independent, and a term that is difficult to calculate can also be canceled.

ノードｎｉに対応する単語候補をＬｉ、単語をｗｉとする。ノードｎｉを探索することが、単語候補Ｌｉに対し単語ｗｉをマッチングすることに相当する。単語候補Ｌｉ内の全文字候補の集合をＡｉ、各文字候補をａ∈Ａｉとする。 The word candidate corresponding to the node ni is Li, and the word is wi. Searching for node ni corresponds to matching word wi with word candidate Li. Ai is a set of all character candidates in the word candidate Li, and each character candidate is aεAi.

図７は、文字候補の例を示す図である。図７に示すような構造は、ラティス構造と呼ばれる。つまり、ここでは、文字候補が、図７のようなラティス構造を取っているものとする。また、以下の説明では、ａｉ、ｒｉ、Ｌｉ、Ｓｉ、ｘｉ、ｗｉなどを以下のように定義するものとする。文字候補ａｉの文字認識結果をｒｉとする。単語候補Ｌｉ内の全文字認識結果をｒｉ、文字候補構造をＳｉとする。文字候補の構造とは、文字候補同士の隣接情報、あるいは、文字候補数などの文字認識結果以外の情報を指す。上述した処理結果ｘｉを、ｘｉ＝（ｒｉ、Ｓｉ）と定義する。単語ｗｉのｊ番目の文字をｃｉｊ∈Ｃ（Ｃはアルファベットの集合）とする。Ｃ＊を任意文字列とする。 FIG. 7 is a diagram illustrating an example of character candidates. The structure shown in FIG. 7 is called a lattice structure. That is, here, it is assumed that the character candidates have a lattice structure as shown in FIG. In the following description, ai, ri, Li, Si, xi, wi, and the like are defined as follows. Let ri be the character recognition result of the character candidate ai. Assume that all character recognition results in the word candidate Li are ri and the character candidate structure is Si. The character candidate structure refers to information other than character recognition results such as adjacent information between character candidates or the number of character candidates. The processing result xi described above is defined as xi = (ri, Si). The jth character of the word wi is assumed to be cijεC (C is a set of alphabets). Let C * be an arbitrary character string.

単語候補Ｌｉ内の左端から右端に至る全経路の集合をＦｉ＝｛ｆｐ｝、ｐ＝１、２、…、経路をｆｐ＝（ａｆ１ｐ、ａｆ２ｐ、…）、ａｆｊｐ∈Ａｉとする。ａｆ（ｊ＋１）ｐは、ａｆｊｐの右側に隣接して位置するものとする。図７には、太線で経路ｆｐなどの例を示している。経路ｆｐ上の文字候補の集合をＥ´ｐ＝｛ａｆｊｐ｝、ｊ＝１、２、…、Ｅ´ｐに含まれないがＡｉに含まれる文字候補の集合をＥｐとする。Ｅｐ∩Ｅ´ｐ＝φ、Ｅｐ∪Ｅ´ｐ＝Ａｉである。事後確率比Ｐ（ｘｊ｜ｎｊ）／Ｐ（ｘｊ）を以下の通り変形する。

Assume that the set of all routes from the left end to the right end in the word candidate Li is Fi = {fp}, p = 1, 2,..., The route is fp = (af1p, af2p,...), And afjpεAi. af (j + 1) p is located adjacent to the right side of afjp. FIG. 7 shows an example of the route fp with a bold line. A set of character candidates on the path fp is E′p = {afjp}, j = 1, 2,..., E′p but a set of character candidates included in Ai is Ep. Ep∩E′p = φ and Ep∪E′p = Ai. The posterior probability ratio P (xj | nj) / P (xj) is modified as follows.

式（４）左辺の分子Ｐ（ｘｊ｜ｎｊ）は、その親ノードを辿って得られるノードをｎｐ１、ｎｐ２、…とすると、Ｐ（ｘｊ｜Ｌｊ、ｗｊ、Ｌｐ１、ｗｐ１、Ｌｐ２、ｗｐ２、…）という意味になる。ｘｊは単語候補Ｌｊに関する情報以外の影響を受けないとする近似を行うと、Ｐ（ｘｊ｜Ｌｊ、ｗｊ）となる。
式（４）の分母において、Ｐ（ｘｊ）≒Ｐ（ｘｊ｜Ｌｊ、Ｃ＊）としているのは、先に述べたとおり、どの単語候補にも何らかの文字が書かれているとする近似であるからである。
式（６）では、確率が最大となる経路の確率に比べ、その他の経路の確率は無視できるとする近似を用いている。続いて以下の近似を行う。

The numerator P (xj | nj) on the left side of the equation (4) is P (xj | Lj, wj, Lp1, wp1, Lp2, wp2,..., Where the nodes obtained by tracing its parent node are np1, np2,. ). When approximation is performed that xj is not affected by information other than information related to the word candidate Lj, P (xj | Lj, wj) is obtained.
In the denominator of Expression (4), P (xj) ≈P (xj | Lj, C *) is an approximation that some character is written in any word candidate as described above. Because.
Equation (6) uses an approximation that the probability of other routes is negligible compared to the probability of the route having the maximum probability. Subsequently, the following approximation is performed.

ここで、Ｋｊはｐによらない定数とする。この近似は、どのパスが正解である確率も同様に確からしい、とする近似である。 Here, Kj is a constant not depending on p. This approximation is an approximation that the probability that any path is correct is equally likely.

式（７）、式（８）を用いることにより、式（６）は以下のように計算される。（どのｆｐでも単語ｗｊと長さの一致しない場合、式（６）は０となる。以後の計算はそれ以外の場合の計算とする。）

By using the equations (7) and (8), the equation (6) is calculated as follows. (If the length does not coincide with the word wj in any fp, the expression (6) becomes 0. The following calculation is performed in other cases.)

ただし、ｍａｔｃｈ（）は、以下の通りに定義している。

However, match () is defined as follows.

式（１０）では、各文字認識結果が互いに独立であるとする近似を用いている。式（１０）から式（１１）への変形では、ｆｐに依存しない値

Equation (10) uses an approximation that the character recognition results are independent of each other. In the transformation from Equation (10) to Equation (11), a value that does not depend on fp

で分母分子を割っている。また、式（９）において、計算の困難な文字候補構造Ｓｊや経路ｆｐに関する項がキャンセルできていることに注意する。 The denominator is divided by Note that in Equation (9), terms relating to the character candidate structure Sj and the path fp that are difficult to calculate can be canceled.

式（１１）が本実施の形態で提案する事後確率比の計算式である。式（１１）では、拡張事後確率比と同じアイデアによる近似精度の向上を行いつつ、余計な項のキャンセルもできている。更に、注目する一単語ｗｊのみの処理で計算可能であり、探索時にも用いることができる。 Equation (11) is a formula for calculating the posterior probability ratio proposed in the present embodiment. In equation (11), the approximation accuracy is improved by the same idea as the extended posterior probability ratio, and an extra term can be canceled. Furthermore, it can be calculated by processing only the one word wj of interest, and can also be used at the time of search.

上述したように、本実施の形態に係るパターン認識方法では、複数の段階からなるパターン認識処理の一部を探索問題と捉えている。上記パターン認識方法では、各段階で得られる各候補をそれぞれノードとし、得られたノードを事後確率に基づいて選択的に処理する。これにより、上記パターン認識方法では、複数段階からなるパターン認識処理を効率的かつ高速化に実行することができる。 As described above, in the pattern recognition method according to the present embodiment, a part of pattern recognition processing including a plurality of stages is regarded as a search problem. In the pattern recognition method, each candidate obtained at each stage is set as a node, and the obtained node is selectively processed based on the posterior probability. Thereby, in the said pattern recognition method, the pattern recognition process which consists of multiple steps can be performed efficiently and at high speed.

また、上記パターン認識方法では、処理対象とする各段のノードを選択する場合、最良優先探索などの手法が適用可能である。最良優先探索では、異なる段階のノードを比較する必要がある。そのための評価値として、上記パターン認識方法では、事後確率が用いられる。事後確率は、各ノードにおける事後確率比（事前確率と事後確率の比）の積により算出される。特に、住所情報の認識に特化した場合、探索時にも計算可能な形で各ノードの事後確率比が計算される。 In the pattern recognition method, when selecting a node at each stage to be processed, a technique such as best priority search can be applied. In the best priority search, it is necessary to compare nodes at different stages. As an evaluation value for that, the posterior probability is used in the pattern recognition method. The posterior probability is calculated by the product of the posterior probability ratio (ratio of prior probability and posterior probability) at each node. In particular, when specializing in address information recognition, the posterior probability ratio of each node is calculated in a form that can be calculated even during a search.

以下、上述したようなパターン認識方法の適用例について説明する。
図８は、上述したようなパターン認識方法によるパターン認識機能を有する情報処理装置１１の構成例を示す図である。
図８に示す例では、情報処理装置１１には、画像入力装置１２が接続されている。画像入力装置１２は、スキャナあるいはカメラなどにより構成される。上記画像入力装置１２は、情報処理装置１１により処理すべき画像を取得するものである。たとえば、上記画像入力装置１２は、スキャナあるいはカメラなどにより構成される。上記画像入力装置１２は、取得した画像情報を情報処理装置１１へ供給するようになっている。なお、上記画像入力装置１２は、記録媒体に記憶された画像情報を読み出して、記録媒体から読み出した画像情報を上記情報処理装置１１へ供給するものであっても良い。 Hereinafter, application examples of the pattern recognition method as described above will be described.
FIG. 8 is a diagram illustrating a configuration example of the information processing apparatus 11 having a pattern recognition function according to the pattern recognition method as described above.
In the example illustrated in FIG. 8, an image input device 12 is connected to the information processing device 11. The image input device 12 is configured by a scanner or a camera. The image input device 12 acquires an image to be processed by the information processing device 11. For example, the image input device 12 includes a scanner or a camera. The image input device 12 supplies the acquired image information to the information processing device 11. The image input device 12 may read the image information stored in the recording medium and supply the image information read from the recording medium to the information processing device 11.

上記情報処理装置１１は、パターン認識装置あるいは文字認識装置として機能するものである。上記情報処理装置１１は、上記画像入力装置１２から供給される画像に含まれる所望の情報（たとえば、住所情報あるいは顔画像など）をパターン認識処理により認識するものである。 The information processing apparatus 11 functions as a pattern recognition apparatus or a character recognition apparatus. The information processing apparatus 11 recognizes desired information (for example, address information or a face image) included in the image supplied from the image input apparatus 12 by pattern recognition processing.

上記情報処理装置１１は、たとえば、コンピュータとして実現される。また、図８に示す構成例では、上記情報処理装置１１は、画像インターフェース（Ｉ／Ｆ）２１、プロセッサ２２、ワーキングメモリ２３、プログラムメモリ２４、データメモリ２５、出力インターフェース（Ｉ／Ｆ）２６などを有している。すなわち、上記情報処理装置１１は、たとえば、画像インターフェース２１および出力インターフェース２６としてのデータ入出力部と、プロセッサ２２としての制御部と、ワーキングメモリ２３、プログラムメモリ２４およびデータメモリ２５などの各種の記憶部などを有するコンピュータにより実現される。 The information processing apparatus 11 is realized as a computer, for example. In the configuration example shown in FIG. 8, the information processing apparatus 11 includes an image interface (I / F) 21, a processor 22, a working memory 23, a program memory 24, a data memory 25, an output interface (I / F) 26, and the like. have. That is, the information processing apparatus 11 includes, for example, a data input / output unit as the image interface 21 and the output interface 26, a control unit as the processor 22, a variety of storage such as the working memory 23, the program memory 24, and the data memory 25. This is realized by a computer having a unit.

上記画像インターフェース２１は、上記画像入力装置１２から供給される画像を取り込むためのインターフェースである。つまり、上記画像インターフェース２１は、パターン認識処理の対象となる画像を取得するためのインターフェースである。
図９は、上記画像インターフェース２１により取り込まれるパターン認識処理の対象となる画像の例を示す図である。図９に示す例は、複数層の情報からなる住所情報が記載された紙葉類の読取画像の例であるものとする。図９示す例では、パターン認識処理としての複数層の情報からなる住所情報の認識処理の対象となる画像の例を示している。 The image interface 21 is an interface for capturing an image supplied from the image input device 12. That is, the image interface 21 is an interface for acquiring an image to be subjected to pattern recognition processing.
FIG. 9 is a diagram showing an example of an image to be subjected to pattern recognition processing that is captured by the image interface 21. The example illustrated in FIG. 9 is an example of a read image of a paper sheet in which address information including information of a plurality of layers is described. In the example illustrated in FIG. 9, an example of an image that is a target of address information recognition processing including information of a plurality of layers as pattern recognition processing is illustrated.

上記プロセッサ２２は、当該情報処理装置１１における種々の処理機能を実行するものである。上記プロセッサ２２は、たとえば、ＣＰＵなどの演算ユニットにより構成される。上記プロセッサ２２は、上記プログラムメモリ２４あるいはデータメモリ２５に記憶されているプログラムを実行することにより、種々の処理機能を実現している。たとえば、上記プロセッサ２２は、プログラムを実行することにより実現される機能の１つとしてパターン認識処理を行うパターン認識部２２ａを有している。上記パターン認識部２２ａの構成例については、後で詳細に説明するものとする。 The processor 22 executes various processing functions in the information processing apparatus 11. The processor 22 is constituted by an arithmetic unit such as a CPU, for example. The processor 22 implements various processing functions by executing programs stored in the program memory 24 or the data memory 25. For example, the processor 22 includes a pattern recognition unit 22a that performs pattern recognition processing as one of functions realized by executing a program. A configuration example of the pattern recognition unit 22a will be described in detail later.

上記ワーキングメモリ２３は、一時的にデータを格納するための記憶部である。上記ワーキングメモリ２３は、たとえば、ＲＡＭ（ランダムアクセスメモリ）などにより構成される。上記プログラムメモリ２４は、制御プログラムおよび制御データなどが記憶されている記憶部である。上記プログラムメモリ２４は、たとえば、ＲＯＭ（リードオンリーメモリ）などにより構成される。上記データメモリ２５は、データを記憶するための大容量の記憶部である。上記データメモリ２５は、たとえば、ハードディスクドライブ（ＨＤＤ）などにより構成される。 The working memory 23 is a storage unit for temporarily storing data. The working memory 23 is composed of, for example, a RAM (random access memory). The program memory 24 is a storage unit that stores a control program, control data, and the like. The program memory 24 is composed of, for example, a ROM (read only memory). The data memory 25 is a large-capacity storage unit for storing data. The data memory 25 is constituted by, for example, a hard disk drive (HDD).

上記データメモリ２５には、バターン認識処理に用いられる辞書データベース２５ａが設けられている。たとえば、当該情報処理装置１１がパターン認識処理として住所情報を認識するものである場合、上記辞書データベース２５ａは、住所情報が格納される住所データベースとして構成される。当該情報処理装置１１がパターン認識処理として顔画像などの生体情報による個人認証を行うものである場合、上記辞書データベース２５ａは、登録者の生体情報が格納される生体情報データベースとして構成される。なお、ここでは、情報処理装置１１が住所情報を認識するものであること想定する。このため、上記辞書データベース２５ａは、住所データベースであるものとする。 The data memory 25 is provided with a dictionary database 25a used for pattern recognition processing. For example, when the information processing apparatus 11 recognizes address information as a pattern recognition process, the dictionary database 25a is configured as an address database in which address information is stored. When the information processing apparatus 11 performs personal authentication using biometric information such as a face image as a pattern recognition process, the dictionary database 25a is configured as a biometric information database in which registrant's biometric information is stored. Here, it is assumed that the information processing apparatus 11 recognizes address information. Therefore, it is assumed that the dictionary database 25a is an address database.

図１０は、辞書データベース２５ａとしての住所データベースの構成例を示す図である。図１０に示す構成例では、辞書データベース（住所データベース）２５ａには、複数階層（ＣＩＴＹ階層、ＳＴＲＥＥＴ階層、ＤＩＲＥＣＴＩＯＮ階層）の各単語からなる住所情報が記憶されている。つまり、辞書データベース２５ａには、各階層の各情報には、次の下階層の情報が従属するように記憶されている。 FIG. 10 is a diagram illustrating a configuration example of an address database as the dictionary database 25a. In the configuration example shown in FIG. 10, the dictionary database (address database) 25a stores address information including words of a plurality of layers (CITY layer, STREET layer, DIRECTION layer). That is, the dictionary database 25a stores the information of each lower layer so that the information of the next lower layer is subordinate to the information of each layer.

図１０に示す例では、ＣＩＴＹ階層には、「ＳＴＯＣＫＨＯＬＭ」（単語Ｄ１）、「ＧＯＴＥＢＯＲＧ」（単語Ｄ２）、「ＡＢＣＤＥ」（単語Ｄ３）、…などの単語が存在し、ＳＴＲＥＥＴ階層には、「ＡＧＡＴＡＮ」（単語Ｄ４）、「ＴＯＳＨＩＢＡ」（単語Ｄ５）、「ＢＧＡＴＡＮ」（単語Ｄ６）、…などの単語が存在し、ＤＩＲＥＣＴＩＯＮ階層には、「ＥＡＳＴ」、「ＷＥＳＴ」、「ＮＯＲＴＨ」などの単語が存在している。さらに、図１０に示す例では、ＣＩＴＹ階層の「ＳＴＯＣＫＨＯＬＭ」（単語Ｄ１）という１つの単語には、ＳＴＲＥＥＴ階層の「ＡＧＡＴＡＮ」（単語Ｄ４）と「ＴＯＳＨＩＢＡ」（単語Ｄ５）という２つの単語が従属している。 In the example illustrated in FIG. 10, words such as “STOCKHOLM” (word D1), “GOTEBORG” (word D2), “ABCDE” (word D3),... Exist in the CITY hierarchy, and “STREET hierarchy includes“ There are words such as “AGATAN” (word D4), “TOSHIBA” (word D5), “BGATAN” (word D6),..., And words such as “EAST”, “WEST”, “NORTH”, etc. in the DIRECTION hierarchy. Is present. Further, in the example shown in FIG. 10, one word “STOCKHOLM” (word D1) in the CITY hierarchy is subordinate to two words “AGATAN” (word D4) and “TOSHIBA” (word D5) in the STREET hierarchy. doing.

上記出力インターフェース２６は、上記プロセッサ２２により得られた情報などを外部へ出力するためのインターフェースである。たとえば、上記プロセッサ２２内のパターン認識処理により得られた認識結果は、上記出力インターフェース２６により外部へ出力されるようになっている。 The output interface 26 is an interface for outputting information obtained by the processor 22 to the outside. For example, the recognition result obtained by the pattern recognition process in the processor 22 is output to the outside by the output interface 26.

次に、上記パターン認識部２２ａの構成について説明する。
図１１は、上記パターン認識部２２ａの構成例を示す図である。なお、以下の説明では、主として、当該情報処理装置１１が住所情報などの複数階層の情報からなる文字情報を認識するものであることを想定している。
上記パターン認識部２２ａは、認識制御部３０、候補抽出部３１、ノード展開部３２、評価値算出部３３、ノード選別部３４、決定部３５などにより構成されている。上述したように、上記パターン認識部２２ａは、プロセッサ２２がプログラムを実行することにより実現される機能である。つまり、上記候補抽出部３１、ノード展開部３２、評価値算出部３３、ノード選別部３４、決定部３５も、プロセッサ２２がプログラムが実行することにより実現される機能である。 Next, the configuration of the pattern recognition unit 22a will be described.
FIG. 11 is a diagram illustrating a configuration example of the pattern recognition unit 22a. In the following description, it is mainly assumed that the information processing apparatus 11 recognizes character information including information of multiple layers such as address information.
The pattern recognition unit 22a includes a recognition control unit 30, a candidate extraction unit 31, a node expansion unit 32, an evaluation value calculation unit 33, a node selection unit 34, a determination unit 35, and the like. As described above, the pattern recognition unit 22a is a function realized by the processor 22 executing a program. That is, the candidate extraction unit 31, the node expansion unit 32, the evaluation value calculation unit 33, the node selection unit 34, and the determination unit 35 are also functions realized by the processor 22 executing the program.

上記認識制御部３０は、パターン認識部２２ａにおけるパターン認識処理全体を制御する機能を司るものである。上記候補抽出部３１は、上記画像インターフェース２１を介して上記画像入力装置１２から供給された入力画像から認識結果における各階層の候補となる情報を抽出するものである。たとえば、当該情報処理装置１１が住所情報などの複数階層の情報からなる文字情報を認識するものである場合、上記候補抽出部３１は、入力画像から各階層の単語候補を抽出する処理を行う。
図１２は、図９に示す入力画像から抽出される単語候補の例を示す図である。図１２に示す例では、アルファベットを認識対象の住所情報を構成する文字として想定している。このため、図１２に示す例では、７つの単語候補が抽出されている。なお、図１２に示す７つの単語候補は、当該画像における位置情報により示される位置Ｐ１〜Ｐ７に存在しているものとする。 The recognition control unit 30 controls a function of controlling the entire pattern recognition process in the pattern recognition unit 22a. The candidate extraction unit 31 extracts information that is a candidate for each layer in the recognition result from the input image supplied from the image input device 12 via the image interface 21. For example, when the information processing apparatus 11 recognizes character information including information of a plurality of layers such as address information, the candidate extraction unit 31 performs a process of extracting word candidates of each layer from the input image.
12 is a diagram showing an example of word candidates extracted from the input image shown in FIG. In the example illustrated in FIG. 12, alphabets are assumed as characters constituting the address information to be recognized. For this reason, in the example shown in FIG. 12, seven word candidates are extracted. It is assumed that the seven word candidates shown in FIG. 12 exist at positions P1 to P7 indicated by position information in the image.

上記ノード展開部３２は、上記候補抽出部３１により抽出された候補に対する探索木を構成するための各ノードを生成するものである。上記ノード展開部３２は、各ノードに属する次の階層のノードを得る処理である。つまり、上記ノード展開部３２では、ある階層の各候補に対して次の階層の候補となり得る全ての候補を選出することにより、複数階層の各ノードからなる探索木を生成する。 The node expansion unit 32 generates each node for constructing a search tree for the candidates extracted by the candidate extraction unit 31. The node expansion unit 32 is a process for obtaining a next hierarchical node belonging to each node. That is, the node expansion unit 32 generates a search tree composed of nodes in a plurality of hierarchies by selecting all candidates that can be candidates for the next hierarchy for each candidate in a certain hierarchy.

例えば、図１２に示す位置Ｐ６の単語候補が図１０に示す住所データベース２５ａの単語Ｄ１であることを示すノードを（Ｄ１、Ｐ６）と表すものとする。ここで、図１０に示す住所データベース２５ａでは、単語Ｄ１（「ＣＩＴＹ」階層の「ＳＴＯＣＫＨＯＬＭ」）には、単語Ｄ４（「ＳＴＲＥＥＴ」階層の「ＡＧＡＴＡＮ」）と単語Ｄ５（「ＳＴＲＥＥＴ」階層の「ＴＯＳＨＩＢＡ」）とが属している。また、位置Ｐ６がＣＩＴＹ階層の情報である場合、ＳＴＲＥＥＴ階層は、図１２に示す入力画像において、位置Ｐ７あるいは位置Ｐ３の何れかであることが、当該住所情報の記載順序（各階層の情報の表記上のルール）により判別可能であるものとする。これらの状況に従って、上記ノード展開部３２は、ノード（Ｄ１、Ｐ６）に属するノードとして、（Ｄ４、Ｐ７）、（Ｄ４、Ｐ３）、（Ｄ５、Ｐ７）、（Ｄ５、Ｐ３）の４つのノードを展開する。 For example, a node indicating that the word candidate at the position P6 shown in FIG. 12 is the word D1 in the address database 25a shown in FIG. 10 is represented as (D1, P6). Here, in the address database 25a shown in FIG. 10, the word D1 (“STOCKHOLM” in the “CITY” hierarchy) includes the word D4 (“AGATAN” in the “STREET” hierarchy) and the word D5 (“TOSHIBA” in the “STREET” hierarchy). ]) Belongs to. If the position P6 is information on the CITY hierarchy, the STREET hierarchy is either the position P7 or the position P3 in the input image shown in FIG. It can be discriminated by the rules of notation. According to these situations, the node expansion unit 32 has four nodes (D4, P7), (D4, P3), (D5, P7), and (D5, P3) as nodes belonging to the node (D1, P6). Expand.

上記評価値算出部３３では、上記ノード展開部３２により生成された各ノードの評価値を算出するものである。たとえば、上記評価値算出部３３では、各ノードとしての各候補に対する認識処理を実行することにより、各ノードの評価値を算出する。本実施形態では、上記評価値算出部３３は、上述した手法により算出される事後確率を評価値として算出するものとする。 The evaluation value calculation unit 33 calculates the evaluation value of each node generated by the node expansion unit 32. For example, the evaluation value calculation unit 33 calculates the evaluation value of each node by executing recognition processing for each candidate as each node. In the present embodiment, the evaluation value calculation unit 33 calculates the posterior probability calculated by the above-described method as the evaluation value.

上記ノード選別部３４は、各ノードのうち最終的に評価すべきノードを選別するものである。上記ノード選別部３４では、上記評価値算出部３３により算出された評価値に基づいて各ノードを最終的に評価すべきノードとするか否かを判定する。たとえば、上記ノード選別部３４は、各階層ごとに上記評価値算出部３３により算出された評価値が高い順に所定数（つまり、上位Ｎ個）のノードを選出する。また、探索手法として最良優先探索が適用される場合、上記ノード選別部３４は、各階層に関わらずに、後述する複数の終端ノードから最も評価値の高いものを選択する。 The node selection unit 34 selects a node to be finally evaluated among the nodes. Based on the evaluation value calculated by the evaluation value calculation unit 33, the node selection unit 34 determines whether or not each node is to be a node to be finally evaluated. For example, the node selection unit 34 selects a predetermined number (ie, top N) nodes in descending order of evaluation values calculated by the evaluation value calculation unit 33 for each layer. Further, when the best priority search is applied as the search method, the node selection unit 34 selects the one having the highest evaluation value from a plurality of terminal nodes to be described later regardless of each hierarchy.

上記決定部３５は、一連のパターン認識処理としての最終的な認識結果を決定するものである。上記決定部３５は、上記ノード選別部３４により最終階層のノードが選別された場合、それらの最終階層のノードに基づいて最終的な認識結果を決定する。たとえば、上記ノード選別部３４により最終階層のノードが複数得られた場合、上記決定部３５は、最大評価値となるノードに基づく認識結果を最終的な認識結果として出力する。また、上記決定部３５は、上記ノード選別部３４により得られた最終階層のノードのうち所定値以上の評価値となる各ノードに基づく各認識結果（１つ又は複数の認識結果）を最終的な認識結果として出力するようにしても良い。さらに、上記決定部３５は、上記ノード選別部３４により得られた最終階層のノードの評価値が所定値未満である場合、最終的な認識結果として当該パターンが認識不能であったことを出力するようにしても良い。 The determination unit 35 determines a final recognition result as a series of pattern recognition processes. When the node of the final hierarchy is selected by the node selection unit 34, the determination unit 35 determines the final recognition result based on the nodes of the final hierarchy. For example, when a plurality of nodes in the final hierarchy are obtained by the node selection unit 34, the determination unit 35 outputs a recognition result based on the node having the maximum evaluation value as a final recognition result. In addition, the determination unit 35 finally determines each recognition result (one or more recognition results) based on each node having an evaluation value equal to or higher than a predetermined value among the nodes in the final hierarchy obtained by the node selection unit 34. May be output as a recognition result. Furthermore, when the evaluation value of the node in the final hierarchy obtained by the node selection unit 34 is less than a predetermined value, the determination unit 35 outputs that the pattern cannot be recognized as a final recognition result. You may do it.

次に、上記のように構成される情報処理装置１１におけるパターン認識処理の第１、第２、第３の処理例について説明する。
まず、上記情報処理装置１１におけるパターン認識処理の第１の処理例について説明する。
図１３は、パターン認識処理の第１の処理例としての処理の流れを説明するためのフローチャートである。
まず、上記画像入力装置１２から供給されるパターン認識処理の対象となる画像は、画像インターフェース２１により情報処理装置１１に取り込まれる（ステップＳ１０）。画像インターフェース２１によりパターン認識処理の対象となる画像が取り込まれると、プロセッサ２２は、パターン認識部２２ａによるパターン認識処理を開始する。すなわち、上記パターン認識部２２ａの認識制御部３０は、まず、上記候補抽出部３１により入力画像から単語候補を抽出する処理を実行する（ステップＳ１１）。たとえば、図９に示すような画像が与えられた場合、上記候補抽出部３１は、図１２に示すような単語候補を抽出する。この際、上記候補抽出部３１は、抽出した各単語候補を識別するための識別情報を付与するとともに、各単語候補の位置を示す情報を特定する。 Next, first, second, and third processing examples of pattern recognition processing in the information processing apparatus 11 configured as described above will be described.
First, a first processing example of pattern recognition processing in the information processing apparatus 11 will be described.
FIG. 13 is a flowchart for explaining the flow of processing as a first processing example of pattern recognition processing.
First, an image to be subjected to pattern recognition processing supplied from the image input device 12 is taken into the information processing device 11 by the image interface 21 (step S10). When an image to be subjected to pattern recognition processing is captured by the image interface 21, the processor 22 starts pattern recognition processing by the pattern recognition unit 22a. That is, the recognition control unit 30 of the pattern recognition unit 22a first executes a process of extracting word candidates from the input image by the candidate extraction unit 31 (step S11). For example, when an image as shown in FIG. 9 is given, the candidate extraction unit 31 extracts word candidates as shown in FIG. At this time, the candidate extraction unit 31 provides identification information for identifying each extracted word candidate and specifies information indicating the position of each word candidate.

入力画像から単語候補が抽出されると、上記認識制御部３０は、抽出された各単語候補に対する探索処理を開始する。まず、上記認識制御部３０は、探索木のルートノードを設定する処理を行う。これは、バッファ上において探索木の生成を開始することに相当する。すなわち、上記認識制御部３０は、ワーキングメモリ２３などに設けられたバッファにルートノードを格納する（ステップＳ１２）。ルートノードを設定すると、上記認識制御部３０は、処理中の階層を示す変数Ｌに初期値としての「Ｌ＝１」を設定する（ステップＳ１３）。 When word candidates are extracted from the input image, the recognition control unit 30 starts a search process for each extracted word candidate. First, the recognition control unit 30 performs processing for setting a root node of a search tree. This corresponds to starting generation of a search tree on the buffer. That is, the recognition control unit 30 stores the root node in a buffer provided in the working memory 23 or the like (step S12). When the root node is set, the recognition control unit 30 sets “L = 1” as an initial value to the variable L indicating the hierarchy being processed (step S13).

変数Ｌが設定されると、上記認識制御部３０は、Ｌ階層の各ノードを生成し、各ノードの評価値を算出する処理を行う（ステップＳ１４〜Ｓ１７）。すなわち、上記認識制御部３０は、既にバッファに格納されているノードを１つ取り出す（ステップＳ１４）。たとえば、Ｌ＝１の場合、認識制御部３０は、バッファからルートノードを取り出す。また、Ｌ＝２の場合、認識制御部３０は、バッファに格納されている第１階層の各ノードを順次取り出す。 When the variable L is set, the recognition control unit 30 performs processing for generating each node of the L hierarchy and calculating an evaluation value of each node (steps S14 to S17). That is, the recognition control unit 30 takes out one node already stored in the buffer (step S14). For example, when L = 1, the recognition control unit 30 extracts the root node from the buffer. In addition, when L = 2, the recognition control unit 30 sequentially extracts each node of the first hierarchy stored in the buffer.

上記バッファからノードが１つ取り出されると（ステップＳ１４）、上記認識制御部３０は、上記ノード展開部３２により取り出したノードに属する１つ下の階層（Ｌ階層）の各ノードを得るノード展開処理を行う（ステップＳ１５）。例えば、Ｌ＝１の場合、上記ノード展開部３２は、上記候補抽出部３１により抽出された候補のうち第１階層の候補に相当する各ノードをルートノードに従属する各ノードとしてバッファに格納する。また、図１２に示す単語候補の抽出例に対して、上記ステップ１４で取り出されたノードが（Ｄ１、Ｐ６）である場合、上記ノード展開部３２は、上述したように、（Ｄ４、Ｐ７）、（Ｄ４、Ｐ３）、（Ｄ５、Ｐ７）、（Ｄ５、Ｐ３）の４通りのノードを展開する。 When one node is extracted from the buffer (step S14), the recognition control unit 30 obtains each node of the next lower layer (L layer) belonging to the node extracted by the node expansion unit 32. Is performed (step S15). For example, when L = 1, the node expansion unit 32 stores, in the buffer, each node corresponding to the first layer candidate among the candidates extracted by the candidate extraction unit 31 as each node subordinate to the root node. . Further, in the example of extracting word candidates shown in FIG. 12, when the node extracted in step 14 is (D1, P6), the node expansion unit 32, as described above, (D4, P7). , (D4, P3), (D5, P7), and (D5, P3).

上記ノード展開部３２により取り出したノードに属するＬ階層の各ノードが得られた場合、上記評価値算出部３３は、得られた各ノードに対する評価値を計算する（ステップＳ１６）。ここでは、上述した計算式により、各ノードに対する認識処理によって近似的に得られる事後確率が評価値として算出されるものとする。 When each node of the L layer belonging to the node extracted by the node expansion unit 32 is obtained, the evaluation value calculation unit 33 calculates an evaluation value for each obtained node (step S16). Here, it is assumed that the posterior probability approximately obtained by the recognition processing for each node is calculated as the evaluation value by the above-described calculation formula.

上記評価値算出部３３により展開された各ノードの評価値が算出されると、上記認識制御部３０は、バッファに未処理のノードが存在するか否かを判断する（ステップＳ１７）。つまり、上記認識制御部３０は、Ｌ階層のノードに対して親ノードとなりうるノードのうち未処理のノードが存在するか否かを判断する。上記バッファに未処理のノードが存在すると判断した場合（ステップＳ１７、ＮＯ）、上記認識制御部３０は、上記ステップＳ１４へ戻り、次のノードをバッファから取り出すことにより、上記ステップＳ１４〜Ｓ１６の処理を繰り返し実行する。 When the evaluation value of each node developed by the evaluation value calculation unit 33 is calculated, the recognition control unit 30 determines whether or not an unprocessed node exists in the buffer (step S17). That is, the recognition control unit 30 determines whether or not there is an unprocessed node among nodes that can become a parent node with respect to a node in the L hierarchy. When it is determined that there is an unprocessed node in the buffer (step S17, NO), the recognition control unit 30 returns to step S14 and takes out the next node from the buffer, thereby performing the processes in steps S14 to S16. Repeatedly.

また、上記バッファに未処理のノードが存在しないと判断した場合（ステップＳ１７、ＹＥＳ）、上記認識制御部３０は、上記ノード選別部３４により上記ステップ１６で計算された各ノードの評価値順に上位Ｎ個のノードを選択し、バッファに格納する（ステップＳ１８）。つまり、上記ステップＳ１８の処理では、上記ノード選別部３４により選別された各ノード（評価値が上位Ｎのノード）がＬ階層のノード（つまり、Ｌ階層の候補）として得られる。ここで、上記ノード選別部３４が評価値順に選出するノードの数は、正解となる候補が破棄されしまうことがないような値が設定される。ただし、選出するノードの数が多くなればなるほど、処理の速度が低下する。従って、選出するノードの数は、認識対象とするパターンの性質および必要とされる処理時間などの運用形態に応じて適宜設定すべきものである。 If it is determined that there is no unprocessed node in the buffer (step S17, YES), the recognition control unit 30 is ranked in the order of the evaluation value of each node calculated in step 16 by the node selection unit 34. N nodes are selected and stored in the buffer (step S18). That is, in the process of step S18, each node selected by the node selection unit 34 (node with the highest evaluation value) is obtained as an L-layer node (that is, an L-layer candidate). Here, the number of nodes selected by the node selection unit 34 in the order of evaluation values is set to a value that does not cause a candidate to be correct to be discarded. However, as the number of nodes to be selected increases, the processing speed decreases. Therefore, the number of nodes to be selected should be appropriately set according to the operation mode such as the nature of the pattern to be recognized and the required processing time.

上位Ｎ個のノードをＬ階層のノードとしてバッファに格納すると、上記認識制御部３０は、Ｌ階層が当該パターン認識における最終階層であるか否かを判断する。この判断によりＬ階層が最終階層でないと判断した場合（ステップＳ１９、ＮＯ）、上記認識制御部３０は、上記変数Ｌを「Ｌ＝Ｌ＋１」に更新することにより、変数Ｌをインクリメントする（ステップＳ２０）。変更Ｌをインクリメントした場合、上記認識制御部３０は、上記ステップＳ１４へ進み、更新されたＬ階層の各ノードに対する処理を行う。 When the top N nodes are stored in the buffer as nodes in the L layer, the recognition control unit 30 determines whether the L layer is the final layer in the pattern recognition. If it is determined that the L layer is not the final layer (NO in step S19), the recognition control unit 30 increments the variable L by updating the variable L to “L = L + 1” (step S20). ). When the change L is incremented, the recognition control unit 30 proceeds to step S14 and performs processing for each updated node in the L hierarchy.

また、上記判断によりＬ階層が最終階層であると判断した場合（ステップＳ１９、ＹＥＳ）、上記認識制御部３０は、上記決定部３５により最終的な認識結果を決定する処理を行う（ステップＳ２１）。この場合、上記決定部３５では、各ノードの評価値に基づいて当該パターン認識処理の最終的な認識結果を決定する。たとえば、所定の閾値に基づいて各ノードの評価値を評価する場合、上記決定部３５は、所定の閾値以上の評価値となった各ノードにより特定される各候補を認識結果として出力する。また、パターン認識処理の認識結果を一意に決定する場合、上記決定部３５は、最大評価値となるノード（あるいは所定の閾値以上でかつ最大評価値となるノード）により特定される候補を認識結果として出力する。 If it is determined by the above determination that the L layer is the last layer (step S19, YES), the recognition control unit 30 performs a process of determining a final recognition result by the determination unit 35 (step S21). . In this case, the determination unit 35 determines the final recognition result of the pattern recognition process based on the evaluation value of each node. For example, when evaluating the evaluation value of each node based on a predetermined threshold, the determination unit 35 outputs each candidate specified by each node having an evaluation value equal to or higher than the predetermined threshold as a recognition result. When the recognition result of the pattern recognition process is uniquely determined, the determination unit 35 recognizes the candidate specified by the node that is the maximum evaluation value (or the node that is the predetermined evaluation threshold and the maximum evaluation value). Output as.

上記のように、第１の処理例では、複数段階からなるパターン認識処理において、各階層ごとに評価値が上位Ｎ個のノードに絞り込んで探索を行う。これにより、第１の処理例によれば、誤って正解候補が破棄される可能性が低く、かつ、高速に正解となる候補を選出することが可能となる。 As described above, in the first process example, in the pattern recognition process including a plurality of stages, the search is performed by narrowing down the evaluation value to the top N nodes for each layer. Thus, according to the first processing example, it is unlikely that the correct answer candidate is erroneously discarded, and it becomes possible to select a candidate that is correct at high speed.

次に、上記情報処理装置１１におけるパターン認識処理の第２の処理例について説明する。
本第２の処理例は、複数階層の候補から正解となる候補を探索する手法として最良優先探索を適用したパターン認識処理の例である。上述したように、最良優先探索では、異なる階層のノードについても、比較対象として、最も評価値が高いノードを優先的に処理してく探索手法である。また、ここでは、各ノードの評価値は、上述したような計算手法により算出されるものとする。 Next, a second processing example of the pattern recognition processing in the information processing apparatus 11 will be described.
The second processing example is an example of pattern recognition processing in which the best priority search is applied as a method for searching for a candidate that is a correct answer from a plurality of hierarchical candidates. As described above, the best priority search is a search method that preferentially processes a node having the highest evaluation value as a comparison target even for nodes in different hierarchies. Here, it is assumed that the evaluation value of each node is calculated by the calculation method as described above.

図１４は、上記情報処理装置１１におけるパターン認識処理の第２の処理例としての処理の流れを説明するためのフローチャートである。 FIG. 14 is a flowchart for explaining the flow of processing as a second processing example of pattern recognition processing in the information processing apparatus 11.

まず、上記画像入力装置１２から供給されるパターン認識処理の対象となる画像は、画像インターフェース２１により情報処理装置１１に取り込まれる（ステップＳ３０）。画像インターフェース２１によりパターン認識処理の対象となる画像が取り込まれると、プロセッサ２２は、パターン認識部２２ａによるパターン認識処理を開始する。すなわち、上記パターン認識部２２ａの認識制御部３０は、まず、上記候補抽出部３１により入力画像から単語候補を抽出する処理を実行する（ステップＳ３１）。 First, an image to be subjected to pattern recognition processing supplied from the image input device 12 is taken into the information processing device 11 by the image interface 21 (step S30). When an image to be subjected to pattern recognition processing is captured by the image interface 21, the processor 22 starts pattern recognition processing by the pattern recognition unit 22a. That is, the recognition control unit 30 of the pattern recognition unit 22a first executes a process of extracting word candidates from the input image by the candidate extraction unit 31 (step S31).

入力画像から単語候補が抽出されると、上記認識制御部３０は、抽出された各単語候補に対する探索処理を開始する。まず、上記認識制御部３０は、探索木のルートノードを設定する処理を行う。これは、バッファ上において探索木の生成を開始することに相当する。すなわち、上記認識制御部３０は、ワーキングメモリ２３などのバッファにルートノードを格納する（ステップＳ３２）。ルートノードを設定すると、上記認識制御部３０は、順次評価値が最大となるノードを探索していく処理を行う。 When word candidates are extracted from the input image, the recognition control unit 30 starts a search process for each extracted word candidate. First, the recognition control unit 30 performs processing for setting a root node of a search tree. This corresponds to starting generation of a search tree on the buffer. That is, the recognition control unit 30 stores the root node in a buffer such as the working memory 23 (step S32). When the root node is set, the recognition control unit 30 performs a process of sequentially searching for a node having the maximum evaluation value.

すなわち、上記認識制御部３０は、既にバッファに格納されているノードから評価値が最大となっているノードを１つ取り出す（ステップＳ３３）。なお、ルートノードのみがバッファに格納されている状態では、上記認識制御部３０は、バッファからルートノードを取り出す。また、バッファに複数のノードが格納されている場合、上記認識制御部３０は、各ノードの階層に係らずに、バッファに格納されている各ノードから評価値が最大となるノードを取り出す。 That is, the recognition control unit 30 takes out one node having the maximum evaluation value from the nodes already stored in the buffer (step S33). In the state where only the root node is stored in the buffer, the recognition control unit 30 takes out the root node from the buffer. When a plurality of nodes are stored in the buffer, the recognition control unit 30 takes out the node having the maximum evaluation value from each node stored in the buffer regardless of the hierarchy of each node.

バッファからノードを１つ取り出すと、上記認識制御部３０は、取り出したノードが終端ノードであるか否かを判断する（ステップＳ３４）。ここで、終端ノードとは、従属するノードが存在しないノードのことである。つまり、終端ノードは、ルートノードから構成される探索木において末端となるノードである。たとえば、図１５は、探索木の構成例を示す概念図である。図１５に示す例において、各白丸、黒丸は、探索における各状態を示すノードである。さらに、図１５に示す例では、黒丸は、下位にノードのない（従属するノードがない）終端ノードを示している。 When one node is extracted from the buffer, the recognition control unit 30 determines whether or not the extracted node is a terminal node (step S34). Here, the term “end node” refers to a node having no subordinate node. That is, the terminal node is a node that is a terminal in a search tree composed of root nodes. For example, FIG. 15 is a conceptual diagram illustrating a configuration example of a search tree. In the example shown in FIG. 15, each white circle and black circle are nodes indicating each state in the search. Further, in the example shown in FIG. 15, black circles indicate terminal nodes that have no lower-level nodes (no subordinate nodes).

上記判断により終端ノードでないと判断した場合（ステップＳ３４、ＮＯ）、上記認識制御部３０は、上記ノード展開部３２により取り出したノードに対するノード展開処理を行う（ステップＳ３５）。上述したように、ノード展開処理は、取り出したノードに属する１つ下の階層（Ｌ階層）の各ノードを得る処理である。なお、ルートノードが取り出された場合、上記ノード展開部３２は、上記候補抽出部３１により抽出された候補のうち第１階層の候補に相当する各ノードをルートノードに従属する各ノードとしてバッファに格納する。 If it is determined by the above determination that the node is not a terminal node (NO in step S34), the recognition control unit 30 performs node expansion processing on the node extracted by the node expansion unit 32 (step S35). As described above, the node expansion process is a process for obtaining each node in the next lower hierarchy (L hierarchy) belonging to the extracted node. When the root node is extracted, the node expansion unit 32 buffers each node corresponding to the first layer candidate among the candidates extracted by the candidate extraction unit 31 as each node subordinate to the root node. Store.

上記ノード展開部３２により取り出したノードに属する各ノードが得られた場合、上記評価値算出部３３は、得られた各ノードに対する評価値を計算する（ステップＳ３６）。ここでは、上述した計算手法により、各ノードに対する認識処理によって近似的に得られる事後確率が評価値として算出されるものとする。 When each node belonging to the node extracted by the node expansion unit 32 is obtained, the evaluation value calculation unit 33 calculates an evaluation value for each obtained node (step S36). Here, it is assumed that the posterior probability approximately obtained by the recognition process for each node is calculated as the evaluation value by the above-described calculation method.

上記評価値算出部３３により展開された各ノードの評価値が算出されると、上記認識制御部３０は、これらの各ノードと各評価値とを対応づけてバッファに格納する（ステップＳ３７）。得られた各ノードと各評価値とをバッファに格納すると、上記認識制御部３０は、上記ステップＳ３３へ戻り、上述した処理を繰り返し実行する。上記ステップＳ３３〜Ｓ３７の処理は、上記ステップＳ３３で取り出したノードが終端ノードと判断されるまで繰り返し実行される。この結果として、最大評価値となるノードが終端ノードに到達するまでの探索木が得られる。 When the evaluation value of each node developed by the evaluation value calculation unit 33 is calculated, the recognition control unit 30 associates each node with each evaluation value and stores them in the buffer (step S37). When the obtained nodes and evaluation values are stored in the buffer, the recognition control unit 30 returns to step S33 and repeatedly executes the above-described processing. The processes in steps S33 to S37 are repeatedly executed until the node extracted in step S33 is determined to be a terminal node. As a result, a search tree is obtained until the node having the maximum evaluation value reaches the terminal node.

すなわち、上記ステップＳ３４で取り出したノードが終端ノードであると判断した場合（ステップＳ３４、ＹＥＳ）、上記認識制御部３０は、当該ノードの評価値が所定の閾値以上であるか否かを上記決定部３５により判定する（ステップＳ３８）。この判定により当該ノードの評価値が所定の閾値未満であると判定された場合（ステップＳ３８、ＮＯ）、上記認識制御部３０は、上記バッファに当該ノード以外に未処理のノードが存在するか否かを判断する（ステップＳ３９）。 That is, when it is determined that the node extracted in step S34 is a terminal node (YES in step S34), the recognition control unit 30 determines whether the evaluation value of the node is equal to or greater than a predetermined threshold value. The determination is made by the unit 35 (step S38). If it is determined by this determination that the evaluation value of the node is less than the predetermined threshold (NO in step S38), the recognition control unit 30 determines whether there is an unprocessed node other than the node in the buffer. Is determined (step S39).

上記バッファに未処理のノードが存在すると判断した場合（ステップＳ３９、ＮＯ）、上記認識制御部３０は、上記ステップＳ３３へ戻り、当該ノード以外で未処理のノードから最大評価値となるノードを取り出すことにより、上記ステップＳ３３〜Ｓ３７の処理を実行する。また、上記バッファに未処理のノードが存在しないと判断した場合（ステップＳ３９、ＹＥＳ）、上記認識制御部３０は、評価値が所定の閾値以上となる候補が得られなかったものとして、処理を終了する。 If it is determined that there is an unprocessed node in the buffer (step S39, NO), the recognition control unit 30 returns to step S33, and extracts a node having the maximum evaluation value from the unprocessed nodes other than the node. As a result, the processes of steps S33 to S37 are executed. If it is determined that there is no unprocessed node in the buffer (step S39, YES), the recognition control unit 30 determines that no candidate whose evaluation value is equal to or greater than a predetermined threshold has not been obtained. finish.

また、上記ステップＳ３８の判定により当該ノードの評価値が所定の閾値以上であると判定された場合（ステップＳ３８、ＹＥＳ）、上記認識制御部３０は、上記決定部３５により当該ノードにより特定される各階層の候補で示されるパターンを最終的な認識結果として出力する（ステップＳ４０）。なお、第２の処理例としては、最終的な認識結果として所定の閾値以上の全ノードを出力するようにしても良い。これは、上記ステップＳ４０で最終的な認識結果が得られた場合であっても、上記ステップＳ３９へ進み、バッファに存在する未処理のノードに対して上記ステップＳ３３以降の処理を実行することにより実現可能である。これにより、最終的な認識結果としてノードの評価値が所定の閾値以上となる複数の認識結果を出力するようにできる。 Further, when it is determined by the determination in step S38 that the evaluation value of the node is equal to or greater than a predetermined threshold (YES in step S38), the recognition control unit 30 is specified by the node by the determination unit 35. The pattern indicated by each layer candidate is output as the final recognition result (step S40). As a second processing example, all nodes having a predetermined threshold value or more may be output as the final recognition result. This is because even if the final recognition result is obtained in step S40, the process proceeds to step S39, and the processes after step S33 are executed on the unprocessed nodes existing in the buffer. It is feasible. As a result, a plurality of recognition results whose node evaluation values are equal to or greater than a predetermined threshold can be output as final recognition results.

上記のように、第２の処理例では、複数段階からなるパターン認識処理において、最良優先探索により認識結果としての候補を絞り込み、絞り込んだ候補のうち所定の閾値以上の評価値となる候補に基づいて最終的な認識結果を得るようになっている。これにより、第２の処理例によれば、効率的に認識結果の候補を絞り込むことができ、絞り込んだ候補から最終的な認識結果を得ることが可能である。
また、第２の処理例に適用される最良優先探索では、必要がある。このため、第２の処理例では、評価値として、上述した計算式により近似的に算出される事後確率が用いられる。これにより、第２の処理例によれば、異なる段階のノードについても比較することが可能となり、最良優先探索を実現可能となっている。 As described above, in the second processing example, in the pattern recognition processing including a plurality of stages, the candidates as recognition results are narrowed down by the best priority search, and based on the candidates whose evaluation values are equal to or higher than a predetermined threshold among the narrowed candidates. As a result, the final recognition result is obtained. Thus, according to the second processing example, recognition result candidates can be narrowed down efficiently, and a final recognition result can be obtained from the narrowed down candidates.
Further, it is necessary for the best priority search applied to the second processing example. For this reason, in the second processing example, the posterior probability approximately calculated by the above-described calculation formula is used as the evaluation value. As a result, according to the second processing example, it is possible to compare the nodes at different stages, and the best priority search can be realized.

次に、上記情報処理装置１１におけるパターン認識処理の第３の処理例について説明する。
本第３の処理例は、第２の処理例と同様に、複数階層の候補から正解となる候補を探索する手法として最良優先探索を適用したパターン認識処理の例である。第３の処理例では、第２の処理例の変形例である。第３の処理例では、各ノードの評価値として、各ノードの事後確率を推定される処理時間で割った値が用いられるものである。 Next, a third processing example of the pattern recognition processing in the information processing apparatus 11 will be described.
Similar to the second processing example, the third processing example is an example of pattern recognition processing in which the best priority search is applied as a method for searching for a candidate that is a correct answer from a plurality of hierarchical candidates. The third processing example is a modification of the second processing example. In the third processing example, the value obtained by dividing the posterior probability of each node by the estimated processing time is used as the evaluation value of each node.

図１６は、上記情報処理装置１１におけるパターン認識処理の第３の処理例としての処理の流れを説明するためのフローチャートである。なお、図１６に示すステップＳ５０〜Ｓ６０は、それぞれ第２の処理例として説明した図１４に示すステップＳ３０〜Ｓ４０と同等な処理である。 FIG. 16 is a flowchart for explaining the flow of processing as a third processing example of the pattern recognition processing in the information processing apparatus 11. Note that steps S50 to S60 shown in FIG. 16 are equivalent to steps S30 to S40 shown in FIG. 14 described as the second processing example.

まず、上記画像入力装置１２から供給されるパターン認識処理の対象となる画像は、画像インターフェース２１により情報処理装置１１に取り込まれる（ステップＳ５０）。画像インターフェース２１によりパターン認識処理の対象となる画像が取り込まれると、プロセッサ２２は、パターン認識部２２ａによるパターン認識処理を開始する。すなわち、上記パターン認識部２２ａの認識制御部３０は、まず、上記候補抽出部３１により入力画像から単語候補を抽出する処理を実行する（ステップＳ５１）。 First, an image to be subjected to pattern recognition processing supplied from the image input device 12 is taken into the information processing device 11 by the image interface 21 (step S50). When an image to be subjected to pattern recognition processing is captured by the image interface 21, the processor 22 starts pattern recognition processing by the pattern recognition unit 22a. That is, the recognition control unit 30 of the pattern recognition unit 22a first executes a process of extracting word candidates from the input image by the candidate extraction unit 31 (step S51).

入力画像から単語候補が抽出されると、上記認識制御部３０は、抽出された各単語候補に対する探索処理を開始する。まず、上記認識制御部３０は、探索木のルートノードを設定する処理を行う。これは、バッファ上において探索木の生成を開始することに相当する。すなわち、上記認識制御部３０は、ワーキングメモリ２３などのバッファにルートノードを格納する（ステップＳ５２）。ルートノードを設定すると、上記認識制御部３０は、順次評価値が最大となるノードを探索していく処理を行う。 When word candidates are extracted from the input image, the recognition control unit 30 starts a search process for each extracted word candidate. First, the recognition control unit 30 performs processing for setting a root node of a search tree. This corresponds to starting generation of a search tree on the buffer. That is, the recognition control unit 30 stores the root node in a buffer such as the working memory 23 (step S52). When the root node is set, the recognition control unit 30 performs a process of sequentially searching for a node having the maximum evaluation value.

すなわち、上記認識制御部３０は、既にバッファに格納されているノードから評価値が最大となっているノードを１つ取り出す（ステップＳ５３）。バッファからノードを１つ取り出すと、上記認識制御部３０は、取り出したノードが終端ノードであるか否かを判断する（ステップＳ５４）。上記判断により終端ノードでないと判断した場合（ステップＳ５４、ＮＯ）、上記認識制御部３０は、上記ノード展開部３２により取り出したノードに対するノード展開処理を行う（ステップＳ５５）。 That is, the recognition control unit 30 takes out one node having the maximum evaluation value from the nodes already stored in the buffer (step S53). When one node is extracted from the buffer, the recognition control unit 30 determines whether or not the extracted node is a terminal node (step S54). If it is determined by the above determination that the node is not a terminal node (NO in step S54), the recognition control unit 30 performs node expansion processing on the node extracted by the node expansion unit 32 (step S55).

上記ノード展開部３２により取り出したノードに属する各ノードが得られた場合、上記評価値算出部３３は、得られた各ノードに対する評価値を計算する（ステップＳ５６）。この第３の処理例では、事後確率を推定処理時間で割った値を評価値として算出する。
すなわち、上記評価値算出部３３は、まず、上記ノード展開部３２により得られた各ノードに対して事後確率を算出する（ステップＳ６１）。各ノードの事後確率は、上述した計算手法により、各ノードに対する認識処理によって近似的に算出される。 When each node belonging to the node extracted by the node expansion unit 32 is obtained, the evaluation value calculation unit 33 calculates an evaluation value for each obtained node (step S56). In the third processing example, a value obtained by dividing the posterior probability by the estimated processing time is calculated as the evaluation value.
That is, the evaluation value calculation unit 33 first calculates a posteriori probability for each node obtained by the node expansion unit 32 (step S61). The posterior probability of each node is approximately calculated by the recognition process for each node by the calculation method described above.

各ノードの事後確率が算出されると、上記評価値算出部３３は、各ノードに関する処理に要する時間を推定する処理を行う（ステップＳ６２）。ここでは、各ノードの属する下位階層の認識処理に要する処理時間が推定されるものとする。このような処理時間は、たとえば、下位階層の単語の総文字数から想定する方法が考えられる。
各ノードの事後確率と推定処理時間とが得られると、上記評価値算出部３３は、各ノードごとに上記ステップＳ６１で算出された事後確率を上記ステップＳ６２で算出された推定処理時間で割った値を各ノードの評価値として算出する（ステップＳ６３）。 When the posterior probability of each node is calculated, the evaluation value calculation unit 33 performs a process of estimating the time required for the process related to each node (step S62). Here, it is assumed that the processing time required for the recognition processing of the lower hierarchy to which each node belongs is estimated. For example, a method of assuming such processing time based on the total number of characters of words in a lower layer can be considered.
When the posterior probability and estimated processing time of each node are obtained, the evaluation value calculation unit 33 divides the posterior probability calculated in step S61 for each node by the estimated processing time calculated in step S62. The value is calculated as an evaluation value for each node (step S63).

このような手法により上記評価値算出部３３で得られた各ノードの評価値が算出されると、上記認識制御部３０は、これらの各ノードと各評価値とを対応づけてバッファに格納する（ステップＳ５７）。得られた各ノードと各評価値とをバッファに格納すると、上記認識制御部３０は、上記ステップＳ５３へ戻り、上述した処理を繰り返し実行する。 When the evaluation value of each node obtained by the evaluation value calculation unit 33 is calculated by such a method, the recognition control unit 30 associates each node with each evaluation value and stores them in the buffer. (Step S57). When the obtained nodes and evaluation values are stored in the buffer, the recognition control unit 30 returns to step S53 and repeatedly executes the above-described processing.

すなわち、上記ステップＳ５４で取り出したノードが終端ノードであると判断した場合（ステップＳ５４、ＹＥＳ）、上記認識制御部３０は、当該ノードの評価値が所定の閾値以上であるか否かを上記決定部３５により判定する（ステップＳ５８）。この判定により当該ノードの評価値が所定の閾値未満であると判定された場合（ステップＳ５８、ＮＯ）、上記認識制御部３０は、上記バッファに当該ノード以外に未処理のノードが存在するか否かを判断する（ステップＳ５９）。 That is, when it is determined that the node extracted in step S54 is a terminal node (YES in step S54), the recognition control unit 30 determines whether or not the evaluation value of the node is equal to or greater than a predetermined threshold value. The determination is made by the unit 35 (step S58). When it is determined by this determination that the evaluation value of the node is less than the predetermined threshold (NO in step S58), the recognition control unit 30 determines whether there is an unprocessed node other than the node in the buffer. Is determined (step S59).

上記バッファに未処理のノードが存在すると判断した場合（ステップＳ５９、ＮＯ）、上記認識制御部３０は、上記ステップＳ５３へ戻り、当該ノード以外で未処理のノードから最大評価値となるノードを取り出すことにより、上記ステップＳ５３〜Ｓ５７の処理を実行する。また、上記バッファに未処理のノードが存在しないと判断した場合（ステップＳ５９、ＹＥＳ）、上記認識制御部３０は、評価値が所定の閾値以上となる候補が得られなかったものとして、処理を終了する。 If it is determined that there is an unprocessed node in the buffer (step S59, NO), the recognition control unit 30 returns to step S53, and extracts a node having the maximum evaluation value from the unprocessed nodes other than the node. As a result, the processes of steps S53 to S57 are executed. If it is determined that there is no unprocessed node in the buffer (step S59, YES), the recognition control unit 30 determines that no candidate whose evaluation value is equal to or greater than a predetermined threshold has not been obtained. finish.

また、上記ステップＳ５８の判定により当該ノードの評価値が所定の閾値以上であると判定された場合（ステップＳ５８、ＹＥＳ）、上記認識制御部３０は、上記決定部３５により当該ノードにより特定される各階層の候補で示されるパターンを最終的な認識結果として出力する（ステップＳ６０）。なお、第３の処理例としては、最終的な認識結果として所定の閾値以上の全ノードを出力するようにしても良い。これは、上記ステップＳ６０で最終的な認識結果が得られた場合であっても、上記ステップＳ５９へ進み、バッファに存在する未処理のノードに対して上記ステップＳ５３以降の処理を実行することにより実現可能である。これにより、最終的な認識結果としてノードの評価値が所定の閾値以上となる複数の認識結果を出力するようにできる。 If it is determined in step S58 that the evaluation value of the node is greater than or equal to a predetermined threshold (YES in step S58), the recognition control unit 30 is identified by the node by the determination unit 35. The pattern indicated by each layer candidate is output as the final recognition result (step S60). As a third processing example, all nodes having a predetermined threshold value or more may be output as a final recognition result. This is because even when the final recognition result is obtained in step S60, the process proceeds to step S59, and the processes after step S53 are executed on the unprocessed nodes existing in the buffer. It is feasible. As a result, a plurality of recognition results whose node evaluation values are equal to or greater than a predetermined threshold can be output as final recognition results.

上記のように、第３の処理例では、複数段階からなるパターン認識処理において、各候補に対する事後確率を推定処理時間で割った値を評価値とし、最良優先探索により認識結果としての候補を絞り込み、絞り込んだ候補のうち所定の閾値以上の評価値となる候補に基づいて最終的な認識結果を得るようになっている。これにより、第３の処理例によれば、処理時間を加味した評価値により効率的に認識結果の候補を絞り込むことができ、絞り込んだ候補から最終的な認識結果を得ることが可能である。 As described above, in the third processing example, in the pattern recognition processing including a plurality of stages, a value obtained by dividing the posterior probability for each candidate by the estimation processing time is used as an evaluation value, and candidates as recognition results are narrowed down by the best priority search. The final recognition result is obtained based on candidates that have an evaluation value equal to or higher than a predetermined threshold among the narrowed candidates. As a result, according to the third processing example, candidates for recognition results can be narrowed down efficiently based on an evaluation value that takes processing time into account, and a final recognition result can be obtained from the narrowed candidates.

本実施の形態のパターン認識方法に係る顔認識処理における各段階の処理を探索木で表現したものである。The process of each step in the face recognition process according to the pattern recognition method of the present embodiment is expressed by a search tree. 住所データベースの例を示す図である。It is a figure which shows the example of an address database. 住所情報の表記例を示す図である。It is a figure which shows the example of a description of address information. 図３に示す住所情報の画像から得られた単語候補の例を示す図である。It is a figure which shows the example of the word candidate obtained from the image of the address information shown in FIG. マッチング処理を探索木で示す概念図である。It is a conceptual diagram which shows a matching process with a search tree. 探索木で表される各ノードの状態を示す図である。It is a figure which shows the state of each node represented by a search tree. 文字候補の例を示す図である。It is a figure which shows the example of a character candidate. パターン認識機能を有する情報処理装置の構成例を示す図である。It is a figure which shows the structural example of the information processing apparatus which has a pattern recognition function. パターン認識処理の対象となる画像の例を示す図である。It is a figure which shows the example of the image used as the object of a pattern recognition process. 辞書データベースとしての住所データベースの構成例を示す図である。It is a figure which shows the structural example of the address database as a dictionary database. パターン認識部の構成例を示す図である。It is a figure which shows the structural example of a pattern recognition part. 図９に示す入力画像から抽出される単語候補の例を示す図である。It is a figure which shows the example of the word candidate extracted from the input image shown in FIG. パターン認識処理の第１の処理例としての処理の流れを説明するためのフローチャートである。It is a flowchart for demonstrating the flow of the process as a 1st process example of a pattern recognition process. 情報処理装置におけるパターン認識処理の第２の処理例としての処理の流れを説明するためのフローチャートである。It is a flowchart for demonstrating the flow of a process as the 2nd process example of the pattern recognition process in information processing apparatus. 探索木の構成例を示す概念図である。It is a conceptual diagram which shows the structural example of a search tree. 情報処理装置におけるパターン認識処理の第３の処理例としての処理の流れを説明するためのフローチャートである。12 is a flowchart for explaining a flow of processing as a third processing example of pattern recognition processing in the information processing apparatus.

Explanation of symbols

１１…情報処理装置、２１…画像インターフェース、２２…プロセッサ、２２ａ…パターン認識部、２３…ワーキングメモリ、２４…プログラムメモリ、２５…データメモリ、２５ａ…辞書データベース（住所データベース）、２６…出力インターフェース、３０…認識制御部、３１…候補抽出部、３２…ノード展開部、３３…評価値算出部、３４…ノード選別部、３５…決定部。 DESCRIPTION OF SYMBOLS 11 ... Information processing apparatus, 21 ... Image interface, 22 ... Processor, 22a ... Pattern recognition part, 23 ... Working memory, 24 ... Program memory, 25 ... Data memory, 25a ... Dictionary database (address database), 26 ... Output interface, DESCRIPTION OF SYMBOLS 30 ... Recognition control part, 31 ... Candidate extraction part, 32 ... Node expansion | deployment part, 33 ... Evaluation value calculation part, 34 ... Node selection part, 35 ... Determination part.

Claims

A pattern recognition method used in an information processing apparatus for performing pattern recognition processing in a plurality of stages,
Expand the next stage recognition candidates belonging to the recognition candidates,
Estimate the time required for the recognition process for the next and subsequent recognition candidates that depend on each recognition candidate,
For each of the expanded recognition candidates, an evaluation value is calculated based on the posterior probability on the condition of all recognition processing results for recognition candidates that have been recognized and the estimated time required for the recognition processing ,
Select recognition candidates based on the calculated evaluation values for each recognition candidate ,
Determining a pattern recognition result from the selected recognition candidates;
A pattern recognition method characterized by the above.

The selection of the recognition candidates selects a predetermined number of recognition candidates in descending order of the evaluation value for each stage.
The pattern recognition method according to claim 1, wherein:

The selection of the recognition candidate is to select the recognition candidate having the largest evaluation value from among the recognition candidates that have calculated the evaluation value,
The expansion of the recognition candidate expands the next stage recognition candidate belonging to the recognition candidate having the largest selected evaluation value.
The pattern recognition method according to claim 1, wherein:

The posterior probability of the recognition candidate includes the probability that the recognition processing result for the recognition candidate is output on the condition of the recognition candidate, the probability that the recognition processing result for the recognition result is output, and the previous one of the recognition candidate. Calculated based on posterior probabilities for stage recognition candidates,
The pattern recognition method according to claim 1, wherein the pattern recognition method is a pattern recognition method.

A character recognition method used in a character recognition device that performs processing for recognizing character information composed of information of a plurality of layers,
Belonging to the word candidate to expand the word candidate of the next stage layer,
Estimate the time required for the recognition process for the next and subsequent recognition candidates subordinate to each word candidate,
For each expanded word candidate, an evaluation value is calculated based on the posterior probability on the condition of all character recognition processing results for the word candidates already subjected to character recognition processing and the estimated time required for the recognition processing ,
Select word candidates based on the calculated evaluation value for each word candidate ,
Determining a recognition result of the entire character information from the selected word candidates;
A character recognition method characterized by the above.

The posterior probability of the word candidate is a probability that a character recognition process result for the word candidate is output on the condition of the word candidate, a probability that a character recognition process result for the recognition result is output, and one of the word candidates. Calculated based on the posterior probability for the previous word candidate,
The character recognition method according to claim 5 , wherein:

A program for causing a computer to perform processing for recognizing a pattern in multiple stages,
The ability to expand the next stage of recognition candidates belonging to recognition candidates,
A function for estimating the time required for recognition processing for recognition candidates in the next and subsequent stages depending on each recognition candidate;
A function for calculating an evaluation value based on the posterior probability and the estimated time required for the recognition processing for each of the expanded recognition candidates based on the result of all recognition processing for recognition candidates that have already been recognized;
A function for selecting recognition candidates based on the calculated evaluation value for each recognition candidate;
A function for determining the recognition result of the pattern from the selected recognition candidates;
Pattern recognition program for realizing

A program for causing a computer to perform processing for recognizing character information including information of a plurality of layers,
The ability to expand the next stage of word candidates belonging to word candidates,
A function for estimating the time required for recognition processing for recognition candidates in the next and subsequent layers subordinate to each word candidate;
A function for calculating an evaluation value based on the posterior probability and the estimated time required for the recognition processing for each expanded word candidate based on the result of all character recognition processing for the word candidate that has undergone character recognition processing;
A function for selecting word candidates based on the calculated evaluation value for each word candidate;
A function that determines the recognition result of the entire character information from the selected word candidates,
Character recognition program for realizing