JP2013246677A

JP2013246677A - Learning device for dictionary for pattern recognition, pattern recognition device, coding device, division device and learning method for dictionary for pattern recognition

Info

Publication number: JP2013246677A
Application number: JP2012120684A
Authority: JP
Inventors: Tomoyuki Hamamura; 倫行浜村; Hide Boku; 英朴; Bumpei Irie; 文平入江; Masaya Maeda; 匡哉前田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2012-05-28
Filing date: 2012-05-28
Publication date: 2013-12-09
Anticipated expiration: 2032-05-28
Also published as: JP5992206B2

Abstract

PROBLEM TO BE SOLVED: To provide a learning device for a dictionary for pattern recognition, a pattern recognition device, a coding device, a division device and a learning method for a dictionary for pattern recognition, for easily achieving the learning of a pattern recognition dictionary without imposing any manual labor.SOLUTION: The learning device for a dictionary for pattern recognition includes: a generation part; a feature extraction part; a setting part; and an update part. The generation part generates a pattern area candidate in a recognition object image in which a pattern input by an input person seems to be described. The feature extraction part calculates a feature value in a pattern area candidate generated by the generation part. The setting part sets probability that the pattern input by the input person is described in the pattern area candidate generated by the generation part on the basis of the visual line position of the input person who has input the pattern. The update part updates the dictionary for pattern recognition on the basis of the probability set by the setting part and the feature value calculated by the feature extraction part.

Description

本発明の実施形態は、パターン認識用辞書の学習装置、パターン認識装置、コーディング装置、区分装置、および、パターン認識用辞書の学習方法に関する。 Embodiments described herein relate generally to a pattern recognition dictionary learning device, a pattern recognition device, a coding device, a sorting device, and a pattern recognition dictionary learning method.

たとえば、パターン認識装置は、パターン画像と正解のパターンを示す情報とを対応づけた学習用データを用いて辞書を更新することにより認識精度を向上できる。パターンの一例としての文字を認識するための辞書の学習方法としては、文字画像と文字画像が示す文字（正解となる文字）を示す情報とを対応付けた学習用データを準備し、その学習用データを元に辞書データを更新する方法がある。文字画像と正解の文字とを対応付ける方法としては、人が個々の文字画像に対して文字情報を指定する方法がある。しかし、人が直接的に個々の文字画像と正解の文字とを対応づける方法は、多大な手間と時間とが掛かるという課題がある。 For example, the pattern recognition apparatus can improve the recognition accuracy by updating the dictionary using learning data in which a pattern image and information indicating a correct pattern are associated with each other. As a learning method of a dictionary for recognizing a character as an example of a pattern, learning data is prepared by associating a character image with information indicating a character (character that is a correct answer) indicated by the character image. There is a method for updating dictionary data based on data. As a method of associating a character image with a correct character, there is a method in which a person designates character information for each character image. However, there is a problem that a method in which a person directly associates each character image with a correct character takes a lot of time and effort.

特開平７−１９１７９６号公報Japanese Patent Laid-Open No. 7-191796 特許第３０５１６２８号公報Japanese Patent No. 3051628

上記の課題を解決するために、パターン認識用辞書の学習を効率よく行えるパターン認識用辞書の学習装置、パターン認識装置、コーディング装置、区分装置、および、パターン認識用辞書の学習方法を提供することを目的とする。 In order to solve the above problems, a pattern recognition dictionary learning device, a pattern recognition device, a coding device, a sorting device, and a pattern recognition dictionary learning method capable of efficiently learning a pattern recognition dictionary are provided. With the goal.

実施形態によれば、パターン認識用辞書の学習装置は、生成部と、特徴抽出部と、設定部と、更新部とを有する。生成部は、入力者が入力したパターンが記載されているらしい認識対象の画像におけるパターン領域候補を生成する。特徴抽出部は、生成部により生成したパターン領域候補における特徴量を算出する。設定部は、パターンを入力した入力者の視線位置に基づいて生成部により生成したパターン領域候補に入力者が入力したパターンが記載されている確率を設定する。更新部は、設定部により設定された確率と特徴抽出部により算出した特徴量とに基づいてパターン認識用辞書を更新する。 According to the embodiment, the pattern recognition dictionary learning apparatus includes a generation unit, a feature extraction unit, a setting unit, and an update unit. The generation unit generates a pattern region candidate in an image to be recognized that seems to describe a pattern input by the input user. The feature extraction unit calculates a feature amount in the pattern area candidate generated by the generation unit. The setting unit sets a probability that the pattern input by the input user is described in the pattern area candidate generated by the generation unit based on the line-of-sight position of the input user who inputs the pattern. The update unit updates the pattern recognition dictionary based on the probability set by the setting unit and the feature amount calculated by the feature extraction unit.

図１は、本実施形態に係る区分装置としての紙葉類処理装置の構成例を概略的に示すブロック図である。FIG. 1 is a block diagram schematically illustrating a configuration example of a paper sheet processing apparatus as a sorting apparatus according to the present embodiment. 図２は、本実施形態に係るビデオコーディング装置としてのＶＣＤの動作例を説明するためのフローチャートである。FIG. 2 is a flowchart for explaining an operation example of the VCD as the video coding apparatus according to the present embodiment. 図３は、本実施形態に係るビデオコーディング装置としてのＶＣＤに供給される認識対象とする画像の例を示す図である。FIG. 3 is a diagram showing an example of a recognition target image supplied to the VCD as the video coding apparatus according to the present embodiment. 図４は、本実施形態に係る認識対象とする画像の表示画面における、視線位置、および、入力タイミングを示す図である。FIG. 4 is a diagram illustrating the line-of-sight position and the input timing on the display screen of the image to be recognized according to the present embodiment. 図５は、本実施形態に係るＶＣＤで作成される学習用データのファイルの構成例を示す図である。FIG. 5 is a diagram illustrating a configuration example of a learning data file created by the VCD according to the present embodiment. 図６は、本実施形態に係る学習部の構成例を示すブロック図である。FIG. 6 is a block diagram illustrating a configuration example of the learning unit according to the present embodiment. 図７は、本実施形態に係る認識対象の画像における仮説の例を示す図である。FIG. 7 is a diagram illustrating an example of a hypothesis in a recognition target image according to the present embodiment. 図８は、本実施形態に係る各仮説に対する事前確率を示す図である。FIG. 8 is a diagram showing prior probabilities for each hypothesis according to the present embodiment. 図９は、本実施形態に係る各仮説に対する事後確率を示す図である。FIG. 9 is a diagram showing posterior probabilities for the respective hypotheses according to the present embodiment. 図１０は、本実施形態に係る学習部による学習処理の動作例を説明するためのフローチャートである。FIG. 10 is a flowchart for explaining an operation example of learning processing by the learning unit according to the present embodiment.

以下、本実施形態について図面を参照して説明する。
本実施形態に係る学習装置は、パターン認識用辞書を更新するものである。ここで、認識対象となるパターンは、例えば、文字、記号或いはコードなどの情報であり、パターン認識用辞書は、それらのパターンを認識するための辞書データを記憶する記憶部である。本実施形態では、パターン認識用辞書としての文字認識用辞書を用いた文字認識処理を行う区分装置について説明するものとする。 Hereinafter, the present embodiment will be described with reference to the drawings.
The learning device according to the present embodiment updates the pattern recognition dictionary. Here, the pattern to be recognized is information such as characters, symbols, or codes, and the pattern recognition dictionary is a storage unit that stores dictionary data for recognizing those patterns. In this embodiment, a sorting apparatus that performs character recognition processing using a character recognition dictionary as a pattern recognition dictionary will be described.

本実施形態に係る区分装置は、文字認識用辞書の学習装置、パターン認識装置およびコーディング装置などの機能を含むシステムである。本実施形態に係る区分装置は、紙葉類あるいは物品などの区分対象物に文字で記載されている区分情報に基づいて、区分対象物を区分処理する。区分装置は、紙葉類あるいは物品などの区分対象物に記載された区分情報としての文字を認識する文字認識機能を有する。区分装置は、文字認識処理による認識結果として得られる区分情報あるいは後述するビデオコーディング処理により入力される区分情報によって、紙葉類あるいは物品などの区分対象物を区分する。 The sorting device according to the present embodiment is a system including functions such as a character recognition dictionary learning device, a pattern recognition device, and a coding device. The sorting apparatus according to the present embodiment sorts a sorting object based on sorting information written in letters on a sorting object such as paper sheets or articles. The sorting apparatus has a character recognition function for recognizing characters as sorting information written on sorting objects such as paper sheets or articles. The sorting device sorts sorting objects such as paper sheets or articles based on sorting information obtained as a recognition result by character recognition processing or sorting information input by video coding processing described later.

本実施形態に係る区分装置は、区分機本体における文字認識用辞書を用いた文字認識処理で認識できなかった区分情報としての文字情報を人が入力するビデオコーディングシステム（以降、ＶＣＳと略称する）を有する。ＶＣＳは、複数のコーディング装置を有する。各コーディング装置は、区分機本体の文字認識処理で認識できなかった区分情報を含む画像を表示画面に表示し、その表示画面を見た人物が当該文字画像に含まれる文字情報としての区分情報を入力する。また、本実施形態に係るコーディング装置は、人物が入力した文字情報としての区分情報を区分機本体に返すだけでなく、区分情報を含む画像と人物が入力した文字情報と文字情報の入力時に当該人物が見ていた位置情報とを含むデータを文字認識用辞書の更新用データ（学習用データ）として区分機本体などに設けた学習部などの文字認識用辞書の学習装置へ供給する機能を有する。 The sorting apparatus according to the present embodiment is a video coding system (hereinafter, abbreviated as VCS) in which a person inputs character information as sorting information that could not be recognized by a character recognition process using a character recognition dictionary in the sorter body. Have The VCS has a plurality of coding devices. Each coding device displays an image including classification information that could not be recognized by the character recognition processing of the classification machine body on the display screen, and a person who viewed the display screen displays the classification information as character information included in the character image. input. In addition, the coding apparatus according to the present embodiment not only returns the classification information as character information input by the person to the sorting machine body, but also when the image including the classification information, the character information input by the person, and the character information are input. It has a function of supplying data including position information viewed by a person to a character recognition dictionary learning device such as a learning unit provided in the sorter body as data for updating the character recognition dictionary (learning data). .

本実施形態に係る区分装置は、ＶＣＳから供給される学習用データに基づいて文字認識用辞書を更新（学習処理）する学習部を有する。学習部は、学習用データに含まれる人物が入力した文字情報と文字情報の入力時に当該人物が見ていた位置情報とを参照して、文字認識用辞書に記憶されている各文字の辞書データを更新する。学習部は、区分機本体における文字認識処理に用いる辞書を更新するものであれば良く、区分装置とは別に設けた学習装置であっても良い。 The sorting apparatus according to the present embodiment includes a learning unit that updates (learns) the character recognition dictionary based on learning data supplied from the VCS. The learning unit refers to the character information input by the person included in the learning data and the position information viewed by the person when the character information is input, and the dictionary data of each character stored in the character recognition dictionary Update. The learning unit only needs to update a dictionary used for character recognition processing in the sorting machine body, and may be a learning device provided separately from the sorting device.

なお、本実施形態に係る区分装置は、郵便物などの紙葉類を区分する紙葉類処理装置（例えば郵便区分装置）、梱包品や貨物（たとえば、小包、宅配便）などの物品を区分する物品区分装置、あるいは、物品に付与されたタグなどに記載された区分情報により物品を仕分けする物品仕分け装置などが想定される。以下の説明では、区分装置の一例として、文字で記載された区分情報としての住所情報により紙葉類を区分する紙葉類処理装置を想定して説明するものとする。 The sorting device according to the present embodiment sorts articles such as a paper sheet processing device (for example, a mail sorting device) that sorts paper sheets such as postal items, and packages and cargo (for example, parcels, parcel delivery). An article sorting apparatus that sorts articles according to sorting information described in tags attached to articles or the like is assumed. In the following description, as an example of a sorting device, a description will be given assuming a paper sheet processing device that sorts paper sheets based on address information as sorting information written in characters.

図１は、本実施形態に係るパターン認識用辞書の学習装置、パターン認識装置およびコーディング装置などの機能を有する区分装置としての紙葉類処理装置の構成例を概略的に示すブロック図である。
紙葉類処理装置１は、たとえば、郵便物あるいは帳票等の紙葉類に記載された住所などの文字で表現された区分情報を文字認識し、その文字認識の結果に基づいて当該紙葉類を区分処理する。図１に示す構成例おいて、紙葉類処理装置１は、区分機本体３及びビデオコーディングシステム（以降、ＶＣＳと略称する）４などから構成される。区分機本体３及びＶＣＳ４は、互いに通信可能なように接続されている。 FIG. 1 is a block diagram schematically showing a configuration example of a paper sheet processing apparatus as a sorting apparatus having functions such as a pattern recognition dictionary learning apparatus, a pattern recognition apparatus, and a coding apparatus according to the present embodiment.
For example, the paper sheet processing apparatus 1 performs character recognition on classification information expressed by characters such as an address written on a paper sheet such as a postal matter or a form, and the paper sheet is based on the result of the character recognition. Is processed separately. In the configuration example shown in FIG. 1, the sheet processing apparatus 1 includes a sorter body 3 and a video coding system (hereinafter abbreviated as VCS) 4. The sorter main body 3 and the VCS 4 are connected so that they can communicate with each other.

まず、区分機本体３について説明する。
図１に示す紙葉類処理装置の区分機本体３は、紙葉類を区分情報としての住所情報により区分処理するものである。区分機本体３は、文字認識部及び住所認識部を有する。区分機本体３の文字認識部は、文字認識用の辞書を参照して、スキャナにより読み取った紙葉類の画像における住所情報らしい各文字を認識する。区分機本体３の住所判定部は、文字認識部による文字認識の結果と住所データベースに記憶されている住所情報とを参照して、紙葉類に記載されている住所情報を判定する。紙葉類処理装置の区分機本体３は、住所情報が認識（特定）できなかった紙葉類の画像をＶＣＳ４へ送る。ＶＣＳ４は、紙葉類の画像における住所情報をオペレータにより入力され、その入力結果を区分機本体４へ返す。区分機本体４は、ＶＣＳ４で入力された住所情報に基づいて住所情報が認識できなかった紙葉類を区分処理する機能も有する。 First, the sorting machine body 3 will be described.
The sorter main body 3 of the paper sheet processing apparatus shown in FIG. 1 sorts paper sheets based on address information as sorting information. The sorter main body 3 includes a character recognition unit and an address recognition unit. The character recognition unit of the sorter main body 3 refers to the dictionary for character recognition and recognizes each character that seems to be address information in the paper sheet image read by the scanner. The address determination unit of the sorting machine main body 3 determines address information described on the paper sheet with reference to the result of character recognition by the character recognition unit and the address information stored in the address database. The sorter body 3 of the paper sheet processing apparatus sends an image of the paper sheet whose address information could not be recognized (specified) to the VCS 4. The VCS 4 receives the address information in the image of the paper sheet by the operator and returns the input result to the sorter main body 4. The sorter main body 4 also has a function of sorting paper sheets whose address information could not be recognized based on the address information input by the VCS 4.

区分機本体３は、オペレーションパネル１０、供給部１１、主搬送路１２、バーコードリーダ（以下、ＢＣＲ）１３、スキャナ１４、バーコードライタ（以下、ＢＣＷ）１５、区分部１６、制御部１７、文字認識部１８、辞書１８a、住所識別部１９、住所データベース（以下、住所ＤＢ）１９ａ及び学習部２０などを備える。 The sorter body 3 includes an operation panel 10, a supply unit 11, a main transport path 12, a barcode reader (hereinafter referred to as BCR) 13, a scanner 14, a barcode writer (hereinafter referred to as BCW) 15, a sorting unit 16, a control unit 17, A character recognition unit 18, a dictionary 18a, an address identification unit 19, an address database (hereinafter referred to as an address DB) 19a, a learning unit 20, and the like are provided.

制御部１７は、紙葉類処理装置１の各部の動作を統合的に制御する。制御部１７は、ＣＰＵ、バッファメモリ、プログラムメモリ、及び不揮発性メモリなどを備える。ＣＰＵは、種々の演算処理を行う。バッファメモリは、ＣＰＵにより行われる演算の結果を一時的に記憶する。プログラムメモリ及び不揮発性メモリは、ＣＰＵが実行する種々のプログラム及び制御データなどを記憶する。制御部１７は、ＣＰＵによりプログラムメモリに記憶されているプログラムを実行することにより、種々の処理を行うことができる。 The control unit 17 controls the operation of each unit of the paper sheet processing apparatus 1 in an integrated manner. The control unit 17 includes a CPU, a buffer memory, a program memory, a nonvolatile memory, and the like. The CPU performs various arithmetic processes. The buffer memory temporarily stores the results of calculations performed by the CPU. The program memory and the nonvolatile memory store various programs executed by the CPU, control data, and the like. The control unit 17 can perform various processes by executing a program stored in the program memory by the CPU.

オペレーションパネル１０は、オペレータ（操作員）が処理モードを指定したり、処理開始を指定したり、紙葉類処理装置１の動作状態などを表示したりする。
供給部１１は、紙葉類処理装置１に取り込む紙葉類をストックする。供給部１１は、重ねられた状態の紙葉類をまとめて受け入れる。供給部１１は、紙葉類を１つずつ主搬送路１２へ供給する。たとえば、供給部１１は、分離ローラを備え、分離ローラは、供給部１１に紙葉類が投入された場合、投入された紙葉類の集積方向の下端に接する。分離ローラは、回転することにより、供給部１１にセットされた紙葉類を集積方向の下端から１枚ずつ主搬送路１２へ一定のピッチで供給する。 The operation panel 10 allows an operator (operator) to specify a processing mode, to specify processing start, and to display an operation state of the sheet processing apparatus 1.
The supply unit 11 stocks paper sheets to be taken into the paper sheet processing apparatus 1. The supply unit 11 collectively receives the stacked paper sheets. The supply unit 11 supplies paper sheets one by one to the main transport path 12. For example, the supply unit 11 includes a separation roller. When a paper sheet is input to the supply unit 11, the supply roller 11 contacts the lower end of the input paper sheet in the stacking direction. The separation roller rotates to supply paper sheets set in the supply unit 11 one by one from the lower end in the stacking direction to the main transport path 12 at a constant pitch.

主搬送路１２は、紙葉類を紙葉類処理装置１内の各部に搬送する搬送部である。主搬送路１２は、搬送ベルト及び駆動プーリなどを備える。主搬送路１２は、駆動モータにより駆動プーリを駆動する。搬送ベルトは、駆動プーリにより動作する。主搬送路１２上には、バーコードリーダ１３、スキャナ１４、バーコードライタ１５、および、区分部１６などが設けられている。 The main transport path 12 is a transport unit that transports paper sheets to each unit in the paper sheet processing apparatus 1. The main conveyance path 12 includes a conveyance belt and a drive pulley. The main conveyance path 12 drives a drive pulley by a drive motor. The conveyor belt is operated by a driving pulley. On the main transport path 12, a bar code reader 13, a scanner 14, a bar code writer 15, and a sorting unit 16 are provided.

ＢＣＲ１３は、主搬送路１２上を搬送される紙葉類に印字されているＩＤバーコードあるいは宛先バーコードなどのバーコードを読取るものである。ＢＣＲ１３は、バーコードの画像を読取る読取部と、読取画像におけるバーコードを認識する認識部を有する。読取部は、バーコードを読み取ると、当該バーコードの画像を認識部へ供給する。認識部は、供給された当該バーコード画像を処理し、当該バーコードに含まれるデータを認識する。認識されたデータは、制御部１７へ供給される。 The BCR 13 reads a barcode such as an ID barcode or a destination barcode printed on a sheet conveyed on the main conveyance path 12. The BCR 13 includes a reading unit that reads an image of a barcode and a recognition unit that recognizes the barcode in the read image. When the reading unit reads the barcode, the reading unit supplies an image of the barcode to the recognition unit. The recognition unit processes the supplied barcode image and recognizes data included in the barcode. The recognized data is supplied to the control unit 17.

スキャナ１４は、主搬送路１２により搬送される紙葉類から画像を取得する。スキャナ１４は、例えば、照明と光学センサとを備える。照明は、主搬送路１２により搬送される紙葉類１に対して光を照射する。光学センサは、ＣｈａｒｇｅＣｏｕｐｌｅｄＤｅｖｉｃｅ（ＣＣＤ）などの受光素子と光学系（レンズ）とを備える。光学センサは、紙葉類で反射した反射光を光学系により受光し、ＣＣＤに結像させ、電気信号（画像）を取得する。スキャナ１４は、主搬送路１２により搬送される紙葉類１から連続して画像を取得することにより、紙葉類の全体の画像を取得する。スキャナ１４は、取得した画像を文字認識部１８に供給する。なお、スキャナ１４は、ビデオカメラなどにより構成しても良い。 The scanner 14 acquires an image from paper sheets conveyed by the main conveyance path 12. The scanner 14 includes, for example, illumination and an optical sensor. The illumination irradiates light on the paper sheet 1 conveyed by the main conveyance path 12. The optical sensor includes a light receiving element such as a charge coupled device (CCD) and an optical system (lens). The optical sensor receives the reflected light reflected by the paper sheets by the optical system, forms an image on the CCD, and acquires an electrical signal (image). The scanner 14 obtains an image of the entire paper sheet by continuously obtaining images from the paper sheet 1 conveyed by the main conveyance path 12. The scanner 14 supplies the acquired image to the character recognition unit 18. The scanner 14 may be constituted by a video camera or the like.

文字認識部１８は、スキャナ１４により読み取った紙葉類の画像に含まれるパターンとしての文字を認識するパターン認識部として機能する。文字認識部１８は、スキャナ１４により読取った紙葉類の画像から住所情報を構成する各文字らしい文字画像の領域（文字候補領域）を抽出する。文字認識部１８は、抽出した文字候補領域の画像における各文字を文字認識用の辞書１８ａに記憶された辞書データを参照して認識する。たとえば、文字認識部１８は、文字候補領域における文字（パターン）としての特徴パラメータと辞書に辞書データとして記憶されている各文字の特徴パラメータ（各文字の辞書パターン）との類似度を算出し、算出した類似度により文字認識結果を判定する。 The character recognizing unit 18 functions as a pattern recognizing unit that recognizes a character as a pattern included in an image of a paper sheet read by the scanner 14. The character recognizing unit 18 extracts a character image region (character candidate region) that seems to be each character constituting the address information from the paper sheet image read by the scanner 14. The character recognition unit 18 recognizes each character in the extracted character candidate region image with reference to dictionary data stored in the character recognition dictionary 18a. For example, the character recognition unit 18 calculates the similarity between the feature parameters as characters (patterns) in the character candidate area and the feature parameters (character dictionary patterns) of each character stored as dictionary data in the dictionary, The character recognition result is determined based on the calculated similarity.

辞書１８ａは、各文字の特徴をデータ化した特徴パラメータなどの辞書データを記憶する。辞書１８ａは、文字認識部１８による文字認識処理（ＯＣＲ処理）に使用される。たとえば、辞書１８ａは、各文字の特徴を示す辞書データとして、各文字の特徴パラメータを記憶する。ここで、文字の特徴パラメータとは、文字の画像における特徴を数値化したものである。たとえば、辞書１８ａに辞書データとして文字の特徴パラメータは、文字の画像をぼかした後の輝度勾配情報を１２８次元ベクトルとして抽出したものなどであっても良い。また、辞書１８ａに辞書データとして記憶する文字の特徴パラメータは、平均ｍ、共分散行列Σを有するガウス分布であっても良い。 The dictionary 18a stores dictionary data such as feature parameters obtained by converting the characteristics of each character into data. The dictionary 18 a is used for character recognition processing (OCR processing) by the character recognition unit 18. For example, the dictionary 18a stores the characteristic parameters of each character as dictionary data indicating the characteristics of each character. Here, the character feature parameter is a numerical value of the feature in the character image. For example, the characteristic parameter of the character as dictionary data in the dictionary 18a may be one obtained by extracting the luminance gradient information after blurring the character image as a 128-dimensional vector. The character feature parameters stored as dictionary data in the dictionary 18a may be a Gaussian distribution having an average m and a covariance matrix Σ.

文字認識部１８は、認識対象とする紙葉類の画像から１つの文字が記載されている領域らしい領域を文字候補領域として抽出する。文字候補領域の抽出方法は、特定の方法に限定されない。１つの文字が記載されているらしい文字候補領域を抽出すると、文字認識部１８は、文字候補領域における文字らしいパターンの特徴パラメータを抽出する。特徴パラメータを抽出すると、文字認識部１８は、抽出した特徴パラメータと辞書１８ａに格納されている各文字の特徴パラメータとを比較して、文字候補領域に記載されている文字が比較した文字である確率（或いは類似度）を計算する。 The character recognizing unit 18 extracts, as a character candidate region, a region that seems to be a region where one character is described from a paper sheet image to be recognized. The extraction method of the character candidate region is not limited to a specific method. When a character candidate area where one character is likely to be described is extracted, the character recognition unit 18 extracts feature parameters of a pattern that seems to be a character in the character candidate area. When the feature parameters are extracted, the character recognition unit 18 compares the extracted feature parameters with the feature parameters of each character stored in the dictionary 18a, and the characters described in the character candidate area are the characters compared. Probability (or similarity) is calculated.

たとえば、文字認識部１８は、抽出した特徴パラメータと辞書１８ａに格納されている「あ」の特徴パラメータとを比較して、文字候補領域に記載されている文字が「あ」である確率（或いは類似度）を計算する。さらに、文字認識部１８は、「い」、「う」、・・・などの各文字について、順に類似度（確率）を計算する。 For example, the character recognizing unit 18 compares the extracted feature parameter with the feature parameter “a” stored in the dictionary 18 a, and the probability that the character described in the character candidate area is “a” (or Similarity) is calculated. Further, the character recognition unit 18 calculates the similarity (probability) in order for each character such as “I”, “U”,.

文字認識部１８は、計算した各文字に対する類似度（確率）から文字候補領域に含まれる文字を特定する。たとえば、文字認識部１８は、計算した確率が所定のしきい値を超えた文字を認識結果とする。所定のしきい値を超える文字が複数存在する場合、文字認識部１８は、それらの文字を確率の大きい順に文字認識結果として住所判定部１９へ出力する。また、所定のしきい値を超える文字が複数存在する場合、文字認識部１８は、それらの文字のうち確率の大きい順に所定数の文字を認識結果として住所判定部１９へ出力するようにしても良い。また、文字を一意に特定する場合、所定のしきい値を超える文字が複数存在すれば、文字認識部１８は、それらの文字のうち確率が最も大きい１つの文字を文字認識結果として住所判定部１９へ出力するようにしても良い。 The character recognition unit 18 specifies a character included in the character candidate area from the calculated similarity (probability) for each character. For example, the character recognition unit 18 sets a character whose calculated probability exceeds a predetermined threshold as a recognition result. When there are a plurality of characters exceeding the predetermined threshold, the character recognition unit 18 outputs these characters to the address determination unit 19 as a character recognition result in descending order of probability. Further, when there are a plurality of characters exceeding a predetermined threshold, the character recognition unit 18 outputs a predetermined number of characters among the characters in descending order of probability to the address determination unit 19 as recognition results. good. In addition, when a character is uniquely specified, if there are a plurality of characters exceeding a predetermined threshold, the character recognizing unit 18 selects one character having the highest probability among the characters as a character recognition result as an address determining unit. 19 may be output.

また、紙葉類処理装置では、文字認識部１８は、住所判定を前提する処理であるため、しきい値と比較せずに文字認識結果を出力するようにしても良い。たとえば、文字認識部１８は、全ての文字について計算した確率の大きい順に、所定数の文字を文字認識結果としても良い。この場合、文字認識部１８は、確率の大きい順に所定数の文字を文字認識結果として住所判定部１９へ出力する。また、文字認識部１８は、文字候補領域の文字を一意に特定するようにしても良い。文字を一意に特定する場合、文字認識部１８は、最も確率の大きい文字を文字候補領域に対する文字認識結果として出力すればよい。文字認識部１８は、抽出された画像に記載されている全ての文字画像について同様に認識処理を行い、抽出された画像に記載されている全ての文字を認識する。なお、文字認識部１８における文字認識処理は、辞書１８aに登録した辞書データを用いた文字認識であれば任意の方法でよく、上記の処理例に限定されるものではない。 Further, in the paper sheet processing apparatus, the character recognition unit 18 is a process that presupposes address determination, and therefore may output a character recognition result without comparing with a threshold value. For example, the character recognition unit 18 may use a predetermined number of characters as character recognition results in descending order of probability calculated for all characters. In this case, the character recognition unit 18 outputs a predetermined number of characters to the address determination unit 19 as a character recognition result in descending order of probability. Further, the character recognition unit 18 may uniquely specify a character in the character candidate area. When the character is uniquely specified, the character recognition unit 18 may output the character with the highest probability as the character recognition result for the character candidate area. The character recognizing unit 18 similarly performs recognition processing for all the character images described in the extracted image, and recognizes all the characters described in the extracted image. The character recognition processing in the character recognition unit 18 may be any method as long as it is character recognition using dictionary data registered in the dictionary 18a, and is not limited to the above processing example.

住所判定部１９は、文字認識部１８による各文字の認識結果に基づいて、各文字の組み合わせからなる住所情報を認識する。住所判定部１９は、文字認識部１８による文字認識結果と住所ＤＢ１９ａに記憶されている住所情報とを比較することにより、紙葉類に記載されている住所情報を特定（認識）する。たとえば、文字認識部１８による文字認識結果として各文字候補領域について複数の文字候補が得られた場合、住所判定部１９は、得られた文字候補の組合せと辞書１８ａに記憶された住所情報とを比較して住所情報を特定する。住所情報が特定（認識）できた場合、住所判定部１９は、特定した住所情報を制御部１７へ供給する。住所情報が特定（認識）できなかった場合、住所判定部１９は、紙葉類の画像、および、文字認識結果を含むビデオコーディング用の情報（コーディングデータ）を画像蓄積分配装置２１へ送信する。 The address determination unit 19 recognizes address information including a combination of characters based on the recognition result of each character by the character recognition unit 18. The address determination unit 19 identifies (recognizes) the address information described on the paper sheet by comparing the character recognition result obtained by the character recognition unit 18 with the address information stored in the address DB 19a. For example, when a plurality of character candidates are obtained for each character candidate area as a result of character recognition by the character recognition unit 18, the address determination unit 19 obtains the combination of the obtained character candidates and the address information stored in the dictionary 18a. Compare and identify address information. When the address information can be specified (recognized), the address determination unit 19 supplies the specified address information to the control unit 17. When the address information cannot be specified (recognized), the address determination unit 19 transmits the image of paper sheets and video coding information (coding data) including the character recognition result to the image storage / distribution device 21.

住所ＤＢ１９ａは、認識対象となる住所情報を記憶する。本実施形態では、住所ＤＢ１９ａに登録される住所情報は、複数階層のデータにより構成される情報であるものとする。住所ＤＢ１９ａは、たとえば、紙葉類（たとえば、郵便物）の処理対象となる地域に存在する全ての住所情報がツリー構造（階層構造）により格納されている。住所ＤＢ１９ａは、更新可能とすることもできる。また、住所ＤＢ１９ａは、住所判定部１９がアクセスできる記憶装置であれば良い。たとえば、住所ＤＢ１９ａは、区分機本体３の外に設けても良い。 The address DB 19a stores address information to be recognized. In the present embodiment, it is assumed that the address information registered in the address DB 19a is information composed of multiple layers of data. In the address DB 19a, for example, all address information existing in an area to be processed of paper sheets (for example, mail) is stored in a tree structure (hierarchical structure). The address DB 19a may be updatable. The address DB 19a may be any storage device that can be accessed by the address determination unit 19. For example, the address DB 19a may be provided outside the sorter main body 3.

ＢＣＷ１５は、必要に応じて紙葉類にＩＤバーコード、あるいは、宛先バーコードを印刷する。たとえば、ＢＣＷ１５は、住所判定部１９により住所情報が認識できた紙葉類に対し、認識結果としての住所情報をバーコード化した宛先バーコードを印刷する。また、ＢＣＷ１５は、住所判定部１９により住所情報が認識できなかった紙葉類に対し、制御部１７から与えられる識別情報（ＩＤコード）をバーコード化したＩＤバーコードを印刷する。つまり、ＢＣＷ１５は、住所情報が認識できた紙葉類にはその認識結果を宛先バーコードとして印刷し、住所情報が認識できなかった紙葉類にはＩＤバーコードを印刷する。すなわち、宛先バーコードは、住所情報そのものを示すバーコードであり、ＩＤバーコードは、当該紙葉類を識別するための識別情報を示すバーコードである。ＩＤバーコードで示す紙葉類の識別情報は、ＶＣＳ４において打鍵入力された住所情報と紙葉類とを対応づけるための情報である。言い換えると、ＩＤバーコードが印刷された紙葉類は、ＶＣＳ４による処理対象となる紙葉類である。 The BCW 15 prints an ID barcode or a destination barcode on a paper sheet as necessary. For example, the BCW 15 prints a destination barcode obtained by converting the address information as a recognition result into a barcode on a paper sheet whose address information can be recognized by the address determination unit 19. Further, the BCW 15 prints an ID barcode obtained by converting the identification information (ID code) given from the control unit 17 into a barcode for paper sheets whose address information cannot be recognized by the address determination unit 19. That is, the BCW 15 prints the recognition result as a destination barcode on a paper sheet for which address information can be recognized, and prints an ID barcode on a paper sheet for which address information cannot be recognized. That is, the destination barcode is a barcode indicating the address information itself, and the ID barcode is a barcode indicating identification information for identifying the paper sheet. The identification information of the paper sheet indicated by the ID barcode is information for associating the address information input by keystroke with the VCS 4 and the paper sheet. In other words, the paper sheet on which the ID barcode is printed is a paper sheet to be processed by the VCS 4.

ＢＣＷ１５の紙葉類の搬送方向の下流側には、紙葉類が住所情報に応じて区分される区分部１６が設けられている。この区分部１６は、複数の段、複数の列に区画された複数の区分ポケット（図示しない）から構成されている。各ポケットは、区分先ごとに対応して設定されており、住所情報あるいは機械コードに基づいて、紙葉類が住所情報に対応したポケットに順次集積される。 On the downstream side of the BCW 15 in the sheet transport direction, a sorting unit 16 is provided for sorting the sheets according to the address information. The partitioning section 16 includes a plurality of partition pockets (not shown) partitioned into a plurality of stages and a plurality of rows. Each pocket is set corresponding to each sorting destination, and sheets are sequentially accumulated in the pocket corresponding to the address information based on the address information or the machine code.

また、区分部１６には、区分先が認識できなかった紙葉類が集積されるＶＣＳ排除ポケット（図示しない）が設けられている。このＶＣＳ排除ポケットに集積された紙葉類は、住所情報がＶＣＳ４にて入力された後に、供給部１１に再供給される。供給部１１に再供給された紙葉類は、当該紙葉類に印刷されたＩＤコードとＶＣＳ４にて入力された住所情報とに基づいて再区分される。制御部１７は、区分情報としての住所情報に基づいて紙葉類を区分部１６の各ポケットに区分する。 Further, the sorting unit 16 is provided with a VCS exclusion pocket (not shown) in which paper sheets whose sorting destination cannot be recognized are accumulated. The paper sheets accumulated in the VCS exclusion pocket are re-supplied to the supply unit 11 after the address information is input by the VCS 4. The paper sheets re-supplied to the supply unit 11 are re-sorted based on the ID code printed on the paper sheets and the address information input by the VCS 4. The control unit 17 sorts the paper sheets into the respective pockets of the sorting unit 16 based on the address information as the sorting information.

学習部２０は、文字認識用の辞書１８aあるいは住所判定用の住所ＤＢ１９ａに対する学習機能を司るものである。学習部２０は、ＶＣＳ４において、オペレータが入力する文字情報および住所情報などを含む情報に基づいて辞書１８ａに記憶されている辞書データあるいは住所ＤＢ１９ａに記憶されている住所データを更新する。学習部２０については、後で詳細に説明する。なお、本実施形態においては、主に、学習部２０が文字認識用辞書１８ａを更新する処理例について説明する。ただし、認識対象を文字単位から、単語などの文字列単位にすることで文字列を認識するための辞書なども同様な処理方法で学習処理が行える。 The learning unit 20 manages a learning function for the dictionary 18a for character recognition or the address DB 19a for address determination. In the VCS 4, the learning unit 20 updates the dictionary data stored in the dictionary 18a or the address data stored in the address DB 19a based on information including character information and address information input by the operator. The learning unit 20 will be described in detail later. In the present embodiment, a processing example in which the learning unit 20 updates the character recognition dictionary 18a will be mainly described. However, a learning process can be performed by a similar processing method for a dictionary for recognizing a character string by changing the recognition target from a character unit to a character string unit such as a word.

次に、ＶＣＳ４について説明する。
ＶＣＳ４は、画像蓄積分配装置２１および複数のビデオコーディングディスク（以降、ＶＣＤと略称する）２２などから構成されている。画像蓄積分配装置２１は、制御部、記憶部、および各種インターフェースなどを有するコンピュータで実現される。ＶＣＤ２２は、たとえば、表示部、入力部、制御部、記憶部、および各種インターフェースなどを有するコンピュータで実現される。 Next, the VCS 4 will be described.
The VCS 4 includes an image storage / distribution device 21, a plurality of video coding disks (hereinafter abbreviated as VCD) 22, and the like. The image storage / distribution device 21 is realized by a computer having a control unit, a storage unit, various interfaces, and the like. The VCD 22 is realized by a computer having a display unit, an input unit, a control unit, a storage unit, various interfaces, and the like, for example.

画像蓄積分配装置２１には、区分機本体３と各ＶＣＤ２２とが接続される。画像蓄積分配装置２１は、区分機本体３内の文字認識部１８及び住所判定部１９により住所情報が認識できなかった紙葉類の画像を含むビデオコーディング用の情報（コーディングデータ）を区分機本体３から受信する。画像蓄積装置２１は、各ＶＣＤ２２の稼働状況などを監視し、各ＶＣＤ２２の稼働状況などに応じて区分機本体３から受信した紙葉類の画像を含むコーディングデータを各ＶＣＤ２２へ配信する。 The image storage / distribution device 21 is connected to the sorter main body 3 and each VCD 22. The image storage / distribution device 21 stores video coding information (coding data) including image of paper sheets whose address information could not be recognized by the character recognition unit 18 and the address determination unit 19 in the sorter body 3. 3 is received. The image storage device 21 monitors the operating status of each VCD 22 and distributes the coding data including the sheet image received from the sorter main body 3 to each VCD 22 according to the operating status of each VCD 22.

各ＶＣＤ２２は、画像蓄積装置２１から配信されたコーディングデータに含まれる紙葉類の画像を表示部２７に表示し、オペレータに正しい住所情報（文字情報）の入力を促す。ＶＣＤ２２は、紙葉類の画像を表示部２７に表示した状態において、オペレータが入力した住所情報として文字情報を含む入力情報を画像蓄積分配装置２１へ返す。画像蓄積分配装置２１は、各ＶＣＤ２２から取得した入力情報を区分機本体３へ返す処理を行なう。 Each VCD 22 displays a paper sheet image included in the coding data distributed from the image storage device 21 on the display unit 27 and prompts the operator to input correct address information (character information). The VCD 22 returns input information including character information as address information input by the operator to the image storage / distribution device 21 in a state where the image of the paper sheet is displayed on the display unit 27. The image storage / distribution device 21 performs a process of returning the input information acquired from each VCD 22 to the sorter body 3.

図１に示す構成例において、ＶＣＤ２２は、ＣＰＵ２３、不揮発性メモリ２４、ＲＡＭ２５、ＲＯＭ２６、表示部２７、入力部２８、及び、視線検出部２９などを有する。
ＣＰＵ２３は、ＶＣＤ２２全体の制御を司る制御部として機能する。ＣＰＵ２３は、ＲＯＭ２６あるいは不揮発性メモリ２４に記憶されている制御プログラム及び制御データに基づいて種々の処理を行う。たとえば、ＣＰＵ２３は、オペレーティングシステムのプログラムを実行することにより、ＶＣＤ２２の基本的な動作制御を行う。なお、各種の機能のうちの一部は、ハードウェア回路により実現されるものであっても良い。 In the configuration example illustrated in FIG. 1, the VCD 22 includes a CPU 23, a nonvolatile memory 24, a RAM 25, a ROM 26, a display unit 27, an input unit 28, and a line-of-sight detection unit 29.
The CPU 23 functions as a control unit that controls the entire VCD 22. The CPU 23 performs various processes based on the control program and control data stored in the ROM 26 or the nonvolatile memory 24. For example, the CPU 23 performs basic operation control of the VCD 22 by executing an operating system program. Note that some of the various functions may be realized by a hardware circuit.

不揮発性メモリ２４は、例えば、ＥＥＰＲＯＭ、フラッシュＲＯＭ、ＨＤＤ（ハードディスクドライブ）、あるいは、ＳＳＤ（Solid State Disk）などのデータの書き込み及び書換えが可能な不揮発性のメモリにより構成される。不揮発性メモリ２４は、ＶＣＤ２２の運用用途に応じて制御プログラム、制御データ、および、種々のデータを格納する。たとえば、不揮発性メモリ２４は、画像蓄積分配装置２１から供給されるビデオコーディング用の画像（文字画像を含む画像）を含むコーディングデータを保存する。また、不揮発性メモリ２４は、オペレータが入力する入力情報および後述する視線位置情報などを記憶するようにしても良い。 The nonvolatile memory 24 is configured by a nonvolatile memory capable of writing and rewriting data, such as an EEPROM, a flash ROM, an HDD (hard disk drive), or an SSD (Solid State Disk). The non-volatile memory 24 stores a control program, control data, and various data in accordance with the operation application of the VCD 22. For example, the nonvolatile memory 24 stores coding data including an image for video coding (an image including a character image) supplied from the image storage / distribution device 21. Further, the nonvolatile memory 24 may store input information input by an operator, line-of-sight position information described later, and the like.

ＲＡＭ２５は、揮発性のメモリである。ＲＡＭ２５は、ＣＰＵ２３の処理中のデータなどを一時的に格納する。たとえば、ＲＡＭ２５は、表示用の画像データを格納したり、オペレータが入力する入力情報および視線位置情報などを格納したりする。ＲＯＭ２６は、予め制御用のプログラム及び制御データなどが記憶される書換え不可の不揮発性メモリである。 The RAM 25 is a volatile memory. The RAM 25 temporarily stores data being processed by the CPU 23. For example, the RAM 25 stores image data for display, and stores input information and line-of-sight position information input by the operator. The ROM 26 is a non-rewritable nonvolatile memory in which a control program and control data are stored in advance.

表示部２７は、液晶ディスプレイなどにより構成される。たとえば、表示部２７は、画像蓄積分配装置２１から供給されるビデオコーディング用の画像（たとえば、紙葉類の画像）などを表示する。表示部２７には、ビデオコーディング用の画像としての紙葉類の画像（認識対象となる文字を含む画像）だけでなく、区分機本体３側で認識できた範囲の情報も表示するようにしても良い。入力部２８は、表示部２７に表示された画像に含まれる住所情報としての文字情報をオペレータが入力するためのデバイスである。たとえば、入力部２８は、キーボードおよびポインティングデバイスなどにより構成される。 The display unit 27 is configured by a liquid crystal display or the like. For example, the display unit 27 displays an image for video coding (for example, an image of a paper sheet) supplied from the image storage / distribution device 21. The display unit 27 displays not only paper sheet images (images including characters to be recognized) as video coding images, but also information on the range recognized on the sorter body 3 side. Also good. The input unit 28 is a device for an operator to input character information as address information included in an image displayed on the display unit 27. For example, the input unit 28 includes a keyboard and a pointing device.

視線検出部２９は、オペレータの視線位置を検出するものである。視線検出部２９は、ビデオコーディングの対象となる紙葉類の画像を表示した状態の表示部２７の表示画面上においてオペレータが注視している位置（視線位置）を検出する。視線検出部２９は、表示部２７の表示画面上におけるオペレータの視線位置を検出できるものであれば良い。 The line-of-sight detection unit 29 detects an operator's line-of-sight position. The line-of-sight detection unit 29 detects a position (line-of-sight position) at which the operator is gazing on the display screen of the display unit 27 in a state where an image of a paper sheet that is a target of video coding is displayed. The line-of-sight detection unit 29 only needs to be able to detect the operator's line-of-sight position on the display screen of the display unit 27.

たとえば、視線検出部２９は、２台のカメラと処理ユニットとにより構成される。２台のカメラは、一定の距離離れて設置され、それぞれがオペレータの瞳を含む画像を撮影する。各カメラは、撮影したオペレータの瞳の画像を含む画像を処理ユニットへ送信する。各カメラから画像を受信した処理ユニットは、２つのカメラが撮影した２つの画像の差異によりオペレータの眼球の形状を推定し、虹彩中心における眼球との接平面を推定する。処理ユニットは、推定した接平面からの法線を計算し、オペレータの視線方向を検出する。さらに、処理ユニットは、オペレータの眼球の位置を推定する。処理ユニットは、オペレータの眼球の位置及び視線方向により表示部２７の表示画面上におけるどの位置にオペレータの視線位置があるか推定する。 For example, the line-of-sight detection unit 29 includes two cameras and a processing unit. The two cameras are set apart from each other by a certain distance, and each of them captures an image including the operator's pupil. Each camera transmits an image including the photographed image of the operator's pupil to the processing unit. The processing unit that has received an image from each camera estimates the shape of the eyeball of the operator based on the difference between the two images taken by the two cameras, and estimates the tangent plane with the eyeball at the iris center. The processing unit calculates a normal from the estimated tangent plane, and detects the line of sight of the operator. Further, the processing unit estimates the position of the operator's eyeball. The processing unit estimates at which position on the display screen of the display unit 27 the operator's line-of-sight position is based on the operator's eyeball position and line-of-sight direction.

なお、視線検出部２９は、上記した構成に限定されるものでは無い。たとえば、視線検出部２９は、複数のカメラにより構成し、視線位置を検出する演算処理をＣＰＵ２３などの別の処理部が行うようにしても良い。この場合、視線検出部２９としてのカメラは、画像から視線位置を検出する処理機能を有するＣＰＵ２３などの処理部へ撮影した画像を出力する。また、視線検出部２９は、カメラを有する構成に限定されるものでもなく、オペレータが装着する機器により表示画面上の視線位置を検出するものであっても良い。 The line-of-sight detection unit 29 is not limited to the configuration described above. For example, the line-of-sight detection unit 29 may be configured by a plurality of cameras, and another processing unit such as the CPU 23 may perform arithmetic processing for detecting the line-of-sight position. In this case, the camera as the line-of-sight detection unit 29 outputs the captured image to a processing unit such as the CPU 23 having a processing function for detecting the line-of-sight position from the image. Further, the line-of-sight detection unit 29 is not limited to the configuration having a camera, and may be one that detects the line-of-sight position on the display screen by a device worn by the operator.

また、視線検出部２９は、所定間隔ごとに、オペレータの視線位置を検出する。視線検出部２９は、所定間隔ごとに検出するオペレータの視線位置を示す情報を時間情報に対応づけた視線位置情報としてＣＰＵ２３へ送信する。たとえば、視線検出部２９は、１秒間に約１０〜３０回程度の周期で、オペレータの視線位置を検出し、検出したオペレータの視線位置と当該視線位置を検出した時間とを示す視線位置情報をＣＰＵ２３へ送信するようにしても良い。 The line-of-sight detection unit 29 detects the line-of-sight position of the operator at predetermined intervals. The line-of-sight detection unit 29 transmits information indicating the line-of-sight position of the operator detected at predetermined intervals to the CPU 23 as line-of-sight position information associated with time information. For example, the line-of-sight detection unit 29 detects an operator's line-of-sight position in a cycle of about 10 to 30 times per second, and displays line-of-sight position information indicating the detected line-of-sight position of the operator and the time when the line-of-sight position is detected. You may make it transmit to CPU23.

次に、ＶＣＤ２２の動作例について説明する。
図２は、本実施形態に係るＶＣＤ２２の動作例を示すフローチャートである。
画像蓄積分配装置２１は、区分機本体３の住所判定部１９が住所を読み取れなかった紙葉類の画像を含むコーディング用のデータを区分機本体３から順次受信する。たとえば、コーディング用のデータには、紙葉類の画像と当該紙葉類のＩＤとを含むデータである。また、コーディング用のデータには、さらに、文字認識部１８および住所判定部１９による処理結果（たとえば、認識できた文字を示す情報など）を含めても良い。画像蓄積分配装置２１は、住所判定部１９から供給されたコーディング用のデータを蓄積する。画像蓄積分配装置２１は、蓄積したコーディング用のデータを各ＶＣＤ２２の動作状況に応じて各ＶＣＤ２２へ分配する。 Next, an operation example of the VCD 22 will be described.
FIG. 2 is a flowchart showing an operation example of the VCD 22 according to the present embodiment.
The image storage / distribution device 21 sequentially receives data for coding including an image of a paper sheet whose address could not be read by the address determination unit 19 of the sorter body 3 from the sorter body 3. For example, the coding data is data including an image of a paper sheet and an ID of the paper sheet. The coding data may further include processing results (for example, information indicating recognized characters) by the character recognition unit 18 and the address determination unit 19. The image storage / distribution device 21 stores the coding data supplied from the address determination unit 19. The image storage / distribution device 21 distributes the stored coding data to each VCD 22 in accordance with the operation status of each VCD 22.

ＶＣＤ２２は、画像蓄積分配装置２１から配信される紙葉類の画像を含むコーディング用のデータを取得する（ステップ１１）。即ち、ＶＣＤ２２は、画像蓄積分配装置２１を介して、住所判定部１９が住所を特定できなかった紙葉類の画像（認識対象となる文字を含む画像）を含むコーディング用のデータを区分機本体３から取得する。 The VCD 22 acquires coding data including a paper sheet image distributed from the image storage / distribution device 21 (step 11). That is, the VCD 22 uses the image storage / distribution device 21 to store coding data including an image of a paper sheet (an image including characters to be recognized) for which the address determination unit 19 could not identify an address. Get from 3.

図３は、ＶＣＤ２２に供給されるコーディング用のデータに含まれる紙葉類の画像の例を示す図である。以下の説明においては、ＶＣＤ２２が受信するコーディング用のデータに含まれる紙葉類の画像が、一例として図３に示すような画像である場合を想定するものとする。
紙葉類の画像を含むコーディング用のデータを取得すると、ＣＰＵ２３は、受信したコーディング用のデータに含まれる紙葉類の画像を表示部２７に表示する（ステップ１２）。紙葉類の画像は、文字（パターン）認識の対象となる文字を含む画像である。紙葉類の画像を表示部２７に表示した状態において、ＣＰＵ２３は、オペレータによる入力部２８への入力を受け付ける。 FIG. 3 is a diagram illustrating an example of a paper sheet image included in the coding data supplied to the VCD 22. In the following description, it is assumed that the sheet image included in the coding data received by the VCD 22 is an image as shown in FIG. 3 as an example.
When the coding data including the paper sheet image is acquired, the CPU 23 displays the paper sheet image included in the received coding data on the display unit 27 (step 12). The paper image is an image including a character that is a target of character (pattern) recognition. In a state where the image of the paper sheet is displayed on the display unit 27, the CPU 23 receives an input to the input unit 28 by the operator.

また、表示部２７に紙葉類の画像を表示した状態において、視線検出部２９は、表示部２７の表示画面上におけるオペレータの視線位置を検出し、検出した視線位置と当該視線位置を検出した時刻と含む情報を視線位置情報としてＣＰＵ２３へ出力する（ステップＳ１３、Ｓ１４）。すなわち、視線検出部２９は、オペレータの視線位置を特定するための情報（例えば、カメラが撮影するオペレータの瞳を含む画像）を取得し（ステップ１３）、取得した情報（カメラが撮影した画像）からオペレータの視線位置を検出する（ステップＳ１４）。 Further, in the state where the image of the paper sheet is displayed on the display unit 27, the line-of-sight detection unit 29 detects the line-of-sight position of the operator on the display screen of the display unit 27, and detects the detected line-of-sight position and the line-of-sight position. Information including the time is output to the CPU 23 as line-of-sight position information (steps S13 and S14). That is, the line-of-sight detection unit 29 acquires information for specifying the operator's line-of-sight position (for example, an image including the operator's pupil imaged by the camera) (step 13), and the acquired information (image captured by the camera) From this, the line-of-sight position of the operator is detected (step S14).

たとえば、視線検出部２９は、複数（例えば２台）のカメラにより表示部２７の表示画面を見ているオペレータの瞳を含む画像を撮影する。複数のカメラでオペレータの瞳を撮影すると、視線検出部２９は、処理ユニットにより複数のカメラが撮影した複数の画像の差異に基づいてオペレータの眼球の形状を推定し、虹彩中心における眼球との接平面を推定する。接平面と推定すると、視線検出部２９の処理ユニットは、接平面からの法線を計算し、オペレータの視線方向を計算する。視線検出部２９の処理ユニットは、計算した視線方向に基づいてオペレータが表示部２７の表示画面上どこを見ているか、即ち、オペレータの視線位置、を検出する。 For example, the line-of-sight detection unit 29 captures an image including the eyes of the operator who is viewing the display screen of the display unit 27 with a plurality of (for example, two) cameras. When the operator's pupil is photographed by a plurality of cameras, the line-of-sight detection unit 29 estimates the shape of the eyeball of the operator based on the difference between the plurality of images photographed by the plurality of cameras by the processing unit, and contacts the eyeball at the iris center. Estimate the plane. When the tangent plane is estimated, the processing unit of the line-of-sight detection unit 29 calculates the normal from the tangent plane and calculates the line-of-sight direction of the operator. The processing unit of the line-of-sight detection unit 29 detects where the operator is looking on the display screen of the display unit 27 based on the calculated line-of-sight direction, that is, the operator's line-of-sight position.

なお、上記ステップ１３及びステップ１４は、文字の入力作業を行っているオペレータの認識対象の画像における視線位置を検出できる処理であれば良く、上述した方法に限定されるものではない。たとえば、視線位置を検出する処理としては、オペレータが装着した機器からの情報に基づいて表示画面上におけるオペレータの視線位置を検出する方法を適用しても良いし、瞳の方向別のパターンとのマッチングに基づいて特定される視線方向から表示画面上の視線位置を検出する方法などを適用しても良い。 Steps 13 and 14 are not limited to the above-described method, as long as they are processes that can detect the line-of-sight position in the recognition target image of the operator who is performing the character input operation. For example, as a process for detecting the line-of-sight position, a method of detecting the line-of-sight position of the operator on the display screen based on information from a device worn by the operator may be applied. A method of detecting the line-of-sight position on the display screen from the line-of-sight direction specified based on the matching may be applied.

視線検出部２９が視線位置を検出すると、ＣＰＵ２３は、視線検出部２９が検出した視線位置を示す情報とその視線位置を検出した時刻とを対応づけて不揮発性メモリ２４に格納する（ステップ１５）。表示部２７の表示画面における視線位置を示す情報とその視線位置であった時刻を示す情報とは、当該紙葉類の画像をコーディングした際の視線位置情報として不揮発性メモリ２４に保持される。この結果、不揮発性メモリ２４には、時系列の視線位置情報が格納される。なお、ＣＰＵ２３は、視線位置情報をＲＡＭ２５に格納してもよい。 When the line-of-sight detection unit 29 detects the line-of-sight position, the CPU 23 associates the information indicating the line-of-sight position detected by the line-of-sight detection unit 29 with the time when the line-of-sight position is detected, and stores it in the nonvolatile memory 24 (step 15). . Information indicating the line-of-sight position on the display screen of the display unit 27 and information indicating the time when the line-of-sight position was displayed are held in the nonvolatile memory 24 as line-of-sight position information when the image of the paper sheet is coded. As a result, time-series eye-gaze position information is stored in the nonvolatile memory 24. The CPU 23 may store the line-of-sight position information in the RAM 25.

視線位置情報を不揮発性メモリ２４に格納する処理と並行して、ＣＰＵ２３は、オペレータがキーボードなどの入力部２８により入力した文字を示す情報を入力文字情報として不揮発性メモリ２４に格納する処理を行う（ステップＳ１６、Ｓ１７）。すなわち、ＣＰＵ２３は、キーボードあるいはポインティングデバイスなどの入力部２８において入力された情報を検知する（ステップ１６）。入力部２８により文字（住所）が入力されると（ステップ１６、ＹＥＳ）、ＣＰＵ２３は、入力された文字（又は単語）とその入力時刻とを対応づけた情報を文字入力情報として不揮発性メモリ２４に保存する（ステップ１７）。 In parallel with the process of storing the line-of-sight position information in the non-volatile memory 24, the CPU 23 performs the process of storing information indicating the character input by the operator using the input unit 28 such as a keyboard as input character information in the non-volatile memory 24. (Steps S16 and S17). That is, the CPU 23 detects information input through the input unit 28 such as a keyboard or a pointing device (step 16). When a character (address) is input by the input unit 28 (step 16, YES), the CPU 23 uses information associating the input character (or word) and the input time as character input information to the nonvolatile memory 24. (Step 17).

ＣＰＵ２３は、視線位置情報と文字入力情報とを１つの紙葉類の画像から得られた文字認識用辞書１８ａの学習用データとして不揮発性メモリ２４に記憶する。すなわち、不揮発性メモリ２４は、視線位置情報を時系列の位置情報として記憶し、入力された文字情報を入力時刻と対応づけて記憶する。このように不揮発性メモリ２４に記憶した時系列の視線位置情報と時刻に対応づけた入力文字情報とを含む学習用データによれば、各文字を入力した時にオペレータが見ていた位置（視線位置）、あるいは、各文字を入力する少し前にオペレータが見ていた位置（視線位置）が容易に特定できる。 The CPU 23 stores the line-of-sight position information and the character input information in the nonvolatile memory 24 as learning data of the character recognition dictionary 18a obtained from one sheet of paper image. That is, the nonvolatile memory 24 stores the line-of-sight position information as time-series position information, and stores the input character information in association with the input time. In this way, according to the learning data including the time-series gaze position information stored in the nonvolatile memory 24 and the input character information associated with the time, the position (gaze position) that the operator was viewing when each character was entered. ), Or the position (line-of-sight position) that the operator was looking at before entering each character can be easily specified.

また、ＣＰＵ２３は、入力部２８による入力に応じて文字入力処理が終了したか否かを判断する（ステップＳ１８）。たとえば、ＣＰＵ２３は、入力部２８により文字入力完了の指示が入力された場合に表示部２７に表示した紙葉類の画像に対するコーディング処理としての文字（住所）入力処理が終了したものと判断するようにすれば良い。 Further, the CPU 23 determines whether or not the character input process is completed in accordance with the input from the input unit 28 (step S18). For example, the CPU 23 determines that the character (address) input process as the coding process for the paper sheet image displayed on the display unit 27 has been completed when the input unit 28 inputs a character input completion instruction. You can do it.

表示部２７に表示している画像に対する文字入力処理（コーディング処理）が終了でなければ（ステップＳ１８、ＮＯ）、ＣＰＵ２３は、ステップ１３へ戻り、上述した処理を繰り返す。たとえば、ＣＰＵ２３は、文字入力処理が終了するまでの間、オペレータの視線位置を所定のタイミング（たとえば、１秒間に１０〜３０回程度）で検出できるようステップＳ１３-Ｓ１４の視線位置検出処理を繰り返し実行する。また、ＣＰＵ２３は、文字入力処理が終了するまで、文字の入力に応じてステップＳ１７の処理としての文字入力情報の保存処理を行う。 If the character input process (coding process) for the image displayed on the display unit 27 is not completed (step S18, NO), the CPU 23 returns to step 13 and repeats the process described above. For example, the CPU 23 repeats the line-of-sight position detection process in steps S13 to S14 so that the line-of-sight position of the operator can be detected at a predetermined timing (for example, about 10 to 30 times per second) until the character input process is completed. Run. Further, the CPU 23 performs a character input information storage process as a process of step S17 in accordance with the input of characters until the character input process is completed.

また、表示部２７に表示した紙葉類の画像に対する文字入力処理（コーディング処理）が終了した場合（ステップＳ１８、ＹＥＳ）、ＣＰＵ２３は、入力された文字からなる入力情報を当該紙葉類の画像に対する住所情報とする。ＣＰＵ２３は、入力された文字からなる住所情報に当該紙葉類を特定するための情報（当該紙葉類のＩＤ）に対応づけたコーディング結果を示す情報を画像蓄積分配装置２１を介して区分機本体３へ送信する（ステップＳ１９）。 When the character input process (coding process) for the paper sheet image displayed on the display unit 27 is completed (step S18, YES), the CPU 23 uses the input information including the input characters as the image of the paper sheet. Address information. The CPU 23 uses the image storage / distribution device 21 to classify information indicating the coding result associated with the address information including the input characters and the information for identifying the paper sheet (ID of the paper sheet). It transmits to the main body 3 (step S19).

また、文字入力処理が終了した場合（ステップＳ１８、ＹＥＳ）、ＣＰＵ２３は、不揮発性メモリ２４に記憶した視線位置情報と入力文字情報とを文字認識用辞書１８ａの学習用データとして画像蓄積分配装置２１を介して区分機本体３の学習部２０へ供給する（ステップＳ２０）。ＣＰＵ２３は、１件分の住所情報のコーディング処理（文字入力処理）が完了するごとに１件分の学習用データを学習部２０へ転送するようにしても良いし、所定のタイミングで不揮発性メモリに蓄積した学習用データを学習部２０へ転送するようにしても良いし、所定件数分の学習用データをまとめて学習部２０へ転送するようにしても良い。 When the character input process is completed (YES in step S18), the CPU 23 uses the line-of-sight position information and the input character information stored in the nonvolatile memory 24 as learning data in the character recognition dictionary 18a. Is supplied to the learning unit 20 of the sorting machine main body 3 (step S20). The CPU 23 may transfer the learning data for one case to the learning unit 20 every time the coding process (character input processing) of the address information for one case is completed, or the non-volatile memory at a predetermined timing. The learning data stored in the learning data may be transferred to the learning unit 20, or a predetermined number of pieces of learning data may be collectively transferred to the learning unit 20.

次に、視線位置と入力文字との関係について説明する。
図４は、表示部２７が表示する図３に示すような紙葉類の画像を見て文字（住所）情報を入力するオペレータの視線位置と文字を入力するタイミングとの例を示す図である。
図４上にある線ｌは、オペレータの視線位置の動きを時系列で示すものである。図４に示す例において、オペレータの視線位置は、線ｌで示すように、紙葉類の画像における「s t o c k H o l m」上に集中している。図４に示す点ａは、オペレータが入力部２８により入力文字として「Ｈ」を入力した時刻における視線位置を示している。ＶＣＤ２２は、上述したように、時系列の視線位置情報と時刻情報に対応づけて入力文字情報とにより、各文字（たとえば、「ｋ」、「ｏ」など）が入力された時刻での視線位置を示す情報も有している。また、入力文字は、住所情報としても格納されている。このため、ＶＣＤ２２は、当該紙葉類の画像に記載されている住所情報も得ることができる。 Next, the relationship between the line-of-sight position and the input character will be described.
FIG. 4 is a diagram showing an example of the line-of-sight position of an operator who inputs character (address) information by looking at an image of a paper sheet as shown in FIG. .
A line l in FIG. 4 shows the movement of the operator's line-of-sight position in time series. In the example shown in FIG. 4, the line-of-sight position of the operator is concentrated on “stock Holm” in the image of the paper sheet, as indicated by the line l. A point a shown in FIG. 4 indicates the line-of-sight position at the time when the operator inputs “H” as an input character by the input unit 28. As described above, the VCD 22 uses the time-series gaze position information and the input character information in association with the time information, and the gaze position at the time when each character (for example, “k”, “o”, etc.) is input. It also has information indicating. Input characters are also stored as address information. For this reason, the VCD 22 can also obtain address information described in the image of the paper sheet.

文字入力処理（コーディング処理）が終了したと判断した場合、ＣＰＵ２３は、不揮発性メモリ２４又はＲＡＭ２５に格納されている打鍵入力された文字情報としての住所情報を読み込む。ＣＰＵ２３は、画像蓄積分配装置２１を通じて、読み込んだ住所情報を当該紙葉類の識別情報（ＩＤコード）と対応づけて区分機本体３へ送信する。これにより、区分機本体３では、各紙葉類の住所情報として、紙葉類に付与したＩＤバーコードに対応するＶＣＳで打鍵入力された住所情報を取得できる。 When it is determined that the character input process (coding process) has been completed, the CPU 23 reads the address information as the character information input by key entry stored in the nonvolatile memory 24 or the RAM 25. The CPU 23 transmits the read address information to the sorter main body 3 through the image storage / distribution device 21 in association with the identification information (ID code) of the paper sheet. As a result, the sorting machine main body 3 can acquire the address information that is key-input by the VCS corresponding to the ID barcode assigned to the paper sheet as the address information of each paper sheet.

また、ＣＰＵ２３は、画像蓄積分配装置２１を通じて、不揮発性メモリ２４に格納されている当該紙葉類の画像、各時刻における視線位置を示す視線位置情報、および、入力文字と入力時刻と対応づけた入力文字情報を学習用データとして区分機本体３へ送信する。学習用データは、文字認識用辞書１８ａを更新するための情報であるため、学習部２０が学習処理を実行するまでに供給すればよい。つまり、学習用データは、文字入力処理が終了するごとに区分機本体３へ送信しなくても良く、任意のタイミングで区分機本体３へ送信するようにしても良い。 Further, the CPU 23 associates the image of the paper sheet stored in the nonvolatile memory 24 with the image storage / distribution device 21, the line-of-sight position information indicating the line-of-sight position at each time, and the input character and the input time. Input character information is transmitted to the sorter body 3 as learning data. Since the learning data is information for updating the character recognition dictionary 18a, the learning data may be supplied before the learning unit 20 executes the learning process. That is, the learning data does not have to be transmitted to the sorting machine main body 3 every time the character input process is completed, and may be transmitted to the sorting machine main body 3 at an arbitrary timing.

次に、ＶＣＳ４で作成される学習用データについて説明する。
図５は、ＶＣＤ２２が作成する学習用データの構成例である。
学習用データは、紙葉類の画像データ（図５に示す例では、ＸＸＸ．ＪＰＧ）、各時刻の視線位置を示す視線位置情報、および、入力文字と入力時刻とを対応づけた入力文字情報を有する。図５に示す例では、ａ時刻でのオペレータの視線位置が表示画面における座標（ｘ１００、ｙ１００）の位置となっていることを示している。また、図５に示す例は、ｂ時刻で文字情報「Ｈ」が入力されたこと（言い換えると、文字「Ｈ」が入力された時刻が「ｂ」であること）を示している。 Next, the learning data created by the VCS 4 will be described.
FIG. 5 is a configuration example of the learning data created by the VCD 22.
The learning data includes paper sheet image data (XXX.JPG in the example shown in FIG. 5), line-of-sight position information indicating the line-of-sight position at each time, and input character information in which the input character is associated with the input time. Have In the example shown in FIG. 5, it is shown that the line-of-sight position of the operator at time a is the position of coordinates (x100, y100) on the display screen. Further, the example shown in FIG. 5 indicates that the character information “H” is input at time b (in other words, the time when the character “H” is input is “b”).

図５に示す例では、ａ時刻では文字入力されておらず、ｂ時刻では文字として「Ｈ」が入力されていることを示す。さらに、文字「Ｈ」が入力されたｂ時刻での視線位置は、視線位置情報により表示画面上の座標（ｘ２００、ｙ２００）であると特定できる。すなわち、図５に示すような学習用データには、時系列の視線位置、および、入力文字と入力時刻とを対応づけた情報が含まれる。このため、図５に示すような学習用データによれば、オペレータがどの文字を入力した時にどの位置を見ていたかが特定できる。 The example shown in FIG. 5 indicates that no character is input at time a and “H” is input as a character at time b. Further, the line-of-sight position at time b when the character “H” is input can be specified as coordinates (x200, y200) on the display screen based on the line-of-sight position information. That is, the learning data as shown in FIG. 5 includes time series line-of-sight positions and information in which input characters are associated with input times. For this reason, according to the learning data as shown in FIG. 5, it is possible to specify which position the operator was looking at when inputting which character.

次に、学習部２０について説明する。
ここでは、学習部２０がＶＣＳ（ＶＣＤ）から供給された学習用データを用いて文字認識用の辞書１８ａを更新する処理（学習機能）について説明するものとする。ただし、後述する学習処理は、パターン認識用の辞書学習に適用できるものである。たとえば、後述する学習処理は、文字単体だけでなく、単語についても適用できるため、住所ＤＢ１９ａにおける住所情報の更新に適用しても良い。また、後述する学習処理は、文字或いは単語だけでなく、生体認証処理あるいは特定の物体の検出処理などの他のパターン認識に用いられる辞書の学習処理に適用しても良い。 Next, the learning unit 20 will be described.
Here, a process (learning function) in which the learning unit 20 updates the character recognition dictionary 18a using the learning data supplied from the VCS (VCD) will be described. However, the learning process to be described later can be applied to dictionary learning for pattern recognition. For example, the learning process described later can be applied not only to a single character but also to a word, and therefore may be applied to updating address information in the address DB 19a. Further, the learning process described later may be applied not only to characters or words but also to a dictionary learning process used for other pattern recognition such as a biometric authentication process or a specific object detection process.

すなわち、学習部２０は、ＶＣＳ４から供給される学習用データを使って辞書１８ａを更新する。学習部２０は、紙葉類の画像において文字が書かれていると推定される部分画像（文字領域候補）を特定する。学習部２０は、特定した各部分画像に対してオペレータが入力した入力文字が当該部分画像に書かれている確率（存在する確率）を計算する。つまり、学習部２０は、認証対象の画像におけるどの部分に、入力文字か書かれている可能性が高いか又は低いか計算する。学習部２０は、部分画像から抽出される特徴量と入力文字が部分画像に存在する確率とに基づいて辞書１８ａを更新する。 That is, the learning unit 20 updates the dictionary 18a using the learning data supplied from the VCS 4. The learning unit 20 identifies a partial image (character region candidate) that is estimated to have a character written in a paper sheet image. The learning unit 20 calculates the probability (input probability) that the input character input by the operator is written in the partial image for each identified partial image. That is, the learning unit 20 calculates which portion of the image to be authenticated is likely to have an input character written or is low. The learning unit 20 updates the dictionary 18a based on the feature amount extracted from the partial image and the probability that the input character exists in the partial image.

図６は、学習装置としての学習部２０の構成例を示す図である。
図６に示す構成例において、学習部２０は、データ格納部３１、仮説生成部３２、仮説事前確率設定部３３、仮説事後確率設定部３４、特徴抽出部３６及び更新部３７などを備える。学習部２０は、プロセッサが制御プログラムを実行することにより上述した各部の処理を含む種々の処理機能を実現する。たとえば、学習部２０は、プロセッサとメモリとインターフェースとを有するコンピュータにより実現できる。 FIG. 6 is a diagram illustrating a configuration example of the learning unit 20 as a learning device.
In the configuration example illustrated in FIG. 6, the learning unit 20 includes a data storage unit 31, a hypothesis generation unit 32, a hypothesis prior probability setting unit 33, a hypothesis posterior probability setting unit 34, a feature extraction unit 36 and an update unit 37. The learning unit 20 implements various processing functions including the processing of each unit described above by the processor executing the control program. For example, the learning unit 20 can be realized by a computer having a processor, a memory, and an interface.

データ格納部３１は、ＶＣＳ４から供給される学習用データを格納する。データ格納部３１は、ＶＣＳ４から供給される学習用データを不揮発性のメモリに保存する構成を有する。データ格納部３１は、学習用データとして、例えば、認識対象とする画像の画像データ、視線位置情報、および、入力文字情報を含む情報を保存する。データ格納部３１に記憶された学習用データは、適宜読出し可能である。データ格納部３１に記憶された学習用データは、仮説生成部３２、仮説事前確率設定部３３などに供給される。 The data storage unit 31 stores learning data supplied from the VCS 4. The data storage unit 31 has a configuration for storing learning data supplied from the VCS 4 in a nonvolatile memory. The data storage unit 31 stores, for example, information including image data of an image to be recognized, line-of-sight position information, and input character information as learning data. The learning data stored in the data storage unit 31 can be read as appropriate. The learning data stored in the data storage unit 31 is supplied to the hypothesis generation unit 32, the hypothesis prior probability setting unit 33, and the like.

仮説生成部３２は、文字の認識対象とする画像（紙葉類の画像）において、ＶＣＳ４で入力された文字（正解文字）が記載されているらしい領域を仮説として生成する。仮説生成部３２は、入力された１つの文字（正解文字）に対し、当該文字の記載領域らしい部分画像（文字領域候補）を認識対象とする画像から生成する。生成された部分画像としての文字領域候補は、仮説とも称する。たとえば、仮説生成部３２は、認識対象の画像において、文字らしい連結画素成分に対する外接矩形の領域を仮説としての部分画像（文字領域候補）として抽出する。仮説としての部分画像（文字領域候補）を抽出する方法は、特定の方法に限定されない。 The hypothesis generation unit 32 generates, as a hypothesis, an area in which the character (correct answer character) input by the VCS 4 is described in an image (paper image) that is a character recognition target. The hypothesis generation unit 32 generates a partial image (character area candidate) that is likely to be a description area of the character from an image that is a recognition target for one input character (correct character). The generated character area candidate as a partial image is also referred to as a hypothesis. For example, the hypothesis generation unit 32 extracts a circumscribed rectangular region for a connected pixel component that seems to be a character as a partial image (character region candidate) as a hypothesis in the recognition target image. A method of extracting a partial image (character area candidate) as a hypothesis is not limited to a specific method.

また、仮説生成部３２は、視線位置情報を参照して仮説としての文字領域候補を絞り込むようにしても良い。つまり、仮説生成部３２は、学習用データに含まれる視線位置情報を参照して学習対象となる文字（入力された正解文字）が書かれている可能性の高い仮説（文字領域候補）を絞り込むようにしても良い。この場合、仮説生成部３２は、当該文字が入力された時刻での視線位置（あるいは当該文字を入力する前の時刻での視線位置）の周辺に存在する文字領域候補を仮説として選択する。たとえば、「Ｈ」が書かれている仮説を生成する場合、仮説生成部３２は、文字領域候補のうち文字「Ｈ」が入力された時刻での視線位置の周辺にある文字領域候補を仮説として選択（生成）するようにしても良い。また、仮説生成部３２は、文字の特徴などに基づいて文字ごとに仮説を生成してもよいし、文字とは独立に仮説を生成してもよい。仮説を生成する方法は、特定の方法に限定されるものではない。 Further, the hypothesis generation unit 32 may narrow down character region candidates as hypotheses with reference to the line-of-sight position information. That is, the hypothesis generation unit 32 refers to the line-of-sight position information included in the learning data, and narrows down hypotheses (character area candidates) that are likely to have a character to be learned (input correct character) written therein. You may do it. In this case, the hypothesis generation unit 32 selects, as a hypothesis, a character area candidate that exists around the line-of-sight position at the time when the character is input (or the line-of-sight position at the time before the character is input). For example, when generating a hypothesis in which “H” is written, the hypothesis generation unit 32 uses, as hypotheses, character region candidates around the line-of-sight position at the time when the character “H” is input among the character region candidates. It may be selected (generated). Further, the hypothesis generation unit 32 may generate a hypothesis for each character based on character characteristics or the like, or may generate a hypothesis independently of the character. The method of generating a hypothesis is not limited to a specific method.

仮説事前確率設定部３３は、仮説生成部３２で仮説として生成した文字領域候補に、当該文字が書かかれている確率を計算する。この確率は、事前確率とも称する。また、当該文字が書かれている仮説は、正解仮説とも称する。仮説事前確率設定部３３は、各仮説（文字領域候補）における特徴量を用いずに確率を計算する。
仮説事前確率設定部３３は、たとえば、オペレータがある文字を入力した時刻でのオペレータの視線位置を基に、各仮説が正解仮説である確率（事前確率）を計算する。オペレータは、表示部２７に表示された認識対象の画像（紙葉類の画像）を見ながら文字を入力する。このため、オペレータが文字を入力した時刻に視線があった位置（文字入力時の視線位置）の周辺にある仮説が正解仮説である確率が高い。すなわち、文字の入力時刻に視線があった位置の周辺にある仮説は正解仮説である可能性が高く、当該視線位置より遠くにある仮説は正解仮説である確率は低い。 The hypothesis prior probability setting unit 33 calculates the probability that the character is written in the character region candidate generated as a hypothesis by the hypothesis generation unit 32. This probability is also referred to as prior probability. A hypothesis in which the character is written is also called a correct hypothesis. The hypothesis prior probability setting unit 33 calculates the probability without using the feature amount in each hypothesis (character region candidate).
The hypothesis prior probability setting unit 33 calculates, for example, the probability (prior probability) that each hypothesis is a correct hypothesis based on the line-of-sight position of the operator at the time when the operator inputs a certain character. The operator inputs characters while looking at the recognition target image (paper sheet image) displayed on the display unit 27. For this reason, there is a high probability that the hypothesis around the position where the line of sight was at the time when the operator entered the character (the line of sight position when inputting the character) is the correct hypothesis. That is, a hypothesis near the position where the line of sight was at the character input time is likely to be a correct hypothesis, and a hypothesis far from the line of sight position has a low probability of being a correct hypothesis.

また、オペレータは、表示部２７に表示されている画像を見てから少し時間が経過した後に文字を入力することもある。つまり、オペレータが表示された文字を見てから実際に文字を入力するまでにはタイムラグがある可能性がある。このような場合を想定すれば、文字の入力時刻の少し前に視線があった位置（文字入力直前の視線位置）の周辺の仮説は正解仮説である確率が高いとしてもよい。なお、事前確率の決定方法は、特定の方法に制限されるものではない。たとえば、仮説に対する事前確率の決定方法は、オペレータごとに変えてもよい。オペレータの癖（各オペレータが文字を入力する時の視線位置の傾向）などを反映した事前確率の決定方法を採用すれば、精度の高い事前確率を算出できる。 In addition, the operator may input a character after a little time has elapsed since viewing the image displayed on the display unit 27. In other words, there may be a time lag from when the operator sees the displayed character until when the character is actually input. Assuming such a case, the hypothesis around the position where the line of sight was slightly before the character input time (the line of sight immediately before the character input) may have a high probability of being the correct hypothesis. Note that the prior probability determination method is not limited to a specific method. For example, the prior probability determination method for the hypothesis may be changed for each operator. By adopting a prior probability determination method that reflects an operator's habit (the tendency of the line-of-sight position when each operator inputs a character) and the like, a highly accurate prior probability can be calculated.

仮説事後確率設定部３４は、仮説生成部３２が生成したある文字に対する各仮説について、事前確率と各仮説としての文字領域候補における特徴パラメータ（特徴量）とを用いて、当該文字が各仮説に記載されている確率（各仮説が正解仮説である確率）を計算する。この確率は、事後確率とも称する。 The hypothesis posterior probability setting unit 34 uses, for each hypothesis for a certain character generated by the hypothesis generation unit 32, the prior character and the feature parameter (feature amount) in the character region candidate as each hypothesis, so that the character is assigned to each hypothesis. Calculate the probability described (probability that each hypothesis is a correct hypothesis). This probability is also called posterior probability.

仮説事後確率設定部３４は、後述する特徴抽出部３６から各仮説としての文字領域候補における文字の特徴量（特徴パラメータ）を取得する。仮説事後確率設定部３４は、たとえば、各仮説における文字の特徴パラメータと辞書１８ａに格納されている入力文字の特徴パラメータとを比較することにより、各仮説における文字と入力文字（正解の文字）との類似度（現在の辞書１８ａで仮説が正解である確率）を計算する。仮説事後確率設定部３４は、各仮説の特徴パラメータと辞書１８ａの入力文字の特徴パラメータとにより計算した類似度（現在の辞書１８ａで仮説が正解である確率）と仮説事前設定部３３で計算した事前確率とを使って各仮説が正解仮説である確率（事後確率）を計算する。 The hypothesis posterior probability setting unit 34 acquires the character feature amount (feature parameter) in the character region candidate as each hypothesis from the feature extraction unit 36 described later. The hypothesis posterior probability setting unit 34 compares, for example, the character parameters of each hypothesis with the character parameters of the input characters stored in the dictionary 18a, thereby determining the characters and input characters (correct characters) in each hypothesis. (The probability that the hypothesis is correct in the current dictionary 18a) is calculated. The hypothesis posterior probability setting unit 34 calculates the similarity (probability that the hypothesis is correct in the current dictionary 18a) calculated by the feature parameter of each hypothesis and the input character feature parameter of the dictionary 18a and the hypothesis pre-setting unit 33. Use the prior probabilities to calculate the probability that each hypothesis is the correct hypothesis (posterior probability).

たとえば、仮説事後確率設定部３４は、各仮説における文字が入力文字の特徴量とオペレータの視線位置に基づく事前確率とから各仮説が正解仮説である確率（事後確率）を計算する。事後確率を計算する方法は、特定の方法に限定されない。学習部２０では、事前確率と類似度とから計算される事後確率を用いることにより、視線位置から計算される事前確率よりも、正確に正解仮説を推定することができる。 For example, the hypothesis posterior probability setting unit 34 calculates the probability (posterior probability) that each hypothesis is a correct hypothesis from the feature amount of the input character and the prior probability based on the operator's line-of-sight position. The method for calculating the posterior probability is not limited to a specific method. The learning unit 20 can estimate the correct hypothesis more accurately than the prior probability calculated from the line-of-sight position by using the posterior probability calculated from the prior probability and the similarity.

特徴抽出部３６は、仮説としての部分画像（文字候補領域）から特徴パラメータを抽出する。特徴抽出部３６で抽出される特徴パラメータは、辞書１８ａで使用される特徴パラメータと比較されるデータである。たとえば、特徴抽出部３６は、特徴パラメータとして、仮説としての部分画像をぼかした後の輝度勾配情報を１２８次元ベクトルとして特徴を抽出しても良い。また、特徴抽出部３６は、画素の濃度値を特徴パラメータとしてもよいし、さらに高次の情報を特徴として用いてもよい。特徴パラメータの抽出方法は、特定の方法に限定されない。 The feature extraction unit 36 extracts feature parameters from a partial image (character candidate region) as a hypothesis. The feature parameter extracted by the feature extraction unit 36 is data to be compared with the feature parameter used in the dictionary 18a. For example, the feature extraction unit 36 may extract features as feature parameters using brightness gradient information after blurring a partial image as a hypothesis as a 128-dimensional vector. The feature extraction unit 36 may use the density value of the pixel as a feature parameter, or may use higher-order information as a feature. The feature parameter extraction method is not limited to a specific method.

更新部３７は、事後確率及び仮説の特徴パラメータ（文字領域候補における文字の特徴パラメータ）に基づいて辞書１８ａに記憶されている辞書データを更新（学習）する。更新部３７は、事後確率の小さい仮説の特徴パラメータが大きく反映されず、事後確率の大きい仮説の特徴パラメータが大きく反映されるように、辞書１８ａを更新する。また、更新部３７は、古い学習用データから得られる情報よりも新しい学習用データから得られる情報をより大きく反映させるようにしてもよい。辞書１８ａへの更新方法は、特定の方法に限定されない。更新方法の具体例については、後述する。 The updating unit 37 updates (learns) the dictionary data stored in the dictionary 18a based on the posterior probability and the hypothetical feature parameter (character feature parameter of the character region candidate). The updating unit 37 updates the dictionary 18a so that the feature parameter of the hypothesis having a small posterior probability is not largely reflected and the feature parameter of the hypothesis having a large posterior probability is largely reflected. Further, the update unit 37 may reflect information obtained from new learning data more largely than information obtained from old learning data. The updating method for the dictionary 18a is not limited to a specific method. A specific example of the update method will be described later.

次に、学習部２０における辞書１８ａの学習処理に用いられるアルゴリズムの例について説明する。
まず、学習部２０は、データ格納部３１から学習用データを読み出す。ここでは、具体例として、学習用データに含まれる認識対象の画像が、図３に示す紙葉類の画像であることを想定して説明するものとする。図３に示す紙葉類の画像では、「ＴａｒｏＴｏｓｈｉｂｅＴＯＳＨＩＢＥｖａｇｅｎ１５１ｓｔｏｃｋＨｏｌｍＳＷＥＤＥＮ」と記載されている。図３に示す画像を含む学習用データには、入力文字として、「ｓ」「ｔ」「ｏ」「ｃ」「ｋ」「Ｈ」「ｏ」「ｌ」「ｍ」が含まれる。また、図３に示す画像を含む学習用データには、視線位置情報として、図４に示すような視線位置を示す情報が含まれる。図４に示す視線位置は、「ｓｔｏｃｋＨｏｌｍ」と記載されている位置に集中している。図４における点ａは、文字「Ｈ」が入力された時刻におけるオペレータの視線位置である。 Next, an example of an algorithm used for the learning process of the dictionary 18a in the learning unit 20 will be described.
First, the learning unit 20 reads learning data from the data storage unit 31. Here, as a specific example, it is assumed that the recognition target image included in the learning data is an image of a paper sheet shown in FIG. In the image of the paper sheet shown in FIG. 3, “Taro Toshishi TOSHIBE vaguen 151 stockHolm SWEDEN” is described. The learning data including the image shown in FIG. 3 includes “s” “t” “o” “c” “k” “H” “o” “l” “m” as input characters. Further, the learning data including the image illustrated in FIG. 3 includes information indicating the line-of-sight position as illustrated in FIG. 4 as the line-of-sight position information. The line-of-sight position shown in FIG. 4 is concentrated at the position described as “stockHolm”. Point a in FIG. 4 is the operator's line-of-sight position at the time when the character “H” is input.

仮説生成部３２は、データ格納部３１から学習用データを取得する。学習用データを取得すると、仮説生成部３２は、入力された１つの文字（正解文字）に対して、当該文字が記載されていると推定される仮説としての領域（部分画像）を画像データから１つ又は複数生成する。たとえば、仮説生成部３２は、「Ｈ」が書かれていると推定される仮説を生成することを想定する。仮説生成部３２は、文字「Ｈ」が入力された時点での視線位置（図４中に示す「×」印）の周辺における文字候補領域を仮説として選択する。 The hypothesis generation unit 32 acquires learning data from the data storage unit 31. When the learning data is acquired, the hypothesis generation unit 32 extracts, from the image data, a region (partial image) as a hypothesis that the character is described with respect to one input character (correct character). Create one or more. For example, it is assumed that the hypothesis generation unit 32 generates a hypothesis estimated that “H” is written. The hypothesis generation unit 32 selects, as a hypothesis, a character candidate region around the line-of-sight position (“×” shown in FIG. 4) when the character “H” is input.

図７は、仮説生成部３２が生成する仮説の例を示す。
仮説４１〜５１は、文字「Ｈ」が含まれると推定される文字候補領域としての仮説の例である。各仮説４１〜５１は、連結画素の成分に対する外接矩形を基準として生成される文字候補領域のうち視線位置を参照して選択された仮説（文字候補領域）の例である。また、仮説生成部３２は、連結画素成分の外接矩形のうち隣接する矩形を組み合わせて１つの文字領域候補（仮説）を生成しても良い。たとえば、仮説４５及び仮説４８は、複数の仮説を含む。仮説４５及び仮説４８は、複数の外接矩形が１つの文字を含む領域である可能性を考慮した仮説である。複数の外接矩形からなる仮説は、隣接する矩形間の距離および隣接する各矩形の相対的な大きさなどに応じて組み合わせることができる。なお、生成する仮説（文字候補領域）は、必ずしも矩形に限定されるものではない。
仮説生成部３２は、生成した仮説を仮説事前確率設定部３３及び特徴抽出部３６へ送信する。 FIG. 7 shows an example of a hypothesis generated by the hypothesis generation unit 32.
Hypotheses 41 to 51 are examples of hypotheses as character candidate regions estimated to include the letter “H”. Each hypothesis 41 to 51 is an example of a hypothesis (character candidate region) selected with reference to the line-of-sight position among the character candidate regions generated based on the circumscribed rectangle for the connected pixel component. The hypothesis generation unit 32 may generate one character region candidate (hypothesis) by combining adjacent rectangles among circumscribed rectangles of the connected pixel components. For example, hypothesis 45 and hypothesis 48 include a plurality of hypotheses. Hypothesis 45 and hypothesis 48 are hypotheses considering the possibility that a plurality of circumscribed rectangles are regions including one character. Hypotheses composed of a plurality of circumscribed rectangles can be combined according to the distance between adjacent rectangles, the relative size of each adjacent rectangle, and the like. The generated hypothesis (character candidate area) is not necessarily limited to a rectangle.
The hypothesis generation unit 32 transmits the generated hypothesis to the hypothesis prior probability setting unit 33 and the feature extraction unit 36.

仮説事前確率設定部３３は、仮説生成部３２から各入力文字に対する仮説を受信する。仮説事前確率設定部３３は、視線位置情報により各仮説に対する事前確率を計算する。仮説事前確率設定部３３は、オペレータが文字を入力した時（直前）の視線位置に基づいて、各仮説が正解仮説であるらしい確率を計算する。 The hypothesis prior probability setting unit 33 receives a hypothesis for each input character from the hypothesis generation unit 32. The hypothesis prior probability setting unit 33 calculates the prior probability for each hypothesis based on the line-of-sight position information. The hypothesis prior probability setting unit 33 calculates the probability that each hypothesis is likely to be a correct hypothesis based on the line-of-sight position when the operator inputs a character (immediately before).

ここで、入力文字をｃ、仮説をｈｉ（ｉ＝１からＮ、Ｎはｃに対する仮説の数）とすると、事前確率は、Ｐ（ｈｉ｜ｃ）と表現される。この場合、事前確率は、以下の式を満たす。

Here, if the input character is c and the hypothesis is hi (i = 1 to N, N is the number of hypotheses for c), the prior probability is expressed as P (hi | c). In this case, the prior probability satisfies the following formula.

ここでは、仮説中に正解仮説が存在しない場合を考慮し、「存在しない」という仮説が存在するものとする。そのため、数式１は、常に満たされる。 Here, it is assumed that there is a hypothesis that “does not exist” in consideration of the case where there is no correct hypothesis in the hypothesis. Therefore, Formula 1 is always satisfied.

図８は、入力文字を「Ｈ」とした場合に、各仮説４１〜５１に対する事前確率を示した図である。図８に示す例では、仮説４１〜５１は、順に０．１％、０．２％、０．５％、１％、３％、５％、１５％、３０％、４０％、０．１％、０．１％と設定されている。従って、文字「Ｈ」を入力した時刻での視線位置（図８に示す「×」印）の周辺にある仮説（図８に示した例では、仮説４８及び仮説４９など）は、高い事前確率が設定される。また、当該視線位置の少し前の位置の仮説（図８に示す例では、仮説４７及び仮説４６など）も、他の仮説に比べて相対的に大きい確率が設定される。 FIG. 8 is a diagram showing prior probabilities for the respective hypotheses 41 to 51 when the input character is “H”. In the example shown in FIG. 8, hypotheses 41 to 51 are in order of 0.1%, 0.2%, 0.5%, 1%, 3%, 5%, 15%, 30%, 40%, 0.1 % And 0.1% are set. Therefore, hypotheses (such as hypothesis 48 and hypothesis 49 in the example shown in FIG. 8) around the line-of-sight position (“×” shown in FIG. 8) at the time when the character “H” is input have a high prior probability. Is set. Further, a hypothesis at a position just before the line-of-sight position (such as hypothesis 47 and hypothesis 46 in the example shown in FIG. 8) is also set with a relatively large probability compared to other hypotheses.

逆に、当該視線位置よりも後の仮説（図８に示す例では、仮説５０及び仮説５１）は、当該視線位置に近い仮説であるにもかかわらず、低い確率が設定されている。これは、オペレータが、文字入力時の視線位置よりも後の文字を見ている確率が低い（つまり、オペレータは、画像における文字を見ながら、あるいは、既に見ていた文字を入力すると考えられる）からである。また、仮説４１〜５１がどれも正解仮説でない確率（つまり、正解仮説が「存在しない」という仮説の確率）は、５％に設定されている。したがって、図８に示す仮説の例は、数式１を満たす。
仮説事前確率設定部３３は、各仮説に対して計算した事前確率を仮説事後確率生成部３４へ送信する。 Conversely, hypotheses subsequent to the line-of-sight position (hypothesis 50 and hypothesis 51 in the example shown in FIG. 8) have a low probability despite being hypotheses close to the line-of-sight position. This is because the probability that the operator is looking at a character after the line-of-sight position at the time of character input is low (that is, the operator is considered to input a character that has already been viewed while looking at the character in the image). Because. Further, the probability that none of the hypotheses 41 to 51 is a correct hypothesis (that is, the probability of a hypothesis that the correct hypothesis does not exist) is set to 5%. Therefore, the hypothetical example shown in FIG.
The hypothesis prior probability setting unit 33 transmits the prior probabilities calculated for each hypothesis to the hypothesis posterior probability generation unit 34.

また、特徴抽出部３６は、仮説生成部３２から受信した各仮説に対して特徴パラメータ（特徴量）Ｘを抽出する。ここで、Ｘは、入力文字ｃに対する各仮説から抽出されるＮ個の特徴の全てを表す。即ち、仮説ｈｉから抽出される文字としての特徴をｘｉとすると、Ｘ＝（ｘ１、ｘ２、・・・ｘＮ）である。前述の通り、特徴ｘｉは、辞書１８ａに格納されている辞書データとしての特徴パラメータと比較されるデータである。
特徴抽出部３６は、各仮説としての文字領域候補から抽出した文字としての特徴Ｘは、仮説事後確率設定部３４及び更新部３７へ送信される。 The feature extraction unit 36 extracts a feature parameter (feature amount) X for each hypothesis received from the hypothesis generation unit 32. Here, X represents all N features extracted from each hypothesis for the input character c. In other words, X = (x1, x2,... XN), where xi is a character feature extracted from the hypothesis hi. As described above, the feature xi is data to be compared with the feature parameter as dictionary data stored in the dictionary 18a.
The feature extraction unit 36 transmits the feature X as a character extracted from the character region candidate as each hypothesis to the hypothesis posterior probability setting unit 34 and the update unit 37.

各仮説の事前確率及び特徴Ｘを受信した仮説事後確率設定部３４は、事前確率及び特徴Ｘから事後確率を計算する。事後確率は、Ｐ（ｈｉ｜Ｘ，ｃ）で表される。事後確率は、以下の式を満たす。

The hypothesis posterior probability setting unit 34 that has received the prior probability and feature X of each hypothesis calculates the posterior probability from the prior probability and the feature X. The posterior probability is represented by P (hi | X, c). The posterior probability satisfies the following formula.

事後確率は、ベイズの定理を用いて、以下の通り変形できる。

The posterior probability can be transformed as follows using Bayes' theorem.

ここで、各仮説に対する特徴ｘ１、ｘ２・・・ｘＮは互いに独立であると近似する。また、仮説ｈｉが正解仮説である確率は、ｉ番目の特徴ｘｉに影響を受けるものとする。そのため、ｋ≠ｉであれば、以下のように近似できる。

Here, it is approximated that the features x1, x2,... XN for each hypothesis are independent of each other. The probability that the hypothesis hi is a correct hypothesis is assumed to be affected by the i-th feature xi. Therefore, if k ≠ i, it can be approximated as follows.

したがって、数式３の右辺の分子の第１因子は、以下の通り近似変形できる。

Therefore, the first factor of the numerator on the right side of Equation 3 can be approximated as follows.

数式５を数式３の分子の第１因子及び分母のΣの中の第１因子に代入し、分母分子を約分することで以下の式が導出される。

By substituting Equation 5 into the first factor in the numerator of Equation 3 and the first factor in Σ of the denominator and dividing the denominator numerator, the following equation is derived.

ここで、数式６における

Where in Equation 6

または、

Or

は、辞書１８ａに格納されている辞書データを用いて計算できる。したがって、仮説事後確率設定部３４は、数式６により事後確率を計算することができる。ただし、事後確率の計算方法は上述した方法などの特定の方法に限定されるものではない。 Can be calculated using dictionary data stored in the dictionary 18a. Therefore, the hypothesis posterior probability setting unit 34 can calculate the posterior probability using Equation 6. However, the method of calculating the posterior probability is not limited to a specific method such as the method described above.

図９は、入力文字「Ｈ」に対する事後確率の計算結果を示す図である。図９に示す例では、正解仮説である仮説４７は、９０％という確率が設定されている。図８及び図９を比較すると、図９に示す仮説４７の事後確率は、図８に示す事前確率よりも高い値となっている。仮説事後確率設定部３４は、仮説から抽出された特徴パラメータと辞書１８ａに保存されている辞書データとを比較すること及び文字を入力した時の視線位置を参照することで、正解仮説には事前確率よりも高い事後確率を設定する。 FIG. 9 is a diagram illustrating a calculation result of the posterior probability for the input character “H”. In the example shown in FIG. 9, the hypothesis 47, which is the correct answer hypothesis, has a probability of 90%. 8 and FIG. 9, the posterior probability of the hypothesis 47 shown in FIG. 9 is higher than the prior probability shown in FIG. The hypothesis posterior probability setting unit 34 compares the feature parameter extracted from the hypothesis with the dictionary data stored in the dictionary 18a and refers to the line-of-sight position when the character is input, so that the correct hypothesis Set a posterior probability higher than the probability.

また、図９に示す例において、正解仮説である仮説４７以外の各仮説には、図８に示す事前確率よりも、低い事後確率が設定されている。すなわち、仮説事後確率設定部３４は、事前確率が設定された各仮説に対して、正解仮説には高い確信度を与え（事後確率を高くし）、正解仮説以外の仮説は確信度を低くする（事後確率を低くする）。
仮説事後確率設定部３４は、入力文字ごとに各仮説に対して計算した事後確率を学習部としての更新部３７へ送信する。 In the example shown in FIG. 9, each hypothesis other than the hypothesis 47, which is the correct hypothesis, is set with a posterior probability lower than the prior probability shown in FIG. That is, the hypothesis posterior probability setting unit 34 gives high confidence to the correct hypothesis (higher posterior probability) for each hypothesis for which the prior probability is set, and lowers the confidence for hypotheses other than the correct hypothesis. (Lower posterior probability).
The hypothesis posterior probability setting unit 34 transmits the posterior probability calculated for each hypothesis for each input character to the update unit 37 as a learning unit.

更新部３７は、仮説事後確率設定部３４から学習用データを元に計算した事後確率を取得し、特徴抽出部３６から各仮説の特徴パラメータを取得する。事後確率及び特徴パラメータＸを受信すると、更新部３７は、入力文字に対する取得した事後確率と特徴パラメータＸとを用いて辞書１８ａに記憶されている辞書データを更新する。ここでは、例として、辞書１８ａは、辞書データとして文字ごとの特徴パラメータのガウス分布が記憶されている場合を想定する。文字の特徴パラメータとしてのガウス分布のパラメータは、平均ｍ及び共分散行列Σであるものとする。この場合、更新部３７は、平均ｍ及び共分散行列Σを以下の式で計算する。

The update unit 37 acquires the posterior probability calculated based on the learning data from the hypothesis posterior probability setting unit 34, and acquires the feature parameters of each hypothesis from the feature extraction unit 36. When the posterior probability and the feature parameter X are received, the updating unit 37 updates the dictionary data stored in the dictionary 18a using the acquired posterior probability and the feature parameter X for the input character. Here, as an example, it is assumed that the dictionary 18a stores a Gaussian distribution of feature parameters for each character as dictionary data. It is assumed that the Gaussian distribution parameters as character feature parameters are the mean m and the covariance matrix Σ. In this case, the update unit 37 calculates the average m and the covariance matrix Σ by the following formula.

ここで、特徴パラメータＸｃは、入力文字ｃに対応する全ての仮説から抽出された特徴の集合である。また、特徴パラメータｘｉは、仮説ｈｉから抽出される特徴である。また、ｗｉは、仮説ｈｉが正解仮説である事後確率である。即ち、数式９は、ｗｉを特徴の「重み」として、重み付き平均を計算する。また、数式１０は、ｗｉを特徴の重みとして、重み付き共分散行列を計算する。平均ｍ及び共分散行列Σの計算を終えると、更新部３７は、辞書１８ａに格納されている、入力文字ｃに対する特徴パラメータとしてのガウス分布のパラメータを計算した平均ｍ及び共分散行列Σに更新する。 Here, the feature parameter Xc is a set of features extracted from all hypotheses corresponding to the input character c. The feature parameter xi is a feature extracted from the hypothesis hi. Wi is a posterior probability that the hypothesis hi is a correct hypothesis. That is, Equation 9 calculates a weighted average with wi as the “weight” of the feature. Also, Equation 10 calculates a weighted covariance matrix with wi as the feature weight. When the calculation of the mean m and the covariance matrix Σ is finished, the updating unit 37 updates the average m and the covariance matrix Σ, which are stored in the dictionary 18a, with the calculated parameters of the Gaussian distribution as the characteristic parameters for the input character c. To do.

更新部３７は、各仮説に対する事後確率と特徴パラメータとを取得した各入力文字について、数式９及び数式１０により平均ｍ及び共分散行列Σを計算する。更新部３７は、各入力文字ｃに対する特徴パラメータのガウス分布のパラメータを計算した平均ｍ及び共分散行列Σに更新する。 The updating unit 37 calculates the average m and the covariance matrix Σ by using Equation 9 and Equation 10 for each input character that has acquired the posterior probability and the feature parameter for each hypothesis. The update unit 37 updates the parameters of the Gaussian distribution of the feature parameters for each input character c to the calculated mean m and covariance matrix Σ.

なお、更新部３７は、１つの入力文字に対する各仮説の事後確率と特徴パラメータとを取得するごとに辞書１８ａを更新するようにしても良いし、１つの学習用データに含まれる全ての入力文字に対する各仮説の事後確率と特徴パラメータとを取得するごとに辞書１８の更新を行うようにしても良い。また、更新部３７は、定期的あるいは任意のタイミングでデータ格納部３１に格納した全ての学習用データに含まれる全ての入力文字について各仮説の事後確率と特徴パラメータとを取得する場合に辞書１８ａを更新するようにしても良い。この場合、学習部２０は、データ格納部３１に格納した全ての学習用データに含まれる全ての入力文字について各仮説の事後確率と特徴パラメータとを取得するまで上述の処理を繰り返した後、更新部３７による辞書１８ａの更新を行うようにすれば良い。更新部３７は、過去にＶＣＳ４で入力された情報（入力文字情報と視線位置情報）に基づいて作成された学習用データを利用して辞書１８ａを更新する。 Note that the updating unit 37 may update the dictionary 18a every time the posterior probability and feature parameter of each hypothesis for one input character are acquired, or all the input characters included in one learning data The dictionary 18 may be updated every time the posterior probabilities and feature parameters of each hypothesis are acquired. Further, the updating unit 37 obtains the posterior probabilities and feature parameters of the respective hypotheses for all input characters included in all the learning data stored in the data storage unit 31 periodically or at an arbitrary timing. May be updated. In this case, the learning unit 20 repeats the above processing until it acquires the posterior probabilities and feature parameters of each hypothesis for all input characters included in all the learning data stored in the data storage unit 31, and then updates The dictionary 18a may be updated by the unit 37. The updating unit 37 updates the dictionary 18a using learning data created based on information (input character information and line-of-sight position information) input in the past by the VCS 4.

更新部３７は、たとえば、以下の更新式により辞書１８ａに記憶した辞書データを更新する。

For example, the update unit 37 updates the dictionary data stored in the dictionary 18a by the following update formula.

ここで、ｍ_ｏｌｄ、Σ_ｏｌｄは、辞書１８ａに辞書データとして格納されている入力文字ｃに対する特徴量としてのガウス分布のパラメータである。ｍ_ｎｅｗ及びΣ_ｎｅｗは、更新後のガウス分布のパラメータである。また、ｗ_ｏｌｄは、更新前のパラメータの「重み」である。「重み」は、任意の値でよい。たとえば、「重み」が大きい値である場合、現在のパラメータが重視され、辞書１８ａの変化は緩やかになる。一方、「重み」が小さい値である場合、現在のパラメータは重視されず、辞書１８ａの変化は急激になる。 Here, m _old and Σ _old are parameters of a Gaussian distribution as feature quantities for the input character c stored as dictionary data in the dictionary 18a. m _{new new} and sigma _{new new} are parameters of the Gaussian distribution of the updated. Also, w _old is the “weight” of the parameter before update. The “weight” may be an arbitrary value. For example, when the “weight” is a large value, the current parameter is emphasized, and the change of the dictionary 18a becomes moderate. On the other hand, when the “weight” is a small value, the current parameter is not emphasized, and the change of the dictionary 18a becomes abrupt.

また、辞書１８ａに記憶されている辞書データとしての文字の特徴パラメータが更新されると、仮説事後確率設定部３４で計算される事後確率の精度が上昇する。したがって、学習部２０は、更新後の辞書１８ａを用いて再度同一の学習用データを用いて学習処理を行うと、さらに辞書１８ａの精度を高めることができる。なお、学習処理は、上述した処理方法などの特定の方法に限定されるものではない。
また、仮説生成部３２、仮説事前確率設定部３３、仮説事後確率設定部３４、特徴抽出部３６及び更新部３７は、プロセッサがプログラムを実行することにより実現されるものとするが、一部又は全部をハードウェアで実現しても良い。 Further, when the character feature parameter as the dictionary data stored in the dictionary 18a is updated, the accuracy of the posterior probability calculated by the hypothesis posterior probability setting unit 34 increases. Therefore, the learning unit 20 can further improve the accuracy of the dictionary 18a by performing learning processing using the same learning data again using the updated dictionary 18a. Note that the learning process is not limited to a specific method such as the processing method described above.
In addition, the hypothesis generation unit 32, the hypothesis prior probability setting unit 33, the hypothesis posterior probability setting unit 34, the feature extraction unit 36, and the update unit 37 are realized by a processor executing a program. All may be realized by hardware.

次に、学習部２０による学習処理の流れについて説明する。
図１０は、学習部２０による辞書１８ａの学習処理の流れを説明するためのフローチャートである。
まず、学習部２０は、データ格納部３１から学習用データを読み出す（ステップ２１）。読み出された学習用データは、仮説生成部３２へ送信される。
学習用データを受信すると、仮説生成部３２は、視線情報から、ある入力文字ｃに対する仮説を生成する（ステップ２２）。たとえば、仮説生成部３２は、認識対象の画像における連結画素成分に対する外接矩形を元に文字領域らしい領域を抽出する。また、仮説生成部３２は、文字領域らしい領域から当該文字を入力した時の視線位置に基づいて仮説とする文字領域を選択しても良い。仮説生成部３２は、生成した仮説を仮説事前確率設定部３３及び特徴抽出部３６へ送信する。 Next, the flow of learning processing by the learning unit 20 will be described.
FIG. 10 is a flowchart for explaining the flow of the learning process of the dictionary 18a by the learning unit 20.
First, the learning unit 20 reads learning data from the data storage unit 31 (step 21). The read learning data is transmitted to the hypothesis generation unit 32.
When the learning data is received, the hypothesis generation unit 32 generates a hypothesis for a certain input character c from the line-of-sight information (step 22). For example, the hypothesis generation unit 32 extracts a region that seems to be a character region based on a circumscribed rectangle for a connected pixel component in an image to be recognized. Further, the hypothesis generation unit 32 may select a character area as a hypothesis based on the line-of-sight position when the character is input from an area that seems to be a character area. The hypothesis generation unit 32 transmits the generated hypothesis to the hypothesis prior probability setting unit 33 and the feature extraction unit 36.

仮説生成部３２により生成された仮説を受信すると、仮説事前確率設定部３３は、視線位置情報に基づいて各仮説が正解仮説である確率、即ち、事前確率を計算する（ステップ２３）。たとえば、仮説事前確率設定部３３は、当該文字を入力した時（あるいは、当該文字を入力する直前）の視線位置に近いほど確率が大きくなるように、各仮説の事前確率を設定する。仮説事前確率設定部３３は、各仮説に設定した事前確率を仮説事後確率設定部３４へ送信する。 When the hypothesis generated by the hypothesis generator 32 is received, the hypothesis prior probability setting unit 33 calculates a probability that each hypothesis is a correct hypothesis, that is, a prior probability based on the line-of-sight position information (step 23). For example, the hypothesis prior probability setting unit 33 sets the prior probability of each hypothesis such that the probability increases as the line of sight is closer to the time when the character is input (or immediately before the character is input). The hypothesis prior probability setting unit 33 transmits the prior probabilities set for each hypothesis to the hypothesis posterior probability setting unit 34.

また、仮説生成部３２が生成した仮説を受信した特徴抽出部３６は、各仮説の特徴パラメータを抽出する（ステップ２４）。ここでいう特徴パラメータとは、辞書１８ａに辞書データとして記憶されている各文字の特徴パラメータと比較されるものである。各仮説の特徴パラメータを抽出すると、特徴抽出部３７は、各仮説の特徴パラメータを事後確率設定部３４及び更新部３７へ送信する。なお、ステップ２３とステップ２４は、逆の順序で行ってもよいし、並行して行ってもよい。 The feature extraction unit 36 that has received the hypothesis generated by the hypothesis generation unit 32 extracts the feature parameters of each hypothesis (step 24). The feature parameter here is to be compared with the feature parameter of each character stored as dictionary data in the dictionary 18a. When the feature parameters of each hypothesis are extracted, the feature extraction unit 37 transmits the feature parameters of each hypothesis to the posterior probability setting unit 34 and the update unit 37. Steps 23 and 24 may be performed in the reverse order or in parallel.

文字に対する各仮説について事前確率設定部３３からの事前確率及び特徴抽出部３６からの特徴パラメータを取得すると、事後確率設定部３４は、取得した事前確率及び特徴パラメータから文字に対する各仮説の事後確率を計算する（ステップ２５）。仮説事後確率設定部３４は、計算した各仮説の事後確率を更新部３７へ送信する。
文字に対する各仮説の事後確率と特徴パラメータとを取得すると、更新部３７は、辞書１８ａに記憶されている当該文字の辞書データを更新する（ステップＳ２６）。たとえば、更新部３７は、事後確率を重みとして特徴パラメータを用いて辞書１８ａに記憶されている当該文字の辞書データ（特徴パラメータ）を更新する。 When the prior probability from the prior probability setting unit 33 and the feature parameter from the feature extraction unit 36 are acquired for each hypothesis for the character, the posterior probability setting unit 34 calculates the posterior probability of each hypothesis for the character from the acquired prior probability and feature parameter. Calculate (step 25). The hypothesis posterior probability setting unit 34 transmits the calculated posterior probability of each hypothesis to the update unit 37.
When the posterior probabilities and feature parameters of each hypothesis for the character are acquired, the updating unit 37 updates the dictionary data of the character stored in the dictionary 18a (step S26). For example, the updating unit 37 updates the dictionary data (feature parameter) of the character stored in the dictionary 18a using the feature parameter with the posterior probability as a weight.

なお、更新部３７は、１つの文字に対する各仮説の事後確率と特徴パラメータとを取得ごとに辞書１８ａを更新するようにしても良いし、学習用データ単位で各文字に対する各仮説の事後確率と特徴パラメータとを取得した後に辞書１８ａを更新するようにしても良いし、データ格納部３１に記憶している全ての学習用データに含まれる各文字に対する各仮説の事後確率と特徴パラメータとを取得した後に辞書１８ａを更新するようにしても良い。 Note that the update unit 37 may update the dictionary 18a every time the posterior probability and feature parameter of each hypothesis for one character are acquired, or the posterior probability of each hypothesis for each character in units of learning data. The dictionary 18a may be updated after acquiring the feature parameters, or the posterior probabilities of each hypothesis and the feature parameters for each character included in all the learning data stored in the data storage unit 31 are acquired. After that, the dictionary 18a may be updated.

以上の流れにより、学習部は、オペレータの入力した文字と文字入力作業中のオペレータの視線位置に基づいて、当該文字の記載領域と特徴パラメータとを高確率で特定する。学習部は、入力文字について当該文字の記載領域である確率と当該記載領域から得られる特徴パラメータとを基に辞書に記憶されて当該文字の辞書データを更新する。これにより、本実施形態に係る学習部によれば、人が直接的に正解文字と正解文字が記載されている確率とを指定することなく、認識対象の画像における正解文字と正解文字が記載されている確率とから辞書を効率的に学習させることができる。 Based on the above flow, the learning unit specifies the character description region and the characteristic parameter with high probability based on the character input by the operator and the line-of-sight position of the operator during the character input operation. The learning unit stores the input character in the dictionary based on the probability that the input character is a description region of the character and the feature parameter obtained from the description region, and updates the dictionary data of the character. Thereby, according to the learning unit according to the present embodiment, the correct character and the correct character in the recognition target image are described without the person directly specifying the correct character and the probability that the correct character is described. The dictionary can be efficiently learned from the probability of being.

本発明のいくつかの実施形態を説明したが、これらの実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。これら新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 Although several embodiments of the present invention have been described, these embodiments are presented by way of example and are not intended to limit the scope of the invention. These novel embodiments can be implemented in various other forms, and various omissions, replacements, and changes can be made without departing from the scope of the invention. These embodiments and modifications thereof are included in the scope and gist of the invention, and are included in the invention described in the claims and the equivalents thereof.

１…紙葉類処理装置、３…区分機本体、４…ＶＣＳ、１０…オペレーションパネル、１１…供給部、１２…主搬送路、１３…ＢＣＲ、１４…スキャナ、１５…ＢＣＷ、１６…区分機、１７…制御部、１８…文字認識部、１８ａ…辞書、１９…住所判定部、１９ａ…住所ＤＢ、２０…学習部、２１…画像蓄積分配装置、２２…ＶＣＤ、２３…ＣＰＵ、２４…不揮発性メモリ、２５…ＲＡＭ、２６…ＲＯＭ、２７…表示部、２８…入力部、２９…視線検出部、３１…データ格納部、３２…仮説生成部、３３…仮説事前確率設定部、３４…仮説事後確率設定部、３６…特徴抽出部、３７…更新部。 DESCRIPTION OF SYMBOLS 1 ... Paper sheet processing apparatus, 3 ... Sorting machine main body, 4 ... VCS, 10 ... Operation panel, 11 ... Supply part, 12 ... Main conveyance path, 13 ... BCR, 14 ... Scanner, 15 ... BCW, 16 ... Sorting machine , 17 ... control unit, 18 ... character recognition unit, 18a ... dictionary, 19 ... address determination unit, 19a ... address DB, 20 ... learning unit, 21 ... image storage / distribution device, 22 ... VCD, 23 ... CPU, 24 ... non-volatile 25 ... RAM, 26 ... ROM, 27 ... display unit, 28 ... input unit, 29 ... gaze detection unit, 31 ... data storage unit, 32 ... hypothesis generation unit, 33 ... hypothesis prior probability setting unit, 34 ... hypothesis A posteriori probability setting unit, 36... Feature extraction unit, 37.

Claims

A generation unit that generates pattern region candidates in an image to be recognized that a pattern input by an input person seems to be described;
A feature extraction unit that calculates a feature amount in the pattern area candidate generated by the generation unit;
Based on the line-of-sight position of the input person who input the pattern, a setting unit that sets a probability that the pattern input by the input person is described in the pattern area candidate generated by the generation unit;
An update unit that updates the pattern recognition dictionary based on the probability set by the setting unit and the feature amount calculated by the feature extraction unit;
A device for learning a pattern recognition dictionary.

The setting unit
A prior probability setting unit for setting a prior probability that a pattern input by the input person is described in a pattern region candidate generated by the generation unit based on the line-of-sight position of the input person;
A posterior probability setting unit that sets a posterior probability for the pattern region candidate based on the prior probability and the feature amount in the pattern region candidate extracted by the feature extraction unit;
The update unit updates a pattern recognition dictionary based on the posterior probability and the feature amount for the pattern region candidate.
The learning apparatus for the pattern recognition dictionary according to claim 1.

The generation unit selects a pattern region candidate from a plurality of pattern region candidates based on the line-of-sight position of the input person who has input the pattern.
The learning apparatus of the pattern recognition dictionary of any one of Claim 1 or 2.

The prior probability setting unit sets a prior probability for each character region candidate based on the line of sight of the input person at the time when the input person inputs the pattern,
The learning apparatus for a dictionary for pattern recognition according to any one of claims 1 to 3.

The prior probability setting unit sets a prior probability for a pattern region candidate based on the line-of-sight position of the input person at a time before the input person inputs the pattern,
5. The pattern recognition dictionary learning device according to claim 1.

The posterior probability setting unit applies the pattern region candidate to the pattern region candidate based on the prior probability, the feature amount in the pattern region candidate, and the similarity between the pattern feature amount stored in the pattern recognition dictionary. Set the posterior probability,
The pattern recognition dictionary learning device according to claim 1.

The pattern is a character.
The pattern recognition dictionary learning device according to any one of claims 1 to 6.

A display unit for displaying an image including a pattern to be recognized;
An input unit for an input person to input a pattern included in the image displayed on the display unit;
A line-of-sight detection unit that detects the line-of-sight position of the input person on the display screen of the display unit;
A coding device.

And a data generation unit that generates learning data for a pattern recognition dictionary including information indicating a pattern input by the input unit and information indicating a line-of-sight position of the input person who has input the pattern.
The coding apparatus according to claim 8.

A storage unit storing dictionary data for pattern recognition;
An image acquisition unit for acquiring an image including a pattern to be recognized;
A pattern recognition unit that recognizes a pattern included in the image acquired by the image acquisition unit using dictionary data stored in the storage unit;
A display unit for displaying an image in which a pattern could not be recognized by the pattern recognition unit;
An input unit for an input person to input a pattern included in the image displayed on the display unit;
A line-of-sight detection unit that detects the line-of-sight position of the input person on the display screen of the display unit;
A generation unit that generates pattern region candidates in the image in which the pattern input by the input unit seems to be described;
A feature extraction unit that calculates a feature amount in the pattern area candidate generated by the generation unit;
A setting unit that sets the probability that the pattern input by the input unit is described in the pattern area candidate generated by the generation unit based on the line-of-sight position detected by the line-of-sight detection unit;
An update unit that updates the dictionary data stored in the storage unit based on the probability set by the setting unit and the feature amount calculated by the feature extraction unit;
A pattern recognition apparatus.

A sorting device for sorting sorting objects based on sorting information,
An image reading unit for reading an image of a description area of the classification information in the classification object;
A storage unit storing dictionary data for recognizing a pattern that can be classification information of a classification target;
A pattern recognition unit that recognizes a pattern included in an image read by the image reading unit using dictionary data stored in the storage unit;
A classification unit that classifies a classification target object based on classification information including a pattern obtained as a recognition result by the pattern recognition unit;
A display unit for displaying a read image of a classification object whose pattern constituting the classification information could not be recognized by the pattern recognition unit;
An input unit for an input person to input a pattern constituting the segment information included in the read image of the segment object displayed on the display unit;
A line-of-sight detection unit that detects the line-of-sight position of the input person on the display screen of the display unit;
A generation unit for generating pattern region candidates in the image of the segmented object that seems to describe the pattern input by the input unit;
A feature extraction unit that calculates a feature amount in the pattern area candidate generated by the generation unit;
A setting unit that sets the probability that the pattern input by the input unit is described in the pattern area candidate generated by the generation unit based on the line-of-sight position detected by the line-of-sight detection unit;
An update unit that updates the dictionary data stored in the storage unit based on the probability set by the setting unit and the feature amount calculated by the feature extraction unit;
Sorting device having.

Generate pattern area candidates that seem to describe the pattern entered by the input person in the image to be recognized,
Calculating a feature amount in the generated pattern region candidate;
A probability that the pattern input by the input person is described in each generated pattern area candidate is set using the line-of-sight position of the input person,
Updating a pattern recognition dictionary based on the set probability and the calculated feature value;
Learning method for a pattern recognition dictionary.