JP2010237909A

JP2010237909A - Knowledge correction program, knowledge correcting device and knowledge correction method

Info

Publication number: JP2010237909A
Application number: JP2009084228A
Authority: JP
Inventors: Osamu Sato; 佐藤　　修; Yutaka Katsumata; 裕勝又; Chika Kobayashi; 智香小林
Original assignee: Fujitsu Frontech Ltd
Current assignee: Fujitsu Frontech Ltd
Priority date: 2009-03-31
Filing date: 2009-03-31
Publication date: 2010-10-21

Abstract

PROBLEM TO BE SOLVED: To improve the specifying precision of a proper character string even when characters are separately recognized. SOLUTION: A misreading candidate information storage means 1a stores misreading candidate information defining a misreading candidate character string including one or more separate character strings obtained by separating characters included in each of a plurality of correction character strings into a plurality of characters in association with each correction character string. A recognition result character string generation means 1b generates a recognition result character string 3 as the candidate of the character string included in image information 2. A character string comparison means 1c specifies the misreading candidate character string which is the most matched with the recognition result character string 3 from among the respective misreading candidate character strings by referring to the misreading candidate information stored in the misreading candidate information storage means 1a, and also specifies a corrected character string 4 corresponding to the misreading candidate character string. An output means 1d outputs the corrected character string 4 specified by the character string comparison means 1c. COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、画像情報に含まれる文字列を認識する知識補正プログラム、知識補正装置および知識補正方法に関する。 The present invention relates to a knowledge correction program for recognizing a character string included in image information, a knowledge correction apparatus, and a knowledge correction method.

従来、帳票を撮像して取得した画像に含まれる文字列を識別する文字認識装置が用いられている。文字認識装置を用いることで、帳票に記入された文字列の文字コードデータを効率的に取得することができる。例えば、金融機関の窓口では、顧客が帳票に記入した口座番号、氏名および住所などの情報が文字認識装置によって読み取られている。読み取られた情報は、金融取引のための入力データとして利用できる。 Conventionally, a character recognition device that identifies a character string included in an image acquired by imaging a form has been used. By using the character recognition device, it is possible to efficiently acquire the character code data of the character string entered in the form. For example, at a financial institution, information such as an account number, name, and address entered by a customer in a form is read by a character recognition device. The read information can be used as input data for financial transactions.

文字認識装置は、例えば以下のようにして画像に含まれる文字列を特定する。
文字認識装置は、読み取りを行う対象文字のテンプレートを予め記憶している。文字認識装置は、文字列の認識対象とする画像を取得すると、その画像に含まれる文字を文字単位で抽出する。そして、文字認識装置は、画像から抽出した文字の特徴（例えば、線の傾き、形状、曲率および面積など）を取得する。文字認識装置は、この特徴とテンプレートの文字の特徴とを照合して、抽出した文字を特定する。文字認識装置は、画像に含まれる文字ごとに、このような処理を順次行って記入者が記入した文字列を特定する。 The character recognition device specifies a character string included in the image as follows, for example.
The character recognition device stores in advance a template for a target character to be read. When the character recognition device acquires an image to be recognized as a character string, the character recognition device extracts characters included in the image in character units. Then, the character recognition device acquires character features (for example, line inclination, shape, curvature, area, etc.) extracted from the image. The character recognition device collates this feature with the character feature of the template to identify the extracted character. The character recognition device sequentially performs such processing for each character included in the image to identify the character string entered by the writer.

ここで、文字認識では、文字に誤認識があると、正しい情報を得るためにオペレータが目視で確認作業を行うなどの必要が生じ、情報の取得効率が低減する。このため、文字の誤認識を軽減することが望まれる。 Here, in character recognition, if there is a misrecognition of a character, it is necessary for an operator to visually confirm it in order to obtain correct information, and information acquisition efficiency is reduced. For this reason, it is desired to reduce erroneous recognition of characters.

これに対し、例えば、所定の項目（例えば、住所）に記入されうる文字列を予め登録しておき、この文字列（以下、登録文字列という）と画像から読み取った文字列の候補（以下、認識結果文字列という）とを照合して、帳票に記入された文字列を特定する方法（知識補正）が知られている（例えば、特許文献１参照）。具体的な方法として、金融機関の名称および該当の金融機関の支店名を組み合わせて生成した登録文字列と認識結果文字列とを照合することで、文字列を特定する方法が知られている（例えば、特許文献２参照）。 On the other hand, for example, a character string that can be entered in a predetermined item (for example, an address) is registered in advance, and this character string (hereinafter referred to as a registered character string) and a character string candidate read from an image (hereinafter referred to as “character string”). A method (knowledge correction) for identifying a character string entered in a form by comparing with a recognition result character string (see, for example, Patent Document 1) is known. As a specific method, there is known a method of identifying a character string by collating a registered character string generated by combining a financial institution name and a branch name of the corresponding financial institution with a recognition result character string ( For example, see Patent Document 2).

特開平０４−１８８３８３号公報JP 04-188383 A 特開平０９−０９７３１２号公報JP 09-09731 2 A

しかし、上記特許文献１，２に記載の方法を用いたとしても、記入時に記入者が意図した内容と認識結果文字列とが異なるほど、文字列の特定精度が低減する。例えば、記入者が“読取”という文字列を意図して記入したにも関わらず、部首間の間隔が広いなどの理由から認識結果文字列として“言売耳又”が取得されることが考えられる。この場合、登録文字列に“読取”が存在していたとしても、文字数の相違（認識結果文字列が４文字であるのに対し、登録文字列は２文字である）や、文字の特徴そのものの相違によって、適正な文字列を特定することが困難となる。 However, even if the methods described in Patent Documents 1 and 2 are used, the character string specifying accuracy decreases as the content intended by the writer at the time of entry differs from the recognition result character string. For example, even though the writer has intentionally entered the character string “read”, “word-of-mouth” may be acquired as the recognition result character string because of the wide interval between the radicals. Conceivable. In this case, even if “read” exists in the registered character string, the number of characters is different (the recognition result character string is 4 characters, whereas the registered character string is 2 characters), and the character characteristics themselves Therefore, it becomes difficult to specify an appropriate character string.

本発明はこのような点に鑑みてなされたものであり、文字が分離して文字認識された場合にも、適正な文字列の特定精度を向上することができる知識補正プログラム、知識補正装置および知識補正方法を提供することを目的とする。 The present invention has been made in view of such a point, and even when characters are separated and recognized, a knowledge correction program, a knowledge correction device, and a knowledge correction program capable of improving the accuracy of specifying an appropriate character string An object is to provide a knowledge correction method.

上記課題を解決するために、知識補正プログラムが提供される。この知識補正プログラムを実行するコンピュータは、文字列比較手段および出力手段として機能する。文字列比較手段は、複数の補正文字列それぞれに含まれる文字を複数の文字に分離した１つ以上の分離文字列を含む誤読候補文字列を各補正文字列に対応付けて定義した誤読候補情報を記憶する誤読候補情報記憶手段に記憶された誤読候補情報を参照して、各誤読候補文字列のうち、認識結果文字列生成手段により画像情報に含まれる文字列の候補として生成された認識結果文字列に最も一致する誤読候補文字列を特定し、この誤読候補文字列に対応する補正文字列を特定する。出力手段は、文字列比較手段が特定した補正文字列を出力する。 In order to solve the above problems, a knowledge correction program is provided. A computer that executes the knowledge correction program functions as a character string comparison unit and an output unit. The character string comparing means is defined as misread candidate information in which a misread candidate character string including one or more separated character strings obtained by separating a character included in each of the plurality of corrected character strings into a plurality of characters is associated with each corrected character string. The recognition result generated as a candidate for the character string included in the image information by the recognition result character string generation means among the misread candidate character strings, with reference to the misread candidate information stored in the misread candidate information storage means The misreading candidate character string that most closely matches the character string is specified, and the correction character string corresponding to the misreading candidate character string is specified. The output means outputs the corrected character string specified by the character string comparison means.

また、上記課題を解決するために、上記知識補正プログラムを実行するコンピュータと同様の機能を有する知識補正装置が提供される。また、上記課題を解決するために、上記知識補正プログラムを実行するコンピュータと同様の処理を行う知識補正方法が提供される。 In order to solve the above problem, a knowledge correction apparatus having the same function as a computer that executes the knowledge correction program is provided. Moreover, in order to solve the said subject, the knowledge correction method which performs the process similar to the computer which runs the said knowledge correction program is provided.

上記知識補正プログラム、知識補正装置および知識補正方法によれば、文字が分離して読み取られた場合にも、適正な文字列の特定精度を向上することができる。 According to the knowledge correction program, the knowledge correction apparatus, and the knowledge correction method, it is possible to improve the accuracy of specifying a proper character string even when characters are separated and read.

本実施の形態の概要を示す図である。It is a figure which shows the outline | summary of this Embodiment. コンピュータのハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of a computer. コンピュータの機能構成を示す図である。It is a figure which shows the function structure of a computer. 帳票画像を例示する図である。It is a figure which illustrates a form image. 誤読候補情報記憶部が記憶するテーブルの例を示す図である。It is a figure which shows the example of the table which a misreading candidate information storage part memorize | stores. 都道府県テーブルのデータ構造例を示す図である。It is a figure which shows the example of a data structure of a prefecture table. 認識結果文字列テーブルのデータ構造例を示す図である。It is a figure which shows the example of a data structure of a recognition result character string table. 比較文字列テーブルの第１のデータ構造例を示す図である。It is a figure which shows the 1st data structure example of a comparison character string table. 代替認識結果文字列テーブルのデータ構造例を示す図である。It is a figure which shows the example of a data structure of an alternative recognition result character string table. 比較文字列テーブルの第２のデータ構造例を示す図である。It is a figure which shows the 2nd data structure example of a comparison character string table. 調整定数テーブルのデータ構造例を示す図である。It is a figure which shows the example of a data structure of an adjustment constant table. 確度定義テーブルのデータ構造例を示す図である。It is a figure which shows the example of a data structure of an accuracy definition table. 文字認識処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of a character recognition process. 比較文字列生成処理の手順を示す第１のフローチャートである。It is a 1st flowchart which shows the procedure of a comparison character string production | generation process. 比較文字列生成処理の手順を示す第２のフローチャートである。It is a 2nd flowchart which shows the procedure of a comparison character string production | generation process. 代替相違度合計の算出処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the calculation process of alternative difference total. 評価結果の第１の具体例を示す図である。It is a figure which shows the 1st specific example of an evaluation result. 評価結果の第２の具体例を示す図である。It is a figure which shows the 2nd specific example of an evaluation result. 文字認識結果確認ウィンドウの表示例を示す図である。It is a figure which shows the example of a display of a character recognition result confirmation window.

以下、本実施の形態を図面を参照して詳細に説明する。
図１は、本実施の形態の概要を示す図である。コンピュータ１は、知識補正プログラムを実行し、画像情報２に含まれる文字列を認識する。画像情報２は、例えば撮像装置により帳票が撮像されて生成される。コンピュータ１は、誤読候補情報記憶手段１ａ、認識結果文字列生成手段１ｂ、文字列比較手段１ｃおよび出力手段１ｄを有する。 Hereinafter, the present embodiment will be described in detail with reference to the drawings.
FIG. 1 is a diagram showing an outline of the present embodiment. The computer 1 executes a knowledge correction program and recognizes a character string included in the image information 2. The image information 2 is generated, for example, by imaging a form with an imaging device. The computer 1 includes misreading candidate information storage means 1a, recognition result character string generation means 1b, character string comparison means 1c, and output means 1d.

誤読候補情報記憶手段１ａは、複数の補正文字列それぞれに含まれる文字を複数の文字に分離した１つ以上の分離文字列を含む誤読候補文字列を各補正文字列に対応付けて定義した誤読候補情報を記憶する。 The misreading candidate information storage unit 1a defines a misreading candidate character string including one or more separated character strings obtained by separating a character included in each of a plurality of corrected character strings into a plurality of characters, in association with each corrected character string. Candidate information is stored.

認識結果文字列生成手段１ｂは、画像情報２に含まれる文字列の候補として認識結果文字列３を生成する。
文字列比較手段１ｃは、誤読候補情報記憶手段１ａに記憶された誤読候補情報を参照して、認識結果文字列生成手段１ｂが生成した認識結果文字列３と誤読候補情報に登録された各誤読候補文字列とを比較・評価する。そして、文字列比較手段１ｃは、認識結果文字列３に最も一致する誤読候補文字列を特定し、この誤読候補文字列に対応する補正文字列４を特定する。 The recognition result character string generation unit 1 b generates a recognition result character string 3 as a character string candidate included in the image information 2.
The character string comparison unit 1c refers to the misread candidate information stored in the misread candidate information storage unit 1a, and recognizes each misread registered in the recognition result character string 3 generated by the recognition result character string generating unit 1b and the misread candidate information. Compare and evaluate candidate strings. Then, the character string comparison unit 1c specifies the misreading candidate character string that most closely matches the recognition result character string 3, and specifies the correction character string 4 corresponding to the misreading candidate character string.

出力手段１ｄは、文字列比較手段１ｃが特定した補正文字列４を出力する。
コンピュータ１によれば、文字列比較手段１ｃにより、誤読候補情報記憶手段１ａに記憶された誤読候補情報が参照されて、各誤読候補文字列のうち、認識結果文字列生成手段１ｂが生成した認識結果文字列３に最も一致する誤読候補文字列が特定される。文字列比較手段１ｃにより、誤読候補情報に基づいて、特定された誤読候補文字列に対応する補正文字列４が特定される。出力手段１ｄにより、文字列比較手段１ｃが特定した補正文字列４が出力される。 The output unit 1d outputs the corrected character string 4 specified by the character string comparison unit 1c.
According to the computer 1, the character string comparing unit 1c refers to the misreading candidate information stored in the misreading candidate information storage unit 1a, and the recognition result character string generating unit 1b generated among the misreading candidate character strings. The misreading candidate character string that most closely matches the result character string 3 is specified. The corrected character string 4 corresponding to the specified misreading candidate character string is specified by the character string comparison unit 1c based on the misreading candidate information. The corrected character string 4 specified by the character string comparison unit 1c is output by the output unit 1d.

これにより、文字が分離して読み取られた場合にも、文字列の特定精度の向上を図ることができる。具体的には、誤読候補情報記憶手段１ａが記憶する誤読候補情報には、例えば補正文字列４である“神奈川”という文字列に対応付けて、“神奈ノ１１”や“神奈１リ”などの誤読候補文字列が定義される。この例では、“ノ１１”や“１リ”が“川”に対する分離文字列に対応する。このとき、認識結果文字列生成手段１ｂが認識結果文字列３として、例えば“神奈１１１”を生成したとする。この場合、文字列比較手段１ｃは、誤読候補情報記憶手段１ａに記憶された誤読候補情報を参照して、認識結果文字列３（“神奈１１１”）に最も一致する誤読候補文字列として、例えば“神奈ノ１１”を特定する。そして、認識結果文字列生成手段１ｂは、誤読候補情報を参照して、誤読候補文字列“神奈ノ１１”に対応付けられた“神奈川”という文字列を補正文字列４として特定する。このように、文字列比較手段１ｃは、誤読候補情報記憶手段１ａに記憶された誤読候補情報を参照して補正文字列を特定することで、認識結果文字列に分離文字列が含まれている場合であっても、適正な補正文字列の特定精度を向上することができる。なお、誤読候補情報記憶手段１ａには、補正文字列に含まれる文字のうち、分離文字列として認識され易い文字を、該当の分離文字列に置き換えた文字列が誤読候補文字列として予め登録される。 Thereby, even when characters are separated and read, it is possible to improve the accuracy of character string identification. Specifically, the misreading candidate information stored in the misreading candidate information storage unit 1a is associated with a character string “Kanagawa” that is the corrected character string 4, for example, “Kanano 11”, “Kana 1 Li”, or the like. The misreading candidate character string is defined. In this example, “No 11” and “1 li” correspond to the separation character string for “river”. At this time, it is assumed that the recognition result character string generation unit 1 b generates, for example, “Kanna 111” as the recognition result character string 3. In this case, the character string comparison unit 1c refers to the misreading candidate information stored in the misreading candidate information storage unit 1a, for example, as the misreading candidate character string that most closely matches the recognition result character string 3 (“Kanna 111”). Specify “Kanano 11”. Then, the recognition result character string generation unit 1 b refers to the misreading candidate information and identifies the character string “Kanagawa” associated with the misreading candidate character string “Kanano 11” as the corrected character string 4. As described above, the character string comparison unit 1c identifies the corrected character string with reference to the misreading candidate information stored in the misreading candidate information storage unit 1a, so that the separated character string is included in the recognition result character string. Even in this case, it is possible to improve the accuracy of specifying an appropriate correction character string. In the misreading candidate information storage unit 1a, a character string in which a character that is easily recognized as a separated character string among characters included in the corrected character string is replaced with a corresponding separated character string is registered in advance as a misreading candidate character string. The

ところで、コンピュータ１は、例えば金融機関などの窓口業務で、顧客が帳票に記入した文字を認識して、金融取引などの入力データとする場合に有用である。以下では、コンピュータ１をこのような用途に用いる場合を例に採り、更に具体的に説明する。 By the way, the computer 1 is useful when, for example, a window service such as a financial institution recognizes characters entered in a form by a customer and uses them as input data for financial transactions. Below, the case where the computer 1 is used for such an application is taken as an example, and it demonstrates more concretely.

図２は、コンピュータのハードウェア構成を示す図である。コンピュータ１００は、金融機関の窓口などに設置され、オペレータによる帳票の情報の入力に用いられる。コンピュータ１００は、ＣＰＵ（Central Processing Unit）１０１、ＲＡＭ（Random Access Memory）１０２、ＨＤＤ１０３、グラフィック処理装置１０４、入力インタフェース１０５，１０６および通信インタフェース１０７を有する。 FIG. 2 is a diagram illustrating a hardware configuration of the computer. The computer 100 is installed at a window of a financial institution or the like, and is used for inputting form information by an operator. The computer 100 includes a CPU (Central Processing Unit) 101, a RAM (Random Access Memory) 102, an HDD 103, a graphic processing device 104, input interfaces 105 and 106, and a communication interface 107.

ＣＰＵ１０１は、コンピュータ１００全体の動作を制御する。
ＲＡＭ１０２は、ＣＰＵ１０１に実行させるＯＳ（Operating System）のプログラムやアプリケーションソフトウェア（以下、アプリケーションという）のプログラムの少なくとも一部を一時的に記憶する。また、ＲＡＭ１０２は、ＣＰＵ１０１による処理に必要な各種データを記憶する。 The CPU 101 controls the operation of the entire computer 100.
The RAM 102 temporarily stores at least part of an OS (Operating System) program and application software (hereinafter referred to as an application) program to be executed by the CPU 101. The RAM 102 stores various data necessary for processing by the CPU 101.

ＨＤＤ１０３は、ＯＳのプログラム、アプリケーションのプログラムを記憶する。また、ＨＤＤ１０３は、ＣＰＵ１０１による処理に必要な各種データを記憶する。なお、ＨＤＤ１０３の代わりに例えばＳＳＤ（Solid State Drive）などの他の不揮発性の記憶装置を用いることもできる。 The HDD 103 stores OS programs and application programs. The HDD 103 stores various data necessary for processing by the CPU 101. Instead of the HDD 103, for example, another nonvolatile storage device such as an SSD (Solid State Drive) can be used.

グラフィック処理装置１０４は、モニタ１１と接続される。グラフィック処理装置１０４は、ＣＰＵ１０１からの命令に従って画像をモニタ１１の画面に表示させる。
入力インタフェース１０５は、キーボード１２とマウス１３と接続される。入力インタフェース１０５は、キーボード１２やマウス１３から送られてくる信号をＣＰＵ１０１に送信する。 The graphic processing device 104 is connected to the monitor 11. The graphic processing device 104 displays an image on the screen of the monitor 11 in accordance with a command from the CPU 101.
The input interface 105 is connected to the keyboard 12 and the mouse 13. The input interface 105 transmits a signal sent from the keyboard 12 or the mouse 13 to the CPU 101.

入力インタフェース１０６は、撮像装置１４と接続される。入力インタフェース１０６は、撮像装置１４が帳票を撮像して生成した帳票画像を取得する。そして、入力インタフェース１０６は、ＣＰＵ１０１の処理に応じて取得した帳票画像をＲＡＭ１０２やＨＤＤ１０３に格納する。 The input interface 106 is connected to the imaging device 14. The input interface 106 acquires a form image generated by the imaging device 14 imaging a form. The input interface 106 stores the form image acquired according to the processing of the CPU 101 in the RAM 102 or the HDD 103.

通信インタフェース１０７は、ネットワーク２０と接続され、他の情報処理装置との間でデータの送受信を行う。
図３は、コンピュータの機能構成を示す図である。コンピュータ１００は、画像取得部１１０、認識結果文字列生成部１２０、誤読候補情報記憶部１３０、制御情報記憶部１４０、比較文字列生成部１５０、代替相違度算出部１６０、補正文字列特定部１７０および出力部１８０を有する。 The communication interface 107 is connected to the network 20 and transmits / receives data to / from other information processing apparatuses.
FIG. 3 is a diagram illustrating a functional configuration of the computer. The computer 100 includes an image acquisition unit 110, a recognition result character string generation unit 120, a misreading candidate information storage unit 130, a control information storage unit 140, a comparison character string generation unit 150, an alternative dissimilarity calculation unit 160, and a corrected character string specification unit 170. And an output unit 180.

画像取得部１１０は、撮像装置１４が帳票を撮像して生成した帳票画像を取得する。画像取得部１１０は、取得した帳票画像を認識結果文字列生成部１２０に出力する。
認識結果文字列生成部１２０は、画像取得部１１０から取得した帳票画像に含まれる文字を項目別（都道府県名や姓名など）に読み取り、読み取った文字に基づいて複数の認識結果文字列を生成する。認識結果文字列生成部１２０は、例えば、次のような手順により認識結果文字列を生成する。 The image acquisition unit 110 acquires a form image generated by the imaging device 14 imaging a form. The image acquisition unit 110 outputs the acquired form image to the recognition result character string generation unit 120.
The recognition result character string generation unit 120 reads characters included in the form image acquired from the image acquisition unit 110 by item (prefecture name, first name, and the like), and generates a plurality of recognition result character strings based on the read characters. To do. The recognition result character string generation unit 120 generates a recognition result character string by the following procedure, for example.

（１）認識結果文字列生成部１２０は、帳票画像に含まれる所定の項目の文字列のうち、１文字分と考えられる所定の領域を読み取って、該当の文字に対する特徴を取得する。ここで、特徴とは、例えば、画像に含まれる線の位置や線の傾きの度合いなどを評価したものである。 (1) The recognition result character string generation unit 120 reads a predetermined area considered to be one character in a character string of a predetermined item included in the form image, and acquires a characteristic for the corresponding character. Here, the feature is, for example, an evaluation of the position of a line included in the image, the degree of inclination of the line, or the like.

（２）認識結果文字列生成部１２０は、図示しない読取文字テンプレート記憶部に予め記憶された読取対象文字のテンプレート（以下、テンプレート文字という）を参照して、帳票画像から抽出した文字の特徴と、各テンプレート文字の特徴とを比較する。認識結果文字列生成部１２０は、比較の結果、帳票画像から抽出した特徴と各テンプレート文字の特徴との食い違いの度合いを示す相違度を算出する。相違度の算出には、例えば、ベイズ法（Bayesian method）を採用することができる。相違度は、値が小さいほど、食い違いの度合いも小さいものとして求められる。すなわち、各テンプレート文字のうち、相違度が小さいものほど、読み取りの候補としての優先度は高い。 (2) The recognition result character string generation unit 120 refers to a character template extracted from a form image with reference to a template of a character to be read (hereinafter referred to as a template character) stored in advance in a read character template storage unit (not shown). Compare the characteristics of each template character. As a result of the comparison, the recognition result character string generation unit 120 calculates a difference indicating the degree of discrepancy between the feature extracted from the form image and the feature of each template character. For example, a Bayesian method can be employed for calculating the degree of difference. The degree of difference is calculated as a value that is smaller as the value is smaller. That is, among the template characters, the lower the degree of difference, the higher the priority as a reading candidate.

（３）認識結果文字列生成部１２０は、帳票画像から抽出した各文字に対し各テンプレート文字について算出した相違度の小さいもの同士を順に組み合わせて、認識結果文字列を生成する。例えば、各文字につき最も相違度の小さいもの同士を組み合わせたものが最も優先度の高い第１候補の認識結果文字列となる。 (3) The recognition result character string generation unit 120 generates a recognition result character string by sequentially combining characters with small differences calculated for each template character with respect to each character extracted from the form image. For example, a combination of those having the smallest difference for each character is the recognition result character string of the first candidate having the highest priority.

認識結果文字列生成部１２０は、このようにして、帳票画像から読み取った各文字の特徴から、複数の認識結果文字列を生成する。認識結果文字列生成部１２０は、生成した各認識結果文字列を制御情報記憶部１４０に格納する。 In this way, the recognition result character string generation unit 120 generates a plurality of recognition result character strings from the characteristics of each character read from the form image. The recognition result character string generation unit 120 stores each generated recognition result character string in the control information storage unit 140.

誤読候補情報記憶部１３０は、認識結果文字列に含まれうる分離文字列を考慮した複数の誤読候補文字列を定義した誤読候補情報を記憶する。ここで、分離文字列とは、帳票に記入された文字の部首間の間隔が広いなどの理由で、１文字として認識されるべき文字が２文字以上の複数の文字として認識されてしまった文字列を示す。例えば、“神”という字に対しては、“ネ申”という分離文字列が考えられる。また、誤読候補情報には、帳票において該当の認識結果文字列が記入された項目に応じて、実際に記入されうる補正文字列が各誤読候補文字列に対応付けて登録されている。ここで、実際に記入されうる補正文字列とは、例えば、都道府県名という項目であれば、現実に存在する都道府県名を示す文字列である。 The misreading candidate information storage unit 130 stores misreading candidate information that defines a plurality of misreading candidate character strings in consideration of separated character strings that can be included in the recognition result character string. Here, the separated character string means that a character that should be recognized as one character has been recognized as a plurality of two or more characters because the interval between the radicals of the characters entered in the form is wide. Indicates a character string. For example, for the character “God”, a separated character string “Ne-Shen” can be considered. In addition, in the misreading candidate information, a correction character string that can be actually entered is registered in association with each misreading candidate character string in accordance with the item in which the corresponding recognition result character string is entered in the form. Here, the correction character string that can be actually entered is, for example, a character string that indicates the name of a prefecture that actually exists if the item is a name of a prefecture.

制御情報記憶部１４０は、認識結果文字列生成部１２０が生成した認識結果文字列を記憶する。また、制御情報記憶部１４０は、比較文字列生成部１５０が生成した認識結果文字列との比較に用いる比較文字列を記憶する。また、制御情報記憶部１４０は、比較文字列生成部１５０が抽出した複数の誤読候補文字列と、それに対応する認識結果文字列と、を比較する際に用いる制御情報を記憶する。制御情報には、比較の結果を数値評価する際に用いる各種条件などが含まれる。 The control information storage unit 140 stores the recognition result character string generated by the recognition result character string generation unit 120. Further, the control information storage unit 140 stores a comparison character string used for comparison with the recognition result character string generated by the comparison character string generation unit 150. In addition, the control information storage unit 140 stores control information used when comparing the plurality of misread candidate character strings extracted by the comparison character string generation unit 150 and the corresponding recognition result character strings. The control information includes various conditions used when numerically evaluating the comparison result.

比較文字列生成部１５０は、認識結果文字列生成部１２０による各項目の認識結果文字列の生成が完了すると、制御情報記憶部１４０に記憶された各項目の認識結果文字列から１つ（例えば、都道府県という項目に含まれる“神奈ノ１１”）を取得する。比較文字列生成部１５０は、誤読候補情報記憶部１３０を参照して、取得した認識結果文字列に対応する誤読候補情報を取得する。更に、比較文字列生成部１５０は、誤読候補情報記憶部１３０に記憶された誤読候補情報を参照して、該当の認識結果文字列に含まれる文字数と一致または文字数±ｎ（例えば、ｎ＝１）の範囲で一致する複数の誤読候補文字列を抽出する。そして、比較文字列生成部１５０は、取得した各誤読候補文字列に基づいて認識結果文字列との比較に用いる複数の比較文字列を生成する。 When the generation of the recognition result character string for each item by the recognition result character string generation unit 120 is completed, the comparison character string generation unit 150 selects one of the recognition result character strings for each item stored in the control information storage unit 140 (for example, , “Kanano 11” included in the item of prefecture. The comparison character string generation unit 150 refers to the misreading candidate information storage unit 130 and acquires misreading candidate information corresponding to the acquired recognition result character string. Further, the comparison character string generation unit 150 refers to the misreading candidate information stored in the misreading candidate information storage unit 130 and matches the number of characters included in the corresponding recognition result character string or the number of characters ± n (for example, n = 1) A plurality of misread candidate character strings that match within the range of Then, the comparison character string generation unit 150 generates a plurality of comparison character strings used for comparison with the recognition result character string based on the acquired misreading candidate character strings.

比較文字列生成部１５０は、各誤読候補文字列につき以下の各場合に応じた比較文字列を生成する。
（第１の場合）比較文字列生成部１５０は、認識結果文字列の有効文字数が該当の誤読候補文字列の有効文字数と等しい場合である。この場合、該当の誤読候補文字列を比較文字列とする。 The comparison character string generation unit 150 generates a comparison character string corresponding to each of the following cases for each misreading candidate character string.
(First Case) The comparison character string generation unit 150 is a case where the number of valid characters of the recognition result character string is equal to the number of valid characters of the corresponding misreading candidate character string. In this case, the corresponding misreading candidate character string is set as a comparison character string.

ここで、有効文字数とは、各文字列のうち、項目を示す文字または文字列（“都”や“県”など）を除いた文字数をいうものとする。なお、認識結果文字列などで“都”や“県”などの文字が含まれない場合には、認識結果文字列の文字数が有効文字数となる。有効文字数は、特許請求の範囲における“文字数”に対応する。 Here, the number of valid characters refers to the number of characters excluding the character or character string indicating the item (such as “city” or “prefecture”) among the character strings. Note that if the recognition result character string or the like does not include characters such as “city” or “prefecture”, the number of characters in the recognition result character string is the number of valid characters. The number of valid characters corresponds to “number of characters” in the claims.

（第２の場合）比較文字列生成部１５０は、認識結果文字列の有効文字数が該当の誤読候補文字列の有効文字数よりも小さい場合である。この場合、該当の誤読候補文字列と認識結果文字列との文字差の分だけ、誤読候補文字列から文字を削除した文字列を比較文字列として生成する。例えば、認識結果文字列“神奈ノ１１”（有効文字数が５）に対して、誤読候補文字列“ネ申奈１１１”（有効文字数が６）が抽出されている場合、“ネ申奈１１１”から（６−５＝）１文字を除いた次の文字列が生成される。すなわち、先頭の文字を外した文字列である“申奈１１１”、最後尾の文字を外した文字列である“ネ申奈１１”および中間の文字を外した文字列である“ネ奈１１１”、“ネ申１１１”、“ネ申奈１１”が比較文字列として生成される。このようにして、誤読候補文字列の有効文字数が認識結果文字列の有効文字数よりも大きい場合には、比較文字列の有効文字数と認識結果文字列の有効文字数とが一致するようにする。これにより、両文字列の対応する位置にある文字同士（例えば、先頭から２文字目同士など）を容易に比較できるようになる。 (Second case) The comparison character string generation unit 150 is a case where the number of valid characters of the recognition result character string is smaller than the number of valid characters of the corresponding misreading candidate character string. In this case, a character string in which characters are deleted from the misreading candidate character string by the amount corresponding to the character difference between the corresponding misreading candidate character string and the recognition result character string is generated as a comparison character string. For example, when the misreading candidate character string “Ne Shina 111” (the number of valid characters is 6) is extracted from the recognition result character string “Kanano 11” (the number of valid characters is 5), 6-5 =) The next character string excluding one character is generated. That is, “Senna 111” which is a character string from which the first character is removed, “Nenshina 11” which is a character string from which the last character is removed, and “Nen 111” which is a character string from which the middle character is removed. , “Ne-Shen 111” and “Ne-Sena 11” are generated as comparison character strings. In this way, when the number of valid characters in the misreading candidate character string is larger than the number of valid characters in the recognition result character string, the number of valid characters in the comparison character string is matched with the number of valid characters in the recognition result character string. This makes it possible to easily compare characters at corresponding positions in both character strings (for example, the second character from the beginning).

（第３の場合）比較文字列生成部１５０は、認識結果文字列の有効文字数が該当の誤読候補文字列の有効文字数よりも大きい場合である。この場合、該当の誤読候補文字列と認識結果文字列との文字差の分だけ、認識結果文字列から文字を削除した文字列を代替認識結果文字列として生成する。そして、比較文字列生成部１５０は、生成した各代替認識結果文字列を制御情報記憶部１４０に格納する。例えば、認識結果文字列“神奈ノ１１”（有効文字数が５）に対して、誤読候補文字列“神奈１１”（有効文字数が４）が抽出されている場合、“神奈ノ１１”から（５−４＝）１文字を除いた次の文字列が生成される。すなわち、先頭の文字を外した文字列である“奈ノ１１”、最後尾の文字を外した文字列である“神奈ノ１”および中間の文字を外した文字列である“神ノ１１”、“神奈１１”が代替認識結果文字列として生成される。この場合、比較文字列生成部１５０は、該当の誤読候補文字列を比較文字列とする。このようにして、認識結果文字列の有効文字数が誤読候補文字列の有効文字数よりも大きい場合には、認識結果文字列の有効文字数と比較文字列の有効文字数とが一致するようにする。これにより、上記（第２の場合）と同様に、両文字列の対応する位置にある文字同士を容易に比較できるようになる。なお、代替認識結果文字列は、該当の文字差に関して一度だけ生成されればよい。例えば、“神奈ノ１１”に対して有効文字数差１の分の代替認識結果文字列が１度生成されれば、その情報を保持することで、次に誤読候補文字列として有効文字数が４の文字列を抽出した際に、代替認識文字列の生成を行う必要はなくなる。 (3rd case) The comparison character string production | generation part 150 is a case where the number of effective characters of a recognition result character string is larger than the number of effective characters of a corresponding misreading candidate character string. In this case, a character string obtained by deleting characters from the recognition result character string by the amount corresponding to the character difference between the corresponding misreading candidate character string and the recognition result character string is generated as an alternative recognition result character string. Then, the comparison character string generation unit 150 stores the generated alternative recognition result character strings in the control information storage unit 140. For example, when the misreading candidate character string “Kanna 11” (the number of effective characters is 4) is extracted from the recognition result character string “Kanano 11” (the number of effective characters is 5), -4 =) The next character string excluding one character is generated. That is, “Nano 11” which is a character string from which the first character is removed, “Kanano 1” which is a character string from which the last character is removed, and “Kamino 11” which is a character string from which the middle character is removed. , “Kanna 11” is generated as an alternative recognition result character string. In this case, the comparison character string generation unit 150 sets the corresponding misreading candidate character string as a comparison character string. In this way, when the number of valid characters in the recognition result character string is larger than the number of valid characters in the misreading candidate character string, the number of valid characters in the recognition result character string is matched with the number of valid characters in the comparison character string. As a result, as in the above (second case), it is possible to easily compare characters at corresponding positions in both character strings. The alternative recognition result character string need only be generated once for the corresponding character difference. For example, if an alternative recognition result character string corresponding to a difference of 1 in the number of effective characters is generated once for “Kanano 11”, the number of effective characters is 4 as a misreading candidate character string next by holding that information. When a character string is extracted, there is no need to generate a substitute recognition character string.

（第４の場合）比較文字列生成部１５０は、認識結果文字列の有効文字数が該当の誤読候補文字列に対応する補正文字列の有効文字数よりも小さい場合である。この場合、該当の補正文字列と認識結果文字列との文字差の分だけ、補正文字列から文字を除外した文字列を比較文字列として生成する。例えば、認識結果文字列の有効文字部分が“神剛”（有効文字数が２）に対して、誤読候補文字列に対応する補正文字列が“神奈川”（有効文字数が３）である場合、比較文字列として、“神奈川”から（３−２＝）１文字を除いた次の文字列が生成される。すなわち、先頭の文字を外した文字列である“奈川”、最後尾の文字を外した文字列である“神奈”および中間の文字を外した文字列である“神川”が比較文字列として生成される。統合文字の例としては、“神奈川”という認識対象文字に対して、先に例示した認識結果文字列“神剛”に含まれる“剛”（“奈川”が統合された統合文字）が考えられる。このようにして、補正文字列の有効文字数の方が、認識結果文字列の有効文字数よりも大きい場合に、比較文字列の有効文字数と認識結果文字列の有効文字数とが一致するものを生成する。このため、統合文字が含まれている場合にも、認識結果文字列と比較文字列との適合確率が向上する。これにより、認識結果文字列が、２文字が１文字に統合された統合文字を含む場合にも、この統合文字に対する補正を行うことができる。 (4th case) The comparison character string production | generation part 150 is a case where the number of effective characters of a recognition result character string is smaller than the number of effective characters of the correction | amendment character string corresponding to a corresponding misreading candidate character string. In this case, a character string excluding characters from the corrected character string by the amount corresponding to the character difference between the corresponding corrected character string and the recognition result character string is generated as a comparison character string. For example, when the effective character part of the recognition result character string is “Shingo” (the number of effective characters is 2), the correction character string corresponding to the misreading candidate character string is “Kanagawa” (the number of effective characters is 3). As a character string, the next character string obtained by removing (3-2 =) one character from “Kanagawa” is generated. In other words, “Nagawa”, the character string with the leading character removed, “Kana”, the character string with the trailing character removed, and “Kamikawa”, the character string with the middle character removed, are generated as comparison character strings. Is done. As an example of the integrated character, for the recognition target character “Kanagawa”, the “go” (integrated character in which “Nagawa” is integrated) included in the recognition result character string “Kango” exemplified above may be considered. . In this way, when the number of valid characters in the correction character string is larger than the number of valid characters in the recognition result character string, a character string in which the number of valid characters in the comparison character string matches the number of valid characters in the recognition result character string is generated. . For this reason, even when an integrated character is included, the matching probability between the recognition result character string and the comparison character string is improved. Thereby, even when the recognition result character string includes an integrated character in which two characters are integrated into one character, correction for the integrated character can be performed.

比較文字列生成部１５０は、生成した各比較文字列と、各比較文字列の比較文字数（有効文字数から削った分の文字数を減算したもの）と、該当の比較文字列に対応する補正文字列と、を制御情報記憶部１４０に格納する。 The comparison character string generation unit 150 generates each comparison character string, the number of comparison characters of each comparison character string (subtracting the number of characters deleted from the number of valid characters), and a correction character string corresponding to the corresponding comparison character string Are stored in the control information storage unit 140.

代替相違度算出部１６０は、比較文字列生成部１５０による比較文字列の生成が完了すると、制御情報記憶部１４０に記憶された各比較文字列と対応する比較文字数とを取得する。そして、代替相違度算出部１６０は、認識結果文字列（または代替認識結果文字列）と各比較文字列とを比較する。代替相違度算出部１６０は、比較の結果、認識結果文字列生成部１２０が認識結果文字列に含まれる各文字について算出した相違度と、制御情報記憶部１４０に記憶された制御情報と、に基づいて、認識結果文字列（または代替認識結果文字列）の各文字に対する各比較文字列の各文字の代替相違度を決定する。ここで、代替相違度とは、認識結果文字列生成部１２０が算出した相違度と制御情報記憶部１４０に記憶された制御情報とに基づいて算出されるものであり、認識結果文字列の各文字と各比較文字列の各文字との食い違いの度合いを示す値である。すなわち、代替相違度は、両文字列の食い違いの度合いを“文字”単位で示す値である。相違度は、値が小さいほど、食い違いの度合いも小さいものとして求められる。 When the comparison character string generation unit 150 completes the generation of the comparison character string, the alternative dissimilarity calculation unit 160 acquires the number of comparison characters corresponding to each comparison character string stored in the control information storage unit 140. Then, the alternative difference calculation unit 160 compares the recognition result character string (or the alternative recognition result character string) with each comparison character string. As a result of the comparison, the alternative dissimilarity calculation unit 160 includes the difference calculated by the recognition result character string generation unit 120 for each character included in the recognition result character string, and the control information stored in the control information storage unit 140. Based on this, the alternative difference degree of each character of each comparison character string with respect to each character of the recognition result character string (or alternative recognition result character string) is determined. Here, the alternative dissimilarity is calculated based on the dissimilarity calculated by the recognition result character string generation unit 120 and the control information stored in the control information storage unit 140. It is a value indicating the degree of discrepancy between a character and each character of each comparison character string. That is, the alternative dissimilarity is a value indicating the degree of discrepancy between the two character strings in “character” units. The degree of difference is calculated as a value that is smaller as the value is smaller.

また、代替相違度算出部１６０は、算出した各文字の代替相違度に基づいて、認識結果文字列（または代替認識結果文字列）に対する各比較文字列の食い違いの度合いを示す値（評価値）を算出する。すなわち、評価値は食い違いの度合いを“文字列”単位で示す値である。評価値は、値が小さいほど、食い違いの度合いも小さいものとして求められる。具体的には、まず、代替相違度算出部１６０は、比較文字列に含まれる各文字の代替相違度の和を求める。そして、代替相違度算出部１６０は、代替相違度の和を該当の比較文字列に対応する比較文字数で割った商を、その比較文字列の評価値とする。代替相違度算出部１６０は、算出した各比較文字列の評価値を補正文字列特定部１７０に出力する。 Further, the substitution difference calculation unit 160 is a value (evaluation value) indicating the degree of discrepancy of each comparison character string with respect to the recognition result character string (or the substitution recognition result character string) based on the calculated substitution difference of each character. Is calculated. That is, the evaluation value is a value indicating the degree of discrepancy in units of “character string”. An evaluation value is calculated | required as a thing with a small degree of discrepancy as a value is small. Specifically, first, the alternative dissimilarity calculation unit 160 obtains the sum of the alternative dissimilarities of the characters included in the comparison character string. Then, the alternative dissimilarity calculation unit 160 uses the quotient obtained by dividing the sum of the alternative dissimilarities by the number of comparison characters corresponding to the corresponding comparison character string as the evaluation value of the comparison character string. The substitution difference calculation unit 160 outputs the calculated evaluation value of each comparison character string to the corrected character string specifying unit 170.

補正文字列特定部１７０は、代替相違度算出部１６０から取得した評価値に基づいて、最も優先度の高い（すなわち、最も評価値の小さい）第１候補の誤読候補文字列を特定する。補正文字列特定部１７０は、誤読候補情報記憶部１３０に記憶された誤読候補情報を参照して、特定した第１候補の誤読候補文字列に対応する補正文字列を取得する。補正文字列特定部１７０は、取得した補正文字列を出力部１８０に出力する。 Based on the evaluation value acquired from the alternative dissimilarity calculation unit 160, the corrected character string specifying unit 170 specifies the misreading candidate character string of the first candidate having the highest priority (that is, the lowest evaluation value). The corrected character string specifying unit 170 refers to the misreading candidate information stored in the misreading candidate information storage unit 130 and acquires a corrected character string corresponding to the specified misreading candidate character string of the first candidate. The corrected character string specifying unit 170 outputs the acquired corrected character string to the output unit 180.

出力部１８０は、補正文字列特定部１７０から取得した補正文字列を示す情報をモニタ１１に表示させる。
なお、比較文字列生成部１５０、代替相違度算出部１６０、補正文字列特定部１７０の処理は、各項目単位で順次実行される。例えば、まず、都道府県名を示す項目について上記各部により補正文字列が特定される。そして、次に、市区町村名を示す項目について上記各部により補正文字列が特定される。更に、次に、地区・番地を示す項目について上記各部により補正文字列が特定される。 The output unit 180 causes the monitor 11 to display information indicating the corrected character string acquired from the corrected character string specifying unit 170.
Note that the processing of the comparison character string generation unit 150, the alternative dissimilarity calculation unit 160, and the corrected character string specification unit 170 is sequentially executed for each item. For example, first, the correction character string is specified by the above-described units for the item indicating the prefecture name. Next, a correction character string is specified by the above-described units for the item indicating the city name. Further, next, the correction character string is specified by each of the above sections for the item indicating the district / address.

このようにして、コンピュータ１００は、全ての項目について補正文字列を特定し、その結果をオペレータに通知する。
図４は、帳票画像を例示する図である。帳票画像２００には、領域２１１，２１２，２２１，２２２，２３１，２３２，・・・が設けられている。帳票画像２００は、金融機関に設置された帳票に顧客が文字列を記入し、その帳票を撮像装置１４が撮像することで生成される。そして、画像取得部１１０は、撮像装置１４から帳票画像２００を取得する。画像取得部１１０は、取得した帳票画像２００を認識結果文字列生成部１２０に出力する。 In this way, the computer 100 identifies the correction character string for all items and notifies the operator of the result.
FIG. 4 is a diagram illustrating a form image. The form image 200 is provided with areas 211, 212, 221, 222, 231, 232,. The form image 200 is generated when a customer enters a character string in a form installed in a financial institution and the imaging device 14 images the form. Then, the image acquisition unit 110 acquires the form image 200 from the imaging device 14. The image acquisition unit 110 outputs the acquired form image 200 to the recognition result character string generation unit 120.

領域２１１は、都道府県名を示す文字列が記入された領域である。
領域２１２は、市区町村名を示す文字列が記入された領域である。
領域２２１は、姓を示す文字列が記入された領域である。 The area 211 is an area in which a character string indicating a prefecture name is entered.
The area 212 is an area in which a character string indicating a city name is entered.
The area 221 is an area where a character string indicating a surname is entered.

領域２２２は、名を示す文字列が記入された領域である。
領域２３１は、銀行名を示す文字列が記入された領域である。
領域２３２は、支店名を示す文字列が記入された領域である。 The area 222 is an area in which a character string indicating a name is entered.
The area 231 is an area in which a character string indicating a bank name is entered.
The area 232 is an area in which a character string indicating a branch name is entered.

認識結果文字列生成部１２０は、生成した各認識結果文字列につき領域２１１，２１２，２２１，２２２，２３１，２３２，・・・の帳票上における読み取り位置によって、各認識結果文字列が何れの項目に該当するものであるかを特定することができる。認識結果文字列生成部１２０は、領域２１１に記入された文字列を都道府県の項目に該当するものと特定する。また、認識結果文字列生成部１２０は、領域２１２に記入された文字列を市区町村の項目に該当するものと特定する。以下、同様にして、認識結果文字列生成部１２０は、姓名や銀行名、支店名などの項目を特定する。 The recognition result character string generation unit 120 determines which item of each recognition result character string depends on the read position of the regions 211, 212, 221, 222, 231, 232,. Can be identified. The recognition result character string generation unit 120 identifies the character string entered in the area 211 as corresponding to the item of the prefecture. In addition, the recognition result character string generation unit 120 identifies the character string entered in the area 212 as corresponding to the item of city. Hereinafter, similarly, the recognition result character string generation unit 120 specifies items such as first and last names, bank names, and branch names.

図５は、誤読候補情報記憶部が記憶するテーブルの例を示す図である。誤読候補情報記憶部１３０には、都道府県テーブル１３１、市区町村テーブル１３２、地区テーブル１３３、姓テーブル１３４、・・・が予め格納される。都道府県テーブル１３１、市区町村テーブル１３２、地区テーブル１３３、姓テーブル１３４、・・・は、誤読候補情報に対応するものである。 FIG. 5 is a diagram illustrating an example of a table stored in the misreading candidate information storage unit. In the misreading candidate information storage unit 130, a prefecture table 131, a municipal table 132, a district table 133, a surname table 134,. The prefecture table 131, the municipal table 132, the district table 133, the surname table 134,... Correspond to the misreading candidate information.

都道府県テーブル１３１は、都道府県名として記入されうる補正文字列と、その都道府県名に対する分離文字列を含む誤読候補文字列と、を対応付けて定義したものである。
市区町村テーブル１３２は、市区町村名として記入されうる補正文字列と、その市区町村名に対する分離文字列を含む誤読候補文字列と、を対応付けて定義したものである。 The prefecture table 131 is defined by associating a corrected character string that can be entered as a prefecture name with a misreading candidate character string that includes a separated character string for the prefecture name.
The municipality table 132 is defined by associating a corrected character string that can be entered as a municipality name with a misreading candidate character string that includes a separation character string for the municipality name.

地区テーブル１３３は、地区名として記入されうる補正文字列と、その地区名に対する分離文字列を含む誤読候補文字列と、を対応付けて定義したものである。
姓テーブル１３４は、姓として記入されうる補正文字列と、その姓に対する分離文字列を含む誤読候補文字列と、を対応付けて定義したものである。 The district table 133 defines a correction character string that can be entered as a district name and a misreading candidate character string that includes a separation character string for the district name in association with each other.
The surname table 134 is defined by associating a corrected character string that can be entered as a surname with a misreading candidate character string that includes a separated character string for the surname.

都道府県テーブル１３１、市区町村テーブル１３２、地区テーブル１３３、姓テーブル１３４、・・・には、分離文字列を含む誤読候補文字列が予め登録される。
図６は、都道府県テーブルのデータ構造例を示す図である。都道府県テーブル１３１には、誤読候補文字列を示す項目、補正文字列を示す項目および有効文字数を示す項目が設けられている。各項目の横方向に並べられた情報同士が互いに関連付けられて、１つ誤読候補文字列に関する情報を示す。なお、市区町村テーブル１３２、地区テーブル１３３、姓テーブル１３４、・・・に関しても同様のデータ構成となる。 In the prefecture table 131, the municipality table 132, the district table 133, the surname table 134,..., Misreading candidate character strings including separated character strings are registered in advance.
FIG. 6 is a diagram illustrating an example of the data structure of the prefecture table. The prefecture table 131 includes an item indicating a misreading candidate character string, an item indicating a correction character string, and an item indicating the number of valid characters. Information arranged in the horizontal direction of each item is associated with each other, and indicates information regarding one misreading candidate character string. The municipal data table 132, the district table 133, the surname table 134,... Have the same data structure.

誤読候補文字列を示す項目には、該当の補正文字列に対する誤読候補文字列が設定される。補正文字列を示す項目には、現実に存在する都道府県名を示す文字列が設定される。有効文字数を示す項目には、誤読候補文字列のうち、区分を示す文字または文字列（“都”や“県”など）を除いた部分の文字数を示す値が設定される。 In the item indicating the misreading candidate character string, the misreading candidate character string for the corresponding correction character string is set. In the item indicating the correction character string, a character string indicating the name of a prefecture that actually exists is set. In the item indicating the number of valid characters, a value indicating the number of characters in a portion of the misreading candidate character string excluding the character or character string (such as “city” or “prefecture”) indicating the classification is set.

都道府県テーブル１３１には、例えば、誤読候補文字列が“ネ申奈１１１県”、補正文字列が“神奈川県”、有効文字数が“６”という情報が設定される。これは、補正文字列“神奈川県”に分離文字列を含めた誤読候補として“ネ申奈１１１県”が認識結果文字列に含まれうることを示しており、“ネ申奈１１１県”の“県”を除いた部分の文字数が“６”であることを示している。 In the prefecture table 131, for example, information that the misreading candidate character string is “Ne Shina 111 prefecture”, the correction character string is “Kanagawa prefecture”, and the number of valid characters is “6” is set. This indicates that “Ne Shina 111 prefecture” can be included in the recognition result character string as a misreading candidate including the separation character string in the corrected character string “Kanagawa prefecture”. This indicates that the number of characters excluding “” is “6”.

図７は、認識結果文字列テーブルのデータ構造例を示す図である。認識結果文字列テーブル１４１ａ，１４１ｂ，１４１ｃ，・・・は、認識結果文字列生成部１２０により、帳票画像２００に含まれる項目ごとに生成されて制御情報記憶部１４０に格納される。なお、以下では、認識結果文字列テーブル１４１ａに関してのみ説明するが、認識結果文字列テーブル１４１ｂ，１４１ｃ，・・・に関しても同様の構成である。 FIG. 7 is a diagram illustrating an example of the data structure of the recognition result character string table. The recognition result character string tables 141a, 141b, 141c,... Are generated by the recognition result character string generation unit 120 for each item included in the form image 200 and stored in the control information storage unit 140. In the following description, only the recognition result character string table 141a will be described. However, the recognition result character string tables 141b, 141c,.

認識結果文字列テーブル１４１ａは、都道府県を示す項目に対応付けられている。認識結果文字列テーブル１４１ａには、優先順位を示す項目および認識結果文字列を示す項目が設けられている。各項目の横方向に並べられた情報同士が互いに関連付けられて、１つの認識結果文字列に関する情報を示す。 The recognition result character string table 141a is associated with an item indicating a prefecture. The recognition result character string table 141a is provided with items indicating priority and items indicating recognition result character strings. Information arranged in the horizontal direction of each item is associated with each other to indicate information related to one recognition result character string.

優先順位を示す項目には、該当の認識結果文字列の優先度を示す情報が設定される。認識結果文字列を示す項目には、帳票画像２００から読み取られた認識結果文字列が設定される。 In the item indicating the priority order, information indicating the priority of the corresponding recognition result character string is set. In the item indicating the recognition result character string, a recognition result character string read from the form image 200 is set.

認識結果文字列テーブル１４１ａには、例えば、優先順位が“第１候補”、認識結果文字列が“神奈ノ１１”という情報が設定される。また、認識結果文字列に含まれる各文字のテンプレート文字との相違を示す相違度が設定される。例えば、“神”という文字に対して、相違度“１３５”が設定される。また、“奈”という文字に対して、相違度“１５７”が設定される。このように、認識結果文字列に含まれる全ての文字に対して相違度が設定される。 In the recognition result character string table 141a, for example, information that the priority is “first candidate” and the recognition result character string is “Kanano 11” is set. In addition, a dissimilarity indicating a difference between each character included in the recognition result character string and the template character is set. For example, the degree of difference “135” is set for the character “God”. Also, the difference degree “157” is set for the character “NA”. Thus, the dissimilarity is set for all characters included in the recognition result character string.

なお、認識結果文字列生成部１２０は、各認識結果文字列に含まれる全ての文字について、相違度が最も小さいもの同士を組み合わせた認識結果文字列の優先順位を高く設定する。 Note that the recognition result character string generation unit 120 sets a high priority for the recognition result character string in which all the characters included in each recognition result character string are combined with those having the smallest difference.

図８は、比較文字列テーブルの第１のデータ構造例を示す図である。比較文字列テーブル１４２ａは、比較文字列生成部１５０によって生成され、制御情報記憶部１４０に格納される。比較文字列テーブル１４２ａは、認識結果文字列“神奈ノ１１”（有効文字数５）が取得された場合に生成されたものを例示している。 FIG. 8 is a diagram illustrating a first data structure example of the comparison character string table. The comparison character string table 142 a is generated by the comparison character string generation unit 150 and stored in the control information storage unit 140. The comparison character string table 142a exemplifies what is generated when the recognition result character string “Kanano 11” (number of valid characters 5) is acquired.

比較文字列テーブル１４２ａには、比較文字列を示す項目、補正文字列を示す項目および比較文字数を示す項目が設けられている。各項目の横方向に並べられた情報同士が互いに関連付けられて、１つの比較文字列の情報を示す。 The comparison character string table 142a includes an item indicating a comparison character string, an item indicating a correction character string, and an item indicating the number of comparison characters. Information arranged in the horizontal direction of each item is associated with each other to indicate information of one comparison character string.

比較文字列を示す項目には、認識結果文字列と比較するための比較文字列が設定される。補正文字列を示す項目には、該当の比較文字列に対応する補正文字列が設定される。比較文字数を示す項目には、該当の比較文字列に対応する比較文字数が設定される。 In the item indicating the comparison character string, a comparison character string for comparison with the recognition result character string is set. In the item indicating the correction character string, a correction character string corresponding to the corresponding comparison character string is set. In the item indicating the number of comparison characters, the number of comparison characters corresponding to the corresponding comparison character string is set.

比較文字列テーブル１４２ａには、例えば、比較文字列が“（ネ）申奈１１１県”、補正文字列が“神奈川県”、比較文字数が“６”という情報が設定される。これは、比較文字列が、図３で説明した（第２の場合）に該当して生成され、誤読候補文字列の先頭文字が削除されて生成されたものである。括弧内の文字“ネ”は、削除された文字であり、比較対象としては用いられない。そして、この比較文字列の比較文字数が除外した１文字を有効文字数から減算した“６−１＝５”であることを示している。 In the comparison character string table 142a, for example, information that the comparison character string is “(N) Shinna 111 prefecture”, the correction character string is “Kanagawa prefecture”, and the number of comparison characters is “6” is set. In this case, the comparison character string is generated corresponding to the case described in FIG. 3 (second case), and the first character of the misreading candidate character string is deleted. The character “ne” in parentheses is a deleted character and is not used as a comparison target. This indicates that “6-1 = 5” is obtained by subtracting one character excluded from the number of comparison characters from the number of valid characters.

比較文字列テーブル１４２ａに例示したその他の比較文字列“神奈１１１県”や“香１１１県”などは、（第１の場合）に該当して誤読候補文字列がそのまま比較文字列として採用されたものである。この場合、有効文字数の値がそのまま比較文字数として設定される。 The other comparison character strings “Kan 111 prefecture” and “Kaori 111 prefecture” illustrated in the comparison character string table 142a correspond to (first case) and the misreading candidate character strings are directly adopted as the comparison character strings. Is. In this case, the value of the number of valid characters is set as the number of comparison characters as it is.

図９は、代替認識結果文字列テーブルのデータ構造例を示す図である。代替認識結果テーブル１４２ｂは、比較文字列生成部１５０によって生成され、制御情報記憶部１４０に格納される。代替認識結果文字列テーブル１４２ｂは、認識結果文字列として“神奈ノ１１”が取得された場合に生成されたものを例示している。 FIG. 9 is a diagram illustrating a data structure example of the alternative recognition result character string table. The substitution recognition result table 142b is generated by the comparison character string generation unit 150 and stored in the control information storage unit 140. The alternative recognition result character string table 142b illustrates an example generated when “Kanano 11” is acquired as the recognition result character string.

代替認識結果文字列テーブル１４２ｂには、代替認識結果文字列を示す項目および対象文字数を示す項目が設けられている。各項目の横方向に並べられた情報同士が互いに関連付けられて、１つの代替認識結果文字列の情報を示す。 The substitution recognition result character string table 142b includes an item indicating the substitution recognition result character string and an item indicating the number of target characters. Information arranged in the horizontal direction of each item is associated with each other to indicate information of one alternative recognition result character string.

代替認識結果文字列を示す項目には、認識結果文字列に対して生成された代替認識結果文字列が設定される。対象文字数を示す項目には、該当の代替認識結果文字列の文字数が設定される。 In the item indicating the alternative recognition result character string, an alternative recognition result character string generated for the recognition result character string is set. In the item indicating the number of target characters, the number of characters of the corresponding alternative recognition result character string is set.

代替認識結果文字列テーブル１４２ｂには、例えば、代替認識結果文字列が“奈ノ１１”、対象文字数が“４”という情報が設定される。これは、図３で説明した（第３の場合）に該当して、認識結果文字列“神奈ノ１１”の先頭文字“神”を除いた“奈ノ１１”が代替認識結果文字列として生成されたものである。そして、その代替認識結果文字列の文字数が“４”であることを示している。なお、“神奈ノ１１”という認識結果文字列には“１”の文字列が連続して２つ含まれている。このため、比較文字列生成部１５０は、“神奈ノ１”という代替認識結果文字列が２つ生成されることになるが、そのうちの１つのみを優先して代替認識結果文字列テーブル１４２ｂに登録する。 In the alternative recognition result character string table 142b, for example, information that the alternative recognition result character string is “Nano 11” and the number of target characters is “4” is set. This corresponds to the case described in FIG. 3 (third case), and “Nano 11” excluding the first character “God” of the recognition result character string “Kanano 11” is generated as an alternative recognition result character string. It has been done. The number of characters in the alternative recognition result character string is “4”. Note that the recognition result character string “Kanano 11” includes two consecutive character strings “1”. Therefore, the comparison character string generation unit 150 generates two alternative recognition result character strings “Kanano 1”. Only one of them is given priority in the alternative recognition result character string table 142b. sign up.

図１０は、比較文字列テーブルの第２のデータ構造例を示す図である。比較文字列テーブル１４２ｃは、比較文字列生成部１５０によって生成され、制御情報記憶部１４０に格納される。比較文字列テーブル１４２ｃは、認識結果文字列として“神剛”が取得された場合に生成されたものを例示している。 FIG. 10 is a diagram illustrating a second data structure example of the comparison character string table. The comparison character string table 142 c is generated by the comparison character string generation unit 150 and stored in the control information storage unit 140. The comparison character string table 142c exemplifies what is generated when “Shingo” is acquired as the recognition result character string.

比較文字列テーブル１４２ｃの構成は、比較文字列テーブル１４２ａの構成と同一であるため、説明を省略する。
比較文字列テーブル１４２ｃには、例えば、比較文字列が“奈川県”、補正文字列が“神奈川県”、比較文字数が“２”という情報が設定される。これは、比較文字列が図３で説明した（第４の場合）に該当して生成され、補正文字列の先頭の文字が除外されて生成されたものである。このとき、補正文字列の文字数から除外した分の文字数を減算した値が比較文字数となる。また、比較文字列“神奈県”も同様にして生成されたものである。 Since the configuration of the comparison character string table 142c is the same as that of the comparison character string table 142a, the description thereof is omitted.
In the comparison character string table 142c, for example, information that the comparison character string is “Nagawa”, the correction character string is “Kanagawa”, and the number of comparison characters is “2” is set. In this example, the comparison character string is generated corresponding to the case described in FIG. 3 (fourth case), and the first character of the correction character string is excluded. At this time, a value obtained by subtracting the number of characters excluded from the number of characters in the correction character string is the number of comparison characters. Also, the comparison character string “Kanagawa” is generated in the same manner.

比較文字列テーブル１４２ｃに例示したその他の比較文字列“香１県”は、（第１の場合）に該当して誤読候補文字列がそのまま比較文字列として採用されたものである。
図１１は、調整定数テーブルのデータ構造例を示す図である。調整定数テーブル１４３は、制御情報記憶部１４０に予め格納される。調整定数テーブル１４３には、項目名を示す項目および調整定数を示す項目が設けられている。各項目の横方向に並べられた情報同士が互いに関連付けられて、１つの調整定数に関する情報を示す。 The other comparison character string “Ko 1 prefecture” exemplified in the comparison character string table 142c corresponds to (first case), and the misreading candidate character string is directly adopted as the comparison character string.
FIG. 11 is a diagram illustrating a data structure example of the adjustment constant table. The adjustment constant table 143 is stored in the control information storage unit 140 in advance. The adjustment constant table 143 includes an item indicating an item name and an item indicating an adjustment constant. Information arranged in the horizontal direction of each item is associated with each other to indicate information regarding one adjustment constant.

項目名を示す項目には、調整定数の項目名を示す情報が設定される。調整定数を示す項目には、該当の調整定数の値を示す情報が設定される。
調整定数テーブル１４３には、例えば、次のような項目名および調整定数が設定される。 In the item indicating the item name, information indicating the item name of the adjustment constant is set. In the item indicating the adjustment constant, information indicating the value of the corresponding adjustment constant is set.
In the adjustment constant table 143, for example, the following item names and adjustment constants are set.

（ａ）“文字不一致相違度（Ｖ）”は、誤読候補文字列に含まれる該当の文字が、認識結果文字列（または代替認識結果文字列）に含まれない場合に、該当の文字の代替相違度とする値Ｖを示している。Ｖの値としては、例えば、Ｖ＝１０００が設定される。 (A) “Character mismatch difference degree (V)” is a substitution of a corresponding character when the corresponding character included in the misreading candidate character string is not included in the recognition result character string (or alternative recognition result character string). A difference value V is shown. As the value of V, for example, V = 1000 is set.

（ｂ）“１文字分離相違度（Ｗ）”は、誤読候補文字列の有効文字数と認識結果文字列の有効文字数との差が１である場合に、誤読候補文字列に含まれる各文字の代替相違度の合計に加算する値Ｗを示している。Ｗの値としては、例えば、Ｗ＝１１００が設定される。なお、比較文字列生成部１５０は、誤読候補文字列として、認識結果文字列との有効文字数の差が２文字以上のものを抽出してもよい。この場合、代替相違度に対して２文字の相違があることを反映した値を代替相違度に加算する。この場合には、各文字の代替相違度の合計にＷよりも更に大きな値として、例えば、１００００を加算することも考えられる。 (B) “One-character separation difference (W)” is the difference between the number of valid characters in the misread candidate character string and the number of valid characters in the recognition result character string, and is 1 for each character included in the misread candidate character string. A value W to be added to the total of the alternative dissimilarities is shown. For example, W = 1100 is set as the value of W. The comparison character string generation unit 150 may extract a misreading candidate character string that has a difference in the number of valid characters from the recognition result character string of two or more characters. In this case, a value reflecting that there is a difference of two characters with respect to the alternative dissimilarity is added to the alternative dissimilarity. In this case, it may be possible to add, for example, 10000 as a value larger than W to the total of alternative dissimilarities of each character.

（ｃ）“文字単位調整相違度閾値（Ｘ）”および“文字単位調整相違度差閾値（Ｙ）”は、誤読候補文字列に含まれる該当の文字が、認識結果文字列に含まれる場合に、該当の文字の代替相違度に優位な評価を与えるか否かを判定するための閾値Ｘ，Ｙを示している。例えば、該当の文字の代替相違度の値を小さく評価することで、該当の文字の確度が高いものとして取り扱うことができる。“文字単位調整相違度閾値（Ｘ）”は、認識結果文字列（または代替認識結果文字列）の該当の文字に対して認識結果文字列生成部１２０が算出した相違度がＸ以下の場合に、優位な評価を与えうることを示している。Ｘの値としては、例えば、Ｘ＝２００が設定される。また、“文字単位調整相違度差閾値（Ｙ）”は、認識結果文字列の該当の文字に対して、認識結果文字列生成部１２０が生成した第２候補の認識結果文字列中の該当位置の文字との相違度差がＹ以上の場合に、優位な評価を与えうることを示している。本例では、上記の両条件が満たされたときに、代替相違度算出部１６０は、認識結果文字列生成部１２０が算出した相違度を１／１０倍した値を、該当の文字の代替相違度とするものとする。このようにすると、認識結果文字列生成部１２０が第１候補の認識結果文字列を生成した段階で、他の認識結果文字列の同一文字位置の文字との相違度の差が大きい文字を優位に扱うことができる。 (C) “Character unit adjustment difference threshold (X)” and “Character adjustment difference difference threshold (Y)” are used when the corresponding character included in the misread candidate character string is included in the recognition result character string. Threshold values X and Y for determining whether or not to give a superior evaluation to the substitution difference degree of the corresponding character are shown. For example, by evaluating the value of the alternative dissimilarity of the corresponding character to be small, it can be handled as having a high accuracy of the corresponding character. The “character unit adjustment difference threshold (X)” is used when the difference calculated by the recognition result character string generation unit 120 with respect to the corresponding character of the recognition result character string (or the alternative recognition result character string) is X or less. , It can be given a favorable evaluation. As the value of X, for example, X = 200 is set. The “character unit adjustment difference difference threshold (Y)” is a corresponding position in the recognition result character string of the second candidate generated by the recognition result character string generation unit 120 for the corresponding character of the recognition result character string. It shows that a superior evaluation can be given when the difference in degree of difference from the letter is greater than or equal to Y. In this example, when both of the above conditions are satisfied, the substitution difference calculation unit 160 sets a value obtained by multiplying the difference calculated by the recognition result character string generation unit 120 by 1/10 to the substitution difference of the corresponding character. It shall be a degree. In this way, when the recognition result character string generation unit 120 generates the first candidate recognition result character string, the character having a large difference in difference from the character at the same character position in the other recognition result character strings is dominant. Can be handled.

図１２は、確度定義テーブルのデータ構造例を示す図である。確度定義テーブル１４４は、制御情報記憶部１４０に予め格納される。確度定義テーブル１４４には、評価値を示す項目、評価値差を示す項目および確度を示す項目が設けられている。各項目の横方向に並べられた情報同士が互いに関連付けられて、１つの確度に関する情報を示す。 FIG. 12 is a diagram illustrating a data structure example of the accuracy definition table. The accuracy definition table 144 is stored in the control information storage unit 140 in advance. The accuracy definition table 144 includes an item indicating an evaluation value, an item indicating an evaluation value difference, and an item indicating the accuracy. Information arranged in the horizontal direction of each item is associated with each other to indicate information on one accuracy.

評価値を示す項目には、評価値の範囲を示す情報が設定される。評価値差を示す項目には、第１候補の誤読候補文字列と第２候補の誤読候補文字列との評価値の差の範囲を示す情報が設定される。確度を示す項目には、該当の評価値範囲かつ評価値差である場合の第１候補の誤読候補文字列の確度を示す情報が設定される。 Information indicating the range of the evaluation value is set in the item indicating the evaluation value. The item indicating the evaluation value difference is set with information indicating a range of evaluation value difference between the first candidate misreading candidate character string and the second candidate misreading candidate character string. In the item indicating the accuracy, information indicating the accuracy of the misreading candidate character string of the first candidate when the evaluation value range and the evaluation value difference are set is set.

確度定義テーブル１４４には、例えば、評価値が“６００以下”、評価値差が“５０以上”、確度が“高”という情報が設定される。これは、評価値が“６００以下”であり、評価値差が“５０以上”である場合、該当の第１候補の誤読候補文字列の確度を“高”とすることを示している。 In the accuracy definition table 144, for example, information that the evaluation value is “600 or less”, the evaluation value difference is “50 or more”, and the accuracy is “high” is set. This indicates that when the evaluation value is “600 or less” and the evaluation value difference is “50 or more”, the accuracy of the misreading candidate character string of the corresponding first candidate is set to “high”.

代替相違度算出部１６０は、確度定義テーブル１４４に基づいて、第１候補の誤読候補文字列の確度を決定することができる。
調整定数テーブル１４３および確度定義テーブル１４４には、利用環境に応じた最適な各パラメータが予め設定される。 The alternative difference calculation unit 160 can determine the accuracy of the misread candidate character string of the first candidate based on the accuracy definition table 144.
In the adjustment constant table 143 and the accuracy definition table 144, optimum parameters according to the use environment are set in advance.

次に、以上のような構成を有するコンピュータ１００の処理に関して説明する。
図１３は、文字認識処理の手順を示すフローチャートである。以下、図１３に示す処理をステップ番号に沿って説明する。 Next, processing of the computer 100 having the above configuration will be described.
FIG. 13 is a flowchart illustrating a procedure of character recognition processing. In the following, the process illustrated in FIG. 13 will be described in order of step number.

［ステップＳ１１］画像取得部１１０は、撮像装置１４が帳票を撮像して生成した帳票画像２００を取得する。画像取得部１１０は、取得した帳票画像２００を認識結果文字列生成部１２０に出力する。 [Step S 11] The image acquisition unit 110 acquires a form image 200 generated by the imaging device 14 capturing an image. The image acquisition unit 110 outputs the acquired form image 200 to the recognition result character string generation unit 120.

［ステップＳ１２］認識結果文字列生成部１２０は、画像取得部１１０から取得した帳票画像２００に含まれる領域２１１，２１２，２２１，２２２，２３１，２３２，・・・に記入された文字列を読み取り、認識結果文字列テーブル１４１ａ，１４１ｂ，１４１ｃ，・・・を生成して、これらを制御情報記憶部１４０に格納する。 [Step S12] The recognition result character string generation unit 120 reads the character strings written in the areas 211, 212, 221, 222, 231, 232,... Included in the form image 200 acquired from the image acquisition unit 110. , Recognition result character string tables 141a, 141b, 141c,... Are generated and stored in the control information storage unit 140.

［ステップＳ１３］比較文字列生成部１５０は、次に処理対象とする項目を特定する。例えば、各項目で処理対象とする順番は予め定められる。具体的には、帳票画像２００に含まれる各項目について、住所であれば、都道府県名、市区町村名、地区名、番地、建物名などの順に処理していくことが考えられる。また、氏名であれば、姓、名の順に処理していくことが考えられる。更に、金融機関名であれば、金融機関名、支店名の順で処理していくことが考えられる。 [Step S13] The comparison character string generation unit 150 identifies an item to be processed next. For example, the order of processing for each item is predetermined. Specifically, for each item included in the form image 200, if it is an address, it may be processed in the order of prefecture name, city name, district name, address, building name, and the like. In the case of a name, it may be possible to process in the order of last name and first name. Furthermore, in the case of a financial institution name, it is possible to process in the order of the financial institution name and the branch name.

［ステップＳ１４］比較文字列生成部１５０は、制御情報記憶部１４０に記憶された認識結果文字列テーブル１４１ａを参照して、処理対象とした項目の認識結果文字列（例えば、“神奈ノ１１”）を取得する。そして、比較文字列生成部１５０は、抽出した認識結果文字列の有効文字数（例えば、“神奈ノ１１”に対して“５”）を取得する。比較文字列生成部１５０は、例えば、帳票画像２００において該当の文字列を認識した位置によって、各文字列の項目を特定することができる。また、例えば、都道府県名と市区町村名とが「神奈川県相模原市」のように連続して認識されるような場合には、該当の文字に含まれる“県”という文字や、“市”という文字を識別して、都道府県名や市区町村名を特定し、対応する文字列を取得することが考えられる。また、氏名であれば、姓名が連続されて記載されている場合には、姓と名との間の空白などのデリミタを識別して、姓、名を区別することも考えられる。 [Step S14] The comparison character string generation unit 150 refers to the recognition result character string table 141a stored in the control information storage unit 140, and recognizes the recognition result character string of the item to be processed (for example, “Kanano 11”). ) To get. Then, the comparison character string generation unit 150 acquires the number of valid characters of the extracted recognition result character string (for example, “5” for “Kanano 11”). For example, the comparison character string generation unit 150 can specify the item of each character string based on the position where the corresponding character string is recognized in the form image 200. For example, when the prefecture name and city name are recognized consecutively, such as “Sagamihara City, Kanagawa Prefecture”, the characters “prefecture” or “city” It is conceivable to identify the character “”, specify the name of the prefecture or city, and obtain the corresponding character string. In the case of a full name, if the first and last names are described consecutively, it may be possible to distinguish the first name and the last name by identifying a delimiter such as a space between the first name and the last name.

［ステップＳ１５］比較文字列生成部１５０は、誤読候補情報記憶部１３０に記憶された誤読候補情報（都道府県テーブル１３１など）を参照し、上記ステップＳ１４で取得した有効文字数に基づいて、処理対象の項目に対応する誤読候補文字列を抽出する。具体的には、比較文字列生成部１５０は、取得した有効文字数±ｎの範囲の有効文字数となる誤読候補文字列を誤読候補情報から抽出する。なお、ｎの値としては、以降の処理の精度および処理速度を考慮して最適な値が決められる。ここでは、このようなｎの値として、ｎ＝１が選択されるものとする。この場合、例えば、有効文字数“５”の認識結果文字列に対して、比較文字列生成部１５０は、誤読候補情報に登録された有効文字数が“６”、“５”、“４”である誤読候補文字列群を取得する。なお、比較対象を増やして照合の精度を更に向上する場合には、ｎを２以上の整数としてもよい。 [Step S15] The comparison character string generation unit 150 refers to the misreading candidate information (such as the prefecture table 131) stored in the misreading candidate information storage unit 130, and performs processing based on the number of valid characters acquired in Step S14. The misreading candidate character string corresponding to the item is extracted. Specifically, the comparison character string generation unit 150 extracts from the misreading candidate information the misreading candidate character string that has the number of valid characters in the range of the acquired number of valid characters ± n. As the value of n, an optimum value is determined in consideration of the accuracy and processing speed of subsequent processing. Here, n = 1 is selected as such a value of n. In this case, for example, for the recognition result character string with the number of valid characters “5”, the comparison character string generation unit 150 has the number of valid characters registered in the misreading candidate information “6”, “5”, “4”. Get misreading candidate character string group. Note that n may be an integer of 2 or more when the number of comparison targets is increased to further improve the accuracy of collation.

［ステップＳ１６］比較文字列生成部１５０は、取得した誤読候補文字列群に対応する補正文字列群を取得する。
［ステップＳ１７］比較文字列生成部１５０は、取得した認識結果文字列、誤読候補文字列群および補正文字列群に基づいて、比較文字列テーブルを生成する。比較文字列生成部１５０は、生成した比較文字列テーブルを制御情報記憶部１４０に格納する。 [Step S16] The comparison character string generation unit 150 acquires a correction character string group corresponding to the acquired misreading candidate character string group.
[Step S17] The comparison character string generation unit 150 generates a comparison character string table based on the acquired recognition result character string, misreading candidate character string group, and correction character string group. The comparison character string generation unit 150 stores the generated comparison character string table in the control information storage unit 140.

［ステップＳ１８］代替相違度算出部１６０は、制御情報記憶部１４０に格納された比較文字列テーブルに含まれる各比較文字列の各文字について、認識結果文字列の各文字に対する代替相違度を算出する。そして、算出した各代替相違度合計Ｔを求める。 [Step S18] The substitution difference calculation unit 160 calculates a substitution difference for each character of the recognition result character string for each character of each comparison character string included in the comparison character string table stored in the control information storage unit 140. To do. Then, the calculated alternative dissimilarity total T is obtained.

［ステップＳ１９］代替相違度算出部１６０は、各比較文字列についての代替相違度合計Ｔを該当の比較文字列に対応する比較文字数で割った商を各比較文字列の評価値として算出する。 [Step S19] The alternative difference calculation unit 160 calculates a quotient obtained by dividing the alternative difference total T for each comparison character string by the number of comparison characters corresponding to the corresponding comparison character string as an evaluation value of each comparison character string.

［ステップＳ２０］代替相違度算出部１６０は、算出した評価値に基づき、各比較文字列の優先順位を決定する。優先順位は、評価値が小さいほど優位であるとして順位付けされる。代替相違度算出部１６０は、各比較文字列に対して算出した評価値の情報を補正文字列特定部１７０に出力する。 [Step S20] The alternative dissimilarity calculation unit 160 determines the priority of each comparison character string based on the calculated evaluation value. The priorities are ranked as being superior as the evaluation value is smaller. The alternative difference calculation unit 160 outputs information on the evaluation value calculated for each comparison character string to the corrected character string specifying unit 170.

［ステップＳ２１］補正文字列特定部１７０は、代替相違度算出部１６０が算出した各比較文字列の評価値のうち優先順位の最も高い第１候補の比較文字列を取得し、これに対応する補正文字列を特定する。補正文字列特定部１７０は、制御情報記憶部１４０に記憶された確度定義テーブル１４４を参照して、特定した補正文字列の確度を決定する。補正文字列特定部１７０は、特定した補正文字列と対応する確度とを出力部１８０に出力する。 [Step S21] The corrected character string specifying unit 170 acquires the first candidate comparison character string having the highest priority among the evaluation values of the comparison character strings calculated by the alternative dissimilarity calculation unit 160, and corresponds to this. Specify the correction character string. The corrected character string specifying unit 170 refers to the accuracy definition table 144 stored in the control information storage unit 140 and determines the accuracy of the specified corrected character string. The corrected character string specifying unit 170 outputs the specified correction character string and the corresponding accuracy to the output unit 180.

［ステップＳ２２］比較文字列生成部１５０は、現在処理対象としている項目中に次の処理対象となる項目が存在するか否かを判定する。存在する場合、処理がステップＳ１３に移される。存在しない場合、処理がステップＳ２３に移される。比較文字列生成部１５０は、例えば、予め定められた順序の最終となる項目（例えば、都道府県名、市区町村名、地区名、・・・の系列の最後の項目）まで処理を完了したか否かを検知することで、上記判定を行うことができる。 [Step S22] The comparison character string generation unit 150 determines whether or not an item to be processed next exists among items to be processed. If it exists, the process proceeds to step S13. If not, the process moves to step S23. For example, the comparison character string generation unit 150 completes the process up to the last item in the predetermined order (for example, the last item in the series of prefecture name, city name, district name,...). The above determination can be made by detecting whether or not.

［ステップＳ２３］出力部１８０は、補正文字列特定部１７０から取得した補正文字列と確度との一覧をモニタ１１に表示するための画面を生成し、モニタ１１に生成した画面を表示させる。 [Step S23] The output unit 180 generates a screen for displaying a list of the corrected character strings and the accuracy acquired from the corrected character string specifying unit 170 on the monitor 11, and causes the monitor 11 to display the generated screen.

このようにして、コンピュータ１００は、帳票画像２００から読み取られた認識結果文字列に分離文字列が含まれていることを考慮した知識補正を行う。
図１４は、比較文字列生成処理の手順を示す第１のフローチャートである。以下、図１４に示す処理をステップ番号に沿って説明する。なお、以下の処理は、図１３のステップＳ１７の処理を詳細に示すものである。 In this way, the computer 100 performs knowledge correction considering that the recognition result character string read from the form image 200 includes the separated character string.
FIG. 14 is a first flowchart illustrating the procedure of the comparison character string generation process. In the following, the process illustrated in FIG. 14 will be described in order of step number. The following process shows the process in step S17 in FIG. 13 in detail.

［ステップＳ３１］比較文字列生成部１５０は、図１３のステップＳ１５で取得した誤読候補文字列群から比較文字列を未生成の誤読候補文字列を１つ抽出する。
［ステップＳ３２］比較文字列生成部１５０は、認識結果文字列と抽出した誤読候補文字列とを比較する。 [Step S31] The comparison character string generation unit 150 extracts one misreading candidate character string for which no comparison character string has been generated from the misreading candidate character string group acquired in step S15 of FIG.
[Step S32] The comparison character string generation unit 150 compares the recognition result character string with the extracted misreading candidate character string.

［ステップＳ３３］比較文字列生成部１５０は、認識結果文字列の有効文字数と、誤読候補文字列の有効文字数とが一致しているか否かを判定する。一致している場合、処理がステップＳ３４に移される。一致していない場合、処理がステップＳ３５に移される。 [Step S33] The comparison character string generation unit 150 determines whether the number of valid characters in the recognition result character string matches the number of valid characters in the misreading candidate character string. If they match, the process proceeds to step S34. If not, the process moves to step S35.

［ステップＳ３４］比較文字列生成部１５０は、抽出した誤読候補文字列を比較文字列として、制御情報記憶部１４０に記憶された比較文字列テーブル１４２ａに登録する。このとき、比較文字列生成部１５０は、該当の比較文字列につき、対応する補正文字列と、比較文字列の比較文字数とを比較文字列テーブル１４２ａに登録する。そして、処理がステップＳ４４に移される。 [Step S34] The comparison character string generation unit 150 registers the extracted misread candidate character string as a comparison character string in the comparison character string table 142a stored in the control information storage unit 140. At this time, the comparison character string generation unit 150 registers the corresponding correction character string and the number of comparison characters of the comparison character string in the comparison character string table 142a for the corresponding comparison character string. Then, the process proceeds to step S44.

［ステップＳ３５］比較文字列生成部１５０は、誤読候補文字列の有効文字数の方が、認識結果文字列の有効文字数よりも大きいか否かを判定する。大きい場合、処理がステップＳ３６に移される。大きくない場合、処理がステップＳ３９に移される。 [Step S35] The comparison character string generation unit 150 determines whether or not the number of valid characters in the misreading candidate character string is larger than the number of valid characters in the recognition result character string. If larger, the process proceeds to step S36. If not, the process moves to step S39.

［ステップＳ３６］比較文字列生成部１５０は、認識結果文字列の有効文字数と誤読候補文字列の有効文字数との差を取得する。ここでは、有効文字数の差として１が取得される。 [Step S36] The comparison character string generation unit 150 acquires the difference between the number of valid characters in the recognition result character string and the number of valid characters in the misread candidate character string. Here, 1 is acquired as the difference in the number of valid characters.

［ステップＳ３７］比較文字列生成部１５０は、誤読候補文字列から取得した差の分の文字を除いた文字列を生成する。
［ステップＳ３８］比較文字列生成部１５０は、上記ステップＳ３７で生成した文字列を比較文字列として、比較文字列テーブル１４２ａに登録する。このとき、比較文字列生成部１５０は、該当の比較文字列につき、対応する補正文字列と、比較文字列の比較文字数とを比較文字列テーブル１４２ａに登録する。 [Step S 37] The comparison character string generation unit 150 generates a character string excluding the difference characters acquired from the misread candidate character strings.
[Step S38] The comparison character string generation unit 150 registers the character string generated in step S37 as a comparison character string in the comparison character string table 142a. At this time, the comparison character string generation unit 150 registers the corresponding correction character string and the number of comparison characters of the comparison character string in the comparison character string table 142a for the corresponding comparison character string.

［ステップＳ３９］比較文字列生成部１５０は、認識結果文字列と誤読候補文字列との有効文字数の差を取得する。
［ステップＳ４０］比較文字列生成部１５０は、該当の有効文字数差の代替認識結果文字列を未生成であるか否かを判定する。未生成である場合、処理がステップＳ４１に移される。生成済みである場合、処理がステップＳ４３に移される。比較文字列生成部１５０は、例えば、認識結果文字列の有効文字数が５で、誤読候補文字列の有効文字数が４である場合に、制御情報記憶部１４０に記憶された代替認識結果文字列テーブル１４２ｂを参照して、対象文字列“４”の代替認識結果文字列が存在するか否かにより上記判定を行うことができる。 [Step S39] The comparison character string generation unit 150 acquires the difference in the number of valid characters between the recognition result character string and the misreading candidate character string.
[Step S40] The comparison character string generation unit 150 determines whether or not an alternative recognition result character string of the corresponding difference in the number of valid characters has not been generated. If not yet generated, the process proceeds to step S41. If it has already been generated, the process proceeds to step S43. For example, when the number of valid characters in the recognition result character string is 5 and the number of valid characters in the misreading candidate character string is 4, the comparison character string generation unit 150 stores the alternative recognition result character string table stored in the control information storage unit 140. With reference to 142b, the above determination can be made based on whether or not there is an alternative recognition result character string of the target character string “4”.

［ステップＳ４１］比較文字列生成部１５０は、認識結果文字列から取得した差の分の文字を除いた文字列を生成する。
［ステップＳ４２］比較文字列生成部１５０は、生成した文字列を代替認識結果文字列として制御情報記憶部１４０に記憶された代替認識結果文字列テーブル１４２ｂに登録する。 [Step S41] The comparison character string generation unit 150 generates a character string excluding the difference characters acquired from the recognition result character string.
[Step S42] The comparison character string generation unit 150 registers the generated character string as an alternative recognition result character string in the alternative recognition result character string table 142b stored in the control information storage unit 140.

［ステップＳ４３］比較文字列生成部１５０は、抽出した誤読候補文字列を比較文字列として、制御情報記憶部１４０に記憶された比較文字列テーブル１４２ａに登録する。このとき、比較文字列生成部１５０は、該当の比較文字列につき、対応する補正文字列と、比較文字列の比較文字数とを比較文字列テーブル１４２ａに登録する。 [Step S43] The comparison character string generation unit 150 registers the extracted misreading candidate character string as a comparison character string in the comparison character string table 142a stored in the control information storage unit 140. At this time, the comparison character string generation unit 150 registers the corresponding correction character string and the number of comparison characters of the comparison character string in the comparison character string table 142a for the corresponding comparison character string.

［ステップＳ４４］比較文字列生成部１５０は、誤読候補文字列群に含まれる全ての誤読候補文字列に対して、比較文字列を生成済みであるか否かを判定する。生成済みである場合、処理がステップＳ４５に移される。生成済みでない場合、処理がステップＳ３１に移される。 [Step S44] The comparison character string generation unit 150 determines whether or not comparison character strings have been generated for all the misreading candidate character strings included in the misreading candidate character string group. If it has been generated, the process proceeds to step S45. If not generated, the process proceeds to step S31.

図１５は、比較文字列生成処理の手順を示す第２のフローチャートである。以下、図１５に示す処理をステップ番号に沿って説明する。なお、以下の処理は、図１４のステップＳ４４の処理から続けて実行されるものである。 FIG. 15 is a second flowchart illustrating the procedure of the comparison character string generation process. In the following, the process illustrated in FIG. 15 will be described in order of step number. The following processing is executed continuously from the processing in step S44 in FIG.

［ステップＳ４５］比較文字列生成部１５０は、図１３のステップＳ１６で取得した補正文字列群から未抽出の補正文字列を１つ抽出する。
［ステップＳ４６］比較文字列生成部１５０は、認識結果文字列と抽出した補正文字列とを比較する。 [Step S45] The comparison character string generation unit 150 extracts one unextracted correction character string from the correction character string group acquired in step S16 of FIG.
[Step S46] The comparison character string generation unit 150 compares the recognition result character string with the extracted corrected character string.

［ステップＳ４７］比較文字列生成部１５０は、補正文字列の有効文字数が認識結果文字列の有効文字数よりも小さい否かを判定する。小さい場合（有効文字数が同じ場合も含む）、処理がステップＳ５１に移される。小さくない場合、すなわち、補正文字列の有効文字数の方が、認識結果文字列の有効文字数よりも大きい場合、処理がステップＳ４８に移される。 [Step S47] The comparison character string generation unit 150 determines whether or not the number of valid characters in the correction character string is smaller than the number of valid characters in the recognition result character string. If it is smaller (including the case where the number of valid characters is the same), the process proceeds to step S51. If not, that is, if the number of valid characters in the corrected character string is greater than the number of valid characters in the recognition result character string, the process proceeds to step S48.

［ステップＳ４８］比較文字列生成部１５０は、認識結果文字列の有効文字数と補正文字列の有効文字数との差を取得する。例えば、認識結果文字列が“神剛”であれば、その有効文字数は“２”であり、補正文字数が“神奈川県”であれば、その有効文字数は“３”であるので、有効文字数の差は、“３−２＝１”となる。 [Step S48] The comparison character string generation unit 150 acquires the difference between the number of valid characters in the recognition result character string and the number of valid characters in the correction character string. For example, if the recognition result character string is “Shingo”, the number of effective characters is “2”, and if the correction character number is “Kanagawa”, the number of effective characters is “3”. The difference is “3-2 = 1”.

［ステップＳ４９］比較文字列生成部１５０は、補正文字列から取得した差の分の文字を除いた文字列を生成する。
［ステップＳ５０］比較文字列生成部１５０は、上記ステップＳ４９で生成した文字列を比較文字列として、比較文字列テーブル１４２ａに登録する。このとき、比較文字列生成部１５０は、該当の比較文字列につき、対応する補正文字列と、比較文字列の比較文字数とを比較文字列テーブル１４２ａに登録する。 [Step S49] The comparison character string generation unit 150 generates a character string excluding the difference characters acquired from the corrected character string.
[Step S50] The comparison character string generation unit 150 registers the character string generated in step S49 in the comparison character string table 142a as a comparison character string. At this time, the comparison character string generation unit 150 registers the corresponding correction character string and the number of comparison characters of the comparison character string in the comparison character string table 142a for the corresponding comparison character string.

［ステップＳ５１］比較文字列生成部１５０は、補正文字列群に含まれる全ての補正文字列を抽出済みであるか否かを判定する。全て抽出済みである場合、処理が完了する。全て抽出済みでない場合、処理がステップＳ４５に移される。 [Step S51] The comparison character string generation unit 150 determines whether or not all correction character strings included in the correction character string group have been extracted. If all have been extracted, the process is complete. If not all have been extracted, the process proceeds to step S45.

このようにして、コンピュータ１００は、認識結果文字列と誤読候補文字列との有効文字数の差および認識結果文字列と補正文字列との有効文字数の差に応じて、比較文字列を生成する。そして、比較文字列と認識結果文字列とについて、同じ文字位置（例えば、先頭から２文字目など）にある文字同士を比較する。 In this way, the computer 100 generates a comparison character string according to the difference in the number of valid characters between the recognition result character string and the misreading candidate character string and the difference in the number of valid characters between the recognition result character string and the correction character string. Then, the comparison character string and the recognition result character string are compared with each other at the same character position (for example, the second character from the top).

なお、図１４のステップＳ３４，Ｓ３８において比較文字列生成部１５０が比較文字列として比較文字列テーブル１４２ａに登録しようとする際に、該当の比較文字列と同一の比較文字列が既に登録済みの場合も考えられる。この場合には、既に取得されている文字列のみを比較文字列とし、新たに登録しようとしている比較文字列は破棄する。例えば、既に“ネ申奈１１”が比較文字列として登録されている場合には、新たに登録しようとしている“ネ申奈１１”は破棄される。これにより、同一の比較文字列について重複した比較処理が行われないようにすることができる。 When the comparison character string generation unit 150 attempts to register the comparison character string in the comparison character string table 142a in steps S34 and S38 of FIG. 14, the same comparison character string as the corresponding comparison character string has already been registered. Cases are also conceivable. In this case, only the already acquired character string is used as the comparison character string, and the comparison character string to be newly registered is discarded. For example, if “Ne Shina 11” has already been registered as a comparison character string, “Ne Shina 11” to be newly registered is discarded. Thereby, it is possible to prevent duplicate comparison processing from being performed for the same comparison character string.

図１６は、代替相違度合計の算出処理の手順を示すフローチャートである。以下、図１６に示す処理をステップ番号に沿って説明する。なお、以下の処理は、図１３のステップＳ１８の処理を詳細に示すものである。 FIG. 16 is a flowchart showing the procedure of the calculation process of the total alternative difference. In the following, the process illustrated in FIG. 16 will be described in order of step number. The following process shows the process in step S18 in FIG. 13 in detail.

［ステップＳ６１］代替相違度算出部１６０は、制御情報記憶部１４０に記憶された比較文字列テーブル１４２ａに含まれる比較文字列のうち、代替相違度を未算出の比較文字列を１つ抽出する。 [Step S61] The alternative dissimilarity calculation unit 160 extracts one comparison character string for which the alternative dissimilarity has not been calculated from the comparison character strings included in the comparison character string table 142a stored in the control information storage unit 140. .

［ステップＳ６２］代替相違度算出部１６０は、抽出した比較文字列の比較文字数を取得する。
［ステップＳ６３］代替相違度算出部１６０は、認識結果文字列の有効文字数と比較文字数とが等しいか否かを判定する。等しい場合、処理がステップＳ６４に移される。等しくない場合、処理がステップＳ６５に移される。 [Step S62] The substitution difference calculation unit 160 acquires the number of comparison characters of the extracted comparison character string.
[Step S63] The substitution difference calculation unit 160 determines whether the number of valid characters in the recognition result character string is equal to the number of comparison characters. If equal, the process proceeds to step S64. If not equal, the process proceeds to step S65.

［ステップＳ６４］代替相違度算出部１６０は、比較文字列と比較する比較元の文字列として認識結果文字列を抽出する。そして、処理がステップＳ６６に移される。すなわち、以降の処理は、認識結果文字列に対して実行されることになる。 [Step S64] The alternative dissimilarity calculation unit 160 extracts a recognition result character string as a comparison source character string to be compared with the comparison character string. Then, the process proceeds to step S66. That is, the subsequent processing is executed on the recognition result character string.

［ステップＳ６５］代替相違度算出部１６０は、比較文字列と比較する比較元の文字列として、制御情報記憶部１４０に記憶された代替認識結果文字列テーブル１４２ｂから比較文字数と対象文字数が等しい代替認識結果文字列を１つ抽出する。 [Step S65] The substitution difference calculation unit 160 substitutes the comparison character number equal to the comparison character number from the substitution recognition result character string table 142b stored in the control information storage unit 140 as a comparison source character string to be compared with the comparison character string. One recognition result character string is extracted.

［ステップＳ６６］代替相違度算出部１６０は、カウンタｉ＝０とする。
［ステップＳ６７］代替相違度算出部１６０は、認識結果文字列および比較文字列に含まれる文字のうち、先頭からｉ番目の文字を抽出する。そして、抽出した双方の文字が一致しているか否かを判定する。一致している場合、処理がステップＳ６８に移される。一致していない場合、処理がステップＳ６９に移される。 [Step S66] The substitution difference calculation unit 160 sets the counter i = 0.
[Step S67] The substitution difference calculation unit 160 extracts the i-th character from the beginning of the characters included in the recognition result character string and the comparison character string. Then, it is determined whether or not both extracted characters match. If they match, the process proceeds to step S68. If not, the process moves to step S69.

［ステップＳ６８］代替相違度算出部１６０は、該当の文字に対して認識結果文字列生成部１２０が算出した相違度を代替相違度として取得する。代替相違度算出部１６０は、制御情報記憶部１４０に記憶された処理対象としている項目の認識結果文字列テーブルを参照して、この相違度を取得できる。 [Step S68] The alternative difference calculation unit 160 acquires the difference calculated by the recognition result character string generation unit 120 for the corresponding character as an alternative difference. The alternative dissimilarity calculation unit 160 can acquire the dissimilarity with reference to the recognition result character string table of the item to be processed stored in the control information storage unit 140.

［ステップＳ６９］代替相違度算出部１６０は、制御情報記憶部１４０に記憶された調整定数テーブル１４３を参照して、該当の文字に対する代替相違度を文字不一致相違度（Ｖ）とする。なお、図１１の例では、Ｖ＝１０００である。 [Step S69] The substitution difference calculation unit 160 refers to the adjustment constant table 143 stored in the control information storage unit 140, and sets the substitution difference degree for the corresponding character as the character mismatch difference degree (V). In the example of FIG. 11, V = 1000.

［ステップＳ７０］代替相違度算出部１６０は、取得した代替相違度が調整定数テーブル１４３に設定された文字単位調整相違度閾値（Ｘ）以下であるか否かを判定する。Ｘ以下である場合、処理がステップＳ７１に移される。Ｘより大きい場合、処理がステップＳ７３に移される。なお、図１１の例では、Ｘ＝２００である。 [Step S 70] The alternative dissimilarity calculation unit 160 determines whether or not the acquired alternative dissimilarity is equal to or less than the character unit adjustment dissimilarity threshold (X) set in the adjustment constant table 143. If it is X or less, the process proceeds to step S71. If it is greater than X, the process proceeds to step S73. In the example of FIG. 11, X = 200.

［ステップＳ７１］代替相違度算出部１６０は、処理対象としている項目の認識結果文字列テーブルを参照して、該当の文字と、第２候補の認識結果文字列の先頭からｉ番目の文字との相違度差を算出する。例えば、認識結果文字列テーブル１４１ａにおいて、先頭から２番目の文字を比較する場合、第１候補の文字列に含まれる“奈”という文字に対し、第２候補の文字列に含まれる“合”という文字の相違度差“２４１−１５７＝８４”が算出される。そして、代替相違度算出部１６０は、算出した相違度差が文字単位調整相違度差閾値（Ｙ）以上であるか否かを判定する。Ｙ以上である場合、処理がステップＳ７２に移される。Ｙよりも小さい場合、処理がステップＳ７３に移される。なお、図１１の例では、Ｙ＝１００である。 [Step S71] The substitution difference calculation unit 160 refers to the recognition result character string table of the item to be processed, and compares the corresponding character with the i-th character from the beginning of the second candidate recognition result character string. The difference degree difference is calculated. For example, in the recognition result character string table 141a, when comparing the second character from the head, the character “NA” included in the first candidate character string is compared with the “match” included in the second candidate character string. A difference degree difference “241−157 = 84” is calculated. Then, the alternative difference calculation unit 160 determines whether or not the calculated difference difference is equal to or greater than the character unit adjustment difference difference threshold (Y). If it is greater than or equal to Y, the process proceeds to step S72. If smaller than Y, the process proceeds to step S73. In the example of FIG. 11, Y = 100.

［ステップＳ７２］代替相違度算出部１６０は、該当の文字について取得した代替相違度を１／１０倍した値を代替相違度として改めて取得する。
［ステップＳ７３］代替相違度算出部１６０は、上記ステップＳ６１で抽出した比較文字列について代替相違度を未算出の文字があるか否かを判定する。未算出の文字がある場合、処理がステップＳ７４に移される。未算出の文字がない場合、処理がステップＳ７５に移される。代替相違度算出部１６０は、例えば、カウンタｉの値が該当の比較文字列の“（比較文字数）−１”の値と等しいか否かを判定することで、この判定を行うことができる。 [Step S 72] The substitution difference calculation unit 160 newly acquires a value obtained by multiplying the substitution difference obtained for the character by 1/10 as the substitution difference.
[Step S73] The alternative difference calculation unit 160 determines whether there is a character for which the alternative difference has not been calculated for the comparison character string extracted in step S61. If there is an uncalculated character, the process proceeds to step S74. If there is no uncalculated character, the process proceeds to step S75. For example, the alternative dissimilarity calculation unit 160 can make this determination by determining whether the value of the counter i is equal to the value of “(number of comparison characters) −1” of the corresponding comparison character string.

［ステップＳ７４］代替相違度算出部１６０は、カウンタｉをインクリメントする。そして、処理がステップＳ６７に移される。
［ステップＳ７５］代替相違度算出部１６０は、該当の比較文字列に含まれる各文字について取得した代替相違度の和Ｋを算出する。 [Step S74] The substitution difference calculation unit 160 increments the counter i. Then, the process proceeds to step S67.
[Step S75] The alternative dissimilarity calculation unit 160 calculates the sum K of alternative dissimilarities acquired for each character included in the corresponding comparison character string.

［ステップＳ７６］代替相違度算出部１６０は、認識結果文字列の有効文字を取得する。また、代替相違度算出部１６０は、比較文字列テーブル１４２ａを参照して、該当の比較文字列の有効文字数を取得する。ここで、比較文字列テーブル１４２ａに含まれる比較文字列のうち、例えば“神奈１１１県”の有効文字数は“５”である。また、“（ネ）申奈１１１県”の有効文字数は“６”である。代替相違度算出部１６０は、取得した各有効文字数に差があるか否かを判定する。差がある場合、処理がステップＳ７７に移される。差がない場合、代替相違度算出部１６０は、代替相違度合計Ｔ＝Ｋとして、処理がステップＳ７８に移される。 [Step S76] The substitution difference calculation unit 160 acquires valid characters of the recognition result character string. Also, the alternative difference calculation unit 160 refers to the comparison character string table 142a and acquires the number of valid characters of the corresponding comparison character string. Here, of the comparison character strings included in the comparison character string table 142a, for example, the number of effective characters of “Kan 111 prefecture” is “5”. In addition, the number of effective characters of “(N) Shina 111 prefecture” is “6”. The substitution difference calculation unit 160 determines whether or not there is a difference between the acquired numbers of valid characters. If there is a difference, the process proceeds to step S77. If there is no difference, the alternative dissimilarity calculation unit 160 sets the alternative dissimilarity total T = K, and the process proceeds to step S78.

［ステップＳ７７］代替相違度算出部１６０は、代替相違度の和Ｋに１文字分離相違度（Ｗ）を加算する。図１１の例では、Ｗ＝１１００である。代替相違度算出部１６０は、この値を代替相違度合計Ｔ（＝Ｋ＋Ｗ）とする。なお、本例では、認識結果文字列と比較文字列との有効文字数に差がある場合、その差は１である。したがって、ここでは１文字分の差が存在することを反映した値ＷをＴに加算するものとしている。ただし、有効文字数の差が２以上の比較文字列が抽出されるような場合には、その差に応じた値を加算することが考えられる。 [Step S77] The substitution difference calculation unit 160 adds the one character separation difference (W) to the sum K of substitution differences. In the example of FIG. 11, W = 1100. The alternative dissimilarity calculation unit 160 sets this value as the alternative dissimilarity total T (= K + W). In this example, when there is a difference in the number of valid characters between the recognition result character string and the comparison character string, the difference is 1. Therefore, here, a value W reflecting that there is a difference of one character is added to T. However, when a comparison character string having a difference in the number of valid characters of 2 or more is extracted, it is conceivable to add a value corresponding to the difference.

［ステップＳ７８］代替相違度算出部１６０は、該当の代替相違度合計Ｔが代替認識結果文字列について求められたものであって、代替認識結果文字列テーブル１４２ｂに登録された該当の対象文字数である全ての代替認識結果文字列に関して算出済みであるか否かを判定する。該当の代替相違度合計Ｔが認識結果文字列について算出されたものである場合、または、対象となる全ての代替認識結果文字列に関して算出済みである場合、処理がステップＳ７９に移される。代替認識結果文字列について求められたものであって、対象となる代替認識結果文字列に未算出のものがある場合、処理がステップＳ６５に移される。 [Step S78] The substitution difference calculation unit 160 obtains the corresponding substitution difference total T for the substitution recognition result character string, and uses the number of the corresponding target characters registered in the substitution recognition result string table 142b. It is determined whether or not all alternative recognition result character strings have been calculated. If the corresponding alternative dissimilarity total T is calculated for the recognition result character string, or if all the alternative recognition result character strings have been calculated, the process proceeds to step S79. If there is an alternative recognition result character string that has been calculated and the target alternative recognition result character string is not yet calculated, the process proceeds to step S65.

［ステップＳ７９］代替相違度算出部１６０は、比較文字列テーブル１４２ａに含まれる全ての比較文字列について代替相違度合計Ｔを算出済みであるか否かを判定する。算出済みである場合、処理が完了する。算出済みでない場合、処理がステップＳ６１に移される。 [Step S79] The alternative dissimilarity calculation unit 160 determines whether or not the alternative dissimilarity total T has been calculated for all the comparison character strings included in the comparison character string table 142a. If it has been calculated, the process is completed. If not calculated, the process proceeds to step S61.

このようにして、コンピュータ１００は、各比較文字列の代替相違度合計Ｔを算出する。このとき、コンピュータ１００は、認識結果文字列と比較文字列との同じ文字位置にある文字を比較して、それらの文字が一致するか否かを判定し、制御情報記憶部１４０に記憶された調整定数テーブル１４３を参照して、代替相違度を決定する。更に、各文字の代替相違度の和に加えて、認識結果文字列の有効文字数と比較文字列に対応する有効文字数との差に応じた調整値を加算して、代替相違度合計を算出する。 In this way, the computer 100 calculates the alternative dissimilarity total T for each comparison character string. At this time, the computer 100 compares the character at the same character position in the recognition result character string and the comparison character string, determines whether or not these characters match, and is stored in the control information storage unit 140. With reference to the adjustment constant table 143, the alternative dissimilarity is determined. Furthermore, in addition to the sum of the alternative dissimilarities for each character, an adjustment value corresponding to the difference between the number of effective characters in the recognition result character string and the effective character number corresponding to the comparison character string is added to calculate the total alternative dissimilarity. .

なお、上記ステップＳ６９において決定した代替相違度が文字単位調整相違度（Ｘ）よりも明らかに大きい値の場合には、上記ステップＳ６９に続いて上記ステップＳ７３を実行する手順としてもよい。 If the alternative dissimilarity determined in step S69 is a value that is clearly larger than the character unit adjustment dissimilarity (X), the step S73 may be executed following the step S69.

図１７は、評価結果の第１の具体例を示す図である。評価結果テーブル３１０は、代替相違度算出部１６０によって生成され、補正文字列特定部１７０に出力される。評価結果テーブル３１０は、認識結果文字列が“神奈ノ１１”である場合を例示している。 FIG. 17 is a diagram illustrating a first specific example of an evaluation result. The evaluation result table 310 is generated by the alternative difference calculation unit 160 and is output to the corrected character string specifying unit 170. The evaluation result table 310 exemplifies a case where the recognition result character string is “Kanano 11”.

評価結果テーブル３１０には、順位を示す項目、比較元文字列を示す項目、比較文字列を示す項目、代替相違度合計を示す項目、評価値を示す項目および補正文字列を示す項目が設けられている。各項目の横方向に並べられた情報同士が互いに関連付けられて１つの比較文字列の評価結果に関する情報を示す。 The evaluation result table 310 is provided with an item indicating the ranking, an item indicating the comparison source character string, an item indicating the comparison character string, an item indicating the total alternative dissimilarity, an item indicating the evaluation value, and an item indicating the correction character string. ing. Information regarding the evaluation result of one comparison character string is shown by associating pieces of information arranged in the horizontal direction of each item with each other.

順位を示す項目には、比較の結果求められた優先度を示す順位が設定される。対応する評価値が小さいほど、優先度は高くなる。比較元文字列を示す項目には、比較の元となる文字列として、認識結果文字列または代替認識結果文字列が設定される。比較文字列を示す項目には、比較元文字列と比較するための比較文字列が設定される。代替相違度合計を示す項目には、代替相違度合計の値が設定される。評価値を示す項目には、評価値が設定される。補正文字列を示す項目には、該当の比較文字列に対応する補正文字列が設定される。 In the item indicating the rank, a rank indicating the priority obtained as a result of the comparison is set. The lower the corresponding evaluation value, the higher the priority. In the item indicating the comparison source character string, a recognition result character string or an alternative recognition result character string is set as a character string to be compared. In the item indicating the comparison character string, a comparison character string for comparison with the comparison source character string is set. In the item indicating the total alternative difference, a value of the total alternative difference is set. An evaluation value is set in the item indicating the evaluation value. In the item indicating the correction character string, a correction character string corresponding to the corresponding comparison character string is set.

評価結果テーブル３１０には、例えば、順位“１”の比較文字列として“神奈１１１県”が示されている。このときの比較元文字列には、認識結果文字列と同一の文字列である“神奈ノ１１”が示されている。代替相違度算出部１６０は、比較元文字列“神奈ノ１１”に対する“神奈１１１県”（ただし、“県”は比較対象外）の評価値を、次のようにして求める。 In the evaluation result table 310, for example, “Kan 111 prefecture” is shown as a comparison character string of rank “1”. The comparison source character string at this time indicates “Kanano 11”, which is the same character string as the recognition result character string. The alternative dissimilarity calculation unit 160 obtains an evaluation value of “Kan 111 prefecture” (however, “prefecture” is not a comparison target) for the comparison source character string “Kanano 11” as follows.

（ａ１）代替相違度算出部１６０は、先頭の文字同士を比較する。比較元文字列の“神”という文字と比較文字列の“神”という文字は一致する。このため、比較元文字列の“神”に対して認識結果文字列生成部１２０が算出した相違度“１３５”を比較文字列の“神”に対する代替相違度とする。なお、文字の一致または不一致の判定は、例えば、文字コードが一致するか否かにより行うことができる。 (A1) The substitution difference calculation unit 160 compares the first characters. The character “God” in the comparison source character string matches the character “God” in the comparison character string. Therefore, the degree of difference “135” calculated by the recognition result character string generation unit 120 with respect to “God” of the comparison source character string is set as the alternative degree of difference with respect to “God” of the comparison character string. Note that the character match or mismatch can be determined, for example, based on whether the character codes match.

（ａ２）代替相違度算出部１６０は、２番目の文字同士を比較する。比較元文字列の“奈”という文字と比較文字列の“奈”という文字は一致する。このため、比較元文字列の“奈”に対して認識結果文字列生成部１２０が算出した相違度“１５７”を比較文字列の“奈”に対する代替相違度とする。 (A2) The substitution difference calculation unit 160 compares the second characters. The character “NA” in the comparison source character string matches the character “NA” in the comparison character string. Therefore, the degree of difference “157” calculated by the recognition result character string generation unit 120 for “N” of the comparison source character string is set as the alternative degree of difference for “N” of the comparison character string.

（ａ３）代替相違度算出部１６０は、３番目の文字同士を比較する。比較元文字列の“ノ”という文字と比較文字列の“１”という文字は一致しない。このため、比較文字列の３番目の“１”に対する代替相違度を文字不一致相違度Ｖ＝“１０００”とする。 (A3) The substitution difference calculation unit 160 compares the third characters. The character “NO” in the comparison source character string does not match the character “1” in the comparison character string. Therefore, the alternative dissimilarity with respect to the third “1” in the comparison character string is assumed to be the character disagreement dissimilarity V = “1000”.

（ａ４）代替相違度算出部１６０は、４番目の文字同士を比較する。比較元文字列の４番目の“１”という文字と比較文字列の４番目の“１”という文字は一致する。このため、比較元文字列の４番目の“１”に対して認識結果文字列生成部１２０が算出した相違度“１０２”を比較文字列の４番目の“１”に対する代替相違度とする。ここで、認識結果文字列テーブル１４１ａを参照すると、第１候補の“神奈ノ１１”に対して、第２の候補の“伸合１、１”の４番目の文字同士の相違度差が“２１５−１０２＝１１３”であり、文字単位調整相違度差閾値（Ｙ）＝１００以上を満たしている。また、代替相違度“１０２”は、文字単位調整相違度閾値（Ｘ）＝２００以下を満たしている。このため、代替相違度算出部１６０は、代替相違度１０２を１／１０にした値を改めて代替相違度として取得する。すなわち、４番目の“１”に対する代替相違度を“（１０２／１０）＝１０”（小数点切捨て）とする。 (A4) The substitution difference calculation unit 160 compares the fourth characters. The fourth character “1” in the comparison source character string matches the fourth character “1” in the comparison character string. For this reason, the difference “102” calculated by the recognition result character string generation unit 120 for the fourth “1” of the comparison source character string is set as the alternative difference for the fourth “1” of the comparison character string. Here, referring to the recognition result character string table 141 a, the difference between the fourth characters of the second candidate “Shin 1 and 1” is different from the first candidate “Kanna 11”. 215−102 = 113 ″, which satisfies the character unit adjustment difference difference threshold (Y) = 100 or more. The alternative dissimilarity “102” satisfies the character unit adjustment dissimilarity threshold (X) = 200 or less. For this reason, the alternative dissimilarity calculation unit 160 newly acquires a value obtained by reducing the alternative dissimilarity 102 to 1/10 as an alternative dissimilarity. In other words, the alternative dissimilarity with respect to the fourth “1” is “(102/10) = 10” (decimal point truncation).

（ａ５）代替相違度算出部１６０は、５番目の文字同士を比較する。比較元文字列の５番目の“１”という文字と比較文字列の５番目の“１”という文字は一致する。このため、比較元文字列の５番目の“１”に対して認識結果文字列生成部１２０が算出した相違度“１０８”を比較文字列の５番目の“１”に対する代替相違度とする。 (A5) The substitution difference calculation unit 160 compares the fifth characters. The fifth character “1” in the comparison source character string matches the fifth character “1” in the comparison character string. For this reason, the difference “108” calculated by the recognition result character string generation unit 120 for the fifth “1” of the comparison source character string is set as the alternative difference for the fifth “1” of the comparison character string.

（ａ６）代替相違度算出部１６０は（ａ１）〜（ａ５）で求めた各文字の代替相違度を合計して代替相違度合計Ｔ＝“１４１０”を得る。
（ａ７）代替相違度算出部１６０は、代替相違度合計Ｔを比較文字列“神奈１１１県”に対応する比較文字数“５”で割った商である“２８２”を評価値とする。 (A6) The substitution difference calculation unit 160 adds up the substitution differences of the characters obtained in (a1) to (a5) to obtain a substitution difference total T = “1410”.
(A7) The alternative dissimilarity calculation unit 160 sets “282”, which is a quotient obtained by dividing the alternative dissimilarity total T by the number of comparison characters “5” corresponding to the comparison character string “Kan 111 prefecture” as an evaluation value.

また、評価結果テーブル３１０には、例えば、順位“２”の比較文字列として“神奈１１県”が示されている。代替相違度算出部１６０は、比較元文字列“神奈１１”に対する“神奈１１県”（ただし、“県”は比較対象外）の評価値を、次のようにして求める。 In the evaluation result table 310, for example, “Kan 11 prefecture” is shown as a comparison character string of rank “2”. The alternative dissimilarity calculation unit 160 obtains an evaluation value of “Kan 11 prefecture” (however, “prefecture” is not a comparison target) for the comparison source character string “Kan 11” as follows.

（ｂ１）代替相違度算出部１６０は、先頭の文字同士を比較する。比較元文字列の“神”という文字と比較文字列の“神”という文字は一致する。このため、比較元文字列の“神”に対して認識結果文字列生成部１２０が算出した相違度“１３５”を比較文字列の“神”に対する代替相違度とする。 (B1) The substitution difference calculation unit 160 compares the first characters. The character “God” in the comparison source character string matches the character “God” in the comparison character string. Therefore, the degree of difference “135” calculated by the recognition result character string generation unit 120 with respect to “God” of the comparison source character string is set as the alternative degree of difference with respect to “God” of the comparison character string.

（ｂ２）代替相違度算出部１６０は、２番目の文字同士を比較する。比較元文字列の“奈”という文字と比較文字列の“奈”という文字は一致する。このため、比較元文字列の“奈”に対して認識結果文字列生成部１２０が算出した相違度“１５７”を比較文字列の“奈”に対する代替相違度とする。 (B2) The alternative difference calculation unit 160 compares the second characters. The character “NA” in the comparison source character string matches the character “NA” in the comparison character string. Therefore, the degree of difference “157” calculated by the recognition result character string generation unit 120 for “N” of the comparison source character string is set as the alternative degree of difference for “N” of the comparison character string.

（ｂ３）代替相違度算出部１６０は、３番目の文字同士を比較する。比較元文字列の３番目の“１”という文字と比較文字列の３番目の“１”という文字は一致する。このため、比較元文字列の３番目の“１”に対して認識結果文字列生成部１２０が算出した相違度“１０２”を比較文字列の３番目の“１”に対する代替相違度とする。ここで、認識結果文字列テーブル１４１ａを参照すると、第１候補の“神奈ノ１１”に対して、第２の候補の“伸合１、１”の４番目の文字同士の相違度差が“２１５−１０２＝１１３”であり、文字単位調整相違度差閾値（Ｙ）＝１００以上を満たしている。なお、この第１候補の４番目の文字は、比較元文字列“神奈１１”の３番目の“１”に対応している。また、代替相違度“１０２”は、文字単位調整相違度閾値（Ｘ）＝２００以下を満たしている。このため、代替相違度算出部１６０は、代替相違度１０２を１／１０にした値を改めて代替相違度として取得する。すなわち、３番目の“１”に対する代替相違度を“（１０２／１０）＝１０”（小数点切捨て）とする。 (B3) The substitution difference calculation unit 160 compares the third characters. The third character “1” in the comparison source character string matches the third character “1” in the comparison character string. For this reason, the degree of difference “102” calculated by the recognition result character string generation unit 120 for the third “1” of the comparison source character string is set as the alternative difference degree for the third “1” of the comparison character string. Here, referring to the recognition result character string table 141 a, the difference between the fourth characters of the second candidate “Shin 1 and 1” is different from the first candidate “Kanna 11”. 215−102 = 113 ″, which satisfies the character unit adjustment difference difference threshold (Y) = 100 or more. The fourth character of the first candidate corresponds to the third “1” of the comparison source character string “Kanna 11”. The alternative dissimilarity “102” satisfies the character unit adjustment dissimilarity threshold (X) = 200 or less. For this reason, the alternative dissimilarity calculation unit 160 newly acquires a value obtained by reducing the alternative dissimilarity 102 to 1/10 as an alternative dissimilarity. In other words, the alternative dissimilarity with respect to the third “1” is “(102/10) = 10” (decimal point truncation).

（ｂ４）代替相違度算出部１６０は、（ｂ１）〜（ｂ３）で求めた各文字の代替相違度の和に有効文字数の差を反映した１文字分離相違度Ｗ＝１１００を合計して代替相違度合計Ｔ＝“１４０２”を得る。 (B4) The substitution difference calculating unit 160 substitutes by summing the one-character separation difference W = 1100 reflecting the difference in the number of effective characters in the sum of substitution differences of the characters obtained in (b1) to (b3). The total difference T = “1402” is obtained.

（ｂ５）代替相違度算出部１６０は、代替相違度合計Ｔを比較文字列“神奈１１県”に対応する比較文字数“４”で割った商である“３５０”を評価値とする。
また、評価結果テーブル３１０には、例えば、順位“１０”の比較文字列として“（ネ）申奈１１１県”が示されている。このときの比較元文字列には、比較元文字列と同一の文字列である“神奈ノ１１”が示されている。代替相違度算出部１６０は、比較元文字列“神奈ノ１１”に対する“（ネ）申奈１１１県”（ただし、“県”は比較対象外）の評価値を、次のようにして求める。なお、“（ネ）申奈１１１県”は、“ネ申奈１１１県”という誤読候補文字列に対応して生成されたものであり、“（ネ）”は、比較の対象外となっていることを示している。 (B5) The alternative dissimilarity calculation unit 160 sets “350”, which is a quotient obtained by dividing the alternative dissimilarity total T by the number of comparison characters “4” corresponding to the comparison character string “Kan 11 prefecture” as an evaluation value.
In the evaluation result table 310, for example, “(N) Shinna 111 prefecture” is shown as a comparison character string of rank “10”. The comparison source character string at this time indicates “Kanano 11”, which is the same character string as the comparison source character string. The alternative dissimilarity calculation unit 160 obtains an evaluation value of “(N) Shinna 111 prefecture” (however, “prefecture” is not subject to comparison) for the comparison source character string “Kanano 11” as follows. “(Ne) Shina 111 prefecture” is generated corresponding to the misreading candidate character string “Ne Shina 111 prefecture”, and “(Ne)” is not subject to comparison. It is shown that.

（ｃ１）代替相違度算出部１６０は、先頭の文字同士を比較する。比較元文字列の“申”という文字と比較文字列の“神”という文字は一致しない。このため、比較元文字列の“申”に対する代替相違度を文字不一致相違度Ｖ＝“１０００”とする。 (C1) The substitution difference calculation unit 160 compares the first characters. The characters “Sen” in the comparison source character string do not match the characters “God” in the comparison character string. For this reason, the alternative dissimilarity with respect to “comment” of the comparison source character string is assumed to be character mismatch dissimilarity V = “1000”.

（ｃ２）代替相違度算出部１６０は、２番目の文字同士を比較する。比較元文字列の“奈”という文字と比較文字列の“奈”という文字は一致する。このため、比較元文字列の“奈”に対して認識結果文字列生成部１２０が算出した相違度“１５７”を比較文字列の“奈”に対する代替相違度とする。 (C2) The substitution difference calculation unit 160 compares the second characters. The character “NA” in the comparison source character string matches the character “NA” in the comparison character string. Therefore, the degree of difference “157” calculated by the recognition result character string generation unit 120 for “N” of the comparison source character string is set as the alternative degree of difference for “N” of the comparison character string.

（ｃ３）代替相違度算出部１６０は、３番目の文字同士を比較する。比較元文字列の“ノ”という文字と比較文字列の“１”という文字は一致しない。このため、比較文字列の３番目の“１”に対する代替相違度を文字不一致相違度Ｖ＝“１０００”とする。 (C3) The substitution difference calculation unit 160 compares the third characters. The character “NO” in the comparison source character string does not match the character “1” in the comparison character string. Therefore, the alternative dissimilarity with respect to the third “1” in the comparison character string is assumed to be the character disagreement dissimilarity V = “1000”.

（ｃ４）代替相違度算出部１６０は、４番目の文字同士を比較する。比較元文字列の４番目の“１”という文字と比較文字列の４番目の“１”という文字は一致する。このため、比較元文字列の４番目の“１”に対して認識結果文字列生成部１２０が算出した相違度“１０２”を比較文字列の４番目の“１”に対する代替相違度とする。ここで、認識結果文字列テーブル１４１ａを参照すると、第１候補の“神奈ノ１１”に対して、第２の候補の“伸合１、１”の４番目の文字同士の相違度差が“２１５−１０２＝１１３”であり、文字単位調整相違度差閾値（Ｙ）＝１００以上を満たしている。また、代替相違度“１０２”は、文字単位調整相違度閾値（Ｘ）＝２００以下を満たしている。このため、代替相違度算出部１６０は、代替相違度１０２を１／１０にした値を改めて代替相違度として取得する。すなわち、４番目の“１”に対する代替相違度を（１０２／１０）＝１０（小数点切捨て）とする。 (C4) The substitution difference calculation unit 160 compares the fourth characters. The fourth character “1” in the comparison source character string matches the fourth character “1” in the comparison character string. For this reason, the difference “102” calculated by the recognition result character string generation unit 120 for the fourth “1” of the comparison source character string is set as the alternative difference for the fourth “1” of the comparison character string. Here, referring to the recognition result character string table 141 a, the difference between the fourth characters of the second candidate “Shin 1 and 1” is different from the first candidate “Kanna 11”. 215−102 = 113 ″, which satisfies the character unit adjustment difference difference threshold (Y) = 100 or more. The alternative dissimilarity “102” satisfies the character unit adjustment dissimilarity threshold (X) = 200 or less. For this reason, the alternative dissimilarity calculation unit 160 newly acquires a value obtained by reducing the alternative dissimilarity 102 to 1/10 as an alternative dissimilarity. That is, the alternative dissimilarity with respect to the fourth “1” is (102/10) = 10 (decimal point truncation).

（ｃ５）代替相違度算出部１６０は、５番目の文字同士を比較する。比較元文字列の５番目の“１”という文字と比較文字列の５番目の“１”という文字は一致する。このため、比較元文字列の５番目の“１”に対して認識結果文字列生成部１２０が算出した相違度“１０８”を比較文字列の５番目の“１”に対する代替相違度とする。 (C5) The substitution difference calculation unit 160 compares the fifth characters. The fifth character “1” in the comparison source character string matches the fifth character “1” in the comparison character string. For this reason, the difference “108” calculated by the recognition result character string generation unit 120 for the fifth “1” of the comparison source character string is set as the alternative difference for the fifth “1” of the comparison character string.

（ｃ６）代替相違度算出部１６０は（ｃ１）〜（ｃ５）で求めた各文字の代替相違度の和に更に有効文字数の差を反映した１文字分離相違度Ｗ＝１１００を合計して代替相違度合計Ｔ＝“３３７５”を得る。 (C6) The substitution difference calculation unit 160 replaces the sum of the substitution differences of each character obtained in (c1) to (c5) with a one-character separation difference W = 1100 that further reflects the difference in the number of effective characters. The total difference T = “3375” is obtained.

（ｃ７）代替相違度算出部１６０は、代替相違度合計Ｔを比較文字列“（ネ）申奈１１県”に対応する比較文字数“５”で割った商である“５６２”を評価値とする。
代替相違度算出部１６０は、上記のようにして、各比較文字列の評価値を算出し、評価値の小さい比較文字列ほど優先順位の高いものとして評価結果テーブル３１０に設定する。 (C7) The alternative dissimilarity calculation unit 160 uses “562”, which is a quotient obtained by dividing the alternative dissimilarity total T by the number of comparison characters “5” corresponding to the comparison character string “(N) Shinna 11 prefecture” as an evaluation value. To do.
The alternative difference calculation unit 160 calculates the evaluation value of each comparison character string as described above, and sets the comparison character string having a smaller evaluation value in the evaluation result table 310 as a higher priority.

評価結果テーブル３１０により、分離文字列が含まれる認識結果文字列に対して、最も優先度の高い比較文字列を特定することができる。
図１８は、評価結果の第２の具体例を示す図である。評価結果テーブル３２０は、代替相違度算出部１６０によって生成され、補正文字列特定部１７０に出力される。評価結果テーブル３２０は、認識結果文字列が“神剛”である場合を例示している。 The evaluation result table 310 can identify the comparison character string having the highest priority for the recognition result character string including the separated character string.
FIG. 18 is a diagram illustrating a second specific example of the evaluation result. The evaluation result table 320 is generated by the alternative difference calculation unit 160 and is output to the corrected character string specifying unit 170. The evaluation result table 320 exemplifies a case where the recognition result character string is “Shingo”.

評価結果テーブル３２０の構成は、図１７に示した評価結果テーブル３１０の構成と同一であるため説明を省略する。
評価結果テーブル３２０には、例えば、順位“１”の比較文字列として“神奈県”が示されている。代替相違度算出部１６０は、認識結果文字列“神剛”に対する“神奈県”（ただし“県”は比較対象外）の評価値を、次のようにして求める。なお、“神奈県”は、認識結果文字列“神剛”の有効文字数“２”が補正文字列“神奈川県”の有効文字数“３”よりも小さいために、比較文字列生成部１５０により生成されたものである。 The configuration of the evaluation result table 320 is the same as that of the evaluation result table 310 shown in FIG.
In the evaluation result table 320, for example, “Kanagawa” is shown as a comparison character string of rank “1”. The alternative dissimilarity calculation unit 160 obtains an evaluation value of “Kan prefecture” (where “prefecture” is not a comparison target) for the recognition result character string “Shingo” as follows. Note that “Kan Prefecture” is generated by the comparison character string generation unit 150 because the number of effective characters “2” of the recognition result character string “Kango” is smaller than the number of effective characters “3” of the correction character string “Kanagawa”. It has been done.

（ｄ１）代替相違度算出部１６０は、先頭の文字同士を比較する。認識結果文字列の“神”という文字と比較文字列の“神”という文字は一致する。このため、認識結果文字列の“神”に対して認識結果文字列生成部１２０が算出した相違度“１３５”を比較文字列の“神”に対する代替相違度とする。 (D1) The substitution difference calculation unit 160 compares the first characters. The character “God” in the recognition result character string matches the character “God” in the comparison character string. For this reason, the degree of difference “135” calculated by the recognition result character string generation unit 120 for “god” of the recognition result character string is set as an alternative degree of difference for “god” of the comparison character string.

（ｄ２）代替相違度算出部１６０は、２番目の文字同士を比較する。認識結果文字列の“剛”という文字と比較文字列の“奈”という文字は一致しない。このため、認識結果文字列の“奈”に対する代替相違度を文字不一致相違度Ｖ＝“１０００”とする。 (D2) The substitution difference calculation unit 160 compares the second characters. The character “go” in the recognition result character string does not match the character “na” in the comparison character string. For this reason, the alternative dissimilarity with respect to “N” of the recognition result character string is assumed to be character mismatch dissimilarity V = “1000”.

（ｄ３）代替相違度算出部１６０は、（ｄ１）、（ｄ２）で求めた各文字の代替相違度を合計して代替相違度合計Ｔ＝“１１３５”を得る。
（ｄ４）代替相違度算出部１６０は、代替相違度合計Ｔを比較文字列“神奈県”に対応する比較文字数“２”で割った商である“５６７”を評価値とする。 (D3) The substitution difference calculation unit 160 adds up the substitution differences of the characters obtained in (d1) and (d2) to obtain a substitution difference total T = “1135”.
(D4) The alternative dissimilarity calculation unit 160 sets “567”, which is a quotient obtained by dividing the alternative dissimilarity total T by the number of comparison characters “2” corresponding to the comparison character string “Kanagawa” as an evaluation value.

なお、比較文字列“奈川県”および“香１県”についても同様にして評価値が算出される。ここで、両比較文字列ともに認識結果文字列と一致する文字が存在しないので、評価値の値が“１０００”と算出される。このため、両比較文字列ともに同じ優先順位となる。 Evaluation values are similarly calculated for the comparison character strings “Nagawa Prefecture” and “Kaori 1 Prefecture”. Here, since there is no character that matches the recognition result character string in both comparison character strings, the evaluation value is calculated as “1000”. For this reason, both comparison character strings have the same priority.

図１９は、文字認識結果確認ウィンドウの表示例を示す図である。文字認識結果確認ウィンドウ４００は、出力部１８０により生成され、モニタ１１に表示される。
文字認識結果確認ウィンドウ４００には、候補表示領域４１０、確定ボタン表示領域４２０および他候補選択ボタン表示領域４３０が設けられている。 FIG. 19 is a diagram illustrating a display example of a character recognition result confirmation window. The character recognition result confirmation window 400 is generated by the output unit 180 and displayed on the monitor 11.
In the character recognition result confirmation window 400, a candidate display area 410, a confirmation button display area 420, and another candidate selection button display area 430 are provided.

候補表示領域４１０には、知識補正の結果、最も確度の高い補正文字列が表示される。このとき、該当の補正文字列に関する知識補正の確度を示す情報も表示される。確度を示す情報は、補正文字列特定部１７０によって生成される。 In the candidate display area 410, the corrected character string with the highest accuracy is displayed as a result of knowledge correction. At this time, information indicating the accuracy of knowledge correction regarding the corresponding correction character string is also displayed. Information indicating the accuracy is generated by the corrected character string specifying unit 170.

補正文字列特定部１７０は、例えば、代替相違度算出部１６０から評価結果テーブル３１０を取得すると、優先順位の最も高い比較文字列に対応する補正文字列“神奈川県”を取得する。そして、補正文字列特定部１７０は、制御情報記憶部１４０に記憶された確度定義テーブル１４４を参照して、取得した補正文字列“神奈川県”に対する確度を特定する。具体的には、第１候補の“神奈１１１県”に対する評価値が“２８２”で“３５０”以下であり、かつ、第２候補の“神奈１１県”の評価値“５６２”との評価値差が“３５０−２８２＝６８”で“５０”以上であるので、確度を“高”と特定する。 For example, when the correction character string specifying unit 170 acquires the evaluation result table 310 from the alternative dissimilarity calculation unit 160, the correction character string specifying unit 170 acquires the correction character string “Kanagawa” corresponding to the comparison character string having the highest priority. Then, the corrected character string specifying unit 170 refers to the accuracy definition table 144 stored in the control information storage unit 140 and specifies the accuracy for the acquired corrected character string “Kanagawa Prefecture”. Specifically, the evaluation value for the first candidate “Kan 111 prefecture” is “282” and “350” or less, and the evaluation value for the second candidate “Kan 11 prefecture” is “562”. Since the difference is “350−282 = 68” and “50” or more, the accuracy is specified as “high”.

このようにして、補正文字列特定部１７０は、各項目に表示する補正文字列の確度を特定し、出力部１８０に通知する。出力部１８０は、補正文字列特定部１７０から取得した各項目の補正文字列と共に確度の情報を候補表示領域４１０に含める。 In this way, the corrected character string specifying unit 170 specifies the accuracy of the corrected character string displayed in each item and notifies the output unit 180 of the accuracy. The output unit 180 includes accuracy information in the candidate display area 410 together with the corrected character string of each item acquired from the corrected character string specifying unit 170.

確定ボタン表示領域４２０は、候補表示領域４１０に表示された各項目の候補を入力データとして確定するためのボタンである。
他候補選択ボタン表示領域４３０は、候補表示領域４１０に表示された文字列とは別の文字列の表示を出力部１８０に指示するためのボタンである。出力部１８０は、この指示を受け付けると、補正文字列特定部１７０から他の補正文字列を取得して、モニタ１１に表示させる。 The confirmation button display area 420 is a button for confirming each item candidate displayed in the candidate display area 410 as input data.
The other candidate selection button display area 430 is a button for instructing the output unit 180 to display a character string different from the character string displayed in the candidate display area 410. Upon receiving this instruction, the output unit 180 acquires another correction character string from the correction character string specifying unit 170 and displays it on the monitor 11.

オペレータは、文字認識結果確認ウィンドウ４００を閲覧し、キーボード１２やマウス１３を用いて各ボタンに操作入力を行うことで、文字認識結果の確認を行い、必要に応じて再補正することができる。 The operator can check the character recognition result by browsing the character recognition result confirmation window 400 and performing an operation input to each button using the keyboard 12 or the mouse 13, and can re-correct if necessary.

なお、例えば、認識結果文字列生成部１２０により、認識結果文字列として“神奈ノ１１”が生成され、この認識結果文字列が、誤読候補情報記憶部１３０に記憶された都道府県テーブル１３１に未登録である場合が考えられる。この場合、上記確定ボタンが押下されたタイミングで、都道府県テーブル１３１に“神奈川”という補正文字列に対応付けて“神奈ノ１１”という誤読候補文字列を新たに登録することが考えられる。このようにすると、以後の知識補正において、“神奈ノ１１”が比較対象に加わることになり、補正の精度を向上することができる。 For example, the recognition result character string generation unit 120 generates “Kanano 11” as the recognition result character string, and the recognition result character string is not yet stored in the prefecture table 131 stored in the misreading candidate information storage unit 130. It may be a registration. In this case, it is conceivable that a misreading candidate character string “Kanano 11” is newly registered in the prefecture table 131 in association with the correction character string “Kanagawa” at the timing when the confirmation button is pressed. In this way, in the subsequent knowledge correction, “Kanano 11” is added to the comparison target, and the correction accuracy can be improved.

以上、説明したように、コンピュータ１００によれば、単一の文字が複数の文字に分離して認識された分離文字列を考慮した誤読候補情報を用いて知識補正を行う。誤読候補情報には、補正文字列に対応付けて、分離文字列を含む誤読候補文字列が登録される。コンピュータ１００は、この誤読候補文字列と帳票画像２００から読み取られた認識結果文字列とを比較することで補正文字列の特定精度を向上することができる。 As described above, according to the computer 100, knowledge correction is performed using misreading candidate information that takes into consideration a separated character string in which a single character is recognized by being separated into a plurality of characters. In the misreading candidate information, a misreading candidate character string including a separated character string is registered in association with the correction character string. The computer 100 can improve the identification accuracy of the corrected character string by comparing the misread candidate character string with the recognition result character string read from the form image 200.

このとき、コンピュータ１００は、認識結果文字列との比較に用いる比較文字列を、認識結果文字列の有効文字数と誤読候補文字列の有効文字数との差に応じて生成する。これにより、比較文字列に対する認識結果文字列の比較を、同一文字位置（例えば、先頭から２番目の位置など）の文字同士の比較により容易に行うことができる。 At this time, the computer 100 generates a comparison character string used for comparison with the recognition result character string according to the difference between the number of effective characters in the recognition result character string and the number of effective characters in the misreading candidate character string. Thereby, the comparison of the recognition result character string with respect to the comparison character string can be easily performed by comparing characters at the same character position (for example, the second position from the top).

また、認識結果文字列と比較文字列との食い違いの度合い（代替相違度）を文字ごとの一致または不一致、有効文字数の相違、認識結果文字列生成時の他の候補に対する優位性などによって評価する。このように代替相違度を詳細に評価することで、より確度の高い比較文字列を精度良く特定することができる。 In addition, the degree of discrepancy (substitution difference) between the recognition result character string and the comparison character string is evaluated based on the match or mismatch for each character, the difference in the number of valid characters, the superiority to other candidates when generating the recognition result character string, and the like. . Thus, by evaluating the alternative dissimilarity in detail, a comparative character string with higher accuracy can be specified with high accuracy.

また、コンピュータ１００は、認識結果文字列の有効文字数に比べて、誤読候補文字列に対応する補正文字列の有効文字数の方が大きい場合には、補正文字列から文字数差の分の文字を除いた文字列を比較文字列として生成する。これにより、複数の文字が単一の文字に統合された統合文字に対する補正を行うことができる。 In addition, when the number of valid characters in the correction character string corresponding to the misreading candidate character string is larger than the number of valid characters in the recognition result character string, the computer 100 excludes characters corresponding to the difference in the number of characters from the correction character string. The generated character string is generated as a comparison character string. Thereby, the correction | amendment with respect to the integrated character by which the several character was integrated into the single character can be performed.

なお、コンピュータ１００が有すべき機能は、知識補正プログラムをコンピュータ１００で実行することにより実現される。処理内容を記述したプログラムは、コンピュータ１００で読み取り可能な記録媒体に記録しておくことができる。コンピュータ１００で読み取り可能な記録媒体としては、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリなどがある。 The functions that the computer 100 should have are realized by executing a knowledge correction program on the computer 100. The program describing the processing contents can be recorded on a recording medium readable by the computer 100. Examples of the recording medium readable by the computer 100 include a magnetic recording device, an optical disc, a magneto-optical recording medium, and a semiconductor memory.

プログラムを流通させる場合には、例えば、そのプログラムが記録された光ディスクなどの可搬型記録媒体が販売される。また、プログラムをサーバコンピュータの記憶装置に格納しておき、そのプログラムを、サーバコンピュータからネットワークを介して他のコンピュータに転送することもできる。 When the program is distributed, for example, a portable recording medium such as an optical disk on which the program is recorded is sold. It is also possible to store the program in a storage device of a server computer and transfer the program from the server computer to another computer via a network.

コンピュータ１００は、例えば、可搬型記録媒体に記録されたプログラムまたはサーバコンピュータから転送されたプログラムを、自己の記憶装置に格納する。そして、コンピュータ１００は、自己の記憶装置からプログラムを読み取り、そのプログラムに従った処理を実行する。なお、コンピュータ１００は、可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することもできる。また、コンピュータ１００は、サーバコンピュータからプログラムが転送されるごとに、逐次、受け取ったプログラムに従った処理を実行することもできる。 The computer 100 stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. Then, the computer 100 reads the program from its own storage device and executes processing according to the program. The computer 100 can also read a program directly from a portable recording medium and execute processing according to the program. In addition, each time the program is transferred from the server computer, the computer 100 can sequentially execute processing according to the received program.

以上、本発明の知識補正プログラム、知識補正装置および知識補正方法を図示の実施の形態に基づいて説明したが、これらに限定されるものではなく、各部の構成は同様の機能を有する任意の構成のものに置き換えることができる。また、他の任意の構成物や工程が付加されてもよい。また、本発明は前述した実施の形態のうちの任意の２以上の構成（特徴）を組み合わせたものであってもよい。 As described above, the knowledge correction program, the knowledge correction apparatus, and the knowledge correction method of the present invention have been described based on the illustrated embodiments. However, the present invention is not limited thereto, and the configuration of each unit is an arbitrary configuration having the same function. Can be replaced. Moreover, other arbitrary structures and processes may be added. Further, the present invention may be a combination of any two or more configurations (features) of the above-described embodiments.

１コンピュータ
１ａ誤読候補情報記憶手段
１ｂ認識結果文字列生成手段
１ｃ文字列比較手段
１ｄ出力手段
２画像情報
３認識結果文字列
４補正文字列 DESCRIPTION OF SYMBOLS 1 Computer 1a Misreading candidate information storage means 1b Recognition result character string production | generation means 1c Character string comparison means 1d Output means 2 Image information 3 Recognition result character string 4 Correction character string

Claims

Computer
A misreading candidate that stores misreading candidate information in which a misreading candidate character string including one or more separated character strings obtained by separating a character included in each of a plurality of corrected character strings is associated with each correction character string. With reference to the misreading candidate information stored in the information storage means, among the misreading candidate character strings, the recognition result character string generated as a character string candidate included in the image information by the recognition result character string generating means. A character string comparison unit that identifies the misread candidate character string that most closely matches and identifies the correction character string corresponding to the misread candidate character string;
Output means for outputting the corrected character string specified by the character string comparison means;
Knowledge correction program characterized by functioning as

The character string comparison unit is configured to recognize the recognition candidate character string from among the misreading candidate character strings, which is the number of characters included in a character number range increased or decreased by a predetermined value with respect to the number of characters of the misreading candidate character string. The knowledge correction program according to claim 1, wherein the misreading candidate character string that most closely matches the result character string is specified.

The character string comparison unit compares the characters in the recognition result character string and the characters included in the misreading candidate character strings by comparing a match or mismatch between characters at the same character position, and Calculating a predetermined evaluation value indicating a degree of difference between the result character string and each of the misread candidate character strings, and identifying the misread candidate character string most closely matching the recognition result character string based on the evaluation value The knowledge correction program according to claim 2.

The character string comparison means calculates the evaluation value so that the degree of difference increases, and when there is a difference between the number of characters of the recognition result character string and the number of characters of the misread candidate character string, 4. The knowledge correction program according to claim 3, wherein a value corresponding to the difference in the number of characters is added to the evaluation value.

If the number of characters in the misreading candidate character string is larger than the number of characters in the recognition result character string, the character string comparison unit obtains the difference in the number of characters, and the same number as the difference in the number of characters from the misreading candidate character string. A plurality of first comparison character strings excluding the character is generated in association with the correction character string corresponding to the misreading candidate character string, and the evaluation value of the recognition result character string and each of the first comparison character strings The knowledge correction program according to claim 3, wherein the knowledge correction program is calculated.

If the number of characters in the misread candidate character string is smaller than the number of characters in the recognition result character string, the character string comparison unit obtains the difference in the number of characters, and the same number as the difference in the number of characters from the recognition result character string. A plurality of comparison source character strings excluding the character is generated in association with the recognition result character string, and the evaluation values of the comparison source character strings and the misreading candidate character strings are calculated. Item 6. The knowledge correction program according to any one of Items 3 to 5.

When the number of characters of the correction character string corresponding to the misread candidate character string is larger than the number of characters of the recognition result character string, the character string comparison unit obtains the difference in the number of characters and calculates the difference from the correction character string. A plurality of second comparison character strings excluding the same number of characters as the difference in the number of characters are generated in association with the correction character strings, and the evaluation values of the recognition result character strings and the respective second comparison character strings are calculated. The knowledge correction program according to any one of claims 3 to 6, characterized by:

When the output means obtains the corrected character string, the evaluation value calculated by the character string comparison means for the correction character string is referred to the accuracy definition information storage means for storing the accuracy information according to the evaluation value. The knowledge correction program according to any one of claims 3 to 7, wherein the accuracy information according to the information is acquired and the accuracy information is displayed together with the correction character string on a display device.

A knowledge correction device for recognizing a character string included in image information,
A misreading candidate storing misreading candidate information in which a misreading candidate character string including one or more separated character strings obtained by separating a character included in each of a plurality of corrected character strings is associated with each correction character string. With reference to the misreading candidate information stored in the information storage means, among the misreading candidate character strings, the recognition result character string generated as a character string candidate included in the image information by the recognition result character string generating means. A character string comparison unit that identifies the misread candidate character string that most closely matches, and that identifies the correction character string corresponding to the misread candidate character string;
Output means for outputting the corrected character string specified by the character string comparison means;
A knowledge correction apparatus comprising:

A knowledge correction method of a knowledge correction device for recognizing a character string included in image information,
A misreading candidate in which a character string comparison unit defines a misreading candidate character string including one or more separated character strings obtained by separating a character included in each of a plurality of corrected character strings into a plurality of characters, in association with each of the corrected character strings. With reference to the misreading candidate information stored in the misreading candidate information storing means for storing information, among the misreading candidate character strings, the recognition result character string generating means is generated as a character string candidate included in the image information. Identify the misreading candidate character string that most closely matches the recognition result character string, identify the correction character string corresponding to the misreading candidate character string,
An output unit that outputs the corrected character string specified by the character string comparison unit;
A knowledge correction method characterized by that.