JP5471254B2

JP5471254B2 - Verification device, verification method, verification program, and creation device

Info

Publication number: JP5471254B2
Application number: JP2009228837A
Authority: JP
Inventors: 浩明武部; 悦伸堀田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2009-09-30
Filing date: 2009-09-30
Publication date: 2014-04-16
Anticipated expiration: 2029-09-30
Also published as: JP2011076481A

Description

本発明は、検証装置、検証方法、検証プログラム及び作成装置に関する。 The present invention relates to a verification device, a verification method, a verification program, and a creation device.

従来、文字が記載された文書を光学的に画像データとして取得し、取得した画像データに含まれる文字を認識する文字認識装置がある。また、文字認識装置による文字認識処理の後、文字認識結果として得られた文字が正しいかを距離値を用いて検証する検証装置がある。なお、以下では、画像データを文字画像と称する。 2. Description of the Related Art Conventionally, there is a character recognition device that optically acquires a document in which characters are described as image data and recognizes characters included in the acquired image data. There is also a verification device that verifies whether a character obtained as a result of character recognition is correct using a distance value after character recognition processing by the character recognition device. Hereinafter, the image data is referred to as a character image.

ここで、検証装置について更に説明する。検証装置は、文字ごとの特徴量が登録された辞書を有する。そして、検証装置は、文字画像に含まれる文字について算出した特徴量と、文字認識結果として得られた文字について辞書に予め登録されていた特徴量との間の距離を示す距離値を算出する。そして、検証装置は、距離値が小さいほど、距離値が大きい場合と比較して正しい確率が高いと検証する。なお、特徴量とは、例えば、文字線の傾き、文字面積等といった文字の形状に関するパラメータを示す。 Here, the verification apparatus will be further described. The verification device has a dictionary in which feature amounts for each character are registered. Then, the verification device calculates a distance value indicating a distance between the feature amount calculated for the character included in the character image and the feature amount previously registered in the dictionary for the character obtained as a character recognition result. The verification device verifies that the smaller the distance value is, the higher the probability of correctness is compared to the case where the distance value is large. Note that the feature amount indicates a parameter relating to the shape of the character, such as the inclination of the character line and the character area.

特開平１０−６３７８４号公報Japanese Patent Laid-Open No. 10-63784

しかしながら、上述の検証装置では、検証結果の精度が悪いという課題があった。特に、他に形状が似ている文字がある類似文字について検証する場合に、検証結果の精度が悪かった。なお、類似文字とは、例えば、「ｌ（エル）」と「１（いち）」、あるいは、「イ」と「ィ」などが該当する。 However, the verification device described above has a problem that the accuracy of the verification result is poor. In particular, when verifying similar characters that have other similar characters, the accuracy of the verification results is poor. The similar characters correspond to, for example, “l (el)” and “1 (1)” or “i” and “i”.

例えば、文字画像に含まれる文字が「ｌ（エル）」であり、文字認識結果として得られた文字が「１（いち）」である場合を例に用いて説明する。ここで、「ｌ（エル）」と「１（いち）」とは形状が類似しており、「ｌ（エル）」について算出された特徴量と、「１（いち）」について辞書に予め登録されていた特徴量との間の距離値が小さくなることがある。この場合、上述の検証装置は、正しい確率が高いと検証する。つまり、上述の検証装置は、文字認識結果として得られた文字が誤っていたとしても、正しい確率が高いと検証することがあった。 For example, a case where the character included in the character image is “l” and the character obtained as a character recognition result is “1” will be described as an example. Here, “l (el)” and “1 (1)” are similar in shape, and the feature amount calculated for “l (el)” and “1 (1)” are registered in the dictionary in advance. The distance value between the set feature amount may be small. In this case, the verification device described above verifies that the probability of correctness is high. That is, the verification device described above may verify that the probability of correctness is high even if the character obtained as a result of character recognition is incorrect.

開示の技術は、上記に鑑みてなされたものであって、文字認識結果として得られた文字が正しいかを精度良く検証可能な検証装置、検証方法、検証プログラム及び作成装置を提供することを目的とする。 The disclosed technology has been made in view of the above, and an object thereof is to provide a verification device, a verification method, a verification program, and a creation device capable of accurately verifying whether a character obtained as a character recognition result is correct. And

開示する検証装置は、１つの態様において、文字画像が入力されると、入力された文字画像に対して文字認識処理を実行する文字認識部を備える。また、検証装置は、第１の文字に対する文字認識処理にて誤認識した結果得られる可能性のある第２の文字と前記第１の文字とを区別する条件と、前記第１の文字及び前記第２の文字の各文字に関して、文字画像に含まれる文字の当該文字画像内での大きさを示す情報と、当該文字と近傍にある他の文字との関連性を示す情報と、当該文字に対する文字認識処理の結果の確からしさを示す情報とのうち少なくともいずれか１つを含む属性値とを用いて、文字画像に含まれる文字の文字認識処理の結果が前記第１の文字である場合に前記結果の正誤を検証する検証式を作成する作成部を備える。また、検証装置は、前記文字認識部による文字認識処理の結果に前記第１の文字が含まれているかを識別し、含まれていると識別した場合に、前記作成部によって作成された検証式を用いて検証を行う検証部を備える。 In one aspect, the disclosed verification apparatus includes a character recognition unit that executes character recognition processing on an input character image when the character image is input. In addition, the verification device includes a condition for distinguishing the first character from the second character that may be obtained as a result of erroneous recognition in the character recognition process for the first character, the first character, and the first character For each character of the second character, information indicating the size of the character included in the character image in the character image, information indicating the relationship between the character and other nearby characters, and the character When the result of the character recognition process of the character included in the character image is the first character using the attribute value including at least one of the information indicating the probability of the result of the character recognition process A creation unit for creating a verification expression for verifying the correctness of the result is provided. Further, the verification device identifies whether the first character is included in the result of the character recognition processing by the character recognition unit, and when the verification device identifies that the first character is included, the verification formula created by the creation unit The verification part which verifies using is provided.

開示する検証装置の１つの態様によれば、文字認識結果として得られた文字が正しいかを精度良く検証可能であるという効果を奏する。 According to one aspect of the disclosed verification apparatus, there is an effect that it is possible to accurately verify whether a character obtained as a character recognition result is correct.

図１は、実施例１に係る検証装置の構成の一例について説明するブロック図である。FIG. 1 is a block diagram illustrating an example of the configuration of the verification apparatus according to the first embodiment. 図２は、実施例２に係る検証装置の構成の一例について説明するブロック図である。FIG. 2 is a block diagram illustrating an example of the configuration of the verification apparatus according to the second embodiment. 図３は、実施例２における学習用データテーブルに記憶された情報の一例について説明する図である。FIG. 3 is a diagram illustrating an example of information stored in the learning data table according to the second embodiment. 図４は、実施例２における最良統合論理式テーブルに記憶された情報の一例について説明する図である。FIG. 4 is a schematic diagram illustrating an example of information stored in the best integrated logical expression table according to the second embodiment. 図５は、実施例２における表示画面の一例について説明する図である。FIG. 5 is a diagram illustrating an example of a display screen according to the second embodiment. 図６−１は、実施例２における最良論理式作成部による処理の全体像について説明する図である。FIG. 6A is a schematic diagram illustrating an overall image of processing performed by the best logical expression creation unit according to the second embodiment. 図６−２は、実施例２における最良論理式作成部による処理の全体像について説明する図である。FIG. 6B is a diagram for explaining the overall image of the process performed by the best logical expression creation unit according to the second embodiment. 図６−３は、実施例２における最良論理式作成部による処理の全体像について説明する図である。FIG. 6C is a schematic diagram illustrating an overall process performed by the best logical expression creating unit according to the second embodiment. 図６−４は、実施例２における最良論理式作成部による処理の全体像について説明する図である。FIG. 6-4 is a schematic diagram illustrating an overall process performed by the best logical expression creating unit according to the second embodiment. 図７−１は、実施例２における一般化処理について説明する図である。FIG. 7A is a schematic diagram illustrating generalization processing according to the second embodiment. 図７−２は、実施例２における一般化処理について説明する図である。FIG. 7B is a schematic diagram illustrating generalization processing according to the second embodiment. 図８は、実施例２における評価値が最も高い論理式を選択する処理について説明する図である。FIG. 8 is a diagram illustrating processing for selecting a logical expression having the highest evaluation value in the second embodiment. 図９−１は、実施例２における最良統合論理式作成部による処理の全体像について説明する図である。FIG. 9A is a schematic diagram illustrating an overall process performed by the best integrated logical expression creation unit according to the second embodiment. 図９−２は、実施例２における最良統合論理式作成部による処理の全体像について説明する図である。FIG. 9-2 is a schematic diagram illustrating an overall process performed by the best integrated logical expression creation unit according to the second embodiment. 図９−３は、実施例２における最良統合論理式作成部による処理の全体像について説明する図である。FIG. 9C is a schematic diagram illustrating an overall process performed by the best integrated logical expression creation unit according to the second embodiment. 図９−４は、実施例２における最良統合論理式作成部による処理の全体像について説明する図である。FIG. 9-4 is a schematic diagram illustrating an overall process performed by the best integrated logical expression creating unit according to the second embodiment. 図１０は、実施例２における最良統合論理式作成処理の流れの一例について説明するフローチャートである。FIG. 10 is a flowchart illustrating an example of the flow of best integrated logical expression creation processing according to the second embodiment. 図１１は、実施例２における最良論理式作成処理の流れの一例について説明するフローチャートである。FIG. 11 is a flowchart illustrating an example of the flow of the best logical expression creation process in the second embodiment. 図１２は、実施例２における一般化処理の流れの一例について説明するフローチャートである。FIG. 12 is a flowchart for explaining an example of the flow of generalization processing in the second embodiment. 図１３は、実施例２における評価値が最も高い論理式を選択する処理の流れの一例について説明するフローチャートである。FIG. 13 is a flowchart illustrating an example of a processing flow for selecting a logical expression having the highest evaluation value in the second embodiment. 図１４は、実施例２における検証処理の流れの一例について説明するフローチャートである。FIG. 14 is a flowchart illustrating an example of the flow of verification processing according to the second embodiment. 図１５は、実施例２に係る検証プログラムを実行するコンピュータの一例について説明する図である。FIG. 15 is a schematic diagram illustrating an example of a computer that executes a verification program according to the second embodiment.

以下に、開示の検証装置、検証方法、検証プログラム及び作成装置の実施例を図面に基づいて詳細に説明する。なお、本実施例により開示する発明が限定されるものではない。各実施例は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 Hereinafter, embodiments of the disclosed verification apparatus, verification method, verification program, and creation apparatus will be described in detail with reference to the drawings. Note that the invention disclosed by this embodiment is not limited. Each embodiment can be appropriately combined within a range in which processing contents do not contradict each other.

図１を用いて、実施例１に係る検証装置１００の構成の一例について説明する。図１は、実施例１に係る検証装置の構成の一例について説明するブロック図である。検証装置１００は、図１に示す例では、作成部１０１と、文字認識部１０２と、検証部１０３とを有する。 An example of the configuration of the verification apparatus 100 according to the first embodiment will be described with reference to FIG. FIG. 1 is a block diagram illustrating an example of the configuration of the verification apparatus according to the first embodiment. In the example illustrated in FIG. 1, the verification apparatus 100 includes a creation unit 101, a character recognition unit 102, and a verification unit 103.

作成部１０１は、第１の文字に対する文字認識処理にて誤認識した結果得られる可能性のある第２の文字と第１の文字とを区別する条件と、属性値とを用いて、文字画像に含まれる文字の文字認識処理の結果が第１の文字である場合に結果の正誤を検証する検証式を作成する。ここで、属性値は、第１の文字及び第２の文字の各文字に関して、文字画像に含まれる文字の当文字画像内での大きさを示す情報と、当文字と近傍にある他の文字との関連性を示す情報と、当文字に対する文字認識処理の結果の確からしさを示す情報とのうち、少なくともいずれか１つを含む。 The creation unit 101 uses a condition for distinguishing the second character and the first character that may be obtained as a result of erroneous recognition in the character recognition process for the first character, and the character image using the attribute value. If the result of the character recognition process for the character included in the first character is a verification formula for verifying the correctness of the result. Here, the attribute value includes, for each character of the first character and the second character, information indicating the size of the character included in the character image in the character image and other characters in the vicinity of the character. And at least one of information indicating the likelihood of the result of character recognition processing for the character.

文字認識部１０２は、文字画像が入力されると、入力された文字画像に対して文字認識処理を実行する。検証部１０３は、文字認識部による文字認識処理の結果に第１の文字が含まれているかを識別し、含まれていると識別した場合に、作成部１０１によって作成された検証式を用いて検証を行う。 When a character image is input, the character recognition unit 102 performs a character recognition process on the input character image. The verification unit 103 identifies whether or not the first character is included in the result of the character recognition processing by the character recognition unit, and when the verification unit 103 identifies that the first character is included, the verification unit 103 uses the verification formula created by the creation unit 101 Perform verification.

すなわち、検証装置１００は、文字認識結果に誤認識しやすい文字が含まれている場合に、様々な情報を加味して予め作成した検証式を用いて、文字認識結果として得られた誤認識しやすい文字が文字認識結果として正しいかを検証する。この結果、実施例１に係る検証装置１００によれば、文字認識結果として得られた文字が正しいかを精度良く検証可能である。 That is, when the character recognition result includes a character that is easily misrecognized, the verification apparatus 100 uses the verification formula created in advance by taking various information into account, and performs the misrecognition obtained as the character recognition result. Verify that easy-to-read characters are correct as character recognition results. As a result, according to the verification apparatus 100 according to the first embodiment, it is possible to accurately verify whether the character obtained as the character recognition result is correct.

［実施例２に係る検証装置の構成］
実施例２に係る検証装置２００について説明する。まず、図２を用いて、実施例２に係る検証装置２００の構成の一例について説明する。図２は、実施例２に係る検証装置の構成の一例について説明するブロック図である。検証装置２００は、図２に示す例では、入力部２０１と、表示部２０２と、記憶部３００と、制御部４００とを有する。 [Configuration of Verification Device According to Second Embodiment]
A verification apparatus 200 according to the second embodiment will be described. First, an example of the configuration of the verification apparatus 200 according to the second embodiment will be described with reference to FIG. FIG. 2 is a block diagram illustrating an example of the configuration of the verification apparatus according to the second embodiment. In the example illustrated in FIG. 2, the verification device 200 includes an input unit 201, a display unit 202, a storage unit 300, and a control unit 400.

入力部２０１は、制御部４００と接続される。入力部２０１は、情報の入力を利用者から受け付け、受け付けた情報を制御部４００に送る。入力部２０１は、キーボードやマウス、マイク、あるいは、文書の文字画像を取得するイメージスキャナやカメラなどが該当する。表示部２０２は、制御部４００と接続される。表示部２０２は、制御部４００から情報を受け付け、受け付けた情報を利用者に表示する。表示部２０２は、モニタ（若しくはディスプレイ、タッチパネル）などが該当する。 The input unit 201 is connected to the control unit 400. The input unit 201 receives input of information from the user and sends the received information to the control unit 400. The input unit 201 corresponds to a keyboard, mouse, microphone, or an image scanner or camera that acquires a character image of a document. Display unit 202 is connected to control unit 400. The display unit 202 receives information from the control unit 400 and displays the received information to the user. The display unit 202 corresponds to a monitor (or display, touch panel) or the like.

なお、入力部２０１によって受け付けられる情報の詳細や、表示部２０２によって表示される情報の詳細については、ここでは説明を省略し、関係する各部について説明する際に併せて説明する。 The details of the information received by the input unit 201 and the details of the information displayed by the display unit 202 will be omitted here, and will be described together with the description of each related unit.

記憶部３００は、制御部４００と接続され、制御部４００による各種処理に用いるデータを記憶する。記憶部３００は、例えば、ＲＡＭ（Random Access Memory）やＲＯＭ（Read Only Memory）、フラッシュメモリ（flash memory）などの半導体メモリ素子、又は、ハードディスクや光ディスクなどの記憶装置である。記憶部３００は、図２に示す例では、学習用データテーブル３０１と、最良統合論理式テーブル３０２とを有する。 The storage unit 300 is connected to the control unit 400 and stores data used for various processes performed by the control unit 400. The storage unit 300 is, for example, a semiconductor memory device such as a random access memory (RAM), a read only memory (ROM), or a flash memory, or a storage device such as a hard disk or an optical disk. In the example illustrated in FIG. 2, the storage unit 300 includes a learning data table 301 and a best integrated logical expression table 302.

学習用データテーブル３０１は、文字画像に含まれる文字を示す文字情報に対応付けて、文字画像に含まれていた文字に関する属性値を予め記憶する。具体的には、属性値は、文字画像に含まれる文字の文字画像内での大きさを示す情報や、近傍にある他の文字との関連性を示す情報や、文字画像に含まれる文字に対する文字認識処理の結果の確からしさを示す情報などを含む。 The learning data table 301 stores in advance attribute values related to characters included in the character image in association with character information indicating characters included in the character image. Specifically, the attribute value corresponds to information indicating the size of the character included in the character image in the character image, information indicating the relationship with other characters in the vicinity, and the character included in the character image. It includes information indicating the likelihood of the result of the character recognition process.

なお、以下では、検証装置２００は、学習用データテーブル３０１を有し、学習用データテーブル３０１が、予め情報を記憶している場合を例に説明する。しかしながら、本発明はこれに限定されるものではない。例えば、検証装置２００は、学習用データテーブル３０１を有することなく、制御部４００が処理を実行するごとに、利用者が属性値を検証装置２００に入力しても良い。 In the following description, the verification apparatus 200 has a learning data table 301 and the learning data table 301 stores information in advance as an example. However, the present invention is not limited to this. For example, the verification apparatus 200 does not have the learning data table 301, and the user may input an attribute value to the verification apparatus 200 every time the control unit 400 executes the process.

ここで、図３を用いて、学習用データテーブル３０１に記憶された情報の一例について更に説明する。図３は、実施例２における学習用データテーブルに記憶された情報の一例について説明する図である。図３に示す例では、文字画像に対して文字認識処理の結果として得られた文字を示す「認識結果」を併せて記憶する場合を例に示した。また、図３の「正解」は、文字画像に実際に含まれていた文字を示す。また、以下では、説明の便宜上、「正解」と「認識結果」と「属性値」との対応付けを「事例データ」と称する。 Here, an example of information stored in the learning data table 301 will be further described with reference to FIG. FIG. 3 is a diagram illustrating an example of information stored in the learning data table according to the second embodiment. In the example shown in FIG. 3, the case where “recognition result” indicating the character obtained as a result of the character recognition processing is also stored in the character image is shown as an example. In addition, “correct answer” in FIG. 3 indicates characters actually included in the character image. In the following, for the sake of convenience of explanation, the correspondence between “correct answer”, “recognition result”, and “attribute value” is referred to as “example data”.

また、図３に示す例では、「属性値」として、「座標」と「距離値」と「正読確率」と「形態素」と「バイグラム確率」と「行座標」とを記憶する場合を例に示した。ここで、「座標」や「行座標」は、文字画像に含まれる文字の文字画像内での大きさを示す情報である。また、「形態素」や「バイグラム確率」は、近傍にある（当該文字に隣接する）他の文字との関連性を示す情報である。また、「距離値」や「正読確率」は、文字画像に含まれる文字に対する文字認識処理の結果の確からしさを示す情報である。 Further, in the example shown in FIG. 3, “coordinate”, “distance value”, “correct reading probability”, “morpheme”, “bigram probability”, and “row coordinate” are stored as “attribute values”. It was shown to. Here, “coordinates” and “line coordinates” are information indicating the size of characters included in the character image in the character image. Further, “morpheme” and “bigram probability” are information indicating relevance with other characters in the vicinity (adjacent to the character). The “distance value” and “correct reading probability” are information indicating the likelihood of the result of the character recognition process for characters included in the character image.

ここで、「距離値」や「正読確率」は、例えば、文字画像に対して文字認識処理を実行することで得られる。また、「座標」や「行座標」は、例えば、文字画像内における文字や行の位置識別することで得られる。また、「形態素」や「バイグラム確率」は、例えば、文字認識処理の結果得られた文字や文字列に対して、形態素解析やバイグラム確率を算出したりすることで得られる。 Here, the “distance value” and the “correct reading probability” are obtained, for example, by executing a character recognition process on a character image. The “coordinates” and “line coordinates” can be obtained by identifying the positions of characters and lines in the character image, for example. Further, “morpheme” and “bigram probability” are obtained by, for example, calculating morpheme analysis and bigram probability for a character or character string obtained as a result of character recognition processing.

ここで、「座標」は、文字画像における文字の位置を示す情報である。図３に示す例では、文字画像によって表される画像上に「ｘ軸」と「ｙ軸」とを設定した上で、「座標」として、文字データの左上の点を示す座標である「ｘｓ」「ｙｓ」と、文字データの右下の点を示す座標である「ｘｅ」「ｙｅ」とを用いる場合を示した。 Here, the “coordinate” is information indicating the position of the character in the character image. In the example shown in FIG. 3, “x-axis” and “y-axis” are set on the image represented by the character image, and “xs” is a coordinate indicating the upper left point of the character data as “coordinate”. In this example, “ys” and “xe” and “ye” which are coordinates indicating the lower right point of the character data are used.

「距離値」は、文字画像に含まれる文字について算出した特徴量と、図２には図示していない辞書テーブル内にある特徴量のうち、文字認識処理の結果となった文字についての特徴量との距離を示す情報である。「距離値」は、特徴空間内における２つの特徴量間の距離が小さいほど、距離が大きい場合よりも文字認識結果の信頼度が高いことを示す。なお、辞書テーブルは、文字ごとの特徴量が登録されている。 The “distance value” is the feature amount calculated for the character included in the character image and the feature amount for the character that is the result of the character recognition processing among the feature amounts in the dictionary table (not shown in FIG. 2). It is information which shows the distance. The “distance value” indicates that the smaller the distance between two feature quantities in the feature space, the higher the reliability of the character recognition result than when the distance is large. In the dictionary table, a feature amount for each character is registered.

「正読確率」は、文字認識結果の確からしさを示す値である。「正読確率」は、値が高いほど、値が小さい場合と比較して文字認識結果が正しい確率が高いことを示す。また、「正読確率」は、値が小さいほど、値が高い場合と比較して文字認識結果が正しい確率が低いことを示す。 The “correct reading probability” is a value indicating the probability of the character recognition result. The “correct reading probability” indicates that the higher the value, the higher the probability that the character recognition result is correct compared to the case where the value is small. Further, “correct reading probability” indicates that the smaller the value, the lower the probability that the character recognition result is correct compared to the case where the value is high.

「形態素」は、形態素解析の結果、形態素ごとに得られる品詞についての情報を示す。なお、形態素解析とは、文章を意味のある単語に区切り、辞書を利用して品詞を判別することを示す。例えば、「私は走った。」という文章に対して形態素解析を実行すると、「私＝名詞」「は＝助詞」「走っ＝動詞」「た＝助詞」「。＝句読点」という解析結果が得られる。図３に示す例では、「形態素」は、辞書を利用して品詞が判別できなかった形態素を示す「未登録語」か、辞書を利用して品詞が判別できた形態素を示す「登録語」か、「記号」かのいずれかである場合を例に示した。図３に示す例では、形態素「１」は「記号」であることを示し、形態素「２」は「登録語」であることを示し、形態素「３」は「未登録語」であることを示す。 The “morpheme” indicates information on the part of speech obtained for each morpheme as a result of the morpheme analysis. Note that morphological analysis indicates that sentences are divided into meaningful words and parts of speech are discriminated using a dictionary. For example, when morphological analysis is performed on the sentence “I ran”, the analysis results of “I = noun”, “ha = particle”, “run = verb”, “ta = particle”, “. = Punctuation” are obtained. It is done. In the example shown in FIG. 3, “morpheme” is “unregistered word” indicating a morpheme whose part of speech could not be determined using a dictionary, or “registered word” indicating a morpheme whose part of speech could be determined using a dictionary. Or a “symbol”. In the example shown in FIG. 3, morpheme “1” indicates “symbol”, morpheme “2” indicates “registered word”, and morpheme “3” indicates “unregistered word”. Show.

なお、辞書を利用して品詞が判別できない場合とは、形態素解析にて区切られた単語が辞書に登録されていない場合が該当する。例えば、「フィンランド」という名詞が辞書に登録されていない場合には、「フィンランド」を形成する各文字の形態素は、「未登録語」になる。 The case where the part of speech cannot be determined using a dictionary corresponds to the case where words delimited by morphological analysis are not registered in the dictionary. For example, when the noun “Finland” is not registered in the dictionary, the morpheme of each character forming “Finland” is “unregistered word”.

「バイグラム確率」は、言語的な確率のことであり、具体的には、文字「Ｘ１」の次に文字「Ｘ２」が出現する確率に関する値である。例えば、「フィンランド」を例に説明すると、「ィ」のバイグラム確率は、「ィ」の前に「フ」が出現する確率を示す。例えば、「大統領」という語句がよくでてくる場合を例に説明すると、「統」の前に「大」がでてくることを示すバイグラム確率は、「統」以外の文字である「武」の前に「大」がでてくることを示すバイグラム確率と比較して、値が高くなる。なお、図３に示す例では、バイグラム確率は、「頻度比率ｐ」と「定数Ｃ」とを用いて、「log（p）＊C」により算出される値を用いた。図３に示す例では、「バイグラム確率」は、「０」に近ければ近いほど、確率が大きいことを示す。なお、「頻度比率ｐ」は、文字「Ｘ１」の次に文字「Ｘ２」が出現する確率を示す。 The “bigram probability” is a linguistic probability, and specifically is a value related to the probability that the character “X2” appears after the character “X1”. For example, taking “Finland” as an example, the bigram probability of “i” indicates the probability of occurrence of “fu” before “i”. For example, in the case where the word “president” often appears, the bigram probability indicating that “large” appears before “torn” is a character other than “torn”, “take”. The value is higher than the bigram probability, which indicates that “large” appears before. In the example illustrated in FIG. 3, the bigram probability is a value calculated by “log (p) * C” using “frequency ratio p” and “constant C”. In the example illustrated in FIG. 3, the “bigram probability” indicates that the closer to “0”, the higher the probability. The “frequency ratio p” indicates the probability that the character “X2” appears after the character “X1”.

「行座標」は、文字画像に実際に含まれていた文字が属する行の位置を示す情報である。図３に示す例では、「行座標」として、文字画像に実際に含まれていた文字が属する行の左上の点を示す座標である「ｘｓ０」「ｘｅ０」と、行の右下の点を示す座標である「ｙｓ０」「ｙｅ０」とを用いる場合を示した。 The “line coordinate” is information indicating the position of the line to which the character actually included in the character image belongs. In the example illustrated in FIG. 3, as “line coordinates”, “xs0” and “xe0”, which are coordinates indicating the upper left point of the line to which the character actually included in the character image belongs, and the lower right point of the line. The case where “ys0” and “ye0” which are the coordinates shown are used is shown.

ここで、図３を用いて、学習用データテーブル３０１に記憶された情報の具体的な一例について更に説明する。図３に示す例では、文字画像に「カーディオバイク（運動」が含まれていた場合を例に示した。すなわち、図３の「正解」に示すように、学習用データテーブル３０１には、「カーディオバイク（運動」に含まれる文字や記号それぞれについて、文字認識処理を行うことで得られた情報を記憶する。より詳細には、図３に示すように、学習用データテーブル３０１は、正解「カ」について、認識結果「カ」を記憶し、「座標」として、xs「７１１」ys「２０６１」xe「７４７」ye「２１０１」を記憶する。つまり、学習用データテーブル３０１は、文字画像に文字「カ」が含まれ、文字認識処理の結果として得られた文字「カ」を記憶する。また、学習用データテーブル３０１は、文字画像に含まれる文字「カ」の左上の点の座標がxs「７１１」ys「２０６１」であり、「カ」の右下の点の座標がxe「７４７」ye「２１０１」であることを記憶する。 Here, a specific example of information stored in the learning data table 301 will be further described with reference to FIG. In the example shown in Fig. 3, the case where the character image includes "cardiobike (exercise)" is shown as an example, that is, as shown in the "correct answer" in Fig. 3, the learning data table 301 includes " For each character or symbol included in the cardiobike (exercise), the information obtained by performing the character recognition process is stored.In more detail, as shown in FIG. The recognition result “K” is stored for “K”, and xs “711” ys “2061” xe “747” ye “2101” is stored as “coordinates.” That is, the learning data table 301 is stored in a character image. The character “f” is included, and the character “f” obtained as a result of the character recognition process is stored, and the learning data table 301 has the coordinates of the upper left point of the character “f” included in the character image. xs "711" It is ys “2061”, and the coordinates of the lower right point of “K” are stored as xe “747” ye “2101”.

また、例えば、学習用データテーブル３０１は、正解「カ」について、距離値「６６６」と、正読確率「９８４」と、形態素「２」と、バイグラム確率「-５４８１１２」とを記憶する。また、例えば、学習用データテーブル３０１は、正解「カ」について、「行座標」として、xs０「７０６」ys０「６８」xe０「７７４」ye０「２２６６」を記憶する。つまり、学習用データテーブル３０１は、文字画像に含まれる文字「カ」が属する行の左上の点を示す座標がxs０「７０６」ys０「６８」であり、行の右下の点を示す座標がxe０「７７４」ye０「２２６６」であることを記憶する。また、学習用データテーブル３０１は、同様に、「カ」以降の「正解」についても属性値を記憶する。 For example, the learning data table 301 stores a distance value “666”, a correct reading probability “984”, a morpheme “2”, and a bigram probability “−548112” for the correct answer “K”. In addition, for example, the learning data table 301 stores xs0 “706” ys0 “68” xe0 “774” ye0 “2266” as “line coordinates” for the correct answer “f”. That is, in the learning data table 301, the coordinates indicating the upper left point of the line to which the character “K” included in the character image belongs are xs0 “706” ys0 “68”, and the coordinates indicating the lower right point of the line are xe0 “774” ye0 “2266” is stored. Similarly, the learning data table 301 stores attribute values for “correct answer” after “f”.

最良統合論理式テーブル３０２は、図４に示すように、「類似文字の組み合わせ」ごとに最良統合論理式を記憶する。なお、図４は、実施例２における最良統合論理式テーブルに記憶された情報の一例について説明する図である。「類似文字の組み合わせ」は、文字認識処理において誤認識される傾向のある文字の組み合わせを示し、図４に示すように、「対象とする文字」と「誤認識しやすい文字」とを含む。ここで、「対象とする文字」は、文字認識処理の対象となる文字を示し、「誤認識しやすい文字」は、「対象とする文字」に対する文字認識処理にて誤認識した結果に得られる可能性のある文字を示す。ここで、「誤認識しやすい文字」は、対象とする文字と形状が似ている文字が該当する。なお、対象とする文字は、「第１の文字」とも称し、誤認識しやすい文字は、「第２の文字」とも称する。 As shown in FIG. 4, the best integrated logical expression table 302 stores the best integrated logical expression for each “combination of similar characters”. FIG. 4 is a diagram illustrating an example of information stored in the best integrated logical expression table according to the second embodiment. The “similar character combination” indicates a combination of characters that tend to be erroneously recognized in the character recognition process, and includes “target character” and “character that is easily misrecognized” as shown in FIG. Here, “target character” indicates a character to be subjected to character recognition processing, and “character that is easily misrecognized” is obtained as a result of erroneous recognition in the character recognition processing for “target character”. Indicates a possible character. Here, “characters that are easily misrecognized” correspond to characters that are similar in shape to the target character. Note that the target character is also referred to as a “first character”, and a character that is easily misrecognized is also referred to as a “second character”.

例えば、文字画像に実際に含まれていた文字「イ」である場合に、「ィ」と誤認識されやすい場合を例に説明する。この場合、「類似文字の組み合わせ」は、対象とする文字「イ」と誤認識しやすい文字「ィ」とを含む。なお、他に、「類似文字の組み合わせ」としては、「ｌ（エル）」と「１（いち）」との組み合わせなどがある。 For example, a case where the character “I” actually included in the character image is easily misrecognized as “I” will be described as an example. In this case, the “similar character combination” includes the target character “I” and the easily misrecognized character “I”. In addition, as the “similar character combination”, there is a combination of “l” and “1”.

図４に示す例では、最良統合論理式テーブル３０２は、文字「イ」に対応付けて最良統合論理式「(F5(U=0.9)orF6(U=0.64))and(F5(U=0.76)orF6(U=0.64)orF2(U=2))and(F5(U=0.9)orF7(U=0.23))and(F5(U=0.76)orF7(U=0.23)orF2(U=2))」を記憶する。なお、最良統合論理式の詳細については、後述するため、ここでは説明を省略する。最良統合論理式は、「検証式」とも称する。 In the example illustrated in FIG. 4, the best integrated logical expression table 302 associates the character “I” with the best integrated logical expression “(F5 (U = 0.9) orF6 (U = 0.64)) and (F5 (U = 0.76)”. orF6 (U = 0.64) orF2 (U = 2)) and (F5 (U = 0.9) orF7 (U = 0.23)) and (F5 (U = 0.76) orF7 (U = 0.23) orF2 (U = 2)) '' Remember. Note that the details of the best integration logical expression will be described later, and thus the description thereof is omitted here. The best integrated logical formula is also referred to as a “verification formula”.

最良統合論理式テーブル３０２は、後述するように、制御部４００の最良統合論理式作成部４０４によって類似文字の組み合わせや最良統合論理式情報が格納される。また、最良統合論理式テーブル３０２に記憶された最良統合論理式は、制御部４００の検証部４０６によって用いられる。 As will be described later, the best integrated logical expression table 302 stores combinations of similar characters and best integrated logical expression information by the best integrated logical expression creating unit 404 of the control unit 400. In addition, the best integrated logical expression stored in the best integrated logical expression table 302 is used by the verification unit 406 of the control unit 400.

制御部４００は、入力部２０１、表示部２０２及び記憶部３００と接続される。また、制御部４００は、各種の制御手順などを規定したプログラムを記憶する内部メモリを有し、種々の制御処理を実行する。制御部４００は、例えば、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）、ＣＰＵ（Central Processing Unit）やＭＰＵ（Micro Processing Unit）などである。 The control unit 400 is connected to the input unit 201, the display unit 202, and the storage unit 300. In addition, the control unit 400 has an internal memory that stores a program that defines various control procedures and the like, and executes various control processes. The control unit 400 is, for example, an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a central processing unit (CPU), or a micro processing unit (MPU).

また、制御部４００は、機能部として、図２に示す例では、受付制御部４０１と、学習用データ取得部４０２と、最良論理式作成部４０３と、最良統合論理式作成部４０４と、文字認識部４０５と、検証部４０６とを有する。 In addition, in the example illustrated in FIG. 2, the control unit 400 includes a reception control unit 401, a learning data acquisition unit 402, a best logical expression creation unit 403, a best integrated logical expression creation unit 404, a character, It has a recognition unit 405 and a verification unit 406.

受付制御部４０１は、入力部２０１を介して、類似文字の組み合わせや条件を利用者から受け付ける。ここで、「条件」は、対象とする文字に対する文字認識処理にて誤認識した結果得られる可能性のある誤認識しやすい文字を対象とする文字と区別するものである。例えば、受付制御部４０１は、図５に示すような表示画面を表示部２０２にて表示する。そして、受付制御部４０１は、図５に示すような表示画面を介して、利用者から類似文字の組み合わせや条件を受け付ける。なお、図５は、実施例２における表示画面の一例について説明する図である。 The reception control unit 401 receives a combination of similar characters and conditions from the user via the input unit 201. Here, the “condition” is to distinguish a character that is likely to be erroneously recognized as a result of erroneous recognition in the character recognition process for the target character from the target character. For example, the reception control unit 401 displays a display screen as shown in FIG. And the reception control part 401 receives the combination and conditions of a similar character from a user via a display screen as shown in FIG. FIG. 5 is a diagram illustrating an example of the display screen in the second embodiment.

ここで、図５において、フィールド５０１は、対象とする文字の入力を受け付けるフィールドを示し、フィールド５０２は、誤認識しやすい文字の入力を受け付けるフィールドを示す。また、フィールド５０１やフィールド５０２に示す例では、対象とする文字として「イ」が入力されており、誤認識しやすい文字として「ィ」が入力されている場合を例に示した。また、フィールド５０３は、条件を入力する際に用いる定義を受け付けるフィールドである。フィールド５０３に示す例では、例えば、正読確率を表す定義は「ｃ」となっている場合を例に示した。また、図５のフィールド５０４は、条件の入力を受け付けるフィールドを示す。フィールド５０４に示す例では、条件Ｆ１〜条件Ｆ７になる。 Here, in FIG. 5, a field 501 indicates a field that receives an input of a target character, and a field 502 indicates a field that receives an input of a character that is easily misrecognized. Further, in the examples shown in the field 501 and the field 502, the case where “I” is input as a target character and “I” is input as a character that is easily misrecognized is shown as an example. A field 503 is a field for receiving a definition used when inputting a condition. In the example shown in the field 503, for example, the case where the definition indicating the correct reading probability is “c” is shown as an example. A field 504 in FIG. 5 indicates a field for accepting an input of a condition. In the example shown in the field 504, the conditions are F1 to F7.

ここで、図５に示した条件Ｆ１〜条件Ｆ７について簡単に説明する。なお、条件Ｆ１〜条件Ｆ７は、対象とする文字が「イ」であり、誤認識しやすい文字が「ィ」である場合に、利用者が入力した条件の一例である。以下では、条件Ｆ１〜条件Ｆ７を用いて説明するが、本発明はこれに限定されるものではなく、条件Ｆ１〜条件Ｆ７以外の条件を利用者が任意に設定して良い。 Here, the conditions F1 to F7 shown in FIG. 5 will be briefly described. Conditions F1 to F7 are examples of conditions input by the user when the target character is “I” and the character that is easily misrecognized is “I”. Below, although demonstrated using the conditions F1-condition F7, this invention is not limited to this, A user may set conditions other than the conditions F1-condition F7 arbitrarily.

条件Ｆ１は、「ｃ＞Ｕ」である。条件Ｆ１は、認識結果が正しければ、検証対象となる文字画像から得られた正読確率が所定の値以上であることを示す。条件Ｆ１は、認識結果が正しい可能性が高ければ高いほど、正読確率が高くなることを踏まえての条件である。ここで、条件Ｆ１の「ｃ」には、検証対象となる文字画像から得られた正読確率が代入される。なお、「Ｕ」は、不定値であり、後述する最良論理式作成部４０３によって、属性値を用いて決定される。 The condition F1 is “c> U”. If the recognition result is correct, the condition F1 indicates that the correct reading probability obtained from the character image to be verified is a predetermined value or more. The condition F1 is a condition based on the fact that the higher the possibility that the recognition result is correct, the higher the correct reading probability. Here, the correct reading probability obtained from the character image to be verified is assigned to “c” in the condition F1. “U” is an indefinite value, and is determined by the best logical expression creation unit 403, which will be described later, using the attribute value.

条件Ｆ２は、「ｍ＝Ｕ」である。条件Ｆ２は、認識結果が正しければ、検証対象となる文字画像から得られた形態素が所定の形態素になること示す。条件Ｆ２は、認識結果が正しければ、認識結果となる文字についての形態素解析結果が「登録語」「未登録語」「記号」のうち所定の１つになることを踏まえての条件である。「Ｕ」には、「登録語」か「未登録語」か「記号」のうちいずれかを示す値が用いられる。ここで、条件Ｆ２の「ｍ」には、検証対象となる文字画像から得られた形態素を示す値が代入される。 The condition F2 is “m = U”. The condition F2 indicates that if the recognition result is correct, the morpheme obtained from the character image to be verified becomes a predetermined morpheme. The condition F2 is a condition based on the fact that if the recognition result is correct, the morphological analysis result for the character that is the recognition result is a predetermined one of “registered word”, “unregistered word”, and “symbol”. A value indicating any one of “registered word”, “unregistered word”, and “symbol” is used for “U”. Here, a value indicating the morpheme obtained from the character image to be verified is substituted for “m” in the condition F2.

条件Ｆ３は、「ｄｉｓｔ（Ｙ）−ｄｉｓｔ（Ｘ）＞Ｕ」である。条件Ｆ３は、認識結果が正しければ、検証対象となる文字画像から得られた対象とする文字についての距離値の方が、誤認識しやすい文字に対する距離値よりも小さいことを示す。条件Ｆ３は、認識結果が正しければ、対象とする文字についての距離値が、誤認識しやすい文字についての距離値よりも小さくなることを踏まえての条件である。 The condition F3 is “dist (Y) −dist (X)> U”. If the recognition result is correct, the condition F3 indicates that the distance value for the target character obtained from the character image to be verified is smaller than the distance value for a character that is likely to be erroneously recognized. The condition F3 is a condition based on the fact that if the recognition result is correct, the distance value for the target character is smaller than the distance value for a character that is easily misrecognized.

条件Ｆ１〜条件Ｆ３については、対象とする文字が「イ」であり、誤認識しやすい文字が「ィ」である場合に必ずしも限定されることなく、用いることが可能な条件である。また、条件Ｆ４〜条件Ｆ７については、文字のサイズに違いがある場合に有効な条件である。例えば、文字のサイズに違いがある場合の例としては、対象とする文字が「イ」であり、誤認識しやすい文字が「ィ」である場合などがある。 The conditions F1 to F3 are conditions that can be used without being limited to the case where the target character is “I” and the character that is easily misrecognized is “I”. Conditions F4 to F7 are effective conditions when there is a difference in character size. For example, as a case where there is a difference in character size, there is a case where the target character is “I” and a character that is easily misrecognized is “I”.

条件Ｆ４は、「ｂ（Ｘ）−ｂ（Ｙ）＞Ｕ」である。条件Ｆ４は、認識結果が正しければ、検証対象となる文字画像から得られたバイグラム確率の方が、「ィ」のバイグラム確率よりも大きいことを示す。条件Ｆ４は、「イ」の方が「ィ」よりも使用頻度が高く、「イ」のバイグラム確率が「ィ」のバイグラム確率よりも大きくなることを踏まえての条件である。 The condition F4 is “b (X) −b (Y)> U”. Condition F4 indicates that if the recognition result is correct, the bigram probability obtained from the character image to be verified is larger than the bigram probability of “i”. Condition F4 is a condition based on the fact that “I” is more frequently used than “I”, and the bigram probability of “I” is larger than the bigram probability of “I”.

条件Ｆ５は、「（ｙｅ−ｙｓ＋１）／（ｙｅ０−ｙｓ０＋１）＞Ｕ」である。条件Ｆ５は、認識結果が正しければ、検証対象となる文字画像から得られた行座標のｙ軸上における間隔を分母とし、検証対象となる文字画像から得られた座標のｙ軸上における間隔を分子とした値が、所定の値以上であることを示す。条件Ｆ５は、「イ」の方が「ィ」よりも高さが大きい文字であり、「イ」の高さを行の高さで割った値は、「ィ」の高さを行の高さで割った値よりも大きくなることを踏まえての条件である。 The condition F5 is “(ye−ys + 1) / (ye0−ys0 + 1)> U”. If the recognition result is correct, Condition F5 uses the interval on the y-axis of the line coordinates obtained from the character image to be verified as the denominator, and sets the interval on the y-axis of the coordinates obtained from the character image to be verified. It shows that the value made into the molecule | numerator is more than predetermined value. Condition F5 is a character in which “I” is larger in height than “I”, and the value obtained by dividing the height of “I” by the height of the line is the height of “I”. This condition is based on the fact that it is larger than the value divided by the above.

条件Ｆ６は、「（ｘｅ−ｘｓ＋１）／（ｘｅ０−ｘｓ０＋１）＞Ｕ」である。条件Ｆ６は、認識結果が正しければ、検証対象となる文字画像から得られた行座標のｘ軸上における間隔を分母とし、検証対象となる文字画像から得られた座標のｘ軸上における間隔を分子とした値が、所定の値以上であることを示す。条件Ｆ６は、「イ」の方が「ィ」よりも幅が大きい文字であり、「イ」の幅を行の幅で割った値は、「ィ」の幅を行の幅で割った値よりも大きくなることを踏まえての条件である。 The condition F6 is “(xe−xs + 1) / (xe0−xs0 + 1)> U”. If the recognition result is correct, the condition F6 uses the interval on the x-axis of the line coordinates obtained from the character image to be verified as the denominator, and the interval on the x-axis of the coordinates obtained from the character image to be verified. It shows that the value made into the molecule | numerator is more than predetermined value. Condition F6 is that “i” has a larger width than “i”, and the value obtained by dividing the width of “i” by the width of the line is the value obtained by dividing the width of “i” by the width of the line. It is a condition based on the fact that it becomes larger than the above.

条件Ｆ７は、「（ｙｓ−ｙｓ０＋１）／（ｙｅ０−ｙｓ０＋１）＞Ｕ」である。条件Ｆ７は、認識結果が正しければ、検証対象となる文字画像から得られた行座標のｙ軸上における間隔を分母とし、座標の右上の点から行座標の右上の点までのｙ軸上の距離を分子とした値が、所定の値以上であることを示す。条件Ｆ７は、「イ」の方が「ィ」よりも高さが大きい文字であり、行の上辺から「イ」までの距離は、行の上辺から「ィ」までの距離よりも大きいことを踏まえての条件である。 The condition F7 is “(ys−ys0 + 1) / (ye0−ys0 + 1)> U”. Condition F7 is that if the recognition result is correct, the interval on the y-axis of the line coordinate obtained from the character image to be verified is used as the denominator, and the y-axis from the upper right point of the coordinate to the upper right point of the line coordinate. It indicates that the value with the distance as a numerator is a predetermined value or more. Condition F7 is that “i” is a character whose height is larger than “i”, and the distance from the upper side of the line to “i” is larger than the distance from the upper side of the line to “i”. It is a condition based on this.

学習用データ取得部４０２は、類似文字の組み合わせに関係する事例データを学習用データテーブル３０１から取得する。例えば、対象とする文字が「Ｘ」であり、誤認識しやすい文字が「Ｙ」である場合を例に説明する。この場合、学習用データ取得部４０２は、学習用データテーブル３０１に記憶された事例データのうち、下記の（Ａ）〜（Ｄ）に該当する事例データを取得する。
（Ａ）正解が「Ｘ」、認識結果が「Ｘ」である事例データ
（Ｂ）正解が「Ｘ」、認識結果が「Ｙ」である事例データ
（Ｃ）正解が「Ｙ」、認識結果が「Ｙ」である事例データ
（Ｄ）正解が「Ｙ」、認識結果が「Ｘ」である事例データ The learning data acquisition unit 402 acquires case data related to a combination of similar characters from the learning data table 301. For example, a case where the target character is “X” and the easily misrecognized character is “Y” will be described as an example. In this case, the learning data acquisition unit 402 acquires case data corresponding to the following (A) to (D) among the case data stored in the learning data table 301.
(A) Case data with correct answer “X” and recognition result “X” (B) Case data with correct answer “X” and recognition result “Y” (C) Correct answer “Y” and recognition result Case data with “Y” (D) Case data with correct answer “Y” and recognition result “X”

対象とする文字が「イ」であり、誤認識しやすい文字が「ィ」である場合を例に、更に説明する。学習用データ取得部４０２は、正解が「イ」であり認識結果が「イ」である事例データ（Ａ）と、正解が「イ」であり認識結果が「ィ」である事例データ（Ｂ）とを取得する。また、学習用データ取得部４０２は、正解が「ィ」であり認識結果が「ィ」である事例データ（Ｃ）と、正解が「ィ」であり認識結果が「イ」である事例データ（Ｄ）とを取得する。つまり、学習用データ取得部４０２は、正解が「イ」か「ィ」であり、認識結果が「イ」か「ィ」である事例データ各々を取得する。 The case where the target character is “I” and the easily misrecognized character is “I” will be further described as an example. The learning data acquisition unit 402 includes case data (A) in which the correct answer is “I” and the recognition result is “I”, and case data (B) in which the correct answer is “I” and the recognition result is “I”. And get. Further, the learning data acquisition unit 402 includes case data (C) in which the correct answer is “I” and the recognition result is “I”, and case data (C) in which the correct answer is “I” and the recognition result is “I”. D). That is, the learning data acquisition unit 402 acquires each case data whose correct answer is “I” or “I” and whose recognition result is “I” or “I”.

なお、以下では、説明の便宜上、事例データ（Ａ）の集合を「集合Ａ」と称し、事例データ（Ｂ）の集合を「集合Ｂ」と称する。また、「集合Ａ」と「集合Ｂ」との和集合を「集合ＳＸ」と称する。また、事例データ（Ｃ）の集合を「集合Ｃ」と称し、事例データ（Ｄ）の集合を「集合Ｄ」と称する。また、「集合Ｃ」と「集合Ｄ」との和集合を「集合ＳＹ」と称する。 Hereinafter, for convenience of explanation, a set of case data (A) is referred to as “set A”, and a set of case data (B) is referred to as “set B”. The union of “set A” and “set B” is referred to as “set SX”. A set of case data (C) is referred to as “set C”, and a set of case data (D) is referred to as “set D”. The union of “set C” and “set D” is referred to as “set SY”.

最良論理式作成部４０３は、利用者によって入力された条件と事例データとを用いて、対象とする文字が正解となる事例データごとに、対象とする文字と誤認識しやすい文字とを区別する論理式を作成する。ここで、図６−１〜図６−４を用いて、最良論理式作成部４０３による処理の全体像について説明する。なお、図６−１〜図６−４は、実施例２における最良論理式作成部４０３による処理の全体像について説明する図である。 The best logical expression creation unit 403 uses the condition and case data input by the user to distinguish between the target character and the easily misrecognized character for each case data in which the target character is correct. Create a logical expression. Here, the overall image of the process performed by the best logical expression creation unit 403 will be described with reference to FIGS. FIGS. 6A to 6D are diagrams for explaining the overall process performed by the best logical expression creation unit 403 according to the second embodiment.

図６−１〜図６−４においては、各事例データに含まれる属性値をプロットがされている。具体的には、図６−１〜図６−４の「Ｃ」は、対象とする文字が正解となる事例データを示す。例えば、図６−１〜図６−４の「Ｃ」は、正解が「Ｘ」の事例データを示し、「集合ＳＸ」に含まれる事例データ各々が該当する。また、図６−１〜図６−４の「Ｅ」は、誤認識しやすい文字が正解となる事例データを示す。例えば、図６−１〜図６−４の「Ｅ」は、正解が「Ｙ」の事例データを示し、「集合ＳＹ」に含まれる事例データ各々が該当する。 In FIGS. 6A to 6D, the attribute values included in each case data are plotted. Specifically, “C” in FIGS. 6-1 to 6-4 indicates case data in which the target character is correct. For example, “C” in FIGS. 6-1 to 6-4 indicates case data whose correct answer is “X”, and each case data included in the “set SX” corresponds. In addition, “E” in FIGS. 6-1 to 6-4 indicates case data in which a character that is easily misrecognized is correct. For example, “E” in FIGS. 6-1 to 6-4 indicates case data whose correct answer is “Y”, and each case data included in the “set SY” corresponds.

ここで、図６−１は、「集合ＳＸ」や「集合ＳＹ」に含まれる事例データの一例を示す。ここで、図６−１の「矢印」に示すように、最良論理式作成部４０３は、「集合ＳＸ」に含まれる事例データを１つ選択する。つまり、最良論理式作成部４０３は、図６−１の「Ｃ」を１つ選択する。そして、図６−２のグループ１０に示すように、最良論理式作成部４０３は、選択した事例データを満たす論理式を作成する。そして、図６−３のグループ１１〜１３に示すように、最良論理式作成部４０３は、作成した論理式に対して一般化処理を実行することで、図６−２のグループ１０で示した論理式よりも条件を緩めた複数の論理式を作成する。そして、図６−４のグループ１４に示すように、最良論理式作成部４０３は、一般化処理を実行することで作成した複数の論理式のうち、評価値が最も高い論理式を１つ選択する。また、最良論理式作成部４０３は、「集合ＳＸ」に含まれる事例データそれぞれについて、図６−１から図６−４にて説明した処理を実行する。なお、一般化処理を実行することで作成された論理式を「除外条件式」とも称する。また、選択された論理式を「高評価除外条件式」とも称する。なお、評価値の詳細については後述する。 Here, FIG. 6A illustrates an example of case data included in the “set SX” and the “set SY”. Here, as indicated by “arrow” in FIG. 6A, the best logical expression creation unit 403 selects one case data included in the “set SX”. That is, the best logical expression creation unit 403 selects one “C” in FIG. Then, as shown in the group 10 of FIG. 6B, the best logical expression creation unit 403 creates a logical expression that satisfies the selected case data. Then, as shown in groups 11 to 13 in FIG. 6-3, the best logical expression creation unit 403 performs generalization processing on the created logical expressions, thereby indicating the group 10 in FIG. 6-2. Create multiple logical expressions with relaxed conditions than logical expressions. Then, as shown in the group 14 of FIG. 6-4, the best logical expression creation unit 403 selects one logical expression having the highest evaluation value from among the plurality of logical expressions created by executing the generalization process. To do. Further, the best logical expression creation unit 403 executes the processing described with reference to FIGS. 6-1 to 6-4 for each case data included in the “set SX”. A logical expression created by executing the generalization process is also referred to as an “exclusion conditional expression”. The selected logical expression is also referred to as a “high evaluation exclusion condition expression”. Details of the evaluation value will be described later.

以下では、最良論理式作成部４０３による処理のうち、図６−２に示した論理式を作成する処理と、図６−３に示した一般化処理により複数の論理式を作成する処理と、図６−４に示した評価値が最も高い論理式を１つ選択する処理とについて、更に説明する。 In the following, among the processes by the best logical expression creating unit 403, the process of creating the logical expression shown in FIG. 6-2, the process of creating a plurality of logical expressions by the generalization process shown in FIG. 6-3, The process of selecting one logical expression having the highest evaluation value shown in FIG. 6-4 will be further described.

図６−２に示した論理式を作成する処理について更に説明する。論理式を作成する処理について説明する際には、条件Ｆ１〜条件Ｆ７を例に説明する。また、例えば、事例データは、座標「（633、178）、（666、223）」、距離値「538」、正読確率「778」、形態素「2」、バイグラム確率「-1489414」、行座標「（614、64）、（688、2278）」である場合を例に用いる。また、例えば、「ィ」の距離値が「７８９」であり、バイグラム確率が「-2509827」であるものとして説明する。なお、「ィ」の距離値やバイグラム確率は、「ィ」が正解となる事例データに含まれる距離値やバイグラム確率を用いる。また、条件Ｆ５〜条件Ｆ７については、縦書き用に変換した上で用いる場合を例に用いる。つまり、縦書き用の条件Ｆ５〜条件Ｆ７は、下記のようになる。
縦書き用の条件Ｆ５「（ｙｅ−ｙｓ＋１）／（ｘｅ０−ｘｓ０＋１）＞Ｕ」
縦書き用の条件Ｆ６「（ｘｅ−ｘｓ＋１）／（ｘｅ０−ｘｓ０＋１）＞Ｕ」
縦書き用の条件Ｆ７「（ｘｓ−ｘｓ０＋１）／（ｘｅ０−ｘｓ０＋１）＜Ｕ」 The process of creating the logical expression shown in FIG. When describing the process of creating a logical expression, the conditions F1 to F7 will be described as examples. Also, for example, the case data includes coordinates “(633, 178), (666, 223)”, distance value “538”, correct reading probability “778”, morpheme “2”, bigram probability “-1489414”, row coordinates The case of “(614, 64), (688, 2278)” is used as an example. Further, for example, it is assumed that the distance value of “i” is “789” and the bigram probability is “−2509827”. For the distance value and bigram probability of “i”, the distance value and bigram probability included in the case data in which “i” is correct are used. Moreover, about the conditions F5-F7, the case where it uses after converting for vertical writing is used for an example. That is, the vertical writing conditions F5 to F7 are as follows.
Condition F5 for vertical writing “(ye−ys + 1) / (xe0−xs0 + 1)> U”
Vertical writing condition F6 “(xe−xs + 1) / (xe0−xs0 + 1)> U”
Condition F7 for vertical writing “(xs−xs0 + 1) / (xe0−xs0 + 1) <U”

ここで、最良論理式作成部４０３は、事例データに含まれる属性値と条件とを用いて、属性値に関する等式や不等式を作成する。例えば、事例データの形態素は「２」であり、「イ」であれば形態素が「２」であることが多いことを踏まえ、最良論理式作成部４０３は、条件Ｆ２について、「ｍ＝Ｕ」の「Ｕ」を「２」に設定する。この結果、最良論理式作成部４０３は、条件Ｆ２について「ｍ＝２」という等式を作成する。また、最良論理式作成部４０３は、同様に、その他の条件についても「Ｕ」を設定することで、等式や不等式を作成する。 Here, the best logical expression creating unit 403 creates an equation or an inequality related to the attribute value by using the attribute value and the condition included in the case data. For example, based on the fact that the morpheme of the case data is “2” and the morpheme is “2” in many cases of “I”, the best logical expression creation unit 403 sets “m = U” for the condition F2. Set “U” to “2”. As a result, the best logical expression creation unit 403 creates an equation “m = 2” for the condition F2. Similarly, the best logical expression creation unit 403 creates equality and inequalities by setting “U” for other conditions as well.

また、最良論理式作成部４０３は、作成した等式や不等式をａｎｄ条件で結合することで、選択した事例データについての論理式を作成する。例えば、条件Ｆ２について作成した「ｍ＝２」という等式や、他の条件について作成された等式や不等式をａｎｄ条件で結合する。この結果、最良論理式作成部４０３は、論理式として下記の（論理式１）を作成する。 Also, the best logical expression creation unit 403 creates a logical expression for the selected case data by combining the created equality and inequality under the AND condition. For example, the equation “m = 2” created for the condition F2 and the equations and inequalities created for other conditions are combined with the and condition. As a result, the best logical expression creation unit 403 creates the following (logical expression 1) as a logical expression.

（論理式１）「（c＞778） and （m＝2） and （dist（Y）-dist（X）＞789-538） and （b（X）-b（Y）＞-1489414＋2509827） and （（ye-ys＋1）／（xe0-xs0＋1）＞46／75） and （（xe-xs＋1）／（xe0-xs0＋1）＞34／75） and （（xs-xs0＋1）／（xe0-xs0＋1）＜20／75）」 (Formula 1) “(c> 778) and (m = 2) and (dist (Y) -dist (X)> 789-538) and (b (X) -b (Y)>-1489414 + 2509827) and ( (Ye-ys + 1) / (xe0-xs0 + 1)> 46/75) and ((xe-xs + 1) / (xe0-xs0 + 1)> 34/75) and ((xs-xs0 + 1) / (xe0-xs0 + 1) <20 / 75) "

なお、（論理式１）は、下記の（論理式２）のように記載しても良い。
（論理式２）「F1（U＝778） and F2（U＝2） and F3（U＝789-538） and F4（U＝-1489414＋2509827） and F5（U＝46／75）） and F6（U＝34／75） and F7（U＝20／75）」 (Logical expression 1) may be described as (logical expression 2) below.
(Formula 2) “F1 (U = 778) and F2 (U = 2) and F3 (U = 789-538) and F4 (U = -1489414 + 2509827) and F5 (U = 46/75)) and F6 (U = 34/75) and F7 (U = 20/75) "

次に、図６−３に示した一般化処理により複数の論理式を作成する処理について説明する。一般化処理とは、論理式を形成する構成要素を徐々に取り除くことによって条件を緩め、より一般的な論理式を作成する処理である。具体的には、最良論理式作成部４０３は、論理式を形成する構成要素を組み合わせ的に減らしていくことにより、より一般的な論理式を作成する。なお、構成要素とは、論理式に含まれる等式や不等式を示す。 Next, processing for creating a plurality of logical expressions by the generalization processing shown in FIG. 6-3 will be described. The generalization process is a process of relaxing a condition by gradually removing components that form a logical expression and creating a more general logical expression. Specifically, the best logical expression creating unit 403 creates a more general logical expression by combining and reducing the components forming the logical expression. Note that a component means an equation or an inequality included in a logical expression.

ここで、何らルールを設定することなく構成要素を減らすと、作成される論理式の数が組み合わせ爆発を起こすことを踏まえ、以下では、最良論理式作成部４０３が、評価値が上位３個の論理式に限定して段階的に構成要素を減らす手法を例に説明する。なお、以下に説明する手法は「ビーム探索」と称する。なお、以下では、「ビーム探索」を用いる場合を例に説明するが、本発明はこれに限定されるものではなく、他の公知の手法を用いても良い。なお、以下では、上位３個の論理式に限定する場合を例に説明するが、本発明はこれに限定されるものではなく、利用者が任意の値に設定して良い。 Here, based on the fact that if the number of components is reduced without setting any rules, the number of logical expressions to be created causes a combined explosion, the best logical expression creating unit 403 will have the top three evaluation values below. A method for reducing the number of components in a stepwise manner by limiting to a logical expression will be described as an example. The method described below is referred to as “beam search”. In the following, a case where “beam search” is used will be described as an example. However, the present invention is not limited to this, and other known methods may be used. In the following description, the case of limiting to the top three logical expressions will be described as an example. However, the present invention is not limited to this, and the user may set an arbitrary value.

では、一般化処理について、図６−２にて作成された論理式から構成要素を１つ減らす場合を例に説明する。図７−１は、実施例２における一般化処理について説明する図である。ここで、図７−１の「６０１」は、図６−２にて作成された論理式を示す。また、図７−１の「１」〜「７」は、それぞれ、最良論理式作成部４０３が作成した等式や不等式を示す。 Now, the generalization process will be described by taking as an example a case where one component is reduced from the logical expression created in FIG. FIG. 7A is a schematic diagram illustrating generalization processing according to the second embodiment. Here, “601” in FIG. 7A represents the logical expression created in FIG. Further, “1” to “7” in FIG. 7A indicate the equality and inequality created by the best logical expression creation unit 403, respectively.

ここで、図７−１の「６０２」に示す例では、最良論理式作成部４０３は、図７−１の「６０１」に示す論理式を形成する構成要素のうちいずれか１つを減らすことで、「７」個の論理式を作成する。なお、図７−１の「６０２」の「×」は、一般化処理によって減らされた構成要素を示す。そして、図７−１の「６０３」に示すように、最良論理式作成部４０３は、図７−１の「６０２」の論理式のうち、評価値が高い上位「３」個に入る論理式を選択する。なお、評価値が高い論理式を選択する処理については、図６−４にて説明する処理と同様であり、後述するためここでは説明を省略する。 Here, in the example indicated by “602” in FIG. 7A, the best logical expression creating unit 403 reduces any one of the components forming the logical expression indicated by “601” in FIG. Thus, “7” logical expressions are created. Note that “x” in “602” in FIG. 7A indicates components reduced by the generalization processing. Then, as indicated by “603” in FIG. 7-1, the best logical expression creating unit 403 includes logical expressions of “3” having the highest evaluation value among the logical expressions of “602” in FIG. Select. Note that the process of selecting a logical expression with a high evaluation value is the same as the process described with reference to FIG.

次に、図７−２を用いて、図６−２にて作成された論理式から構成要素を１つ減らした後に、更に１つ構成要素を減らす場合を例に説明する。図７−２は、実施例２における一般化処理について説明する図である。ここで、図７−２の「６０４」は、図７−２の「６０３」にて選択された論理式を示す。また、図７−２の「１」〜「７」は、それぞれ、最良論理式作成部４０３が作成した等式や不等式を示す。 Next, with reference to FIG. 7B, an example will be described in which one component is further reduced after one component is reduced from the logical expression created in FIG. 6-2. FIG. 7B is a schematic diagram illustrating generalization processing according to the second embodiment. Here, “604” in FIG. 7-2 represents the logical expression selected in “603” in FIG. 7-2. In addition, “1” to “7” in FIG. 7B indicate equality and inequality created by the best logical expression creation unit 403, respectively.

ここで、図７−２の「６０４」に示すように、最良論理式作成部４０３は、図７−１の「６０３」に示した「３」個の論理式を、一般化処理の対象とする。具体的には、最良論理式作成部４０３は、図７−２の「６０４」に示す「３」個の論理式それぞれについて、論理式の構成要素のうちいずれか１つを減らす。この結果、図７−２の「６０５」に示す例では、最良論理式作成部４０３は、「１８」個の論理式を作成する。そして、図７−２の「６０６」に示すように、最良論理式作成部４０３は、図７−２の「６０５」の論理式のうち、評価値が高い上位「３」個に入る論理式を選択する。なお、評価値が高い論理式を選択する処理については、図６−４にて説明する処理と同様であり、後述するためここでは説明を省略する。 Here, as indicated by “604” in FIG. 7B, the best logical expression creating unit 403 sets “3” logical expressions indicated by “603” in FIG. To do. Specifically, the best logical expression creation unit 403 reduces any one of the components of the logical expression for each of “3” logical expressions indicated by “604” in FIG. As a result, in the example indicated by “605” in FIG. 7B, the best logical expression creating unit 403 creates “18” logical expressions. Then, as indicated by “606” in FIG. 7B, the best logical expression creating unit 403 includes the logical expression of “3” having the highest evaluation value among the logical expressions “605” in FIG. Select. Note that the process of selecting a logical expression with a high evaluation value is the same as the process described with reference to FIG.

また、最良論理式作成部４０３は、同様の処理を繰り返すことで、図６−２にて作成された論理式から構成要素を段階的に減らし、複数の論理式を作成する。なお、一般化処理の詳細な流れの一例については、図１２を用いて後述するため、ここでは説明を省略する。 Further, the best logical expression creation unit 403 repeats the same processing, thereby reducing the number of components in a stepwise manner from the logical expression created in FIG. 6-2 and creating a plurality of logical expressions. An example of a detailed flow of the generalization process will be described later with reference to FIG.

次に、図６−４に示した評価値が最も高い論理式を１つ選択する処理について、図８を用いて説明する。図８は、実施例２における評価値が最も高い論理式を選択する処理について説明する図である。図８の「７０１」は、最良論理式作成部４０３によって作成された論理式を示す。例えば、図８の「７０１」に示す例では、図８の「Ｓ０」は、構成要素が１つも減らされていない論理式を示し、図８の「Ｓ１」は、構成要素が１つ減らされた論理式を示す。 Next, processing for selecting one logical expression having the highest evaluation value shown in FIG. 6-4 will be described with reference to FIG. FIG. 8 is a diagram illustrating processing for selecting a logical expression having the highest evaluation value in the second embodiment. “701” in FIG. 8 indicates a logical expression created by the best logical expression creation unit 403. For example, in the example shown in “701” in FIG. 8, “S0” in FIG. 8 represents a logical expression in which no component is reduced, and “S1” in FIG. 8 is reduced by one component. Shows the logical expression.

ここで、図８の「７０２」に示すように、最良論理式作成部４０３は、図８の「７０１」に示す論理式のうち最も評価値の高い論理式を選択する。具体的には、最良論理式作成部４０３は、論理式が満たすべき事例データの数（正事例数）と、満たすべきでない事例データの数（負事例数）と、論理式を形成する構成要素数とを用いて、評価値を決定する。ここで、「正事例数」は、「集合ＳＸ」に含まれる事例データを何個説明したかを示す。例えば、図６−４に示す例では、「正事例数」は、「３」個になる。「負事例数」は、「集合ＳＹ」に含まれる事例データを何個満たしたかを示す。例えば、図６−４に示す例では、「負事例数」は、「０」個になる。また、「構成要素数」は、ａｎｄ条件やｏｒ条件を用いて連結された「認識結果の属性に関する等式や不等式」の数を示す。例えば、図８の「７０２」に示す論理式では、「６」になる。 Here, as indicated by “702” in FIG. 8, the best logical expression creating unit 403 selects the logical expression having the highest evaluation value among the logical expressions indicated by “701” in FIG. Specifically, the best logical expression creation unit 403 includes the number of case data that should be satisfied by the logical expression (the number of positive cases), the number of case data that should not be satisfied (the number of negative cases), and the components that form the logical expression The evaluation value is determined using the number. Here, the “number of positive cases” indicates how many case data included in the “set SX” have been described. For example, in the example illustrated in FIG. 6-4, the “number of correct cases” is “3”. The “number of negative cases” indicates how many case data included in the “set SY” are satisfied. For example, in the example illustrated in FIG. 6-4, the “number of negative cases” is “0”. The “number of components” indicates the number of “equals and inequalities regarding the attribute of the recognition result” connected using the and condition and the or condition. For example, in the logical expression indicated by “702” in FIG.

ここで、最良論理式作成部４０３は、負事例数が少なければ少ないほど、負事例数が多い論理式よりも評価値が高いと判定する。また、最良論理式作成部４０３は、正事例数が多ければ多いほど、正事例数が少ない論理式よりも評価値が高いと判定する。また、最良論理式作成部４０３は、構成要素数が多ければ多いほど、構成要素数が少ない論理式よりも評価値が高いと判定する。そして、最良論理式作成部４０３は、評価値が最も高いと判定した論理式を論理式として選択する。なお、以下では、最良論理式作成部４０３によって評価値が最も高いと判定された論理式を「最良論理式」と称する。 Here, the best logical expression creation unit 403 determines that the smaller the number of negative cases, the higher the evaluation value than the logical expression having a large number of negative cases. In addition, the best logical expression creation unit 403 determines that the larger the number of positive cases, the higher the evaluation value than the logical expression with a smaller number of positive cases. In addition, the best logical expression creation unit 403 determines that the larger the number of components, the higher the evaluation value than the logical expression with a smaller number of components. Then, the best logical expression creation unit 403 selects a logical expression that is determined to have the highest evaluation value as a logical expression. In the following, the logical expression that is determined by the best logical expression creating unit 403 to have the highest evaluation value is referred to as “best logical expression”.

上述のように、最良論理式作成部４０３は、「集合ＳＸ」に含まれる事例データそれぞれについて、図６−１から図６−４にて説明した処理を実行する。この結果、最良論理式作成部４０３は、「集合ＳＸ」に含まれる事例データそれぞれについて、最良論理式を作成する。 As described above, the best logical expression creation unit 403 executes the processing described with reference to FIGS. 6-1 to 6-4 for each case data included in the “set SX”. As a result, the best logical expression creation unit 403 creates the best logical expression for each case data included in the “set SX”.

図２の説明に戻る。最良統合論理式作成部４０４は、最良論理式各々を用いて、対象とする文字が正解となる事例データすべてを説明する論理式である最良統合論理式を作成する。ここで、図９−１〜図９−４を用いて、最良統合論理式作成部４０４による処理の全体像について説明する。なお、図９−１〜図９−４は、実施例２における最良統合論理式作成部による処理の全体像について説明する図である。 Returning to the description of FIG. The best integrated logical expression creation unit 404 uses each of the best logical expressions to create a best integrated logical expression that is a logical expression that explains all case data in which the target character is correct. Here, the overall image of the process performed by the best integrated logical expression creation unit 404 will be described with reference to FIGS. FIG. 9A to FIG. 9D are diagrams for explaining the overall image of processing by the best integrated logical expression creation unit in the second embodiment.

図９−１〜図９−４においては、各事例データに含まれる属性値をプロットがされている。具体的には、図９−１〜図９−４の「Ｃ」は、対象とする文字が正解となる事例データを示す。例えば、図９−１〜図９−４の「Ｃ」は、正解が「Ｘ」の事例データを示し、「集合ＳＸ」に含まれる事例データ各々が該当する。また、図９−１〜図９−４の「Ｅ」は、誤認識しやすい文字が正解となる事例データを示す。例えば、図９−１〜図９−４の「Ｅ」は、正解が「Ｙ」の事例データを示し、「集合ＳＹ」に含まれる事例データ各々が該当する。 In FIG. 9A to FIG. 9D, the attribute values included in each case data are plotted. Specifically, “C” in FIGS. 9-1 to 9-4 indicates case data in which the target character is correct. For example, “C” in FIGS. 9-1 to 9-4 indicates case data whose correct answer is “X”, and each case data included in the “set SX” corresponds. Further, “E” in FIGS. 9-1 to 9-4 indicates case data in which a character that is easily misrecognized is correct. For example, “E” in FIGS. 9-1 to 9-4 indicates case data whose correct answer is “Y”, and each case data included in the “set SY” corresponds.

ここで、最良統合論理式作成部４０４は、最も評価値が高い最良論理式を選択する。例えば、図９−１のグループ２１で表した最良統合論理式を選択する。そして、最良統合論理式作成部４０４は、被覆チェックを行う。具体的には、最良統合論理式作成部４０４は、選択した最良論理式によって説明されない（被覆されない）事例データがあるかを判定する。ここで、被覆チェックにおいて対象となる事例データは、対象とする文字が正解となる事例データである。最良統合論理式作成部４０４は、あると判定した場合には、図９−２の「矢印」に示すように、被覆されなかった事例データを１つ選択し、図９のグループ２２に示すように、選択した事例データについての最良論理式を選択する。また、最良統合論理式作成部４０４は、図９−１と図９−２とにおいて選択した最良論理式をｏｒ条件にて結合する。そして、図９−３のグループ２３やグループ２４に示すように、最良統合論理式作成部４０４は、結合した論理式に対して一般化処理を実行することで、複数の論理式を作成する。そして、図９−４のグループ２５に示すように、最良論理式作成部４０３は、一般化処理を実行することで作成した複数の論理式のうち、評価値が最も高い論理式を１つ選択する。なお、最良統合論理式作成部４０４によって選択された最も評価値が高い最良論理式を「高評価条件式」とも称する。 Here, the best integrated logical expression creation unit 404 selects the best logical expression having the highest evaluation value. For example, the best integrated logical expression represented by the group 21 in FIG. Then, the best integrated logical expression creation unit 404 performs a covering check. Specifically, the best integrated logical expression creating unit 404 determines whether there is case data that is not explained (not covered) by the selected best logical expression. Here, the case data targeted in the covering check is case data in which the target character is correct. When it is determined that there is a best integrated logical expression creation unit 404, as shown by “arrow” in FIG. 9-2, one case data that is not covered is selected and shown in the group 22 in FIG. Then, the best logical expression for the selected case data is selected. The best integrated logical expression creation unit 404 combines the best logical expressions selected in FIGS. 9-1 and 9-2 with the or condition. 9-3, the best integrated logical expression creation unit 404 creates a plurality of logical expressions by executing generalization processing on the combined logical expressions. 9-4, the best logical expression creation unit 403 selects one logical expression having the highest evaluation value from the plurality of logical expressions created by executing the generalization process. To do. The best logical expression having the highest evaluation value selected by the best integrated logical expression creating unit 404 is also referred to as “high evaluation conditional expression”.

また、最良統合論理式作成部４０４は、選択した論理式によって説明されない事例データがあるかを判定し、あると判定した場合には、図９−２〜図９−４を用いて説明した処理を繰り返す。一方、最良統合論理式作成部４０４は、選択した論理式によって説明されない事例データがないと判定した場合には、選択した論理式を「最良統合論理式」とする。この結果、最良統合論理式作成部４０４は、対象とする文字が正解となる事例データすべてを説明する最良統合論理式を「１」個作成する。 Also, the best integrated logical expression creation unit 404 determines whether there is case data that is not explained by the selected logical expression, and if it is determined that there is, the process described with reference to FIGS. 9-2 to 9-4 repeat. On the other hand, when it is determined that there is no case data that is not explained by the selected logical expression, the best integrated logical expression creating unit 404 sets the selected logical expression as the “best integrated logical expression”. As a result, the best integrated logical expression creation unit 404 creates “1” best integrated logical expressions that explain all the case data in which the target character is correct.

なお、ここで、最良統合論理式作成部４０４による一般化処理は、最良論理式作成部４０３によって行われる処理と同様であり、詳細な説明については省略する。また、最良統合論理式作成部４０４による評価値が最も高い論理式を１つ選択する処理は、最良論理式作成部４０３によって行われる処理と同様であり、詳細な説明については省略する。 Here, the generalization process by the best integrated logical expression creating unit 404 is the same as the process performed by the best logical expression creating unit 403, and detailed description thereof is omitted. The process of selecting one logical expression having the highest evaluation value by the best integrated logical expression creating unit 404 is the same as the process performed by the best logical expression creating unit 403, and detailed description thereof is omitted.

上述したように、最良論理式作成部４０３と最良統合論理式作成部４０４とは、協働することで、条件や、第１の文字や第２の文字についての属性値を用いて、文字認識処理の結果が第１の文字である場合に結果の正誤を検証する検証式を作成する。 As described above, the best logical expression creation unit 403 and the best integrated logical expression creation unit 404 cooperate to perform character recognition using conditions and attribute values for the first character and the second character. If the result of the processing is the first character, a verification formula for verifying the correctness of the result is created.

文字認識部４０５は、文字画像を入力部２０１から受け付けると、受け付けた文字画像に対して文字認識処理を実行し、文字画像に含まれる文字を文字コードとして認識する。そして、文字認識部４０５は、文字画像のうち、文字が含まれている部分の文字画像から特徴量を算出し、算出した特徴量との類似度が最も高い文字を辞書テーブルから読み出し、文字認識結果とする。 When the character recognition unit 405 receives a character image from the input unit 201, the character recognition unit 405 performs character recognition processing on the received character image, and recognizes a character included in the character image as a character code. Then, the character recognition unit 405 calculates a feature amount from the character image of the portion of the character image that includes the character, reads a character having the highest similarity with the calculated feature amount from the dictionary table, and performs character recognition. As a result.

検証部４０６は、検証部４０６は、文字認識部４０５による文字認識結果に、最良統合論理式テーブル３０２に記憶された対象とする文字が含まれているかを識別する。例えば、検証部４０６は、文字認識結果に「イ」が含まれているかを識別する。そして、検証部４０６は、含まれていると識別すると、文字認識結果に含まれていた対象とする文字を検索キーとして、最良統合論理式テーブル３０２から最良統合論理式を読み出し、読み出した最良統合論理式を用いて検証を行う。 The verification unit 406 identifies whether the target character stored in the best integrated logical expression table 302 is included in the character recognition result by the character recognition unit 405. For example, the verification unit 406 identifies whether “i” is included in the character recognition result. If the verification unit 406 identifies that it is included, the verification unit 406 reads the best integration logical expression from the best integration logical expression table 302 using the target character included in the character recognition result as a search key, and reads the best integration logical read. Verification is performed using a logical expression.

検証部４０６による検証処理について、更に詳細に説明する。検証部４０６は、最良統合論理式テーブル３０２から読み出した最良統合論理式が、文字認識結果として得られた対象とする文字に関する属性値を満たすかを判定する。そして、検証部４０６は、満たすと判定した場合には、正しい可能性が高いと判定し、満たさないと判定した場合には、誤っている可能性が高いと判定する。 The verification process by the verification unit 406 will be described in more detail. The verification unit 406 determines whether the best integrated logical expression read from the best integrated logical expression table 302 satisfies the attribute value related to the target character obtained as the character recognition result. Then, the verification unit 406 determines that the possibility of being correct is high when it is determined that it is satisfied, and determines that the possibility of being incorrect is high when it is determined that it is not satisfied.

ここで、検証部４０６は、対象とする文字に関する属性値として、例えば、文字認識部４０５が文字認識処理の過程において算出した属性値を使用し、あるいは、対象とする文字に関する属性値を自ら算出した上で使用する。以下に、検証部４０６が属性値を算出する手法の一例について簡単に説明する。なお、検証部４０６が属性値を算出する手法については以下に説明する手法に限定されるものではなく、その他の公知の手法や利用者が任意に設定した手法を用いて良い。 Here, the verification unit 406 uses, for example, the attribute value calculated by the character recognition unit 405 during the character recognition process as the attribute value related to the target character, or calculates the attribute value related to the target character by itself. To use. Hereinafter, an example of a method for the verification unit 406 to calculate the attribute value will be briefly described. Note that the method by which the verification unit 406 calculates the attribute value is not limited to the method described below, and other known methods or methods arbitrarily set by the user may be used.

例えば、「座標」や「行座標」を算出する場合には、検証部４０６は、文字画像における文字や行の座標を識別する。例えば、検証部４０６は、文字の左上の点を示す座標と文字の右下の点を示す座標とを識別し、行の左上の点を示す座標と行の右下の点を示す座標とを識別する。 For example, when calculating “coordinates” and “line coordinates”, the verification unit 406 identifies the coordinates of characters and lines in the character image. For example, the verification unit 406 identifies the coordinates indicating the upper left point of the character and the coordinates indicating the lower right point of the character, and the coordinates indicating the upper left point of the row and the coordinates indicating the lower right point of the row. Identify.

また、例えば、「距離値」を算出する場合には、検証部４０６は、文字画像に含まれる文字について特徴量を算出する。また、検証部４０６は、辞書テーブル内にある特徴量のうち、文字認識処理の結果となった文字についての特徴量を取得する。そして、検証部４０６は、算出した特徴量と取得した特徴量との距離を算出することで、距離値を算出する。 For example, when calculating the “distance value”, the verification unit 406 calculates a feature amount for the character included in the character image. In addition, the verification unit 406 acquires a feature amount for a character that is a result of the character recognition process among the feature amounts in the dictionary table. Then, the verification unit 406 calculates a distance value by calculating a distance between the calculated feature value and the acquired feature value.

また、例えば、「正読確率」を算出する場合には、検証部４０６は、文字画像に含まれる文字について特徴量を算出する。また、検証部４０６は、辞書テーブル内にある特徴量のうち、算出した特徴量から１番目に近い特徴量と、算出した特徴量から２番目に近い特徴量とを取得する。なお、ここで、算出した特徴量から１番目に近い特徴量は、文字認識処理の結果となった文字についての特徴量になる。そして、検証部４０６は、算出した特徴量と、算出した特徴量から１番目に近い特徴量との間の距離値「ｄ１」を算出する。また、検証部４０６は、算出した特徴量と、算出した特徴量から２番目に近い特徴量との間の距離値「ｄ２」を算出する。そして、検証部４０６は、距離値「ｄ１」が距離値「ｄ２」と比べて小さければ小さいほど高い値を算出し、距離値「ｄ１」が距離値「ｄ２」と比べて大きければ大きいほど小さい値を算出することで、正読確率を算出する。つまり、距離値「ｄ１」が距離値「ｄ２」と比較して小さければ小さいほど、文字画像に含まれる文字について特徴量が、文字認識処理の結果となった文字以外の文字とは離れていることを示すので、正読確率が高くなる。 Further, for example, when calculating “correct reading probability”, the verification unit 406 calculates a feature amount for a character included in a character image. Also, the verification unit 406 acquires a feature quantity that is the first closest to the calculated feature quantity and a feature quantity that is the second closest to the calculated feature quantity among the feature quantities in the dictionary table. Here, the feature quantity closest to the calculated feature quantity is the feature quantity for the character that is the result of the character recognition process. Then, the verification unit 406 calculates a distance value “d1” between the calculated feature value and the feature value closest to the calculated feature value. Further, the verification unit 406 calculates a distance value “d2” between the calculated feature value and the feature value second closest to the calculated feature value. Then, the verification unit 406 calculates a higher value as the distance value “d1” is smaller than the distance value “d2”, and smaller as the distance value “d1” is larger than the distance value “d2”. The correct reading probability is calculated by calculating the value. That is, as the distance value “d1” is smaller than the distance value “d2”, the feature amount of the character included in the character image is farther from characters other than the character that is the result of the character recognition process. This indicates that the probability of correct reading increases.

また、例えば、「バイグラム確率」を算出する場合には、検証部４０６は、「頻度比率ｐ」と「定数Ｃ」とを用いて、「log（p）＊C」を計算することで算出する。また、例えば、「形態素」を算出する場合には、検証部４０６は、文字認識処理の結果得られた文字列に対して形態素解析を実行することで文章を意味のある単語に区切り、辞書を利用して品詞を判別することで算出する。 Further, for example, when calculating the “bigram probability”, the verification unit 406 calculates “log (p) * C” using “frequency ratio p” and “constant C”. . Further, for example, when calculating “morpheme”, the verification unit 406 performs morpheme analysis on the character string obtained as a result of the character recognition process, thereby dividing the sentence into meaningful words, and It is calculated by discriminating the part of speech using it.

また、検証部４０６は、文字認識結果を表示部２０２から表示する。また、検証部４０６は、誤っている可能性が高いとの検証結果が得られた文字について、他の文字とは異なる様態にて表示部２０２から表示する。例えば、検証部４０６は、文字認識結果として得られた文字のうち、誤っている可能性が高いと判定した文字について、他の認識文字とは違う色を用いて表示する。 The verification unit 406 displays the character recognition result from the display unit 202. Further, the verification unit 406 displays characters from the display unit 202 in a manner different from other characters for characters for which a verification result indicating that there is a high possibility of being incorrect is obtained. For example, the verification unit 406 displays a character that is determined to have a high possibility of being wrong among characters obtained as a result of character recognition using a color different from that of other recognized characters.

なお、検証装置２００は、既知のパーソナルコンピュータ、ワークステーション、サーバ、携帯電話、ＰＨＳ（Personal Handyphone System）端末、移動体通信端末又はＰＤＡ（Personal Digital Assistant）などの情報処理装置を利用して実現しても良い。例えば、ＰＤＡなどの情報処理装置に、図２に示した学習用データテーブル３０１と、最良統合論理式テーブル３０２との各機能を搭載する。また、ＰＤＡなどの情報処理装置に、受付制御部４０１と、学習用データ取得部４０２と、最良論理式作成部４０３と、最良統合論理式作成部４０４と、文字認識部４０５と、検証部４０６との各機能を搭載することによって実現しても良い。 The verification apparatus 200 is realized by using an information processing apparatus such as a known personal computer, workstation, server, mobile phone, PHS (Personal Handyphone System) terminal, mobile communication terminal, or PDA (Personal Digital Assistant). May be. For example, each function of the learning data table 301 and the best integrated logical expression table 302 shown in FIG. 2 is installed in an information processing apparatus such as a PDA. In addition, an information processing apparatus such as a PDA includes a reception control unit 401, a learning data acquisition unit 402, a best logical expression creation unit 403, a best integrated logical expression creation unit 404, a character recognition unit 405, and a verification unit 406. It may be realized by installing each function.

例えば、サーバとしての検証装置２００は、条件や類似文字の組み合わせをクライアントから受け付けると、最良統合論理式を作成した上で、クライアントに返信しても良い。またサーバとしての検証装置２００は、クライアントから文字認識結果を受け付けると、検証式を用いた検証を実行した上で、検証結果をクライアントに返信しても良い。 For example, when the verification apparatus 200 as a server receives a combination of conditions and similar characters from a client, the verification apparatus 200 may create a best integrated logical expression and return it to the client. When the verification apparatus 200 as a server receives a character recognition result from the client, the verification apparatus 200 may perform verification using a verification formula and then return the verification result to the client.

［実施例２に係る検証装置による処理］
次に、実施例２に係る検証装置２００による処理について説明する。以下では、特に言及しない限り、対象とする文字が「イ」であり、と誤認識しやすい文字が「ィ」である場合を例に説明する。 [Processing by Verification Device According to Second Embodiment]
Next, processing by the verification apparatus 200 according to the second embodiment will be described. In the following, a case where the target character is “I” and a character that is easily misrecognized as “I” will be described as an example unless otherwise specified.

［最良統合論理式作成処理］
まず、図１０を用いて、実施例２における最良統合論理式作成処理の流れの一例について説明する。図１０は、実施例２における最良統合論理式作成処理の流れの一例について説明するフローチャートである。 [Best integration formula creation process]
First, an example of the flow of the best integrated logical expression creation process in the second embodiment will be described with reference to FIG. FIG. 10 is a flowchart illustrating an example of the flow of best integrated logical expression creation processing according to the second embodiment.

図１０に示すように、受付制御部４０１が、類似文字の組み合わせと条件とを受け付けると（ステップＳ１０１肯定）、学習用データ取得部４０２は、類似文字の組み合わせに関係する事例データを学習用データテーブル３０１から取得する（ステップＳ１０２）。例えば、学習用データ取得部４０２は、正解が「イ」か「ィ」であり、認識結果が「イ」か「ィ」である事例データ各々を取得する。 As illustrated in FIG. 10, when the reception control unit 401 receives a combination of similar characters and a condition (Yes in step S101), the learning data acquisition unit 402 displays case data related to the combination of similar characters as learning data. Obtained from the table 301 (step S102). For example, the learning data acquisition unit 402 acquires each case data whose correct answer is “I” or “I” and whose recognition result is “I” or “I”.

そして、最良論理式作成部４０３は、対象とする文字が正解となる事例データごとに、最良論理式を作成する（ステップＳ１０３）。なお、最良論理式を作成する処理の流れの詳細な一例については、図１１を用いて後述するため、ここでは説明を省略する。 The best logical expression creation unit 403 creates the best logical expression for each case data in which the target character is correct (step S103). Note that a detailed example of the process flow for creating the best logical expression will be described later with reference to FIG.

そして、最良統合論理式作成部４０４は、対象とする文字が正解となる事例データごとに作成された最良論理式のうち、最も評価値が高い最良論理式を選択する（ステップＳ１０４）。そして、最良統合論理式作成部４０４は、被覆チェックを行う（ステップＳ１０５）。具体的には、最良統合論理式作成部４０４は、選択した最良論理式によって説明されない（被覆されない）事例データがあるかを判定する。 Then, the best integrated logical expression creation unit 404 selects the best logical expression having the highest evaluation value from the best logical expressions created for each case data in which the target character is correct (step S104). Then, the best integrated logical expression creation unit 404 performs a cover check (step S105). Specifically, the best integrated logical expression creating unit 404 determines whether there is case data that is not explained (not covered) by the selected best logical expression.

ここで、最良統合論理式作成部４０４は、選択した最良論理式によって説明されない（被覆されない）事例データがあると判定した場合には（ステップＳ１０６肯定）、被覆されなかった事例データについての最良論理式をｏｒ条件で統合する（ステップＳ１０７）。つまり、最良統合論理式作成部４０４は、被覆されなかった事例データを選択し、選択した事例データについての最良論理式と、上述のステップＳ１０５にて被覆チェックの対象とした最良論理式とをｏｒ条件にて結合する。 Here, when the best integrated logical expression creation unit 404 determines that there is case data that is not explained (not covered) by the selected best logical expression (Yes in step S106), the best logic for the case data that has not been covered is determined. The expressions are integrated under the or condition (step S107). That is, the best integrated logical expression creation unit 404 selects the case data that has not been covered, and obtains the best logical expression for the selected case data and the best logical expression that has been subjected to the covering check in step S105 described above. Combine with conditions.

そして、最良統合論理式作成部４０４は、ｏｒ条件にて結合することで作成した論理式に対して一般化処理を実行する（ステップＳ１０８）。この結果、最良統合論理式作成部４０４は、一般化処理を実行することで、複数の論理式を作成する。なお、一般化処理の流れの詳細な一例については、図１３を用いて説明するため、ここでは説明を省略する。 Then, the best integrated logical expression creation unit 404 executes a generalization process on the logical expression created by combining with the or condition (step S108). As a result, the best integrated logical expression creation unit 404 creates a plurality of logical expressions by executing generalization processing. A detailed example of the flow of the generalization process will be described with reference to FIG.

そして、最良統合論理式作成部４０４は、評価値の最も高い論理式を選択する（ステップＳ１０９）。すなわち、最良統合論理式作成部４０４は、一般化処理を実行することで作成した複数の論理式のうち、評価値の最も高い論理式を選択する。なお、評価値の最も高い論理式を選択する処理の詳細な流れの一例については、図１２を用いて後述するため、説明を省略する。 Then, the best integrated logical expression creation unit 404 selects the logical expression having the highest evaluation value (step S109). That is, the best integrated logical expression creation unit 404 selects a logical expression having the highest evaluation value from among a plurality of logical expressions created by executing the generalization process. An example of a detailed flow of processing for selecting a logical expression having the highest evaluation value will be described later with reference to FIG.

そして、最良統合論理式作成部４０４は、選択した論理式について、被覆チェックを行う（ステップＳ１０５）。また、最良統合論理式作成部４０４は、選択した最良論理式によって説明されない（被覆されない）事例データがないと判定するまで（ステップＳ１０６否定）、上述のステップＳ１０５〜ステップＳ１０８を繰り返す。 Then, the best integrated logical expression creation unit 404 performs a covering check on the selected logical expression (step S105). The best integrated logical expression creation unit 404 repeats the above steps S105 to S108 until it determines that there is no case data that is not explained (not covered) by the selected best logical expression (No at step S106).

一方、上述のステップＳ１０６において、最良統合論理式作成部４０４が、選択した最良論理式によって説明されない（被覆されない）事例データがないと判定した場合について説明する（ステップＳ１０６否定）。この場合、最良統合論理式作成部４０４は、被覆チェックの対象となった論理式を「最良統合論理式」として選択し、対象とする文字と対応付けて最良統合論理式テーブル３０２に格納する（ステップＳ１１０）。 On the other hand, a case will be described where, in step S106 described above, the best integrated logical expression creation unit 404 determines that there is no case data that is not explained (not covered) by the selected best logical expression (No in step S106). In this case, the best integrated logical expression creation unit 404 selects the logical expression that is the target of the coverage check as the “best integrated logical expression” and stores it in the best integrated logical expression table 302 in association with the target character ( Step S110).

［最良論理式作成処理］
次に、図１１を用いて、実施例２における最良論理式作成処理の流れの一例について説明する。図１１は、実施例２における最良論理式作成処理の流れの一例について説明するフローチャートである。なお、図１１を用いて説明する処理の流れは、図１０におけるステップＳ１０３に対応する。 [Best formula creation process]
Next, an example of the flow of the best logical expression creation process in the second embodiment will be described with reference to FIG. FIG. 11 is a flowchart illustrating an example of the flow of the best logical expression creation process in the second embodiment. Note that the processing flow described with reference to FIG. 11 corresponds to step S103 in FIG.

図１１に示すように、最良論理式作成部４０３は、対象とする文字が正解となる事例データを１つ選択する（ステップＳ２０１）。例えば、最良論理式作成部４０３は、「イ」が正解となる事例データを１つ選択する。そして、最良論理式作成部４０３は、選択した事例データを満たす論理式を作成する（ステップＳ２０２）。例えば、最良論理式作成部４０３は、利用者によって入力された条件それぞれについて、選択した属性値を用いて等式又は不等式を作成し、条件それぞれについて作成したて等式又は不等式をａｎｄ条件にて結合することで、論理式を作成する。 As shown in FIG. 11, the best logical expression creation unit 403 selects one case data in which the target character is correct (step S201). For example, the best logical expression creation unit 403 selects one case data in which “I” is correct. Then, the best logical expression creating unit 403 creates a logical expression that satisfies the selected case data (step S202). For example, the best logical expression creation unit 403 creates an equality or inequality for each condition input by the user using the selected attribute value, and creates the new equality or inequality for each condition under the and condition. Create logical expressions by combining them.

そして、最良論理式作成部４０３は、ａｎｄ条件にて結合することで作成した論理式に対して一般化処理を実行する（ステップＳ２０３）。この結果、最良論理式作成部４０３は、一般化処理を実行することで、複数の論理式を作成する。なお、一般化処理の流れの詳細な一例については、図１３を用いて説明するため、ここでは説明を省略する。 Then, the best logical expression creation unit 403 performs generalization processing on the logical expressions created by combining with the AND condition (step S203). As a result, the best logical expression creation unit 403 creates a plurality of logical expressions by executing generalization processing. A detailed example of the flow of the generalization process will be described with reference to FIG.

そして、最良論理式作成部４０３は、評価値の最も高い論理式を選択する（ステップＳ２０４）。すなわち、最良論理式作成部４０３は、一般化処理を実行することで作成した複数の論理式のうち、評価値の最も高い論理式を選択する。ここで、最良論理式作成部４０３によって選択された論理式が「最良論理式」になる。なお、評価値の最も高い論理式を選択する処理の詳細な流れの一例については、図１２を用いて後述するため、説明を省略する。 Then, the best logical expression creating unit 403 selects a logical expression having the highest evaluation value (step S204). That is, the best logical expression creation unit 403 selects a logical expression having the highest evaluation value from among a plurality of logical expressions created by executing the generalization process. Here, the logical expression selected by the best logical expression creation unit 403 becomes the “best logical expression”. An example of a detailed flow of processing for selecting a logical expression having the highest evaluation value will be described later with reference to FIG.

その後、最良論理式作成部４０３は、未処理の事例データがあるかを判定する（ステップＳ２０５）。つまり、最良論理式作成部４０３は、対象とする文字が正解となる事例データすべてについて、最良論理式を作成したかを判定する。ここで、最良論理式作成部４０３は、未処理の事例データがあると判定すると（ステップＳ２０５肯定）、上述のステップＳ２０１に戻り、処理を繰り返す。一方、最良論理式作成部４０３は、未処理の事例データがないと判定すると（ステップＳ２０５否定）、最良論理式作成処理を終了し、図１０の処理におけるステップＳ１０４を開始する。 Thereafter, the best logical expression creation unit 403 determines whether there is unprocessed case data (step S205). That is, the best logical expression creation unit 403 determines whether the best logical expression has been created for all case data in which the target character is correct. If the best logical expression creation unit 403 determines that there is unprocessed case data (Yes at step S205), the process returns to step S201 described above and repeats the process. On the other hand, when determining that there is no unprocessed case data (No at Step S205), the best logical expression creating unit 403 ends the best logical expression creating process and starts Step S104 in the process of FIG.

この結果、最良論理式作成部４０３による最良論理式作成処理が終了した時点において、対象とする文字が正解となる事例データごとに、最良論理式が作成されたことになる。 As a result, when the best logical expression creating process by the best logical expression creating unit 403 is completed, the best logical expression is created for each case data in which the target character is correct.

［一般化処理］
次に、図１２を用いて、実施例２における一般化処理の流れの一例について説明する。図１２は、実施例２における一般化処理の流れの一例について説明するフローチャートである。なお、図１２を用いて説明する処理の流れは、図１０のステップＳ１０８や図１１のステップＳ２０３に対応する。 [Generalization processing]
Next, an example of the flow of generalization processing in the second embodiment will be described with reference to FIG. FIG. 12 is a flowchart for explaining an example of the flow of generalization processing in the second embodiment. Note that the processing flow described with reference to FIG. 12 corresponds to step S108 in FIG. 10 and step S203 in FIG.

なお、以下では、図１１のステップＳ２０３において、最良論理式作成部４０３が一般化処理を実行する場合を例に説明する。また、以下では、説明の便宜上、一般化処理の対象となる論理式を形成する構成要素数が、つまり、図１１のステップＳ２０２にて作成された論理式の構成要素数が「ｎ」である場合を例に説明する。 In the following, a case where the best logical expression creation unit 403 executes the generalization process in step S203 of FIG. 11 will be described as an example. In the following, for convenience of explanation, the number of components forming the logical expression to be generalized, that is, the number of components of the logical expression created in step S202 of FIG. 11 is “n”. A case will be described as an example.

図１２に示すように、最良論理式作成部４０３は、パラメータ「ｉ」に「１」を設定し（ステップＳ３０１）、一般化処理の対象となる論理式を形成する構成要素を１つ減らした論理式各々を作成する（ステップＳ３０２）。ここで、ステップＳ３０２にて作成される論理式各々の構成要素数は、「ｎ−１」になる。例えば、構成要素数が「７」である場合には、最良論理式作成部４０３は、７個ある等式又は不等式のうちいずれか１つを減らすことで、構成要素が１つ減って「６」個になった「７」個の論理式を作成する。 As shown in FIG. 12, the best logical expression creation unit 403 sets “1” for the parameter “i” (step S301), and reduces the number of components forming the logical expression to be generalized by one. Each logical expression is created (step S302). Here, the number of components of each logical expression created in step S302 is “n−1”. For example, when the number of components is “7”, the best logical expression creation unit 403 reduces the number of components by one by reducing any one of the seven equalities or inequalities to “6”. “7” logical expressions are created.

そして、最良論理式作成部４０３は、作成した論理式各々のうち、評価値が高い上位「ｋ」個に入る論理式を選択する（ステップＳ３０３）。例えば、最良論理式作成部４０３は、ステップＳ３０２にて作成した「７」個の論理式のうち、評価値が高い上位「３」個の論理式を選択する。 Then, the best logical expression creation unit 403 selects, from each of the created logical expressions, logical expressions that fall in the upper “k” having the highest evaluation value (step S303). For example, the best logical expression creation unit 403 selects the top “3” logical expressions having the highest evaluation value from the “7” logical expressions created in step S302.

そして、最良論理式作成部４０３は、パラメータ「ｉ」に「１」加算する（ステップＳ３０４）。例えば、パラメータ「ｉ」が「１」であった場合を例に説明すると、最良論理式作成部４０３は、パラメータ「１」に「１」加算して「２」にする。 Then, the best logical expression creation unit 403 adds “1” to the parameter “i” (step S304). For example, the case where the parameter “i” is “1” will be described as an example. The best logical expression creation unit 403 adds “1” to the parameter “1” to obtain “2”.

そして、最良論理式作成部４０３は、パラメータ「ｉ」と構成要素数「ｎ」とを比較し、パラメータ「ｉ」が構成要素数「ｎ」より小さいかを判定する（ステップＳ３０５）。例えば、パラメータが「２」であり、構成要素数が「７」である場合には、最良論理式作成部４０３は、小さいと判定する。また、パラメータが「７」であり、構成要素数が「７」であれば、最良論理式作成部４０３は、小さくないと判定する。 The best logical expression creation unit 403 compares the parameter “i” with the number of components “n” and determines whether the parameter “i” is smaller than the number of components “n” (step S305). For example, when the parameter is “2” and the number of components is “7”, the best logical expression creation unit 403 determines that the number is small. If the parameter is “7” and the number of components is “7”, the best logical expression creation unit 403 determines that the parameter is not small.

ここで、最良論理式作成部４０３は、パラメータ「ｉ」が構成要素数「ｎ」より小さいと判定する場合には（ステップＳ３０５肯定）、評価値が高い上位「ｋ」個に入る論理式として選択した論理式それぞれについて、論理式を形成する構成要素を１つ減らした論理式各々を作成する（ステップＳ３０６）。例えば、上述のステップＳ３０３にて「３」個の論理式を選択し、「３」個の論理式の構成要素数がそれぞれ「６」個である場合を例に説明する。この場合、最良論理式作成部４０３は、「３」個の論理式それぞれについて、６個ある等式又は不等式のうちいずれか１つを減らすことで、構成要素が１つ減って「５」個になった「６」個の論理式を作成する。つまり、最良論理式作成部４０３は、「６×３＝１８」個の論理式を作成する。 Here, when the best logical expression creation unit 403 determines that the parameter “i” is smaller than the number of components “n” (Yes in Step S305), For each selected logical expression, each logical expression in which the number of components forming the logical expression is reduced by one is created (step S306). For example, a case will be described as an example where “3” logical expressions are selected in step S303 described above, and the number of components of “3” logical expressions is “6”. In this case, the best logical expression creation unit 403 reduces the number of constituent elements by one by reducing any one of six equalities or inequalities for each of “3” logical expressions, thereby reducing “5”. Create “6” logical expressions. That is, the best logical expression creation unit 403 creates “6 × 3 = 18” logical expressions.

そして、最良論理式作成部４０３は、作成した論理式について、評価値が高い上位「ｋ」個に入る論理式を選択する（ステップＳ３０３）。つまり、例えば、最良論理式作成部４０３は、作成した「１８」個の論理式のうち、評価値が高い上位「３」個の論理式を選択する。そして、最良論理式作成部４０３は、上述のステップＳ３０５において、パラメータ「ｉ」が構成要素数「ｎ」より小さくないと判定されるまで処理を繰り返す（ステップＳ３０３〜Ｓ３０６）。 Then, the best logical expression creation unit 403 selects a logical expression that falls within the top “k” having the highest evaluation value for the created logical expression (step S303). That is, for example, the best logical expression creating unit 403 selects the top “3” logical expressions having the highest evaluation value from among the “18” logical expressions created. The best logical expression creation unit 403 repeats the process until it is determined in step S305 described above that the parameter “i” is not smaller than the number of components “n” (steps S303 to S306).

一方、上述のステップＳ３０５において、パラメータ「ｉ」が構成要素数「ｎ」より小さくないと判定した場合には（ステップＳ３０５否定）、一般化処理を終了する。 On the other hand, if it is determined in step S305 described above that the parameter “i” is not smaller than the number of components “n” (No in step S305), the generalization process ends.

［評価値が最も高い論理式を選択する処理］
次に、図１３を用いて、実施例２における最良評価値が最も高い論理式を選択する処理の流れの一例について説明する。図１３は、実施例２における評価値が最も高い論理式を選択する処理の流れの一例について説明するフローチャートである。なお、図１３を用いて説明する処理の流れは、図１０のステップＳ１０９や図１１のステップＳ２０４に対応する。なお、以下では、図１１のステップＳ２０４において、最良論理式作成部４０３が最良論理式を選択する場合を例に説明する。 [Process to select the logical expression with the highest evaluation value]
Next, an example of a processing flow for selecting a logical expression having the highest best evaluation value in the second embodiment will be described with reference to FIG. FIG. 13 is a flowchart illustrating an example of a processing flow for selecting a logical expression having the highest evaluation value in the second embodiment. Note that the processing flow described with reference to FIG. 13 corresponds to step S109 in FIG. 10 and step S204 in FIG. In the following, a case where the best logical expression creation unit 403 selects the best logical expression in step S204 of FIG. 11 will be described as an example.

図１３に示すように、最良論理式作成部４０３は、論理式それぞれについて、正事例数、負事例数、構成要素数を識別する（ステップＳ４０１）。そして、最良論理式作成部４０３は、負事例が「０」の論理式があるかを判定する（ステップＳ４０２）。ここで、最良論理式作成部４０３は、負事例が「０」の論理式があると判定した場合には（ステップＳ４０２肯定）、負事例が「０」の論理式について、（正事例数／構成要素数）を論理式ごとに算出する（ステップＳ４０３）。そして、最良論理式作成部４０３は、（正事例数／構成要素数）が最も高い論理式を最良論理式として選択する（ステップＳ４０４）。 As shown in FIG. 13, the best logical expression creation unit 403 identifies the number of positive cases, the number of negative cases, and the number of components for each logical expression (step S401). Then, the best logical expression creation unit 403 determines whether there is a logical expression having a negative case of “0” (step S402). Here, when the best logical expression creating unit 403 determines that there is a logical expression having a negative case of “0” (Yes in step S402), the logical expression having a negative case of “0” is calculated as (number of positive cases / (Number of components) is calculated for each logical expression (step S403). Then, the best logical expression creation unit 403 selects the logical expression having the highest (number of positive cases / number of components) as the best logical expression (step S404).

一方、最良論理式作成部４０３は、負事例が「０」の論理式がないと判定した場合には（ステップＳ４０２否定）、（正事例数×構成要素数）／（負事例数）を論理式ごとに算出する（ステップＳ４０５）。そして、最良論理式作成部４０３は、（正事例数×構成要素数）／（負事例数）が最も高い論理式を最良論理式として選択する（ステップＳ４０６）。 On the other hand, when it is determined that there is no logical expression whose negative example is “0” (No in step S402), the best logical expression creating unit 403 calculates (positive case number × component number) / (negative case number) as a logical expression. It calculates for every formula (step S405). The best logical expression creation unit 403 selects the logical expression having the highest (number of positive cases × number of components) / (number of negative cases) as the best logical expression (step S406).

［検証処理］
次に、図１４を用いて、実施例２における検証処理の流れの一例について説明する。図１４は、実施例２における検証処理の流れの一例について説明するフローチャートである。 [Verification processing]
Next, an example of the flow of verification processing according to the second embodiment will be described with reference to FIG. FIG. 14 is a flowchart illustrating an example of the flow of verification processing according to the second embodiment.

図１４に示すように、文字認識部４０５は、文字画像を受け付けると（ステップＳ５０１肯定）、文字認識処理を実行する（ステップＳ５０２）。具体的には、文字認識部４０５は、文字画像のうち、文字が含まれている部分から特徴量を算出し、算出した特徴量との類似度が最も高い文字を辞書テーブルから読み出し、文字認識結果とする。 As illustrated in FIG. 14, when the character recognition unit 405 receives a character image (Yes in step S501), the character recognition unit 405 executes character recognition processing (step S502). Specifically, the character recognition unit 405 calculates a feature amount from a portion of the character image that includes the character, reads a character having the highest similarity with the calculated feature amount from the dictionary table, and performs character recognition. As a result.

そして、検証部４０６は、文字認識部４０５による文字認識結果に、最良統合論理式テーブル３０２に記憶された対象とする文字が含まれているかを判定する（ステップＳ５０３）。ここで、検証部４０６は、対象とする文字が含まれていないと判定した場合には（ステップＳ５０３否定）、文字認識結果をそのまま表示部２０２にて表示する（ステップＳ５０４）。 Then, the verification unit 406 determines whether the character recognition result by the character recognition unit 405 includes the target character stored in the best integrated logical expression table 302 (step S503). If the verification unit 406 determines that the target character is not included (No at step S503), the verification unit 406 displays the character recognition result on the display unit 202 as it is (step S504).

一方、文字認識部４０５による文字認識結果に、対象とする文字が含まれていると判定した場合について説明する（ステップＳ５０３肯定）。この場合、検証部４０６は、文字認識結果に含まれていた対象とする文字を検索キーとして、最良統合論理式テーブル３０２から最良統合論理式を読み出す（ステップＳ５０５）。例えば、文字認識結果に「イ」が含まれていた場合には、対象となる文字「イ」に対応付けられた最良統合論理式を読み出す。そして、検証部４０６は、読み出した最良統合論理式を用いて検証処理を実行する（ステップＳ５０６）。 On the other hand, the case where it is determined that the target character is included in the character recognition result by the character recognition unit 405 will be described (Yes in step S503). In this case, the verification unit 406 reads the best integrated logical expression from the best integrated logical expression table 302 using the target character included in the character recognition result as a search key (step S505). For example, when “I” is included in the character recognition result, the best integrated logical expression associated with the target character “I” is read. Then, the verification unit 406 executes verification processing using the read best integrated logical expression (step S506).

そして、検証部４０６は、誤っている可能性が高いとの検証結果が得られた文字について、他の文字とは異なる様態にて表示部２０２から表示する（ステップＳ５０７）。例えば、検証部４０６は、文字認識結果として得られた文字のうち、誤っている可能性が高いと判定した文字について、他の認識文字とは違う色を用いて表示する。 Then, the verification unit 406 displays, from the display unit 202, a character for which a verification result indicating that there is a high possibility of being wrong is obtained in a manner different from other characters (step S507). For example, the verification unit 406 displays a character that is determined to have a high possibility of being wrong among characters obtained as a result of character recognition using a color different from that of other recognized characters.

［実施例２の効果］
上述のように、実施例２によれば、検証装置２００は、条件や属性値を用いて検証式を作成する。また、検証装置２００は、文字画像が入力されると、入力された文字画像に対して文字認識処理を実行する。そして、検証装置２００は、文字認識処理の結果に対象となる文字が含まれているかを識別し、含まれていると識別した場合に、検証式を用いて検証を行う。この結果、実施例２によれば、文字認識結果として得られた文字が他に形状が類似した文字がある類似文字であったとしても、文字認識結果として得られた文字が正しいかを精度良く検証可能である。 [Effect of Example 2]
As described above, according to the second embodiment, the verification apparatus 200 creates a verification formula using conditions and attribute values. In addition, when a character image is input, the verification apparatus 200 performs character recognition processing on the input character image. Then, the verification apparatus 200 identifies whether the target character is included in the result of the character recognition process, and performs verification using the verification formula when it is identified as included. As a result, according to the second embodiment, even if the character obtained as the character recognition result is a similar character having other similar characters, it is accurately determined whether the character obtained as the character recognition result is correct. It can be verified.

また、実施例２によれば、検証装置２００は、対象となる文字についての属性値ごとに、前記第２の文字と前記第１の文字とを区別する複数の条件の入力を受け付けて論理式を作成する。そして、検証装置２００は、対象となる文字についての属性値ごとに、作成した論理式を一般化することで複数の論理式を作成する。また、検証装置２００は、複数の論理式それぞれについて、対象となる文字についての属性値のうち論理式が満たしている属性値の数が多ければ多いほど高い値になり、第２の文字についての属性値のうち該論理式が満たしている属性値の数が少なければ少ないほど高い値になる評価値を算出する。そして、検証装置２００は、算出された評価値が最も高い論理式を選択する。ここで、検証装置２００が選択した論理式が最良論理式になる。また、検証装置２００は、属性値ごとに選択した論理式のうち、最も評価値の高い論理式を１つ選択する。そして、検証装置２００は、選択した論理式について、対象となる文字についての属性値すべてを満たしているかを判定し、満たしていると判定した場合には、選択した論理式を最良統合論理式として決定する。また、検証装置２００は、満たしていないと判定した場合には、満たしていなかった他の属性値について選択された論理式と統合した上で、最良統合論理式として決定する。この結果、文字認識結果として得られた文字が正しいかを精度良く検証可能である。 Further, according to the second embodiment, the verification apparatus 200 receives an input of a plurality of conditions for distinguishing the second character and the first character for each attribute value of the target character, and receives the logical expression Create Then, the verification apparatus 200 creates a plurality of logical expressions by generalizing the created logical expressions for each attribute value for the target character. In addition, the verification apparatus 200 increases the value for each of the plurality of logical expressions as the number of attribute values satisfied by the logical expression among the attribute values for the target character increases. An evaluation value that is higher as the number of attribute values that the logical expression satisfies among the attribute values is smaller is calculated. Then, the verification apparatus 200 selects a logical expression having the highest calculated evaluation value. Here, the logical expression selected by the verification device 200 is the best logical expression. Further, the verification apparatus 200 selects one logical expression having the highest evaluation value among the logical expressions selected for each attribute value. Then, the verification apparatus 200 determines whether or not the selected logical expression satisfies all the attribute values for the target character. If it is determined that the selected logical expression is satisfied, the selected logical expression is set as the best integrated logical expression. decide. If the verification apparatus 200 determines that the attribute is not satisfied, the verification apparatus 200 integrates the selected logical expression with respect to the other attribute values that are not satisfied, and determines the best integrated logical expression. As a result, it is possible to accurately verify whether the character obtained as the character recognition result is correct.

また、実施例２によれば、検証装置２００は、検証の結果、文字画像に含まれる文字に関する属性値が検証式を満たさなかった場合に、文字認識処理の結果として得られた他の文字とは異なる態様にて表示部２０２から出力するこの結果、誤認識の可能性の高い類似文字を利用者が簡単に把握することが可能である。 In addition, according to the second embodiment, the verification apparatus 200 determines that other characters obtained as a result of the character recognition process when the attribute value related to the character included in the character image does not satisfy the verification formula as a result of the verification. As a result of output from the display unit 202 in a different manner, the user can easily grasp similar characters that are likely to be erroneously recognized.

さて、これまで本発明の実施例について説明したが、本発明は上記した実施例以外にも、その他の実施例にて実施されても良い。そこで、以下では、その他の実施例について説明する。 Although the embodiments of the present invention have been described so far, the present invention may be implemented in other embodiments besides the above-described embodiments. Therefore, other embodiments will be described below.

［学習用データ取得部］
例えば、上述の実施例では、対象とする文字が「イ」であり、誤認識しやすい文字が「ィ」である場合には、学習用データ取得部４０２は、正解が「イ」か「ィ」であり、認識結果が「イ」か「ィ」である事例データ各々を取得する場合を例に説明した。しかし、本発明はこれに限定されるものではない。例えば、学習用データ取得部４０２は、認識結果に関係なく、正解が「イ」か「ィ」である事例データを取得しても良い。 [Learning data acquisition unit]
For example, in the above-described embodiment, when the target character is “I” and the easily misrecognized character is “I”, the learning data acquisition unit 402 determines whether the correct answer is “I” or “I”. ”And the case data whose recognition result is“ I ”or“ I ”is acquired as an example. However, the present invention is not limited to this. For example, the learning data acquisition unit 402 may acquire case data whose correct answer is “I” or “I” regardless of the recognition result.

［誤認識しやすい文字］
また、例えば、上述の実施例では、１つの対象とする文字に対応する誤認識しやすい文字は、１つである場合を例に説明した。しかし、本発明はこれに限定されるものではない。例えば、１つの対象とする文字に対応する誤認識しやすい文字が、２個以上あっても良い。 [Characters that are easily misrecognized]
Further, for example, in the above-described embodiment, the case where there is one character that is easily misrecognized corresponding to one target character has been described as an example. However, the present invention is not limited to this. For example, there may be two or more easily misrecognized characters corresponding to one target character.

［検証処理］
また、例えば、上述の実施例では、対象とする文字について作成された検証式を用いて検証する場合について説明したが、本発明はこれに限定されるものではない。例えば、検証装置は、文字認識処理の結果に「イ」が含まれていた場合に、「イ」について作成された検証式だけでなく、「ィ」について作成された検証式を用いて検証しても良い。そして、検証装置は、「イ」についての検証式を満たし、「ィ」についての検証式を満たさなかった場合に、文字認識処理の結果が正しいと検証しても良い。 [Verification processing]
Further, for example, in the above-described embodiment, the case where the verification is performed using the verification formula created for the target character has been described, but the present invention is not limited to this. For example, if “i” is included in the result of the character recognition process, the verification device verifies using not only the verification formula created for “i” but also the validation formula created for “i”. May be. The verification device may verify that the result of the character recognition process is correct when the verification formula for “I” is satisfied and the verification formula for “I” is not satisfied.

［システム構成］
また、本実施例において説明した各処理のうち、自動的に行われるものとして説明した処理の全部又は一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部又は一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。 [System configuration]
Also, among the processes described in this embodiment, all or part of the processes described as being performed automatically can be performed manually, or the processes described as being performed manually can be performed. All or a part can be automatically performed by a known method. In addition, the processing procedure, control procedure, specific name, and information including various data and parameters shown in the above-described document and drawings can be arbitrarily changed unless otherwise specified.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部又は一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的又は物理的に分散・統合して構成することができる。 Further, each component of each illustrated apparatus is functionally conceptual, and does not necessarily need to be physically configured as illustrated. That is, the specific form of distribution / integration of each device is not limited to the one shown in the figure, and all or a part of the distribution / integration may be functionally or physically distributed in arbitrary units according to various loads or usage conditions. Can be integrated and configured.

例えば、図２に示す例を用いて説明すると、最良統合論理式テーブル３０２と文字認識部４０５と検証部４０６とを有する第１の装置と、学習用データテーブル３０１と受付制御部４０１と学習用データ取得部４０２と最良論理式作成部４０３と最良統合論理式作成部４０４とを有する第２の装置とを別装置としても良い。 For example, to explain with reference to the example shown in FIG. 2, the first device having the best integrated logical expression table 302, the character recognition unit 405, and the verification unit 406, the learning data table 301, the reception control unit 401, and the learning The second device having the data acquisition unit 402, the best logical formula creation unit 403, and the best integrated logical formula creation unit 404 may be a separate device.

この場合、第二の装置は、最良統合論理式を作成すると、第一の装置に送信する。また、第一の装置は、第一の装置から最良統合論理式を受信すると、最良統合論理式テーブル３０２に格納する。また、第一の装置は、最良統合論理式を用いた検証処理を実行する。 In this case, when the second device creates the best integrated logical expression, it transmits it to the first device. Further, when the first device receives the best integrated logical expression from the first device, the first device stores it in the best integrated logical expression table 302. Further, the first device executes a verification process using the best integrated logical expression.

［コンピュータ］
また、上記の実施例で説明した各種の処理は、予め用意されたプログラムをパーソナルコンピュータやワークステーションなどのコンピュータで実行することによって実現することができる。そこで、以下では、図１５を用いて、上記の実施例と同様の機能を有する検証プログラムを実行するコンピュータの一例について説明する。なお、図１５は、実施例２に係る検証プログラムを実行するコンピュータの一例について説明する図である。 [Computer]
The various processes described in the above embodiments can be realized by executing a prepared program on a computer such as a personal computer or a workstation. In the following, an example of a computer that executes a verification program having the same function as that of the above-described embodiment will be described with reference to FIG. FIG. 15 is a schematic diagram illustrating an example of a computer that executes a verification program according to the second embodiment.

図１５に示すように、実施例２におけるコンピュータ３０００は、キーボード３００１、マイク３００２、スピーカ３００３、ディスプレイ３００４を有する。また、コンピュータ３０００は、更に、通信部３００６、ＣＰＵ３０１０、ＲＯＭ３０１１、ＨＤＤ（Hard Disk Drive）３０１２、ＲＡＭ（Random Access Memory）３０１３を有する。また、コンピュータ３０００は、各部をバス３００９などで接続している。 As illustrated in FIG. 15, the computer 3000 according to the second embodiment includes a keyboard 3001, a microphone 3002, a speaker 3003, and a display 3004. The computer 3000 further includes a communication unit 3006, a CPU 3010, a ROM 3011, an HDD (Hard Disk Drive) 3012, and a RAM (Random Access Memory) 3013. In the computer 3000, each unit is connected by a bus 3009 or the like.

ＲＯＭ３０１１には、図１５に示すように、受付制御プログラム３０１１ａと、学習用データ取得プログラム３０１１ｂと、最良論理式作成プログラム３０１１ｃとが予め記憶されている。また、ＲＯＭ３０１１には、更に、最良統合論理式作成プログラム３０１１ｄと、文字認識プログラム３０１１ｅと、検証プログラム３０１１ｆとが予め記憶されている。ここで、受付制御プログラム３０１１ａは、上記の実施例２で示した受付制御部４０１と同様の機能を発揮する制御プログラムである。学習用データ取得プログラム３０１１ｂは、学習用データ取得部４０２と同様の機能を発揮する制御プログラムである。また、最良論理式作成プログラム３０１１ｃは、最良論理式作成部４０３と同様の機能を発揮する制御プログラムである。最良統合論理式作成プログラム３０１１ｄは、最良統合論理式作成部４０４と同様の機能を発揮する制御プログラムである。文字認識プログラム３０１１ｅは、文字認識部４０５と最良統合論理式作成部４０４と同様の機能を発揮する制御プログラムである。検証プログラム３０１１ｆは、検証部４０６と最良統合論理式作成部４０４と同様の機能を発揮する制御プログラムである。なお、これらのプログラム３０１１ａ〜３０１１ｆについては、図２に示した検証装置２００の各構成要素と同様、適宜統合又は分離しても良い。 As shown in FIG. 15, the ROM 3011 stores in advance a reception control program 3011a, a learning data acquisition program 3011b, and a best logical expression creation program 3011c. The ROM 3011 further stores in advance a best integrated logical expression creation program 3011d, a character recognition program 3011e, and a verification program 3011f. Here, the reception control program 3011a is a control program that exhibits the same function as the reception control unit 401 shown in the second embodiment. The learning data acquisition program 3011b is a control program that exhibits the same function as the learning data acquisition unit 402. The best logical expression creation program 3011c is a control program that exhibits the same function as the best logical expression creation unit 403. The best integrated logical expression creation program 3011d is a control program that exhibits the same function as the best integrated logical expression creation unit 404. The character recognition program 3011e is a control program that exhibits the same functions as the character recognition unit 405 and the best integrated logical expression creation unit 404. The verification program 3011f is a control program that exhibits the same functions as the verification unit 406 and the best integrated logical expression creation unit 404. Note that these programs 3011a to 3011f may be integrated or separated as appropriate, similarly to each component of the verification apparatus 200 shown in FIG.

そして、ＣＰＵ３０１０が、これらのプログラム３０１１ａ〜３０１１ｆをＲＯＭ３０１１から読み出して実行することにより、図１５に示すように、各プログラム３０１１ａ〜３０１１ｆについては、受付制御プロセス３０１０ａと、学習用データ取得プロセス３０１０ｂと、最良論理式作成プロセス３０１０ｃと、最良統合論理式作成プロセス３０１０ｄと、文字認識プロセス３０１０ｅと、検証プロセス３０１０ｆとして機能するようになる。なお、各プロセス３０１０ａ〜３０１０ｆは、図２に示した、受付制御部４０１と、学習用データ取得部４０２と、最良論理式作成部４０３と、最良統合論理式作成部４０４と、文字認識部４０５と、検証部４０６とにそれぞれ対応する。 Then, the CPU 3010 reads out these programs 3011a to 3011f from the ROM 3011 and executes them, and as shown in FIG. 15, for each program 3011a to 3011f, an acceptance control process 3010a, a learning data acquisition process 3010b, It functions as the best logical expression creation process 3010c, the best integrated logical expression creation process 3010d, the character recognition process 3010e, and the verification process 3010f. Each of the processes 3010a to 3010f includes the reception control unit 401, the learning data acquisition unit 402, the best logical expression creation unit 403, the best integrated logical expression creation unit 404, and the character recognition unit 405 shown in FIG. And the verification unit 406 respectively.

そして、ＨＤＤ３０１２には、学習用データテーブル３０１２ａと、最良統合論理式テーブル３０１２ｂが設けられている。なお、各テーブル３０１２ａ〜３０１２ｂは、図２に示した、学習用データテーブル３０１と、最良統合論理式テーブル３０２とにそれぞれ対応する。 The HDD 3012 is provided with a learning data table 3012a and a best integrated logical expression table 3012b. Each table 3012a to 3012b corresponds to the learning data table 301 and the best integrated logical expression table 302 shown in FIG.

そして、ＣＰＵ３０１０は、学習用データテーブル３０１２ａと、最良統合論理式テーブル３０１２ｂとを読み出してＲＡＭ３０１３に格納し、ＲＡＭ３０１３に格納された学習用データデータ３０１３ａと、最良統合論理式データ３０１３ｂと、条件データ３０１３ｃと、最良論理式データ３０１３ｄとを用いて、検証プログラムを実行する。 Then, the CPU 3010 reads the learning data table 3012a and the best integrated logical expression table 3012b and stores them in the RAM 3013. The learning data data 3013a, the best integrated logical expression data 3013b, and the condition data 3013c stored in the RAM 3013 are read out. And the verification program is executed using the best logical expression data 3013d.

［その他］
なお、本実施例で説明した検証プログラムは、インターネットなどのネットワークを介して配布することができる。また、検証プログラムは、ハードディスク、フレキシブルディスク（ＦＤ）、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤなどのコンピュータで読み取り可能な記録媒体に記録され、コンピュータによって記録媒体から読み出されることによって実行することもできる。 [Others]
The verification program described in the present embodiment can be distributed via a network such as the Internet. The verification program can also be executed by being recorded on a computer-readable recording medium such as a hard disk, a flexible disk (FD), a CD-ROM, an MO, and a DVD, and being read from the recording medium by the computer.

１００検証装置
２００検証装置
２０１入力部
２０２表示部
３００記憶部
３０１学習用データテーブル
３０２最良統合論理式テーブル
４００制御部
４０１受付制御部
４０２学習用データ取得部
４０３最良論理式作成部
４０４最良統合論理式作成部
４０５文字認識部
４０６検証部 DESCRIPTION OF SYMBOLS 100 Verification apparatus 200 Verification apparatus 201 Input part 202 Display part 300 Storage part 301 Learning data table 302 Best integration logical expression table 400 Control part 401 Reception control part 402 Learning data acquisition part 403 Best logical expression creation part 404 Best integration logical expression Creation unit 405 Character recognition unit 406 Verification unit

Claims

Receiving an input of a plurality of conditions for distinguishing between the first character and the second character that may be obtained as a result of erroneous recognition in the character recognition process for the first character; An attribute value, information indicating the size of the character included in the character image of the character in the character image, information indicating the relationship between the character and other nearby characters, and the character A conditional expression creating unit that creates a conditional expression for each attribute value including a plurality of pieces of information including information indicating the probability of the result of the character recognition process;
For each conditional expression created by the conditional expression creating section, an exclusion conditional expression creating section that creates a plurality of exclusion conditional expressions excluding at least one of the conditions included in the conditional expression;
For each exclusion condition expression created for each conditional expression by the exclusion condition expression creation unit, the higher the number of attribute values that the exclusion condition expression satisfies among the attribute values for the first character, the higher the value The evaluation value that becomes higher as the number of attribute values that the exclusion conditional expression satisfies among the attribute values for the second character is smaller is calculated, and the calculated evaluation value is the highest. A selection unit that selects a high evaluation exclusion conditional expression that is a conditional expression for each conditional expression;
From the high evaluation exclusion conditional expressions selected for each conditional expression by the selection unit, one high evaluation conditional expression having the highest evaluation value is selected, and whether all the attribute values for the first character are satisfied If it is determined and satisfied that the result of character recognition processing of the character included in the character image is the first character, the high evaluation condition selected as the verification expression for verifying the correctness of the result When determining a formula as the verification formula and determining that the formula is not satisfied, a determination unit that determines the verification formula after being integrated with a high evaluation condition formula for other attribute values that are not satisfied,
Identifying whether the first character of the result of character recognition processing contained, if identified as containing, a verification unit for performing verification using the verification equation created by the determination unit A verification apparatus characterized by comprising.

If it is determined that incorrect results of verification by the verification unit, claims and further comprising a output unit for outputting the display unit in a manner different from the other characters in the character recognition process 1 The verification device described in 1.

Computer
Receiving an input of a plurality of conditions for distinguishing between the first character and the second character that may be obtained as a result of erroneous recognition in the character recognition process for the first character; An attribute value, information indicating the size of the character included in the character image of the character in the character image, information indicating the relationship between the character and other nearby characters, and the character A conditional expression creating step for creating a conditional expression for each attribute value including a plurality of pieces of information and information indicating the probability of the result of the character recognition process;
For each conditional expression created by the conditional expression creating step, an exclusion conditional expression creating step for creating a plurality of exclusion conditional expressions excluding at least one of the conditions included in the conditional expression;
For each exclusion conditional expression created for each conditional expression in the exclusion conditional expression creation step, the higher the number of attribute values that the exclusion conditional expression satisfies among the attribute values for the first character, the higher the value The evaluation value that becomes higher as the number of attribute values that the exclusion conditional expression satisfies among the attribute values for the second character is smaller is calculated, and the calculated evaluation value is the highest. A selection process for selecting a high evaluation exclusion conditional expression that is a conditional expression for each conditional expression,
From the high evaluation exclusion conditional expressions selected for each conditional expression in the selection step, one high evaluation conditional expression having the highest evaluation value is selected, and whether all attribute values for the first character are satisfied If it is determined and satisfied that the result of character recognition processing of the character included in the character image is the first character, the high evaluation condition selected as the verification expression for verifying the correctness of the result When determining a formula as the verification formula and determining that the formula is not satisfied, a determination step of determining the verification formula after integrating with a high evaluation condition formula for other attribute values that are not satisfied,
Identifying whether the first character of the result of character recognition processing contained, if identified as containing, a verification step of performing verification using the verification equation created by the determination step A verification method characterized by performing.

Receiving an input of a plurality of conditions for distinguishing between the first character and the second character that may be obtained as a result of erroneous recognition in the character recognition process for the first character; An attribute value, information indicating the size of the character included in the character image of the character in the character image, information indicating the relationship between the character and other nearby characters, and the character A conditional expression creation procedure for creating a conditional expression for each attribute value including a plurality of pieces of information including information indicating the probability of the result of the character recognition process;
For each conditional expression created by the conditional expression creating procedure, an exclusion conditional expression creating procedure for creating a plurality of exclusion conditional expressions excluding at least one of the conditions included in the conditional expression;
For each exclusion conditional expression created for each conditional expression by the exclusion conditional expression creation procedure, the higher the number of attribute values that the exclusion conditional expression satisfies among the attribute values for the first character, the higher the value The evaluation value that becomes higher as the number of attribute values that the exclusion conditional expression satisfies among the attribute values for the second character is smaller is calculated, and the calculated evaluation value is the highest. A selection procedure for selecting a high evaluation exclusion conditional expression that is a conditional expression for each conditional expression,
Of the high evaluation exclusion conditional expressions selected for each conditional expression by the selection procedure, one high evaluation conditional expression with the highest evaluation value is selected, and whether all the attribute values for the first character are satisfied If it is determined and satisfied that the result of character recognition processing of the character included in the character image is the first character, the high evaluation condition selected as the verification expression for verifying the correctness of the result When determining a formula as the verification formula and determining that the formula is not satisfied, a determination procedure for determining the verification formula after being integrated with a high evaluation condition formula for other attribute values that are not satisfied,
A verification procedure for identifying whether or not the first character is included in the result of the character recognition process, and performing verification using the verification formula created by the pre- decision procedure when the first character is identified. A verification program characterized in that it is executed.

Information indicating the size of the character contained in the character image in the character image, information indicating the relationship between the character and other nearby characters, and the probability of the result of character recognition processing for the character An attribute value storage unit that stores an attribute value including at least one of the information to be displayed in association with character information indicating a character included in the character image;
A receiving unit that receives a condition for distinguishing the second character from the first character that may be obtained as a result of erroneous recognition in the character recognition process for the first character;
A conditional expression creating unit that creates a conditional expression for each attribute value stored in the attribute value storage unit in association with character information indicating the first character;
For each conditional expression created by the conditional expression creating section, an exclusion conditional expression creating section that creates a plurality of exclusion conditional expressions excluding at least one of the conditions included in the conditional expression;
For each exclusion condition expression created for each conditional expression by the exclusion condition expression creation unit, the higher the number of attribute values that the exclusion condition expression satisfies among the attribute values for the first character, the higher the value And the higher the smaller the number of attribute values that the exclusion condition expression satisfies among the attribute values stored in the attribute value storage unit in association with the character information indicating the second character, the higher the value. A selection unit that calculates an evaluation value, and selects a high evaluation exclusion conditional expression that is an exclusion conditional expression with the highest calculated evaluation value for each conditional expression;
From the high evaluation exclusion conditional expressions selected for each conditional expression by the selection unit, one high evaluation conditional expression having the highest evaluation value is selected, and whether all the attribute values for the first character are satisfied If it is determined and satisfied that the result of character recognition processing of the character included in the character image is the first character, the high evaluation condition selected as the verification expression for verifying the correctness of the result When the formula is determined as the verification formula and it is determined that the formula is not satisfied, the determination section includes a determination unit that determines the verification formula after being integrated with a high evaluation condition formula for other attribute values that are not satisfied A creation device characterized by that.