JP7404625B2

JP7404625B2 - Information processing device and program

Info

Publication number: JP7404625B2
Application number: JP2019009325A
Authority: JP
Inventors: ベイリ任; 俊一木村
Original assignee: Fuji Xerox Co Ltd; Fujifilm Business Innovation Corp
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2019-01-23
Filing date: 2019-01-23
Publication date: 2023-12-26
Anticipated expiration: 2039-01-23
Also published as: JP2020119206A

Description

本発明は、情報処理装置、及びプログラムに関する。 The present invention relates to an information processing device and a program.

文字認識の精度を向上させるための技術が検討されている。特許文献１には、入力画像の文章領域を、本文行領域と行間領域とに分別し、行間領域の文字列である行間文字列を抽出し、行間文字列ごとに、本文行領域の文字列の中から、仮決定の親文字列である仮親文字列を決定し、本文行領域及び行間領域の文字を認識し、仮親文字列の認識結果を参照キーとして親文字に対するルビ候補を示すルビ辞書を参照することによって得られるルビ候補の少なくとも１つと、行間文字列の認識結果が一致するか否かの判定を行い、その結果に基づいて、仮親文字列、又は仮親文字列に含まれる一部の文字を除いた残りの文字列を本決定の親文字列として決定する文書認識装置が記載されている。 Techniques are being considered to improve the accuracy of character recognition. Patent Document 1 discloses that a text area of an input image is divided into a text line area and a line spacing area, a line spacing string that is a character string in the line spacing area is extracted, and a character string in the body line area is extracted for each line spacing string. A ruby dictionary that determines a tentative parent character string from among them, recognizes characters in the text line area and interline area, and shows ruby candidates for the parent character using the recognition result of the temporary parent string as a reference key. It is determined whether the recognition result of the line spacing character string matches at least one of the ruby candidates obtained by referring to A document recognition device is described that determines the remaining character string after excluding the characters , as the final parent character string.

特許文献２には、第１の文字列であるイメージデータを文字認識して文字列コードに変換し、第１の文字列と読みが同じで文字種が異なる第２の文字列であるイメージデータを文字認識して文字列コードに変換し、文字認識された第１の文字列を第２の文字列と同じ文字種の文字列に変換し、変換された第１の文字列と文字認識された第２の文字列とを比較して、第１の文字列と第２の文字列とが異なる場合に第２の文字列を第１の文字列に基づいて訂正する文字認識装置が記載されている。 Patent Document 2 discloses that image data that is a first character string is character-recognized and converted into a character string code, and image data that is a second character string that has the same pronunciation as the first character string but a different character type is generated. Recognize the characters and convert them into character string codes, convert the first recognized character string into a character string of the same character type as the second character string, and combine the first character string with the first character string with the same character type as the second character string. A character recognition device is described that compares a second character string with a second character string and corrects the second character string based on the first character string if the first character string and the second character string are different. .

特許文献３には、原稿の画像情報から抽出した文字画像情報毎に形状的特徴に基づいて対応する漢字又はローマ字を選択する時に、特定の漢字画像情報について複数の漢字が選択された際に、画像情報内でこの漢字画像情報と所定の関係を有するローマ字画像情報について選択されたローマ字に基づいて、選択された複数の漢字の中から特定の漢字画像情報に対応する漢字を検索する文字認識方法が記載されている。 Patent Document 3 discloses that when selecting a corresponding kanji or Roman character based on shape characteristics for each character image information extracted from image information of a manuscript, when a plurality of kanji are selected for specific kanji image information, A character recognition method that searches for a kanji that corresponds to specific kanji image information from among a plurality of selected kanji based on a roman character selected for roman character image information that has a predetermined relationship with this kanji image information within image information. is listed.

なお、特許文献４には、認識対象の文字認識における機械学習において、教師信号として単文字間の境界に関する情報を不要とするようにした情報処理装置が記載されている。 Note that Patent Document 4 describes an information processing device that eliminates the need for information regarding boundaries between single characters as a teacher signal in machine learning for recognizing characters to be recognized.

特開２０１２－２１２２９３号公報Japanese Patent Application Publication No. 2012-212293 特開平９－１３８８３５号公報Japanese Patent Application Publication No. 9-138835 特開２０１０－２８２２７２号公報Japanese Patent Application Publication No. 2010-282272 特開２０１６－２１２４７３号公報Japanese Patent Application Publication No. 2016-212473

本発明の目的の一つは、関連する２つの画像からそれぞれ認識された文字列のいずれを信頼するかについて指標を得ることである。 One of the objects of the present invention is to obtain an indication as to which of the character strings respectively recognized from two related images is to be trusted.

本発明の請求項１に係る情報処理装置は、第１画像から第１文字列を認識する第１認識部と、前記第１画像に関連する第２画像から第２文字列を認識する第２認識部と、異なる文字列が予め関連付けられた辞書を参照して、前記第１文字列に関連する１又は複数の第３文字列を抽出する抽出部と、前記第３文字列ごとに、前記第２文字列に対する類似度を示す第１類似度をそれぞれ算出する算出部と、前記第１文字列の文字認識の信頼度を示す第１信頼度、前記第２文字列の文字認識の信頼度を示す第２信頼度、及び前記第１類似度を用いて評価される、前記第２文字列に対する前記第３文字列の信頼性が閾値未満であるときに、第１情報として前記第２文字列を出力し、該信頼性が前記閾値以上であるときに、前記第１情報として前記第２文字列に代えて前記第３文字列を出力する出力部と、を有する情報処理装置である。 The information processing device according to claim 1 of the present invention includes a first recognition unit that recognizes a first character string from a first image, and a second recognition unit that recognizes a second character string from a second image related to the first image. a recognition unit; an extraction unit that extracts one or more third character strings related to the first character string by referring to a dictionary in which different character strings are associated in advance; a calculation unit that respectively calculates a first similarity indicating a degree of similarity to a second character string, a first reliability indicating a reliability of character recognition of the first character string , and a reliability of character recognition of the second character string; When the reliability of the third character string with respect to the second character string, which is evaluated using the second reliability indicating the degree of similarity and the first similarity, is less than a threshold, the second An information processing device comprising: an output unit that outputs a character string and outputs the third character string as the first information in place of the second character string when the reliability is equal to or higher than the threshold value. .

本発明の請求項２に係る情報処理装置は、請求項１に記載の態様において、前記抽出部は、前記第１信頼度及び前記第２信頼度が決められた条件を満たす場合に前記第３文字列を抽出することを特徴とする情報処理装置である。 In the information processing apparatus according to claim 2 of the present invention, in the aspect according to claim 1 , the extraction unit is configured to extract the information from the third reliability when the first reliability and the second reliability satisfy a predetermined condition. This is an information processing device characterized by extracting character strings.

本発明の請求項３に係る情報処理装置は、請求項１又は２に記載の態様において、前記抽出部は、異なる文字列が予め関連付けられた辞書を参照して、前記第２文字列に関連する１又は複数の第４文字列を抽出し、前記算出部は、前記第４文字列ごとに、前記第１文字列に対する類似度を示す第２類似度をそれぞれ算出し、前記出力部は、前記第１信頼度、及び前記第２信頼度の少なくともいずれか、及び前記第２類似度に応じて、該第２類似度及び前記第４文字列の少なくともいずれかに基づく第２情報を出力することを特徴とする情報処理装置である。 In the information processing device according to claim 3 of the present invention, in the aspect according to claim 1 or 2 , the extraction unit refers to a dictionary in which different character strings are associated in advance, and the calculation unit calculates, for each of the fourth character strings, a second degree of similarity indicating the degree of similarity to the first character string, and the output unit: Outputting second information based on at least one of the second similarity and the fourth character string according to at least one of the first reliability and the second reliability and the second similarity. This is an information processing device characterized by the following.

本発明の請求項４に係る情報処理装置は、請求項３に記載の態様において、前記出力部は、前記第１信頼度、前記第２信頼度、及び前記第２類似度を用いて評価される、前記第１文字列に対する前記第４文字列の信頼性が閾値未満であるときに、前記第２情報として前記第１文字列を出力し、該信頼性が前記閾値以上であるときに、前記第２情報として前記第１文字列に代えて前記第４文字列を出力することを特徴とする情報処理装置である。 In the information processing device according to claim 4 of the present invention, in the aspect according to claim 3 , the output unit is evaluated using the first reliability, the second reliability, and the second similarity. outputting the first character string as the second information when the reliability of the fourth character string with respect to the first character string is less than a threshold; and when the reliability is greater than or equal to the threshold; The information processing apparatus is characterized in that the fourth character string is output as the second information in place of the first character string.

本発明の請求項５に係る情報処理装置は、請求項３又は４に記載の態様において、前記抽出部は、前記第１信頼度及び前記第２信頼度が決められた条件を満たす場合に前記第４文字列を抽出することを特徴とする情報処理装置である。 In the information processing apparatus according to claim 5 of the present invention, in the aspect according to claim 3 or 4 , the extracting unit extracts the information when the first reliability and the second reliability satisfy a predetermined condition. This is an information processing device characterized by extracting a fourth character string.

本発明の請求項６に係る情報処理装置は、請求項１に記載の態様において、前記第２認識部は、前記第２画像から算出される１以上の特徴量のそれぞれに定められた重みをつけて集計した量に基づいて、前記第２文字列を認識し、前記出力部が前記第２文字列に代えて前記第３文字列を出力した場合に、前記第２画像から該第３文字列を認識するように、前記重みを修正することを特徴とする情報処理装置である。 In the information processing device according to claim 6 of the present invention, in the aspect according to claim 1 , the second recognition unit calculates a weight determined for each of the one or more feature amounts calculated from the second image. When the second character string is recognized and the output unit outputs the third character string instead of the second character string, the third character string is recognized from the second image based on the amount added and totaled. The information processing apparatus is characterized in that the weights are modified so as to recognize columns.

本発明の請求項７に係る情報処理装置は、請求項４に記載の態様において、前記第１認識部は、前記第１画像から算出される１以上の特徴量のそれぞれに定められた重みをつけて集計した量に基づいて、前記第１文字列を認識し、前記出力部が前記第１文字列に代えて前記第４文字列を出力した場合に、前記第１画像から該第４文字列を認識するように、前記重みを修正することを特徴とする情報処理装置である。
本発明の請求項８に係る情報処理装置は、第１画像から第１文字列を認識する第１認識部と、前記第１画像に関連する第２画像から第２文字列を認識する第２認識部と、異なる文字列が予め関連付けられた辞書を参照して、前記第１文字列に関連する１又は複数の第３文字列を抽出するとともに前記第２文字列に関連する１又は複数の第４文字列を抽出する抽出部と、前記第３文字列ごとに、前記第２文字列に対する類似度を示す第１類似度をそれぞれ算出するとともに前記第４文字列ごとに、前記第１文字列に対する類似度を示す第２類似度をそれぞれ算出する算出部と、前記第１文字列の文字認識の信頼度を示す第１信頼度、及び前記第２文字列の文字認識の信頼度を示す第２信頼度の少なくともいずれか、及び前記第１類似度に応じて、該第１類似度及び前記第３文字列の少なくともいずれかに基づく第１情報を出力するとともに、前記第１信頼度、前記第２信頼度、及び前記第２類似度を用いて評価される、前記第１文字列に対する前記第４文字列の信頼性が閾値未満であるときに、第２情報として前記第１文字列を出力し、該信頼性が前記閾値以上であるときに、前記第２情報として前記第１文字列に代えて前記第４文字列を出力する出力部と、を有する情報処理装置である。 In the information processing device according to claim 7 of the present invention, in the aspect according to claim 4 , the first recognition unit calculates a weight determined for each of the one or more feature amounts calculated from the first image. When the first character string is recognized and the output unit outputs the fourth character string instead of the first character string, the fourth character from the first image is recognized based on the amount added and totaled. The information processing apparatus is characterized in that the weights are modified so as to recognize columns.
The information processing device according to claim 8 of the present invention includes a first recognition unit that recognizes a first character string from a first image, and a second recognition unit that recognizes a second character string from a second image related to the first image. A recognition unit refers to a dictionary in which different character strings are associated in advance, and extracts one or more third character strings related to the first character string, and extracts one or more third character strings related to the second character string. an extraction unit that extracts a fourth character string; and an extraction unit that calculates, for each of the third character strings, a first degree of similarity indicating the degree of similarity to the second character string; a calculation unit that respectively calculates a second degree of similarity indicating a degree of similarity to a string; a first degree of reliability indicating a degree of reliability of character recognition of the first character string; and a degree of reliability indicating a degree of reliability of character recognition of the second character string; According to at least one of the second reliability and the first similarity, first information based on at least one of the first similarity and the third character string is output, and the first reliability, When the reliability of the fourth character string with respect to the first character string, which is evaluated using the second reliability and the second similarity, is less than a threshold, the first character string is used as second information. and an output unit that outputs the fourth character string instead of the first character string as the second information when the reliability is equal to or higher than the threshold value.

本発明の請求項９に係る情報処理装置は、請求項１から８のいずれか１項に記載の態様において、前記第２画像は、前記第１画像に含まれる文字列の発音を示す文字列を含む画像である、ことを特徴とする情報処理装置である。 In the information processing device according to claim 9 of the present invention, in the aspect according to any one of claims 1 to 8 , the second image is a character string indicating the pronunciation of the character string included in the first image. The information processing apparatus is characterized in that the image is an image including the following information.

本発明の請求項１０に係るプログラムは、コンピュータを、第１画像から第１文字列を認識する第１認識部と、前記第１画像に関連する第２画像から第２文字列を認識する第２認識部と、異なる文字列が予め関連付けられた辞書を参照して、前記第１文字列に関連する１又は複数の第３文字列を抽出する抽出部と、前記第３文字列ごとに、前記第２文字列に対する類似度を示す第１類似度をそれぞれ算出する算出部と、前記第１文字列の文字認識の信頼度を示す第１信頼度、及び前記第２文字列の文字認識の信頼度を示す第２信頼度、及び前記第１類似度を用いて評価される、前記第２文字列に対する前記第３文字列の信頼性が閾値未満であるときに、第１情報として前記第２文字列を出力し、該信頼性が前記閾値以上であるときに、前記第１情報として前記第２文字列に代えて前記第３文字列を出力する出力部、として機能させるためのプログラムである。
本発明の請求項１１に係るプログラムは、コンピュータを、第１画像から第１文字列を認識する第１認識部と、前記第１画像に関連する第２画像から第２文字列を認識する第２認識部と、異なる文字列が予め関連付けられた辞書を参照して、前記第１文字列に関連する１又は複数の第３文字列を抽出するとともに前記第２文字列に関連する１又は複数の第４文字列を抽出する抽出部と、前記第３文字列ごとに、前記第２文字列に対する類似度を示す第１類似度をそれぞれ算出するとともに前記第４文字列ごとに、前記第１文字列に対する類似度を示す第２類似度をそれぞれ算出する算出部と、前記第１文字列の文字認識の信頼度を示す第１信頼度、及び前記第２文字列の文字認識の信頼度を示す第２信頼度の少なくともいずれか、及び前記第１類似度に応じて、前記第３文字列及び前記第１類似度の少なくともいずれかに基づく第１情報を出力するとともに、前記第１信頼度、前記第２信頼度、及び前記第２類似度を用いて評価される、前記第１文字列に対する前記第４文字列の信頼性が閾値未満であるときに、第２情報として前記第１文字列を出力し、該信頼性が前記閾値以上であるときに、前記第２情報として前記第１文字列に代えて前記第４文字列を出力する出力部、として機能させるためのプログラムである。 A program according to claim 10 of the present invention includes a first recognition unit that recognizes a first character string from a first image, and a second recognition unit that recognizes a second character string from a second image related to the first image. 2 recognition unit, an extraction unit that extracts one or more third character strings related to the first character string by referring to a dictionary in which different character strings are associated in advance, and for each third character string, a calculation unit that respectively calculates a first similarity degree indicating a degree of similarity to the second character string; a first reliability degree indicating a reliability degree of character recognition of the first character string; and a first degree of reliability indicating a degree of character recognition of the second character string; When the reliability of the third character string with respect to the second character string , which is evaluated using the second reliability indicating the reliability and the first similarity, is less than a threshold, the third character string is used as the first information. 2 character strings, and outputs the third character string as the first information in place of the second character string when the reliability is equal to or higher than the threshold value . be.
The program according to claim 11 of the present invention includes a computer that includes a first recognition unit that recognizes a first character string from a first image, and a second recognition unit that recognizes a second character string from a second image related to the first image. 2. A recognition unit refers to a dictionary in which different character strings are associated in advance, and extracts one or more third character strings related to the first character string, and extracts one or more third character strings related to the second character string. an extraction unit that extracts a fourth character string of a calculation unit that respectively calculates a second degree of similarity indicating a degree of similarity to a character string, a first degree of reliability indicating a degree of reliability of character recognition of the first character string, and a degree of reliability of character recognition of the second character string; outputting first information based on at least one of the third character string and the first similarity according to at least one of the second reliability shown and the first similarity; , when the reliability of the fourth character string with respect to the first character string, which is evaluated using the second reliability and the second similarity, is less than a threshold, the first character is used as second information. The program functions as an output unit that outputs the fourth character string as the second information in place of the first character string when the reliability is equal to or higher than the threshold value.

請求項１、１０に係る発明によれば、関連する２つの画像からそれぞれ認識された文字列のいずれを信頼するかについて指標が得られる。
また、請求項１、１０に係る発明によれば、辞書を参照して抽出された、第１文字列に関連する第３文字列の信頼性が、第２文字列に対して閾値以上であるときに、第３文字列の出力が得られる。
請求項２に係る発明によれば、辞書を参照して第１文字列に関連する第３文字列を抽出する処理を行うための条件を、第１文字列、第２文字列の各信頼度に基づいて設定することができる。
請求項３に係る発明によれば、第１画像から認識された第１文字列と、第１画像に関連する第２画像から認識された第２文字列と、辞書を参照して抽出された、第１文字列に関連する第３文字列と、辞書を参照して抽出された、第２文字列に関連する第４文字列と、
を用いて、第１文字列と第２文字列とのいずれを信頼するかについて指標が得られる。
請求項４に係る発明によれば、辞書を参照して抽出された、第２文字列に関連する第４文字列の信頼性が、第１文字列に対して閾値以上であるときに、第４文字列の出力が得られる。
請求項５に係る発明によれば、辞書を参照して第２文字列に関連する第４文字列を抽出する処理を行うための条件を、第１文字列、第２文字列の各信頼度に基づいて設定することができる。
請求項６に係る発明によれば、第２画像から算出される特徴量につける重みを修正しない場合に比べて、第２文字列を認識する精度が向上する。
請求項７、８、１１に係る発明によれば、第１画像から算出される特徴量につける重みを修正しない場合に比べて、第１文字列を認識する精度が向上する。
請求項９に係る発明によれば、第１画像に含まれる文字列の発音を示す文字列を含む第２文字列が、第２画像から認識される。 According to the inventions according to claims 1 and 10, an index can be obtained as to which of the character strings respectively recognized from two related images is to be trusted.
Further, according to the invention according to claims 1 and 10 , the reliability of the third character string related to the first character string, which is extracted with reference to the dictionary, is equal to or higher than a threshold value with respect to the second character string. Sometimes a third string of output is obtained.
According to the invention according to claim 2 , the conditions for performing the process of extracting the third character string related to the first character string with reference to the dictionary are set based on the reliability of each of the first character string and the second character string. Can be set based on.
According to the invention according to claim 3 , the first character string recognized from the first image, the second character string recognized from the second image related to the first image, and the second character string extracted with reference to a dictionary. , a third character string related to the first character string, and a fourth character string related to the second character string extracted with reference to a dictionary;
can be used to obtain an index as to whether to trust the first character string or the second character string.
According to the invention according to claim 4 , when the reliability of the fourth character string related to the second character string extracted with reference to the dictionary is equal to or higher than the threshold value with respect to the first character string, You will get 4 strings of output.
According to the invention according to claim 5 , the conditions for performing the process of extracting the fourth character string related to the second character string with reference to the dictionary are set based on the reliability of each of the first character string and the second character string. Can be set based on.
According to the invention according to claim 6 , the accuracy of recognizing the second character string is improved compared to the case where the weight given to the feature amount calculated from the second image is not corrected.
According to the inventions according to claims 7, 8, and 11, the accuracy of recognizing the first character string is improved compared to the case where the weight given to the feature amount calculated from the first image is not corrected.
According to the invention according to claim 9, the second character string including the character string indicating the pronunciation of the character string included in the first image is recognized from the second image.

情報処理装置１の構成を示す図。FIG. 1 is a diagram showing the configuration of an information processing device 1. FIG. 記憶部１２に記憶される領域対応表１２１の例を示す図。3 is a diagram showing an example of an area correspondence table 121 stored in the storage unit 12. FIG. 手書きされた領域の例を示す図。A diagram showing an example of a handwritten area. 記憶部１２に記憶される文字認識モデル１２２の例を示す図。FIG. 3 is a diagram showing an example of a character recognition model 122 stored in the storage unit 12. 記憶部１２に記憶される辞書ＤＢ１２３の例を示す図。FIG. 3 is a diagram showing an example of a dictionary DB 123 stored in the storage unit 12. 記憶部１２に記憶される分類モデル１２４を説明するための概念図。FIG. 3 is a conceptual diagram for explaining a classification model 124 stored in the storage unit 12. FIG. 情報処理装置１の機能的構成を示す図。FIG. 1 is a diagram showing a functional configuration of an information processing device 1. FIG. 文字認識の例を説明するための図。A diagram for explaining an example of character recognition. 情報処理装置１の動作の流れを示すフロー図。FIG. 2 is a flow diagram showing the flow of operations of the information processing device 1. FIG.

＜実施形態＞
＜情報処理装置の構成＞
図１は、情報処理装置１の構成を示す図である。図１に示す通り、情報処理装置１は、制御部１１、記憶部１２、通信部１３、操作部１４、表示部１５、及び画像読取部１６を有する。 <Embodiment>
<Configuration of information processing device>
FIG. 1 is a diagram showing the configuration of an information processing device 1. As shown in FIG. As shown in FIG. 1, the information processing device 1 includes a control section 11, a storage section 12, a communication section 13, an operation section 14, a display section 15, and an image reading section 16.

制御部１１は、ＣＰＵ（Central Processing Unit）、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）を有し、ＣＰＵがＲＯＭ及び記憶部１２に記憶されているコンピュータプログラム（以下、単にプログラムという）を読み出して実行することにより情報処理装置１の各部を制御する。 The control unit 11 has a CPU (Central Processing Unit), a ROM (Read Only Memory), and a RAM (Random Access Memory), and the CPU is a computer program (hereinafter simply referred to as a program) stored in the ROM and the storage unit 12. Each part of the information processing device 1 is controlled by reading and executing the command.

通信部１３は、有線又は無線により通信回線（図示せず）に接続する通信回路である。情報処理装置１は、通信部１３により、通信回線に接続された他の装置（すなわち、外部装置）と情報をやり取りする。 The communication unit 13 is a communication circuit connected to a communication line (not shown) by wire or wirelessly. The information processing device 1 uses the communication unit 13 to exchange information with other devices (ie, external devices) connected to the communication line.

操作部１４は、各種の指示をするための操作ボタン、キーボード、タッチパネル等の操作子を備えており、利用者による操作を受付けてその操作内容に応じた信号を制御部１１に送る。 The operation unit 14 includes operators such as operation buttons, a keyboard, and a touch panel for issuing various instructions, and receives operations by the user and sends signals to the control unit 11 according to the contents of the operations.

表示部１５は、液晶ディスプレイ等の表示画面を有しており、制御部１１の制御の下、画像を表示する。表示画面の上には、操作部１４の透明のタッチパネルが重ねて配置されてもよい。 The display section 15 has a display screen such as a liquid crystal display, and displays images under the control of the control section 11. A transparent touch panel of the operation unit 14 may be placed on top of the display screen.

画像読取部１６は、プラテンガラス、媒体に光を照射する照射装置、反射光を集光する光学系、ＣＭＯＳ（Complementary Metal Oxide Semiconductor）イメージセンサや、ＣＣＤ（Charge Coupled Device）イメージセンサ等の撮像素子等を備える。画像読取部１６は、制御部１１の制御の下、プラテンガラスに載せられた、紙等の媒体に形成された画像を読取り、読取った画像を示す画像データを生成して制御部１１に供給する。 The image reading unit 16 includes a platen glass, an irradiation device that irradiates light onto the medium, an optical system that collects reflected light, and an image sensor such as a CMOS (Complementary Metal Oxide Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor. Equipped with etc. The image reading unit 16 reads an image formed on a medium such as paper placed on a platen glass under the control of the control unit 11, generates image data representing the read image, and supplies the generated image data to the control unit 11. .

記憶部１２は、ソリッドステートドライブ、ハードディスクドライブ等の記憶手段であり、制御部１１のＣＰＵに読み込まれる各種のプログラム、データ等を記憶する。また、記憶部１２は、領域対応表１２１、文字認識モデル１２２、辞書ＤＢ１２３、及び分類モデル１２４を記憶する。 The storage unit 12 is a storage means such as a solid state drive or a hard disk drive, and stores various programs, data, etc. that are read by the CPU of the control unit 11. The storage unit 12 also stores an area correspondence table 121, a character recognition model 122, a dictionary DB 123, and a classification model 124.

＜領域対応表の構成＞
図２は、記憶部１２に記憶される領域対応表１２１の例を示す図である。領域対応表１２１は、帳票等の手書き用紙に含まれる記入欄等の領域のレイアウトを示した表である。領域対応表１２１は、領域を識別するための領域名と、その領域の具体的な範囲や位置を示す情報である領域情報とを対応付けて記憶する。例えば、図２に示す領域対応表１２１で、「氏名フリガナ」に対応する領域情報は「Ａ２」である。領域情報は、例えば、手書き用紙から読取られた画像に対して、斜め補正や拡大・縮小補正が行われた後の補正画像における座標情報等として表される。例えば領域情報が示す領域が矩形であれば、領域情報はその矩形の左上及び右下の各頂点の座標等で表される。 <Structure of area correspondence table>
FIG. 2 is a diagram showing an example of the area correspondence table 121 stored in the storage unit 12. The area correspondence table 121 is a table showing the layout of areas such as entry columns included in a handwritten paper such as a form. The area correspondence table 121 stores area names for identifying areas and area information, which is information indicating the specific range and position of the area, in association with each other. For example, in the area correspondence table 121 shown in FIG. 2, the area information corresponding to "Name Furigana" is "A2". The area information is expressed, for example, as coordinate information in a corrected image after skew correction and enlargement/reduction correction are performed on an image read from handwritten paper. For example, if the area indicated by the area information is a rectangle, the area information is expressed by the coordinates of the upper left and lower right vertices of the rectangle.

図３は、手書きされた領域の例を示す図である。例えば、図３に示す二点鎖線の枠で囲った「Ａ２」は、図２で示す「氏名フリガナ」という領域名で識別される領域であり、枠で囲った「Ａ１」は、図２で示す「氏名漢字」という領域名で識別される領域である。 FIG. 3 is a diagram showing an example of a handwritten area. For example, "A2" surrounded by a two-dot chain line frame shown in FIG. 3 is an area identified by the area name "Name Furigana" shown in FIG. This area is identified by the area name "Name Kanji" shown in the figure.

＜文字認識モデルの構成＞
図４は、記憶部１２に記憶される文字認識モデル１２２の例を示す図である。文字認識モデル１２２は、領域対応表１２１に含まれる領域名で識別される領域ごとに、その領域に手書きされる文字の認識処理に用いられるモデルデータを対応付けて記憶する。 <Configuration of character recognition model>
FIG. 4 is a diagram showing an example of the character recognition model 122 stored in the storage unit 12. The character recognition model 122 stores, for each area identified by the area name included in the area correspondence table 121, model data used for recognition processing of characters handwritten in that area, in association with each other.

このモデルデータは、例えば、予め正解の文字コードと対応付けられた手書き文字の画像を示す画像データを読み込ませ、文字コードと手書き文字との対応関係を機械学習させることで生成された学習済みモデルである。情報処理装置１の制御部１１は、例えば、用紙に書き込まれた手書き文字の画像を読取って生成された画像データを画素ごとに分解して、各画素の階調値を多層ニューラルネットワークに入力する。そして制御部１１は、文字認識モデル１２２から読み出したモデルデータを多層ニューラルネットワークに適用し、算出される出力に基づいて、手書き文字に対応する文字コードを認識する。 This model data is, for example, a trained model generated by loading image data showing images of handwritten characters that are associated with correct character codes in advance and performing machine learning on the correspondence between character codes and handwritten characters. It is. For example, the control unit 11 of the information processing device 1 decomposes image data generated by reading an image of handwritten characters written on paper into pixel by pixel, and inputs the gradation value of each pixel to a multilayer neural network. . The control unit 11 then applies the model data read from the character recognition model 122 to the multilayer neural network, and recognizes the character code corresponding to the handwritten character based on the calculated output.

＜辞書ＤＢの構成＞
図５は、記憶部１２に記憶される辞書ＤＢ１２３の例を示す図である。辞書ＤＢ１２３は、異なる文字列を予め関連付けたデータベースである。図５に示す辞書ＤＢ１２３は、辞書名リスト１２３１と、辞書データ１２３２と、を有する。辞書名リスト１２３１は、辞書データ１２３２を識別するための識別情報である辞書名を列挙したリストである。辞書データ１２３２は、辞書名リスト１２３１に記載されている辞書名ごとにそれぞれ関連付けられたデータであって、その辞書名が示す対象の文字列にそれぞれ関連する文字列を記憶するデータである。 <Configuration of dictionary DB>
FIG. 5 is a diagram showing an example of the dictionary DB 123 stored in the storage unit 12. The dictionary DB 123 is a database in which different character strings are associated in advance. The dictionary DB 123 shown in FIG. 5 includes a dictionary name list 1231 and dictionary data 1232. The dictionary name list 1231 is a list of dictionary names that are identification information for identifying the dictionary data 1232. The dictionary data 1232 is data associated with each dictionary name listed in the dictionary name list 1231, and is data that stores character strings respectively related to the target character string indicated by the dictionary name.

例えば、図５に示す辞書ＤＢ１２３の辞書名リスト１２３１には「氏名用辞書」という辞書名が記入されている。そして、「氏名用辞書」という辞書名には、１つの辞書データ１２３２が対応付けられている。この辞書データ１２３２は、氏名に用いられる漢字を示した文字列と、その漢字の発音を示すカタカナ（すなわち、フリガナ）を示した文字列とを関連付けるデータである。氏名用辞書に対応付けられた辞書データ１２３２は、漢字からその漢字のフリガナを特定することに用いられる。１つの漢字の発音が複数通りである場合、その漢字から複数のフリガナが特定されることもある。例えば、図５に示す通り、「友規」という漢字には、「トモキ」「トモノリ」「ユウキ」等、複数の発音が関連付けられている。 For example, the dictionary name "Name Dictionary" is entered in the dictionary name list 1231 of the dictionary DB 123 shown in FIG. One dictionary data 1232 is associated with the dictionary name "Name Dictionary." This dictionary data 1232 is data that associates a character string indicating a kanji used in a name with a character string indicating a katakana (namely, furigana) indicating the pronunciation of the kanji. Dictionary data 1232 associated with the name dictionary is used to identify the furigana of a kanji from a kanji. If one kanji has multiple pronunciations, multiple furigana may be identified from that kanji. For example, as shown in FIG. 5, a plurality of pronunciations are associated with the kanji ``Tomuki'', such as ``Tomoki'', ``Tomonori'', and ``Yuki''.

＜分類モデルの構成＞
記憶部１２に記憶される分類モデル１２４は、関連する２つの画像からそれぞれ認識された文字列のいずれを信頼するかについての判断に用いられる。情報処理装置１は、分類モデル１２４を用いて、例えば、第１信頼度、第２信頼度、及び、第１類似度の３つの数値で構成される特徴量を分類して、上述した判断を行う。 <Configuration of classification model>
The classification model 124 stored in the storage unit 12 is used to determine which of the character strings respectively recognized from two related images is to be trusted. The information processing device 1 uses the classification model 124 to classify the feature amount consisting of three numerical values, for example, a first reliability degree, a second reliability degree, and a first similarity degree, and makes the above-mentioned judgment. conduct.

ここで、第１信頼度とは、読取られた画像に含まれる画像（以下、第１画像という）から認識された第１文字列の信頼度である。また、第２信頼度とは、第１画像に関連する画像（以下、第２画像という）から認識された第２文字列の信頼度である。そして、第１類似度とは、辞書ＤＢ１２３から抽出された、第１文字列に関連する文字列（以下、第３文字列という）の、第２文字列に対する類似度である。類似度とは、２つの文字列が相互に類似している程度を示す数値であり、例えば、レーベンシュタイン距離、ジャロ・ウィンクラー距離等の編集距離で示される。 Here, the first reliability is the reliability of the first character string recognized from the image included in the read image (hereinafter referred to as the first image). Moreover, the second reliability is the reliability of the second character string recognized from the image related to the first image (hereinafter referred to as the second image). The first similarity is the similarity of a character string related to the first character string (hereinafter referred to as a third character string) extracted from the dictionary DB 123 to the second character string. The degree of similarity is a numerical value indicating the degree to which two character strings are similar to each other, and is expressed as an edit distance such as Levenshtein distance or Jaro-Winkler distance.

図６は、記憶部１２に記憶される分類モデル１２４を説明するための概念図である。図６で特徴量の次元は、説明のため２とする。ｘ，ｙの２つの数値で構成される特徴量は、図６に示す通りｘｙ平面上にプロットされる。これらの特徴量は属するクラスを示すラベルと対応付ける、いわゆる「ラベル付け」が予めされている。すなわち、これらの特徴量は、正解付きの認識データであり、教師データである。 FIG. 6 is a conceptual diagram for explaining the classification model 124 stored in the storage unit 12. In FIG. 6, the dimension of the feature amount is assumed to be 2 for the sake of explanation. The feature amount consisting of two numerical values x and y is plotted on the xy plane as shown in FIG. These feature amounts are associated with labels indicating the classes to which they belong, so-called "labeling" in advance. In other words, these feature amounts are recognition data with correct answers and teacher data.

図６に示す例では、各特徴量に対応する点は、それぞれ四角で表される点及び丸で表される点のいずれかである。分類モデル１２４は、予めラベル付けがなされたこれらの特徴量に基づいて生成されたモデルデータである。例えば、図６に示す直線Ｌは、上述した複数の点を種類ごとに分類する線であり、この直線Ｌを示すパラメータは、モデルデータの一例である。 In the example shown in FIG. 6, the points corresponding to each feature amount are either points represented by squares or points represented by circles. The classification model 124 is model data generated based on these feature amounts that have been labeled in advance. For example, the straight line L shown in FIG. 6 is a line that classifies the plurality of points described above by type, and the parameters indicating this straight line L are an example of model data.

分類モデル１２４は、教師データを用いて機械学習の分類手法により生成される。
この教師データは、例えば、第１信頼度、第２信頼度、及び、第１類似度の３つの数値で構成される特徴量と、それらの特徴量にそれぞれ対応付けられた２以上のクラスと、を関連付けたデータである。クラスには、例えば「第２文字列を信頼する」及び「第３文字列を信頼する」のいずれかのラベルが付されている。この機械学習の分類手法としては、例えば、サポートベクターマシン、線形回帰、アンサンブル学習等が挙げられる。また、この機械学習のアルゴリズムには、例えば、エイダブースト等が適用される。なお、ラベルは上述した２種類に限らず、例えば、情報の出力内容や出力の有無を示すものを含んでもよい。 The classification model 124 is generated by a machine learning classification method using training data.
This training data includes, for example, feature quantities consisting of three numerical values: first reliability, second reliability, and first similarity, and two or more classes respectively associated with these feature quantities. This is data associated with . The classes are labeled with either "Trust the second character string" or "Trust the third character string", for example. Examples of this machine learning classification method include support vector machine, linear regression, and ensemble learning. Furthermore, for example, Adaboost is applied to this machine learning algorithm. Note that the labels are not limited to the two types described above, and may include, for example, labels indicating the output content of information and the presence or absence of output.

＜情報処理装置の機能的構成＞
図７は、情報処理装置１の機能的構成を示す図である。図７において、情報処理装置１の通信部１３及び操作部１４は省かれている。 <Functional configuration of information processing device>
FIG. 7 is a diagram showing the functional configuration of the information processing device 1. As shown in FIG. In FIG. 7, the communication unit 13 and operation unit 14 of the information processing device 1 are omitted.

情報処理装置１の制御部１１は、記憶部１２に記憶されているプログラムを読み出して実行することにより、解析部１１１、認識部１１２、抽出部１１３、算出部１１４、及び出力部１１５として機能する。 The control unit 11 of the information processing device 1 functions as an analysis unit 111, a recognition unit 112, an extraction unit 113, a calculation unit 114, and an output unit 115 by reading and executing a program stored in the storage unit 12. .

解析部１１１は、画像読取部１６が読取った画像を示す画像データを取得し、この画像を構成する記入欄のレイアウトを解析する。解析部１１１は、画像読取部１６から画像データを取得すると、この画像データが示す画像に描かれた線や印等に基づいて、いわゆる斜め補正や拡大・縮小補正、オフセット補正等の各種の補正を行う。そして、解析部１１１は、領域対応表１２１を参照することで、補正された画像から第１画像及び第２画像を切出す。 The analysis unit 111 acquires image data representing the image read by the image reading unit 16, and analyzes the layout of entry fields that make up this image. When the analysis unit 111 acquires the image data from the image reading unit 16, the analysis unit 111 performs various corrections such as so-called skew correction, enlargement/reduction correction, offset correction, etc. based on the lines, marks, etc. drawn on the image indicated by this image data. I do. Then, the analysis unit 111 cuts out the first image and the second image from the corrected image by referring to the area correspondence table 121.

第１画像は、例えば、領域情報「Ａ１」で示される領域に描画された画像であり、利用者によって手書きされた漢字を示す画像である。第２画像は、例えば、領域情報「Ａ２」で示される領域に描画された画像であり、利用者によって手書きされたフリガナを示す画像である。第２画像に示されるフリガナは、第１画像に示される漢字のフリガナであるから、第１画像と第２画像とは関連している。この漢字及びフリガナは、例えば利用者の氏名を示す漢字及びフリガナである。 The first image is, for example, an image drawn in the area indicated by the area information "A1", and is an image showing kanji characters handwritten by the user. The second image is, for example, an image drawn in the area indicated by the area information "A2", and is an image showing furigana handwritten by the user. Since the furigana shown in the second image is the furigana of the kanji character shown in the first image, the first image and the second image are related. The kanji and furigana are, for example, the kanji and furigana that indicate the user's name.

認識部１１２は、解析部１１１によって切出された第１画像及び第２画像のそれぞれに対し、文字認識モデル１２２に記憶されたモデルデータを用いて文字認識処理を行い、各領域に手書きされた文字列を認識する。認識部１１２は、第１画像から第１文字列を認識する。このとき、認識部１１２は、第１認識部として機能する。また、認識部１１２は、第１画像に関連する第２画像から第２文字列を認識する。このとき、認識部１１２は、第２認識部として機能する。 The recognition unit 112 performs character recognition processing on each of the first image and second image extracted by the analysis unit 111 using the model data stored in the character recognition model 122, and recognizes the handwritten characters in each area. Recognize strings. The recognition unit 112 recognizes the first character string from the first image. At this time, the recognition unit 112 functions as a first recognition unit. Furthermore, the recognition unit 112 recognizes a second character string from a second image related to the first image. At this time, the recognition unit 112 functions as a second recognition unit.

認識部１１２は、第１画像及び第２画像のそれぞれに対し、階調値に基づいてエッジ検出等を行って、文字単位で画像を分割する。そして、認識部１１２は、それぞれの領域に対応付けられたモデルデータを文字認識モデル１２２から読み出して、１文字ずつ認識処理を行う。 The recognition unit 112 performs edge detection and the like on each of the first image and the second image based on the gradation value, and divides the image into characters. Then, the recognition unit 112 reads model data associated with each region from the character recognition model 122, and performs recognition processing one character at a time.

ここで認識部１１２は、１文字ずつ文字認識をする際に、認識した手書き文字が、モデルデータの生成に使われた教師データ等に含まれる文字の画像とどの程度、相違しているかを評価する。この評価は、例えば、一致する画素の数や、共通の階調値を示す画素の塊の配置、大きさ等に基づいて行われる。そして、認識部１１２は、この評価の結果に基づいて、文字ごとの文字認識の信頼度（以下、文字信頼度という）を算出する。 Here, when performing character recognition character by character, the recognition unit 112 evaluates how much the recognized handwritten character differs from the character image included in the teacher data used to generate the model data. do. This evaluation is performed based on, for example, the number of matching pixels, the arrangement and size of clusters of pixels showing a common gradation value, and the like. Then, the recognition unit 112 calculates the reliability of character recognition for each character (hereinafter referred to as character reliability) based on the results of this evaluation.

認識部１１２は、分割した全ての画像について文字認識を行うと、これを並べた文字列を生成するとともに、この文字列の信頼度を算出する。文字列の信頼度は、例えば、特許文献４に記載した数式４を用いて算出される。認識部１１２は、例えば、隣り合う文字の文字信頼度から計算される局所的なエネルギー関数の和を求め、これにより文字列の信頼度を算出する。 When the recognition unit 112 performs character recognition on all the divided images, it generates a character string by arranging the characters and calculates the reliability of this character string. The reliability of a character string is calculated using Equation 4 described in Patent Document 4, for example. For example, the recognition unit 112 calculates the sum of local energy functions calculated from the character reliabilities of adjacent characters, and thereby calculates the reliability of the character string.

認識部１１２は、文字認識の結果がそれぞれ文字ごとに複数ある場合、それらの文字を組合せて得られる文字列ごとに信頼度を算出する。そして、算出された信頼度が最も高い文字列をその画像から認識した文字列として選択する。例えば、認識部１１２は、複数の文字列の候補から１つの文字列を選択する際に、上述した局所的なエネルギー関数の和で示されるエネルギー関数が最小になる場合の文字列を、ビタビアルゴリズムを用いて探索する。 If there are a plurality of character recognition results for each character, the recognition unit 112 calculates reliability for each character string obtained by combining those characters. Then, the character string with the highest calculated reliability is selected as the character string recognized from the image. For example, when selecting one character string from a plurality of character string candidates, the recognition unit 112 uses the Viterbi algorithm to select a character string that minimizes the energy function represented by the sum of the local energy functions. Explore using.

図８は、文字認識の例を説明するための図である。例えば、図３に示す手書き文字に対して、認識部１１２は、第１文字列として「富士友規」という文字列、及び第２文字列として「フジマウキ」という文字列をそれぞれ認識する。そして、認識部１１２は、第１文字列及び第２文字列のそれぞれの信頼度も算出する。信頼度は０以上１以下の数値で示される。図８に示す通り、第１文字列の信頼度は０．９９８であり、第２文字列の信頼度は０．１９である。第１文字列の信頼度は０よりも１に近いため、文字認識が成功していると推測される。一方、第２文字列の信頼度は１よりも０に近いため、文字認識が失敗していると推測される。 FIG. 8 is a diagram for explaining an example of character recognition. For example, with respect to the handwritten characters shown in FIG. 3, the recognition unit 112 recognizes the character string "Fuji Tomoki" as the first character string and the character string "Fujimauki" as the second character string. The recognition unit 112 also calculates the reliability of each of the first character string and the second character string. The reliability is indicated by a numerical value between 0 and 1. As shown in FIG. 8, the reliability of the first character string is 0.998, and the reliability of the second character string is 0.19. Since the reliability of the first character string is closer to 1 than 0, it is presumed that character recognition is successful. On the other hand, since the reliability of the second character string is closer to 0 than 1, it is presumed that character recognition has failed.

抽出部１１３は、異なる文字列が予め関連付けられた辞書を参照して、第１文字列に関連する１又は複数の第３文字列を抽出する。抽出部１１３は、上述した第１文字列である「富士友規」に関連する第３文字列を、辞書ＤＢ１２３から抽出する。抽出部１１３は、辞書ＤＢ１２３を参照して、「富士」の部分から「フジ」という１通りのフリガナを抽出する。また、抽出部１１３は、辞書ＤＢ１２３を参照して、「友規」の部分から「トモキ」、「トモノリ」、及び「ユウキ」の３通りのフリガナを抽出する。したがって、抽出部１１３は、「フジトモキ」、「フジトモノリ」及び「フジユウキ」の３つの第３文字列を抽出する。 The extraction unit 113 refers to a dictionary in which different character strings are associated in advance, and extracts one or more third character strings related to the first character string. The extraction unit 113 extracts from the dictionary DB 123 a third character string related to the first character string "Fuji Yuki" described above. The extraction unit 113 refers to the dictionary DB 123 and extracts one type of furigana, "Fuji", from the "Fuji" part. Further, the extraction unit 113 refers to the dictionary DB 123 and extracts three types of furigana, "Tomoki", "Tomonori", and "Yuki" from the "Tomoki" part. Therefore, the extraction unit 113 extracts three third character strings: "Fujitomoki", "Fujitomonori", and "Fujiyuki".

算出部１１４は、第３文字列ごとに、第２文字列に対する類似度を示す第１類似度をそれぞれ算出する。この類似度は、第２文字列と第３文字列との編集距離によって算出される。ここで編集距離とは、初期の文字列（以下、初期文字列という）を目的とする文字列（以下、目的文字列という）に変化させるための編集処理の種類及び回数等に基づいて算出される数値である。編集処理とは、例えば、「追加する」、「削除する」、「入れ替える」といった処理をいう。 The calculation unit 114 calculates, for each third character string, a first degree of similarity indicating the degree of similarity to the second character string. This degree of similarity is calculated based on the edit distance between the second character string and the third character string. Here, the edit distance is calculated based on the type and number of editing processes to change an initial character string (hereinafter referred to as initial character string) to a target character string (hereinafter referred to as target character string). This is the numerical value. Editing processing refers to processing such as "adding", "deleting", and "replacing", for example.

算出部１１４は、「追加する」、「削除する」、「入れ替える」という３つの編集処理の編集距離をいずれも「１」とする。そして、算出部１１４は、初期文字列から目的文字列までに要した編集処理の、それぞれの編集距離の合計を、初期文字列から目的文字列への編集距離として算出する。この編集距離を第１類似度とする場合、第１類似度が０に近いほど、第２文字列と第３文字列は類似しており、大きいほど非類似である。 The calculation unit 114 sets the edit distances of the three editing processes of "add", "delete", and "replace" to "1". Then, the calculation unit 114 calculates the sum of the edit distances of the editing processes required from the initial character string to the target character string as the edit distance from the initial character string to the target character string. When this edit distance is the first similarity, the closer the first similarity is to 0, the more similar the second character string and the third character string are, and the larger the first similarity, the more dissimilar.

なお、１つの「追加する」と１つの「削除する」とは、１つの「入れ替える」に相当するが、算出部１１４は、編集距離の合計が小さくなるように「入れ替える」を採用する。 Note that one "add" and one "delete" correspond to one "swap," but the calculation unit 114 employs "swap" so that the total edit distance becomes small.

図８に示す例で、第３文字列である「フジトモキ」は、「ト」「モ」をそれぞれ「マ」「ウ」に入れ替えることで、第２文字列である「フジマウキ」に編集される。したがって、この第３文字列は、「入れ替える」という編集処理を２回行うことで第２文字列に変化するから、第３文字列の第２文字列に対する編集距離、すなわち第１類似度は「２」である。 In the example shown in Figure 8, the third character string "Fujitomoki" is edited to the second character string "Fujitomoki" by replacing "to" and "mo" with "ma" and "u" respectively. . Therefore, this third character string changes to the second character string by performing the editing process of "swapping" twice, so the editing distance of the third character string with respect to the second character string, that is, the first similarity is " 2".

また、第３文字列である「フジトモノリ」は、「ト」「モ」「ノ」をそれぞれ「マ」「ウ」「キ」に入れ替え、かつ、「リ」を削除することで、第２文字列である「フジマウキ」に編集される。つまり、この第３文字列は、「入れ替える」を３回、「削除する」を１回、すなわち合計して４回の編集処理を行うことで第２文字列に変化する。したがって、第３文字列の第２文字列に対する編集距離、すなわち第１類似度は「４」である。 In addition, the third character string "Fujitomonori" is created by replacing "to," "mo," and "no" with "ma," "u," and "ki," respectively, and deleting "li" to create the second character string. It is edited into the column "Fujimauki". In other words, this third character string is changed into the second character string by performing the editing process of "replace" three times and "delete" once, that is, a total of four times. Therefore, the edit distance of the third character string to the second character string, that is, the first similarity is "4".

一方、第３文字列である「フジユウキ」は、「ユ」を「マ」に入れ替えることで、第２文字列である「フジマウキ」に編集される。したがって、この第３文字列は、「入れ替える」という編集処理を１回だけ行うことで第２文字列に変化するから、第３文字列の第２文字列に対する編集距離、すなわち第１類似度は「１」である。 On the other hand, the third character string "Fuji Yuuki" is edited into the second character string "Fuji Mauuki" by replacing "yu" with "ma". Therefore, since this third character string changes to the second character string by performing the editing process of "swapping" only once, the editing distance of the third character string with respect to the second character string, that is, the first similarity is It is "1".

ところで、第２画像から認識されたフリガナは、一般に複数のカタカナ文字で構成される文字列であり、誤認識された場合であっても、その誤りは文字列全体の一部であることが多い。そして、第１類似度は、辞書ＤＢ１２３から抽出したフリガナと、認識されたフリガナとの相違の程度を表している。そのため、第１類似度が類似を示しているほど、一般に辞書ＤＢ１２３から抽出したフリガナの方が認識されたフリガナよりも信頼される。つまり、この場合、第１類似度が０に近い（類似していることを示す）ほど、漢字の認識精度の方が、フリガナの認識精度よりも信頼できると言える。 By the way, the furigana recognized from the second image is generally a string consisting of multiple katakana characters, and even if it is misrecognized, the error is often part of the entire string. . The first similarity represents the degree of difference between the furigana extracted from the dictionary DB 123 and the recognized furigana. Therefore, the more similar the first similarity is, generally the furigana extracted from the dictionary DB 123 is more reliable than the recognized furigana. That is, in this case, it can be said that the closer the first similarity is to 0 (indicating that they are similar), the more reliable the recognition accuracy of kanji is than the recognition accuracy of furigana.

しかし、第１類似度が大きい（非類似であることを示す）ほど、辞書ＤＢ１２３から抽出したフリガナと認識されたフリガナとの相違する箇所が増えるので、漢字の認識精度に比較してフリガナの認識精度を信頼できない、とは言えなくなる。 However, the larger the first similarity (indicating that they are dissimilar), the more places there are differences between the furigana extracted from the dictionary DB 123 and the recognized furigana. It is no longer possible to say that you cannot trust the accuracy.

そこで、情報処理装置１は、第３文字列が複数ある場合、これら第３文字列ごとに算出した第１類似度を比較して、最も類似を示している（この場合、最も０に近い）第３文字列を選択する。 Therefore, when there are multiple third character strings, the information processing device 1 compares the first similarity calculated for each of these third character strings and selects the one that shows the most similarity (in this case, the one closest to 0). Select the third character string.

出力部１１５は、第１類似度に応じて、この第１類似度及び第３文字列の少なくともいずれかに基づく情報である第１情報を出力する。例えば、図８に示す通り、第１文字列として「富士友規」という文字列、第２文字列として「フジマウキ」という文字列がそれぞれ認識され、第３文字列として「フジユウキ」という文字列が抽出されたとする。この場合、第１文字列の信頼度である第１信頼度は「０．９９８」、第２文字列の信頼度である第２信頼度は「０．１９」、第３文字列の第２文字列に対する第１類似度は「１」である。 The output unit 115 outputs first information that is information based on at least one of the first similarity and the third character string, depending on the first similarity. For example, as shown in Figure 8, the character string "Fuji Yuuki" is recognized as the first character string, the character string "Fuji Mauuki" is recognized as the second character string, and the character string "Fuji Yuuki" is extracted as the third character string. Suppose that In this case, the first reliability that is the reliability of the first character string is "0.998", the second reliability that is the reliability of the second character string is "0.19", and the second reliability of the third character string is "0.998". The first similarity to the character string is "1".

このとき、出力部１１５は、（第１信頼度，第２信頼度，第１類似度）で示される特徴量が（０．９９８，０．１９，１）である場合について、分類モデル１２４を参照し、第２文字列と第３文字列のいずれを信頼するべきかを判断する。そして、出力部１１５は、判断の結果に応じて、第１類似度及び第３文字列の少なくともいずれかに基づく情報である第１情報を出力する。 At this time, the output unit 115 outputs the classification model 124 for the case where the feature amount indicated by (first reliability, second reliability, first similarity) is (0.998, 0.19, 1). and determine whether to trust the second character string or the third character string. Then, the output unit 115 outputs first information that is information based on at least one of the first similarity and the third character string, depending on the result of the determination.

なお、この場合、出力部１１５は、第１類似度に加えて、第１文字列の信頼度を示す第１信頼度、及び第２文字列の信頼度を示す第２信頼度の少なくともいずれかに応じて、第１情報を出力する。特に、上述した（第１信頼度，第２信頼度，第１類似度）の３次元で示される特徴量のように、第１類似度を含む複数次元の特徴量を用いると、例えば統計的分類手法を適用することにより複雑な判断基準の下に、第１情報の出力内容や出力の有無が決まる。 In this case, in addition to the first similarity, the output unit 115 outputs at least one of a first reliability indicating the reliability of the first character string and a second reliability indicating the reliability of the second character string. The first information is output according to the first information. In particular, when using multi-dimensional features including the first similarity, such as the three-dimensional features (first reliability, second reliability, first similarity) mentioned above, for example, statistical By applying a classification method, the content of output of the first information and whether or not to output it are determined based on complex judgment criteria.

出力部１１５は、例えば、第１情報として「フジマウキ（もしかしてフジユウキ？）」という文字列を示す制御信号を出力し、表示部１５にこの文字列を表示させてもよい。この場合、括弧内の「もしかして」に続いて第３文字列を示すので、第１情報は、第１類似度及び第３文字列の少なくともいずれかに基づく情報である。 For example, the output unit 115 may output a control signal indicating a character string “Fuji mauki (maybe Fuji Yuuki?)” as the first information, and cause the display unit 15 to display this character string. In this case, since the third character string is shown following "maybe" in parentheses, the first information is information based on at least one of the first similarity and the third character string.

また、出力部１１５は、例えば、第１情報として「フジマウキ（類似度が１の他の候補があります）」という文字列を示す制御信号を出力し、表示部１５にこの文字列を表示させてもよい。この場合、括弧内には他の候補の類似度が示されるので、第１情報は、第１類似度及び第３文字列の少なくともいずれかに基づく情報である。 Further, the output unit 115 outputs, for example, a control signal indicating a character string “Fujimauki (there is another candidate with a similarity of 1)” as the first information, and causes the display unit 15 to display this character string. Good too. In this case, the similarity of other candidates is shown in parentheses, so the first information is information based on at least one of the first similarity and the third character string.

＜情報処理装置の動作＞
図９は、情報処理装置１の動作の流れを示すフロー図である。図９に示す通り、情報処理装置１の制御部１１は、画像読取部１６を制御して媒体に形成された画像を読取る（ステップＳ１０１）。制御部１１は、読取った画像を補正して、領域対応表１２１に基づいてこの画像から第１画像及び第２画像を切出す（ステップＳ１０２）。なお、読取った画像の補正は行われなくてもよい。 <Operation of information processing device>
FIG. 9 is a flow diagram showing the flow of operations of the information processing device 1. As shown in FIG. 9, the control unit 11 of the information processing device 1 controls the image reading unit 16 to read the image formed on the medium (step S101). The control unit 11 corrects the read image and cuts out the first image and the second image from this image based on the area correspondence table 121 (step S102). Note that the read image does not need to be corrected.

制御部１１は、第１画像から第１文字列を認識し（ステップＳ１０３）、第１文字列の第１信頼度を算出する（ステップＳ１０４）。 The control unit 11 recognizes the first character string from the first image (step S103), and calculates the first reliability of the first character string (step S104).

また、制御部１１は、第２画像から第２文字列を認識し（ステップＳ１０５）、第２文字列の第２信頼度を算出する（ステップＳ１０６）。ステップＳ１０５は、ステップＳ１０３の前に行われてもよい。 Further, the control unit 11 recognizes the second character string from the second image (step S105), and calculates the second reliability of the second character string (step S106). Step S105 may be performed before step S103.

制御部１１は、辞書ＤＢ１２３を参照して第１文字列に関連する１又は複数の第３文字列を抽出し（ステップＳ１０７）、第３文字列ごとに第２文字列に対する第１類似度を算出する（ステップＳ１０８）。 The control unit 11 refers to the dictionary DB 123 to extract one or more third character strings related to the first character string (step S107), and calculates the first similarity to the second character string for each third character string. Calculate (step S108).

制御部１１は、第１信頼度、第２信頼度、及び第１類似度に応じて、第２文字列に対する第３文字列の信頼性を評価する（ステップＳ１０９）。この評価は、制御部１１が、分類モデル１２４を参照して、（第１信頼度，第２信頼度，第１類似度）で示される特徴量を分類し、この特徴量がどのラベルが付けられたクラスに分類されたかに応じて決定される。 The control unit 11 evaluates the reliability of the third character string with respect to the second character string according to the first reliability, the second reliability, and the first similarity (step S109). In this evaluation, the control unit 11 refers to the classification model 124, classifies the feature amount indicated by (first reliability, second reliability, first similarity), and determines which label this feature is attached to. It is decided according to whether the class is classified into the specified class.

そして、制御部１１は、評価した信頼性が条件を満たした場合に、第２文字列に代えて第３文字列を出力する（ステップＳ１１０）。制御部１１は、例えば、（第１信頼度，第２信頼度，第１類似度）で示される特徴量が「第３文字列を信頼する」というラベル付けがされたクラスに分類された場合に、第２文字列に代えて第３文字列を出力する。 Then, if the evaluated reliability satisfies the conditions, the control unit 11 outputs the third character string instead of the second character string (step S110). For example, when the feature amount indicated by (first reliability, second reliability, first similarity) is classified into a class labeled "trust the third character string", the control unit 11 Then, the third character string is output in place of the second character string.

以上、説明した通り、情報処理装置１は、第３文字列の第２文字列に対する第１類似度に応じて第１情報を出力する。 As described above, the information processing device 1 outputs the first information according to the first similarity of the third character string to the second character string.

例えば、上述した第１信頼度は、第１文字列の認識精度を示す指標であり、第２信頼度は第２文字列の認識精度を示す指標であるが、いずれも、それぞれの文字認識処理に基づいて算出される数値である。したがって、第１信頼度、又は第２信頼度だけで認識精度を評価すると判断を誤る可能性がある。 For example, the first reliability mentioned above is an index indicating the recognition accuracy of the first character string, and the second reliability is an index indicating the recognition accuracy of the second character string. This is a numerical value calculated based on. Therefore, if the recognition accuracy is evaluated only based on the first reliability or the second reliability, there is a possibility that the judgment will be incorrect.

一方、第１類似度は、辞書ＤＢ１２３から抽出した、第１文字列に関連する第３文字列と、第２文字列との編集距離等の比較結果により算出される。つまり、第１類似度は、第１文字列、第２文字列の文字認識処理に加えて、文字列の関連を記憶した辞書ＤＢ１２３に基づいている。そして、１つの第１文字列に関連して複数の第３文字列が記憶されていても、第１類似度は、これら複数の第３文字列と第２文字列との各組に対してそれぞれ算出されるので、辞書から抽出された文字列が一意に定まらない、ということがない。 On the other hand, the first similarity is calculated based on a comparison result such as the edit distance between the third character string related to the first character string and the second character string extracted from the dictionary DB 123. That is, the first similarity is based on the dictionary DB 123 that stores the relationships between character strings in addition to character recognition processing of the first character string and second character string. Even if a plurality of third character strings are stored in relation to one first character string, the first similarity is calculated for each pair of the plurality of third character strings and second character strings. Since each is calculated, there is no possibility that the character string extracted from the dictionary cannot be determined uniquely.

つまり、第１類似度に応じて第１情報を出力することで、情報処理装置１は、２つの文字認識処理のいずれを信頼するかについて、文字認識処理とそれ以外の両方の観点に基づく判断の指標を利用者に提供する。 In other words, by outputting the first information according to the first similarity, the information processing device 1 makes a decision based on both the character recognition process and other aspects as to which of the two character recognition processes to trust. Provide users with indicators of

＜変形例＞
以上が実施形態の説明であるが、この実施形態の内容は以下のように変形し得る。また、以下の変形例は、組合されてもよい。 <Modified example>
The above is the description of the embodiment, but the content of this embodiment can be modified as follows. Further, the following modifications may be combined.

＜１＞
上述した実施形態において、第１画像及び第２画像は、読取られた１つの画像から切出されていたが、これに限られない。例えば、情報処理装置１の制御部１１は、名刺のおもて面から第１画像を、裏面から第２画像を、それぞれ画像読取部１６により別々に読取らせてもよい。すなわち、第１画像と第２画像とは、互いに関連していれば共通の画像に含まれなくてもよい。 <1>
In the embodiment described above, the first image and the second image are cut out from one read image, but the invention is not limited to this. For example, the control unit 11 of the information processing device 1 may cause the image reading unit 16 to separately read a first image from the front side of the business card and a second image from the back side. That is, the first image and the second image do not need to be included in a common image as long as they are related to each other.

＜２＞
上述した実施形態において、情報処理装置１は、第１類似度に加えて、第１文字列の信頼度を示す第１信頼度、及び第２文字列の信頼度を示す第２信頼度の少なくともいずれかに応じて、第１情報を出力していたが、これに限られない。情報処理装置１は、例えば、第１信頼度及び第２信頼度に関わらず、第１類似度に応じて第１情報を出力してもよい。この場合、情報処理装置１は、第１信頼度及び第２信頼度のいずれか、又はその両方を算出しなくてもよい。 <2>
In the embodiment described above, in addition to the first similarity, the information processing device 1 calculates at least the first reliability indicating the reliability of the first character string and the second reliability indicating the reliability of the second character string. Although the first information is output in accordance with either of the above, the present invention is not limited to this. For example, the information processing device 1 may output the first information according to the first similarity, regardless of the first reliability and the second reliability. In this case, the information processing device 1 does not need to calculate either or both of the first reliability and the second reliability.

＜３＞
上述した実施形態において、制御部１１は、分類モデル１２４を参照して、（第１信頼度，第２信頼度，第１類似度）で示される特徴量を分類し、この特徴量がどのクラスに分類されたかに応じて、第２文字列に対する第３文字列の信頼性を評価していた。しかし、信頼性は、分類先のクラスではなく、特徴量から算出される数値で評価されてもよい。 <3>
In the embodiment described above, the control unit 11 refers to the classification model 124, classifies the feature amount indicated by (first reliability, second reliability, first similarity), and determines which class this feature amount belongs to. The reliability of the third character string with respect to the second character string was evaluated depending on whether the third character string was classified as such. However, the reliability may be evaluated using a numerical value calculated from the feature amount instead of the class to be classified.

例えば、（第１信頼度，第２信頼度，第１類似度）で示される特徴量を独立変数としてもつ関数が定義されている場合、制御部１１は、この関数を演算して得られる数値を、第２文字列に対する第３文字列の信頼性として用いてもよい。この場合、信頼性を示す数値が閾値以上であるときに、情報処理装置１は、第２文字列に代えて第３文字列を出力してもよい。つまり、この変形例における情報処理装置１は第１信頼度、第２信頼度、及び第１類似度を用いて評価される、第２文字列に対する第３文字列の信頼性が閾値以上であるときに、第２文字列に代えて第３文字列を出力する。 For example, if a function is defined that has the feature amount shown as (first reliability, second reliability, first similarity) as an independent variable, the control unit 11 calculates the numerical value obtained by calculating this function. may be used as the reliability of the third character string with respect to the second character string. In this case, when the numerical value indicating reliability is equal to or greater than the threshold value, the information processing device 1 may output the third character string instead of the second character string. That is, in the information processing device 1 in this modification, the reliability of the third character string with respect to the second character string, which is evaluated using the first reliability, the second reliability, and the first similarity, is greater than or equal to the threshold. Sometimes, a third string is output in place of the second string.

＜４＞
上述した実施形態において、情報処理装置１は、第１文字列及び第２文字列を認識すると、第１文字列に関連する第３文字列を辞書ＤＢ１２３から抽出していたが、第１信頼度及び第２信頼度が決められた条件を満たす場合に第３文字列を抽出してもよい。例えば、第１信頼度及び第２信頼度がそれぞれ決められた閾値以上である場合、第１文字列及び第２文字列のいずれも誤っている可能性が低い。この場合、情報処理装置１は第３文字列の抽出を行わなくてもよい。つまり、２つの文字認識がいずれも信頼し得る場合、この変形例における情報処理装置１は、第３文字列の抽出を行わないので、無用な処理負荷が減る。 <4>
In the embodiment described above, when the information processing device 1 recognizes the first character string and the second character string, it extracts the third character string related to the first character string from the dictionary DB 123. The third character string may be extracted when the second reliability satisfies a predetermined condition. For example, when the first reliability and the second reliability are each greater than or equal to a predetermined threshold, there is a low possibility that both the first character string and the second character string are incorrect. In this case, the information processing device 1 does not need to extract the third character string. In other words, if both character recognitions are reliable, the information processing device 1 in this modification does not extract the third character string, thereby reducing unnecessary processing load.

＜５＞
上述した実施形態において、情報処理装置１は、辞書ＤＢ１２３を参照して、第１文字列に関連する１又は複数の第３文字列を抽出していたが、第２文字列に関連する１又は複数の第４文字列を抽出してもよい。例えば、情報処理装置１は、辞書ＤＢ１２３を参照して、第２文字列であるフリガナから、そのフリガナにより発音される漢字を第４文字列として抽出してもよい。この場合、第４文字列の抽出に用いる辞書ＤＢ１２３は、第３文字列の抽出に用いる辞書ＤＢ１２３と共通であってもよいし、共通でなくてもよい。 <5>
In the embodiment described above, the information processing device 1 refers to the dictionary DB 123 and extracts one or more third character strings related to the first character string, but extracts one or more third character strings related to the second character string. A plurality of fourth character strings may be extracted. For example, the information processing device 1 may refer to the dictionary DB 123 and extract, from the second character string Furigana, the kanji pronounced by the Furigana as the fourth character string. In this case, the dictionary DB 123 used to extract the fourth character string may or may not be the same as the dictionary DB 123 used to extract the third character string.

そして、この場合、情報処理装置１は、抽出した第４文字列ごとに、第１文字列に対する類似度を示す第２類似度をそれぞれ算出し、この第２類似度に応じて、第２類似度及び第４文字列の少なくともいずれかに基づく情報である第２情報を出力するとよい。 In this case, the information processing device 1 calculates, for each of the extracted fourth character strings, a second degree of similarity indicating the degree of similarity to the first character string, and calculates a second degree of similarity in accordance with the second degree of similarity. It is preferable to output the second information that is information based on at least one of the degree and the fourth character string.

例えば、図３に示す手書き文字に対して、情報処理装置１は、第１文字列として「富士反規」という文字列、第２文字列として「フジユウキ」という文字列を認識する。このとき、情報処理装置１は、第１文字列の信頼度として０．１を算出し、第２文字列の信頼度として０．９を算出する。この場合、第１文字列の信頼度は１よりも０に近く、文字認識が失敗していると推測される。一方、第２文字列の信頼度は０よりも１に近く、文字認識が成功していると推測される。 For example, with respect to the handwritten characters shown in FIG. 3, the information processing device 1 recognizes the character string "Fuji Tanuki" as the first character string and the character string "Fuji Yuuki" as the second character string. At this time, the information processing device 1 calculates 0.1 as the reliability of the first character string and 0.9 as the reliability of the second character string. In this case, the reliability of the first character string is closer to 0 than 1, and it is presumed that character recognition has failed. On the other hand, the reliability of the second character string is closer to 1 than 0, indicating that character recognition is successful.

情報処理装置１の制御部１１によって実現する抽出部１１３は、上述した第２文字列である「フジユウキ」に関連する第４文字列を、辞書ＤＢ１２３から抽出する。抽出部１１３は、辞書ＤＢ１２３を参照して、例えば、「フジ」の部分から「富士」という１通りの漢字を抽出する。また、抽出部１１３は、辞書ＤＢ１２３を参照して、例えば、「ユウキ」の部分から「祐樹」、「優希」、及び「友規」の３通りの漢字を抽出する。したがって、抽出部１１３は、「富士祐樹」、「富士優希」、及び「富士友規」の３つの第４文字列を抽出する。 The extraction unit 113 realized by the control unit 11 of the information processing device 1 extracts a fourth character string related to the second character string “Fuji Yuuki” from the dictionary DB 123. The extraction unit 113 refers to the dictionary DB 123 and extracts, for example, one type of kanji character "Fuji" from the part "Fuji". Further, the extraction unit 113 refers to the dictionary DB 123 and extracts three types of kanji, ``Yuki'', ``Yuki'', and ``Yuki'' from the portion of ``Yuki'', for example. Therefore, the extraction unit 113 extracts the three fourth character strings: "Yuki Fuji," "Yuki Fuji," and "Yuki Fuji."

そして、情報処理装置１は、抽出した３つの第４文字列ごとに、それぞれ第１文字列に対する第２類似度を算出する。「富士祐樹」及び「富士優希」は、第１文字列である「富士反規」に対する編集距離がいずれも「２」であるのに対し、「富士友規」は、編集距離が「１」であるため、情報処理装置１は、３つの第４文字列のうち「富士友規」を選択する。 Then, the information processing device 1 calculates the second similarity with respect to the first character string for each of the three extracted fourth character strings. “Fuji Yuki” and “Fuji Yuki” both have an edit distance of “2” with respect to the first character string “Fuji Tanuki”, while “Fuji Yuki” has an edit distance of “1”. Therefore, the information processing device 1 selects "Fuji Yuki" from among the three fourth character strings.

情報処理装置１の制御部１１によって実現する出力部１１５は、（第１信頼度，第２信頼度，第２類似度）で示される特徴量が（０．１，０．９，１）である場合について、分類モデル１２４を参照し、第１文字列と第４文字列のいずれを信頼するべきかを判断する。そして、判断結果に基づいて、情報処理装置１は、第２類似度及び第４文字列の少なくともいずれかに基づく第２情報を出力する。 The output unit 115 realized by the control unit 11 of the information processing device 1 has a feature amount indicated by (first reliability, second reliability, second similarity) of (0.1, 0.9, 1). In a certain case, the classification model 124 is referred to to determine which of the first character string and the fourth character string should be trusted. Then, based on the determination result, the information processing device 1 outputs second information based on at least one of the second similarity and the fourth character string.

上述した出力部１１５は、例えば、第２情報として「富士反規（もしかして富士友規？）」という文字列を示す制御信号を出力し、表示部１５にこの文字列を表示させてもよい。この場合、括弧内の「もしかして」に続いて第４文字列を示すので、第２情報は、第２類似度及び第４文字列の少なくともいずれかに基づく情報である。 The above-mentioned output unit 115 may output, for example, a control signal indicating a character string “Fuji Hanuki (maybe Fuji Tomoki?)” as the second information, and cause the display unit 15 to display this character string. In this case, since the fourth character string is shown following "maybe" in parentheses, the second information is information based on at least one of the second similarity and the fourth character string.

また、上述した出力部１１５は、例えば、第２情報として「富士反規（類似度が１の他の候補があります）」という文字列を示す制御信号を出力し、表示部１５にこの文字列を表示させてもよい。この場合、括弧内には他の候補の類似度が示されるので、第２情報は、第２類似度及び第４文字列の少なくともいずれかに基づく情報である。 Further, the output unit 115 described above outputs, for example, a control signal indicating a character string “Fuji Hanuki (there is another candidate with a similarity of 1)” as second information, and displays this character string on the display unit 15. may be displayed. In this case, since the similarity of other candidates is shown in parentheses, the second information is information based on at least one of the second similarity and the fourth character string.

上述した通り、情報処理装置１は、第１文字列と辞書ＤＢ１２３とを用いて第３文字列を抽出することで、誤認識された第２文字列を訂正し、又は、その誤認識の可能性を利用者に知らせる。 As described above, the information processing device 1 extracts the third character string using the first character string and the dictionary DB 123, thereby correcting the erroneously recognized second character string, or correcting the possibility of the erroneous recognition. Inform users of gender.

一方、例えば第２文字列に比べて第１文字列の信頼度が低い場合、情報処理装置１は、第３文字列を抽出するだけでは、第１文字列の訂正等をすることはできない。しかし、この変形例の情報処理装置１は、第１文字列と辞書ＤＢ１２３とを用いて第３文字列を抽出するとともに、第２文字列と辞書ＤＢ１２３とを用いて第４文字列を抽出する。そのため、この情報処理装置１は、誤認識された第１文字列を訂正し、又は、その誤認識の可能性を利用者に知らせる。 On the other hand, for example, if the reliability of the first character string is lower than that of the second character string, the information processing device 1 cannot correct the first character string by simply extracting the third character string. However, the information processing device 1 of this modification uses the first character string and the dictionary DB 123 to extract the third character string, and also uses the second character string and the dictionary DB 123 to extract the fourth character string. . Therefore, the information processing device 1 corrects the erroneously recognized first character string or notifies the user of the possibility of the erroneous recognition.

＜６＞
また、辞書ＤＢ１２３から第４文字列を抽出する場合、情報処理装置１は、第１文字列の信頼度を示す第１信頼度、及び第２文字列の信頼度を示す第２信頼度の少なくともいずれかに応じて、第２情報を出力するとよい。 <6>
Further, when extracting the fourth character string from the dictionary DB 123, the information processing device 1 selects at least one of the first reliability level indicating the reliability level of the first character string and the second reliability level indicating the reliability level of the second character string. It is preferable to output the second information depending on either of the two.

特に、（第１信頼度，第２信頼度，第２類似度）の３次元で示される特徴量のように、第２類似度を含む複数次元の特徴量を用いると、例えば統計的分類手法を適用することにより複雑な判断基準の下に、第２情報の出力内容や出力の有無が決まる。 In particular, when using a multi-dimensional feature including a second similarity, such as a three-dimensional feature (first reliability, second confidence, second similarity), statistical classification methods By applying the above, the content of output of the second information and whether or not to output it are determined based on complex judgment criteria.

＜７＞
また、辞書ＤＢ１２３から第４文字列を抽出する場合、情報処理装置１は、第１信頼度、第２信頼度、及び第２類似度を用いて評価される、第１文字列に対する第４文字列の信頼性が閾値以上であるときに、第１文字列に代えて第４文字列を出力するとよい。 <7>
Further, when extracting a fourth character string from the dictionary DB 123, the information processing device 1 extracts the fourth character with respect to the first character string, which is evaluated using the first reliability, the second reliability, and the second similarity. When the reliability of the string is equal to or higher than a threshold value, it is preferable to output the fourth character string instead of the first character string.

＜８＞
また、辞書ＤＢ１２３から第４文字列を抽出する場合、情報処理装置１は、第１信頼度及び第２信頼度が決められた条件を満たす場合に第４文字列を抽出するとよい。この変形例における情報処理装置１は、第１文字列及び第２文字列の文字認識がいずれも信頼し得る場合に第４文字列の抽出を行わないので、無用な処理負荷が減る。 <8>
Furthermore, when extracting the fourth character string from the dictionary DB 123, the information processing device 1 may extract the fourth character string when the first reliability and the second reliability satisfy predetermined conditions. The information processing device 1 in this modification does not extract the fourth character string when the character recognition of both the first character string and the second character string is reliable, so unnecessary processing load is reduced.

＜９＞
上述した実施形態において、制御部１１は、文字認識モデル１２２から読み出したモデルデータを多層ニューラルネットワークに適用し、手書き文字に対応する文字コードを認識していたが、文字認識の手法はこれに限られない。 <9>
In the embodiment described above, the control unit 11 applied the model data read from the character recognition model 122 to the multilayer neural network to recognize the character code corresponding to the handwritten character, but the method of character recognition is limited to this. I can't do it.

また、制御部１１は、文字認識モデル１２２を読み出すだけではなく、処理の結果に応じて書き換えてもよい。 Further, the control unit 11 may not only read out the character recognition model 122 but also rewrite it according to the result of processing.

例えば、情報処理装置１は、第２画像を構成する各画素の階調値を、多層ニューラルネットワークに入力し、文字認識モデル１２２から取得した、各入力に対する重み係数を適用して文字認識を行うことがある。すなわち、この場合の制御部１１は、第２画像から算出される１以上の特徴量のそれぞれに定められた重みをつけて集計した量に基づいて、第２文字列を認識する。 For example, the information processing device 1 inputs the gradation value of each pixel constituting the second image into a multilayer neural network, and performs character recognition by applying a weighting coefficient to each input obtained from the character recognition model 122. Sometimes. That is, the control unit 11 in this case recognizes the second character string based on the amount of one or more feature amounts calculated from the second image, each of which is given a predetermined weight and then totaled.

そして、情報処理装置１は、第３文字列ごとに第２文字列に対する類似度を示す第１類似度を算出する。この第１類似度に応じて、第２文字列に代えて第３文字列を出力した場合、情報処理装置１は、第２文字列よりも第３文字列（及び、これの抽出に用いられた第１文字列）を信頼したことを意味する。このとき、この変形例における情報処理装置１は、第２文字列の認識に用いられた文字認識モデル１２２を、処理の結果に応じて修正する。具体的には、情報処理装置１は、第２画像から第３文字列が認識されるように、上述した重み係数を修正する。すなわち、この情報処理装置１は、第２文字列に代えて第３文字列を出力した場合に、第２画像からこの第３文字列を認識するように、上述した重みを修正するとよい。この変形例によれば、情報処理装置１の処理の結果が、学習済みモデルである文字認識モデル１２２にフィードバックされるので、文字認識の精度が上がる。 Then, the information processing device 1 calculates a first degree of similarity indicating the degree of similarity to the second character string for each third character string. When outputting the third character string instead of the second character string according to the first similarity, the information processing device 1 outputs the third character string (and the third character string used for extraction) rather than the second character string. This means that the first character string) is trusted. At this time, the information processing device 1 in this modification modifies the character recognition model 122 used to recognize the second character string according to the result of the process. Specifically, the information processing device 1 modifies the weighting coefficient described above so that the third character string is recognized from the second image. That is, when the information processing device 1 outputs the third character string instead of the second character string, it is preferable to modify the weight described above so that the third character string is recognized from the second image. According to this modification, the results of the processing of the information processing device 1 are fed back to the character recognition model 122, which is a trained model, so that the accuracy of character recognition increases.

＜１０＞
また、辞書ＤＢ１２３から第４文字列を抽出する場合、情報処理装置１は、第１画像から算出される１以上の特徴量のそれぞれに定められた重みをつけて集計した量に基づいて、第１文字列を認識するとよい。そして、情報処理装置１は、第１文字列に代えて第４文字列を出力した場合に、第１画像からこの第４文字列を認識するように、上述した重みを修正するとよい。 <10>
Furthermore, when extracting the fourth character string from the dictionary DB 123, the information processing device 1 extracts the fourth character string from the first image based on the amount calculated by assigning predetermined weights to each of the one or more feature amounts calculated from the first image. It is best to recognize one character string. Then, when the information processing device 1 outputs the fourth character string instead of the first character string, it is preferable to modify the weight described above so that the fourth character string is recognized from the first image.

＜１１＞
上述した実施形態において、第２画像は、利用者によって手書きされた漢字の発音を示す文字列、すなわち、フリガナを示す画像であったが、これに限られない。例えば、第２画像は、第１画像に手書きされた文章等に対する翻訳であってもよい。この場合、辞書ＤＢ１２３は、例えば、和英辞書、英和辞書等の言語間の辞典でもよい。 <11>
In the embodiment described above, the second image is an image showing the character string representing the pronunciation of the kanji handwritten by the user, that is, the furigana, but the second image is not limited to this. For example, the second image may be a translation of a sentence written by hand on the first image. In this case, the dictionary DB 123 may be an interlingual dictionary such as a Japanese-English dictionary or an English-Japanese dictionary.

例えば、利用者が第１画像の領域に「自動車」という文字列を手書きし、第２画像の領域に「ｃａｒ」という文字列を手書きする。情報処理装置１は、この第１画像及び第２画像を取得すると、それぞれに対して文字認識処理を行う。その結果、情報処理装置１は、第１画像から「自動車」という文字列を認識し、第２画像から「ｄａｒ」という文字列を認識する。この場合、第１画像の文字認識は成功しているが、第２画像の文字認識は失敗している。 For example, the user handwrites the character string "car" in the area of the first image, and handwrites the character string "car" in the area of the second image. Upon acquiring the first image and the second image, the information processing device 1 performs character recognition processing on each. As a result, the information processing device 1 recognizes the character string "automobile" from the first image, and recognizes the character string "dar" from the second image. In this case, character recognition of the first image is successful, but character recognition of the second image is unsuccessful.

情報処理装置１は、第１画像から認識された第１文字列である「自動車」に基づいて、この第１文字列に関連する第３文字列を辞書ＤＢ１２３から抽出する。抽出された第３文字列は、「ｃａｒ」「ａｕｔｏｍｏｂｉｌｅ」「ａｕｔｏ」「ｍｏｔｏｒｃａｒ」等であり、情報処理装置１は、これら複数の第３文字列ごとに、第２文字列である「ｄａｒ」との第１類似度を算出する。そして、情報処理装置１は、最も類似している「ｃａｒ」を第３文字列として選択し、第１文字列の第１信頼度、第２文字列の第２信頼度、及び選択されたこの第３文字列の第１類似度に基づいて、第２文字列に代えて第３文字列を出力するべきか否かを判断する。 The information processing device 1 extracts a third character string related to the first character string from the dictionary DB 123 based on the first character string "automobile" recognized from the first image. The extracted third character strings are "car", "automobile", "auto", "motorcar", etc., and the information processing device 1 extracts the second character string "dar" for each of these third character strings. A first degree of similarity with the first similarity is calculated. Then, the information processing device 1 selects the most similar "car" as the third character string, and calculates the first reliability of the first character string, the second reliability of the second character string, and the selected Based on the first similarity of the third character string, it is determined whether the third character string should be output in place of the second character string.

＜１２＞
上述した実施形態において、画像認識の入力には画像を示す画像データが用いられたが、画像データは、読取られた画像に限られない。情報処理装置１は、例えば、手書き文字の筆順やストローク等、文字を書く際の経時変化を示す情報から文字を認識してもよい。 <12>
In the embodiments described above, image data representing an image is used for inputting image recognition, but the image data is not limited to a read image. The information processing device 1 may recognize characters from information indicating changes over time when writing characters, such as stroke order and strokes of handwritten characters, for example.

＜１３＞
上述した実施形態において、文字列の信頼度は、特許文献４に記載した数式４を用いて算出されていたが、これに限られない。情報処理装置１は、文字列の信頼度を、この文字列を構成する各文字の文字信頼度に基づいて算出してもよい。情報処理装置１は、文字列の信頼度を、例えば、その文字列に含まれる文字の文字信頼度の平均値によって、算出してもよい。平均値には、例えば、相加平均、相乗平均、調和平均等が用いられてもよい。 <13>
In the embodiment described above, the reliability of a character string is calculated using Equation 4 described in Patent Document 4, but the reliability is not limited to this. The information processing device 1 may calculate the reliability of a character string based on the character reliability of each character constituting the character string. The information processing device 1 may calculate the reliability of a character string, for example, based on the average value of the character reliability of characters included in the character string. For example, an arithmetic mean, a geometric mean, a harmonic mean, etc. may be used as the average value.

また、情報処理装置１は、例えば、文字列に含まれる各文字の文字信頼度の最小値を、その文字列の信頼度として算出してもよい。また、情報処理装置１は、文字列に含まれる各文字の文字信頼度の積を、その文字列の信頼度として算出してもよい。この場合、各文字の文字信頼度は、いずれも０以上１以下等に正規化されたものである。 Further, the information processing device 1 may calculate, for example, the minimum value of the character reliability of each character included in the character string as the reliability of the character string. Further, the information processing device 1 may calculate the product of the character reliability of each character included in the character string as the reliability of the character string. In this case, the character reliability of each character is normalized to 0 or more and 1 or less.

＜１４＞
上述した実施形態において、「追加する」、「削除する」、「入れ替える」という３つの編集処理の編集距離は、いずれも「１」として計算されたが、これらの編集処理には、種類ごとに異なる重みが付けられていてもよい。 <14>
In the embodiment described above, the edit distances for the three editing processes "add", "delete", and "replace" are all calculated as "1", but these editing processes have different distances for each type. Different weights may be given.

また、類似度は、編集距離を初期文字列又は目的文字列の長さで除算して算出されてもよい。例えば、第２文字列が「フジマウキ」であり、第３文字列が「フジユウキ」である場合、目的文字列の長さは「５」であり、第３文字列の第２文字列に対する編集距離は「１」である。この場合、第１類似度は「１／５」、すなわち「０．２」となる。 Further, the similarity may be calculated by dividing the edit distance by the length of the initial character string or the target character string. For example, if the second character string is "Fuji Mauki" and the third character string is "Fuji Yuuki", the length of the target character string is "5", and the edit distance of the third character string with respect to the second character string is is "1". In this case, the first similarity is "1/5", that is, "0.2".

また、類似度は、初期文字列又は目的文字列の長さから、編集距離を差し引いた値で表されてもよい。例えば、目的文字列の長さが「５」、第３文字列の第２文字列に対する編集距離が「１」である場合、第１類似度は「５－１」、すなわち「４」となる。要するに、初期文字列と目的文字列との類似度は、初期文字列から目的文字列への編集距離を用いて算出されるとよく、さらに初期文字列又は目的文字列の長さを用いて算出されてもよい。 Further, the degree of similarity may be expressed as a value obtained by subtracting the edit distance from the length of the initial character string or the target character string. For example, if the length of the target character string is "5" and the edit distance of the third character string to the second character string is "1", the first similarity is "5-1", that is, "4". . In short, the degree of similarity between the initial character string and the target character string is preferably calculated using the edit distance from the initial character string to the target character string, and further calculated using the length of the initial character string or the target character string. may be done.

＜１５＞
上述した実施形態において、情報処理装置１は、利用者に対応付けられていない文字認識モデル１２２を参照していたが、例えば、書き手ごとに対応付けられた文字認識のための学習済みモデルを参照してもよい。すなわち、情報処理装置１は、利用者ごとに異なる学習済みモデルを用いて、その利用者に指示された画像から文字列を認識するとよい。この変形例によれば、例えば、手書き文字の書き手ごとの筆跡、書き癖等に特化した学習済みモデルが文字認識に用いられるので、文字認識の精度が向上する。 <15>
In the embodiment described above, the information processing device 1 refers to the character recognition model 122 that is not associated with a user, but for example, refers to a trained model for character recognition that is associated with each writer. You may. That is, the information processing device 1 may use a different trained model for each user to recognize a character string from an image instructed by the user. According to this modification, for example, a trained model specialized in the handwriting, writing habits, etc. of each handwritten character is used for character recognition, so that the accuracy of character recognition is improved.

＜１６＞
上述した実施形態において、情報処理装置１は、画像読取部１６を有する画像読取装置であったが、画像読取部１６を有しなくてもよい。情報処理装置１は、例えば、通信部１３及び通信回線を介して、媒体から画像を読取る画像読取装置を制御し、この画像読取装置から画像を取得してもよい。また、情報処理装置１は、操作部１４のタッチパネルを操作して利用者が手書きした文字を認識してもよい。この場合、情報処理装置１は、画像を示す画像データとして、タッチパネルが受付けた操作に基づく筆順、ストローク等を含む情報を取得すればよい。 <16>
In the embodiment described above, the information processing device 1 is an image reading device having the image reading section 16, but it may not have the image reading section 16. The information processing device 1 may, for example, control an image reading device that reads an image from a medium via the communication unit 13 and a communication line, and acquire the image from the image reading device. Further, the information processing device 1 may recognize characters handwritten by the user by operating the touch panel of the operation unit 14. In this case, the information processing device 1 may acquire information including the stroke order, strokes, etc. based on the operation received by the touch panel as image data representing the image.

＜１７＞
上述した実施形態において、情報処理装置１は、第１文字列及び第２文字列をそれぞれ１つずつ認識していたが、複数の第１文字列、複数の第２文字列をそれぞれ認識してもよい。この場合、情報処理装置１は、各第１文字列、各第２文字列の組合せごとに、上述した処理を行えばよい。 <17>
In the embodiment described above, the information processing device 1 recognizes one first character string and one second character string, but may recognize multiple first character strings and multiple second character strings, respectively. Good too. In this case, the information processing device 1 may perform the above-described process for each combination of each first character string and each second character string.

＜１８＞
上述した実施形態において、第１類似度は、第３文字列の、第２文字列に対する編集距離に基づいて算出されたが、第３文字列が第２文字列へ編集される際に編集される箇所（以下、編集箇所という）の情報を含んだ情報であってもよい。この場合、第１類似度はスカラー値ではなく、ベクトルで表されてもよい。 <18>
In the embodiment described above, the first similarity is calculated based on the edit distance of the third character string with respect to the second character string, but the first similarity is calculated based on the edit distance of the third character string with respect to the second character string. The information may include information about the edited location (hereinafter referred to as the edited location). In this case, the first similarity may be expressed not as a scalar value but as a vector.

例えば、第２文字列が「フジマウキ」であり、第３文字列が「フジユウキ」である場合、編集距離は「１」であり、第３文字列と第２文字列との相違する箇所、すなわち、編集箇所は３文字目である。この場合、情報処理装置１は、（編集箇所，編集距離）＝（３，１）という複数の要素で構成されるベクトルを第１類似度として算出してもよい。また、この場合、情報処理装置１は、編集箇所所の情報と、第２文字列のその箇所に対応する文字について算出された文字信頼度とを用いて、出力する内容を判断してもよい。この構成によれば、第１類似度が編集距離のみに由来する場合に比べて、第１類似度に含まれる情報が増えるので、例えば、第２文字列を第３文字列に訂正すべきか否かについて、判断の精度が向上する。 For example, if the second character string is "Fuji Mauuki" and the third character string is "Fuji Yuuki", the edit distance is "1", and the difference between the third character string and the second character string, i.e. , the edited part is the third character. In this case, the information processing device 1 may calculate a vector composed of a plurality of elements such as (edited part, edited distance) = (3, 1) as the first similarity. Further, in this case, the information processing device 1 may determine the content to be output using the information on the editing location and the character reliability calculated for the character corresponding to that location in the second character string. . According to this configuration, the information included in the first similarity increases compared to the case where the first similarity is derived only from the edit distance, so for example, whether the second character string should be corrected to the third character string or not. The accuracy of judgment will be improved.

＜１９＞
情報処理装置１の制御部１１によって実行されるプログラムは、磁気テープ及び磁気ディスク等の磁気記録媒体、光ディスク等の光記録媒体、光磁気記録媒体、半導体メモリ等の、コンピュータ装置が読取り可能な記録媒体に記憶された状態で提供し得る。また、このプログラムは、インターネット等の通信回線経由でダウンロードされてもよい。なお、上述した制御部１１によって例示した制御手段としてはＣＰＵ以外にも種々の装置が適用される場合があり、例えば、専用のプロセッサ等が用いられる。 <19>
The program executed by the control unit 11 of the information processing device 1 is a record that can be read by a computer device, such as a magnetic recording medium such as a magnetic tape and a magnetic disk, an optical recording medium such as an optical disk, a magneto-optical recording medium, or a semiconductor memory. It may be provided stored on a medium. Further, this program may be downloaded via a communication line such as the Internet. Note that various devices other than the CPU may be applied as the control means exemplified by the control unit 11 described above, and for example, a dedicated processor or the like may be used.

１…情報処理装置、１１…制御部、１１１…解析部、１１２…認識部、１１３…抽出部、１１４…算出部、１１５…出力部、１２…記憶部、１２１…領域対応表、１２２…文字認識モデル、１２３…辞書ＤＢ、１２３１…辞書名リスト、１２３２…辞書データ、１２４…分類モデル、１３…通信部、１４…操作部、１５…表示部、１６…画像読取部。 DESCRIPTION OF SYMBOLS 1... Information processing device, 11... Control unit, 111... Analysis unit, 112... Recognition unit, 113... Extraction unit, 114... Calculation unit, 115... Output unit, 12... Storage unit, 121... Area correspondence table, 122... Character Recognition model, 123...Dictionary DB, 1231...Dictionary name list, 1232...Dictionary data, 124...Classification model, 13...Communication section, 14...Operation section, 15...Display section, 16...Image reading section.

Claims

a first recognition unit that recognizes a first character string from a first image;
a second recognition unit that recognizes a second character string from a second image related to the first image;
an extraction unit that refers to a dictionary in which different character strings are associated in advance and extracts one or more third character strings related to the first character string;
a calculation unit that calculates, for each of the third character strings, a first degree of similarity indicating the degree of similarity to the second character string;
Evaluated using a first reliability indicating reliability of character recognition of the first character string , a second reliability indicating reliability of character recognition of the second character string , and the first similarity. When the reliability of the third character string with respect to the second character string is less than a threshold, the second character string is output as first information, and when the reliability is greater than or equal to the threshold, the first an output unit that outputs the third character string instead of the second character string as information ;
An information processing device having:

The information processing apparatus according to claim 1 , wherein the extraction unit extracts the third character string when the first reliability and the second reliability satisfy a predetermined condition.

The extraction unit refers to a dictionary in which different character strings are associated in advance, and extracts one or more fourth character strings related to the second character string,
The calculation unit calculates, for each of the fourth character strings, a second degree of similarity indicating a degree of similarity to the first character string,
The output unit is configured to generate a first reliability level based on at least one of the first reliability level and the second reliability level, and a third character string based on at least one of the second similarity level and the fourth character string, according to the second similarity level. The information processing device according to claim 1 or 2, wherein the information processing device outputs two pieces of information.

When the reliability of the fourth character string with respect to the first character string is less than a threshold, the output unit evaluates using the first reliability, the second reliability, and the second similarity. outputting the first character string as the second information, and outputting the fourth character string as the second information in place of the first character string when the reliability is equal to or higher than the threshold value; The information processing device according to claim 3 , characterized in that:

The information processing device according to claim 3 or 4 , wherein the extraction unit extracts the fourth character string when the first reliability and the second reliability satisfy a predetermined condition.

The second recognition unit is
Recognizing the second character string based on an amount calculated by adding a predetermined weight to each of one or more feature amounts calculated from the second image and totaling the amount;
A claim characterized in that, when the output unit outputs the third character string instead of the second character string, the weight is corrected so that the third character string is recognized from the second image. The information processing device according to item 1 .

The first recognition unit is
Recognizing the first character string based on an amount calculated by adding a predetermined weight to each of one or more feature amounts calculated from the first image and totaling the amount;
A claim characterized in that, when the output unit outputs the fourth character string instead of the first character string, the weight is corrected so that the fourth character string is recognized from the first image. The information processing device according to item 4 .

a first recognition unit that recognizes a first character string from a first image;
a second recognition unit that recognizes a second character string from a second image related to the first image;
Referring to a dictionary in which different character strings are associated in advance, one or more third character strings related to the first character string are extracted, and one or more fourth character strings related to the second character string. an extraction unit that extracts
For each of the third character strings, calculate a first degree of similarity indicating the degree of similarity to the second character string, and for each fourth character string, calculate a second degree of similarity indicating the degree of similarity to the first character string. A calculation unit that calculates each,
According to at least one of a first reliability level indicating the reliability level of character recognition of the first character string, and a second reliability level indicating the reliability level of character recognition of the second character string, and the first similarity level. , outputs first information based on at least one of the first similarity and the third character string, and is evaluated using the first reliability, the second reliability, and the second similarity. , when the reliability of the fourth character string with respect to the first character string is less than a threshold, output the first character string as second information, and when the reliability is greater than or equal to the threshold, output the first character string. an output unit that outputs the fourth character string instead of the first character string as second information;
An information processing device having:

The second image is an image including a character string indicating the pronunciation of the character string included in the first image.
The information processing device according to any one of claims 1 to 8.

computer,
a first recognition unit that recognizes a first character string from a first image;
a second recognition unit that recognizes a second character string from a second image related to the first image;
an extraction unit that refers to a dictionary in which different character strings are associated in advance and extracts one or more third character strings related to the first character string;
a calculation unit that calculates, for each of the third character strings, a first degree of similarity indicating the degree of similarity to the second character string;
Evaluated using a first reliability indicating reliability of character recognition of the first character string , a second reliability indicating reliability of character recognition of the second character string , and the first similarity. When the reliability of the third character string with respect to the second character string is less than a threshold, the second character string is output as first information, and when the reliability is greater than or equal to the threshold, the first an output unit that outputs the third character string instead of the second character string as information ;
A program to function as

computer,
a first recognition unit that recognizes a first character string from a first image;
a second recognition unit that recognizes a second character string from a second image related to the first image;
Referring to a dictionary in which different character strings are associated in advance, one or more third character strings related to the first character string are extracted , and one or more fourth character strings related to the second character string. an extraction unit that extracts
For each of the third character strings, calculate a first degree of similarity indicating the degree of similarity to the second character string , and for each fourth character string, calculate a second degree of similarity indicating the degree of similarity to the first character string. A calculation unit that calculates each ,
at least one of a first reliability level indicating the reliability level of character recognition of the first character string, and a second reliability level indicating the reliability level of character recognition of the second character string, and the first similarity level. , outputs first information based on at least one of the third character string and the first similarity , and is evaluated using the first reliability, the second reliability, and the second similarity. , when the reliability of the fourth character string with respect to the first character string is less than a threshold, output the first character string as second information, and when the reliability is greater than or equal to the threshold, output the first character string. an output unit that outputs the fourth character string instead of the first character string as 2 information ;
A program to function as