JP2020149288A

JP2020149288A - Image processing apparatus, image processing method, and image processing program

Info

Publication number: JP2020149288A
Application number: JP2019045632A
Authority: JP
Inventors: 紋宏中島; Ayahiro Nakajima
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 2019-03-13
Filing date: 2019-03-13
Publication date: 2020-09-17

Abstract

To improve the accuracy in character recognition for a voucher image.SOLUTION: An image processing apparatus comprises an extraction unit, and a processing unit, and performs character recognition for a voucher image showing a voucher. The extraction unit extracts, from the voucher image, a plurality of provisional character areas arranged side by side in a row direction. The processing unit sets a plurality of character interval candidates as an interval for partitioning a row area including the plurality of provisional character areas in the row direction, calculates, for each of the character interval candidates, the degree of overlapping between partitioning positions and a character present portion in accordance with the character interval candidates, determines a character interval from the plurality of character interval candidates based on the plurality of calculated degrees of overlapping, and recognizes the characters for partitioned areas obtained by partitioning the row area according to the determined character interval.SELECTED DRAWING: Figure 3

Description

本発明は、証憑を表す証憑画像に対して文字認識を行う画像処理装置、画像処理方法、及び、画像処理プログラムに関する。 The present invention relates to an image processing device, an image processing method, and an image processing program that perform character recognition on a voucher image representing a voucher.

レシート、領収書、等の証憑をスキャナーで読み取った画像に含まれる複数の文字をＯＣＲで認識することが行われている。ここで、ＯＣＲは、装置としてはOptical Character Readerの略称であり、光学式文字認識という意味ではOptical Character Recognitionの略称である。装置としてのＯＣＲは、認識した文字に加えて、画像の中で文字の領域を表す情報等も出力する。 OCR recognizes a plurality of characters contained in an image obtained by scanning a voucher such as a receipt or a receipt with a scanner. Here, OCR is an abbreviation for Optical Character Reader as a device, and is an abbreviation for Optical Character Recognition in the sense of optical character recognition. In addition to the recognized characters, the OCR as a device also outputs information representing a character area in the image.

特許文献１には、小切手表面に印字された文字の認識を行う小切手処理装置が開示されている。この小切手処理装置は、スキャナーから出力された画像データから切り出した第１の切り出し領域について文字認識を行って第１候補を出力し、前述の画像データから切り出した第２の切り出し領域について文字認識を行って第２候補を出力する。第２の切り出し領域の大きさは、第１の切り出し領域の大きさと同じである。 Patent Document 1 discloses a check processing device that recognizes characters printed on the surface of a check. This check processing device performs character recognition on the first cutout area cut out from the image data output from the scanner, outputs the first candidate, and performs character recognition on the second cutout area cut out from the above-mentioned image data. And output the second candidate. The size of the second cutout area is the same as the size of the first cutout area.

特開２００６−１２７３７５号公報Japanese Unexamined Patent Publication No. 2006-127375

ＯＣＲは、画像から複数の文字を一つの領域として切り出したり、１文字に満たない領域を１文字として切り出したりすることがある。上述した小切手処理装置は、切り出し領域の大きさが変わらないので、一つの領域として切り出された複数の文字や、１文字に満たない領域にかかっている文字を、認識することができない。 OCR may cut out a plurality of characters as one area from an image, or cut out an area less than one character as one character. Since the size of the cutout area does not change, the check processing device described above cannot recognize a plurality of characters cut out as one area or characters covering an area less than one character.

本発明の画像処理装置は、証憑を表す証憑画像に対して文字認識を行う画像処理装置であって、
前記証憑画像から、行方向へ並んだ複数の暫定文字領域を抽出する抽出部と、
処理部と、を備え、
前記処理部は、
前記複数の暫定文字領域を含む行領域を前記行方向に区画するための間隔として複数の文字間隔候補を設定し、
各前記文字間隔候補について、前記文字間隔候補に従う区画位置と文字存在部分との重複度合いを算出し、
算出した複数の前記重複度合いに基づいて複数の前記文字間隔候補から文字間隔を決定し、
前記決定した文字間隔に従って前記行領域を区画した区画領域に対して文字を認識する、態様を有する。 The image processing device of the present invention is an image processing device that recognizes characters on a voucher image representing a voucher.
An extraction unit that extracts a plurality of provisional character areas arranged in the line direction from the voucher image,
With a processing unit,
The processing unit
A plurality of character spacing candidates are set as intervals for partitioning the line area including the plurality of provisional character areas in the line direction.
For each of the character spacing candidates, the degree of overlap between the division position according to the character spacing candidate and the character existence portion is calculated.
Based on the calculated multiplicity, the character spacing is determined from the plurality of character spacing candidates.
It has an embodiment in which characters are recognized for a partition area in which the line area is partitioned according to the determined character spacing.

また、本発明の画像処理方法は、
証憑を表す証憑画像に対して文字認識を行う画像処理方法であって、
前記証憑画像から、行方向へ並んだ複数の暫定文字領域を抽出する抽出工程と、
処理工程と、を含み、
前記処理工程では、
前記複数の暫定文字領域を含む行領域を前記行方向において区画するための間隔として複数の文字間隔候補を設定し、
各前記文字間隔候補について、前記文字間隔候補に従う区画位置と文字存在部分との重複度合いを算出し、
算出した複数の前記重複度合いに基づいて複数の前記文字間隔候補から文字間隔を決定し、
前記決定した文字間隔に従って前記行領域を区画した区画領域に対して文字を認識する処理工程と、を含む、態様を有する。 Moreover, the image processing method of the present invention
It is an image processing method that recognizes characters on a voucher image that represents a voucher.
An extraction step of extracting a plurality of provisional character areas arranged in the line direction from the voucher image, and
Including the processing process,
In the processing step,
A plurality of character spacing candidates are set as intervals for partitioning the line area including the plurality of provisional character areas in the line direction.
For each of the character spacing candidates, the degree of overlap between the division position according to the character spacing candidate and the character existence portion is calculated.
Based on the calculated multiplicity, the character spacing is determined from the plurality of character spacing candidates.
It has an aspect including a processing step of recognizing characters with respect to a partition area in which the line area is partitioned according to the determined character spacing.

さらに、本発明の画像処理プログラムは、
証憑を表す証憑画像に対して文字認識を行うための画像処理プログラムであって、
前記証憑画像から、行方向へ並んだ複数の暫定文字領域を抽出する抽出機能と、
処理機能と、をコンピューターに実現させ、
前記処理機能は、
前記複数の暫定文字領域を含む行領域を前記行方向において区画するための間隔として複数の文字間隔候補を設定し、
各前記文字間隔候補について、前記文字間隔候補に従う区画位置と文字存在部分との重複度合いを算出し、
算出した複数の前記重複度合いに基づいて複数の前記文字間隔候補から文字間隔を決定し、
前記決定した文字間隔に従って前記行領域を区画した区画領域に対して文字を認識する、態様を有する。 Further, the image processing program of the present invention
An image processing program for character recognition of a voucher image representing a voucher.
An extraction function that extracts a plurality of provisional character areas arranged in the line direction from the voucher image, and
Realize the processing function on the computer,
The processing function
A plurality of character spacing candidates are set as intervals for partitioning the line area including the plurality of provisional character areas in the line direction.
For each of the character spacing candidates, the degree of overlap between the division position according to the character spacing candidate and the character existence portion is calculated.
Based on the calculated multiplicity, the character spacing is determined from the plurality of character spacing candidates.
It has an embodiment in which characters are recognized for a partition area in which the line area is partitioned according to the determined character spacing.

画像処理装置を含むシステムの構成例を模式的に示すブロック図。A block diagram schematically showing a configuration example of a system including an image processing device. 文字認識処理の例を示すフローチャート。A flowchart showing an example of character recognition processing. 文字間隔決定処理の例を示すフローチャート。The flowchart which shows the example of the character spacing determination processing. ＯＣＲエンジンによる認識結果の例を模式的に示す図。The figure which shows typically the example of the recognition result by an OCR engine. 書体の各構成要素を模式的に示す図。The figure which shows each component of a typeface schematically. 文字間隔を決定する例を模式的に示す図。The figure which shows typically the example which determines the character spacing. 区画位置と文字存在部分との重複度合いの例を模式的に示す図。The figure which shows typically the example of the degree of overlap of the section position and the character existence part. 半角文字間隔の一部を全角文字間隔に変更する例を模式的に示す図。The figure which shows typically the example of changing a part of the half-width character spacing to the full-width character spacing. 証憑画像に対してＯＣＲにより文字認識が行われた例を模式的に示す図。The figure which shows typically the example which character recognition was performed by OCR on the voucher image.

以下、本発明の実施形態を説明する。むろん、以下の実施形態は本発明を例示するものに過ぎず、実施形態に示す特徴の全てが発明の解決手段に必須になるとは限らない。 Hereinafter, embodiments of the present invention will be described. Of course, the following embodiments merely exemplify the present invention, and not all of the features shown in the embodiments are essential for the means for solving the invention.

（１）本発明に含まれる技術の概要：
まず、図１〜９に示される例を参照して本発明に含まれる技術の概要を説明する。尚、本願の図は模式的に例を示す図であり、これらの図に示される各方向の拡大率は異なることがあり、各図は整合していないことがある。むろん、本技術の各要素は、符号で示される具体例に限定されない。「本発明に含まれる技術の概要」において、括弧内は直前の語の補足説明を意味する。 (1) Outline of the technique included in the present invention:
First, an outline of the technique included in the present invention will be described with reference to the examples shown in FIGS. 1 to 9. The figures of the present application are diagrams schematically showing examples, and the enlargement ratios in each direction shown in these figures may be different, and the figures may not be consistent. Of course, each element of the present technology is not limited to the specific example indicated by the reference numeral. In "Overview of the technique included in the present invention", the parentheses mean a supplementary explanation of the immediately preceding word.

態様１：
図１等に例示するように、本技術の一態様に係る画像処理装置（例えばメインサーバー３０）は、証憑を表す証憑画像ＩＭ１に対して文字認識を行う画像処理装置（３０）であって、抽出部Ｕ１、及び、処理部Ｕ２を備える。前記抽出部Ｕ１は、前記証憑画像ＩＭ１から、行方向Ｄ１へ並んだ複数の暫定文字領域Ａ１を抽出する。前記処理部Ｕ２は、前記複数の暫定文字領域Ａ１を含む行領域Ａ１０を前記行方向Ｄ１において区画するための間隔として複数の文字間隔候補ｉを設定し、各前記文字間隔候補ｉについて、前記文字間隔候補ｉに従う区画位置Ｐ１と文字存在部分Ａ１との重複度合い（例えば文字非存在画素の割合Ｒｉ）を算出し、算出した複数の前記重複度合い（Ｒｉ）に基づいて複数の前記文字間隔候補ｉから文字間隔（例えば幅Ｗ３，Ｗ４）を決定し、該決定した文字間隔（Ｗ３，Ｗ４）に従って前記行領域Ａ１０を区画した区画領域Ａ２に対して文字を認識する。 Aspect 1:
As illustrated in FIG. 1 and the like, the image processing device (for example, the main server 30) according to one aspect of the present technology is an image processing device (30) that recognizes characters on the voucher image IM1 representing the voucher. It includes an extraction unit U1 and a processing unit U2. The extraction unit U1 extracts a plurality of provisional character regions A1 arranged in the line direction D1 from the voucher image IM1. The processing unit U2 sets a plurality of character spacing candidates i as intervals for partitioning the line area A10 including the plurality of provisional character areas A1 in the line direction D1, and for each of the character spacing candidates i, the characters. The degree of overlap between the division position P1 and the character presence portion A1 according to the interval candidate i (for example, the ratio Ri of the character non-existing pixels) is calculated, and the plurality of the character interval candidates i are based on the calculated degree of overlap (Ri). Character spacing (for example, widths W3 and W4) is determined from the above, and characters are recognized with respect to the partition area A2 that divides the line area A10 according to the determined character spacing (W3, W4).

上述した態様１では、各文字間隔候補ｉに従った区画位置Ｐ１と、行領域Ａ１０の中で複数の文字が存在する文字存在部分Ａ１１と、の重複度合い（Ｒｉ）に基づいて行領域Ａ１０を区画する文字間隔（Ｗ３，Ｗ４）が決定され、該文字間隔（Ｗ３，Ｗ４）に従って行領域Ａ１０を区画した区画領域Ａ２に対して文字が認識される。証憑に使用される文字の間隔は少ない種類に限定されることが多いので、各文字の切り出しに対するかすれ等の影響が少なくなる。従って、本態様は、証憑画像に対する文字認識の精度を向上させることができる。 In the above-described aspect 1, the line area A10 is set based on the degree of overlap (Ri) between the partition position P1 according to each character spacing candidate i and the character existence portion A11 in which a plurality of characters exist in the line area A10. The character spacing (W3, W4) for partitioning is determined, and characters are recognized for the partitioning area A2 for partitioning the line area A10 according to the character spacing (W3, W4). Since the space between characters used for vouchers is often limited to a small number of types, the influence of faintness on the cutting out of each character is reduced. Therefore, this aspect can improve the accuracy of character recognition for the voucher image.

ここで、証憑は、外部の第三者から入手された会計資料を意味し、レシート、領収書、請求書、納品書、等を含む。
行領域の中で文字が存在しない文字非存在部分は文字存在部分を除いた部分であるので、区画位置と文字存在部分との重複度合いは、複数の区画位置と重複した文字存在画素の割合、複数の区画位置と重複した文字非存在画素の割合、等を含む。
文字間隔は、等間隔に限定されず、半角文字間隔と全角文字間隔が混在する等、複数の間隔が混在していてもよい。文字間隔候補も、等間隔に限定されず、半角文字間隔と全角文字間隔が混在する等、複数の間隔が混在していてもよい。
尚、上述した付言は、以下の態様においても適用される。 Here, the voucher means an accounting material obtained from an external third party, and includes a receipt, a receipt, an invoice, a delivery note, and the like.
Since the non-existing part of the character in the line area excluding the existing part of the character, the degree of overlap between the division position and the existing part of the character is the ratio of the overlapping character existence pixel to the multiple division positions. Includes the percentage of non-existent characters that overlap with multiple partition positions, etc.
The character spacing is not limited to equal spacing, and a plurality of spacing may be mixed, such as a mixture of half-width character spacing and full-width character spacing. The character spacing candidates are not limited to equal spacing, and a plurality of spacing may be mixed, such as a mixture of half-width character spacing and full-width character spacing.
The above-mentioned additional notes also apply in the following aspects.

態様２：
図４に例示するように、前記抽出部Ｕ１は、前記暫定文字領域Ａ１の配置を表す配置情報ＩＡ１を前記暫定文字領域Ａ１に対応付けてもよい。前記処理部Ｕ２は、各前記暫定文字領域Ａ１に対応付けられた前記配置情報ＩＡ１に基づいて前記行領域Ａ１０の高さＨ０を決定してもよく、該決定した高さＨ０に基づいて前記複数の文字間隔候補ｉを設定してもよい。証憑に使用される文字は行の高さに応じた間隔になることが多いので、本態様は、処理を高速化させることができる。 Aspect 2:
As illustrated in FIG. 4, the extraction unit U1 may associate the arrangement information IA1 representing the arrangement of the provisional character area A1 with the provisional character area A1. The processing unit U2 may determine the height H0 of the line area A10 based on the arrangement information IA1 associated with each of the provisional character areas A1, and the plurality of heights H0 based on the determined height H0. Character spacing candidate i may be set. Since the characters used for the voucher are often spaced according to the height of the line, this aspect can speed up the process.

態様３：
図３，６等に例示するように、前記処理部Ｕ２は、前記重複度合い（Ｒｉ）に基づいて前記行領域Ａ１０を半角文字単位で仮に区画する半角文字間隔（例えば幅Ｗ３）を設定してもよい。当該処理部Ｕ２は、図８等に例示するように、前記設定した半角文字間隔（Ｗ３）に従った複数の半角区画位置Ｐ２のうち前記文字存在部分Ａ１１と重複する度合い（例えば文字存在画素の数Ｎｊ）が基準度合い（例えば閾値Ｔｎ）よりも大きい半角区画位置を前記行領域Ａ１０の区画から除くことにより前記半角文字間隔（Ｗ３）を全角文字間隔（例えば幅Ｗ４）に変更してもよい。また、当該処理部Ｕ２は、前記半角文字間隔（Ｗ３）と前記全角文字間隔（Ｗ４）のいずれか一方である前記文字間隔（Ｗ３，Ｗ４）に従って各前記区画領域Ａ２に対して文字を認識してもよい。証憑には半角文字と全角文字が使用されることが多いので、本態様は、証憑画像に対する文字認識の精度をさらに向上させることができる。
ここで、行領域の中で文字非存在部分は文字存在部分を除いた部分であるので、半角区画位置が文字存在部分と重複する度合いは、半角区画位置と重複した文字存在画素の数、半角区画位置と重複した文字非存在画素の数、等を含む。この付言は、以下の態様においても適用される。 Aspect 3:
As illustrated in FIGS. 3 and 6, the processing unit U2 sets a half-width character interval (for example, width W3) that temporarily divides the line area A10 in half-width character units based on the degree of overlap (Ri). May be good. As illustrated in FIG. 8 and the like, the processing unit U2 overlaps with the character existence portion A11 among a plurality of half-width division positions P2 according to the set half-width character spacing (W3) (for example, of the character existence pixel). The half-width character spacing (W3) may be changed to the full-width character spacing (for example, width W4) by excluding the half-width division position where the number Nj) is larger than the reference degree (for example, the threshold value Tn) from the division of the line area A10. .. Further, the processing unit U2 recognizes characters for each partition area A2 according to the character spacing (W3, W4) which is one of the half-width character spacing (W3) and the full-width character spacing (W4). You may. Since half-width characters and full-width characters are often used for the voucher, this aspect can further improve the accuracy of character recognition for the voucher image.
Here, since the non-existing character portion is the portion excluding the existing character portion in the line area, the degree to which the half-width division position overlaps with the existing character portion is the number of characters existing pixels overlapping with the half-width division position and the half-width Includes the number of non-existing characters pixels that overlap with the partition position, etc. This appendix also applies in the following aspects:

態様４：
前記処理部Ｕ２は、証憑を表す画像に含まれる文字を認識するための機械学習により生成された文字認識モデル（例えば図１に示すＤＬエンジン３２）を用いて各前記区画領域Ａ２に対して文字を認識してもよい。本態様は、証憑画像に対する文字認識の精度を向上させる好適な例を提供することができる。 Aspect 4:
The processing unit U2 uses a character recognition model (for example, the DL engine 32 shown in FIG. 1) generated by machine learning for recognizing characters included in an image representing a voucher to display characters for each of the compartment areas A2. May be recognized. This aspect can provide a suitable example of improving the accuracy of character recognition for a voucher image.

態様５：
前記抽出部Ｕ１は、図２等に例示するように、前記証憑画像ＩＭ１に対して第一の文字認識を行うことにより前記証憑画像ＩＭ１から前記複数の暫定文字領域Ａ１を抽出してもよい。前記処理部Ｕ２は、前記第一の文字認識よりも精度が高い第二の文字認識を各前記区画領域Ａ２に対して行ってもよい。本態様も、証憑画像に対する文字認識の精度を向上させる好適な例を提供することができる。 Aspect 5:
As illustrated in FIG. 2, the extraction unit U1 may extract the plurality of provisional character regions A1 from the voucher image IM1 by performing the first character recognition on the voucher image IM1. The processing unit U2 may perform a second character recognition with higher accuracy than the first character recognition for each of the compartment areas A2. This aspect can also provide a suitable example of improving the accuracy of character recognition for a voucher image.

態様６：
また、図２等に例示するように、本技術の一態様に係る画像処理方法は、抽出部Ｕ１に対応する抽出工程ＳＴ１、及び、処理部Ｕ２に対応する処理工程ＳＴ２を含む。本態様も、証憑画像に対する文字認識の精度を向上させることができる。 Aspect 6:
Further, as illustrated in FIG. 2 and the like, the image processing method according to one aspect of the present technology includes an extraction step ST1 corresponding to the extraction unit U1 and a processing step ST2 corresponding to the processing unit U2. This aspect can also improve the accuracy of character recognition for the voucher image.

態様７：
さらに、図１に例示するように、本技術の一態様に係る画像処理プログラムＰＲ１は、抽出部Ｕ１に対応する抽出機能ＦＵ１、及び、処理部Ｕ２に対応する処理機能ＦＵ２をコンピューター（例えばメインサーバー３０）に実現させる。本態様も、証憑画像に対する文字認識の精度を向上させることができる。 Aspect 7:
Further, as illustrated in FIG. 1, the image processing program PR1 according to one aspect of the present technology uses a computer (for example, a main server) for the extraction function FU1 corresponding to the extraction unit U1 and the processing function FU2 corresponding to the processing unit U2. Realize in 30). This aspect can also improve the accuracy of character recognition for the voucher image.

さらに、本技術は、上述した画像処理装置を含む複合装置、上述した画像処理方法を含む情報処理方法、上述した画像処理プログラムを含む情報処理プログラム、前述のいずれかのプログラムを記録したコンピューター読み取り可能な媒体、等に適用可能である。前述のいずれかの装置は、分散した複数の部分で構成されてもよい。 Further, the present technology can read a composite device including the above-mentioned image processing device, an information processing method including the above-mentioned image processing method, an information processing program including the above-mentioned image processing program, and a computer-readable record of any of the above-mentioned programs. It can be applied to various media, etc. Any of the devices described above may be composed of a plurality of dispersed parts.

（２）本技術を想到した背景：
図９は、証憑であるレシートをスキャナーで読み取ることにより得られた証憑画像ＩＭ１に対してＯＣＲにより文字認識が行われた例を模式的に示している。尚、レシートは、キャッシュレジスターで機械的に発行される、宛名の無い領収書を意味する。ＯＣＲは、認識された文字を含む認識文字領域Ａ５を複数、証憑画像ＩＭ１から抽出する。 (2) Background to the idea of this technology:
FIG. 9 schematically shows an example in which character recognition is performed by OCR on the voucher image IM1 obtained by scanning the receipt which is the voucher with a scanner. A receipt means an unaddressed receipt that is mechanically issued at a cash register. The OCR extracts a plurality of recognized character areas A5 including the recognized characters from the voucher image IM1.

例えば、証憑画像ＩＭ１において上の行にある文字「￥」は、１文字として正しく認識されている。この場合、１文字のみ含む認識文字領域Ａ５が証憑画像ＩＭ１から抽出され、正しく認識された文字、及び、認識文字領域Ａ５の範囲を表す情報がＯＣＲから出力される。認識文字領域Ａ５の範囲を表す情報は、詳しくは後述するが、認識文字領域Ａ５の開始座標、認識文字領域Ａ５の幅を表す情報、認識文字領域Ａ５の高さを表す情報、等が含まれる。一方、図９において、かすれＰＡ１を有する文字「０」及び「６」は、それぞれ２文字として認識され、正しく認識されていない。図９には、高さ方向Ｄ２に向いた線状のかすれＰＡ１が入った文字「０」が２つの文字「（」、「）」と認識され、高さ方向Ｄ２に向いた線状のかすれＰＡ１が入った文字「６」が２つの文字「Ｅ」、「ｊ」と認識されていることが示されている。特に、証憑に感熱紙が使用されている場合、文字の一部がかすれ易く、このために文字が正しく認識されないことがある。キャッシュレジスターで発行されるレシートは、感熱紙が使用されることが多いため、文字のかすれが生じ易い。 For example, in the voucher image IM1, the character "\" in the upper line is correctly recognized as one character. In this case, the recognition character area A5 including only one character is extracted from the voucher image IM1, and the correctly recognized characters and the information representing the range of the recognition character area A5 are output from the OCR. The information representing the range of the recognition character area A5 will be described in detail later, but includes information indicating the start coordinates of the recognition character area A5, information indicating the width of the recognition character area A5, information indicating the height of the recognition character area A5, and the like. .. On the other hand, in FIG. 9, the characters "0" and "6" having the faint PA1 are recognized as two characters, respectively, and are not correctly recognized. In FIG. 9, the character "0" containing the PA1 is recognized as two characters "(" and ")", and the linear blur facing the height direction D2. It is shown that the letter "6" containing PA1 is recognized as the two letters "E" and "j". In particular, when thermal paper is used for the voucher, a part of the character is easily blurred, and therefore the character may not be recognized correctly. Since thermal paper is often used for receipts issued by cash registers, characters are likely to be faint.

また、レシートに汚れが付着することもある。図９において、汚れＳＰ１に近い２文字「２０」、２文字「８年」、及び、２文字「２３」は、それぞれ１文字として認識され、正しく認識されていない。特に、２文字「８年」の認識文字領域Ａ５の幅は、半角文字と全角文字を合わせた幅となっており、レシートに使用される文字としての幅に合っていない。 In addition, dirt may adhere to the receipt. In FIG. 9, the two characters "20", the two characters "8 years", and the two characters "23", which are close to the dirty SP1, are each recognized as one character and are not correctly recognized. In particular, the width of the recognition character area A5 for the two characters "8 years" is the total width of the half-width characters and the full-width characters, and does not match the width of the characters used in the receipt.

キャッシュレジスターで発行されるレシートは、全角文字の幅が一定であり、半角文字の幅が全角文字の幅の半分であることが多い。そこで、本技術は、行方向Ｄ１へ並んだ複数の文字を囲む行領域Ａ１０を行方向Ｄ１において区画する文字間隔を複数の候補のうちから選択して、選択した文字間隔に従って各文字領域の文字を認識することにより、文字認識の精度を向上させている。以下、この技術の具体例を説明する。 In receipts issued by cash registers, the width of full-width characters is constant, and the width of half-width characters is often half the width of full-width characters. Therefore, in the present technology, the character spacing for partitioning the line area A10 surrounding a plurality of characters arranged in the line direction D1 in the line direction D1 is selected from a plurality of candidates, and the characters in each character area are selected according to the selected character spacing. By recognizing, the accuracy of character recognition is improved. A specific example of this technique will be described below.

（３）画像処理装置を含むシステムの構成の具体例：
図１は、画像処理装置を含むシステムの構成の具体例を模式的に示している。図１に示すシステムＳＹ１は、スキャナー１０、クライアント２０、画像処理装置の例であるメインサーバー３０、及び、ストレージサーバー４０を含んでいる。ここで、クライアントはクライアントコンピューターの略称であり、メインサーバーはメインサーバーコンピューターの略称であり、ストレージサーバーはストレージサーバーコンピューターの略称である。メインサーバー３０は、画像処理装置の例である。クライアント２０、メインサーバー３０、及び、ストレージサーバー４０は、インターネットを含むネットワークＮＥ１に接続されている。ネットワークＮＥ１への接続は、有線による接続でもよいし、無線による接続でもよいし、有線と無線の両方による接続でもよい。インターネットを含むネットワークＮＥ１は、ＬＡＮを含んでいてもよい。ここで、ＬＡＮは、Local Area Networkの略称である。メインサーバー３０とストレージサーバー４０は、ネットワークＮＥ１を介してユーザーにクラウドサービスを提供可能である。 (3) Specific example of system configuration including image processing device:
FIG. 1 schematically shows a specific example of the configuration of a system including an image processing device. The system SY1 shown in FIG. 1 includes a scanner 10, a client 20, a main server 30 which is an example of an image processing device, and a storage server 40. Here, the client is an abbreviation for a client computer, the main server is an abbreviation for a main server computer, and a storage server is an abbreviation for a storage server computer. The main server 30 is an example of an image processing device. The client 20, the main server 30, and the storage server 40 are connected to the network NE1 including the Internet. The connection to the network NE1 may be a wired connection, a wireless connection, or both a wired and wireless connection. The network NE1 including the Internet may include a LAN. Here, LAN is an abbreviation for Local Area Network. The main server 30 and the storage server 40 can provide cloud services to users via the network NE1.

スキャナー１０は、例えば、光源からの光を原稿に当てて原稿画像を読み取り、データとしての原稿画像を外部へ出力する。図１に示すスキャナー１０は、クライアント２０の通信Ｉ／Ｆ２０ｈに対して有線又は無線により通信可能に接続されている。ここで、Ｉ／Ｆは、インターフェイスの略称である。スキャナー１０は、原稿としての証憑を光学的に読み取ると、対応する証憑画像ＩＭ１をクライアント２０に出力する。スキャナー１０には、原稿台ガラスと原稿カバーとの間に原稿を配置するフラットベッド式のスキャナー、原稿送り装置付きのスキャナー、等を用いることができる。また、スキャナー１０は、スキャナー機能に加えて、印刷機能、ファクシミリ通信機能、といった機能を兼ね備えた複合機でもよい。 For example, the scanner 10 irradiates the original with light from a light source to read the original image, and outputs the original image as data to the outside. The scanner 10 shown in FIG. 1 is connected to the communication I / F 20h of the client 20 so as to be able to communicate by wire or wirelessly. Here, I / F is an abbreviation for an interface. When the scanner 10 optically reads the voucher as a document, the scanner 10 outputs the corresponding voucher image IM1 to the client 20. As the scanner 10, a flatbed type scanner in which a document is arranged between the platen glass and the document cover, a scanner with a document feeder, and the like can be used. Further, the scanner 10 may be a multifunction device having functions such as a printing function and a facsimile communication function in addition to the scanner function.

クライアント２０には、タブレット端末を含めたパーソナルコンピューター、スマートフォン、等を用いることができる。図１に示すクライアント２０は、プロセッサーであるＣＰＵ２０ａ、半導体メモリーであるＲＯＭ２０ｂ、半導体メモリーであるＲＡＭ２０ｃ、記憶装置２０ｄ、クライアント用ネットワークＩ／Ｆ２０ｅ、入力装置２０ｆ、表示装置２０ｇ、通信Ｉ／Ｆ２０ｈ、等を有している。これらの要素２０ａ〜２０ｈ等は、電気的に接続されていることにより互いに情報を入出力可能である。ここで、ＣＰＵはCentral Processing Unitの略称であり、ＲＯＭはRead Only Memoryの略称であり、ＲＡＭはRandom Access Memoryの略称である。 As the client 20, a personal computer including a tablet terminal, a smartphone, or the like can be used. The client 20 shown in FIG. 1 includes a CPU 20a as a processor, a ROM 20b as a semiconductor memory, a RAM 20c as a semiconductor memory, a storage device 20d, a client network I / F20e, an input device 20f, a display device 20g, a communication I / F20h, and the like. have. Since these elements 20a to 20h and the like are electrically connected, information can be input and output from each other. Here, CPU is an abbreviation for Central Processing Unit, ROM is an abbreviation for Read Only Memory, and RAM is an abbreviation for Random Access Memory.

記憶装置２０ｄは、図示していないが、オペレーティングシステム、アプリケーションプログラム、等を記憶している。アプリケーションプログラムは、スキャナー１０を制御するドライバープログラムを含む。このドライバープログラムは、スキャナー１０が証憑を読み取ることにより生成された証憑画像ＩＭ１をスキャナー１０から受信する機能、及び、証憑画像ＩＭ１をメインサーバー３０にアップロードする機構をクライアント２０に実現させる。記憶装置２０ｄには、ハードディスクといった磁気記憶装置、フラッシュメモリーといった不揮発性半導体メモリー、等を用いることができる。ＣＰＵ２０ａは、記憶装置２０ｄに記憶されている情報を適宜、ＲＡＭ２０ｃに読み出し、読み出したプログラムを実行することにより各種処理を行う。ネットワークＩ／Ｆ２０ｅは、ネットワークＮＥ１に接続され、このネットワークＮＥ１に繋がっている相手装置と所定の通信規格に従って通信を行う。入力装置２０ｆには、ポインティングデバイス、キーボードを含むハードキー、表示パネルの表面に貼り付けられたタッチパネル、等を用いることができる。表示装置２０ｇには、液晶表示パネル等を用いることができる。通信Ｉ／Ｆ２０ｈは、スキャナー１０といった周辺機器に接続するためのインターフェイスである。通信Ｉ／Ｆ２０ｈには、ＵＳＢインターフェイス、無線通信インターフェイス、等を用いることができる。ここで、ＵＳＢは、Universal Serial Busの略称である。 Although not shown, the storage device 20d stores an operating system, an application program, and the like. The application program includes a driver program that controls the scanner 10. This driver program provides the client 20 with a function of receiving the voucher image IM1 generated by the scanner 10 reading the voucher from the scanner 10 and a mechanism of uploading the voucher image IM1 to the main server 30. As the storage device 20d, a magnetic storage device such as a hard disk, a non-volatile semiconductor memory such as a flash memory, or the like can be used. The CPU 20a appropriately reads the information stored in the storage device 20d into the RAM 20c, and executes various processes by executing the read program. The network I / F20e is connected to the network NE1 and communicates with a partner device connected to the network NE1 according to a predetermined communication standard. As the input device 20f, a pointing device, a hard key including a keyboard, a touch panel attached to the surface of a display panel, and the like can be used. A liquid crystal display panel or the like can be used for the display device 20 g. The communication I / F 20h is an interface for connecting to a peripheral device such as a scanner 10. A USB interface, a wireless communication interface, or the like can be used for the communication I / F 20h. Here, USB is an abbreviation for Universal Serial Bus.

尚、スキャナー１０は、クライアント２０の構成を含んでいてもよい。この場合、スキャナー１０は、ネットワークＮＥ１を介した外部との通信機能を兼ね備えた装置として機能する。 The scanner 10 may include the configuration of the client 20. In this case, the scanner 10 functions as a device having a function of communicating with the outside via the network NE1.

メインサーバー３０は、ネットワークＮＥ１上で文字認識機能を提供するサーバーコンピューターであり、１台のコンピューターでもよいし、複数台のコンピューターでもよい。図１に示すメインサーバー３０は、プロセッサーであるＣＰＵ３０ａ、半導体メモリーであるＲＯＭ３０ｂ、半導体メモリーであるＲＡＭ３０ｃ、記憶装置３０ｄ、メインサーバー用ネットワークＩ／Ｆ３０ｅ、等を有している。これらの要素３０ａ〜３０ｅ等は、電気的に接続されていることにより互いに情報を入出力可能である。プロセッサーは、一つのＣＰＵに限定されず、複数のＣＰＵ、ＡＳＩＣといったハードウェア回路とＣＰＵとの組合せ、等でもよい。ここで、ＡＳＩＣは、Application Specific Integrated Circuitの略称である。図示していないが、メインサーバー３０は、オペレーターによる操作を受け付けるための入力装置、オペレーターに情報を示すための表示装置、等を備えていてもよい。 The main server 30 is a server computer that provides a character recognition function on the network NE1, and may be one computer or a plurality of computers. The main server 30 shown in FIG. 1 includes a CPU 30a as a processor, a ROM 30b as a semiconductor memory, a RAM 30c as a semiconductor memory, a storage device 30d, a network I / F30e for the main server, and the like. Since these elements 30a to 30e and the like are electrically connected, information can be input and output from each other. The processor is not limited to one CPU, but may be a plurality of CPUs, a combination of a hardware circuit such as an ASIC and a CPU, and the like. Here, ASIC is an abbreviation for Application Specific Integrated Circuit. Although not shown, the main server 30 may be provided with an input device for receiving an operation by the operator, a display device for showing information to the operator, and the like.

記憶装置３０ｄは、図示しないオペレーティングシステム、画像処理プログラムＰＲ１、等を記憶している。記憶装置３０ｄには、ハードディスクといった磁気記憶装置、フラッシュメモリーといった不揮発性半導体メモリー、等を用いることができる。ネットワークＩ／Ｆ３０ｅは、ネットワークＮＥ１に接続され、このネットワークＮＥ１に繋がっている相手装置と所定の通信規格に従って通信を行う。 The storage device 30d stores an operating system (not shown), an image processing program PR1, and the like. As the storage device 30d, a magnetic storage device such as a hard disk, a non-volatile semiconductor memory such as a flash memory, or the like can be used. The network I / F30e is connected to the network NE1 and communicates with a partner device connected to the network NE1 according to a predetermined communication standard.

図１に示すＣＰＵ３０ａは、抽出部Ｕ１の例である汎用的なＯＣＲエンジン３１、及び、ＡＩを利用した文字認識部の例であるＤＬエンジン３２を備えている。ここで、ＡＩはArtificial Intelligenceの略称であり、ＤＬはDeep Learningの略称である。文字認識部は、処理部Ｕ２の一部である。ＯＣＲエンジン３１は、画像処理プログラムＰＲ１がメインサーバー３０に実現させる抽出機能ＦＵ１でもよい。ＤＬエンジン３２は、画像処理プログラムＰＲ１がメインサーバー３０に実現させる文字認識機能でもよい。文字認識機能は、処理機能ＦＵ２の一部である。
図１に示す画像処理プログラムＰＲ１は、制御機能をメインサーバー３０に実現させる制御プログラム３３を含んでいる。制御機能は、処理機能ＦＵ２の一部である。 The CPU 30a shown in FIG. 1 includes a general-purpose OCR engine 31 which is an example of the extraction unit U1 and a DL engine 32 which is an example of a character recognition unit using AI. Here, AI is an abbreviation for Artificial Intelligence, and DL is an abbreviation for Deep Learning. The character recognition unit is a part of the processing unit U2. The OCR engine 31 may be an extraction function FU1 realized by the image processing program PR1 on the main server 30. The DL engine 32 may be a character recognition function realized by the image processing program PR1 on the main server 30. The character recognition function is a part of the processing function FU2.
The image processing program PR1 shown in FIG. 1 includes a control program 33 that realizes a control function on the main server 30. The control function is a part of the processing function FU2.

メインサーバー３０のＣＰＵ３０ａは、記憶装置３０ｄに記憶されている情報を適宜、ＲＡＭ３０ｃに読み出し、読み出したプログラムを実行することにより各種処理を行う。ＣＰＵ３０ａは、ＲＡＭ３０ｃに読み出された画像処理プログラムＰＲ１を実行することにより、上述した機能に対応する処理を行う。画像処理プログラムＰＲ１が上述した機能ＦＵ１，ＦＵ２をコンピューターに実現させる場合、画像処理プログラムＰＲ１は、コンピューターであるメインサーバー３０を、抽出機能ＦＵ１に対応する抽出部Ｕ１、及び、処理機能ＦＵ２に対応する処理部Ｕ２として機能させる。また、画像処理プログラムＰＲ１を実行するメインサーバー３０は、抽出部Ｕ１に対応する抽出工程ＳＴ１、及び、処理部Ｕ２に対応する処理工程ＳＴ２を実施する。上述した機能ＦＵ１，ＦＵ２をコンピューターに実現させる画像処理プログラムＰＲ１を記憶したコンピューター読み取り可能な媒体は、メインサーバー３０の内部の記憶装置に限定されず、メインサーバー３０の外部の記録媒体でもよい。 The CPU 30a of the main server 30 appropriately reads the information stored in the storage device 30d into the RAM 30c and executes various processes by executing the read program. The CPU 30a performs processing corresponding to the above-mentioned function by executing the image processing program PR1 read into the RAM 30c. When the image processing program PR1 realizes the above-mentioned functions FU1 and FU2 on a computer, the image processing program PR1 corresponds to the main server 30, which is a computer, the extraction unit U1 corresponding to the extraction function FU1 and the processing function FU2. It functions as a processing unit U2. Further, the main server 30 that executes the image processing program PR1 executes the extraction process ST1 corresponding to the extraction unit U1 and the processing process ST2 corresponding to the processing unit U2. The computer-readable medium that stores the image processing program PR1 that realizes the above-mentioned functions FU1 and FU2 in a computer is not limited to the storage device inside the main server 30, and may be a recording medium outside the main server 30.

図１に示すメインサーバー３０は、ストレージサーバー４０と通信可能に接続している。ストレージサーバー４０は、ネットワークＮＥ１上でストレージ機能を提供するサーバーコンピューターであり、１台のコンピューターでもよいし、複数台のコンピューターでもよい。図示していないが、ストレージサーバー４０は、プロセッサーであるＣＰＵ、半導体メモリーであるＲＯＭ及びＲＡＭ、内部記憶装置、ストレージサーバー用ネットワークＩ／Ｆ、等を有している。ストレージサーバー４０は、メインサーバー３０から受信したデータを内部記憶装置に記憶可能であり、内部記憶装置に記憶されているデータをネットワークＮＥ１経由で送信可能である。
むろん、サーバーコンピューターがメインサーバー３０とストレージサーバー４０とに分かれていることは一例に過ぎず、メインサーバー３０がストレージサーバー４０の構成を含んでいてもよい。 The main server 30 shown in FIG. 1 is communicably connected to the storage server 40. The storage server 40 is a server computer that provides a storage function on the network NE1, and may be one computer or a plurality of computers. Although not shown, the storage server 40 includes a CPU as a processor, ROMs and RAMs as semiconductor memories, an internal storage device, a network I / F for a storage server, and the like. The storage server 40 can store the data received from the main server 30 in the internal storage device, and can transmit the data stored in the internal storage device via the network NE1.
Of course, the fact that the server computer is divided into the main server 30 and the storage server 40 is only an example, and the main server 30 may include the configuration of the storage server 40.

（４）文字認識処理の具体例：
図２は、メインサーバー３０で行われる文字認識処理を模式的に例示している。図３は、図２のステップＳ１１０で行われる文字間隔決定処理を模式的に例示している。ここで、図２のステップＳ１０２〜Ｓ１０４は、抽出部Ｕ１、抽出工程ＳＴ１、及び、抽出機能ＦＵ１に対応している。図２のステップＳ１０６〜Ｓ１１２、及び、図３のステップＳ２０２〜Ｓ２１２は、処理工程ＳＴ２に対応している。以下、「ステップ」の記載を省略する。また、抽出部Ｕ１が行うことをＯＣＲエンジン３１が行うとして記載し、ＡＩを利用した文字認識部が行うことをＤＬエンジン３２が行うとして記載することにする。 (4) Specific example of character recognition processing:
FIG. 2 schematically illustrates the character recognition process performed by the main server 30. FIG. 3 schematically illustrates the character spacing determination process performed in step S110 of FIG. Here, steps S102 to S104 of FIG. 2 correspond to the extraction unit U1, the extraction step ST1, and the extraction function FU1. Steps S106 to S112 of FIG. 2 and steps S202 to S212 of FIG. 3 correspond to the processing step ST2. Hereinafter, the description of "step" will be omitted. Further, it is described that what the extraction unit U1 does is performed by the OCR engine 31, and what the character recognition unit using AI performs is described as being performed by the DL engine 32.

ユーザーがスキャナー１０にレシートといった証憑を読み取らせると、スキャナー１０は、読み取った証憑を表す証憑画像ＩＭ１を生成し、該証憑画像ＩＭ１をクライアント２０に送信する。クライアント２０は、証憑画像ＩＭ１を受信し、ネットワークＮＥ１を介してメインサーバー３０に送信する。そこで、メインサーバー３０は、Ｓ１０２において、ネットワークＮＥ１を介して証憑画像ＩＭ１を受信する。証憑画像ＩＭ１の例は、図９に示されている。 When the user causes the scanner 10 to read a voucher such as a receipt, the scanner 10 generates a voucher image IM1 representing the read voucher and transmits the voucher image IM1 to the client 20. The client 20 receives the voucher image IM1 and transmits it to the main server 30 via the network NE1. Therefore, the main server 30 receives the voucher image IM1 via the network NE1 in S102. An example of the voucher image IM1 is shown in FIG.

証憑画像ＩＭ１を取得したメインサーバー３０は、Ｓ１０４において、証憑画像ＩＭ１を対象とした第一の文字認識処理を汎用的なＯＣＲエンジン３１に実行させる。ＯＣＲエンジン３１は、図９に例示するように、証憑画像ＩＭ１に対して第一の文字認識を行うことにより証憑画像ＩＭ１から配置情報を含む複数の認識文字領域Ａ５を抽出する。複数の認識文字領域Ａ５は、行方向Ｄ１へ並んだ複数の暫定文字領域Ａ１を含んでいる。本具体例では、行方向Ｄ１へ並んだ複数の暫定文字領域Ａ１を対象として説明する。 The main server 30 that has acquired the voucher image IM1 causes a general-purpose OCR engine 31 to execute the first character recognition process for the voucher image IM1 in S104. As illustrated in FIG. 9, the OCR engine 31 extracts a plurality of recognized character areas A5 including arrangement information from the voucher image IM1 by performing the first character recognition on the voucher image IM1. The plurality of recognized character areas A5 include a plurality of provisional character areas A1 arranged in the line direction D1. In this specific example, a plurality of provisional character areas A1 arranged in the line direction D1 will be described.

ＯＣＲエンジン３１は、証憑画像ＩＭ１のレイアウトを解析し、解析結果に基づいて行を切り出し、切り出された各行に含まれている１文字らしき画像について、当該１文字らしき画像を囲む暫定文字領域Ａ１を設定する。図９において、「￥」、かすれＰＡ１を有する「０」の両側、かすれＰＡ１を有する「６」の両側、汚れに近い「２０」、汚れに近い「８年」、汚れに近い「２３」、等に１文字の暫定文字領域Ａ１が設定されていることが示されている。次に、ＯＣＲエンジン３１は、各暫定文字領域Ａ１の画像から暫定文字領域Ａ１内の文字を所定のアルゴリズムに従って推測する。１文字に対応していない暫定文字領域Ａ１は、多くの場合、正しい文字が認識されないことになる。ＯＣＲエンジン３１は、各暫定文字領域Ａ１の配置を表す配置情報とともに認識文字を出力する。 The OCR engine 31 analyzes the layout of the voucher image IM1, cuts out a line based on the analysis result, and for the one-character-like image included in each cut-out line, creates a provisional character area A1 surrounding the one-character-like image. Set. In FIG. 9, "¥", both sides of "0" having a faint PA1, both sides of "6" having a faint PA1, "20" close to dirt, "8 years" close to dirt, "23" close to dirt, It is shown that the provisional character area A1 of one character is set in the above. Next, the OCR engine 31 estimates the characters in the provisional character area A1 from the image of each provisional character area A1 according to a predetermined algorithm. In the provisional character area A1 that does not correspond to one character, the correct character is not recognized in many cases. The OCR engine 31 outputs the recognition character together with the arrangement information indicating the arrangement of each provisional character area A1.

図４は、ＯＣＲエンジン３１から出力される認識結果ＲＥ９を模式的に例示している。図４の下部には、暫定文字領域Ａ１の配置を表す配置情報ＩＡ１の各項目を説明するための模式的な暫定文字領域Ａ１を示している。認識結果ＲＥ９は、各暫定文字領域Ａ１に対応付けられた認識文字及び配置情報ＩＡ１を含んでいる。配置情報ＩＡ１は、例えば、証憑画像ＩＭ１における画素単位の数値で暫定文字領域Ａ１の配置を表す。図４に示す配置情報ＩＡ１は、暫定文字領域Ａ１の左端の位置を示す文字左端Ｘｓと、暫定文字領域Ａ１の上端の位置を示す文字上端Ｙｓと、暫定文字領域Ａ１の幅を表す文字幅Ｗｃと、暫定文字領域Ａ１の高さを表す文字高さＨｃと、暫定文字領域Ａ１のベースラインＢＬと、を含んでいる。文字左端Ｘｓと文字上端Ｙｓは、暫定文字領域Ａ１の開始座標であり、例えば、暫定文字領域Ａ１の左上の角にある画素の座標値で表される。文字幅Ｗｃは、例えば、行方向Ｄ１における暫定文字領域Ａ１の画素数で表される。文字高さＨｃは、例えば、高さ方向Ｄ２における暫定文字領域Ａ１の画素数で表される。 FIG. 4 schematically illustrates the recognition result RE9 output from the OCR engine 31. At the bottom of FIG. 4, a schematic provisional character area A1 for explaining each item of the arrangement information IA1 representing the arrangement of the provisional character area A1 is shown. The recognition result RE9 includes the recognition character and the arrangement information IA1 associated with each provisional character area A1. The arrangement information IA1 represents, for example, the arrangement of the provisional character area A1 by a numerical value in units of pixels in the voucher image IM1. The layout information IA1 shown in FIG. 4 includes a character left end Xs indicating the position of the left end of the provisional character area A1, a character upper end Ys indicating the position of the upper end of the provisional character area A1, and a character width Wc indicating the width of the provisional character area A1. A character height Hc representing the height of the provisional character area A1 and a baseline BL of the provisional character area A1 are included. The left end Xs of the character and the upper end Ys of the character are the starting coordinates of the provisional character area A1, and are represented by, for example, the coordinate values of the pixels in the upper left corner of the provisional character area A1. The character width Wc is represented by, for example, the number of pixels in the provisional character region A1 in the line direction D1. The character height Hc is represented by, for example, the number of pixels in the provisional character region A1 in the height direction D2.

図５は、書体の各構成要素を模式的に示している。図５の上段には小文字の欧文書体が示され、図５の下段には大文字の欧文書体が示されている。 FIG. 5 schematically shows each component of the typeface. A lowercase European typeface is shown in the upper part of FIG. 5, and an uppercase European typeface is shown in the lower part of FIG.

アルファベットの小文字の高さは、エックスハイト（x-height）、アセンダー（Ascender）、及び、ディセンダー（Descender）に分けることができる。また、エックスハイトとディセンダーとの間の仮想線はベースライン（Baseline）ＢＬであり、ディセンダーの下端の仮想線はディセンダーライン（Descender line）ＤＬであり、エックスハイトとアセンダーとの間の仮想線はミーンライン（Mean line）であり、アセンダーラインの上端の仮想線はアセンダーライン（Ascender line）ＡＬである。アルファベットの小文字には、「ｘ」、「ａ」、「ｃ」、等のようにベースラインＢＬとミーンラインＭＬとの間に配置される文字と、「ｂ」、「ｄ」、等のようにベースラインＢＬとアセンダーラインＡＬとの間に配置される文字と、「ｇ」、「ｊ」、等のようにディセンダーラインＤＬとミーンラインＭＬとの間に配置される文字と、がある。 The height of the lowercase letters of the alphabet can be divided into x-height, ascender, and Descender. Also, the virtual line between the X-height and the descender is the Baseline BL, the virtual line at the bottom of the descender is the Descender line DL, and the virtual line between the X-height and the ascender is. It is the Mean line, and the virtual line at the top of the ascender line is the Ascender line AL. The lowercase letters of the alphabet include letters placed between the baseline BL and the meanline ML, such as "x", "a", "c", etc., and "b", "d", etc. There are characters arranged between the baseline BL and the ascender line AL, and characters arranged between the descender line DL and the mean line ML such as "g", "j", and the like.

アルファベットの大文字の高さは、キャップハイト（Cap height）に合わせられている。キャップハイトの下端の仮想線はベースラインＢＬであり、キャップハイトの上端の仮想線はキャップライン（Cap line）ＣＬである。キャップラインＣＬは、アセンダーラインＡＬよりも低く設計されることもあれば、アセンダーラインＡＬよりも高く設計されることもある。 The height of the uppercase letters of the alphabet is adjusted to the cap height. The virtual line at the lower end of the cap height is the baseline BL, and the virtual line at the upper end of the cap height is the cap line CL. The cap line CL may be designed lower than the ascender line AL or higher than the ascender line AL.

以上説明したように、欧文書体は、ベースラインＢＬが合わせられた状態で複数の文字が並べられる。日本語の場合も、ベースラインＢＬを基準として複数の文字が並べられる。
そこで、ＯＣＲエンジン３１は、各暫定文字領域Ａ１の抽出時に、文字左端Ｘｓ、文字上端Ｙｓ、文字幅Ｗｃ、及び、文字高さＨｃとともにベースラインＢＬを取得する。ＯＣＲエンジン３１は、これらの配置情報ＩＡ１を認識文字とともに暫定文字領域Ａ１に対応付け、得られた認識結果ＲＥ９を出力する。 As described above, in the European typeface, a plurality of characters are arranged with the baseline BL aligned. In the case of Japanese as well, a plurality of characters are arranged with reference to the baseline BL.
Therefore, the OCR engine 31 acquires the baseline BL together with the character left end Xs, the character upper end Ys, the character width Wc, and the character height Hc at the time of extracting each provisional character area A1. The OCR engine 31 associates these arrangement information IA1 with the provisional character area A1 together with the recognition character, and outputs the obtained recognition result RE9.

第一の文字認識の後、メインサーバー３０は、Ｓ１０６において、ＯＣＲエンジン３１から出力された認識結果ＲＥ９、すなわち、各暫定文字領域Ａ１に対応付けられた認識文字と配置情報ＩＡ１を取得する。 After the first character recognition, the main server 30 acquires the recognition result RE9 output from the OCR engine 31, that is, the recognition character and the arrangement information IA1 associated with each provisional character area A1 in S106.

その後、メインサーバー３０は、Ｓ１０８において、行方向Ｄ１へ並んだ１行分の各暫定文字領域Ａ１の配置情報ＩＡ１に基づいて、１行に並んだ複数の暫定文字領域Ａ１を含む行領域Ａ１０を設定する。図５で示したように、同じ行にある複数の文字はベースラインＢＬが合わせられるので、ベースラインＢＬがほぼ同じ複数の暫定文字領域Ａ１が同じ行に配置されていることになる。ここで、最初に注目した暫定文字領域Ａ１のベースラインＢＬの位置をＹｂとし、ベースラインＢＬに対する微小な閾値をＴｂとする。メインサーバー３０は、ベースラインＢＬの位置が（Ｙｂ−Ｔｂ）から（Ｙｂ＋Ｔｂ）までの範囲にある複数の暫定文字領域Ａ１を含む最小の長方形を行領域Ａ１０として設定すればよい。図４に示す例では、ベースラインＢＬが「１０３」である複数の暫定文字領域Ａ１が同じ行に配置されていることになるので、これらの暫定文字領域Ａ１を含む最小の長方形が行領域Ａ１０として設定される。 After that, in S108, the main server 30 sets the line area A10 including the plurality of provisional character areas A1 arranged in one line based on the arrangement information IA1 of each provisional character area A1 arranged in the line direction D1. Set. As shown in FIG. 5, since the baseline BL is aligned with the plurality of characters on the same line, the plurality of provisional character areas A1 having substantially the same baseline BL are arranged on the same line. Here, let Yb be the position of the baseline BL of the provisional character region A1 that was first noticed, and let Tb be a minute threshold value with respect to the baseline BL. The main server 30 may set the smallest rectangle including a plurality of provisional character areas A1 in which the position of the baseline BL is in the range from (Yb-Tb) to (Yb + Tb) as the line area A10. In the example shown in FIG. 4, since a plurality of provisional character areas A1 having a baseline BL of "103" are arranged on the same line, the smallest rectangle including these provisional character areas A1 is the line area A10. Is set as.

図９の中段には、行毎に複数の暫定文字領域Ａ１を含む行領域Ａ１０が設定されている様子が示されている。例えば、メインサーバー３０は、１行分の複数の暫定文字領域Ａ１において、最も左の座標値ＸＬ、最も右の座標値ＸＲ、最も上の座標値ＹＵ、及び、最も下の座標値ＹＤを探すと、左上の座標（ＸＬ，ＹＵ）から右下の座標（ＸＲ，ＹＤ）までの長方形を行領域Ａ１０として設定することができる。最も下の座標値ＹＤから最も上の座標値ＹＵまでの高さは行領域Ａ１０の高さに対応しているので、Ｓ１０８は行領域Ａ１０の高さを決定する処理を示している。座標値ＹＵ，ＹＤが画素単位である場合、行領域Ａ１０の高さは｜ＹＤ−ＹＵ｜−１となる。 In the middle of FIG. 9, it is shown that a line area A10 including a plurality of provisional character areas A1 is set for each line. For example, the main server 30 searches for the leftmost coordinate value XL, the rightmost coordinate value XR, the top coordinate value YU, and the lowest coordinate value YD in a plurality of provisional character areas A1 for one line. And, the rectangle from the upper left coordinate (XL, YU) to the lower right coordinate (XR, YD) can be set as the row area A10. Since the height from the lowest coordinate value YD to the highest coordinate value YU corresponds to the height of the row area A10, S108 indicates a process of determining the height of the row area A10. When the coordinate values YU and YD are in pixel units, the height of the row area A10 is | YD-YU | -1.

図６は、設定された行領域Ａ１０から幅Ｗ３，Ｗ４の文字間隔を決定する様子を模式的に例示している。図６の最上部は、Ｓ１０８に対応し、仮想線で示される行領域Ａ１０の高さＨ０が決定された状態を示している。以下、図６も参照して説明する。 FIG. 6 schematically illustrates how to determine the character spacing of the widths W3 and W4 from the set line area A10. The uppermost part of FIG. 6 corresponds to S108 and shows a state in which the height H0 of the row area A10 indicated by the virtual line is determined. Hereinafter, description will be made with reference to FIG.

行領域Ａ１０の高さＨ０の決定後、メインサーバー３０は、Ｓ１１０において、各行領域Ａ１０の文字間隔を決定する処理を行う。図３は、Ｓ１１０で行われる文字間隔決定処理を示している。 After determining the height H0 of the line area A10, the main server 30 performs a process of determining the character spacing of each line area A10 in S110. FIG. 3 shows the character spacing determination process performed in S110.

まず、メインサーバー３０は、Ｓ２０２において、行領域Ａ１０の高さＨ０の半分の幅Ｗ１を仮の半角文字間隔に設定する。図６には、行領域Ａ１０が行方向Ｄ１において幅Ｗ１＝Ｈ０／２の仮の半角文字間隔で区画された様子が示されている。 First, in S202, the main server 30 sets the width W1 which is half the height H0 of the line area A10 to a temporary half-width character spacing. FIG. 6 shows how the line area A10 is divided in the line direction D1 at temporary half-width character intervals of width W1 = H0 / 2.

仮の半角文字間隔の設定後、メインサーバー３０は、Ｓ２０４において、行領域Ａ１０を行方向Ｄ１において区画する複数の文字間隔候補ｉを幅Ｗ１の仮の半角文字間隔に基づいて設定する。「ｉ」は、文字間隔候補を識別する変数である。ここで、文字間隔候補ｉの文字間隔の幅をＷ２（ｉ）とし、幅Ｗ１の仮の半角文字間隔を中心とした文字間隔の変化量をΔＷとする。この場合、メインサーバー３０は、文字間隔候補ｉの文字間隔の幅Ｗ２（ｉ）をＷ１−ΔＷからＷ１＋ΔＷまで複数段階に設定する。
以上より、Ｓ２０２〜Ｓ２０４は、行領域Ａ１０の高さＨ０に基づいて複数の文字間隔候補ｉを設定する処理を示している。 After setting the temporary half-width character spacing, the main server 30 sets, in S204, a plurality of character spacing candidates i for partitioning the line area A10 in the line direction D1 based on the temporary half-width character spacing of the width W1. “I” is a variable that identifies a character spacing candidate. Here, the width of the character spacing of the character spacing candidate i is W2 (i), and the amount of change in the character spacing centered on the temporary half-width character spacing of the width W1 is ΔW. In this case, the main server 30 sets the character spacing width W2 (i) of the character spacing candidate i in a plurality of stages from W1-ΔW to W1 + ΔW.
From the above, S202 to S204 indicate a process of setting a plurality of character spacing candidates i based on the height H0 of the line area A10.

複数の文字間隔候補ｉの設定後、メインサーバー３０は、Ｓ２０６において、設定された文字間隔候補ｉに従った区画位置の文字非存在画素の割合Ｒｉを各文字間隔候補ｉについて取得する。 After setting the plurality of character spacing candidates i, the main server 30 acquires, in S206, the ratio Ri of the non-existing character pixels at the section positions according to the set character spacing candidates i for each character spacing candidate i.

図７は、文字間隔候補ｉに従った区画位置Ｐ１の文字非存在画素の割合Ｒｉを模式的に例示している。図７に示す証憑画像ＩＭ１の解像度が低いのは分かり易く示すためであり、実際の証憑画像ＩＭ１の解像度は図７に示す解像度よりも高いものとする。
証憑画像ＩＭ１には、白といった地色を表す画素値を有する文字非存在部分Ａ１２の画素と、黒といった印字色を表す画素値を有する文字存在部分Ａ１１の画素と、がある。証憑画像ＩＭ１に含まれる各画素が文字存在部分Ａ１１であるか文字非存在部分Ａ１２であるかは、画素値に対する閾値を基準として画素値が地色側の値であるか印字色側の値であるかを調べることにより判別することができる。図７に示す証憑画像ＩＭ１は、黒又は灰色の画素が文字存在部分Ａ１１であり、白色の画素が文字非存在部分Ａ１２である。ここで、文字非存在画素の割合Ｒｉは、区画位置Ｐ１と文字存在部分Ａ１１との重複度合いを表している。 FIG. 7 schematically illustrates the ratio Ri of the non-existing characters of the character at the division position P1 according to the character spacing candidate i. The resolution of the voucher image IM1 shown in FIG. 7 is low for easy understanding, and the resolution of the actual voucher image IM1 is assumed to be higher than the resolution shown in FIG.
The voucher image IM1 includes a pixel of a character nonexistent portion A12 having a pixel value representing a ground color such as white, and a pixel of a character present portion A11 having a pixel value representing a print color such as black. Whether each pixel included in the voucher image IM1 is the character presence portion A11 or the character non-existence portion A12 depends on whether the pixel value is the value on the ground color side or the value on the print color side based on the threshold value for the pixel value. It can be determined by examining whether or not it exists. In the voucher image IM1 shown in FIG. 7, the black or gray pixel is the character presence portion A11, and the white pixel is the character non-existence portion A12. Here, the ratio Ri of the character non-existing pixels represents the degree of overlap between the division position P1 and the character existing portion A11.

図７には、文字間隔候補ｉを４〜６画素間隔にした場合の各区画位置Ｐ１が示されている。図７では、複数の区画位置Ｐ１にある画素に×印が付されている。尚、文字間隔候補ｉが４画素間隔であることは行方向Ｄ１において隣り合う区画位置Ｐ１同士の間に３画素あることを意味し、文字間隔候補ｉが６画素間隔であることは行方向Ｄ１において隣り合う区画位置Ｐ１同士の間に５画素あることを意味する。図７に示す例では、文字間隔候補ｉが４画素間隔である場合、全区画位置Ｐ１の画素数が８０であり、全区画位置Ｐ１にある文字非存在部分Ａ１２の画素数が６３であり、文字非存在部分の割合Ｒｉは０．７８８（＝６３／８０）である。また、文字間隔候補ｉが５画素間隔である場合、全区画位置Ｐ１の画素数が６４であり、全区画位置Ｐ１にある文字非存在部分Ａ１２の画素数が５４であり、文字非存在部分の割合Ｒｉは０．８４４（＝５４／６４）である。さらに、文字間隔候補ｉが６画素間隔である場合、全区画位置Ｐ１の画素数が５６であり、全区画位置Ｐ１にある文字非存在部分Ａ１２の画素数が３５であり、文字非存在部分の割合Ｒｉは０．６２５（＝３５／５６）である。 FIG. 7 shows each section position P1 when the character spacing candidate i is set to a spacing of 4 to 6 pixels. In FIG. 7, the pixels at the plurality of partition positions P1 are marked with a cross. Note that the character spacing candidate i having a 4-pixel spacing means that there are 3 pixels between adjacent partition positions P1 in the row direction D1, and the character spacing candidate i having a character spacing candidate i having a 6-pixel spacing means that the character spacing candidate i has a row direction D1. It means that there are 5 pixels between adjacent partition positions P1. In the example shown in FIG. 7, when the character spacing candidate i is a 4-pixel spacing, the number of pixels of the total partition position P1 is 80, and the number of pixels of the character nonexistent portion A12 at the total partition position P1 is 63. The ratio Ri of the character non-existing portion is 0.788 (= 63/80). Further, when the character spacing candidate i is 5 pixel spacing, the number of pixels of the total partition position P1 is 64, the number of pixels of the character nonexistent portion A12 at the total partition position P1 is 54, and the character nonexistent portion The ratio Ri is 0.844 (= 54/64). Further, when the character spacing candidate i has a 6-pixel spacing, the number of pixels of the total partition position P1 is 56, the number of pixels of the character nonexistent portion A12 at the total partition position P1 is 35, and the number of pixels of the character nonexistent portion A12 is 35. The ratio Ri is 0.625 (= 35/56).

文字非存在画素の割合Ｒｉの取得後、メインサーバー３０は、Ｓ２０８において、割合Ｒｉが最大となった文字間隔候補ｉの幅Ｗ２（ｉ）を幅Ｗ３の半角文字間隔に決定する。幅Ｗ３は、半角文字の決定された幅である。図６には、複数の幅Ｗ２（ｉ）の文字間隔から幅Ｗ３の半角文字間隔が選択された様子が示されている。図７に示す例では、文字間隔候補ｉが５画素間隔である場合の割合Ｒｉが最大であるので、半角文字間隔が５画素間隔に決定される。
以上説明したＳ２０６〜Ｓ２０８は、文字非存在画素の割合Ｒｉに基づいて行領域Ａ１０を半角文字単位で仮に区画する幅Ｗ３の半角文字間隔を設定する処理を示している。 After acquiring the ratio Ri of the non-existing characters, the main server 30 determines in S208 the width W2 (i) of the character spacing candidate i having the maximum ratio Ri as the half-width character spacing of the width W3. The width W3 is a determined width of half-width characters. FIG. 6 shows how the half-width character spacing of the width W3 is selected from the character spacing of the plurality of widths W2 (i). In the example shown in FIG. 7, since the ratio Ri when the character spacing candidate i is 5 pixel spacing is the maximum, the half-width character spacing is determined to be 5 pixel spacing.
S206 to S208 described above show a process of setting a half-width character interval having a width W3 that temporarily divides the line area A10 in half-width character units based on the ratio Ri of non-existing characters.

なお、上述した例では証憑画像において、４、５、６画素の間隔を例として１画素単位に文字間隔候補を決める処理について説明したが、証憑画像を高解像度化し、高解像度化した画像について同様の処理を行うようにしてもよい。このようにすれば、文字間隔を求める精度をより向上させることができる。 In the above-mentioned example, in the voucher image, the process of determining the character spacing candidate in 1-pixel units using the spacing of 4, 5, and 6 pixels as an example has been described, but the same applies to the voucher image having a higher resolution and a higher resolution image. May be performed. In this way, the accuracy of obtaining the character spacing can be further improved.

半角文字間隔の設定後、メインサーバー３０は、Ｓ２１０において、幅Ｗ３の半角文字間隔に従った各半角区画位置ｊの文字存在画素の数を取得する。 After setting the half-width character spacing, the main server 30 acquires the number of character existence pixels at each half-width division position j according to the half-width character spacing of the width W3 in S210.

図８は、半角文字間隔の一部を全角文字間隔に変更する様子を模式的に例示している。ここで、複数の半角区画位置Ｐ２のそれぞれを識別する変数をｊとし、半角区画位置ｊにある文字存在部分Ａ１１の画素数をＮｊとする。画素数Ｎｊは、半角区画位置Ｐ２と文字存在部分Ａ１１とが重複する度合いを表している。図８に示す例では、各半角区画位置ｊと文字存在部分Ａ１１とが重複した画素の数Ｎｊは、左から順に、０、０、０、０、７、０、０、３となっている。 FIG. 8 schematically illustrates how a part of the half-width character spacing is changed to the full-width character spacing. Here, let j be a variable that identifies each of the plurality of half-width section positions P2, and let Nj be the number of pixels of the character existence portion A11 at the half-width section position j. The number of pixels Nj represents the degree to which the half-width partition position P2 and the character existence portion A11 overlap. In the example shown in FIG. 8, the number Nj of pixels in which each half-width section position j and the character existence portion A11 overlap is 0, 0, 0, 0, 7, 0, 0, 3 in order from the left. ..

各画素数Ｎｊの取得後、メインサーバー３０は、Ｓ２１２において、複数の半角区画位置ｊのうち画素数Ｎｊが閾値Ｔｎを超える半角区画位置を行領域Ａ１０の区画から除去する。その後、メインサーバー３０は、文字間隔決定処理を終了させる。閾値Ｔｎは、画素数Ｎｊに対する基準度合いを表している。例えば、閾値Ｔｎは、一つの半角区画位置ｊに含まれる画素の数に、０よりも大きく１よりも小さい割合を乗じた数（例えば０．２５等）にすることができる。図８に示す例では、一つの半角区画位置ｊに８画素が含まれ、閾値Ｔｎが２画素である。複数の半角区画位置ｊのうち、Ｎｊ＝７の半角区画位置とＮｊ＝３の半角区画位置が行領域Ａ１０の区画から除去される。これにより、Ｎｊ＝７の半角区画位置を挟む半角文字間隔が全角文字間隔に変更され、Ｎｊ＝３の半角区画位置を挟む半角文字間隔が全角文字間隔に変更される。図８の下部には、変更後の文字間隔に従った各区画位置Ｐ３が示されている。図６には、幅Ｗ３の半角文字間隔が幅Ｗ４＝２×Ｗ３の全角文字間隔に変更される様子が示されている。幅Ｗ４は、全角文字の決定された幅である。 After acquiring each number of pixels Nj, the main server 30 removes, in S212, the half-width section position where the number of pixels Nj exceeds the threshold value Tn among the plurality of half-width section positions j from the section of the row area A10. After that, the main server 30 ends the character spacing determination process. The threshold value Tn represents the reference degree with respect to the number of pixels Nj. For example, the threshold value Tn can be a number obtained by multiplying the number of pixels included in one half-width partition position j by a ratio larger than 0 and smaller than 1 (for example, 0.25 or the like). In the example shown in FIG. 8, one half-width partition position j includes 8 pixels, and the threshold value Tn is 2 pixels. Of the plurality of half-width section positions j, the half-width section position of Nj = 7 and the half-width section position of Nj = 3 are removed from the section of the row area A10. As a result, the half-width character spacing sandwiching the half-width section position of Nj = 7 is changed to the full-width character spacing, and the half-width character spacing sandwiching the half-width section position of Nj = 3 is changed to the full-width character spacing. At the bottom of FIG. 8, each section position P3 according to the changed character spacing is shown. FIG. 6 shows how the half-width character spacing of the width W3 is changed to the full-width character spacing of the width W4 = 2 × W3. The width W4 is a determined width of full-width characters.

以上説明したＳ２１０〜Ｓ２１２は、幅Ｗ３の半角文字間隔に従った複数の半角区画位置Ｐ２のうち文字存在部分Ａ１１と重複する度合いが基準度合いよりも大きい半角区画位置Ｐ２を行領域Ａ１０の区画から除くことにより幅Ｗ３の半角文字間隔を幅Ｗ４の全角文字間隔に変更する処理を示している。従って、Ｓ２０６〜Ｓ２１２の処理は、各文字間隔候補ｉに従った区画位置Ｐ１と、文字存在部分Ａ１１と、の重複度合いに基づいて行領域Ａ１０を区画する幅Ｗ３，Ｗ４の文字間隔を決定する処理を示している。 In S210 to S212 described above, the half-width partition position P2 in which the degree of overlap with the character existence portion A11 among the plurality of half-width partition positions P2 according to the half-width character spacing of the width W3 is larger than the reference degree is set from the section of the line area A10. The process of changing the half-width character spacing of the width W3 to the full-width character spacing of the width W4 is shown by removing it. Therefore, in the processing of S206 to S212, the character spacing of the widths W3 and W4 for partitioning the line area A10 is determined based on the degree of overlap between the partition position P1 according to each character spacing candidate i and the character existence portion A11. Indicates processing.

図３で示した文字間隔決定処理の後、メインサーバー３０は、図２のＳ１１２において、幅Ｗ３と幅Ｗ４のいずれか一方である幅の文字間隔に従って行領域Ａ１０から区画された各区画領域Ａ２を設定領域Ａ３とした第二の文字認識をＤＬエンジン３２に実行させる。また、メインサーバー３０は、認識された文字の確からしさを示す確信度ＣをＤＬエンジン３２に出力させる。ＤＬエンジン３２は、設定領域Ａ３に対して文字を認識する第二の文字認識処理、及び、該認識された文字の確からしさを示す確信度Ｃを出力する処理を実行する。第二の文字認識処理は、図８に示すように、各区画領域Ａ２に対して行われる。確信度Ｃは、各区画領域Ａ２に対して出力される。例えば、図８に示す各区画領域Ａ２には、行方向Ｄ１の順に、認識文字「２」とＣ＝１．０、認識文字「０」とＣ＝０．９、認識文字「１」とＣ＝１．０、認識文字「８」とＣ＝０．８、認識文字「年」とＣ＝１．０、認識文字「１」とＣ＝１．０、及び、認識文字「月」とＣ＝０．８が出力されていることが示されている。 After the character spacing determination process shown in FIG. 3, the main server 30 has, in S112 of FIG. 2, each partition area A2 partitioned from the line area A10 according to the character spacing of one of the width W3 and the width W4. Is set to the setting area A3, and the DL engine 32 is made to execute the second character recognition. Further, the main server 30 causes the DL engine 32 to output a certainty degree C indicating the certainty of the recognized character. The DL engine 32 executes a second character recognition process for recognizing a character in the setting area A3 and a process for outputting a certainty degree C indicating the certainty of the recognized character. As shown in FIG. 8, the second character recognition process is performed on each partition area A2. The certainty degree C is output for each partition area A2. For example, in each partition area A2 shown in FIG. 8, the recognition characters “2” and C = 1.0, the recognition characters “0” and C = 0.9, and the recognition characters “1” and C in the order of the line direction D1. = 1.0, recognition characters "8" and C = 0.8, recognition characters "year" and C = 1.0, recognition characters "1" and C = 1.0, and recognition characters "month" and C It is shown that = 0.8 is output.

ＤＬエンジン３２も、文字認識処理を実行するためのＯＣＲエンジンの一種である。しかし、ＤＬエンジン３２は、機械学習の例であるDeep Learningにより生成された文字認識モデルを含み、ＯＣＲエンジン３１により行われる第一の文字認識よりも精度が高い第二の文字認識を行う。ＤＬエンジン３２は、ニューラルネットワークといった公知の機械学習アルゴリズムを利用することにより生成することができる。教師あり機械学習のための教師データには、証憑の文字を表す多量の学習用画像を入力とした入力データ、及び、これら各学習用画像に含まれる文字を出力とした出力データを用いることができる。例えば、入力データと出力データとの関係が教師データとして多層構造のニューラルネットワークに入力されると、ＤＬエンジン３２は、多量の学習用画像の特徴を自動的に学習し、証憑を表す画像に含まれる文字を認識し該文字の確からしさを推定する文字認識モデルを構築する。認識された文字の確からしさを示す確信度Ｃは、例えば、入力画像と同じ学習用画像が複数ある場合に当該複数の学習用画像の中で認識文字と一致する文字を含む学習用画像の割合に対応し、入力画像に含まれる文字に認識文字が一致する確率を意味する。
以上より、ＤＬエンジン３２は、証憑を表す画像に含まれる文字を認識し該文字の確からしさを推定するための機械学習により生成された文字認識モデルを含んでいる。ＤＬエンジン３２は、文字認識モデルを用いて設定領域Ａ３に対して文字を認識し、該認識された文字の確からしさを示す確信度Ｃを文字認識モデルから取得する。 The DL engine 32 is also a type of OCR engine for executing character recognition processing. However, the DL engine 32 includes a character recognition model generated by Deep Learning, which is an example of machine learning, and performs a second character recognition with higher accuracy than the first character recognition performed by the OCR engine 31. The DL engine 32 can be generated by using a known machine learning algorithm such as a neural network. For supervised machine learning, input data with a large amount of learning images representing voucher characters as input and output data with characters included in each of these learning images as output can be used. it can. For example, when the relationship between the input data and the output data is input to the multi-layered neural network as teacher data, the DL engine 32 automatically learns the features of a large amount of training images and includes them in the image representing the voucher. Build a character recognition model that recognizes the characters to be used and estimates the certainty of the characters. The certainty C, which indicates the certainty of the recognized character, is, for example, the ratio of the learning image including the character matching the recognized character among the plurality of learning images when there are a plurality of learning images same as the input image. Corresponds to, and means the probability that the recognized character matches the character included in the input image.
From the above, the DL engine 32 includes a character recognition model generated by machine learning for recognizing characters included in an image representing a voucher and estimating the certainty of the characters. The DL engine 32 recognizes a character with respect to the setting area A3 using the character recognition model, and acquires a certainty degree C indicating the certainty of the recognized character from the character recognition model.

ここで、証憑に現れる文字は、「０」〜「９」の数字、「￥」や「円」といった金額を表す文字、「月」や「日」といった日時を表す漢字、等、種類が限定されている。従って、ＤＬエンジン３２は、ＯＣＲエンジン３１よりも高い精度で文字を認識することが可能である。 Here, the types of characters that appear on the voucher are limited, such as numbers from "0" to "9", characters that represent the amount of money such as "¥" and "yen", and kanji that represent the date and time such as "month" and "day". Has been done. Therefore, the DL engine 32 can recognize characters with higher accuracy than the OCR engine 31.

第二の文字認識の後、メインサーバー３０は、Ｓ１１４において、各区画領域Ａ２の認識文字を少なくとも証憑画像ＩＭ１とともにストレージサーバー４０に保存させる。メインサーバー３０は、各区画領域Ａ２の認識文字、及び、証憑画像ＩＭ１と合わせて、各区画領域Ａ２の確信度Ｃ、各区画領域Ａ２の文字左端、各区画領域Ａ２の文字上端、各区画領域Ａ２の文字幅、各区画領域Ａ２の文字高さ、各区画領域Ａ２のベースライン、等の情報もストレージサーバー４０に保存してもよい。ここで、証憑画像ＩＭ１に含まれる複数の認識文字を認識文字データと呼ぶことにする。例えば、メインサーバー３０を操作するオペレーターは、ストレージサーバー４０に保存されている認識文字データ等の情報を表示装置に表示させると、認識文字データが正しいか否かを確認することができる。この場合、オペレーターは、認識文字データに含まれる認識文字を修正する操作をメインサーバー３０に対して行ってもよい。
Ｓ１１４の保存処理の後、メインサーバー３０は、図２で示した文字認識処理を終了させる。 After the second character recognition, the main server 30 stores the recognized characters of each partition area A2 in the storage server 40 together with at least the voucher image IM1 in S114. The main server 30 together with the recognition character of each partition area A2 and the voucher image IM1, has the certainty degree C of each partition area A2, the left end of the character of each partition area A2, the upper end of the character of each partition area A2, and each partition area. Information such as the character width of A2, the character height of each partition area A2, the baseline of each partition area A2, and the like may also be stored in the storage server 40. Here, a plurality of recognition characters included in the voucher image IM1 will be referred to as recognition character data. For example, the operator who operates the main server 30 can confirm whether or not the recognition character data is correct by displaying information such as the recognition character data stored in the storage server 40 on the display device. In this case, the operator may perform an operation on the main server 30 to correct the recognition character included in the recognition character data.
After the storage process of S114, the main server 30 ends the character recognition process shown in FIG.

ストレージサーバー４０は、ネットワークＮＥ１を介して認識文字データ等の情報を外部へ送信可能である。ストレージサーバー４０に保存された認識文字データは、レシートや請求書といった証憑に記載された取引相手、取引日、取引金額、等の内容を示す文字列を含んでいる。そこで、ストレージサーバー４０は、会計処理、税務上の処理、等の処理のために、会計事務所で使用される端末に認識文字データ等の情報を送信してもよい。また、ストレージサーバー４０に保存されている認識文字データ等の情報は、ユーザーの求めに応じてクライアント２０に送信されてもよいし、ネットワークＮＥ１に接続されたプリンターに送信されたうえで該プリンターにより印刷されてもよい。 The storage server 40 can transmit information such as recognition character data to the outside via the network NE1. The recognition character data stored in the storage server 40 includes a character string indicating the contents of the transaction partner, transaction date, transaction amount, etc. written on the voucher such as a receipt or an invoice. Therefore, the storage server 40 may transmit information such as recognition character data to a terminal used in an accounting office for processing such as accounting processing and tax processing. Further, information such as recognition character data stored in the storage server 40 may be transmitted to the client 20 at the request of the user, or may be transmitted to a printer connected to the network NE1 and then by the printer. It may be printed.

以上説明したように、各文字間隔候補ｉに従った区画位置Ｐ１と、行領域Ａ１０の文字存在部分Ａ１１と、の重複度合いに基づいて行領域Ａ１０を区画する文字間隔が決定され、該文字間隔に従って行領域Ａ１０から区画された各区画領域Ａ２に対して文字が認識される。証憑に使用される文字の間隔は、例えば、一定の半角文字間隔と一定の全角文字間隔のいずれか等、少ない種類に限定されることが多い。このことから、印字された証憑において、文字にかすれがあったり汚れがあったりしても、複数の文字の切れ目に従って各文字が適切に切り出され、各区画領域Ａ２の文字が高精度で認識される。従って、本具体例は、証憑画像に対する文字認識の精度を向上させることができる。 As described above, the character spacing for partitioning the line area A10 is determined based on the degree of overlap between the partition position P1 according to each character spacing candidate i and the character existence portion A11 of the line area A10, and the character spacing is determined. Characters are recognized for each partition area A2 partitioned from the line area A10 according to the above. The character spacing used for vouchers is often limited to a small number, for example, either a fixed half-width character spacing or a fixed full-width character spacing. From this, even if the characters are faint or dirty in the printed voucher, each character is appropriately cut out according to the breaks of a plurality of characters, and the characters in each section area A2 are recognized with high accuracy. To. Therefore, this specific example can improve the accuracy of character recognition for the voucher image.

（５）変形例：
本発明は、種々の変形例が考えられる。
上述した具体例ではメインサーバー３０が画像処理プログラムＰＲ１を実行したが、画像処理プログラムＰＲ１は、クライアント２０とスキャナー１０の少なくとも一方で実行されてもよい。例えば、記憶装置２０ｄに記憶されている画像処理プログラムＰＲ１をクライアント２０が実行する場合、上述した機能ＦＵ１〜ＦＵ３がクライアント２０に実現され、該クライアント２０が画像処理装置の例となる。また、メインサーバー３０とクライアント２０とが協働して画像処理プログラムＰＲ１を実行することも可能である。例えば、メインサーバー３０がＯＣＲエンジン３１とＤＬエンジン３２を実行しクライアント２０が制御プログラム３３を実行することが考えられる。また、メインサーバー３０がＤＬエンジン３２を実行しクライアント２０がＯＣＲエンジン３１と制御プログラム３３を実行してもよい。 (5) Modification example:
Various modifications of the present invention can be considered.
In the specific example described above, the main server 30 executes the image processing program PR1, but the image processing program PR1 may be executed by at least one of the client 20 and the scanner 10. For example, when the client 20 executes the image processing program PR1 stored in the storage device 20d, the above-mentioned functions FU1 to FU3 are realized in the client 20, and the client 20 is an example of the image processing device. It is also possible for the main server 30 and the client 20 to cooperate with each other to execute the image processing program PR1. For example, it is conceivable that the main server 30 executes the OCR engine 31 and the DL engine 32, and the client 20 executes the control program 33. Further, the main server 30 may execute the DL engine 32, and the client 20 may execute the OCR engine 31 and the control program 33.

第一の文字認識よりも精度が高い第二の文字認識は、Deep Learningにより生成された文字認識モデルを含むＤＬエンジンにより実現される以外にも、Deep Learning以外の機械学習の手法により作成されたプログラムにより実現されてもよい。 The second character recognition, which is more accurate than the first character recognition, was created by machine learning methods other than deep learning, in addition to being realized by the DL engine including the character recognition model generated by deep learning. It may be realized programmatically.

図３のＳ２０６〜Ｓ２０８の処理では区画位置の文字非存在画素の割合Ｒｉに基づいて半角文字間隔が決定されたが、区画位置の文字存在画素の割合に基づいて半角文字間隔を決定することも可能である。例えば、メインサーバー３０は、Ｓ２０６において各文字間隔候補ｉに従った区画位置の文字存在画素の割合１−Ｒｉを取得してもよく、Ｓ２０８において割合１−Ｒｉが最小となった文字間隔候補ｉの幅Ｗ２（ｉ）を幅Ｗ３の半角文字間隔に決定してもよい。この場合、文字存在画素の割合１−Ｒｉは、区画位置Ｐ１と文字存在部分Ａ１１との重複度合いを表している。 In the processing of S206 to S208 of FIG. 3, the half-width character spacing is determined based on the ratio Ri of the character non-existing pixels at the division position, but the half-width character spacing may be determined based on the ratio of the character existence pixels at the division position. It is possible. For example, the main server 30 may acquire the ratio 1-Ri of the character existence pixels at the division positions according to each character spacing candidate i in S206, and the character spacing candidate i in which the ratio 1-Ri is the minimum in S208. The width W2 (i) of may be determined as the half-width character spacing of the width W3. In this case, the ratio 1-Ri of the character existence pixels represents the degree of overlap between the division position P1 and the character existence portion A11.

図３のＳ２１０〜Ｓ２１２の処理では各半角区画位置ｊの文字存在画素の数Ｎｊに基づいて行領域Ａ１０の区画から除去する半角区画位置が決定されたが、各半角区画位置ｊの文字非存在画素の数に基づいて行領域Ａ１０の区画から除去する半角区画位置を決定することも可能である。例えば、メインサーバー３０は、Ｓ２１０において半角文字間隔に従った各半角区画位置ｊの文字非存在画素の数を取得してもよく、Ｓ２１２において文字非存在画素の数が閾値未満の半角区画位置を行領域Ａ１０の区画から除去してもよい。この場合、文字非存在画素の数は半角区画位置Ｐ２と文字存在部分Ａ１１とが重複する度合いを表し、前述の閾値は基準度合いを表している。 In the processing of S210 to S212 of FIG. 3, the half-width section position to be removed from the section of the line area A10 was determined based on the number of pixels Nj in which the characters exist at each half-width section position j, but the character does not exist at each half-width section position j. It is also possible to determine the half-width section position to be removed from the section of the row area A10 based on the number of pixels. For example, the main server 30 may acquire the number of non-existing character pixels of each half-width partition position j according to the half-width character spacing in S210, and in S212, the number of non-existing character pixels is less than the threshold value. It may be removed from the section of row area A10. In this case, the number of non-existing character pixels represents the degree of overlap between the half-width partition position P2 and the character existing portion A11, and the above-mentioned threshold value represents the reference degree.

（６）結び：
以上説明したように、本発明によると、種々の態様により、証憑画像に対する文字認識の精度を向上させる技術等を提供することができる。むろん、独立請求項に係る構成要件のみからなる技術でも、上述した基本的な作用、効果が得られる。
また、上述した例の中で開示した各構成を相互に置換したり組み合わせを変更したりした構成、公知技術及び上述した例の中で開示した各構成を相互に置換したり組み合わせを変更したりした構成、等も実施可能である。本発明は、これらの構成等も含まれる。 (6) Conclusion:
As described above, according to the present invention, it is possible to provide a technique for improving the accuracy of character recognition for a voucher image or the like in various aspects. Of course, the above-mentioned basic actions and effects can be obtained even with a technique consisting of only the constituent requirements according to the independent claims.
In addition, the configurations disclosed in the above-mentioned examples are mutually replaced or the combinations are changed, the known techniques and the respective configurations disclosed in the above-mentioned examples are mutually replaced or the combinations are changed. It is also possible to implement the above-mentioned configuration. The present invention also includes these configurations and the like.

１０…スキャナー、２０…クライアント、３０…メインサーバー、３０ｄ…記憶装置、３１…ＯＣＲエンジン、３２…ＤＬエンジン、３３…制御プログラム、４０…ストレージサーバー、Ａ１…暫定文字領域、Ａ２…区画領域、Ａ３…設定領域、Ａ５…認識文字領域、Ａ１０…行領域、Ａ１１…文字存在部分、Ａ１２…文字非存在部分、Ｃ…確信度、Ｄ１…行方向、Ｄ２…高さ方向、Ｈ０…行領域の高さ、ｉ…文字間隔候補、ＩＡ１…配置情報、ＩＭ１…証憑画像、ＮＥ１…ネットワーク、Ｐ１…区画位置、Ｐ２…半角区画位置、Ｐ３…区画位置、ＰＲ１…画像処理プログラム、ＲＥ９…認識結果、Ｕ１…抽出部、Ｕ２…処理部、Ｗ１〜Ｗ４…幅。 10 ... Scanner, 20 ... Client, 30 ... Main server, 30d ... Storage device, 31 ... OCR engine, 32 ... DL engine, 33 ... Control program, 40 ... Storage server, A1 ... Provisional character area, A2 ... Partition area, A3 ... Setting area, A5 ... Recognition character area, A10 ... Line area, A11 ... Character presence part, A12 ... Character non-existence part, C ... Confidence, D1 ... Line direction, D2 ... Height direction, H0 ... Line area height Now, i ... character spacing candidate, IA1 ... arrangement information, IM1 ... voucher image, NE1 ... network, P1 ... division position, P2 ... half-width division position, P3 ... division position, PR1 ... image processing program, RE9 ... recognition result, U1 ... Extraction unit, U2 ... Processing unit, W1 to W4 ... Width.

Claims

An image processing device that recognizes characters on a voucher image that represents a voucher.
An extraction unit that extracts a plurality of provisional character areas arranged in the line direction from the voucher image,
With a processing unit,
The processing unit
A plurality of character spacing candidates are set as intervals for partitioning the line area including the plurality of provisional character areas in the line direction.
For each of the character spacing candidates, the degree of overlap between the division position according to the character spacing candidate and the character existence portion is calculated.
Based on the calculated multiplicity, the character spacing is determined from the plurality of character spacing candidates.
An image processing device that recognizes characters in a section area in which the line area is divided according to the determined character spacing.

The extraction unit associates the arrangement information representing the arrangement of the provisional character area with the provisional character area.
The processing unit determines the height of the line area based on the arrangement information associated with each provisional character area, and sets the plurality of character spacing candidates based on the determined height. Item 1. The image processing apparatus according to item 1.

The processing unit sets a half-width character spacing that temporarily divides the line area in half-width character units based on the degree of overlap, and the character existing portion among a plurality of half-width division positions according to the set half-width character spacing. By removing the half-width section position where the degree of overlap is larger than the reference degree from the section of the line area, the half-width character spacing is changed to the full-width character spacing, and the half-width character spacing or the full-width character spacing is one of the above. The image processing apparatus according to claim 1 or 2, wherein characters are recognized for each of the compartmentalized areas according to character spacing.

Claims 1 to 3, wherein the processing unit recognizes characters for each of the compartments using a character recognition model generated by machine learning for recognizing characters included in an image representing a voucher. The image processing apparatus according to any one item.

The extraction unit extracts the plurality of provisional character areas from the voucher image by performing the first character recognition on the voucher image.
The image processing apparatus according to any one of claims 1 to 4, wherein the processing performs a second character recognition having a higher accuracy than the first character recognition for each of the compartmentalized areas.

It is an image processing method that recognizes characters on a voucher image that represents a voucher.
An extraction step of extracting a plurality of provisional character areas arranged in the line direction from the voucher image, and
Including the processing process,
In the processing step,
A plurality of character spacing candidates are set as intervals for partitioning the line area including the plurality of provisional character areas in the line direction.
For each of the character spacing candidates, the degree of overlap between the division position according to the character spacing candidate and the character existence portion is calculated.
Based on the calculated multiplicity, the character spacing is determined from the plurality of character spacing candidates.
An image processing method including a processing step of recognizing characters in a section area in which the line area is divided according to the determined character spacing.

An image processing program for character recognition of a voucher image representing a voucher.
An extraction function that extracts a plurality of provisional character areas arranged in the line direction from the voucher image, and
Realize the processing function on the computer,
The processing function
A plurality of character spacing candidates are set as intervals for partitioning the line area including the plurality of provisional character areas in the line direction.
For each of the character spacing candidates, the degree of overlap between the division position according to the character spacing candidate and the character existence portion is calculated.
Based on the calculated multiplicity, the character spacing is determined from the plurality of character spacing candidates.
An image processing program that recognizes characters in a section area that divides the line area according to the determined character spacing.