JP2023043910A

JP2023043910A - Character string extraction device, character string extraction method and character string extraction program

Info

Publication number: JP2023043910A
Application number: JP2021151644A
Authority: JP
Inventors: 遼平田中; Ryohei Tanaka
Original assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Current assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Priority date: 2021-09-17
Filing date: 2021-09-17
Publication date: 2023-03-30
Anticipated expiration: 2041-09-17
Also published as: JP7043670B1

Abstract

To highly accurately extract a line image region of a character string for each line.SOLUTION: A character string extraction device 10 comprises: a derivation unit 22; and an extraction unit 24. The derivation unit 22 derives a character string center region likelihood 62, a boundary region likelihood 64 and a background region likelihood 66 for each pixel region from an image 50 of a recording medium entered with characters by using a NNW 23. The extraction unit 24 extracts a line image region 60 of a character string 52 for each line included in the image 50 on the basis of the character string center region likelihood 62, the boundary region likelihood 64 and the background region likelihood 66.SELECTED DRAWING: Figure 2

Description

本発明の実施形態は、文字列抽出装置、文字列抽出方法、および文字列抽出プログラムに関する。 TECHNICAL FIELD Embodiments of the present invention relate to a character string extraction device, a character string extraction method, and a character string extraction program.

記録媒体に記入された文字を認識する技術が知られている。例えば、文字を記入された記録媒体の画像に含まれる文字列を行ごとに抽出し、抽出した行ごとに文字認識する技術が開示されている。例えば、学習モデルに画像を入力することで、画像に含まれる文字列の行領域を所定の縮小率縮小した領域を文字列の中心領域として導出する。そして、導出した中心領域を所定比率で拡大した領域を、１行分の文字列の行画像領域として抽出する技術が開示されている（例えば、非特許文献１参照）。 Techniques for recognizing characters written on a recording medium are known. For example, a technology is disclosed in which a character string included in an image of a recording medium in which characters are written is extracted for each line, and character recognition is performed for each extracted line. For example, by inputting an image into the learning model, an area obtained by reducing the line area of the character string included in the image by a predetermined reduction ratio is derived as the central area of the character string. A technique for extracting a region obtained by enlarging the derived center region by a predetermined ratio as a line image region of a character string for one line is disclosed (see, for example, Non-Patent Document 1).

しかしながら従来技術では、複数の文字列の行が接近または重複して記入されている場合、複数の文字列の行を同一の行の行画像領域として誤特定する場合があった。すなわち、従来技術では、画像から行ごとの文字列の行画像領域を高精度に抽出することは困難であった。 However, in the prior art, when a plurality of lines of character strings are written close to each other or overlapped, there are cases where the lines of a plurality of character strings are erroneously identified as the line image area of the same line. That is, in the conventional technology, it is difficult to extract the line image area of the character string for each line from the image with high accuracy.

ＷｅｎｈａｉＷａｎｇ，ｅｔａｌ．“ＳｈａｐｅＲｏｂｕｓｔＴｅｘｔＤｅｔｅｃｔｉｏｎｗｉｔｈＰｒｏｇｒｅｓｓｉｖｅＳｃａｌｅＥｘｐａｎｓｉｏｎＮｅｔｗｏｒｋ”２０１９Wenhai Wang, et al. “Shape Robust Text Detection with Progressive Scale Expansion Network” 2019

本発明は、上記に鑑みてなされたものであって、行ごとの文字列の行画像領域を高精度に抽出することができる、文字列抽出装置、文字列抽出方法、および文字列抽出プログラムを提供することを目的とする。 The present invention has been made in view of the above, and provides a character string extraction device, a character string extraction method, and a character string extraction program capable of extracting a line image region of a character string for each line with high accuracy. intended to provide

実施形態の文字列抽出装置は、導出部と、抽出部と、を備える。導出部は、ニューラルネットワークを用いて、文字を記入された記録媒体の画像の画素領域ごとに、文字列領域内の文字列中心領域らしさ、前記文字列領域と文字列中心領域との間の境界領域らしさ、および、背景領域らしさ、を導出する。抽出部は、前記文字列中心領域らしさ、前記境界領域らしさ、および前記背景領域らしさに基づいて、前記画像に含まれる行ごとの文字列の行画像領域を抽出する。 A character string extraction device according to an embodiment includes a derivation unit and an extraction unit. The deriving unit uses a neural network to determine, for each pixel area of an image of a recording medium in which characters are written, the likeness of a character string central area within a character string area, the boundary between the character string area and the character string central area. Region-likeness and background-likeness are derived. The extraction unit extracts a line image area of the character string for each line included in the image based on the likeness of the character string center area, the likeness of the boundary area, and the likeness of the background area.

実施形態の文字列抽出装置の構成を示すブロック図。1 is a block diagram showing the configuration of a character string extraction device according to an embodiment; FIG. 導出部および抽出部による処理の流れを示す模式図。FIG. 4 is a schematic diagram showing the flow of processing by a derivation unit and an extraction unit; 文字列中心領域、境界領域、および背景領域の説明図。Explanatory drawing of a character string central area, a boundary area, and a background area. 行画像領域の模式図。Schematic diagram of a row image area. 行画像領域の模式図。Schematic diagram of a row image area. ＮＮＷの学習の説明図。Explanatory drawing of learning of NNW. ＮＮＷの学習の説明図。Explanatory drawing of learning of NNW. ＮＮＷの学習の説明図。Explanatory drawing of learning of NNW. ＮＮＷの学習の説明図。Explanatory drawing of learning of NNW. 損失関数を最小化させる学習の説明図。Explanatory diagram of learning to minimize the loss function. 境界領域の模式図。Schematic diagram of the boundary region. 情報処理の流れを示すフローチャート。4 is a flowchart showing the flow of information processing; 従来の行画像領域の特定の説明図。FIG. 11 is a specific explanatory diagram of a conventional row image area; 本実施形態の行画像領域の抽出の説明図。FIG. 4 is an explanatory diagram of extraction of a row image region according to the embodiment; 文字列領域を所定の縮小比率で縮小した説明図。FIG. 4 is an explanatory diagram showing a character string area reduced at a predetermined reduction ratio; 文字列領域を第２画素数縮小した説明図。FIG. 10 is an explanatory diagram of a character string area reduced by a second number of pixels; ハードウェア構成図。Hardware configuration diagram.

以下に添付図面を参照して、文字列抽出装置、文字列抽出方法、および文字列抽出プログラムを詳細に説明する。 A character string extraction device, a character string extraction method, and a character string extraction program will be described in detail below with reference to the accompanying drawings.

図１は、本実施形態の文字列抽出装置１０の構成の一例を示すブロック図である。 FIG. 1 is a block diagram showing an example of the configuration of a character string extraction device 10 of this embodiment.

文字列抽出装置１０は、文字を記入された記録媒体の画像から行画像領域を抽出する情報処理装置である。行画像領域の詳細は後述する。 The character string extraction device 10 is an information processing device that extracts a line image area from an image of a recording medium in which characters are written. Details of the row image area will be described later.

文字列抽出装置１０は、記憶部１２と、通信部１４と、ＵＩ（ユーザ・インタフェース）部１６と、制御部２０と、を備える。記憶部１２、通信部１４、ＵＩ部１６、および制御部２０は、バス１８などを介して通信可能に接続されている。 The character string extraction device 10 includes a storage unit 12 , a communication unit 14 , a UI (user interface) unit 16 and a control unit 20 . The storage unit 12, the communication unit 14, the UI unit 16, and the control unit 20 are communicably connected via a bus 18 or the like.

記憶部１２は、各種のデータを記憶する。記憶部１２は、例えば、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、フラッシュメモリ等の半導体メモリ素子、ハードディスク、光ディスク等である。なお、記憶部１２は、文字列抽出装置１０の外部に設けられた記憶装置であってもよい。 The storage unit 12 stores various data. The storage unit 12 is, for example, a RAM (Random Access Memory), a semiconductor memory device such as a flash memory, a hard disk, an optical disk, or the like. Note that the storage unit 12 may be a storage device provided outside the character string extraction device 10 .

通信部１４は、ネットワーク等を介して外部の情報処理装置と通信する通信インターフェースである。 The communication unit 14 is a communication interface that communicates with an external information processing device via a network or the like.

ＵＩ部１６は、ユーザによる操作入力を受付ける受付機能、および、各種の情報を表示する表示機能を有する。受付機能は、例えば、マウスなどのポインティングデバイスやキーボードなどによって実現される。表示機能は、例えば、ディスプレイによって実現される。なお、ＵＩ部１６は、受付機能と表示機能を一体的に構成したタッチパネルであってよい。 The UI unit 16 has a reception function for receiving operation input by the user and a display function for displaying various information. The reception function is implemented by, for example, a pointing device such as a mouse, a keyboard, or the like. A display function is realized by, for example, a display. Note that the UI unit 16 may be a touch panel that integrates a reception function and a display function.

制御部２０は、文字列抽出装置１０において各種の情報処理を実行する。 The control unit 20 executes various types of information processing in the character string extraction device 10 .

制御部２０は、導出部２２と、抽出部２４と、文字列認識部２６と、を備える。 The control unit 20 includes a derivation unit 22 , an extraction unit 24 and a character string recognition unit 26 .

導出部２２、抽出部２４、および文字列認識部２６は、例えば、１または複数のプロセッサにより実現される。例えば上記各部は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）などのプロセッサにプログラムを実行させること、すなわちソフトウェアにより実現してもよい。上記各部は、専用のＩＣなどのプロセッサ、すなわちハードウェアにより実現してもよい。上記各部は、ソフトウェアおよびハードウェアを併用して実現してもよい。複数のプロセッサを用いる場合、各プロセッサは、各部のうち１つを実現してもよいし、各部のうち２以上を実現してもよい。 The derivation unit 22, the extraction unit 24, and the character string recognition unit 26 are realized by, for example, one or more processors. For example, each of the above units may be implemented by causing a processor such as a CPU (Central Processing Unit) to execute a program, that is, by software. Each of the above units may be implemented by a processor such as a dedicated IC, that is, by hardware. Each of the above units may be implemented using both software and hardware. When multiple processors are used, each processor may implement one of the units, or may implement two or more of the units.

なお、文字列抽出装置１０の制御部２０は、少なくとも導出部２２および抽出部２４を備えた構成であればよく、文字列認識部２６を備えない構成であってもよい。例えば、文字列認識部２６は、文字列抽出装置１０に通信可能に接続された外部の情報処理装置に搭載されていてもよい。 Note that the control unit 20 of the character string extraction device 10 may be configured to include at least the derivation unit 22 and the extraction unit 24, and may not include the character string recognition unit 26. FIG. For example, the character string recognition unit 26 may be installed in an external information processing device communicably connected to the character string extraction device 10 .

図２は、導出部２２および抽出部２４による処理の流れの一例を示す模式図である。 FIG. 2 is a schematic diagram showing an example of the flow of processing by the derivation unit 22 and the extraction unit 24. As shown in FIG.

導出部２２は、画像５０の画素領域ごとに、文字列中心領域らしさ６２、境界領域らしさ６４、および背景領域らしさ６６を導出する。 The deriving unit 22 derives a character string central region likelihood 62 , a boundary region likelihood 64 , and a background region likelihood 66 for each pixel region of the image 50 .

画像５０は、文字を記入された記録媒体の画像である。図２には、画像５０の一例として画像５０Ａを示す。 Image 50 is an image of a recording medium on which characters are written. FIG. 2 shows an image 50A as an example of the image 50. As shown in FIG.

画像５０Ａには、手書きなどにより記録媒体に記入された文字列５２が含まれる。文字列５２は、書字方向に沿って記入された１または複数の文字の群である。図２には、「１２３４」の文字からなる文字列５２Ａと、「５６７」の文字からなる文字列５２Ｂと、を一例として示す。文字列５２Ａおよび文字列５２Ｂは、文字列５２の一例である。 The image 50A includes a character string 52 written on the recording medium by handwriting or the like. Character string 52 is a group of one or more characters written along the writing direction. FIG. 2 shows, as an example, a character string 52A consisting of characters "1234" and a character string 52B consisting of characters "567". Character string 52A and character string 52B are examples of character string 52 .

画素領域とは、１画素または連続（隣接）する複数の画素からなる領域である。 A pixel region is a region composed of one pixel or a plurality of continuous (adjacent) pixels.

文字列中心領域らしさ６２とは、文字列中心領域８０である度合いを意味する。文字列中心領域らしさ６２は、例えば、文字列中心領域らしさ６２を表すスコア６１などによって表される。境界領域らしさ６４とは、境界領域８２である度合いを意味する。境界領域らしさ６４は、例えば、境界領域らしさ６４を表すスコア６１などによって表される。背景領域らしさ６６とは、背景領域８４である度合いを意味する。背景領域らしさ６６は、例えば、背景領域らしさ６６を表すスコア６１などによって表される。 The character string central area likeness 62 means the degree of character string central area 80 . The character string central area likeness 62 is represented by, for example, a score 61 that expresses the character string central area likeness 62 . The boundary area likelihood 64 means the degree of being the boundary area 82 . The border-region likeness 64 is represented by, for example, a score 61 representing the border-region likeness 64 . The background area likeness 66 means the degree of being the background area 84 . The background area likeness 66 is represented by, for example, a score 61 representing the background area likeness 66 or the like.

図３は、文字列中心領域８０、境界領域８２、および背景領域８４の一例の説明図である。図３には、図２に示す画像５０Ａに含まれる１行分の文字列５２（例えば、文字列５２Ａ）を含む領域を拡大した画像５０の模式図を示す。 FIG. 3 is an explanatory diagram of an example of the character string center area 80, the border area 82, and the background area 84. As shown in FIG. FIG. 3 shows a schematic diagram of an image 50 in which a region including one line of character string 52 (for example, character string 52A) included in image 50A shown in FIG. 2 is enlarged.

文字列領域８６は、画像５０に含まれる１行分の文字列５２の領域である。文字列中心領域８０は、文字列領域８６内の領域である。文字列領域８６内の領域である、とは、文字列領域８６内の領域であって、文字列領域８６以下のサイズの領域であることを示す。 A character string area 86 is an area of one line of character strings 52 included in the image 50 . The character string central region 80 is an area within the character string region 86 . An area within the character string area 86 means an area within the character string area 86 and having a size equal to or smaller than the character string area 86 .

例えば、文字列中心領域８０は、文字列領域８６内の所定位置に向かって該文字列領域８６を第２画素数縮小した領域である。文字列領域８６内の所定位置は、文字列領域８６の位置であればよく、文字列領域８６の中心および中心以外の何れであってもよい。 For example, the character string center area 80 is an area obtained by reducing the character string area 86 by the second number of pixels toward a predetermined position within the character string area 86 . The predetermined position in the character string area 86 may be the position of the character string area 86, and may be the center of the character string area 86 or a position other than the center.

第２画素数は、１以上の画素数であればよく、予め定めればよい。また、第２画素数は、ユーザによるＵＩ部１６の操作指示などに応じて変更可能としてもよい。 The second number of pixels may be one or more pixels, and may be determined in advance. Also, the second number of pixels may be changeable according to an operation instruction of the UI unit 16 by the user.

なお、第２画素数は、文字列領域８６を第２画素数縮小した領域である文字列中心領域８０が該第２画素数分の縮小によって消滅しない画素数に設定される。例えば、文字列領域８６の第２画素数分の縮小によって得られる文字列中心領域８０の最短辺の幅Ｙが、該文字列中心領域８０が消滅しない画素数以上の幅となるように、第２画素数を定めればよい。文字列中心領域８０が消滅しない画素数の最低値は、例えば、１画素である。 The second number of pixels is set to the number of pixels that does not eliminate the character string central region 80, which is the region obtained by reducing the character string region 86 by the second number of pixels, by the reduction by the second number of pixels. For example, the width Y of the shortest side of the character string central region 80 obtained by shrinking the character string region 86 by the second number of pixels is set to a width equal to or larger than the number of pixels that does not cause the character string central region 80 to disappear. 2 It suffices to determine the number of pixels. The minimum number of pixels in which the character string central region 80 does not disappear is, for example, 1 pixel.

境界領域８２は、画像５０における、文字列領域８６と文字列中心領域８０との間の領域である。言い換えると、境界領域８２は、文字列領域８６と文字列中心領域８０との間の全領域である。すなわち、境界領域８２は、ある文字列５２の文字列領域８６の文字列中心領域８０と、他の文字列５２の文字列領域８６の文字列中心領域８０または背景領域８４と、の境界を表す領域である。背景領域８４は、画像５０における、文字列中心領域８０、境界領域８２、および文字列領域８６以外の領域である。 Boundary region 82 is the region between character string region 86 and character string central region 80 in image 50 . In other words, border region 82 is the entire region between character string region 86 and character string center region 80 . That is, the boundary area 82 represents the boundary between the character string central area 80 of the character string area 86 of a certain character string 52 and the character string central area 80 of the character string area 86 of the other character string 52 or the background area 84. area. A background area 84 is an area other than the character string center area 80 , the boundary area 82 , and the character string area 86 in the image 50 .

なお、境界領域８２は、文字列領域８６と文字列中心領域８０との間の領域を含む領域であればよく、文字列領域８６の外側に向かって所定画素数はみ出した領域を含んでいてもよい。この場合、背景領域８４を、画像５０の全画像領域から全ての文字列領域８６を例えば２画素数分拡大した領域を除いた領域とすればよい。そして、境界領域８２は、文字列中心領域８０と背景領域８４以外の領域とすればよい。 Note that the boundary area 82 may be an area that includes an area between the character string area 86 and the character string center area 80, and may include an area protruding a predetermined number of pixels toward the outside of the character string area 86. good. In this case, the background area 84 may be an area obtained by excluding an area obtained by enlarging all the character string areas 86 by, for example, two pixels from the entire image area of the image 50 . The boundary area 82 may be an area other than the character string central area 80 and the background area 84 .

図２に戻り説明を続ける。 Returning to FIG. 2, the description is continued.

導出部２２は、画像５０の画素領域ごとに、文字列中心領域らしさ６２、境界領域らしさ６４、および背景領域らしさ６６を導出する。導出部２２は、ＮＮＷ２３を用いて、画像５０から文字列中心領域らしさ６２、境界領域らしさ６４、および背景領域らしさ６６の各々を表すスコア６１を導出する。導出部２２の詳細は後述する。 The deriving unit 22 derives a character string central region likelihood 62 , a boundary region likelihood 64 , and a background region likelihood 66 for each pixel region of the image 50 . The deriving unit 22 uses the NNW 23 to derive a score 61 representing each of the character string center area likelihood 62 , boundary area likelihood 64 , and background area likelihood 66 from the image 50 . Details of the derivation unit 22 will be described later.

抽出部２４は、画素領域ごとに導出された文字列中心領域らしさ６２、境界領域らしさ６４、および背景領域らしさ６６に基づいて、画像５０に含まれる行画像領域６０を抽出する。 The extraction unit 24 extracts the line image region 60 included in the image 50 based on the character string central region likelihood 62, boundary region likelihood 64, and background region likelihood 66 derived for each pixel region.

行画像領域６０とは、画像５０に含まれる１行分の文字列５２の領域である。 A line image area 60 is an area of one line of character strings 52 included in the image 50 .

抽出部２４は、文字列中心領域らしさ６２、境界領域らしさ６４、および背景領域らしさ６６を用いて、画素領域ごとに文字列中心領域らしさ６２の尤度を算出する。詳細には、抽出部２４は、画素領域ごとに導出された文字列中心領域らしさ６２、境界領域らしさ６４、および背景領域らしさ６６の各々を表すスコア６１を、これらのスコア６１の合計値が“１”となるように正規化することで、各々の尤度を算出する。 The extracting unit 24 uses the character string central area likelihood 62, the boundary area likelihood 64, and the background area likelihood 66 to calculate the likelihood of the character string central area likelihood 62 for each pixel area. Specifically, the extracting unit 24 calculates a score 61 representing each of the character string central region likelihood 62, the boundary region likelihood 64, and the background region likelihood 66 derived for each pixel region, and the total value of these scores 61 is " Each likelihood is calculated by normalizing to 1″.

そして、抽出部２４は、画像５０における、文字列中心領域らしさ６２の尤度が閾値以上の領域を、文字列中心領域８０として特定する。 Then, the extraction unit 24 identifies an area in the image 50 in which the likelihood of the character string central area likelihood 62 is equal to or greater than the threshold as the character string central area 80 .

図２には、文字列中心領域らしさ６２の尤度が閾値以上の領域を領域６３Ａとし、境界領域らしさ６４の尤度が閾値以上の領域を領域６５Ａとし、背景領域らしさ６６の尤度が閾値以上の領域を領域６７Ａとして示す。画像５０Ａを用いた場合、抽出部２４は、文字列中心領域らしさ６２の尤度が閾値以上の領域として、領域６３Ａ１および領域６３Ａ２を特定することとなる。領域６３Ａ１および領域６３Ａ２は、文字列中心領域らしさ６２の尤度が閾値以上の領域６３Ａの一例である。 In FIG. 2, an area where the likelihood of the likelihood of the character string center area 62 is equal to or greater than the threshold is defined as an area 63A, an area where the likelihood of the likelihood of the boundary area 64 is equal to or greater than the threshold is defined as an area 65A, and the likelihood of the likelihood of the background area 66 is equal to or greater than the threshold. The above area is shown as area 67A. When the image 50A is used, the extraction unit 24 specifies the areas 63A1 and 63A2 as areas where the likelihood of the character string center area likelihood 62 is greater than or equal to the threshold. The area 63A1 and the area 63A2 are examples of the area 63A in which the likelihood of the character string center area likelihood 62 is equal to or greater than the threshold.

抽出部２４は、画像５０における、文字列中心領域らしさ６２の尤度が閾値以上の領域６３Ａを、文字列中心領域８０として特定する。図２に示す例の場合、抽出部２４は、文字列中心領域らしさ６２の尤度が閾値以上の領域６３Ａ１を、文字列中心領域８０Ａとして特定する。また、抽出部２４は、文字列中心領域らしさ６２の尤度が閾値以上の領域６３Ａ２を、文字列中心領域８０Ｂとして特定する。文字列中心領域８０Ａおよび文字列中心領域８０Ｂは、文字列中心領域８０の一例である。文字列中心領域８０Ａは、文字列５２Ａに対応する文字列領域８６Ａの文字列中心領域８０である。文字列中心領域８０Ｂは、文字列５２Ｂに対応する文字列領域８６Ｂの文字列中心領域８０である。 The extraction unit 24 identifies an area 63</b>A in the image 50 in which the likelihood of the character string central area likeness 62 is greater than or equal to a threshold as the character string central area 80 . In the example shown in FIG. 2, the extracting unit 24 identifies an area 63A1 in which the likelihood of the character string central area likelihood 62 is equal to or greater than the threshold as the character string central area 80A. In addition, the extraction unit 24 identifies an area 63A2 in which the likelihood of the character string central area likelihood 62 is equal to or greater than the threshold as the character string central area 80B. The character string central area 80A and the character string central area 80B are examples of the character string central area 80. FIG. The character string central area 80A is the character string central area 80 of the character string area 86A corresponding to the character string 52A. The character string central area 80B is the character string central area 80 of the character string area 86B corresponding to the character string 52B.

抽出部２４は、特定した文字列中心領域８０、または、特定した文字列中心領域８０を第１画素数拡大した領域を、行画像領域６０として抽出する。このため、図２に示す例の場合、例えば、抽出部２４は、文字列５２Ａに対応する文字列領域８６Ａの文字列中心領域８０Ａ、または文字列中心領域８０Ａを第１画素数拡大した領域を、行画像領域６０Ａとして抽出する。また、抽出部２４は、文字列５２Ｂに対応する文字列領域８６Ｂの文字列中心領域８０Ｂ、または文字列中心領域８０Ｂを第１画素数拡大した領域を、行画像領域６０Ｂとして抽出する。行画像領域６０Ａおよび行画像領域６０Ｂは、行画像領域６０の一例である。 The extraction unit 24 extracts the identified character string central region 80 or an area obtained by enlarging the identified character string central region 80 by the first number of pixels as the line image region 60 . For this reason, in the case of the example shown in FIG. 2, for example, the extracting unit 24 extracts the character string central region 80A of the character string region 86A corresponding to the character string 52A, or the region obtained by enlarging the character string central region 80A by the first number of pixels. , are extracted as the row image area 60A. The extracting unit 24 also extracts a character string central region 80B of a character string region 86B corresponding to the character string 52B, or a region obtained by enlarging the character string central region 80B by the first number of pixels, as a row image region 60B. Line image area 60A and line image area 60B are examples of line image area 60 .

第１画素数は、１以上の画素数を表す値であればよい。第１画素数は、文字列中心領域８０を第１画素数拡大した領域が該文字列中心領域８０を含む文字列領域８６の外枠を超えない画素数にあらかじめ調整される。なお、第１画素数は、ユーザによるＵＩ部１６の操作指示などに応じて、これらの条件を満たす範囲内で変更可能としてもよい。 The first number of pixels may be a value representing the number of pixels of 1 or more. The first number of pixels is adjusted in advance to the number of pixels that does not exceed the outer frame of the character string area 86 including the character string central area 80 when the character string central area 80 is enlarged by the first number of pixels. Note that the first pixel count may be changeable within a range that satisfies these conditions according to an operation instruction of the UI unit 16 by the user or the like.

図４Ａおよび図４Ｂは、行画像領域６０の一例の模式図である。 4A and 4B are schematic diagrams of an example of row image area 60. FIG.

図４Ａに示すように、例えば、抽出部２４は、文字列中心領域８０を第１画素数拡大することで、文字列中心領域８０を文字列領域８６の外枠と一致する範囲にまで拡大した領域を、行画像領域６０として抽出する。 As shown in FIG. 4A , for example, the extraction unit 24 expands the character string central region 80 by the first number of pixels to expand the character string central region 80 to a range that matches the outer frame of the character string region 86. Regions are extracted as line image regions 60 .

また、図４Ｂに示すように、抽出部２４は、文字列中心領域８０を第１画素数拡大することで、文字列中心領域８０を文字列領域８６の範囲内で第１画素数拡大した領域を、行画像領域６０として抽出してもよい。 Further, as shown in FIG. 4B , the extraction unit 24 expands the character string central region 80 by the first number of pixels, and expands the character string central region 80 by the first number of pixels within the range of the character string region 86. may be extracted as row image region 60 .

また、抽出部２４は、文字列中心領域８０を行画像領域６０として抽出してもよい。 Also, the extracting unit 24 may extract the character string central region 80 as the line image region 60 .

図２に戻り説明を続ける。導出部２２について詳細に説明する。 Returning to FIG. 2, the description is continued. The derivation unit 22 will be described in detail.

導出部２２は、ＮＮＷ（ニューラルネットワーク）２３を備える。導出部２２は、ＮＮＷ２３を用いて、画素領域ごとに、文字列中心領域らしさ６２、境界領域らしさ６４、および背景領域らしさ６６を導出する。 The derivation unit 22 includes an NNW (neural network) 23 . The deriving unit 22 uses the NNW 23 to derive a character string center area likelihood 62, a boundary area likelihood 64, and a background area likelihood 66 for each pixel area.

ＮＮＷ２３は、画像５０を入力とし、画像５０の画素領域ごとの、文字列中心領域らしさ６２、境界領域らしさ６４、および背景領域らしさ６６を出力とする学習モデルである。ＮＮＷ２３は、例えば、ＣＮＮ（ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）、ＧＣＮ（ＧｒａｐｈＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｔｗｏｒｋ）、またはＲＮＮ（ＲｅｃｕｒｒｅｎｔＮｅｕｒａｌＮｅｔｗｏｒｋ）などのニューラルネットワークで構成される深層学習モデル（ＤＮＮ）である。 The NNW 23 is a learning model that receives the image 50 as an input and outputs a character string center area likelihood 62 , a boundary area likelihood 64 , and a background area likelihood 66 for each pixel area of the image 50 . The NNW 23 is a deep learning model (DNN) composed of a neural network such as a CNN (Convolutional Neural Network), a GCN (Graph Convolutional Network), or an RNN (Recurrent Neural Network).

導出部２２は、複数の学習データ７０を用いて予めＮＮＷ２３を学習する。学習データ７０は、画像５０とスコア６１との対である。スコア６１は、画素領域ごとの、文字列中心領域らしさ６２、境界領域らしさ６４、および背景領域らしさ６６の各々を表すスコアである。学習データ７０に含まれるスコア６１は、対応する画像５０の正解データに相当する。学習データ７０に含まれる画像５０の少なくとも一部には、斜めに記入された文字列５２を含む画像５０、複数行の文字列５２が接近または重複して記入されている画像５０などを用いればよい。 The derivation unit 22 learns the NNW 23 in advance using a plurality of learning data 70 . Training data 70 are pairs of images 50 and scores 61 . The score 61 is a score representing each of character string central area likeness 62, boundary area likeness 64, and background area likeness 66 for each pixel area. A score 61 included in the learning data 70 corresponds to the correct data of the corresponding image 50 . For at least a part of the image 50 included in the learning data 70, if an image 50 including a character string 52 written obliquely, an image 50 in which a plurality of lines of character strings 52 are written closely or overlappingly, etc. good.

導出部２２は、抽出部２４において特定される互いに異なる行の文字列５２の文字列中心領域８０が非連結となる、文字列中心領域らしさ６２、境界領域らしさ６４、および背景領域らしさ６６を導出するように、ＮＮＷ２３を学習することが好ましい。 The deriving unit 22 derives a character string central area likeness 62, a boundary area likeness 64, and a background area likeness 66 in which the character string central areas 80 of the character strings 52 on different lines specified by the extracting unit 24 are not connected. It is preferable to learn the NNW 23 so that it does.

図５Ａ～図５Ｄは、ＮＮＷ２３の学習の一例の説明図である。 5A to 5D are explanatory diagrams of an example of learning of the NNW 23. FIG.

例えば、複数の文字列領域８６の一部が重なった状態で記録媒体に記入されている場合がある。具体的には、図５Ａに示すように、画像５０Ａに含まれる文字列領域８６Ａと文字列領域８６Ｂとが重なる場合を想定する。文字列領域８６Ａおよび文字列領域８６Ｂは、文字列領域８６の一例である。文字列領域８６Ａは、文字列５２Ａに対応する文字列領域８６である。文字列領域８６Ｂは、文字列５２Ｂに対応する文字列領域８６である。 For example, there are cases where a plurality of character string areas 86 are written on the recording medium in a state in which parts of them overlap. Specifically, as shown in FIG. 5A, it is assumed that a character string region 86A and a character string region 86B included in image 50A overlap each other. Character string area 86A and character string area 86B are examples of character string area 86 . A character string area 86A is a character string area 86 corresponding to the character string 52A. A character string area 86B is a character string area 86 corresponding to the character string 52B.

この場合、導出部２２は、抽出部２４において特定される互いに異なる行の文字列５２の文字列中心領域８０が非連結となる、文字列中心領域らしさ６２、境界領域らしさ６４、および背景領域らしさ６６を導出するように、ＮＮＷ２３を学習することが好ましい。 In this case, the deriving unit 22 extracts the character string central region-likenesses 62, the boundary region-likenesses 64, and the background region-likenesses in which the character string central regions 80 of the character strings 52 on different lines specified by the extracting unit 24 are not connected. NNW 23 is preferably trained to derive .66.

すなわち、ＮＮＷ２３は、互いに異なる行の文字列５２の文字列中心領域８０が非連結となる、文字列中心領域らしさ６２、境界領域らしさ６４、および背景領域らしさ６６を導出するように、予め学習されてなる。 That is, the NNW 23 is trained in advance so as to derive a character string center region likelihood 62, a boundary region likelihood 64, and a background region likelihood 66, in which the character string center regions 80 of the character strings 52 on different lines are not connected. It becomes

具体的には、図５Ｂ～図５Ｄに示すように、ＮＮＷ２３は、抽出部２４で特定される文字列中心領域８０Ａと文字列中心領域８０Ｂとが非連結となるようなスコア６１を算出するように、予め学習されてなることが好ましい。 Specifically, as shown in FIGS. 5B and 5D, the NNW 23 calculates a score 61 such that the character string central region 80A and the character string central region 80B specified by the extraction unit 24 are not connected. is preferably learned in advance.

このような学習がなされることで、導出部２２がＮＮＷ２３に画像５０Ａを入力すると、抽出部２４で非連結の文字列中心領域８０Ａおよび文字列中心領域８０Ｂを特定するような、文字列中心領域らしさ６２、境界領域らしさ６４、および背景領域らしさ６６のスコア６１が出力される。 Through such learning, when the derivation unit 22 inputs the image 50A to the NNW 23, the extraction unit 24 identifies the non-connected character string central regions 80A and 80B. A score 61 of likeness 62, border-likeness 64, and background-likeness 66 is output.

なお、ＮＮＷ２３は、互いに異なる行の文字列５２に対応する文字列中心領域８０が非連結となるようなスコア６１を出力すればよい。このため、文字列中心領域８０Ａの境界領域８２Ａ、および文字列中心領域８０Ｂの境界領域８２Ｂの形状は、例えば、以下の何れかの形状などであってよい。 The NNW 23 may output the score 61 such that the character string center regions 80 corresponding to the character strings 52 on different lines are not connected. Therefore, the shape of the boundary region 82A of the character string central region 80A and the boundary region 82B of the character string central region 80B may be, for example, any of the following shapes.

すなわち、図５Ｂ～図５Ｄに示すように、境界領域８２Ａおよび境界領域８２Ｂの形状は、文字列領域８６Ａと文字列領域８６Ｂとが非重複な領域については、各々の文字列領域８６の輪郭に沿った形状である。また、境界領域８２Ａおよび境界領域８２Ｂの形状は、文字列領域８６Ａと文字列領域８６Ｂとが重複する領域については、該重複する領域を通る１または複数の直線から成る領域であればよい（図５Ｂ、図５Ｃ参照）。また、境界領域８２Ａおよび境界領域８２Ｂの形状は、文字列領域８６Ａと文字列領域８６Ｂとが重複する領域については、何れか一方の文字列領域８６の輪郭に沿った形状であってもよい（図５Ｄ参照）。 That is, as shown in FIGS. 5B and 5D, the shapes of the boundary areas 82A and 82B follow the contours of the respective character string areas 86A and 86B where the character string areas 86A and 86B do not overlap. It is a shape that follows. Further, the shape of the boundary area 82A and the boundary area 82B may be an area consisting of one or a plurality of straight lines passing through the overlapping area with respect to the area where the character string area 86A and the character string area 86B overlap (Fig. 5B, see FIG. 5C). In addition, the shape of the boundary area 82A and the boundary area 82B may be a shape along the outline of one of the character string areas 86 for the area where the character string area 86A and the character string area 86B overlap ( See Figure 5D).

ＮＮＷ２３は、境界領域らしさ６４の損失の重み係数が、文字列中心領域らしさ６２および背景領域らしさ６６の損失の重み係数より大きい損失関数を最小化するように、予め学習されてなることが好ましい。 The NNW 23 is preferably trained in advance so as to minimize a loss function in which the loss weighting factor of the boundary area likeness 64 is greater than the loss weighting factor of the character string central area likeness 62 and the background area likeness 66 .

図６は、損失関数を最小化させる学習の一例の説明図である。 FIG. 6 is an explanatory diagram of an example of learning for minimizing a loss function.

例えば、画像５０Ａに含まれる文字列５２Ａと文字列５２Ｂとの一部に重複する重複領域Ｄがある場合を想定する。この場合、この重複領域Ｄを境界領域８２として特定することが困難となる場合がある。 For example, it is assumed that there is an overlapping area D in which the character string 52A and the character string 52B included in the image 50A partially overlap. In this case, it may be difficult to identify this overlapping area D as the boundary area 82 .

そこで、ＮＮＷ２３は、境界領域らしさ６４の損失の重み係数が、文字列中心領域らしさ６２および背景領域らしさ６６の損失の重み係数より大きい損失関数を最小化するように、予め学習されてなることが好ましい。詳細には、ＮＮＷ２３は、下記式（１）によって表される損失関数を最小化するように学習されてなることが好ましい。 Therefore, the NNW 23 is preliminarily learned so as to minimize a loss function in which the loss weighting factor of the boundary area likeness 64 is larger than the loss weighting factor of the character string central area likeness 62 and the background area likeness 66. preferable. Specifically, the NNW 23 is preferably trained to minimize the loss function represented by Equation (1) below.

Ｌ＝ｗ_ｆＬ_ｆ＋ｗ_ｂＬ_ｂ＋ｗ_ｅＬ_ｅ・・・式（１） L=w _f L _f + w _b L _b + w _e L _e Expression (1)

式（１）中、Ｌは損失関数を表す。Ｌ_ｆは文字列中心領域らしさ６２の損失を表す。ｗ_ｆは、文字列中心領域らしさ６２の損失に対する重み係数を表す。Ｌ_ｂは背景領域らしさ６６の損失を表す。ｗ_ｂは、背景領域らしさ６６の損失に対する重み係数を表す。Ｌ_ｅは境界領域らしさ６４の損失を表す。ｗ_ｅは、境界領域らしさ６４の損失に対する重み係数を表す。 In Equation (1), L represents a loss function. L _f represents the loss of string central region likelihood 62 . w _f represents a weighting factor for the loss of text central region likeness 62 . L _b represents the loss of background area likeness 66 . _wb represents a weighting factor for the loss of background area likeness 66; L _e represents the loss of border-likeness 64 . w _e represents a weighting factor for the loss of border-likeness 64 .

式（１）中、境界領域らしさ６４の損失に対する重み係数ｗ_ｅは、文字列中心領域らしさ６２の損失に対する重み係数ｗ_ｆ、および、背景領域らしさ６６の損失に対する重み係数ｗ_ｂより大きい値であればよい。 In equation (1), the weighting factor w _e for the loss of the border area likeness 64 is greater than the weighting factor w _f for the loss of the character string central area likeness 62 and the weighting factor w _b for the loss of the background area likeness 66 . I wish I had.

境界領域らしさ６４の損失に対する重み係数ｗ_ｅを、文字列中心領域らしさ６２の損失に対する重み係数ｗ_ｆ、および、背景領域らしさ６６の損失に対する重み係数ｗ_ｂより大きい値とした損失関数Ｌを最小化するようにＮＮＷ２３を学習する。この学習により、導出部２２は、より高精度な境界領域らしさ６４を導出することが可能となる。すなわち、導出部２２は、抽出部２４でより高精度な行画像領域６０を抽出可能な、境界領域らしさ６４を導出することができる。 The weighting factor w _e for the loss of the border area likeness 64 is larger than the weighting factor w _f for the loss of the character string central area likeness 62 and the weighting factor w _b for the loss of the background area likeness 66 , and the loss function L is minimized. NNW 23 is trained to This learning enables the derivation unit 22 to derive the boundary area likelihood 64 with higher accuracy. In other words, the derivation unit 22 can derive the border area likelihood 64 that enables the extraction unit 24 to extract the line image area 60 with higher accuracy.

なお、導出部２２は、画像５０の画素領域ごとに、１または複数種類の文字列中心領域らしさ６２、１または複数種類の境界領域らしさ６４、および１または複数種類の背景領域らしさ６６を導出してもよい。 Note that the derivation unit 22 derives one or more types of character string center region likenesses 62, one or more types of boundary region likenesses 64, and one or more types of background region likenesses 66 for each pixel region of the image 50. may

文字列中心領域らしさ６２の種類は、文字列中心領域８０を予め定めた分類条件に応じて複数グループに分類した各グループのラベルを表す。例えば、文字列中心領域らしさ６２の種類は、含まれる文字列５２の属性、含まれる文字列５２の書字方向、含まれる文字列５２によって表される文の種類、含まれる文字列５２によって表される文字形状、などである。 The type of character string central region likelihood 62 represents the label of each group obtained by classifying the character string central region 80 into a plurality of groups according to predetermined classification conditions. For example, the type of character string center region likeness 62 is represented by the attribute of the included character string 52, the writing direction of the included character string 52, the type of sentence represented by the included character string 52, and the included character string 52. character shapes, etc.

文字列５２の属性は、例えば、英語、漢字、カタカナ、数字、などである。文字列５２の書字方向は、例えば、縦書き、横書き、などである。文字列５２によって表される文の種類は、例えば、住所、電話番号、郵便番号、氏名などである。文の種類は、フィールドタイプと称される場合がある。文字形状は、例えば、手書き、活字、などである。 The attributes of the character string 52 are, for example, English, kanji, katakana, numerals, and the like. The writing direction of the character string 52 is, for example, vertical writing, horizontal writing, or the like. Types of sentences represented by the character string 52 are, for example, addresses, telephone numbers, postal codes, names, and the like. Sentence types are sometimes referred to as field types. Character shapes are, for example, handwriting, printed characters, and the like.

境界領域らしさ６４の種類は、境界領域８２を予め定めた分類条件に応じて複数グループに分類した各グループのラベルを表す。例えば、境界領域らしさ６４の種類は、他の文字列中心領域８０の境界領域８２と非重複の領域と、他の文字列中心領域８０の境界領域８２と重複する領域と、に分類される。 The type of boundary area-likeness 64 represents the label of each group in which the boundary area 82 is classified into a plurality of groups according to predetermined classification conditions. For example, the types of boundary area likeness 64 are classified into areas that do not overlap with the boundary area 82 of another character string central area 80 and areas that overlap with the boundary area 82 of another character string central area 80 .

図７は、境界領域８２の一例の模式図である。例えば、境界領域らしさ６４の種類は、他の文字列中心領域８０に非重複の境界領域８２Ａの境界領域らしさ６４と、他の文字列中心領域８０に重複する境界領域８２Ｂの境界領域らしさ６４と、に分類される。 FIG. 7 is a schematic diagram of an example of the boundary area 82. As shown in FIG. For example, the types of boundary area likelihood 64 include the boundary area likelihood 64 of a boundary area 82A that does not overlap with another character string central area 80, and the boundary area likelihood 64 of a boundary area 82B that overlaps with another character string central area 80. ,are categorized.

背景領域らしさ６６の種類は、背景領域８４を予め定めた分類条件に応じて複数グループに分類した各グループのラベルを表す。例えば、背景領域らしさ６６の種類は、表を表す表領域、図を表す図領域、表および図以外のその他の領域、などである。 The type of background area-likeness 66 represents the label of each group in which the background area 84 is classified into a plurality of groups according to predetermined classification conditions. For example, the types of background area-likeness 66 include a table area representing a table, a diagram area representing a diagram, and other areas other than tables and diagrams.

導出部２２が、画像５０の画素領域ごとに、より複数の種類の、文字列中心領域らしさ６２、境界領域らしさ６４、および背景領域らしさ６６を導出する。この導出処理により、抽出部２４では、より高精度に文字列中心領域８０を特定することができる。このため、抽出部２４は、より高精度に行画像領域６０を抽出することができる。 The derivation unit 22 derives a plurality of types of character string central area likenesses 62 , border area likenesses 64 , and background area likenesses 66 for each pixel area of the image 50 . By this derivation process, the extraction unit 24 can specify the character string central region 80 with higher accuracy. Therefore, the extraction unit 24 can extract the row image region 60 with higher accuracy.

図１に戻り説明を続ける。 Returning to FIG. 1, the description continues.

文字列認識部２６は、抽出部２４で抽出された行画像領域６０ごとに、行画像領域６０に含まれる文字を認識し、文字認識結果を出力する。文字列認識部２６による文字認識には、公知の方法を用いればよい。 The character string recognition unit 26 recognizes characters included in each line image region 60 extracted by the extraction unit 24, and outputs a character recognition result. A known method may be used for character recognition by the character string recognition unit 26 .

次に、本実施形態の文字列抽出装置１０で実行する情報処理の流れの一例を説明する。 Next, an example of the flow of information processing executed by the character string extraction device 10 of this embodiment will be described.

図８は、文字列抽出装置１０で実行される情報処理の流れの一例を示すフローチャートである。なお、図８には、文字列抽出装置１０が文字列認識部２６を備える構成である場合の情報処理の流れの一例を示す。 FIG. 8 is a flowchart showing an example of the flow of information processing executed by the character string extraction device 10. As shown in FIG. Note that FIG. 8 shows an example of the flow of information processing when the character string extraction device 10 is configured to include the character string recognition unit 26 .

導出部２２は、画像５０から、画素領域ごとに文字列中心領域らしさ６２、境界領域らしさ６４、および背景領域らしさ６６の各々を表すスコア６１を導出する（ステップＳ１００）。 The deriving unit 22 derives a score 61 representing each of the character string central region likelihood 62, boundary region likelihood 64, and background region likelihood 66 for each pixel region from the image 50 (step S100).

抽出部２４は、ステップＳ１００で導出されたスコア６１に基づいて、画像５０に含まれる行画像領域６０を抽出する（ステップＳ１０２）。 The extraction unit 24 extracts the row image area 60 included in the image 50 based on the score 61 derived in step S100 (step S102).

文字列認識部２６は、ステップＳ１０２で抽出された行画像領域６０の文字認識結果を出力する（ステップＳ１０４）。 The character string recognition unit 26 outputs the character recognition result of the row image area 60 extracted in step S102 (step S104).

そして、本ルーチンを終了する。 Then, the routine ends.

以上説明したように、本実施形態の文字列抽出装置１０は、導出部２２と、抽出部２４と、を備える。導出部２２は、ＮＮＷ２３を用いて、文字を記入された記録媒体の画像５０から、画素領域ごとに、文字列中心領域らしさ６２、境界領域らしさ６４、および背景領域らしさ６６を導出する。抽出部２４は、文字列中心領域らしさ６２、境界領域らしさ６４、および背景領域らしさ６６に基づいて、画像５０に含まれる行ごとの文字列５２の行画像領域６０を抽出する。 As described above, the character string extraction device 10 of this embodiment includes the derivation unit 22 and the extraction unit 24 . Using the NNW 23, the derivation unit 22 derives a character string central region likelihood 62, a boundary region likelihood 64, and a background region likelihood 66 for each pixel region from the image 50 of the recording medium on which characters are written. The extraction unit 24 extracts a line image area 60 of the character string 52 for each line included in the image 50 based on the character string center area likeness 62 , the boundary area likeness 64 , and the background area likeness 66 .

ここで、従来技術では、複数の文字列５２の行が接近または重複して記入されている場合、複数の文字列５２の行を同一の行の行画像領域として誤特定する場合があった。 Here, in the prior art, when lines of a plurality of character strings 52 are written close to each other or overlapped, the lines of a plurality of character strings 52 may be erroneously specified as the line image area of the same line.

図９Ａは、従来の行画像領域の特定の一例の説明図である。例えば、非特許文献１に開示されている方法では、複数の文字列５２の行が接近または重複して記入されている場合、重複または接近する領域Ｑを背景として分類することが困難である。このため、従来技術では、図９Ａに示すように、異なる行の文字列５２である文字列５２Ａと文字列５２Ｂとを、同一の行の行画像領域６００として誤特定する場合があった。すなわち、従来技術では、画像５０から行ごとの文字列５２の行画像領域６０を高精度に抽出することは困難であった。 FIG. 9A is an explanatory diagram of an example of conventional line image area identification. For example, in the method disclosed in Non-Patent Document 1, when lines of a plurality of character strings 52 are written closely or overlappingly, it is difficult to classify overlapping or close areas Q as background. Therefore, in the prior art, as shown in FIG. 9A, character strings 52A and 52B, which are character strings 52 on different lines, may be erroneously specified as the line image area 600 on the same line. That is, in the conventional technique, it is difficult to extract the line image area 60 of the character string 52 for each line from the image 50 with high precision.

一方、本実施形態の文字列抽出装置１０では、抽出部２４が、文字列中心領域らしさ６２、境界領域らしさ６４、および背景領域らしさ６６に基づいて、画像５０に含まれる行ごとの文字列５２の行画像領域６０を抽出する。すなわち、本実施形態の文字列抽出装置１０では、文字列中心領域らしさ６２および背景領域らしさ６６のみではなく、境界領域らしさ６４を更に用いて、行画像領域６０を抽出する。 On the other hand, in the character string extraction device 10 of the present embodiment, the extracting unit 24 extracts the character strings 52 for each line included in the image 50 based on the character string central area likeness 62, the boundary area likeness 64, and the background area likeness 66. , the row image area 60 is extracted. That is, the character string extracting apparatus 10 of the present embodiment extracts the line image region 60 by using not only the character string central region likelihood 62 and the background region likelihood 66 but also the boundary region likelihood 64 .

このため、図９Ｂに示すように、本実施形態の文字列抽出装置１０では、異なる行の文字列５２である文字列５２Ａと文字列５２Ｂとを、別の行の行画像領域６０Ａおよび行画像領域６０Ｂの各々として抽出することができる。 For this reason, as shown in FIG. 9B, in the character string extraction device 10 of the present embodiment, the character strings 52A and 52B, which are the character strings 52 of different lines, are extracted from the line image area 60A and line image area 60A of different lines. It can be extracted as each of the regions 60B.

すなわち、本実施形態の文字列抽出装置１０では、文字列中心領域らしさ６２および背景領域らしさ６６に加えて、境界領域らしさ６４を更に用いることで、画素領域ごとに算出される文字列中心領域８０の尤度を高精度に算出することができる。そして、本実施形態の文字列抽出装置１０は、算出した尤度に基づいて特定した文字列中心領域８０を用いることで、高精度に行画像領域６０を抽出することができる。 That is, in the character string extracting apparatus 10 of the present embodiment, in addition to the character string central region likelihood 62 and the background region likelihood 66, the boundary region likelihood 64 is further used to calculate the character string central region 80 for each pixel region. can be calculated with high accuracy. Then, the character string extraction device 10 of the present embodiment can extract the line image area 60 with high accuracy by using the character string central area 80 specified based on the calculated likelihood.

従って、本実施形態の文字列抽出装置１０は、行ごとの文字列５２の行画像領域６０を高精度に抽出することができる。 Therefore, the character string extracting device 10 of the present embodiment can extract the line image area 60 of the character string 52 for each line with high accuracy.

また、本実施形態の文字列抽出装置１０では、文字列中心領域８０は、文字列領域８６内の所定位置に向かって文字列領域８６を第２画素数縮小した領域である。 Further, in the character string extracting device 10 of the present embodiment, the character string center region 80 is a region obtained by reducing the character string region 86 by the second number of pixels toward a predetermined position within the character string region 86 .

図１０Ａは、文字列領域８６を所定の縮小比率で縮小した場合の説明図である。図１０Ａに示すように、文字列領域８６を所定の縮小比率で縮小した領域を文字列中心領域８０とした場合、文字列中心領域８０の一部が消失する場合がある。 FIG. 10A is an explanatory diagram when the character string area 86 is reduced at a predetermined reduction ratio. As shown in FIG. 10A, when a character string central region 80 is formed by reducing a character string region 86 at a predetermined reduction ratio, part of the character string central region 80 may disappear.

図１０Ｂは、文字列領域８６を第２画素数縮小した場合の説明図である。図１０Ｂに示すように、文字列領域８６を第２画素数縮小した領域を文字列中心領域８０とすると、文字列中心領域８０の一部が消失することを抑制することができる。 FIG. 10B is an explanatory diagram when the character string area 86 is reduced by the second number of pixels. As shown in FIG. 10B , if the character string region 86 is reduced by the second number of pixels to be the character string central region 80 , it is possible to prevent part of the character string central region 80 from disappearing.

このため、本実施形態の文字列抽出装置１０は、上記効果に加えて、文字列領域８６が複雑な形状である場合であっても、高精度に行画像領域６０を抽出することができる。 Therefore, in addition to the effects described above, the character string extraction device 10 of the present embodiment can extract the line image region 60 with high accuracy even when the character string region 86 has a complicated shape.

また、本実施形態の文字列抽出装置１０では、高精度に抽出された行画像領域６０の文字認識が行われることで、上記効果に加えて、画像５０に含まれる文字を高精度に認識することができる。 Further, in the character string extraction device 10 of the present embodiment, character recognition of the line image region 60 extracted with high accuracy is performed, so that in addition to the above effects, characters included in the image 50 can be recognized with high accuracy. be able to.

次に、本実施形態の文字列抽出装置１０のハードウェア構成を説明する。 Next, the hardware configuration of the character string extraction device 10 of this embodiment will be described.

図１１は、本実施形態の文字列抽出装置１０の一例のハードウェア構成図である。 FIG. 11 is a hardware configuration diagram of an example of the character string extraction device 10 of this embodiment.

本実施形態の文字列抽出装置１０は、ＣＰＵ９１などの制御装置と、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）９２やＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）９３などの記憶装置と、ネットワークに接続して通信を行う通信Ｉ／Ｆ９４と、各部を接続するバス９５と、を備える。 The character string extracting device 10 of the present embodiment includes a control device such as a CPU 91, a storage device such as a ROM (Read Only Memory) 92 and a RAM (Random Access Memory) 93, and a communication I/ F94 and a bus 95 connecting each part.

本実施形態の文字列抽出装置１０で実行されるプログラムは、ＲＯＭ９２等に予め組み込まれて提供される。 A program executed by the character string extracting device 10 of the present embodiment is preinstalled in the ROM 92 or the like and provided.

本実施形態の文字列抽出装置１０で実行されるプログラムは、インストール可能な形式又は実行可能な形式のファイルでＣＤ－ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｋＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、フレキシブルディスク（ＦＤ）、ＣＤ－Ｒ（ＣｏｍｐａｃｔＤｉｓｋＲｅｃｏｒｄａｂｌｅ）、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｋ）等のコンピュータで読み取り可能な記録媒体に記録してコンピュータプログラムプロダクトとして提供されるように構成してもよい。 The program to be executed by the character string extraction device 10 of the present embodiment can be stored as an installable or executable file on a CD-ROM (Compact Disk Read Only Memory), a flexible disk (FD), a CD-R (Compact Disk Recordable), DVD (Digital Versatile Disk), or other computer-readable recording medium, and provided as a computer program product.

さらに、本実施形態の文字列抽出装置１０で実行されるプログラムを、インターネット等のネットワークに接続されたコンピュータ上に格納し、ネットワーク経由でダウンロードさせることにより提供するように構成してもよい。また、本実施形態の文字列抽出装置１０で実行されるプログラムをインターネット等のネットワーク経由で提供または配布するように構成してもよい。 Furthermore, the program executed by the character string extraction device 10 of this embodiment may be stored on a computer connected to a network such as the Internet, and provided by being downloaded via the network. Also, the program executed by the character string extracting device 10 of this embodiment may be provided or distributed via a network such as the Internet.

本実施形態の文字列抽出装置１０で実行されるプログラムは、コンピュータを、本実施形態の文字列抽出装置１０の各部として機能させうる。このコンピュータは、ＣＰＵ９１がコンピュータ読取可能な記憶媒体からプログラムを主記憶装置上に読み出して実行することができる。 A program executed by the character string extraction device 10 of this embodiment can cause a computer to function as each part of the character string extraction device 10 of this embodiment. In this computer, the CPU 91 can read a program from a computer-readable storage medium onto the main storage device and execute it.

上記には、本発明の実施形態を説明したが、本実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。上記新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。本実施形態やその変形は、発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 Although embodiments of the present invention have been described above, the embodiments are presented as examples and are not intended to limit the scope of the invention. The novel embodiments described above can be embodied in various other forms, and various omissions, replacements, and modifications can be made without departing from the scope of the invention. This embodiment and its modifications are included in the scope and gist of the invention, and are included in the scope of the invention described in the claims and equivalents thereof.

１０文字列抽出装置
２２導出部
２３ＮＮＷ
２４抽出部 10 character string extraction device 22 derivation unit 23 NNW
24 Extractor

実施形態の文字列抽出装置は、導出部と、抽出部と、を備える。導出部は、ニューラルネットワークを用いて、文字を記入された記録媒体の画像の画素領域ごとに、文字列領域内の文字列中心領域らしさ、前記文字列領域と文字列中心領域との間の境界領域らしさ、および、背景領域らしさ、を導出する。抽出部は、前記文字列中心領域らしさ、前記境界領域らしさ、および前記背景領域らしさに基づいて、前記画像に含まれる行ごとの文字列の行画像領域を抽出する。前記境界領域らしさは、境界領域である度合いを示し、前記境界領域は、前記文字列領域と前記文字列中心領域との間の全領域である。 A character string extraction device according to an embodiment includes a derivation unit and an extraction unit. The deriving unit uses a neural network to determine, for each pixel area of an image of a recording medium in which characters are written, the likeness of a character string central area within a character string area, the boundary between the character string area and the character string central area. Region-likeness and background-likeness are derived. The extraction unit extracts a line image area of the character string for each line included in the image based on the likeness of the character string center area, the likeness of the boundary area, and the likeness of the background area. The boundary area-likeness indicates the degree of being a boundary area, and the boundary area is the entire area between the character string area and the character string center area.

Claims

Using a neural network, for each pixel area of an image of a recording medium in which characters are written, the likeness of a character string central area within a character string area, the likeness of a boundary area between the character string area and the character string central area, and , a derivation unit for deriving the likeness of the background region;
an extraction unit that extracts a line image area of a character string for each line included in the image based on the likeness of the central area of the character string, the likeness of the boundary area, and the likeness of the background area;
A string extractor with

The extractor is
Identifying the character string central region included in the image based on the likelihood of the character string central region calculated for each pixel region using the character string central region likelihood, the boundary region likelihood, and the background region likelihood death,
extracting the character string center region or a region obtained by enlarging the character string center region by a first number of pixels as the line image region;
The character string extraction device according to claim 1.

The character string central region is
An area obtained by reducing the character string area by a second number of pixels toward a predetermined position within the character string area,
3. The character string extraction device according to claim 1 or 2.

The neural network is
It is learned in advance so as to derive the character string central region-likeness, the boundary region-likeness, and the background region-likeness in which the character string central regions of character strings on different lines are not connected,
The character string extraction device according to any one of claims 1 to 3.

The neural network is
pre-learned so as to minimize a loss function in which the weighting factor of the loss of the likeness of the boundary area is greater than the weighting factor of the loss of the likeness of the character string central area and the likeness of the background area;
The character string extraction device according to any one of claims 1 to 4.

The derivation unit is
For each pixel region of the image,
Deriving one or more types of the character string central region-likeness, one or more types of the boundary region-likeness, and one or more types of the background region-likeness;
The character string extraction device according to any one of claims 1 to 5.

Using a neural network, for each pixel area of an image of a recording medium in which characters are written, the likeness of a character string central area within a character string area, the likeness of a boundary area between the character string area and the character string central area, and , background region-likeness, and
extracting a line image region of a character string for each line included in the image based on the likeness of the character string center region, the likeness of the border region, and the likeness of the background region;
String extraction method containing .

Using a neural network, for each pixel area of an image of a recording medium in which characters are written, the likeness of a character string central area within a character string area, the likeness of a boundary area between the character string area and the character string central area, and , background region-likeness, and
extracting a line image region of a character string for each line included in the image based on the likeness of the character string center region, the likeness of the border region, and the likeness of the background region;
A string extraction program for executing on a computer.