JP7417116B2

JP7417116B2 - Information processing system, information processing method, program

Info

Publication number: JP7417116B2
Application number: JP2021090955A
Authority: JP
Inventors: 唯仁八尾
Original assignee: Canon Marketing Japan Inc; Canon IT Solutions Inc
Current assignee: Canon Marketing Japan Inc; Canon IT Solutions Inc
Priority date: 2020-12-28
Filing date: 2021-05-31
Publication date: 2024-01-18
Anticipated expiration: 2041-05-31
Also published as: JP2022104498A

Description

本発明は、情報処理システム、情報処理方法、プログラムに関する。 The present invention relates to an information processing system, an information processing method, and a program.

印刷された帳票から情報を読み取ってシステムに入力する業務を補助するものとしてＯＣＲ（光学文字認識）が存在する。ＯＣＲでは文字を認識する前に、帳票内に文字が印刷された領域を検出する文字検出という処理が存在する。 OCR (optical character recognition) exists as a tool that assists in the task of reading information from printed forms and inputting it into a system. In OCR, before recognizing characters, there is a process called character detection that detects areas in which characters are printed within a form.

ＯＣＲで読み取り対象とされる文書は刊行物、ビジネス文書など多岐にわたり、用途によってＯＣＲに対する要求にも差がある。この中でも、帳票をＯＣＲする際の文字検出においては、「短い文字列でも見落とさないこと」、「文字間隔が開いた見出しを１つの文字列として認識できること」、「互いに無関係な文字列同士が結合されないこと」といった点が要求される。 There are a wide variety of documents that can be read by OCR, such as publications and business documents, and the requirements for OCR differ depending on the purpose. Among these, when detecting characters when performing OCR on documents, it is important to ensure that even short character strings are not overlooked, that headings with wide spacing between characters can be recognized as one character string, and that unrelated character strings are combined. It is required that such things be avoided.

ＨｙｂｒｉｄＰａｇｅＬａｙｏｕｔＡｎａｌｙｓｉｓｖｉａＴａｂ－ＳｔｏｐＤｅｔｅｃｔｉｏｎｈｔｔｐｓ：／／ｓｔａｔｉｃ．ｇｏｏｇｌｅｕｓｅｒｃｏｎｔｅｎｔ．ｃｏｍ／ｍｅｄｉａ／ｒｅｓｅａｒｃｈ．ｇｏｏｇｌｅ．ｃｏｍ／ｊａ／／ｐｕｂｓ／ａｒｃｈｉｖｅ／３５０９４．ｐｄｆHybrid Page Layout Analysis via Tab-Stop Detectionhttps://static. googleusercontent. com/media/research. google. com/ja//pubs/archive/35094. pdf

非特許文献１において、雑誌や新聞、論文などの段組みの文章に対するＯＣＲ技術について記載されている。 Non-Patent Document 1 describes an OCR technique for text in columns such as magazines, newspapers, and papers.

これに対して、請求書や領収書といった帳票は、段組みの文章として扱いＯＣＲ処理を実行してしまうと、互いに関係ない近接文字列同士を１つの段落として結合してしまうという課題が生じてしまう。 On the other hand, if documents such as invoices and receipts are treated as text in columns and OCR processing is performed, a problem arises in that adjacent character strings that are unrelated to each other are combined into one paragraph. Put it away.

非特許文献１以外でも、機械学習による物体検出手法を応用した文字検出手法が提案されているが、物体検出ベースの手法は帳票中の見出しや値のような単一行の短い文字列（ただし帳票内では重要な意味を持つ文字列）を見逃す傾向があり、前述の帳票ＯＣＲに対する要求を満たさない。 In addition to Non-Patent Document 1, character detection methods that apply object detection methods using machine learning have been proposed, but object detection-based methods are There is a tendency for character strings with important meanings to be overlooked (character strings with important meanings), and the above-mentioned requirements for form OCR are not met.

そこで本発明は、より適切な文字認識結果が得られる技術を提供することを目的とする。 Therefore, an object of the present invention is to provide a technique that can obtain more appropriate character recognition results.

本発明の情報処理システムは、文字認識の対象の画像から、連続して存在する画素を取得する連続画素取得手段と、前記連続画素取得手段により取得された画素に基づき、文字領域を推定する推定手段と、前記推定手段により推定された文字領域の単位で文字認識処理を実行する文字認識手段と、を備えることを特徴とする。 The information processing system of the present invention includes a continuous pixel acquisition unit that acquires consecutive pixels from an image that is a target of character recognition, and an estimation that estimates a character area based on the pixels acquired by the continuous pixel acquisition unit. and a character recognition means that executes character recognition processing in units of character areas estimated by the estimation means.

本発明によれば、より適切な文字認識結果を得ることが可能となる。 According to the present invention, it is possible to obtain more appropriate character recognition results.

本発明の実施形態における、表抽出システムのシステム構成の一例を示す図である。1 is a diagram showing an example of a system configuration of a table extraction system in an embodiment of the present invention. 本発明の実施形態における、ＰＣのハードウェア構成の一例を示すブロック図である。FIG. 2 is a block diagram showing an example of the hardware configuration of a PC in an embodiment of the present invention. 本発明の実施形態における、機能構成の一例を示す図である。FIG. 2 is a diagram showing an example of a functional configuration in an embodiment of the present invention. 本発明の実施形態における、画像前処理部の処理結果の一例を示す図である。FIG. 3 is a diagram showing an example of a processing result of an image preprocessing unit in an embodiment of the present invention. 本発明の実施形態における、連続画素検出部の検出結果の一例を示す図である。It is a figure which shows an example of the detection result of a continuous pixel detection part in embodiment of this invention. 本発明の実施形態における、画像片分類部の処理結果の一例を示す図である。FIG. 6 is a diagram illustrating an example of a processing result of an image segment classification unit in an embodiment of the present invention. 本発明の実施形態における、文字領域推定部の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of processing of a character area estimation part in an embodiment of the present invention. 本発明の実施形態における、文字と分類された画像片の座標をプロットしたワーク画像の一例を示す図である。FIG. 3 is a diagram showing an example of a work image in which coordinates of image pieces classified as characters are plotted in an embodiment of the present invention. 本発明の実施形態における、文字のまとまりを推定する処理の一例を示す図である。FIG. 3 is a diagram illustrating an example of a process for estimating a group of characters in an embodiment of the present invention. 本発明の実施形態における、孤立文字の結合処理の一例を示す図である。FIG. 3 is a diagram illustrating an example of processing for combining isolated characters in an embodiment of the present invention. 本発明の実施形態における、文字検出結果の一例を示す図である。It is a figure showing an example of a character detection result in an embodiment of the present invention. 本発明の実施形態における、出力結果の一例を示す図である。It is a figure showing an example of an output result in an embodiment of the present invention. 本発明の実施形態における、セル分割部の処理の一例を示す図である。It is a figure which shows an example of the process of a cell division part in embodiment of this invention. 本発明の実施形態における、文字列画像の活字、手書き分類の処理の一例を示す図である。FIG. 3 is a diagram illustrating an example of processing for classifying printed characters and handwriting of a character string image in an embodiment of the present invention.

以下、図面を参照して、本発明の実施形態を詳細に説明する。 Embodiments of the present invention will be described in detail below with reference to the drawings.

図１は、本発明の実施形態における文字認識システムのシステム構成の一例を示す図である。 FIG. 1 is a diagram showing an example of the system configuration of a character recognition system according to an embodiment of the present invention.

文字認識の主要な処理を行うためのクライアントＰＣ１０１および、帳票をスキャンして画像ファイル化するスキャナ１０２が通信経路１００を介して接続される構成となっている。 A client PC 101 for performing main processing of character recognition and a scanner 102 for scanning a form and converting it into an image file are connected via a communication path 100.

通信経路１００はスキャナ１０２の有する物理インターフェースに応じて、有線ＬＡＮ，無線ＬＡＮ，ＵＳＢなどの形態をとることができる。 The communication path 100 can take the form of a wired LAN, wireless LAN, USB, etc. depending on the physical interface that the scanner 102 has.

通信経路１００上にはファイルサーバー１０３を置いてもよい。スキャナ１０２でスキャンした画像をクライアントＰＣ１０１に取り込む方法として、スキャナ１０２からクライアントＰＣ１０１に直接画像を送信する方法、スキャナ１０２で取り込んだ画像ファイルをいったんファイルサーバー１０３に保管し、クライアントＰＣ１０１がファイルサーバー１０３から画像ファイルを取り出す方法などがあるが、いずれの方法であっても良い。 A file server 103 may be placed on the communication path 100. As a method of importing an image scanned by the scanner 102 to the client PC 101, there is a method in which the image is directly sent from the scanner 102 to the client PC 101, and a method in which the image file imported by the scanner 102 is temporarily stored in the file server 103, and then the client PC 101 is sent from the file server 103. There are methods for extracting image files, and any method may be used.

図２は、本発明のクライアントＰＣ１０１、スキャナ１０２、ファイルサーバー１０３に適用可能な情報処理装置のハードウェア構成の一例を示すブロック図である。 FIG. 2 is a block diagram showing an example of the hardware configuration of an information processing apparatus applicable to the client PC 101, scanner 102, and file server 103 of the present invention.

図２に示すように、情報処理装置は、システムバス２００を介してＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）２０１、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）２０２、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）２０３、記憶装置２０４、入力コントローラ２０５、音声コントローラ２０６、ビデオコントローラ２０７、メモリコントローラ２０８、よび通信Ｉ／Ｆコントローラ２０９が接続される。 As shown in FIG. 2, the information processing device includes a CPU (Central Processing Unit) 201, a ROM (Read Only Memory) 202, a RAM (Random Access Memory) 203, a storage device 204, an input controller 205, An audio controller 206, a video controller 207, a memory controller 208, and a communication I/F controller 209 are connected.

ＣＰＵ２０１は、システムバス２００に接続される各デバイスやコントローラを統括的に制御する。 The CPU 201 centrally controls each device and controller connected to the system bus 200.

ＲＯＭ２０２あるいは外部メモリ２１３は、ＣＰＵ２０１が実行する制御プログラムであるＢＩＯＳ（ＢａｓｉｃＩｎｐｕｔ／ＯｕｔｐｕｔＳｙｓｔｅｍ）やＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）や、本情報処理方法を実現するためのコンピュータ読み取り実行可能なプログラムおよび必要な各種データ（データテーブルを含む）を保持している。 The ROM 202 or external memory 213 stores a BIOS (Basic Input/Output System) and an OS (Operating System), which are control programs executed by the CPU 201, as well as computer-readable and executable programs and various necessary programs for realizing this information processing method. Holds data (including data tables).

ＲＡＭ２０３は、ＣＰＵ２０１の主メモリ、ワークエリア等として機能する。ＣＰＵ２０１は、処理の実行に際して必要なプログラム等をＲＯＭ２０２あるいは外部メモリ２１３からＲＡＭ２０３にロードし、ロードしたプログラムを実行することで各種動作を実現する。 The RAM 203 functions as the main memory, work area, etc. of the CPU 201. The CPU 201 loads necessary programs and the like from the ROM 202 or the external memory 213 into the RAM 203 when executing processing, and executes the loaded programs to realize various operations.

入力コントローラ２０５は、キーボード２１０や不図示のマウス等のポインティングデバイス等の入力装置からの入力を制御する。入力装置がタッチパネルの場合、ユーザがタッチパネルに表示されたアイコンやカーソルやボタンに合わせて押下（指等でタッチ）することにより、各種の指示を行うことができることとする。 The input controller 205 controls input from input devices such as a keyboard 210 and a pointing device such as a mouse (not shown). If the input device is a touch panel, the user can issue various instructions by pressing (touching with a finger or the like) an icon, cursor, or button displayed on the touch panel.

また、タッチパネルは、マルチタッチスクリーンなどの、複数の指でタッチされた位置を検出することが可能なタッチパネルであってもよい。 Further, the touch panel may be a touch panel capable of detecting positions touched by multiple fingers, such as a multi-touch screen.

ビデオコントローラ２０７は、ディスプレイ２１２などの外部出力装置への表示を制御する。ディスプレイは本体と一体になったノート型パソコンのディスプレイも含まれるものとする。なお、外部出力装置はディスプレイに限ったものははく、例えばプロジェクタであってもよい。また、前述のタッチ操作を受け付け可能な装置については、入力装置も提供する。 Video controller 207 controls display on an external output device such as display 212. The display shall also include the display of a notebook computer that is integrated with the main body. Note that the external output device is not limited to a display, but may be a projector, for example. Furthermore, an input device is also provided for the device capable of accepting the above-mentioned touch operation.

なおビデオコントローラ２０７は、表示制御を行うためのビデオメモリ（ＶＲＡＭ）を制御することが可能で、ビデオメモリ領域としてＲＡＭ２０３の一部を利用することもできるし、別途専用のビデオメモリを設けることも可能である。 Note that the video controller 207 can control a video memory (VRAM) for display control, and can use a part of the RAM 203 as a video memory area, or can provide a separate dedicated video memory. It is possible.

メモリコントローラ２０８は、外部メモリ２１３へのアクセスを制御する。外部メモリとしては、ブートプログラム、各種アプリケーション、フォントデータ、ユーザファイル、編集ファイル、および各種データ等を記憶する外部記憶装置（ハードディスク）、フレキシブルディスク（ＦＤ）、或いはＰＣＭＣＩＡカードスロットにアダプタを介して接続されるコンパクトフラッシュ（登録商標）メモリ等を利用可能である。 Memory controller 208 controls access to external memory 213. External memory can be an external storage device (hard disk) that stores boot programs, various applications, font data, user files, editing files, various data, etc., a flexible disk (FD), or a PCMCIA card slot connected via an adapter. Compact flash (registered trademark) memory and the like can be used.

通信Ｉ／Ｆコントローラ２０９は、ネットワークを介して外部機器と接続・通信するものであり、ネットワークでの通信制御処理を実行する。例えば、ＴＣＰ／ＩＰを用いた通信やＩＳＤＮなどの電話回線、および携帯電話の４Ｇ回線、５Ｇ回線等を用いた通信が可能である。 The communication I/F controller 209 connects and communicates with external devices via a network, and executes communication control processing on the network. For example, communication using TCP/IP, a telephone line such as ISDN, a 4G line, a 5G line of a mobile phone, etc. is possible.

尚、ＣＰＵ２０１は、例えばＲＡＭ２０３内の表示情報用領域へアウトラインフォントの展開（ラスタライズ）処理を実行することにより、ディスプレイ２１２上での表示を可能としている。また、ＣＰＵ２０１は、ディスプレイ２１２上の不図示のマウスカーソル等でのユーザ指示を可能とする。 Note that the CPU 201 enables display on the display 212 by, for example, executing an outline font development (rasterization) process in a display information area in the RAM 203. Further, the CPU 201 allows the user to give instructions using a mouse cursor (not shown) on the display 212.

図３は、クライアントＰＣ１０１の機能構成の一例を示す図である。 FIG. 3 is a diagram showing an example of the functional configuration of the client PC 101.

入力受付部２５１は、スキャナ１０２やファイルサーバー１０３を介して画像の入力を受け付ける。 The input receiving unit 251 receives image input via the scanner 102 or the file server 103.

画像前処理部２５２は、入力受付部２５１で受け付けた入力画像のノイズ除去や二値化処理を行う。 The image preprocessing unit 252 performs noise removal and binarization processing on the input image received by the input receiving unit 251.

連続画素検出部２５３は、画像前処理部２５２による処理で得られた二値化画像の中から連続した画素（隣り合った画素）を検出し、画像片として切り出す。なお、完全に隣り合っておらず、数画素離れている程度で、かすれに起因するものであると判断できる部分は同一の画像片として切り出してもよい。すなわち、連続画素検出部２５３は、ひと続き（一連）の繋がった線であると判定（評価）される部分を１つの画像片として切り出す。 The continuous pixel detection unit 253 detects continuous pixels (adjacent pixels) from the binarized image obtained by the processing by the image preprocessing unit 252, and cuts them out as image pieces. Note that portions that are not completely adjacent to each other and are separated by a few pixels and can be determined to be caused by blur may be cut out as the same image piece. That is, the continuous pixel detection unit 253 cuts out a portion that is determined (evaluated) to be a continuous (series) of connected lines as one image piece.

画像片分類部２５４は、連続画素検出部２５３により切り出された画像片が、文字由来のものかそれ以外のものかを判定し分類する機能を持つ。分類の際には、分類モデル２５５に格納されたパラメータを用いる。 The image piece classification unit 254 has a function of determining and classifying whether the image piece cut out by the continuous pixel detection unit 253 is derived from characters or something else. During classification, parameters stored in the classification model 255 are used.

文字領域推定部２５６は、画像片分類部２５４によって文字由来と分類された画像片の領域情報から、帳票上の文字列の塊の領域を推定する。 The character area estimating unit 256 estimates the area of the chunk of character strings on the form from the area information of the image piece classified as being of character origin by the image piece classifying unit 254.

セル分割部２５７は、帳票上の表やセルの座標情報が与えられている場合に、セルの座標情報を用いて文字列領域を分割する。 The cell dividing unit 257 divides the character string area using the cell coordinate information when coordinate information of a table or a cell on a form is given.

活字手書き判定部２６０は、文字領域内に書かれている文字が活字（第１種別の文字）であるか手書き文字（第２種別の文字）であるかを分類（特定）する機能を持つ。分類の際には、分類モデル２６１に格納されたパラメータを用いる。 The printed and handwritten character determination unit 260 has a function of classifying (specifying) whether the characters written in the character area are printed characters (first type of characters) or handwritten characters (second type of characters). During classification, parameters stored in the classification model 261 are used.

文字認識部２５８は、文字領域推定部２５６により推定された各文字領域に対して、ＯＣＲ処理（文字認識処理）を行い、当該領域に書かれた文字を認識する。 The character recognition unit 258 performs OCR processing (character recognition processing) on each character area estimated by the character area estimation unit 256, and recognizes the characters written in the area.

結果出力部２５９は、検出された文字領域とそこに書かれた文字をセットにしてファイルとして出力する。 The result output unit 259 outputs a set of the detected character area and the characters written therein as a file.

図４は、画像前処理部２５２による処理の一例を示す図である。 FIG. 4 is a diagram showing an example of processing by the image preprocessing section 252.

入力画像４０１は、スキャナ１０２などを通して取り込まれた画像である。 An input image 401 is an image captured through a scanner 102 or the like.

画像前処理部２５２は入力画像４０１を二値化、白黒反転したのち、ノイズ除去の処理を行い、前処理画像４０２を生成する。 The image preprocessing unit 252 binarizes and inverts the input image 401 into black and white, and then performs noise removal processing to generate a preprocessed image 402.

図５は、連続画素検出部２５３の検出結果の一例を示す図である。 FIG. 5 is a diagram showing an example of the detection results of the continuous pixel detection section 253.

連続画素検出部２５３は、前処理画像４０２から、白い画素が連続した領域を検出し、画像片として切り出す。画像片は文字の偏、旁、ロゴ、罫線、印鑑の断片などからなる。後述の分類器の精度向上のため、画像片は連続画素の周辺領域をある程度含めて切り出すようにする。 The continuous pixel detection unit 253 detects an area where white pixels are continuous from the preprocessed image 402 and cuts it out as an image piece. Image fragments consist of fragments of letters, letters, logos, ruled lines, and fragments of seals. In order to improve the accuracy of the classifier, which will be described later, image pieces are cut out to include some area surrounding continuous pixels.

画像片５０１－５０６は連続画素検出部２５３によって切り出された画像片の例である。 Image pieces 501 to 506 are examples of image pieces cut out by the continuous pixel detection unit 253.

図６は、画像片分類部２５４による処理結果の一例を示す図である。 FIG. 6 is a diagram illustrating an example of a processing result by the image segment classification unit 254.

画像片５０１－５０６はその由来が文字であるか否かによって、文字と非文字に分類される。図６の例では、５０１、５０２、５０５が文字に分類され、５０３、５０４、５０６が非文字に分類されている。非文字として分類されたものは、ロゴや印鑑や罫線などである。 Image pieces 501-506 are classified into text and non-text depending on whether or not their origin is text. In the example of FIG. 6, 501, 502, and 505 are classified as characters, and 503, 504, and 506 are classified as non-characters. Items classified as non-text include logos, seals, and ruled lines.

文字、非文字の分類の手がかりとしては、分類モデル２５５が使われる。分類モデル２５５は機械学習によって文字、非文字の特徴を記憶した学習モデルである。機械学習による画像の分類モデルとしてはＶＧＧ、ＲｅｓＮｅｔ等が知られている。 A classification model 255 is used as a clue for classifying characters and non-characters. The classification model 255 is a learning model that stores character and non-character features through machine learning. VGG, ResNet, and the like are known as image classification models based on machine learning.

図７は、文字領域推定部２５６の処理の流れを示すフローチャートである。 FIG. 7 is a flowchart showing the process flow of the character area estimation unit 256.

ステップＳ７０１では、文字領域の推定に使用するワーク画像８０１を生成する。ワーク画像８０１は前処理画像４０２と同サイズで画像全体が黒で塗りつぶされている画像である。 In step S701, a work image 801 used for character area estimation is generated. The work image 801 has the same size as the preprocessed image 402, and the entire image is filled with black.

ステップＳ７０２－Ｓ７０５では連続画素検出部２５３によって検出され、画像片分類部２５４によって分類された各画像片に対して処理を行う。 In steps S702 to S705, each image piece detected by the continuous pixel detection unit 253 and classified by the image piece classification unit 254 is processed.

ステップＳ７０３では、処理対象の画像片が文字として分類されているかどうかを参照し、文字と分類されていた場合は、処理をステップＳ７０４に移行する。 In step S703, it is checked whether the image piece to be processed is classified as a character. If it is classified as a character, the process moves to step S704.

文字として分類されていない場合は、次の画像片に対する処理に移行する。 If it is not classified as a character, the processing moves on to the next image fragment.

ステップＳ７０４では、ワーク画像８０１に、処理対象の画像片が検出された領域を描画する。 In step S704, an area where an image piece to be processed is detected is drawn on the workpiece image 801.

図８は、ステップＳ７０４で描画された画像片が検出された領域（矩形で示した領域）の一例である。前処理画像４０２から検出された連続画素のうち文字と分類されたもののバウンディングボックス（矩形領域）をワーク画像８０１上に白い領域（矩形）として描画している。描画された矩形の集合は矩形群８０２となる。ここでは文字以外の要素（下線、罫線など）に対応する矩形は描画されず、これにより帳票内から文字が書かれた領域だけを抽出するという目的を実現する。 FIG. 8 is an example of an area (area indicated by a rectangle) in which the image piece drawn in step S704 is detected. Bounding boxes (rectangular areas) of continuous pixels detected from the preprocessed image 402 that are classified as characters are drawn on the work image 801 as white areas (rectangles). A set of drawn rectangles becomes a rectangle group 802. Here, rectangles corresponding to elements other than characters (underlines, ruled lines, etc.) are not drawn, thereby achieving the purpose of extracting only the area where characters are written from within the form.

ステップＳ７０６では、ステップＳ７０４で描画されたそれぞれの矩形の領域を拡張する。具体的には、ワーク画像８０１内の白い画素領域を拡張し矩形間の隙間を埋めて結合することにより、編や旁などに分割された文字内の要素を文字列のレベルまでまとめていく。 In step S706, each rectangular area drawn in step S704 is expanded. Specifically, by expanding the white pixel area in the work image 801, filling in the gaps between the rectangles, and merging them, the elements in the characters that have been divided into sections, columns, etc. are brought together to the level of a character string.

矩形領域を拡張について、具体的には、例えば、あらかじめ決まった画素数分だけ各矩形を広げるという方法や、矩形のサイズに応じた割合（２０％など）で広げるといった方法がある。どちらの場合も、上下の行の文字列と結合されてしまうことを低減させるため、主に横方向に広げ、縦方向には少しだけ広げるのが望ましい。 Specifically, for expanding a rectangular area, for example, there is a method of expanding each rectangle by a predetermined number of pixels, or a method of expanding each rectangle by a proportion (such as 20%) depending on the size of the rectangle. In either case, in order to reduce the possibility of being combined with character strings in the upper and lower rows, it is desirable to spread it mainly in the horizontal direction and only slightly in the vertical direction.

ステップＳ７０７では、拡張された矩形群に対して再度連続画素のまとまりを抽出する。 In step S707, a group of continuous pixels is extracted again from the expanded rectangular group.

ステップＳ７０８では、ステップＳ７０７で抽出したまとまりを内包するバウンディングボックスでワーク画像８０１を塗りつぶす。 In step S708, the work image 801 is filled with a bounding box that includes the group extracted in step S707.

ステップＳ７０９では、ワーク画像内の孤立した矩形を連結する。 In step S709, isolated rectangles in the workpiece image are connected.

ステップＳ７１０では、ステップＳ７０９で連結した矩形を内包するバウンディングボックスでワーク画像８０１を塗りつぶす。 In step S710, the work image 801 is filled with a bounding box that includes the rectangles connected in step S709.

以上のように、文字の部品単位や文字単位で検出された領域を拡張し結合していくことで、文字列単位の領域を特定することが可能となる。 As described above, by expanding and combining areas detected in character parts or character units, it is possible to specify areas in character string units.

図９は、ステップＳ７０６、Ｓ７０７、Ｓ７０８によって文字のまとまりを推定する処理の一例を示す図である。 FIG. 9 is a diagram illustrating an example of a process for estimating a group of characters in steps S706, S707, and S708.

矩形で示した各領域を拡張することにより領域群９０１が得られ、領域群９０１内の各領域のバウンディングボックスを塗りつぶすことで文字列候補群９０２を得る。 A region group 901 is obtained by expanding each region shown by a rectangle, and a character string candidate group 902 is obtained by filling in the bounding box of each region within the region group 901.

ステップＳ７０９、Ｓ７１０では、ワーク画像８０１中の孤立した矩形を他の矩形に連結して１つの文字列としてまとめる。 In steps S709 and S710, isolated rectangles in the work image 801 are connected to other rectangles and combined into one character string.

帳票中の見出しの中には文字間が大きく開いたものがあり、そうした見出しの中にはステップＳ７０６では結合できずに見出し中の１文字が孤立してしまう場合が多い。ここではそうした孤立文字同士を連結して本来の文字列のまとまりに統合することが可能となる。 Some headings in a form have large spaces between characters, and in many cases such headings cannot be combined in step S706, resulting in one character in the heading becoming isolated. Here, it is possible to concatenate such isolated characters and integrate them into the original string.

図１０は、ステップＳ７０９、Ｓ７１０による孤立文字の結合処理の一例を示す図である。ここで、ワーク画像８０１には、孤立した文字列候補１０１０、１０１１、１０１２が存在しているものとする。これらは図４の入力画像４０１上では本来「納品書０１」という１つの文字列を形成しているものである。 FIG. 10 is a diagram showing an example of the process of combining isolated characters in steps S709 and S710. Here, it is assumed that isolated character string candidates 1010, 1011, and 1012 exist in the work image 801. These characters originally form one character string "Delivery Note 01" on the input image 401 in FIG. 4.

ステップＳ７０９では、各文字列候補領域に対して、矩形のアスペクト比が所定の閾値よりも１に近い（すなわち、１文字だけ孤立していると推定される）、水平方向の一定以内の距離に同じ高さの文字列候補領域が存在している、という２つの条件を満たす領域を直線で結び、連続画素となるよう加工する。矩形のアスペクト比が所定の閾値よりも１に近いとは、具体的には例えば以下のような条件のいずれかとなる。
・Ｔｈ１＞（矩形の横サイズ／縦サイズ）＞Ｔｈ２（Ｔｈ１＞１、Ｔｈ２＜１）という条件。
・（矩形の長辺サイズ／短辺サイズ）＜Ｔｈ３（＞１）という条件。
・（矩形の短辺サイズ／長辺サイズ）＞Ｔｈ４（＜１）という条件。 In step S709, for each character string candidate region, the aspect ratio of the rectangle is closer to 1 than a predetermined threshold (that is, it is estimated that only one character is isolated), and the rectangle is within a certain distance in the horizontal direction. Areas that satisfy the two conditions that character string candidate areas of the same height exist are connected with a straight line and processed to form continuous pixels. Specifically, the aspect ratio of the rectangle is closer to 1 than the predetermined threshold value, for example, under any of the following conditions.
- Condition: Th1>(horizontal size/vertical size of rectangle)>Th2 (Th1>1, Th2<1).
・(Long side size of rectangle/short side size) <Th3 (>1) condition.
・(Short side size of rectangle/long side size)> Th4 (<1) condition.

図１０では、文字候補矩形１０１０から同１０１１、同１０１１から１０１０、同１０１１から同１０１２の組み合わせが上記の条件に該当する。これらの文字候補矩形を直線で連結すると連続画素領域１０１３が得られる。文字候補矩形１０１０と１０１１の一つ下にある２行目先頭の矩形は、アスペクト比が所定の閾値よりも１に近いという条件は満たすが、水平方向に一定以内の距離に同じ高さの文字列候補が存在するという条件を満たさないため、非連結対象となっている。 In FIG. 10, the combinations of character candidate rectangles 1010 to 1011, 1011 to 1010, and 1011 to 1012 correspond to the above condition. A continuous pixel area 1013 is obtained by connecting these character candidate rectangles with straight lines. The rectangle at the beginning of the second line, which is one below the character candidate rectangles 1010 and 1011, satisfies the condition that the aspect ratio is closer to 1 than the predetermined threshold, but there are characters with the same height within a certain distance in the horizontal direction. Since the condition that column candidates exist is not met, it is not subject to concatenation.

ステップＳ７１０では、この状態のワーク画像８０１に対してステップＳ７０８と同様に連続画素領域のバウンディングボックスを抽出して塗りつぶす。これにより文字列領域群１００１が得られる。 In step S710, a bounding box of a continuous pixel area is extracted and filled in for the work image 801 in this state, as in step S708. As a result, a character string area group 1001 is obtained.

ステップＳ７１１では、ステップＳ７１０で抽出されたバウンディングボックスに対応する位置にある文字列画像を入力画像から取得する。 In step S711, a character string image located at a position corresponding to the bounding box extracted in step S710 is acquired from the input image.

図１１は、文字検出処理の出力結果の一例を示す図である。 FIG. 11 is a diagram showing an example of the output result of the character detection process.

この例は入力画像４０１に対して文字列領域群１００１を当てはめたものである。入力画像４０１からバウンディングボックスに対応する領域をそれぞれ切り出すことで、文字列画像１１０１－１１０６を取得する。 In this example, a character string area group 1001 is applied to an input image 401. By cutting out regions corresponding to bounding boxes from the input image 401, character string images 1101-1106 are obtained.

ステップＳ７１２では、ステップＳ７１１で取得された文字列画像に係る文字が活字（第１種別の文字）であるか手書き文字（第２種別の文字）であるかを分類する。分類にあたっては、活字手書き分類モデル２６１に格納されたパラメータを用いて行う。活字手書き分類モデル２６１としては、活字と手書き文字とを学習（機械学習）させることで生成された学習済みモデルが好適な例である。すなわち、ステップＳ７１２では、ステップＳ７１１で取得された文字列画像のそれぞれについて、手書きと活字のいずれであるかを判定する。あるいは、活字であるか否かまたは手書きであるか否かを判定する。そして判定結果に基づいて分類を行う。 In step S712, it is classified whether the characters related to the character string image acquired in step S711 are printed characters (characters of the first type) or handwritten characters (characters of the second type). The classification is performed using parameters stored in the printed and handwritten classification model 261. A suitable example of the printed and handwritten classification model 261 is a trained model generated by learning (machine learning) the printed and handwritten characters. That is, in step S712, it is determined whether each character string image acquired in step S711 is handwritten or printed. Alternatively, it is determined whether the text is printed or handwritten. Then, classification is performed based on the determination results.

図１４は、活字手書き分類部２６０が文字列画像を活字と手書き文字とに分類した様子を示す図である。 FIG. 14 is a diagram showing how the printed and handwritten character classification unit 260 classifies a character string image into printed characters and handwritten characters.

文字列画像１４０１－１４０６のうち、１４０２、１４０３、１４０４が活字に分類され、１４０１、１４０５、１４０６が手書きに分類されている様子を示している。 Among character string images 1401-1406, characters 1402, 1403, and 1404 are classified as printed characters, and characters 1401, 1405, and 1406 are classified as handwritten characters.

そして、文字認識部２５８によって、分類された各文字列画像に対して文字認識処理を行う。文字認識処理においては、ステップＳ７１２の分類結果に応じて、活字と分類された文字列については活字に適した文字認識エンジンを用いて文字認識を行い、手書き文字と分類された文字列については、手書き文字に適した文字認識エンジンを用いて文字認識を行うといったように、文字認識エンジンを使い分けることで、より適切な文字認識結果を得ることが可能となる。 Then, the character recognition unit 258 performs character recognition processing on each classified character string image. In the character recognition process, according to the classification result in step S712, character strings classified as printed characters are recognized using a character recognition engine suitable for printed characters, and character strings classified as handwritten characters are recognized. By using different character recognition engines, such as performing character recognition using a character recognition engine suitable for handwritten characters, it is possible to obtain more appropriate character recognition results.

また、活字として分類された文字列については文字認識処理を行わず、手書き文字として分類された文字列について文字認識処理を行うようにすることで、手書きされる前の帳票（活字文字列が記載された帳票）を予め登録しなくても手書き後の帳票から手書き文字列を抽出することが可能となる。この場合、ステップＳ７１２では、活字であるか否かの判定、分類は行わず、手書き文字であるか否かの判定に応じて、手書き文字であると判定された文字列画像に対して文字認識処理を行い、手書き文字であると判定されなかた文字列画像には文字認識処理を行わないようにすればよい。 In addition, character recognition processing is not performed on character strings classified as printed characters, but character recognition processing is performed on character strings classified as handwritten characters. It becomes possible to extract a handwritten character string from a handwritten form without registering the written form in advance. In this case, in step S712, character string images determined to be handwritten characters are recognized based on the determination as to whether or not they are handwritten characters, without determining whether or not they are printed characters or classifying them. The character recognition process may not be performed on character string images that are not determined to be handwritten characters.

図１２は、結果出力部２５９による出力結果の一例を示す図である。本実施例では出力結果１２０１はＪＳＯＮ形式のテキストファイルとして文字領域のＩＤ、矩形座標、読み取ったテキスト内容を含んでいる。 FIG. 12 is a diagram showing an example of an output result by the result output unit 259. In this embodiment, the output result 1201 is a JSON format text file that includes the ID of the character area, the rectangle coordinates, and the read text contents.

セル分割部２５７は、帳票内の表に関して、表領域とセル領域の情報が外部から与えられている場合に、セルの情報を用いて文字検出結果を分割する処理を行う。 The cell dividing unit 257 performs a process of dividing a character detection result using cell information when information on a table area and a cell area is provided from the outside regarding a table in a form.

文字領域推定部２５６は画像片分類部２５４によって文字と分類された領域のみを対象にして処理を行うため、この時点で罫線の情報が失われており、文字領域推定部２５６の出力結果は複数のセル内の文字列が結合されている場合がある。この結果を補正するため、セル矩形の情報を用いて複数のセルにまたがった文字列領域を分割する。 Since the character area estimating unit 256 processes only the areas classified as characters by the image segment classifying unit 254, the information on the ruled lines is lost at this point, and the output results of the character area estimating unit 256 are multiple. Strings in cells may be concatenated. In order to correct this result, a character string area spanning multiple cells is divided using cell rectangle information.

図１３は、セル分割部の処理の一例を示す図である。 FIG. 13 is a diagram illustrating an example of processing by the cell division section.

入力画像１３０１を入力として本手法で文字検出を行った場合、文字列画像１３１１－１３１３が得られるが、このうち文字列画像１３１２は右詰と左詰のテキストのセルが隣接しているため、文字列がセルをまたいで結合されている。 When character detection is performed using this method using input image 1301 as input, character string images 1311 to 1313 are obtained, but among these, character string image 1312 has adjacent right-aligned and left-aligned text cells, so Strings are concatenated across cells.

表、セルの情報が外部から表１３２０、セル１３２１－１３２４としてそれぞれの矩形情報が与えられた場合、その情報を用いて文字列画像１３１２を文字列画像１３１４と１３１５に分割する。 When rectangular information is given from the outside as table 1320 and cells 1321-1324, character string image 1312 is divided into character string images 1314 and 1315 using that information.

このように、表形式の領域については、複数のセルに記載された文字を一つの領域と特定することなく、それぞれのセル毎に文字列領域を特定し、ＯＣＲ処理を実行することが可能となる。 In this way, for a tabular area, it is possible to specify the character string area for each cell and perform OCR processing without identifying characters written in multiple cells as one area. Become.

以上説明した通り、本願発明では、ＯＣＲの対象の画像から、連続して存在する画素を取得し、取得した画素に基づき、文字領域を推定する。そして、推定された文字領域の単位でＯＣＲ処理を実行する。このように、画素片から文字を検出することで、予め文字が存在する領域を限定してから検出する方法に比べ、文字の見落としを防ぐことが可能となる。 As described above, in the present invention, consecutive pixels are acquired from an image to be subjected to OCR, and a character area is estimated based on the acquired pixels. Then, OCR processing is performed for each estimated character area. By detecting characters from pixel pieces in this way, it is possible to prevent characters from being overlooked, compared to a method in which the area where characters exist is limited in advance and then detected.

また、推定された文字領域を結合することで、一つの文字を２文字と判定してしまう（例えば偏と旁を別々の文字と認識してしまう）ことを低減させることが可能とある。 Furthermore, by combining the estimated character regions, it is possible to reduce the possibility that one character is determined to be two characters (for example, ``bia'' and 旁 are recognized as separate characters).

また、推定された文字領域同士を結合することで、帳票のタイトル等でよく見られる文字と文字の間隔が広い文字列についても、１つの文字列として認識することが可能となる。 Furthermore, by combining the estimated character areas, it becomes possible to recognize character strings with wide spacing between characters, such as those often seen in the title of a form, as a single character string.

また、本実施例のように、文字領域の結合や分割処理を実施したあとに、活字と手書き文字とを分類することで、一つの文字であるにもかかわらず、偏は手書き文字、旁は活字と分類してしまうことを低減させることが可能となる。 In addition, as in this example, by classifying printed characters and handwritten characters after combining and dividing character regions, even though they are one character, it is possible to distinguish between handwritten characters and handwritten characters. It is possible to reduce the possibility of classification as printed text.

また、表領域の情報（表領域の位置やセルの形状・位置など）の情報に基づき推定された文字領域を分割することで、複数のセルに記入された文字列を一つの文字列として認識してしまうことを低減させることが可能となる。 In addition, by dividing the estimated character area based on table area information (table area position, cell shape/position, etc.), character strings written in multiple cells are recognized as one character string. This makes it possible to reduce the number of things that happen.

本発明は、例えば、システム、装置、方法、プログラムもしくは記録媒体等としての実施態様をとることが可能である。具体的には、複数の機器から構成されるシステムに適用しても良いし、また、一つの機器からなる装置に適用しても良い。 The present invention can be implemented as, for example, a system, an apparatus, a method, a program, a recording medium, or the like. Specifically, the present invention may be applied to a system consisting of a plurality of devices, or may be applied to a device consisting of a single device.

また、本発明におけるプログラムは、図７に示すフローチャートの処理方法をコンピュータが実行可能なプログラムであり、本発明の記憶媒体は図７の処理方法をコンピュータが実行可能なプログラムが記憶されている。なお、本発明におけるプログラムは図７の各装置の処理方法ごとのプログラムであってもよい。 Further, the program according to the present invention is a program that allows a computer to execute the processing method shown in the flowchart shown in FIG. 7, and the storage medium of the present invention stores a program that allows a computer to execute the processing method shown in FIG. Note that the program in the present invention may be a program for each processing method of each device shown in FIG.

以上のように、前述した実施形態の機能を実現するプログラムを記録した記録媒体を、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（またはＣＰＵやＭＰＵ）が記録媒体に格納されたプログラムを読み出し、実行することによっても本発明の目的が達成されることは言うまでもない。 As described above, a recording medium recording a program that implements the functions of the embodiments described above is supplied to a system or device, and the computer (or CPU or MPU) of the system or device reads the program stored in the recording medium. It goes without saying that the object of the present invention can also be achieved by reading and executing.

この場合、記録媒体から読み出されたプログラム自体が本発明の新規な機能を実現することになり、そのプログラムを記録した記録媒体は本発明を構成することになる。 In this case, the program itself read from the recording medium will realize the novel function of the present invention, and the recording medium on which the program is recorded constitutes the present invention.

プログラムを供給するための記録媒体としては、例えば、フレキシブルディスク、ハードディスク、光ディスク、光磁気ディスク、ＣＤ－ＲＯＭ、ＣＤ－Ｒ、ＤＶＤ－ＲＯＭ、磁気テープ、不揮発性のメモリカード、ＲＯＭ、ＥＥＰＲＯＭ、シリコンディスク等を用いることが出来る。 Examples of recording media for supplying programs include flexible disks, hard disks, optical disks, magneto-optical disks, CD-ROMs, CD-Rs, DVD-ROMs, magnetic tapes, non-volatile memory cards, ROMs, EEPROMs, and silicon A disk or the like can be used.

また、コンピュータが読み出したプログラムを実行することにより、前述した実施形態の機能が実現されるだけでなく、そのプログラムの指示に基づき、コンピュータ上で稼働しているＯＳ（オペレーティングシステム）等が実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 In addition, by executing a program read by a computer, not only the functions of the above-described embodiments are realized, but also the OS (operating system) etc. running on the computer are realized based on the instructions of the program. It goes without saying that this also includes a case where part or all of the processing is performed and the functions of the embodiments described above are realized by the processing.

さらに、記録媒体から読み出されたプログラムが、コンピュータに挿入された機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書き込まれた後、そのプログラムコードの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵ等が実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Furthermore, after the program read from the recording medium is written to the memory of the function expansion board inserted into the computer or the function expansion unit connected to the computer, the function expansion board It goes without saying that this also includes a case where a CPU or the like provided in a function expansion unit or the like performs part or all of the actual processing, and the functions of the above-described embodiments are realized by the processing.

また、本発明は、複数の機器から構成されるシステムに適用しても、ひとつの機器から成る装置に適用しても良い。また、本発明は、システムあるいは装置にプログラムを供給することによって達成される場合にも適応できることは言うまでもない。この場合、本発明を達成するためのプログラムを格納した記録媒体を該システムあるいは装置に読み出すことによって、そのシステムあるいは装置が、本発明の効果を享受することが可能となる。 Moreover, the present invention may be applied to a system made up of a plurality of devices, or to a device made up of one device. It goes without saying that the present invention can also be applied to cases where the present invention is achieved by supplying a program to a system or device. In this case, by reading a recording medium storing a program for achieving the present invention into the system or device, the system or device can enjoy the effects of the present invention.

さらに、本発明を達成するためのプログラムをネットワーク上のサーバ、データベース等から通信プログラムによりダウンロードして読み出すことによって、そのシステムあるいは装置が、本発明の効果を享受することが可能となる。なお、上述した各実施形態およびその変形例を組み合わせた構成も全て本発明に含まれるものである。 Further, by downloading and reading a program for achieving the present invention from a server, database, etc. on a network using a communication program, the system or device can enjoy the effects of the present invention. Note that all configurations that are combinations of the above-described embodiments and their modifications are also included in the present invention.

１００ＬＡＮ
１０１クライアントＰＣ
１０２スキャナ
１０３ファイルサーバー 100 LAN
101 Client PC
102 Scanner 103 File server

Claims

continuous pixel acquisition means for acquiring a plurality of pixel pieces formed by consecutive pixels of the same pixel value in a binarized image that is a target of character recognition;
a first specifying means for specifying, for each of the plurality of pixel pieces acquired by the continuous pixel acquisition means, a rectangular area including the area where the pixel piece is acquired;
a second specifying means for specifying a character area indicating an area for each character by extending the rectangular area specified by the first specifying means and combining it with another rectangular area;
Among the character areas specified by the second specifying means, a positional relationship in which the aspect ratio of the character area is closer to 1 than a predetermined threshold is within a certain distance in the horizontal direction and at the same height. a third specifying means for specifying a multi-character area including a plurality of characters by combining it with other specific character areas in the area;
Character recognition means that executes, for each of the plurality of character regions specified by the third identification means, recognition processing of characters included in the plurality of character regions ;
An information processing system comprising:

When the multiple character area exists across multiple cells in the table area, further comprising dividing means for dividing the multiple character area at the boundaries of the multiple cells ,
2. The information processing system according to claim 1 , wherein, when a plurality of character regions are divided by the dividing means, the character recognition means executes the recognition process for each divided region.

A continuous pixel acquisition step in which the continuous pixel acquisition means of the information processing system acquires a plurality of pixel pieces formed by consecutive pixels of the same pixel value in a binarized image that is a target of character recognition. and,
a first identifying step in which the first identifying means of the information processing system identifies, for each of the plurality of pixel pieces acquired in the continuous pixel acquiring step, a rectangular area including the area where the pixel piece is acquired; ,
A second specifying means of the information processing system specifies a character area indicating an area for each character by expanding the rectangular area specified in the first specifying step and combining it with another rectangular area. a second identification step;
The third specifying means of the information processing system is configured to determine, among the character areas specified in the second specifying step, a specific character area whose aspect ratio is closer to 1 than a predetermined threshold in a horizontal direction. a third specifying step of specifying a multi-character area containing a plurality of characters by combining it with another specific character area within a distance of and having the same height positional relationship;
a character recognition step in which the character recognition means of the information processing system executes, for each multiple character region specified in the third specifying step , a recognition process for characters included in the multiple character region ;
An information processing method comprising:

A program for causing a computer to function as each means according to claim 1 or 2 .