JP2022019257A

JP2022019257A - Information processing device, information processing method, and program

Info

Publication number: JP2022019257A
Application number: JP2020122994A
Authority: JP
Inventors: 妙子山▲崎▼; Taeko Yamazaki
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2020-07-17
Filing date: 2020-07-17
Publication date: 2022-01-27
Anticipated expiration: 2040-07-17
Also published as: JP7532124B2

Abstract

To enable highly accurate character recognition processing on document images even when dealing with documents of various formats.SOLUTION: An information processing device for performing character string detection processing suitable for identifying a horizontal character string area is provided, the information processing device being configured to perform character string detection processing as it is for a document image with a horizontal line direction, and perform character string area detection processing for a document with a vertical line direction after rotating the image 90 degrees to make the line direction horizontal.SELECTED DRAWING: Figure 2

Description

本発明は、文書画像に含まれる文字列の記載領域を特定する技術に関する。 The present invention relates to a technique for specifying a description area for a character string included in a document image.

従来、文書をスキャンして得られた文書画像内の文字を読み取って認識する技術がある。この技術は一般にＯＣＲ（Optical Character Recognition）と呼ばれる。ＯＣＲ処理には通常、文書画像をＯＣＲに適した画像に修正する画像前処理、文書画像から文字の記載領域（文字列領域）を特定する文字列検出処理、検出された文字列領域に含まれる各文字を識別する文字認識処理とで構成される。このようなＯＣＲ処理に関し、特許文献１には、一般的な横書きの文書をスキャンして得た文書画像に対し、画像全体の縦横の射影から文字列領域を検出して文字認識を行う技術が開示されている。また、特許文献２には、名刺をスキャンして得た文書画像に対し、画像全体の縦横の射影から文字列の外接枠を検出し、さらに当該外接枠内の連結画素の数に基づいて氏名部分を特定して文字認識を行う技術が開示されている。 Conventionally, there is a technique of reading and recognizing characters in a document image obtained by scanning a document. This technique is generally called OCR (Optical Character Recognition). The OCR processing is usually included in an image preprocessing for modifying a document image into an image suitable for OCR, a character string detection process for specifying a character description area (character string area) from the document image, and a detected character string area. It consists of a character recognition process that identifies each character. Regarding such OCR processing, Patent Document 1 describes a technique for detecting a character string area from vertical and horizontal projections of the entire image and performing character recognition on a document image obtained by scanning a general horizontally written document. It has been disclosed. Further, in Patent Document 2, the external frame of the character string is detected from the vertical and horizontal projections of the entire image of the document image obtained by scanning the business card, and the name is further based on the number of connected pixels in the external frame. A technique for identifying a part and performing character recognition is disclosed.

特開平７－２００７３３号公報Japanese Unexamined Patent Publication No. 7-200733 特開平６－９６２７０号公報Japanese Unexamined Patent Publication No. 6-96270

上記ＯＣＲ処理の結果を用いて文書画像のインデキシングに用いることが従来から行われており、このインデキシングの一態様として、名刺をスキャンして保存するというユースケースがある。名刺は一般的な文書と異なり、用紙サイズが小さく、かつ、そこに含まれる文字数も少ないのが通常である。また、氏名等を縦書きで記載した縦型名刺も存在する。 Conventionally, the result of the OCR processing is used for indexing a document image, and as one aspect of this indexing, there is a use case of scanning and storing a business card. Unlike general documents, business cards are usually small in paper size and contain a small number of characters. There is also a vertical business card in which the name and the like are written vertically.

上記インデキシングにおいては、上述の縦型名刺のような縦書きかつ記載文字数が少ない（文字密度が低い）文書に対しても、その文書画像から文字列領域を適切に検出し、高精度に文字認識処理を行うことが求められる。しかしながら、特許文献１の技術は、一般的な文書、すなわち横書きかつ記載文字数が多い（文字密度が高い）文書しか想定しておらず、縦型名刺のような文書については精度よく処理することができない。また、特許文献２の技術は名刺に特化した技術である上、氏名付近に会社ロゴなどの模様があるなどの射影が上手く取れないような複雑なレイアウトの文書については想定されていない。 In the above indexing, even for a document such as the above-mentioned vertical business card that is written vertically and has a small number of characters (character density is low), the character string area is appropriately detected from the document image and the character recognition is performed with high accuracy. Processing is required. However, the technique of Patent Document 1 assumes only general documents, that is, documents that are written horizontally and have a large number of characters (high character density), and can process documents such as vertical business cards with high accuracy. Can not. Further, the technique of Patent Document 2 is a technique specialized for business cards, and it is not assumed for a document having a complicated layout such as a pattern such as a company logo near the name that cannot be projected well.

本開示の技術は、上記の問題に鑑みてなされたものであり、処理対象となる文書に横書きと縦書きが混在していても適切に文字列領域を検出し、高精度に文字認識処理を行うことを目的とする。 The technique of the present disclosure has been made in view of the above problems, and even if the document to be processed contains a mixture of horizontal writing and vertical writing, the character string area is appropriately detected and the character recognition processing is performed with high accuracy. The purpose is to do.

本開示に係る情報処理装置は、文書に記載されている文字が正立する状態の文書画像に対して、縦書きか横書きかを判定する処理を行う判定手段と、前記文書画像を回転させる処理を行う回転手段と、横書きの文字列領域を特定するのに適した文字列検出処理を行う検出手段と、を備え、前記検出手段は、前記判定の結果が横書きである場合には、前記回転手段によって回転されていない前記文書画像に対して前記文字列検出処理を行い、前記判定の結果が縦書きである場合には、前記回転手段によって９０度回転させた前記文書画像に対して前記文字列検出処理を行う、ことを特徴とする。 The information processing apparatus according to the present disclosure includes a determination means for determining whether to write vertically or horizontally for a document image in which the characters described in the document are upright, and a process for rotating the document image. The detection means includes a rotation means for performing the above-mentioned rotation and a detection means for performing a character string detection process suitable for specifying the character string area for horizontal writing, and the detection means performs the rotation when the result of the determination is horizontal writing. The character string detection process is performed on the document image that has not been rotated by the means, and when the result of the determination is vertical writing, the character on the document image rotated by 90 degrees by the rotation means. It is characterized in that it performs column detection processing.

本開示の技術によれば、処理対象となる文書に横書きと縦書きが混在していても適切に文字列領域を検出することができ、その結果、高精度に文字認識処理を行うことができる。 According to the technique of the present disclosure, a character string area can be appropriately detected even if horizontal writing and vertical writing are mixed in a document to be processed, and as a result, character recognition processing can be performed with high accuracy. ..

情報処理システムのハードウェア構成を示す図。The figure which shows the hardware configuration of an information processing system. システム全体の処理の流れを示すフローチャート。A flowchart showing the processing flow of the entire system. 縦書きの文書画像の一例。An example of a vertically written document image. 横書きの文書画像の一例。An example of a horizontally written document image. 縦書きの文書画像から文字列領域を検出する様子を説明する図。The figure explaining how the character string area is detected from the vertical document image. 後処理の詳細を示すフローチャート。A flowchart showing the details of post-processing.

以下、本発明の実施形態について図面に基づいて説明する。なお、実施形態は本発明を限定するものではなく、また、実施形態で説明されている全ての構成が本発明の課題を解決するため必須の手段であるとは限らない。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. It should be noted that the embodiments do not limit the present invention, and not all the configurations described in the embodiments are indispensable means for solving the problems of the present invention.

［実施形態１］
［ハードウェア構成］
図１は、本実施形態に係る情報処理システムのハードウェア構成を示す図である。情報処理システムは、複写機１００と、情報処理装置１１０とを有する。
複写機１００は、スキャナ１０１と、複写機側通信部１０２とを有する。スキャナ１０１は、文書のスキャンを行い、文書画像を生成する。複写機側通信部１０２は、ネットワークを介して、情報処理装置１１０を含む外部装置と通信を行う。 [Embodiment 1]
[Hardware configuration]
FIG. 1 is a diagram showing a hardware configuration of an information processing system according to the present embodiment. The information processing system includes a copying machine 100 and an information processing device 110.
The copying machine 100 includes a scanner 101 and a copying machine side communication unit 102. The scanner 101 scans the document and generates a document image. The copying machine side communication unit 102 communicates with an external device including the information processing device 110 via a network.

情報処理装置１１０は、システム制御部１１１と、ＲＯＭ１１２と、ＲＡＭ１１３と、ＨＤＤ１１４と、表示部１１５と、入力部１１６と、情報処理装置側通信部１１７とを有する。システム制御部１１１は、ＣＰＵなどの演算装置で構成され、ＲＯＭ１１２に記憶された制御プログラムを読み出して各種処理を実行する。ＲＡＭ１１３は、システム制御部１１１の主メモリ、ワークエリア等の一時記憶領域として用いられる。ＨＤＤ１１４は、各種データや各種プログラム等を記憶する。なお、後述する情報処理装置１１０の機能や処理は、システム制御部１１１がＲＯＭ１１２又はＨＤＤ１１４に格納されているプログラムを読み出し、このプログラムを実行することにより実現される。情報処理装置側通信部１１７は、ネットワークを介して、複写機１００を含む外部装置との通信処理を行う。表示部１１５は、各種情報を表示する。入力部１１６は、キーボードやマウスを有し、ユーザによる各種操作を受け付ける。なお、表示部１１５と入力部１１６は、タッチパネルのように一体に設けられてもよい。また、表示部１１５は、プロジェクタによる投影を行うものであってもよく、入力部１１６は、投影された画像に対する指先の位置を、カメラで認識するものであってもよい。 The information processing device 110 includes a system control unit 111, a ROM 112, a RAM 113, an HDD 114, a display unit 115, an input unit 116, and an information processing device side communication unit 117. The system control unit 111 is composed of an arithmetic unit such as a CPU, reads out a control program stored in the ROM 112, and executes various processes. The RAM 113 is used as a temporary storage area for the main memory, work area, etc. of the system control unit 111. The HDD 114 stores various data, various programs, and the like. The functions and processes of the information processing apparatus 110, which will be described later, are realized by the system control unit 111 reading a program stored in the ROM 112 or the HDD 114 and executing this program. The information processing device side communication unit 117 performs communication processing with an external device including the copying machine 100 via the network. The display unit 115 displays various information. The input unit 116 has a keyboard and a mouse, and accepts various operations by the user. The display unit 115 and the input unit 116 may be integrally provided like a touch panel. Further, the display unit 115 may be one that projects by a projector, and the input unit 116 may be one that recognizes the position of a fingertip with respect to the projected image by a camera.

本実施形態においては、複写機１００のスキャナ１０１が名刺等の紙文書をスキャンし、文書画像を生成する。文書画像は、複写機側通信部１０２により情報処理装置１１０に送信される。情報処理装置１１０においては、情報処理装置側通信部１１７が文書画像を受信し、これをＨＤＤ１１４などの記憶装置に記憶する。 In the present embodiment, the scanner 101 of the copying machine 100 scans a paper document such as a business card and generates a document image. The document image is transmitted to the information processing apparatus 110 by the communication unit 102 on the copying machine side. In the information processing device 110, the information processing device side communication unit 117 receives the document image and stores it in a storage device such as the HDD 114.

なお、図１のハードウェア構成は本実施形態を実現する構成の一例であり、例えば表示部１１５と入力部１１６の一部機能は、複写機１００にあってもよい。また複写機１００と情報処理装置１１０を一体化した構成であってもよい。 The hardware configuration of FIG. 1 is an example of a configuration for realizing the present embodiment. For example, some functions of the display unit 115 and the input unit 116 may be provided in the copying machine 100. Further, the copying machine 100 and the information processing apparatus 110 may be integrated.

＜全体処理フロー＞
次に、本実施形態に係る情報処理システムにおける、文書画像に対するＣＯＲ処理を実現するソフトウェアの動作フローについて、図２を用いて説明する。図２のフローチャートに示す一連の処理は、システム制御部１１１が所定のプログラムをＲＯＭ１１２等から読み出して実行することで実現される。なお、以下の説明において記号「Ｓ」はステップを意味する。 <Overall processing flow>
Next, the operation flow of the software that realizes the COR processing for the document image in the information processing system according to the present embodiment will be described with reference to FIG. The series of processes shown in the flowchart of FIG. 2 is realized by the system control unit 111 reading a predetermined program from the ROM 112 or the like and executing the process. In the following description, the symbol "S" means a step.

まず、Ｓ２０１では、文書画像のデータがＨＤＤ１１４から取得される。次に、Ｓ２０２では、取得した文書画像に対してその原稿の種類を判定する処理が実行される。本実施形態では、原稿種類が名刺か非名刺かを判定するものとする。この判定は、スキャン解像度と文書画像サイズに基づき行う。例えば、スキャン解像度が３００ＤＰＩの場合には、スキャンによって得られた文書画像の長辺が１０４０ピクセル±５％、短辺が６１５ピクセル±５％の範囲であれば名刺と判定し、範囲外であれば非名刺と判定する。なお、原稿種類の判定方法は画像サイズに基づく方法に限らない。例えば原稿種類を特定する情報を入力部１１６から受け付けてもよいし、或いは文書画像から特徴量を算出して予め学習された識別モデルにより判定してもよい。 First, in S201, the document image data is acquired from the HDD 114. Next, in S202, a process of determining the type of the original document is executed for the acquired document image. In this embodiment, it is determined whether the manuscript type is a business card or a non-business card. This determination is made based on the scan resolution and the document image size. For example, when the scan resolution is 300 DPI, if the long side of the document image obtained by scanning is in the range of 1040 pixels ± 5% and the short side is in the range of 615 pixels ± 5%, it is judged as a business card and it is out of the range. If it is judged as a non-business card. The method for determining the type of the original is not limited to the method based on the image size. For example, information for specifying the document type may be received from the input unit 116, or the feature amount may be calculated from the document image and determined by a discriminative model learned in advance.

Ｓ２０２における判定結果に基づき、Ｓ２０３では処理フローが分岐する。非名刺と判定された場合にはＳ２０４へ遷移し、名刺と判定された場合にはＳ２０５へ遷移する。
Ｓ２０４では、原稿種類に依らない汎用的な文字認識処理が実行される。文字認識処理が完了すると、本処理を終了する。 Based on the determination result in S202, the processing flow branches in S203. If it is determined to be a non-business card, it transitions to S204, and if it is determined to be a business card, it transitions to S205.
In S204, general-purpose character recognition processing that does not depend on the document type is executed. When the character recognition process is completed, this process ends.

Ｓ２０５では、文書画像に対して傾斜を補正する処理が実行される。スキャナ１０１により生成される文書画像は、原稿台への原稿の置き方により傾きが生じる。そこで、傾斜補正処理を行って傾きのない文書画像を得る。傾斜補正処理は、特許第４１１４９５９号などに開示される公知の方法を適用すればよい。 In S205, a process of correcting the inclination of the document image is executed. The document image generated by the scanner 101 is tilted depending on how the original is placed on the platen. Therefore, the tilt correction process is performed to obtain a document image without tilt. For the tilt correction process, a known method disclosed in Japanese Patent No. 4114959 may be applied.

次に、Ｓ２０６では、文書画像中の文字方向を判別する処理が実行される。ここで、文字方向は、文書画像中の文字が正立する方向を０度とした時の文字の方向と定義する。上記Ｓ２０５で傾斜補正処理が施された文書画像は、原稿が９０度単位で回転している場合がある。そこで、特許第３７２７９７１号などに開示される公知の方法を適用して、文書画像の文字方向を取得する。 Next, in S206, a process of determining the character direction in the document image is executed. Here, the character direction is defined as the character direction when the direction in which the characters in the document image stand upright is 0 degrees. In the document image subjected to the tilt correction processing in S205, the original may be rotated in units of 90 degrees. Therefore, a known method disclosed in Japanese Patent No. 3727971 or the like is applied to obtain the character direction of the document image.

次に、Ｓ２０７では、Ｓ２０６で判別された文字方向に基づき文書画像を回転させる処理が実行される。これにより、文字が正立した文書画像が得られる。 Next, in S207, a process of rotating the document image based on the character direction determined in S206 is executed. As a result, a document image in which the characters are upright can be obtained.

次に、Ｓ２０８では、文書画像から文字画素を抽出する処理が実行される。本実施形態では、各画素の輝度値に対して閾値処理を行う二値化により文字画素を抽出する。二値化には、例えば大津の二値化など公知の方法を適用すればよい。原稿中の文字は背景に比べて濃い色で印刷されるのが通常であるため、輝度値が閾値よりも小さい方の画素を文字画素とする。なお、文字画素の抽出方法は閾値処理による方法に限らない。例えば、任意の注目画素を中心にその近傍領域から画像特徴量を抽出し、該特徴量に基づき事前に学習された識別モデルに基づき注目画素が文字画素か否かを推論する方法で実現してもよい。 Next, in S208, a process of extracting character pixels from the document image is executed. In the present embodiment, character pixels are extracted by binarization in which threshold processing is performed on the luminance value of each pixel. For binarization, a known method such as binarization of Otsu may be applied. Since the characters in the document are usually printed in a darker color than the background, the pixel whose brightness value is smaller than the threshold value is used as the character pixel. The method for extracting character pixels is not limited to the method by threshold processing. For example, it is realized by a method of extracting an image feature amount from an area in the vicinity of an arbitrary pixel of interest and inferring whether or not the pixel of interest is a character pixel based on a discriminative model learned in advance based on the feature amount. May be good.

次に、Ｓ２０９では、Ｓ２０８にて抽出された文字画素に基づき、行方向を判別する処理が実行される。ここで行方向とは、文書画像中の文字が並ぶ方向であり、本実施形態では横（水平）方向、あるいは縦（垂直）方向の何れかとする。判別手法としては、例えば文書画像の全体に対し縦方向と横方向それぞれに射影ヒストグラムを生成し、分散が小さい方向を行方向として決定する手法などがある。この行方向判別処理により、例えば図３に示す名刺画像３０１であれば行方向は縦（垂直）方向と判別され、図４に示す名刺画像４０１であれば行方向は横（水平）方向と判別される。得られた判別結果はＲＡＭ１１３に格納される。 Next, in S209, a process of determining the line direction is executed based on the character pixels extracted in S208. Here, the line direction is the direction in which the characters in the document image are lined up, and in the present embodiment, it is either the horizontal (horizontal) direction or the vertical (vertical) direction. As a discrimination method, for example, there is a method of generating projection histograms in the vertical direction and the horizontal direction for the entire document image and determining the direction in which the variance is small as the row direction. By this row direction determination process, for example, in the case of the business card image 301 shown in FIG. 3, the row direction is determined to be the vertical (vertical) direction, and in the case of the business card image 401 shown in FIG. 4, the row direction is determined to be the horizontal (horizontal) direction. Will be done. The obtained determination result is stored in the RAM 113.

次のＳ２１０では、Ｓ２０９における判別結果に基づき処理フローが分岐する。判別の結果、行方向が横方向の場合にはＳ２１１へ遷移し、行方向が縦方向の場合にはＳ２１２へ遷移する。 In the next S210, the processing flow branches based on the determination result in S209. As a result of the determination, when the row direction is the horizontal direction, the transition is made to S211 and when the row direction is the vertical direction, the transition is made to S212.

行方向が横方向である場合のＳ２１１では、横書きの文字行を特定するのに適した文字列検出処理が実行される。具体的には、文書画像内に存在する黒画素塊の有無を水平方向に走査して、黒画素塊同士の間隔が一定範囲内にある複数の黒画素塊を特定し、当該特定された複数の黒画素塊を囲む外接矩形領域を１つの文字列領域として検出される。文書画像内の文字の並び方向が横方向の場合、文字間の隙間（黒画素塊同士の間隔＝白画素の数）は、垂直方向（上下方向）よりも水平方向（左右方向）の方が狭くなるのが通常である。そこで、横書きのひとまとまりの文字群の特定に適するよう処理パラメータを調整した領域解析を行うことによって、文書画像から文字列領域を検出する。具体的な文字列検出の方法としては、特開平７－２００７３３号などに開示される公知の方法を適用すればよい。これにより、例えば行方向が横方向である図４の名刺画像４０１に対して文字列検出処理を行うと、同図下に示すように５つの横長矩形の文字列領域４０２～４０６が検出されることになる。検出された文字列領域の情報は、ＲＡＭ１１３に格納される。 In S211 when the line direction is the horizontal direction, a character string detection process suitable for specifying a horizontal character line is executed. Specifically, the presence or absence of black pixel clusters existing in the document image is scanned in the horizontal direction to identify a plurality of black pixel clusters in which the distance between the black pixel clusters is within a certain range, and the specified plurality of black pixel clusters are specified. The extrinsic rectangular area surrounding the black pixel block of is detected as one character string area. When the characters in the document image are arranged in the horizontal direction, the gap between the characters (distance between black pixel blocks = number of white pixels) is larger in the horizontal direction (horizontal direction) than in the vertical direction (vertical direction). It is usually narrower. Therefore, the character string area is detected from the document image by performing area analysis in which the processing parameters are adjusted so as to be suitable for specifying a group of characters written horizontally. As a specific method for detecting a character string, a known method disclosed in Japanese Patent Application Laid-Open No. 7-270733 or the like may be applied. As a result, for example, when the character string detection process is performed on the business card image 401 of FIG. 4 in which the row direction is horizontal, five horizontally long rectangular character string areas 402 to 406 are detected as shown in the lower part of the figure. It will be. The information of the detected character string area is stored in the RAM 113.

行方向が縦方向である場合のＳ２１２では、文書画像を９０度回転させる回転処理が、文字列検出処理に先立って実行される。例えば、行方向が縦方向である図３の名刺画像３０１を９０度回転させ、図５に示すような名刺の上下が左右になるようにした名刺画像５０１を生成する。このように回転処理された後の名刺画像はＲＡＭ１１３に格納される。これにより、原稿（名刺）上では縦方向に並んでいる文字が回転後の画像中では横方向に並ぶことになる。これにより、行方向が縦方向の文書画像を疑似的に横方向の文書画像として扱うことができ、上述のＳ２１１と共通の文字列検出処理を適用することが可能になる。 In S212 when the line direction is the vertical direction, the rotation process of rotating the document image by 90 degrees is executed prior to the character string detection process. For example, the business card image 301 of FIG. 3 having a vertical row direction is rotated by 90 degrees to generate a business card image 501 in which the top and bottom of the business card as shown in FIG. 5 are left and right. The business card image after the rotation processing in this way is stored in the RAM 113. As a result, the characters arranged vertically on the manuscript (business card) are arranged horizontally in the rotated image. As a result, the document image in the vertical direction can be treated as a pseudo document image in the horizontal direction, and the character string detection process common to the above-mentioned S211 can be applied.

続くＳ２１３では、上述のＳ２１１と同様、横書きの文字行を特定するのに適した文字列検出処理が実行される。例えば、名刺画像３０１を９０度回転させた図５の名刺画像５０１の場合は、同図中央に示すように５つの文字列領域５０２～５０６が検出されることになる。こうして検出された文字列領域の情報は、ＲＡＭ１１３に格納される。なお、図３に示す回転前の名刺画像３０１に対しそのまま文字列検出処理を行ったとすると、例えば同図右に示すように６つの文字列領域３０２～３０７が検出される。この例では、苗字「城野」の部分が別々の文字列領域に分離されてしまっているのが分かる。処理対象が縦長タイプかつ縦書きの名刺の場合、本実施形態のように画像を９０度回転させてから文字列検出処理を行うことで、氏名のように文字間隔が広く離散的に配置されている文字部分の文字列領域をより適切に検出できることが分かる。 In the following S213, as in the case of S211 described above, a character string detection process suitable for specifying a horizontally written character line is executed. For example, in the case of the business card image 501 of FIG. 5 in which the business card image 301 is rotated by 90 degrees, five character string regions 502 to 506 are detected as shown in the center of the figure. The information of the character string area detected in this way is stored in the RAM 113. Assuming that the character string detection process is performed on the business card image 301 before rotation shown in FIG. 3 as it is, for example, six character string regions 302 to 307 are detected as shown on the right side of the figure. In this example, it can be seen that the part of the surname "Jono" is separated into separate character string areas. When the processing target is a vertically long type and vertically written business card, the character string detection process is performed after rotating the image by 90 degrees as in the present embodiment, so that the character spacing is wide and discretely arranged like the name. It can be seen that the character string area of the existing character part can be detected more appropriately.

次に、Ｓ２１４では、Ｓ２１３で検出された文字列領域を－９０度回転させる処理、すなわち、Ｓ２１２で回転させた方向とは逆の方向に同じ角度だけ回転させる処理が実行される。これにより、文字列領域内の文字が正立した状態に戻る。この際、検出された文字列領域に対応する部分画像を－９０度回転させてもよい。例えば図５に示す回転後の名刺画像５０１から、文字列領域５０２、５０３、５０４、５０５、５０６それぞれの部分画像を切り出して、各部分画像に対して－９０度回転させる処理を行ってもよい。また、検出された文字列領域の座標情報を－９０度回転させ、当該回転後の座標情報をＳ２０７で取得した文字が成立する方向の文書画像に適用してもよい。例えば、図５の名刺画像５０１’における文字列領域５０２’、５０３’、５０４’、５０５’、５０６’にそれぞれ対応する－９０度回転させた座標情報を求め、それを文字が正立する方向の文書画像である３０１に適用する。これにより、文字が正立する状態の文字列領域を取得できる。 Next, in S214, a process of rotating the character string region detected in S213 by −90 degrees, that is, a process of rotating the character string region in the direction opposite to the direction rotated in S212 by the same angle is executed. As a result, the characters in the character string area return to the upright state. At this time, the partial image corresponding to the detected character string area may be rotated by −90 degrees. For example, a partial image of each of the character string regions 502, 503, 504, 505, and 506 may be cut out from the rotated business card image 501 shown in FIG. 5 and rotated by −90 degrees with respect to each partial image. .. Further, the coordinate information of the detected character string region may be rotated by −90 degrees, and the coordinate information after the rotation may be applied to the document image in the direction in which the character acquired in S207 is established. For example, the coordinate information rotated by −90 degrees corresponding to the character string areas 502', 503', 504', 505', and 506'in the business card image 501'in FIG. 5 is obtained, and the direction in which the characters stand upright is obtained. It is applied to 301 which is a document image of. As a result, it is possible to acquire the character string area in which the character stands upright.

Ｓ２１５では、Ｓ２１４で取得した各文字列領域に対して文字認識処理が実行される。文字認識処理としては、例えば、文字列領域に対応する部分画像に含まれる文字の文字コードを、学習済みモデルを用いて推論する手法がある。学習済みモデルとは、文字画像を入力としてその文字コードを出力するよう学習された識別モデルである。Ｓ２１６でも同様に、Ｓ２１１で検出した各文字列領域に対して文字認識処理が実行される。Ｓ２１６の文字認識処理が完了すると、本処理を終了する。 In S215, the character recognition process is executed for each character string area acquired in S214. As a character recognition process, for example, there is a method of inferring a character code of a character included in a partial image corresponding to a character string area by using a trained model. The trained model is a discriminative model trained to output a character code by inputting a character image. Similarly, in S216, the character recognition process is executed for each character string area detected in S211. When the character recognition process of S216 is completed, this process ends.

Ｓ２１７では、Ｓ２１５で得られた文字認識結果に対して、後処理が実行される。この後処理の詳細については後述する。後処理が完了すると、本処理を終了する。 In S217, post-processing is executed for the character recognition result obtained in S215. The details of this post-processing will be described later. When the post-processing is completed, this processing ends.

以上が、文書画像に対するＣＯＲ処理を実現するソフトウェアの動作フローである。 The above is the operation flow of the software that realizes the COR processing for the document image.

＜後処理の詳細＞
続いて、Ｓ２１７の後処理について、図６のフローチャートを参照しつつ説明する。 <Details of post-processing>
Subsequently, the post-processing of S217 will be described with reference to the flowchart of FIG.

Ｓ６０１では、Ｓ２１５の文字認識処理で得られた認識結果（文字コード）が行単位に分割される。認識結果の中に改行コードが含まれていればそこで分割すればよい。さらに、行単位に分割した認識結果に含まれる１文字毎の位置情報を用いて、行単位に分割後の文字列領域それぞれの外接矩形の座標情報が生成される。例えば、図５の名刺画像５０１において複数行で構成される文字領列域５０５の場合は、参照符号５０７で示すように、２つの文字列領域５０８と５０９に分割され、分割後のそれぞれの文字列領域に対応する外接矩形の座標情報が生成される。得られた行単位の文字列領域の座標情報は、行単位の文字コードとともにＲＡＭ１１３に格納される。 In S601, the recognition result (character code) obtained in the character recognition process of S215 is divided into line units. If the recognition result includes a line feed code, it should be divided there. Further, using the position information for each character included in the recognition result divided into line units, the coordinate information of the circumscribed rectangle of each character string area after division is generated for each line unit. For example, in the case of the character string area 505 composed of a plurality of lines in the business card image 501 of FIG. 5, as shown by the reference numeral 507, the character string area 508 and 509 are divided into two character string areas 508 and 509, respectively. Coordinate information of the circumscribing rectangle corresponding to the column area is generated. The coordinate information of the obtained line-by-line character string area is stored in the RAM 113 together with the line-by-line character code.

次にＳ６０２では、Ｓ６０１で得られた行単位の文字時列領域について、英数字が支配的かどうかを文字コードに基づいて判定される。例えば、名刺画像３０１のように、縦長タイプでかつ縦書きの名刺に書かれるメールアドレスやホームページのＵＲＬは、横書きにしたものを９０度回転させた形式で記載されるケースが多い。汎用的な文字認識処理では、文字が正立した状態にあることを前提に識別モデルを構築するのが一般的であるものの、回転した文字の画像特徴量を別途学習しておくことで、メールアドレス等についても認識できる。その一方、類似した縦書き文字の誤判定も混入し得る。このため、認識結果からその文字行においては英数字が支配的であるか、すなわち、メールアドレスやＵＲＬである可能性が高いか否かを判定する。例えば、１行の認識文字数のうち５文字以上かつ、その行に含まれる文字数の過半数が英数記号である場合に、英数字が支配的と判断すればよい。なお、最低５文字以上の条件を設ける理由は、ＵＲＬのドメインやメールアドレスの標準技術仕様に基づくものである。英数字が支配的であると判定された場合にはＳ６０３へ遷移し、そうでない場合は本処理を終了する。 Next, in S602, it is determined based on the character code whether or not the alphanumericals are dominant in the line-based character time column area obtained in S601. For example, as in the case of the business card image 301, the e-mail address and the URL of the homepage written on the vertically long type and vertically written business card are often described in a format in which the horizontally written one is rotated by 90 degrees. In general-purpose character recognition processing, it is common to build an identification model on the premise that the characters are in an upright state, but by learning the image features of the rotated characters separately, mail You can also recognize addresses. On the other hand, misjudgment of similar vertical writing characters may be mixed. Therefore, from the recognition result, it is determined whether or not the alphanumericals are dominant in the character line, that is, whether or not there is a high possibility that the character line is an e-mail address or a URL. For example, if five or more characters are recognized in one line and the majority of the characters included in the line are alphanumerical symbols, it may be determined that alphanumerical characters are dominant. The reason for setting the condition of at least 5 characters is based on the standard technical specifications of the URL domain and the e-mail address. If it is determined that the alphanumerical characters are dominant, the process proceeds to S603, and if not, this process ends.

Ｓ６０３では、英数字が支配的と判定された行単位の文字列領域の座標情報を９０度回転させ、縦になっている文字列領域を横にする。この際、行単位の文字列領域の部分画像を生成し、当該部分画像を９０度回転させてもよい。また、座標情報のみを９０度回転させ、生成済みである文字列検出用に回転させた文書画像（図５の名刺画像５０１を参照）に対し、９０度回転した座標情報を適用させてもかまわない。これにより、図５における文字列領域５１０のような、元々は縦であったものを横にした文字列領域が得られる。これにより、中の文字が正立している状態の文字列領域となる。９０度回転後の文字列領域はＲＡＭ１１３に格納される。 In S603, the coordinate information of the character string area in line units determined to be dominant in alphanumericals is rotated by 90 degrees, and the vertical character string area is laid horizontally. At this time, a partial image of the character string region in line units may be generated, and the partial image may be rotated by 90 degrees. Further, the coordinate information rotated by 90 degrees may be applied to the document image (see the business card image 501 in FIG. 5) which is rotated by 90 degrees only for the coordinate information and rotated for detecting the character string that has already been generated. do not have. As a result, a character string area such as the character string area 510 in FIG. 5 is obtained by laying down what was originally vertical. As a result, it becomes a character string area in which the characters inside are upright. The character string area after rotating 90 degrees is stored in the RAM 113.

次にＳ６０４では、Ｓ６０３で得た回転後の文字列領域に対して文字認識処理が実行される。ここで実行する文字認識処理は先のＳ２１５やＳ２１６と同じ文字認識処理でもよいし、認識対象の文字種をアルファベット、数字、メールアドレスやＵＲＬで使用可能な記号などに絞った学習済みモデルを用いた専用の文字認識処理でもよい。得られた認識結果はＲＡＭ１１３に格納される。 Next, in S604, the character recognition process is executed for the rotated character string area obtained in S603. The character recognition process executed here may be the same character recognition process as in S215 and S216 above, and a trained model is used in which the character types to be recognized are narrowed down to alphabets, numbers, symbols that can be used in e-mail addresses and URLs, and the like. Dedicated character recognition processing may be used. The obtained recognition result is stored in the RAM 113.

次に、Ｓ６０５では、Ｓ６０４にて得られた認識結果が、先のＳ２１５における文字認識処理で得られた認識結果と統合される。この統合は、Ｓ６０４の認識結果の座標情報を、元の文書画像の座標系に合わせたものに変換して、Ｓ２１５の認識結果に組み込む処理と言い換えることが可能である。例えば、先に実行されるＳ２１５の文字認識処理では、その座標系は図５の名刺画像５０１’に従う。これに対し後処理におけるＳ６０４の文字認識処理では、名刺画像５０１’における文字列領域５０５’（行単位で分割後は、文字列領域５０８と５０９）を９０度回転させて横にした文字列領域５１０がその対象となるので座標系が一致しない。そこで、Ｓ６０４で得た文字列領域５１０の認識結果の座標情報を５０７における座標系、すなわち５０５’に該当する位置となるように座標情報を変換する。こうして座標系を一致させて、後処理で得られた認識結果と、先に得られている認識結果とを１つにまとめる。統合結果はＲＡＭ１１３に格納される。 Next, in S605, the recognition result obtained in S604 is integrated with the recognition result obtained in the character recognition process in S215. This integration can be rephrased as a process of converting the coordinate information of the recognition result of S604 into a coordinate system of the original document image and incorporating it into the recognition result of S215. For example, in the character recognition process of S215 executed earlier, the coordinate system follows the business card image 501'of FIG. On the other hand, in the character recognition processing of S604 in the post-processing, the character string area 505'in the business card image 501'(the character string areas 508 and 509 after being divided in line units) is rotated 90 degrees and laid horizontally. Since 510 is the target, the coordinate systems do not match. Therefore, the coordinate information of the recognition result of the character string region 510 obtained in S604 is converted into the coordinate information so as to be the position corresponding to the coordinate system in 507, that is, 505'. In this way, the coordinate systems are matched, and the recognition result obtained by the post-processing and the recognition result obtained earlier are combined into one. The integration result is stored in the RAM 113.

以上が、本実施形態に係る、後処理の内容である。 The above is the content of the post-processing according to this embodiment.

本実施形態によれば、対象文書が例えば縦型タイプで縦書きの名刺の場合、行方向が横になるよう画像を回転させた上で、横方向用の文字列検出処理を適用し、検出した文字列領域を文字が正立する方向に戻して文字認識処理を実行する。これにより高精度な文字認識結果を得ることが可能になる。また、縦書きの名刺内に横書きの英数文字が９０度傾いた状態で配置されている場合でも、後処理において、文字が正立した状態となるように回転させた上で文字認識処理を再び行ってその認識結果を先の認識結果と統合する。これにより、縦書き名刺内に含まれる横書きのメールアドレスやＵＲＬについても高精度で文字認識を実行できる。 According to the present embodiment, when the target document is, for example, a vertical type business card written vertically, the image is rotated so that the line direction is horizontal, and then the character string detection process for the horizontal direction is applied to detect the document. The character string area is returned to the direction in which the character stands upright, and the character recognition process is executed. This makes it possible to obtain highly accurate character recognition results. In addition, even if the horizontally written alphanumerical characters are arranged at an angle of 90 degrees in the vertical writing business card, in the post-processing, the character recognition processing is performed after rotating the characters so that they are in an upright state. Go again and integrate the recognition result with the previous recognition result. As a result, character recognition can be performed with high accuracy even for the horizontally written e-mail address and URL included in the vertically written business card.

（その他の実施例）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other examples)
The present invention supplies a program that realizes one or more functions of the above-described embodiment to a system or device via a network or storage medium, and one or more processors in the computer of the system or device reads and executes the program. It can also be realized by the processing to be performed. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

Claims

A determination means for determining whether to write vertically or horizontally for a document image in which the characters described in the document are upright.
A rotation means that performs a process of rotating the document image, and
A detection means that performs character string detection processing suitable for specifying the horizontal character string area, and
Equipped with
The detection means
When the result of the determination is horizontal writing, the character string detection process is performed on the document image that has not been rotated by the rotation means.
When the result of the determination is vertical writing, the character string detection process is performed on the document image rotated by 90 degrees by the rotation means.
An information processing device characterized by this.

The detection means horizontally scans the presence or absence of black pixel clusters existing in the document image to identify a plurality of black pixel clusters in which the distance between the black pixel clusters is within a certain range, and the identification is performed. The information processing apparatus according to claim 1, wherein an extrinsic rectangular region surrounding a plurality of black pixel blocks is detected as the character string region.

Further, a character recognition means for performing character recognition processing on the character string area detected from the document image is provided.
The character recognition means is
When the result of the determination is vertical writing, the detected character string area is rotated by −90 degrees to perform the first character recognition process.
The information processing apparatus according to claim 1 or 2.

The character recognition means is
When the character string area targeted for the first character recognition process is a character string area dominated by alphanumericals, the character string area is rotated 90 degrees to perform the second character recognition process.
The information processing apparatus according to claim 3.

The character recognition means is
It has a means for generating a character string area for each line based on the result of the first character recognition process.
When alphanumericals are dominant in the generated line-by-line character string area, the character string area is rotated by 90 degrees to perform the second character recognition process.
The information processing apparatus according to claim 4.

The information processing apparatus according to claim 4 or 5, wherein the character recognition means has a means for integrating the result of the first character recognition process and the result of the second character recognition process.

The information processing apparatus according to any one of claims 4 to 6, wherein the second character recognition process is a character recognition process for recognizing a character type used for an e-mail address or a URL.

The information processing apparatus according to any one of claims 1 to 7, wherein the document is a business card.

A determination step that performs a process of determining whether to write vertically or horizontally for a document image in which the characters described in the document are upright.
A rotation step that performs a process of rotating the document image, and
A detection step that performs a character string detection process suitable for specifying a horizontal character string area,
Including
In the detection step,
When the result of the determination is horizontal writing, the character string detection process is performed on the document image that has not been rotated by the rotation means.
When the result of the determination is vertical writing, the character string detection process is performed on the document image rotated by 90 degrees by the rotation means.
An information processing method characterized by that.

A program for causing a computer to function as the information processing apparatus according to any one of claims 1 to 8.