JP2006011966A

JP2006011966A - Character recognition device and character recognition program

Info

Publication number: JP2006011966A
Application number: JP2004190094A
Authority: JP
Inventors: Yutaka Koshi; 裕越; Shunichi Kimura; 俊一木村; Masanori Sekino; 雅則関野; Kazunori So; 一憲宋
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2004-06-28
Filing date: 2004-06-28
Publication date: 2006-01-12

Abstract

<P>PROBLEM TO BE SOLVED: To provide a device performing OCR processing at very high speed with high precision on a document including mixed characters varied in size and shape. <P>SOLUTION: This character recognition device is provided with a storage means 320 storing hierarchical code data acquired by applying hierarchical coding processing onto raster data of a document, a decoding means 340 performing decoding processing with a first bit rate on the whole or a part of a code string included by the hierarchical code data, and a recognition means 350 recognizing a character from an image obtained by decoding processing. When recognition accuracy by the recognition means 350 is lowered below a predetermined value, a bit rate in decoding processing is switched from the first bit rate to a second rate higher than the first bit rate. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、文字が記された文書の画像からテキストデータを取得する技術に関する。 The present invention relates to a technique for acquiring text data from an image of a document on which characters are written.

文書のラスタデータからその文書に記された文字のテキストデータを取得する処理は、一般にＯＣＲ（Optical Character Recognition）処理と呼ばれる。ＯＣＲ処理では、文書をスキャンして得たラスタデータから文字を描画している画像を各々切り出し、切り出した画像と予め辞書として準備されたパターンとを照合することで、文書内の文字を各々認識している。このため、文字の認識確度の向上という面から見れば、文書から得られるラスタデータは可能な限り高解像度であることが好ましい（特許文献１及び２参照）。ところが、ラスタデータを高解像度化すれば、そのデータを展開するメモリの容量も大きくせざるを得えず、展開したデータを処理するプロセッサの負担も大きくなる。つまり、処理コストの面から見れば、ラスタデータが高解像度であることは好ましくない。
このような相反する２つの要請を満たすべく、従来より、種々の技術が提案されてきた。例えば、特許文献３乃至５では、文字が記された原稿を比較的低解像度でスキャンして文字認識を試み、認識率が低い場合に解像度をあげて再度スキャンする文字認識技術が提案されている。また、特許文献６も同種の認識技術を提案する。この文献に開示されたデータ処理装置は、文字認識状況に応じてスキャナのズーム倍率を変動させるようになっている。特許文献７に開示された文字認識装置は、原稿に記された文字の属性（例えば、文字サイズ）の入力を受け付け、入力された属性を基に、どの程度の解像度でスキャンした画像であれば十分な認識確度が確保されるかを判断する。特許文献８に開示された帳票分類処理システムは、漢字などの複雑な文字が記された部分は高解像度でスキャンした画像を基に文字認識を行なう一方で、それ以外の部分は低解像度でスキャンした画像を基に文字認識を行なう。
特開平６−２３７３７９号公報特開平９−２８９６２４号公報特開２０００−２９９８７号公報特開２０００−２９３６３３号公報特開２００１−１０３３１１号公報特開平６−２００８９号公報特開２００２−２４７６６号公報特開平８−２７２８８４号公報 A process for acquiring text data of characters written in a document from raster data of the document is generally called an OCR (Optical Character Recognition) process. In the OCR process, each image in which characters are drawn is cut out from raster data obtained by scanning the document, and each character in the document is recognized by comparing the cut-out image with a pattern prepared in advance as a dictionary. is doing. For this reason, from the viewpoint of improving the character recognition accuracy, it is preferable that the raster data obtained from the document has the highest possible resolution (see Patent Documents 1 and 2). However, if the resolution of the raster data is increased, the capacity of the memory for expanding the data must be increased, and the burden on the processor that processes the expanded data also increases. That is, from the viewpoint of processing cost, it is not preferable that the raster data has a high resolution.
Conventionally, various techniques have been proposed to satisfy these two conflicting requirements. For example, Patent Documents 3 to 5 propose a character recognition technique in which an original on which characters are written is scanned at a relatively low resolution to try character recognition, and when the recognition rate is low, the resolution is increased and then scanned again. . Patent Document 6 also proposes the same kind of recognition technology. The data processing apparatus disclosed in this document varies the zoom magnification of the scanner according to the character recognition situation. The character recognition device disclosed in Patent Document 7 accepts input of an attribute (for example, character size) of a character written on a document, and can be an image scanned at what resolution based on the input attribute. Determine whether sufficient recognition accuracy is secured. The form classification processing system disclosed in Patent Document 8 performs character recognition based on an image scanned with high resolution for parts with complex characters such as kanji, while scanning other parts with low resolution. Character recognition is performed based on the obtained image.
JP-A-6-237379 JP-A-9-289624 JP 2000-29987 A JP 2000-293633 A JP 2001-103311 A JP-A-6-20089 Japanese Patent Laid-Open No. 2002-24766 Japanese Patent Application Laid-Open No. 8-27284

上述したように、できるだけ狭小な回路規模でＯＣＲ処理の認識率を向上させるための種々の技術がこれまで提案されており、その多くは、低解像度でスキャンした画像を用いて文字認識を行なうケースと、高解像度でスキャンした画像を用いて文字認識を行なうケースとを使い分ける点に着想を見出したものであった。しかしなら、上記一連の従来技術では、高解像度の画像を用いるかそれとも低解像度の画像を用いるかの判断が原稿毎に行なわれており、スキャン対象となる文書内に認識が難しい文字（例えば、漢字）と比較的認識しやすい文字（例えば、仮名文字、アルファベット）とが混在する場合の利用には不向きであった。なお、特許文献８には、一部分のみを高解像度でスキャンして文字認識に用いる旨の記載が見られるものの、この文献に開示された帳票分類処理システムは、複雑な文字の記される箇所が予め決まっている帳票類を認識対象として想定するものであり、漢字、仮名文字、アルファベットなどが混在する一般的な文書を認識対象として想定したものではない。
本発明は、このような背景の下に案出されたものであり、多種多様なサイズ、形状の文字が混在する文書へのＯＣＲ処理を極めて高速且つ高精度に実行する装置を提供することを目的とする。 As described above, various techniques for improving the recognition rate of OCR processing with a circuit scale as narrow as possible have been proposed so far, and in many cases, character recognition is performed using images scanned at a low resolution. And the idea of using a case where character recognition is performed using an image scanned at high resolution. However, in the above-described series of conventional techniques, whether to use a high-resolution image or a low-resolution image is determined for each original, and characters that are difficult to recognize in the document to be scanned (for example, It is not suitable for use in the case where characters (e.g., kanji) and characters that are relatively easy to recognize (for example, kana characters and alphabets) coexist. Note that although Patent Document 8 shows that only a part is scanned at a high resolution and used for character recognition, the form classification processing system disclosed in this document has a place where complicated characters are written. A pre-determined form is assumed as a recognition target, and a general document in which kanji, kana characters, alphabets, and the like are mixed is not assumed as a recognition target.
The present invention has been devised in view of such a background, and provides an apparatus for performing OCR processing on a document in which characters of various sizes and shapes are mixed with extremely high speed and high accuracy. Objective.

本発明の好適な態様である文字認識装置は、一又は複数の文字が記された文書を走査して得た画像を入力する入力手段と、前記入力された画像に階層的符号化処理を施し、ビットレートに応じて階層化された符号列を内包する階層的符号データを取得する符号化手段と、前記取得された階層的符号データを記憶する記憶手段と、前記記憶手段に記憶された階層的符号データが内包する符号列の全部又は一部に対し、指定されたビットレートでの復号化処理を施す復号化手段と、前記復号化処理により得た画像から文字を認識する認識手段と、前記認識手段による認識の成否に基づき、前記復号化処理におけるビットレートを制御する制御手段とを備える。 A character recognition device according to a preferred aspect of the present invention includes an input unit for inputting an image obtained by scanning a document in which one or more characters are written, and a hierarchical encoding process for the input image. Encoding means for acquiring hierarchical code data including a code string hierarchized according to the bit rate, storage means for storing the acquired hierarchical code data, and hierarchies stored in the storage means Decoding means for performing decoding processing at a specified bit rate for all or part of a code string included in the target code data, and recognition means for recognizing characters from the image obtained by the decoding processing; Control means for controlling a bit rate in the decoding process based on success or failure of recognition by the recognition means.

本発明の別の好適な態様である文字認識装置は、一又は複数の文字が記された文書を走査して得た画像を入力する入力手段と、前記入力された画像に階層的符号化処理を施し、ビットレートに応じて階層化された符号列を内包する階層的符号データを取得する符号化手段と、前記取得された階層的符号データを記憶する記憶手段と、前記記憶手段に記憶された階層的符号データが内包する符号列の全部又は一部に対し、第１のビットレートでの復号化処理を施す復号化手段と、前記復号化処理により得た画像から文字を認識する認識手段と、前記認識手段による認識の確度が所定値を下回ったとき、前記復号化処理におけるビットレートを前記第１のビットレートよりも高い第２のビットレートに切り換える制御手段とを備える。 According to another preferred aspect of the present invention, a character recognition device includes an input unit that inputs an image obtained by scanning a document in which one or more characters are written, and a hierarchical encoding process on the input image. And encoding means for acquiring hierarchical code data including a code string hierarchized according to the bit rate, storage means for storing the acquired hierarchical code data, and storing in the storage means Decoding means for performing decoding processing at a first bit rate on all or part of a code string included in the hierarchical code data, and recognition means for recognizing characters from the image obtained by the decoding processing And a control means for switching the bit rate in the decoding process to a second bit rate higher than the first bit rate when the accuracy of recognition by the recognition means falls below a predetermined value.

本発明の別の好適な態様である文字認識装置は、一又は複数の文字が記された文書を走査して得た画像を入力する入力手段と、前記入力された画像に階層的符号化処理を施し、ビットレートに応じて階層化された符号列を内包する階層的符号データを取得する符号化手段と、前記取得された階層的符号データを記憶する記憶手段と、前記記憶手段に記憶された階層的符号データから、文字認識を行なう注目領域と対応する符号列を特定し、特定した符号列に対して第１のビットレートでの復号化処理を施す復号化手段と、前記復号化処理により得た画像から文字を認識する認識手段と、前記認識手段による認識の確度が所定値を下回ったとき、前記復号化処理におけるビットレートを前記第１のビットレートよりも高い第２のビットレートに切り換える制御手段とを備える。 According to another preferred aspect of the present invention, a character recognition device includes an input unit that inputs an image obtained by scanning a document in which one or more characters are written, and a hierarchical encoding process on the input image. And encoding means for acquiring hierarchical code data including a code string hierarchized according to the bit rate, storage means for storing the acquired hierarchical code data, and storing in the storage means A decoding means for identifying a code string corresponding to a region of interest for character recognition from the hierarchical code data and performing a decoding process on the identified code string at a first bit rate; and the decoding process Recognizing means for recognizing characters from the image obtained by the above, and a second bit rate higher than the first bit rate when the accuracy of recognition by the recognizing means falls below a predetermined value. In Ri changing and a control unit.

この態様において、前記符号化手段は、前記入力された画像を互いに重なり部分を持たない複数の描画領域に分割し、分割した各描画領域について階層的符号化処理を施すことで、当該描画領域毎の階層的符号データを取得し、前記復号化手段は、前記分割された描画領域の各々を前記注目領域として順次特定する。 In this aspect, the encoding means divides the input image into a plurality of drawing areas that do not have overlapping portions, and performs a hierarchical encoding process on each of the divided drawing areas, thereby The decoding means sequentially identifies each of the divided drawing areas as the attention area.

本発明の別の好適な態様である文字認識装置は、一又は複数の文字が記された文書を走査して得た画像を入力する入力手段と、前記入力された画像に階層的符号化処理を施し、ビットレートに応じて階層化された符号列を内包する階層的符号データを取得する符号化手段と、前記取得された階層的符号データを記憶する記憶手段と、前記記憶手段に記憶された階層的符号データから、文字認識を行なう注目領域と対応する符号列を特定し、特定した符号列に対して第１のビットレートでの復号化処理を施す復号化手段と、前記復号化処理により得た画像から文字を認識する認識手段と、前記注目領域のうち、前記認識手段による認識の確度が所定値を下回った領域を新たな注目領域として特定する領域絞込み手段と、当該新たな注目領域について前記復号化手段が復号化処理を行う際のビットレートを前記第１のビットレートよりも高い第２のビットレートに切り替える制御手段とを備える。 According to another preferred aspect of the present invention, a character recognition device includes an input unit that inputs an image obtained by scanning a document in which one or more characters are written, and a hierarchical encoding process on the input image. And encoding means for acquiring hierarchical code data including a code string hierarchized according to the bit rate, storage means for storing the acquired hierarchical code data, and storing in the storage means A decoding means for identifying a code string corresponding to a region of interest for character recognition from the hierarchical code data and performing a decoding process on the identified code string at a first bit rate; and the decoding process Recognition means for recognizing characters from the image obtained by the above, area narrowing means for specifying, as the new attention area, an area in which the recognition accuracy by the recognition means falls below a predetermined value among the attention areas, and the new attention Territory And a control means for switching to a higher second bit rate than the bit rate of the first bit rate when performing the decoding means decoding Te.

この態様において、前記符号化手段は、前記入力された画像を互いに重なり部分を持たない複数の描画領域に分割し、分割した各描画領域に階層的符号化処理を施すことで、当該描画領域毎の階層的符号データを取得し、前記領域特定手段は、文字認識の確度が所定値を下回る領域であるか否かの判断を、前記分割された描画領域毎に行うようにしてもよい。 In this aspect, the encoding unit divides the input image into a plurality of drawing areas that do not have overlapping portions, and performs hierarchical encoding processing on each of the divided drawing areas, thereby The area specifying means may determine whether or not the character recognition accuracy is an area below a predetermined value for each of the divided drawing areas.

本発明の別の好適な態様であるプログラムは、一又は複数の文字が記された文書を走査して得た画像を入力する入力手段と、情報の記憶手段とを備えたコンピュータ装置に、前記入力された画像に階層的符号化処理を施し、ビットレートに応じて階層化された符号列を内包する階層的符号データを取得する符号化機能と、前記取得された階層的符号データを前記記憶手段に記憶するデータ記憶機能と、前記記憶手段に記憶された階層的符号データが内包する符号列の全部又は一部に対し、指定されたビットレートでの復号化処理を施す復号化機能と、前記復号化処理により得た画像から文字を認識する認識機能と、前記認識機能による認識の成否に基づき、前記復号化処理におけるビットレートを制御する制御機能とを実現させる。 According to another preferred aspect of the present invention, there is provided a computer program comprising: an input unit that inputs an image obtained by scanning a document in which one or more characters are written; and an information storage unit. A coding function for performing hierarchical coding processing on an input image and acquiring hierarchical code data including a code string hierarchized according to a bit rate; and storing the acquired hierarchical code data in the memory A data storage function stored in the means, a decoding function that performs a decoding process at a specified bit rate on all or part of the code string included in the hierarchical code data stored in the storage means, A recognition function for recognizing characters from an image obtained by the decoding process and a control function for controlling the bit rate in the decoding process based on the success or failure of the recognition by the recognition function are realized.

本発明の別の好適な態様であるプログラムは、一又は複数の文字が記された文書を走査して得た画像を入力する入力手段と、情報の記憶手段とを備えたコンピュータ装置に、前記入力された画像に階層的符号化処理を施し、ビットレートに応じて階層化された符号列を内包する階層的符号データを取得する符号化機能と、前記取得された階層的符号データを前記記憶手段に記憶するデータ記憶機能と、前記記憶手段に記憶された階層的符号データが内包する符号列の全部又は一部に対し、第１のビットレートでの復号化処理を施す復号化機能と、前記復号化処理により得た画像から文字を認識する認識機能と、前記認識機能による認識の確度が所定値を下回ったとき、前記復号化処理におけるビットレートを前記第１のビットレートよりも高い第２のビットレートに切り換える制御機能とを実現させる。 According to another aspect of the present invention, there is provided a computer program comprising: an input unit that inputs an image obtained by scanning a document in which one or more characters are written; and an information storage unit. A coding function for performing hierarchical coding processing on an input image and acquiring hierarchical code data including a code string hierarchized according to a bit rate; and storing the acquired hierarchical code data in the memory A data storage function stored in the means, a decoding function that performs a decoding process at a first bit rate on all or part of a code string included in the hierarchical code data stored in the storage means, When the recognition function for recognizing characters from the image obtained by the decoding process and the recognition accuracy by the recognition function fall below a predetermined value, the bit rate in the decoding process is set higher than the first bit rate. To realize a control function to switch to the second bit rate are.

本発明の別の好適な態様であるプログラムは、一又は複数の文字が記された文書を走査して得た画像を入力する入力手段と、情報の記憶手段とを備えたコンピュータ装置に、前記入力された画像に階層的符号化処理を施し、ビットレートに応じて階層化された符号列を内包する階層的符号データを取得する符号化機能と、前記取得された階層的符号データを前記記憶手段に記憶するデータ記憶機能と、前記記憶手段に記憶された階層的符号データから、文字認識を行なう注目領域と対応する符号列を特定し、特定した符号列に対して第１のビットレートでの復号化処理を施す復号化機能と、前記復号化処理により得た画像から文字を認識する認識機能と、前記認識機能による認識の確度が所定値を下回ったとき、前記復号化処理におけるビットレートを前記第１のビットレートよりも高い第２のビットレートに切り換える制御機能とを実現させる。 According to another aspect of the present invention, there is provided a computer program comprising: an input unit that inputs an image obtained by scanning a document in which one or more characters are written; and an information storage unit. A coding function for performing hierarchical coding processing on an input image and acquiring hierarchical code data including a code string hierarchized according to a bit rate; and storing the acquired hierarchical code data in the memory A code string corresponding to a region of interest for character recognition is identified from the data storage function stored in the means and the hierarchical code data stored in the storage means, and the first code rate is applied to the identified code string. A decoding function for performing the decoding process, a recognition function for recognizing characters from the image obtained by the decoding process, and a recognition function by the recognition function when the accuracy of recognition is below a predetermined value, The trait to realize a control function to switch to a higher second bit rate than the first bit rate.

本発明の別の好適な態様であるプログラムは、一又は複数の文字が記された文書を走査して得た画像を入力する入力手段と、情報の記憶手段とを備えたコンピュータ装置に、前記入力された画像に階層的符号化処理を施し、ビットレートに応じて階層化された符号列を内包する階層的符号データを取得する符号化機能と、前記取得された階層的符号データを前記記憶手段に記憶するデータ記憶機能と、前記記憶手段に記憶された階層的符号データから、文字認識を行なう注目領域と対応する符号列を特定し、特定した符号列に対して第１のビットレートでの復号化処理を施す復号化機能と、前記復号化処理により得た画像から文字を認識する認識機能と、前記注目領域のうち、前記認識機能による認識の確度が所定値を下回った領域を新たな注目領域として特定する領域絞込み機能と、当該新たな注目領域について前記復号化機能が復号化処理を行う際のビットレートを前記第１のビットレートよりも高い第２のビットレートに切り替える制御機能と実現させる。 According to another aspect of the present invention, there is provided a computer program comprising: an input unit that inputs an image obtained by scanning a document in which one or more characters are written; and an information storage unit. A coding function for performing hierarchical coding processing on an input image and acquiring hierarchical code data including a code string hierarchized according to a bit rate; and storing the acquired hierarchical code data in the memory A code string corresponding to a region of interest for character recognition is identified from the data storage function stored in the means and the hierarchical code data stored in the storage means, and the first code rate is applied to the identified code string. A decoding function for performing the decoding process, a recognition function for recognizing characters from the image obtained by the decoding process, and an area of the attention area in which the recognition accuracy by the recognition function falls below a predetermined value. A region narrowing function that identifies the region of interest, and a control function that switches the bit rate when the decoding function performs the decoding process for the new region of interest to a second bit rate that is higher than the first bit rate; make it happen.

本発明によれば、回路規模を狭小なものとしつつも、多種多様なサイズ、形状の文字が混在する文書へのＯＣＲ処理を極めて高速且つ高精度に実行することができる。 According to the present invention, it is possible to execute OCR processing on a document in which characters of various sizes and shapes are mixed at extremely high speed and with high accuracy while reducing the circuit scale.

（第１実施形態）
本発明の第１実施形態に係る文字認識装置について、図を参照しつつ説明する。
図１は、本実施形態に係る文字認識装置のハードウェア概略構成を示すブロック図である。同図に示すように、この文字認識装置は、スキャナ１００と、操作子２００と、コントローラ３００と、通信インターフェース４００とを接続してなる。
スキャナ１００は、スキャン対象となる文書を光学的に走査してグレースケールのラスタデータを生成し、コントローラ３００に供給する。操作子２００は、文字認識処理の開始指示などの各種入力操作を司る。コントローラ３００は、ラスタデータにＯＣＲ処理を施すことで、文書に記された文字のテキストデータを取得する。通信インターフェース４００は、コントローラ３００が取得したテキストデータを外部のコンピュータ装置へ送信する。 (First embodiment)
A character recognition device according to a first embodiment of the present invention will be described with reference to the drawings.
FIG. 1 is a block diagram showing a schematic hardware configuration of the character recognition apparatus according to the present embodiment. As shown in the figure, this character recognition apparatus is configured by connecting a scanner 100, an operator 200, a controller 300, and a communication interface 400.
The scanner 100 optically scans a document to be scanned to generate gray scale raster data, and supplies the raster data to the controller 300. The operator 200 is responsible for various input operations such as an instruction to start character recognition processing. The controller 300 obtains text data of characters written in the document by performing OCR processing on the raster data. The communication interface 400 transmits the text data acquired by the controller 300 to an external computer device.

図２は、コントローラ３００の内部構成を示すブロック図である。コントローラ３００は、符号化手段３１０と、符号データ記憶手段３２０と、注目領域特定手段３３０と、復号化手段３４０と、文字認識手段３５０と、テキストデータ記憶手段３６０とを内蔵している。
符号化手段３１０は、ラスタデータに階層的符号化処理を施すことで、ＪＰＥＧ（Joint Photographic Experts Group）２０００符号化規格に従ったフォーマットの符号データである階層的符号データを取得する。即ち、この符号化手段３１０は、図示しないＤＣレベルシフト手段、タイリング手段、ウェーブレット変換手段、量子化手段、係数ビットモデリング手段、符号列生成手段等を内蔵してなり、ラスタデータが入力されると、これらの各手段が、ＤＣレベルシフト、タイリング、ウェーブレット変換、量子化、係数ビットモデリング、符号列生成といった一連の処理を実行し、階層的符号データを取得する。 FIG. 2 is a block diagram showing an internal configuration of the controller 300. The controller 300 includes an encoding unit 310, a code data storage unit 320, a region of interest specifying unit 330, a decoding unit 340, a character recognition unit 350, and a text data storage unit 360.
The encoding unit 310 performs hierarchical encoding processing on the raster data to obtain hierarchical code data that is code data in a format according to the JPEG (Joint Photographic Experts Group) 2000 encoding standard. That is, the encoding unit 310 includes a DC level shift unit, a tiling unit, a wavelet transform unit, a quantization unit, a coefficient bit modeling unit, a code string generation unit, etc. (not shown), and receives raster data. Each of these means executes a series of processes such as DC level shift, tiling, wavelet transform, quantization, coefficient bit modeling, and code string generation, and acquires hierarchical code data.

符号化手段３１０に内蔵される各手段の振る舞いは従来技術の範疇に属するため、詳細は割愛し、ここでは、符号化手段３１０により取得される階層的符号データのデータ構造を説明しておく。
図３は、階層的符号データのデータ構造図である。同図に示すように、この符号データは、メインヘッダと、「タイル」と称される複数のセグメントからなる。メインヘッダＭＨは、画像の大きさ、画像位置のオフセットいったような属性情報を内包している。ここで、符号化規格ＪＰＥＧ２０００に特有の概念である「タイル領域」について説明しておく。「タイル領域」とは、入力されたラスタデータの全描画領域を分割して得た、互いに重なり部分を有しない同サイズの長方形状の描画領域を意味する。ＪＰＥＧ２０００では、ウェーブレット変換や符号化などのさまざまな信号処理の過程において、タイル領域の境界を越える画素の参照は行なわず、各タイル領域の各々を１つの独立単位として取り扱うことになっている。各タイル領域の相関をなくすことにより、符号化及び復号化時の省メモリ化を実現したものである。階層的符号データにおいて、タイルと呼ばれるセグメントの各々は、１つのタイル領域と夫々対応する。 Since the behavior of each means incorporated in the encoding means 310 belongs to the category of the prior art, the details are omitted here. Here, the data structure of the hierarchical code data acquired by the encoding means 310 will be described.
FIG. 3 is a data structure diagram of hierarchical code data. As shown in the figure, the code data includes a main header and a plurality of segments called “tiles”. The main header MH includes attribute information such as an image size and an image position offset. Here, the “tile area”, which is a concept unique to the encoding standard JPEG2000, will be described. The “tile area” means a rectangular drawing area of the same size that is obtained by dividing the entire drawing area of the input raster data and has no overlapping portion. In JPEG2000, in various signal processing processes such as wavelet transform and encoding, pixels that cross the boundary of tile areas are not referred to, and each tile area is handled as one independent unit. By eliminating the correlation between the tile areas, memory saving at the time of encoding and decoding is realized. In the hierarchical code data, each segment called a tile corresponds to one tile area.

図３に示すように、タイルＴのセグメントは、タイルヘッダＴＨとタイルデータＴＤとに分かれている。タイルヘッダＴＨは各タイル領域にラスタ順に付与されたタイル番号などの属性情報を内包する。一方、タイルデータＴＨは複数のパケットＰに分かれており、各パケットＰは、所定のビットレート毎にエンベデッド化された符号列を各々内包している。「エンベデッド化」とは、１つの画素を表す符号列の中に、複数のビットレートで符号化された符号列が階層状に埋め込まれていることを意味する。例えば、図３のパケットＰ１には、ラスタデータを最も低いビットレートで符号化して得られる符号列が内包され、パケットＰ２には、パケット１の符号列よりも一段階高いビットレートで符号化して得られる符号列とパケット１の符号列との差分の符号列、つまりはパケット１の符号化へ量子化する際に切捨てられる符号列が内包されることになる。同様に、パケットＰ３以降の各パケットには、直前のパケットの符号列よりも高いビットレートで符号化して得られる符号列との差分が再帰的に内包されることになる。 As shown in FIG. 3, the segment of the tile T is divided into a tile header TH and tile data TD. The tile header TH includes attribute information such as a tile number assigned to each tile area in raster order. On the other hand, the tile data TH is divided into a plurality of packets P, and each packet P includes a code string embedded for each predetermined bit rate. “Embedding” means that a code sequence encoded at a plurality of bit rates is embedded in a hierarchical manner in a code sequence representing one pixel. For example, the packet P1 in FIG. 3 includes a code string obtained by encoding raster data at the lowest bit rate, and the packet P2 is encoded at a bit rate one step higher than that of the packet 1 code string. A code string that is a difference between the obtained code string and the code string of packet 1, that is, a code string that is truncated when quantized to the encoding of packet 1 is included. Similarly, each packet after the packet P3 includes recursively a difference from a code string obtained by encoding at a higher bit rate than the code string of the immediately preceding packet.

このように符号列がエンベデッド化されていると、復号化するビットのビットレートと復元される画像の画質との間に比例関係が成り立つことになる。例えば、あるタイル領域の画像に階層的符号化処理を施してａバイトの符号列を含む符号データが生成されたとする。このａバイトの符号列がエンベデット化されている場合、ｂバイト（ｂ＜ａ）やｃバイト（ｃ＜ｂ）のみを復号化して画像を得ることができ、しかも得られた画像は、もともとのビットレートをｂバイトやｃバイトとして符号化、復号化を行なった場合の画質と同じになる。 When the code string is embedded in this way, a proportional relationship is established between the bit rate of the bit to be decoded and the image quality of the restored image. For example, it is assumed that code data including an a-byte code string is generated by performing hierarchical encoding processing on an image in a certain tile area. When this a-byte code string is embedded, an image can be obtained by decoding only b bytes (b <a) and c bytes (c <b), and the obtained image is the original one. This is the same as the image quality when encoding / decoding is performed with the bit rate set to b bytes or c bytes.

図２の説明に戻る。
符号データ記憶手段３２０は、符号化手段３１０から供給される階層的符号データを一時的に記憶するバッファである。
注目領域特定手段３３０は、符号データ記憶手段３２０の階層的符号データから、注目領域となる１つのタイル領域と対応するセグメントを特定し、特定したセグメントの符号列を抽出して復号化手段３４０へ供給する。この注目領域特定手段３３０は、図示しない画質レベルカウンタを有しており、同じ領域内における文字認識が失敗する毎に、このカウンタの数値を１つずつ増加させる。そして、符号列の抽出時は、抽出すべき符号列を画質レベルカウンタの数値に応じて取捨選択する。例えば、注目領域特定手段３３０が自らの画質レベルカウンタを参照した結果、その数値が「１」となっていれば、最上位層のパケットに含まれている符号列のみを抽出して供給し、画質レベルカウンタの数値が「２」となっていれば、上から２番目の層のパケットに含まれている符号列まで抽出し、それらを連結して得た符号列を供給する。つまり、この注目領域特定手段３３０は、復号化手段３４０によって行なわれる復号化処理のビットレートを切り替える制御手段としても機能する。 Returning to the description of FIG.
The code data storage unit 320 is a buffer that temporarily stores the hierarchical code data supplied from the encoding unit 310.
The attention area specifying means 330 specifies a segment corresponding to one tile area as the attention area from the hierarchical code data in the code data storage means 320, extracts the code string of the specified segment, and outputs it to the decoding means 340. Supply. This attention area specifying means 330 has an image quality level counter (not shown), and increases the value of this counter by one each time character recognition fails in the same area. When extracting the code string, the code string to be extracted is selected according to the value of the image quality level counter. For example, if the numerical value is “1” as a result of the attention area specifying unit 330 referring to its image quality level counter, only the code string included in the packet of the highest layer is extracted and supplied, If the value of the image quality level counter is “2”, the code string included in the packet of the second layer from the top is extracted, and the code string obtained by concatenating them is supplied. That is, the attention area specifying unit 330 also functions as a control unit that switches the bit rate of the decoding process performed by the decoding unit 340.

復号化手段３４０は、注目領域特定手段３３０が階層的符号データから抽出した符号列に復号化処理を施し、注目領域の画像を画素情報群として復元する。画素情報とは、画像を描画する画素のアドレスと画素値を内包する情報を意味する。即ち、この復号化手段３４０は、図示しない算術復号手段、係数ビットモデリング手段、逆量子化手段、逆ウェーブレット変換手段、ＤＣレベルシフト手段等を内蔵してなり、符号列が入力されると、内蔵された各手段が、算術復号、係数ビットモデリング、逆量子化、逆ウェーブレット変換、ＤＣレベルシフトといった一連の処理を実行し、注目領域の画素情報群を取得する。 The decoding unit 340 performs a decoding process on the code string extracted from the hierarchical code data by the attention area specifying unit 330 and restores the image of the attention area as a pixel information group. Pixel information means information including the address and pixel value of a pixel for drawing an image. That is, the decoding unit 340 includes an arithmetic decoding unit, a coefficient bit modeling unit, an inverse quantization unit, an inverse wavelet transform unit, a DC level shift unit, and the like (not shown). Each of these means executes a series of processes such as arithmetic decoding, coefficient bit modeling, inverse quantization, inverse wavelet transform, and DC level shift, and acquires a pixel information group of the region of interest.

文字認識手段３５０は、注目領域の画素情報群から文字を認識し、その認識の成否に基づき文字認識の確度を求めて出力する。後の動作説明の項で詳述するように、この認識確度が所定値を下回った場合は、復号化手段３４０にて、より高いビットレートで復号化された画素情報群が、文字認識手段３５０へ再度供給される。
テキストデータ記憶手段３６０は、文字認識手段３５０によって認識された文字のテキストデータを一時的に記憶するバッファである。 The character recognizing means 350 recognizes a character from the pixel information group of the attention area, and obtains and outputs the accuracy of character recognition based on the success or failure of the recognition. As will be described in detail later in the description of the operation, when the recognition accuracy falls below a predetermined value, the pixel information group decoded by the decoding unit 340 at a higher bit rate is converted into the character recognition unit 350. Supplied again.
The text data storage unit 360 is a buffer that temporarily stores text data of characters recognized by the character recognition unit 350.

次に、本実施形態に特徴的な動作である文字認識処理について説明する。
図４は、文字認識処理を示すフローチャートである。
この処理は、利用者が、図示しない原稿載置台に文書を載置し、操作子２００から文字認識処理の開始を指示すると開始される。
文字認識処理の開始が指示されると、スキャナ１００は、文書を光学的に走査してグレースケールのラスタデータを生成し、コントローラ３００に供給する（Ｓ１０）。供給されるラスタデータはコントローラ３００の符号化手段３１０へ入力される。 Next, a character recognition process that is a characteristic operation of the present embodiment will be described.
FIG. 4 is a flowchart showing the character recognition process.
This process is started when the user places a document on a document placement table (not shown) and instructs the operator 200 to start the character recognition process.
When the start of the character recognition process is instructed, the scanner 100 optically scans the document to generate grayscale raster data, and supplies it to the controller 300 (S10). The supplied raster data is input to the encoding unit 310 of the controller 300.

コントローラ３００の符号化手段３１０は、入力されたラスタデータに階層的符号化処理を施し、階層的符号データを取得する（Ｓ１１）。上述したように、階層的符号データは、タイル領域毎にエンベデット化された符号列を内包してなる。
次に、注目領域特定手段３３０は、階層的符号化処理の際にラスタデータを分割して得られた各タイル領域のうちの１つを注目領域として特定する（Ｓ１２）。 The encoding unit 310 of the controller 300 performs hierarchical encoding processing on the input raster data to obtain hierarchical code data (S11). As described above, the hierarchical code data includes a code string embedded in each tile area.
Next, the attention area specifying means 330 specifies one of the tile areas obtained by dividing the raster data during the hierarchical encoding process as the attention area (S12).

注目領域特定手段３３０は、階層的符号データが内包する各タイルヘッダを参照し、ステップ１２で特定したタイル領域と対応するタイルデータを特定する（Ｓ１３）。
注目領域特定手段３３０は、自らの画質レベルカウンタの数値が示す画質レベルと対応する階層の符号列をステップ１３で特定したタイルデータから抽出して文字認識手段３５０へ出力する（Ｓ１４）。例えば、画質レベルカウンタの値が「１」となっていれば、タイルデータにおける最上位層のパケットに含まれている文字列のみを抽出し、画質レベルカウンタの値が「２」となっていれば、タイルデータにおける２番目の層のパケットに含まれている文字列まで抽出し、それらを連結して得た文字列を供給するといった処理を行う。 The attention area specifying means 330 refers to each tile header included in the hierarchical code data, and specifies tile data corresponding to the tile area specified in step 12 (S13).
The attention area specifying unit 330 extracts the code string of the hierarchy corresponding to the image quality level indicated by the value of its image quality level counter from the tile data specified in step 13, and outputs it to the character recognition unit 350 (S14). For example, if the value of the image quality level counter is “1”, only the character string included in the packet of the highest layer in the tile data is extracted, and the value of the image quality level counter is “2”. For example, the character string included in the packet of the second layer in the tile data is extracted, and the character string obtained by connecting them is supplied.

復号化手段３４０は、注目領域特定手段３３０から入力される文字列に復号化処理を施すことで、タイル領域の画像を描画する画素情報群を取得し、取得した画素情報群を文字認識手段３５０へ出力する（Ｓ１５）。
文字認識手段３５０は、復号化手段３４０から入力される画素情報群を基に、文字認識を試みる（Ｓ１６）。具体的には、画素情報群によって描画されるビットマップから各文字を描画している画像を切出し、切り出した画像から抽出した特徴量と辞書として予め準備された文字（以下、「基準文字」と呼ぶ）の特徴量との論理上の距離を計測する。そして、計測の結果、特徴量の距離の開きが最も少ない基準文字が描画されているものと判断する。 The decoding unit 340 performs a decoding process on the character string input from the attention area specifying unit 330 to acquire a pixel information group for drawing an image of the tile area, and the acquired pixel information group is used as the character recognition unit 350. (S15).
The character recognizing unit 350 attempts to perform character recognition based on the pixel information group input from the decoding unit 340 (S16). Specifically, an image in which each character is drawn is cut out from a bitmap drawn by the pixel information group, and a feature amount extracted from the cut out image and a character prepared in advance as a dictionary (hereinafter referred to as “reference character”) Measure the logical distance from the feature quantity. Then, as a result of the measurement, it is determined that the reference character having the smallest feature distance is drawn.

文字認識手段３５０は、ステップ１６における認識の確度が所定値を下回っているか判断する（Ｓ１７）。このステップにおける認識の確度は、特徴量の距離と対応する。即ち、描画されているものと判断した標準文字との特徴量の距離が所定値よりも小さければこのステップの判断結果は「ＮＯ」となり、所定値よりも大きければこのステップの判断結果は「ＹＥＳ」となる。なお、注目領域に複数の文字の画像が含まれている場合、各々の文字について求めた認識の確度の平均値が所定値よりも小さければこのステップの判断結果は「ＮＯ」となり、大きければ「ＹＥＳ」となる。 The character recognition unit 350 determines whether the recognition accuracy in step 16 is below a predetermined value (S17). The recognition accuracy in this step corresponds to the distance of the feature amount. That is, if the distance of the feature amount from the standard character determined to be drawn is smaller than the predetermined value, the determination result of this step is “NO”, and if the distance is larger than the predetermined value, the determination result of this step is “YES”. " When the attention area includes an image of a plurality of characters, the determination result of this step is “NO” if the average value of the recognition accuracy obtained for each character is smaller than a predetermined value, and “ YES ”.

ステップ１７の判断結果が「ＹＥＳ」となったとき、文字認識手段３５０は、文字認識が失敗した旨の信号を注目領域特定手段３３０へ供給する（Ｓ１８）。信号を取得した注目領域特定手段３３０は、自らの画質レベルカウンタの値に「１」を加算する（Ｓ１９）。その後、ステップ１４に戻り、注目領域特定手段３３０が新たな画質レベルと対応する符号列を抽出し、続くステップ１５以降の処理が順次実行される。 When the determination result in step 17 is “YES”, the character recognition unit 350 supplies a signal indicating that the character recognition has failed to the attention area specifying unit 330 (S18). The attention area specifying means 330 that has acquired the signal adds “1” to the value of its image quality level counter (S19). Then, returning to step 14, the attention area specifying means 330 extracts a code string corresponding to the new image quality level, and the subsequent processing from step 15 is sequentially executed.

ステップ１７の判断結果が「ＮＯ」となったとき、文字認識手段３５０は、ステップ１６の認識の結果得られたテキストデータをテキストデータ記憶手段３６０に出力する（Ｓ２０）。テキストデータ記憶手段３６０は、出力されたテキストデータを順次記憶する。続いて、文字認識手段３５０は、文字認識が成功した旨の信号を注目領域特定手段３３０へ供給する（Ｓ２１）。信号を取得した注目領域特定手段３３０は、自らの画質レベルカウンタの値を「０」にリセットする（Ｓ２２）。その後、ステップ１２に戻り、注目領域特定手段３３０が別のタイル領域を注目領域として特定し、この新たな注目領域を処理対象としてステップ１３以降の処理が繰り返される。
すべてのタイル領域を注目領域として上記一連の処理が実行し終えると、テキストデータ記憶手段３６０に記憶されたテキストデータが読み出され、通信インターフェース４００を介して外部のコンピュータ装置へ送信される。 When the determination result in step 17 is “NO”, the character recognition unit 350 outputs the text data obtained as a result of the recognition in step 16 to the text data storage unit 360 (S20). The text data storage unit 360 sequentially stores the output text data. Subsequently, the character recognition unit 350 supplies a signal indicating that the character recognition is successful to the attention area specifying unit 330 (S21). The attention area specifying means 330 that has acquired the signal resets the value of its image quality level counter to “0” (S22). Thereafter, returning to step 12, the attention area specifying means 330 specifies another tile area as the attention area, and the processing after step 13 is repeated with this new attention area as the processing target.
When the above-described series of processing has been executed with all tile regions as the attention region, the text data stored in the text data storage unit 360 is read out and transmitted to an external computer device via the communication interface 400.

以上説明した本実施形態では、符号化手段３１０が、文字が記された文書のラスターデータに階層的符号化処理を施して階層的符号データを取得する。そして、この階層的符号データを復号化して得られる画素情報群を文字認識手段３５０へ供給する復号化手段３４０は、文字認識手段３５０から文字の認識が失敗した旨のフィードバックを受ける毎に、より高いビットレートで復号化して得た画素情報群を再帰的に供給するようになっている。このような機能構成を取る本実施形態によれば、復号化処理の負担を可能な限り抑えつつも、極めて高精度な文字認識結果を得ることができる。 In the present embodiment described above, the encoding unit 310 performs hierarchical encoding processing on raster data of a document in which characters are written, and acquires hierarchical code data. The decoding unit 340 that supplies the pixel information group obtained by decoding the hierarchical code data to the character recognition unit 350 receives more feedback from the character recognition unit 350 that character recognition has failed. A pixel information group obtained by decoding at a high bit rate is recursively supplied. According to this embodiment having such a functional configuration, it is possible to obtain a character recognition result with extremely high accuracy while suppressing the burden of decoding processing as much as possible.

（第２実施形態）
上記実施形態では、復号化処理及び文字認識処理がラスタデータを分割して得たタイル領域毎に実行されるようになっており、１つのタイル領域について文字認識が完遂された後に、別のタイル領域へ処理対象が順次移動するようになっていた。
これに対し、本実施形態は、まず、ラスタデータの全体の描画領域の符号列を最も低いビットレートで復号化して文字認識を試み、文字認識の確度が所定値を下回ったタイル領域を絞り込む。その後は、抽出したタイル領域の符号列をより高いビットレートで復号化して文字認識を試み、文字認識の確度が所定を下回ったタイル領域を更に絞り込む処理を再帰的に繰り返すようになっている。 (Second Embodiment)
In the above embodiment, the decoding process and the character recognition process are performed for each tile area obtained by dividing the raster data. After the character recognition is completed for one tile area, another tile is processed. The processing target was sequentially moved to the area.
On the other hand, in this embodiment, first, the character string is tried by decoding the code string of the entire drawing area of the raster data at the lowest bit rate, and the tile area where the accuracy of character recognition is below a predetermined value is narrowed down. Thereafter, the code sequence of the extracted tile area is decoded at a higher bit rate to try character recognition, and the process of further narrowing down the tile area whose character recognition accuracy is below a predetermined value is recursively repeated.

本実施形態のハードウェア構成は第１実施形態と同様であるので、再度の説明は割愛する。
次に、本実施形態に特徴的な動作である文字認識処理について説明する。
図５は、文字認識処理を示すフローチャートである。 Since the hardware configuration of this embodiment is the same as that of the first embodiment, the description thereof will be omitted.
Next, a character recognition process that is a characteristic operation of the present embodiment will be described.
FIG. 5 is a flowchart showing the character recognition process.

この図において、ステップ２０乃至ステップ２１の処理内容は、図４に示したステップ１０乃至ステップ１１と同様である。また、本実施形態では、図４のステップ１２乃至ステップ１３に相当する処理は行われない。
ステップ２１において階層的符号データが取得されると、注目領域特定手段３３０は、階層的符号データが内包するすべてのタイルデータから、自らの画質レベルカウンタの数値が示す画質レベルと対応する階層の符号列を抽出して文字認識手段３５０へ出力する（Ｓ２４）。 In this figure, the processing contents of steps 20 to 21 are the same as those of steps 10 to 11 shown in FIG. Further, in the present embodiment, processing corresponding to Step 12 to Step 13 in FIG. 4 is not performed.
When the hierarchical code data is acquired in step 21, the attention area specifying unit 330 encodes the code of the hierarchy corresponding to the image quality level indicated by the numerical value of its image quality level counter from all the tile data included in the hierarchical code data. The column is extracted and output to the character recognition means 350 (S24).

すると、復号化手段３４０は、注目領域特定手段３３０から入力される符号列に復号化処理を施すことで、各タイル領域の画像を夫々描画する画素情報群を取得し、取得した画素情報群を文字認識手段３５０へ出力する（Ｓ２５）。つまり、本実施形態では、タイル領域毎の画素情報群のセットが文字認識手段３５０へ纏めて供給されることになる。
文字認識手段３５０は、復号化手段３４０から入力される画素情報群に基づく文字認識を試みる（Ｓ２６）。そして、文字認識手段３５０は、ステップ２６における認識の確度が所定値を下回ったタイル領域があるか判断する（Ｓ２７）。 Then, the decoding unit 340 performs a decoding process on the code string input from the attention area specifying unit 330, thereby acquiring a pixel information group for drawing an image of each tile area, and acquiring the acquired pixel information group. It outputs to the character recognition means 350 (S25). That is, in the present embodiment, a set of pixel information groups for each tile area is collectively supplied to the character recognition unit 350.
The character recognizing means 350 tries character recognition based on the pixel information group input from the decoding means 340 (S26). Then, the character recognizing unit 350 determines whether there is a tile area in which the recognition accuracy in step 26 falls below a predetermined value (S27).

ステップ２７の判断結果が「ＹＥＳ」となったとき、文字認識手段３５０は、文字認識の確度が所定値を下回ったタイル領域のタイル番号を注目領域特定手段３３０へ供給する（Ｓ２８）。この際、文字認識手段３５０は、文字認識の確度が所定値を上回ったタイル領域については、その領域の文字認識結果として出力したテキストデータを自らが備える図示しないメモリに記憶しておく。 When the determination result in step 27 is “YES”, the character recognizing unit 350 supplies the tile number of the tile area whose accuracy of character recognition is lower than the predetermined value to the attention area specifying unit 330 (S28). At this time, the character recognizing unit 350 stores the text data output as the character recognition result of the area in a memory (not shown) provided for the tile area where the accuracy of character recognition exceeds a predetermined value.

注目領域特定手段３３０は、タイル番号を取得すると、自らの画質レベルカウンタの値に「１」を加算する（Ｓ２９）。そして、注目領域特定手段３３０は、階層的符号データが内包する各タイルヘッダを参照し、取得したタイル番号が示す一又は複数のタイル領域のタイルデータを特定する（Ｓ３０）。更に、注目領域特定手段３３０は、ステップ３０で特定したタイルデータから、自らの画質レベルカウンタの数値が示す画質レベルと対応する階層の符号列を抽出して文字認識手段３５０へ出力する（Ｓ３１）。つまり、本実施形態における注目領域特定手段３３０は、文字認識手段３５０による認識の確度が所定値を下回った領域を新たな注目領域として特定する領域絞込み手段としても機能する。この処理の後、ステップ２６に戻って文字認識手段３５０が文字認識を試み、続くステップ２７以降の処理が順次実行される。そして、ステップ１６乃至ステップ３１の一連の処理は、所定値を上回る確度がすべてのタイル領域について出力されるまで繰り返されることになる。 When acquiring the tile number, the attention area specifying unit 330 adds “1” to the value of its image quality level counter (S29). Then, the attention area specifying unit 330 refers to each tile header included in the hierarchical code data, and specifies tile data of one or more tile areas indicated by the acquired tile number (S30). Further, the attention area specifying unit 330 extracts the code string of the hierarchy corresponding to the image quality level indicated by the value of the image quality level counter of the tile data specified in step 30 and outputs the code string to the character recognition unit 350 (S31). . That is, the attention area specifying means 330 in this embodiment also functions as an area narrowing means for specifying, as a new attention area, an area where the accuracy of recognition by the character recognition means 350 is below a predetermined value. After this processing, returning to step 26, the character recognizing means 350 tries character recognition, and the subsequent processing from step 27 is sequentially executed. Then, the series of processing from step 16 to step 31 is repeated until the accuracy exceeding the predetermined value is output for all tile regions.

ステップ２７の判断結果が「ＮＯ」となったとき、文字認識手段３５０は、ステップ２６の文字認識の結果得られた一連のテキストデータをテキストデータ記憶手段３６０に出力する（Ｓ３２）。テキストデータ記憶手段３６０は、出力されたテキストデータを順次記憶し、処理が終了する。
以上説明した本実施形態によれば、文字の認識を行なうたびに文字認識の確度が所定値を下回ったタイル領域を絞り込みつつ、段階的符号データの符号列を復号化するビットレートを段階的に高くしていく。このため、高品質の画像を基に文字認識を行なう領域を段階的に狭小化していくことができ、文字認識のための処理負担を軽減できる。 When the determination result in step 27 is “NO”, the character recognition unit 350 outputs a series of text data obtained as a result of the character recognition in step 26 to the text data storage unit 360 (S32). The text data storage unit 360 sequentially stores the output text data, and the process ends.
According to the present embodiment described above, the bit rate for decoding the code string of the step-wise code data is stepwise while narrowing down the tile area where the character recognition accuracy falls below a predetermined value every time the character is recognized. Increase it. For this reason, the area for character recognition based on a high-quality image can be narrowed in stages, and the processing burden for character recognition can be reduced.

（他の実施形態）
本願発明は、種々の変形実施が可能である。
上記実施形態では、ＪＰＥＧ２０００符号化規格に従った符号化、復号化を行なっていたが、ＪＢＩＧ（joint bi-level image experts group）といったような他の符号化規格により符号化、復号化を行なってもよい。
上記実施形態における、符号化手段３１０、注目領域特定手段３３０、復号化手段３４０、文字認識手段３５０の各手段と同等の機能を実現するプログラムを汎用のコンピュータ装置に実装させ、このコンピュータ装置のプロセッサに、上記各手段と同等の処理を実行させるようにしてもよい。この種のプログラムは、ＣＤ−ＲＯＭなどの記憶媒体に記憶して配布してもよいし、ネットワーク上に設けられたサーバ装置からクライアント装置の要求に応じて配信されるようにしてもよい。
上記実施形態では、文字認識手段３５０が、注目領域の画素情報群から文字を認識する機能と、その認識の成否に基づき文字認識の確度を求めて出力する機能とを共に担うものであったが、これらの機能を個別の手段により実現してもよい。また、第２実施形態では、文字認識手段３５０による認識の確度が所定値を下回った領域を新たな注目領域として特定する機能を注目領域特定手段３３０が担っていたが、この機能を注目領域特定手段３３０とは独立した別の手段に担わせてもよい。
ラスターデータを符号化してられる符号列は、エンベデット化されている必要はない。例えば、ラスタデータを複数のビットレートで個別に符号化して各々の符号列を含む符号データを取得しておき、これらの符号データを画質の低い順から順番に利用して文字認識を行うようにしてもよい。かかる変形例の構成及び動作を概念的に示すと、「一又は複数の文字が記された文書を走査して得た画像を入力する入力手段と、前記入力された画像に異なるビットレートでの符号化処理を個別に施し、前記画像を異なるビットレートで符号化して得た各符号データを取得する符号化手段と、前記取得された各符号データを記憶する記憶手段と、前記記憶手段から、第１のビットレートの符号データを抽出する抽出手段と、前記抽出された符号データに復号化処理を施す復号化手段と、前記復号化処理により得た画像から文字を認識する認識手段と、前記認識手段による認識の確度が所定値を下回ったとき、前記抽出手段が符号データを抽出するキーとなるビットレートを、第１のビットレートより高い第２のビットレートに切り替える制御手段とを備えた文字認識装置。」となる。 (Other embodiments)
The present invention can be modified in various ways.
In the above embodiment, encoding and decoding are performed according to the JPEG2000 encoding standard, but encoding and decoding are performed according to other encoding standards such as JBIG (joint bi-level image experts group). Also good.
A program that realizes functions equivalent to those of the encoding unit 310, the attention area specifying unit 330, the decoding unit 340, and the character recognition unit 350 in the above embodiment is mounted on a general-purpose computer device, and the processor of this computer device In addition, processing equivalent to each of the above means may be executed. This type of program may be distributed by being stored in a storage medium such as a CD-ROM, or may be distributed in response to a request from a client device from a server device provided on a network.
In the above embodiment, the character recognition unit 350 has both a function of recognizing characters from the pixel information group of the region of interest and a function of obtaining and outputting the accuracy of character recognition based on the success or failure of the recognition. These functions may be realized by individual means. In the second embodiment, the attention area specifying unit 330 has a function of specifying, as a new attention area, an area in which the accuracy of recognition by the character recognition means 350 falls below a predetermined value. Another means independent of the means 330 may be used.
The code string obtained by encoding the raster data does not need to be embedded. For example, raster data is individually encoded at a plurality of bit rates to obtain code data including each code string, and character recognition is performed using these code data in order from the lowest image quality. May be. Conceptually showing the configuration and operation of such a modified example, “the input means for inputting an image obtained by scanning a document in which one or more characters are written, and the input image at different bit rates. An encoding process is performed individually, encoding means for acquiring each code data obtained by encoding the image at different bit rates, storage means for storing each acquired code data, and the storage means, Extraction means for extracting code data of a first bit rate; decoding means for performing decoding processing on the extracted code data; recognition means for recognizing characters from an image obtained by the decoding processing; Control for switching the bit rate, which is a key for extracting code data by the extraction unit, to a second bit rate higher than the first bit rate when the accuracy of recognition by the recognition unit falls below a predetermined value Character recognition device and a stage. Becomes. "

文字認識装置のハードウェア概略構成図である。It is a hardware schematic block diagram of a character recognition apparatus. コントローラの内部構成を示すブロック図である。It is a block diagram which shows the internal structure of a controller. 階層的符号データのデータ構造図である。It is a data structure figure of hierarchical code data. 文字認識処理を示すフローチャートである（第１実施形態）。It is a flowchart which shows a character recognition process (1st Embodiment). 文字認識処理を示すフローチャートである（第２実施形態）。It is a flowchart which shows a character recognition process (2nd Embodiment).

Explanation of symbols

１００…スキャナ、２００…操作子、３００…コントローラ、３１０…符号化手段、３２０…符号データ記憶手段、３３０…注目領域特定手段、３４０…復号化手段、３５０…文字認識手段、３６０…テキストデータ記憶手段。
DESCRIPTION OF SYMBOLS 100 ... Scanner, 200 ... Operator, 300 ... Controller, 310 ... Coding means, 320 ... Code data storage means, 330 ... Region of interest specifying means, 340 ... Decoding means, 350 ... Character recognition means, 360 ... Text data storage means.

Claims

Input means for inputting an image obtained by scanning a document in which one or more characters are written;
Encoding means for performing hierarchical encoding processing on the input image and acquiring hierarchical code data including a code string hierarchized according to a bit rate;
Storage means for storing the acquired hierarchical code data;
Decoding means for performing a decoding process at a specified bit rate on all or part of a code string included in the hierarchical code data stored in the storage means;
Recognition means for recognizing characters from the image obtained by the decoding process;
A character recognition device comprising: control means for controlling a bit rate in the decoding process based on success or failure of recognition by the recognition means.

Input means for inputting an image obtained by scanning a document in which one or more characters are written;
Encoding means for performing hierarchical encoding processing on the input image and acquiring hierarchical code data including a code string hierarchized according to a bit rate;
Storage means for storing the acquired hierarchical code data;
Decoding means for performing a decoding process at a first bit rate on all or part of a code string included in the hierarchical code data stored in the storage means;
Recognition means for recognizing characters from the image obtained by the decoding process;
And a control means for switching the bit rate in the decoding process to a second bit rate higher than the first bit rate when the recognition accuracy by the recognition means falls below a predetermined value.

Input means for inputting an image obtained by scanning a document in which one or more characters are written;
Encoding means for performing hierarchical encoding processing on the input image and acquiring hierarchical code data including a code string hierarchized according to a bit rate;
Storage means for storing the acquired hierarchical code data;
Decoding means for identifying a code string corresponding to a region of interest for character recognition from the hierarchical code data stored in the storage means, and performing decoding processing at the first bit rate for the identified code string When,
Recognition means for recognizing characters from the image obtained by the decoding process;
And a control means for switching the bit rate in the decoding process to a second bit rate higher than the first bit rate when the recognition accuracy by the recognition means falls below a predetermined value.

The character recognition device according to claim 3.
The encoding means includes
The input image is divided into a plurality of drawing regions that do not have overlapping portions, and hierarchical coding processing is performed on each divided drawing region to obtain hierarchical code data for each drawing region,
The decoding means includes
A character recognition device for sequentially specifying each of the divided drawing areas as the attention area.

Input means for inputting an image obtained by scanning a document in which one or more characters are written;
Encoding means for performing hierarchical encoding processing on the input image and acquiring hierarchical code data including a code string hierarchized according to a bit rate;
Storage means for storing the acquired hierarchical code data;
Decoding means for identifying a code string corresponding to a region of interest for character recognition from the hierarchical code data stored in the storage means, and performing decoding processing at the first bit rate for the identified code string When,
Recognition means for recognizing characters from the image obtained by the decoding process;
Of the attention area, an area narrowing means for specifying, as a new attention area, an area where the accuracy of recognition by the recognition means falls below a predetermined value;
A character recognition device comprising: control means for switching a bit rate at which the decoding means performs a decoding process for the new attention area to a second bit rate higher than the first bit rate.

The character recognition device according to claim 5.
The encoding means includes
The input image is divided into a plurality of drawing areas that do not have overlapping portions, and hierarchical coding processing is performed on each divided drawing area to obtain hierarchical code data for each drawing area,
The area narrowing means is
A character recognition device that determines whether or not the accuracy of character recognition is an area below a predetermined value for each of the divided drawing areas.

Input means for inputting an image obtained by scanning a document in which one or more characters are written;
In a computer device comprising information storage means,
A coding function for performing hierarchical coding processing on the input image and acquiring hierarchical code data including a code string hierarchized according to a bit rate;
A data storage function for storing the acquired hierarchical code data in the storage means;
A decoding function that performs a decoding process at a specified bit rate on all or part of a code string included in the hierarchical code data stored in the storage unit;
A recognition function for recognizing characters from the image obtained by the decoding process;
And a control function for controlling a bit rate in the decoding process based on success or failure of recognition by the recognition function.

Input means for inputting an image obtained by scanning a document in which one or more characters are written;
In a computer device comprising information storage means,
A coding function for performing hierarchical coding processing on the input image and acquiring hierarchical code data including a code string hierarchized according to a bit rate;
A data storage function for storing the acquired hierarchical code data in the storage means;
A decoding function for performing a decoding process at a first bit rate on all or part of a code string included in the hierarchical code data stored in the storage unit;
A recognition function for recognizing characters from the image obtained by the decoding process;
And a control function for switching the bit rate in the decoding process to a second bit rate higher than the first bit rate when the recognition accuracy by the recognition function falls below a predetermined value.

Input means for inputting an image obtained by scanning a document in which one or more characters are written;
In a computer device comprising information storage means,
A coding function for performing hierarchical coding processing on the input image and acquiring hierarchical code data including a code string hierarchized according to a bit rate;
A data storage function for storing the acquired hierarchical code data in the storage means;
A decoding function for identifying a code string corresponding to a region of interest for character recognition from the hierarchical code data stored in the storage means, and performing a decoding process at a first bit rate on the identified code string When,
A recognition function for recognizing characters from the image obtained by the decoding process;
And a control function for switching the bit rate in the decoding process to a second bit rate higher than the first bit rate when the recognition accuracy by the recognition function falls below a predetermined value.

Input means for inputting an image obtained by scanning a document in which one or more characters are written;
In a computer device comprising information storage means,
A coding function for performing hierarchical coding processing on the input image and acquiring hierarchical code data including a code string hierarchized according to a bit rate;
A data storage function for storing the acquired hierarchical code data in the storage means;
A decoding function for identifying a code string corresponding to a region of interest for character recognition from the hierarchical code data stored in the storage means, and performing a decoding process at a first bit rate on the identified code string When,
A recognition function for recognizing characters from the image obtained by the decoding process;
Among the attention areas, an area narrowing function for specifying, as a new attention area, an area in which the recognition accuracy by the recognition function is lower than a predetermined value;
A program for realizing a control function for switching a bit rate at which the decoding function performs a decoding process for the new attention area to a second bit rate higher than the first bit rate.