JP2009188995A

JP2009188995A - Image processing apparatus and image processing method

Info

Publication number: JP2009188995A
Application number: JP2009002820A
Authority: JP
Inventors: Masaaki Yasunaga; 真明安永; Atsushi Tabata; 淳田畑
Original assignee: Toshiba Corp; Toshiba TEC Corp
Current assignee: Toshiba Corp; Toshiba TEC Corp
Priority date: 2008-02-08
Filing date: 2009-01-08
Publication date: 2009-08-20

Abstract

<P>PROBLEM TO BE SOLVED: To appropriately compress data while not only suppressing increase in the capacitance of a buffer to be used but also reducing the time required for matching, using a dictionary. <P>SOLUTION: In an image processing apparatus, a registration unit 42 registers extracted dictionary target data as Symbol information, assigns identification information to the Symbol information and registers, correspondingly to the identification information, Symbol position information, indicating the place in image data where the Symbol information exists. A dictionary buffer 43 holds therein the registered Symbol information and Symbol position information and when the output presence/absence information is set to a first state by a setting unit 45, an output unit 44 outputs to a compression-encoding unit 46 the Symbol information and Symbol position information held in the dictionary buffer 43. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は画像処理装置および画像処理方法に係り、特に、データを圧縮することができるようにした画像処理装置および画像処理方法に関する。 The present invention relates to an image processing apparatus and an image processing method, and more particularly, to an image processing apparatus and an image processing method capable of compressing data.

従来から、多くのデータ圧縮方法が知られている。データ圧縮方法の１つとして、ハフマン符号化や算術符号化などに代表されるエントロピー符号化が知られている。このエントロピー符号化の前処理として、データの特徴を抽出してデータを変換するモデル化処理が行われる。モデル化処理には、ジャコブ・ジヴとエイブラハム・レンペルによって開発されたデータ圧縮アルゴリズムであるLZ77やLZ78という手法が知られている。LZ77やLZ78は、既出のキャラクタの発生位置とその長さを保存することによってデータを圧縮する手法である。また、モデル化処理には、LZ77やLZ78の他に、ユニバーサル符号化(辞書化・辞書に基づいた符号化(dictionary-based coding))という手法が知られている(以下、「辞書化」と記述する)。この辞書化は、LZ77と同様に、データ量が増えれば増えるほどより高い圧縮効果が期待できる。 Conventionally, many data compression methods are known. As one of data compression methods, entropy coding represented by Huffman coding and arithmetic coding is known. As pre-processing for entropy encoding, modeling processing for extracting data features and converting the data is performed. For the modeling process, methods such as LZ77 and LZ78, which are data compression algorithms developed by Jacob Jiv and Abraham Lempel, are known. LZ77 and LZ78 are methods for compressing data by storing the occurrence positions and lengths of existing characters. In addition to LZ77 and LZ78, modeling is known as a technique called universal coding (dictionary-based coding) (hereinafter referred to as `` dictionaryization ''). Describe). Similar to LZ77, this dictionary can be expected to have a higher compression effect as the amount of data increases.

この辞書化を画像の圧縮に適用した技術の例として、ISO/IEC14492で国際標準化されている２値画像圧縮技術「JBIG2」のSymbol Dictionary Codingがある。この辞書化による圧縮では、画像のビットマップ（例えば１文字のビットマップ）を辞書に登録しておき、共通するビットマップを同一の辞書として取り扱い、辞書と位置情報を用いて圧縮する。そのため、この辞書化による圧縮は、文字コードに対して効果的であるだけでなく、画像のビットマップを辞書に登録しておけば、特定のパターンを持つ画像(文字画像やハーフトーン画像など)に対しても効果を発揮する。例えば辞書化による圧縮を用いないで「ABCBAD」という文字画像を圧縮する場合、従来は「ABCBAD」をそのまま全部画像として圧縮していた。これに対して、辞書化による圧縮を用いて「ABCBAD」という画像を圧縮する場合、「A」という画像が２つあることから、「A」という画像を１つだけ保持しつつ、残りの「A」という画像に関しては位置情報のみを持つことによりデータを圧縮し、データの削減を図る。具体的には、上記の場合、辞書化による圧縮によって以下のようなデータを作成する。 As an example of a technique in which this lexiconization is applied to image compression, there is Symbol Dictionary Coding of binary image compression technique “JBIG2” internationally standardized by ISO / IEC14492. In the compression by lexicization, an image bitmap (for example, a one-character bitmap) is registered in the dictionary, a common bitmap is handled as the same dictionary, and compression is performed using the dictionary and position information. Therefore, compression by this dictionary is not only effective for character codes, but if you register an image bitmap in the dictionary, images with a specific pattern (such as character images and halftone images) Also effective against. For example, when a character image “ABCBAD” is compressed without using dictionary compression, conventionally, “ABCBAD” is entirely compressed as an image. On the other hand, when the image “ABCBAD” is compressed using compression by dictionary conversion, since there are two images “A”, only one image “A” is retained and the remaining “A” is retained. For the image “A”, the data is compressed by having only the position information, and the data is reduced. Specifically, in the above case, the following data is created by compression by dictionary conversion.

辞書(Symbol)４種類：『A』『B』『C』『D』
辞書(Symbol)位置情報６種類：(画像A：位置(0,0))、(画像B：位置(6,0))、
(画像C：位置(12,0))、(画像B：位置(18,0))、
(画像A：位置(24,0))、(画像D：位置(30,0)) Dictionary (Symbol) 4 types: “A” “B” “C” “D”
Dictionary (Symbol) position information 6 types: (image A: position (0,0)), (image B: position (6,0)),
(Image C: Position (12,0)), (Image B: Position (18,0)),
(Image A: Position (24,0)), (Image D: Position (30,0))

辞書化による圧縮によってデータを作成する場合、次のような処理が実行される。すなわち、辞書化の対象となる画像を抽出し、抽出された辞書化対象の画像が既存の辞書(symbol)と同一か否かを判定し、既存の辞書と同一であると判定されれば辞書(Symbol)の位置情報を登録する一方、既存の辞書のいずれとも異なると判定されれば辞書(Symbol)の情報と辞書(Symbol)の位置情報を登録する。これにより、辞書(Symbol)の情報と辞書(Symbol)の位置情報を圧縮することにより高圧縮化を実現する。 When data is created by compression by dictionary conversion, the following processing is executed. That is, an image to be lexicized is extracted, it is determined whether the extracted image to be lexicized is the same as an existing dictionary (symbol), and if it is determined to be the same as the existing dictionary, the dictionary While the position information of (Symbol) is registered, if it is determined that it is different from any of the existing dictionaries, the information of the dictionary (Symbol) and the position information of the dictionary (Symbol) are registered. Thus, high compression can be realized by compressing the dictionary information and the position information of the dictionary.

なお、圧縮処理の高速化に関しては、データの１まとまり(文字列など)をひとつの辞書候補として扱ったり、あるいは、比較対象の位置を近づけたりすることによって高速化や高圧縮化を図る技術が提案されている。これらの技術は、それぞれ、特開平１０−１５０３６６号公報や特開２００３−２６４７０３号公報に開示されている。 For speeding up the compression process, there is a technique for speeding up and compressing data by treating a single piece of data (such as a character string) as one dictionary candidate or by bringing the position of the comparison target closer. Proposed. These techniques are disclosed in Japanese Patent Application Laid-Open Nos. 10-150366 and 2003-264703, respectively.

特開平１０−１５０３６６号公報Japanese Patent Laid-Open No. 10-150366 特開２００３−２６４７０３号公報JP 2003-264703 A

しかしながら、この辞書化による圧縮方法では、圧縮するデータ量が増えれば増えるほど圧縮効果は高くなるが、その分一時的に辞書(Symbol)のデータを保持する大容量のバッファが必要になるだけでなく、辞書(Symbol)の数が増えれば増えるに伴い、辞書化対象の画像と辞書(Symbol)とのマッチングに要する時間も増加してしまうという課題があった。 However, with this compression method using lexicization, the greater the amount of data to be compressed, the higher the compression effect.However, only a large-capacity buffer is needed to temporarily hold dictionary data. However, as the number of dictionaries (symbols) increases, there is a problem that the time required for matching between the image to be dictionaryd and the dictionaries (symbols) also increases.

本発明は、このような状況に鑑みてなされたものであり、バッファ使用量の増加を抑制するだけでなく、辞書とのマッチングに要する時間を低減しつつ、データを好適に圧縮することができる画像処理装置および画像処理方法を提供することを目的とする。 The present invention has been made in view of such a situation, and not only suppresses an increase in the amount of buffer usage, but also allows data to be suitably compressed while reducing the time required for matching with a dictionary. An object is to provide an image processing apparatus and an image processing method.

本発明の画像処理装置は、上述した課題を解決するために、原稿に関する画像データを読み取るスキャン部と、前記画像データから辞書化対象データを抽出する抽出部と、前記抽出部により抽出された前記辞書化対象データをSymbol情報として登録し、前記Symbol情報に対して識別情報を割り当てるとともに、前記Symbol情報が画像データ中のいずれの場所に存在するかを示すSymbol位置情報を前記識別情報と対応付けて登録する登録部と、前記登録部によって登録された前記Symbol情報および前記Symbol位置情報を保持する辞書バッファと、前記辞書バッファに保持される前記Symbol情報および前記Symbol位置情報を所定の符号化方式を用いて圧縮符号化する圧縮符号化部と、辞書化された原稿の枚数が所定の単位枚数に達したか否かに基づいて、前記辞書バッファに保持される前記Symbol情報および前記Symbol位置情報を前記圧縮符号化部に出力するか否かに関する出力有無情報を第１の状態または第２の状態に設定する設定部と、前記設定部によって前記出力有無情報が前記第１の状態に設定された場合、前記辞書バッファに保持される前記Symbol情報および前記Symbol位置情報を前記圧縮符号化部に出力する出力部とを備え、前記抽出部によって抽出された前記辞書化対象データが前記辞書バッファに保持されるいずれかの前記Symbol情報と一致する場合、前記登録部は、前記辞書化対象データを新たなSymbol情報として登録せずに、前記辞書化対象データと一致する前記Symbol情報に割り当てられた前記識別情報と対応付けて前記Symbol位置情報を登録する一方、前記抽出部によって抽出された前記辞書化対象データが前記辞書バッファに保持されるいずれの前記Symbol情報とも一致しない場合、前記登録部は、前記辞書化対象データを新たなSymbol情報として登録し、前記Symbol情報に対して前記識別情報を割り当てるとともに、前記Symbol位置情報を前記識別情報と対応付けて登録することを特徴とする。 In order to solve the above-described problems, an image processing apparatus according to the present invention includes a scanning unit that reads image data relating to a document, an extraction unit that extracts lexicographic target data from the image data, and the extraction unit that extracts the data to be dictionaryd. The data to be lexicized is registered as symbol information, identification information is assigned to the symbol information, and symbol position information indicating where the symbol information exists in image data is associated with the identification information. A registration unit for registration, a dictionary buffer for storing the symbol information and the symbol position information registered by the registration unit, and a predetermined encoding method for the symbol information and the symbol position information held in the dictionary buffer. A compression encoding unit that compresses and encodes the data using a dictionary, and the dictionary based on whether or not the number of documents in the dictionary has reached a predetermined unit number A setting unit for setting output presence / absence information related to whether or not to output the symbol information and the symbol position information held in a buffer to the compression encoding unit in the first state or the second state, and the setting unit An output unit that outputs the symbol information and the symbol position information held in the dictionary buffer to the compression encoding unit when the output presence / absence information is set to the first state; If the extracted lexical object data matches any of the symbol information held in the dictionary buffer, the registration unit does not register the lexical object data as new symbol information, and the dictionary While registering the Symbol position information in association with the identification information assigned to the Symbol information that matches the data to be converted, the lexicization extracted by the extraction unit If the data does not match any of the symbol information held in the dictionary buffer, the registration unit registers the data to be dictionaryd as new symbol information and assigns the identification information to the symbol information In addition, the symbol position information is registered in association with the identification information.

本発明の画像処理方法は、上述した課題を解決するために、原稿に関する画像データを読み取り、前記画像データから辞書化対象データを抽出し、抽出された前記辞書化対象データをSymbol情報として登録し、前記Symbol情報に対して識別情報を割り当てるとともに、前記Symbol情報が画像データ中のいずれの場所に存在するかを示すSymbol位置情報を前記識別情報と対応付けて登録し、登録された前記Symbol情報および前記Symbol位置情報を保持し、保持される前記Symbol情報および前記Symbol位置情報を所定の符号化方式を用いて圧縮符号化し、辞書化された原稿の枚数が所定の単位枚数に達したか否かに基づいて、保持される前記Symbol情報および前記Symbol位置情報を出力するか否かに関する出力有無情報を第１の状態または第２の状態に設定し、前記出力有無情報が前記第１の状態に設定された場合、前記辞書バッファに保持される前記Symbol情報および前記Symbol位置情報を出力し、抽出された前記辞書化対象データが前記辞書バッファに保持されるいずれかの前記Symbol情報と一致する場合、前記辞書化対象データを新たなSymbol情報として登録せずに、前記辞書化対象データと一致する前記Symbol情報に割り当てられた前記識別情報と対応付けて前記Symbol位置情報を登録する一方、抽出された前記辞書化対象データが前記辞書バッファに保持されるいずれの前記Symbol情報とも一致しない場合、前記辞書化対象データを新たなSymbol情報として登録し、前記Symbol情報に対して前記識別情報を割り当てるとともに、前記Symbol位置情報を前記識別情報と対応付けて登録することを特徴とする。 In order to solve the above-described problem, the image processing method of the present invention reads image data relating to a document, extracts lexical object data from the image data, and registers the extracted lexical object data as Symbol information. , Assigning identification information to the Symbol information, registering the symbol position information indicating where the Symbol information exists in the image data in association with the identification information, and registering the Symbol information The symbol position information is retained, and the stored symbol information and the symbol position information are compression-encoded using a predetermined encoding method, and whether or not the number of originals in the dictionary reaches a predetermined unit number On the basis of whether or not the symbol information and the symbol position information to be output are to be output, the output presence / absence information is set to the first state or the second state. When the presence / absence information is set to the first state, the Symbol information and the Symbol position information held in the dictionary buffer are output, and the extracted data to be lexicized is held in the dictionary buffer. If the symbol information matches the symbol information, the symbolization target data is not registered as new symbol information, but is associated with the identification information assigned to the symbol information that matches the dictionary target data. While registering position information, if the extracted lexical object data does not match any of the Symbol information held in the dictionary buffer, the lexical object data is registered as new Symbol information, and the Symbol information And the symbol position information is registered in association with the identification information.

本発明の画像処理装置は、上述した課題を解決するために、原稿に関する画像データを読み取るスキャン部と、前記画像データから辞書化対象データを抽出する抽出部と、前記抽出部により抽出された前記辞書化対象データをSymbol情報として登録し、前記Symbol情報に対して識別情報を割り当てるとともに、前記Symbol情報が画像データ中のいずれの場所に存在するかを示すSymbol位置情報を前記識別情報と対応付けて登録する登録部と、前記登録部によって登録された前記Symbol情報および前記Symbol位置情報を保持する辞書バッファと、前記辞書バッファに保持される前記Symbol情報および前記Symbol位置情報を所定の符号化方式を用いて圧縮符号化する圧縮符号化部と、辞書化された原稿の枚数が所定の単位枚数に達したか否かに基づいて、前記辞書バッファに保持される前記Symbol情報および前記Symbol位置情報を前記圧縮符号化部に出力するための出力制御信号を生成する生成部と、前記生成部によって生成された前記出力制御信号に従い、前記辞書バッファに保持される前記Symbol情報および前記Symbol位置情報を前記圧縮符号化部に出力する出力部とを備え、前記抽出部によって抽出された前記辞書化対象データが前記辞書バッファに保持されるいずれかの前記Symbol情報と一致する場合、前記登録部は、前記辞書化対象データを新たなSymbol情報として登録せずに、前記辞書化対象データと一致する前記Symbol情報に割り当てられた前記識別情報と対応付けて前記Symbol位置情報を登録する一方、前記抽出部によって抽出された前記辞書化対象データが前記辞書バッファに保持されるいずれの前記Symbol情報とも一致しない場合、前記登録部は、前記辞書化対象データを新たなSymbol情報として登録し、前記Symbol情報に対して前記識別情報を割り当てるとともに、前記Symbol位置情報を前記識別情報と対応付けて登録することを特徴とする。 In order to solve the above-described problems, an image processing apparatus according to the present invention includes a scanning unit that reads image data relating to a document, an extraction unit that extracts lexicographic target data from the image data, and the extraction unit that extracts the data to be dictionaryd. The data to be lexicized is registered as symbol information, identification information is assigned to the symbol information, and symbol position information indicating where the symbol information exists in image data is associated with the identification information. A registration unit for registration, a dictionary buffer for storing the symbol information and the symbol position information registered by the registration unit, and a predetermined encoding method for the symbol information and the symbol position information held in the dictionary buffer. A compression encoding unit that compresses and encodes the data using a dictionary, and the dictionary based on whether or not the number of documents in the dictionary has reached a predetermined unit number A generator for generating an output control signal for outputting the symbol information and the symbol position information held in a buffer to the compression encoder; and the dictionary buffer according to the output control signal generated by the generator Output the symbol information and the symbol position information held in the compression encoding unit, and the dictionary target data extracted by the extraction unit is held in the dictionary buffer When the symbol information matches the symbol information, the registration unit does not register the dictionary data as new symbol information, but associates it with the identification information assigned to the symbol information that matches the dictionary data. The Symbol position information is registered, and the lexicographic target data extracted by the extraction unit is stored in the dictionary buffer. If the information does not match the mbol information, the registration unit registers the data to be dictionaryd as new symbol information, assigns the identification information to the symbol information, and associates the symbol position information with the identification information. Registered.

本発明によれば、バッファ使用量の増加を抑制するだけでなく、辞書とのマッチングに要する時間を低減しつつ、データを好適に圧縮することができる。 According to the present invention, it is possible not only to suppress an increase in buffer usage, but also to suitably compress data while reducing the time required for matching with a dictionary.

本実施形態に係る画像処理装置の構成を示すブロック図。1 is a block diagram illustrating a configuration of an image processing apparatus according to an embodiment. 第１実施形態における制御部により実行することが可能な機能的な構成を示すブロック図。The block diagram which shows the functional structure which can be performed by the control part in 1st Embodiment. 辞書化される画像例を示す図。The figure which shows the example of an image made into a dictionary. 辞書(Symbol)の種類を示す図。The figure which shows the kind of dictionary (Symbol). 図２の画像処理装置における辞書化処理を説明するフローチャート。3 is a flowchart for explaining dictionary processing in the image processing apparatus in FIG. 2. スキャン部が２ページの原稿をスキャンして１ページ目の画像と２ページの画像を辞書化する様子を示す図。The figure which shows a mode that a scanning part scans 2 pages of originals and makes the image of the 1st page and the image of 2 pages into a dictionary. 辞書バッファの状態１を示す図。The figure which shows the state 1 of a dictionary buffer. 辞書バッファの状態２を示す図。The figure which shows the state 2 of a dictionary buffer. 辞書バッファの状態３を示す図。The figure which shows the state 3 of a dictionary buffer. 辞書バッファの状態４を示す図。The figure which shows the state 4 of a dictionary buffer. 辞書バッファの状態５を示す図。The figure which shows the state 5 of a dictionary buffer. 辞書バッファの状態６を示す図。The figure which shows the state 6 of a dictionary buffer. 第２実施形態における制御部により実行することが可能な機能的な構成を示すブロック図。The block diagram which shows the functional structure which can be performed by the control part in 2nd Embodiment. 処理ページ数、辞書化単位ページ数、辞書化単位ページ数に基づく辞書情報出力フラグ、最終処理判定フラグ、および辞書情報出力フラグの対応関係を示すテーブルを示す図。The figure which shows the table which shows the correspondence of the dictionary information output flag based on the number of process pages, the number of dictionary unit pages, the number of dictionary unit pages, the final process determination flag, and the dictionary information output flag. 第３実施形態における制御部により実行することが可能な機能的な構成を示すブロック図。The block diagram which shows the functional structure which can be performed by the control part in 3rd Embodiment. 処理ページ数、辞書バッファ使用量、辞書バッファ許容量、辞書バッファ使用量に基づく辞書情報出力フラグ、最終処理判定フラグ、および辞書情報出力フラグの対応関係を示すテーブルを示す図。The figure which shows the table which shows the correspondence of the dictionary information output flag based on the number of process pages, dictionary buffer usage, dictionary buffer allowance, dictionary buffer usage, final process determination flag, and dictionary information output flag.

以下、本発明の実施の形態について、図面を参照しながら説明する。図１は、本実施形態に係る画像処理装置１の構成を表している。図１に示されるように、画像処理装置１は、制御部１１、プリンタ駆動部１２、画像データインタフェース１３、ページメモリ１４、画像処理部１５、スキャナ部１６、および操作パネル１７を備える。制御部１は、ＣＰＵ（Central Processing Unit）２１、ＲＯＭ（Read Only Memory）２２、ＲＡＭ（Random Access Memory）２３、バス２４、ＨＤＤ（Hard Disk Drive）２５、および外部通信部２６からなる。ＣＰＵ２１は、ＲＯＭ２２に記憶されているプログラムまたはＨＤＤ２５からＲＡＭ２３にロードされた各種のアプリケーションプログラムに従って各種の処理を実行するとともに、種々の制御信号を生成し、各部に供給することにより画像処理装置１を統括的に制御する。ＲＡＭ２３は、ＣＰＵ２１が各種の処理を実行する上において必要なデータなどを適宜記憶する。ＣＰＵ２１、ＲＯＭ２２、ＲＡＭ２３、およびＨＤＤ２５は、バス２４を介して相互に接続されている。また、モデム、ターミナルアダプタ、およびネットワークインタフェースなどより構成される外部通信部２６がバス２４に接続される。外部通信部２６は、ネットワーク１８を介しての通信処理を行う。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 shows a configuration of an image processing apparatus 1 according to the present embodiment. As shown in FIG. 1, the image processing apparatus 1 includes a control unit 11, a printer driving unit 12, an image data interface 13, a page memory 14, an image processing unit 15, a scanner unit 16, and an operation panel 17. The control unit 1 includes a CPU (Central Processing Unit) 21, a ROM (Read Only Memory) 22, a RAM (Random Access Memory) 23, a bus 24, an HDD (Hard Disk Drive) 25, and an external communication unit 26. The CPU 21 executes various processes in accordance with programs stored in the ROM 22 or various application programs loaded from the HDD 25 to the RAM 23, generates various control signals, and supplies the control units to the image processing apparatus 1. Control all over. The RAM 23 appropriately stores data necessary for the CPU 21 to execute various processes. The CPU 21, ROM 22, RAM 23, and HDD 25 are connected to each other via a bus 24. An external communication unit 26 including a modem, a terminal adapter, a network interface, and the like is connected to the bus 24. The external communication unit 26 performs communication processing via the network 18.

制御部１１には、プリンタ駆動部１２、操作パネル１７、および画像データインタフェース１３が接続されている。操作パネル１７は、パネル制御部３１、表示部３２、および操作キー３３からなる。表示部３２は、例えばＬＣＤ（Liquid Crystal Display）などからなる。画像データインタフェース１３には、画像処理部１５とページメモリ１４が接続される。画像処理部１５には、スキャナ部１６が接続される。ここで、画像を形成する際の画像データの流れについて説明する。原稿が原稿台ガラス上にセットされると、スキャナ部１６によって原稿の画像データが読み取られ、読み取られた画像データは画像処理部１５に供給される。画像処理部１５は、例えばＡＳＩＣなどにより構成され、スキャナ部１６から供給された原稿の画像データを取得し、取得された画像データに対して例えばシェーディング補正や各種のフィルタリング処理、階調処理、ガンマ補正などを施す。勿論、画像処理部１５は、ソフトウェアとして制御部１１などに実装されるようにしてもよい。これらの処理後の画像データは、必要に応じて画像データインタフェース１３を介してページメモリ１４に格納される。プリンタ駆動部１２は、制御部１１の制御に従って駆動する。 A printer drive unit 12, an operation panel 17, and an image data interface 13 are connected to the control unit 11. The operation panel 17 includes a panel control unit 31, a display unit 32, and operation keys 33. The display unit 32 is composed of, for example, an LCD (Liquid Crystal Display). An image processing unit 15 and a page memory 14 are connected to the image data interface 13. A scanner unit 16 is connected to the image processing unit 15. Here, the flow of image data when an image is formed will be described. When the document is set on the platen glass, the scanner unit 16 reads the image data of the document, and the read image data is supplied to the image processing unit 15. The image processing unit 15 is configured by, for example, an ASIC and acquires image data of an original document supplied from the scanner unit 16, and the acquired image data is subjected to, for example, shading correction, various kinds of filtering processing, gradation processing, gamma, and the like. Make corrections. Of course, the image processing unit 15 may be implemented as software in the control unit 11 or the like. The processed image data is stored in the page memory 14 via the image data interface 13 as necessary. The printer driving unit 12 is driven according to the control of the control unit 11.

ここで、本実施形態においては、複数の原稿をスキャンしてスキャン後のデータを圧縮する場合において２つの方法を想定する。１つ目の方法について説明する。まず、スキャナ部１６が複数の原稿をマルチページスキャンした後、制御部１１は、スキャン後のデータを一時ファイルとして一旦制御部１１のＨＤＤ２５に格納する。次に、制御部１１は、ＨＤＤに格納されたマルチページに関する一時ファイルをＲＡＭ２３上に展開して圧縮する。２つ目の方法について説明する。まず、スキャナ部１６が複数の原稿をマルチページスキャンしつつ、制御部１１は、スキャン後のデータを一時ファイルとして制御部１１のＨＤＤ２５に順次格納する。制御部１１は、ＨＤＤ２５への一時ファイルの格納に並行して、ＨＤＤに格納された一時ファイルを順次ＲＡＭ２３上に展開して圧縮する。本実施形態においては、上記のいずれの圧縮方法にも適用することが可能である。 Here, in the present embodiment, two methods are assumed when a plurality of originals are scanned and the data after scanning is compressed. The first method will be described. First, after the scanner unit 16 performs multi-page scanning on a plurality of documents, the control unit 11 temporarily stores the scanned data in the HDD 25 of the control unit 11 as a temporary file. Next, the control unit 11 expands and compresses the temporary file related to the multipage stored in the HDD on the RAM 23. The second method will be described. First, while the scanner unit 16 performs multi-page scanning of a plurality of documents, the control unit 11 sequentially stores the scanned data as a temporary file in the HDD 25 of the control unit 11. In parallel with the storage of the temporary file in the HDD 25, the control unit 11 sequentially expands and compresses the temporary file stored in the HDD on the RAM 23. The present embodiment can be applied to any of the above compression methods.

［第１実施形態］
図２は、制御部１１により実行することが可能な機能的な構成を表している。図２に示されるように、画像処理装置１は、本発明の特徴的な構成として、辞書化対象データ抽出部４０、Symbol一致判定部４１、Symbol情報・Symbol位置情報登録部４２、辞書バッファ４３、辞書情報出力部４４、辞書情報出力フラグ設定部４５、および圧縮符号化部４６を備える。これらの構成は、制御部１１のＣＰＵ２１上でソフトウェアとして実装される。 [First Embodiment]
FIG. 2 shows a functional configuration that can be executed by the control unit 11. As shown in FIG. 2, the image processing apparatus 1 includes, as a characteristic configuration of the present invention, a dictionary data extraction unit 40, a symbol match determination unit 41, a symbol information / symbol position information registration unit 42, and a dictionary buffer 43. A dictionary information output unit 44, a dictionary information output flag setting unit 45, and a compression encoding unit 46. These configurations are implemented as software on the CPU 21 of the control unit 11.

辞書化対象データ抽出部４０は、スキャンデータとしての画像データを読み出し、読み出された画像データから辞書化対象データを抽出し、抽出された辞書化対象データをSymbol一致判定部４１とSymbol情報・Symbol位置情報登録部４２に出力する。 The lexical object data extraction unit 40 reads out image data as scan data, extracts lexicographic object data from the read image data, and uses the extracted lexical object data as a symbol match determination unit 41 and Symbol information / The information is output to the Symbol position information registration unit 42.

Symbol一致判定部４１は、Symbol比較部５１とSymbol比較結果出力部５２からなる。Symbol比較部５１は、辞書化対象データと、辞書バッファ４３に一時的に保持されている（バッファリングされている）いずれかのSymbol情報と一致しているか否かを比較判定し、一致判定結果（すなわち、両者が一致している場合には一致していることを示す結果であり、両者が一致していない場合には一致していないことを示す結果）をSymbol比較結果出力部５２に出力する。Symbol比較結果出力部５２は、Symbol比較部５１からの一致判定結果を取得し、取得された一致判定結果をSymbol情報・Symbol位置情報登録部４２に出力する。なお、Symbol比較部５１が辞書化対象データを比較判定する場合、辞書化対象データと辞書バッファ４３に一時的に保持されているいずれかのSymbol情報が画素値という点で１００％一致していなくても、所要の基準値以上相関性が存在すれば、Symbol比較部５１は両者が一致していると判定する。 The symbol coincidence determination unit 41 includes a symbol comparison unit 51 and a symbol comparison result output unit 52. The Symbol comparison unit 51 compares and determines whether the data to be lexicized matches with any of the symbol information temporarily held (buffered) in the dictionary buffer 43, and the match determination result (That is, a result indicating that the two match, and a result indicating that they do not match if both do not match) are output to the Symbol comparison result output unit 52. To do. The symbol comparison result output unit 52 acquires the match determination result from the symbol comparison unit 51 and outputs the acquired match determination result to the symbol information / symbol position information registration unit 42. When the symbol comparison unit 51 compares and determines the data to be dictionaryd, the data to be dictionaryd and any of the symbol information temporarily stored in the dictionary buffer 43 do not match 100% in terms of pixel values. However, if there is a correlation greater than the required reference value, the symbol comparison unit 51 determines that the two match.

Symbol情報・Symbol位置情報登録部４２は、Symbol情報・ＩＤ登録部５３、ＩＤ・Symbol位置情報登録部５４、およびセレクタ５５からなる。Symbol情報・ＩＤ登録部５３は、辞書化対象データを新規のSymbol情報として登録するとともに、登録されたSymbol情報に対してＩＤ番号を割り当てる。Symbol情報・ＩＤ登録部５３は、登録されたSymbol情報を辞書バッファ４３のSymbol情報バッファ５６に出力するとともに、登録されたSymbol情報に割り当てられたＩＤ番号をＩＤ・Symbol位置情報登録部５４に出力する。ＩＤ・Symbol位置情報登録部５４は、登録する辞書化対象データが画像中のいずれの場所に存在するかを示すSymbol位置情報をＩＤ番号と対応付けて登録し、登録されたSymbol位置情報とＩＤ番号を辞書バッファ４３のSymbol位置情報バッファ５７に出力する。セレクタ５５は、Symbol一致判定部４１からの一致判定結果を取得し、取得された一致判定結果が両者が一致していないことを示す結果であった場合、Symbol情報・ＩＤ登録部５３によってSymbol情報が登録されるとともにＩＤ番号が割り当てられた後にＩＤ・Symbol位置情報登録部５４によってSymbol位置情報が登録されるように、Symbol情報・ＩＤ登録部５３とＩＤ・Symbol位置情報登録部５４を制御する。一方、セレクタ５５は、取得された一致判定結果が両者が一致していることを示す結果であった場合、Symbol情報・ＩＤ登録部５３による処理を実行せずにＩＤ・Symbol位置情報登録部５４によってSymbol位置情報が登録されるように、Symbol情報・ＩＤ登録部５３とＩＤ・Symbol位置情報登録部５４を制御する。 The Symbol information / Symbol position information registration unit 42 includes a Symbol information / ID registration unit 53, an ID / Symbol position information registration unit 54, and a selector 55. The Symbol information / ID registration unit 53 registers the data to be dictionaryd as new symbol information and assigns an ID number to the registered symbol information. The Symbol information / ID registration unit 53 outputs the registered Symbol information to the Symbol information buffer 56 of the dictionary buffer 43 and outputs the ID number assigned to the registered Symbol information to the ID / Symbol position information registration unit 54. To do. The ID / Symbol position information registration unit 54 registers Symbol position information indicating where the data to be registered exists in the image in association with the ID number, and registers the registered Symbol position information and ID. The number is output to the symbol position information buffer 57 of the dictionary buffer 43. The selector 55 acquires the match determination result from the symbol match determination unit 41, and if the acquired match determination result is a result indicating that they do not match, the symbol information / ID registration unit 53 performs the symbol information And the ID / Symbol position information registration unit 54 controls the Symbol information / ID registration unit 53 and the ID / Symbol position information registration unit 54 so that the ID / Symbol position information registration unit 54 registers the symbol position information. . On the other hand, if the acquired match determination result is a result indicating that the two match, the selector 55 does not execute the processing by the symbol information / ID registration unit 53 but the ID / symbol position information registration unit 54. The symbol information / ID registration unit 53 and the ID / Symbol position information registration unit 54 are controlled so that the symbol position information is registered by the above.

辞書情報出力部４４は、ゲート５８と辞書バッファ初期化部５９からなる。ゲート５８は、辞書情報出力フラグ設定部４５によって設定された辞書情報出力フラグがＯＮである場合、辞書バッファ４３のSymbol情報バッファ５６にバッファリングされているSymbol情報とSymbol位置情報バッファ５７にバッファリングされているSymbol位置情報を辞書バッファ４３から読み出し、読み出されたSymbol情報とSymbol位置情報を圧縮符号化部４６に出力するとともに、Symbol情報とSymbol位置情報の出力後に辞書バッファ４３のSymbol情報バッファ５６とSymbol位置情報バッファ５７を初期化するための初期化ＯＮ信号を辞書バッファ初期化部５９に出力する。一方、ゲート５８は、辞書情報出力フラグがＯＦＦである場合、辞書バッファ４３のSymbol情報バッファ５６にバッファリングされているSymbol情報とSymbol位置情報バッファ５７にバッファリングされているSymbol位置情報の圧縮符号化部４６に出力せずに、辞書バッファ４３のSymbol情報バッファ５６とSymbol位置情報バッファ５７を初期化しないための初期化ＯＦＦ信号を辞書バッファ初期化部５９に出力する。 The dictionary information output unit 44 includes a gate 58 and a dictionary buffer initialization unit 59. When the dictionary information output flag set by the dictionary information output flag setting unit 45 is ON, the gate 58 buffers the symbol information buffered in the symbol information buffer 56 of the dictionary buffer 43 and the symbol position information buffer 57. The symbol position information read out from the dictionary buffer 43, and the read symbol information and symbol position information are output to the compression encoding unit 46, and the symbol information buffer of the dictionary buffer 43 is output after the output of the symbol information and the symbol position information. 56 and an initialization ON signal for initializing the symbol position information buffer 57 are output to the dictionary buffer initialization unit 59. On the other hand, when the dictionary information output flag is OFF, the gate 58 compresses the symbol information buffered in the symbol information buffer 56 of the dictionary buffer 43 and the symbol position information buffered in the symbol position information buffer 57. An initialization OFF signal for not initializing the symbol information buffer 56 and the symbol position information buffer 57 of the dictionary buffer 43 is output to the dictionary buffer initialization unit 59 without being output to the initialization unit 46.

辞書バッファ初期化部５９は、ゲート５８からの初期化ＯＮ／ＯＦＦ信号に従い、辞書バッファ４３を初期化する。すなわち、辞書バッファ初期化部５９は、ゲート５８からの初期化ＯＮ／ＯＦＦ信号が初期化ＯＮ信号である場合、辞書バッファ４３を初期化するための初期化信号を生成し、生成された初期化信号を辞書バッファ４３のSymbol情報バッファ５６とSymbol位置情報バッファ５７に出力する。一方、辞書バッファ初期化部５９は、ゲート５８からの初期化ＯＮ／ＯＦＦ信号が初期化ＯＦＦ信号である場合、辞書バッファ４３を初期化するための初期化信号を生成しない。従って、辞書バッファ初期化部５９は、初期化信号を辞書バッファ４３のSymbol情報バッファ５６とSymbol位置情報バッファ５７に出力しない。辞書情報出力フラグ設定部４５は、ユーザにより操作パネル１７が操作されることにより入力された辞書化単位ページ数に関するパラメータに基づいて、辞書情報出力フラグをＯＮまたはＯＦＦに設定する。圧縮符号化部４６は、辞書情報出力部４４のゲート５８からのSymbol情報とSymbol位置情報を取得し、取得されたSymbol情報とSymbol位置情報を圧縮符号化する。 The dictionary buffer initialization unit 59 initializes the dictionary buffer 43 in accordance with the initialization ON / OFF signal from the gate 58. That is, the dictionary buffer initialization unit 59 generates an initialization signal for initializing the dictionary buffer 43 when the initialization ON / OFF signal from the gate 58 is an initialization ON signal, and generates the generated initialization. The signal is output to the symbol information buffer 56 and the symbol position information buffer 57 of the dictionary buffer 43. On the other hand, when the initialization ON / OFF signal from the gate 58 is the initialization OFF signal, the dictionary buffer initialization unit 59 does not generate an initialization signal for initializing the dictionary buffer 43. Therefore, the dictionary buffer initialization unit 59 does not output the initialization signal to the symbol information buffer 56 and the symbol position information buffer 57 of the dictionary buffer 43. The dictionary information output flag setting unit 45 sets the dictionary information output flag to ON or OFF based on the parameter relating to the number of lexicographic unit pages input by operating the operation panel 17 by the user. The compression encoding unit 46 acquires the symbol information and the symbol position information from the gate 58 of the dictionary information output unit 44, and compresses and encodes the acquired symbol information and the symbol position information.

ここで、従来、辞書化による圧縮を用いて図３に示される「ABCBAD」という画像を圧縮する場合、「A」という画像が２つあることから、「A」という画像を１つだけ保持しつつ、残りの「A」という画像に関しては位置情報のみを持つことによりデータを圧縮し、データの削減を図る。具体的には、上記の場合、辞書化による圧縮によって以下のようなデータを作成する。 Here, conventionally, when compressing the image “ABCBAD” shown in FIG. 3 using compression by dictionary conversion, since there are two images “A”, only one image “A” is retained. On the other hand, with respect to the remaining image “A”, the data is compressed by having only position information, thereby reducing the data. Specifically, in the above case, the following data is created by compression by dictionary conversion.

図４に示される辞書(Symbol)４種類：『A』『B』『C』『D』
辞書(Symbol)位置情報６種類：(画像A：位置(0,0))、(画像B：位置(6,0))、
(画像C：位置(12,0))、(画像B：位置(18,0))、
(画像A：位置(24,0))、(画像D：位置(30,0)) 4 types of dictionaries (Symbols) shown in Fig. 4: "A""B""C""D"
Dictionary (Symbol) position information 6 types: (image A: position (0,0)), (image B: position (6,0)),
(Image C: Position (12,0)), (Image B: Position (18,0)),
(Image A: Position (24,0)), (Image D: Position (30,0))

しかしながら、従来の辞書化による圧縮方法では、圧縮するデータ量が増えれば増えるほど圧縮効果は高くなるが、その分一時的に辞書(Symbol)のデータを保持する大容量のバッファが必要になるだけでなく、辞書(Symbol)の数が増えれば増えるに伴い、辞書化対象の画像と辞書(Symbol)とのマッチングに要する時間も増加してしまう。そこで、本実施形態においては、入力される画像データのデータ量の増加に伴うバッファの使用量の増加や、マッチングに要する時間の増加に伴う処理時間の増加を防止するために、辞書化時に辞書化のタイミングコントロール信号を与え、一定条件ごとにバッファに保持されるデータを出力して符号化する。これにより、バッファ使用量の増加を抑制するだけでなく、辞書とのマッチングに要する時間を低減しつつ、データを好適に圧縮することができる。 However, with the conventional compression method based on lexiconization, the greater the amount of data to be compressed, the higher the compression effect, but only a large capacity buffer that temporarily holds the data of the dictionary (Symbol) is required. In addition, as the number of dictionaries (Symbols) increases, the time required for matching between the dictionary target image and the dictionary (Symbol) also increases. Therefore, in the present embodiment, in order to prevent an increase in the amount of buffer usage that accompanies an increase in the amount of input image data and an increase in processing time associated with an increase in the time required for matching, a dictionary is used when creating a dictionary. A timing control signal is provided, and data held in the buffer is output and encoded for each predetermined condition. As a result, not only an increase in the amount of buffer usage can be suppressed, but also the data can be suitably compressed while reducing the time required for matching with the dictionary.

図５のフローチャートを参照して、図２の画像処理装置１における辞書化処理について説明する。なお、図５のフローチャートを用いて説明する辞書化処理では、説明を簡略するために、図６に示されるようにスキャン部１６が２ページの原稿をスキャンして１ページ目の画像と２ページの画像を辞書化する場合を想定する。なお、図５のフローチャートを用いて説明する辞書化処理の前に、ユーザにより操作パネル１７が操作されることにより入力された辞書化単位ページ数に関するパラメータの値として「２」が予め入力されており、辞書情報出力フラグ設定部４５は、入力された辞書化単位ページ数に関するパラメータの値「２」に基づいて、２ページの画像の辞書登録が終了する度に辞書情報出力フラグをＯＮに設定する。 With reference to the flowchart in FIG. 5, the dictionary processing in the image processing apparatus 1 in FIG. 2 will be described. In the dictionary processing described with reference to the flowchart of FIG. 5, to simplify the description, as shown in FIG. 6, the scanning unit 16 scans a two-page document and the first page image and two pages. Assume that the image of the above is made into a dictionary. Note that, before the lexicalization processing described with reference to the flowchart of FIG. 5, “2” is input in advance as a parameter value related to the number of lexicographic unit pages input by operating the operation panel 17 by the user. The dictionary information output flag setting unit 45 sets the dictionary information output flag to ON every time the registration of the image of the two-page image is completed based on the input parameter value “2” regarding the number of lexicized unit pages. To do.

動作１において、辞書化対象データ抽出部４０は、スキャンデータとしての画像データを読み出し、読み出された画像データから辞書化対象データを抽出し、抽出された辞書化対象データをSymbol一致判定部４１とSymbol情報・Symbol位置情報登録部４２に出力する。具体的には、辞書化対象データ抽出部４０は、図６に示される１ページ目の画像データから辞書化対象データを抽出する。辞書化対象データ抽出部４０は、図６に示される１ページの画像データの中から左から右と上から下に黒画素を探索し、（２，０）の位置の黒画素を検出する。なお、左上の位置を（０，０）とする。辞書化対象データ抽出部４０は、検出された（２，０）の位置の黒画素の連結成分を抽出することにより、図６に示される１ページの画像データの中から「Ａ」という辞書化対象データを抽出する。 In operation 1, the lexicalization target data extraction unit 40 reads out image data as scan data, extracts lexicalization target data from the read image data, and uses the extracted lexicalization target data as a symbol match determination unit 41. Are output to the symbol information / symbol position information registration unit 42. Specifically, the lexical object data extraction unit 40 extracts lexical object data from the image data of the first page shown in FIG. The lexicographic object data extraction unit 40 searches for black pixels from left to right and from top to bottom from the image data of one page shown in FIG. 6, and detects the black pixel at the position (2, 0). The upper left position is (0, 0). The lexicalization target data extraction unit 40 extracts a connected component of the detected black pixel at the position (2, 0), thereby creating a lexicon “A” from the image data of one page shown in FIG. Extract the target data.

動作２において、Symbol一致判定部４１のSymbol比較部５１は、辞書化対象データと、辞書バッファ４３に一時的に保持されている（バッファリングされている）いずれかのSymbol情報と一致しているか否かを比較判定し、一致判定結果をSymbol比較結果出力部５２に出力する。Symbol比較結果出力部５２は、Symbol比較部５１からの一致判定結果を取得し、取得された一致判定結果をSymbol情報・Symbol位置情報登録部４２に出力する。具体的には、図６に示される１ページの画像データの中から「Ａ」という辞書化対象データが抽出された場合、このときに辞書バッファ４３のSymbol情報バッファ５６にはまだ「Ａ」というSymbolが保持されていないことから、Symbol比較部５１は、辞書化対象データと、辞書バッファ４３に一時的に保持されているいずれのSymbol情報とも一致していないと判定する。なお、辞書バッファ５６に予め特定のSymbolが登録・保持されるようにしてもよい。 In the operation 2, the symbol comparison unit 51 of the symbol match determination unit 41 matches the data to be dictionaryd and any of the symbol information temporarily held (buffered) in the dictionary buffer 43. A comparison determination result is output to the Symbol comparison result output unit 52. The symbol comparison result output unit 52 acquires the match determination result from the symbol comparison unit 51 and outputs the acquired match determination result to the symbol information / symbol position information registration unit 42. Specifically, when the data to be lexicized “A” is extracted from the image data of one page shown in FIG. 6, the symbol information buffer 56 of the dictionary buffer 43 still has “A”. Since Symbol is not held, the Symbol comparison unit 51 determines that the data to be dictionaryd does not match any Symbol information temporarily held in the dictionary buffer 43. Note that a specific symbol may be registered and held in the dictionary buffer 56 in advance.

動作２においてSymbol一致判定部４１のSymbol比較部５１が、辞書化対象データと、辞書バッファ４３に一時的に保持されているいずれのSymbol情報とも一致していないと判定した場合、Symbol一致判定部４１のSymbol比較部５１は、一致判定結果をSymbol比較結果出力部５２に出力する。Symbol比較結果出力部５２は、Symbol比較部５１からの一致判定結果を取得し、取得された一致判定結果をSymbol情報・Symbol位置情報登録部４２に出力する。動作３において、Symbol情報・ＩＤ登録部５３は、辞書化対象データ（「Ａ」というSymbol）を新規のSymbol情報として登録するとともに、登録されたSymbol情報に対してＩＤ番号を割り当てる。具体的には、Symbol情報・ＩＤ登録部５３は、辞書化対象データ（「Ａ」というSymbol）を新規のSymbol情報として登録するとともに、登録されたSymbol情報に対してＩＤ番号（ＩＤ１）を割り当てる。 When the symbol comparison unit 51 of the symbol match determination unit 41 in operation 2 determines that the data to be dictionaryd does not match any symbol information temporarily stored in the dictionary buffer 43, the symbol match determination unit The symbol comparison unit 51 of 41 outputs the coincidence determination result to the symbol comparison result output unit 52. The symbol comparison result output unit 52 acquires the match determination result from the symbol comparison unit 51 and outputs the acquired match determination result to the symbol information / symbol position information registration unit 42. In operation 3, the symbol information / ID registration unit 53 registers the data to be dictionaryd (the symbol “A”) as new symbol information, and assigns an ID number to the registered symbol information. Specifically, the symbol information / ID registration unit 53 registers the data to be dictionaryd (symbol “A”) as new symbol information and assigns an ID number (ID1) to the registered symbol information. .

Symbol情報・ＩＤ登録部５３は、登録されたSymbol情報を辞書バッファ４３のSymbol情報バッファ５６に出力するとともに、登録されたSymbol情報に割り当てられたＩＤ番号をＩＤ・Symbol位置情報登録部５４に出力する。ＩＤ・Symbol位置情報登録部５４は、登録する辞書化対象データが画像中のいずれの場所に存在するかを示すSymbol位置情報をＩＤ番号と対応付けて登録する。具体的には、ＩＤ・Symbol位置情報登録部５４は、登録する辞書化対象データ（「Ａ」というSymbol）が画像中のいずれの場所に存在するかを示すSymbol位置情報（１ページ目の（０，０）の位置情報）をＩＤ番号（ＩＤ１）と対応付けて登録する。 The Symbol information / ID registration unit 53 outputs the registered Symbol information to the Symbol information buffer 56 of the dictionary buffer 43 and outputs the ID number assigned to the registered Symbol information to the ID / Symbol position information registration unit 54. To do. The ID / Symbol position information registration unit 54 registers Symbol position information indicating where in the image the data to be registered exists in association with the ID number. Specifically, the ID / Symbol position information registration unit 54 displays the Symbol position information (the first page ((Symbol “A”)) that indicates where in the image the data to be registered (Symbol “A”) exists. 0,0)) is registered in association with the ID number (ID1).

ＩＤ・Symbol位置情報登録部５４は、登録されたSymbol位置情報とＩＤ番号を辞書バッファ４３のSymbol位置情報バッファ５７に出力する。辞書バッファ４３のSymbol情報バッファ５６とSymbol位置情報バッファ５７は、Symbol情報・Symbol位置情報登録部４２からのSymbol情報とSymbol位置情報をバッファリングする。図７は、辞書バッファ４３の状態１を示している。 The ID / Symbol position information registration unit 54 outputs the registered Symbol position information and ID number to the Symbol position information buffer 57 of the dictionary buffer 43. The symbol information buffer 56 and the symbol position information buffer 57 of the dictionary buffer 43 buffer the symbol information and the symbol position information from the symbol information / symbol position information registration unit 42. FIG. 7 shows state 1 of the dictionary buffer 43.

動作５において、辞書情報出力部４４のゲート５８は、辞書情報出力フラグがＯＮであるか否かを判定する。「Ａ」というSymbolの辞書化対象データがSymbol情報として登録された場合、２ページの画像の辞書登録が終了していないことから、辞書情報出力フラグ設定部４５は、辞書情報出力フラグをＯＦＦに設定する。ゲート５８は、辞書情報出力フラグがＯＦＦに設定されている場合、辞書バッファ４３のSymbol情報バッファ５６にバッファリングされているSymbol情報とSymbol位置情報バッファ５７にバッファリングされているSymbol位置情報の圧縮符号化部４６に出力せずに、辞書バッファ４３のSymbol情報バッファ５６とSymbol位置情報バッファ５７を初期化しないための初期化ＯＦＦ信号を辞書バッファ初期化部５９に出力する。これにより、辞書バッファ初期化部５９は、辞書バッファ４３を初期化するための初期化信号を生成せず、辞書バッファ初期化部５９は、辞書バッファ４３のSymbol情報バッファ５６とSymbol位置情報バッファ５７を初期化しない。 In operation 5, the gate 58 of the dictionary information output unit 44 determines whether or not the dictionary information output flag is ON. When the lexicalization data of the symbol “A” is registered as the symbol information, the dictionary information output flag setting unit 45 sets the dictionary information output flag to OFF because the dictionary registration of the two-page image is not completed. Set. When the dictionary information output flag is set to OFF, the gate 58 compresses the Symbol information buffered in the Symbol information buffer 56 of the dictionary buffer 43 and the Symbol position information buffered in the Symbol position information buffer 57. Without outputting to the encoding unit 46, an initialization OFF signal for not initializing the symbol information buffer 56 and the symbol position information buffer 57 of the dictionary buffer 43 is output to the dictionary buffer initializing unit 59. As a result, the dictionary buffer initialization unit 59 does not generate an initialization signal for initializing the dictionary buffer 43, and the dictionary buffer initialization unit 59 does not use the symbol information buffer 56 and the symbol position information buffer 57 of the dictionary buffer 43. Is not initialized.

その後、処理は動作１に戻る。 Thereafter, the processing returns to operation 1.

次に、辞書化対象データ抽出部４０は、図６に示される１ページ目の画像データから辞書化対象データを抽出する。このとき、辞書化対象データ抽出部４０は、すでに行われた辞書化処理によって登録されたSymbolを除外して辞書化対象データを抽出する。辞書化対象データ抽出部４０は、図６に示される１ページの画像データの中から左から右と上から下に黒画素を探索し、（６，０）の位置の黒画素を検出する。辞書化対象データ抽出部４０は、検出された（６，０）の位置の黒画素の連結成分を抽出することにより、図６に示される１ページの画像データの中から「Ｂ」という辞書化対象データを抽出する。 Next, the lexical object data extraction unit 40 extracts lexicographic object data from the image data of the first page shown in FIG. At this time, the lexicalization target data extracting unit 40 extracts lexicalization target data by excluding Symbols registered by the already performed lexicization process. The lexicographic object data extraction unit 40 searches for black pixels from left to right and from top to bottom in the image data of one page shown in FIG. 6, and detects a black pixel at the position (6, 0). The lexicalization target data extraction unit 40 extracts the connected component of the detected black pixel at the position (6, 0), thereby creating a lexicon “B” from the image data of one page shown in FIG. Extract the target data.

なお、辞書化対象データ抽出部４０は、すでに行われた辞書化処理によって登録されたSymbolを除外して辞書化対象データを抽出するために、辞書化対象データ抽出部４０に入力される画像データから登録されたSymbolを除外しておいてもよい。 Note that the lexicalization target data extraction unit 40 extracts image data input to the lexicalization target data extraction unit 40 in order to extract lexicalization target data by excluding Symbols registered by the already performed lexicization processing. You may exclude registered symbols from.

その後、処理は動作２に進み、辞書バッファ４３のSymbol情報バッファ５６には「Ａ」というSymbolは保持されているがまだ「Ｂ」というSymbolが保持されていないことから、動作２においてSymbol比較部５１は、辞書化対象データと、辞書バッファ４３に一時的に保持されているいずれのSymbol情報とも一致していないと判定する。そして、動作３において、Symbol情報・ＩＤ登録部５３は、辞書化対象データ（「Ｂ」というSymbol）を新規のSymbol情報として登録するとともに、登録されたSymbol情報に対してＩＤ番号（ＩＤ２）を割り当てる。ＩＤ・Symbol位置情報登録部５４は、登録する辞書化対象データ（「Ｂ」というSymbol）が画像中のいずれの場所に存在するかを示すSymbol位置情報（１ページ目の（６，０）の位置情報）をＩＤ番号（ＩＤ２）と対応付けて登録する。 Thereafter, the processing proceeds to operation 2, and the symbol information buffer 56 of the dictionary buffer 43 holds the symbol “A” but does not yet hold the symbol “B”. No. 51 determines that the data to be lexicized does not match any symbol information temporarily stored in the dictionary buffer 43. In operation 3, the symbol information / ID registration unit 53 registers the data to be dictionaryd (Symbol “B”) as new symbol information, and an ID number (ID 2) for the registered symbol information. assign. The ID / Symbol position information registering unit 54 displays Symbol position information (Symbol (6, 0) on the first page) indicating where in the image the data to be registered (Symbol “B”) exists. (Location information) is registered in association with the ID number (ID2).

辞書バッファ４３のSymbol情報バッファ５６とSymbol位置情報バッファ５７は、Symbol情報・Symbol位置情報登録部４２からのSymbol情報とSymbol位置情報をバッファリングする。図８は、辞書バッファ４３の状態２を示している。その後、処理は動作１に戻る。 The symbol information buffer 56 and the symbol position information buffer 57 of the dictionary buffer 43 buffer the symbol information and the symbol position information from the symbol information / symbol position information registration unit 42. FIG. 8 shows state 2 of the dictionary buffer 43. Thereafter, the processing returns to operation 1.

さらに同様に、辞書化対象データ抽出部４０は、図６に示される１ページの画像データの中から左から右と上から下に黒画素を探索し、（１２，０）の位置の黒画素を検出する。辞書化対象データ抽出部４０は、検出された（１２，０）の位置の黒画素の連結成分を抽出することにより、図６に示される１ページの画像データの中から「Ｃ」という辞書化対象データを抽出する。 Similarly, the lexicographic object data extraction unit 40 searches for black pixels from the left to the right and from the top to the bottom in the image data of one page shown in FIG. 6, and the black pixel at the position (12, 0). Is detected. The lexicalization target data extraction unit 40 extracts a connected component of the detected black pixel at the position (12, 0), thereby creating a lexicon “C” from the image data of one page shown in FIG. Extract the target data.

その後、処理は動作２に進み、辞書バッファ４３のSymbol情報バッファ５６には「Ａ」と「Ｂ」というSymbolは保持されているがまだ「Ｃ」というSymbolが保持されていないことから、動作２においてSymbol比較部５１は、辞書化対象データと、辞書バッファ４３に一時的に保持されているいずれのSymbol情報とも一致していないと判定する。そして、動作３において、Symbol情報・ＩＤ登録部５３は、辞書化対象データ（「Ｃ」というSymbol）を新規のSymbol情報として登録するとともに、登録されたSymbol情報に対してＩＤ番号（ＩＤ３）を割り当てる。ＩＤ・Symbol位置情報登録部５４は、登録する辞書化対象データ（「Ｃ」というSymbol）が画像中のいずれの場所に存在するかを示すSymbol位置情報（１ページ目の（１２，０）の位置情報）をＩＤ番号（ＩＤ３）と対応付けて登録する。 Thereafter, the processing proceeds to operation 2, and the symbols “A” and “B” are held in the symbol information buffer 56 of the dictionary buffer 43, but the symbol “C” is not yet held. The symbol comparison unit 51 determines that the data to be lexicized does not match any symbol information temporarily stored in the dictionary buffer 43. In operation 3, the symbol information / ID registration unit 53 registers the data to be dictionaryd (symbol “C”) as new symbol information, and an ID number (ID 3) for the registered symbol information. assign. The ID / Symbol position information registration unit 54 stores Symbol position information ((12, 0) on the first page) indicating where in the image the data to be registered (Symbol “C”) exists. (Location information) is registered in association with the ID number (ID3).

辞書バッファ４３のSymbol情報バッファ５６とSymbol位置情報バッファ５７は、Symbol情報・Symbol位置情報登録部４２からのSymbol情報とSymbol位置情報をバッファリングする。図９は、辞書バッファ４３の状態３を示している。 The symbol information buffer 56 and the symbol position information buffer 57 of the dictionary buffer 43 buffer the symbol information and the symbol position information from the symbol information / symbol position information registration unit 42. FIG. 9 shows state 3 of the dictionary buffer 43.

続いて、１ページの画像の辞書登録が終了したことから、図６に示される２ページ目の画像の辞書登録を行う。辞書化対象データ抽出部４０は、図６に示される２ページ目の画像データから辞書化対象データを抽出する。辞書化対象データ抽出部４０は、図６に示される２ページの画像データの中から左から右と上から下に黒画素を探索し、（０，０）の位置の黒画素を検出する。辞書化対象データ抽出部４０は、検出された（０，０）の位置の黒画素の連結成分を抽出することにより、図６に示される２ページの画像データの中から「Ｂ」という辞書化対象データを抽出する。 Subsequently, since the dictionary registration of the image of the first page is completed, the dictionary registration of the image of the second page shown in FIG. 6 is performed. The lexicalization target data extraction unit 40 extracts lexicalization target data from the image data of the second page shown in FIG. The lexicographic object data extraction unit 40 searches for black pixels from the left to the right and from the top to the bottom from the two pages of image data shown in FIG. 6, and detects the black pixel at the position (0, 0). The lexicalization target data extraction unit 40 extracts a connected component of the detected black pixel at the position (0, 0), thereby creating a lexicon “B” from the two pages of image data shown in FIG. Extract the target data.

その後、処理は動作２に進み、辞書バッファ４３のSymbol情報バッファ５６には「Ａ」、「Ｂ」および「Ｃ」というSymbolが保持されており、すでに「Ｂ」というSymbolが保持されていることから、動作２においてSymbol比較部５１は、辞書化対象データと、辞書バッファ４３に一時的に保持されているいずれかのSymbol情報と一致していると判定する。すなわち、Symbol比較部５１は、辞書化対象データとしての「Ｂ」というSymbolと、辞書バッファ４３に一時的に保持されている１ページ目の「Ｂ」というSymbolと一致していると判定する。そして、動作４において、ＩＤ・Symbol位置情報登録部５４は、登録する辞書化対象データ（「Ｂ」というSymbol）が画像中のいずれの場所に存在するかを示すSymbol位置情報（２ページ目の（０，０）の位置情報）を、１ページ目の「Ｂ」というSymbolを識別するためのＩＤ番号（ＩＤ２）と対応付けて登録する。 Thereafter, the processing proceeds to operation 2, and the symbols “A”, “B”, and “C” are held in the symbol information buffer 56 of the dictionary buffer 43, and the symbol “B” is already held. Therefore, in the operation 2, the symbol comparison unit 51 determines that the data to be dictionaryd matches the symbol information temporarily held in the dictionary buffer 43. That is, the symbol comparison unit 51 determines that the symbol “B” as the data to be dictionaryd matches the symbol “B” on the first page temporarily stored in the dictionary buffer 43. In operation 4, the ID / Symbol position information registration unit 54 then displays Symbol position information (second page) indicating where in the image the data to be registered (Symbol “B”) exists. (Position information (0, 0)) is registered in association with an ID number (ID2) for identifying the symbol “B” on the first page.

辞書バッファ４３のSymbol情報バッファ５６とSymbol位置情報バッファ５７は、Symbol情報・Symbol位置情報登録部４２からのSymbol情報とSymbol位置情報をバッファリングする。図１０は、辞書バッファ４３の状態４を示している。その後、処理は動作１に戻る。 The symbol information buffer 56 and the symbol position information buffer 57 of the dictionary buffer 43 buffer the symbol information and the symbol position information from the symbol information / symbol position information registration unit 42. FIG. 10 shows state 4 of the dictionary buffer 43. Thereafter, the processing returns to operation 1.

次に、辞書化対象データ抽出部４０は、図６に示される２ページ目の画像データから辞書化対象データを抽出する。辞書化対象データ抽出部４０は、図６に示される２ページの画像データの中から左から右と上から下に黒画素を探索し、（６，０）の位置の黒画素を検出する。辞書化対象データ抽出部４０は、検出された（６，０）の位置の黒画素の連結成分を抽出することにより、図６に示される２ページの画像データの中から「Ａ」という辞書化対象データを抽出する。 Next, the lexical object data extraction unit 40 extracts lexical object data from the image data of the second page shown in FIG. The lexicographic object data extraction unit 40 searches for black pixels from the left to the right and from the top to the bottom in the two pages of image data shown in FIG. 6, and detects the black pixel at the position (6, 0). The lexicalization target data extraction unit 40 extracts a connected component of the detected black pixel at the position (6, 0), thereby creating a lexicon “A” from the two pages of image data shown in FIG. Extract the target data.

その後、処理は動作２に進み、辞書バッファ４３のSymbol情報バッファ５６には「Ａ」、「Ｂ」および「Ｃ」というSymbolが保持されており、すでに「Ａ」というSymbolが保持されていることから、動作２においてSymbol比較部５１は、辞書化対象データと、辞書バッファ４３に一時的に保持されているいずれかのSymbol情報と一致していると判定する。すなわち、Symbol比較部５１は、辞書化対象データとしての「Ａ」というSymbolと、辞書バッファ４３に一時的に保持されている１ページ目の「Ａ」というSymbolと一致していると判定する。そして、動作４において、ＩＤ・Symbol位置情報登録部５４は、登録する辞書化対象データ（「Ｂ」というSymbol）が画像中のいずれの場所に存在するかを示すSymbol位置情報（２ページ目の（６，０）の位置情報）を、１ページ目の「Ａ」というSymbolを識別するためのＩＤ番号（ＩＤ１）と対応付けて登録する。 Thereafter, the process proceeds to operation 2, and the symbols “A”, “B”, and “C” are held in the symbol information buffer 56 of the dictionary buffer 43, and the symbol “A” is already held. Therefore, in the operation 2, the symbol comparison unit 51 determines that the data to be dictionaryd matches the symbol information temporarily held in the dictionary buffer 43. In other words, the symbol comparison unit 51 determines that the symbol “A” as the data to be dictionaryd matches the symbol “A” on the first page temporarily stored in the dictionary buffer 43. In operation 4, the ID / Symbol position information registration unit 54 then displays Symbol position information (second page) indicating where in the image the data to be registered (Symbol “B”) exists. (Position information (6, 0)) is registered in association with the ID number (ID1) for identifying the symbol “A” on the first page.

辞書バッファ４３のSymbol情報バッファ５６とSymbol位置情報バッファ５７は、Symbol情報・Symbol位置情報登録部４２からのSymbol情報とSymbol位置情報をバッファリングする。図１１は、辞書バッファ４３の状態５を示している。その後、処理は動作１に戻る。 The symbol information buffer 56 and the symbol position information buffer 57 of the dictionary buffer 43 buffer the symbol information and the symbol position information from the symbol information / symbol position information registration unit 42. FIG. 11 shows state 5 of the dictionary buffer 43. Thereafter, the processing returns to operation 1.

さらに、辞書化対象データ抽出部４０は、図６に示される２ページ目の画像データから辞書化対象データを抽出する。辞書化対象データ抽出部４０は、図６に示される２ページの画像データの中から左から右と上から下に黒画素を探索し、（１２，０）の位置の黒画素を検出する。辞書化対象データ抽出部４０は、検出された（１２，０）の位置の黒画素の連結成分を抽出することにより、図６に示される２ページの画像データの中から「Ｄ」という辞書化対象データを抽出する。 Further, the lexicalization target data extraction unit 40 extracts lexicalization target data from the image data of the second page shown in FIG. The lexicographic object data extraction unit 40 searches for black pixels from the left to the right and from the top to the bottom from the two pages of image data shown in FIG. 6, and detects the black pixel at the position (12, 0). The lexicalization target data extraction unit 40 extracts a connected component of the detected black pixel at the position (12,0), thereby creating a lexicon “D” from the two pages of image data shown in FIG. Extract the target data.

その後、処理は動作２に進み、辞書バッファ４３のSymbol情報バッファ５６にはまだ「Ｄ」というSymbolが保持されていないことから、Symbol比較部５１は、辞書化対象データと、辞書バッファ４３に一時的に保持されているいずれのSymbol情報とも一致していないと判定する。 Thereafter, the process proceeds to operation 2, and since the symbol “D” is not yet held in the symbol information buffer 56 of the dictionary buffer 43, the symbol comparison unit 51 temporarily stores the data to be dictionaryd and the dictionary buffer 43. It is determined that it does not match any of the symbol information held in the database.

動作２においてSymbol一致判定部４１のSymbol比較部５１が、辞書化対象データと、辞書バッファ４３に一時的に保持されているいずれのSymbol情報とも一致していないと判定した場合、Symbol一致判定部４１のSymbol比較部５１は、一致判定結果をSymbol比較結果出力部５２に出力する。Symbol比較結果出力部５２は、Symbol比較部５１からの一致判定結果を取得し、取得された一致判定結果をSymbol情報・Symbol位置情報登録部４２に出力する。動作３において、Symbol情報・ＩＤ登録部５３は、辞書化対象データ（「Ｄ」というSymbol）を新規のSymbol情報として登録するとともに、登録されたSymbol情報に対してＩＤ番号を割り当てる。具体的には、Symbol情報・ＩＤ登録部５３は、辞書化対象データ（「Ｄ」というSymbol）を新規のSymbol情報として登録するとともに、登録されたSymbol情報に対してＩＤ番号（ＩＤ４）を割り当てる。ＩＤ・Symbol位置情報登録部５４は、登録する辞書化対象データ（「Ｄ」というSymbol）が画像中のいずれの場所に存在するかを示すSymbol位置情報（２ページ目の（１２，０）の位置情報）をＩＤ番号（ＩＤ４）と対応付けて登録する。 When the symbol comparison unit 51 of the symbol match determination unit 41 in operation 2 determines that the data to be dictionaryd does not match any symbol information temporarily stored in the dictionary buffer 43, the symbol match determination unit The symbol comparison unit 51 of 41 outputs the coincidence determination result to the symbol comparison result output unit 52. The symbol comparison result output unit 52 acquires the match determination result from the symbol comparison unit 51 and outputs the acquired match determination result to the symbol information / symbol position information registration unit 42. In operation 3, the symbol information / ID registration unit 53 registers the data to be dictionaryd (the symbol “D”) as new symbol information, and assigns an ID number to the registered symbol information. Specifically, the symbol information / ID registration unit 53 registers the data to be dictionaryd (symbol “D”) as new symbol information and assigns an ID number (ID4) to the registered symbol information. . The ID / Symbol position information registration unit 54 stores Symbol position information ((12, 0) on the second page) indicating where in the image the data to be registered (Symbol “D”) exists. (Location information) is registered in association with the ID number (ID4).

ＩＤ・Symbol位置情報登録部５４は、登録されたSymbol位置情報とＩＤ番号を辞書バッファ４３のSymbol位置情報バッファ５７に出力する。辞書バッファ４３のSymbol情報バッファ５６とSymbol位置情報バッファ５７は、Symbol情報・Symbol位置情報登録部４２からのSymbol情報とSymbol位置情報をバッファリングする。図１２は、辞書バッファ４３の状態６を示している。 The ID / Symbol position information registration unit 54 outputs the registered Symbol position information and ID number to the Symbol position information buffer 57 of the dictionary buffer 43. The symbol information buffer 56 and the symbol position information buffer 57 of the dictionary buffer 43 buffer the symbol information and the symbol position information from the symbol information / symbol position information registration unit 42. FIG. 12 shows state 6 of the dictionary buffer 43.

動作５において、辞書情報出力部４４のゲート５８は、辞書情報出力フラグがＯＮであるか否かを判定する。「Ｄ」というSymbolの辞書化対象データがSymbol情報として登録された場合、２ページの画像の辞書登録が終了したことから、辞書情報出力フラグ設定部４５は、辞書情報出力フラグをＯＮに設定する。ゲート５８は動作６で、辞書情報出力フラグがＯＮに設定されている場合、辞書バッファ４３のSymbol情報バッファ５６にバッファリングされているSymbol情報とSymbol位置情報バッファ５７にバッファリングされているSymbol位置情報を圧縮符号化部４６に出力するとともに、辞書バッファ４３のSymbol情報バッファ５６とSymbol位置情報バッファ５７を初期化するための初期化ＯＮ信号を辞書バッファ初期化部５９に出力する。圧縮符号化部４６は、辞書情報出力部４４のゲート５８からのSymbol情報とSymbol位置情報を取得し、取得されたSymbol情報とSymbol位置情報を圧縮符号化する。なお、辞書情報出力フラグ設定部４５は、辞書バッファ４３のSymbol情報バッファ５６にバッファリングされているSymbol情報とSymbol位置情報バッファ５７にバッファリングされているSymbol位置情報を圧縮符号化部４６に出力した後、辞書情報出力フラグをＯＦＦに設定する。 In operation 5, the gate 58 of the dictionary information output unit 44 determines whether or not the dictionary information output flag is ON. When the data to be lexicographed for the symbol “D” is registered as symbol information, the dictionary information output flag setting unit 45 sets the dictionary information output flag to ON because the dictionary registration of the two-page image has been completed. . When the dictionary information output flag is set to ON in the operation 58 in the gate 58, the symbol information buffered in the symbol information buffer 56 of the dictionary buffer 43 and the symbol position buffered in the symbol position information buffer 57 are displayed. Information is output to the compression encoding unit 46 and an initialization ON signal for initializing the symbol information buffer 56 and the symbol position information buffer 57 of the dictionary buffer 43 is output to the dictionary buffer initializing unit 59. The compression encoding unit 46 acquires the symbol information and the symbol position information from the gate 58 of the dictionary information output unit 44, and compresses and encodes the acquired symbol information and the symbol position information. The dictionary information output flag setting unit 45 outputs the symbol information buffered in the symbol information buffer 56 of the dictionary buffer 43 and the symbol position information buffered in the symbol position information buffer 57 to the compression encoding unit 46. After that, the dictionary information output flag is set to OFF.

動作７において、辞書バッファ初期化部５９は、辞書バッファ４３を初期化するための初期化信号を生成し、生成された初期化信号を辞書バッファ４３に出力し、辞書バッファ４３のSymbol情報バッファ５６とSymbol位置情報バッファ５７を初期化する。 In operation 7, the dictionary buffer initialization unit 59 generates an initialization signal for initializing the dictionary buffer 43, outputs the generated initialization signal to the dictionary buffer 43, and displays the symbol information buffer 56 of the dictionary buffer 43. The Symbol position information buffer 57 is initialized.

これにより、ユーザが辞書化処理前に操作パネル１７を操作することによって辞書化単位ページ数に関するパラメータを指定することで、辞書情報の圧縮符号化部４６への出力および辞書バッファ４３の初期化のタイミングを任意に設定することができる。従って、既知の１ページ単位で１つの辞書を作成したり、マルチページスキャンによって全ページを１つのファイルにした上で１つの辞書を作成することができる。また、ユーザが所望する任意のタイミングでこまめに辞書バッファ４３を初期化することができるので、辞書バッファ４３を節約して使用することができる。その結果、ワークメモリを十分に確保することができない状況下においても、好適に辞書化処理を実行することができる。以上により、バッファ使用量の増加を抑制するだけでなく、辞書とのマッチングに要する時間を低減しつつ、データを好適に圧縮することができる。特に、１ページごとに１つの辞書を作成するようにすると、辞書バッファ４３のメモリ領域の量を抑えることができる。また、全ページで１つの辞書を作成するようにすると、ファイルサイズを最小限することができる。さらに、数ページごとに１つの辞書を作成するようにすると、辞書バッファ４３のメモリ領域の量を抑えつつ、ファイルサイズを小さくすることができる。 As a result, the user operates the operation panel 17 before lexicographic processing to specify parameters relating to the number of lexicographic unit pages, whereby the dictionary information is output to the compression encoding unit 46 and the dictionary buffer 43 is initialized. Timing can be set arbitrarily. Therefore, one dictionary can be created for each known page, or one dictionary can be created after all pages are made into one file by multi-page scanning. Further, since the dictionary buffer 43 can be frequently initialized at an arbitrary timing desired by the user, the dictionary buffer 43 can be saved and used. As a result, the dictionary processing can be suitably executed even in a situation where a sufficient work memory cannot be secured. As described above, it is possible not only to suppress an increase in the buffer usage amount but also to suitably compress data while reducing the time required for matching with the dictionary. In particular, if one dictionary is created for each page, the amount of memory area in the dictionary buffer 43 can be reduced. If one dictionary is created for all pages, the file size can be minimized. Furthermore, if one dictionary is created for every several pages, the file size can be reduced while reducing the amount of memory area of the dictionary buffer 43.

勿論、図５のフローチャートを用いて説明した辞書化処理の場合、説明を簡略するために、図６に示されるようにスキャン部１６が２ページの原稿をスキャンして１ページ目の画像と２ページの画像を辞書化する場合を想定したが、例えば１０ページの画像を辞書化する場合に、２ページの画像の辞書登録が終了する度に辞書情報出力フラグをＯＮに設定しつつ、１０ページの画像の辞書化処理を実行することができる。 Of course, in the case of the dictionary processing described with reference to the flowchart of FIG. 5, to simplify the description, as shown in FIG. 6, the scanning unit 16 scans a two-page document and the first page image and 2 Although the case where the image of the page is converted into a dictionary is assumed, for example, when the image of the 10 page is converted into a dictionary, the dictionary information output flag is set to ON every time the dictionary registration of the image of the 2 page is completed, and the page 10 It is possible to execute lexicographic processing of the images.

［第２実施形態］
図１３は、制御部１１により実行することが可能な機能的な構成を表している。図１３に示されるように、画像処理装置１は、本発明の特徴的な構成として、辞書化対象データ抽出部４０、Symbol一致判定部４１、Symbol情報・Symbol位置情報登録部４２、辞書バッファ４３、辞書情報出力部４４、辞書情報出力フラグ設定部４５、および圧縮符号化部４６を備える。これらの構成は、制御部１１のＣＰＵ２１上でソフトウェアとして実装される。なお、図２に対応する構成については同様の符号を付しており、その説明は繰り返しになるので省略する。 [Second Embodiment]
FIG. 13 illustrates a functional configuration that can be executed by the control unit 11. As shown in FIG. 13, the image processing apparatus 1 includes, as a characteristic configuration of the present invention, a dictionary target data extraction unit 40, a symbol match determination unit 41, a symbol information / symbol position information registration unit 42, and a dictionary buffer 43. A dictionary information output unit 44, a dictionary information output flag setting unit 45, and a compression encoding unit 46. These configurations are implemented as software on the CPU 21 of the control unit 11. The components corresponding to those in FIG. 2 are denoted by the same reference numerals, and the description thereof will be omitted because it will be repeated.

辞書情報出力フラグ設定部４５は、辞書化タイミング判定部６１と演算部６２からなる。辞書情報出力フラグ設定部４５は、ユーザにより操作パネル１７が操作されることにより入力された辞書化単位ページ数に関するパラメータおよび処理ページ数に関するパラメータに基づいて、辞書情報出力フラグをＯＮまたはＯＦＦに設定する。具体的には、辞書化タイミング判定部６１は、辞書化単位ページ数に関するパラメータおよび処理ページ数に関するパラメータに基づいて辞書化タイミングであるか否かを判定し、辞書化のタイミングであると判定した場合には辞書情報出力フラグをＯＮに設定する。すなわち、辞書化処理を実行する総ページ数が５ページであり、かつ辞書化単位ページ数が３ページである場合、辞書化タイミング判定部６１は、現在辞書化しているページ数が３ページ目であるときには、辞書化のタイミングであると判定して辞書情報出力フラグをＯＮに設定するとともに、それ以外のときには、辞書化のタイミングではないと判定して辞書情報出力フラグをＯＦＦに設定する。辞書化タイミング判定部６１は、辞書化単位ページ数に基づく辞書情報出力フラグのＯＮ／ＯＦＦ設定信号を演算部６２に出力する。一方、辞書情報出力フラグ設定部４５は、現在辞書化しているページが最終ページである場合、最終処理判定フラグをＯＮに設定するとともに、それ以外の場合には、最終処理判定フラグをＯＦＦに設定する。この最終処理判定フラグのＯＮ／ＯＦＦ設定信号は演算部６２に入力される。演算部６２は、辞書化タイミング判定部６１からの辞書化単位ページ数に基づく辞書情報出力フラグのＯＮ／ＯＦＦ設定信号と最終処理判定フラグのＯＮ／ＯＦＦ設定信号に基づいてＯＲ演算処理を施し、演算結果を辞書情報出力フラグの設定信号として辞書情報出力部４４に出力する。 The dictionary information output flag setting unit 45 includes a lexicon timing determination unit 61 and a calculation unit 62. The dictionary information output flag setting unit 45 sets the dictionary information output flag to ON or OFF based on the parameter related to the number of lexicized unit pages and the parameter related to the number of processed pages input by operating the operation panel 17 by the user. To do. Specifically, the lexicalization timing determination unit 61 determines whether or not it is lexicographic timing based on the parameter regarding the number of lexicographic unit pages and the parameter regarding the number of processed pages, and determines that it is the lexicization timing. In this case, the dictionary information output flag is set to ON. That is, when the total number of pages for which lexicization processing is performed is 5 and the number of lexicographic unit pages is 3, the lexicalization timing determination unit 61 determines that the number of pages currently lexicized is the 3rd page. In some cases, it is determined that it is a lexicalization timing, and the dictionary information output flag is set to ON. In other cases, it is determined that it is not a lexicization timing, and the dictionary information output flag is set to OFF. The lexicalization timing determination unit 61 outputs an ON / OFF setting signal of a dictionary information output flag based on the number of lexicographic unit pages to the calculation unit 62. On the other hand, the dictionary information output flag setting unit 45 sets the final process determination flag to ON when the page currently being dictionaryd is the final page, and sets the final process determination flag to OFF otherwise. To do. The final processing determination flag ON / OFF setting signal is input to the calculation unit 62. The calculation unit 62 performs an OR calculation process based on the ON / OFF setting signal of the dictionary information output flag and the ON / OFF setting signal of the final processing determination flag based on the number of lexicographic unit pages from the lexicization timing determination unit 61, The calculation result is output to the dictionary information output unit 44 as a dictionary information output flag setting signal.

図１４は、処理ページ数、辞書化単位ページ数、辞書化単位ページ数に基づく辞書情報出力フラグ、最終処理判定フラグ、および辞書情報出力フラグの対応関係を示すテーブルである。図１４に示されるように、辞書化処理を実行する総ページ数が５ページであり、かつ辞書化単にページ数が３ページである場合、現在の処理ページ数が「１」のときには、辞書化単位ページ数に基づく辞書情報出力フラグ、最終処理判定フラグ、および辞書情報出力フラグはそれぞれ「ＯＦＦ」、「ＯＦＦ」、および「ＯＦＦ」となる。現在の処理ページ数が「２」のときには、辞書化単位ページ数に基づく辞書情報出力フラグ、最終処理判定フラグ、および辞書情報出力フラグはそれぞれ「ＯＦＦ」、「ＯＦＦ」、および「ＯＦＦ」となる。現在の処理ページ数が「３」のときには、辞書化単位ページ数に基づく辞書情報出力フラグ、最終処理判定フラグ、および辞書情報出力フラグはそれぞれ「ＯＮ」、「ＯＦＦ」、および「ＯＮ」となる。現在の処理ページ数が「４」のときには、辞書化単位ページ数に基づく辞書情報出力フラグ、最終処理判定フラグ、および辞書情報出力フラグはそれぞれ「ＯＦＦ」、「ＯＦＦ」、および「ＯＦＦ」となる。現在の処理ページ数が「５」のときには、辞書化単位ページ数に基づく辞書情報出力フラグ、最終処理判定フラグ、および辞書情報出力フラグはそれぞれ「ＯＦＦ」、「ＯＮ」、および「ＯＮ」となる。 FIG. 14 is a table showing a correspondence relationship between the number of processed pages, the number of lexicized unit pages, the dictionary information output flag based on the number of lexicized unit pages, the final processing determination flag, and the dictionary information output flag. As shown in FIG. 14, when the total number of pages for which lexicographic processing is executed is 5 pages and the number of pages is simply 3 pages, when the current number of processed pages is “1”, lexicization is performed. The dictionary information output flag, final processing determination flag, and dictionary information output flag based on the number of unit pages are “OFF”, “OFF”, and “OFF”, respectively. When the current processing page number is “2”, the dictionary information output flag, the final processing determination flag, and the dictionary information output flag based on the number of lexicographic unit pages are “OFF”, “OFF”, and “OFF”, respectively. . When the current processing page number is “3”, the dictionary information output flag, the final processing determination flag, and the dictionary information output flag based on the number of lexicographic unit pages are “ON”, “OFF”, and “ON”, respectively. . When the current processing page number is “4”, the dictionary information output flag, the final processing determination flag, and the dictionary information output flag based on the number of lexicographic unit pages are “OFF”, “OFF”, and “OFF”, respectively. . When the current processing page number is “5”, the dictionary information output flag, the final processing determination flag, and the dictionary information output flag based on the number of lexicographic unit pages are “OFF”, “ON”, and “ON”, respectively. .

従って、辞書化処理を実行する総ページ数が５ページであり、かつ辞書化単にページ数が３ページである場合には、３ページ目と５ページ目に辞書バッファ４３に辞書登録されたSymbol情報とSymbol位置情報が圧縮符号化部４６に出力されるとともに、辞書バッファ４３にバッファリングされたSymbol情報とSymbol位置情報が初期化される。なお、図１３に示される画像処理装置１の場合であっても、辞書化処理は図５に示される処理と同様である。但し、図２の画像処理装置１の場合は、辞書バッファ４３に辞書登録されたSymbol情報とSymbol位置情報の符号化と初期化のタイミングが異なることとなる。 Therefore, when the total number of pages for executing the dictionary processing is 5 pages and the number of pages is simply 3 pages, the symbol information registered in the dictionary buffer 43 in the third and fifth pages is stored in the dictionary information. And the symbol position information are output to the compression encoding unit 46, and the symbol information and the symbol position information buffered in the dictionary buffer 43 are initialized. Even in the case of the image processing apparatus 1 shown in FIG. 13, the dictionary processing is the same as the processing shown in FIG. However, in the case of the image processing apparatus 1 in FIG. 2, the encoding timing and initialization timing of Symbol information and Symbol position information registered in the dictionary buffer 43 are different.

これにより、一定ページ数ごとに辞書化することができるとともに、最終ページにおいても辞書化のタイミングを制御することができる。なお、図１３と図１４の場合、処理ページ数に関するパラメータを操作パネル１７上から受け付けるようにしているが、このような場合に限られず、画像処理装置１の制御部１１においてページ数をカウントアップするようにしてもよい。また、処理ページ数に限られず、処理領域単位（例えば１ページの半分の領域や３分の１の領域など）で辞書化するようにしてもよい。以上により、ユーザが所望する任意のタイミングでこまめに辞書バッファ４３を初期化することができるので、辞書バッファ４３を節約して使用することができる。その結果、ワークメモリ量をより効率的に削減する確保することができる。そして、ユーザは、簡単に符合量や使用メモリ量のバランスを任意に調整することができる。従って、バッファ使用量の増加を抑制するだけでなく、辞書とのマッチングに要する時間を低減しつつ、データを好適に圧縮することができる。 Thereby, it is possible to create a dictionary for each predetermined number of pages, and it is possible to control the timing of dictionary formation on the last page. 13 and 14, parameters related to the number of processed pages are received from the operation panel 17. However, the present invention is not limited to this, and the control unit 11 of the image processing apparatus 1 counts up the number of pages. You may make it do. Further, the number of pages to be processed is not limited, and a dictionary may be formed in units of processing areas (for example, a half area of one page or a one-third area). As described above, since the dictionary buffer 43 can be frequently initialized at an arbitrary timing desired by the user, the dictionary buffer 43 can be saved and used. As a result, it is possible to ensure that the amount of work memory is more efficiently reduced. Then, the user can easily adjust the balance of the code amount and the used memory amount arbitrarily. Therefore, it is possible not only to suppress an increase in the buffer usage amount but also to suitably compress the data while reducing the time required for matching with the dictionary.

［第３実施形態］
図１５は、制御部１１により実行することが可能な機能的な構成を表している。図１５に示されるように、画像処理装置１は、本発明の特徴的な構成として、辞書化対象データ抽出部４０、Symbol一致判定部４１、Symbol情報・Symbol位置情報登録部４２、辞書バッファ４３、辞書情報出力部４４、辞書情報出力フラグ設定部４５、および圧縮符号化部４６を備える。これらの構成は、制御部１１のＣＰＵ２１上でソフトウェアとして実装される。なお、図２と図１３に対応する構成については同様の符号を付しており、その説明は繰り返しになるので適宜省略する。 [Third Embodiment]
FIG. 15 illustrates a functional configuration that can be executed by the control unit 11. As shown in FIG. 15, the image processing apparatus 1 includes, as a characteristic configuration of the present invention, a dictionary target data extraction unit 40, a symbol match determination unit 41, a symbol information / symbol position information registration unit 42, and a dictionary buffer 43. A dictionary information output unit 44, a dictionary information output flag setting unit 45, and a compression encoding unit 46. These configurations are implemented as software on the CPU 21 of the control unit 11. 2 and FIG. 13 are denoted by the same reference numerals, and the description thereof will be repeated, and will be omitted as appropriate.

辞書情報出力フラグ設定部４５は、辞書化タイミング判定部６１と演算部６２からなる。辞書情報出力フラグ設定部４５は、辞書バッファ４３からの辞書バッファ使用量および辞書バッファ許容量に基づいて、辞書情報出力フラグをＯＮまたはＯＦＦに設定する。具体的には、辞書化タイミング判定部６１は、辞書バッファ４３からの辞書バッファ使用量および辞書バッファ許容量に基づいて辞書化タイミングであるか否かを判定し、辞書化のタイミングであると判定した場合には辞書情報出力フラグをＯＮに設定する。すなわち、辞書バッファ４３からの辞書バッファ許容量が１００ＫＢである場合、辞書化タイミング判定部６１は、辞書バッファ使用量が１０５ＫＢであるときには辞書バッファ使用量が辞書バッファ許容量よりも大きいことから、辞書化のタイミングであると判定して辞書情報出力フラグをＯＮに設定するとともに、辞書バッファ使用量が辞書バッファ許容量以下のときには、辞書化のタイミングではないと判定して辞書情報出力フラグをＯＦＦに設定する。辞書化タイミング判定部６１は、辞書バッファ使用量に基づく辞書情報出力フラグのＯＮ／ＯＦＦ設定信号を演算部６２に出力する。なお、辞書バッファ４３は辞書バッファ使用量を管理しており、辞書バッファの使用に応じて辞書バッファ使用量を逐次インクリメントする。また、辞書バッファ４３の使用量として、辞書バッファ４３のSymbol情報バッファ５６の使用量を用いてもよいし、辞書バッファ４３のSymbol位置情報バッファ５７の使用量を用いてもよいし、両方の使用量を合わせた辞書バッファ４３の全体の使用量を用いるようにしてもよい。 The dictionary information output flag setting unit 45 includes a lexicon timing determination unit 61 and a calculation unit 62. The dictionary information output flag setting unit 45 sets the dictionary information output flag to ON or OFF based on the dictionary buffer usage amount and the dictionary buffer allowable amount from the dictionary buffer 43. Specifically, the lexicalization timing determination unit 61 determines whether it is lexicographic timing based on the dictionary buffer usage amount and the dictionary buffer allowable amount from the dictionary buffer 43, and determines that it is the lexicization timing. If it does, the dictionary information output flag is set to ON. That is, when the dictionary buffer allowable amount from the dictionary buffer 43 is 100 KB, the lexicon timing determination unit 61 determines that the dictionary buffer usage amount is larger than the dictionary buffer allowable amount when the dictionary buffer usage amount is 105 KB. The dictionary information output flag is set to ON and the dictionary information output flag is set to OFF when the dictionary buffer usage amount is less than the dictionary buffer allowable amount. Set. The lexicalization timing determination unit 61 outputs an ON / OFF setting signal of a dictionary information output flag based on the dictionary buffer usage amount to the calculation unit 62. Note that the dictionary buffer 43 manages the dictionary buffer usage, and sequentially increments the dictionary buffer usage according to the usage of the dictionary buffer. Further, as the usage amount of the dictionary buffer 43, the usage amount of the Symbol information buffer 56 of the dictionary buffer 43 may be used, or the usage amount of the Symbol position information buffer 57 of the dictionary buffer 43 may be used. The total amount of use of the dictionary buffer 43 combined with the amount may be used.

一方、辞書情報出力フラグ設定部４５は、現在辞書化しているページが最終ページである場合、最終処理判定フラグをＯＮに設定するとともに、それ以外の場合には、最終処理判定フラグをＯＦＦに設定する。この最終処理判定フラグのＯＮ／ＯＦＦ設定信号は演算部６２に入力される。演算部６２は、辞書化タイミング判定部６１からの辞書バッファ使用量に基づく辞書情報出力フラグのＯＮ／ＯＦＦ設定信号と最終処理判定フラグのＯＮ／ＯＦＦ設定信号に基づいてＯＲ演算処理を施し、演算結果を辞書情報出力フラグの設定信号として辞書情報出力部４４に出力する。なお、辞書バッファ使用量が辞書バッファ許容量を超える前に、辞書バッファ４３に辞書登録されたSymbol情報とSymbol位置情報の符号化および初期化を行うようにしてもよい。 On the other hand, the dictionary information output flag setting unit 45 sets the final process determination flag to ON when the page currently being dictionaryd is the final page, and sets the final process determination flag to OFF otherwise. To do. The final processing determination flag ON / OFF setting signal is input to the calculation unit 62. The calculation unit 62 performs OR calculation processing based on the ON / OFF setting signal of the dictionary information output flag and the ON / OFF setting signal of the final processing determination flag based on the dictionary buffer usage from the dictionary timing determination unit 61, The result is output to dictionary information output unit 44 as a dictionary information output flag setting signal. Note that before the dictionary buffer usage amount exceeds the dictionary buffer allowable amount, the symbol information and the symbol position information registered in the dictionary buffer 43 may be encoded and initialized.

図１６は、処理ページ数、辞書バッファ使用量、辞書バッファ許容量、辞書バッファ使用量に基づく辞書情報出力フラグ、最終処理判定フラグ、および辞書情報出力フラグの対応関係を示すテーブルである。図１６に示されるように、辞書バッファ許容量が１００ＫＢである合、現在の辞書バッファ使用量が「３０ＫＢ」のときには、辞書バッファ使用量に基づく辞書情報出力フラグ、最終処理判定フラグ、および辞書情報出力フラグはそれぞれ「ＯＦＦ」、「ＯＦＦ」、および「ＯＦＦ」となる。現在の辞書バッファ使用量が「７０ＫＢ」のときには、辞書バッファ使用量に基づく辞書情報出力フラグ、最終処理判定フラグ、および辞書情報出力フラグはそれぞれ「ＯＦＦ」、「ＯＦＦ」、および「ＯＦＦ」となる。現在の辞書バッファ使用量が「１０５ＫＢ」のときには、辞書バッファ使用量に基づく辞書情報出力フラグ、最終処理判定フラグ、および辞書情報出力フラグはそれぞれ「ＯＮ」、「ＯＦＦ」、および「ＯＮ」となる。現在の辞書バッファ使用量が「４５ＫＢ」のときには、辞書バッファ使用量に基づく辞書情報出力フラグ、最終処理判定フラグ、および辞書情報出力フラグはそれぞれ「ＯＦＦ」、「ＯＦＦ」、および「ＯＦＦ」となる。現在の辞書バッファ使用量が「７０ＫＢ」であっても現在の処理ページ数が最終ページのときには、辞書バッファ使用量に基づく辞書情報出力フラグ、最終処理判定フラグ、および辞書情報出力フラグはそれぞれ「ＯＦＦ」、「ＯＮ」、および「ＯＮ」となる。 FIG. 16 is a table showing the correspondence between the number of processed pages, dictionary buffer usage, dictionary buffer allowable amount, dictionary information output flag based on dictionary buffer usage, final processing determination flag, and dictionary information output flag. As shown in FIG. 16, when the dictionary buffer allowable amount is 100 KB, and the current dictionary buffer usage amount is “30 KB”, the dictionary information output flag, final processing determination flag, and dictionary information based on the dictionary buffer usage amount The output flags are “OFF”, “OFF”, and “OFF”, respectively. When the current dictionary buffer usage is “70 KB”, the dictionary information output flag, final processing determination flag, and dictionary information output flag based on the dictionary buffer usage are “OFF”, “OFF”, and “OFF”, respectively. . When the current dictionary buffer usage amount is “105 KB”, the dictionary information output flag, final processing determination flag, and dictionary information output flag based on the dictionary buffer usage amount are “ON”, “OFF”, and “ON”, respectively. . When the current dictionary buffer usage amount is “45 KB”, the dictionary information output flag, final processing determination flag, and dictionary information output flag based on the dictionary buffer usage amount are “OFF”, “OFF”, and “OFF”, respectively. . If the current dictionary buffer usage is “70 KB” and the current number of processed pages is the last page, the dictionary information output flag, final processing determination flag, and dictionary information output flag based on the dictionary buffer usage are each set to “OFF”. ”,“ ON ”, and“ ON ”.

従って、辞書化処理を実行する総ページ数が５ページである場合に、辞書バッファ４３の使用量が辞書バッファ許容量を超えたときと最終ページのときには、辞書バッファ４３に辞書登録されたSymbol情報とSymbol位置情報が圧縮符号化部４６に出力されるとともに、辞書バッファ４３にバッファリングされたSymbol情報とSymbol位置情報が初期化される。なお、図１３に示される画像処理装置１の場合であっても、辞書化処理は図５に示される処理と同様である。 Accordingly, when the total number of pages to be subjected to lexicographic processing is 5, when the usage amount of the dictionary buffer 43 exceeds the dictionary buffer allowable amount and the last page, the symbol information registered in the dictionary buffer 43 as a dictionary is used. And the symbol position information are output to the compression encoding unit 46, and the symbol information and the symbol position information buffered in the dictionary buffer 43 are initialized. Even in the case of the image processing apparatus 1 shown in FIG. 13, the dictionary processing is the same as the processing shown in FIG. 5.

これにより、辞書バッファ４３の使用量を管理しながら辞書バッファ４３を効率的に利用し、一定の辞書バッファ４３の許容量を基準にして辞書化することができるとともに、最終ページにおいても辞書化のタイミングを制御することができる。特に、辞書バッファ４３のSymbol情報バッファ５６の使用量や辞書バッファ４３のSymbol位置情報バッファ５７の使用量が決められている場合に有効となる。その結果、ワークメモリ量をより効率的に削減する確保することができる。従って、バッファ使用量の増加を抑制するだけでなく、辞書とのマッチングに要する時間を低減しつつ、データを好適に圧縮することができる。 As a result, the dictionary buffer 43 can be efficiently used while managing the usage of the dictionary buffer 43, and the dictionary can be dictionaryd based on the allowable amount of the certain dictionary buffer 43. Timing can be controlled. This is particularly effective when the usage amount of the symbol information buffer 56 of the dictionary buffer 43 and the usage amount of the symbol position information buffer 57 of the dictionary buffer 43 are determined. As a result, it is possible to ensure that the amount of work memory is more efficiently reduced. Therefore, it is possible not only to suppress an increase in the buffer usage amount but also to suitably compress the data while reducing the time required for matching with the dictionary.

なお、本発明の実施形態において説明した一連の処理は、ソフトウェアにより実行させることもできるが、ハードウェアにより実行させることもできる。 The series of processes described in the embodiments of the present invention can be executed by software, but can also be executed by hardware.

また、本発明の実施形態では、フローチャートのステップは、記載された順序に沿って時系列的に行われる処理の例を示したが、必ずしも時系列的に処理されなくとも、並列的あるいは個別実行される処理をも含むものである。 In the embodiment of the present invention, the steps of the flowchart show an example of processing that is performed in time series in the order described. However, even if they are not necessarily processed in time series, they are executed in parallel or individually. The processing to be performed is also included.

１…画像処理装置、１１…制御部、１２…プリンタ駆動部、１３…画像データインタフェース、１４…ページメモリ、１５…画像処理部、１６…スキャナ部、１７…操作パネル、２１…ＣＰＵ、２２…ＲＯＭ、２３…ＲＡＭ、２４…バス、２５…ＨＤＤ、２６…外部通信部、３１…パネル制御部、３２…表示部、３３…操作キー、４０…辞書化対象データ抽出部、４１…Symbol一致判定部、４２…Symbol情報・Symbol位置情報登録部、４３…辞書バッファ、４４…辞書情報出力部、４５…辞書情報出力フラグ設定部、４６…圧縮符号化部、５１…Symbol比較部、５２…Symbol比較結果出力部、５３…Symbol情報・ＩＤ登録部、５４…ＩＤ・Symbol位置情報登録部、５５…セレクタ、５６…Symbol情報バッファ、５７…Symbol位置情報バッファ、５８…ゲート、５９…辞書バッファ初期化部、６１…辞書化タイミング判定部、６２…演算部。 DESCRIPTION OF SYMBOLS 1 ... Image processing apparatus, 11 ... Control part, 12 ... Printer drive part, 13 ... Image data interface, 14 ... Page memory, 15 ... Image processing part, 16 ... Scanner part, 17 ... Operation panel, 21 ... CPU, 22 ... ROM, 23 ... RAM, 24 ... Bus, 25 ... HDD, 26 ... External communication unit, 31 ... Panel control unit, 32 ... Display unit, 33 ... Operation key, 40 ... Dictionary data extraction unit, 41 ... Symbol match determination , 42 ... Symbol information / Symbol position information registration unit, 43 ... Dictionary buffer, 44 ... Dictionary information output unit, 45 ... Dictionary information output flag setting unit, 46 ... Compression encoding unit, 51 ... Symbol comparison unit, 52 ... Symbol Comparison result output unit 53... Symbol information / ID registration unit 54. ID / Symbol position information registration unit 55. Selector 56. Symbol information buffer 57. Symbol position information buffer 58. ... dictionary buffer initialization unit, 61 ... Dictionary of timing determination unit, 62 ... arithmetic unit.

Claims

A scanning unit that reads image data relating to the document;
An extraction unit for extracting data to be dictionaryd from the image data;
The lexicographic target data extracted by the extraction unit is registered as Symbol information, identification information is assigned to the Symbol information, and a Symbol position indicating where the Symbol information is present in the image data A registration unit for registering information in association with the identification information;
A dictionary buffer that holds the symbol information and the symbol position information registered by the registration unit;
A compression encoding unit that compresses and encodes the Symbol information and Symbol position information held in the dictionary buffer using a predetermined encoding method;
Whether to output the symbol information and the symbol position information held in the dictionary buffer to the compression encoding unit based on whether or not the number of lexicographic originals reaches a predetermined unit number A setting unit for setting the output presence / absence information to the first state or the second state;
An output unit configured to output the symbol information and the symbol position information held in the dictionary buffer to the compression encoding unit when the output presence / absence information is set to the first state by the setting unit;
When the lexical object data extracted by the extraction unit matches any of the Symbol information held in the dictionary buffer, the registration unit does not register the lexical object data as new Symbol information. The symbol position information is registered in association with the identification information assigned to the symbol information that matches the dictionary target data, while the dictionary target data extracted by the extraction unit is stored in the dictionary buffer. If it does not match any of the held symbol information, the registration unit registers the data to be dictionaryd as new symbol information, assigns the identification information to the symbol information, and sets the symbol position information. An image processing apparatus that registers and associates with the identification information.

The output unit does not output the symbol information and the symbol position information held in the dictionary buffer to the compression encoding unit when the output presence / absence information is set to the second state by the setting unit. The image processing apparatus according to claim 1.

When the symbol information and the symbol position information held in the dictionary buffer are output to the compression encoding unit by the output unit, the symbol information and the symbol position information held in the dictionary buffer are initialized. The image processing apparatus according to claim 1, further comprising an initialization unit.

The setting unit sets the output presence / absence information to the first state when the number of documented documents reaches the unit number, and the number of documented documents reaches the unit number. The image processing apparatus according to claim 1, wherein if there is no output, the output presence / absence information is set to the second state.

5. The image according to claim 4, wherein the setting unit sets the output presence / absence information to the first state when the number of documentized documents reaches the final number of processed documents. 5. Processing equipment.

The setting unit outputs the output presence / absence information from the first state to the second after the Symbol information and Symbol position information held in the dictionary buffer are output to the compression encoding unit by the output unit. The image processing apparatus according to claim 1, wherein the image processing apparatus is set to a state of

The setting unit sets the output presence / absence information to the first state when a usage amount of the dictionary buffer in which one or both of the Symbol information and the Symbol position information is held reaches a predetermined reference value. In addition, when the usage amount of the dictionary buffer in which one or both of the symbol information and the symbol position information is held does not reach a predetermined reference value, the output presence / absence information is set to the second state. The image processing apparatus according to claim 1, wherein:

The image processing apparatus according to claim 1, wherein the predetermined unit number is specified in accordance with a predetermined operation.

Scan image data about the document,
Extracting dictionary data from the image data,
Registering the extracted data to be dictionaryd as symbol information, assigning identification information to the symbol information, and identifying the symbol position information indicating where in the image data the symbol information exists Registered in association with information,
Hold the registered Symbol information and Symbol position information,
The symbol information and the symbol position information held are compression encoded using a predetermined encoding method,
Based on whether or not the number of lexicographic originals has reached a predetermined unit number, the presence / absence information on whether or not to output the symbol information and the symbol position information to be held is displayed in the first state or the first state. Set to state 2
When the output presence / absence information is set to the first state, the Symbol information and Symbol position information held in the dictionary buffer are output,
When the extracted lexical object data matches with any of the Symbol information held in the dictionary buffer, the lexical object data matches with the lexical object data without registering the lexical object data as new Symbol information. When the symbol position information is registered in association with the identification information assigned to the Symbol information, the extracted data to be dictionaryd does not match any of the Symbol information held in the dictionary buffer, An image processing method comprising: registering the lexical data as new symbol information, assigning the identification information to the symbol information, and registering the symbol position information in association with the identification information.

A scanning unit that reads image data relating to the document;
An extraction unit for extracting data to be dictionaryd from the image data;
The lexicographic target data extracted by the extraction unit is registered as Symbol information, identification information is assigned to the Symbol information, and a Symbol position indicating where the Symbol information is present in the image data A registration unit for registering information in association with the identification information;
A dictionary buffer that holds the symbol information and the symbol position information registered by the registration unit;
A compression encoding unit that compresses and encodes the Symbol information and Symbol position information held in the dictionary buffer using a predetermined encoding method;
Output control for outputting the symbol information and the symbol position information held in the dictionary buffer to the compression encoding unit based on whether the number of lexicographic originals reaches a predetermined unit number A generator for generating a signal;
In accordance with the output control signal generated by the generation unit, an output unit that outputs the Symbol information and Symbol position information held in the dictionary buffer to the compression encoding unit,
When the lexical object data extracted by the extraction unit matches any of the Symbol information held in the dictionary buffer, the registration unit does not register the lexical object data as new Symbol information. The symbol position information is registered in association with the identification information assigned to the symbol information that matches the dictionary target data, while the dictionary target data extracted by the extraction unit is stored in the dictionary buffer. If it does not match any of the held symbol information, the registration unit registers the data to be dictionaryd as new symbol information, assigns the identification information to the symbol information, and sets the symbol position information. An image processing apparatus that registers and associates with the identification information.