JP2008193580A

JP2008193580A - Information processing apparatus

Info

Publication number: JP2008193580A
Application number: JP2007028037A
Authority: JP
Inventors: Tomofumi Kitazawa; 智文北澤
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2007-02-07
Filing date: 2007-02-07
Publication date: 2008-08-21

Abstract

<P>PROBLEM TO BE SOLVED: To prevent the deterioration of an image read by an optical reading process by using information recorded in an original even when the capacitance of information recordable in the original is limited. <P>SOLUTION: The information processing apparatus includes: an image reading unit 101 for acquiring a document image from an original; a character layout information reading unit 104 for acquiring character layout information; a character recognizing unit 103 which performs character recognition processing on the acquired document image to acquire text data; and a character data constitution unit 109 which reconstitutes the acquired text data on the basis of the acquired character layout information to acquire restored electronic data. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、情報処理装置および紙状記録媒体に関し、より詳細には光学的読み取りによる文字画像の劣化を防止する技術に関する。 The present invention relates to an information processing apparatus and a paper-like recording medium, and more particularly to a technique for preventing deterioration of a character image due to optical reading.

原稿を光学的に読み取って原稿のデジタル画像（読み取り画像）を取得する際に、読み取り画像にボケや、ノイズが乗ることによる画質の劣化が生じることがある。また、読み取り時に原稿が傾くなどして、読み取り画像の品位を下げることもある。 When a digital image (read image) of a document is acquired by optically reading the document, image quality may be deteriorated due to blur or noise on the read image. Also, the quality of the read image may be lowered due to the document being inclined during reading.

このような読み取り画像の画質低下を防ぐ方法がいくつか提案されている。例えば、複写装置が、読み取り対象の原稿に対応する電子データを格納するサーバに接続されており、読み取り対象の原稿に印刷されたコード（バーコードや２次元バーコード）、や原稿に付加又は埋め込まれているＲＦＩＤチップ、から読み取った原稿を特定するための識別子（ＩＤ）を取得して、該原稿に対応する電子データをサーバから読み出してくることにより、読み取り画像の劣化を防ぐことが行われている。しかし、原稿に対応する電子データが格納されたサーバと切り離されているような複写装置では、サーバにアクセスすることができず、読み取り画像の劣化を防ぐことができないという問題があった。 Several methods have been proposed for preventing such deterioration of the read image quality. For example, a copying apparatus is connected to a server that stores electronic data corresponding to a document to be read, and a code (bar code or two-dimensional bar code) printed on the document to be read is added to or embedded in the document. By acquiring an identifier (ID) for specifying a document read from the RFID chip, and reading out electronic data corresponding to the document from the server, it is possible to prevent deterioration of the read image. ing. However, in a copying apparatus that is separated from a server that stores electronic data corresponding to a document, there is a problem in that the server cannot be accessed and deterioration of a read image cannot be prevented.

さらに、図２６のように原稿に付加又は埋め込んだＲＦＩＤチップに原稿に対応する電子データを記録しておいて、複写する際には、スキャナで光学的に読み取った読み取り画像に基づいて印刷するのではなく、ＲＦＩＤチップに記録されているデータを読み出し、読み出したデータに基づいて印刷を実行することにより、読み取り画像の劣化を防ぐというアイディアがある。しかし、原稿に付加又は埋め込むのに不都合のない大きさのＲＦＩＤチップに記録できるデータは、現状では数百バイトであり、原稿１ページ分の情報に対応する電子データを記録するには不十分である。 Further, as shown in FIG. 26, electronic data corresponding to a document is recorded on an RFID chip added or embedded in the document, and when copying, printing is performed based on a read image optically read by a scanner. Instead, there is an idea of preventing deterioration of the read image by reading data recorded on the RFID chip and executing printing based on the read data. However, data that can be recorded on an RFID chip of a size that is not inconvenient to add or embed in a document is currently several hundred bytes, which is insufficient to record electronic data corresponding to information for one page of a document. is there.

また、機械読み取り可能に、原稿に対応する電子データを記録する方法として、原稿を符号化して図２７や図２８のようにバーコードや２次元コードなどのパターンに置き換えて印刷し、光学的な読み取り手段（スキャナなど）で読み取ったパターンを復号して電子データを取得する方法も提案されているが、バーコードや２次元コードに置き換えることが可能な容量では、一般的な原稿に対応する電子データを記録するのに不十分というのが現状である。 Further, as a method of recording electronic data corresponding to an original so as to be machine-readable, the original is encoded and replaced with a pattern such as a barcode or a two-dimensional code as shown in FIGS. A method of acquiring electronic data by decoding a pattern read by a reading means (such as a scanner) has also been proposed. However, in a capacity that can be replaced with a barcode or a two-dimensional code, an electronic device corresponding to a general document is used. The current situation is that it is insufficient to record data.

また、図２９に示すように、原稿に対応する電子データを符号化し、背景ドットパターンとして埋め込んで印刷するという提案もあるが、原稿の文字の判読の邪魔にならないように背景ドットパターンを埋め込むためには、パターンが配置できる領域が限られてしまうので、大きな記憶容量を得ることはできない。 In addition, as shown in FIG. 29, there is a proposal of encoding electronic data corresponding to a document and embedding it as a background dot pattern for printing, but in order to embed a background dot pattern so as not to interfere with the interpretation of characters in the document. However, since the area where the pattern can be arranged is limited, a large storage capacity cannot be obtained.

上述したような記録方法は、在庫管理や流通管理などで、伝票の機械による読み出しや、ＦＡＸ番号を記録しておくなどの小さな記憶容量で済むような情報の埋め込みに限られているのが現状である。 The recording method as described above is limited to the embedding of information that requires only a small storage capacity such as reading a slip by a machine or recording a FAX number in inventory management or distribution management. It is.

ここで、オリジナル原稿が印刷されたオリジナル原稿部分と、オリジナル原稿をコード化したコード情報部分が、該印刷物におけるオリジナル原稿部分の余白に印刷された印刷物を利用者が複写機によりコピーする場合、複写機は、印刷物の中から、コード情報を取得し、コード情報を復号してディジタル情報を取得し、このディジタル情報からオリジナル原稿のイメージデータを生成して、コピーを行うことで、繰り返しコピーによる画像劣化を回避し無限にオリジナル原稿又は電子文書情報と同一の画像品質を維持し続ける画像形成システムを提供することができる技術がある（例えば、特許文献１参照）。
特開２００３−２４４４２４号公報 Here, when the user copies a printed material in which the original document portion on which the original document has been printed and the code information portion obtained by encoding the original document are printed on the margin of the original document portion in the printed material, The machine obtains code information from the printed material, decodes the code information to obtain digital information, generates image data of the original document from this digital information, and performs copying, thereby repeatedly copying the image. There is a technique that can provide an image forming system that can avoid deterioration and maintain the same image quality as the original document or electronic document information indefinitely (see, for example, Patent Document 1).
JP 2003-244424 A

しかしながら、上記特許文献１に記載の技術は、原稿に設けられた記録部に、原稿に対応する電子データのすべてを記録しておくため、非常に大きな容量を備えた記録部を必要とするという点において、上述の問題点を解決するものではない。 However, the technique disclosed in Patent Document 1 requires a recording unit having a very large capacity because all of the electronic data corresponding to the document is recorded in the recording unit provided in the document. However, the above-described problems are not solved.

そこで、本発明では、原稿に記録可能な情報の容量が制限された中においても、原稿に記録された情報を用いることで、光学的な読み取りによる読み取り画像の劣化を防ぐことを目的とする。 Therefore, an object of the present invention is to prevent deterioration of a read image due to optical reading by using information recorded on a document even when the capacity of information recordable on the document is limited.

請求項１記載の発明は、原稿から文書画像及び文字レイアウト情報を取得する読取手段と、読取手段で取得した文書画像に対して文字認識処理を実行してテキストデータを取得する文字認識処理手段と、文字認識処理手段で取得したテキストデータを、文字レイアウト情報に基づいて再構成して復元電子データを取得する再構成手段とを備えることを特徴とする情報処理装置である。 According to the first aspect of the present invention, a reading unit that acquires a document image and character layout information from a document, and a character recognition processing unit that executes text recognition processing on the document image acquired by the reading unit to acquire text data. An information processing apparatus comprising: reconstruction means for reconstructing text data acquired by the character recognition processing means based on character layout information to obtain restored electronic data.

請求項２記載の発明は、原稿から文書画像及び文字レイアウト情報を取得する読取手段と、読取手段で取得した文書画像に対して読取手段で取得した文字レイアウト情報に基づいて文字認識処理を実行し、テキストデータを取得する文字認識処理手段とを備えることを特徴とする情報処理装置である。 The invention according to claim 2 executes a character recognition process based on the character layout information acquired by the reading means for the document image acquired by the reading means and the document image acquired by the reading means from the document. An information processing apparatus comprising character recognition processing means for acquiring text data.

請求項３記載の発明は、請求項２記載の情報処理装置において、文字認識手段は、読取手段で取得した文字レイアウト情報に基づいて文字のテンプレートを生成するテンプレート生成部と、読取手段で取得した文書画像とテンプレート生成部にて生成されたテンプレートとを文字ごとに照合して、文字認識処理を行う文字認識処理部とからなることを特徴とする。 According to a third aspect of the present invention, in the information processing apparatus according to the second aspect, the character recognizing unit acquires the template generating unit that generates a character template based on the character layout information acquired by the reading unit, and the reading unit acquires the character template. A character recognition processing unit that performs character recognition processing by collating a document image with a template generated by a template generation unit for each character is characterized.

請求項４記載の発明は、請求項３記載の情報処理装置において、文字認識手段はさらに、読取手段で取得した文字レイアウト情報に基づいて文書画像から文字を切り出す文字分離部を含み、文字認識部は、文字分離部で文書画像から切り出された文字とテンプレート生成部で生成されたテンプレートとを照合して、文字認識処理を行うことを特徴とする。 According to a fourth aspect of the present invention, in the information processing apparatus according to the third aspect, the character recognition unit further includes a character separation unit that cuts out characters from the document image based on the character layout information acquired by the reading unit. Is characterized in that a character recognition process is performed by collating the character cut out from the document image by the character separation unit with the template generated by the template generation unit.

請求項５記載の発明は、請求項３または４記載の情報処理装置において、読取手段は、原稿から文書画像及び文字レイアウト情報に加えて、文書画像に含まれる所定の文字と該文字の原稿における位置を示す位置情報とが対応付けられた文字認識補助情報を取得し、文字認識手段は、文字認識部における文字認識処理において、処理対象の文字の原稿における位置に基づいて文字認識補助情報を参照し、文字認識補助情報に位置に対応する情報が含まれる場合には、文字認識補助情報に含まれる該位置に対応する文字を文字認識処理結果に代えて出力することを特徴とする。 According to a fifth aspect of the present invention, in the information processing apparatus according to the third or fourth aspect, in addition to the document image and the character layout information from the original, the reading unit is a predetermined character included in the document image and the original of the character. Character recognition auxiliary information associated with position information indicating the position is acquired, and the character recognition means refers to the character recognition auxiliary information based on the position of the character to be processed in the character recognition process in the character recognition unit. When the information corresponding to the position is included in the character recognition auxiliary information, the character corresponding to the position included in the character recognition auxiliary information is output instead of the character recognition processing result.

請求項６記載の発明は、請求項１乃至５のいずれか１項に記載の情報処理装置において、文字レイアウト情報は、文字のフォントを示す文字フォント情報と、文字のサイズを示す文字サイズ情報とを含むことを特徴とする。 According to a sixth aspect of the present invention, in the information processing apparatus according to any one of the first to fifth aspects, the character layout information includes: character font information indicating a character font; character size information indicating a character size; It is characterized by including.

請求項７記載の発明は、請求項６記載の情報処理装置において、文字レイアウト情報はさらに、文字に施された修飾を示す文字修飾情報を含むことを特徴とする。 According to a seventh aspect of the present invention, in the information processing apparatus according to the sixth aspect, the character layout information further includes character modification information indicating a modification applied to the character.

請求項８記載の発明は、請求項６または７に記載の情報処理装置において、文字レイアウト情報はさらに、文字列の範囲を示す範囲情報を含むことを特徴とする。 The invention according to claim 8 is the information processing apparatus according to claim 6 or 7, wherein the character layout information further includes range information indicating a range of the character string.

請求項９記載の発明は、請求項５に記載の情報処理装置において、文字認識補助情報に含まれる所定の文字は、文字認識処理部が誤認識する可能性がある文字であることを特徴とする。 The invention according to claim 9 is the information processing apparatus according to claim 5, wherein the predetermined character included in the character recognition auxiliary information is a character that may be erroneously recognized by the character recognition processing unit. To do.

請求項１０記載の発明は、請求項５または９に記載の情報処理装置において、文字認識処理部が誤認識する可能性がある文字は、文字のフォント又は文字に対する装飾の有無とに場合分けされて設定されることを特徴とする。 According to a tenth aspect of the present invention, in the information processing apparatus according to the fifth or ninth aspect, characters that may be erroneously recognized by the character recognition processing unit are classified according to the font of the character or the presence or absence of decoration for the character. It is characterized by being set.

請求項１１記載の発明は、請求項１乃至１０のいずれか１項に記載の情報処理装置において、読取手段は、原稿を光学的にスキャンして、文書画像及び文字レイアウト情報を取得することを特徴とする。 According to an eleventh aspect of the present invention, in the information processing apparatus according to any one of the first to tenth aspects, the reading unit optically scans a document to obtain a document image and character layout information. Features.

請求項１２記載の発明は、請求項１乃至１０のいずれか１項に記載の情報処理装置において、読取手段は、原稿を光学的にスキャンして文書画像を取得する第１の読取部と、原稿に付された記録部から文字レイアウト情報を取得する第２の読取部とからなることを特徴とする。 According to a twelfth aspect of the present invention, in the information processing apparatus according to any one of the first to tenth aspects, the reading unit includes a first reading unit that optically scans a document to obtain a document image; And a second reading unit that acquires character layout information from a recording unit attached to the document.

請求項１３記載の発明は、文書画像及び文字レイアウト情報を有する紙状記録媒体に印字して出力する情報処理装置であって、文書画像及び文字レイアウト情報情報は他の情報処理装置の読取手段により取得され、他の情報処理装置の文字認識処理手段により読取手段で取得した文書画像に対して文字認識処理を実行してテキストデータが取得され、他の情報処理装置の再構成手段により文字認識処理手段で取得したテキストデータを、文字レイアウト情報に基づいて再構成して復元電子データが取得されることを特徴とする紙状記録媒体に印字して出力する情報処理装置である。 According to a thirteenth aspect of the present invention, there is provided an information processing apparatus for printing on a paper-like recording medium having a document image and character layout information and outputting the document image and the character layout information by reading means of another information processing apparatus. Text data is acquired by executing character recognition processing on the document image acquired by the reading means by the character recognition processing means of the other information processing apparatus, and the character recognition processing is performed by the reconstruction means of the other information processing apparatus An information processing apparatus for printing and outputting on a paper-like recording medium characterized in that restored electronic data is obtained by reconstructing text data acquired by the means based on character layout information.

請求項１４記載の発明は、文書画像及び文字レイアウト情報を有する紙状記録媒体に印字して出力する情報処理装置であって、文書画像及び文字レイアウト情報情報は他の情報処理装置の読取手段により取得され、他の情報処理装置の文字認識処理手段により読取手段で取得した文書画像に対して読取手段で取得した文字レイアウト情報に基づいて文字認識処理が実行され、テキストデータが取得されることを特徴とする紙状記録媒体に印字して出力する情報処理装置である。 According to a fourteenth aspect of the present invention, there is provided an information processing apparatus for printing on a paper-like recording medium having a document image and character layout information, and outputting the document image and character layout information information by reading means of another information processing apparatus. The character recognition processing is executed based on the character layout information acquired by the reading unit on the document image acquired by the reading unit by the character recognition processing unit of the other information processing apparatus, and the text data is acquired. This is an information processing apparatus that prints and outputs a characteristic paper-like recording medium.

本発明によれば、原稿に記録可能な情報の容量が制限された中においても、原稿に記録された情報を用いることで、光学的な読み取りによる読み取り画像の劣化を防ぐことができる。 According to the present invention, even when the amount of information that can be recorded on a document is limited, it is possible to prevent deterioration of a read image due to optical reading by using the information recorded on the document.

次に、本発明の第１の実施の形態の構成について図面を参照して説明する。 Next, the configuration of the first exemplary embodiment of the present invention will be described with reference to the drawings.

本発明の第１の実施の形態の情報処理装置の構成例を示す図１を参照すると、情報処理装置は、画像読取部１０１、画像処理部１０２、文字認識部１０３、文字レイアウト情報読取部１０４、文字レイアウト情報復号部１０５、書体・配置情報抽出部１０６、書体・配置情報変換部１０７、書体・配置情報記録部１０８、文字データ構成部１０９、画像データ構成部１１０、データ記録部１１１、記録媒体１１２、データ転送部１１３およびデータ受信部１１４からなり、読み取り画像と機械読み取り用データ記録手段の情報から、自動的に文書データを構成できる。また、そのような原稿を印刷するための印刷部１１５も備える。 Referring to FIG. 1 showing a configuration example of the information processing apparatus according to the first embodiment of the present invention, the information processing apparatus includes an image reading unit 101, an image processing unit 102, a character recognition unit 103, and a character layout information reading unit 104. , Character layout information decoding unit 105, typeface / arrangement information extraction unit 106, typeface / arrangement information conversion unit 107, typeface / arrangement information recording unit 108, character data configuration unit 109, image data configuration unit 110, data recording unit 111, recording It comprises a medium 112, a data transfer unit 113, and a data receiving unit 114, and document data can be automatically constructed from information of a read image and machine reading data recording means. A printing unit 115 for printing such a document is also provided.

なお、原稿とは、文書画像、文字レイアウト情報とが記録されている紙状記録媒体を意味する（後述する本発明の第２、第３の実施の形態において同様である）。文字レイアウト情報は、紙状記録媒体に印刷されて記録されていても、紙状記録媒体に付加又は埋め込まれたＲＦＩＤなどに電子的に記録されていても良い。文書画像は印刷された文書データを意味する。文書データは、電子の世界におけるデータであって、図形、イメージ、文字、表などの文書データと、文書データ内における文字レイアウトデータとを少なくとも含む。 The original means a paper-like recording medium on which a document image and character layout information are recorded (the same applies to the second and third embodiments of the present invention described later). The character layout information may be printed and recorded on a paper recording medium, or may be electronically recorded on an RFID or the like added to or embedded in the paper recording medium. The document image means printed document data. The document data is data in the electronic world, and includes at least document data such as graphics, images, characters, and tables, and character layout data in the document data.

文字レイアウト情報は、文字レイアウトデータが原稿に記録されている状態における情報を意味する。本実施の形態では、文字レイアウトデータは、文書データのうち文字列のデータ（テキストデータ）の、文字フォント情報（文字のフォントを示す情報）、文字サイズ情報（文字のサイズを示す情報）、文字書体情報（文字の書体を示す情報）、配置情報（文字列が配置されている位置を示す情報）、範囲情報（文字列の範囲を示す情報。この情報に含まれている範囲の文字列データは、文字フォント情報、文字サイズ情報、文字書体情報、配置情報、が共通するように設定しておけば、記録に要する容量を少なくできる。）などが含まれる（後述する本発明の第２の実施の形態において同様である）。 The character layout information means information in a state where character layout data is recorded on a document. In the present embodiment, the character layout data includes character font information (information indicating character font), character size information (information indicating character size), character of character string data (text data) in the document data. Typeface information (information indicating the typeface of the character), placement information (information indicating the position where the character string is placed), range information (information indicating the range of the character string; character string data in the range included in this information (If the font font information, the character size information, the character font information, and the arrangement information are set in common, the capacity required for recording can be reduced). The same applies to the embodiment).

本実施の形態では、テキストデータとは、読み取り画像に対して文字認識処理を施して得られたデータを意味する。また、本実施の形態では、復元電子データとは、テキストデータを文字レイアウト情報に基づいて再構成した情報を意味する。 In the present embodiment, text data means data obtained by performing character recognition processing on a read image. In the present embodiment, the restored electronic data means information obtained by reconstructing text data based on character layout information.

読み取り画像とは、原稿を光学的に読み取った画像を意味する。 The read image means an image obtained by optically reading a document.

紙状記録媒体に付された機械読み取り用データ記録手段には、文字のフォントや大きさなどの文字の書体や行間隔や揃え方などの文字レイアウトに関する情報を記録する。例えば、本実施の形態の機械読み取り用データ記録手段に記録する情報の一例を示す図である図４の（ｂ）のように、行数と桁数とその位置のフォント、大きさなどの情報を記録する。 The machine-readable data recording means attached to the paper-like recording medium records information related to character layout such as character font, line spacing and alignment, such as character font and size. For example, as shown in FIG. 4B, which shows an example of information to be recorded in the machine reading data recording means of the present embodiment, information such as the number of lines, the number of digits, the font at that position, the size, etc. Record.

機械読み取り用データ記録手段としては、機械読み取り用データ記録手段の一例を示す図２６のようなＲＦＩＤチップ、機械読み取り用データ記録手段の一例を示す図２７のようなバーコードパターン、機械読み取り用データ記録手段の一例を示す図２８のような２次元コードパターン、機械読み取り用データ記録手段の一例を示す図２９のような背景ドットパターンなどが考えられる。機械が容易に情報を読み書きできる手段であればこれらに限定されるものではない。ＲＦＩＤに情報を記録する場合は、情報の読み書きを行うためのアンテナが必要である。文字レイアウト情報を埋め込んだ印刷コードパターンを印刷する場合は、情報の読み取りはスキャナで原稿の読み込みと同時にそれらの情報を読み取り、記録は、プリンタで、文書の印刷と同時に行う。 As the machine reading data recording means, an RFID chip as shown in FIG. 26 showing an example of the machine reading data recording means, a barcode pattern as shown in FIG. 27 showing an example of the machine reading data recording means, and machine reading data. A two-dimensional code pattern as shown in FIG. 28 showing an example of recording means, a background dot pattern as shown in FIG. 29 showing an example of machine reading data recording means, and the like can be considered. The present invention is not limited to these as long as the machine can easily read and write information. When recording information on RFID, an antenna for reading and writing information is required. When printing a print code pattern in which character layout information is embedded, the information is read simultaneously with the reading of the original by the scanner, and the information is read simultaneously with the printing of the document by the printer.

本実施の形態の文書データが印刷される場合の処理動作を図２を参照して説明する。情報処理装置は、印刷の命令を受けると（Ｓ２０１）、文書データから、文書内のフォントや文字の大きさ、配置に関する文字レイアウト情報を抽出する（Ｓ２０２）。印刷コードパターンに文字レイアウト情報を埋め込む場合には、文字レイアウト情報を符号化し、印刷コードパターンを生成する（Ｓ２０３）。文書の印刷時に、文書データと一緒に、その印刷コードパターンを印刷する。機械読み取り用データ記録手段が、ＲＦＩＤチップである場合は、図３のように、抽出した文字レイアウト情報を、予め規定された記録形式に情報を変換し、ＲＦＩＤに書き込む準備を行う。次に、文書データから印刷する画像データとしての復元電子データを変換して生成し（Ｓ２０４）、印刷を開始する（Ｓ２０５）。 A processing operation when the document data of this embodiment is printed will be described with reference to FIG. When the information processing apparatus receives a print command (S201), it extracts character layout information relating to the font, character size, and arrangement in the document from the document data (S202). When the character layout information is embedded in the print code pattern, the character layout information is encoded to generate a print code pattern (S203). When the document is printed, the print code pattern is printed together with the document data. When the machine-readable data recording means is an RFID chip, as shown in FIG. 3, the extracted character layout information is converted into a pre-defined recording format, and preparations for writing into the RFID are made. Next, the restored electronic data as image data to be printed is converted and generated from the document data (S204), and printing is started (S205).

文書データが印刷される場合の他の処理動作を示す図である図３では、図２と基本的に同様である（Ｓ３０１〜Ｓ３０４）が、印刷の直前に、ＲＦＩＤチップへの書き込みを行う（Ｓ３０５）順序になっているが、装置構成によっては、印刷の直後に、ＲＦＩＤチップへの書き込みを行うようにしてもよく、また、印刷と並行して、チップへの記録を行うようにしても良い。 FIG. 3 showing another processing operation when document data is printed is basically the same as FIG. 2 (S301 to S304), but writing to the RFID chip is performed immediately before printing (S301 to S304). S305) Although the order is the same, depending on the apparatus configuration, writing to the RFID chip may be performed immediately after printing, or recording to the chip may be performed in parallel with printing. good.

フォント情報も明朝、ゴシックといったものではなく、明朝体ならば半角のM、ゴシックならば半角のG、斜体ならば半角のI、太字ならば半角のB、と言ったように、対応する記号を決めておけば、字数が減るので文字レイアウトに関する情報量を小さくすることが出来る。最も多いフォントや文字の大きさを一回記録し、それとは異なるフォントや大きさになっている部分の情報を記録するようにすれば、記録するべき量を減らすことが可能である。 The font information is not Ming Dynasty or Gothic. Corresponds as I say half-width M for Mincho, half-width G for Gothic, half-width I for italic, half-width B for bold. If the symbols are determined, the number of characters can be reduced, so that the amount of information related to the character layout can be reduced. It is possible to reduce the amount to be recorded by recording the size of the most fonts and characters once and recording the information of the font or size different from that.

このようにして印刷したもの（図４の（ａ））は、画像読取部１０１と、文字レイアウト情報読取部１０４によって読み取られ、図４の（ｂ）ように、文字認識処理（＝ＯＣＲ処理）によって抽出したテキストデータと、機械読み取り用データ記録手段に記録された文字のフォント、大きさ、色などの文字レイアウト情報を組み合わせて、復元電子データを構成する（図４の（ｃ））。この復元電子データは、印刷（複写）されたり、テキストデータと文字のフォントなどの情報を含んだワープロのデータとして、他の記憶媒体に保存されたり、他の装置に転送される（図４の（ｄ））。 The printed matter (FIG. 4A) is read by the image reading unit 101 and the character layout information reading unit 104, and character recognition processing (= OCR processing) is performed as shown in FIG. The restored electronic data is constructed by combining the text data extracted by the above and the character layout information such as the font, size, and color of the characters recorded in the machine-readable data recording means (FIG. 4C). The restored electronic data is printed (copied), saved as a word processor data including information such as text data and character fonts, or transferred to another device (see FIG. 4). (D)).

それらの流れを示したものが、図５である。まず、ユーザーが、『複写』『送信』『保存』などのモードを選択すると（Ｓ５０１）、どのモードを選択しても、次に復元電子データの構成をはじめる（Ｓ５０３）。 The flow is shown in FIG. First, when the user selects a mode such as “copy”, “send”, or “save” (S501), regardless of which mode is selected, the restored electronic data starts to be configured (S503).

復元電子データ構成の工程では、スキャン（Ｓ５０３１）した読み取り画像から復元電子データを構成する。機械読み出し用記録部に記録されたフォントや文字の大きさ、文字の配置などの文字レイアウト情報を復号し（Ｓ５０３２）、光学的に読み取った読み取り画像に文字認識処理を施し（Ｓ５０３３）、テキストデータを生成する。文字認識によって生成したテキストデータと、機械読み出し用データ記録部からの文字レイアウト情報を組み合わせて、復元電子データを生成する（Ｓ５０３４）。 In the restoration electronic data construction step, restoration electronic data is constructed from the scanned scanned image (S5031). Character layout information such as the font, character size, and character arrangement recorded in the machine reading recording unit is decoded (S5032), and the optically read image is subjected to character recognition processing (S5033) to obtain text data. Is generated. The restored electronic data is generated by combining the text data generated by the character recognition and the character layout information from the machine reading data recording unit (S5034).

次に、モードの選択が判断され（Ｓ５０２）、『複写』モードが選択された場合は、枚数、縮尺などが更に入力され（Ｓ５０４、Ｓ５０５）、入力された条件に応じて、合成された復元電子データが印刷される（Ｓ５０６）。『送信』モードが選択された場合には、更に、送信先が入力され（Ｓ５０７、Ｓ５０８）、指示された送信先に復元電子データが送信される（Ｓ５０９）。また、『保存』モードが選択された場合には、ファイル名、データを保存するドライブや媒体、フォルダなどが入力され（Ｓ５１０、Ｓ５１１）、合成された復元電子データが保存される（Ｓ５１２）。 Next, the selection of the mode is judged (S502). If the "copy" mode is selected, the number of sheets, the scale, etc. are further input (S504, S505), and the combined restoration is performed according to the input conditions. Electronic data is printed (S506). When the “transmission” mode is selected, the transmission destination is further input (S507, S508), and the restored electronic data is transmitted to the designated transmission destination (S509). If the “save” mode is selected, a file name, a drive, a medium, a folder, etc. for saving data are input (S510, S511), and the synthesized restored electronic data is saved (S512).

また、元のデータである読み取り画像にアクセス可能な系の中で複写される場合には、ＲＦＩＤチップに記録されたりや印刷された印刷コードパターンに埋め込まれているＩＤから、その元データにアクセスして、元データを印刷するようにしてもよい。そうすれば、文字認識処理が不要となるためより高速で、確実な、画像劣化のない複写作業を行うことができる。
『複写』モードの場合は、読み出したフォントや配置に関する情報を再び、ＲＦＩＤチップや印刷コードパターンに書き込むようにしてもよい。機械読み取り用データ記録手段が印刷コードパターンの場合は、一度復号したデータをまた符号化して印刷すれば、印刷コードパターンを劣化させずに複写できる。 Also, when copying in a system accessible to the read image as the original data, the original data is accessed from the ID recorded in the RFID chip or embedded in the printed print code pattern. Then, the original data may be printed. By doing so, the character recognition process becomes unnecessary, so that a copying operation can be performed at a higher speed and more reliably without image deterioration.
In the “copy” mode, the read font and arrangement information may be written again in the RFID chip or the print code pattern. When the machine-readable data recording means is a print code pattern, it can be copied without degrading the print code pattern if the once decoded data is encoded and printed.

本実施の形態によれば、書類に設けられた記録部に、文書データのすべてを記録しておくわけではないので、限られた容量に必要な情報を残すことが出来る。紙状記録媒体に設けられた記憶手段に蓄えられた文字レイアウト情報と、文字認識処理によって得られるテキストデータを組み合わせることにより、複写による文字の画像劣化を防ぐことが出来る。 According to the present embodiment, not all document data is recorded in the recording unit provided in the document, so that necessary information can be left in a limited capacity. By combining the character layout information stored in the storage means provided on the paper-like recording medium and the text data obtained by the character recognition process, it is possible to prevent image deterioration of characters due to copying.

次に、本発明の第２の実施の形態について説明する。
通常の文字認識は、原稿を光学的に読み取った後で、文字認識処理が行われ、文字列情報を得る。本実施の形態では、テキストデータとは読み取り画像に対して、文字レイアウト情報に基づいて文字認識処理を施して得られたデータを意味する（後述する本発明の第３の実施の形態において同様である）。また、本実施の形態では、復元電子データとは、テキストデータを意味する。（後述する本発明の第３〜第７の実施の形態において同様である）。 Next, a second embodiment of the present invention will be described.
In normal character recognition, after a document is optically read, character recognition processing is performed to obtain character string information. In the present embodiment, the text data means data obtained by performing character recognition processing on the read image based on the character layout information (the same applies to the third embodiment of the present invention described later). is there). In the present embodiment, the restored electronic data means text data. (The same applies to the third to seventh embodiments of the present invention described later).

従来の文字認識処理の一般的な手順は、図６の通りである。まず、光電変換により、文字が画像データとして入力される（Ｓ６０１）。その後で、量子化などの処理が行われ、アナログデータをデジタルデータに変換する。前処理は、文字の認識を行いやすくするための処理を施す工程であり、文字を読み取る際に入り込んでしまう雑音の除去や画像の２値化が行われる（Ｓ６０２）。文字分離の工程では、文字を１つずつに分ける切り出しと、文字の大きさを所定の大きさに揃える正規化が行われる（Ｓ６０３）。 A general procedure of conventional character recognition processing is as shown in FIG. First, characters are input as image data by photoelectric conversion (S601). Thereafter, processing such as quantization is performed to convert analog data into digital data. The preprocessing is a process for facilitating character recognition, in which noise that enters when a character is read and image binarization are performed (S602). In the character separation step, segmentation is performed to divide the characters into one by one, and normalization is performed to align the character size to a predetermined size (S603).

文字分離処理がなされた文字データは次の特徴抽出部に送り込まれる。特徴抽出部ではその文字形状の特徴を抽出する（Ｓ６０４）。これは正規化しただけではデータとして次元数が高いため、その文字が表すより次元数の低いデータに圧縮することを行う。特徴抽出の方法としては、特徴抽出の一例を示す図７のように、文字を細線化した（１ドット幅に近づけ）芯線に基づく方法、特徴抽出の一例を示す図８のように輪郭に基づく（直線や曲線、相対位置等による）方法などがある。 The character data subjected to the character separation process is sent to the next feature extraction unit. The feature extraction unit extracts the feature of the character shape (S604). Since the number of dimensions is high as data just by normalization, the data is compressed into data having a lower number of dimensions than the character represents. As a feature extraction method, as shown in FIG. 7 showing an example of feature extraction, a method based on a core line in which characters are thinned (close to one dot width), and based on an outline as shown in FIG. 8 showing an example of feature extraction. There are methods (by straight lines, curves, relative positions, etc.).

次に、特徴が抽出された文字データは識別部に送られ、用意されている文字図形パターン（文字図形パターンを集めたものが図中の標準文字辞書である）との照合を行う（Ｓ６０５）。活字認識では、それらの特徴抽出を行わずに、入力されたパターンを直接、基準文字図形と比較して識別する方式も有る。後処理工程（Ｓ６０６）を設け、一般に形態素解析辞書に基づく、形態素解析により、文字認識が行われた文字列が、文として成り立っているかを確認することで、誤認識部分の発見を行おうとする方法もある。 Next, the character data from which the features have been extracted is sent to the identification unit and collated with a prepared character graphic pattern (a collection of character graphic patterns is a standard character dictionary in the figure) (S605). . In type recognition, there is also a method of identifying an input pattern by comparing it directly with a reference character figure without performing feature extraction. A post-processing step (S606) is provided, and it is attempted to find a misrecognized part by confirming whether a character string that has been recognized by a morphological analysis based on a morphological analysis dictionary is generally formed as a sentence. There is also a method.

従来は画像パターン認識処理のみで行っていた文字認識を、印刷コードパターンや、ＲＦＩＤチップに記録した、文字レイアウト情報を使うことにより、文字認識の精度と速度の向上に利用する。 Character recognition, which has conventionally been performed only with image pattern recognition processing, is used to improve the accuracy and speed of character recognition by using print code patterns and character layout information recorded in an RFID chip.

復元電子データ構成部分の処理の流れを示す図である図９に示すように、復元電子データの構成を行う。図５との違いは、復元電子データの構成に関する部分（Ｓ９０３）で、その前後の動作については同じである。 As shown in FIG. 9, which is a diagram showing the flow of processing of the restored electronic data configuration part, the restored electronic data is configured. The difference from FIG. 5 is the portion related to the structure of the restored electronic data (S903), and the operations before and after that are the same.

復元電子データを構成する工程（Ｓ９０３）では、原稿を読み込んだら（Ｓ９０３１）、機械読み取り用データ記録手段から、文字レイアウト情報を読み出し、復号する（Ｓ９０３２）。まず、文字の大きさやフォント情報を、文字の切り出しに利用する。使用されているフォントが、等幅フォントであれば、複雑なパターン認識処理をせずに、フォントや文字の大きさに関する情報使って、所定間隔で、切り分けていけば、文字分離を行うことができる（Ｓ９０３３）。そのため「奴」や「双」を女又、又又と読み誤ったり、文字分離に時間がかかったりといったことがなくなる。 In the step of constructing the restored electronic data (S903), when the original is read (S9031), the character layout information is read from the machine reading data recording means and decoded (S9032). First, character size and font information are used for character segmentation. If the font used is a monospaced font, character separation can be performed by separating the information at a predetermined interval using information on the font and character size without performing complicated pattern recognition processing. Yes (S9033). For this reason, there is no need to misread “Ten” or “Two” again or again, and it takes time to separate characters.

もし全角文字の中に半角文字が混じっていた場合には、その半角文字以降の字の切り出しがずれることを検出できる処理（文字判定で基準文字との一致率が連続して低くなる場合は、一致率が低くなり始めた位置の文字を半角として文字認識を行うなど）を入れれば、半角文字が混じっていても、正確に文字分離が行える。 If half-width characters are mixed in a full-width character, a process that can detect a shift in the cutout of the character after that half-width character (if the match rate with the reference character continuously decreases in character determination, If character recognition is performed with the character at the position where the matching rate starts to decrease as half-width characters), even if half-width characters are mixed, character separation can be performed accurately.

プロポーショナルフォントの場合は、一定ピッチで切り出しを行うことができないが、文字間隔（英文字フォントの場合は、特定部分の間隔になる場合もある）が同じになるので、文字の大きさが決まれば、文字間隔も決まってくる。そのため、決まった文字間隔となるように文字の切り出しを行うようにすればよいので、従来方式に比べ、簡単なパターン認識処理で済ますことができる。以上の点から、フォントや文字の大きさが予め分からずにパターン認識処理によってのみ文字の切り出しを行っていた場合に比べ、精度と速度が向上することが分かる。 Proportional fonts cannot be cut out at a fixed pitch, but the character spacing is the same (in the case of English fonts, it may be the spacing of specific parts), so if the character size is determined , Character spacing is also determined. For this reason, it is only necessary to cut out characters so as to have a fixed character interval, so that a simple pattern recognition process can be performed compared to the conventional method. From the above points, it can be seen that the accuracy and speed are improved as compared with the case where the character is cut out only by the pattern recognition process without knowing the font or character size in advance.

文字の切り出しが終わった後、従来方式では、図６に示されるように、文字の特徴抽出を行い、候補文字が絞られ、標準文字図形との比較で、最終的に文字の判別が行われる。文字判別の工程（Ｓ９０３７）でも印刷コードパターンやＲＦＩＤチップに記録された文字レイアウト情報を利用する。字は記録されていなくとも、書体や文字の大きさは分かるので、標準文字（基準テンプレート）を特定の書体、大きさに限定して、文字照合を行い、識別する。例えば、印刷された印刷コードパターンまたは、付けられたＲＦＩＤチップに記録された書体の情報から、文字認識の対象となる文字は、ゴシック体の１１ポイントで印刷されていると分かるとする。 After the character segmentation is finished, in the conventional method, as shown in FIG. 6, character feature extraction is performed, candidate characters are narrowed down, and finally character discrimination is performed by comparison with a standard character figure. . In the character discrimination step (S9037), the print code pattern and the character layout information recorded on the RFID chip are also used. Even if no character is recorded, the typeface and the size of the character can be known, so the standard character (reference template) is limited to a specific typeface and size, and character matching is performed for identification. For example, suppose that it is understood from the printed print code pattern or the typeface information recorded on the attached RFID chip that the character to be character-recognized is printed with 11 points of Gothic type.

「永」という字を読み出して、文字の判別をする場合を例にすると、図１０のように、読み出された字の画像と、ゴシック体、１１ポイントで候補の字（例えば、「永」「水」「氷」など）のテンプレートを生成し、対照する（Ｓ９０３６）。もし、印刷された文字が明朝体であれば、機械読み取り用データ記録手段に記録されたフォント情報から、そのことが分かるので、明朝体のテンプレートを生成させて、対照すればよい。対象の結果最も一致率の高い文字を選択する。テンプレート形状のフォントや文字の大きさを特定できるので、文字認識が正しく出来る場合に、形状の一致率が高くなる。読み込んだ文字の大きさと同じテンプレート画像を生成して比較すればよいので、従来必要であった正規化の工程が不要になる。これらにより、文字認識精度と処理速度を向上させることができる。 Taking the case of reading the character “eiga” and determining the character as an example, as shown in FIG. 10, the read character image and the Gothic font, a candidate character with 11 points (for example, “eiga”) A template of “water”, “ice”, etc.) is generated and contrasted (S9036). If the printed character is Mincho, it can be understood from the font information recorded in the machine-readable data recording means, so a Mincho template can be generated and compared. Select the character with the highest match rate as the target result. Since the font and character size of the template shape can be specified, when the character recognition can be performed correctly, the shape matching rate becomes high. Since a template image having the same size as the read character may be generated and compared, the normalization process that has been necessary in the past is unnecessary. As a result, character recognition accuracy and processing speed can be improved.

文字の大きさと書体の種類を１種類に固定すれば、文字認識の精度を向上させられることが知られているが、印刷されている文字に応じて、適切なテンプレートを用意するので、文書の中で複数種類のフォントを使ったとしても、決められたフォントを使ったときと同じ文字認識精度が得られる（情報処理装置側に準備のない特殊なフォントでなければ）。 It is known that the accuracy of character recognition can be improved by fixing the character size and typeface to one type. However, since an appropriate template is prepared according to the printed character, Even if multiple types of fonts are used, the same character recognition accuracy as when a predetermined font is used can be obtained (unless it is a special font that is not prepared on the information processing apparatus side).

また、一文字ずつ文字の識別が終わった時点でその文字のフォントや位置が分かるので、第１の実施の形態で説明した手順とは異なり、文字認識終了後に、改めて文字認識結果と、文字レイアウト情報の合成を行わない処理手順とすることも可能である。 Further, since the font and position of the character are known when the character is identified one character at a time, unlike the procedure described in the first embodiment, the character recognition result and the character layout information are newly changed after the character recognition is completed. It is also possible to adopt a processing procedure that does not synthesize.

その後の処理は、第１の実施の形態の図５に示すステップＳ５０３３、Ｓ５０３４と同じである。 Subsequent processing is the same as steps S5033 and S5034 shown in FIG. 5 of the first embodiment.

本実施の形態によれば、フォントや大きさが限定されるとテンプレートの形状が特定されるので文字認識の精度を向上させることが出来る。活字の文字認識処理は、文字認識の基準となる文字テンプレートと読み込んだ画像を比較する工程を入れて、文字認識の結果を判定しているものが多い。印刷されている文字と、文字テンプレートとして容易されている文字のフォントが異なると、例え同じ字であっても対照したときの一致率が低く、文字認識の精度を高くすることができないので有効である。 According to the present embodiment, the accuracy of character recognition can be improved because the shape of the template is specified when the font and size are limited. In many types of character recognition processing, the result of character recognition is determined by including a step of comparing a read image with a character template that serves as a character recognition reference. If the font of the character that is printed and the character that is easy as a character template are different, the matching rate is low even if the characters are the same, and it is effective because the accuracy of character recognition cannot be increased. is there.

次に、本発明の第３の実施の形態について説明する。
本実施の形態では、文字レイアウト情報は、文字レイアウトデータが原稿に記録されている状態における情報を意味する。本実施の形態では、文字レイアウトデータは、文書データのうち文字列のデータ（テキストデータ）の、文字フォント情報（文字のフォントを示す情報）、文字サイズ情報（文字のサイズを示す情報）、文字書体情報（文字の書体を示す情報）、配置情報（文字列が配置されている位置を示す情報）、範囲情報（文字列の範囲を示す情報。この情報に含まれている範囲の文字列データは、文字フォント情報、文字サイズ情報、文字書体情報、配置情報、が共通するように設定しておけば、記録に要する容量を少なくできる。）、さらに、文字装飾情報（文字に施された装飾を示す情報、例えば、アンダーラインや取り消し線など）を含む。
フォントや文字の大きさを機械読み取り用データ記録手段に記録しておくのと同様に、アンダーラインや取り消し線を施す部分を、機械読み取り用データ記録手段に記録して印刷する。 Next, a third embodiment of the present invention will be described.
In the present embodiment, the character layout information means information in a state where character layout data is recorded on a document. In the present embodiment, the character layout data includes character font information (information indicating character font), character size information (information indicating character size), character of character string data (text data) in the document data. Typeface information (information indicating the typeface of the character), placement information (information indicating the position where the character string is placed), range information (information indicating the range of the character string; character string data in the range included in this information If character font information, character size information, character typeface information, and arrangement information are set in common, the recording capacity can be reduced.) Furthermore, character decoration information (decoration applied to characters) (For example, underline or strikethrough).
Similar to recording the size of the font or character in the machine reading data recording means, the portion to be underlined or strike-through is recorded in the machine reading data recording means and printed.

アンダーラインや取り消し線があると記録された個所の文字認識を行うには、本来の文字に、取り消し線や網掛けなどを付けた文字テンプレートを用意し、読み取り画像と比較を行う。 In order to perform character recognition at the location where there is an underline or strikethrough, a character template with a strikethrough or shading added to the original character is prepared and compared with the scanned image.

例えば、図１１のように、取り消し線がある画像と、取り消し線の無い文字のテンプレートとを比較しようとしても、形が変わってしまい、一致率が下がる。そこで、テンプレート識別の際は、図１２のように本来の字の形に、取り消し線を加えた形の基準テンプレートを作成し、読み込んだ画像との比較を行う。取り消し線つきのテンプレートと比較することで、取り消し線が施された文字であっても、文字認識の精度を下げずに済む。 For example, as shown in FIG. 11, even if an image with a strikethrough is compared with a template of characters without a strikethrough, the shape changes and the matching rate decreases. Therefore, when identifying a template, a reference template is created by adding a strikethrough to the original character shape as shown in FIG. 12, and is compared with the read image. By comparing with a strike-through template, it is not necessary to reduce the accuracy of character recognition even for a strike-through character.

本実施の形態によれば、アンダーラインや取り消し線が有っても、アンダーラインのある文字テンプレートや取り消し線と重なった文字テンプレートを用意できるので、文字認識の精度を上げることが出来る。文章中で、部分的にアンダーラインを施したり、文書作成の履歴を残すために取り消し線として、１本か２本の線を文字に重ねたりすることもあるが、これらの線があることで、文字認識が正しく出来ない可能性があるので有効である。取り消し線が施された部分は、文字と取り消し線の分離が行い難く、アンダーラインや取り消し線が引かれている文字と、通常の（取り消し線と重なっていない）文字テンプレートと比較しても一致率が低いので、文字認識の精度が低下してしまうので有効である。 According to the present embodiment, even if there is an underline or strikethrough, a character template with an underline or a character template overlapping with a strikethrough can be prepared, so that the accuracy of character recognition can be improved. In a sentence, some lines may be underlined, or one or two lines may be superimposed on a character as a strikethrough to leave a history of document creation. This is effective because character recognition may not be performed correctly. Strikethrough parts are difficult to separate from characters and strikethrough, and even when compared with normal (not overlapping with strikethrough) character templates, characters that are underlined or strikethrough Since the rate is low, the accuracy of character recognition is reduced, which is effective.

次に、本発明の第４の実施の形態について説明する。
原稿は、正しく文字認識されるものがほとんどであるが、中には、画数が多い文字や、似たような形の文字がある（数字の０とアルファベットのＯや、ひらがなのへとカタカナのヘ、撤と撒など）などで、文字認識が出来なかったり、誤認識されたりする可能性の高い文字がある。 Next, a fourth embodiment of the present invention will be described.
Most of the manuscripts are recognized correctly, but there are characters with a large number of strokes and characters with similar shapes (number 0 and alphabet O, hiragana and katakana). There are characters that are likely to be unrecognizable or misrecognized due to, e.g.

なお、原稿とは、文書画像、文字レイアウト情報に加えて、文字認識補助情報が記録されている紙状記録媒体を意味する（後述する本発明の第５〜第７の実施の形態において同様である）。文字認識部は、文字認識処理に際して誤認識の可能性がある文字を予めリスト化したリスト（誤認識文字リスト）を記録手段に記録しておく。文書データが含む文字列データの文字に誤認識文字リストに含まれる文字があるかをチェックし、誤認識文字リストに含まれる文字については、その文字コードと位置を文字認識補助情報して原稿に記録する。つまり、文字認識補助情報とは、原稿に記録される情報であり、文字認識処理において誤認識される可能性がある文字の文字コードと位置とを示す情報である。文字認識補助情報は、文字レイアウト情報と同様に、紙状記録媒体に印刷されて記録されていても、紙状記録媒体に付加又は埋め込まれたＲＦＩＤなどに電子的に記録されていても良い。 The original means a paper-like recording medium on which character recognition auxiliary information is recorded in addition to a document image and character layout information (the same applies to fifth to seventh embodiments of the present invention described later). is there). The character recognition unit records in the recording means a list (misrecognized character list) in which characters that may be erroneously recognized during character recognition processing are listed in advance. Checks whether there is any character in the misrecognized character list in the character string data included in the document data. Record. That is, the character recognition auxiliary information is information recorded on a document, and is information indicating a character code and a position of a character that may be erroneously recognized in the character recognition process. Similar to the character layout information, the character recognition auxiliary information may be printed and recorded on a paper-like recording medium, or may be electronically recorded on an RFID or the like added to or embedded in the paper-like recording medium.

本実施の形態では、テキストデータとは読み取り画像に対して、文字レイアウト情報と文字認識補助情報に基づいて文字認識処理を施して得られたデータを意味する（後述する本発明の第４〜第７の実施の形態において同様である）。 In the present embodiment, the text data means data obtained by subjecting a read image to character recognition processing based on character layout information and character recognition auxiliary information (fourth to fourth of the present invention described later). The same applies to the seventh embodiment).

本実施の形態の動作を示す図１３を参照すると、印刷の命令を受けると（Ｓ１３０１）、文字レイアウト情報を抽出する（Ｓ１３０２）前後に、印刷される文章中に文字認識が困難な文字がないかを検索する（Ｓ１３０３）。もし、文字認識が困難な文字が文書中に有る場合には、その文字の位置と文字コードを文字認識補助情報として、文字レイアウト情報を記録するのと同様に、印刷コードパターンやＲＦＩＤチップなどの記録手段に記録する。 Referring to FIG. 13 showing the operation of the present embodiment, when a print command is received (S1301), there is no character that is difficult to recognize in the printed text before and after extracting the character layout information (S1302). (S1303). If there is a character in the document that is difficult to recognize, the character position and character code are used as auxiliary information for character recognition, and the character code information such as a print code pattern or RFID chip is recorded. Record in recording means.

印刷された文書を読み取って、文書データを再現する工程を示す図１４を参照すると、文字レイアウト情報と文字認識補助情報を読み出し（Ｓ１４０３２）、文字認識が困難として印刷コードパターンやＲＦＩＤチップに記録された文字に関しては、文字認識処理によらずに、記録しておいた文字と文字認識補助情報に含まれる位置情報を使ってテキストデータを作成する（Ｓ１４０３６）。文字認識が困難ではないとして、印刷コードパターンやＲＦＩＤチップに記録されていない文字に関しては、文字認識処理を行い（Ｓ１４０３５）、テキストデータを生成する。すべての文字についてこれらの処理を行う（Ｓ１４０３７）。文字認識が難しい文字は、印刷コードパターンや、ＲＦＩＤチップに記録されるので、記録されている文字は、誤認識されることがなくなる。 Referring to FIG. 14 showing the process of reading the printed document and reproducing the document data, the character layout information and the character recognition auxiliary information are read (S14032) and recorded in the print code pattern or RFID chip as character recognition is difficult. With respect to the detected character, text data is created using the recorded character and the position information included in the character recognition auxiliary information without using the character recognition process (S14036). Assuming that character recognition is not difficult, character recognition processing is performed for characters not recorded in the print code pattern or RFID chip (S14035), and text data is generated. These processes are performed for all characters (S14037). Characters that are difficult to recognize are recorded on a print code pattern or an RFID chip, so that the recorded characters are not erroneously recognized.

図１３に戻り、文字認識（機械読み取り用データ記録手段のデータ挿入を含めて）が終わったら復元電子データを変換して生成し（Ｓ１３０５）、ＲＦＩＤチップへの書き込みを行い（Ｓ１３０６）、印刷を開始する（Ｓ１３０７）。 Returning to FIG. 13, when character recognition (including data insertion of the machine reading data recording means) is completed, the restored electronic data is converted and generated (S1305), written to the RFID chip (S1306), and printed. Start (S1307).

本実施の形態によれば、画数が多かったり、似たような形の字があったりで、読み込んだ文字が、誤って認識され、間違って印刷される可能性を減らすことが出来る。画数が多い文字や、形の似ている文字がある場合などは、機械が文字認識しにくかったり、誤認識したりすることがあり、文字認識に時間がかかったり、文字認識に失敗して、間違った文字で、文書データが構成されたり、印刷されたりしてしまうので有効である。 According to the present embodiment, it is possible to reduce the possibility that a read character is erroneously recognized and erroneously printed due to a large number of strokes or a similar character. If there are characters with a large number of strokes or characters that are similar in shape, the machine may be difficult to recognize characters or misrecognize, it takes time to recognize characters, or character recognition fails, This is effective because the document data is composed or printed with wrong characters.

次に、本発明の第５の実施の形態について説明する。
頻繁に文字の書体や大きさなどを変えたりする場合は、文字レイアウト情報が増えてしまい、文字レイアウト以外の情報をわずかしか記録できなくなる。そこで、図１５のように、文字認識の難易度のリストを用意しておき、情報を埋め込んだ印刷コードパターンや、ＲＦＩＤチップの記憶部の空き容量に応じて、文字認識が難しい順に、文字とその位置の情報を記録していく。例えば、ひらがなのヘべぺとカタカナのヘベペ、ひらがなのりとカタカナのリ、カタカナのタと漢字の夕などは、どの組み合わせも、誤認識される可能性が高い。しかし、その中でも、ひらがなのヘべぺとカタカナのヘベペが最も難しいので、文章中に、「へ」と「タ」が有る場合は、「へ」を優先して記録していく。機械読み取り用データ記録手段の容量が足りない場合は優先順位の低い文字（リストアップされた中では文字認識失敗の可能性が低い文字）については機械読み取り用データ記録手段への記録を行わないようにする。 Next, a fifth embodiment of the present invention will be described.
If the font type and size of characters are frequently changed, the character layout information increases, and only information other than the character layout can be recorded. Therefore, as shown in FIG. 15, a list of character recognition difficulty levels is prepared, and according to the print code pattern in which information is embedded and the free space in the storage part of the RFID chip, the character The position information is recorded. For example, Hiragana Hebepe and Katakana Hebepe, Hiragana Nori and Katakana Li, Katakana Ta and Kanji Evening are highly likely to be misrecognized. However, among them, Hiragana Hebepe and Katakana Hebepe are the most difficult, so if there are "he" and "ta" in the sentence, "he" is recorded with priority. When the capacity of the machine-readable data recording means is insufficient, the low-priority characters (characters with a low possibility of character recognition being listed) are not recorded in the machine-readable data recording means. To.

文字ごとの文字認識の難易度は、文字認識の方法などよっても変わるので、予め順位付けしておく必要がある。 Since the difficulty level of character recognition for each character varies depending on the character recognition method and the like, it is necessary to rank them in advance.

本実施の形態によれば、文字認識を誤りやすい文字のみを記録しておくことで、限られた記録容量で、文字認識の失敗を防ぐことが出来る。印刷コードパターンやＲＦＩＤチップの容量が限られているので、文字レイアウト情報のほかに、認識が難しい文字に関する情報をすべて記録しておくことが出来なくなる可能性があるので有効である。 According to the present embodiment, it is possible to prevent failure of character recognition with a limited recording capacity by recording only characters that are likely to be erroneously recognized. Since the print code pattern and the capacity of the RFID chip are limited, it is effective because it may not be possible to record all information related to characters that are difficult to recognize in addition to the character layout information.

次に、本発明の第６の実施の形態について説明する。
本実施の形態での文字認識補助情報は、上記の第４の実施の形態の文字認識補助情報と異なる点はないが、文字認識部が記録している誤認識文字リストが異なる。文字認識部は、文字認識処理に際して誤認識の可能性がある文字を、フォント別に予めリスト化したリスト（誤認識文字リスト）を記録手段に記録しておく。文字認識部は、文書データが含む文字列データの文字とその文字のフォントとを取得して、フォント別に用意された誤認識文字リストにその文字が含まれるかをチェックし、誤認識文字リストに含まれる文字については、その文字コードと位置を文字認識補助情報して原稿に記録する。文字認識補助情報とは、原稿に記録される情報であり、文字認識処理において誤認識される可能性がある文字の文字コードと位置とを示す情報である。 Next, a sixth embodiment of the present invention will be described.
The character recognition auxiliary information in the present embodiment is not different from the character recognition auxiliary information in the fourth embodiment described above, but the misrecognized character list recorded by the character recognition unit is different. The character recognition unit records, in the recording unit, a list (character recognition character list) in which characters that may be erroneously recognized during character recognition processing are listed in advance for each font. The character recognition unit acquires the character of the character string data included in the document data and the font of the character, checks whether the character is included in the misrecognized character list prepared for each font, and enters the misrecognized character list. For the included characters, the character code and position are recorded on the document as character recognition auxiliary information. The character recognition auxiliary information is information recorded on a document and is information indicating a character code and a position of a character that may be erroneously recognized in the character recognition process.

フォントの種類によっても、難易度の順番が変わるので、図１６のように、フォントごとに、異なる難易度表（誤認識文字リスト）を用意するようにする。使用されるフォントに応じて、文字認識が失敗する可能性の高い文字を優先的に機械読み取り用の記録部に記録していく。ゴシック体と明朝体を比較した場合を例に考えると、へへぺ（ひらがな）とヘベペ（カタカナ）、タ（カタカナ）と夕（漢字）の区別は、ゴシック体よりも明朝体の方が難しい。それに対し、ー（長音記号）と一（漢字）、ト（カタカナ）と卜（漢字）はゴシック体よりも明朝体の方が判別しやすい。そこで、同ページの中の文字に明朝体とゴシック体が混じって印刷されており、文中に明朝体の一（漢字）と、ゴシック体の一（漢字）がある場合には、明朝体の一（漢字）よりも、ゴシック体の一（漢字）を優先して、機械読み取り用の記録部へ記録する。 Since the order of difficulty varies depending on the type of font, a different difficulty level table (misrecognized character list) is prepared for each font as shown in FIG. Depending on the font used, characters that are likely to fail character recognition are preferentially recorded in the machine reading recording unit. Considering the case where Gothic and Mincho are compared, for example, hehepe (Hiragana) and Hebepe (Katakana), ta (Katakana) and evening (Kanji) are more distinctive than the Gothic Is difficult. On the other hand, it is easier to distinguish the Mincho type than the Gothic type for-(long sign) and one (Kanji), to (Katakana) and 卜 (Kanji). Therefore, if the characters on the same page are printed with a mixture of Mincho and Gothic, and if there is one Mincho (Kanji) and one Gothic (Kanji) in the sentence, Mincho A Gothic character (kanji) is given priority over a physical character (kanji), and is recorded in the recording unit for machine reading.

記録部の容量が足りなくなる場合は、第５の実施の形態と同様に、優先順位の低い文字の記録を行わない。 When the capacity of the recording unit is insufficient, characters of low priority are not recorded as in the fifth embodiment.

本実施の形態によれば、フォントにより、判別しやすい文字とそうではない文字が異なるので、フォントごとに、機械読み取り用の記録部に記録する優先順位を変えることで、文字認識の失敗を減らすことができる。同じ文字であってもフォントの種類によって、文字認識の難易度が異なるので、一律に難易度を設定した場合に、フォントの種類によっては、難易度設定が適さない場合があるので有効である。 According to the present embodiment, the easily distinguishable character and the character that is not so differ depending on the font. Therefore, the failure of character recognition is reduced by changing the priority order to be recorded in the machine reading recording unit for each font. be able to. Even if the characters are the same, the difficulty of character recognition differs depending on the type of font. Therefore, when the difficulty level is uniformly set, the difficulty level setting may not be suitable depending on the type of font, which is effective.

次に、本発明の第７の実施の形態について説明する。
本実施の形態での文字認識補助情報は、上記の第４、第６の実施の形態の文字認識補助情報と異なる点はないが、文字認識部が記録している誤認識文字リストが異なる。文字認識部は、文字認識処理に際して誤認識の可能性がある文字を、文字の装飾情報の有る無しに場合分けしてリスト化したリスト（誤認識文字リスト）を記録手段に記録しておく。文字認識部は、文書データが含む文字列データの文字とその文字の装飾情報とを取得して、装飾の有る無しに場合分けされて用意された誤認識文字リストにその文字が含まれるかをチェックし、誤認識文字リストに含まれる文字については、その文字コードと位置を文字認識補助情報して原稿に記録する。 Next, a seventh embodiment of the present invention will be described.
The character recognition auxiliary information in this embodiment is not different from the character recognition auxiliary information in the fourth and sixth embodiments described above, but is different in the misrecognized character list recorded by the character recognition unit. The character recognition unit records, in the recording unit, a list (misrecognized character list) in which characters that may be erroneously recognized during character recognition processing are classified according to cases where there is no character decoration information. The character recognition unit acquires the character of the character string data included in the document data and the decoration information of the character, and determines whether the character is included in the misrecognized character list prepared separately for cases where there is no decoration. When the character is included in the misrecognized character list, the character code and position are recorded on the document as character recognition auxiliary information.

取り消し線があることが分かっていても、取り消し線が引かれることにより機械による文字認識の影響を受けやすい文字は取り消し線の有無によって文字認識の難易度が変わるリストを作っておき、印刷コードパターンやＲＦＩＤチップなどの記録部に記録される優先順位を取り消し線の有無により違えるようにする。 Even if it is known that there is a strikethrough, a list of characters that are easily affected by machine character recognition due to the strikethrough is created by changing the difficulty level of the character recognition depending on the presence or absence of the strikethrough. The priority order recorded in the recording unit such as the RFID chip is made different depending on the presence or absence of the strikethrough.

例えば、図１７のように、誤認識文字リストを作っておく。門、閂という文字は、取り消し線が無い場合や取り消し線があっても縦書きの場合は、識別が容易なので、文字認識難度を低く登録されるが、横書きで、取り消し線が引かれた場合は、識別が困難になるので、文字認識難度を高く登録される。牛と午のようにもともと、間違う可能性の高い文字は、縦書きの取り消し線アリの場合は更に難度が高くなるように設定しておく。 For example, a misrecognized character list is created as shown in FIG. The characters gate and 閂 are registered with low character recognition difficulty because there is no strikethrough or when there is a strikethrough but vertical writing, so the character recognition difficulty is registered low, but when strikethrough is drawn in horizontal writing Since it becomes difficult to identify, the character recognition difficulty is registered high. Characters that are likely to be mistaken, such as cows and noon, are set to be more difficult in the case of vertical strikeout ants.

機械読み取り用データ記録手段の空き容量に応じて、文字認識難度の高い順から、機械読み取り用データ記録手段に記録する。 According to the available capacity of the machine reading data recording means, the data is recorded in the machine reading data recording means in descending order of character recognition difficulty.

また、フォントの種類や文字大きさなどによっては、取り消し線の影響が小さいものもあるので、取り消し線がある場合でも、フォントや大きさによって文字認識難易度の設定を変えるようにしても良い。例えば図１８に示すように、弌は、フォントが正楷書体の場合、１本の取り消し線の場合はしきがまえの中の横線は取り消し線とほとんど重ならなくなるので、取り消し線の有無により、文字認識難度は変わらない。そのため、記録部の容量によっては、優先順位が下がるので記録しないという調整も可能である。そこで、図１９のように、フォントごとに、取り消し線の有無による難易度の設定を変えるようにしてもよい。 Also, depending on the type of font and the size of the character, the influence of the strikethrough is small, so even if there is a strikethrough, the character recognition difficulty setting may be changed depending on the font and size. For example, as shown in FIG. 18, when the font is a regular font, the horizontal line in the foreground does not overlap with the strikethrough in the case of a single strikethrough. The difficulty does not change. Therefore, depending on the capacity of the recording unit, it is possible to adjust so that the recording is not performed because the priority is lowered. Therefore, as shown in FIG. 19, the setting of the difficulty level depending on the presence or absence of a strikethrough may be changed for each font.

本実施の形態によれば、取り消し線やアンダーラインの有無により判読難易度が異なる文字の場合、取り消し線やアンダーラインの有無により限られた機械読み取り用データ記録手段に記録する文字の優先順位が変わるので、限られた記憶容量を有効に活用できる。文字を構成する横線の一部が取り消し線の位置に近い文字の場合、取り消し線と、文字を構成する線が重なったり、接近したりすることで、文字認識が難しくなるので有効である。例えば、図３０のように門と閂は、取り消し線と閂のもんがまえの中の一が重なる。そのため、取り消し線が有ると分かっている場合でも、その文字が門なのか、閂なのか判定しにくくなる。同様に弋と弌も、取り消し線としきがまえの中の一が重なる場合に、その文字が弋か弌か判定し難い。また、縦書きの場合は、取り消し線が、ちょうど、文字の中心に縦線が通るので、図３１のように一と十や、大と木などの文字は、取り消し線があると分かっていても、判定が難しくなるので有効である。 According to the present embodiment, in the case of characters having different legibility due to the presence or absence of strikethrough or underline, the priority order of characters to be recorded in the machine reading data recording means limited by the presence or absence of strikethrough or underline Because it changes, the limited storage capacity can be used effectively. When a part of the horizontal line constituting the character is a character close to the position of the strike-through line, the strike-through line and the line constituting the character overlap or approach each other, which makes it difficult to recognize the character. For example, as shown in FIG. 30, the gate and the fence overlap the strikethrough and the one in front of the fence. For this reason, even if it is known that there is a strikethrough, it is difficult to determine whether the character is a gate or a trap. Similarly, it is difficult to determine whether a character is 弋 or 弌 when 取り消し and 弌 overlap each other in the strikethrough and the previous one. In addition, in the case of vertical writing, the strikethrough line passes through the center of the character, so it is known that characters such as 1 and 10 or large and wood have strikethrough lines as shown in FIG. Is effective because it is difficult to determine.

次に、本発明の第８の実施の形態について説明する。
図２０のように、中央揃えの行があったり、左詰めの行が有ったりする書類も、印刷コードパターンやＲＦＩＤチップに配置情報のすべてを記録するのではなく、文字認識の際に、図２１のように文字の配置から、空白部分を文字のスペースに置き換えて配置情報を検出する。特殊なレイアウトで無い限り、文字認識処理の文字抽出精度が得られれば、文字の配置から、原稿の文字レイアウト情報を得ることができる。 Next, an eighth embodiment of the present invention will be described.
As shown in FIG. 20, a document having a center-aligned line or a left-justified line does not record all the arrangement information on the print code pattern or the RFID chip, but at the time of character recognition. As shown in FIG. 21, the arrangement information is detected by replacing the blank part with the character space from the character arrangement. Unless the layout is special, if the character extraction accuracy of the character recognition process can be obtained, the character layout information of the document can be obtained from the character arrangement.

中央揃えの行があったとしても、その行が中央揃えであることを検出せずに、行頭から複数文字分の空白があると認識する。厳密にはもとの文書データを再現したことにはならないが、多くの場合問題はない。文字配列に関する情報を減らす分、文字認識の難しい字に関する情報をより多く記録することができる。 Even if there is a center-aligned line, it does not detect that the line is center-aligned, and recognizes that there is a space of multiple characters from the beginning of the line. Strictly speaking, the original document data is not reproduced, but in many cases there is no problem. The amount of information related to characters that are difficult to recognize can be recorded by reducing the amount of information related to character arrangement.

本実施の形態によれば、文字認識の際に文字の配列情報も検出できるので、予め文字配列情報を記録せずに済む。そのため、記録部に別の情報を記録することが出来る。文書の各行の配置が、中央揃えになったり、左詰めになったり右詰めになったりするなど、頻繁に、各行の配置が変わる場合は、文字の配置に関する情報が大きくなり、後述するような文字認識の確度を上げるための情報を多く記録できなくなってしまうので有効である。 According to the present embodiment, since character arrangement information can also be detected during character recognition, it is not necessary to record character arrangement information in advance. Therefore, other information can be recorded in the recording unit. If the layout of each line changes frequently, such as when the layout of each line in the document is center-aligned, left-justified, or right-justified, the information related to the text layout becomes large, as described below. This is effective because a large amount of information for improving the accuracy of character recognition cannot be recorded.

次に、本発明の第９の実施の形態について説明する。
通常の複写のように、読み込んだ画像を文字認識すること無しに、そのまま印刷する複写モードと、読み込んだ画像を文字認識して、文字の部分の画像劣化が生じないように、文書データを構成して印刷するモードを用意しておき、ユーザーが選択できるようにする。 Next, a ninth embodiment of the present invention will be described.
As in normal copying, the document data is configured so that the copied image is printed as it is without recognizing the read image, and the read image is recognized so that image deterioration of the character portion does not occur. Then, prepare a printing mode so that the user can select it.

紙の文書から読み込んで、ワープロ文書のデータを作成したり、複写された文書が更に複写される可能性が高かったり、複写による画像劣化を嫌ったりする場合は文字認識、文書データ構成の工程を含んだモードで複写し、急いで複写を行いたい場合は、従来の複写機と同様に、光学的に読み込んだ像を文字変換することなく画像として印刷する。複写モードだけではなく、送信モードや、保存モードを選択した場合には、読み込んだ画像をそのまま画像データとして送信したり保存したり、文字認識、文書データ構成処理により構成した文書データを送信したり保存したりできる。 If you are reading from a paper document and creating word processing document data, or if there is a high possibility that the copied document will be further copied, or if you do not like image deterioration due to copying, you can perform the character recognition and document data composition process. When copying in the included mode and wanting to make a quick copy, the optically read image is printed as an image without character conversion, as in a conventional copying machine. When not only the copy mode but also the send mode or save mode is selected, the scanned image can be sent and saved as image data as it is, or the document data composed by character recognition and document data composition processing can be sent Or save it.

本実施の形態によれば、電子的な再編集工程を入れないことを選択するようにして、複写速度を落とさずにすむモードを選択できる。文字認識処理して新たに文書データを構成すると、文字画質の低下を防ぐことが出来るが時間がかかってしまい、そのため、画質よりも、コピーの早さを優先したいユーザーにとっては都合が悪いので有効である。 According to the present embodiment, it is possible to select a mode that does not reduce the copying speed by selecting not to include an electronic re-editing step. If new document data is constructed by character recognition processing, it is possible to prevent deterioration in character image quality, but it takes time, so it is inconvenient for users who want to prioritize copying speed over image quality. It is.

次に、本発明の第１０の実施の形態について説明する。
文字認識の失敗などにより紙を介して文書データを構成したデータと元データである読み取り画像が違う可能性があることを明確にするために、コピーしたことを示す情報を、複写印刷された場合は、表示部や、機械読み取り用の記録部に記録し、電子データとして保存される場合は、復元電子データのタグ情報として記録する。 Next, a tenth embodiment of the present invention will be described.
In order to clarify that there is a possibility that the data that made up the document data via paper and the scanned image that is the original data may be different due to a failure in character recognition, etc. Is recorded on the display unit or the machine reading recording unit, and when it is stored as electronic data, it is recorded as tag information of the restored electronic data.

図２２のように、「この文書は、読み込んだ文字を、文字認識処理して印刷しています」、「文字認識には失敗の可能性があります。」などの文言を、頭注、脚注として印字し、通常の複写や、元データである読み取り画像からの印刷ではないことをしめす（視覚認識のための表示部に記録する場合）。 As shown in Fig. 22, this document is printed as headnotes and footnotes such as "This document is printed after character recognition processing" and "Character recognition may fail." However, it indicates that it is not normal copying or printing from the read image that is the original data (when recording on a display unit for visual recognition).

あるいは、復元電子データを印刷する場合に、ＲＦＩＤチップや印刷コードパターンに記録された、紙に付された固有ＩＤとそのＩＤの付いた紙を読み取って作ったコピーであることなどを示す情報を、複写された紙に付けられたＲＦＩＤチップに記録したり、印刷コードパターンにその情報を埋め込んだりする。 Alternatively, when printing the restored electronic data, information indicating the unique ID attached to the paper and the copy made by reading the paper with the ID recorded on the RFID chip or the print code pattern, etc. The information is recorded on an RFID chip attached to the copied paper, or the information is embedded in a print code pattern.

図２３のように印刷コードパターンにその情報を入れて印刷したりする。図２３は、元データである読み取り画像を印刷するときに、例えば、ＩＤとして"７１７３４６"が付けられ、そのＩＤが付けられた文書が、元データである読み取り画像にアクセスできない環境で、読み取られ、複写された場合に、印刷コードパターンにＩＤ"７１７３４６"をコピーしたことを示すデータを印刷コードパターンに埋め込んだことを示している。例えば"７１７３４６"文書を前述の処理を行った場合には、"７１７３４６Ｃ"のようなＩＤを付ける。更に、"７１７３４６Ｃ"を前述の処理によって複写印刷した場合には、"７１７３４６ＣＣ"とするなど、何代目のコピーであるか分かるようにすることも考えられる。 As shown in FIG. 23, the information is printed in a print code pattern. In FIG. 23, when a read image that is original data is printed, for example, “717346” is added as an ID, and the document with the ID is read in an environment where the read image that is the original data cannot be accessed. In the case of copying, data indicating that ID “717346” has been copied into the print code pattern is embedded in the print code pattern. For example, when the above-described processing is performed on a “717346” document, an ID such as “717346C” is attached. Further, when “717346C” is copied and printed by the above-described processing, it is conceivable to make it possible to know the copy of the generation, such as “717346CC”.

また、読み取り画像から復元電子データを生成し保存する場合、構成された復元電子データのタグ情報として、元データである読み取り画像のＩＤとそれをコピーしたことを示す情報を記録する。このようにしておけば、元データである読み取り画像にアクセスできる環境で、書類を読み込んだ時や、ネットワークにつながったパソコンに文書データをコピーした場合などに、元データである読み取り画像との関連を確認することができる。 When the restored electronic data is generated from the read image and stored, the ID of the read image that is the original data and information indicating that it has been copied are recorded as tag information of the constructed restored electronic data. In this way, when the document is read in an environment where the scanned image that is the original data can be accessed, or when the document data is copied to a computer connected to the network, the relationship with the scanned image that is the original data Can be confirmed.

本実施の形態によれば、複写した文書や構成された復元電子データがオリジナルではないことが分かる。また、オリジナルのデータにアクセスできる環境で、コピーした書類を読み込ませたり、構成された復元電子データを活用したりするときに、元データである読み取り画像との関連を知る手がかりになる。印刷された紙を介して複製される書類や復元電子データが、オリジナル原稿とすべて一致するわけではないことを認識できるようにしておかなくてはならなく、また、オリジナルのデータにアクセスできる系で、その原稿のコピーや構成された復元電子データを扱う場合に、オリジナルとの関連付けが容易になされることが望ましいので有効である。 According to the present embodiment, it can be seen that the copied document and the constructed restored electronic data are not original. Further, when the copied document is read or the constructed restored electronic data is used in an environment where the original data can be accessed, it becomes a clue to know the relationship with the read image that is the original data. It must be possible to recognize that documents and restored electronic data copied via printed paper do not match the original manuscript, and the system can access the original data. This is effective because it is desirable that the original is easily associated with the original when the copy of the manuscript and the constructed restored electronic data are handled.

次に、本発明の第１１の実施の形態について説明する。
印刷された文とは別に、余白部分に描き込まれた内容のみを抽出して、印刷したり、画像データとして蓄積したり転送したりするモードを備える。 Next, an eleventh embodiment of the present invention will be described.
In addition to the printed sentence, there is a mode in which only the content drawn in the margin is extracted and printed, stored as image data, or transferred.

印刷時に、印刷する印刷コードパターン、またはＲＦＩＤチップに、文字の配置や大きさを記録して置けば、印刷に文字のある位置が分かる。図２４のように、読み込んだ画像から、文字の範囲を分離すれば（図２４の（ｃ））、加筆部分を抽出する（図２４の（ｄ））ことが可能である。図などの位置も記録しておけば、印刷された時点での、文字や図の領域以外の余白領域も検出することが可能である。 At the time of printing, if the arrangement and size of characters are recorded on a print code pattern or RFID chip to be printed, the position where the characters are present in printing can be known. If the range of characters is separated from the read image as shown in FIG. 24 (FIG. 24C), the added part can be extracted (FIG. 24D). If a position such as a figure is also recorded, it is possible to detect a blank area other than the area of the character or the figure at the time of printing.

本実施の形態によれば、印刷された印刷コードパターンやＲＦＩＤチップに蓄積された、文字の配置情報などから、印刷後に描き込まれたと考えられる部分を抽出することが出来る。余白への書き込み部分だけが必要な場合や、元の印刷部分を隠した文書を作成したい場合、元データである読み取り画像との差分を取るような方法であると、元データである読み取り画像が無ければ、元から印刷されている部分と、余白に加筆された部分を機械の処理で分離することができなく、また、紙などを切ってマスクを作る方法もあるが、面倒であるので有効である。 According to the present embodiment, it is possible to extract a portion that is considered to be drawn after printing from the printed print code pattern or the character arrangement information stored in the RFID chip. If you only need to write in the margin, or if you want to create a document that hides the original print part, if the method is to take a difference from the read image that is the original data, the read image that is the original data Without it, the part that was originally printed and the part that was added to the margin could not be separated by machine processing, and there is also a method of making a mask by cutting paper etc., but it is troublesome and effective It is.

次に、本発明の第１２の実施の形態について説明する。
図２５のように、描き込まれたメモなどは隠しておきたい場合や、書き込みのないきれいな書類が欲しい場合のために、初めに印刷された形で複写するモードを備える。 Next, a twelfth embodiment of the present invention will be described.
As shown in FIG. 25, there is a mode of copying in the form printed first for the case where a drawn memo or the like is to be hidden or a beautiful document without writing is desired.

書き加えた部分を排除して複写できるモードを用意しておき、そのモードが選択された場合には、原稿が印刷された直後の状態に印刷される。複写モード以外でも、印刷後の書き込み部を排除した印刷時の文書データとして、送信、保存することもできる。 A mode is prepared in which the added portion can be removed and copied, and when that mode is selected, the document is printed in a state immediately after printing. Even in a mode other than the copy mode, it can be transmitted and stored as document data at the time of printing without the writing unit after printing.

第９の実施の形態と組み合わせて、書き込まれたりする前の元の印刷が行われた領域情報のみを使って、文字認識処理を行わずに、読み込んだ画像から印刷後の書き込みを除いた画像を複写、送信、保存するモードも選択できるようにしておくことも可能である。 In combination with the ninth embodiment, an image obtained by excluding writing after printing from a read image without performing character recognition processing using only the area information on which the original printing before writing was performed. It is also possible to select a mode for copying, transmitting, and storing the image.

本実施の形態によれば、文字の配置情報から、印刷後に描き込まれたり、汚れたりした部分を除いた、印刷時と同じ形態を希望する場合に、そのような形態に印刷することが出来る。元データである読み取り画像にアクセスすることが出来ない環境であっても、書き込みや汚れの無い（少ない）オリジナルの原稿が欲しい場合に有効である。 According to the present embodiment, when the same form as that at the time of printing is desired from the character arrangement information, excluding the part that is drawn after printing or dirty, it is possible to print in such a form. . This is effective in the case where an original document without writing or dirt (less) is desired even in an environment where the read image as the original data cannot be accessed.

なお、上述する各実施の形態は、本発明の好適な実施の形態であり、本発明の要旨を逸脱しない範囲内において種々変更実施が可能である。本発明は、スキャナ、プリンタ、複写機、複合機などにも適用可能である。 Each of the above-described embodiments is a preferred embodiment of the present invention, and various modifications can be made without departing from the scope of the present invention. The present invention can also be applied to scanners, printers, copiers, multifunction devices, and the like.

本発明の第１の実施の形態の情報処理装置の構成例を示す図である。It is a figure which shows the structural example of the information processing apparatus of the 1st Embodiment of this invention. 本発明の第１の実施の形態の動作を示す図である。It is a figure which shows the operation | movement of the 1st Embodiment of this invention. 本発明の第１の実施の形態の動作を示す図である。It is a figure which shows the operation | movement of the 1st Embodiment of this invention. 本発明の第１の実施の形態の機械読み取り用データ記録手段に記録する情報の一例を示す図である。It is a figure which shows an example of the information recorded on the data recording means for machine reading of the 1st Embodiment of this invention. 本発明の第１の実施の形態の動作を示す図である。It is a figure which shows the operation | movement of the 1st Embodiment of this invention. 文字認識処理の一般的な手順を示す図である。It is a figure which shows the general procedure of a character recognition process. 特徴抽出の方法を示す図である。It is a figure which shows the method of feature extraction. 特徴抽出の方法を示す図である。It is a figure which shows the method of feature extraction. 文書データ構成部分の処理の流れを示す図である。It is a figure which shows the flow of a process of a document data structure part. 文書データの再構成を示す図である。It is a figure which shows the reconstruction of document data. テンプレートとの比較を示す図である。It is a figure which shows the comparison with a template. テンプレートとの比較を示す図である。It is a figure which shows the comparison with a template. 本発明の第４の実施の形態の動作を示す図である。It is a figure which shows the operation | movement of the 4th Embodiment of this invention. 本発明の第４の実施の形態の動作を示す図である。It is a figure which shows the operation | movement of the 4th Embodiment of this invention. 文字認識の難易度のリストを示す図である。It is a figure which shows the list | wrist of the difficulty of character recognition. フォントごとに、異なる難易度表を示す図である。It is a figure which shows a different difficulty table for every font. 優先順位を示す図である。It is a figure which shows a priority. 本発明の第７の実施の形態の一例を示す図である。It is a figure which shows an example of the 7th Embodiment of this invention. 優先順位を示す図である。It is a figure which shows a priority. 本発明の第８の実施の形態の一例を示す図である。It is a figure which shows an example of the 8th Embodiment of this invention. 本発明の第８の実施の形態の一例を示す図である。It is a figure which shows an example of the 8th Embodiment of this invention. 本発明の第１０の実施の形態の一例を示す図である。It is a figure which shows an example of the 10th Embodiment of this invention. 本発明の第１０の実施の形態の一例を示す図である。It is a figure which shows an example of the 10th Embodiment of this invention. 本発明の第１１の実施の形態の一例を示す図である。It is a figure which shows an example of the 11th Embodiment of this invention. 本発明の第１２の実施の形態の一例を示す図である。It is a figure which shows an example of the 12th Embodiment of this invention. 機械読み取り用データ記録手段としてのＲＦＩＤチップを示す図である。It is a figure which shows the RFID chip | tip as a data recording means for machine reading. 機械読み取り用データ記録手段としてのバーコードパターンを示す図である。It is a figure which shows the barcode pattern as a data recording means for machine reading. 機械読み取り用データ記録手段としての２次元コードパターンを示す図である。It is a figure which shows the two-dimensional code pattern as a machine reading data recording means. 機械読み取り用データ記録手段としての背景ドットパターンを示す図である。It is a figure which shows the background dot pattern as a data recording means for machine reading. 取り消し線と、文字を構成する線が重なったり、接近したりする一例を示す図である。It is a figure which shows an example in which the strikethrough and the line which comprises a character overlap or approach. 取り消し線と、文字を構成する線が重なったり、接近したりする一例を示す図である。It is a figure which shows an example in which the strikethrough and the line which comprises a character overlap or approach.

Explanation of symbols

１０１画像読取部
１０２画像処理部
１０３文字認識部
１０４文字レイアウト情報読取部
１０５文字レイアウト情報復号部
１０６書体・配置情報抽出部
１０７書体・配置情報変換部
１０８書体・配置情報記録部
１０９文字データ構成部
１１０画像データ構成部
１１１データ記録部
１１２記録媒体
１１３データ転送部
１１４データ受信部
１１５印刷部 DESCRIPTION OF SYMBOLS 101 Image reading part 102 Image processing part 103 Character recognition part 104 Character layout information reading part 105 Character layout information decoding part 106 Typeface / placement information extraction part 107 Typeface / placement information conversion part 108 Typeface / placement information recording part 109 Character data structure part DESCRIPTION OF SYMBOLS 110 Image data structure part 111 Data recording part 112 Recording medium 113 Data transfer part 114 Data receiving part 115 Printing part

Claims

Reading means for acquiring a document image and character layout information from a manuscript;
Character recognition processing means for acquiring text data by executing character recognition processing on the document image acquired by the reading means;
Reconstructing means for reconstructing the text data obtained by the character recognition processing means based on the character layout information to obtain restored electronic data;
An information processing apparatus comprising:

Reading means for acquiring a document image and character layout information from a manuscript;
Character recognition processing means for executing text recognition processing on the document image acquired by the reading means based on the character layout information acquired by the reading means, and acquiring text data;
An information processing apparatus comprising:

The character recognition means includes
A template generation unit for generating a character template based on the character layout information acquired by the reading unit;
3. The character recognition processing unit according to claim 2, further comprising: a character recognition processing unit that performs character recognition processing by collating the document image acquired by the reading unit with the template generated by the template generation unit for each character. Information processing device.

The character recognition means further includes
A character separation unit that cuts out characters from the document image based on the character layout information acquired by the reading unit;
4. The character recognition unit according to claim 3, wherein the character recognition unit performs character recognition processing by collating the character cut out from the document image by the character separation unit with the template generated by the template generation unit. Information processing device.

The reading unit acquires character recognition auxiliary information in which a predetermined character included in the document image and position information indicating the position of the character in the document are associated with the document image and the character layout information from the document. And
The character recognition means refers to the character recognition auxiliary information based on the position of the character to be processed in the character recognition process in the character recognition unit, and the character recognition auxiliary information includes information corresponding to the position. 5. The information processing apparatus according to claim 3, wherein, in a case where the character recognition is performed, the character corresponding to the position included in the character recognition auxiliary information is output instead of the character recognition processing result.

6. The information processing apparatus according to claim 1, wherein the character layout information includes character font information indicating a character font and character size information indicating a character size.

The information processing apparatus according to claim 6, wherein the character layout information further includes character modification information indicating a modification applied to the character.

The information processing apparatus according to claim 6, wherein the character layout information further includes range information indicating a range of a character string.

The information processing apparatus according to claim 5, wherein the predetermined character included in the character recognition auxiliary information is a character that may be erroneously recognized by the character recognition processing unit.

The information processing apparatus according to claim 5 or 9, wherein characters that may be erroneously recognized by the character recognition processing unit are set according to character font or presence / absence of decoration for characters. .

The information processing apparatus according to claim 1, wherein the reading unit optically scans the document to obtain a document image and character layout information.

The reading means includes
A first reading unit that optically scans the original to obtain a document image;
A second reading unit that acquires character layout information from a recording unit attached to the document;
The information processing apparatus according to claim 1, comprising:

An information processing apparatus that prints and outputs on a paper-like recording medium having a document image and character layout information,
The document image and character layout information information is acquired by reading means of another information processing apparatus,
Text data is acquired by performing character recognition processing on the document image acquired by the reading unit by the character recognition processing unit of the other information processing apparatus,
A paper-like recording medium, wherein reconstructed electronic data is obtained by reconstructing text data obtained by the character recognition processing means by the reconstructing means of the other information processing apparatus based on the character layout information An information processing device that prints and outputs data.

An information processing apparatus that prints and outputs on a paper-like recording medium having a document image and character layout information,
The document image and character layout information information is acquired by reading means of another information processing apparatus,
Character recognition processing is executed on the document image obtained by the reading means by the character recognition processing means of the other information processing apparatus based on the character layout information obtained by the reading means, and text data is obtained. An information processing apparatus that prints and outputs a characteristic paper-like recording medium.