JP3731800B2

JP3731800B2 - Document data creation method and apparatus, and character data restoration method and apparatus

Info

Publication number: JP3731800B2
Application number: JP2000084848A
Authority: JP
Inventors: 勉黒瀬
Original assignee: Riso Kagaku Corp
Current assignee: Riso Kagaku Corp
Priority date: 2000-03-24
Filing date: 2000-03-24
Publication date: 2006-01-05
Anticipated expiration: 2020-03-24
Also published as: JP2001274976A

Description

【０００１】
【発明の属する技術分野】
本発明は、パーソナルコンピュータやワードプロセッサなどで作成された文書のビットマップデータを作成する方法および装置、並びにビットマップから文字データを復元する方法および装置に関するものである。
【０００２】
【従来の技術】
従来、ユーザがパーソナルコンピュータ（以下パソコンという）やワードプロセッサ（以下ワープロという）で任意に作成した文書のデジタルデータ（各文字の文字コードおよび印字位置の情報）を保管しておいたり、持ち運んだりすることができる記録媒体（記憶媒体）として、フロッピーディスクやＣＤ−ＲＯＭなどが広く知られている。この種の記録媒体に記録されている文書の内容は、記録媒体に対応する装置、例えばＦＤドライブ装置やＣＤ−ＲＯＭドライブ装置などの読出装置を搭載したパソコンやワープロを用いることで誤りなく復元させることができる。
【０００３】
ただし、保管状態にある多数の記録媒体の中から、所望の文書を記録した媒体を見つけ出す（いわゆる検索）ためには、パソコンやワープロを用いて、記録されている文書をファイル名を頼りに順次読み出して画面に表示して内容を確認する必要がある。しかしながら、このような方法は、作業効率が悪いという問題点がある。
【０００４】
一方、持ち運び（ハンドリング）中の時間を利用して文書の内容を確認するためには、記録媒体とともに表示部を有する携帯型の読出装置を持ち歩くか、または該文書の印刷物（紙媒体）を持ち歩く必要がある。しかしながら、読出装置を持ち歩く方法では、見難くない程度の表示画面を備えた装置にすると持ち運ぶのに大きすぎたり重すぎたりするという問題があり、印刷物を持ち歩く方法では、記録媒体と印刷物との両方を管理（ファイル）する必要があり、二重管理が生じるという問題点がある。
【０００５】
そこで、上述のような問題点を解決する一方法として、文書を印刷物でのみハンドリングすることとし、公知の種々の文字認識方法を用いて、印刷物の文字を認識することで、文書のデジタルデータを復元するということが考えられる。
【０００６】
この際、例えば特開昭６２−２４３０８７号などに提案されているように、文字コードや文字位置（以下文字情報という）だけでなく、アンダーライン、倍角、フォント、サイズ、あるいはスタイルなどの文字の属性を示す情報（以下属性情報という）とともに復元するということも考えられる。
【０００７】
ここで、上記特開昭６２−２４３０８７号に提案されている方法とは、用紙上において、文書内容を示す文字に加えて、例えば、倍角文字を指定する場合はマル囲み付きＢＳ、アンダーラインを指定する場合はマル囲み付きＵＳなど前記属性情報を示す属性指定文字を属性が変化する文字の直前に記入しておき、文字認識の際に、前記属性指定文字を認識したときには、この属性指定文字の情報に基づいてアンダーラインを付したり倍角文字にするなど文書内容を示す文字を修飾するというものである。
【０００８】
この方法を用いれば、パソコンなどで作成した文書を紙出力のみで管理することができるようになるので、文書内容の確認や検索が容易にでき、また文書のデジタルデータを文書内容だけでなく属性情報も含めて復元でき、さらに、二重管理という問題を生じることがない文書管理方法にすることができる。
【０００９】
【発明が解決しようとする課題】
しかしながら、上記特開昭６２−２４３０８７号に提案されている方法では、属性が変化する文字の直前に属性指定文字を記入するので、元データの文字レイアウト（文字が配される位置）と印刷物上の文字レイアウトとが一致しなくなるという問題がある。このため、印刷物上の文字を見ただけでは、元データや復元される文書のレイアウトの間違いやバランスの悪さに気付くことができないという問題を招くことにもなる。
【００１０】
本発明は上記事情に鑑みてなされたものであり、文書内容の確認や検索が容易にでき、また文書のデジタルデータを文書内容だけでなく属性情報も含めて復元でき、さらに、二重管理という問題を生じることがない文書管理方法を実現するに際して、文書の印刷物を見ただけで、元データや該印刷物から復元される文書内容およびレイアウトを確認することができる文字認識用文書のビットマップデータを作成する方法および装置、並びにビットマップから文字データを復元する方法および装置を提供することを目的とするものである。
【００１１】
【課題を解決するための手段】
本発明による第１の文書データ作成方法は、コンピュータにより作成された文書原稿データに基づいて文書のビットマップデータを作成する文書データ作成方法であって、前記文書原稿データを構成する各文字の情報から文字情報と属性情報を抽出し、該抽出した各文字の文字情報に応じて、基礎フォント文字に変形を加えたキャリアフォント文字のビットマップを使用して前記文書原稿データに対応するビットマップデータを作成し、前記抽出した各文字の属性情報を、該属性情報と該属性情報を表わす情報コードとの対応を表す第１の変換テーブルに基づいて前記情報コードに変換し、前記作成されたビットマップデータの各画素のうち所定の条件に合致した画素を、前記情報コードを合成することができる合成可能画素として抽出し、前記情報コードを前記合成可能画素に順次付加するものである。
【００１２】
本発明による第２の文書データ作成方法は、第１の文書データ作成方法と同様に作成された文書のビットマップデータに情報コードを付加（合成）するに際して、情報コードを圧縮した後に付加するようにしたものであって、文書原稿データを構成する各文字の情報から文字情報と属性情報を抽出し、該抽出した各文字の文字情報に応じて、基礎フォント文字に変形を加えたキャリアフォント文字のビットマップを使用して前記文書原稿データに対応するビットマップデータを作成し、前記抽出した各文字の属性情報を、該属性情報と該属性情報を表わす情報コードとの対応を表す第１の変換テーブルに基づいて前記情報コードに変換し、前記情報コードが、文字によって固有な固有情報と行単位または段落単位で同一である非固有情報とに分離されるものであり、前記文書原稿データを構成する文字のうち連なった文字の前記非固有情報を消去して、前記連なった文字の情報コードを圧縮して圧縮済情報コードを生成し、前記作成されたビットマップデータの各画素のうち所定の条件に合致した画素を、前記圧縮済情報コードを合成することができる合成可能画素として抽出し、前記圧縮済情報コードを前記合成可能画素に順次付加するものである。
【００１３】
ここで「圧縮」は、文書の全文字についての全情報コードの量が少なくなるように、所定の文字についての情報コードのみを圧縮するものであってもよい。
【００１４】
また、「キャリアフォント文字」は、該キャリアフォント文字のビットマップに情報コードを付加したときに、後述する文字データ復元過程において、情報コードを付加したものであるのか否かを認識するのに都合がよい文字であって、文字データ復元過程において情報コードを付加したものであるのか否かを認識することができる限り、どのようなものを使用してもよく、また、従来から使用されている文字を変形して得たものであってもよい。
【００１５】
また「圧縮済情報コードを合成することができる合成可能画素」は、後述する文字データ復元過程において、情報コードを付加した画素であるのか否かを認識するのに都合がよい画素であって、文字データ復元過程において情報コードを付加したものであるのか否かを認識することができる限り、いずれの画素を合成可能画素としてもよい。なお、情報コードを付加することによって、元の文字の認識性を落とさないように配慮するのが望ましい。
あるいは、前記キャリアフォント文字は、走査方向のライン上に並ぶ黒画素群のランレングスが奇数値になるように変形したものであり、前記合成可能画素が、前記走査方向のライン上に並ぶ黒画素群に隣接する白画素であり、かつ、該白画素を黒画素に変えても２つの黒画素群がつながらない白画素であってもよい。
【００１６】
一方、本発明による第１の文字データ復元方法は、上記第１の文書データ作成方法を使用して作成された文書のビットマップデータから文書の文字データを復元する方法であって、文書のビットマップ中の前記合成可能画素に付加された前記情報コードを抽出し、前記情報コードと前記属性情報の対応を表す第２の変換テーブルを参照して前記抽出した情報コードを属性情報に変換し、該抽出した情報コードを前記文書のビットマップから除去し、該情報コードが除去されたビットマップに基づいて文字情報を復元することを特徴とするものである。
さらに、キャリアフォント文字が、走査方向のライン上に並ぶ黒画素群のランレングスが奇数値になるように変形したものである場合には、上記第１の文書データ作成方法を使用して作成された文書のビットマップから走査方向のライン上に並ぶ黒画素群のランレングスが偶数値であるか奇数値であるかに基づいて前記情報コードを抽出し、前記情報コードと前記属性情報の対応を表す第２の変換テーブルを参照して前記抽出した情報コードを属性情報に変換し、該抽出した情報コードを前記文書のビットマップから除去し、該情報コードが除去されたビットマップに基づいて文字情報を復元するものであってもよい。
【００１７】
また、本発明による第２の文字データ復元方法は、上記第２の文書データ作成方法を使用して作成された文書のビットマップから文書の文字データを復元する方法であって、文書のビットマップ中の前記合成可能画素に付加された前記圧縮済情報コードを抽出し、該抽出した圧縮済情報コードを復元して圧縮前の前記情報コードを求め、前記情報コードと前記属性情報の対応を表す第２の変換テーブルを参照して前記求めた情報コードを属性情報に変換し、前記抽出した圧縮済情報コードを前記文書のビットマップから除去し、該圧縮済情報コードが除去されたビットマップに基づいて文字情報を復元することを特徴とするものである。
さらに、キャリアフォント文字が、走査方向のライン上に並ぶ黒画素群のランレングスが奇数値になるように変形したものである場合には、上記第２の文書データ作成方法を使用して作成された文書のビットマップから走査方向のライン上に並ぶ黒画素群のランレングスが偶数値であるか奇数値であるかに基づいて前記圧縮済情報コードを抽出し、該抽出した圧縮済情報コードを復元して圧縮前の前記情報コードを求め、前記情報コードと前記属性情報の対応を表す第２の変換テーブルを参照して前記求めた情報コードを属性情報に変換し、前記抽出した圧縮済情報コードを前記文書のビットマップから除去し、該圧縮済情報コードが除去されたビットマップに基づいて文字情報を復元するものであってもよい。
【００１８】
上記第１および第２の文字データ復元方法においては、復元した文字情報に対して、変換された属性情報に基づいて修飾を施すことが望ましい。
【００１９】
本発明による第１の文書データ作成装置は、上記第１の文書データ作成方法を実施する装置、すなわち、コンピュータにより作成された文書原稿データに基づいて文書のビットマップデータを作成する文書データ作成装置であって、基礎フォント文字に変形を加えたキャリアフォント文字のビットマップデータを記憶するキャリアフォント文字記憶手段と、前記文書原稿データを構成する各文字の情報から文字情報と属性情報を抽出する文字・属性情報抽出手段と、該抽出した各文字の文字情報に応じて、前記キャリアフォント文字記憶手段からキャリアフォント文字のビットマップを読み出して前記文書原稿データに対応するビットマップデータを作成するビットマップデータ作成手段と、前記抽出した各文字の属性情報を、該属性情報と該属性情報を表わす情報コードとの対応を表す第１の変換テーブルに基づいて前記情報コードに変換する属性情報変換手段と、前記作成されたビットマップデータの各画素のうち所定の条件に合致した画素を、前記情報コードを合成することができる合成可能画素として抽出する情報合成可能画素抽出手段と、前記情報コードを前記合成可能画素に順次付加する情報コード合成手段とを備えたことを特徴とするものである。
【００２０】
本発明による第２の文書データ作成装置は、上記第２の文書データ作成方法を実施する装置、すなわち、コンピュータにより作成された文書原稿データに基づいて文書のビットマップデータを作成する文書データ作成装置であって、基礎フォント文字に変形を加えたキャリアフォント文字のビットマップデータを記憶するキャリアフォント文字記憶手段と、前記文書原稿データを構成する各文字の情報から文字情報と属性情報を抽出する文字・属性情報抽出手段と、該抽出した各文字の文字情報に応じて、前記キャリアフォント文字記憶手段からキャリアフォント文字のビットマップを読み出して前記文書原稿データに対応するビットマップデータを作成するビットマップデータ作成手段と、前記抽出した各文字の属性情報を、該属性情報と該属性情報を表わす情報コードとの対応を表す第１の変換テーブルに基づいて前記情報コードに変換する属性情報変換手段と、前記情報コードが、文字によって固有な固有情報と行単位または段落単位で同一である非固有情報とに分離されるものであり、前記文書原稿データを構成する文字のうち連なった文字の前記非固有情報を消去して、前記連なった文字の情報コードを圧縮して圧縮済情報コードを生成する情報コード圧縮手段と、前記作成されたビットマップデータの各画素のうち所定の条件に合致した画素を、前記圧縮済情報コードを合成することができる合成可能画素として抽出する情報合成可能画素抽出手段と、前記圧縮済情報コードを前記合成可能画素に順次付加する情報コード合成手段とを備えたことを特徴とするものである。
【００２１】
また、本発明による第１の文字データ復元装置は、上記第１の文字データ復元方法を実施する装置、すなわち、上記第１の文書データ作成装置により作成された文書のビットマップから文書の文字データを復元する文字データ復元装置であって、文書のビットマップ中の前記合成可能画素に付加された前記情報コードを抽出する情報コード抽出手段と、前記情報コードと前記属性情報の対応を表す第２の変換テーブルを参照して前記抽出した情報コードを属性情報に変換する情報コード変換手段と、前記抽出した情報コードを前記文書のビットマップから除去する情報コード除去手段と、該情報コードが除去されたビットマップに基づいて文字情報を復元する文字認識手段とを備えたことを特徴とするものである。
さらに、キャリアフォント文字が、走査方向のライン上に並ぶ黒画素群のランレングスが奇数値になるように変形したものである場合には、上記第１の文書データ作成装置により作成された文書のビットマップから走査方向のライン上に並ぶ黒画素群のランレングスが偶数値であるか奇数値であるかに基づいて前記情報コードを抽出する情報コード抽出手段と、前記情報コードと前記属性情報の対応を表す第２の変換テーブルを参照して前記抽出した情報コードを属性情報に変換する情報コード変換手段と、前記抽出した情報コードを前記文書のビットマップから除去する情報コード除去手段と、該情報コードが除去されたビットマップに基づいて文字情報を復元する文字認識手段とを備えたものであってもよい。
【００２２】
本発明による第２の文字データ復元装置は、上記第２の文字データ復元方法を実施する装置、すなわち、上記第２の文書データ作成装置により作成された文書のビットマップから文書の文字データを復元する文字データ復元装置であって、文書のビットマップ中の前記合成可能画素に付加された前記圧縮済情報コードを抽出する情報コード抽出手段と、該抽出した圧縮済情報コードを復元して圧縮前の前記情報コードを求める情報コード復元手段と、前記情報コードと前記属性情報の対応を表す第２の変換テーブルを参照して前記復元した情報コードを前記属性情報に変換する情報コード変換手段と、前記抽出した圧縮済情報コードを前記文書のビットマップから除去する情報コード除去手段と、該圧縮済情報コードが除去されたビットマップに基づいて文字情報を復元する文字認識手段とを備えたことを特徴とするものである。
さらに、キャリアフォント文字が、走査方向のライン上に並ぶ黒画素群のランレングスが奇数値になるように変形したものである場合には、上記第２の文書データ作成装置により作成された文書のビットマップから走査方向のライン上に並ぶ黒画素群のランレングスが偶数値であるか奇数値であるかに基づいて前記圧縮済情報コードを抽出する情報コード抽出手段と、該抽出した圧縮済情報コードを復元して圧縮前の前記情報コードを求める情報コード復元手段と、前記情報コードと前記属性情報の対応を表す第２の変換テーブルを参照して前記復元した情報コードを前記属性情報に変換する情報コード変換手段と、前記抽出した圧縮済情報コードを前記文書のビットマップから除去する情報コード除去手段と、該圧縮済情報コードが除去されたビットマップに基づいて文字情報を復元する文字認識手段とを備えたものであってもよい。
【００２３】
本発明による第１および第２の文字データ復元装置においては、復元した文字情報に対して、変換された属性情報に基づいて修飾を施す修飾手段を更に備えたものとするのが望ましい。
【００２４】
【発明の効果】
本発明による第１の文書データ作成方法および装置、並びに第１の文字データ復元方法および装置（以下纏めて第１の発明という）によれば、文書データ作成過程において、文書原稿を構成する各文字の情報から文字情報と属性情報を抽出し、抽出した文字情報に応じて作成された文書原稿に対応するビットマップデータに抽出した属性情報を表す情報コードを付加し、文字データ復元過程において、文書のビットマップから各文字の情報コードを抽出して、変換テーブルを参照して情報コードを属性情報に変換する一方、情報コードを除去した後のビットマップに基づいて文字認識を行なって文字情報を復元するようにしたので、元の文書のデジタルデータを属性情報も含めて正確に復元させることができる。
【００２５】
このように、第１の発明によれば、文書のデジタルデータを文書内容だけでなく属性情報も含めて復元することができるようにしているので、結果として、パソコンなどで作成した文書を紙出力のみで管理することができるようになるので、二重管理という問題を生じることがない文書管理方法にすることができる。
【００２６】
また、第１の文書データ作成方法および装置における文書データ作成過程においては、文字の属性情報を属性指定文字の記入ではなく、キャリアフォント文字のビットマップに情報コードとして付加しているので、元データのレイアウトを崩すことなく属性情報を付加することができ、パソコンなどで作成した文書の元データと略同じ位置に各文字が出力され、元データの文字レイアウトと印刷物上の文字レイアウトとが一致しなくなるという問題が生じることがなく、文書の印刷物を見ただけで、元データや該印刷物から復元される文書内容およびレイアウトを確認することができる。
【００２７】
また、本発明による第２の文書データ作成方法および装置、並びに第２の文字データ復元方法および装置（以下纏めて第２の発明という）によれば、文書データ作成過程において、情報コードを圧縮した後に第１の発明と同様の方法により圧縮済情報コードを付加し、文字データ復元過程において、文書のビットマップから抽出した圧縮済情報コードを元の情報コードに復元して、第１の発明と同様の方法により、属性情報や文字情報を得るようにしたので、同一の情報量を復元するための情報コード数を、情報コードを圧縮しない場合に比べて縮小でき、復元できる属性情報量が第１の発明よりも増加するという効果がある。
【００２８】
【発明の実施の形態】
以下、図面を参照して本発明の実施の形態について詳細に説明する。
【００２９】
図１は後述する本発明の実施の形態による文書データ作成方法および文字データ復元方法を実施する各装置に使用されるキャリアフォント文字のビットマップを作成する装置の構成を示すブロック図である。
【００３０】
図１に示すように、このビットマップフォント作成装置１は、基礎ビットマップフォントを構成する基礎フォント文字のビットマップデータを記憶する基礎フォント文字記憶手段１０と、読み出したビットマップデータが表す基礎フォント文字のビットマップに対して、後述する方法を用いて変形を加えることにより、水平方向１ラインごとの黒画素群の構成画素数（以下ランレングスという）を奇数値にすることにより、新たなフォント文字（以下キャリアフォント文字という）を作成するキャリアフォント文字作成手段１１とを備えている。
【００３１】
このランレングスが奇数値にされたキャリアフォント文字は、従来のフォント文字自体の各ランレングスが全て奇数値である場合もあるが、そうでないときには、従来のフォント文字を基礎フォント文字としてビットマップ化した後に、後述する方法を用いて微小変形を加えることで作成することができる。ここで、ランレングスを奇数化しているのは、文字の属性情報を二値データとして文字のビットマップに付加（合成）し、付加した二値データを復元（分離）できるようにするためである（詳細は後述する）。
【００３２】
次に、ビットマップフォント作成装置１によるビットマップフォントの作成方法、つまりビットマップフォントを構成する多数のキャリアフォント文字を作成する方法について、図２に示すフローチャートを参照して説明する。なお、図２中、ステップ番号にはＳ印を付す。
【００３３】
１）キャリアフォント文字作成手段１１は、先ず、基礎フォント文字記憶手段１０に格納されている、キャリアフォント文字の基礎となる特定のフォント文字（基礎フォント文字）、サイズ、およびスタイルを有する一般的な文字である基礎フォント文字のセット（フォントデータ）から、任意の１文字のデータを選択して読み出す。そして、読み出した基礎フォント文字の文字サイズに応じて固有な領域内で水平方向１ラインごとに走査を行ない（ステップ１）、黒画素を検出する（ステップ２）。さらに、検出された黒画素を先頭とする黒画素群のランレングスを計数する（ステップ３）。
【００３４】
ここで、固有な領域としては、図６の文字の周囲に点線を付した矩形領域とするとよい。また、走査とは、画像データ上でのことを意味し、例えば基礎フォント文字の画素データをエリアメモリにそのビットマップに対応させて格納し、このメモリ上において走査するとよい。
【００３５】
ステップ３において計数された黒画素群のランレングスが、奇数であるか否かを判定し（ステップ４）、奇数でないと判定された場合には、黒画素群を構成する走査方向最前段の画素を白画素に変換して、ランレングスを奇数化する（ステップ５）。
【００３６】
２）ステップ４においてランレングスが奇数であると判定された場合およびステップ５においてランレングスを奇数化した後には、ステップ１からステップ４あるいはステップ５までの処理を、上記文字サイズに応じて固有な領域内の全ラインの走査が終了するまで繰り返し行なう。
【００３７】
そして、以上の処理を、全ての基礎フォント文字に対して行なうことで、キャリアフォント文字のフォントデータが作成される。作成されたフォントデータは、例えばＣＤ−ＲＯＭなどのコンピュータ読取り可能な媒体に記録して配布するとよい。図３（ａ）に漢字「旅」および「行」についての各キャリアフォント文字のビットマップデータの一例を示す。
【００３８】
このように、上記方法を用いれば、簡単な方法でキャリアフォント文字のビットマップデータを作成することができる。
【００３９】
次に、上述のようにして作成されたキャリアフォント文字からなるビットマップフォントを用いて文書のビットマップデータを作成する、本発明に係る文書データ作成部を備えた文書作成装置、および文書作成装置により作成、出力された印刷物から文書の文字データを復元する、本発明に係る文書データ復元部を備えた文書復元装置の第１の実施の形態について説明する。なお、この文書作成装置と文書復元装置とを合わせて、紙によるデジタル・インターフェース・システム（以下紙ＤＩＦシステムという）という。
【００４０】
図４は文書作成装置の概略構成を示すブロック図、図５は文書復元装置の概略構成を示すブロック図、図６はパソコンやワープロなどで作成された文書原稿の一例を示す図である。
【００４１】
文書作成装置２は、図４に示すように、文字・属性情報抽出手段２０と、キャリアフォント文字記憶手段３１、文書ビットマップデータ作成手段３２、情報合成可能画素抽出手段３３、本発明の第１の変換テーブルとしての属性情報変換テーブル３４、属性情報変換手段３５、および情報コード合成手段３６からなる文書データ作成部３０と、プリンタなどの画像出力手段４０とから構成されており、ユーザがパソコンやワープロで任意に作成した文書の文字および属性情報に応じて、キャリアフォント文字を用いた文字認識用文書としての情報コード合成文書の印刷物５０を作成し、出力するものである。
【００４２】
文書データ作成部３０を構成するキャリアフォント文字記憶手段３１には、上述したビットマップフォント作成装置１により作成されたキャリアフォント文字を格納するものである。なお、フォントデータの入手に際しては、例えば専用フォント文字のセットを記録したＣＤ−ＲＯＭなどの媒体を不図示の読取装置で読み取って、読み取ったデータをキャリアフォント文字記憶手段３１に記憶させるとよい。
【００４３】
文書データ作成部３０を構成する文書ビットマップデータ作成手段３２は、ユーザがパソコンやワープロで任意に作成した文書の各文字の文字情報のうちの文字コードに応じてキャリアフォント文字記憶手段３１に記憶されている所定のキャリアフォント文字のビットマップデータを読み出し、この読み出したキャリアフォント文字のビットマップデータを、各文字の文字情報のうち文字位置基準点の座標に応じて、元の文書の大きさに対応する全画素が白のデジタル画像の所定の位置に貼り付けることによって、キャリアフォント文字を用いた文書のビットマップデータを作成するものである。
【００４４】
ここで、文字位置基準点の座標としては、文字サイズに応じて固有な領域内の所定の座標とすればよく、本例においては、図６中●で示す、各文字の周囲に点線を付した矩形領域ａ内の左上頂点画素の座標とする。なお、これに限らず、文字の基準位置を定めることができるかぎり、その他の座標を文字位置基準点の座標としてもよい。
【００４５】
文書データ作成部３０を構成する情報合成可能画素抽出手段３３は、キャリアフォント文字を用いた文書の画像領域内での水平方向１ラインごとの走査によって、情報合成可能画素としての黒画素群を構成する走査方向最前段の画素に隣接する白画素を抽出するものである。
【００４６】
文書データ作成部３０を構成する属性情報変換手段３５は、フォント、サイズ、スタイルおよび文字位置基準点座標修正データの各情報を、図７に示す属性情報変換テーブル３４に基づいて所定の情報コードにそれぞれ変換し、各情報コードを所定のフォーマットに当てはめるものである。
【００４７】
文書データ作成部３０を構成する情報コード合成手段３６は、所定のフォーマットに当てはめられた情報コードを、所定の順序にしたがって、情報合成可能画素抽出手段３３によって抽出された情報合成可能画素に付加して、キャリアフォント文字を用いた情報コード合成文書のビットマップデータを作成するものである。
【００４８】
文書復元装置３は、図５に示すように、４００ｄｐｉの解像度を有するスキャナなどの画像入力手段６０と、文書ビットマップデータ抽出手段７０と、文字認識手段８０、情報コード分離手段８４、本発明の第２の変換テーブルとしての情報コード変換テーブル８５、情報コード変換手段８６および修飾手段としての文字・属性情報再構成手段８７からなる文字データ復元部８８とから構成されており、上記文書作成装置２により作成、出力された印刷物５０から、元の文書の文字データをパソコンやワープロ上で復元させるものである。
【００４９】
文字データ復元部８８を構成する情報コード分離手段８４は、文書ビットマップデータ抽出手段７０により抽出された文字について、水平方向１ラインごとの走査によって全ての黒画素群を検出した後、各黒画素群のランレングスを計数して情報コードを分離すると共に、情報コード「１」を表す黒画素を白画素に修正しておくものである。すなわち、情報コード分離手段８４は、本発明の情報コード抽出手段と情報コード除去手段とを兼ねるものである。
【００５０】
文字データ復元部８８を構成する文字認識手段８０は、情報コード分離手段８４により修正された後の、キャリアフォント文字を用いた文書のビットマップデータから、文字（本例においては文字コード）および文字位置基準点の座標を復元するものである。文字認識方法としては、公知の種々の方法を用いることができる。
【００５１】
文字データ復元部８８を構成する情報コード変換テーブル８５には、上記文字作成装置２の属性情報変換テーブル３４に格納されている情報と同一のものが格納される。
【００５２】
文字データ復元部８８を構成する情報コード変換手段８６は、情報コード分離手段８４によって分離された情報コードを、文字位置基準点の座標が小さい順に１文字分ずつグループ化して、前記所定のフォーマットに応じて当てはめ、情報コード変換テーブル８５に基づいて、フォント、サイズ、スタイル、文字位置基準点座標修正データの４つの文字の属性情報に変換（復元）するものである。
【００５３】
文字データ復元部８８を構成する文字・属性情報再構成手段８７は、文字認識手段８０により復元された文字コードおよび文字位置基準点の座標と、情報コード変換手段８６によって文字位置基準点の座標が小さい順に復元されたフォント、サイズ、スタイル、文字位置基準点座標修正データの４つの文字の属性情報とを対応付けるものである。
【００５４】
なお、文字・属性情報再構成手段８７の後段に画像出力手段を備え、元の文書を印刷物として復元させるようにしてもよい。
【００５５】
次に、文書作成装置２および文書復元装置３からなる紙ＤＩＦシステムの作用について説明する。
【００５６】
文書作成装置２では、以下のようにしてキャリアフォント文字を用いて文書の印刷物５０が出力される。
【００５７】
１）先ず文字・属性情報抽出手段２０により、ユーザがパソコンやワープロで任意に作成した文書の各文字の文字コード、文字位置基準点の座標、フォント、サイズ、スタイルの５つの属性情報を抽出する。
【００５８】
なお、文字・属性情報抽出手段２０による属性情報の抽出の際には、文書データの文字レイアウトに合わせて抽出するようにする。具体的には、図６に示す印刷物５０に対応する文書データから属性情報を抽出する場合には、印刷物５０の左上を走査原点として、右方向を主走査方向、下方向を副走査方向として抽出することとする。走査方向がこれとは異なる場合には、画像データ上で適当な回転処理を行った後に抽出処理を行なうようにする。なお、走査とは、画像データ上でのことを意味し、例えばパソコンなどで作成した文書の画素データをエリアメモリにそのビットマップに対応させて格納し、このメモリ上において走査するとよい。
【００５９】
抽出した各文字の属性情報は、文字位置基準点の座標が走査順で小さい文字から順次抽出するものとする。例えば、図６においては、「旅」「行」「日」「時」「一」「月」「十」「日」の順で抽出する。
【００６０】
２）次に、文書ビットマップデータ作成手段３２において、各文字の文字コードに応じて、キャリアフォント文字フォント記憶手段３１に記憶されている所定のキャリアフォント文字のビットマップデータを順次読み出す。そして、文書の各文字の文字位置基準点が配される用紙上の位置に該文字に対応するキャリアフォント文字の文字位置基準点が配されるように、元の文書の大きさに対応する全画素が白のデジタル画像の所定の位置に、読み出したキャリアフォント文字のビットマップデータを順次貼り付ける。これにより、キャリアフォント文字を用いた文書のビットマップデータＤ１が作成される。
【００６１】
このようにして作成されたビットマップデータＤ１は、キャリアフォント文字のビットマップを構成する水平方向（文書の行方向に相当）１ラインごとの黒画素群の構成画素数,すなわちランレングスが奇数値とされたものとなっている。
【００６２】
また、この第１の実施の形態においては、文書復元装置において４００ｄｐｉの解像度を有する画像入力手段６０を使用するものとしているために、ユーザがパソコンやワープロで作成した４００ｄｐｉのドットの解像度を有する文書から、１００ｄｐｉのドットの解像度を有する印刷物５０を作成するようにしている。これは、印刷物５０の画像データを文書ビットマップデータ抽出手段７０で正確に抽出するためには、画像入力手段６０の解像度が印刷物５０のドットの解像度以上を要するためである。よって、文書ビットマップデータ作成手段３２によるキャリアフォント文字のビットマップデータの貼り付けは、図６中●で示す４００ｄｐｉ相当の文字位置基準点の座標を変換することによって得られる１００ｄｐｉ相当の文字位置基準点の座標が、図３（ａ）中○で示すキャリアフォント文字の文字サイズに応じて固有な領域ｂ内の左上頂点画素の座標と一致するように行われる。
【００６３】
なお、この第１の実施の形態においては、４００ｄｐｉから１００ｄｐｉへの変換により生じるズレ量の情報（以下文字位置基準点座標修正データという）を属性情報とともにキャリアフォント文字に付加しておくことで、文書復元装置３において文字位置基準点の座標を正確に復元することができるようにしている。
【００６４】
これは、４００ｄｐｉのフォント文字を１００ｄｐｉで表現すると、解像度が１／４になり、文書作成装置２により出力される文字レイアウトの位置精度も１／４に低下するので、例えば、ある文字の原稿上の座標（４００ｄｐｉ）が、主走査方向Ｘ＝４０１、副走査方向Ｙ＝４０３であったとき、文書作成装置２により出力された用紙上での座標は、Ｘ＝１００（あまり１）、Ｙ＝１００（あまり３）になるなど、かっこ内の「あまり」の部分を再現することができない。したがって、文字位置基準点の座標を正確に復元するには、この位置精度の低下分を補正する必要があり、本例においては、文字位置基準点座標修正データとして、前記「あまり」のＸ座標成分とＹ座標成分とを示す情報を属性情報とともにキャリアフォント文字に付加しておき、復元の過程において、Ｘ座標成分とＹ座標成分とを使用して、この「あまり」の部分を修正することにしている。
【００６５】
３）次に、情報合成可能画素抽出手段３３により、キャリアフォント文字を用いた文書の画像領域内での水平方向１ラインごとの走査によって、上記情報合成可能画素としての黒画素群を構成する走査方向最前段の画素に隣接する白画素を抽出する。ただし、この白画素のうち、黒画素に変更することで２つの黒画素群がつながって新しい１つの黒画素群が形成されてしまうものについては、情報コードを合成できない画素として扱う。これは、情報コードを付加することによって、紙出力において元の文字の認識性を落とさないようにするためである。図３（ａ）のキャリアフォント文字「旅」「行」のビットマップに情報合成可能画素を合わせて示した図を図３（ｂ）に示す。図３（ｂ）中、／が情報合成可能画素を示し、×が２つの黒画素群がつながるため情報コードを合成できない画素を示す。
【００６６】
４）さらに属性情報変換手段３５により、フォント、サイズ、スタイルおよび文字位置基準点座標修正データの情報を、図７に示す属性情報変換テーブル３４に基づいて情報コードに変換した後、所定のフォーマットに当てはめる。
【００６７】
具体的には、図８（ａ）に示すように、フォントの情報をｂ_１１，ｂ_１０，ｂ_９の３ビット（ｂｉｔ）分に、サイズの情報をｂ_８，ｂ_７，ｂ_６の３ビット分に、スタイルの情報をｂ_５，ｂ_４の２ビット分に、文字位置基準点座標修正データのうちのＹ座標成分をｂ_３，ｂ_２の２ビット分に、同じくＸ座標成分をｂ_１，ｂ_０の２ビット分に、それぞれ当てはめて１２ビットからなるコードデータとする。
【００６８】
各文字「旅」「行」「日」「時」「一」「月」「十」「日」についての情報コードを示すデータの一例を図８（ｂ）に示す。
【００６９】
５）さらに情報コード合成手段３６により、図８（ａ）に示すフォーマットに当てはめられた情報コードを、ｂ_１１，…，ｂ_０の順で、情報合成可能画素抽出手段３３によって抽出された情報合成可能画素に走査順にしたがって順次付加する。ここで「走査順」とは、図６に示す文書原稿に対応するビットマップ上において、右方向を主走査方向Ｘ、下方向を副走査方向Ｙとし、ラスタースキャンの走査と同じように、左上を走査原点として、先ず主走査を行い、１ライン分の主走査が終了したら次のラインにシフトするような順序である。なお、１文字分の情報コードの付加が終了したら、次の文字の情報コードを引き続き付加するようにする。
【００７０】
図３（ｂ）中／で示す各情報合成可能画素と図８（ｂ）に示す各文字の情報コードの各ビットとの対応関係を、図３（ｂ）を拡大して図９に示す。図示するように、図９中の情報合成可能画素の走査順に、各文字の情報コードの各ビットが順に対応するようになっており、「一」のビットｂ_０以降については、図示しない次の文字の情報合成可能画素と対応し、全ての文字の情報コードが余すところなく対応づけられることとなる。
【００７１】
なお、この第１の実施の形態においては、情報コードとして「１」を付加する場合にのみ、情報合成可能画素を黒画素に変更して、黒画素群のランレングスが偶数に変化するようにしている。これにより、キャリアフォント文字を用いた情報コード合成文書のビットマップデータが作成される。
【００７２】
情報コード合成後のキャリアフォント文字を用いたビットマップデータの一例を図３（ｃ）に示す。このビットマップデータは、図３（ｂ）中／で示す情報合成可能画素に、図８（ｂ）に示す情報コードを表す画素を付加することにより作成されたものである。なお、「一」のビットｂ_０以降については、図示しない次の文字に付加され、全ての文字の情報コードが余すところなく付加されることとなる。
【００７３】
６）最後に、画像出力手段４０により、キャリアフォント文字を用いた情報コード合成済文書のビットマップデータＤ２に基づいて、該文書の印刷物５０を１００ｄｐｉで出力する。これにより、パソコンなどで作成した文書の元データと略同じレイアウト（文字位置）でキャリアフォント文字が印刷される。
【００７４】
なお、この第１の実施の形態においては、後述する文書復元装置３の文書ビットマップデータ抽出手段７０によって文書ビットマップデータを正確に抽出することができるように、文書の画像領域サイズおよび解像度に応じたスケール枠５５を用紙の縁から数ミリ内側にかけて印刷するようにしている。図１０に、このスケール枠５５付きの、キャリアフォント文字に情報コードを付加した文書の印刷物５０の一例を示す。なお、スケール枠に限らず、その他の基準位置を示す文字や記号を印刷するようにしてもよい。
【００７５】
図１０から判るように、キャリアフォント文字の一部（図では「旅」や「行」）については元の文字（基礎フォント文字）に比べて、ランレングスを奇数値にしたり情報コードを表す画素を付加するようにしているので、形状が多少変形されているが、文字の認識性を損ねるものではなく、このような印刷物５０を見ただけで、ユーザは元の文書の文字内容を容易に確認できる。
【００７６】
また、文字の属性情報を属性指定文字の記入ではなく、キャリアフォント文字のビットマップに情報コードとして付加しているので、パソコンなどで作成した文書の元データと略同じ位置に各文字が出力され、元データの文字レイアウトと印刷物上の文字レイアウトとが一致しなくなるという問題が生じることがなく、文書の印刷物を見ただけで、元データや該印刷物から復元される文書内容およびレイアウトを確認することができる。
【００７７】
また、第１の文書データ作成方法および装置における文書データ作成過程においては、文字の属性情報を属性指定文字の記入ではなく、キャリアフォント文字のビットマップに情報コードとして付加しているので、元データのレイアウトを崩すことなく属性情報を付加することができ、パソコンなどで作成した文書の元データと略同じ位置に各文字が出力されるので、元データの文字レイアウトと印刷物上の文字レイアウトとが一致しなくなるという問題が生じることがなく、文書の印刷物を見ただけで、元データや該印刷物から復元される文書内容およびレイアウトを確認することができる。
【００７８】
一方、文書復元装置３では、以下のようにしてキャリアフォント文字を用いた文書の印刷物５０から元の文書のデジタルデータＤ１が復元される。
【００７９】
１）画像入力手段６０により、キャリアフォント文字を用いた情報コード合成済文書の印刷物５０を４００ｄｐｉの多値画像データとして読み込む。また、文書ビットマップデータ抽出手段７０により、この多値画像データから、該印刷物５０のスケール枠５５内に存在する１００ｄｐｉの文書ビットマップデータを抽出する。
【００８０】
なお、画像入力手段６０による印刷物５０の読み込みの際には、印刷物５０の文字レイアウトに合わせて読み込むようにする。具体的には、図６に示す印刷物５０を読み込む場合には、印刷物５０の左上を走査原点として、右方向を主走査方向、下方向を副走査方向として読み取ることとする。読取り方向がこれとは異なる場合には、画像データ上で適当な回転処理を行った後に後述する各処理を行なうようにする。
【００８１】
なお、この第１の実施の形態においては、上述のように印刷物５０にはスケール枠５５が印刷されている。スケール枠５５と印刷文字との相対位置は原稿画像を忠実に再現するものであり、印刷物が多少天地左右にずれて画像入力手段６０により斜めに読み取られても、周知の位置ズレ補正方法を用いてこの相対位置に基づいて画像データ上で位置ズレを補正することにより、位置ズレのない文書ビットマップデータを抽出することができ、結果として正確な文字認識が可能となる。つまり、スケール枠５５は文書ビットマップデータを抽出する際の位置決め用のデータとして機能するものである。
【００８２】
２）情報コード分離手段８４により、水平方向１ラインごとの走査によって全ての黒画素群を検出する。さらに、各黒画素群のランレングスを計数し、該ランレングスが奇数である場合は情報として「０」、偶数である場合は情報として「１」がそれぞれ付加されていると判断して情報コードを分離する。なお、この情報コードは、文字位置基準点の座標が走査順で小さい文字から順番に分離される。また、この情報コードの分離と共に、ランレングスが偶数であった黒画素群を構成する走査方向最前段の情報コード「１」を表す黒画素を白画素に修正しておく。これにより情報コードを除去した後のキャリアフォント文字を用いた文書のビットマップデータが作成される。
【００８３】
３）文字認識手段８０において、情報コード分離手段８４によって情報コードを除去した後のキャリアフォント文字を用いた文書のビットマップデータと不図示の照合用テーブルとに基づいて、文字コードおよび１００ｄｐｉ相当の文字位置基準点の座標を復元する。
【００８４】
４）情報コード変換手段８６により、情報コード分離手段８４によって分離された情報コードを、１文字分ずつ、すなわち１２ビットずつグループ化して、図８（ａ）に示すフォーマットにｂ_１１，…，ｂ_０の順で当てはめる。さらに、図７に示す情報コード変換テーブル８５に基づいて所定の属性情報に変換する。これにより、文字位置基準点の座標が走査順で小さいものから順に、フォント、サイズ、スタイル、および文字位置基準点座標修正データの５つの属性情報を復元する。
【００８５】
５）文字・属性情報再構成手段８７において、１００ｄｐｉ相当の文字位置基準点の座標が小さい順に情報コード変換手段８６によって復元されたフォント、サイズ、スタイル、文字位置基準点座標修正データの情報とを対応付ける。具体的には、文字コードおよび１００ｄｐｉ相当の文字位置基準点の座標を、該文字位置基準点の座標の小さい順に並び替えて、フォント、サイズ、スタイルおよび文字位置基準点座標修正データと対応付けて、フォント、サイズ、あるいはスタイルに応じた文字のビットマップデータを配置する。また、１００ｄｐｉ相当の文字位置基準点の座標と文字位置基準点座標修正データとから、４００ｄｐｉ相当の文字位置基準点の座標を復元する、つまり元の文書データの文字と同じ位置に文字のビットマップデータが配置されるようにする。
【００８６】
これにより、文字コードの記述からなる、元の文書のデジタルデータを、パソコンやワープロ上で属性情報も含めて正確に復元させることができる。つまり、文字（本例では文字コード）の復元はキャリアフォント文字に基づいて行なっており、少なくとも、従来の文字認識技術と同じレベルの認識率で文書のデジタルデータを正確に復元させることができるし、フォント、サイズ、あるいはスタイルなどの属性情報を失うこともない。また、文字の基準位置情報も含めて復元するようにしているので、文字が配されるべき位置についても正確に復元することができ、復元されたデジタルデータに基づいて再出力した画像上や印刷物上においても、元の文書とほぼ同じレイアウトでその内容を確認することができる。
【００８７】
さらに、文書のデジタルデータを文書内容だけでなく属性情報も含めて復元するようにしているので、結果として、パソコンなどで作成した文書を紙出力のみで管理することができるようになるので、二重管理という問題を生じることがない文書管理方法にすることができる。
【００８８】
次に、本発明に係る文書データ作成部を備えた文書作成装置および本発明に係る文書データ復元部を備えた文書復元装置の第２の実施の形態について説明する。
【００８９】
この第２の実施の形態による文書作成装置２および文書復元装置３は、基本的には上記第１の実施の形態におけるものと同様の構成を備えている。異なるのは、第１の実施の形態ではユーザが作成した文書の各文字の全ての属性情報をキャリアフォント文字に付加するのに対し、第２の実施の形態では圧縮して付加し、この圧縮して付加された属性情報を復元するという点である。以下、この点について説明する。
【００９０】
図１１は文書作成装置の概略構成を示すブロック図である。なお、文書復元装置の構成は、第１の実施の形態におけるものと同じである。
【００９１】
通常、ユーザが任意に作成した文書では、図６に示すように、行単位または段落単位で、同一のフォント、サイズ、スタイルを有する文字が使用されていることが多い。この場合、キャリアフォント文字には、フォント、サイズ、スタイルに関して同一の情報コードが行単位または段落単位で繰り返し付加されることになる。
【００９２】
そこで、第２の実施の形態においては、図１１に示すように、属性情報変換手段３５と情報コード合成手段３６との間に情報コード圧縮手段３７を設け、属性情報変換手段３５によって得られた情報コードを圧縮した後に、圧縮済情報コードをキャリアフォント文字に付加するようにしている。具体的には、情報コードを、文字位置基準点座標修正データ（ｘ座標、ｙ座標）のように文字によって固有な情報（以下固有情報という）と、フォント、サイズ、スタイルのように行単位または段落単位で同一であることの多い情報（以下非固有情報という）とに分離し、図１２（ａ）（ｂ）に示すフォーマットに当てはめる。さらに、各文字について、走査順で１つ前の文字と非固有情報が同一であるか否かを判定し、同一であると判定した場合は、該各文字の非固有情報を消去して情報コードを圧縮する。なお、文書復元処理の際、非固有情報が消去された情報コードから元の全情報コードを復元するために、判定結果を非固有情報変更フラグとして図８（ａ）に示すフォーマットに付加している。第２の実施の形態においては、判定結果が同一である場合のフラグを「０」、異なる場合のフラグを「１」に設定している。
【００９３】
各文字「旅」「行」「日」「時」「一」「月」「十」「日」についての圧縮済情報コードを示すデータの一例を図１２（ｃ）に示す。図８（ｂ）と図１２（ｃ）とを比較すると判るように、図６に示した文書を復元するには、上記第１の実施の形態で用いる情報コードのフォーマットの場合は図８（ｂ）に示すように９６ビットのデータが必要になるになるのに対して、第２の実施の形態では６４ビットの情報コードで文書の復元が可能になっていることから情報コードの圧縮ができることが確認できる。
【００９４】
図１３に、情報コード合成後のビットマップデータの一例を示す。このビットマップデータは、図３（ｂ）に示す情報合成可能画素に、図１２（ｃ）に示す情報コードを順次付加することにより作成されたものである。
【００９５】
一方で、図５に示す文書復元装置３の情報コード変換手段８６において、情報コードの属性情報への変換前に、情報コード分離手段８４によって分離された情報コードを非固有情報変更有無フラグの情報に応じて図８（ａ）に示すフォーマットに編集し直す。具体的には、１ビット目の非固有情報変更有無フラグの情報が「１」である文字に対しては、固有情報として続く４ビットの情報を、非固有情報としてさらに続く８ビットの情報を分離して図８（ａ）に示すフォーマットに当てはめる。一方、１ビット目の非固有情報有無フラグの情報が「０」である場合は、固有情報として続く４ビットの情報を分離するとともに、非固有情報として該走査順で１つ前の文字の非固有情報を複写して、同フォーマットに当てはめる。つまり、この第２の実施の形態における情報コード変換手段８６は、本発明の情報コード復元手段を兼ねるものとして機能する。これにより、属性情報の復元を、上記第１の実施の形態と同様にして行なうことが可能になる。
【００９６】
以上のように、第２の実施の形態においては、情報コードを圧縮してキャリアフォント文字に付加するため、同一の情報量を復元するためにキャリアフォント文字に付加すべき情報コード数を、第１の実施の形態の場合に比べて縮小できる。換言すれば、第１の実施の形態の場合に比べて復元できる情報量が増加する。これにより、属性情報として、フォント、サイズ、スタイルの情報に加え、例えばアンダーライン指定や倍角指定などの多種の情報を付加して復元することもできるようになる。
【００９７】
なお、上記各実施の形態においては、文書作成装置２に画像出力手段４０を内蔵しているが、別体の構成にしてもよい。また、文書復元装置３における画像入力手段６０についても同様である。
【００９８】
また、上記各実施の形態においては、文字原稿を例にして説明したが、例えば罫線が含まれるような原稿においても、罫線を構成する“─”，“│”，“┌”，“ ┐”，“┘”，“└”，“├”，“ ┤”，“┴”，“ ┬”，“ ┼”といった記号を文字の場合と同様に規則化して、キャリアフォント文字に組み入れておき、さらに属性情報に罫線の太さや、破線等のスタイルを加えることで、文字原稿と同様に罫線を含む原稿にも本発明を利用することができるようになる。
【００９９】
なおキャリアフォント文字は一種類である必要はない。前記文字情報に文字コードと文字位置だけではなく、フォント、サイズも含め、各フォントに対応するキャリアフォント文字を複数のサイズ分持たせてもよい。こうすれば、本装置で作成されたビットマップデータは、元の文書原稿をより忠実に反映するものとなる。文字の情報のうち何を属性情報とするかは、生成されるビットマップデータの原稿に対する忠実さの度合いと、キャリアフォント文字フォント記憶手段の容量とのバランスによって決めればよい。
【０１００】
また、文書復元装置において、情報コードを圧縮することなくキャリアフォント文字に付加したものと、圧縮済情報コードをキャリアフォント文字に付加したもののいずれにも対応することができるようにするには、いずれの形態で情報コードを付加したのかを示すフラグを追加するとよい。通常は１文書ごとに圧縮済とするか否かであるので、このフラグは、１つの文書において、最初の文字の情報コードの先頭ビットｂ_１１の前に１つだけ付加するだけで十分である。
【０１０１】
また、上記各実施の形態においては、従来のフォント文字を基礎フォント文字として使用し、キャリアフォント文字を作成するものとして説明したが、この基礎フォント文字は、どのようなフォント文字を使用してもよく、今後出現する新たなフォント文字を使用することもできる。例えば、本出願人が特願２０００−８２１５６号において提案している専用フォント文字を使用することもできる。この場合、文字認識手段８０などの専用フォント文字と関連する部分については上記特願２０００−８２１５６号に記載のものとするのはいうまでもない。
【０１０２】
以上説明したように、上記実施の形態による紙ＤＩＦシステムにおいては、文書作成装置２において作成された文書の印刷物５０を見ただけで、ユーザは文書の内容を文字だけでなく属性も含めて確認できるし、また、文書復元装置３において、文書作成装置２により作成された文書の印刷物５０から、元の文書をパソコンやワープロ上で正確に復元させることができる。
【０１０３】
したがって、本発明を文書管理の方法として利用することによって、文書情報を紙出力のみで管理することができるようになるので、文書内容の確認や検索が容易にでき、また文書のデジタルデータを誤りなく復元でき、さらに、二重管理という問題を生じることがない文書管理方法を確立することができる。
【図面の簡単な説明】
【図１】キャリアフォント文字のビットマップを作成する装置の構成を示すブロック図
【図２】キャリアフォント文字を作成する方法を示すフローチャート
【図３】基礎フォント文字からキャリアフォント文字を作成する方法を示す図であって、基礎フォント文字のビットマップデータの一例を示した図（ａ）、情報合成可能画素を合わせて示した図（ｂ）、情報コードを合成した後のビットマップデータの一例を示した図（ｃ）
【図４】本発明の第１の実施の形態による文書作成装置の概略構成を示すブロック図
【図５】本発明の第１の実施の形態による文書復元装置の概略構成を示すブロック図
【図６】文書原稿の一例を示す図
【図７】属性情報と情報コードとの対応関係の一例を示した図
【図８】第１の実施の形態における情報コードのフォーマットの一例を示した図
【図９】図３（ｂ）中／で示す各情報合成可能画素と図８（ｂ）に示す各文字の情報コードの各ビットとの対応関係を示した拡大図
【図１０】キャリアフォント文字に情報コードを付加した文書の一例を示す図
【図１１】本発明の第２の実施の形態による文書作成装置の概略構成を示すブロック図
【図１２】第２の実施の形態における情報コードのフォーマットの一例を示した図（ａ），（ｂ）、圧縮済情報コードを示すデータの一例（ｃ）
【図１３】情報コードを合成した後のビットマップデータの一例を示した図
【符号の説明】
１ビットマップフォント作成装置
２文書作成装置
３文書復元装置
１０基礎フォント文字記憶手段
１１キャリアフォント文字作成手段
２０文字・属性情報抽出手段
３０文書データ作成部
３１キャリアフォント文字記憶手段
３２ビットマップデータ作成手段
３３情報合成可能画素抽出手段
３４属性情報変換テーブル
３５属性情報変換手段
３６情報コード合成手段
４０画像出力手段
５０印刷物
６０画像入力手段
７０文書ビットマップデータ抽出手段
８０文字認識手段
８５情報コード変換テーブル
８６情報コード変換手段
８７文字・属性情報再構成手段
８８文字データ復元部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a method and apparatus for creating bitmap data of a document created by a personal computer or a word processor, and a method and apparatus for restoring character data from a bitmap.
[0002]
[Prior art]
Conventionally, digital data (character code of each character and information on the printing position) of a document arbitrarily created by a user with a personal computer (hereinafter referred to as a personal computer) or a word processor (hereinafter referred to as a word processor) is stored or carried. As recording media (storage media) capable of recording, floppy disks, CD-ROMs, and the like are widely known. The content of a document recorded on this type of recording medium can be restored without error by using a device corresponding to the recording medium, for example, a personal computer or word processor equipped with a reading device such as an FD drive device or a CD-ROM drive device. be able to.
[0003]
However, in order to find a medium in which a desired document is recorded (so-called search) from a large number of storage media in a stored state, the recorded documents are sequentially ordered by file name using a personal computer or word processor. It needs to be read and displayed on the screen to check the contents. However, such a method has a problem that work efficiency is poor.
[0004]
On the other hand, in order to check the contents of a document by using the time during carrying (handling), carry a portable reading device having a display unit together with a recording medium, or carry a printed matter (paper medium) of the document. There is a need. However, in the method of carrying the reading device, there is a problem that the device having a display screen that is not difficult to see is too large or too heavy to carry. In the method of carrying the printed material, both the recording medium and the printed material are used. There is a problem in that double management occurs.
[0005]
Therefore, as a method for solving the above-described problems, the document is handled only by printed matter, and the digital data of the document is obtained by recognizing the characters of the printed matter by using various known character recognition methods. It is possible to restore.
[0006]
At this time, as proposed in, for example, JP-A-62-243087, not only character codes and character positions (hereinafter referred to as character information) but also characters such as underline, double angle, font, size, or style are used. It is also conceivable to restore together with information indicating attributes (hereinafter referred to as attribute information).
[0007]
Here, the method proposed in the above-mentioned Japanese Patent Application Laid-Open No. Sho 62-243087 is a method in which, in addition to characters indicating the document content on a sheet, for example, when double-size characters are specified, a double-enclosed BS is used. In the case of designation, an attribute designation character indicating the attribute information such as a circled US is entered immediately before the character whose attribute changes, and when the attribute designation character is recognized during character recognition, this attribute designation character Based on the information, the characters indicating the document content are modified, such as underlining or double-width characters.
[0008]
If this method is used, documents created on a personal computer or the like can be managed only by paper output, so it is easy to check and search the document content, and the digital data of the document can be attributed in addition to the document content. It is possible to restore the document including information, and to make a document management method that does not cause the problem of double management.
[0009]
[Problems to be solved by the invention]
However, in the method proposed in the above Japanese Patent Laid-Open No. 62-243087, the attribute designation character is entered immediately before the character whose attribute changes, so the character layout of the original data (position where the character is arranged) and the printed matter There is a problem that the character layout does not match. For this reason, just looking at the characters on the printed matter may cause a problem that it is impossible to notice an error in the layout of the original data or the restored document or a poor balance.
[0010]
The present invention has been made in view of the above circumstances, can easily check and search document contents, can restore digital data of documents including attribute information as well as document contents, and is also called double management. Bitmap data for character recognition documents that can confirm the original data and the document contents and layout restored from the printed material simply by looking at the printed material when realizing a document management method that does not cause a problem It is an object of the present invention to provide a method and apparatus for creating a character, and a method and apparatus for restoring character data from a bitmap.
[0011]
[Means for Solving the Problems]
  A first document data generation method according to the present invention is a document data generation method for generating bitmap data of a document based on document original data generated by a computer, and information on each character constituting the document original data. Character information and attribute information are extracted from the character information according to the extracted character information.Carrier with basic font characters modifiedBitmap data corresponding to the document manuscript data is created using a font character bitmap, and attribute information of each extracted character is stored.TheAttribute information andRepresents the attribute informationInformation codeWhenIs converted into the information code based on the first conversion table representing the correspondence ofPixels that meet the specified conditionsThe information code is extracted as a synthesizeable pixel that can be synthesized, and the information code is sequentially added to the synthesizeable pixel.
[0012]
  In the second document data creation method according to the present invention, when an information code is added (synthesized) to bitmap data of a document created in the same manner as the first document data creation method, the information code is added after being compressed. The character information and the attribute information are extracted from the information of each character constituting the document manuscript data, and according to the character information of each extracted character,Carrier with basic font characters modifiedBitmap data corresponding to the document manuscript data is created using a font character bitmap, and attribute information of each extracted character is stored.TheAttribute information andRepresents the attribute informationInformation codeWhenThe information code is converted into the information code based on the first conversion table representing the correspondence of the information, and the information code is separated into specific information unique to the character and non-unique information that is the same in line units or paragraph units. And deleting the non-unique information of consecutive characters among the characters constituting the document document data, compressing the information codes of the consecutive characters to generate a compressed information code, and the created bitmap Of each pixel of dataPixels that meet the specified conditionsThe compressed information code is extracted as a synthesizeable pixel that can be synthesized, and the compressed information code is sequentially added to the synthesizeable pixel.
[0013]
  Here "compression" is,SentenceOnly the information code for a predetermined character may be compressed so that the amount of all information codes for all characters of the book is reduced.
[0014]
  Also, “Career"Font characters"The carrierA character data restoration process that is convenient for recognizing whether an information code is added or not in a character data restoration process to be described later when an information code is added to a font character bitmap. As long as it can be recognized whether or not the information code is added in the above, any type may be used, and it may be obtained by transforming a conventionally used character. Good.
[0015]
  Further, the “synthesizeable pixel capable of synthesizing the compressed information code” is a pixel that is convenient for recognizing whether or not it is a pixel to which an information code is added in the character data restoration process described later, Any pixel may be used as a compositable pixel as long as it can be recognized whether or not the information code is added in the character data restoration process.. It should be noted that it is desirable to add an information code so as not to deteriorate the recognizability of the original character.
  Alternatively, the carrier font character is modified so that the run length of the black pixel group arranged on the line in the scanning direction becomes an odd value, and the compositible pixel is a black pixel arranged on the line in the scanning direction. The white pixel may be a white pixel adjacent to the group and the two black pixel groups are not connected even if the white pixel is changed to a black pixel.
[0016]
  On the other hand, a first character data restoration method according to the present invention is a method for restoring character data of a document from bitmap data of a document created by using the first document data creation method. mapAdded to the synthesizable pixel inExtracting the information code;Indicates the correspondence between the information code and the attribute informationConverting the extracted information code into attribute information with reference to a second conversion table;TheThe extracted information code is removed from the bitmap of the document, and based on the bitmap from which the information code is removedSentenceCharacter information is restored.
  Further, when the carrier font character is modified so that the run length of the black pixel group arranged on the line in the scanning direction becomes an odd value, it is created using the first document data creation method. The information code is extracted based on whether the run length of the black pixel group arranged on the line in the scanning direction is an even value or an odd value from the bit map of the document, and the correspondence between the information code and the attribute information is determined. The extracted information code is converted into attribute information with reference to the second conversion table to be represented, the extracted information code is removed from the bitmap of the document, and the character based on the bitmap from which the information code is removed The information may be restored.
[0017]
  A second character data restoration method according to the present invention is a method for restoring character data of a document from a document bitmap created by using the second document data creation method.Added to the synthesizable pixel inExtracting the compressed information code;TheRestore the extracted compressed information code before compressionSaidAsk for an information codeIndicates the correspondence between the information code and the attribute informationConverting the obtained information code into attribute information with reference to a second conversion table;SaidThe extracted compressed information code is removed from the bitmap of the document, and the compressed information code is removed based on the removed bitmap.SentenceCharacter information is restored.
  Further, when the carrier font character is modified so that the run length of the black pixel group arranged on the line in the scanning direction has an odd value, it is created using the second document data creation method. The compressed information code is extracted based on whether the run length of the black pixel group arranged on the line in the scanning direction is an even value or an odd value from the bitmap of the document, and the extracted compressed information code is The information code before compression is obtained by restoration, the obtained information code is converted into attribute information by referring to a second conversion table representing the correspondence between the information code and the attribute information, and the extracted compressed information is obtained. The code may be removed from the bitmap of the document, and the character information may be restored based on the bitmap from which the compressed information code is removed.
[0018]
In the first and second character data restoration methods, it is desirable that the restored character information is modified based on the converted attribute information.
[0019]
  A first document data creation device according to the present invention is a device that implements the first document data creation method, that is, a document data creation device that creates bitmap data of a document based on document original data created by a computer. BecauseCarrier with basic font characters modifiedStore font character bitmap dataCareerFont character storage means, character / attribute information extraction means for extracting character information and attribute information from information of each character constituting the document manuscript data, and depending on the character information of each extracted character,CareerFrom font character storage meansCareerBitmap data creation means for reading out bitmaps of font characters and creating bitmap data corresponding to the document original data, and attribute information of each extracted characterTheAttribute information andRepresents the attribute informationInformation codeWhenAttribute information conversion means for converting to the information code based on the first conversion table representing the correspondence of the pixel, and among the pixels of the created bitmap dataPixels that meet the specified conditionsIt comprises information synthesizable pixel extracting means for extracting the information code as a synthesizable pixel capable of synthesizing, and information code synthesizing means for sequentially adding the information code to the synthesizable pixel. is there.
[0020]
  A second document data creation device according to the present invention is a device that implements the second document data creation method, that is, a document data creation device that creates bitmap data of a document based on document original data created by a computer. BecauseCarrier with basic font characters modifiedStore font character bitmap dataCareerFont character storage means, character / attribute information extraction means for extracting character information and attribute information from information of each character constituting the document manuscript data, and depending on the character information of each extracted character,CareerFrom font character storage meansCareerBitmap data creation means for reading out bitmaps of font characters and creating bitmap data corresponding to the document original data, and attribute information of each extracted characterTheAttribute information andRepresents the attribute informationInformation codeWhenAttribute information conversion means for converting the information code into the information code based on the first conversion table representing the correspondence between the information code and the unique information unique to the character and the non-unique information that is the same in line unit or paragraph unit An information code that is separated and that erases the non-unique information of consecutive characters of the characters constituting the document document data and compresses the information code of the consecutive characters to generate a compressed information code Of the compression means and each pixel of the created bitmap dataPixels that meet the specified conditionsAn information synthesizable pixel extracting means for extracting the compressed information code as a synthesizable pixel capable of synthesizing, and an information code synthesizing means for sequentially adding the compressed information code to the synthesizable pixel. It is what.
[0021]
  A first character data restoration device according to the present invention is a device for performing the first character data restoration method, that is, character data of a document from a bitmap of a document created by the first document data creation device. A character data restoration device that restores a document bitmapAdded to the synthesizable pixel inInformation code extracting means for extracting the information code;Indicates the correspondence between the information code and the attribute informationAn information code converting means for converting the extracted information code into attribute information with reference to a second conversion table; an information code removing means for removing the extracted information code from the bitmap of the document; and Based on removed bitmapSentenceCharacter recognition means for restoring character information is provided.
  Further, when the carrier font character is modified so that the run length of the black pixel group arranged on the line in the scanning direction becomes an odd value, the document created by the first document data creating apparatus is Information code extracting means for extracting the information code based on whether the run length of the black pixel group arranged on the line in the scanning direction from the bitmap is an even value or an odd value, and the information code and the attribute information An information code converting means for converting the extracted information code into attribute information with reference to a second conversion table representing correspondence; an information code removing means for removing the extracted information code from the bitmap of the document; Character recognition means for restoring character information based on the bitmap from which the information code has been removed may be provided.
[0022]
  A second character data restoration device according to the present invention restores character data of a document from a device that implements the second character data restoration method, that is, a document bitmap created by the second document data creation device. Character data restoration device, which is a document bitmapAdded to the compositable pixels inAn information code extracting means for extracting the compressed information code;TheRestore the extracted compressed information code before compressionSaidAn information code restoring means for obtaining an information code;Indicates the correspondence between the information code and the attribute informationRefer to the second conversion tableSaidRestored information codeSaidInformation code conversion means for converting into attribute information;SaidInformation code removing means for removing the extracted compressed information code from the bitmap of the document, and based on the bitmap from which the compressed information code has been removedSentenceCharacter recognition means for restoring character information is provided.
  Further, when the carrier font character is modified so that the run length of the black pixel group arranged on the line in the scanning direction becomes an odd value, the document created by the second document data creating apparatus is Information code extracting means for extracting the compressed information code based on whether the run length of the black pixel group arranged on the line in the scanning direction from the bitmap is an even value or an odd value, and the extracted compressed information Information code restoring means for restoring the code to obtain the information code before compression, and converting the restored information code into the attribute information with reference to a second conversion table representing the correspondence between the information code and the attribute information Information code converting means, information code removing means for removing the extracted compressed information code from the bitmap of the document, and the compressed information code being removed. Or it may be provided with a character recognition means for restoring the character information based on the bitmap.
[0023]
In the first and second character data restoration devices according to the present invention, it is desirable that the first and second character data restoration devices further include modification means for modifying the restored character information based on the converted attribute information.
[0024]
【The invention's effect】
According to the first document data creation method and apparatus and the first character data restoration method and apparatus (hereinafter collectively referred to as the first invention) according to the present invention, in the document data creation process, each character constituting the document manuscript Character information and attribute information are extracted from the information of the document, and an information code representing the extracted attribute information is added to the bitmap data corresponding to the document manuscript created according to the extracted character information. The information code of each character is extracted from the bitmap, and the information code is converted into attribute information by referring to the conversion table. On the other hand, character recognition is performed based on the bitmap after the information code is removed to obtain the character information. Since it is restored, the digital data of the original document can be accurately restored including the attribute information.
[0025]
As described above, according to the first aspect of the invention, the digital data of the document can be restored including not only the document contents but also the attribute information. As a result, the document created on the personal computer or the like is output to paper. Therefore, it is possible to make a document management method that does not cause the problem of double management.
[0026]
In the document data creation process in the first document data creation method and apparatus, the character attribute information is added as an information code to the bitmap of the carrier font character instead of entering the attribute designation character. Attribute information can be added without destroying the layout of the document, and each character is output at approximately the same position as the original data of the document created on a personal computer, etc., and the character layout of the original data matches the character layout on the printed matter. The problem of disappearance does not occur, and it is possible to confirm the original data and the document content and layout restored from the printed material simply by looking at the printed material.
[0027]
According to the second document data creation method and apparatus and the second character data restoration method and apparatus (hereinafter collectively referred to as the second invention) according to the present invention, the information code is compressed in the document data creation process. A compressed information code is added later by a method similar to that of the first invention, and the compressed information code extracted from the document bitmap is restored to the original information code in the character data restoration process. Since attribute information and character information are obtained by the same method, the number of information codes for restoring the same information amount can be reduced as compared with the case where the information code is not compressed, and the amount of attribute information that can be restored is the first. There is an effect that it increases compared to the first invention.
[0028]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
[0029]
FIG. 1 is a block diagram showing a configuration of an apparatus for creating a bitmap of a carrier font character used in each apparatus for executing a document data creation method and a character data restoration method according to an embodiment of the present invention to be described later.
[0030]
As shown in FIG. 1, this bitmap font creating apparatus 1 includes a basic font character storage means 10 for storing bitmap data of basic font characters constituting a basic bitmap font, and a basic font represented by the read bitmap data. By modifying the character bitmap using the method described later, the number of pixels constituting the black pixel group for each horizontal line (hereinafter referred to as “run length”) is set to an odd value, thereby creating a new font. Carrier font character creating means 11 for creating characters (hereinafter referred to as carrier font characters).
[0031]
Carrier font characters with odd run lengths may have odd values for all run lengths of the conventional font characters themselves, but if not, bitmaps are created using the conventional font characters as basic font characters. After that, it can be created by applying a minute deformation using a method described later. The reason why the run length is made odd is to add (synthesize) the character attribute information to the character bitmap as binary data so that the added binary data can be restored (separated). (Details will be described later).
[0032]
Next, a method for creating a bitmap font by the bitmap font creation apparatus 1, that is, a method for creating a number of carrier font characters constituting the bitmap font will be described with reference to the flowchart shown in FIG. In FIG. 2, the step number is marked with S.
[0033]
1) First, the carrier font character creating means 11 is a general font having a specific font character (basic font character), size, and style which is stored in the basic font character storage means 10 and serves as a basis of the carrier font character. Any one character data is selected and read from a set of basic font characters (font data). Then, scanning is performed for each line in the horizontal direction within a unique region according to the character size of the read basic font character (step 1), and black pixels are detected (step 2). Further, the run length of the black pixel group starting from the detected black pixel is counted (step 3).
[0034]
Here, the unique area may be a rectangular area with dotted lines around the characters in FIG. Scanning means on image data. For example, pixel data of basic font characters may be stored in an area memory in correspondence with the bitmap and scanned on this memory.
[0035]
It is determined whether or not the run length of the black pixel group counted in step 3 is an odd number (step 4). If it is determined that the run length is not an odd number, the pixel in the foremost stage in the scanning direction constituting the black pixel group is determined. Is converted into white pixels to make the run length an odd number (step 5).
[0036]
2) When it is determined in step 4 that the run length is an odd number and after the run length is converted to an odd number in step 5, the processing from step 1 to step 4 or step 5 is performed depending on the character size. Repeat until all the lines in the region are scanned.
[0037]
Then, by performing the above processing for all basic font characters, font data of carrier font characters is created. The created font data may be recorded and distributed on a computer-readable medium such as a CD-ROM. FIG. 3A shows an example of bitmap data of each carrier font character for the Chinese characters “travel” and “line”.
[0038]
Thus, if the above method is used, bit map data of carrier font characters can be created by a simple method.
[0039]
Next, a document creation apparatus having a document data creation section according to the present invention, and a document creation apparatus, which creates bitmap data of a document using a bitmap font composed of carrier font characters created as described above A first embodiment of a document restoration apparatus provided with a document data restoration unit according to the present invention for restoring character data of a document from a printed matter created and output in accordance with the first embodiment will be described. The document creation device and the document restoration device are collectively referred to as a paper digital interface system (hereinafter referred to as a paper DIF system).
[0040]
FIG. 4 is a block diagram showing a schematic configuration of the document creation device, FIG. 5 is a block diagram showing a schematic configuration of the document restoration device, and FIG. 6 is a diagram showing an example of a document original created by a personal computer or a word processor.
[0041]
As shown in FIG. 4, the document creation apparatus 2 includes character / attribute information extraction means 20, carrier font character storage means 31, document bitmap data creation means 32, information synthesizable pixel extraction means 33, and the first of the present invention. A document data creation unit 30 including an attribute information conversion table 34, an attribute information conversion unit 35, and an information code synthesis unit 36, and an image output unit 40 such as a printer. According to the character and attribute information of a document arbitrarily created by a word processor, a printed matter 50 of an information code composite document as a character recognition document using a carrier font character is created and output.
[0042]
The carrier font character storage means 31 constituting the document data creation unit 30 stores the carrier font characters created by the bitmap font creation device 1 described above. When obtaining font data, for example, a medium such as a CD-ROM in which a set of dedicated font characters is recorded may be read by a reading device (not shown), and the read data may be stored in the carrier font character storage means 31.
[0043]
The document bitmap data creation means 32 constituting the document data creation unit 30 is stored in the carrier font character storage means 31 according to the character code of the character information of each character of the document arbitrarily created by the user with a personal computer or word processor. The bitmap data of a predetermined carrier font character is read out, and the bitmap data of the read carrier font character is read out according to the coordinates of the character position reference point in the character information of each character. Bitmap data of a document using a carrier font character is created by pasting all the pixels corresponding to to a predetermined position of a white digital image.
[0044]
Here, the coordinates of the character position reference point may be predetermined coordinates in a unique area according to the character size. In this example, dotted lines are added around each character indicated by ● in FIG. The coordinates of the upper left vertex pixel in the rectangular area a. However, the present invention is not limited to this, and other coordinates may be used as the coordinates of the character position reference point as long as the reference position of the character can be determined.
[0045]
The information synthesizable pixel extraction means 33 constituting the document data creation unit 30 constitutes a black pixel group as information synthesizable pixels by scanning each line in the horizontal direction within the image area of the document using the carrier font character. A white pixel adjacent to the pixel at the forefront in the scanning direction is extracted.
[0046]
The attribute information conversion means 35 constituting the document data creation unit 30 converts each information of font, size, style, and character position reference point coordinate correction data into a predetermined information code based on the attribute information conversion table 34 shown in FIG. Each is converted and each information code is applied to a predetermined format.
[0047]
The information code synthesizing unit 36 constituting the document data creating unit 30 adds the information code applied to a predetermined format to the information synthesizable pixels extracted by the information synthesizable pixel extracting unit 33 according to a predetermined order. Thus, bitmap data of an information code composition document using carrier font characters is created.
[0048]
As shown in FIG. 5, the document restoration apparatus 3 includes an image input means 60 such as a scanner having a resolution of 400 dpi, a document bitmap data extraction means 70, a character recognition means 80, an information code separation means 84, and the present invention. The document creation apparatus 2 includes an information code conversion table 85 as a second conversion table, an information code conversion unit 86, and a character data restoration unit 88 including a character / attribute information reconstruction unit 87 as a modification unit. The character data of the original document is restored on the personal computer or word processor from the printed matter 50 created and output by the above.
[0049]
The information code separating unit 84 constituting the character data restoring unit 88 detects all black pixel groups by scanning each line in the horizontal direction for the character extracted by the document bitmap data extracting unit 70, and then detects each black pixel. A group run length is counted to separate information codes, and a black pixel representing the information code “1” is corrected to a white pixel. In other words, the information code separating means 84 serves as both the information code extracting means and the information code removing means of the present invention.
[0050]
The character recognition unit 80 constituting the character data restoration unit 88 uses characters (character codes in this example) and characters from the bitmap data of the document using the carrier font characters after being corrected by the information code separation unit 84. The coordinates of the position reference point are restored. Various known methods can be used as the character recognition method.
[0051]
The information code conversion table 85 constituting the character data restoration unit 88 stores the same information as the information stored in the attribute information conversion table 34 of the character creation device 2.
[0052]
The information code conversion means 86 constituting the character data restoration unit 88 groups the information codes separated by the information code separation means 84 one character at a time in ascending order of the coordinates of the character position reference points, and puts them in the predetermined format. Accordingly, it is applied and converted (restored) into the attribute information of four characters of font, size, style, and character position reference point coordinate correction data based on the information code conversion table 85.
[0053]
The character / attribute information reconstructing unit 87 constituting the character data restoring unit 88 has the character code restored by the character recognizing unit 80 and the coordinates of the character position reference point and the coordinates of the character position reference point by the information code converting unit 86. The attribute information of the four characters of font, size, style, and character position reference point coordinate correction data restored in ascending order is associated.
[0054]
Note that an image output unit may be provided after the character / attribute information reconstruction unit 87 to restore the original document as a printed matter.
[0055]
Next, the operation of the paper DIF system including the document creation device 2 and the document restoration device 3 will be described.
[0056]
The document creation apparatus 2 outputs a printed matter 50 of a document using carrier font characters as follows.
[0057]
1) First, the character / attribute information extraction means 20 extracts five attribute information of a character code, character position reference point coordinates, font, size, and style of a document arbitrarily created by a user using a personal computer or word processor. .
[0058]
When the attribute information is extracted by the character / attribute information extracting means 20, it is extracted according to the character layout of the document data. Specifically, when extracting attribute information from the document data corresponding to the printed material 50 shown in FIG. 6, the upper left of the printed material 50 is extracted as the scanning origin, the right direction is extracted as the main scanning direction, and the downward direction is extracted as the sub-scanning direction. I decided to. If the scanning direction is different from this, extraction processing is performed after appropriate rotation processing is performed on the image data. Scanning means on image data. For example, pixel data of a document created by a personal computer or the like is stored in an area memory corresponding to the bit map, and scanned on this memory.
[0059]
It is assumed that the extracted attribute information of each character is sequentially extracted from characters whose coordinates of the character position reference point are small in the scanning order. For example, in FIG. 6, “Journey”, “Line”, “Day”, “Time”, “One”, “Month”, “Ten”, and “Day” are extracted in this order.
[0060]
2) Next, the document bitmap data creation means 32 sequentially reads out bitmap data of predetermined carrier font characters stored in the carrier font character font storage means 31 according to the character code of each character. Then, the character position reference point of the carrier font character corresponding to the character is placed at the position on the paper where the character position reference point of each character of the document is placed. The bitmap data of the read carrier font characters is sequentially pasted at a predetermined position of the digital image with white pixels. As a result, document bitmap data D1 using carrier font characters is created.
[0061]
The bitmap data D1 thus created has an odd number of pixels constituting the black pixel group for each horizontal direction (corresponding to the row direction of the document) constituting the bitmap of the carrier font character, that is, the run length. It has become that.
[0062]
Further, in the first embodiment, since the image input means 60 having a resolution of 400 dpi is used in the document restoration apparatus, a document having a resolution of 400 dpi dots created by a user with a personal computer or word processor. Therefore, a printed matter 50 having a dot resolution of 100 dpi is created. This is because the resolution of the image input means 60 needs to be higher than the resolution of the dots of the printed matter 50 in order for the document bitmap data extracting means 70 to accurately extract the image data of the printed matter 50. Therefore, the bitmap data of the carrier font characters by the document bitmap data creating means 32 is pasted by converting the coordinates of the character position reference point equivalent to 400 dpi shown by ● in FIG. 6 to the character position reference equivalent to 100 dpi. The coordinates of the point are set so as to coincide with the coordinates of the upper left vertex pixel in the unique area b in accordance with the character size of the carrier font character indicated by ◯ in FIG.
[0063]
In the first embodiment, information on the amount of deviation caused by the conversion from 400 dpi to 100 dpi (hereinafter referred to as character position reference point coordinate correction data) is added to the carrier font character along with the attribute information. In the document restoration device 3, the coordinates of the character position reference point can be restored accurately.
[0064]
This is because, when a 400 dpi font character is expressed at 100 dpi, the resolution becomes ¼ and the positional accuracy of the character layout output by the document creation device 2 also drops to ¼. When the coordinate (400 dpi) is the main scanning direction X = 401 and the sub-scanning direction Y = 403, the coordinates on the paper output by the document creation device 2 are X = 100 (too much 1), Y = The “too much” part in parentheses cannot be reproduced, such as 100 (too much 3). Therefore, in order to accurately restore the coordinates of the character position reference point, it is necessary to correct this decrease in position accuracy. In this example, the “too much” X coordinate is used as the character position reference point coordinate correction data. The information indicating the component and the Y coordinate component is added to the carrier font character together with the attribute information, and the “too much” portion is corrected by using the X coordinate component and the Y coordinate component in the restoration process. I have to.
[0065]
3) Next, by the information synthesizable pixel extracting means 33, scanning that constitutes the black pixel group as the information synthesizable pixels by scanning for each horizontal line in the image area of the document using the carrier font character. A white pixel adjacent to the pixel in the foremost direction is extracted. However, among these white pixels, a pixel that is changed to a black pixel and is connected to two black pixel groups to form a new black pixel group is treated as a pixel in which an information code cannot be synthesized. This is to prevent the recognition of the original character from being deteriorated in the paper output by adding the information code. FIG. 3B is a diagram in which information compositible pixels are shown together with the bit map of the carrier font characters “journey” and “row” in FIG. In FIG. 3B, / indicates a pixel that can be combined with information, and x indicates a pixel that cannot combine information codes because two black pixel groups are connected.
[0066]
4) Further, the attribute information conversion means 35 converts the font, size, style, and character position reference point coordinate correction data information into an information code based on the attribute information conversion table 34 shown in FIG. Apply.
[0067]
Specifically, as shown in FIG.₁₁, B₁₀, B₉Size information in 3 bits (bit)₈, B₇, B₆Style information is b in 3 bits of₅, B₄The Y coordinate component of the character position reference point coordinate correction data is b₃, B₂Similarly, the X coordinate component is set to b for 2 bits of₁, B₀The code data consisting of 12 bits is applied to each of the 2 bits.
[0068]
FIG. 8B shows an example of data indicating information codes for each character “journey” “line” “day” “hour” “one” “month” “ten” “day”.
[0069]
5) Further, an information code applied to the format shown in FIG.₁₁, ..., b₀Are sequentially added to the information synthesizable pixels extracted by the information synthesizable pixel extracting means 33 according to the scanning order. Here, “scanning order” means that the right side is the main scanning direction X and the downward direction is the sub-scanning direction Y on the bitmap corresponding to the document document shown in FIG. First, the main scanning is performed with the scanning origin as the scanning origin, and when the main scanning for one line is completed, the order is shifted to the next line. When the addition of the information code for one character is completed, the information code for the next character is continuously added.
[0070]
FIG. 9 is an enlarged view of FIG. 3B showing the correspondence between each information compositible pixel indicated by / in FIG. 3B and each bit of the information code of each character shown in FIG. 8B. As shown in the drawing, each bit of the information code of each character corresponds to the scanning order of the information synthesizable pixels in FIG.₀In the following, the information compositing pixel of the next character (not shown) is associated, and the information codes of all the characters are associated with each other.
[0071]
In the first embodiment, only when “1” is added as the information code, the information synthesizable pixel is changed to a black pixel so that the run length of the black pixel group changes to an even number. ing. Thereby, bitmap data of the information code composite document using the carrier font characters is created.
[0072]
FIG. 3C shows an example of bitmap data using carrier font characters after information code synthesis. This bitmap data is created by adding a pixel representing the information code shown in FIG. 8B to the information synthesizable pixel indicated by / in FIG. 3B. The bit b of “one”₀After that, it is added to the next character (not shown), and the information codes of all characters are completely added.
[0073]
6) Finally, the image output means 40 outputs the printed matter 50 of the document at 100 dpi based on the bitmap data D2 of the information code synthesized document using the carrier font characters. As a result, carrier font characters are printed with substantially the same layout (character position) as the original data of a document created on a personal computer or the like.
[0074]
In the first embodiment, the image area size and resolution of the document are set so that document bitmap data can be accurately extracted by the document bitmap data extraction means 70 of the document restoration device 3 described later. A corresponding scale frame 55 is printed several millimeters from the edge of the paper. FIG. 10 shows an example of a printed matter 50 of a document with a scale frame 55 and an information code added to a carrier font character. Note that not only the scale frame but also other characters and symbols indicating the reference position may be printed.
[0075]
As can be seen from FIG. 10, for some of the carrier font characters (“journey” and “line” in the figure), the run length is set to an odd value or a pixel representing an information code compared to the original character (basic font character). However, the shape is slightly deformed, but this does not impair the recognizability of the characters. The user can easily understand the character contents of the original document simply by looking at such a printed matter 50. I can confirm.
[0076]
In addition, since the character attribute information is added as an information code to the bitmap of the carrier font character instead of entering the attribute designation character, each character is output at approximately the same position as the original data of the document created on a personal computer or the like. Thus, the problem that the character layout of the original data does not match the character layout on the printed material does not occur, and the document content and layout restored from the printed data can be confirmed simply by looking at the printed material of the document. be able to.
[0077]
In the document data creation process in the first document data creation method and apparatus, the character attribute information is added as an information code to the bitmap of the carrier font character instead of entering the attribute designation character. Attribute information can be added without destroying the layout of the document, and each character is output at approximately the same position as the original data of the document created on a personal computer, etc., so the character layout of the original data and the character layout on the printed matter There is no problem that they do not match, and it is possible to confirm the original data and the document contents and layout restored from the printed material simply by looking at the printed material.
[0078]
On the other hand, in the document restoration device 3, the digital data D1 of the original document is restored from the printed material 50 of the document using the carrier font characters as follows.
[0079]
1) The printed material 50 of the information code synthesized document using the carrier font characters is read by the image input means 60 as 400 dpi multi-value image data. Further, the document bitmap data extracting means 70 extracts 100 dpi document bitmap data existing in the scale frame 55 of the printed matter 50 from the multi-valued image data.
[0080]
When the printed material 50 is read by the image input means 60, it is read according to the character layout of the printed material 50. Specifically, when the printed material 50 shown in FIG. 6 is read, the upper left of the printed material 50 is read as the scanning origin, the right direction is read as the main scanning direction, and the lower direction is read as the sub-scanning direction. When the reading direction is different from this, each processing described later is performed after appropriate rotation processing is performed on the image data.
[0081]
In the first embodiment, the scale frame 55 is printed on the printed matter 50 as described above. The relative position between the scale frame 55 and the printed characters reproduces the original image faithfully, and even if the printed matter is slightly shifted to the left and right and is read obliquely by the image input means 60, a known positional deviation correction method is used. By correcting the positional deviation on the image data based on the relative position, document bitmap data having no positional deviation can be extracted, and as a result, accurate character recognition is possible. That is, the scale frame 55 functions as positioning data when extracting document bitmap data.
[0082]
2) The information code separation means 84 detects all black pixel groups by scanning every horizontal line. Further, the run length of each black pixel group is counted. If the run length is an odd number, it is determined that “0” is added as information, and if it is an even number, “1” is added as information. Isolate. Note that this information code is sequentially separated from the character whose coordinates of the character position reference point are in the scanning order. In addition to the separation of the information code, the black pixel representing the information code “1” at the forefront stage in the scanning direction constituting the black pixel group having an even run length is corrected to a white pixel. Thereby, bitmap data of the document using the carrier font character after the information code is removed is created.
[0083]
3) In the character recognition unit 80, based on the bitmap data of the document using the carrier font characters after the information code is removed by the information code separation unit 84 and a collation table (not shown), the character code and the equivalent of 100 dpi Restores the coordinates of the character position reference point.
[0084]
4) The information code separated by the information code separating means 84 is grouped by one character, that is, 12 bits by the information code converting means 86, and the information code b is converted into the format shown in FIG.₁₁, ..., b₀Apply in this order. Further, it is converted into predetermined attribute information based on the information code conversion table 85 shown in FIG. As a result, the five pieces of attribute information of the font, size, style, and character position reference point coordinate correction data are restored in order from the smallest coordinate of the character position reference point in the scanning order.
[0085]
5) In the character / attribute information reconstruction unit 87, the font, size, style, and character position reference point coordinate correction data information restored by the information code conversion unit 86 in ascending order of the coordinates of the character position reference point equivalent to 100 dpi are stored. Associate. Specifically, the character code and the coordinates of the character position reference point equivalent to 100 dpi are rearranged in ascending order of the coordinates of the character position reference point, and associated with the font, size, style, and character position reference point coordinate correction data. Arrange character bitmap data according to font, size, or style. Further, the coordinates of the character position reference point equivalent to 400 dpi are restored from the coordinates of the character position reference point equivalent to 100 dpi and the character position reference point coordinate correction data, that is, the character bitmap at the same position as the character of the original document data. Ensure that the data is placed.
[0086]
Thereby, the digital data of the original document composed of the description of the character code can be accurately restored on the personal computer or word processor including the attribute information. In other words, the character (character code in this example) is restored based on the carrier font character, and at least the digital data of the document can be accurately restored with the same recognition rate as the conventional character recognition technology. You won't lose attribute information such as font, size, or style. In addition, since the information including the reference position information of the character is restored, the position where the character is to be placed can be restored accurately, and the image or printed matter is re-output based on the restored digital data. In the above, the content can be confirmed with the same layout as the original document.
[0087]
Furthermore, since the digital data of the document is restored including not only the document content but also the attribute information, the document created on a personal computer or the like can be managed only by paper output. A document management method that does not cause the problem of heavy management can be achieved.
[0088]
Next, a second embodiment of the document creation apparatus provided with the document data creation unit according to the present invention and the document restoration apparatus provided with the document data restoration unit according to the present invention will be described.
[0089]
The document creation device 2 and the document restoration device 3 according to the second embodiment basically have the same configuration as that in the first embodiment. The difference is that in the first embodiment, all the attribute information of each character of the document created by the user is added to the carrier font character, whereas in the second embodiment, it is compressed and added. The attribute information added in this way is restored. Hereinafter, this point will be described.
[0090]
FIG. 11 is a block diagram showing a schematic configuration of the document creation apparatus. The configuration of the document restoration apparatus is the same as that in the first embodiment.
[0091]
Normally, in a document arbitrarily created by a user, characters having the same font, size, and style are often used in units of lines or paragraphs as shown in FIG. In this case, the same information code regarding the font, size, and style is repeatedly added to the carrier font character in units of lines or paragraphs.
[0092]
Therefore, in the second embodiment, as shown in FIG. 11, an information code compression unit 37 is provided between the attribute information conversion unit 35 and the information code synthesis unit 36, and the attribute information conversion unit 35 obtains the information code compression unit 37. After the information code is compressed, the compressed information code is added to the carrier font characters. Specifically, the information code includes information unique to each character (hereinafter referred to as “unique information”) such as character position reference point coordinate correction data (x coordinate, y coordinate) and line-by-line information such as font, size, and style. It is separated into information that is often the same for each paragraph (hereinafter referred to as non-unique information) and applied to the format shown in FIGS. Further, for each character, it is determined whether or not the non-unique information is the same as the previous character in the scanning order. If it is determined that the character is the same, the non-unique information of each character is deleted and information is deleted. Compress the code. In the document restoration process, in order to restore all original information codes from information codes from which non-unique information has been deleted, the determination result is added as a non-unique information change flag to the format shown in FIG. Yes. In the second embodiment, the flag when the determination result is the same is set to “0”, and the flag when the determination result is different is set to “1”.
[0093]
FIG. 12C shows an example of data indicating the compressed information code for each character “journey” “line” “day” “hour” “one” “month” “ten” “day”. As can be seen by comparing FIG. 8B and FIG. 12C, in order to restore the document shown in FIG. 6, in the case of the information code format used in the first embodiment, FIG. As shown in b), 96-bit data is required, whereas in the second embodiment, since the document can be restored with a 64-bit information code, the information code can be compressed. I can confirm that I can do it.
[0094]
FIG. 13 shows an example of bitmap data after information code synthesis. This bitmap data is created by sequentially adding the information code shown in FIG. 12C to the information compositible pixels shown in FIG.
[0095]
On the other hand, in the information code conversion means 86 of the document restoration apparatus 3 shown in FIG. 5, the information code separated by the information code separation means 84 is converted into the information of the non-unique information change presence / absence flag before the information code is converted into the attribute information. Accordingly, the format is re-edited to the format shown in FIG. Specifically, for characters whose non-unique information change presence / absence flag information of the first bit is “1”, 4-bit information that continues as unique information, and 8-bit information that continues as non-unique information These are separated and applied to the format shown in FIG. On the other hand, when the information of the non-unique information presence / absence flag of the first bit is “0”, the following 4-bit information is separated as the unique information and the non-unique information of the previous character in the scanning order is separated. Copy the unique information and apply it to the same format. That is, the information code conversion means 86 in the second embodiment functions as the information code restoration means of the present invention. As a result, the attribute information can be restored in the same manner as in the first embodiment.
[0096]
As described above, in the second embodiment, since the information code is compressed and added to the carrier font character, the number of information codes to be added to the carrier font character in order to restore the same amount of information is Compared with the first embodiment, the size can be reduced. In other words, the amount of information that can be restored increases compared to the case of the first embodiment. As a result, in addition to font, size, and style information, various information such as underline designation and double-angle designation can be added and restored as attribute information.
[0097]
In each of the embodiments described above, the image output means 40 is built in the document creation device 2, but it may be configured separately. The same applies to the image input means 60 in the document restoration device 3.
[0098]
In each of the above embodiments, the text document is described as an example. However, for example, a document including ruled lines includes “—”, “|”, “┌”, and ““ ”that configure ruled lines. , “┘”, “└”, “├”, “┤”, “┴”, “┬”, “┼” are regularized in the same way as in the case of characters, and incorporated into carrier font characters, By adding a ruled line thickness or a broken line style to the attribute information, the present invention can be used for a document including ruled lines as well as a character document.
[0099]
The carrier font character need not be one type. The character information may include a plurality of carrier font characters corresponding to each font including not only the character code and character position but also the font and size. In this way, the bitmap data created by this apparatus more accurately reflects the original document document. What character information is used as attribute information may be determined by the balance between the degree of fidelity of the generated bitmap data to the original and the capacity of the carrier font character font storage means.
[0100]
In addition, in the document restoration device, in order to be able to support both the information code added to the carrier font character without being compressed and the compressed information code added to the carrier font character, It is preferable to add a flag indicating whether the information code is added in the form of. Since it is usually determined whether or not each document is compressed, this flag indicates the first bit b of the information code of the first character in one document.₁₁It is sufficient to add only one before the.
[0101]
In each of the above embodiments, a conventional font character is used as a basic font character and a carrier font character is created. However, this basic font character may be any font character. Often, new font characters that appear in the future can be used. For example, special font characters proposed by the present applicant in Japanese Patent Application No. 2000-82156 can be used. In this case, it goes without saying that the portion related to the dedicated font character such as the character recognition means 80 is described in the above Japanese Patent Application No. 2000-82156.
[0102]
As described above, in the paper DIF system according to the above-described embodiment, the user confirms the content of the document including not only the characters but also the attributes only by looking at the printed matter 50 of the document created by the document creation device 2. In addition, in the document restoration device 3, the original document can be accurately restored on the personal computer or word processor from the printed material 50 of the document created by the document creation device 2.
[0103]
Therefore, by using the present invention as a document management method, document information can be managed only by paper output, so that document contents can be easily confirmed and searched, and digital data in the document is erroneous. It is possible to establish a document management method that can be restored without any problems and that does not cause the problem of double management.
[Brief description of the drawings]
FIG. 1 is a block diagram showing the configuration of an apparatus for creating a bitmap of carrier font characters
FIG. 2 is a flowchart showing a method for creating a carrier font character.
FIG. 3 is a diagram showing a method for creating a carrier font character from a basic font character, and shows a diagram (a) showing an example of bitmap data of the basic font character, and a diagram showing together information synthesizable pixels (FIG. b) A diagram (c) showing an example of bitmap data after the information code is synthesized
FIG. 4 is a block diagram showing a schematic configuration of a document creation apparatus according to the first embodiment of the present invention.
FIG. 5 is a block diagram showing a schematic configuration of the document restoration apparatus according to the first embodiment of the present invention.
FIG. 6 is a diagram showing an example of a document document
FIG. 7 is a diagram showing an example of the correspondence between attribute information and information code
FIG. 8 is a diagram showing an example of an information code format in the first embodiment;
9 is an enlarged view showing the correspondence between each information compositible pixel indicated by / in FIG. 3B and each bit of the information code of each character shown in FIG. 8B.
FIG. 10 is a view showing an example of a document in which an information code is added to a carrier font character.
FIG. 11 is a block diagram showing a schematic configuration of a document creation apparatus according to a second embodiment of the present invention.
FIGS. 12A and 12B show an example of an information code format according to the second embodiment, and FIG. 12C shows an example of data indicating a compressed information code.
FIG. 13 is a diagram showing an example of bitmap data after combining information codes
[Explanation of symbols]
1 Bitmap font creation device
2 Document creation device
3 Document restoration device
10 Basic font character storage means
11 Carrier font character creation means
20 Character / attribute information extraction means
30 Document data creation section
31 Carrier font character storage means
32 Bitmap data creation means
33 Information synthesizable pixel extraction means
34 Attribute information conversion table
35 Attribute information conversion means
36 Information code synthesis means
40 Image output means
50 printed matter
60 Image input means
70 Document bitmap data extraction means
80 character recognition means
85 Information code conversion table
86 Information code conversion means
87 Character / attribute information reconstruction means
88 character data restoration part

Claims

A document data creation method for creating bitmap data of a document based on document manuscript data created by a computer,
Character information and attribute information are extracted from information of each character constituting the document manuscript data;
In accordance with the character information of each extracted character, a bitmap data corresponding to the document manuscript data is created using a bitmap of a carrier font character obtained by modifying a basic font character ,
The attribute information of each character that the extracted and converted into the information code based on a first conversion table showing the correspondence between the information codes representing the attribute information and the attribute information, each of the bit map data created in the above A pixel that matches a predetermined condition among the pixels is extracted as a synthesizeable pixel that can synthesize the information code,
A document data creation method, wherein the information code is sequentially added to the compositable pixels.

The carrier font character is modified so that the run length of the black pixel group arranged on the line in the scanning direction has an odd value,
The synthesizable pixel is a white pixel adjacent to a black pixel group arranged on the line in the scanning direction, and the white pixel is not connected to the two black pixel groups even if the white pixel is changed to a black pixel. Characteristic claims 1 Document data creation method described.

A document data creation method for creating bitmap data of a document based on document manuscript data created by a computer,
Character information and attribute information are extracted from information of each character constituting the document manuscript data;
In accordance with the character information of each extracted character, a bitmap data corresponding to the document manuscript data is created using a bitmap of a carrier font character obtained by modifying a basic font character ,
The attribute information of each character that the extracted and converted into the information code based on a first conversion table showing the correspondence between the information codes representing the attribute information and the attribute information, wherein the information code is unique by the letter It is separated into unique information and non-unique information that is the same in line units or paragraph units, and the non-unique information of consecutive characters among the characters constituting the document manuscript data is deleted, and the continuous information A compressible information code is generated by compressing a character information code, and a pixel that matches a predetermined condition among the pixels of the created bitmap data can be synthesized. Extract as pixels,
A method of creating document data, wherein the compressed information code is sequentially added to the compositable pixels.

The carrier font character is modified so that the run length of the black pixel group arranged on the line in the scanning direction has an odd value,
The synthesizable pixel is a white pixel adjacent to a black pixel group arranged on the line in the scanning direction, and the white pixel is not connected to the two black pixel groups even if the white pixel is changed to a black pixel. The document data creation method according to claim 3, wherein:

A method for restoring character data of a document from bitmap data of a document created using the document data creation method according to claim 1 ,
The information code added to the synthesizable pixel in the bitmap of the document is extracted, and the extracted information code is referred to as attribute information by referring to a second conversion table indicating correspondence between the information code and the attribute information. Converted to
Removing the information code the extracted from the bitmap of the document, character data restoring method which is characterized in that to restore the character information based on the bit map in which the information code is removed.

A method for restoring character data of a document from bitmap data of a document created using the document data creation method according to claim 2,
The information code is extracted from the bit map of the document based on whether the run length of the black pixel group arranged on the line in the scanning direction is an even value or an odd value, and the correspondence between the information code and the attribute information is determined. The extracted information code is referred to as attribute information by referring to the second conversion table Converted,
A character data restoration method comprising: removing the extracted information code from the bitmap of the document; and restoring character information based on the bitmap from which the information code has been removed.

A method for restoring character data of a document from a bitmap of the document created using the document data creation method according to claim 3 ,
Extracting the zipped information code attached to the synthesizable pixel in the bitmap of the document, to restore the compressed completion information code the extracted obtains the information code before compression, the information code and the attribute Converting the obtained information code into attribute information with reference to a second conversion table representing correspondence of information;
Character data restoring method which is characterized in that the extraction Compressed information code removed from the bitmap of the document, to restore the character information based on the bit map in which the compressed completion information code has been removed.

A method for restoring character data of a document from a bitmap of the document created using the document data creation method according to claim 4,
The compressed information code is extracted from the bitmap of the document based on whether the run length of the black pixel group arranged on the line in the scanning direction is an even value or an odd value, and the extracted compressed information code is Reconstructing the information code before compression, converting the obtained information code into attribute information with reference to a second conversion table representing a correspondence between the information code and the attribute information,
A character data restoration method, wherein the extracted compressed information code is removed from the bitmap of the document, and character information is restored based on the bitmap from which the compressed information code is removed.

The character data restoration method according to claim 5 , wherein the restored character information is modified based on the converted attribute information.

A document data creation device that creates bitmap data of a document based on document original data created by a computer,
Carrier font character storage means for storing bitmap data of carrier font characters obtained by modifying basic font characters ;
Character / attribute information extracting means for extracting character information and attribute information from information of each character constituting the document manuscript data;
In accordance with the character information of each extracted character, bitmap data creation means for creating a bitmap data corresponding to the document document data by reading a bitmap of the carrier font characters from the carrier font character storage means;
The attribute information of each character that the extracted, the attribute information converting means for converting the information code based on a first conversion table showing the correspondence between the information codes representing the attribute information and the attribute information,
Information synthesizable pixel extracting means for extracting pixels that match a predetermined condition among the pixels of the created bitmap data as synthesizable pixels capable of synthesizing the information code;
An information code synthesizing unit for sequentially adding the information code to the synthesizable pixels.

The carrier font character is modified so that the run length of the black pixel group arranged on the line in the scanning direction has an odd value,
The synthesizable pixel is a white pixel adjacent to a black pixel group arranged on the line in the scanning direction, and the white pixel is not connected to the two black pixel groups even if the white pixel is changed to a black pixel. 11. The document data creation device according to claim 10, wherein

A document data creation device that creates bitmap data of a document based on document original data created by a computer,
Carrier font character storage means for storing bitmap data of carrier font characters obtained by modifying basic font characters ;
Character / attribute information extracting means for extracting character information and attribute information from information of each character constituting the document manuscript data;
In accordance with the character information of each extracted character, bitmap data creation means for creating a bitmap data corresponding to the document document data by reading a bitmap of the carrier font characters from the carrier font character storage means;
The attribute information of each character that the extracted, the attribute information converting means for converting the information code based on a first conversion table showing the correspondence between the information codes representing the attribute information and the attribute information,
The information code is separated into unique information unique to each character and non-unique information that is the same in line units or paragraph units, and the non-unique characters of consecutive characters that constitute the document manuscript data. Information code compression means for erasing information and compressing the information code of the consecutive characters to generate a compressed information code;
Information synthesizable pixel extracting means for extracting pixels that match a predetermined condition among the pixels of the created bitmap data as synthesizable pixels capable of synthesizing the compressed information code;
An apparatus for creating document data, comprising: information code synthesizing means for sequentially adding the compressed information code to the synthesizable pixels.

The carrier font character is modified so that the run length of the black pixel group arranged on the line in the scanning direction has an odd value,
The synthesizable pixel is a white pixel adjacent to a black pixel group arranged on the line in the scanning direction, and the white pixel is not connected to the two black pixel groups even if the white pixel is changed to a black pixel. The document data creation device according to claim 12, characterized in that:

A character data restoration device for restoring character data of a document from a bitmap of the document created by the document data creation device according to claim 10 ,
Information code extracting means for extracting the information code added to the compositable pixels in the bitmap of the document;
Information code conversion means for converting the extracted information code into attribute information with reference to a second conversion table representing the correspondence between the information code and the attribute information;
Information code removing means for removing the extracted information code from the bitmap of the document;
Character data restoration device which is characterized in that a character recognition means for restoring the character information based on the bit map in which the information code is removed.

A character data restoration device for restoring character data of a document from a bitmap of the document created by the document data creation device according to claim 11,
  Information code extracting means for extracting the information code based on whether the run length of the black pixel group arranged on a line in the scanning direction is an even value or an odd value from the bitmap of the document;
  Information code conversion means for converting the extracted information code into attribute information with reference to a second conversion table representing the correspondence between the information code and the attribute information;
  Information code removing means for removing the extracted information code from the bitmap of the document;
  A character data restoration device comprising character recognition means for restoring character information based on a bitmap from which the information code has been removed.

A character data restoration device for restoring character data of a document from a bitmap of the document created by the document data creation device according to claim 12 ,
Information code extracting means for extracting the compressed information code added to the compositable pixels in the bitmap of the document;
An information code reconstruction means for obtaining the information code before compression to restore the compressed completion information code the extracted,
An information code converting means for converting the second information code the restored by referring to the conversion table indicating the correspondence of the information codes and the attribute information in the attribute information,
Information code removing means for removing the extracted compressed information code from the bitmap of the document;
Character data restoration device which is characterized in that a character recognition means for restoring the character information based on the bit map in which the compressed completion information code has been removed.

A character data restoration device for restoring character data of a document from a bitmap of the document created by the document data creation device according to claim 13,
  Information code extracting means for extracting the compressed information code based on whether the run length of the black pixel group arranged on a line in the scanning direction is an even value or an odd value from the bitmap of the document;
  Information code restoring means for restoring the extracted compressed information code and obtaining the information code before compression;
  Information code conversion means for converting the restored information code into the attribute information with reference to a second conversion table representing the correspondence between the information code and the attribute information;
  Information code removing means for removing the extracted compressed information code from the bitmap of the document;
  A character data restoration device comprising: character recognition means for restoring character information based on a bitmap from which the compressed information code has been removed.

18. The character data restoring apparatus according to claim 14 , further comprising a modifying unit that modifies the restored character information based on the converted attribute information.