JP3728209B2

JP3728209B2 - Image processing method and apparatus, computer program, and storage medium

Info

Publication number: JP3728209B2
Application number: JP2001021658A
Authority: JP
Inventors: 淳田丸; 恵市岩村
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2001-01-30
Filing date: 2001-01-30
Publication date: 2005-12-21
Anticipated expiration: 2021-01-30
Also published as: JP2002232679A

Description

【０００１】
【発明の属する技術分野】
本発明は、文書画像のデジタルデータに電子透かし情報を埋め込む、或いはそこから電子透かし情報を抽出する画像処理装置、及び方法、及びこの方法を記憶した記憶媒体に関するものである。
【０００２】
【従来の技術】
近年、インターネット上で画像や音声などのデジタルデータの著作権保護手段として、電子透かし技術が注目されている。電子透かし技術は、画像や音声などのデジタルデータを人間が知覚できない程度に操作することにより電子透かし情報を埋め込む技術である。
【０００３】
多値画像に対する電子透かし技術としては、一般的に画素の濃度の冗長性を利用した種々の方法が知られている。また、２値画像である文書画像に対する電子透かし技術としては、例えば、英文（欧文）の単語間の空白長を変更することにより電子透かし情報を埋め込む特開平９−１８６６０３（ＵＳＰ５８６１６１９）が知られている。
【０００４】
【発明が解決しようとする課題】
しかしながら、上記従来技術における、複数文字からなる単語を処理単位としてその単語間の距離を調節するという単純なものでは、自然な出力結果を得ることは難しい。
【０００５】
そこで、本発明は、文字間の空白部分の距離を利用して情報の埋め込みを行い、自然な出力を得ることを可能にし、且つ、単語間に対するものより、より多くの情報を埋め込むことを可能ならしめる画像処理装置及び方法及びプログラム及び記憶媒体を提供しようとするものである。
【０００６】
また、他の発明は、単語間の空白を利用するにしても、より自然な出力結果とする画像処理装置及び方法及びプログラム及び記憶媒体を提供しようとするものである。
【０００７】
【課題を解決するための手段】
かかる課題を解決するため、例えば本発明の画像処理装置は以下の構成を備える。すなわち、
文書画像に電子透かし情報を埋め込む画像処理装置であって、
前記文書画像における文字イメージ或いは文字を構成する要素である要素イメージに外接する矩形を抽出する抽出手段と、
埋め込み対象となる情報のビットの論理値に基づき、注目矩形の位置を、隣合う他の矩形のいずれかへのシフトするシフト量を決定するシフト量算出手段と、
該シフト量算出手段で算出したシフト量に基づいて、注目矩形の文字イメージ或いは要素イメージをシフトするシフト手段とを備え、
前記シフト量算出手段は、注目矩形の前に位置する他の矩形との距離をＰ、後に位置する矩形との距離をＳとしたとき、Ｐ−ＳとＰ＋Ｓとの比に基づいて、電子透かし情報が表すよう注目矩形の位置のシフト量を算出することを特徴とする。
【０００８】
【発明の実施の形態】
以下、添付図面に従って本発明に係る実施形態を詳細に説明する。
【０００９】
＜第１の実施の形態＞
本実施形態では、日本語の文書データ（漢字を用いた文書データ）に対して領域分割と文字要素外接矩形抽出からなる文書画像解析技術を用い、文字要素外接矩形間の空白長を用いて、情報を埋め込む電子透かしについて説明する。より具体的には、例えば特開平９−１８６６０３号（米国特許第５８６１６１９号）において単語間に適用された手法をそのまま採用した場合に、漢字の「へん」と「つくり」が離れてしまうなどの、不自然な文書画像を作りだしてしまう問題に鑑み、本実施形態では、単語間ではなく、文字或いは文字を構成する要素（へん、つくり等）間の空白長を利用し、且つ、できるだけ不自然さが少なく、デジタル上においては確実に透かし情報の抽出が可能な埋め込みを提案する。
【００１０】
本実施形態は、大きく分けて、電子透かし埋め込み手法と、電子透かし抽出手法に分けられる。図１に電子透かし埋め込み手法の概要を、図２に電子透かし抽出手法の概要を示す。以下、順を追って説明する。
【００１１】
＜埋め込み法＞
先ず、電子透かし埋め込み法について述べる。図１において、入力文書画像１０１（例えばイメージスキャナ等で読み取る）は、先ず後述する文書画像解析部１０２によって、テキスト領域やグラフ等の図形の領域に分割され、さらにテキスト領域に対しては、文字要素毎に外接矩形が抽出される。その結果が図示の符号１０４である。ここで、文字要素とは、文字認識技術で知られている射影技術（水平・垂直方向にドットのヒストグラムを作成し、行間、文字要素間を判定する技術）を用いて抽出された矩形領域（連続する有意なドットの広がりに外接する矩形）を指し、一つの文字である場合と、文字の構成要素（へん、つくり等）である場合がある。なお、文書画像中のテキスト領域（文字列領域）と非テキスト領域への領域分割（分離）は公知の技術を用いるとする。また、図示の符号１０４は分かり易く示しているものであり、実際は、例えば文書画像の左上隅を原点にし、その位置から各矩形の左上隅、右下隅の座標、或いは、矩形の左上隅の座標とその矩形の高さ及び幅等のデータを抽出し、それを文字の並び順にメモリに記憶するものであり、図示の如く矩形を描画するものではない。
【００１２】
また、抽出されたテキスト領域内の各文字（或いは文字要素）の外接矩形の情報（座標データ）から、外接矩形間の空白長を算出する。
【００１３】
詳細は後述するが、或る行の矩形の数が３０個（文字数が３０であるとは限らない）あった場合、両端の矩形は除外し、２番目の矩形、４番目の矩形、…と偶数番めの矩形に着目する。今、或る注目矩形とその前に位置する矩形（以下、前矩形）との間の空白長Ｐ、注目矩形とその後続する矩形（後矩形）との間の空白長Ｓを算出し、注目矩形を左右に微小距離シフトしてＰ，Ｓの関係を制御することで、注目矩形に対して情報１ビット（０か１）を埋め込む。１行が３０矩形の場合には、３０／２−１＝１４ビットを埋め込むことになる。なお、ここでは偶数番めの矩形について情報を埋め込む例を示したが、前後する矩形が存在する矩形に対して情報を埋めこめばよいので、この条件を満たす限りは奇数番目の矩形に情報を埋め込んでも構わない。また、埋め込む矩形が１つ置きに（間に埋め込み対象外の矩形を１つ設けること）したのは、連続する矩形に順に情報を埋め込むと、１つの空白距離が前後する矩形で共有する状態となってしまい、制御できるのが常に後矩形との距離としてなってしまい、徐々に誤差が累積されてしまい、レイアウトが大きく変わるためである。
【００１４】
上記のようにして、或る外接矩形内の領域を左右にシフトすることで、電子透かし情報１０６を埋め込んだ文書画像１０５を透かし埋め込み処理（１０３）で生成することになる。
【００１５】
＜文書画像解析技術＞
文書画像解析技術は本来、文字認識技術の要素技術の一つであり、入力された文書画像に対して、テキスト領域やグラフ等の図形の領域などへの分割と、テキスト領域に対しては、射影を用いて文字単位での切り出しを行うものである。例として、特開平６−６８３０１を挙げることができる。
【００１６】
＜抽出手法＞
次に電子透かし抽出手法について述べる。電子透かし抽出手法においては、電子透かし埋め込み手法と同様、先ず、文字認識技術で使われる文書画像解析手段２０２によって、透かし埋め込み画像２０１（イメージスキャナ等で読み取る）から、領域分割と射影によって、文字毎に外接矩形２０３を抽出する。次に抽出された外接矩形群の情報を用いて、隣接する外接矩形間の空白長を算出する。
【００１７】
その偶数番目の矩形に注目し、前矩形間の距離Ｐ、後矩形間の距離Ｓの関係から注目矩形に埋め込まれた情報１ビットを抽出する。これをその行の他の矩形（偶数番めの矩形）についても同様に行い、埋め込まれた情報を抽出していく。
【００１８】
＜埋め込み規則＞
次に埋め込み規則について述べる。先ず、１ビットの情報を埋め込む矩形の前後の空白長を上記のようにＰ，Ｓと定める（図３参照）。この１ビットを埋め込むための矩形は、先に説明したように１つの行において両端の矩形を除いて、一つおきに定まる。次にここで定めた空白長Ｐ，Ｓに対して、（Ｐ−Ｓ）／（Ｐ＋Ｓ）を算出し（但し、この値は０、若しくは小さな値になるので分母に小さな値を乗算、若しくは分子に大きな値を乗算することになる）、適当な量子化ステップで量子化する。量子化代表値には、交互に０か１を割り当てておく。
【００１９】
透かし情報抽出の際には、例えば、図４の式１によって、埋め込まれている値を抽出（「mod」は２で割ったときの余りを返す関数であり、２値化にすることを意味している）することができる。ここでαは量子化ステップを示す。
【００２０】
透かし情報埋め込みの際には、１ビットを埋め込むための矩形（文字若しくは文字構成要素の）外接矩形内の領域を、１ピクセルずつ左右にふりながら、図４の式１を算出し、その結果が埋め込もうとする値(０若しくは１)となるまで、平行移動の方向（左または右）と幅（ピクセル数）を探索する。
【００２１】
この探索のフローチャートを図５並びに図６に示す。また、これらの詳細について以下に説明する。なお、説明を簡単なものとするため、文書画像が横書きである例について説明する。
【００２２】
まず、変数の意味であるが、これは以下の通りである。
【００２３】
変数ｉは平行移動幅の候補を示す。変数Flag1は情報を埋め込むのに動かす矩形を、距離iだけ右に動かす際に右の矩形と接触するか否かを示し、接触する場合には１を取る。また、変数Flag2は距離iだけ左に矩形を動かす際に、接触するか否かを示す。Flag1同様、接触する場合に１を取る。
【００２４】
次に、このフローチャートの各処理を、処理のフローに沿って説明する。このフローはステップＳ５０１から開始する。
【００２５】
本処理が開始されると、まず、ステップＳ５０２で、変数群の初期値を設定する。すなわち、ｉ＝ｆｌａｇ１＝ｆｌａｇ２＝０とする。
【００２６】
次いで、ステップＳ５０３では、埋め込みのために動かす対象の注目矩形（注目文字或いは文字要素の部分イメージ、以下略）を距離ｉだけ右に平行移動した際に、右隣の文字若しくは文字要素である矩形（前矩形）に接触するか否かを判定する。接触する場合にはステップＳ５０４に進み、変数ｆｌａｇ１を“１”にセットする。
【００２７】
また、ステップＳ５０５では、注目矩形を距離ｉだけ左に平行移動した際に、左隣の矩形に接触するか否かを判定する。接触する場合にはステップＳ５０６でｆｌａｇ２に“１”にセットする。
【００２８】
ステップＳ５０７では、平行移動の距離ｉで示される値の際、左右両方に接触してしまうか否かを判定する。左右両方に接触してしまう場合には、ステップＳ５０８で平行移動幅を０とし、平行移動幅の計算を終了し、この場合は、埋め込みが不可能とする。
【００２９】
また、左右両方に接触はしないと判断した場合、図６のステップＳ６０１に進む。このステップＳ６０１では、注目矩形を距離iだけ右に平行移動する際に、埋め込もうとするビットが図４の式１によって、得られるか否かを判定する。得られる場合には、ステップＳ６０２において平行移動幅をｉとして平行移動幅の計算を終了する。なお、ここで平行移動幅は、値が正の場合に、右に平行移動することを意味し、負の場合には、左に平行移動することを意味するものと定める。
【００３０】
ステップＳ６０３では、注目矩形を距離iだけ左に平行移動する際に、埋め込もうとするビットが図４の式１によって、得られるか否かを判定する。得られる場合には、ステップＳ６０４において平行移動幅を−ｉとして平行移動幅の計算を終了する。
【００３１】
ステップＳ６０５においては、ｉの値をインクリメントし、ステップＳ５０３へ処理を戻す。
【００３２】
探索結果を、平行移動する方向と距離と定めた上で、実際に１ビットを埋め込むための文字の外接矩形内の領域を平行移動する。以上のような処理を、文書画像全体に対して行うことで、電子透かし情報を文書画像に埋め込むことができる。
【００３３】
埋め込むべき全文字或いは全文字要素についてのシフト量が決定される（決定された各矩形のシフト量は、その矩形を特定する情報と関連付けられて一旦メモリに格納される）ので、オリジナル文書画像中の該当する矩形内のイメージをその決定されたシフト量に従ってシフトし、埋め込み後の文書画像を生成することになる。
【００３４】
以上説明した埋め込み手法，抽出手法は、図７に示す信号処理装置を用いて実現可能である。
【００３５】
同図において、ホストコンピュータ７０１は例えば一般に普及しているパーソナルコンピュータで実現できる。スキャナ７１４は透かし情報の埋め込み対象となる文書の原稿を読み取るためのものである。また、コンピュータ７０１は読み取った画像を編集・保管することが可能である。更に、ここで得られた画像をプリンタ７１５から印刷させることが可能である。また、ユーザーからの各種マニュアル指示等は、マウス７１２、キーボード７１３からの入力により行われる。
【００３６】
ホストコンピュータ７０１の内部では、バス７１６により後述する各ブロックが接続され、種々のデータの受け渡しが可能である。
【００３７】
図中、７０３は、内部の各ブロックの動作を制御、或いは内部に記憶されたプログラムを実行することのできるＣＰＵである。７０４は、印刷されることが認められていない特定画像を記憶したり、あらかじめ必要な画像処理プログラム等を記憶しておくＲＯＭである。７０５は、ＣＰＵにて処理を行うために一時的にプログラムや処理対象の画像データを格納しておくＲＡＭである。７０６は、ＯＳやアプリケーション等のＲＡＭ等に転送されるプログラムや画像データをあらかじめ格納したり、処理後の画像データを保存することのできるハードディスク（ＨＤ）である。７０７は、原稿或いはフィルム等をＣＣＤにて読み取り、画像データを生成するスキャナと接続し、スキャナで得られた画像データを入力することのできるスキャナインターフェイス（Ｉ／Ｆ）である。７０８は、外部記憶媒体の一つであるＣＤ（ＣＤ−Ｒ）に記憶されたデータを読み込み或いは書き出すことのできるＣＤドライブである。７０９は、７０８と同様にＦＤからの読み込み、ＦＤへの書き出しができるＦＤドライブである。７１０も、７０８と同様にＤＶＤからの読み込み、ＤＶＤへの書き出しができるＤＶＤドライブである。尚、ＣＤ，ＦＤ，ＤＶＤ等に画像編集用のプログラム、或いはプリンタドライバが記憶されている場合には、これらプログラムをＨＤ７０６上にインストールし、必要に応じてＲＡＭ７０５に転送されるようになっている。７１１は、マウス７１２或いはキーボード７１３からの入力指示を受け付けるためにこれらと接続されるインターフェイス（Ｉ／Ｆ）である。
【００３８】
かかる構成において、アプリケーションを用いて、スキャナ７１４で透かし情報を埋め込む対象となる原稿を読み取ることになる（場合によってはＨＤ７０６やリムーバル記憶媒体に格納された文書画像でも良い）。次いで、透かし情報（例えば著作者名等の情報）をキーボード７１３等から入力し、その情報をビット列にして、先に説明した処理を実行するプログラムを実行させ、情報の埋め込みを行う。埋め込まれ、生成された文書画像は例えばネットワーク上に格納して第三者が閲覧できるようにしても良いし、プリンタ７１５で印刷するようにしても構わない。
【００３９】
また、上記実施形態では、埋め込むビットの情報として、図４の式を用いた。単純にＰからＳを減じた値が０以上か０未満かによって、“０”、“１”の１ビットを埋め込んでも良いが、この場合、文書画像の文字サイズ等は一切考慮されない。すなわち、文字が小さい文書や文字が大きな文書等に適応的に対処できない。かかる点、図４のようにすることで、文字サイズが大きい場合に相対的に文字（矩形）間の距離も大きくなることに適用できることになり、より自然な出力結果を得ることができる。また、図４における分母に乗算するαは小さな値を取ることになるが、この値も適応的に変化させるようにしてもよい（例えば、矩形間の平均距離に応じて変化する等）。
【００４０】
また、上記実施形態では、シフトする量を示す変数ｉは“１”、すなわち、読み取った画像における１画素単位にシフトする例を説明した。この場合の１画素はイメージスキャナの読み取り解像度に依存するものとなるものの、第三者が同じ解像度で読み取るとは限らない。そこで、シフトする最小単位は、その原稿の読み取り解像度に依存して決定するようにしてもよい。最も単純な手法は、現存するほとんどのイメージスキャナは最低でも２００ｄｐｉの解像度はあるだろうから、この２００ｄｐｉで読み取ったとしたときの１画素のサイズを基準にしする。従って、６００ｄｐｉの解像度を有するスキャナで読み取った場合には、変数ｉは“３”単位に増加させるようにする。
【００４１】
また、上記実施形態では、文字或いは文字を構成する要素に着目して処理を行ったが、図４の式を英語等の言語における文章中の単語間の空白部分に適用しても構わない。一般に英単語は、数文字で構成されるものであるので、一行中の単語間の空白部分の数は、日本語等の漢字圏より少なく、埋め込む情報量が少なくなるが、単語間の空白部分を自然な状態で維持することは可能になるという効果がある。単語間の空白と、文字間の空白の区別は、所定距離以上を単語間と認定する等の処理を行えばよい。当然、シフトするのは単語全体をシフトさせることになる。なお、埋め込まれた情報を抽出するのは、第１の実施形態での対象が文字から単語になったものであるので、その説明は省略する。
【００４２】
＜第２の実施の形態＞
上記第１の実施形態は、既存の文書画像を例にして説明したが、本願発明はこれに限定されるものではない。例えば、パーソナルコンピュータで動作しているＯＳが米国マイクロソフト社のＷｉｎｄｏｗｓの場合、それにインストールするプリンタドライバに上記の機能を搭載するようにしても構わない。
【００４３】
以下、プリンタドライバで実現する場合の動作処理を図８のフローチャートに従って説明する。
【００４４】
ワープロ等のアプリケーション等から印刷指示を受けると、本プリンタドライバが起動する。
【００４５】
まず、ステップＳ８１で印刷しようとする文書に対して埋め込みを行うか否かを問い合わせるメッセージを表示し、操作者に選択させる。なお、この選択は、印刷処理の際に表示されるダイアログボックスを活用し、そのオプションを選択した際に表示されるようにしても構わない。
【００４６】
いずれにせよ、ステップＳ８２では埋め込みが指示されたか否かを判断し、否の場合には、通常のプリンタドライバと同様の動作を行うべく、ステップＳ８３に進む（例えばＰＤＬ記述言語に変換して出力する処理）。
【００４７】
また、ステップＳ８２で埋め込みが指示されていると判断した場合には、ステップＳ８４に進み、操作者に埋め込むべき情報を入力を行わせる。なお、埋め込み情報を入力するタイミングは、このタイミングである必要はなく、例えばステップＳ８１で埋め込みを行うタイミングで入力するようにしても構わないし、予め著作情報を記述したファイルとして、或いはレジストリに登録しておき、その内容を活用するようにしても構わない。
【００４８】
ステップＳ８５では、ＯＳを介してアプリケーションより受信した印刷対象のデータを解析し、それが文字に関するものである場合には、印刷用の文字パターンを発生し、その外接矩形（１つの文字に対して２つ発生する場合もあり得る）の算出、及び、前後する矩形との距離に基づいて着目矩形に対し情報の埋め込みを行う。埋め込むべき矩形は、後続する矩形があることが必要になるから、実際は３つの矩形が生成される毎にに行うことになる。いずれにしても、アプリケーションから渡されるデータを解析すれば、それが文字に関するものであるのか、イメージや図形等の非文字情報であるのか確実に判別できることになるので、像域分離は単純なものとすることが可能となる。
【００４９】
こうして、印刷出力するだけの最低単位の埋め込み後のイメージ（例えば１ページ分のイメージ）が生成されると、それをステップＳ８６でＯＳに出力し、ＯＳはそのデータをプリンタに出力する処理を行うことになる。
【００５０】
上記の如く、本第２の実施形態の場合、透かし情報を埋め込む場合には、文字を含む文書データをイメージデータの形式でプリンタに出力せざるを得ないが、一般的なユーザーにとっては通常のアプリケーションにおける作業の一貫として埋め込みを行うことが可能となる。つまり、埋め込みに関する格別な知識がなくても、実現できることになる。
【００５１】
＜他の実施形態＞
上記の如く、本発明は、汎用の情報処理装置（パーソナルコンピュータ等）のコンピュータ（CPUあるいはMPU）によって実現可能である。従って、コンピュータに上記実施の形態を実現するためのソフトウエアのプログラムコードを供給し，このプログラムコードに従って上記システムあるいは装置のコンピュータが上記各種デバイスを動作させることにより上記実施の形態を実現する場合も本発明の範疇に含まれる．
またこの場合，前記ソフトウエアのプログラムコード自体が上記実施の形態の機能を実現することになり，そのプログラムコード自体，及びそのプログラムコードをコンピュータに供給するための記憶媒体、具体的には上記プログラムコードを格納した磁気ディスク（フロッピーディスクやハードディスク）、光ディスク（ＣＤＲＯＭ、ＣＤ−Ｒ等）、メモリカード、テープ（紙テープ、磁気テープ等）、ＲＯＭ素子等を用いることができる．
また，上記コンピュータが，供給されたプログラムコードのみに従って各種デバイスを制御することにより，上記実施の形態の機能が実現される場合だけではなく、上記プログラムコードがコンピュータ上で稼働しているOS(オペレーティングシステム)、あるいは他のアプリケーションソフト等と共同して上記実施の形態が実現される場合にもかかるプログラムコードは本発明の範疇に含まれる。
【００５２】
更に，この供給されたプログラムコードが，コンピュータの機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに格納された後，そのプログラムコードの指示に基づいてその機能拡張ボードや機能格納ユニットに備わるCPU等が実際の処理の一部または全部を行い，その処理によって上記実施の形態が実現される場合も本発明の範疇に含まれる。
【００５３】
以上説明した様に、本実施形態によれば、１文字毎、或いは該１文字を構成する"へん"や"つくり"などの部分構成毎に空白長が存在するような文書画像に適した、電子透かし情報の埋め込みを行うことができる。また、互いに隣接する各単語、文字、部分構成等の空白長がいかなるものであっても、電子透かし情報を埋め込む為のこの空白長の変動量をできるだけ小さく抑え、視覚的な画質劣化を抑えることができる。
【００５４】
また互いに隣接する各単語、文字、部分構成等の空白長がいかなるものであっても、電子透かし情報を埋め込む為のこの空白長の変動量をできるだけ小さく抑え、視覚的な画質劣化を抑えることが可能になる。
【００５５】
【発明の効果】
以上説明したように本発明によれば、文字間の空白部分の距離を利用して情報の埋め込みを行い、自然な出力を得ることを可能にし、且つ、単語間に対するものより、より多くの情報を埋め込むことが可能になる。
【００５６】
また、他の発明によれば、単語間の空白を利用するにしても、より自然な出力結果とすることが可能になる。
【図面の簡単な説明】
【図１】実施形態における電子透かし埋め込みのための機能ブロック図である。
【図２】実施形態における電子透かしの抽出のための機能ブロック図である。
【図３】実施形態における埋め込み処理を説明するための図である。
【図４】実施形態における埋め込み／抽出処理における量子化の演算式を示す図である。
【図５】実施形態における埋め込み処理のフローチャートである。
【図６】実施形態における埋め込み処理のフローチャートである。
【図７】実施形態における情報処理装置に適用した際のブロック構成図である。
【図８】第２の実施形態におけるプリンタドライバの動作処理手順を示すフローチャートである。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an image processing apparatus and method for embedding digital watermark information in digital data of a document image, or extracting digital watermark information therefrom, and a storage medium storing the method.
[0002]
[Prior art]
In recent years, digital watermark technology has attracted attention as a copyright protection means for digital data such as images and sounds on the Internet. The digital watermark technique is a technique for embedding digital watermark information by manipulating digital data such as images and sounds to the extent that humans cannot perceive.
[0003]
As a digital watermark technique for a multi-valued image, various methods using the redundancy of pixel density are generally known. As a digital watermark technique for a document image that is a binary image, for example, Japanese Patent Laid-Open No. 9-186603 (US Pat. No. 5,861,619) is known in which digital watermark information is embedded by changing the blank length between English (European) words. Yes.
[0004]
[Problems to be solved by the invention]
However, it is difficult to obtain a natural output result with the simple technique of adjusting a distance between words using a plurality of characters as a processing unit in the prior art.
[0005]
Therefore, the present invention makes it possible to embed information by using the distance of the blank portion between characters to obtain a natural output, and to embed more information than that between words It is an object of the present invention to provide an image processing apparatus and method, a program, and a storage medium.
[0006]
Another object of the present invention is to provide an image processing apparatus and method, a program, and a storage medium that provide a more natural output result even if a space between words is used.
[0007]
[Means for Solving the Problems]
In order to solve this problem, for example, an image processing apparatus of the present invention comprises the following arrangement. That is ,
An image processing apparatus for embedding digital watermark information in a document image,
Extraction means for extracting a rectangle circumscribing a character image in the document image or an element image which is an element constituting the character;
A shift amount calculating means for determining a shift amount for shifting the position of the target rectangle to any of the other adjacent rectangles based on the logical value of the bit of the information to be embedded;
Shift means for shifting the character image or element image of the target rectangle based on the shift amount calculated by the shift amount calculation means;
The shift amount calculation means is a digital watermark based on the ratio of PS and P + S, where P is the distance from another rectangle located in front of the rectangle of interest and S is the distance from the rectangle located behind. The shift amount of the position of the target rectangle is calculated so as to be represented by the information.
[0008]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments according to the present invention will be described in detail with reference to the accompanying drawings.
[0009]
<First Embodiment>
In this embodiment, using document image analysis technology consisting of region segmentation and character element circumscribed rectangle extraction for Japanese document data (document data using kanji), using the space length between character element circumscribed rectangles, A digital watermark for embedding information will be described. More specifically, for example, when the technique applied between words in Japanese Patent Laid-Open No. 9-186603 (US Pat. No. 5,816,619) is adopted as it is, the kanji characters “hen” and “making” are separated. In view of the problem of creating an unnatural document image, in the present embodiment, a space length between characters or elements constituting the character (hen, structure, etc.) is used instead of between words, and as unnatural as possible. We propose an embedding method that can extract watermark information reliably on digital.
[0010]
This embodiment can be broadly divided into a digital watermark embedding method and a digital watermark extraction method. FIG. 1 shows an outline of the digital watermark embedding method, and FIG. 2 shows an outline of the digital watermark extraction method. In the following, description will be given in order.
[0011]
<Embedding method>
First, the digital watermark embedding method will be described. In FIG. 1, an input document image 101 (for example, read by an image scanner) is first divided into graphic areas such as a text area and a graph by a document image analysis unit 102 described later. A circumscribed rectangle is extracted for each element. The result is the reference numeral 104 shown in the figure. Here, the character element is a rectangular area extracted using a projection technique known in the character recognition technique (a technique for creating a histogram of dots in the horizontal and vertical directions to determine line spacing and character elements). A rectangle that circumscribes the spread of successive significant dots) and may be a single character or a component of a character (hen, structure, etc.). A known technique is used for dividing (separating) a text area (character string area) and a non-text area in a document image. In addition, the reference numeral 104 shown in the figure is easy to understand. Actually, for example, the upper left corner of the document image is set as the origin, and the coordinates of the upper left corner and the lower right corner of each rectangle from the position, or the coordinates of the upper left corner of the rectangle. And data such as the height and width of the rectangle are extracted and stored in the memory in the order in which the characters are arranged, and the rectangle is not drawn as shown.
[0012]
Further, the blank length between the circumscribed rectangles is calculated from the circumscribed rectangle information (coordinate data) of each character (or character element) in the extracted text area.
[0013]
Although details will be described later, when there are 30 rectangles in a certain line (the number of characters is not necessarily 30), the rectangles at both ends are excluded, the second rectangle, the fourth rectangle, and so on. Focus on the even-numbered rectangle. Now, a blank length P between a certain target rectangle and a rectangle positioned in front of it (hereinafter referred to as the previous rectangle) and a blank length S between the target rectangle and the subsequent rectangle (rear rectangle) are calculated. One bit (0 or 1) of information is embedded in the target rectangle by controlling the relationship between P and S by shifting the rectangle from side to side by a minute distance. When one line is 30 rectangles, 30 / 2-1 = 14 bits are embedded. Although an example of embedding information for even-numbered rectangles is shown here, it is only necessary to embed information in a rectangle where there are preceding and following rectangles, so as long as this condition is satisfied, information is embedded in odd-numbered rectangles. You can embed it. In addition, every other rectangle to be embedded (one non-embedded rectangle is provided between them) is a state in which information is embedded in successive rectangles in order to share a rectangle with one blank distance. This is because the distance from the rear rectangle can always be controlled, errors are gradually accumulated, and the layout changes greatly.
[0014]
As described above, the document image 105 in which the digital watermark information 106 is embedded is generated by the watermark embedding process (103) by shifting the area in a circumscribed rectangle to the left and right.
[0015]
<Document image analysis technology>
Document image analysis technology is essentially one of the elemental technologies of character recognition technology. The input document image is divided into text regions and graphic regions such as graphs, and for text regions, Cut out in character units using projection. As an example, JP-A-6-68301 can be mentioned.
[0016]
<Extraction method>
Next, a digital watermark extraction method will be described. In the digital watermark extraction method, as in the digital watermark embedding method, first, the document image analysis unit 202 used in the character recognition technology performs segmentation and projection from the watermark embedded image 201 (read by an image scanner or the like) for each character. The circumscribed rectangle 203 is extracted. Next, using the extracted circumscribed rectangle group information, the blank length between adjacent circumscribed rectangles is calculated.
[0017]
Focusing on the even-numbered rectangles, 1 bit of information embedded in the target rectangle is extracted from the relationship between the distance P between the front rectangles and the distance S between the rear rectangles. This is similarly performed for other rectangles in the row (even-numbered rectangles), and the embedded information is extracted.
[0018]
<Embedding rules>
Next, embedding rules will be described. First, the blank lengths before and after a rectangle in which 1-bit information is embedded are determined as P and S as described above (see FIG. 3). This rectangle for embedding one bit is determined every other line except for the rectangles at both ends in one row as described above. Next, (PS) / (P + S) is calculated for the blank lengths P and S determined here (however, since this value is 0 or a small value, the denominator is multiplied by a small value or the numerator Quantize with an appropriate quantization step. The quantization representative value is assigned with 0 or 1 alternately.
[0019]
When extracting watermark information, for example, the embedded value is extracted by Equation 1 in FIG. 4 (“mod” is a function that returns the remainder when divided by 2 and means binarization. Can be). Here, α indicates a quantization step.
[0020]
When embedding watermark information, Equation 1 in FIG. 4 is calculated while shifting the region in the rectangle (character or character component) circumscribing rectangle for embedding 1 bit to the left and right one pixel at a time. The direction of translation (left or right) and the width (number of pixels) are searched until the value to be embedded (0 or 1) is reached.
[0021]
Flow charts for this search are shown in FIG. 5 and FIG. Details thereof will be described below. In order to simplify the description, an example in which the document image is horizontally written will be described.
[0022]
First of all, the meaning of variables is as follows.
[0023]
The variable i indicates a translation width candidate. The variable Flag1 indicates whether or not the rectangle that is moved to embed information is in contact with the right rectangle when it is moved to the right by the distance i. The variable Flag2 indicates whether or not to touch when moving the rectangle to the left by the distance i. Like Flag1, take 1 when touching.
[0024]
Next, each process of this flowchart is demonstrated along the flow of a process. This flow starts from step S501.
[0025]
When this process is started, first, in step S502, initial values of variable groups are set. That is, i = flag1 = flag2 = 0.
[0026]
Next, in step S503, when a target rectangle to be moved for embedding (a focused image or a partial image of a character element, hereinafter abbreviated) is translated to the right by a distance i, a rectangle that is a character or character element adjacent to the right. It is determined whether or not (front rectangle) is touched. In the case of contact, the process proceeds to step S504, and the variable flag1 is set to “1”.
[0027]
In step S505, it is determined whether or not the rectangle of interest touches the rectangle on the left when the rectangle of interest is translated left by the distance i. If contact is made, flag2 is set to “1” in step S506.
[0028]
In step S507, it is determined whether or not both the left and right sides are touched when the value indicated by the parallel movement distance i is reached. If both the left and right sides are touched, the translation width is set to 0 in step S508, the calculation of the translation width is terminated, and in this case, embedding is impossible.
[0029]
If it is determined that the left and right are not touched, the process proceeds to step S601 in FIG. In this step S601, it is determined whether or not the bit to be embedded can be obtained by Equation 1 in FIG. 4 when the target rectangle is translated to the right by the distance i. If it is obtained, in step S602, the parallel movement width is set to i and the calculation of the parallel movement width is terminated. Here, the translation width is determined to mean translation to the right when the value is positive, and to translate to the left when the value is negative.
[0030]
In step S603, it is determined whether or not the bit to be embedded can be obtained by Equation 1 in FIG. 4 when the target rectangle is translated to the left by the distance i. If it is obtained, the parallel movement width is set to -i in step S604, and the calculation of the parallel movement width is ended.
[0031]
In step S605, the value of i is incremented, and the process returns to step S503.
[0032]
The search result is determined as the direction and distance of translation, and the region in the circumscribed rectangle of the character for actually embedding one bit is translated. By performing the above processing on the entire document image, the digital watermark information can be embedded in the document image.
[0033]
Since the shift amount for all characters or all character elements to be embedded is determined (the determined shift amount of each rectangle is temporarily stored in the memory in association with the information for specifying the rectangle). The image in the corresponding rectangle is shifted according to the determined shift amount, and a document image after embedding is generated.
[0034]
The embedding method and extraction method described above can be realized using the signal processing apparatus shown in FIG.
[0035]
In the figure, the host computer 701 can be realized by, for example, a widely used personal computer. A scanner 714 reads an original of a document to be embedded with watermark information. The computer 701 can edit and store the read image. Further, the image obtained here can be printed from the printer 715. In addition, various manual instructions from the user are performed by input from the mouse 712 and the keyboard 713.
[0036]
Inside the host computer 701, blocks described later are connected by a bus 716, and various data can be transferred.
[0037]
In the figure, reference numeral 703 denotes a CPU capable of controlling the operation of each internal block or executing a program stored therein. A ROM 704 stores a specific image that is not permitted to be printed, or stores a necessary image processing program or the like in advance. Reference numeral 705 denotes a RAM that temporarily stores a program and image data to be processed for processing by the CPU. Reference numeral 706 denotes a hard disk (HD) capable of storing in advance programs and image data to be transferred to a RAM or the like such as an OS or application, and storing processed image data. A scanner interface (I / F) 707 is connected to a scanner that reads an original or film with a CCD and generates image data, and can input image data obtained by the scanner. Reference numeral 708 denotes a CD drive capable of reading or writing data stored in a CD (CD-R) which is one of external storage media. Reference numeral 709 denotes an FD drive capable of reading from the FD and writing to the FD as in the case of 708. Similarly to 708, reference numeral 710 denotes a DVD drive that can read from and write to DVD. If an image editing program or a printer driver is stored on a CD, FD, DVD, etc., these programs are installed on the HD 706 and transferred to the RAM 705 as necessary. . Reference numeral 711 denotes an interface (I / F) connected to the mouse 712 or the keyboard 713 to accept input instructions.
[0038]
In such a configuration, a document to be embedded with watermark information is read by the scanner 714 using an application (in some cases, a document image stored in the HD 706 or a removable storage medium may be used). Next, watermark information (for example, information such as the author's name) is input from the keyboard 713 or the like, the information is converted into a bit string, a program for executing the processing described above is executed, and information is embedded. The embedded and generated document image may be stored on a network so that it can be viewed by a third party, or may be printed by the printer 715.
[0039]
Further, in the above embodiment, the equation of FIG. 4 is used as the information of the bit to be embedded. One bit of “0” and “1” may be embedded depending on whether the value obtained by simply subtracting S from P is 0 or less, but in this case, the character size of the document image is not considered at all. That is, it cannot adaptively deal with documents with small characters or documents with large characters. In this respect, the configuration shown in FIG. 4 can be applied to a relatively large distance between characters (rectangles) when the character size is large, and a more natural output result can be obtained. Further, α to be multiplied by the denominator in FIG. 4 takes a small value, but this value may also be adaptively changed (for example, it changes according to the average distance between rectangles).
[0040]
In the above-described embodiment, the variable i indicating the shift amount is “1”, that is, an example of shifting in units of one pixel in the read image has been described. One pixel in this case depends on the reading resolution of the image scanner, but a third party does not always read at the same resolution. Therefore, the minimum unit for shifting may be determined depending on the reading resolution of the document. The simplest method is based on the size of one pixel when reading at 200 dpi because most existing image scanners will have a resolution of 200 dpi at a minimum. Accordingly, when reading is performed by a scanner having a resolution of 600 dpi, the variable i is increased to “3” units.
[0041]
Further, in the above embodiment, the processing is performed by paying attention to the characters or the elements constituting the characters, but the formula of FIG. 4 may be applied to a blank portion between words in a sentence in a language such as English. Generally speaking, English words are composed of several characters, so the number of blank spaces between words in a line is less than that of Kanji characters such as Japanese, and the amount of information to be embedded is small, but there are blank spaces between words. It is possible to maintain a natural state. A distinction between a space between words and a space between characters may be performed by, for example, recognizing that a predetermined distance or more is between words. Of course, shifting will shift the entire word. The reason for extracting the embedded information is that the object in the first embodiment is changed from a character to a word, and the description thereof is omitted.
[0042]
<Second Embodiment>
The first embodiment has been described using an existing document image as an example, but the present invention is not limited to this. For example, when the OS running on the personal computer is Windows of Microsoft Corporation in the United States, the above function may be installed in the printer driver installed therein.
[0043]
Hereinafter, an operation process when the printer driver is implemented will be described with reference to the flowchart of FIG.
[0044]
When a print instruction is received from an application such as a word processor, the printer driver is activated.
[0045]
First, in step S81, a message for inquiring whether to embed a document to be printed is displayed, and the operator is made to select it. Note that this selection may be displayed when an option is selected using a dialog box displayed during the printing process.
[0046]
In any case, in step S82, it is determined whether or not embedding has been instructed. If not, the process proceeds to step S83 to perform the same operation as that of a normal printer driver (for example, converted into a PDL description language and output). Process).
[0047]
If it is determined in step S82 that the embedding is instructed, the process proceeds to step S84, and the operator inputs information to be embedded. The timing for inputting the embedding information does not have to be this timing. For example, the embedding information may be input at the timing of embedding in step S81, or may be registered in the registry as a file in which copyright information is described in advance. You may make it utilize the contents beforehand.
[0048]
In step S85, data to be printed received from the application via the OS is analyzed. If the data to be printed is related to a character, a character pattern for printing is generated and the circumscribed rectangle (for one character) is generated. Information may be embedded in the target rectangle based on the calculation of two possible occurrences and the distance from the preceding and following rectangles. Since the rectangle to be embedded needs to have a subsequent rectangle, it is actually performed every time three rectangles are generated. In any case, if the data passed from the application is analyzed, it is possible to reliably determine whether it is related to characters or non-character information such as images and figures, so image area separation is simple. It becomes possible.
[0049]
In this way, when an embedded image (for example, an image for one page) of the minimum unit sufficient for printing is generated, it is output to the OS in step S86, and the OS performs processing for outputting the data to the printer. It will be.
[0050]
As described above, in the case of the second embodiment, when embedding watermark information, document data including characters must be output to a printer in the form of image data. It is possible to embed as part of the work in the application. That is, it can be realized without special knowledge about embedding.
[0051]
<Other embodiments>
As described above, the present invention can be realized by a computer (CPU or MPU) of a general-purpose information processing apparatus (personal computer or the like). Accordingly, there may be a case in which the above embodiment is realized by supplying a software program code for realizing the above embodiment to a computer and causing the computer of the above system or apparatus to operate the above various devices according to this program code. It is included in the category of the present invention.
In this case, the program code of the software itself realizes the functions of the above embodiments, and the program code itself and a storage medium for supplying the program code to the computer, specifically, the program A magnetic disk (floppy disk or hard disk) storing a code, an optical disk (CDROM, CD-R, etc.), a memory card, a tape (paper tape, magnetic tape, etc.), a ROM element, etc. can be used.
In addition, the computer controls various devices according to only the supplied program code, so that the functions of the above-described embodiments are realized, and the OS (operating system) on which the program code is running on the computer. Such program code is also included in the scope of the present invention when the above embodiment is realized in cooperation with a system) or other application software.
[0052]
Further, after the supplied program code is stored in the memory of the function expansion board of the computer or the function expansion unit connected to the computer, the program code is stored in the function expansion board or function storage unit based on the instruction of the program code. A case in which the CPU or the like provided performs part or all of the actual processing and the above embodiment is realized by the processing is also included in the scope of the present invention.
[0053]
As described above, according to the present embodiment, it is suitable for a document image in which a blank length exists for each character or for each partial structure such as “hen” or “make” that constitutes the character. It is possible to embed digital watermark information. Also, whatever the blank length of each adjacent word, character, partial structure, etc., the amount of variation in this blank length for embedding digital watermark information is minimized to suppress visual image quality degradation. Can do.
[0054]
Also, whatever the blank length of each adjacent word, character, partial structure, etc., the amount of variation of this blank length for embedding digital watermark information can be minimized to suppress visual image quality degradation. It becomes possible.
[0055]
【The invention's effect】
As described above, according to the present invention, it is possible to embed information by using the distance of the blank portion between characters to obtain a natural output, and more information than that between words. Can be embedded.
[0056]
Further, according to another invention, even if a space between words is used, a more natural output result can be obtained.
[Brief description of the drawings]
FIG. 1 is a functional block diagram for embedding a digital watermark in an embodiment.
FIG. 2 is a functional block diagram for extracting a digital watermark in the embodiment.
FIG. 3 is a diagram for explaining an embedding process in the embodiment.
FIG. 4 is a diagram illustrating an arithmetic expression for quantization in embedding / extraction processing in the embodiment.
FIG. 5 is a flowchart of an embedding process in the embodiment.
FIG. 6 is a flowchart of an embedding process in the embodiment.
FIG. 7 is a block configuration diagram when applied to the information processing apparatus in the embodiment.
FIG. 8 is a flowchart illustrating an operation processing procedure of a printer driver according to a second embodiment.

Claims

An image processing apparatus for embedding digital watermark information in a document image,
Extraction means for extracting a rectangle circumscribing a character image in the document image or an element image which is an element constituting the character;
A shift amount calculating means for determining a shift amount for shifting the position of the target rectangle to any of the other adjacent rectangles based on the logical value of the bit of the information to be embedded;
Shift means for shifting the character image or element image of the target rectangle based on the shift amount calculated by the shift amount calculation means ;
The shift amount calculation means is a digital watermark based on the ratio of PS and P + S, where P is the distance from another rectangle located in front of the rectangle of interest and S is the distance from the rectangle located behind. An image processing apparatus that calculates a shift amount of a position of a target rectangle so as to be represented by information .

The image processing apparatus according to claim 1, wherein the document image is input from an image scanner.

The image processing apparatus according to claim 1 or 2 , wherein the rectangle to be embedded with information is every other rectangle in the group of rectangles in one row.

An image processing method for embedding digital watermark information in a document image,
An extraction step of extracting a rectangle circumscribing a character image in the document image or an element image that is an element constituting the character;
Based on the logical value of the bit of information to be embedded, a shift amount calculating step for determining a shift amount to shift the position of the target rectangle to any of the other adjacent rectangles;
Based on the shift amount calculated in the shift amount calculation step, a shift step of shifting the character image or element image of the target rectangle ,
The shift amount calculating step is based on the ratio between PS and P + S, where P is the distance from another rectangle located in front of the rectangle of interest and S is the distance from the rectangle located later. An image processing method characterized by calculating a shift amount of a position of a target rectangle so that information can be expressed.

A computer program for embedding digital watermark information in a document image,
A program code of an extraction process for extracting a rectangle circumscribing a character image in the document image or an element image that is an element constituting the character;
Based on the logical value of the bit of information to be embedded, the program code of the shift amount calculation step for determining the shift amount to shift the position of the target rectangle to any of the other adjacent rectangles;
Based on the shift amount calculated in the shift amount calculation step, have a program code shift step of shifting a character image or element image of the target rectangle,
The program code of the shift amount calculation step is based on the ratio of P−S and P + S, where P is the distance from another rectangle located in front of the rectangle of interest and S is the distance from the rectangle located behind. A computer program for calculating a shift amount of a position of a target rectangle so that digital watermark information can be expressed .

The computer program according to claim 5 is a printer driver program,
A program code for displaying a selection user interface for selecting whether or not to embed digital watermark information when a print instruction is given;
A program for the step of generating a character pattern from information on characters in the data passed from the host process and passing the generated character pattern or an element image pattern constituting the character pattern to the extraction step when there is an instruction to embed A computer program comprising a code.

Storage medium storing a computer program according to paragraph 5 or paragraph 6 claims.

An image processing apparatus for inputting a document image and extracting watermark information embedded in the document image,
Circumscribing rectangle discrimination means for discriminating a rectangle circumscribing a character image in the document image or an element image which is an element constituting the character;
A calculation means for calculating a distance from another rectangle adjacent to the target rectangle;
Information embedded in the target rectangle based on the distance calculated by the calculation means, and embedded information extraction means ,
The embedding information extracting means is configured such that P−S and P + S, where P is the distance between the target rectangle and the rectangle positioned in front of the target rectangle, and S is the distance between the target rectangle and the rectangle positioned after the target rectangle. An image processing apparatus , wherein bits of embedded information are extracted based on the ratio .

An image processing method for inputting a document image and extracting watermark information embedded in the document image,
A circumscribed rectangle determining step for extracting a rectangle circumscribing a character image in the document image or an element image that is an element constituting the character;
A calculation step of calculating a distance from another rectangle adjacent to the target rectangle;
The calculated on the basis of the output distance calculated in step, and an embedded information extraction step information embedded against target rectangle,
In the embedding information extraction step, P−S and P + S, where P is the distance between the target rectangle and the rectangle positioned in front of the target rectangle, and S is the distance between the target rectangle and the rectangle positioned after the target rectangle. An image processing method characterized in that bits of embedded information are extracted based on the ratio .

An image processing apparatus for embedding digital watermark information in a document image consisting of a plurality of words,
A digital watermark embedding unit that embeds digital watermark information in the document image by changing a distance between words adjacent to each word in the document image;
The digital watermark embedding means may represent digital watermark information based on a quantized value of a ratio of PS and P + S, where P and S are distances between a predetermined word and adjacent words before and after the predetermined word. An image processing apparatus characterized by that.

An image processing method for embedding digital watermark information in a document image consisting of a plurality of words,
A digital watermark embedding step of embedding digital watermark information in the document image by changing a distance between words adjacent to each word in the document image,
In the digital watermark embedding step, when each distance between a predetermined word and adjacent words before and after it is P and S, the digital watermark information is expressed based on the quantized value of the ratio of PS and P + S. An image processing method characterized by that.

A storage medium storing an image processing program for embedding digital watermark information in a document image composed of a plurality of words in a state readable by a computer,
A digital watermark embedding step of embedding digital watermark information in the document image by changing a distance between words adjacent to each word in the document image,
In the digital watermark embedding step, when each distance between a predetermined word and adjacent words before and after it is P and S, the digital watermark information is expressed based on the quantized value of the ratio of PS and P + S. A storage medium characterized by that.

An image processing apparatus for extracting digital watermark information from a document image composed of a plurality of words,
In the document image, comprising a digital watermark detection means for detecting digital watermark information in the document image based on a distance between words adjacent to each word,
The digital watermark information detected by the digital watermark detection means is a quantized value of a ratio of PS and P + S, where P and S are distances between a predetermined word and adjacent words before and after the predetermined word. An image processing apparatus characterized by being represented on the basis of the above.

An image processing method for extracting digital watermark information from a document image consisting of a plurality of words,
In the document image, comprising a digital watermark detection step of detecting digital watermark information in the document image based on a distance between words adjacent to each word,
The digital watermark information detected in the digital watermark detection step is a quantized value of a ratio of PS and P + S, where P and S are distances between a predetermined word and adjacent words before and after the predetermined word. An image processing method characterized by being expressed based on the above.

A storage medium storing an image processing program for extracting digital watermark information from a document image composed of a plurality of words in a state readable by a computer,
In the document image, comprising a digital watermark detection step of detecting digital watermark information in the document image based on a distance between words adjacent to each word,
The digital watermark information detected in the digital watermark detection step is a quantized value of a ratio of PS and P + S, where P and S are distances between a predetermined word and adjacent words before and after the predetermined word. A storage medium characterized by being expressed based on the above.