JP3964684B2

JP3964684B2 - Digital watermark embedding device, digital watermark detection device, digital watermark embedding method, and digital watermark detection method

Info

Publication number: JP3964684B2
Application number: JP2002003153A
Authority: JP
Inventors: 昌彦須崎
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2002-01-10
Filing date: 2002-01-10
Publication date: 2007-08-22
Anticipated expiration: 2022-01-10
Also published as: JP2003209676A

Description

【０００１】
【発明の属する技術分野】
本発明は，印刷済みの透かし入り文書に対し文字列の追加や消去などの変更が行われた場合に，その文書をスキャナ等でコンピュータに入力して処理を行うことにより，原本からの変更の有無やその位置を特定するための電子透かし埋め込み／検出技術に関するものである。
【０００２】
【従来の技術】
画像や文書データなどにコピー・偽造防止のための情報や機密情報を人の目には見えない形で埋め込む「電子透かし」は，保存やデータの受け渡しがすべて電子媒体上で行われることを前提としており，透かしによって埋め込まれている情報の劣化や消失がないため確実に情報検出を行うことができる。これと同様に，紙媒体に印刷された文書に対しても，文書が不正に改ざんされたりコピーされることを防ぐために，文字以外の視覚的に目障りではない形式でかつ容易に改ざんが不可能であるような機密情報を印刷文書に埋め込む方法が必要となっている。
【０００３】
印刷物として最も広く利用される白黒の二値の文書に対する電子透かし埋め込み方法としては，以下のような技術が知られている。
【０００４】
［１］特開平９−１７９４９４「機密情報記録方法」
４００ｄｐｉ以上のプリンタで印刷されることを想定する。情報を数値化し，基準点マークと位置判別マークとの距離（ドット数）により情報の表現を行う。
【０００５】
［２］特開２００１−７８００６「白黒２値文書画像への透かし情報埋め込み・検出方法及びその装置」
任意の文字列を囲む最小矩形をいくつかのブロックに分割し，それらを２つのグループ（グループ１，グループ２）に分ける（グループの数は３つ以上でも良い）。例えば信号が１の場合はグループ１のブロック中の特徴量を増やしグループ２の各ブロック中の特徴量を減らす。信号が０の場合は逆の操作を行う。ブロック中の特徴量は，文字領域の画素数や文字の太さ，ブロックを垂直にスキャンして最初に文字領域にぶつかる点までの距離などである。
【０００６】
［３］特開２００１−５３９５４「情報埋込み装置，情報読み出し装置，電子透かしシステム，情報埋め込み方法，情報読み出し方法及び記録媒体」
１つの文字を囲む最小矩形の幅と高さをその文字に対する特徴量として定め，２つ以上の文字間での特徴量の大小関係の分類パターンによりシンボルを表すものとする。例えば３つの文字からは６つの特徴量が定義でき，これらの大小関係のパターンの組合わせを列挙し，これらの組合わせを２つのグループに分類し，それぞれにシンボルを与える。埋め込む情報がシンボル０であって，これを表すために選択された文字の特徴量の組合わせパターンがシンボル１であった場合，６つの特徴量のうちいずれかを文字領域を膨らませるなどして変化させる。変化させるパターンは変化量が最小となるように選択する。
【０００７】
［４］特願平１０−２００７４３「文書処理装置」
万線スクリーン（細かい平行線で構成された特殊スクリーン）のスクリーン線を後方に移動させるかどうかにより情報を表現する。
【０００８】
【発明が解決しようとする課題】
ところで，印刷済みの文書に対する文字の上書きまたは消去などの変更が行われた場合，変更箇所を知るためには原本との比較が必要である。手書きなどで変更が行われた場合には，印刷文字と異なるためにすぐ見分けがつくが，同じフォントで上書き印刷を行った場合などには，変更箇所を見分けることが困難になる。従来例［１］〜［４］の中には，印刷文書内に埋め込んだ情報が正しく読み取れなくなることによって改ざんの有無を判定する方法もあるが，紙面の汚れや印刷時や読み取り時に雑音が付加された場合などでも情報が読み取れなくなるため，検出精度は高くない。
【０００９】
本発明は，従来の電子透かし埋め込み／検出技術が有する上記問題点に鑑みてなされたものであり，本発明の目的は，機密情報の検出精度を向上させることの可能な，新規かつ改良された電子透かし埋め込み装置，電子透かし検出装置，電子透かし埋め込み方法，及び，電子透かし検出方法を提供することである。
【００１０】
【課題を解決するための手段】
上記課題を解決するため，本発明の第１の観点によれば，文書画像に対して電子透かしにより機密情報を埋め込む，電子透かし埋め込み装置が提供される。本発明の電子透かし埋め込み装置（１００１）は，前記文書画像を参照して，前記機密情報（１００４）を基に透かし画像を作成する透かし画像形成部（１００６）を備え，前記透かし画像形成部（１００６）は，前記文書画像を，所定のフィルタによって所定のシンボルを識別可能なドットパターンを埋め込むための埋め込み領域を算出し，前記埋め込み領域に対し，文字領域の割合が所定の閾値以下であるか否かを判断し，文字領域の割合が所定の閾値以下である場合に，前記埋め込み領域の文字領域と重ならない領域に，前記機密情報の少なくとも一部を成すシンボルを識別可能なドットパターン（シンボルユニット）を所定の数埋め込むことを特徴とする。
【００１１】
また，前記透かし画像形成部（１００６）は，文字領域の割合が所定の閾値を超える前記埋め込み領域の，文字領域と重ならない領域に，前記シンボルユニットを複数種類，所定の数ずつ埋め込むことを特徴とする。
【００１２】
また，前記透かし画像形成部（１００６）は，前記埋め込み領域の文字領域と重なる領域に，前記機密情報とは無関係のシンボルであるドットパターン（背景ユニット）を埋め込むことを特徴とする。
【００１３】
また，上記課題を解決するため，本発明の第２の観点によれば，文書画像に対して電子透かしにより埋め込まれた機密情報を検出する，電子透かし検出装置が提供される。本発明の電子透かし検出装置（１００２）において，前記文書画像は，所定のフィルタによって所定のシンボルを識別可能なドットパターンを埋め込むための複数の埋め込み領域に分割され，前記各埋め込み領域に，前記機密情報の少なくとも一部を成すシンボルを識別可能なドットパターン（シンボルユニット），あるいは，前記機密情報とは無関係のシンボルであるドットパターン（背景ユニット）が埋め込まれることによって，前記機密情報が埋め込まれている。
【００１４】
そして，前記機密情報を検出する透かし検出部（１０１１）を備え，前記透かし検出部は，前記ドットパターンから所定のシンボルを識別可能な複数種類のフィルタを備え，前記各埋め込み領域ごとに，前記複数種類のフィルタによりマッチングを行い，一の前記フィルタのマッチング数が他のすべての前記フィルタのマッチング数に比べて非常に大きい前記埋め込み領域から，前記一のフィルタに対応する，前記機密情報の少なくとも一部を検出することを特徴とする。
【００１５】
また，前記透かし検出部は，前記各フィルタのマッチング数の間に差が無い前記埋め込み領域からは，前記機密情報を検出しないことを特徴とする。
【００１６】
また，上記課題を解決するため，本発明の第３の観点によれば，文書画像に対して電子透かしにより機密情報を埋め込む，電子透かし埋め込み方法が提供される。本発明の電子透かし埋め込み方法は，以下の第１〜第５工程を含むことを特徴とする。
▲１▼前記文書画像を，所定のフィルタによって所定のシンボルを識別可能なドットパターンを埋め込むための複数の埋め込み領域に分割する第１工程。
▲２▼前記各埋め込み領域ごとに，文字領域の割合が所定の閾値以下であるか否かを判断する第２工程。
▲３▼文字領域の割合が所定の閾値以下である場合に，前記埋め込み領域の文字領域と重ならない領域に，前記機密情報の少なくとも一部を成すシンボルを識別可能なドットパターン（シンボルユニット）を所定の数埋め込む第３工程。
▲４▼文字領域の割合が所定の閾値を超える前記埋め込み領域の，文字領域と重ならない領域に，前記シンボルユニットを複数種類，所定の数ずつ埋め込む第４工程。
▲５▼前記埋め込み領域の文字領域と重なる領域に，前記機密情報とは無関係のシンボルであるドットパターン（背景ユニット）を埋め込む第５工程。
【００１７】
また，上記課題を解決するため，本発明の第４の観点によれば，文書画像に対して電子透かしにより埋め込まれた機密情報を検出する，電子透かし検出方法が提供される。本発明の電子透かし検出方法において，前記文書画像は，所定のフィルタによって所定のシンボルを識別可能なドットパターンを埋め込むための複数の埋め込み領域に分割され，前記各埋め込み領域に，前記機密情報の少なくとも一部を成すシンボルを識別可能なドットパターン（シンボルユニット），あるいは，前記機密情報とは無関係のシンボルであるドットパターン（背景ユニット）が埋め込まれることによって，前記機密情報が埋め込まれている。
【００１８】
そして，前記各埋め込み領域ごとに，複数種類のフィルタによりマッチングを行い，一の前記フィルタのマッチング数が他のすべての前記フィルタのマッチング数に比べて非常に大きい前記埋め込み領域からは，前記一のフィルタに対応する，前記機密情報の少なくとも一部を検出し，前記各フィルタのマッチング数の間に差が無い前記埋め込み領域からは，前記機密情報を検出しないことを特徴とする。
【００１９】
上記電子透かし埋め込み装置，電子透かし検出装置，電子透かし埋め込み方法，及び，電子透かし検出装置によれば，以下のような効果が得られる，
（ａ）電子透かしを挿入する文書画像の文字の配置状態を参照し，文字領域以外の領域にのみ機密情報の一部を成す有効なシンボルユニットを埋め込むため，元の文書がどのようなものであっても確実に機密情報を埋め込むことができる。
（ｂ）シンボルユニットを埋め込まない領域には，シンボルユニットを複数種類，所定の数ずつ埋め込むことにより，検出時に，機密情報の一部を成す有効なシンボルユニットが埋め込まれていないことを確実に判定できる。
（ｃ）埋め込み情報の検出時に，ある領域に対する複数種類のフィルタの出力値のそれぞれの総和などによりシンボルの判定を行うため，情報検出の精度が高く保たれる。
【００２０】
また，前記透かし入り文書画像合成部は，前記機密情報の埋め込み時及び／又は検出時の各種属性情報を，前記透かし入り文書画像の４隅部分に記録することが好ましい。かかる構成によれば，以下の効果が得られる。
（ｄ）埋め込んだ信号数などの属性情報をセットする属性記録領域を信号を埋め込む領域の４隅に設定することで，機密情報を検出する際に入出力デバイスのハードウェア的な誤差に影響されることなく正確に属性情報を取り出すことが可能となり，それ以降の検出精度を向上させることができる。
【００２１】
さらに，上記電子透かし検出装置において，改ざん検出を行う機能を追加することが可能である。すなわち，本発明の他の電子透かし検出装置（２００１）は，文字消去改ざん検出部（２０１２）をさらに備え，前記文字消去改ざん検出部は，前記機密情報が埋め込まれた文書画像を所定の閾値で二値化することにより，文字領域の画素値を０，背景領域の画素値を１として文字領域抽出画像を作成し，前記機密情報が埋め込まれた文書画像の，前記シンボルユニットが検出できない領域の画素値を０，シンボルユニットが検出できる領域の画素値を１としてシンボルユニット抽出画像を作成し，前記文字領域抽出画像と前記シンボルユニット抽出画像とを比較することにより（差分画像を生成して），前記透かし入り文書画像に対する改ざんを検出することを特徴とする。かかる構成によれば，さらに以下のような効果が得られる。
（ｅ）印刷された文書に対し，任意の文字列を消去するような不正があった場合，その文書の原文がなくても改ざんが検出でき，改ざん場所も特定できる。
【００２２】
さらに，上記電子透かし検出装置において，改ざん検出のための付加情報を機密情報と共に挿入することにより改ざん検出を行う機能を追加することが可能である。すなわち，本発明の他の電子透かし検出装置（３００１）は，前記文書画像に対する前記機密情報の埋め込み時に埋め込まれた前記シンボルユニットの数を検出する，埋め込み信号数検出部（３０１３）と，前記入力画像に対する前記所定のフィルタの出力値を算出し，各埋め込み領域ごとに記録する，フィルタ出力値算出部（３０１４）と，前記埋め込み信号数検出部の検出値と，前記フィルタ出力値算出部の算出値とから，前記透かし画像に埋め込まれた前記シンボルユニットの数を検出するための最適な閾値を計算する，最適閾値判定部（３０１５）と，前記透かし入り文書画像に実際に埋め込まれている前記シンボルユニットの数を検出する，検出信号計数部（３０１６）と，前記埋め込み信号数検出部の検出値と，前記信号検出計数部の計数値とを比較することにより，前記透かし入り文書画像に対する改ざんの有無を判定する，改ざん判定部（３０１７）と，をさらに備えたことを特徴とする。
【００２３】
かかる構成によれば，さらに以下のような効果が得られる。
（ｆ）文書画像をいくつかのブロックに分割し，各ブロック内に埋め込んだシンボルユニット数を記録することで，ブロック単位で改ざん検出を行うことができ，改ざん場所の特定が可能となる。
（ｇ）埋め込んだ信号数を記録することで，信号検出や改ざん検出のための最適な閾値を求めることができる。
（ｈ）印刷された文書に対し，空白部分に文字列を追加したり，任意の文字列を修正液などで消去するような不正があった場合，その文書の原文がなくても改ざんが検出でき，改ざん場所も特定できる。
【００２４】
【発明の実施の形態】
以下に添付図面を参照しながら，本発明にかかる電子透かし埋め込み装置，電子透かし検出装置，電子透かし埋め込み方法，及び，電子透かし検出方法の好適な実施の形態について詳細に説明する。なお，本明細書及び図面において，実質的に同一の機能構成を有する構成要素については，同一の符号を付することにより重複説明を省略する。
【００２５】
（第１の実施の形態）
図１は，本発明の第１の実施の形態にかかる電子透かし埋め込み装置及び電子透かし検出装置の構成を示す説明図である。まず，電子透かし埋め込み装置１００１についてついて説明する。
【００２６】
（電子透かし埋め込み装置１００１）
電子透かし埋め込み装置１００１は，文書データ１００３と，文書に埋め込む機密情報１００４を基に文書画像を構成し，紙媒体に印刷を行う装置である。文書データ１００３は，フォント情報やレイアウト情報を含むデータであり，ワープロソフト等の文書作成ツール等により作成されたデータである。また，機密情報１００４は，紙媒体に文字以外の形式で埋め込む情報であり，文字，画像，音声などの各種データである。
【００２７】
電子透かし埋め込み装置１００１は，図１に示したように，文書画像形成部１００５と，透かし画像形成部１００６と，透かし入り文書画像合成部１００７と，出力デバイス１００８により構成されている。
【００２８】
（文書画像形成部１００５）
文書画像形成部１００５では，文書データ１００３を紙面に印刷した状態の画像が作成される。具体的には，文書画像中の白画素領域は何も印刷されない部分であり，黒画素領域は黒の塗料が塗布される部分である。なお，本実施の形態では，白い紙面に黒のインク（単色）で印刷を行うことを前提として説明するが，本発明はこれに限定されず，カラー（多色）で印刷を行う場合であっても，同様に本発明を適用可能である。
【００２９】
（透かし画像形成部１００６）
透かし画像形成部１００６は，機密情報１００４をディジタル化して数値に変換したものをＮ元符号化し（Ｎは２以上の自然数である。符号化されたビット列を「符号語」と称する。），符号語の各シンボルをあらかじめ用意した透かし信号に割り当てる。本実施の形態は，透かし画像形成部１００６による透かしの埋め込み動作に特徴を有する。すなわち，透かし画像形成部１００６は，文書データ１００３に対し機密情報１００４を埋め込む際に，文書データ１００３を参照することにより，文書データ１００３がどのようなものであっても確実に機密情報１００４を埋め込むことができる。透かし信号とその埋め込み動作については，さらに後述する。
【００３０】
（透かし入り文書画像合成部１００７，出力デバイス１００８）
透かし入り文書画像合成部１００７は，文書画像と透かし画像を重ね合わせて透かし入りの文書画像を作成する。また，出力デバイス１００８は，プリンタなどの出力装置であり，透かし入り文書画像を紙媒体に印刷する。したがって，文書画像形成部１００５，透かし画像形成部１００６，透かし入り文書画像合成部１００７はプリンタドライバの中の一つの機能として実現されていても良い。
【００３１】
印刷文書１００９は，元の文書データ１００３に対して機密情報１００４を埋め込んで印刷されたものであり，物理的に保管・管理される。
【００３２】
電子透かし埋め込み装置１００１は以上のように構成されている。
次いで，透かし画像形成部１００６が文書データ１００３に対し機密情報１００４を埋め込む際に用いられる透かし信号について説明する。
【００３３】
（信号ユニット）
透かし信号は，ドット（黒画素）の配列によって任意の波長と方向を持つ波を表現したものである。以下，幅と高さがＳｗ，Ｓｈの矩形を１つの信号の単位として「信号ユニット」と称する。図２は信号ユニットの一例を示す説明図である。
【００３４】
幅Ｓｗと高さＳｈは異なっていても良いが，本実施の形態では説明を容易にするためＳｗ＝Ｓｈとする。長さの単位は画素数であり，図２の例ではＳｗ＝Ｓｈ＝１２である。これらの信号が紙面に印刷されたときの大きさは，透かし画像の解像度に依存しており，例えば透かし画像が６００ｄｐｉ（ｄｏｔｐｅｒｉｎｃｈ：解像度の単位であり，１インチ当たりのドット数）の画像であるとしたならば，図２の信号ユニットの幅と高さは，印刷文書上で１２／６００＝０．０２（インチ）となる。
【００３５】
図２（１）は，ドット間の距離が水平軸に対してａｒｃｔａｎ（３）（ａｒｃｔａｎはｔａｎの逆関数）の方向に密であり，波の伝搬方向はａｒｃｔａｎ（−１／３）である。以下，この信号ユニットをユニットＡと称する。図２（２）はドット間の距離が水平軸に対してａｒｃｔａｎ（−３）の方向に密であり，波の伝搬方向はａｒｃｔａｎ（１／３）である。以下，この信号ユニットをユニットＢと称する。
【００３６】
図３は，図２（１）の画素値の変化をａｒｃｔａｎ（１／３）の方向から見た断面図である。図３において，ドットが配列されている部分が波の最小値の腹（振幅が最大となる点）となり，ドットが配列されていない部分は波の最大値の腹となっている。
【００３７】
また，ドットが密に配列されている領域はそれぞれ１ユニットの中に２つ存在するため，この例では１ユニットあたりの周波数は２となる。波の伝播方向はドットが密に配列されている方向に垂直となるため，ユニットＡの波は水平方向に対してａｒｃｔａｎ（−１／３），ユニットＢの波はａｒｃｔａｎ（１／３）となる。なお，ａｒｃｔａｎ（ａ）の方向とａｃｒｔａｎ（ｂ）の方向が垂直のとき，ａ×ｂ＝−１である。
【００３８】
信号ユニットには図２（１），（２）で示されるもの以外にも，例えば図４（３）〜（５）で示されるようなドット配列が考えられる。図４（３）は，ドット間の距離が水平軸に対してａｒｃｔａｎ（１／３）の方向に密であり，波の伝搬方向はａｒｃｔａｎ（−３）である。以下，この信号ユニットをユニットＣと称する。
図４（４）は，ドット間の距離が水平軸に対してａｒｃｔａｎ（−１／３）の方向に密であり，波の伝搬方向はａｒｃｔａｎ（３）である。以下，この信号ユニットをユニットＤと称する。図４（５）は，ドット間の距離が水平軸に対してａｒｃｔａｎ（１）の方向に密であり，波の伝搬方向はａｒｃｔａｎ（−１）である。なお，図４（５）は，ドット間の距離が水平軸に対してａｒｃｔａｎ（−１）の方向に密であり，波の伝搬方向はａｒｃｔａｎ（１）であると考えることもできる。以下，この信号ユニットをユニットＥと称する。
【００３９】
（シンボルユニット）
信号ユニットに符号語のシンボルを割り当て，信号ユニットを透かし画像に埋め込むことにより，機密情報１００４を透かし画像に埋め込むことができる。以下，符号語のシンボルを割り当てた信号ユニットを「シンボルユニット」と称する。
【００４０】
機密情報１００４を符号語に変換する際の次元数により，必要なシンボルユニットの数が定まる。機密情報を２元符号化（Ｎ＝２）する場合には，シンボルユニットを２種類（例えば，ユニットＡ，ユニットＢ）用意し，例えば，ユニットＡにシンボル０を割り当て，ユニットＢにシンボル１を割り当てることができる。また，機密情報を４元符号化（Ｎ＝４）する場合には，シンボルユニットを４種類（例えば，ユニットＡ，ユニットＢ，ユニットＣ，ユニットＤ）用意し，例えば，ユニットＡにシンボル０を，ユニットＢにシンボル１を，ユニットＣにシンボル２を，ユニットＤにシンボル３を割り当てることができる。
【００４１】
（背景ユニット）
さらに，例えば，ユニットＥに符号語のシンボルとは無関係のシンボル（例えば，機密語をＮ元符号化する場合，シンボルＮ）を割り当て，これを背景ユニットと定義し，これを隙間なく並べて透かし画像の背景とすることができる。以下，符号語のシンボルとは無関係のシンボルを割り当てた信号ユニットを「背景ユニット」と称する。背景ユニットを隙間なく並べて，そこにシンボルユニットを埋め込む場合には，埋め込もうとする位置の背景ユニットと，埋め込むシンボルユニットとを入れ替える。
【００４２】
図５（１）はユニットＥを背景ユニットと定義し，これを隙間なく並べて透かし画像の背景とした場合を示す説明図である。図５（２）は図５（１）の背景画像の中にシンボルユニットとしてのユニットＡを埋め込んだ一例を示し，図５（３）は図５（１）の背景画像の中にシンボルユニットとしてのユニットＢを埋め込んだ一例を示している。
【００４３】
図５に示した一例においては，各信号ユニット中のドットの数をすべて等しくしているため，これら信号ユニットを隙間なく並べることにより，透かし画像の見かけの濃淡が均一となる。したがって，印刷された紙面上では，単一の濃度を持つグレー画像が背景として埋め込まれているように見える。また，信号ユニットに対するシンボルの割り当ての組み合わせは無数に考えられる。このようにして，第三者（不正者）に透かし信号を簡単に解読できないようにすることができる。
【００４４】
（ユニットパターン）
機密情報１００４を符号化した符号語の各シンボルについて，単に対応するシンボルユニットを配置していくことによっても，透かし画像に機密情報１００４を埋め込むことは可能である。本実施の形態では，第三者による不正な解読を一層防止するために，符号語の各シンボルに対して，信号ユニットの配置パターン（以下，ユニットパターンと称する。）を定義し，ユニットパターンを配置することによって透かし画像に機密情報１００４を埋め込む方法について説明する。ユニットパターンの概念について，図６を参照しながら説明する。
【００４５】
図６は，本実施の形態で採用するユニットパターンとそのユニットパターンが表すシンボルの一例を示したものである。ここで，１つのユニットパターンを幅（列）×高さ（行）＝４×２の信号ユニットの行列とする。また，背景ユニットをユニットＥ（シンボル２）とし，これに埋め込まれるシンボルユニットをユニットＡ（シンボル０）及びユニットＢ（シンボル１）とする。
図６（１）では，ユニットＡ（シンボル０）を所定の閾値（例えば６）以上配置して，ユニットパターン全体としてシンボル０を表す。
図６（２）では，ユニットＢ（シンボル１）を所定の閾値（例えば６）以上配置して，ユニットパターン全体としてシンボル１を表す。
図６（３）では，ユニットＡとユニットＢとをほぼ同数（同数あるいはいずれかのシンボルユニットが１つ多い）配置して，ユニットパターン全体としてシンボル２を表す。
【００４６】
以上，電子透かし埋め込み装置１００１の構成及び透かし信号について説明した。次いで，電子透かし埋め込み装置１００１の動作について，図７〜図１３を参照しながら説明する。
【００４７】
まず，文書画像形成部１００５は，文書データ１００３を基に，文書が紙に印刷された状態の画像をページごとに作成する。この文書画像は白黒の二値画像であるものとして説明する。以下の説明中，文書画像の「文字領域」とはプリンタで印刷したときにインク（トナー）を塗布する領域であり，文書画像上では輝度値が０の画素（黒い画素）である。また，「背景領域」とはプリンタで印刷したときにインク（トナー）を塗布しない領域であり，文書画像の輝度値が１である画素（白い画素）のことを示すものとする。
【００４８】
透かし画像形成部１００６は，機密情報１００４から文書画像の背景として重ねあわせる透かし画像を作成する。以下に，透かし画像形成部１００６の動作について，図７を参照しながら説明する。図７は，透かし画像形成部１００６の処理の流れを示す説明図である。
【００４９】
（ステップＳ１１０１）
ステップＳ１１０１では，機密情報１００４をＮ元符号に変換する。Ｎは任意であるが，以下では簡単のためＮ＝２（機密情報１００４を２元符号に変換する）とする。従って，ステップＳ１１０１で生成される符号語は，０と１のビット列で表現されるものとする。このステップＳ１１０１では機密情報１００４をそのまま符号化しても良いし，暗号化したものを符号化しても良い。
【００５０】
（ステップＳ１１０２）
ステップＳ１１０２では，符号語の各シンボルに対して，図６に示したように，ユニットパターンを割り当てる。
【００５１】
（ステップＳ１１０３）
ステップＳ１１０３では，シンボルユニット配置可否行列を定義する。シンボルユニット配置可否行列は文書画像を１つのブロックの大きさがＳｗ（幅）×Ｓｈ（高さ）画素のブロック画像に分割した画像を行列で表したものであり，文書画像の対応するブロックにシンボルユニットを埋め込めるかどうかを表すものである。これは文字領域にシンボルユニットを挿入した場合には検出不可能となるため，あらかじめシンボルユニットを埋め込むことが可能な場所を指定するための行列である。行列の要素の値が１であれば，文書画像の対応するブロックにはシンボルユニットを埋め込むことが可能であり，値が０であれば背景ユニットを埋め込むことになる。ここで，Ｓｗ，Ｓｈはそれぞれ信号ユニットの幅と高さであり，入力文書画像の大きさをＷ×Ｈとするとユニット行列Ｕｍの要素数は，
幅（列）×高さ（行）＝Ｍｗ×Ｍｈ＝Ｗ／Ｓｗ×Ｈ／Ｓｈ
となる。
【００５２】
シンボルユニット配置可否行列の各要素は文書画像の対応するブロック中に文字領域が存在するかどうかによって決定する。例えば，シンボルユニット配置可否行列の任意の要素（Ｘ，Ｙ）（Ｙ行Ｘ列）は入力文書画像のｘ＝Ｘ×Ｓｗ〜（Ｘ＋１）×Ｓｗ，ｙ＝Ｙ×Ｓｈ〜（Ｙ＋１）×Ｓｈの中に含まれている文字領域（輝度値が０の画素）がＴｎ画素以下である場合には１，文字領域がＴｎ画素より大きい場合には０とする。Ｔｎは閾値でありＳｗ×Ｓｈ×０．５以下の小さな数とする。
【００５３】
図８はシンボルユニット配置可否行列作成の例を示している。図８（１）はシンボルユニット配置可否行列の各要素に対応するブロックを入力文書画像上に重ねて示したものである。図８（２）では各ブロック中に文字領域が含まれている場合に，対応するブロックの値を０としていることを示している。図８（３）では，文字領域判定結果からシンボルユニット配置可否行列の各要素の値を決定している。
【００５４】
（ステップＳ１１０４）
ステップＳ１１０４ではユニットパターン配置可否行列を作成する。これは，文書画像中のこの行列に対応する領域にユニットパターンを挿入可能な場合には要素の値が１となり，挿入不可能な場合は０となる。ユニットパターンを幅（列）×高さ（行）＝４×２の信号ユニットの行列と定義すると，ユニットパターン挿入可否の判定は以下のように行う。まず，図８（３）に示したシンボルユニット配置可否行列を４×２の領域に区分する。１つの領域を構成する８個の信号ユニットのうち，所定の閾値Ｔｕ個（Ｔｕは６程度）以上がシンボルユニット埋め込み可能（シンボルユニット配置可否行列の値が１）であればユニットパターン埋め込み可能とし，それ以外の場合はユニットパターン埋め込み不可能とする。
【００５５】
図９はユニットパターン配置可否行列の作成過程の例を説明した図である。図９（１）は１つのユニットパターンが８つの信号ユニットから構成されていることを示している。図９（２）は各ユニットパターンに対し，対応するシンボルユニット配置可否行列の要素が１である数がＴｕ（＝６）以上のユニットパターンには１が，それ以外のユニットパターンには０が与えられていることを示している。図９（３）はユニットパターン配置可否行列の各要素の値をセットしていることを示している。
【００５６】
（ステップＳ１１０５）
ステップＳ１１０５ではユニットパターン配置可否行列を参照してユニットパターン行列を作成する。符号語のシンボルは，ユニットパターン行列中に繰り返しセットされるが，ユニットパターンが埋め込み不可能な要素にはセットされない。例えば図１０のように，ユニットパターン行列およびユニットパターン配置可否行列の大きさをＰｗ×Ｐｈ＝４×４であるとし，符号語のシンボルが（００１１）の４ビットであったとする。この図ではユニットパターン配置可否行列の１行２列目の要素の値が０であるため，符号語のシンボルの２ビット目（シンボル０）はセットされずに，シンボル２がセットされ，１行３列目に符号語のシンボルの２ビット目がセットされる。
【００５７】
（ステップＳ１１０６）
ステップＳ１１０６ではユニットパターン行列とシンボルユニット配置可否行列を基にユニット行列Ｕｍを作成する。ユニット行列Ｕｍはシンボルユニット配置可否行列と同じ大きさであり，信号ユニットの配置パターンを記述する行列である。信号ユニットの配置のルールを以下のように定める。
【００５８】
・ステップ１：シンボルユニット配置可否行列において要素が０の位置には背景ユニット（シンボル２）をセットする（図１１（１））。
・ステップ２：ユニットパターン行列の要素が符号語のシンボルの場合には，ユニット行列Ｕｍの対応する領域にそのシンボルに対応するシンボルユニットをセットする（図１１（２））。
・ステップ３：ユニットパターン行列が符号語のシンボル以外（ユニットパターン配置可否行列の値が０）の場合には，０を表すシンボルユニットと１を表すシンボルユニットを同じ数だけセットする（図１１（３））。
・ステップ４：信号ユニットがセットされていない領域に背景ユニットをセットする（図１１（４））。
【００５９】
要約すれば，文字領域には背景シンボルをセットし，任意のユニットパターンのうち背景領域がＴｕ（＝６）以上あれば符号語のシンボルを割り当て，それ以外の場合は背景領域に２種類のシンボルユニットを同じ数だけ割り当てる。背景領域が奇数の場合は残りの一つには背景シンボルをセットすることになる。これにより，符号語のシンボルが割り当てられているユニットパターンには同じユニットパターンが６つ以上セットされているため，検出時には埋め込んだシンボルユニットに対するフィルタの出力値の合計値が，もう片方のフィルタの出力の合計値よりも大幅に大きくなり，符号語のシンボルが割り当てられていないユニットパターンは２つのフィルタの出力値の合計の差が小さくなる。したがって，符号語を割り当てたユニットパターンであるか割り当てていないユニットパターンであるかの判定が容易になる効果がある。
【００６０】
（埋め込み条件の記録）
電子透かし検出装置１００２における入力画像における信号ユニットの大きさは，電子透かし埋め込み装置１００１で設定した信号ユニットの大きさ，および出力デバイスの解像度と入力デバイスの解像度の比から計算することができる。
【００６１】
例えば，
・電子透かし埋め込み装置１００１で文書中に埋め込んだ信号ユニットの大きさが幅×高さ＝Ｓｗ×Ｓｈ（画素）である。
・電子透かし検出装置１００２の入力画像における信号ユニットの大きさはＳｉｗ×Ｓｉｈである。
・出力デバイス１００８の解像度がＤｏｕｔ（ｄｐｉ），入力デバイス１０１０の解像度がＤｉｎ（ｄｐｉ）である。
とすると，
Ｓｉｗ＝Ｓｗ×Ｄｉｎ／Ｄｏｕｔ
Ｓｉｈ＝Ｓｈ×Ｄｉｎ／Ｄｏｕｔ
となる。
【００６２】
しかしながらプリンタやスキャナの機械的な精度誤差（ＤｉｎやＤｏｕｔが設定通りの値になっていない）により，ＳｉｗやＳｉｈが必ずしも上記の式のようにはならない。これにより電子透かし検出装置１００２の入力画像における信号検出位置にずれが生じ，埋め込んだ機密情報１００４の検出精度が低下する。
【００６３】
例えばＳｗ＝Ｓｈ＝１２，Ｄｏｕｔ＝６００でＤｉｎ＝４００であるとすると，Ｓｉｗ＝Ｓｉｈ＝１２×４００／６００＝８となるが，ＤｏｕｔやＤｉｎに誤差が含まれていてＳｉｗやＳｉｈが８とならない場合が考えられる。Ｓｉｗの誤差が０．１％（０．００８画素の誤差，実際はもう少し大きい）であったとしてでも，Ａ４サイズの紙をスキャナ解像度４００ｄｐｉで取り込んだ場合は，入力画像の大きさはおよそ幅×高さ＝３０００×４０００（画素）程度となり，画像の左端を基準とした場合には画像の右端における位置ずれは３０００×０．００８＝２４（画素）となり，３つ分の信号ユニットのずれとなり検出精度に重大な影響を与える。
【００６４】
この解決策として，透かし画像に埋め込む信号ユニットの数を，任意の十分大きな整数Ｎｓの倍数としておくことで，入力画像の信号埋め込み領域の大きさから信号ユニットの大きさを逆算するときの誤差の吸収に利用する方法がある。以下では，これとは別の方法について説明する。
【００６５】
本実施の形態では，図１２に示したように，ユニット行列Ｕｍの４隅を透かし画像の埋め込んだ信号ユニット数（ユニット行列Ｕｍのサイズ）やプリンタの解像度などの埋め込み条件の記録領域（以下，属性記録領域と称する。）とする方法について説明する。すなわち，図１２（１）に示したように，ユニット行列Ｕｍ（ｘ，ｙ）の大きさをｘ＝１〜Ｍｗ，ｙ＝１〜ＭｈとしたときにＵｍ（１，１），Ｕｍ（Ｍｗ，１），Ｕｍ（１，Ｍｈ），Ｕｍ（Ｍｗ，Ｍｈ）付近を属性記録領域として使用する。
【００６６】
図１２（２）はユニット行列のサイズをユニット行列の属性情報記録部にセットする例の説明図である。ここではＭｗ，Ｍｈをそれぞれ１６ビットで表現し，属性記録領域１に記録する方法を示している。例えばＭｗ＝４００を１６ビットの２進数で表現すると，「０００００００１１００１００００」となる。これをユニット行列の表現規則（ステップＳ１１０５）にしたがって「１１１１１１１２２１１２１１１１」という値に変換する。これをユニット行列Ｕｍ（２，１）〜Ｕｍ（１７，１）に記録する。同様にＭｈ＝６００を二進数で表現し，ユニット行列の表現規則に変換し，これをユニット行列Ｕｍ（１，２）〜Ｕｍ（１，１７）に記録する。
【００６７】
図１２（２）のＵｍ（２，２）〜Ｕｍ（１７，１７）の領域はプリンタの解像度など他の属性を記録しても良いし，電子透かし検出装置１００２においてユニット行列のサイズを確実に検出できるよう，ＭｗとＭｈを繰り返し記録しても良い。また，ここではＵｍ（１，１）〜Ｕｍ（１７，１７）を属性記録領域１として使用したが，Ｕｍ（１，１）〜Ｕｍ（Ｘ，Ｙ）をどの範囲まで広げるかは，属性として記録する情報量やＤｏｕｔやＤｉｎに含まれている誤差などの見積もりによって変化する。さらに，属性記録領域の大きさは既知であるか，または属性記録領域の範囲を属性記録領域自体に記録し，この領域には機密情報１００４を埋め込まず，また検出時もこの領域は無視するようにする。図１２（２）では属性記録領域１について説明したが，属性記録領域２〜属性記録領域４についても同様である。
【００６８】
ユニット画像の４隅に属性情報領域を設定することによる効果は，後述の電子透かし検出装置１００２の説明部分で説明する。
【００６９】
（ステップＳ１１０７）
図１３はステップＳ１１０７の例を示している。ステップＳ１１０７ではステップＳ１１０６で作成したユニット行列Ｕｍ（図１３（１））に従って信号ユニットを背景画像に配置する（図１３（２））。信号ユニットを並べることにより作成した背景画像に文書画像を重ね合わせ，透かし入り文書画像を作成する（図１３（３））。
【００７０】
以上，電子透かし埋め込み装置装置１００１について説明した。次いで，図１，図１４〜図１９を参照しながら，電子透かし検出装置１００２について説明する。
【００７１】
（電子透かし検出装置１００２）
電子透かし検出装置１００２は，紙媒体に印刷されている文書１００９を画像として取り込み，埋め込まれている機密情報１００４を復元する装置である。電子透かし検出装置１００２は，図１に示したように，入力デバイス１０１０と，透かし検出部１０１１により構成されている。
【００７２】
入力デバイス１０１０は，スキャナなどの入力装置であり，紙に印刷された文書１００９を多値階調のグレイ画像として計算機に取り込む。また，透かし検出部１０１１は，入力画像に対してフィルタ処理を行い，埋め込まれた信号を検出する。検出された信号からシンボルを復元し，埋め込まれた機密情報１００４を取り出す。
【００７３】
以上のように構成される電子透かし検出装置１００２の動作について，図１４〜図１９を参照しながら説明する。
【００７４】
（透かし検出部１０１１，ステップＳ１２０１）
図１４は透かし検出部１０１１の処理の流れを示している。
ステップＳ１２０１では入力デバイス１０１０から透かし入り印刷文書の画像を入力する。
【００７５】
（ステップＳ１２０２）
ステップＳ１２０２では入力された画像から信号ユニットが埋め込まれている領域（以下，信号領域と称する。）の輪郭線を検出し，画像の回転などの補正を行う。
【００７６】
図１５は信号領域の検出方法の説明図である。
図１５（１）はステップＳ１２０１で入力された画像である。ここでは信号領域の上端を検出する例を示している。入力された画像をＩｍｇ（ｘ，ｙ），ｘ＝０〜Ｗｉ−１，ｙ＝０〜Ｈｉ−１とする。また，電子透かし埋め込み装置１００１で文書中に埋め込んだ信号ユニットの大きさが幅×高さ＝Ｓｗ×Ｓｈ（画素），出力デバイス１００８の解像度をＤｏｕｔ（ｄｐｉ），入力デバイス１０１０の解像度をＤｉｎ（ｄｐｉ）として，
ｔＳｗ＝ＳＷ×Ｄｉｎ／Ｄｏｕｔ
ｔＳｈ＝Ｓｈ×Ｄｉｎ／Ｄｏｕｔ
とする。すなわち，ｔＳｗとｔＳｈはＩｍｇにおける理論上の信号ユニットの大きさであり，信号検出フィルタはこの値を基に設計される。
【００７７】
この画像Ｉｍｇから信号領域の上端検出のためのサンプル領域Ｓ（ｘ），ｘ＝１〜Ｓｎを設定する。ＳｎはＷｉ／Ｎｐ（Ｎｐは１０〜２０程度の整数）であるものとする。また，Ｓ（ｘ）の幅はＷｓ＝ｔＳｗ×Ｎｔ（Ｎｔは２〜５程度の整数），高さはＨｓ＝Ｈｉ／Ｎｈ（Ｎｈは８程度）とし，Ｓ（ｘ）のＩｍｇにおける水平方向の位置はｘ×Ｎｐとする。
【００７８】
任意のＳ（ｎ）における信号領域の上端ＳＹ０（ｎ）の検出方法を以下に示す。
・ステップ１：ＩｍｇからＳ（ｎ）に対応する領域を切り取る（図１５（１））。
・ステップ２：Ｓ（ｎ）に対してフィルタＡとフィルタＢを施し，Ｓ（ｎ）内の水平方向における最大値をＦｓ（ｙ）に記録する（図１５（２））。
・ステップ３：ある閾値Ｔｙを設定し，Ｆｓ（１）〜Ｆｓ（Ｔｙ−１）の平均値をＶ０（Ｔｙ），Ｆｓ（Ｔｙ）〜Ｆｓ（Ｈｓ）の平均値をＶ１（Ｔｙ）とする。Ｖ１（Ｔｙ）−Ｖ０（Ｔｙ）が最大となるＴｙをＳ（ｎ）における信号領域の上端の位置としてＳＹ０（ｎ）にセットする（図１５（３））。
【００７９】
図１５（４）はＦｓ（ｙ）のｙに対する値の変化を示した図である。図のようにＩｍｇの信号ユニットのない領域は信号検出フィルタの出力値の平均値は小さく，一方文書画像出力部１００１によって背景部分にはシンボルユニット（ユニットＡまたはユニットＢ）を密に配置しているため信号検出フィルタの出力値が大きくなる（文書の余白部分は背景部分であり，ここにも密に埋め込んである）。したがって，信号領域とそれ以外の領域の境界付近を境に信号検出フィルタの出力値が大きく変動し，これを領域検出に利用している。
【００８０】
ステップ１〜ステップ３をＳ（ｘ），ｘ＝１〜Ｓｎについて行いＳＹ０（ｘ），ｘ＝１〜Ｓｎを得る。信号領域の上端はこれによって得られたサンプル点Ｓ０（ｘ×Ｎｐ，ＳＹ０（ｘ）），ｘ＝１〜Ｓｎを最小二乗法などを用いて直線近似して得る。他の輪郭線も上記と同様の方法を用いて検出し，例えば信号領域の上端が水平になるように信号領域を回転移動した画像を以下では入力画像と呼ぶ。
【００８１】
図１６は属性領域に埋め込まれたユニット行列の大きさを復元する方法の例を示している。ここでは，入力画像の信号領域は（Ｉｘ０，Ｉｙ０）〜（Ｉｘ１，Ｉｙ１）とし，属性記録領域１の情報を復元する例を示す。
・ステップ１：入力画像の（Ｉｘ０，Ｉｙ０）付近の領域を切り取る（図１６▲１▼）。
・ステップ２：切り取られた領域に対して属性領域１を設定する（図１６▲２▼）。属性領域１は文書画像出力部１００１で設定したものと同じものであるとし，例えばＭｗを１６ビットで表したときの最上位ビットは（Ｉｘ０＋ｔＳｗ，Ｉｙ０）に，最下位ビットは（Ｉｘ０＋ｔＳｗ×１７，Ｉｙ０）に埋め込まれているものとして検出する。
・ステップ３：ステップ２で設定したＭｗの埋め込み領域に対し，フィルタＡとフィルタＢを施し，各ビット位置でフィルタＡとフィルタＢの出力値の大きいほうに対応するシンボルユニットが，そのビット位置に埋め込まれているものと判定する（図１６▲３▼）。
・ステップ４：文書画像出力部１００１でセットしたときと逆の順序でＭｗの値を復元する（図１６▲４▼，▲５▼）。
【００８２】
入力画像における信号ユニットの大きさの理論値ｔＳｗ，ｔＳｈは誤差が含まれているものの，属性記録領域における信号検出位置は，図１５で検出した境界線をそれぞれ基準としているため，例えばＳｗ＝Ｓｈ＝１２，Ｄｏｕｔ＝６００，Ｄｉｎ＝４００の場合では，ｔＳｗ＝ｔＳｈ＝１２×４００／６００＝８であるため，属性記録領域は８×１７＝１３６画素程度の大きさしかなく，仮に誤差が１％（実際はこれより少ない）程度であっても属性領域の基準点から最も離れた位置でも１画素程度の誤差となり，ほぼ正確に信号検出位置を設定することが可能となる効果がある。
【００８３】
入力画像における信号ユニットの真の幅Ｓｉｗは，属性記録領域から取り出されたユニット行列の幅Ｍｗと図１５から得られた信号領域の幅Ｉｘ１−Ｉｘ０を基に，
Ｓｉｗ＝Ｍｗ／（Ｉｘ１−Ｉｘ０）
によって算出できる。同様に信号ユニットの真の幅Ｓｉｈは，
Ｓｉｈ＝Ｍｈ／（Ｉｙ１−Ｉｙ０）
によって算出できる。
【００８４】
（ステップＳ１２０３，Ｓ１２０４）
図１７はステップＳ１２０３とステップＳ１２０４の説明図である。ステップＳ１２０３はユニットパターンごとにフィルタ出力値の合計を計算する。図１７において，ユニットパターンＵ（ｘ，ｙ）を構成する信号ユニット毎にフィルタＡとのコンボリューション（たたみこみ積分）を計算し，それぞれの信号ユニットに対するコンボリューションの出力値の総和をユニットパターンに対するフィルタＡの出力値Ｆｕ（Ａ，ｘ，ｙ）と定義する。ただし，信号ユニット毎のコンボリューションは，フィルタＡの位置を信号ユニット毎に水平・垂直方向にずらしながら計算した結果の最大値とする。
【００８５】
フィルタＢについても同様にしてユニットパターンＵ（ｘ，ｙ）に対する出力値Ｆｕ（Ｂ，ｘ，ｙ）を計算する。
【００８６】
ステップＳ１２０４ではＦｕ（Ａ，ｘ，ｙ）とＦｕ（Ｂ，ｘ，ｙ）を比較し，これらの差の絶対値｜Ｆｕ（Ａ，ｘ，ｙ）−Ｆｕ（Ｂ，ｘ，ｙ）｜があらかじめ定められた閾値Ｔｐより小さければ符号語のシンボルが割り当てられていないものとする。それ以外の場合はＦｕ（Ａ，ｘ，ｙ）とＦｕ（Ｂ，ｘ，ｙ）の大きいほうのシンボルが割り当てられているものと判定する。すなわち，Ｆｕ（Ａ，ｘ，ｙ）＞Ｆｕ（Ｂ，ｘ，ｙ）であればＵ（ｘ，ｙ）にはシンボル０が埋め込まれ，Ｆｕ（Ａ，ｘ，ｙ）＜Ｆｕ（Ｂ，ｘ，ｙ）であればＵ（ｘ，ｙ）にはシンボル１が埋め込まれているものとする。
【００８７】
図１８はステップＳ１２０３とステップＳ１２０４を実現する別の方法の説明図である。図１８において，図１７と同様にユニットパターンＵ（ｘ，ｙ）を構成する信号ユニット毎にフィルタＡとのコンボリューションを計算する。このときシンボルユニット検出のための閾値Ｔｓを定め，各信号ユニットに対するフィルタＡのコンボリューションがＴｓ以上であれば，この信号はシンボルＡであると判定する。ユニットパターンＵ（ｘ，ｙ）内においてシンボルＡであると判定された数をユニットパターンＵ（ｘ，ｙ）に対するフィルタＡの出力値Ｆｕ（Ａ，ｘ，ｙ）と定義する。ただし，信号ユニット毎のコンボリューションは，フィルタＡの位置を信号ユニット毎に水平・垂直方向にずらしながら計算した結果の最大値とする。
【００８８】
フィルタＢについても同様にしてユニットパターンＵ（ｘ，ｙ）に対する出力値Ｆｕ（Ｂ，ｘ，ｙ）を計算する。
【００８９】
Ｆｕ（Ａ，ｘ，ｙ）とＦｕ（Ｂ，ｘ，ｙ）がユニットパターン内で検出された信号数であることを除き，ステップＳ１２０４の処理は図１７と同じとなる。
【００９０】
入力画像から得られるすべてのユニットパターンに対して図１７または図１８，もしくは図１７と図１８の処理を同時に行い，ユニットパターン行列Ｕを作成する。
【００９１】
（ステップＳ１２０５）
ステップＳ１２０５では判定されたシンボルを基に埋め込まれた情報を復号する。図１９はユニットパターン行列から符号語を取り出す方法の例を示している。図１９ではシンボルが割り当てられていない要素にはシンボル２がセットされているものとし，シンボル２がセットされている要素を無視してシンボルを取り出して符号語を復元する。
【００９２】
以上説明したように，本実施の形態によれば，以下の効果が得られる。
（ａ）透かしを挿入する文書画像の文字の配置状態を参照し，文字に重ならない領域にのみ意味のある情報（符号語のシンボル）を埋め込むため，元の文書がどのようなものであっても確実に機密情報を埋め込むことができる。
（ｂ）符号語のシンボルを埋め込まない領域には，相反する信号ユニットを同じ数だけ配置することにより，検出時にシンボルが埋め込まれていないことを確実に判定できる。
（ｃ）埋め込み情報の検出時に，ある領域に対する２つのフィルタの出力値のそれぞれの総和などによりシンボルの判定を行うため，情報検出の精度が高く保たれる。
（ｄ）埋め込んだ信号数などの属性情報をセットする属性記録領域を信号を埋め込む領域の４隅に設定することで，機密情報を検出する際に入出力デバイスのハードウェア的な誤差に影響されることなく正確に属性情報を取り出すことが可能となり，それ以降の検出精度を向上させることができる。
【００９３】
（第２の実施の形態）
図２０は，本発明の第２の実施の形態にかかる電子透かし埋め込み装置及び電子透かし検出装置の構成を示す説明図である。第１の実施の形態と異なる点は，電子透かし検出装置２００２に文字消去改ざん検出部２０１２が追加された点である。ここで，電子透かし埋め込み装置１００１の構成および動作は第１の実施の形態と同一であるものとする。
【００９４】
（文字消去改ざん検出部２０１２）
文字消去改ざん検出部２０１２は，印刷文書１００９に対して文字部分を修正液などで消去するなどの改ざんが行われた場合に，改ざんの有無および改ざん場所を検出する処理を行う部分である。
【００９５】
図２１は文字消去改ざん検出部の動作を示す流れ図である。
ここで想定している改ざんは，印刷文書１００９の文字部分一部が修正液などで消去されたような場合である。改ざん検出の基本的な原理は以下の通りである。電子透かし埋め込み装置１００１により作成された印刷文書１００９には，
（１）文字領域には背景ユニットが埋め込まれており，シンボルユニットは埋め込まれていない。
（２）背景領域（文字領域以外の領域）にはシンボルユニットが密に埋め込まれている。
という特徴がある。このため，（印刷文書の）文字領域を消去することによってその領域を背景領域に改ざんされても，その部分に新たにシンボルユニットを埋め込むことが困難であるため，信号検出時に「背景領域であるにもかかわらずシンボルユニットが埋め込まれていない領域（不正な領域）」が出現することを利用している。
【００９６】
以下の説明では，入力デバイス１０１０で入力された画像に対して回転などの補正を行った画像を，第１の実施の形態と同様に入力画像と呼ぶ。
また，
・電子透かし埋め込み装置１００１で文書中に埋め込んだ信号ユニットの大きさが幅×高さ＝Ｓｗ×Ｓｈ（画素）である。
・埋め込んだ信号ユニット数は，横×高さ＝ｎｗ×ｎｈである。
・埋め込んだシンボルユニットはユニットＡとユニットＢの二種類である。
・入力画像における信号ユニットの大きさはＳｉｗ×Ｓｉｈである。
という前提で説明を行う。
【００９７】
ここで，出力デバイスの解像度と入力デバイスの解像度が等しい場合には，Ｓｉｗ＝Ｓｗ，Ｓｉｈ＝Ｓｈとなるが，解像度が異なる場合にはそれらの比によってＳｉｗとＳｉｈが計算される。例えば出力デバイスの解像度が６００ｄｐｉで入力デバイスの解像度が４００ｄｐｉであった場合には，Ｓｉｗ＝Ｓｗ×２／３，Ｓｉｈ＝Ｓｈ×２／３となる。
【００９８】
（ステップＳ２１０１）
ステップＳ２１０１では入力画像を二値化することによって文字領域を検出する。入力デバイスによって入力された画像において，文字領域は輝度値が小さく（黒に近い色），背景領域は輝度値が大きい（白に近い色）。印刷文書には全面に透かし信号が埋め込まれているが，透かし信号は単位面積当たりのドットの数が文字領域に比べて非常に少ないため，二値化処理によって，透かし信号を含む背景領域と文字領域とを切り分けることができる。二値化の閾値はあらかじめ定めておいても良いし，入力画像の輝度値の分布から判別分析法などの画像処理手法を用いて動的に定めても良い。
【００９９】
図２２は，文字領域抽出画像の説明図である。
二値化された画像において，文字領域に対応する画素値を０，背景領域に対応する画素値を１と定める。二値化によって得られる画像をさらにｎｗ×ｎｈの大きさに縮小した画像を文字領域抽出画像と呼ぶ（図２２）。縮小する際は縮小前の二値画像のＳｉｗ×Ｓｉｈ画素ブロックから１画素の値を決定するが，縮小前の二値画像のＳｉｗ×Ｓｉｈ画素のうち値が０である画素が所定の閾値Ｔｎ以上であれば，それに対応する文字領域抽出画像の画素値を０とする。閾値ＴｎはＳｉｗ×Ｓｉｈ×０．５程度の値とする。
【０１００】
ステップＳ２１０２では入力画像に対してユニット行列Ｕｍの作成およびシンボルユニットの検出を行い，ユニット抽出画像を生成する。
【０１０１】
（ステップＳ２１０２）
図２３はステップＳ２１０２の説明図である。まず，入力画像をｎｗ×ｎｈ個のブロック（１ブロックの大きさはＳｉｗ×Ｓｉｈ）に分割し（図２３（２）），分割されたブロックごとに信号検出フィルタによりシンボルユニットの検出を行う。
【０１０２】
シンボルユニットが検出できたかどうかの判定は図１８の説明と同様に，信号検出フィルタＡまたはＢのどちらかの出力値が閾値Ｔｓを超えた場合，そのブロックにはシンボルユニットが埋め込まれていると判定する。ここでは，どちらの信号が埋め込まれているかを区別する必要がないことに注意する。すべてのブロックに対してシンボルユニットの検出が行われた結果を画像で表したものをユニット抽出画像と呼ぶ（図２３（３））。ユニット抽出画像の大きさは分割されたブロック数と同じでＳｗ×Ｓｈである。図２３のユニット抽出画像において，画像の大きさは白い画素（画素値は１）はシンボルユニットが検出されたものとし，黒い画素（画素値は０）はシンボルユニットが検出できなかったものとする。
【０１０３】
（ステップＳ２１０３）
ステップＳ２１０３ではステップＳ２１０１で作成した文字領域抽出画像とステップＳ２１０２で作成したユニット抽出画像を比較し，改ざんを検出する。入力画像の文字領域に重なる部分にはもともとシンボルユニットを埋め込んでいない，すなわち背景部分にのみシンボルユニットを埋め込んでいるため，ユニット抽出画像におけるシンボルユニット検出領域（画素値は１）と文字領域抽出画像における背景領域（画素値は１）は一致する。したがって文字領域抽出画像とユニット抽出画像の差分画像を生成し，値が０にならない領域が「印刷文書（紙）に対して辞書消去による改ざんが行われた部分」として検出される。
【０１０４】
図２４はステップＳ２１０３の説明図である。ここでは改ざんが行われていないため，差分画像はすべての画素の値が０となる。
【０１０５】
図２５は改ざんがあった場合の説明図である。入力画像の文字領域が消去されており，文字領域抽出画像のそれに対応する領域からは文字領域が抽出されないため画素値が１となる（図２５（３））。しかしながら同じ領域からユニットシンボルは検出されないため，ユニット抽出画像の対応する領域は画素値が０となる（図２５（６））。したがって，これらの差分画像を計算したときに画素値が０とならない領域が出現し，ここが改ざん場所であることが特定できる（図２５（９））。
【０１０６】
なお，信号ユニットの大きさを非常に小さいものにし，１つの文字の隙間にもシンボルユニットを埋め込むことによって，修正液等で文字を消去して上から異なる文字を上書したとしても，文字の隙間に埋め込んだシンボルユニットが消されているため，これが検出不能になる。したがってこれを利用すれば，ユニット抽出画像において，シンボルユニットが全く検出できない領域があれば，改ざんされている可能性があるとすることも可能である。
【０１０７】
文字領域抽出画像およびユニット抽出画像は，それぞれ領域拡大縮小などの画像処理手法を用いて雑音除去等を行って高周波成分を除去した後，改ざん検出を行っても良い。これにより，雑音成分に影響されずに安定した改ざん検出を行うことが可能となる。
【０１０８】
以上説明したように，本実施の形態によれば，以下の効果が得られる。
（ｅ）印刷された文書に対し，主に任意の文字列を修正液などで消去するような不正があった場合，その文書の原文がなくても改ざんが検出でき，改ざん場所も特定できる。
【０１０９】
（第３の実施の形態）
図２６は本発明の第３の実施の形態の構成を示す図である。第１の実施の形態と異なる点は，電子透かし埋め込み装置３００１に埋め込み信号数記録部３０１２が追加され，電子透かし検出装置３００２に埋め込み信号数検出部３０１３，フィルタ出力値算出部３０１４，最適閾値判定部３０１５，検出信号計数部３０１６および改ざん判定部３０１７が追加された点である。ここで，その他の構成要素の動作は第１の実施の形態と同一であるものとする。
【０１１０】
（埋め込み信号数記録部３０１２）
埋め込み信号数記録部３０１２は，透かし画像に埋め込む信号数を透かし画像自体に記録する部分である。この信号数は原文の文書画像の文字数やレイアウトに依存して変化する。
【０１１１】
（埋め込み信号数検出部３０１３）
埋め込み信号数検出部３０１３は埋め込み信号数記録部３０１２で記録した数値を復元し，透かし画像作成時に埋め込んだ信号数を取り出す。
【０１１２】
（フィルタ出力値算出部３０１４）
フィルタ出力値算出部３０１４では入力画像に対する信号検出フィルタの出力値を計算し，信号埋め込み位置ごとに記録する。
【０１１３】
（最適閾値判定部３０１５）
最適閾値判定部３０１５ではフィルタ出力値算出部３０１４で計算した出力値と埋め込み信号数検出部３０１３で得た埋め込み信号数を利用して，信号検出のための最適な閾値を計算する。
【０１１４】
（検出信号計数部３０１６）
検出信号計数部３０１６は最適閾値判定部３０１５で得られた閾値を用いて入力画像に埋め込まれた信号数を数える。
【０１１５】
（改ざん判定部３０１７）
改ざん判定部３０１７は，埋め込み信号数検出部３０１３で記録した正解数と検出信号計数部３０１６で得られた信号数を比較することにより改ざん等が行われたかどうかの判定，および改ざん場所の特定を行う。
【０１１６】
（埋め込み信号数記録部３０１２）
図２７は埋め込み信号数記録部３０１２の動作を示す流れ図である。
【０１１７】
（ステップＳ３１０１）
図２８はステップＳ３１０１の説明図である。ステップＳ３１０１では，まずユニット行列Ｕｍ（図２８（２））の左端のＩｗ個分の要素を埋め込みシンボルユニット数の記録用のユニット（記録用ユニット帯と呼ぶ）として使用する（図２８（３））。次に，ユニット行列Ｕｍの記録用ユニット帯を除いた部分を（横×縦＝）Ｂｗ×Ｂｈ個のブロックに分割する（これをユニット数記録単位行列Ｎｕ（ｘ，ｙ）ｘ＝１〜Ｂｗ，ｙ＝１〜Ｂｈと呼ぶ）。各ブロックの大きさはユニット行列Ｕｍの要素数を大きさの単位として（幅×高さ＝）ｂｗ×ｂｈとする（図２８（４））。
【０１１８】
ユニット行列Ｕｍの左端に記録用ユニット帯を配置する場合，ユニット数記録単位行列に関して設定可能なパラメータは，横方向のブロック数，ブロックの高さ方向の大きさである。残りの縦方向のブロック数とブロックの幅方向の大きさは，設定したパラメータおよび記録用ユニット帯の幅，ユニット行列Ｕｍのパラメータから自動的に決定される。
【０１１９】
以下の説明では，ユニット行列Ｕｍの大きさ（要素数）をＭｗ×Ｍｈとしたとき，横方向のブロック数をＢｗ＝４，ブロックの高さ方向の大きさをｂｈ＝１６，記録用ユニット帯の幅をＩｗ＝４とする。したがって，縦方向のブロック数はＢｈ＝Ｍｈ／ｂｈ＝Ｍｈ／１６，ブロックの幅方向の大きさはｂｗ＝（Ｍｈ−Ｉｗ）／Ｂｗ＝（Ｍｈ−４）／４となる。
【０１２０】
（ステップＳ３１０２，ステップＳ３１０３）
図２９はステップＳ３１０２およびステップＳ３１０３の説明図である。ステップＳ３１０２ではユニット行列Ｕｍにおいてユニット数記録単位行列の各要素に対応する領域に含まれるシンボルユニットの数を計測する。図２９の例ではユニット数記録単位行列Ｎｕ（Ｘ，Ｙ）におけるシンボルユニット数の計測方法を示しており，以下のステップにより実行される。
・ステップ１：Ｎｕ（Ｘ，Ｙ）に対応するユニット行列Ｕｍ上での領域を取り出す（図２９▲１▼，▲２▼）。
・ステップ２：ステップ１で取り出された領域内に埋め込まれているシンボルユニットの数を計測する（図２９▲３▼，▲４▼）。
ここで，シンボルユニットの埋め込み規則は第１の実施の形態で説明したものと同様に，入力文書画像の文字領域にはシンボルユニットは埋め込まれていないものとする。図２９の例では，この領域に埋め込まれたシンボルユニット数は７１であったものとする。
【０１２１】
ステップＳ３１０３はステップＳ３１０２で計測されたシンボルユニット数を記録用ユニット帯に記録する。以下にそのステップを示す。
・ステップ３：Ｎ（Ｘ，Ｙ）＝７１を二進数で表現する（図２９▲６▼）。
・ステップ４：ステップ３の結果を記録用ユニット帯の対応する領域にセットする。（図２９▲７▼，▲８▼）
【０１２２】
ここで示した例は，ユニット数記録単位行列の１行に対応するユニット行列Ｕｍの行数ｂｈを１６，記録用ユニット帯の幅Ｉｗを４としているため，ユニット数記録単位行列の各行に対して記録用のユニット数はＩｗ×ｂｈ＝４×１６＝６４となる。またユニット数記録単位行列の列数Ｂｗは４であるため，ユニット数記録単位行列の１つの要素に割り当てられる記録用のユニット数（単位記録ユニット数と呼ぶ）はＩｗ×ｂｈ侶ｗ＝６４／４＝８となる。したがって，ユニット記録単位行列の各行に対応する記録用ユニット帯の１〜２行目にはユニット記録単位行列の１列目の情報を，３〜４行目には２列目，５〜６行目には３列目，７〜８行目には４列目の情報をそれぞれ単位記録ユニット数（８ビット）で記録することになる。
【０１２３】
この例ではユニット数を記録しているが，ユニット記録単位行列の「各要素に対応するユニット行列Ｕｍの領域中に埋め込むことができる信号ユニット数の最大値」に対する「シンボルユニット数」の割合を記録しても良い。割合を記録する方式は，「ユニット記録単位行列の各要素に対応するユニット行列Ｕｍの範囲が大きく，その中に含まれるユニット数も多くなり，この数を表現するために必要なビット数が単位記録ユニット数を超えるような場合」や「ユニット記録単位行列の列数を増やしたため，ユニット記録単位行列の一つの要素の情報を表現するために割り当てられる単位記録ユニット数が少なくなった場合」に有効となる。また，改ざん場所の特定はユニット記録単位行列の要素単位に行うため，同じ入力文書画像に対してユニット記録単位行列の行数や列数を増やすことにより，印刷文書に対する改ざん場所の特定を詳細に行うことが可能になる利点があるが，それだけ記録用ユニット帯を大きく取るか，または単位記録ユニット数を小さくする必要がある。
【０１２４】
なお，記録用ユニット帯は文書画像の文字領域に重ならないよう，文書画像の余白部分に設定する。また，記録用ユニット帯はユニット行列Ｕｍの右端，または上端，下端に設定しても，以降の処理を「記録用ユニット帯が文書画像の上下にある」という前提で行えば同じ効果が得られる。
【０１２５】
さらに，ユニット行列Ｕｍの左右に記録用ユニット帯を設定し，それぞれ同じ情報をセットしても良い。この場合，用紙が汚れたりして片方の記録用ユニット帯の情報が読み取れなくなった場合でも，もう一方の記録用ユニット帯から情報を読み取ることにより，安定して改ざん検出等の処理を行うことができる効果がある。これは上下方向についても同様である。
【０１２６】
なお，記録用ユニット帯を文書画像のどこに設定したかは，第１の実施の形態で説明した属性記録領域に記述することにより，既知である必要はなくなる。
【０１２７】
（埋め込み信号数検出部３０１３）
以下の説明では第２の実施の形態の説明と同様に，
・電子透かし埋め込み装置３００１で文書中に埋め込んだ信号ユニットの大きさがＳｗ×Ｓｈ（画素）である。
・埋め込んだ信号ユニット数は，横×高さ＝ｎｗ×ｎｈである。
・埋め込んだシンボルユニットはユニットＡとユニットＢの二種類である。
・入力画像における信号ユニットの大きさはＳｉｗ×Ｓｉｈである。
という前提で説明を行う。
【０１２８】
図３０は埋め込み信号数検出部３０１３の説明図である。埋め込み信号数の検出は以下のステップで行う。
・ステップ１：入力画像をＳｗ×Ｓｈ個のブロックに分割して，ユニット行列Ｕｍを設定する（図３０▲１▼）。
・ステップ２：ユニット行列Ｕｍの記録用ユニット帯に相当する部分を取り出す（図３０▲２▼）。
・ステップ３：記録用ユニット帯に信号検出フィルタを施すことによって埋め込んだビット列を復元する（図３０▲３▼，▲４▼）。図２９▲３▼において，記録用ユニット帯に相当するユニット行列Ｕｍの各要素に対応する入力画像上の領域に対し，２つのフィルタ（フィルタＡとフィルタＢ）の出力値を計算し，出力値が大きいほうのフィルタに対応するシンボルユニットが埋め込まれているものとする。この例ではフィルタＡの出力値が大きいためユニットＡ（シンボル０）が埋め込まれていると判定されている。
・ステップ４：復元されたビット列を基にユニット数記録単位行列を復元する（図３０▲５▼）。
【０１２９】
（フィルタ出力値算出部３０１４）
図３１はフィルタ出力値算出部３０１４の説明図である。
ここでは埋め込み信号数検出部で設定したユニット行列Ｕｍの各要素に対して，以下のステップにより信号検出フィルタの出力値を記録する。
・ステップ１：ユニット行列Ｕｍの任意の要素に対応する入力画像の領域に対して信号検出フィルタ（フィルタＡとフィルタＢ）の出力値を計算する（図３０（１））。信号検出フィルタはそれぞれ対象とする領域に対して上下左右にずらしながら出力値を計算し，フィルタＡによる出力値の最大値とフィルタＢによる出力値の最大値の大きいほうを求める。
・ステップ２：ユニット行列Ｕｍのすべての要素についてステップ１を行い，出力値をフィルタ出力値行列Ｆｍ（ｘ，ｙ），ｘ＝１〜Ｓｗ，ｙ＝１〜Ｓｈの対応する要素に記録する。
【０１３０】
（最適閾値判定部３０１５）
図３２は最適閾値判定部３０１５の説明図である。
ここでの閾値は，ユニット行列Ｕｍの各領域に対応する入力画像の領域にユニットシンボルが埋め込まれているかどうかを判定するための閾値（Ｔｓと呼ぶ）であり，フィルタ出力値行列の任意の要素の値が閾値Ｔｓを超えたならば，入力画像のそれに対応する位置にはシンボルユニットが埋め込まれているものと判定する。最適閾値判定は以下のステップで行われる。
・ステップ１：フィルタ出力値行列の要素（信号検出フィルタの出力値）の平均Ｆａ，標準偏差Ｆｓなどから閾値ｔｓの初期値を設定する（図３２▲１▼）。ここでは例えば初期値をｔｓ＝Ｆａ−Ｆｓ＊３とする。
・ステップ２：フィルタ出力値行列をｔｓによって二値化し，ユニット抽出画像を作成する（図３２▲２▼）。
・ステップ３：ユニット抽出画像に対してユニット数記録単位行列を当てはめる（図３２▲３▼）。
・ステップ４：ユニット抽出画像のユニット数記録単位行列の各要素に対応する領域中のシンボルユニット数を数え，ユニット数記録単位行列に記録する（図３２▲４▼）。
・ステップ５：埋め込み信号数検出部３０１３で復号された記録用ユニット帯に記録されていたシンボルユニット数とステップ４から得られたシンボルユニット数の差分の絶対値をユニット数記録単位行列の要素毎に計算し，すべての要素についての合計値をＳｆ（ｔｓ）とする（図３２▲３▼）。
・ステップ６：Ｓｆ（ｔｓ）が最小となるｔｓをＴｓとして記録する（図３２▲３▼）。
・ステップ７：ｔｓにΔｔを加え，ｔｓを更新する（図３２▲７▼）。Δｔはあらかじめ定めた値か，ステップ１で求めた標準偏差Ｆｓ（例えばΔｔ＝Ｆｓ×０．１とするなど）から算出しても良い。
・ステップ８：Ｔｓが予定した値に達したならば終了する。そうでなければ，ステップ１に戻る（図３２▲８▼）。
【０１３１】
（検出信号計数部３０１６）
図３３は検出信号計数部３０１６の説明図である。
この部分の処理は最適閾値判定部３０１５から得られた最適閾値Ｔｓによってフィルタ出力値行列を二値化したユニット抽出画像を用いて，最適閾値判定部３０１５とほぼ同一の処理を行う。
・ステップ１：フィルタ出力値行列をＴｓによって二値化し，ユニット抽出画像を作成する（図３３▲１▼）。
・ステップ２：ユニット抽出画像に対してユニット数記録単位行列を当てはめる（図３３▲２▼）。
・ステップ３：ユニット抽出画像のユニット数記録単位行列の各要素に対応する領域中のシンボルユニット数を数え，ユニット数記録単位行列に記録する（図３３▲３▼）。
・ステップ４：埋め込み信号数検出部３０１３で復号された記録用ユニット帯に記録されていたシンボルユニット数とステップ３から得られたシンボルユニット数の差分Ｄ（Ｘ，Ｙ）をユニット数記録単位行列の要素毎に計算する（図３３▲４▼）。ユニット数記録単位行列の任意の要素Ｎｕ（Ｘ，Ｙ）におけるＤ（Ｘ，Ｙ）は，記録用ユニット帯から復元されたユニットシンボル数をＲ（Ｘ，Ｙ），ステップ３で計測されたユニットシンボル数をＣ（Ｘ，Ｙ）としてＤ（Ｘ，Ｙ）＝Ｒ（Ｘ，Ｙ）−ｃ（Ｘ，Ｙ）によって計算されるものとする。
【０１３２】
（改ざん判定部３０１７）
ユニット数記録単位の任意の要素Ｎ（Ｘ，Ｙ）における改ざんの判定はＤ（Ｘ，Ｙ）を用いて以下のように行う。
（１）文字が追加された改ざん：Ｄ（Ｘ，Ｙ）＞ＴＡ（ＴＡは正の整数）「記録されていたユニットシンボル数より検出されたユニットシンボル数の方が少ない場合には，本来埋め込まれていたユニットシンボルの上に文字が追加されたために検出不能になったと判断。」
（２）文字が消去された改ざん：Ｄ（Ｘ，Ｙ）＜ＴＣ（ＴＣは負の整数）「記録されていたユニットシンボル数より検出されたユニットシンボル数の方が多い場合には，本来埋め込まれていなかったユニットシンボルが検出された。これは文字を修正液などで消去した後に，画像に文字領域ではない汚れが生じ，これに信号検出フィルタが反応したものと判断。第２の実施の形態で検出できなかった文字消去による改ざんを検出するため。」
（３）改ざんなし：（１），（２）以外
【０１３３】
以上説明したように，本実施の形態によれば，以下の効果が得られる。
（ｆ）文書画像をいくつかのブロックに分割し，各ブロック内に埋め込んだシンボルユニット数を記録することで，ブロック単位で改ざん検出を行うことができ，改ざん場所の特定が可能となる。
（ｇ）埋め込んだ信号数を記録することで，信号検出や改ざん検出のための最適な閾値を求めることができる。
（ｈ）印刷された文書に対し，空白部分に文字列を追加したり，任意の文字列を修正液などで消去するような不正があった場合，その文書の原文がなくても改ざんが検出でき，改ざん場所も特定できる。
【０１３４】
以上，添付図面を参照しながら本発明にかかる電子透かし埋め込み装置，電子透かし検出装置，電子透かし埋め込み方法，及び，電子透かし検出方法の好適な実施形態について説明したが，本発明はかかる例に限定されない。当業者であれば，特許請求の範囲に記載された技術的思想の範疇内において各種の変更例または修正例に想到し得ることは明らかであり，それらについても当然に本発明の技術的範囲に属するものと了解される。
【０１３５】
例えば，第３の実施の形態の処理のあとに第１の実施の形態の処理を行い，最適閾値判定部３０１５で得られた最適閾値Ｔｓを第１の実施の形態のステップＳ１２０３およびステップＳ１２０４の処理（図１８）の閾値Ｔｓとして用いても良い。これによりＴｓを固定値とした場合よりも埋め込み情報の抽出の精度が向上する。
【０１３６】
同様に第３の実施の形態の処理のあとに第２の実施の形態の処理を行い，最適閾値判定部３０１５で得られた最適閾値Ｔｓを第２の実施の形態の閾値Ｔｓとして用いることも可能である。これによりＴｓを固定値とした場合よりも改ざん検出の精度が向上する。
【０１３７】
【発明の効果】
以上説明したように，本発明の主要な効果を挙げれば以下の通りである。
（ａ）透かしを挿入する文書画像の文字の配置状態を参照し，文字に重ならない領域にのみ意味のある情報（符号語のシンボル）を埋め込むため，元の文書がどのようなものであっても確実に機密情報を埋め込むことができる。
（ｂ）符号語のシンボルを埋め込まない領域には，相反する信号ユニットを同じ数だけ配置することにより，検出時にシンボルが埋め込まれていないことを確実に判定できる。
（ｃ）埋め込み情報の検出時に，ある領域に対する２つのフィルタの出力値のそれぞれの総和などによりシンボルの判定を行うため，情報検出の精度が高く保たれる。
（ｄ）埋め込んだ信号数などの属性情報をセットする属性記録領域を信号を埋め込む領域の４隅に設定することで，機密情報を検出する際に入出力デバイスのハードウェア的な誤差に影響されることなく正確に属性情報を取り出すことが可能となり，それ以降の検出精度を向上させることができる。
【０１３８】
さらに，上記発明の実施の形態で説明した応用例によれば，さらに，以下の効果を得ることができる。
（ｅ）印刷された文書に対し，主に任意の文字列を修正液などで消去するような不正があった場合，その文書の原文がなくても改ざんが検出でき，改ざん場所も特定できる。
（ｆ）文書画像をいくつかのブロックに分割し，各ブロック内に埋め込んだシンボルユニット数を記録することで，ブロック単位で改ざん検出を行うことができ，改ざん場所の特定が可能となる。
（ｇ）埋め込んだ信号数を記録することで，信号検出や改ざん検出のための最適な閾値を求めることができる。
（ｈ）印刷された文書に対し，空白部分に文字列を追加したり，任意の文字列を修正液などで消去するような不正があった場合，その文書の原文がなくても改ざんが検出でき，改ざん場所も特定できる。
【図面の簡単な説明】
【図１】電子透かし埋め込み装置及び電子透かし検出装置の構成を示す説明図である。
【図２】信号ユニットの一例を示す説明図であり，（１）はユニットＡを，（２）はユニットＢを示している。
【図３】図２（１）の画素値の変化をａｒｃｔａｎ（１／３）の方向から見た断面図である。
【図４】信号ユニットの一例を示す説明図であり，（３）はユニットＣを，（４）はユニットＤを，（５）はユニットＥを示している。
【図５】背景画像の説明図であり，（１）はユニットＥを背景ユニットと定義し，これを隙間なく並べた透かし画像の背景とした場合を示し，（２）は（１）の背景画像の中にユニットＡを埋め込んだ一例を示し，（３）は（１）の背景画像の中にユニットＢを埋め込んだ一例を示している。
【図６】ユニットパターンの一例を示す説明図である。
【図７】透かし画像形成部１００６の処理の流れを示す流れ図である。
【図８】シンボルユニット配置可否行列の一例を示す説明図である
【図９】ユニットパターン配置可否行列の一例を示す説明図である。
【図１０】ユニットパターン行列の一例を示す説明図である。
【図１１】ユニット行列の一例を示す説明図である。
【図１２】属性記録領域の説明図である。
【図１３】ステップＳ１１０７の一例を示す説明図である。
【図１４】透かし検出部１０１１の処理の流れを示す説明図である。
【図１５】信号領域の検出方法の説明図である。
【図１６】属性領域に埋め込まれたユニット行列の大きさを復元する方法の一例を示す説明図である。
【図１７】ステップＳ１２０３とステップＳ１２０４の説明図である。
【図１８】ステップＳ１２０３とステップＳ１２０４を実現する別の方法の説明図である。
【図１９】ユニットパターン行列から符号語を取り出す方法の一例を示す説明図である。
【図２０】本発明の第２の実施の形態の構成を示す図である。
【図２１】文字消去改ざん検出部の動作を示す流れ図である。
【図２２】文字領域抽出画像の説明図である。
【図２３】ステップＳ２１０２の説明図である。
【図２４】ステップＳ２１０３の説明図である。
【図２５】改ざんがあった場合の説明図である。
【図２６】本発明の第３の実施の形態の構成を示す説明図である。
【図２７】埋め込み信号数記録部３０１２の動作を示す流れ図である。
【図２８】ステップＳ３１０１の説明図である。
【図２９】ステップＳ３１０２およびステップＳ３１０３の説明図である。
【図３０】埋め込み信号数検出部３０１３の説明図である。
【図３１】フィル列出力値算出部３０１４の説明図である。
【図３２】最適閾値判定部３０１５の説明図である。
【図３３】検出信号計数部３０１６の説明図である。
【符号の説明】
１００１，２００１，３００１電子透かし埋め込み装置
１００２，２００２，３００２電子透かし検出装置
１００３，２００３，３００３文書データ
１００４，２００４，３００４機密情報
１００５，２００５，３００５文書画像形成部
１００６，２００６，３００６透かし画像形成部
１００７，２００７，３００７透かし入り文書画像合成部
１００８，２００８，３００８出力デバイス
１００９，２００９，３００９印刷文書（紙）
１０１０，２０１０，３０１０入力デバイス
１０１１，２０１１透かし検出部
２０１２文字消去改ざん検出部
３０１２埋め込み信号記録部
３０１３埋め込み信号数検出部
３０１４フィルタ出力値算出部
３０１５最適閾値判定部
３０１６検出信号数計数部
３０１７改ざん判定部[0001]
BACKGROUND OF THE INVENTION
In the present invention, when a character string is added to or deleted from a watermarked document that has already been printed, the document is scanned with a scanner or the like. Pi The present invention relates to a digital watermark embedding / detection technique for specifying whether or not there is a change from the original and performing processing by inputting it to a computer.
[0002]
[Prior art]
“Digital watermarking” that embeds information to prevent copying / counterfeiting and confidential information in images and document data in a form that is invisible to the human eye is based on the premise that all storage and data transfer are performed on electronic media. Since the information embedded by the watermark is not deteriorated or lost, the information can be reliably detected. Similarly, documents printed on paper media cannot be easily tampered with in a form that is not visually unsightly other than text to prevent unauthorized tampering and copying. There is a need for a method for embedding confidential information in a print document.
[0003]
The following techniques are known as digital watermark embedding methods for black and white binary documents that are most widely used as printed materials.
[0004]
[1] Japanese Patent Laid-Open No. 9-179494 “Confidential Information Recording Method”
Assume that printing is performed with a printer of 400 dpi or more. Information is digitized and information is expressed by the distance (number of dots) between the reference point mark and the position determination mark.
[0005]
[2] Japanese Patent Application Laid-Open No. 2001-78006 “Method and apparatus for embedding / detecting watermark information in a monochrome binary document image”
A minimum rectangle surrounding an arbitrary character string is divided into several blocks and divided into two groups (group 1 and group 2) (the number of groups may be three or more). For example, when the signal is 1, the feature amount in the block of group 1 is increased and the feature amount in each block of group 2 is decreased. When the signal is 0, the reverse operation is performed. The feature amount in the block includes the number of pixels in the character area, the thickness of the character, and the distance to the point where the block first hits the character area.
[0006]
[3] JP 2001-53954 A "Information embedding device, information reading device, digital watermark system, information embedding method, information reading method and recording medium"
The width and height of the minimum rectangle that encloses one character are defined as the feature amount for the character, and the symbol is represented by a classification pattern of the magnitude relationship of the feature amount between two or more characters. For example, six feature quantities can be defined from three characters, and combinations of patterns of these magnitude relations are listed, these combinations are classified into two groups, and a symbol is given to each. If the information to be embedded is symbol 0 and the combination pattern of the feature values of the character selected to represent this is symbol 1, the character region is expanded by expanding one of the six feature values. Change. The pattern to be changed is selected so that the amount of change is minimized.
[0007]
[4] Japanese Patent Application No. 10-200743 “Document Processing Device”
Information is expressed by whether or not the screen lines of the line screen (special screen composed of fine parallel lines) are moved backward.
[0008]
[Problems to be solved by the invention]
By the way, when a change such as overwriting or erasure of characters is performed on a printed document, it is necessary to compare it with the original in order to know the changed part. When changes are made by handwriting or the like, they can be easily identified because they are different from the printed characters. However, when overwriting is performed with the same font, it is difficult to distinguish the changed portions. In the conventional examples [1] to [4], there is a method for determining whether or not the information embedded in the printed document cannot be read correctly. The detection accuracy is not high because the information cannot be read even if it is done.
[0009]
The present invention has been made in view of the above-mentioned problems of the conventional digital watermark embedding / detection technology, and an object of the present invention is new and improved capable of improving the detection accuracy of confidential information. An electronic watermark embedding device, an electronic watermark detection device, an electronic watermark embedding method, and an electronic watermark detection method are provided.
[0010]
[Means for Solving the Problems]
In order to solve the above problems, according to a first aspect of the present invention, there is provided a digital watermark embedding apparatus that embeds confidential information in a document image with a digital watermark. The digital watermark embedding device (1001) of the present invention includes a watermark image forming unit (1006) that creates a watermark image based on the confidential information (1004) with reference to the document image, and the watermark image forming unit (1006) 1006) calculates an embedding area for embedding a dot pattern capable of identifying a predetermined symbol by a predetermined filter in the document image, and whether the ratio of the character area to the embedding area is equal to or less than a predetermined threshold value. A dot pattern (symbol) that can identify a symbol that forms at least a part of the confidential information in an area that does not overlap with the character area of the embedding area when the ratio of the character area is equal to or less than a predetermined threshold. Embed a certain number of units) Characteristic And
[0011]
Further, the watermark image forming unit (1006) embeds a plurality of types and a predetermined number of the symbol units in a region that does not overlap the character region in the embedding region where the ratio of the character region exceeds a predetermined threshold. Characteristic And
[0012]
Further, the watermark image forming unit (1006) embeds a dot pattern (background unit) which is a symbol unrelated to the confidential information in an area overlapping the character area of the embedding area. Characteristic And
[0013]
In order to solve the above problem, according to a second aspect of the present invention, there is provided a digital watermark detection apparatus for detecting confidential information embedded in a document image by a digital watermark. In the digital watermark detection apparatus (1002) of the present invention, the document image is divided into a plurality of embedding areas for embedding a dot pattern capable of identifying a predetermined symbol by a predetermined filter. The confidential information is embedded by embedding a dot pattern (symbol unit) that can identify a symbol that forms at least a part of information, or a dot pattern (background unit) that is a symbol unrelated to the confidential information. Yes.
[0014]
And a watermark detection unit (1011) for detecting the confidential information, wherein the watermark detection unit includes a plurality of types of filters that can identify a predetermined symbol from the dot pattern, and each of the embedding areas includes the plurality of filters. Matching is performed using different types of filters, and at least one of the confidential information corresponding to the one filter from the embedded region in which the number of matching of one filter is very large compared to the number of matching of all the other filters. Part is detected.
[0015]
The watermark detection unit may not detect the confidential information from the embedding area where there is no difference between the matching numbers of the filters.
[0016]
In order to solve the above problem, according to a third aspect of the present invention, there is provided a digital watermark embedding method for embedding confidential information in a document image by digital watermark. The digital watermark embedding method of the present invention includes the following first to fifth steps.
(1) A first step of dividing the document image into a plurality of embedding areas for embedding a dot pattern capable of identifying a predetermined symbol by a predetermined filter.
(2) A second step of determining, for each of the embedding areas, whether or not the character area ratio is equal to or less than a predetermined threshold value.
(3) A dot pattern (symbol unit) that can identify a symbol that forms at least a part of the confidential information in an area that does not overlap the character area of the embedded area when the ratio of the character area is equal to or less than a predetermined threshold. A third step of embedding a predetermined number.
(4) A fourth step of embedding a plurality of types and a predetermined number of the symbol units in an area that does not overlap the character area in the embedding area where the ratio of the character area exceeds a predetermined threshold.
(5) A fifth step of embedding a dot pattern (background unit), which is a symbol unrelated to the confidential information, in an area overlapping the character area of the embedding area.
[0017]
In order to solve the above problems, according to a fourth aspect of the present invention, there is provided a digital watermark detection method for detecting confidential information embedded in a document image with a digital watermark. In the electronic watermark detection method of the present invention, the document image is divided into a plurality of embedding areas for embedding a dot pattern capable of identifying a predetermined symbol by a predetermined filter, and each embedded area includes at least the confidential information. The confidential information is embedded by embedding a dot pattern (symbol unit) that can identify a part of the symbol or a dot pattern (background unit) that is a symbol unrelated to the confidential information.
[0018]
Then, for each of the embedded regions, matching is performed using a plurality of types of filters. From the embedded region, the number of matching of one filter is much larger than the number of matching of all the other filters. It is characterized in that at least a part of the confidential information corresponding to the filter is detected, and the confidential information is not detected from the embedded area where there is no difference between the matching numbers of the filters.
[0019]
According to the digital watermark embedding device, the digital watermark detection device, the digital watermark embedding method, and the digital watermark detection device, the following effects can be obtained.
(A) By referring to the character arrangement state of the document image into which the electronic watermark is inserted and embedding a valid symbol unit that forms a part of confidential information only in an area other than the character area, what kind of original document is Confidential information can be embedded without fail.
(B) By embedding a predetermined number of symbol units in a region where symbol units are not embedded, it is possible to reliably determine that a valid symbol unit forming part of confidential information is not embedded at the time of detection. it can.
(C) When detecting embedded information, symbols are determined based on the sum of output values of a plurality of types of filters for a certain area, so that the accuracy of information detection is kept high.
[0020]
Further, it is preferable that the watermarked document image composition unit records various attribute information at the time of embedding and / or detecting the confidential information at four corners of the watermarked document image. According to this configuration, the following effects can be obtained.
(D) By setting the attribute recording areas for setting the attribute information such as the number of embedded signals at the four corners of the signal embedding area, it is affected by hardware errors of the input / output device when detecting confidential information. Therefore, it is possible to accurately extract the attribute information without increasing the detection accuracy thereafter.
[0021]
Furthermore, in the digital watermark detection apparatus, it is possible to add a function for performing falsification detection. That is, another digital watermark detection apparatus (2001) of the present invention further includes a character erasure / alteration detection unit (2012), and the character erasure / alteration detection unit detects a document image in which the confidential information is embedded with a predetermined threshold. By binarizing, a character area extraction image is created with the pixel value of the character area set to 0 and the pixel value of the background area set to 1, and the symbol unit of the document image in which the confidential information is embedded cannot be detected. By creating a symbol unit extracted image with a pixel value of 0 and a pixel value of a region where the symbol unit can be detected as 1, and comparing the character region extracted image with the symbol unit extracted image (by generating a difference image) , Detecting alteration of the watermarked document image. According to such a configuration, the following effects can be further obtained.
(E) If there is a fraud that erases an arbitrary character string in a printed document, it is possible to detect falsification even if there is no original text of the document, and to specify the falsification location.
[0022]
Furthermore, in the digital watermark detection apparatus, it is possible to add a function for performing falsification detection by inserting additional information for falsification detection together with confidential information. That is, another digital watermark detection apparatus (3001) of the present invention detects the number of the symbol units embedded when the confidential information is embedded in the document image, the embedded signal number detection unit (3013), and the input The filter output value calculation unit (3014), the detection value of the embedded signal number detection unit, and the calculation of the filter output value calculation unit that calculate the output value of the predetermined filter for the image and record it for each embedding area And an optimum threshold value determination unit (3015) for calculating an optimum threshold value for detecting the number of the symbol units embedded in the watermark image from the value, and the actual value embedded in the watermarked document image. A detection signal counting unit (3016) for detecting the number of symbol units, a detection value of the embedded signal number detection unit, and the signal detector By comparing the count value of the part, it determines the presence or absence of tampering with the watermarked document image, and tampering judgment unit (3017), and further comprising a.
[0023]
According to such a configuration, the following effects can be further obtained.
(F) By dividing the document image into several blocks and recording the number of symbol units embedded in each block, alteration detection can be performed in units of blocks, and the alteration location can be specified.
(G) By recording the number of embedded signals, it is possible to obtain an optimum threshold for signal detection and tamper detection.
(H) If a printed document contains an illegal character such as adding a character string to a blank area or erasing an arbitrary character string with correction fluid, the alteration is detected without the original document. And tampering locations can be identified.
[0024]
DETAILED DESCRIPTION OF THE INVENTION
Exemplary embodiments of a digital watermark embedding device, a digital watermark detection device, a digital watermark embedding method, and a digital watermark detection method according to the present invention will be described below in detail with reference to the accompanying drawings. In the present specification and drawings, components having substantially the same functional configuration are denoted by the same reference numerals, and redundant description is omitted.
[0025]
(First embodiment)
FIG. 1 is an explanatory diagram showing the configuration of the digital watermark embedding device and the digital watermark detection device according to the first embodiment of the present invention. First, the digital watermark embedding device 1001 will be described.
[0026]
(Digital watermark embedding apparatus 1001)
An electronic watermark embedding device 1001 is a device that forms a document image based on document data 1003 and confidential information 1004 embedded in the document, and prints it on a paper medium. The document data 1003 is data including font information and layout information, and is data created by a document creation tool such as word processing software. The confidential information 1004 is information embedded in a paper medium in a format other than characters, and is various data such as characters, images, and sounds.
[0027]
As shown in FIG. 1, the digital watermark embedding apparatus 1001 includes a document image forming unit 1005, a watermark image forming unit 1006, a watermarked document image synthesizing unit 1007, and an output device 1008.
[0028]
(Document Image Forming Unit 1005)
The document image forming unit 1005 creates an image in a state where the document data 1003 is printed on a paper surface. Specifically, the white pixel region in the document image is a portion where nothing is printed, and the black pixel region is a portion where black paint is applied. In this embodiment, the description will be made on the assumption that printing is performed on white paper with black ink (single color). However, the present invention is not limited to this, and printing is performed in color (multicolor). However, the present invention can be similarly applied.
[0029]
(Watermark image forming unit 1006)
The watermark image forming unit 1006 digitizes the confidential information 1004 and converts it into a numerical value, and performs N-ary encoding (N is a natural number of 2 or more. The encoded bit string is referred to as a “code word”). Each symbol of the word is assigned to a watermark signal prepared in advance. This embodiment is characterized by the watermark embedding operation by the watermark image forming unit 1006. In other words, the watermark image forming unit 1006 refers to the document data 1003 when embedding the confidential information 1004 in the document data 1003, thereby reliably embedding the confidential information 1004 whatever the document data 1003 is. be able to. The watermark signal and its embedding operation will be described later.
[0030]
(Watermarked document image composition unit 1007, output device 1008)
The watermarked document image composition unit 1007 creates a watermarked document image by superimposing the document image and the watermark image. The output device 1008 is an output device such as a printer, and prints a watermarked document image on a paper medium. Therefore, the document image forming unit 1005, the watermark image forming unit 1006, and the watermarked document image synthesizing unit 1007 may be realized as one function in the printer driver.
[0031]
The print document 1009 is printed by embedding confidential information 1004 in the original document data 1003, and is physically stored and managed.
[0032]
The digital watermark embedding apparatus 1001 is configured as described above.
Next, a watermark signal used when the watermark image forming unit 1006 embeds confidential information 1004 in the document data 1003 will be described.
[0033]
(Signal unit)
The watermark signal represents a wave having an arbitrary wavelength and direction by an arrangement of dots (black pixels). Hereinafter, a rectangle having a width and a height of Sw and Sh is referred to as a “signal unit” as one signal unit. FIG. 2 is an explanatory diagram showing an example of the signal unit.
[0034]
Although the width Sw and the height Sh may be different, in the present embodiment, Sw = Sh is set for ease of explanation. The unit of length is the number of pixels. In the example of FIG. 2, Sw = Sh = 12. The size when these signals are printed on the paper surface depends on the resolution of the watermark image. For example, the watermark image is an image of 600 dpi (dot per inch: unit of resolution, number of dots per inch). 2, the width and height of the signal unit in FIG. 2 is 12/600 = 0.02 (inch) on the printed document.
[0035]
In FIG. 2A, the distance between dots is dense in the direction of arctan (3) (arctan is an inverse function of tan) with respect to the horizontal axis, and the wave propagation direction is arctan (−1/3). . Hereinafter, this signal unit is referred to as unit A. In FIG. 2B, the distance between dots is dense in the direction of arctan (−3) with respect to the horizontal axis, and the wave propagation direction is arctan (1/3). Hereinafter, this signal unit is referred to as unit B.
[0036]
FIG. 3 is a cross-sectional view of the change in the pixel value in FIG. 2A as viewed from the direction of arctan (1/3). In FIG. 3, the portion where the dots are arranged is the antinode of the minimum value of the wave (the point where the amplitude is maximum), and the portion where the dots are not arranged is the antinode of the maximum value of the wave.
[0037]
In addition, since there are two regions in each unit where dots are densely arranged, the frequency per unit is 2 in this example. Since the wave propagation direction is perpendicular to the direction in which the dots are densely arranged, the wave of unit A is arctan (-1/3) with respect to the horizontal direction, and the wave of unit B is arctan (1/3). Become. Note that a × b = −1 when the direction of arctan (a) and the direction of actan (b) are perpendicular.
[0038]
In addition to the signal units shown in FIGS. 2 (1) and (2), for example, dot arrangements as shown in FIGS. 4 (3) to (5) are conceivable. In FIG. 4C, the distance between the dots is dense in the direction of arctan (1/3) with respect to the horizontal axis, and the wave propagation direction is arctan (−3). Hereinafter, this signal unit is referred to as unit C.
In FIG. 4 (4), the distance between dots is dense in the direction of arctan (−1/3) with respect to the horizontal axis, and the wave propagation direction is arctan (3). Hereinafter, this signal unit is referred to as unit D. In FIG. 4 (5), the distance between dots is dense in the direction of arctan (1) with respect to the horizontal axis, and the wave propagation direction is arctan (−1). In FIG. 4 (5), it can be considered that the distance between the dots is dense in the direction of arctan (−1) with respect to the horizontal axis, and the wave propagation direction is arctan (1). Hereinafter, this signal unit is referred to as unit E.
[0039]
(Symbol unit)
By assigning codeword symbols to the signal unit and embedding the signal unit in the watermark image, the confidential information 1004 can be embedded in the watermark image. Hereinafter, a signal unit to which a codeword symbol is assigned is referred to as a “symbol unit”.
[0040]
The number of required symbol units is determined by the number of dimensions when the confidential information 1004 is converted into a code word. When the confidential information is subjected to binary encoding (N = 2), two types of symbol units (for example, unit A and unit B) are prepared. For example, symbol 0 is assigned to unit A and symbol 1 is assigned to unit B. Can be assigned. When the confidential information is quaternary encoded (N = 4), four types of symbol units (for example, unit A, unit B, unit C, unit D) are prepared. For example, symbol 0 is assigned to unit A. , Symbol 1 can be assigned to unit B, symbol 2 can be assigned to unit C, and symbol 3 can be assigned to unit D.
[0041]
(Background unit)
Further, for example, a symbol irrelevant to the codeword symbol (for example, symbol N in the case of N-way encoding of a confidential word) is assigned to the unit E, this is defined as a background unit, and this is arranged without any gaps. Can be the background. Hereinafter, a signal unit to which a symbol unrelated to the codeword symbol is assigned is referred to as a “background unit”. When arranging background units without gaps and embedding symbol units therein, the background unit at the position to be embedded is replaced with the symbol unit to be embedded.
[0042]
FIG. 5 (1) is an explanatory diagram showing a case where the unit E is defined as a background unit and arranged as a background of a watermark image without gaps. FIG. 5 (2) shows an example in which the unit A as a symbol unit is embedded in the background image of FIG. 5 (1), and FIG. 5 (3) shows the symbol unit in the background image of FIG. 5 (1). An example in which the unit B is embedded is shown.
[0043]
In the example shown in FIG. 5, since the number of dots in each signal unit is all equal, by arranging these signal units without gaps, the apparent density of the watermark image becomes uniform. Therefore, it appears that a gray image having a single density is embedded as a background on the printed paper. There are an infinite number of combinations of symbol assignments for signal units. In this way, it is possible to prevent a third party (illegal person) from easily deciphering the watermark signal.
[0044]
(Unit pattern)
It is also possible to embed the confidential information 1004 in the watermark image by simply arranging the corresponding symbol unit for each symbol of the codeword obtained by encoding the confidential information 1004. In the present embodiment, in order to further prevent unauthorized decoding by a third party, a signal unit arrangement pattern (hereinafter referred to as a unit pattern) is defined for each symbol of a code word, and the unit pattern is defined as a unit pattern. A method of embedding the confidential information 1004 in the watermark image by arranging the images will be described. The concept of the unit pattern will be described with reference to FIG.
[0045]
FIG. 6 shows an example of a unit pattern employed in the present embodiment and symbols represented by the unit pattern. Here, one unit pattern is a matrix of signal units of width (column) × height (row) = 4 × 2. Also, the background unit is unit E (symbol 2), and the symbol units embedded therein are unit A (symbol 0) and unit B (symbol 1).
In FIG. 6 (1), unit A (symbol 0) is arranged at a predetermined threshold (for example, 6) or more, and symbol 0 is represented as an entire unit pattern.
In FIG. 6 (2), unit B (symbol 1) is arranged at a predetermined threshold (for example, 6) or more, and symbol 1 is represented as an entire unit pattern.
In FIG. 6 (3), the unit A and the unit B are arranged in substantially the same number (the same number or one of the symbol units is more than one), and the symbol 2 is represented as a whole unit pattern.
[0046]
The configuration of the digital watermark embedding apparatus 1001 and the watermark signal have been described above. Next, the operation of the digital watermark embedding apparatus 1001 will be described with reference to FIGS.
[0047]
First, the document image forming unit 1005 creates, for each page, an image in which the document is printed on paper based on the document data 1003. This document image will be described as a monochrome binary image. In the following description, a “character area” of a document image is an area to which ink (toner) is applied when printed by a printer, and is a pixel having a luminance value of 0 (black pixel) on the document image. The “background region” is a region where ink (toner) is not applied when printing with a printer, and indicates a pixel (white pixel) having a luminance value of 1 in the document image.
[0048]
The watermark image forming unit 1006 creates a watermark image to be superimposed as a background of the document image from the confidential information 1004. The operation of the watermark image forming unit 1006 will be described below with reference to FIG. FIG. 7 is an explanatory diagram showing a processing flow of the watermark image forming unit 1006.
[0049]
(Step S1101)
In step S1101, the confidential information 1004 is converted into an N-element code. N is arbitrary, but in the following, for simplicity, it is assumed that N = 2 (the confidential information 1004 is converted into a binary code). Accordingly, the code word generated in step S1101 is represented by a bit string of 0 and 1. In step S1101, the confidential information 1004 may be encoded as it is, or the encrypted information may be encoded.
[0050]
(Step S1102)
In step S1102, a unit pattern is assigned to each symbol of the code word as shown in FIG.
[0051]
(Step S1103)
In step S1103, a symbol unit arrangement availability matrix is defined. The symbol unit arrangement availability matrix represents an image obtained by dividing a document image into block images having a block size of Sw (width) × Sh (height) pixels, and the corresponding block of the document image. Indicates whether a symbol unit can be embedded. This is a matrix for designating a place where a symbol unit can be embedded in advance because it cannot be detected when a symbol unit is inserted in the character area. If the value of the matrix element is 1, the symbol unit can be embedded in the corresponding block of the document image, and if the value is 0, the background unit is embedded. Here, Sw and Sh are the width and height of the signal unit, respectively. If the size of the input document image is W × H, the number of elements of the unit matrix Um is
Width (column) x height (row) = Mw x Mh = W / Sw x H / Sh
It becomes.
[0052]
Each element of the symbol unit arrangement availability matrix is determined by whether or not a character area exists in the corresponding block of the document image. For example, arbitrary elements (X, Y) (Y rows and X columns) of the symbol unit arrangement availability matrix are x = X × Sw to (X + 1) × Sw, y = Y × Sh to (Y + 1) × Sh of the input document image. If the character area (pixels with a luminance value of 0) included in the is less than or equal to Tn pixels, 1 is assumed, and if the character area is larger than Tn pixels, 0 is assumed. Tn is a threshold and is a small number of Sw × Sh × 0.5 or less.
[0053]
FIG. 8 shows an example of creating a symbol unit arrangement availability matrix. FIG. 8A shows the blocks corresponding to the elements of the symbol unit arrangement availability matrix superimposed on the input document image. FIG. 8B shows that the value of the corresponding block is set to 0 when each block includes a character area. In FIG. 8 (3), the value of each element of the symbol unit arrangement availability matrix is determined from the character area determination result.
[0054]
(Step S1104)
In step S1104, a unit pattern arrangement availability matrix is created. The element value is 1 when the unit pattern can be inserted into the region corresponding to this matrix in the document image, and 0 when the unit pattern cannot be inserted. If a unit pattern is defined as a matrix of signal units of width (columns) × height (rows) = 4 × 2, determination of whether or not a unit pattern can be inserted is performed as follows. First, the symbol unit arrangement availability matrix shown in FIG. 8 (3) is divided into 4 × 2 regions. Of the eight signal units constituting one area, if a predetermined threshold value Tu (Tu is about 6) or more can be embedded in a symbol unit (the value of the symbol unit arrangement availability matrix is 1), the unit pattern can be embedded. In other cases, the unit pattern cannot be embedded.
[0055]
FIG. 9 is a diagram for explaining an example of the process of creating the unit pattern arrangement availability matrix. FIG. 9 (1) shows that one unit pattern is composed of eight signal units. In FIG. 9B, for each unit pattern, 1 is assigned to a unit pattern in which the number of elements of the corresponding symbol unit arrangement availability matrix being 1 is Tu (= 6) or more, and 0 is assigned to the other unit patterns. It shows that it is given. FIG. 9 (3) shows that the value of each element of the unit pattern arrangement availability matrix is set.
[0056]
(Step S1105)
In step S1105, a unit pattern matrix is created with reference to the unit pattern arrangement availability matrix. The codeword symbol is repeatedly set in the unit pattern matrix, but is not set in an element in which the unit pattern cannot be embedded. For example, as shown in FIG. 10, it is assumed that the size of the unit pattern matrix and the unit pattern arrangement availability matrix is Pw × Ph = 4 × 4, and the codeword symbol is 4 bits of (0011). In this figure, since the value of the element in the first row and second column of the unit pattern arrangement matrix is 0, the second bit (symbol 0) of the codeword symbol is not set, but the symbol 2 is set and the first row The second bit of the symbol of the code word is set in the third column.
[0057]
(Step S1106)
In step S1106, a unit matrix Um is created based on the unit pattern matrix and the symbol unit arrangement availability matrix. The unit matrix Um is the same size as the symbol unit arrangement availability matrix and describes the arrangement pattern of signal units. The rules for the arrangement of signal units are defined as follows.
[0058]
Step 1: A background unit (symbol 2) is set at a position where the element is 0 in the symbol unit arrangement availability matrix (FIG. 11 (1)).
Step 2: If the element of the unit pattern matrix is a codeword symbol, the symbol unit corresponding to the symbol is set in the corresponding region of the unit matrix Um (FIG. 11 (2)).
Step 3: If the unit pattern matrix is not a codeword symbol (the unit pattern arrangement availability matrix has a value of 0), the same number of symbol units representing 0 and symbol units representing 1 are set (FIG. 11 ( 3)).
Step 4: A background unit is set in an area where no signal unit is set (FIG. 11 (4)).
[0059]
In summary, a background symbol is set in the character area, and if the background area of the arbitrary unit pattern is Tu (= 6) or more, a codeword symbol is assigned. Otherwise, two types of symbols are assigned to the background area. Assign the same number of units. If the background area is odd, a background symbol is set for the remaining one. As a result, since the unit pattern to which the codeword symbol is assigned has six or more identical unit patterns, the total output value of the filter for the embedded symbol unit at the time of detection is the same as that of the other filter. The unit pattern to which the code word symbol is not allocated becomes much smaller than the total output value, and the difference between the total output values of the two filters is small. Therefore, it is easy to determine whether the code pattern is a unit pattern assigned or not assigned.
[0060]
(Recording embedding conditions)
The size of the signal unit in the input image in the digital watermark detection apparatus 1002 can be calculated from the size of the signal unit set in the digital watermark embedding apparatus 1001 and the ratio between the resolution of the output device and the resolution of the input device.
[0061]
For example,
The size of the signal unit embedded in the document by the digital watermark embedding apparatus 1001 is width × height = Sw × Sh (pixel).
The size of the signal unit in the input image of the digital watermark detection apparatus 1002 is Siw × Sih.
The resolution of the output device 1008 is Dout (dpi), and the resolution of the input device 1010 is Din (dpi).
Then,
Siw = Sw × Din / Dout
Sih = Sh × Din / Dout
It becomes.
[0062]
However, Siw and Sih do not necessarily have the above formulas due to mechanical accuracy errors of printers and scanners (Din and Dout are not as set). As a result, a shift occurs in the signal detection position in the input image of the digital watermark detection apparatus 1002, and the detection accuracy of the embedded confidential information 1004 decreases.
[0063]
For example, if Sw = Sh = 12, Dout = 600 and Din = 400, then Siw = Sih = 12 × 400/600 = 8, but Dout and Din contain errors, and Siw and Sih are 8. There are cases where this is not possible. Even if the Siw error is 0.1% (0.008 pixel error, actually a little larger), when A4 size paper is captured at a scanner resolution of 400 dpi, the size of the input image is approximately width x height. When the left edge of the image is used as a reference, the displacement at the right edge of the image is 3000 × 0.008 = 24 (pixels), which is a displacement of three signal units. Significantly affects accuracy.
[0064]
As a solution to this, the number of signal units to be embedded in the watermark image is set to a multiple of an arbitrarily large integer Ns, so that the error when calculating the size of the signal unit from the size of the signal embedding area of the input image is reduced. There is a method used for absorption. In the following, another method will be described.
[0065]
In the present embodiment, as shown in FIG. 12, the recording area (hereinafter referred to as the recording area) of the embedding conditions such as the number of signal units (the size of the unit matrix Um) in which the watermark image is embedded at the four corners of the unit matrix Um and the resolution of the printer. A method referred to as an attribute recording area) will be described. That is, as shown in FIG. 12A, when the size of the unit matrix Um (x, y) is x = 1 to Mw and y = 1 to Mh, Um (1, 1), Um (Mw , 1), Um (1, Mh), and the vicinity of Um (Mw, Mh) are used as attribute recording areas.
[0066]
FIG. 12B is an explanatory diagram of an example in which the unit matrix size is set in the attribute information recording unit of the unit matrix. Here, a method of expressing Mw and Mh by 16 bits and recording them in the attribute recording area 1 is shown. For example, when Mw = 400 is expressed by a binary number of 16 bits, it becomes “0000011100100000”. This is converted into a value of “1111111221121111” according to the unit matrix expression rule (step S1105). This is recorded in the unit matrices Um (2,1) to Um (17,1). Similarly, Mh = 600 is expressed by a binary number, converted to a unit matrix expression rule, and recorded in unit matrices Um (1,2) to Um (1,17).
[0067]
In the area Um (2, 2) to Um (17, 17) in FIG. 12 (2), other attributes such as the resolution of the printer may be recorded. In the digital watermark detection apparatus 1002, the size of the unit matrix is surely determined. Mw and Mh may be recorded repeatedly so that they can be detected. In addition, Um (1, 1) to Um (17, 17) are used as the attribute recording area 1 here, but to what range Um (1, 1) to Um (X, Y) is extended is an attribute. It varies depending on the amount of information to be recorded and the estimation of errors contained in Dout and Din. Further, the size of the attribute recording area is known, or the range of the attribute recording area is recorded in the attribute recording area itself, and the confidential information 1004 is not embedded in this area, and this area is ignored at the time of detection. To. Although the attribute recording area 1 has been described in FIG. 12B, the same applies to the attribute recording area 2 to the attribute recording area 4.
[0068]
The effect of setting the attribute information areas at the four corners of the unit image will be described in the description part of the digital watermark detection apparatus 1002 described later.
[0069]
(Step S1107)
FIG. 13 shows an example of step S1107. In step S1107, the signal units are arranged in the background image according to the unit matrix Um (FIG. 13 (1)) created in step S1106 (FIG. 13 (2)). The document image is superimposed on the background image created by arranging the signal units, and a watermarked document image is created (FIG. 13 (3)).
[0070]
The digital watermark embedding device 1001 has been described above. Next, the digital watermark detection apparatus 1002 will be described with reference to FIGS. 1 and 14 to 19.
[0071]
(Digital watermark detection apparatus 1002)
The digital watermark detection apparatus 1002 is an apparatus that takes in a document 1009 printed on a paper medium as an image and restores embedded confidential information 1004. As shown in FIG. 1, the digital watermark detection apparatus 1002 includes an input device 1010 and a watermark detection unit 1011.
[0072]
The input device 1010 is an input device such as a scanner, and captures a document 1009 printed on paper as a multi-value gray image in a computer. In addition, the watermark detection unit 1011 performs a filtering process on the input image and detects an embedded signal. The symbol is restored from the detected signal, and the embedded confidential information 1004 is taken out.
[0073]
The operation of the digital watermark detection apparatus 1002 configured as described above will be described with reference to FIGS.
[0074]
(Watermark Detection Unit 1011, Step S1201)
FIG. 14 shows a processing flow of the watermark detection unit 1011.
In step S1201, an image of a watermarked print document is input from the input device 1010.
[0075]
(Step S1202)
In step S1202, a contour line of a region in which the signal unit is embedded (hereinafter referred to as a signal region) is detected from the input image, and correction such as image rotation is performed.
[0076]
FIG. 15 is an explanatory diagram of a signal region detection method.
FIG. 15A shows the image input in step S1201. Here, an example in which the upper end of the signal area is detected is shown. Assume that the input image is Img (x, y), x = 0 to Wi-1, and y = 0 to Hi-1. Also, the size of the signal unit embedded in the document by the digital watermark embedding apparatus 1001 is width × height = Sw × Sh (pixel), the resolution of the output device 1008 is Dout (dpi), and the resolution of the input device 1010 is Din ( dpi)
tSw = SW × Din / Dout
tSh = Sh × Din / Dout
And That is, tSw and tSh are theoretical signal unit sizes in Img, and the signal detection filter is designed based on this value.
[0077]
A sample region S (x), x = 1 to Sn for detecting the upper end of the signal region is set from the image Img. Sn is assumed to be Wi / Np (Np is an integer of about 10 to 20). Further, the width of S (x) is Ws = tSw × Nt (Nt is an integer of about 2 to 5), the height is Hs = Hi / Nh (Nh is about 8), and the horizontal direction of S (x) in Img The position of is assumed to be x × Np.
[0078]
A method for detecting the upper end SY0 (n) of the signal region at an arbitrary S (n) will be described below.
Step 1: A region corresponding to S (n) is cut from Img (FIG. 15 (1)).
Step 2: Filter A and Filter B are applied to S (n), and the maximum value in the horizontal direction in S (n) is recorded in Fs (y) (FIG. 15 (2)).
Step 3: A certain threshold value Ty is set, and the average value of Fs (1) to Fs (Ty-1) is set to V0 (Ty), Fs (Ty) The average value of ~ Fs (Hs) is defined as V1 (Ty). Ty that maximizes V1 (Ty) −V0 (Ty) is set to SY0 (n) as the position of the upper end of the signal region in S (n) (FIG. 15 (3)).
[0079]
FIG. 15 (4) is a diagram showing a change in the value of Fs (y) with respect to y. As shown in the figure, in the area without the Img signal unit, the average value of the output value of the signal detection filter is small. On the other hand, the symbol unit (unit A or unit B) is densely arranged in the background portion by the document image output unit 1001. Therefore, the output value of the signal detection filter becomes large (the margin part of the document is the background part, which is also embedded densely here). Therefore, the output value of the signal detection filter greatly fluctuates around the boundary between the signal region and the other regions, and this is used for region detection.
[0080]
Steps 1 to 3 are performed for S (x), x = 1 to Sn to obtain SY0 (x), x = 1 to Sn. The upper end of the signal region is obtained by linearly approximating the sample points S0 (xNp, SY0 (x)) and x = 1 to Sn obtained by this using the least square method or the like. Other contour lines are also detected using the same method as described above. For example, an image obtained by rotating the signal area so that the upper end of the signal area is horizontal is referred to as an input image below.
[0081]
FIG. 16 shows an example of a method for restoring the size of the unit matrix embedded in the attribute area. Here, an example in which the signal area of the input image is (Ix0, Iy0) to (Ix1, Iy1) and information in the attribute recording area 1 is restored is shown.
Step 1: A region near (Ix0, Iy0) of the input image is cut out ((1) in FIG. 16).
Step 2: Set the attribute area 1 for the clipped area ((2) in FIG. 16). The attribute area 1 is assumed to be the same as that set in the document image output unit 1001. For example, when Mw is expressed by 16 bits, the most significant bit is (Ix0 + tSw, Iy0), and the least significant bit is (Ix0 + tSw × 17, Detected as being embedded in Iy0).
Step 3: Apply filter A and filter B to the Mw embedding area set in step 2, and the symbol unit corresponding to the larger output value of filter A and filter B at each bit position It is determined that it is embedded ((3) in FIG. 16).
Step 4: The value of Mw is restored in the reverse order as set by the document image output unit 1001 (FIG. 16 (4), (5)).
[0082]
Although the theoretical values tSw and tSh of the size of the signal unit in the input image include an error, the signal detection position in the attribute recording area is based on the boundary line detected in FIG. 15, for example, Sw = Sh = 12, Dout = 600, Din = 400, tSw = tSh = 12 × 400/600 = 8, so the attribute recording area is only about 8 × 17 = 136 pixels, and there is an error of 1 Even if it is about% (actually less than this), an error of about one pixel occurs at a position farthest from the reference point of the attribute area, and there is an effect that the signal detection position can be set almost accurately.
[0083]
The true width Siw of the signal unit in the input image is based on the width Mw of the unit matrix extracted from the attribute recording area and the width Ix1-Ix0 of the signal area obtained from FIG.
Siw = Mw / (Ix1-Ix0)
Can be calculated. Similarly, the true width Sih of the signal unit is
Sih = Mh / (Iy1-Iy0)
Can be calculated.
[0084]
(Steps S1203 and S1204)
FIG. 17 is an explanatory diagram of steps S1203 and S1204. In step S1203, the sum of the filter output values is calculated for each unit pattern. In FIG. 17, the convolution (convolution integration) with the filter A is calculated for each signal unit constituting the unit pattern U (x, y), and the sum of the output values of the convolution for each signal unit is filtered for the unit pattern. It is defined as an output value Fu (A, x, y) of A. However, the convolution for each signal unit is the maximum value calculated as the position of the filter A is shifted in the horizontal and vertical directions for each signal unit.
[0085]
Similarly, the output value Fu (B, x, y) for the unit pattern U (x, y) is calculated for the filter B.
[0086]
In step S1204, Fu (A, x, y) and Fu (B, x, y) are compared, and the absolute value | Fu (A, x, y) −Fu (B, x, y) | If it is smaller than a predetermined threshold value Tp, it is assumed that no codeword symbol is assigned. In other cases, it is determined that the larger symbol of Fu (A, x, y) and Fu (B, x, y) is assigned. That is, if Fu (A, x, y)> Fu (B, x, y), symbol 0 is embedded in U (x, y), and Fu (A, x, y) <Fu (B, x , Y), it is assumed that symbol 1 is embedded in U (x, y).
[0087]
FIG. 18 is an explanatory diagram of another method for realizing steps S1203 and S1204. In FIG. 18, the convolution with the filter A is calculated for each signal unit constituting the unit pattern U (x, y) as in FIG. At this time, a threshold Ts for symbol unit detection is determined, and if the convolution of the filter A for each signal unit is equal to or greater than Ts, it is determined that this signal is the symbol A. The number determined to be the symbol A in the unit pattern U (x, y) is defined as the output value Fu (A, x, y) of the filter A for the unit pattern U (x, y). However, the convolution for each signal unit is the maximum value calculated as the position of the filter A is shifted in the horizontal and vertical directions for each signal unit.
[0088]
Similarly, the output value Fu (B, x, y) for the unit pattern U (x, y) is calculated for the filter B.
[0089]
The processing in step S1204 is the same as that in FIG. 17 except that Fu (A, x, y) and Fu (B, x, y) are the number of signals detected in the unit pattern.
[0090]
The unit pattern matrix U is created by simultaneously performing the processing of FIG. 17 or FIG. 18 or the processing of FIG. 17 and FIG. 18 on all the unit patterns obtained from the input image.
[0091]
(Step S1205)
In step S1205, the embedded information is decoded based on the determined symbol. FIG. 19 shows an example of a method for extracting a code word from a unit pattern matrix. In FIG. 19, it is assumed that symbol 2 is set to an element to which no symbol is assigned, and the code word is restored by taking out the symbol while ignoring the element to which symbol 2 is set.
[0092]
As described above, according to the present embodiment, the following effects can be obtained.
(A) Referencing the character arrangement state of the document image into which the watermark is to be inserted, and embedding meaningful information (symbols of codewords) only in the area that does not overlap the character, so what the original document is Even confidential information can be embedded securely.
(B) By arranging the same number of conflicting signal units in the area where no symbol of the code word is embedded, it is possible to reliably determine that no symbol is embedded at the time of detection.
(C) When embedding information is detected, symbol determination is performed based on the sum of the output values of two filters for a certain area, so that the accuracy of information detection is kept high.
(D) By setting the attribute recording areas for setting the attribute information such as the number of embedded signals at the four corners of the signal embedding area, it is affected by hardware errors of the input / output device when detecting confidential information. Therefore, it is possible to accurately extract the attribute information without increasing the detection accuracy thereafter.
[0093]
(Second Embodiment)
FIG. 20 is an explanatory diagram showing the configuration of the digital watermark embedding device and the digital watermark detection device according to the second embodiment of the present invention. The difference from the first embodiment is that a character erasure / alteration detection unit 2012 is added to the digital watermark detection apparatus 2002. Here, it is assumed that the configuration and operation of the digital watermark embedding apparatus 1001 are the same as those in the first embodiment.
[0094]
(Character Erase / Falsification Detection Unit 2012)
The character erasure / alteration detection unit 2012 is a part that performs processing for detecting the presence / absence of alteration and the location of alteration when the print document 1009 is altered such as by erasing the character portion with a correction liquid.
[0095]
FIG. 21 is a flowchart showing the operation of the character erasure / alteration detection unit.
The alteration assumed here is a case where a part of the character portion of the printed document 1009 has been erased with a correction liquid or the like. The basic principle of falsification detection is as follows. The print document 1009 created by the digital watermark embedding apparatus 1001 includes
(1) The background unit is embedded in the character area, and the symbol unit is not embedded.
(2) Symbol units are densely embedded in the background area (area other than the character area).
There is a feature. For this reason, even if the character area (of the printed document) is deleted and the area is altered to the background area, it is difficult to embed a new symbol unit in that area. Despite this, it is used that the “region where the symbol unit is not embedded (illegal region)” appears.
[0096]
In the following description, an image obtained by performing correction such as rotation on an image input by the input device 1010 is referred to as an input image, as in the first embodiment.
Also,
The size of the signal unit embedded in the document by the digital watermark embedding apparatus 1001 is width × height = Sw × Sh (pixel).
The number of embedded signal units is horizontal × height = nw × nh.
-There are two types of embedded symbol units: Unit A and Unit B.
The size of the signal unit in the input image is Siw × Sih.
The explanation is based on the premise of
[0097]
Here, when the resolution of the output device is equal to the resolution of the input device, Siw = Sw and Sih = Sh. However, when the resolutions are different, Siw and Sih are calculated based on their ratio. For example, when the resolution of the output device is 600 dpi and the resolution of the input device is 400 dpi, Siw = Sw × 2/3 and Sih = Sh × 2/3.
[0098]
(Step S2101)
In step S2101, the character area is detected by binarizing the input image. In the image input by the input device, the character area has a small luminance value (color close to black) and the background area has a large luminance value (color close to white). Although the watermark signal is embedded in the entire surface of the printed document, the watermark signal has very few dots per unit area compared to the character area. The area can be separated. The binarization threshold may be determined in advance, or may be determined dynamically using an image processing method such as a discriminant analysis method from the distribution of luminance values of the input image.
[0099]
FIG. 22 is an explanatory diagram of a character area extraction image.
In the binarized image, the pixel value corresponding to the character area is set to 0, and the pixel value corresponding to the background area is set to 1. An image obtained by further reducing the image obtained by binarization to a size of nw × nh is referred to as a character region extracted image (FIG. 22). At the time of reduction, the value of one pixel is determined from the Siw × Sih pixel block of the binary image before reduction, but the pixel having a value of 0 among the Siw × Sih pixels of the binary image before reduction is a predetermined threshold value Tn. If it is above, the pixel value of the character area extraction image corresponding to it is set to 0. The threshold value Tn is set to a value of about Siw × Sih × 0.5.
[0100]
In step S2102, a unit matrix Um is generated from the input image and a symbol unit is detected to generate a unit extracted image.
[0101]
(Step S2102)
FIG. 23 is an explanatory diagram of step S2102. First, an input image is divided into nw × nh blocks (the size of one block is Siw × Sih) (FIG. 23 (2)), and a symbol unit is detected by a signal detection filter for each of the divided blocks.
[0102]
As in the description of FIG. 18, when the output value of either the signal detection filter A or B exceeds the threshold value Ts, it is determined that the symbol unit is embedded in the block. judge. Note that it is not necessary to distinguish which signal is embedded here. An image representing the result of detection of symbol units for all blocks is referred to as a unit extracted image (FIG. 23 (3)). The size of the unit extracted image is the same as the number of divided blocks and is Sw × Sh. In the unit extracted image of FIG. 23, it is assumed that the symbol unit is detected when the pixel size is white (pixel value is 1), and the symbol unit is not detected when the pixel is black (pixel value is 0). .
[0103]
(Step S2103)
In step S2103, the character region extraction image created in step S2101 is compared with the unit extraction image created in step S2102 to detect tampering. Since the symbol unit is not originally embedded in the portion that overlaps the character area of the input image, that is, the symbol unit is embedded only in the background portion, the symbol unit detection area (pixel value is 1) in the unit extracted image and the character area extracted image The background regions (pixel value is 1) in the same. Therefore, a difference image between the character region extracted image and the unit extracted image is generated, and a region where the value does not become zero is detected as “a portion where the print document (paper) has been tampered with by erasing the dictionary”.
[0104]
FIG. 24 is an explanatory diagram of step S2103. Since no alteration has been performed here, the value of all the pixels in the difference image is zero.
[0105]
FIG. 25 is an explanatory diagram when tampering occurs. Since the character area of the input image has been deleted and no character area is extracted from the area corresponding to that of the character area extraction image, the pixel value is 1 (FIG. 25 (3)). However, since no unit symbol is detected from the same area, the corresponding area of the unit extracted image has a pixel value of 0 ((6) in FIG. 25). Therefore, when these difference images are calculated, an area where the pixel value does not become 0 appears, and it can be specified that this is a falsification place (FIG. 25 (9)).
[0106]
Even if the character unit is very small and the symbol unit is embedded in the gap of one character, even if the character is erased with correction fluid and another character is overwritten, Since the symbol unit embedded in the gap is erased, this becomes undetectable. Therefore, if this is used, if there is an area in which the symbol unit cannot be detected at all in the unit extraction image, it may be possible that the area has been tampered with.
[0107]
The character region extraction image and the unit extraction image may be subjected to tampering detection after removing high-frequency components by performing noise removal or the like using an image processing method such as region enlargement / reduction. This makes it possible to perform stable tamper detection without being affected by noise components.
[0108]
As described above, according to the present embodiment, the following effects can be obtained.
(E) If there is a fraud in which an arbitrary character string is erased mainly with a correction liquid in a printed document, the alteration can be detected even if there is no original text of the document, and the alteration location can be specified.
[0109]
(Third embodiment)
FIG. 26 is a diagram showing the configuration of the third exemplary embodiment of the present invention. A difference from the first embodiment is that an embedded signal number recording unit 3012 is added to the digital watermark embedding device 3001, and an embedded signal number detection unit 3013, a filter output value calculation unit 3014, and an optimum threshold determination are added to the digital watermark detection device 3002. A part 3015, a detection signal counting part 3016, and a tampering determination part 3017 are added. Here, the operation of the other components is the same as that of the first embodiment.
[0110]
(Embedded signal number recording unit 3012)
The embedded signal number recording unit 3012 is a part that records the number of signals embedded in the watermark image in the watermark image itself. The number of signals varies depending on the number of characters and layout of the original document image.
[0111]
(Embedded signal number detection unit 3013)
The embedded signal number detection unit 3013 restores the numerical value recorded by the embedded signal number recording unit 3012 and extracts the number of signals embedded at the time of creating the watermark image.
[0112]
(Filter output value calculation unit 3014)
The filter output value calculation unit 3014 calculates the output value of the signal detection filter for the input image and records it for each signal embedding position.
[0113]
(Optimum threshold determination unit 3015)
The optimum threshold value determination unit 3015 calculates an optimum threshold value for signal detection using the output value calculated by the filter output value calculation unit 3014 and the number of embedded signals obtained by the embedded signal number detection unit 3013.
[0114]
(Detection signal counting unit 3016)
The detection signal counting unit 3016 counts the number of signals embedded in the input image using the threshold value obtained by the optimum threshold value determination unit 3015.
[0115]
(Falsification determination unit 3017)
The tampering determination unit 3017 determines whether tampering or the like has been performed by comparing the number of correct answers recorded by the embedded signal number detection unit 3013 with the number of signals obtained by the detection signal counting unit 3016, and specifies the tampering location. Do.
[0116]
(Embedded signal number recording unit 3012)
FIG. 27 is a flowchart showing the operation of the embedded signal number recording unit 3012.
[0117]
(Step S3101)
FIG. 28 is an explanatory diagram of step S3101. In step S3101, first, Iw elements at the left end of the unit matrix Um (FIG. 28 (2)) are used as recording units (called recording unit bands) for the number of embedded symbol units (FIG. 28 (3)). ). Next, the portion excluding the recording unit band of the unit matrix Um is divided into (horizontal × vertical =) Bw × Bh blocks (this is the unit number recording unit matrix Nu (x, y) x = 1 to Bw. , Y = 1 to Bh). The size of each block is bw × bh with the number of elements of the unit matrix Um as the unit of size (width × height =) bw × bh (FIG. 28 (4)).
[0118]
When the recording unit band is arranged at the left end of the unit matrix Um, parameters that can be set for the unit number recording unit matrix are the number of blocks in the horizontal direction and the size in the height direction of the blocks. The number of remaining vertical blocks and the size in the width direction of the blocks are automatically determined from the set parameters, the width of the recording unit band, and the parameters of the unit matrix Um.
[0119]
In the following description, when the size (number of elements) of the unit matrix Um is Mw × Mh, the number of blocks in the horizontal direction is Bw = 4, the size in the height direction of the blocks is bh = 16, and the recording unit band Let Iw = 4. Therefore, the number of blocks in the vertical direction is Bh = Mh / bh = Mh / 16, and the size in the width direction of the blocks is bw = (Mh−Iw) / Bw = (Mh−4) / 4.
[0120]
(Step S3102, Step S3103)
FIG. 29 is an explanatory diagram of steps S3102 and S3103. In step S3102, the number of symbol units included in the area corresponding to each element of the unit number recording unit matrix in the unit matrix Um is measured. The example of FIG. 29 shows a method of measuring the number of symbol units in the unit number recording unit matrix Nu (X, Y), which is executed by the following steps.
Step 1: An area on the unit matrix Um corresponding to Nu (X, Y) is extracted (FIG. 29 (1), (2)).
Step 2: The number of symbol units embedded in the area extracted in Step 1 is measured (FIG. 29 (3), (4)).
Here, it is assumed that the symbol unit embedding rule is that no symbol unit is embedded in the character area of the input document image in the same manner as described in the first embodiment. In the example of FIG. 29, it is assumed that the number of symbol units embedded in this area is 71.
[0121]
In step S3103, the number of symbol units measured in step S3102 is recorded in the recording unit band. The steps are shown below.
Step 3: N (X, Y) = 71 is expressed in binary (FIG. 29 (6)).
Step 4: Set the result of step 3 in the corresponding area of the recording unit band. (Fig. 29 (7), (8))
[0122]
In the example shown here, the number of rows bh of the unit matrix Um corresponding to one row of the unit number recording unit matrix is 16, and the width Iw of the unit band for recording is 4. Therefore, for each row of the unit number recording unit matrix, Thus, the number of recording units is Iw × bh = 4 × 16 = 64. Further, since the number of columns Bw of the unit number recording unit matrix is 4, the number of recording units assigned to one element of the unit number recording unit matrix (referred to as the unit recording unit number) is Iw × bh 侶 w = 64 / 4 = 8. Therefore, the first and second rows of the recording unit band corresponding to each row of the unit recording unit matrix have information on the first column of the unit recording unit matrix, and the third and fourth rows have the second column and 5 to 6 rows. The information in the third column is recorded in the eyes, and the information in the fourth column is recorded in the 7th to 8th rows by the number of unit recording units (8 bits).
[0123]
In this example, the number of units is recorded, but the ratio of the “number of symbol units” to the “maximum number of signal units that can be embedded in the area of the unit matrix Um corresponding to each element” of the unit recording unit matrix is It may be recorded. The method of recording the ratio is as follows: “The range of the unit matrix Um corresponding to each element of the unit recording unit matrix is large, and the number of units included in it is large, and the number of bits necessary to express this number is the unit. "When exceeding the number of recording units" or "When the number of unit recording units allocated to express information of one element of the unit recording unit matrix decreases because the number of columns of the unit recording unit matrix is increased" It becomes effective. In addition, since the tampering location is specified for each element of the unit recording unit matrix, the tampering location for the printed document can be specified in detail by increasing the number of rows and columns of the unit recording unit matrix for the same input document image. Although there is an advantage that it can be performed, it is necessary to increase the recording unit band or to reduce the number of unit recording units.
[0124]
The recording unit band is set in the margin of the document image so as not to overlap the character area of the document image. Even if the recording unit band is set at the right end, the upper end, or the lower end of the unit matrix Um, the same effect can be obtained if the subsequent processing is performed on the assumption that the recording unit band is above and below the document image. .
[0125]
Furthermore, recording unit bands may be set on the left and right of the unit matrix Um, and the same information may be set for each. In this case, even if the paper becomes dirty and the information on one recording unit belt cannot be read, the information from the other recording unit belt can be read to stably perform processing such as alteration detection. There is an effect that can be done. The same applies to the vertical direction.
[0126]
It is not necessary to know where the recording unit band is set in the document image by describing it in the attribute recording area described in the first embodiment.
[0127]
(Embedded signal number detection unit 3013)
In the following explanation, as in the explanation of the second embodiment,
The size of the signal unit embedded in the document by the digital watermark embedding apparatus 3001 is Sw × Sh (pixel).
The number of embedded signal units is horizontal × height = nw × nh.
-There are two types of embedded symbol units: Unit A and Unit B.
The size of the signal unit in the input image is Siw × Sih.
The explanation is based on the premise of
[0128]
FIG. 30 is an explanatory diagram of the embedded signal number detection unit 3013. The number of embedded signals is detected in the following steps.
Step 1: The input image is divided into Sw × Sh blocks and a unit matrix Um is set ((1) in FIG. 30).
Step 2: A portion corresponding to the recording unit band of the unit matrix Um is taken out ((2) in FIG. 30).
Step 3: The embedded bit string is restored by applying a signal detection filter to the recording unit band ((3) and (4) in FIG. 30). In FIG. 29 (3), the output values of two filters (filter A and filter B) are calculated for the area on the input image corresponding to each element of the unit matrix Um corresponding to the recording unit band, and the output value It is assumed that a symbol unit corresponding to a filter having a larger is embedded. In this example, since the output value of the filter A is large, it is determined that the unit A (symbol 0) is embedded.
Step 4: The unit number recording unit matrix is restored based on the restored bit string ((5) in FIG. 30).
[0129]
(Filter output value calculation unit 3014)
Figure 31 shows the fill T FIG. 14 is an explanatory diagram of an output value calculation unit 3014.
Here, the output value of the signal detection filter is recorded by the following steps for each element of the unit matrix Um set by the embedded signal number detection unit.
Step 1: The output value of the signal detection filter (filter A and filter B) is calculated for the region of the input image corresponding to an arbitrary element of the unit matrix Um (FIG. 30 (1)). The signal detection filter calculates the output value while shifting the target region up, down, left and right, and obtains the larger of the maximum value of the output value by the filter A and the maximum value of the output value by the filter B.
Step 2: Step 1 is performed on all the elements of the unit matrix Um, and the output values are recorded in the corresponding elements of the filter output value matrix Fm (x, y), x = 1 to Sw, y = 1 to Sh.
[0130]
(Optimum threshold determination unit 3015)
FIG. 32 is an explanatory diagram of the optimum threshold value determination unit 3015.
The threshold here is a threshold (referred to as Ts) for determining whether or not a unit symbol is embedded in the area of the input image corresponding to each area of the unit matrix Um, and is an arbitrary element of the filter output value matrix. If the value exceeds the threshold value Ts, it is determined that a symbol unit is embedded at a position corresponding to that of the input image. The optimum threshold determination is performed in the following steps.
Step 1: The initial value of the threshold value ts is set from the average Fa, standard deviation Fs, etc. of the elements of the filter output value matrix (output values of the signal detection filter) (FIG. 32 (1)). Here, for example, the initial value is ts = Fa−Fs * 3.
Step 2: The filter output value matrix is binarized by ts to create a unit extraction image ((2) in FIG. 32).
Step 3: The unit number recording unit matrix is applied to the unit extracted image ((3) in FIG. 32).
Step 4: The number of symbol units in the area corresponding to each element of the unit number recording unit matrix of the unit extracted image is counted and recorded in the unit number recording unit matrix ((4) in FIG. 32).
Step 5: The absolute value of the difference between the number of symbol units recorded in the recording unit band decoded by the embedded signal number detection unit 3013 and the number of symbol units obtained from Step 4 is calculated for each element of the unit number recording unit matrix. And the total value for all the elements is Sf (ts) ((3) in FIG. 32).
Step 6: Record ts at which Sf (ts) is minimum as Ts ((3) in FIG. 32).
Step 7: Δt is added to ts to update ts ((7) in FIG. 32). Δt may be calculated from a predetermined value or the standard deviation Fs obtained in step 1 (for example, Δt = Fs × 0.1).
Step 8: End if Ts reaches the expected value. Otherwise, the process returns to step 1 ((8) in FIG. 32).
[0131]
(Detection signal counting unit 3016)
FIG. 33 is an explanatory diagram of the detection signal counting unit 3016.
In this part of the processing, the unit extraction image obtained by binarizing the filter output value matrix with the optimal threshold value Ts obtained from the optimal threshold value determination unit 3015 is used to perform substantially the same processing as the optimal threshold value determination unit 3015.
Step 1: The filter output value matrix is binarized by Ts, and a unit extraction image is created (FIG. 33 (1)).
Step 2: The unit number recording unit matrix is applied to the unit extracted image ((2) in FIG. 33).
Step 3: The number of symbol units in the area corresponding to each element of the unit number recording unit matrix of the unit extracted image is counted and recorded in the unit number recording unit matrix ((3) in FIG. 33).
Step 4: The difference D (X, Y) between the number of symbol units recorded in the recording unit band decoded by the embedded signal number detection unit 3013 and the number of symbol units obtained from Step 3 is the unit number recording unit matrix. Is calculated for each element ((4) in FIG. 33). D (X, Y) in an arbitrary element Nu (X, Y) of the unit number recording unit matrix is the number of unit symbols restored from the recording unit band, R (X, Y), and the unit measured in step 3 Assume that the number of symbols is C (X, Y), and D (X, Y) = R (X, Y) −c (X, Y).
[0132]
(Falsification determination unit 3017)
Tampering determination in an arbitrary element N (X, Y) of the unit number recording unit is performed as follows using D (X, Y).
(1) Tampering with added characters: D (X, Y)> TA (TA is a positive integer) “If the number of detected unit symbols is smaller than the number of recorded unit symbols, the character is originally embedded. Judged that it was undetectable because a character was added on top of the unit symbol. "
(2) Tampering with erased characters: D (X, Y) <TC (TC is a negative integer) “If the number of detected unit symbols is larger than the number of recorded unit symbols, it is originally embedded. A unit symbol that was not detected was detected, and it was determined that after the character was erased with correction fluid, the image was contaminated with a non-character region, and the signal detection filter reacted to this. To detect tampering by erasing characters that could not be detected in the form. "
(3) No alteration: Other than (1) and (2)
[0133]
As described above, according to the present embodiment, the following effects can be obtained.
(F) By dividing the document image into several blocks and recording the number of symbol units embedded in each block, alteration detection can be performed in units of blocks, and the alteration location can be specified.
(G) By recording the number of embedded signals, it is possible to obtain an optimum threshold for signal detection and tamper detection.
(H) If a printed document contains an illegal character such as adding a character string to a blank area or erasing an arbitrary character string with correction fluid, the alteration is detected without the original document. And tampering locations can be identified.
[0134]
The preferred embodiments of the digital watermark embedding device, the digital watermark detection device, the digital watermark embedding method, and the digital watermark detection method according to the present invention have been described above with reference to the accompanying drawings. However, the present invention is limited to this example. Not. It will be obvious to those skilled in the art that various changes or modifications can be conceived within the scope of the technical idea described in the claims, and these are naturally within the technical scope of the present invention. It is understood that it belongs.
[0135]
For example, the processing of the first embodiment is performed after the processing of the third embodiment, and the optimum threshold value Ts obtained by the optimum threshold value determination unit 3015 is set to the values in steps S1203 and S1204 of the first embodiment. You may use as threshold value Ts of a process (FIG. 18). Thereby, the accuracy of extraction of embedded information is improved as compared with the case where Ts is a fixed value.
[0136]
Similarly, the processing of the second embodiment may be performed after the processing of the third embodiment, and the optimum threshold Ts obtained by the optimum threshold determination unit 3015 may be used as the threshold Ts of the second embodiment. Is possible. Thereby, the accuracy of tampering detection is improved as compared with the case where Ts is a fixed value.
[0137]
【The invention's effect】
As described above, the main effects of the present invention are as follows.
(A) Referencing the character arrangement state of the document image into which the watermark is to be inserted, and embedding meaningful information (symbols of codewords) only in the area that does not overlap the character, so what the original document is Even confidential information can be embedded securely.
(B) By arranging the same number of conflicting signal units in the area where no symbol of the code word is embedded, it is possible to reliably determine that no symbol is embedded at the time of detection.
(C) When embedding information is detected, symbol determination is performed based on the sum of the output values of two filters for a certain area, so that the accuracy of information detection is kept high.
(D) By setting the attribute recording areas for setting the attribute information such as the number of embedded signals at the four corners of the signal embedding area, it is affected by hardware errors of the input / output device when detecting confidential information. Therefore, it is possible to accurately extract the attribute information without increasing the detection accuracy thereafter.
[0138]
Furthermore, according to the application example described in the embodiment of the present invention, the following effects can be further obtained.
(E) If there is a fraud in which an arbitrary character string is erased mainly with a correction liquid in a printed document, the alteration can be detected even if there is no original text of the document, and the alteration location can be specified.
(F) By dividing the document image into several blocks and recording the number of symbol units embedded in each block, alteration detection can be performed in units of blocks, and the alteration location can be specified.
(G) By recording the number of embedded signals, it is possible to obtain an optimum threshold for signal detection and tamper detection.
(H) If a printed document contains an illegal character such as adding a character string to a blank area or erasing an arbitrary character string with correction fluid, the alteration is detected without the original document. And tampering locations can be identified.
[Brief description of the drawings]
FIG. 1 is an explanatory diagram illustrating a configuration of a digital watermark embedding device and a digital watermark detection device.
FIG. 2 is an explanatory diagram showing an example of a signal unit, where (1) shows a unit A and (2) shows a unit B.
3 is a cross-sectional view of the change in pixel value in FIG. 2A as viewed from the direction of arctan (1/3).
FIGS. 4A and 4B are explanatory diagrams illustrating an example of a signal unit. FIG. 4 illustrates a unit C, FIG. 4 illustrates a unit D, and FIG.
FIGS. 5A and 5B are explanatory diagrams of a background image, in which FIG. 5A shows a case where unit E is defined as a background unit and this is used as a background of a watermark image arranged without gaps, and FIG. An example in which the unit A is embedded in the image is shown, and (3) shows an example in which the unit B is embedded in the background image of (1).
FIG. 6 is an explanatory diagram showing an example of a unit pattern.
FIG. 7 is a flowchart showing a process flow of the watermark image forming unit 1006;
FIG. 8 is an explanatory diagram illustrating an example of a symbol unit arrangement availability matrix;
FIG. 9 is an explanatory diagram showing an example of a unit pattern arrangement availability matrix;
FIG. 10 is an explanatory diagram showing an example of a unit pattern matrix.
FIG. 11 is an explanatory diagram illustrating an example of a unit matrix.
FIG. 12 is an explanatory diagram of an attribute recording area.
FIG. 13 is an explanatory diagram showing an example of step S1107.
14 is an explanatory diagram showing a flow of processing of a watermark detection unit 1011. FIG.
FIG. 15 is an explanatory diagram of a signal region detection method.
FIG. 16 is an explanatory diagram illustrating an example of a method for restoring the size of a unit matrix embedded in an attribute region.
FIG. 17 is an explanatory diagram of steps S1203 and S1204.
FIG. 18 is an explanatory diagram of another method for realizing steps S1203 and S1204.
FIG. 19 is an explanatory diagram illustrating an example of a method of extracting a code word from a unit pattern matrix.
FIG. 20 is a diagram showing a configuration of a second exemplary embodiment of the present invention.
FIG. 21 is a flowchart showing the operation of the character erasure / alteration detection unit.
FIG. 22 is an explanatory diagram of a character region extraction image.
FIG. 23 is an explanatory diagram of step S2102.
FIG. 24 is an explanatory diagram of step S2103.
FIG. 25 is an explanatory diagram when tampering occurs.
FIG. 26 is an explanatory diagram showing a configuration of the third exemplary embodiment of the present invention.
FIG. 27 is a flowchart showing the operation of the embedded signal number recording unit 3012;
FIG. 28 is an explanatory diagram of step S3101.
FIG. 29 is an explanatory diagram of steps S3102 and S3103.
30 is an explanatory diagram of an embedded signal number detection unit 3013. FIG.
31 is an explanatory diagram of a fill row output value calculation unit 3014. FIG.
32 is an explanatory diagram of an optimum threshold value determination unit 3015. FIG.
33 is an explanatory diagram of a detection signal counting unit 3016. FIG.
[Explanation of symbols]
1001, 2001, 3001 Electronic watermark embedding apparatus
1002, 2002, 3002 Electronic watermark detection apparatus
1003, 2003, 3003 Document data
1004, 2004, 3004 Confidential information
1005, 2005, 3005 Document image forming unit
1006, 2006, 3006 Watermark image forming unit
1007, 2007, 3007 Watermarked document image composition unit
1008, 2008, 3008 Output device
1009, 2009, 3009 Printed document (paper)
1010, 2010, 3010 Input device
1011, 1011 Watermark detection unit
2012 Character Erase / Falsification Detection Unit
3012 Embedded signal recording unit
3013 Embedded signal number detection unit
3014 Filter output value calculation unit
3015 Optimal threshold determination unit
3016 Detection signal number counting unit
3017 Tampering determination unit

Claims

An electronic watermark embedding device that embeds confidential information in a document image by electronic watermark,
A watermark image forming unit that creates a watermark image based on the confidential information with reference to the document image;
The watermark image forming unit
An embedding area for embedding a dot pattern capable of identifying a predetermined symbol by a predetermined filter is calculated from the document image, and it is determined whether or not a ratio of a character area is equal to or less than a predetermined threshold with respect to the embedding area. And
A predetermined number of dot patterns (symbol units) that can identify a symbol that forms at least a part of the confidential information in an area that does not overlap the character area of the embedded area when the ratio of the character area is equal to or less than a predetermined threshold. embedding, in a region which overlaps with the buried region of the character region, embeds independent of the dot pattern is a symbol (background unit) from said secret information,
An electronic watermark embedding apparatus, characterized in that the watermark image having a uniform apparent shading is formed .

The watermark image forming unit
2. The digital watermark embedding according to claim 1, wherein the document image is divided into a plurality of embedding areas, and it is determined for each embedding area whether the ratio of the character area is equal to or less than a predetermined threshold value. apparatus.

The watermark image forming unit
3. The electron according to claim 1, wherein the symbol unit is embedded in a plurality of types and a predetermined number in a region that does not overlap the character region in the embedded region in which the ratio of the character region exceeds a predetermined threshold. Watermark embedding device.

4. The digital watermark embedding apparatus according to claim 1, wherein the symbol unit is a dot pattern in which dots are arranged at a predetermined interval in a predetermined direction.

5. The symbol unit according to claim 1, wherein a predetermined symbol can be identified by a predetermined filter by changing a direction and / or interval of the dots. The electronic watermark embedding device described.

A watermarked document image composition unit that creates a watermarked document image by superimposing the document image and the watermark image;
An output device for printing the watermarked document image;
The digital watermark embedding apparatus according to claim 1, further comprising:

The watermarked document image composition unit
7. The digital watermark embedding apparatus according to claim 6, wherein various attribute information at the time of embedding and / or detecting the confidential information is recorded at four corners of the watermarked document image.

An embedded signal number recording unit for recording the number of the symbol units and the background unit embedded in the watermark image in a blank area of the watermark image, further comprising: The digital watermark embedding device according to any one of 5, 6 and 7.

An electronic watermark detection apparatus for detecting confidential information embedded with a digital watermark by a digital watermark embedding apparatus according to any one of claims 1 to 8,
The document image is divided into a plurality of embedding areas for embedding a dot pattern capable of identifying a predetermined symbol by a predetermined filter,
A dot pattern (symbol unit) that can identify a symbol that forms at least a part of the confidential information or a dot pattern (background unit) that is a symbol unrelated to the confidential information is embedded in each embedded area. , The confidential information is embedded,
A watermark detection unit for detecting the confidential information;
The watermark detection unit
A plurality of types of filters capable of identifying a predetermined symbol from the dot pattern;
For each of the embedded regions, matching is performed using the plurality of types of filters,
Detecting at least a part of the confidential information corresponding to the one filter from the embedded region in which the number of matching of the one filter is much larger than the number of matching of all the other filters, A digital watermark detection device.

The watermark detection unit
10. The digital watermark detection apparatus according to claim 9, wherein the confidential information is not detected from the embedded area where there is no difference between the matching numbers of the filters.

A character erasure / tamper detection unit;
The character erasure / alteration detection unit
By binarizing the document image in which the confidential information is embedded with a predetermined threshold value, a character region extraction image is created with a pixel value of the character region set to 0 and a pixel value of the background region set to 1,
A symbol unit extraction image is created by setting a pixel value of an area in which the symbol unit cannot be detected as 0 and a pixel value of an area in which the symbol unit can be detected as 1 in the document image in which the confidential information is embedded,
11. The digital watermark detection apparatus according to claim 9, wherein alteration of the watermarked document image is detected by comparing the character region extracted image and the symbol unit extracted image.

An embedded signal number detector for detecting the number of the symbol units embedded when the confidential information is embedded in the document image;
A filter output value calculation unit that calculates an output value of the predetermined filter for the input image and records it for each embedding area;
Optimal threshold determination for calculating an optimal threshold for detecting the number of the symbol units embedded in the watermark image from the detection value of the embedded signal number detection unit and the calculation value of the filter output value calculation unit Part,
A detection signal counting unit for detecting the number of the symbol units actually embedded in the watermarked document image;
A tampering determination unit that determines whether or not the watermarked document image has been tampered with by comparing a detection value of the embedded signal number detection unit with a count value of the signal detection counting unit;
The digital watermark detection apparatus according to claim 9, further comprising:

The embedded signal number detection unit includes:
13. The digital watermark detection apparatus according to claim 12, wherein the number of the symbol units embedded at the time of embedding the confidential information is detected from information recorded in a blank area of the watermarked document image.

The optimum threshold value determination unit
The difference between the detection value of the embedded signal number detection unit and the calculated value of the filter output value calculation unit is summed for all the embedded regions, and the threshold value is determined so that the total value is minimized. 14. The digital watermark detection apparatus according to claim 12, wherein the digital watermark detection apparatus is characterized.

An electronic watermark embedding method for embedding confidential information in a document image with an electronic watermark,
Dividing the document image into a plurality of embedding areas for embedding a dot pattern capable of identifying a predetermined symbol by a predetermined filter;
A second step of determining whether the ratio of the character area is equal to or less than a predetermined threshold value for each of the embedded areas;
When the ratio of the character area is equal to or less than a predetermined threshold, a predetermined number of dot patterns (symbol units) that can identify a symbol that forms at least a part of the confidential information in an area that does not overlap the character area of the embedded area A third step of embedding;
A fourth step of embedding a plurality of types and a predetermined number of the symbol units in an area that does not overlap the character area of the embedding area in which the ratio of the character area exceeds a predetermined threshold;
A fifth step of embedding a dot pattern (background unit) which is a symbol irrelevant to the confidential information in an area overlapping the character area of the embedding area;
A method for embedding a digital watermark, comprising:

An electronic watermark detection method for detecting confidential information embedded in a digital image by a digital watermark embedding method according to claim 15, comprising:
The document image is divided into a plurality of embedding areas for embedding a dot pattern capable of identifying a predetermined symbol by a predetermined filter,
A dot pattern (symbol unit) that can identify a symbol that forms at least a part of the confidential information or a dot pattern (background unit) that is a symbol unrelated to the confidential information is embedded in each embedded area. , The confidential information is embedded,
For each of the embedded regions, matching is performed using a plurality of types of filters.
Detecting at least a part of the confidential information corresponding to the one filter from the embedded region in which the matching number of the one filter is very large compared to the matching numbers of all the other filters;
The electronic watermark detection method according to claim 1, wherein the confidential information is not detected from the embedded area where there is no difference between the matching numbers of the filters.