JP2004515799A

JP2004515799A - Stegotext encoder and decoder

Info

Publication number: JP2004515799A
Application number: JP2002515737A
Authority: JP
Inventors: セウェル，ロジャー，フェイン; オウエン，マーク，セント．ジョン; バーロウ，ステファン，ジョン; ロング，サイモン，ポール
Original assignee: アクティヴェィテッドコンテントコーポレーションインコーポレーテッド
Priority date: 2000-07-27
Filing date: 2001-07-27
Publication date: 2004-05-27
Anticipated expiration: 2021-07-27
Also published as: US7298841B2; US20040028222A1; AU2001275712A1; DE60110086T2; CA2417499A1; US20090190755A1; EP1305901B1; ATE293316T1; EP1305901A2; JP5105686B2; US20080152128A1; DE60110086D1; US7916862B2; WO2002011326A3; US7917766B2; WO2002011326A2

Abstract

The invention comprises an encoder for encoding a stegotext and a decoder for decoding the encoded stegotext, the stegotext being generated by modulating the log power spectrogram of a covertext signal with at least one key, the or each key having been added or subtracted in the log domain to the covertext power spectrogram in accordance with the data of the watermark code with which the stegotext was generated, and the modulated power spectrogram having been returned into the original domain of the covertext. The decoder carries out Fast Fourier Transformation and rectangular polar conversion of the stegotext signal so as to transform the stegotext signal into the log power spectrogram domain; subtracts in the log power domain positive and negative multiples of the key or keys from blocks of the log power spectrogram and evaluates the probability of the results of such subtractions representing an unmodified block of covertext in accordance with a predetermined statistical model.

Description

【０００１】
本発明は、アナログ信号またはデジタル信号の電子透かし処理（ｗａｔｅｒｍａｒｋｉｎｇ：ウォーターマーキング）に関する。信号はビデオ信号またはデータ信号でもよいが、本発明は、それに限られるものではないが、特にオーディオ信号の電子透かし処理に関するものであることを理解されたい。
【０００２】
「電子透かし」という用語は、加算されたデータが主信号の主目的に影響を及ぼさないように、主信号にデータを加算する手順を対象として含むことを意味する。主信号は「カバーテキスト」と呼ばれることが多く、加算された電子透かしデータを含む信号は「ステゴテキスト」と呼ばれることが多い。このようにオーディオ信号の場合は、ステゴテキストが再生される際に、ステゴテキストに加算データが存在することは聴き手には事実上知覚できないように意図されている。しかし、ステゴテキストに加算データが存在することによって、ユーザが適宜のデコード装置を持っている場合は、カバーテキストの基点を識別できる。ユーザの装置に適宜の回路が備えられていれば、復元された電子透かしデータがその装置と適合しない場合は、ユーザが元のカバーテキスト信号に含まれている主データを再生できないようにし得る。加えて、ユーザがカバーテキストを再生できなければならない。
【０００３】
このような技術は明らかに音楽のレコーディングに関して大きな可能性を有している。その結果、ステゴテキストを聴く権利がある者が、加算されたコード化データにより生じたスプリアス・サウンドによって楽しみを損なわれないように、オーディオ信号に電子透かし処理を施すという課題に相当な努力が払われてきた。
【０００４】
一方では、記録され、伝送されたオーディオ素材に施されることができる様々な種類の従来の信号処理の後に有効であり、かつ加算されたコード化データを除去または無効にする直接的な試みに対して抗することができるほど充分に、電子透かしが頑強であることが重要である。
【０００５】
アナログ信号に電子透かし処理を行う装置と方法は国際特許明細書ＷＯ９８／５３５６５号に記載されており、この明細書は電子透かし信号に使用されてきたいくつかの技術を開示している。
【０００６】
先行して公刊されたこの明細書で提案されている１つの電子透かし処理方法には、オーディオ信号の短期の自己相関関数を算定し、その後、聴きにくく、ある特定の遅延（単数または複数）を経て短期の自己相関関数の値を変更する加算信号を付加し、それによって低速度でデータを伝える特定の波長を生成することが含まれている。この波長へのデータの実際の変調は、いくつかの適宜の変調技術のいずれかを用いて行うことができる。装置の受信端末で電子透かし読み取り器（すなわちデコーダ）がステゴテキストの短期の自己相関関数を算定し、用いた変調技術に適した復調を施す。読み取り器が自己相関関数を変調するために最初に用いられたデータを利用できれば、加算されたコード化データをステゴテキストから除去することができる。
【０００７】
しかし、多くのオーディオ信号の短期の自己相関関数は、基本音声のサウンドを変更することなく任意の長さを遅延を経てゼロに任意に近づくように容易に変更することができる。したがって、比較的簡単に電子透かし処理された信号を侵害して、電子透かし処理の効果が無効になるようにすることができる。
【０００８】
本発明は上記の欠点がない電子透かし処理システムを提供し、かつ電子透かし処理された信号をデコードするデコーダを提供することに関するものである。
【０００９】
本発明の第１の態様では、カバーテキスト信号をエンコードしてステゴテキストを生成するためのエンコーダであって、
カバーテキスト信号の高速フーリエ変換および方形極変換を実行するための第１の変換手段であって、それによりカバーテキスト信号を対数パワー・スペクトログラムへと変換する、第１の変換手段と、
一つのキーまたは各キーが所定サイズの二次元パターンの形式である少なくとも１つのキーを供給する手段と、
対数パワー・スペクトログラム領域内でキーの倍数を、または複数のキーがある場合には１つまたは複数のキーの倍数を、変換されたカバーテキスト信号のブロックに加算または減算するマルチプライアと、
所望のコードを表すデータに従ってマルチプライアによるキー（単数または複数）の加算または減算を制御する手段と、
変調されたカバーテキスト信号の極方形変換および逆高速フーリエ変換を実行するための第２の変換手段であって、それによりステゴテキストを生成する、第２の変換手段と、を備えたエンコーダが提供される。
【００１０】
本発明の第２の態様では、カバーテキスト信号をエンコードしてステゴテキストを生成する方法であって、
カバーテキスト信号の高速フーリエ変換および方形極変換を実行することであって、それによりカバーテキスト信号をパワー・スペクトログラム領域へと変換する、実行することと
一つのキーまたは各キーが所定サイズの二次元パターンの形式である少なくとも１つのキーを供給することと、
対数パワー・スペクトログラム領域内でキーの倍数を、または複数のキーがある場合には１つまたは複数のキーの倍数を、変換されたカバーテキスト信号のセグメントに加算または減算することと、
所望のコードを表すデータに従って加算／乗算ステップでキー（単数または複数）の倍数の加算または減算を制御することと、
変調されたカバーテキスト信号の極方形変換および逆高速フーリエ変換を実行することであって、それによりステゴテキストを生成する、実行することと、を含む方法が提供される。
【００１１】
本発明の第３の態様では、カバーテキスト信号の対数パワー・スペクトログラムを、１つのキーまたは各キーの倍数が、それによってステゴテキストが生成された電子透かしコードに従って対数領域でカバーテキスト・パワー・スペクトログラムに加算または減算された少なくとも１つのキー（Ｋ）で変調し、かつ変調されたパワー・スペクトログラムをカバーテキストの元の領域に戻すことによって生成されたステゴテキストをデコードするデコーダであって、ステゴテキスト信号の高速フーリエ変換および方形極変換を実行する変換手段であって、それによりステゴテキスト信号を対数パワー・スペクトログラム領域に変換する、変換手段と、元のカバーテキスト信号の対数パワー・スペクトログラムがそれによってエンコードされたキー（単数または複数を）を供給する手段と、対数パワー領域で対数パワー・スペクトログラムのブロックからキー（単数または複数）の正と負の倍数を減算し、かつ所定の統計モデルに従ってカバーテキストの未修正ブロックを表す上記減算の結果の確率を評価するための計算手段と、計算手段の出力からエンコードされたデータを復元する抽出手段と、を備えたデコーダが提供される。
【００１２】
本発明の第４の態様では、カバーテキスト信号の対数パワー・スペクトログラムを、１つのキーまたは各キーの倍数が、それによってステゴテキストが生成された電子透かしコードのデータに従って対数領域でカバーテキスト・パワー・スペクトログラムに加算または減算された少なくとも１つのキー（Ｋ）で変調し、かつ変調されたパワー・スペクトログラムをカバーテキストの元の領域に戻すことによって生成されたステゴテキストをデコードする方法であって、ステゴテキスト信号の高速フーリエ変換および方形極変換を実行することであって、それによりステゴテキスト信号を対数パワー・スペクトログラム領域に変換する、実行することと、元のカバーテキスト信号の対数パワー・スペクトログラムがそれによってエンコードされたキー（単数または複数を）を供給することと、対数パワー領域で対数パワー・スペクトログラムのブロックからキー（単数または複数）の正と負の倍数を減算し、かつ所定の統計モデルに従ってカバーテキストの未修正ブロックを表す上記減算の結果の確率を評価することと、計算手段の出力からエンコードされたデータを復元することと、を含む方法が提供される。
【００１３】
本発明の第４の態様では、請求項４５に記載の電子透かしキー・ジェネレータが提供される。
【００１４】
本発明をより良く理解できるように、ここで添付図面を参照するとともに本発明の実施形態を例をあげて説明する。
【００１５】
ここで図１を参照すると、基本システムはキー・ジェネレータ（１）と、エンコーダ（２）と、デコーダ（３）とから構成されている。キー・ジェネレータ（１）は（１’）で入力された整数シード値に基づいて疑似ランダム・キーを生成する。デコーダ（２）はステゴテキストを生成するためにキーを用いて、データを有するカバーテキストとして（４）で入力された音楽ファイルをマーキング（ｍａｒｋ：電子透かし処理）する。データは（２’）でエンコーダ（２）へと入力される。伝送線（５）を経てステゴテキストを受信したデコーダ（３）は、この場合もキーを用いてマーキングされたファイルからデータをリードバックし、復元されたデータを（６）で出力する。データが適正にリードバックされたことを確実にするために、エンコード動作とデコード動作には同じキーを用いなければならない。このキーは勿論、必要な場合にシードから再生可能なので、シード値はマーキングされたファイルをデコードするのに必要な全ての値である。伝送線（５）は勿論、広範の多様な形式をとることができる。したがって、ステゴテキストは任意の適宜の媒体に記録し、または無線、ファイバー・ケーブル等によって伝送することができよう。以下ではマーキングされないファイルはどれもカバーテキストと呼び、電子透かし処理されたファイルはステゴテキストと呼ぶ。この実施形態は音楽で使用することに関連して説明するが、記載される技術および装置は音声またはビデオ・データのような音楽ではない場合にも利用できることが理解されよう。
【００１６】
添付図面の図２は本発明によるより詳細な実施形態のブロック図を示している。この図ではカバーテキストは（１０）で示されたマーキングされていないオーディオ・ファイルである。このオーディオ・ファイルの音源は１０’で示されている。これはライブ・イベントをピックアップする、テープまたはディスクのようなレコーディング媒体、またはラジオまたはインターネットで伝送された信号であってよい。このオーディオ・ファイルはエンコーダ（２）に入力され、回路（１１）においてパワー・スペクトログラムへと変換される。この変換の理由は以下のようである。情報をステゴテキストの位相成分で伝えることは不可能である。人間の耳は基本的に位相には敏感ではなく、これはある圧縮アルゴリズムによって引き出されたものである。したがって、位相に依存する電子透かし処理技術は圧縮に対し頑強であるとは考えられない。その上、オーディオ・ファイルにランダムな「グループ遅延」を加えることによって、その周波数成分の位相にスクランブルをかけ、オーディオ・ファイルを処理することができる。計算上集約的ではないこのような処理は一般に、オーディオ・ファイル内にある何らかの特定の波形を破壊する。したがって、時間領域形式の信号上の波形に依存する電子透かしは、この処理によって読み取り不能にすることができる。
【００１７】
したがって本発明では、カバーテキストのパワー・スペクトログラムを利用してカバーテキストの電子透かし処理を行うことが提案される。このようにして、カバーテキスト内の各々の周波数成分の大きさだけが修正され、各周波数成分の位相はマーキング処理を通して保存される。位相情報はデコーダで廃棄される。ここでこの手順を詳細に説明する。
【００１８】
カバーテキストのパワー・スペクトログラムを計算するために、カバーテキストは長さの半分だけ重複するブロック２Ｙサンプルの長さに分割される。このようにして、Ｙのサンプルごとに新たなブロックが始まる。記載のようにサンプル率ｆ_ｓ＝４４１００Ｈｚのオーディオ・ファイル用に設計されているこの実施形態では、Ｙは１０２４に設定される。
【００１９】
各ブロックは分析ウインドウとして知られているウインドウ関数で乗算され、ウインドウモードにされた（ｗｉｎｄｏｗｅｄ）ブロックのフーリエ変換が計算される。ウインドゥ関数の目的は、サンプル値がブロックのいずれかの終端でゼロの方向に漸減し、不連続性の回避を確実にすることである。フーリエ変換はブロックを周期的な関数の反復ユニットとして処理する。ウインドウモードにされたブロックは実際のサンプルから構成されているので、そのフーリエ変換は正と負の周波数に対して共役対称である。負の周波数成分は付加的な情報を伝えず、したがって廃棄することができる。
【００２０】
各フーリエ係数は、その大きさが対応する周波数成分の振幅を表し、その引き数が位相を表す複素数である。位相情報が廃棄されると、残るのは信号のパワー・スペクトルである。厳密な意味では、パワー・スペクトルは各フーリエ係数の大きさを二乗することによって得られる。
【００２１】
いくつかの連続するパワー・スペクトルが互いに横並びに配列されると、数値の格子が形成される。一般には垂直である一方の軸は周波数を表し、一般には水平であるもう一方の軸は時間を表す。この格子がオーディオ・サンプルのパワー・スペクトログラムである。図３は音楽の一節から取り出されたパワー・スペクトログラムの例を示している。この図では、格子内の数値は様々な濃淡として示されている。−８から３までの右側の列は、スペククトログラムを評価できるようにするため、スペクトログラムの輝度レベルを突き合わせる（ｍａｔｃｈ）ことができるスケールである。
【００２２】
Ｙを選択することでスペクトログラムの分解能が決まる。周波数方向では分解能はｆ_ｓ／２Ｙであり、時間方向では分解能はＹ／ｆ_ｓである。この実施形態では、これらの値はそれぞれ２１．５ヘルツと２３．２ｍｓである。図３の軸はこれらの単位で測定される。
【００２３】
そのパワー・スペクトログラムからオーディオ波形を充分良好に再構成することは困難であるように思われるが、位相情報が保持されていれば可能である。スペクトログラム・データは逆フーリエ変換で時間領域に戻され、以前と同じようにして重複され、互いに加算されることができる。
【００２４】
そこからスペクトログラムが得られるカバーテキストを電子透かし処理するために、スペクトログラムへの修正が少なく、かつ元の位相情報が保持されている限りは、前述の方法によって満足できるオーディオ波形を再現できることが判明している。再構成された時間−領域セグメントは何れかの端部でゼロまで漸減することはもはや保証されず、したがって、合成ウインドウが前述のように互いに加算されて、セグメントがウインドウモードにされると、最終的な波形の主観的なクオリティが高まることに注目するべきである。分析および合成ウインドウは、システムを通して全体的な振幅変調がないよう確実にするように選択されなければならない。この実施形態では、これらのウインドウはそれぞれ二乗余弦関数の平方根である。
【００２５】
図２では、スペクトログラムの変調はエンコードされる予定のビット・ストリームに応答して、全体的に（１２）で示された回路で行われる。
【００２６】
最後にブロック（２）で、回路（１３）は変調されたパワー・スペクトログラムを時間領域に戻し、これらをステゴテキストに変換するために合成する。図２では、ステゴテキストは１５で示されている。
【００２７】
デコーダ（３）はステゴテキストを対数スペクトログラムに変換するための回路（１６）と、キーを用いて対数スペクトログラムを相関することで、回路（１８）で電子透かしコードを表し、（１９）で出力されるビット・ストリームを抽出するようにする回路（１７）とを備えている。
【００２８】
可聴効果を伴わずにオーディオ信号のパワー・スペクトログラムの要素が変調される範囲は、その元のレベルとほぼ比例することが判明している。したがって、デシベル・タームで一定量までパワー・スペクトログラムへの加算または減算を行ってもよい。知覚できる変調の量は聴く環境に左右されるが、標準的には約１ｄＢである。したがって、この実施形態では、電子透かし処理プロセスは「対数パワー・スペクトログラム領域」で実行され、キー・ジェネレータ１によって生成されたキー、および電子透かしとしてエンコードすべきデータに従ってパワー・スペクトログラムに加算または減算を行うことからなっている。
【００２９】
より大きいスペクトログラム要素に、より大幅に変調を加えることができるので、これらの要素によって伝えられる情報は、振幅がより小さい要素の場合よりもノイズに対する感度が低くなる。しかし、これらの要素が何であるかを予め知ることは不可能である。したがって、記載している電子透かし処理は、カバーテキストが情報を伝えるために利用できるどのような要素をも引き出せるように備えられている。このように、この実施形態では回路１２内の各スペクトログラム要素は、電子透かしの情報伝達能力を最大限にするように変調される。したがって、電子透かし内の各データ・ビットはスペクトログラムの領域で変調パターンを誘発する。変調パターンは一面では「１」ビットをエンコードするために適用され、他面では「ゼロ」ビットをエンコードするために適用される。ビットは規則的な間隔で、すなわちスペクトログラム内の規則的な水平間隔Ｔでエンコードされる。
【００３０】
オーディオ・ファイルのサイレント部分のように電子透かしを隠すことができない短いセグメントがカバーテキストにあることもある。したがって、各データ・ビットができるだけ長くステゴテキストの部分に作用することが必須である。この実施形態では、この課題に対して２つのアプローチを用いている。
【００３１】
添付図面の図４はこれらのアプローチの１つをグラフで示している。このアプローチでは、スペクトログラム変調パターンは隣接するビットについて重複している。図４では各方形Ｋは変調パターンのコピーを表している。各スペクトログラム変調パターンＫは幅がＸの時間単位であり、高さがＹの周波数単位である。Ｙはスペクトログラムの全高である。この実施形態ではＸは３２であり、Ｔ＝５である。したがって、カバーテキストのパワー・スペクトログラムの最初の３２列幅のブロックが同じサイズのキーによって変調され、次にキーがＴ（５列）だけステップされると、カバーテキストの最初の５列は、対応する５列のキーによって変調されるに留まる。変調の次の繰り返しでは、カバーテキストの６列〜３７列までがキーによって変調されるので、６列〜３２列は二回変調されたことになる。三回目の繰り返しでは、最初のブロックの６列〜１０列までが二重の変調だけに留まるが、１１列〜３２列は３回変調され、一方、３３列〜３７列までは２回目の変調がなされ、３８列〜４２列までは最初の変調がなされる。このシーケンスがカバーテキストの長さ全体にわたって繰り返される。勿論、ＸとＴの値は広い範囲にわたって異なる。例えばＸは２５６でよく、Ｔは１０であってよい。
【００３２】
第２のアプローチは、依然として適時に各ビットの効果を広げるためにメッセージ・ビットにエラー補正コードを適用するものである。
【００３３】
図５に示した畳込みエンコーダは、より長いキーを用いた場合と比較してデコーダ内でのメモリの必要性を減らすように、音楽の長い部分にわたって各入力ビットの効果を広げるために用いられる。エンコードすべきデータ・ストリームは線（３０）上でシフトレジスタに入力され、このレジスタはこの実施形態では３個のＤ形フリップフロップ（３１、３２、および３３）から構成されている。クロック（ｃｌｋ／２）は線（３４）上で供給される。クロック速度（ｃｌｋ）でフリップされる出力スイッチ（３５）は、３個のフリップフロップによって形成されたシフトレジスタ内でビットの１つまたは２つの排他的ＯＲ結合のいずれかを選択するように、一対の排他的ＯＲゲート（３６、３７）の出力に接続される。この実施例では、上部の排他的ＯＲゲート（３６）がシフトレジスタの３つのビットの全てに接続され、下部のゲート（３７）がビット０とビット２に接続される。このエンコーダは二次元マトリクス〔１１１；１０１〕によって規定され、ただしマトリクスの最初の行は上部の排他的ＯＲゲートに対するシフトレジスタの接続に対応し、第２の行は下部の排他的ＯＲゲート（３７）に対する接続に対応する。接続のパターンはセット｛０，１｝からの係数を有する多項式の形式で表すことができる。この場合は、多項式はＸ^２＋Ｘ＋１（ゲート３６）とＸ^２＋１（ゲート３７）である。
【００３４】
このエンコーダでは、各入力ビットは連続する６つの出力ビット（マトリクス内のエントリの総数）に作用し、出力ビット速度は入力ビット速度の２倍である（マトリクス内の行数）。このようなコードは「速度１／２コード」と呼ばれる。マトリクスの各行内のエントリ「ジェネレータ多項式」は伸張にに選択されなければならない。この実施形態では、速度が整数の逆数であるコードだけを用いることができる。この制約の理由は、この実施形態で用いられるエンコーダの種類によるだけであり、他の関連性はないことを理解されたい。したがって、他の形式のエラー補正コーディングを用いる必要があった場合は、上記の制約は必ずしも該当しない。出力に不変のままに送られる識別コードはマトリクス〔１〕によって指定される。
【００３５】
コードが「畳み込み」と呼ばれる理由は、以下のような畳み込み関数を用いて実施できるからである。入力されたデータ・ビットにはまずコード速度に従ってゼロが挿入される。例えば、元のデータが（１０１１）であるものと想定する。（１０００１０１）を得るためにこのデータにはゼロが挿入されるので、このデータはその時点で元の速度の半分である。単一の行として書き込まれた上記のエンコーダ・マトリクスである（１１１０１１）をこれらのデータに畳み込むと、（１１１０２２２１２１１１）となる。（１１１００００１０１１１）であるモジュロ−２を採用する。モジュロ−２演算はエンコーダ内で排他的ＯＲゲート（３６および３７）の機能を果たす。このように、４ビットのシーケンスが１２ビット・コード語にエンコードされたことになる。一般に、ｎビットのシーケンスは（２ｎ＋４）−ビット・コード語へとエンコードされる。
【００３６】
畳み込みコードについて説明してきたが、その他の多くの種類の適宜のエラー補正コードを利用できることが理解されよう。このようなコードにはＲｅｅｄ−Ｓｏｌｏｍｏｎコード、ＢＣＨコード、Ｇ０ｌａｙコード、Ｆｉｒｅコード、Ｔｕｒｂｏコード、Ｇａｌｌａｇｈｅｒコード、およびＭａｃｋａｙ−Ｎｅａｌコードが含まれる。
【００３７】
畳み込みエンコードの前に同期化エンコードが実行される。したがって、エンコードされたデータ・ビットのストリームに同期化フラグが挿入される。このようにして、エンコードされたビット・ストリームの同期化はこれに開始フラグを挿入することによって達成される。フラグ・パターンは５つの「１」が後に続く「０」である。それ以外ではデータ・ストリーム内にこのパターンが発生しないことを確実にするため、４つの「１」のどのシーケンスの後にも追加のゼロ・ビットが挿入される。これは、「ゼロ詰め込み（ｚｅｒｏｓｔｕｆｆｉｎｇ）」として知られている。詰め込まれたゼロはデコーダによって除去される。この手順によって、開始フラグあたり６ビット＋ちょうど３％を超えるデータ速度の全体的な低下というペナルティが生ずる。これの代わりに多くの代替方法が可能であることが当業者には理解されよう。
【００３８】
ここで図６に戻ると、この図は、説明したばかりの２つのプロセスをどのようにエンコーダに組込むかを示している。したがって、この場合も（１０）はエンコードすべきカバーテキストを示し、Ｋは現在処理中の電流ウインドウを示し、Ｋ−１は以前のウインドウを示している。マルチプライヤ（４１）で、抽出されたウインドウＫは分析フインドウ関数（４２）で乗算されるので、抽出されたウインドウはその各々の端部で漸減する。回路（４３）では、分析ウインドウ関数によって修正された抽出済みのウインドウの高速フーリエ変換が達成される。必要なパワー・スペクトログラムを生成するために、回路（４４）で方形極変換が行われる。このパワー・スペクトログラムは、既に述べたように回路（４５）で修正され、この回路はスペクトログラムの位相成分が不変のままである、図２の回路１２に対応している。
【００３９】
ステゴテキストの生成を完了するために、極方形変換が回路（４６）で行われる。逆高速フータエ変換が回路（４７）で行われ、かつ合成ウインドウ関数（４８）を用いて、（４９）で逆高速フーリエ変換回路（４７）の出力を乗算する。最後に、重複したウインドウが（５０）で加算されて、（１５）に示したステゴテキストが作成される。
【００４０】
いくつかの異なる電子透かしを利用できるようにすることが望ましいことが理解されよう。基本的に、いくつかの異なる電子透かしを使用することが可能であることによって、どの透かしを使用すべきかの知識がなければ、侵害者が隠されたメッセージをデコード、除去、または偽造することは著しく困難になる。
【００４１】
この実施形態では、「キー」という用語は特定数の同系統の電子透かしを表すために用いられている。この実施形態の場合も、キーは疑似ランダムに生成され、いずれか１つのキーはシードとして用いられる単一の整数によって定められる。これは図１に示したシード入力である。
【００４２】
この実施形態では、キーはスペクトログラム変調値のアレイＫ（ｔ，ｆ）であり、ｔとｆは整数指数であり、−１≦Ｋ（ｔ，ｆ）≦＋１である。Ｋ（ｔ，ｆ）は、Ｈ−ｘ／２≦ｔ＜ｘ／２および０≦ｆ＜Ｙの範囲外ではゼロになるように定義されている。カバーテキストのスペクトログラムをＧ（ｔ，ｆ）とし、ステゴテキストのスペクトログラムをＨ（ｔ，ｆ）とする。ｄ_ｉはエンコードすべきデータ・ビットを表すものとする。ただし、ｄ_ｉは（０または１ではなく）±１である。簡略にするため、エラー補正コーディングは無視するものとする。したがってエンコーディング・アルゴリズムは下記によって得られる：
【００４３】
【数３】

【００４４】
したがって、分岐切断の適切な選択が得られる。
【００４５】
【数４】

【００４６】
ただし、ｓはエンコーディング強度を定める実定数である。方程式（１）および（２）では、ＧとＨは複素数であるが、Ｋは実数である。したがって、方程式（１）では、ａｒｇＨ＝ａｒｇＧとなる。したがって電子透かしはパワー・スペクトル内でエンコードされ、元のスペクトル成分の位相が保存される。
【００４７】
キーの設計は侵害に対して頑強なステゴテキストの生成の際に最も重要であることが理解されよう。ここで、キーを設計するための考慮事項を詳細に説明する。
【００４８】
ホワイト・ノイズ・パターンだけから構成され、キー内の各々のセルが独立し、かつ全く同一に分布しているキーは、多くの理由から魅力あるものである。これは計算上生成し易く、最大可能な情報伝達能力を有している。一般に、これはカバーテキストとの相関性は低く、単一の狭い自己相関ピークを有している。実験により、これはオーディオ・ファイルの多岐にわたる改ざん（ｍａｎｉｐｕｌａｔｉｏｎ）に対して頑強であり、しかも聴こえないような充分に低い強度でエンコードされることが判明している。しかし、個々の行がランダムに左右シフトされる、スペクトログラムへのグループ遅延侵害を用いてステゴテキストを改ざんすることによって、既に得られたスペクトログラム分解能により、行を１列以上シフトするようにグループ遅延パラメータを配列することができる。これによってステゴテキストとキーとの間にある何らかの相関性が破壊される。ステゴテキストの知覚的に満足できる構造と、あらゆる形式の上記のグループ遅延侵害に対して頑強であることが同時に得られるスペクトログラム分解能を選択することは不可能であるように思われる。
【００４９】
更に、全ての周波数が例えば５％（１半音未満）だけ上昇し、テキストが同じ係数だけ時間的に短縮されるようにステゴテキストをリサンプルすることが可能である。スペクトログラムに及ぼすこの作用は、スペクトログラムを垂直方向に拡大し、水平方向に縮小することである。この手順は図７にグラフで示されており、ただし１５Ａは元のステゴテキストを表し、１５Ｂは変更されたステゴテキストである。セルのうちで依然として符合するのは極めて少ないことが分かる。周波数軸に沿って、ｆ≧２０であるセルはセルの以前の位置とは全く重複しない。この場合も相関関数は破壊される。
【００５０】
これらの２つの問題のうちの第１の点、すなわち１つの次元での拡大は、反復する列を含むようにキーを修正することによって解消できる。実験により、各スペクトログラム列を１２回反復すれば、相関関数を破壊するのに必要なグループ遅延が、知覚的に受け入れられない影響をステゴテキストに確実に及ぼすのに充分であることが分かっている。そのための代償は情報伝達能力の低下である。すなわち、キーの自己相関ピークはより広く、かつ低くなり、したがって所定の頑強性を得るのにより高いエンコード強度が必要になる。
【００５１】
第２の問題点は徹底的な探索によって解消できる。相関関数は異なるリサンプリング速度の範囲で評価されることができ、また、最強の相関性を与える相関関数を発見することによって、どのような係数（ｆａｃｔｏｒ：因子）でファイルがリサンプルされたかを判定することができる。残念なことに、ピッチは変化するが、全体的な時間は一定に留まるように、またはピッチは一定に留まるが全体の時間が変化するようにステゴテキストをリサンプルすることが可能である。この後者のプロセスは、例えば一作品の音楽を所定のスロットに正確にフィットさせることが望ましい放送の用途では一般的なものである。したがって、探索するための二次元空間の可能性がある。すなわち、ステゴテキストが周波数および／または時間で任意に拡張された可能性がある。キーが上記のように反復する列を含むように修正されている場合は、自己相関関数は広く、したがって可能性がある時間拡大の範囲は散在的にサンプリングするだけでよいが、それにも関わらず、計算上の負担は大きい。
【００５２】
しかし、本発明はこの問題の解決方法を提供するものである。ある固定的な基点に対する拡張がキーに及ぼす影響を慎重に検討すると、拡張の相対効果はキー全体にわたって一定であることが分かる。これは、変化し、かつ上記の問題点を誘発する絶対的な効果である。この実施形態では、より高い空間周波数が更にフィルタリングされ、基点から除去されるようにキー・パターンが修正される。
【００５３】
以下の説明目的のため、カバーテキスト、ステゴテキスト、およびキーを対数スペクトログラム領域内の画像と見なすことにする。「周波数」とは、これらの画像内の平均空間周波数を意味し、背後にあるオーディオの周波数のことではない。
【００５４】
まず、一次元での問題点を検討する。ｆ（ｔ）を正弦波とし、ｆ（ｔ）＝ｓｉｎ ωｔとする。係数αを押し込むと、ｇ（ｔ）＝ｓｉｎ αωｔとなる。これらの位相角φ＝αωｔ−ωｔ＝ωｔ（α−１）で得られる。適宜に選択されたｔの周囲の間隔で計算された、ｆ（ｔ）とｇ（ｔ）の相関が、ある閾値を超えるものと断言することは位相角φを制約することと等価であるので、｜φ｜＜φ０である。したがって、ｔ：｜ω｜＜φ／（α−１）ｔの項でωは拘束され、もしくは、αが抵抗が必要なほどの最大の拡張であるものと選択される場合は、ある正の定数Ｃについて１／ω＞Ｃ｜ｔ｜である。この関係に照らして、正弦波のタイムスケールτに関して述べる方が簡明である。ただしτ＝１／ωである。
【００５５】
ここで、拡張した場合にそれ自体と相関する関数の周波数内容を指定することが可能である。これはタイムスケールの閾値よりも短いタイムスケールを有する周波数成分を含んでいてはならず（τ＝Ｃ｜ｔ｜）、ただし定数Ｃは所望の拡張抵抗の程度を設定する。このような関数はホワイト・ノイズ信号を適宜にフィルタリングして除去することによって得ることができる。遮断周波数がｔと反比例して変化する低域フィルタが必要である。このようなフィルタを以下では「掃引」フィルタと呼ぶ。
【００５６】
この実施形態で既に説明したように、連続的なデータ・ビットに対応するキーは重複される。キーの１回のコピーによって特定の時点に存在する周波数成分と、以前または後続のコピーによって存在する周波数成分との重複を最小限にするため、キーには高域フィルタも適用される。したがって、全体的な効果は帯域フィルタの効果となる。高域フィルタの遮断周波数は、隣接するキーの低域特性と適合するように掃引される。このことは図８のグラフに示されている。タイムスケールに関して一定である「帯域幅」△は△＝ＣＴで得られ、Ｔはカバーテキストに対するキーの連続的な適用間の間隔である。図９は４つの連続するビットｄ_０、ｄ_１、ｄ_２、およびｄ_３のためのキーのコピーがどのように重複するかを示している。
【００５７】
このような上記の種類の掃引帯域フィルタを、ホワイト・ノイズ信号に適用した結果の例が図１０に示されている。
【００５８】
同様にして、二次元キーも二次元のホワイト・ノイズ・パターンから生成することができる。前述のように変化する特性を有するフィルタは各次元に別個に適用される。フィルタリングの後、−１≦Ｋ（ｔ，ｆ）≦＋１の条件を実現するために、データ値は非線形関数を通過するようにされる。この実施形態では正弦（ｓｉｎｅ）が用いられる。
【００５９】
その結果として生ずるパターンの例が、図１１のパワー・スペクトログラムに示されている。ここでは、軸は「時間」と「周波数」（ここではオーディオ周波数の意である）である。これらは、キーが適用される図３のスペクトログラムの軸と適合している。時間方向での基点はキーの中心にあり、一方、周波数方向での基点は最上部にある。図１１の右側の列は図３のスケール列と同様の機能を有している。
【００６０】
この実施形態はＸ軸に沿ってだけではなく、Ｙ軸に沿っても帯域フィルタによるフィルタリングを適用しているが、キーのコピーはその方向には重複しないので、低域フィルタによるフィルタリングでも充分であろう。帯域フィルタではなく低域フィルタを使用することによってキーの情報伝達能力を高めることができる。
【００６１】
上記の方程式中の定数Ｃの値が増大すると、生成されるキーはより大幅な拡張に耐えるようになる。高域フィルタおよび低域フィルタの特性は互いに類似するようになり、したがってフィルタの帯域幅はより狭くなる。これによってキーの情報伝達能力は低下する。したがって、拡張に対する抵抗と情報伝達能力との間で調整（ｔｒａｄｅｏｆｆ）がなされる。この実施形態では、ピクセル当たりＣ＝０．１５（サイクルごとのピクセル）であり、このように生成されたキーは、時間方向にも周波数方向にも約±６％まで拡張があっても充分良好に作用する。Ｃの上記の定義はピクセルに関していることが分かる。この文脈では、ピクセルという用語は、スペクトログラムの水平方向でのフィルタリングと、垂直方向でのフィルタリングを検討する場合には意味が異なる。
【００６２】
水平方向ではピクセルという用語はスペクトログラムの列間での時間間隔を意味するものとして用いられる。垂直方向でのフィルタリングを考える場合は、ピクセルという用語はスペクトログラムの２つの隣接する行の周波数差を意味するものとして用いられる。
【００６３】
したがって、図１１では水平ピクセルは約２３ミリ秒であり、垂直方向のピクセルは約２２Ｈｚである。
【００６４】
したがって、低域フィルタの帯域幅にτ＞Ｃ｜ｔ｜を用いる際には、τはサイクルごとのピクセルで計測され、ｔは基準点、すなわち基点からの前述で定義されたように、ピクセル内で測定された当該のスペクトログラムのポイントのＸまたはＹ座標を表す。図１１では基準点は画像の上縁の中心にある。この基準点はゼロ周波数に対応するように選択されたものである。他の基準点を選択することもできるが、ゼロ周波数の条件が好適である。
【００６５】
周波数と時間の双方におけるある範囲での拡張後の標準的なキー自体の相関性のピークが図１２に示されている。数値はキーのピーク自己相関が１になるように正規化されている。
【００６６】
キーとステゴテキストとの間の二次元相関を計算すると、ステゴテキストが周波数範囲で拡張した場合、相関のピークは線ｙ＝０からわずかに移動することがあることが判明する。このような理由から、この実施形態は二次元相関を用いている。すなわち、ｙ方向にわずかにずれた関数の値が互いに加算されて、ビット・シンクロナイザを通過する一次元関数が形成される。
【００６７】
ここで図２３を参照すると、この図はキーＫの生成の流れ図を示している。
【００６８】
ステップＫ_１では、シード整数が入力され、ステップＫ_２でこの数値が、均一に分布する乱数を生成するＴａｕｓｗｏｒｔｈｅジェネレータに供給される。Ｔａｕｓｗｏｒｔｈｅジェネレータ出力は、ボックス−コックス方式によって一次元ガスウ分布乱数へと変換されるようにステップＫ_３で供給される。Ｘ＝３２、およびＹ＝１０２４であるキーでは、このような乱数は３２７６８になる。Ｔａｕｓｗｏｒｔｈｅジェネレータおよびボックス−コックス方式によって実行されるプロセスは、１９８８年にＣｌａｒｅｎｄｏｎ出版によってオックスフォード科学出版叢書で刊行されたＪｏｈｎＤａｇｐｕｎａｒ著の書籍「ランダム変数生成の原理（ＰｒｉｎｃｉｐｌｅｏｆＲａｎｄｏｍＶａｒｉａｔｅＧｅｎｅｒａｔｉｏｎ）」に詳細に記載されている。
【００６９】
ステップＫ_４で、３２７６８の乱数が３２×１０２４の乱数の二次元アレイへと再構成される。
【００７０】
ステップＫ_５で、前述の二次元掃引フィルタリングが実行される。
【００７１】
ステップＫ_６で、データ値は−１≦Ｋ（ｔ，ｆ）≦＋１の条件を実現するために、データ値は非線形正弦関数を通過するようにされる。
【００７２】
最後に、ステップＫ_９で、キーをエンコーダまたはデコーダで直接使用するか、または適宜の読み出し可能メモリに記憶しておくことができる。
【００７３】
上記のプロセスは全て、３００に示すように適宜にプログラムされたコンピュータによって実行され、ＣＤ、ＲＯＭ、ＤＶＤ、ディスク、テープ、またはその他の任意の適宜の記憶媒体とすることができる記録媒体３０１に記憶されることができる。
【００７４】
これまで本発明に基づくカバーテキストのエンコーディングの基本ステップおよび原理を説明してきたが、図１３はエンコーダのブロック図を示している。
【００７５】
上述の各図と同様に、（１０）はこの実施形態ではエンコードすべき音楽であるカバーテキストを表し、（１５）は最終的なステゴテキストを表す。
【００７６】
回路（５１）において、カバーテキストは対数絶対値スペクトログラムへと変換される。
【００７７】
このようにして生成されたスペクトログラムはＦＦＴ回路（５１）に供給され、そこで受信されたスペクトログラムはクロック（５２）によってスペクトログラム・バッファへとクロックされる。ＦＦＴ回路（５１）は、入力されたスペクトログラムの重複分割と、図６で説明したウインドウ機能とを実行する。クロック（５２）によって、スペクトログラム・バッファ（５３）のコンテンツがスペクトログラム形式で、この実施形態では２５６または３２列であるキーの長さに等しい音楽量を表すことが確実にされる。
【００７８】
エンコードすべきデータは、（５５）で、前述のように同期化フラグを加算し、かつゼロ詰め込みを実行するために回路（５４）に供給される。
【００７９】
回路（５４）の出力は、図５を参照して説明したエンコーダに対応し、かつ（５７）で必要な多項式が供給される畳み込みエンコーダ（５６）に供給される。
【００８０】
キー・マトリクスは、（５８）で回路（５９）へのエンコーダに供給され、そこではキー・マトリクスはスペクトログラム・パッファ（５３）内に保持されているスペクトログラムへと直接乗算できる数値のセットへと変換される。これらの値は２つのマトリクスの形式をとり、一方はゼロ・ビットをエンコードするためであり、他方は１ビットをエンコードするためのものである。これらのマトリクスはキーの真数、およびキーの真数の逆数である。これらのマトリクスを、保持されたスペクトログラムへと乗算する動作は、対数スペクトログラム領域内で加算または減算する動作と等価である。
【００８１】
キーがバッファ（５３）のコンテンツを変調する強度は、入力（６０）によって決定される。この入力は方程式２の実定数ｓに相当する。
【００８２】
２つのマトリクスは、（６１）で示すようにスペクトログラム・バッファ（５３）のコンテンツと選択的に乗算され、この選択は、バッファ（５３）内に記憶された音楽が単一のデータ・ビットでエンコードされるように畳み込みエンコーダ（５６）の出力に従って行われる。バッファ（５３）のコンテンツは、ＩＦＦＴ回路（６２）へと書き込まれた各ビットごとに１クロックの周期だけシフトされるので、主エンコード・ループは書き込まれた各ビットごとに一度実行される。
【００８３】
ＩＦＦＴ回路（６２）の出力はクリップ防止バッファ（６３）に印加される。これは、データが音楽ファイルとして読み出される時に、回路（６２）から読み出されたデータが確実にクリッピングしないようにするためである。クリッピングが起こりそうな場合は、出力のボリュームを漸減させて、クリッピングがかろうじて回避されるように、振幅変調曲線が生成される。そうすることが安全ならば、ボリュームはこの場合も漸次に規準まで増加させる。
【００８４】
最後に、クリップ防止バッファ回路（６３）から出力がステゴテキスト（１５）として出力される。
【００８５】
図１３はスクランブラ６５をも含んでいる。可能な多くのスクランブラを使用できるが、標準的なものはＣＣＩＴＴのＶ３２規格に記載されている。スクランブラを含めることは、畳み込みエンコーダと同様にオプションである。
【００８６】
上記の説明は、簡潔にするために単一のキーを使用するものとして記載した。勿論、各キーが異なるシード整数によって生成された１つ以上のキーを使用してもよいことが理解されよう。付加的に、ステゴテキストを電子透かし処理するためにキー（単数または複数）の倍数を用いることが可能である。上記の実施形態では倍数は「１」であるので、キーの倍数に言及される場合は常に、倍数が「１」であってよいこと、すなわちキーはその符号は別にしても不変のままに留まることが示唆される。
【００８７】
ステゴテキストを電子透かし処理するために、または電子透かしコードを検索するために、２つ以上の異なるキーを使用する実際の方法は、本明細書に記載された実施形態と全く同類である。したがって、１つ以上のキーがある場合は、いずれかの時点でどのキーが（６１）でスペクトログラムへと乗算されるかはエンコードすべきデータに従って定まる。スペクトログラムを変調するために±１以外の倍数が用いられる場合は、その都度、１ビット以上をエンコードすることが可能である。勿論、デコーディングには同じ倍数のセットが使用される。
【００８８】
本発明によるエンコーダの実施形態を説明してきたが、ここで、コード化されたデータを復元するために圧縮または拡張されることがあるステゴテキストのデコーディングの問題に焦点を当てることにする。
【００８９】
カバーテキストのパワー・スペクトルを変調するために、図１３のエンコーダで用いられるキーの特性を説明してきたが、電子透かしデータを抽出するためにステゴテキストをデコードする場合、キーとステゴテキストのパワー・スペクトログラムとを相関させることによってデータ・ビットを特定できることが理解されよう。ステゴテキストが侵害にさらされておらず、またはそれ以外の理由で拡張または圧縮されていない場合には、データに従って補正されたこれらの対数要素でステゴテキストとキーとに明確な相関がある。
【００９０】
前述のキーは、垂直または水平のいずれかの方向でステゴテキストの±６％の拡張を含むステゴテキストの歪みに対処することができる。
【００９１】
ステゴテキストがキーによって許容される±６％以上の拡張を受けた場合に対処するために、いくつかのアプローチが可能である。
【００９２】
しかし、本発明の実施形態は直接的な相関を含まない別のアプローチを採用しており、次にそれを詳細に説明する。デコーダの実際の回路を説明する前に、まず、それに含まれる原理を基本的に説明する。
【００９３】
世界中のほどんどのデモジュレータおよびデコーダは、望ましい信号は加算ノイズ、ホワイト・ノイズ、固定ノイズ、およびガウス・ノイズであるノイズによって損なわれるということを前提にして最適であるように設計されてきた。そのほとんどが、任意の個々の時間で１つの実際値だけが受信されるという意味で一次元であり、いくつかは二次元である。この実施形態では、任意の時点でスペクトログラム列全体の形式の多くの実際値が受信される場合に信号が発生される（ｅｎｃｏｕｎｔｅｒｅｄ）。
【００９４】
任意の個々のサンプルについて、そのサンプル内のノイズ値の周辺確率分布（すなわち他のサンプルの値が何であるか不明である場合に認められる分布）がガウス分布、正規分布、または平均ゼロ分布である場合に、そのノイズはガウス・ノイズと呼ばれる。
【００９５】
ノイズのフーリエ・スペクトルが検討された場合に、そのスペクトルの任意の個々の要素の周辺確率分布が他の任意の要素の周辺確率分布と同一である場合は、そのノイズはホワイト・ノイズと呼ばれる。
【００９６】
任意の時間領域サンプルの周辺確率分布が他の任意のサンプルの周辺確率分布と同一であり、所定の長さのノイズの任意の１つの抜粋の結合確率分布が他の任意のこのような抜粋の結合確率分布と同一である場合、そのノイズは固定ノイズと呼ばれる。
【００９７】
ほとんど全ての場合、これらの前提の全てはある程度まで破られる。一般にこのことは重要ではない。しかし、本発明では、上記の前提からの例外は極めて重要であり、したがって詳細に説明する。
【００９８】
例えば、スチューデント・ノイズは非ガウス一次元ノイズの例であると見なすことができる。スチューデント分布は、各サンプルごとに標準偏差が異なること以外はガウス分布と同様である。特に、逆変数はガンマ分布からのサンプルごとに新たに引き出される。
【００９９】
図１４および１５は、変数１の一次元ホワイト・ガウス・ノイズの抜粋と、（１に等しい関連するガンマ分布の形状パラメータｍを有し、変数１をも有するようにスケーリングされた）一次元ホワイト・スチューデント・ノイズの抜粋をそれぞれ示している。双方とも同じ変数を有しているが、見かけは全く異なる。スチューデント分布ノイズはある種の大きなスパイク（ｓｐｉｋｅ：ピーク）を有しているが、これはガウス分布ノイズでは極めて稀にしか発生しないであろう。
【０１００】
このことは、ノイズが衝撃ノイズである場合、および異常値が（ガウスの前提の下で）適正とは程遠い結論を余儀なくさせるような状況では問題である。
【０１０１】
発生の頻度が低い非ガウス・ノイズの別の例は、各々がガンマ分布されている２つの数量の比率の対数として分布されているノイズである。このようなノイズは下方へのスパイクを多く有しているが、上方へのスパイクはない（または逆である）。このようなノイズはオーディオの電子透かし処理の問題に関連があることが判明している。
【０１０２】
非ホワイト性は直ちに分かる程度以上に乱されることがある。その例がカラー・ノイズである。これはある周波数が他の周波数以上に存在するが、ノイズが固定的であるようなノイズである。典型的には、低周波数が最も多くを占めるピンク・ノイズ、他の周波数がフィルタリングされて除去される帯域制限ノイズ、およびパワー・スペクトル密度が、ゼロよりも高いある限度まで低下した周波数と反比例する「１／ｆ」ノイズがある。
【０１０３】
しかし、一般に、他の種類の非ホワイト・ノイズがある。
【０１０４】
その一例が一次元非固定ノイズである。１つの次元だけが考慮される場合は、このノイズは非固定ノイズであり得る。例えば、断続的な熱源に近づきすぎるレジスタからのノイズである。または、（「カラー」の変化ならびに時間と共に振幅の変化を引き起こす）可変キャパシタンスを含むフィルタの後に記録されたノイズである。
【０１０５】
一方、信号が多次元である場合は、異なる次元でのノイズは、信号（単数または複数）が時間に対して固定的であるとしても、そのノイズに相関性があることがある。あるいは、あるチャネル／次元でのノイズが他のチャネル／次元でのノイズよりも大きい振幅を有することがある。勿論、音楽の対数絶対値のスペクトログラムを除外することがまさに不合理であるように、これも時間に対して非固定的である多次元ノイズを止めるものはない。
【０１０６】
一般に、これらの前提のいずれかを破ることは、ステゴテキストから電子透かしをデコードすることの試みに際して重要であるだろう。しかし、関心点が本発明に関わる種類の電子透かしをデコードする際の、特に音楽での強靱さにあるならば、非ガウス性はそれほど重要ではなく、一方、信号間の相関と、異なる周波数成分と異なる周波数での振幅変化との相関の、双方の相関の形式の一般的な非ホワイト性は極めて重要であることが実験により明らかに判明している。非ガウス性を考慮に入れたデコーダは、計算に際して様々な中間変数にとってより検知可能な数値を生成する上で利点があり、一方、非ホワイト性を考慮に入れないデコーダは性能が低下し、メモリおよびフロップを浪費する。
【０１０７】
従うべき基本は多次元ガウス分布の概念である。ｘを実数のＮベクトル（すなわちＮの要素を有する列ベクトル）であるものと仮定する。μがｘ（同じくＮベクトル）の分布の平均であるものとする。
【０１０８】
ここで、一次元の場合に標準偏差を表す変数σ（あるいは１／σ^２を表すｓ）があるのと全く同様に、Ｎ次元の場合は、実数のＮｘＮマトリクスｓがあり、これは一次元の場合と同様であるが、より複雑なｓの役割に対し、より単純な役割を果たす。
【０１０９】
まず、Ｘの分布がμを中心として球対称（すなわち円対称）である二次元の場合を検討する。この場合は、μを通過する分布を横切るどのスライスも標準偏差が例えばσである一次元ガウス分布と類似して見える。この場合は、ＳはＩ／σ^２であり、ただしＩは単位マトリクスを表す。分布が円対称ではなく、座標軸の１つと位置合わせされた楕円の長軸と楕円対称である場合は、Ｓは２つの対角要素が異なり、かつ正である対角マトリクスとなる。楕円が座標軸と位置合わせされていない場合は、Ｓは全てのエントリが非ゼロであり、対角要素が正である２ｘ２の対称マトリクスになる。Ｓの逆数Ｖは分布の共分散マトリクスとして知られており、Ｓはｉｃｏｖマトリクス（ｉｃｏｖ：逆共分散の略語）と呼ばれる。
【０１１０】
Ｓは更に他の重要な特性を有している。これは任意の非ゼロベクトルＹについてｙ’ｓｙが正のスカラになることを意味する正定（ｐｏｓｉｔｉｖｄｅｆｉｎｉｔｅ）である（ここで’は移項を表す。ｙ’ｓｙは常にスカラであるが、Ｓが正定であり、ｙがゼロではない場合に常に正であることだけが保証される。）
【０１１１】
Ｎ次元の場合は、「円」の代わりに「Ｎ次元の球」に、および「楕円」の代わりに「Ｎ次元の楕円面」になることを除いては、同じ画像が保持される。
【０１１２】
この場合は、このような密度の確率分布の公式は下記になる：
【０１１３】
【数５】

【０１１４】
ここでゼロ平均の「ランダム」Ｎ次元のガウス分布を考慮する必要があるものと仮定する。「ランダム」は勿論、どの分布から引き出されるかが言明されない限りは実際の意味を持つものではない。必要なことは、ゼロ平均を伴うＮ次元ガウス分布上の分布である。それが意味するのはｉｃｏｖマトリクス上の分布の必要性である。または、正定の対称的なＮｘＮマトリクスＳ上の分布の必要性である。
【０１１５】
これらの要求基準を満たす分布の１つがウィシャート分布である。
【０１１６】
ウィシャート分布は（Ｎ以外に）２つのパラメータを有している。すなわち、「緊密度」または「形状」パラメータであるｋと、「スケール・マトリクス」であるＶである。ｋはＮ−１より大きくなければならない。これが大きい程、その平均を中心とした分布は緊密になる。Ｖは正定で、かつ対称でなければならない。平均はｋＶ ^−１である。密度は下記となる：
【０１１７】
【数６】

【０１１８】
Ｎが１である場合は、これはガンマ分布にまで劣化する。しかし、ｍおよびｒがガンマ分布の形状およびスケール・パラメータである場合は、Ｖは２ｒであり、ｋは２ｍである。
【０１１９】
一次元のスチューデント・ノイズを以前説明した際に、各サンプルごとにガンマ分布からｓ＝１／σ^２の新たな値が引き出されることに言及したが、その場合、実際のサンプルはゼロ平均と逆共分散ｓとを伴う一次元ガウスから引き出される。
【０１２０】
その場合、多次元スチューデント・ノイズを生成する１つの方法は、新たに各サンプルごとに新たなｉｃｏｖマトリクスＳを引き出すことであり、その後で、ゼロ平均およびｉｃｏｖマトリクスＳを伴う多次元ガウス分布からサンプル自体が引き出される。各新たなｉｃｏｖマトリクスは、ある適宜のパラメータｋおよびＶを伴うウィシャート分布から引き出されるものとする。
【０１２１】
上記のウィシャート分布のスケール・マトリクスＶが何らかの正の実数ｒのための形式ｒＩのスケール・マトリクスである特殊な場合は、このスチューデント分布は球対称であり、下記によって得られる：
【０１２２】
【数７】

【０１２３】
したがって、基点からのｘの距離Ｒの確率密度は下記のように表すことができる：
【０１２４】
【数８】

【０１２５】
ここで音楽の対数絶対値スペクトログラムが実際にどのように見えるかが分かる。これは多次元スチューデント・ノイズとしてモデリングすることができるが、やや異なるモデリングがなされればより良好な適合が得られる。全てのサンプルについて新たなｉｃｏｖマトリクスを引き出す代わりに、ビット周期ごとに１回、またはより少ない頻度で、新たなｉｃｏｖマトリクスが引き出される。次に、結果として生じた準スチューデント分布からの各サンプルが、スペクトログラムの１列と見なされる。これは、以下に明らかにされる理由から「ｄｅｔＶ」分布と呼ばれる。
【０１２６】
関連するウィシャート分布のパラメータがいかなるものであるべきかの問題については、本明細書で後に扱う。しかし、理想的にはこれらのパラメータはある種の広範な音楽の集成に関して決定されるものであるが、実際には、出会う可能性がある種類の音楽の代表的なものを含んでいれば、少数の作品だけを用いたとしても恐らくは大きな相違にはならないであろう。「残存する不確実性」は、Ｎよりも大幅に大きくないｋを有するウィシャート分布によって処理することができる。
【０１２７】
ウィシャート・スケール・マトリクスＶが極めて小さい単位倍数（ｍｕｌｔｉｐｌｅｏｆｔｈｅｉｄｅｎｔｉｔｙ）（実際にはｅ^−１２Ｉ）であるような特殊な場合は、「ｄｅｔＺ」分布と呼ばれる。この場合「Ｚ」はゼロを意味する。
【０１２８】
上記のように背景を説明をしてきたが、ここで、本発明に従って生成されるステゴテキストをデコードするために使用できるデコーダの特性を定義することができる。各デコーダは、特定のノイズのモデル（すなわちカバーテキストまたは音楽）用の最適なビット・バイ・ビット式メモリレス・デコーダとして設計されている。したがって、各デコーダは何が受信されたかに鑑みて、当該のデータ・ビットでのベイズ事後分布を計算し、次に最高の事後確率を有する値（０または１）を選別する。ここで「最適」という用語は極めて特殊な意味を持つことに留意されたい。すなわち、デコーダは、それが実装コストが最も安価なデコーダであるか、あるいは最も高価なデコーダであるかに関わらず、最良の出力を生成するデコーダである。
【０１２９】
このようなデコーダは明らかに、モデルが正確である場合に限って実際に最適なものになる。そうでなければ、チャネルの有効キャパシティが低減する。
【０１３０】
このように、相関関数を用いてデコーダを実施することができ、一方、「ノイズ」が固定ノイズ、ホワイト・ノイズおよびガウス・ノイズであるという前提で、ビット・バイ・ビット式メモリレス・デコーダを生成するためにＦＦＴを用いて実施することができる。実際には、信号強度が０．００５にまで低減された場合にキャパシティは大幅に損失する結果になる。この損失は、キーの低周波成分を全て放棄することで、ある程度までは緩和されるが、それでも相当な損失である。
【０１３１】
したがって、説明しようとするデコーダの実施形態には、時間が固定であるように、また周知のｉｃｏｖマトリクスを有するように限定される非ホワイト多次元ガウス・ノイズとしてノイズを定義するノイズ・モデルへの変更が含まれる。
【０１３２】
デコーダはこのノイズ・モデル用に極めて簡単に実施することができる。周知のｉｃｏｖマトリクスＳが採用され、Ｃｈｏｌｅｓｋｙ分解がなされ（すなわちＣ’Ｃ＝Ｓであるようなその三角正定移項平方根Ｃ）、受信された信号の対数絶対値スペクトログラムとキーの双方にＣが事前乗算される。Ｃを事前乗算するプロセスは前提とされている非ホワイト・ノイズをホワイト・ノイズに変換するので、このデコーダを「ホワイト化されたガウス」デコーダと呼ぶことができる。
【０１３３】
このデコーダはｉｃｏｖマトリクスＳがまさにノイズをモデリングするマトリクスＳ（すなわち音楽の対数絶対値スペクトログラム）である場合に良好に動作する。その理由を理解するため、ノイズをＮ空間内の極めて細長い楕円面であるとみなしてもよい。このデコーダは楕円面を球面に拡張することによる一方で、伝えられた信号を同時に各方向に同じ量だけ拡張することによって動作する。その結果、楕円面の最短軸と平行な信号部分がノイズを越えて著しく拡張され、したがって容易に復元可能である。
【０１３４】
しかし、使用しているＳが、電子透かしがデコードされている音楽のＳではない場合は、このデコーダに問題が生じる。このことが生じる理由は直観的には、ノイズを球面に拡張する代わりに、別の細長い楕円面に拡張し、したがってノイズの絶対的な大きさが、課意宇町された信号の大きさと比較して大きいままであるからである。したがって、この方法は非常に少なく見積もっても、電子透かしを読み取る時点で各々の音楽作品ごとに適正なｉｃｏｖマトリクスを利用できることが必要となり、既にこれだけでも相当に不利である。
【０１３５】
しかし、動作が対数絶対値領域で行われることから、状況は上記よりもさらにいっそう悪い。時間領域の音楽信号にホワイト・ガウス・ノイズが加わることによる歪みを検討してみる。動作が時間領域で行われる場合、それは単に楕円面の狭い寸法の全てをわずかに広げるだけであろう。しかし、これが対数絶対値スペクトログラム領域において当てはまる場合、加算された歪みノイズは、音楽のスペクトル内容に厳密に比例したままであるように、スペクトル内容を常に変更しなければならないであろう。これは事実とは異なるので、より複雑な挙動が予測される。この挙動は楕円面を１つまたは複数の異なる方向に回転させることを含み、デコーダは、異なる音楽作品のｉｃｏｖマトリクスを使用した場合とまったく同様に失敗することが判明する。
【０１３６】
これらの問題点を解消するため、これから説明するデコーダは「ｄｅｔＶ」分布、すなわちウィシャート分布から引き出され、１ビットの周期にわたって局所的に一定である未知のｉｃｏｖマトリクスを有する多次元ガウス・ノイズ分布を使用する。これは、デコードしている音楽の元のｉｃｏｖマトリクスも、分布内に発生した歪みがそれに及ぼす正確な影響も判明していないが、このようなマトリクスが存在することは分かっているという事実を考慮するものである。
【０１３７】
したがって、説明されるデコーダはこのノイズ・モデル用の最適なデコーダになるように設計されている。したがって、以下に説明されるデコーダの実施形態は、所定の受信信号で各データ・ビットの事後分布を計算する確率的なアプローチを採用する。そのために、説明されるデコーダはベイズの理論を用いる。
【０１３８】
このように、各々がＮの要素のベクトルである対数絶対値スペクトログラムのＭ列である信号が１ビット周期中に受信されたものと考えてみる。
【０１３９】
ノイズ分布の平均値はゼロであるものと想定される。スペクトログラム全体の実際に観測された平均値が差し引かれるものと想定すれば、これは平均値が不明であり、かつそれがどのような値をも等しくとることができるとア・プリオリに信じられるということに極めて等しい。
【０１４０】
ここでＫを、実際に使用された場合の対数絶対値スペクトログラム内のキーの値であるものとする。すなわち、これが使用された強度で既に乗算されたものとし、（かつ、対数の底としてｅの代わりに１０をどのように使用することも許容されたものとする）。キーが各々の高さがＮであるＭの列から構成されていれば、ＫはＮｘＭマトリクスとなる。ｂを問題のビット値であるとする。簡潔にするために、重複するビットはないものと想定できるが、それがあってもなくてもデコーダにはほとんど相違がないことが判明しているので、重複するビットはないものと仮定する。完結にするため、ｂを１または０ではなく＋１または−１であるものとする。Ｘを、ｂが電子透かし処理された音楽のＭ列の対数絶対値スペクトログラムからなるマトリクスであるとする。Ｙを、ステゴテキスト内のＸに対応する受信された対数絶対値スペクトログラム列であるものとする。ここでタイミングが判明しているものと仮定する。
【０１４１】
ここで判明させるべきはＰ（ｂ｜ｙ）である。勿論、Ｐ（ｂ＝＋１｜Ｙ）／Ｐ（ｂ＝−１｜Ｙ）（またはその対数）を知ることも同様に望ましく、デコードするためにこれが１（それぞれが正）より大きい場合は、ｂ＝１となり、それ以外の場合はｂ＝−１となる。
【０１４２】
起点となる情報は何であろうか？これは以下の方程式に要約することができる：
【０１４３】
【数９】

【０１４４】
【数１０】

【０１４５】
【数１１】

【０１４６】
【数１２】

【０１４７】
ただし、Ｘ_ｍはＸのｍ番目の列であり、ｋとＶは関連するウィシャート分布のパラメータである。
【０１４８】
次にデコーダは、連続的に受信されたデータ・ビットの必要な事後確率を発見するために以下の方程式で示すようなベイズの定理を適用する。
【０１４９】
【数１３】

【０１５０】
【数１４】

【０１５１】
【数１５】

【０１５２】
【数１６】

【０１５３】
【数１７】

【０１５４】
【数１８】

【０１５５】
【数１９】

【０１５６】
【数２０】

【０１５７】
上記で方程式１３は、方程式７を呼び出すことによって方程式１２から成り立つ。方程式１４は方程式８を呼び出すことによって成り立つ。
【０１５８】
方程式１５は、Ｐ（Ａ）＝∫Ｐ（Ａ、Ｂ）ｄＢ、すなわち排反事象が結合する確率は個々の確率の合計または積分であるという基本確率理論を用いた上記の方程式から、また、Ｐ（Ａ｜Ｂ）＝Ｐ（Ａ、Ｂ）／Ｐ（Ｂ）であることを示す条件確率の定義によって導出される。
【０１５９】
方程式１６は方程式９を用いて方程式１５から成り立つ。方程式１７は方程式１０を用いて方程式１６から成り立つ。最後に方程式１８は収集係数（ｃｏｌｌｅｃｔｉｎｇｆａｃｔｏｒ）によって簡略化することによって得られる。
【０１６０】
方程式１８の積分は例外的に複雑であるが、方程式１９を生じ、そこから方程式２０が成り立つ。このように、方程式２０は方程式１９の変形を除算することによって方程式１９から得られ、ただしｂ＝−１である変形を伴って、ｂ＝１である。
【０１６１】
【数２１】

【０１６２】
【数２２】

【０１６３】
ｄｅｔＶデコーディグは最良のアルゴリズムの中核をなすので、ここでこの最後の方程式の右辺を評価する最良の方法を論考する。
【０１６４】
ＷをＶのＣｈｏｌｅｓｋｙ分解とすると、Ｗ’Ｗ＝Ｖとなる。Ｕ＝Ｗ^−１とすると、Ｕ’ＶＵ＝Ｉである。したがって、
【０１６５】
【数２３】

【０１６６】
【数２４】

【０１６７】
【数２５】

【０１６８】
最も内側の括弧の内容に特異値分解（ＳＶＤ）を適用する。ＳＶＤによってどのマトリクスも直交マトリクス、対角マトリクス、およびその他の直交マトリクスの積として書き込むことができる。ＳＶＤは、１回は分子内の括弧用に、もう１回は分母内の括弧用に２回用いられなければならない。この手順は分子だけを考慮して示されている。
【０１６９】
このようにＵ’Ｙ−Ｕ’Ｋはマトリクスであり、ＳＶＤによって、ＬＤＲ＝Ｕ’Ｙ−Ｕ’Ｋ（方程式２４）になるように、Ｌ、Ｄ、およびＲの計算が可能である。ただし、ＬおよびＲは直交マトリクスであり、Ｄは対角マトリクスである。次の７つの方程式は方程式（２４）から順次成り立つ。
【０１７０】
【数２６】

【０１７１】
これらの方程式中、Ｒの直交性によって必然的にＲ’Ｒ＝Ｉとなり（Ｌについても同様）、Ｌは直交かつ実数であるので、ｄｅｔ（Ｌ）＝ｄｅｔ（Ｌ’）＝±１であることから最後の行が成り立つ。
【０１７２】
しかし、Ｉ＋ＤＤ’は対角マトリクスから計算された対角マトリクスであり、したがってその行列式は、各々を簡単に評価できる対角上の要素の積である。したがって、Ｄだけを知ればよく、ＬまたはＲを知る必要はない。
【０１７３】
これまでデコード手順の基本原理を論考してきたが、ここで図１６を参照してデコーダの実施形態を説明する。
【０１７４】
図１６の実際のデコーダを論考する前に、デコーダ内の様々な段階を前述の理論的論考と関連付けることが可能であることが不可欠である。
【０１７５】
このように、Ｙがデコーダによって受信されたスペクトログラム・ブロックを表し、Ｘ＋ｂＫに等しいことが理解されよう。ただし、ｂはコードであり、Ｋはキーである。したがって、ｂとＫの値が判明していれば、受信されるＹが特定の値をとる確率は、Ｙ−ｂＫが元ののカバーテキストを表す確率と全く同じである。これが方程式（８）で表されていることである。
【０１７６】
ここで方程式（９）に転ずると、これはＸが電子透かし処理されていない音楽のスペクトログラムである確率を表現している。この方程式で、Ｓは未知のｉｃｏｖマトリクスである。音楽がホワイト・ガウス・ノイズとして表されるべきものであるならば、方程式（９）のＳは常に単位行列の倍数であるはずである。しかし、前述のように、これは実際の音楽を正確に表すものではない。このことが、本発明でｉｃｏｖマトリクスを使用する理由である。
【０１７７】
したがって、音楽作品全体ではないものの、Ｘの各列がこの未知のｉｃｏｖマトリクススＳによって定義される同じ多次元ガウス分布から別個にもたらさえるものと想定される。したがって、デコーダは、音楽の異なる部分ではＳが異なることを前提として動作する。方程式９はこのことを数学的に表している。
【０１７８】
しかし、電子透かし処理されたスペクトログラムをデコードするためには、Ｘがどの値をとる可能性があるかを知ることが不可欠である。方程式９は、Ｓの値が判明している場合だけこの情報を提供することができる。
【０１７９】
したがって、方程式１０の機能はＳがどの値をとる可能性があるかを判定することである。この方程式では、ＳはパラメータＶとＫを有するウィシャート分布に従って分布されているものと想定されている。これらのパラメータが選択される方法は本明細書で後に説明する。
【０１８０】
関連する基本原理を論考してきたが、ここで添付図面のうち図１６を参照してデコーダの第１の実施形態を説明する。
【０１８１】
図１６では音楽をベースにしたステゴテキストが再び（１５）によって表されている。以下の説明は簡略にするために音楽だけに言及するが、勿論、他の形式のカバーテキストも同様に処理することができる。
【０１８２】
（１００）において、カバーテキストは図６の（５１、５２）で開示されているウインドウ関数に匹敵するウインドウ関数でポイントごとに乗算され、（１０１）において、ウインドウ処理されたステゴテキストの高速フーリエ変換が行われ、これが（１０２）において対数領域に変換される。
【０１８３】
（１０３）において、方程式２２〜２５で示したマトリクスＵ’を表すように、既に生成されているマトリクスＵ’で（１０２）の出力が事前乗算される。Ｕ’の生成については後述する。（１０２）の出力をＦと定義すると、（１０３）の出力はＵ’＊Ｆとなり、これはＤＣオフセットがあればそれを除去するように高域フィルタ（１０４）に印加される。（１０４）の出力Ｕ’＊Ｆは初期ステゴテキストに応じた長さの対数絶対値スペクトログラムである。
【０１８４】
このスペクトログラムを処理するために、これは各々がＫに対応する幅を有するブロックに区分される。本実施形態および第２の実施形態では、各ブロックは時間次元で３２列幅であり、周波数次元で１０２４行幅であるが、これらの値は勿論変更できる。区分された各ブロックは方程式２３、２４、および２５の設定と同様にＵ’Ｙを表す。
【０１８５】
この区分化は、各区分ブロックが添付図面の図４に示された状況に対応してブロックの幅よりも１行だけ狭い先行ブロックと重複するように、（１０５）で行われる。
【０１８６】
上記のようにして得られた各ブロックは（１０６）でＵ’Ｋに加算され、（１０７）でＵ’Ｋから減算されるので、（１０８）と（１０９）で２つの異なる結果が得られる。すなわち、（１）左辺：Ｘ_−１＝Ｕ’Ｙ＋Ｕ’Ｋ、および（２）右辺：Ｘ_＋１＝Ｕ’Ｙ−Ｕ’Ｋである。
【０１８７】
ここで、これらの値が方程式２３および２４で見いだされる値に対応することが理解されよう。
【０１８８】
デコーダの次の段階には、方程式２３の分母と分子それぞれの対数を計算することが含まれる。これは２４で示された方程式を用いて達成される。段階（１１０、１１１）では、方程式２３の分母と分子の対数行列式が決定され、段階（１１２、１１３）で出力された対数行列式が−（（ｋ＋Ｍ）／２）でそれぞれスケーリングされる。この因数は勿論上記の方程式にもある。この因数の導出については後述する。
【０１８９】
段階（１１４）において、（１１２）および（１１３）において得られたスケリーング済みの値によって表される加算決定子の対数から分母の対数を減算することによって、方程式２３の数量が計算される。
【０１９０】
このようにして得られた値は、１または−１のデータ・ビットが存在する場合は、当該データ・ビットの確率の対数比としてバッファ（１１５）に記憶される。したがって、バッファ（１１５）はコード・ビットのシーケンスの事後確率を保持するものと見なすことができ、シーケンスの長さはバッファのサイズによって決まる。
【０１９１】
バッファ（１１５）への個々のエントリのシーケンスを検討すると、これらのエントリは、前述したように個々の値がマトリクス処理の結果を表す、ゼロ軸を中心に分布している値からなるものである。バッファ（１１５）は２５６の計数値を保持することができるが、勿論、この数も可変である。図１７はバッファ（１１５）に記憶された値のシーケンスを示すグラフである。
【０１９２】
バッファ（１１５）内の値は黒い曲線（１５０）によって表される。縦の実線１５１はビットがエンコードされた時間を表す。これらの時間は後述する様式でクロック抽出回路（１１６）によって決定される。
【０１９３】
ここでバッファ（１１５）からオリジナル・コードを表す値を抽出することが必要である。この場合も、元のステゴテキストは拡張または圧縮されることがあるので、コードのビット速度の抽出はこのことに考慮しなければならないことを理解しなければならない。
【０１９４】
これはクロック抽出回路（１１６）で実行される手順である。ここでは可能性があるスライス・ポイントのあらゆるシーケンスが考慮され、ゼロからの総合偏差が最大であるポイントのシーケンスが、埋め込まれたコードのためのクロックとして選択される。
【０１９５】
ネストされた一対のループが、可能性があるビットのクロック周波数、および位相のオフセットにわたって効果的に反復される。各々の反復で、スライスされた値の二乗和が計算される。この和が最大になるような周波数と位相の値が、クロックを表す標識の集合として段階（１１７）に戻される。
【０１９６】
段階（１１７）において、バッファ（１１５）内で最初と最後の値が発見されると予期される場所を指示するために２つのポインタが使用される。これらは、データがブロックからブロックへとスライスされて、ギャップや繰り返しなく抽出されたビットを共に結合できるように処理される。事後確率ベクトルはスペクトログラム・バッファ（１０５）内のデータと同じ速度でシフトされる。
【０１９７】
クロック抽出回路（１１６）によって生成されたクロックは（１１７）で、読み取られるべきデータをバッファ（１１５）からバッファ段（１１８）へとスライスするために使用される。オリジナル・キーは５列間隔でカバーテキストの対数絶対値スペクトログラムの３２列に加算されたことが理解されよう。したがって、デコーディングの前にステゴテキストが圧縮または拡張された場合でも、コード・ビットはバッファ内に記憶された値のシーケンスのほぼ５列目ごとに予期される。しかし、抽出されたクロックに応答して緩衝記憶されたデータをスライスした結果は、音楽がｄｅｔＶ分布を有していることをデコーダが想定している限りは、依然として充分に正確なオリジナル・コードの表現ではない。前述したように、これは完全な真実ではない。したがって、５列間隔のポイントのあるものは誤っている場合がある。したがって、元の音楽がｄｅｔＶ分布ではないことが許容される必要がある。
【０１９８】
まず段階（１１８）への出力を検討する。これは、ｌｏｇ（Ｐ（ｂ_ｎ＝１｜Ｙ_ｎ）／Ｐ（ｂ_ｎ＝０｜Ｙ_ｎ））
であることを意味する値のシーケンスからなっている。
【０１９９】
これは音楽が実際にｄｅｔＶ分布を有していれば必要なシーケンスであろう。しかし、既に説明したようにそうではない。
【０２００】
Ｃ_１，Ｃ_２，．．．Ｃ_ｎを段階（１１７）から出力された値のシーケンスであると定義すると、音楽がｄｅｔＶ分布を有していれば、Ｃ_ｎ＝ｌｏｇ（Ｐ（ｂ_ｎ＝１｜Ｙ_ｎ）／Ｐ（ｂ_ｎ＝０｜Ｙ_ｎ））
となる。
【０２０１】
音楽はｄｅｔＶ分布を有していないので、ｆ（Ｃ_ｎ）がＣ_ｎよりもその対数により近似しており、したがって、より適正なシーケンスをデコーダのエラー補正デコード段（１２０）に入力できるような関数ｆを発見することが必要であり、その際に、エンコーダで電子透かしコードが付加的にエラー補正コードでエンコードされたことが理解されよう。
【０２０２】
データ・スライス段階（１１７）の出力は正の値の可能性「＋１」と負の値の可能性「−１」とを有するゼロのいずれかの側の一連の値である。
【０２０３】
各々の「＋１」の値はαといわれる値から偏差している。同様に各々の「−１」の値は−αから偏差し、αはＣ_ｎの絶対値平均に等しい、すなわちα−平均｜Ｃ_ｎ｜であるものと推定される。
【０２０４】
正のＣ_ｎがαと異なり、負のＣ_ｎが−αと異なる数量も推定される必要がある。この値はσであると定義されるので、σ＝ｓｔｄ（｜Ｃ_ｎ｜−α）
となる。
【０２０５】
ここでσが得られたので、元のの値Ｃ_１，Ｃ_２，．．．Ｃ_ｎが段階（１１８）でスケーリングされ、その結果、これらの値は＋または−α／σに対して標準偏差１を有する。これは方程式ａ_ｎ＝Ｃ_ｎ／σおよびβ＝α／σによって要約される。
【０２０６】
したがって、ｈ_ｎ＝｜ａ_ｎ｜−βである場合、ｈ_ｎは平均値ゼロ、および１の標準偏差を有するが、必ずしもガウス分布されているわけではない。
【０２０７】
本実施形態では、ｈ_ｎが既に論考した種類の一次元スチューデント分布であるものと想定されている。したがって、
【０２０８】
【数２７】

となり、
【０２０９】
その結果、
【数２８】

となる。
【０２１０】
方程式２６から、必要な
【数２９】

が成り立つ。
【０２１１】
ここで値ｒとｍの導出について説明する。これは実際に典型的に見られるｈ_ｎの値の大サンプルを収集し、かつ方程式（２５）を用いて、ｈ_ｎの個々の値と、ｍおよびｌｏｇ（ｒ）での不適切な均一な事前分布の尤度としてＭＡＰ推定を実行することによって行われる。
【０２１２】
したがって、段階（１１８）において方程式２８を用いて、計算された対数尤度比のベクトルを含む「尤度」マップと呼ばれるものを得るためにデコードされた加算された、エラー補正エンコーディングを最終的に有している必要がある補正されたシーケンスが計算される。
【０２１３】
図１８は前述したばかりの手順によってバッファ（１１５）の内容から導出される対数尤度マップを示している。参照番号１５０、１５１は図１８では図１７と同じ意味で用いられている。
【０２１４】
図１６のデコーダの最終段は従来のものである。
【０２１５】
図５に示した畳み込みエンコーダは各入力コード・ビットごとに２つの出力ビットを生成する。
【０２１６】
したがって、回路（１１８）の出力に存在する２ビットごとに、どのビットが所望のコードの一部であるかについて判定を行わなければならない。
【０２１７】
この機能を実行するため、最も簡単な形式の畳み込みデコーダ（１２０）は可能性がある各出力ビットを吟味し、このような各ビットについて固定ウインドウ内の周囲のビットの考え得る値を全て検討する。この手順は位相探索回路（１１９）で行われる。ウインドウのサイズは性能と計算量との調整によって定められる。例えば、バッファ内に１０の値を含むウインドウの場合は、全部で１０２４のシーケンスが評価されなければならない。
【０２１８】
１０２４のシーケンスの各々について、関連するビットが＋１であるか−１であるかに応じてバッファ内の関連値を加算または減算することによって、そのウインドウにわたるバッファ内の値の確率が計算される。
【０２１９】
検討中の位置に＋１を有する５１２の全シーケンスの確率が加算され、また、関連位置にゼロを有する他の５１２のシーケンスが加算される。それによって検討中のビットが１であるか０であるかの確率が得られる。
【０２２０】
この手順は図１９に示されている。この図では（２５０）は回路（１１８）からスライスされた値の概略的な表現である。ｗｉｎ_１は１０値ウインドウを表し、７は当該ピクセルを表している。ｗｉｎ_ｉ＋２はこのシーケンス内の次のウインドウを表し、８は当該値の次のピクセルを表している。最後にＶ_ｉは当該ピクセル７について実行されたばかりの評価の結果を表し、Ｖ_ｉ＋２は次の評価の結果を表している。
【０２２１】
図１９に示すように、図５の畳み込みエンコーダの出力は各コード・ビットにつき２つの出力ビットを供給するので、ウインドウはバッファの内容に沿って２ビット間隔で段階付けされる。この手順は、バッファ内のそれぞれの偶数値および奇数値にわたって２回実行されなければならない。このようにして、各々が関連する確率を有する２つのシーケンスが生成され、最後により確率が高いシーケンスに基づいて選択がなされる。
【０２２２】
これまで説明してきたのは最も簡単な形式のエンコーダ／デコーダである。
【０２２３】
しかし、２対１以外の他の何らかの比率を有することが有利であろう。
【０２２４】
例えば比率が４対１である場合は、ウインドウがバッファ内の４つの値について均一かつ連続的に段階付けされた４つのシーケンスを用い、かつこれらのシーケンスから最も確率が高い出力ビットを選択することが必要であろう。
【０２２５】
同様に、ビタビデコーダのように、コードをデコードできる他の方法もあることが当業者には明らかであろう。
【０２２６】
エンコード段階で用いられるデコーダ多項式に対応する多項式が１２０’において最大尤度デコーダ１２０に供給され、最終的に加算された同期化ビット、およびゼロ詰め込み中に加算されたゼロは（１２０）において除去されて、デコードされたデータが残される。図１３の任意のスクランブラ（６５）がエンコード・プロセスで使用された場合にのみ、デスクランブラ（１２２）が必要である。
【０２２７】
添付図面の図２０はデコーダの別の実施形態を示している。図２０のデコーダは図１６のデコーダと共通の整数の大部分を有していることが分かるであろう。したがって、これらの共通の整数が出現する場所には同じ参照番号が用いられている。
【０２２８】
図２１のデコーダの動作の基本はベクトル空間内での投影マップの概念である。
【０２２９】
図面中、図２１は二次元ベクトル空間から一次元部分空間への直交投影マップｆを示している。
【０２３０】
この図ではランダムなポイントの集合ｖ_１、ｖ_２、ｖ_３、ｖ_４、およびｖ_５が関数ｆによって単一の線、すなわちＬが付された直交線へとマッピングされている。
【０２３１】
概略的に、ｖをｖ_１・ｖ_２と書かれたドット積を有する実Ｎ次元ベクトル空間とする。Ｖへの投影マップは以下の方程式を満たすｆ・Ｖ→Ｖである。
任意のｖ_１、ｖ_２、ｖ_３∈ｖおよび任意の実数ｒについて、
ｒｆ（ｖ）＝ｆ（ｒｖ）
ｆ（ｖ_１＋ｖ_２）＝ｆ（ｖ_１）＋ｆ（ｖ_２）
ｆ（ｆ（ｖ））＝ｆ（ｖ）
である。加えて、任意のｖ_１、ｖ_２∈ｖについて、
ｆ（ｖ_１）＝０→ｖ_１・ｆ（ｖ_２）＝０
であるならば、直交投影マップといわれる。
【０２３２】
Ｖの部分空間Ｗは、任意のｗ_１、ｗ_２∈ｗ、および任意の実数ｒについてｒｗ_１＋ｗ_２∈ＷであるようなＶの部分集合である。Ｖの各々の部分空間Ｗごとに、ｆ_ｗのＷへの正確に１つの直交投影マップが存在する。
【０２３３】
例えば、Ｎは２であってよく、ｖは全ての実２−ベクトルの集合でよく、ｗはその最初の要素が第２の要素の２倍である全ての２−ベクトルの集合でよい。図１９ではＷは線Ｌで表され、ランダム・ポイントは点線で、ｆ_ｗの下のＷ内の画像へと結合されている。
【０２３４】
図１３のエンコーダ、および図１６のデコーダのような、説明してきた本発明の実施形態は全て、計算の要件は低減するが、特にステゴテキストからコードを抽出する場合に必要な情報は保持するいかなるマトリクスの操作も相当な価値があるように、対数絶対値スペクトログラム内の値のマトリクスが操作されていることに基づいて動作することが理解されよう。ベクトル空間における前述の投影マップの概念がこのようなツールを提供する。したがって、Ａを列間隔がＷであるマトリクスとすると、ｆ_ｗ（ｖ）＝ＢｖであるようなマトリクスＢが存在する。
【０２３５】
以下はＭａｔｌａｂ（ＲＴＭ）プログラミング言語を用いた、必要な投影マトリクスの計算方法の一例である。
【０２３６】
Ａを列間隔がＷであるマトリクスとすると、ｆ_ｗ（ｖ）＝Ｂｖであるように以下のＭａｔｌａｂステートメントによって計算できるマトリクスＢが存在する。
【０２３７】

％ここでＤの対角上の比ゼロ要素を１に置き換え
％ＤをＬと同じサイズになるようにゼロで埋める
ｄ＝ｄｉａｇｆｒｏｍ（Ｄ）；
ｄｎｎｚ＝ｓｕｍ（ｄｋｅｙ＞１ｅ−１０＊ｄｋｅｙ（１））；
Ｄ１＝ｄｉａｇｓｚ（ｏｎｅｓ（ｄｎｎｚ、１）ｓｉｚｅ（Ｌ））；
％Ｂは基底Ｌ内のＶ内の任意のｖの成分を計算することによって動作し、
％Ｖと直交する成分をゼロにし、
％次に直交座標系に戻る
Ｂ＝Ｌ＊Ｄ１＊Ｌ’；
これらのステートメントの最後の行をＢ_０で置き換えると
＝Ｌ（：、１：ｄｎｎｚ）’；
【０２３８】
このように、Ｂ_０は投影マップｆ_ｗを実行するだけではなく、勿論Ｗ＝Ｖであるという些細な例外を伴って、Ｖ用の直交座標系よりも次元が少ないＷ用の直交座標への切換えも行う。
【０２３９】
図１６のデコーダ、およびこのデコーダの動作に関連する方程式を論考する際に、Ｕ’Ｙは以前計算されたカバーテキストと同様のサンプルの統計によって補正されたステゴテキストのセグメントの対数絶対値スペクトログラムを表したものであることが想起されよう。ＷがＵ’Ｙの列によって埋められた部分空間を表し、この受信データが図２１の簡単な例で論考したようにＷへと直交投影されると、電子透かしコードを抽出するために必要な計算は、新たな部分空間へのＹの投影によって情報の過剰な損失が生じない限りは大幅に簡略化することができる。
【０２４０】
このような投影を実行することは攻撃に対するステゴテキストの頑強性をほとんど損なわないことが判明している。
【０２４１】
このように、方程式２５を再び検討すると、投影Ｂ_０を実行することによって今度は^ｄｅｔ（Ｉ＋Ｂ_０Ｕ’Ｙ−Ｂ_０Ｕ’Ｋ）（Ｂ_０Ｕ’Ｙ−Ｂ_０Ｕ’Ｋ）’）を評価することが必要であり、ただし、Ｂ_０は図２０を参照して既に論考したようにｆ_ｗに関連するマトリクスである。
【０２４２】
この評価を実行するためにＢ_０Ｕ’をＢ_０Ｕ’Ｋと共に事前計算し、ＲＯＭに記憶しておくことができる。
【０２４３】
実行されるべき次の段階は、ｄｅｔ（Ｉ＋（Ｂ_０Ｕ’Ｙ−Ｂ_０Ｕ’Ｋ）（Ｂ_０Ｕ’Ｙ−Ｂ_０Ｕ’Ｋ）’）＝ｄｅｔ（Ｃ）^２であることに留意しながら、Ｔ＝（Ｂ_０Ｕ’）Ｙ−（Ｂ_０Ｕ’Ｋ）を評価し、Ｉ＋ＴＴ’を評価し、かつＣ’Ｃ＝ＴＴ’となるようにＣｈｏｌｅｓｋｙ分解を行うことである。Ｃは三角形であるのでｄｅｔ（Ｃ）^２は評価し易い。
【０２４４】
上記が認識されれば、ここで図２０のエンコーダと図１６のエンコーダの相違について論考することができる。
【０２４５】
図２０では、点線のボックス（１２４）は、後述する実施形態の１つの変形だけに存在する制御可能なリサンプラ回路を表している。加えて、ボックス（１２４）と回路（１１６）とを結ぶ点線の接続線も以下に説明する変形だけに存在する。Ｋは以前と同様に、カバーテキストをエンコードするために当初から使用されたキーを表す。前述したように、このキーは次にランダムな整数のシード数によって生成されるランダムなホワイト・ノイズを用い、次に二次元掃引帯域フィルタによってフィルタリングされて生成されたものである。
【０２４６】
マトリクス・マルチプライア（２０１）は前述の事前定義された統計データＵ’と共にＲＯＭ（２０２）に保持されたキーＫをマトリクス乗算してＵ’Ｋを生成する。このデータＵ’はＲＯＭ（２０３）に記憶される。マルチプライア（２０１）の出力は投影マトリクスＢ_０を生成するボックス（２０４）に供給され、この投影マトリクスはマルチプライア（２０５）によってマルチプライア（２０１）の出力で乗算されてＢ_０Ｕ’Ｋが生成される。加えて、（２０４）によって出力されたＢ_０もマトリクス・マルチプライア（２０６）の一方の入力に供給される。他方の入力にはＵ’が供給されるので、マルチプライア（２０６）の出力はＢ_０Ｕ’となる。
【０２４７】
ここで図２０のデコーダは図１６のデコーダと同様に動作し、図１６のエンコーダの素子と同様に動作する図２０のエンコーダの素子には同じ参照番号が付されている。
【０２４８】
このように、マルチプライア（１０３）において、既に定義されたＹの各ブロックはＢ_０Ｕ’で乗算される。同様に（１０６、１０７）において、Ｕ’Ｋではなく値Ｂ_０Ｕ’Ｋがそれぞれ加算および減算される。
【０２４９】
デコーダの第２の実施形態のその他の素子は第１の実施形態の素子と全く同様に動作する。
【０２５０】
ここで、上記で説明したエンコーダの２つの実施形態のベイズ・プロセスに必要な先の統計値の生成手順を説明する必要がある。これは図２２の流れ図に関連して説明する。
【０２５１】
この図のステップＳ１で、複数の音楽サンプルが連結される。音楽サンプルは広範囲の音楽から選択することができ、例えば適当な数のＣＤからの抜粋を再生することによって作成することができる。ＣＤの抜粋はテープ、ライブ、または放送された音楽とミキシングするか、または置き換えることができる。
【０２５２】
この抜粋の最終結果が広範囲の音楽をカバーできる長さの音楽の抜粋となる。勿論、ユーザがデコード目的で適切なデータ・セットを選択できるように、統計データが異なる種類の音楽をベースにできるようにするため、選択された音楽の抜粋をスキューすることもできる。このようにして、統計データのいくつかの異なるセットをＲＯＭ（２０３）内に記憶しておいて、ユーザが適宜に選択することができる。
【０２５３】
ステップＳ２で、連結された音楽サンプルの対数パワー・スペクトログラムが生成される。ステップＳ３で、このようにして得られたパワー・スペクトログラムの列の平均値が計算され、ステップＳ４でこの平均値が個々の列から減算されてＡが算出される。
【０２５４】
後者の２つのステップは、スペクトログラムの行が適切な特性を備えた高域フィルタを通過するようにさせることによって概算することができる。
【０２５５】
ステップＳ５において、ステップＳ４で得られた値の共分散マトリクスが生成され、したがってＥ＝ＡＡ’／Ｎとなる。
【０２５６】
ステップＳ６において、スケール・マトリクスの場合はウィシャート・パラメータがｒ＊Ｅであり、形状パラメータがＫである前述の種類のｄｅｔＶ分布から各列が別個に引き出されるものと想定される。更に、対数ｒが不適切に均一な事前分布を有し、Ｋが不適切に均一な事前分布を有することも想定される。ステップＳ６で、ベイズの定理を用いてγおよびｋのＭＡＰ（最大事後確率）が計算される。
【０２５７】
ステップＳ７でＶがｒＥに等しく、またｋ＝ｋに設定され、これらの値を用いてＵ’が計算される。したがってＵ’はサンプルの対数パワー・スペクトログラムの平均局所共分散マトリクスを単位マトリクスに変換するマトリクスである。最後にステップＳ８でＵ’が記憶される。
【０２５８】
図２０に示したいくつかの変形をここで説明する。
【０２５９】
バッファ（１１５）からタイミング周波数および位相のＭＡＰ値を抽出した結果、最も正確な性能を生じない可能性がある。したがって、説明した２つのデコーダの変形では、考え得るタイミング周波数ωと位相φの全ての範囲が考慮される。ωとφの各々について、ωおよびφに基づいてスライスが行われるバッファ（１１５）からスライスされた値の絶対値を加算し、その結果を累乗することによって、データをＤとすると事後確率Ｐ（ω、φ｜Ｄ）が推定される。
【０２６０】
前述のデコーダの場合、その後でＰ（ω、φ｜Ｄ）が最大値になるωとφの値を用いて、バッファ（１１５）内の値がスライスされる。
【０２６１】
説明している変形の代わりに、最初にリサンプラ回路（１２３）によって自動入力をリサンプリングして、ステゴテキストを拡張または圧縮する。次に、ランダムなサンプルを、クロック抽出回路（１１６）の代わりとなるランダムサンプラによってｐ（ω、φ｜Ｄ）から引き出す。この手順は、実際に真の値が、より低いもののより高い確率を含む広範囲のピークに属している場合に、ＭＡＰが分布内の狭く高いスパイクとして出現することがあるということを考慮に入れる。
【０２６２】
そこで、前述した回路（１１６）内のバッファ（１１５）のランダムなサンプリングがリサンプラ（１２３）と連係して行われる様式を検討する。
【０２６３】
最初に、ステゴテキストが６％未満だけ拡張（圧縮）されている場合は、回路（１１６）がキー（Ｋ）の性質によってデータ・スライス回路（１１７）用の適宜のタイミングを即座に選択し、同様に回路（１２３）に適正な補正値を出力して、検知された拡張（圧縮）を解除するようにする可能性が高い。
【０２６４】
これに対して、拡張（圧縮）が６％以上である場合は、バッファ（１１５）内の値、ひいては計算されたＰ（ω、φ｜Ｄ）の値は不適正であり、全て同じ大きさを有する。その結果、乱数値がリサンプラ回路に送られる。ここでは２つの場合を検討する必要がある。第１に、フィードバックされた値が偶然に適正値の±６％内にある場合である。可能な拡張の全範囲が±１０％である場合は、これが生ずる可能性は少なくとも１／２である。新たにリサンプルされたステゴテキストがバッファ（１１５）に到達すると、回路（１１６）は現在必要な適正な補正を決定することができる。そこから、適正な値が回路（１２３）にフィードバックされ、回路（１１７）に送られる。
【０２６５】
第２に、回路（１２３）にフィードバックされた値が適正値の±６％に含まれない場合である。これが再び不適正である場合は、フィードバックされる次の値はランダムに選択され、前述の手順が継続される。それぞれの反復ごとに適正な値を選択する確率は１／２であるので、適正なタイミングと位相が評価されるまで反復は数回しか必要ない。勿論、リサンプラ回路は入力されたステゴテキストを不変のままに止めることによって始動できる。
【０２６６】
最後に、正確に再現可能な入力を処理する全てのデコーダの共通の問題点は、ある環境では特定の入力が適正にデコードされず、入力にいずれかの不規則性がない場合は、入力が繰り返されるごとに同じ問題が再び生ずることである。したがって、例えば図１６および図２１に関連して前述したデコーダ、およびその変形がこの問題を回避する手段を有するようなデコーダを提供することが提案される。その１つは、デコーダへの入力に真にランダムなノイズを単に加算することである。これにはデコーダの性能を劣化させるという欠点がある。
【０２６７】
別の代替形態は、所定の範囲内で真にランダムな長さを有しているゼロ信号、またはランダム・ノイズの周期で実際の入力がデコードされるように処理することである。
【０２６８】
前述の明細書ではエンコーダおよびデコーダの様々な実施形態が「フィルタ」、「マルチプライア」、「バッファ」および「回路」等のような回路素子に関して定義されてきたことが理解されよう。しかし、信号の実際のレコーディング、または再生とは別個に、これらの回路素子は全て、適宜のソフトウェア操作で代用することができる。したがって、特に図１３を参照して説明したエンコーダはその全ての機能的な様相で、適宜のコードを受信する汎用コンピュータにより代用することができる。このようなコードの例は、図２０のデコーダで使用されるマトリクスＢ_０のジェネレータに関連して得られる。したがって、図１３、１６、および２０に示した全てのステップとブロックの機能はソフトウェア・ステップとして実行されることができる。
【０２６９】
デコーダの実施形態の場合、それらがステゴテキストをデコードするだけではなく、例えば音楽のような出力としてステゴテキストを生成する個々のシステムで使用される場合は、デコーダは、場合によっては超大規模集積回路を使用する集積マイクロプロセッサ（単数または複数）の形態でもよい。
【図面の簡単な説明】
【図１】
ステゴテキストを生成するように付加的なデータを伴うカバーテキスト信号をエンコードし、かつデコードするためのシステムのブロック図である。
【図２】
ステゴテキストを生成し、かつデコードするために図１の実施形態で使用できるエンコーダおよびデコーダのブロック図である。
【図３】
音楽の一節のパワー・スペクトルのグラフである。
【図４】
パワー・スペクトログラムが補正された場合の変調パターンの重複を示したグラフである。
【図５】
畳み込みエンコーダのブロック図である。
【図６】
図２の実施形態よりも詳細に示したエンコーダ／デコーダのブロック図である。
【図７】
ステゴテキストへの時間−拡張による侵害を示したグラフである。
【図８】
図１のシステムの本発明の実施形態に従って使用されるフィルタのパラメータを示したグラフである。
【図９】
連続するキーのフィルタ特性を示したグラフである。
【図１０】
掃引帯域フィルタを使用した一次元のホワイト・ノイズ信号を示したグラフである。
【図１１】
各方向に掃引フィルタを使用した二次元のホワイト・ノイズ信号の結果を示したグラフである。
【図１２】
相関に対する拡張の影響を示したグラフである。
【図１３】
本発明に基づくエンコーダの一実施形態のブロック図である。
【図１４】
異なる種類のノイズを示したグラフである。
【図１５】
異なる種類のノイズを示したグラフである。
【図１６】
図１６は、図１６Ａ及び図１６Ｂの合体した状態を示す図である。
【図１６Ａ】
図１６Ａは図１に示したシステムのデコーダの一実施形態の前半を示す図である。
【図１６Ｂ】
図１６Ｂは図１に示したシステムのデコーダの一実施形態の後半を示す図である。
【図１７】
図１６のデコーダのバッファの内容を示したグラフである。
【図１８】
図１７に示した値の処理結果を示したグラフである。
【図１９】
図１６および２０のデコーダの一部を形成する最大尤度畳み込みコード・デコーダの動作を示したグラフである。
【図２０】
図２０は、図２０Ａ及び図２０Ｂの合体した状態を示す図である。
【図２０Ａ】
図２０Ａは、デコーダの第２の実施形態の前半を示す図である。
【図２０Ｂ】
図２０Ｂは、デコーダの第２の実施形態の後半を示す図である。
【図２１】
ベクトル空間での投影マップを示したグラフである。
【図２２】
図１６および２０のデコーダで使用される音楽に関するパラメータの生成を示した図である。
【図２３】
キーの生成を示したグラフである。[0001]
The present invention relates to digital watermarking (watermarking) of an analog signal or a digital signal. Although the signal may be a video signal or a data signal, it is to be understood that the invention is particularly, but not exclusively, concerned with the watermarking of audio signals.
[0002]
The term "digital watermark" is meant to cover procedures for adding data to the main signal so that the added data does not affect the main purpose of the main signal. The main signal is often called “cover text”, and the signal including the added digital watermark data is often called “stego text”. Thus, in the case of an audio signal, it is intended that the presence of the added data in the stego text is virtually inaudible to the listener when the stego text is reproduced. However, the presence of the added data in the stego text allows the base point of the cover text to be identified when the user has an appropriate decoding device. If an appropriate circuit is provided in the user's device, if the restored digital watermark data is not compatible with the device, the user may be prevented from reproducing the main data included in the original cover text signal. In addition, the user must be able to play the cover text.
[0003]
Such techniques clearly have great potential for recording music. As a result, those who have the right to listen to stego text have made considerable efforts to digitally watermark the audio signal so that the spurious sounds created by the added coded data do not hurt the enjoyment. I have been.
[0004]
On the one hand, it is effective after various kinds of conventional signal processing that can be applied to the recorded and transmitted audio material, and to a direct attempt to remove or invalidate the added coded data. It is important that the watermark is robust enough to be able to withstand it.
[0005]
An apparatus and method for performing digital watermarking on an analog signal is described in International Patent Specification WO 98/53565, which discloses some techniques that have been used for digital watermarking signals.
[0006]
One method of watermarking proposed in this earlier published specification involves calculating a short-term autocorrelation function of the audio signal, and then reducing the difficulty of listening and certain delay (s). This involves adding a summation signal that changes the value of the autocorrelation function in the short term, thereby creating a particular wavelength that conveys data at a lower rate. The actual modulation of the data to this wavelength can be performed using any of several suitable modulation techniques. At the receiving end of the device, a watermark reader (or decoder) calculates the short-term autocorrelation function of the stegotext and performs demodulation appropriate for the modulation technique used. If the reader has access to the data originally used to modulate the autocorrelation function, the added coded data can be removed from the stegotext.
[0007]
However, the short-term autocorrelation function of many audio signals can be easily modified to arbitrarily approach any length to zero with a delay without changing the sound of the base speech. Therefore, it is possible to relatively easily infringe the signal subjected to the digital watermarking process and invalidate the effect of the digital watermarking process.
[0008]
The present invention relates to providing a digital watermark processing system that does not have the above-mentioned disadvantages, and a decoder that decodes a signal subjected to digital watermark processing.
[0009]
According to a first aspect of the present invention, there is provided an encoder for encoding a cover text signal to generate a stego text,
First transforming means for performing a fast Fourier transform and a square pole transform of the cover text signal, thereby transforming the cover text signal into a logarithmic power spectrogram;
Means for providing at least one key, wherein one or each key is in the form of a two-dimensional pattern of a predetermined size;
A multiplier for adding or subtracting a multiple of the key, or, if there are multiple keys, a multiple of one or more keys to a block of the converted cover text signal in a logarithmic power spectrogram domain;
Means for controlling the addition or subtraction of the key (s) by the multiplier according to data representing the desired code;
An encoder comprising: a second transform means for performing a polar square transform and an inverse fast Fourier transform of the modulated cover text signal, thereby producing a stegotext. Is done.
[0010]
According to a second aspect of the present invention, there is provided a method of encoding a cover text signal to generate a stego text,
Performing a fast Fourier transform and a square pole transform of the cover text signal, thereby converting and performing the cover text signal into a power spectrogram domain;
Providing at least one key, wherein one or each key is in the form of a two-dimensional pattern of a predetermined size;
Adding or subtracting multiples of a key, or multiples of one or more keys, if there are multiple keys, to a segment of the converted cover text signal within the logarithmic power spectrogram domain;
Controlling the addition or subtraction of multiples of the key (s) in an addition / multiplication step according to the data representing the desired code;
A method is provided for performing a polar square transform and an inverse fast Fourier transform of a modulated cover text signal, thereby generating and performing a stegotext.
[0011]
In a third aspect of the invention, the log power spectrogram of the cover text signal is represented by a cover text power spectrogram in the log domain according to the watermark code from which a key or multiples of each key was generated. A decoder for decoding the stegotext generated by modulating with at least one key (K) added or subtracted from the original text and returning the modulated power spectrogram to the original area of the covertext. Transform means for performing a fast Fourier transform and a square pole transform of the signal, thereby transforming the stegotext signal into a logarithmic power spectrogram domain, and the logarithmic power spectrogram of the original cover text signal thereby comprising: Encoded key Means for supplying (one or more) and subtracting the positive and negative multiples of the key (s) from the blocks of the log power spectrogram in the log power domain and uncorrecting the cover text according to a given statistical model A decoder is provided, comprising: calculating means for evaluating the probability of the result of said subtraction representing a block; and extracting means for restoring encoded data from the output of the calculating means.
[0012]
In a fourth aspect of the invention, the logarithmic power spectrogram of the covertext signal is represented by a key or a multiple of each key in the logarithmic domain according to the data of the watermark code from which the stegotext was generated. Decoding the stegotext generated by modulating with at least one key (K) added or subtracted from the spectrogram and returning the modulated power spectrogram to the original area of the cover text, Performing a fast Fourier transform and a square pole transform of the stegotext signal, thereby transforming and performing the stegotext signal into a logarithmic power spectrogram domain, and performing a logarithmic power spectrogram of the original cover text signal. Encoded by it Providing the key (s), subtracting the positive and negative multiples of the key (s) from the block of log power spectrogram in the log power domain, and uncovering the cover text according to a predetermined statistical model. A method is provided that includes evaluating the probability of the result of the subtraction representing a modified block, and recovering the encoded data from the output of the computing means.
[0013]
According to a fourth aspect of the present invention, there is provided a digital watermark key generator according to claim 45.
[0014]
For a better understanding of the invention, embodiments of the invention will now be described by way of example with reference to the accompanying drawings.
[0015]
Referring now to FIG. 1, the basic system comprises a key generator (1), an encoder (2) and a decoder (3). The key generator (1) generates a pseudo-random key based on the integer seed value input in (1 '). The decoder (2) uses the key to generate the stegotext, and marks the music file input in (4) as a cover text with data using a key. The data is input (2 ') to the encoder (2). The decoder (3), which has received the stegotext via the transmission line (5), again reads back the data from the marked file using the key and outputs the restored data at (6). The same key must be used for the encoding and decoding operations to ensure that the data has been read back properly. The key is, of course, reproducible from the seed when needed, so the seed value is all the values needed to decode the marked file. The transmission line (5) can, of course, take a wide variety of forms. Thus, the stegotext could be recorded on any suitable medium or transmitted wirelessly, over a fiber cable, or the like. In the following, any file that is not marked is called cover text, and a file that has been subjected to digital watermarking is called stego text. Although this embodiment is described in connection with use with music, it will be appreciated that the techniques and devices described may be used for non-music such as voice or video data.
[0016]
FIG. 2 of the accompanying drawings shows a block diagram of a more detailed embodiment according to the present invention. In this figure, the cover text is an unmarked audio file indicated by (10). The sound source of this audio file is indicated by 10 '. This may be a recording medium, such as a tape or disk, or a radio or internet transmitted signal that picks up a live event. This audio file is input to the encoder (2) and converted into a power spectrogram in the circuit (11). The reason for this conversion is as follows. It is not possible to convey information in the phase component of the stegotext. The human ear is basically insensitive to phase, which is derived by some compression algorithm. Thus, digital watermarking techniques that depend on phase are not considered robust to compression. Moreover, by adding a random "group delay" to the audio file, the phase of its frequency components can be scrambled to process the audio file. Such processing, which is not computationally intensive, generally destroys any particular waveform in the audio file. Thus, digital watermarks that depend on waveforms on signals in the time domain format can be rendered unreadable by this process.
[0017]
Therefore, the present invention proposes to perform digital watermarking of a cover text using a power spectrogram of the cover text. In this way, only the magnitude of each frequency component in the cover text is modified, and the phase of each frequency component is preserved through the marking process. The phase information is discarded at the decoder. Here, this procedure will be described in detail.
[0018]
To calculate the power spectrogram of the cover text, the cover text is divided into block 2Y sample lengths that overlap by half the length. In this way, a new block starts for each Y sample. Sample rate f as described _s In this embodiment designed for = 44100 Hz audio files, Y is set to 1024.
[0019]
Each block is multiplied by a window function known as the analysis window, and the Fourier transform of the windowed block is calculated. The purpose of the window function is to ensure that the sample values taper towards zero at either end of the block, ensuring that discontinuities are avoided. The Fourier transform treats the block as a repeating unit of a periodic function. Since the windowed block consists of actual samples, its Fourier transform is conjugate symmetric with respect to positive and negative frequencies. Negative frequency components do not carry additional information and can therefore be discarded.
[0020]
Each Fourier coefficient is a complex number whose magnitude represents the amplitude of the corresponding frequency component and whose argument represents the phase. If the phase information is discarded, what remains is the power spectrum of the signal. In a strict sense, the power spectrum is obtained by squaring the magnitude of each Fourier coefficient.
[0021]
When several successive power spectra are arranged side by side with each other, a numerical grid is formed. One axis, which is generally vertical, represents frequency, and the other axis, which is generally horizontal, represents time. This grid is the power spectrogram of the audio sample. FIG. 3 shows an example of a power spectrogram extracted from a music passage. In this figure, the values in the grid are shown as different shades. The right column from -8 to 3 is a scale where the luminance levels of the spectrogram can be matched so that the spectrogram can be evaluated.
[0022]
The resolution of the spectrogram is determined by selecting Y. Resolution is f in frequency direction _s / 2Y, and the resolution is Y / f in the time direction. _s It is. In this embodiment, these values are 21.5 Hz and 23.2 ms, respectively. The axes in FIG. 3 are measured in these units.
[0023]
It seems difficult to reconstruct the audio waveform sufficiently well from the power spectrogram, but it is possible if the phase information is retained. The spectrogram data is returned to the time domain in an inverse Fourier transform, duplicated and added together as before.
[0024]
In order to digitally watermark the cover text from which the spectrogram is obtained, it was found that the above method can reproduce a satisfactory audio waveform as long as the spectrogram is modified little and the original phase information is retained. ing. The reconstructed time-domain segments are no longer guaranteed to taper to zero at either end, so the composite windows are added to each other as described above and the final It should be noted that the subjective quality of the typical waveform is increased. The analysis and synthesis windows must be chosen to ensure that there is no overall amplitude modulation throughout the system. In this embodiment, each of these windows is the square root of the raised cosine function.
[0025]
In FIG. 2, the modulation of the spectrogram is performed entirely in the circuit indicated by (12) in response to the bit stream to be encoded.
[0026]
Finally, in block (2), circuit (13) returns the modulated power spectrograms to the time domain and combines them for conversion to stegotext. In FIG. 2, the stego text is indicated by 15.
[0027]
The decoder (3) represents a digital watermark code in the circuit (18) by correlating the log spectrogram with a circuit (16) for converting the stego text into a log spectrogram, and outputting it in (19). (17) for extracting a bit stream to be extracted.
[0028]
It has been found that the extent to which the elements of a power spectrogram of an audio signal are modulated without audible effects is approximately proportional to their original level. Therefore, addition or subtraction to the power spectrogram may be performed up to a certain amount in decibel terms. The amount of modulation that can be perceived depends on the listening environment, but is typically about 1 dB. Thus, in this embodiment, the watermarking process is performed in the "logarithmic power spectrogram domain" and adds or subtracts from the power spectrogram according to the key generated by the key generator 1 and the data to be encoded as the watermark. Consists of doing.
[0029]
Because larger spectrogram elements can be more heavily modulated, the information conveyed by these elements is less sensitive to noise than the smaller amplitude elements. However, it is impossible to know in advance what these elements are. Thus, the described watermarking process is provided so that the cover text can extract any elements available to convey information. Thus, in this embodiment, each spectrogram element in circuit 12 is modulated to maximize the information carrying capability of the watermark. Thus, each data bit in the watermark induces a modulation pattern in the region of the spectrogram. The modulation pattern is applied on one side to encode "1" bits and on the other side to encode "zero" bits. The bits are encoded at regular intervals, ie at regular horizontal intervals T in the spectrogram.
[0030]
There may be short segments in the cover text where the watermark cannot be hidden, such as the silent part of an audio file. Therefore, it is imperative that each data bit act on a portion of the stegotext as long as possible. This embodiment uses two approaches to this task.
[0031]
FIG. 4 of the accompanying drawings graphically illustrates one of these approaches. In this approach, the spectrogram modulation patterns overlap for adjacent bits. In FIG. 4, each square K represents a copy of the modulation pattern. Each spectrogram modulation pattern K is a time unit having a width of X and a frequency unit having a height of Y. Y is the total height of the spectrogram. In this embodiment, X is 32 and T = 5. Thus, if the first 32 column wide block of the cover text power spectrogram is modulated by a key of the same size and then the key is stepped by T (5 columns), the first 5 columns of the cover text will be It is only modulated by five rows of keys. In the next repetition of the modulation, columns 6 to 37 of the cover text are modulated by the key, so columns 6 to 32 are modulated twice. In the third repetition, columns 6 to 10 of the first block remain in double modulation only, while columns 11 to 32 are modulated three times, while columns 33 to 37 are modulated second. , And the first modulation is performed for the 38th to 42th columns. This sequence is repeated over the length of the cover text. Of course, the values of X and T will vary over a wide range. For example, X may be 256 and T may be 10.
[0032]
The second approach is to apply an error correction code to the message bits to still spread the effect of each bit in a timely manner.
[0033]
The convolutional encoder shown in FIG. 5 is used to extend the effect of each input bit over a long portion of the music, so as to reduce the need for memory in the decoder compared to using longer keys. . The data stream to be encoded is input on line (30) to a shift register, which in this embodiment consists of three D-type flip-flops (31, 32 and 33). The clock (clk / 2) is provided on line (34). The output switch (35), which is flipped at the clock rate (clk), couples one or two exclusive OR combinations of bits in a shift register formed by three flip-flops. Are connected to the outputs of exclusive OR gates (36, 37). In this embodiment, the upper exclusive OR gate (36) is connected to all three bits of the shift register, and the lower gate (37) is connected to bit 0 and bit 2. This encoder is defined by a two-dimensional matrix [111; 101], where the first row of the matrix corresponds to the connection of the shift register to the upper exclusive OR gate, and the second row is the lower exclusive OR gate (37 ). The pattern of connections can be represented in the form of a polynomial with coefficients from the set {0,1}. In this case, the polynomial is X ² + X + 1 (gate 36) and X ² +1 (gate 37).
[0034]
In this encoder, each input bit acts on six consecutive output bits (the total number of entries in the matrix), and the output bit rate is twice the input bit rate (the number of rows in the matrix). Such a code is called a "rate 1/2 code". The entry "generator polynomial" in each row of the matrix must be selected for decompression. In this embodiment, only codes whose speed is the reciprocal of an integer can be used. It should be understood that the reason for this limitation is only due to the type of encoder used in this embodiment and has no other relevance. Thus, if another form of error correction coding had to be used, the above constraints would not necessarily apply. The identification code sent unchanged on the output is specified by the matrix [1].
[0035]
The reason the code is called "convolution" is because it can be implemented using the following convolution function. Zeros are first inserted into the input data bits according to the code rate. For example, assume that the original data is (1011). This data is now half the original speed, since zeros are inserted into this data to obtain (1000101). When the above encoder matrix (111011) written as a single row is convolved with these data, it becomes (11122221111). (111000010111) modulo-2 is adopted. The modulo-2 operation acts as an exclusive OR gate (36 and 37) in the encoder. Thus, a 4-bit sequence has been encoded into a 12-bit codeword. Generally, a sequence of n bits is encoded into a (2n + 4) -bit codeword.
[0036]
Although convolutional codes have been described, it will be appreciated that many other types of suitable error correction codes can be utilized. Such codes include a Reed-Solomon code, a BCH code, a G0lay code, a Fire code, a Turbo code, a Gallagher code, and a Mackey-Neal code.
[0037]
Synchronous encoding is performed before convolutional encoding. Therefore, a synchronization flag is inserted into the stream of encoded data bits. In this way, synchronization of the encoded bit stream is achieved by inserting a start flag into it. The flag pattern is a "0" followed by five "1s". Additional zero bits are inserted after any sequence of four "1s" to ensure that this pattern does not otherwise occur in the data stream. This is known as "zero stuffing". The padded zeros are removed by the decoder. This procedure results in a penalty of 6 bits per start flag plus an overall reduction in data rate of just over 3%. One skilled in the art will appreciate that many alternatives are possible.
[0038]
Turning now to FIG. 6, this figure shows how the two processes just described can be incorporated into an encoder. Thus, again, (10) indicates the cover text to be encoded, K indicates the current window currently being processed, and K-1 indicates the previous window. At the multiplier (41), the extracted window K is multiplied by the analysis window function (42) so that the extracted window tapers off at each end thereof. In the circuit (43), a fast Fourier transform of the extracted window modified by the analysis window function is achieved. A square pole transformation is performed in circuit (44) to generate the required power spectrogram. This power spectrogram is modified in a circuit (45), as already mentioned, which corresponds to the circuit 12 of FIG. 2, in which the phase components of the spectrogram remain unchanged.
[0039]
To complete the generation of the stegotext, a polar transform is performed in circuit (46). The inverse fast Fourier transform is performed by the circuit (47), and the output of the inverse fast Fourier transform circuit (47) is multiplied by (49) using the synthesis window function (48). Finally, the overlapping windows are added at (50) to create the stegotext shown at (15).
[0040]
It will be appreciated that it is desirable to have several different watermarks available. Basically, by being able to use several different watermarks, it is possible for an intruder to decode, remove, or forge hidden messages without knowing which watermark to use. It becomes extremely difficult.
[0041]
In this embodiment, the term "key" is used to represent a specific number of similar digital watermarks. Also in this embodiment, the keys are generated pseudo-randomly, and any one key is defined by a single integer used as a seed. This is the seed input shown in FIG.
[0042]
In this embodiment, the key is an array of spectrogram modulation values, K (t, f), where t and f are integer exponents, and −1 ≦ K (t, f) ≦ + 1. K (t, f) is defined to be zero outside the ranges of H−x / 2 ≦ t <x / 2 and 0 ≦ f <Y. Let the spectrogram of the cover text be G (t, f) and the spectrogram of the stego text be H (t, f). d _i Represents the data bits to be encoded. Where d _i Is ± 1 (not 0 or 1). For simplicity, error correction coding is ignored. Thus the encoding algorithm is obtained by:
[0043]
[Equation 3]

[0044]
Thus, an appropriate choice of branch cut is obtained.
[0045]
(Equation 4)

[0046]
Here, s is a real constant that determines the encoding strength. In equations (1) and (2), G and H are complex numbers, but K is real. Therefore, in equation (1), argH = argG. Thus, the watermark is encoded in the power spectrum, preserving the phase of the original spectral components.
[0047]
It will be appreciated that key design is most important in generating stego-text that is robust against infringement. Here, considerations for designing the key will be described in detail.
[0048]
A key consisting solely of a white noise pattern, where each cell in the key is independent and exactly identical, is attractive for a number of reasons. It is computationally easy to generate and has the greatest possible information transfer capability. Generally, this has low correlation with the cover text and has a single narrow autocorrelation peak. Experiments have shown that this is robust to a wide variety of manipulations of the audio file, yet encoded with sufficiently low intensity that it is inaudible. However, by tampering the stegotext with group delay violations of the spectrogram, where individual rows are randomly shifted left and right, the group delay parameter can be shifted one or more columns by the already obtained spectrogram resolution. Can be arranged. This breaks any correlation between the stego text and the key. It seems impossible to select a perceptually satisfying structure of the stegotext and a spectrogram resolution which at the same time is robust against all forms of the above group delay violations.
[0049]
Furthermore, it is possible to resample the stegotext so that all frequencies are raised, for example, by 5% (less than one semitone) and the text is shortened in time by the same factor. The effect on the spectrogram is to expand the spectrogram vertically and shrink it horizontally. This procedure is illustrated graphically in FIG. 7, where 15A represents the original stego text and 15B is the modified stego text. It can be seen that very few of the cells still match. Along the frequency axis, a cell with f ≧ 20 does not overlap at all with the previous position of the cell. Also in this case, the correlation function is destroyed.
[0050]
The first of these two problems, the expansion in one dimension, can be eliminated by modifying the key to include repeating columns. Experiments have shown that 12 iterations of each spectrogram sequence are sufficient to ensure that the group delay required to break the correlation function has a perceptually unacceptable effect on the stegotext. . The price for that is a decrease in the ability to transmit information. That is, the autocorrelation peak of the key is wider and lower, thus requiring a higher encoding strength to obtain a given robustness.
[0051]
The second problem can be solved by a thorough search. The correlation function can be evaluated at a range of different resampling rates, and by finding the correlation function that gives the strongest correlation, it is possible to determine what factor the file was resampled to. Can be determined. Unfortunately, it is possible to resample the stegotext such that the pitch changes but the overall time stays constant, or the pitch stays constant but the overall time changes. This latter process is common, for example, in broadcast applications where it is desirable to fit a piece of music exactly into a given slot. Therefore, there is a possibility of a two-dimensional space for searching. That is, the stegotext may have been arbitrarily expanded in frequency and / or time. If the key is modified to include repeating columns as described above, the autocorrelation function is wide, and thus the range of possible time expansions need only be sampled sparsely, but nevertheless The computational burden is great.
[0052]
However, the present invention provides a solution to this problem. Careful consideration of the effect of expansion on a fixed base point on the key shows that the relative effect of expansion is constant across the key. This is an absolute effect that changes and induces the above problems. In this embodiment, the key pattern is modified so that higher spatial frequencies are further filtered and removed from the origin.
[0053]
For the purposes of the following description, we will consider cover text, stego text, and keys as images in the log spectrogram domain. "Frequency" means the average spatial frequency in these images, not the frequency of the underlying audio.
[0054]
First, consider the problem in one dimension. Let f (t) be a sine wave and f (t) = sin ωt. When the coefficient α is pushed, g (t) = sin αωt. These phase angles are obtained by φ = αωt−ωt = ωt (α−1). Asserting that the correlation between f (t) and g (t), calculated at appropriately chosen intervals around t, exceeds a certain threshold is equivalent to constraining the phase angle φ. , | Φ | <φ0. Therefore, if ω is constrained in the term t: | ω | <φ / (α−1) t, or if α is selected to be the largest extension that requires resistance, then some positive 1 / ω> C | t | for the constant C. In view of this relationship, it is easier to describe the sine wave time scale τ. Here, τ = 1 / ω.
[0055]
Here, it is possible to specify the frequency content of a function that is correlated with itself when expanded. It must not include frequency components with a time scale shorter than the time scale threshold (τ = C | t |), where the constant C sets the desired degree of expansion resistance. Such a function can be obtained by appropriately filtering out the white noise signal. A low-pass filter whose cutoff frequency changes in inverse proportion to t is required. Such a filter is referred to below as a "sweep" filter.
[0056]
As already described in this embodiment, the keys corresponding to successive data bits are duplicated. A high-pass filter is also applied to the key to minimize overlap between frequency components present at a particular point in time by one copy of the key and those present by previous or subsequent copies. Thus, the overall effect is that of a bandpass filter. The cutoff frequency of the high pass filter is swept to match the low pass characteristics of the adjacent key. This is shown in the graph of FIG. The "bandwidth" に関して that is constant with respect to the time scale is obtained with △ = CT, where T is the interval between successive applications of the key to the cover text. FIG. 9 shows four consecutive bits d ₀ , D ₁ , D ₂ , And d ₃ Shows how duplicate copies of the key for
[0057]
An example of the result of applying such a kind of swept bandpass filter to a white noise signal is shown in FIG.
[0058]
Similarly, a two-dimensional key can be generated from a two-dimensional white noise pattern. Filters having varying characteristics as described above are applied separately to each dimension. After filtering, the data values are passed through a non-linear function to achieve the condition -1 ≦ K (t, f) ≦ + 1. In this embodiment, a sine is used.
[0059]
An example of the resulting pattern is shown in the power spectrogram of FIG. Here, the axes are “time” and “frequency” (which means audio frequency here). These are compatible with the axes of the spectrogram in FIG. 3 to which the keys apply. The origin in the time direction is at the center of the key, while the origin in the frequency direction is at the top. The row on the right side in FIG. 11 has the same function as the scale row in FIG.
[0060]
This embodiment applies bandpass filtering not only along the X-axis, but also along the Y-axis, but since the key copy does not overlap in that direction, low-pass filtering is sufficient. There will be. The use of a low pass filter instead of a band pass filter can increase the key's ability to transmit information.
[0061]
As the value of the constant C in the above equation increases, the keys generated will withstand greater expansion. The characteristics of the high-pass and low-pass filters will be similar to each other, and thus the bandwidth of the filter will be narrower. This reduces the key's ability to transmit information. Thus, a trade off is made between the resistance to expansion and the ability to transmit information. In this embodiment, C = 0.15 (pixels per cycle) per pixel, and the key so generated is good enough with an expansion of about ± 6% in both the time and frequency directions. Act on. It can be seen that the above definition of C relates to pixels. In this context, the term pixel has a different meaning when considering horizontal and vertical filtering of the spectrogram.
[0062]
In the horizontal direction, the term pixel is used to mean the time interval between columns of the spectrogram. When considering vertical filtering, the term pixel is used to mean the frequency difference between two adjacent rows of the spectrogram.
[0063]
Thus, in FIG. 11, horizontal pixels are about 23 milliseconds and vertical pixels are about 22 Hz.
[0064]
Thus, when using τ> C | t | for the bandwidth of the low-pass filter, τ is measured in pixels per cycle, and t is within the pixel as defined above from the reference point, ie, the origin. Represents the X or Y coordinate of the point of the spectrogram concerned measured at. In FIG. 11, the reference point is at the center of the upper edge of the image. This reference point has been selected to correspond to the zero frequency. Other reference points can be selected, but the condition of zero frequency is preferred.
[0065]
The peaks of the correlation of the standard key itself after a range expansion in both frequency and time are shown in FIG. The numerical values are normalized such that the peak autocorrelation of the key is 1.
[0066]
Computing the two-dimensional correlation between the key and the stegotext reveals that if the stegotext extends in the frequency range, the peak of the correlation may shift slightly from the line y = 0. For this reason, this embodiment uses two-dimensional correlation. That is, the values of the functions that are slightly offset in the y-direction are added together to form a one-dimensional function that passes through the bit synchronizer.
[0067]
Reference is now made to FIG. 23, which shows a flow diagram for generating key K.
[0068]
Step K ₁ Then, a seed integer is input, and step K ₂ Are supplied to a Tausworthe generator which generates uniformly distributed random numbers. The Tausworthe generator output is converted to a one-dimensional Gaussian distributed random number by the box-Cox method in step K. ₃ Supplied with. For a key where X = 32 and Y = 1024, such a random number would be 32768. The process performed by the Tausworthe generator and the box-cox scheme is described in detail in the book "Principle of Random Variation Generation" by John Dagpunar, published in Oxford Science Publications by Clarendon Publishing in 1988. Has been described.
[0069]
Step K ₄ The 32768 random numbers are reconstructed into a two-dimensional array of 32 × 1024 random numbers.
[0070]
Step K ₅ , The two-dimensional sweep filtering described above is performed.
[0071]
Step K ₆ Where the data values are made to pass through a non-linear sine function in order to fulfill the condition -1 ≦ K (t, f) ≦ + 1.
[0072]
Finally, step K ₉ The key can then be used directly by the encoder or decoder, or stored in a suitable readable memory.
[0073]
All of the above processes are performed by a suitably programmed computer as shown at 300 and stored on a recording medium 301, which can be a CD, ROM, DVD, disk, tape, or any other suitable storage medium. Can be done.
[0074]
Having described the basic steps and principles of cover text encoding according to the present invention, FIG. 13 shows a block diagram of an encoder.
[0075]
As in the above figures, (10) represents cover text which is music to be encoded in this embodiment, and (15) represents final stego text.
[0076]
In circuit (51), the cover text is converted into a log magnitude spectrogram.
[0077]
The spectrogram thus generated is supplied to an FFT circuit (51), where the received spectrogram is clocked by a clock (52) into a spectrogram buffer. The FFT circuit (51) executes the overlap division of the input spectrogram and the window function described with reference to FIG. The clock (52) ensures that the contents of the spectrogram buffer (53) represent a music volume in spectrogram format, equal to the key length, which in this embodiment is 256 or 32 columns.
[0078]
The data to be encoded is supplied (55) to the circuit (54) for adding the synchronization flag and performing zero padding as described above.
[0079]
The output of the circuit (54) is supplied to a convolutional encoder (56) corresponding to the encoder described with reference to FIG. 5 and supplied with the required polynomial in (57).
[0080]
The key matrix is supplied (58) to the encoder to circuit (59), where the key matrix is converted into a set of numbers that can be directly multiplied to the spectrogram held in the spectrogram buffer (53). Is done. These values take the form of two matrices, one for encoding zero bits and the other for encoding one bit. These matrices are the exact number of the key and the inverse of the exact number of the key. Multiplying these matrices by the retained spectrogram is equivalent to adding or subtracting in the logarithmic spectrogram domain.
[0081]
The intensity with which the key modulates the contents of the buffer (53) is determined by the input (60). This input corresponds to the real constant s in Equation 2.
[0082]
The two matrices are selectively multiplied with the contents of the spectrogram buffer (53) as shown at (61), the choice being that the music stored in the buffer (53) is encoded with a single data bit. This is performed according to the output of the convolutional encoder (56). Since the contents of the buffer (53) are shifted by one clock period for each bit written to the IFFT circuit (62), the main encoding loop is executed once for each bit written.
[0083]
The output of the IFFT circuit (62) is applied to a clip prevention buffer (63). This is to ensure that the data read from the circuit (62) does not clip when the data is read as a music file. If clipping is likely, an amplitude modulation curve is generated such that the volume of the output is tapered off so that clipping is barely avoided. If it is safe to do so, the volume is again increased gradually to the norm.
[0084]
Finally, the output from the clip prevention buffer circuit (63) is output as a stegotext (15).
[0085]
FIG. 13 also includes a scrambler 65. Many possible scramblers can be used, but the standard ones are described in the CCITT V32 standard. Including a scrambler is optional, as is a convolutional encoder.
[0086]
The above description has been described as using a single key for brevity. Of course, it will be appreciated that each key may use one or more keys generated by different seed integers. Additionally, multiples of the key (s) can be used to watermark the stegotext. Since the multiple is "1" in the above embodiment, whenever a multiple of a key is mentioned, the multiple may be "1", i.e., the key remains unchanged, apart from its sign. It is suggested to stay.
[0087]
The actual method of using two or more different keys for watermarking stegotext or searching for a watermark code is quite similar to the embodiments described herein. Thus, if there is more than one key, which key is multiplied to the spectrogram at (61) at any point depends on the data to be encoded. Whenever a multiple other than ± 1 is used to modulate the spectrogram, one or more bits can be encoded each time. Of course, the same multiple set is used for decoding.
[0088]
Having described an embodiment of an encoder according to the present invention, we will now focus on the problem of decoding stegotext, which may be compressed or expanded to recover coded data.
[0089]
Having described the characteristics of the key used in the encoder of FIG. 13 to modulate the power spectrum of the cover text, when decoding the stego text to extract the watermark data, the key and the power It will be appreciated that data bits can be identified by correlating with a spectrogram. If the stego text has not been compromised or otherwise expanded or compressed, there is a clear correlation between the stego text and the key in these logarithmic elements corrected according to the data.
[0090]
The aforementioned keys can address distortions in the stego text that include ± 6% expansion of the stego text in either the vertical or horizontal direction.
[0091]
Several approaches are possible to handle the case where the stego text has undergone more than the ± 6% expansion allowed by the key.
[0092]
However, embodiments of the present invention employ another approach that does not involve direct correlation, which will now be described in detail. Before describing the actual circuit of the decoder, the principle involved therein will first be basically described.
[0093]
Most demodulators and decoders around the world have been designed to be optimal given that the desired signal is corrupted by noise, which is additive, white, fixed, and Gaussian noise. . Most are one-dimensional in the sense that only one actual value is received at any particular time, and some are two-dimensional. In this embodiment, a signal is generated if many actual values in the form of an entire spectrogram sequence are received at any one time.
[0094]
For any individual sample, the marginal probability distribution of the noise values within that sample (ie, the distribution that is accepted when the values of the other samples are unknown) is a Gaussian, normal, or mean zero distribution In some cases, the noise is called Gaussian noise.
[0095]
When the Fourier spectrum of a noise is considered, if the marginal probability distribution of any individual element of the spectrum is the same as the marginal probability distribution of any other element, the noise is called white noise.
[0096]
The marginal probability distribution of any time domain sample is the same as the marginal probability distribution of any other sample, and the combined probability distribution of any one excerpt of noise of a given length is If it is the same as the joint probability distribution, the noise is called fixed noise.
[0097]
In almost all cases, all of these assumptions are broken to some extent. In general, this is not important. However, in the present invention, exceptions from the above assumptions are very important and will therefore be described in detail.
[0098]
For example, student noise can be considered as an example of non-Gaussian one-dimensional noise. The Student distribution is similar to the Gaussian distribution except that the standard deviation differs for each sample. In particular, the inverse variable is newly derived for each sample from the gamma distribution.
[0099]
FIGS. 14 and 15 show a one-dimensional white Gaussian noise excerpt of variable 1 and a one-dimensional white (with associated gamma distribution shape parameter m equal to 1 and scaled to also have variable 1).・ Excerpts of student noise are shown. Both have the same variables, but look quite different. Student-distributed noise has some large spikes, which will occur very rarely with Gaussian noise.
[0100]
This is a problem when the noise is shock noise and in situations where outliers (under Gaussian assumptions) necessitate conclusions that are far from correct.
[0101]
Another example of non-Gaussian noise that occurs infrequently is noise that is distributed as the log of the ratio of two quantities, each of which is gamma-distributed. Such noise has many downward spikes, but no upward spikes (or vice versa). Such noise has been found to be related to audio watermarking problems.
[0102]
Non-whiteness can be disturbed beyond what is immediately noticeable. An example is color noise. This is noise in which some frequencies are higher than others, but the noise is fixed. Typically, pink noise is dominated by low frequencies, band-limited noise where other frequencies are filtered out, and the frequency at which the power spectral density drops to some limit above zero There is "1 / f" noise.
[0103]
However, in general, there are other types of non-white noise.
[0104]
One example is one-dimensional non-fixed noise. This noise may be non-fixed if only one dimension is considered. For example, noise from a resistor too close to an intermittent heat source. Or noise recorded after a filter that includes a variable capacitance (causing a "color" change as well as an amplitude change over time).
[0105]
On the other hand, if the signal is multidimensional, noise in different dimensions may be correlated with the noise, even if the signal (s) is fixed with respect to time. Alternatively, noise on one channel / dimension may have a greater amplitude than noise on another channel / dimension. Of course, there is nothing to stop multi-dimensional noise, which is also non-fixed with respect to time, just as it is unreasonable to exclude the spectrogram of the log magnitude of music.
[0106]
In general, breaking either of these assumptions will be important in attempting to decode the watermark from stego text. However, non-Gaussianity is less important if the point of interest is the robustness in decoding digital watermarks of the type involved in the present invention, especially in music, while the correlation between the signals and the different frequency components Experiments have clearly shown that the general non-whiteness of both forms of correlation, correlation with amplitude changes at different frequencies, is very important. Decoders that account for non-Gaussianity have the advantage of producing more detectable numerical values for various intermediate variables in the computation, while decoders that do not account for non-whiteness have reduced performance and require less memory. And waste the flop.
[0107]
The basis to follow is the concept of a multidimensional Gaussian distribution. Suppose x is a real N vector (ie, a column vector with N elements). Let μ be the average of the distribution of x (also N vector).
[0108]
Here, a variable σ (or 1 / σ) representing a standard deviation in a one-dimensional case ² Just as there is s), for the N-dimensional case, there is a real NxN matrix s, which is similar to the one-dimensional case, but simpler for the more complex role of s. Play a role.
[0109]
First, a two-dimensional case in which the distribution of X is spherically symmetric (that is, circularly symmetric) about μ will be considered. In this case, any slice traversing the distribution passing through μ will look similar to a one-dimensional Gaussian distribution with a standard deviation of, for example, σ. In this case, S is I / σ ² Where I represents a unit matrix. If the distribution is not circularly symmetric but elliptically symmetric with the major axis of the ellipse aligned with one of the coordinate axes, then S is a positive diagonal matrix with two different diagonal elements and a positive. If the ellipse is not aligned with the coordinate axes, S is a 2x2 symmetric matrix with all entries non-zero and positive diagonal elements. The reciprocal V of S is known as the covariance matrix of the distribution, and S is called the icov matrix (icov: abbreviation for inverse covariance).
[0110]
S has other important properties. This is a positive define which means that y'sy is a positive scalar for any non-zero vector Y (where 'represents a transfer term; y'sy is always a scalar, but S is It is positive definite and is only guaranteed to be positive whenever y is non-zero.)
[0111]
In the case of N dimensions, the same image is retained, except that instead of "circles", "N-dimensional spheres" and "ellipses" are "N-dimensional ellipsoids".
[0112]
In this case, the formula for such a density probability distribution is:
[0113]
(Equation 5)

[0114]
Suppose here that it is necessary to consider a zero-mean "random" N-dimensional Gaussian distribution. "Random", of course, has no real meaning unless it is stated which distribution it is derived from. What is needed is a distribution on an N-dimensional Gaussian distribution with zero mean. What that means is the need for distribution on the ikov matrix. Alternatively, there is a need for a distribution on a positive definite symmetric N × N matrix S.
[0115]
One of the distributions that meets these requirements is the Wishart distribution.
[0116]
The Wishart distribution has two parameters (other than N). That is, k which is a “tightness” or “shape” parameter and V which is a “scale matrix”. k must be greater than N-1. The larger this is, the tighter the distribution around its mean. V must be positive definite and symmetric. Average is kV ^-1 It is. The density is as follows:
[0117]
(Equation 6)

[0118]
If N is 1, this degrades to a gamma distribution. However, if m and r are the shape and scale parameters of the gamma distribution, then V is 2r and k is 2m.
[0119]
As described above for one-dimensional student noise, s = 1 / σ from the gamma distribution for each sample ² , Where the actual samples are derived from a one-dimensional Gaussian with zero mean and inverse covariance s.
[0120]
In that case, one way to generate multidimensional student noise is to derive a new icov matrix S for each new sample, and then sample from the multidimensional Gaussian distribution with zero mean and icov matrix S. Withdraws itself. Each new ikov matrix shall be drawn from a Wishart distribution with some appropriate parameters k and V.
[0121]
In the special case where the scale matrix V of the Wishart distribution above is a scale matrix of the form rI for some positive real number r, this Student distribution is spherically symmetric and is obtained by:
[0122]
(Equation 7)

[0123]
Thus, the probability density of the distance R of x from the base point can be expressed as:
[0124]
(Equation 8)

[0125]
Here you can see what the log absolute spectrogram of the music actually looks like. This can be modeled as multidimensional student noise, but a slightly different modeling gives a better fit. Instead of extracting a new icov matrix for every sample, a new icov matrix is extracted once per bit period or less frequently. Each sample from the resulting quasi-Student distribution is then considered a column of the spectrogram. This is called the "detV" distribution for reasons that will become apparent below.
[0126]
The question of what the parameters of the relevant Wishart distribution should be will be dealt with later in this specification. However, ideally these parameters are determined with respect to some broad musical composition, but in practice, if they include the representatives of the types of music that you might encounter, Using only a small number of works will probably not make a big difference. "Remaining uncertainty" can be handled by a Wishart distribution with k not significantly greater than N.
[0127]
The Wishart scale matrix V has a very small multiple of the identity (actually e ^-12 The special case such as I) is called a "detZ" distribution. In this case, "Z" means zero.
[0128]
Having described the background above, it is now possible to define the characteristics of a decoder that can be used to decode the stegotext generated in accordance with the present invention. Each decoder is designed as an optimal bit-by-bit memoryless decoder for a particular noise model (ie, cover text or music). Thus, each decoder calculates the Bayesian posterior distribution at that data bit, considering what was received, and then selects the value (0 or 1) with the highest posterior probability. Note that the term "optimal" has a very special meaning here. That is, the decoder is the one that produces the best output, whether it is the least expensive decoder or the most expensive decoder.
[0129]
Obviously, such a decoder is actually optimal only if the model is accurate. Otherwise, the effective capacity of the channel is reduced.
[0130]
In this way, a decoder can be implemented using a correlation function, while a bit-by-bit memoryless decoder is implemented, assuming that the "noise" is fixed noise, white noise and Gaussian noise. It can be implemented using an FFT to generate. In practice, when the signal strength is reduced to 0.005, the result is a significant loss of capacity. This loss is mitigated to some extent by discarding all the low frequency components of the key, but is still a significant loss.
[0131]
Thus, the described embodiment of the decoder includes a noise model that defines the noise as non-white multidimensional Gaussian noise that is limited to have a fixed time and to have a well-known icov matrix. Changes are included.
[0132]
The decoder can be implemented very simply for this noise model. The well-known icov matrix S is employed and a Cholesky decomposition is performed (ie, its triangular positive definite transfer square root C such that C'C = S), and both the log magnitude spectrogram and the key of the received signal are premultiplied by C Is done. This decoder can be referred to as a "whitened Gaussian" decoder because the process of pre-multiplying C converts the non-white noise assumed to white noise.
[0133]
This decoder works well when the icov matrix S is exactly the matrix S that models the noise (ie the log magnitude spectrogram of music). To understand why, the noise may be considered as a very elongated ellipsoid in N space. This decoder operates by extending the ellipsoid into a sphere, while simultaneously extending the transmitted signal by the same amount in each direction. As a result, the signal part parallel to the shortest axis of the ellipsoid is significantly extended beyond the noise and is therefore easily recoverable.
[0134]
However, if the S used is not the S of the music for which the digital watermark has been decoded, a problem occurs in this decoder. The reason why this happens is that, intuitively, instead of expanding the noise to a spherical surface, it expands to another elongated ellipsoid, so that the absolute magnitude of the noise can be compared to the magnitude of the imputed signal. Because it remains large. Therefore, even if this method is very underestimated, it is necessary to be able to use an appropriate icov matrix for each music work at the time of reading a digital watermark, and this is already a considerable disadvantage.
[0135]
However, the situation is even worse than above, since the operation is performed in the logarithmic magnitude domain. Let's examine the distortion caused by the addition of white Gaussian noise to a music signal in the time domain. If the operation is performed in the time domain, it will only slightly widen all of the narrow dimensions of the ellipsoid. However, if this is the case in the log-magnitude spectrogram domain, the added distortion noise will always have to change the spectral content so that it remains strictly proportional to the musical spectral content. Since this is not the case, more complex behavior is expected. This behavior involves rotating the ellipsoid in one or more different directions, and the decoder turns out to fail just as it did with the icov matrix of different music pieces.
[0136]
To overcome these problems, the decoder to be described now provides a multi-dimensional Gaussian noise distribution with an unknown icov matrix that is derived from the "detV" distribution, the Wishart distribution, and is locally constant over a 1-bit period. use. This takes into account the fact that neither the original icov matrix of the music being decoded nor the exact effect of the distortion introduced in the distribution on it is known, but such a matrix is known to exist. Is what you do.
[0137]
Therefore, the described decoder is designed to be the optimal decoder for this noise model. Accordingly, the decoder embodiments described below employ a stochastic approach that calculates the posterior distribution of each data bit at a given received signal. To that end, the described decoder uses Bayesian theory.
[0138]
Thus, consider that a signal, which is an M series of log magnitude spectrograms, each of which is a vector of N elements, was received during one bit period.
[0139]
The average value of the noise distribution is assumed to be zero. Assuming that the actual observed mean of the entire spectrogram is to be subtracted, this means that the mean is unknown, and a priori believes that it can take any value equal. It is very equal.
[0140]
Here, let K be the value of the key in the log absolute value spectrogram when actually used. That is, it is assumed that it has already been multiplied by the used intensity (and how it is allowed to use 10 instead of e as the base of the logarithm). If the keys are composed of M columns, each of height N, then K is an NxM matrix. Let b be the bit value in question. For simplicity, it can be assumed that there are no duplicate bits, but since it has been found that there is little difference in the decoder with or without it, it is assumed that there are no duplicate bits. For completeness, let b be +1 or -1 instead of 1 or 0. Let X be a matrix consisting of a log absolute value spectrogram of M columns of music that has been digitally watermarked. Let Y be the received log magnitude spectrogram sequence corresponding to X in the stegotext. Here, it is assumed that the timing is known.
[0141]
What should be found here is P (b | y). Of course, it is also desirable to know P (b = + 1 | Y) / P (b = -1 | Y) (or its logarithm), and if this is greater than 1 (each positive) for decoding, b = 1, otherwise b = -1.
[0142]
What is the starting information? This can be summarized in the following equation:
[0143]
(Equation 9)

[0144]
(Equation 10)

[0145]
(Equation 11)

[0146]
(Equation 12)

[0147]
Where X _m Is the mth column of X, and k and V are the parameters of the associated Wishart distribution.
[0148]
The decoder then applies the Bayes theorem as shown in the following equation to find the required posterior probabilities of the continuously received data bits.
[0149]
(Equation 13)

[0150]
[Equation 14]

[0151]
(Equation 15)

[0152]
(Equation 16)

[0153]
[Equation 17]

[0154]
(Equation 18)

[0155]
[Equation 19]

[0156]
(Equation 20)

[0157]
Equation 13 above is made up of Equation 12 by calling Equation 7. Equation 14 is satisfied by calling Equation 8.
[0158]
Equation 15 is derived from the above equation using the basic probability theory that P (A) = ＢP (A, B) dB, that is, the probability that an exclusion event combines is the sum or integral of the individual probabilities, It is derived by the definition of the conditional probability indicating that P (A | B) = P (A, B) / P (B).
[0159]
Equation 16 is constructed from Equation 15 using Equation 9. Equation 17 is derived from Equation 16 using Equation 10. Finally, equation 18 is obtained by simplification by a collecting factor.
[0160]
The integration of Equation 18 is exceptionally complex, but results in Equation 19, from which Equation 20 holds. Thus, Equation 20 is obtained from Equation 19 by dividing the transformation of Equation 19, with b = 1, with the transformation where b = −1.
[0161]
(Equation 21)

[0162]
(Equation 22)

[0163]
Since detV decoding is at the heart of the best algorithm, the best way to evaluate the right-hand side of this last equation will now be discussed.
[0164]
If W is the Cholesky decomposition of V, W′W = V. U = W ^-1 Then, U′VU = I. Therefore,
[0165]
[Equation 23]

[0166]
(Equation 24)

[0167]
(Equation 25)

[0168]
Singular value decomposition (SVD) is applied to the contents of the innermost parenthesis. SVD allows any matrix to be written as the product of an orthogonal matrix, a diagonal matrix, and other orthogonal matrices. SVD must be used twice, once for parentheses in the numerator and once for parentheses in the denominator. This procedure is shown considering only molecules.
[0169]
Thus, U′Y−U′K is a matrix, and SVD allows the calculation of L, D, and R such that LDR = U′Y−U′K (Equation 24). Here, L and R are orthogonal matrices, and D is a diagonal matrix. The next seven equations hold sequentially from equation (24).
[0170]
(Equation 26)

[0171]
In these equations, R′R = I is inevitable due to the orthogonality of R (the same applies to L). Since L is orthogonal and a real number, det (L) = det (L ′) = ± 1. Thus the last line holds.
[0172]
However, I + DD 'is a diagonal matrix calculated from the diagonal matrix, so its determinant is the product of the elements on the diagonal, each of which can be easily evaluated. Therefore, it is only necessary to know D, not L or R.
[0173]
Having discussed the basic principles of the decoding procedure, an embodiment of the decoder will now be described with reference to FIG.
[0174]
Before discussing the actual decoder of FIG. 16, it is essential that it is possible to associate the various stages within the decoder with the above theoretical discussion.
[0175]
Thus, it will be appreciated that Y represents the spectrogram block received by the decoder and is equal to X + bK. Where b is a code and K is a key. Thus, if the values of b and K are known, the probability that received Y will take a particular value is exactly the same as the probability that Y-bK represents the original cover text. This is represented by equation (8).
[0176]
Turning now to equation (9), this represents the probability that X is a spectrogram of music that has not been watermarked. In this equation, S is an unknown ikov matrix. If the music is to be represented as white Gaussian noise, S in equation (9) should always be a multiple of the identity matrix. However, as mentioned above, this does not accurately represent actual music. This is the reason for using the icov matrix in the present invention.
[0177]
Thus, it is assumed that each column of X, though not the entire musical work, separately comes from the same multidimensional Gaussian distribution defined by this unknown icov matrix S. Thus, the decoder operates on the assumption that S is different for different parts of the music. Equation 9 illustrates this mathematically.
[0178]
However, in order to decode a digitally watermarked spectrogram, it is essential to know what value X may take. Equation 9 can provide this information only if the value of S is known.
[0179]
Thus, the function of equation 10 is to determine what value S may take. In this equation, it is assumed that S is distributed according to a Wishart distribution with parameters V and K. The manner in which these parameters are selected will be described later herein.
[0180]
Having discussed the relevant basic principles, a first embodiment of the decoder will now be described with reference to FIG. 16 of the accompanying drawings.
[0181]
In FIG. 16, the music-based stegotext is again represented by (15). The following description refers only to music for simplicity, but of course other forms of cover text can be treated as well.
[0182]
At (100), the cover text is multiplied point by point with a window function comparable to the window function disclosed in (51, 52) of FIG. 6, and at (101) the fast Fourier transform of the windowed stegotext. Which is converted to the logarithmic domain at (102).
[0183]
At (103), the output of (102) is pre-multiplied with the already generated matrix U 'to represent the matrix U' shown in equations 22-25. Generation of U 'will be described later. If the output of (102) is defined as F, the output of (103) will be U '* F, which is applied to the high-pass filter (104) to remove any DC offset. The output U '* F of (104) is a log magnitude spectrogram of a length corresponding to the initial stego text.
[0184]
To process this spectrogram, it is partitioned into blocks, each having a width corresponding to K. In the present embodiment and the second embodiment, each block has a width of 32 columns in the time dimension and a width of 1024 rows in the frequency dimension. However, these values can be changed as a matter of course. Each partitioned block represents U′Y as well as the settings in equations 23, 24 and 25.
[0185]
This partitioning is performed at (105) such that each partitioned block overlaps with a preceding block that is one row narrower than the block width, corresponding to the situation shown in FIG. 4 of the accompanying drawings.
[0186]
Each block obtained as described above is added to U'K in (106) and subtracted from U'K in (107), so that two different results are obtained in (108) and (109). . That is, (1) left side: X _-1 = U'Y + U'K, and (2) right side: X ₊₁ = U'Y-U'K.
[0187]
It will now be appreciated that these values correspond to the values found in equations 23 and 24.
[0188]
The next stage of the decoder involves calculating the log of each of the denominator and numerator of Equation 23. This is achieved using the equation shown at 24. In steps (110, 111), the logarithmic determinant of the denominator and the numerator of Equation 23 is determined, and the logarithmic determinant output in step (112, 113) is respectively scaled by-((k + M) / 2). This factor is, of course, in the above equation. The derivation of this factor will be described later.
[0189]
In step (114), the quantity in equation 23 is calculated by subtracting the log of the denominator from the log of the sum determinant represented by the scaled values obtained in (112) and (113).
[0190]
The value thus obtained is stored in the buffer (115) as the log ratio of the probability of the data bit, if one or -1 data bit is present. Thus, the buffer (115) can be regarded as holding the posterior probability of the sequence of code bits, the length of the sequence being determined by the size of the buffer.
[0191]
Considering the sequence of individual entries into the buffer (115), these entries consist of values distributed around the zero axis, where the individual values represent the result of the matrix processing as described above. . The buffer (115) can hold 256 count values, but of course this number is also variable. FIG. 17 is a graph showing a sequence of values stored in the buffer (115).
[0192]
The values in the buffer (115) are represented by the black curve (150). The vertical solid line 151 represents the time at which the bit was encoded. These times are determined by the clock extraction circuit (116) in the manner described below.
[0193]
Here, it is necessary to extract a value representing the original code from the buffer (115). Again, it should be understood that the extraction of the bit rate of the code must take this into account, since the original stego text may be expanded or compressed.
[0194]
This is a procedure executed by the clock extraction circuit (116). Here, all possible sequences of slice points are considered, and the sequence of points with the largest total deviation from zero is selected as the clock for the embedded code.
[0195]
A pair of nested loops is effectively repeated over the possible bit clock frequency and phase offset. At each iteration, the sum of squares of the sliced values is calculated. The frequency and phase values that maximize this sum are returned to step (117) as a set of indicators representing the clock.
[0196]
In step (117), two pointers are used to indicate where the first and last values are expected to be found in buffer (115). These are processed so that the data is sliced from block to block and the extracted bits can be combined together without gaps or repetitions. The posterior probability vector is shifted at the same rate as the data in the spectrogram buffer (105).
[0197]
The clock generated by the clock extraction circuit (116) is used at (117) to slice the data to be read from the buffer (115) to the buffer stage (118). It will be appreciated that the original key was added to the 32 columns of the log magnitude spectrogram of the cover text at 5 column intervals. Thus, even if the stegotext is compressed or expanded prior to decoding, code bits are expected approximately every fifth column of the sequence of values stored in the buffer. However, the result of slicing the buffered data in response to the extracted clock is still sufficiently accurate for the original code as long as the decoder assumes that the music has a detV distribution. Not an expression. As mentioned earlier, this is not entirely true. Therefore, some of the points at five-row intervals may be incorrect. Therefore, it must be allowed that the original music is not in the detV distribution.
[0198]
First consider the output to step (118). This is log (P (b _n = 1 | Y _n ) / P (b _n = 0 | Y _n ))
Consists of a sequence of values that mean
[0199]
This would be a necessary sequence if the music actually had a detV distribution. But that is not the case, as already explained.
[0200]
C ₁ , C ₂ ,. . . C _n Is defined as the sequence of values output from step (117), if the music has a detV distribution, C _n = Log (P (b _n = 1 | Y _n ) / P (b _n = 0 | Y _n ))
It becomes.
[0201]
Since music does not have a detV distribution, f (C _n ) Is C _n It is necessary to find a function f that is closer to its logarithm than to the logarithm so that a more appropriate sequence can be input to the error correction decoding stage (120) of the decoder, at which time the watermark in the encoder is It will be appreciated that the code was additionally encoded with an error correction code.
[0202]
The output of the data slicing stage (117) is a series of values on either side of zero with a positive value of "+1" and a negative value of "-1".
[0203]
Each value of “+1” deviates from a value called α. Similarly, each "-1" value deviates from -α, where α is C _n Equal to the average of the absolute values of _n |.
[0204]
Positive C _n Is different from α and negative C _n Also need to be estimated for quantities different from -α. Since this value is defined as σ, σ = std (| C _n | -Α)
It becomes.
[0205]
Now that σ is obtained, the original value C ₁ , C ₂ ,. . . C _n Are scaled in step (118) so that these values have a standard deviation of 1 for + or -α / σ. This is the equation a _n = C _n / Σ and β = α / σ.
[0206]
Therefore, h _n = | A _n If | -β, then h _n Has a mean of zero and a standard deviation of one, but is not necessarily Gaussian.
[0207]
In the present embodiment, h _n Is assumed to be a one-dimensional Student distribution of the kind already discussed. Therefore,
[0208]
[Equation 27]

Becomes
[0209]
as a result,
[Equation 28]

It becomes.
[0210]
From Equation 26, the required
(Equation 29)

Holds.
[0211]
Here, the derivation of the values r and m will be described. This is actually the typical h _n Collect a large sample of the values of and using equation (25), h _n By performing a MAP estimation as the likelihood of an improper uniform prior in m and log (r).
[0212]
Thus, using equation 28 in step (118), the summed, error-corrected encoding that has been decoded is finally obtained to obtain what is called a "likelihood" map containing the vector of calculated log likelihood ratios. A corrected sequence that needs to be calculated is calculated.
[0213]
FIG. 18 shows a log likelihood map derived from the contents of the buffer (115) by the procedure just described.

Reference numerals

150 and 151 are used in FIG. 18 in the same meaning as in FIG.
[0214]
The final stage of the decoder in FIG. 16 is a conventional one.
[0215]
The convolutional encoder shown in FIG. 5 produces two output bits for each input code bit.
[0216]
Therefore, for every two bits present at the output of the circuit (118), a determination must be made as to which bits are part of the desired code.
[0219]
To perform this function, the simplest form of convolutional decoder (120) examines each possible output bit and for each such bit considers all possible values of surrounding bits in a fixed window. . This procedure is performed by the phase search circuit (119). The size of the window is determined by adjusting performance and computational complexity. For example, for a window containing 10 values in the buffer, a total of 1024 sequences must be evaluated.
[0218]
For each of the 1024 sequences, the probability of the value in the buffer over the window is calculated by adding or subtracting the associated value in the buffer depending on whether the associated bit is +1 or -1.
[0219]
The probabilities of all 512 sequences with +1 at the position under consideration are added, and the other 512 sequences with zeros at the relevant position are added. This gives the probability that the bit under consideration is 1 or 0.
[0220]
This procedure is shown in FIG. In this figure, (250) is a schematic representation of the values sliced from the circuit (118). win ₁ Represents a 10-value window, and 7 represents the pixel. win _{i + 2} Represents the next window in this sequence, and 8 represents the next pixel of that value. Finally V _i Represents the result of the evaluation just performed for the pixel 7; _{i + 2} Represents the result of the following evaluation.
[0221]
As shown in FIG. 19, the output of the convolutional encoder of FIG. 5 provides two output bits for each code bit, so that the window is stepped at two bit intervals along the contents of the buffer. This procedure must be performed twice over each even and odd value in the buffer. In this way, two sequences are generated, each having an associated probability, and a selection is made based on the sequence with the higher probability at the end.
[0222]
What has been described is the simplest form of encoder / decoder.
[0223]
However, it would be advantageous to have some other ratio besides 2: 1.
[0224]
For example, if the ratio is 4 to 1, use four sequences whose window is uniformly and continuously graded for the four values in the buffer and select the most probable output bits from these sequences. Would be needed.
[0225]
Similarly, it will be apparent to those skilled in the art that there are other ways in which the code can be decoded, such as a Viterbi decoder.
[0226]
A polynomial corresponding to the decoder polynomial used in the encoding stage is provided at 120 'to the maximum likelihood decoder 120, where the finally added synchronization bits and the zeros added during the zero padding are removed at (120). Thus, the decoded data remains. The descrambler (122) is only needed if any of the scramblers (65) of FIG. 13 were used in the encoding process.
[0227]
FIG. 20 of the accompanying drawings shows another embodiment of the decoder. It can be seen that the decoder of FIG. 20 has most of the common integers as the decoder of FIG. Therefore, the same reference numbers are used where these common integers appear.
[0228]
The basis of the operation of the decoder in FIG. 21 is the concept of a projection map in a vector space.
[0229]
In the drawing, FIG. 21 shows an orthogonal projection map f from a two-dimensional vector space to a one-dimensional subspace.
[0230]
In this figure, a set of random points v ₁ , V ₂ , V ₃ , V ₄ , And v ₅ Are mapped by the function f into a single line, that is, an orthogonal line denoted by L.
[0231]
Schematically, v is v ₁ ・ V ₂ And a real N-dimensional vector space having a dot product written as The projection map to V is fV → V which satisfies the following equation:
Any v ₁ , V ₂ , V ₃ For ∈v and any real number r,
rf (v) = f (rv)
f (v ₁ + V ₂ ) = F (v ₁ ) + F (v ₂ )
f (f (v)) = f (v)
It is. In addition, any v ₁ , V ₂ About ∈v
f (v ₁ ) = 0 → v ₁ ・ F (v ₂ ) = 0
If so, it is said to be an orthogonal projection map.
[0232]
The subspace W of V is any w ₁ , W ₂ ∈w, and rw for any real number r ₁ + W ₂ Is a subset of V such that ∈W. F for each subspace W of V _w There is exactly one orthogonal projection map of W to
[0233]
For example, N may be 2, v may be the set of all real 2-vectors, and w may be the set of all 2-vectors whose first element is twice the second element. In FIG. 19, W is represented by line L, random points are represented by dotted lines, f _w To the image in W below.
[0234]
All of the described embodiments of the present invention, such as the encoder of FIG. 13 and the decoder of FIG. 16, reduce computational requirements, but retain any information needed, especially when extracting codes from stegotext. It will be appreciated that the manipulation of the matrix operates based on the manipulation of the matrix of values in the log magnitude spectrogram, so that the manipulation of the matrix is also of considerable value. The aforementioned concept of a projection map in vector space provides such a tool. Therefore, if A is a matrix with a column spacing of W, then f _w There is a matrix B such that (v) = Bv.
[0235]
The following is an example of a method of calculating a required projection matrix using the Matlab (RTM) programming language.
[0236]
If A is a matrix with a column spacing of W, then f _w There is a matrix B that can be calculated by the following Matlab statement such that (v) = Bv.
[0237]

% Where the ratio zero element on the diagonal of D is replaced with 1.
% D is padded with zeros to make it the same size as L
d = diagfrom (D);
dnnz = sum (dkey> 1e-10 * dkey (1));
D1 = diagsz (ones (dnnz, 1) size (L));
% B operates by calculating the component of any v in V in the base L,
The component orthogonal to% V is set to zero,
Return to Cartesian coordinate system
B = L * D1 * L ';
Change the last line of these statements to B ₀ Replace with
= L (:, 1: dnnz) ';
[0238]
Thus, B ₀ Is the projection map f _w , And of course, with the slight exception that W = V, also switches to the rectangular coordinates for W, which have fewer dimensions than the rectangular coordinate system for V.
[0239]
In discussing the decoder of FIG. 16 and the equations associated with the operation of this decoder, U′Y computes the log magnitude spectrogram of a segment of stego text corrected by sample statistics similar to the previously calculated cover text. Recall that it is a representation. W represents the subspace filled by U'Y columns, and when this received data is orthogonally projected onto W as discussed in the simple example of FIG. 21, it is necessary to extract the watermark code. The calculation can be greatly simplified as long as the projection of Y onto the new subspace does not cause excessive loss of information.
[0240]
It has been found that performing such projections does not significantly impair the robustness of the stegotext against attacks.
[0241]
Thus, considering equation 25 again, the projection B ₀ This time by running ^det (I + B ₀ U'Y-B ₀ U'K) (B ₀ U'Y-B ₀ U'K) ') needs to be evaluated, but B ₀ Is f as discussed previously with reference to FIG. _w Is a matrix related to.
[0242]
B to perform this evaluation ₀ U 'to B ₀ It can be pre-calculated with U'K and stored in ROM.
[0243]
The next step to be performed is det (I + (B ₀ U'Y-B ₀ U'K) (B ₀ U'Y-B ₀ U'K) ') = det (C) ² T = (B ₀ U ') Y- (B ₀ U′K) is evaluated, I + TT ′ is evaluated, and Cholesky decomposition is performed so that C′C = TT ′. Since C is a triangle, det (C) ² Is easy to evaluate.
[0244]
Given the above, the differences between the encoder of FIG. 20 and the encoder of FIG. 16 can now be discussed.
[0245]
In FIG. 20, the dashed box (124) represents a controllable resampler circuit that is present in only one variation of the embodiment described below. In addition, dotted connection lines connecting the box (124) and the circuit (116) exist only in the variants described below. K represents the key originally used to encode the cover text, as before. As mentioned above, this key was generated using random white noise, which is then generated by a random integer seed number, and then filtered by a two-dimensional swept bandpass filter.
[0246]
The matrix multiplier (201) performs a matrix multiplication of the key K held in the ROM (202) together with the previously defined statistical data U 'to generate U'K. This data U 'is stored in the ROM (203). The output of multiplier (201) is projection matrix B ₀ Which is then multiplied by the output of multiplier (201) by multiplier (205) to obtain B ₀ U′K is generated. In addition, B output by (204) ₀ Is also supplied to one input of the matrix multiplier (206). Since U 'is supplied to the other input, the output of the multiplier (206) is B ₀ U '.
[0247]
Here, the decoder of FIG. 20 operates similarly to the decoder of FIG. 16, and the elements of the encoder of FIG. 20 that operate in the same manner as the elements of the encoder of FIG. 16 are given the same reference numerals.
[0248]
Thus, in the multiplier (103), each block of Y already defined is B ₀ Multiplied by U '. Similarly, at (106, 107), the value B is used instead of U'K. ₀ U'K is added and subtracted, respectively.
[0249]
The other elements of the second embodiment of the decoder operate exactly the same as the elements of the first embodiment.
[0250]
Here, it is necessary to describe the procedure for generating the preceding statistics required for the Bayes process of the two embodiments of the encoder described above. This will be described with reference to the flowchart of FIG.
[0251]
In step S1 of this figure, a plurality of music samples are connected. The music samples can be selected from a wide range of music and can be created, for example, by playing excerpts from an appropriate number of CDs. The CD excerpt can be mixed or replaced with tape, live, or broadcast music.
[0252]
The end result of this excerpt is an excerpt of music long enough to cover a wide range of music. Of course, excerpts of the selected music can also be skewed so that the statistical data can be based on different types of music so that the user can select an appropriate data set for decoding purposes. In this way, several different sets of statistical data can be stored in the ROM (203) and selected by the user as appropriate.
[0253]
In step S2, a logarithmic power spectrogram of the concatenated music sample is generated. In step S3, the average value of the columns of the power spectrogram thus obtained is calculated, and in step S4, the average value is subtracted from each column to calculate A.
[0254]
The latter two steps can be approximated by having the rows of the spectrogram pass through a high pass filter with appropriate characteristics.
[0255]
In step S5, a covariance matrix of the values obtained in step S4 is generated, so that E = AA '/ N.
[0256]
In step S6, in the case of a scale matrix, it is assumed that each column is separately extracted from the aforementioned type of detV distribution in which the Wishart parameter is r * E and the shape parameter is K. It is further assumed that the logarithm r has an improperly uniform prior distribution and K has an improperly uniform prior distribution. In step S6, the MAP (maximum posterior probability) of γ and k is calculated using Bayes' theorem.
[0257]
In step S7, V is set equal to rE and k = k, and U ′ is calculated using these values. Therefore, U ′ is a matrix that converts the average local covariance matrix of the log power spectrogram of the sample into a unit matrix. Finally, U 'is stored in step S8.
[0258]
Some variations shown in FIG. 20 will now be described.
[0259]
Extracting the timing frequency and phase MAP values from the buffer (115) may not yield the most accurate performance. Therefore, in the described two decoder variants, the full range of possible timing frequencies ω and phases φ is taken into account. For each of ω and φ, the absolute value of the value sliced from the buffer (115) where slicing is performed based on ω and φ is added, and the result is raised to the power, so that the posterior probability P ( ω, φ | D) are estimated.
[0260]
In the case of the above-mentioned decoder, the values in the buffer (115) are sliced using the values of ω and φ at which P (ω, φ | D) becomes the maximum value.
[0261]
Instead of the described variant, the automatic input is first resampled by a resampler circuit (123) to expand or compress the stegotext. Next, a random sample is extracted from p (ω, φ | D) by a random sampler instead of the clock extraction circuit (116). This procedure takes into account that MAPs may appear as narrow, high spikes in the distribution if the true value actually belongs to a wide range of peaks, including lower but higher probabilities.
[0262]
Thus, consider the manner in which random sampling of the buffer (115) in the circuit (116) described above is performed in conjunction with the resampler (123).
[0263]
First, if the stegotext is expanded (compressed) by less than 6%, the circuit (116) immediately selects the appropriate timing for the data slicing circuit (117) depending on the nature of the key (K), Similarly, there is a high possibility that an appropriate correction value is output to the circuit (123) to release the detected expansion (compression).
[0264]
On the other hand, when the expansion (compression) is 6% or more, the value in the buffer (115), and thus the calculated value of P (ω, φ | D) is inappropriate, and all have the same size. Having. As a result, a random value is sent to the resampler circuit. Here, two cases need to be considered. First, the value fed back happens to be within ± 6% of the proper value. If the total range of possible extensions is ± 10%, the probability of this occurring is at least １／. When the newly resampled stego text reaches the buffer (115), the circuit (116) can determine the appropriate correction currently needed. From there, the appropriate values are fed back to the circuit (123) and sent to the circuit (117).
[0265]
Second, the value fed back to the circuit (123) is not included in ± 6% of the appropriate value. If this is again incorrect, the next value to be fed back is randomly selected and the above procedure is continued. Since the probability of choosing the correct value for each iteration is １／, only a few iterations are required until the correct timing and phase are evaluated. Of course, the resampler circuit can be started by keeping the input stegotext unchanged.
[0266]
Finally, a common problem with all decoders that process accurately reproducible inputs is that in some circumstances a particular input may not decode properly and if the input does not have any irregularities, The same problem reoccurs with each repetition. It is therefore proposed to provide such a decoder, for example as described above in connection with FIGS. 16 and 21, and a variant thereof having means to avoid this problem. One is to simply add truly random noise to the input to the decoder. This has the disadvantage of degrading the performance of the decoder.
[0267]
Another alternative is to process the actual input to be decoded with a period of zero signal, or a random noise, having a truly random length within a predetermined range.
[0268]
It will be appreciated that in the foregoing specification, various embodiments of the encoder and decoder have been defined in terms of circuit elements such as "filters", "multipliers", "buffers", and "circuits". However, apart from the actual recording or reproduction of the signal, all of these circuit elements can be replaced by appropriate software operations. Thus, in particular, the encoder described with reference to FIG. 13 can be replaced by a general purpose computer that receives the appropriate code in all its functional aspects. An example of such a code is the matrix B used in the decoder of FIG. ₀ Obtained in relation to the generator of Thus, the functions of all the steps and blocks shown in FIGS. 13, 16 and 20 can be implemented as software steps.
[0269]
In the case of decoder embodiments, if they are used in individual systems that not only decode the stego text, but also generate the stego text as output, such as music, for example, the decoder may be a very large scale integrated circuit. In the form of an integrated microprocessor (s) using
[Brief description of the drawings]
FIG.
FIG. 2 is a block diagram of a system for encoding and decoding a cover text signal with additional data to generate stegotext.
FIG. 2
FIG. 2 is a block diagram of an encoder and decoder that can be used in the embodiment of FIG. 1 to generate and decode stegotext.
FIG. 3
It is a graph of the power spectrum of a passage of music.
FIG. 4
9 is a graph showing overlapping modulation patterns when a power spectrogram is corrected.
FIG. 5
It is a block diagram of a convolution encoder.
FIG. 6
FIG. 3 is a block diagram of an encoder / decoder shown in more detail than the embodiment of FIG. 2.
FIG. 7
4 is a graph showing time-extension infringement on stegotext.
FIG. 8
2 is a graph illustrating parameters of a filter used in accordance with an embodiment of the present invention of the system of FIG.
FIG. 9
5 is a graph showing filter characteristics of successive keys.
FIG. 10
5 is a graph showing a one-dimensional white noise signal using a swept bandpass filter.
FIG. 11
5 is a graph showing the results of a two-dimensional white noise signal using a sweep filter in each direction.
FIG.
6 is a graph showing the effect of extension on correlation.
FIG. 13
1 is a block diagram of one embodiment of an encoder according to the present invention.
FIG. 14
5 is a graph showing different types of noise.
FIG.
5 is a graph showing different types of noise.
FIG.
FIG. 16 is a diagram showing a combined state of FIGS. 16A and 16B.
FIG. 16A
FIG. 16A is a diagram illustrating a first half of an embodiment of a decoder of the system illustrated in FIG. 1.
FIG. 16B
FIG. 16B illustrates the second half of one embodiment of the decoder of the system shown in FIG.
FIG.
17 is a graph showing the contents of a buffer of the decoder of FIG.
FIG.
18 is a graph showing a processing result of the values shown in FIG. 17.
FIG.
21 is a graph illustrating the operation of a maximum likelihood convolutional code decoder forming part of the decoder of FIGS. 16 and 20.
FIG.
FIG. 20 is a diagram showing a combined state of FIGS. 20A and 20B.
FIG. 20A
FIG. 20A is a diagram illustrating the first half of the second embodiment of the decoder.
FIG. 20B
FIG. 20B is a diagram illustrating the latter half of the second embodiment of the decoder.
FIG. 21
4 is a graph showing a projection map in a vector space.
FIG.
FIG. 21 is a diagram illustrating generation of parameters related to music used in the decoders of FIGS. 16 and 20.
FIG. 23
6 is a graph showing key generation.

Claims

An encoder for encoding a cover text signal to generate a stego text,
First transforming means for performing a fast Fourier transform and a square pole transform of the cover text signal, thereby transforming the cover text signal into a logarithmic power spectrogram;
Means for providing at least one key, wherein one or each key is in the form of a two-dimensional pattern of a predetermined size;
A multiplier for adding or subtracting a multiple of the key, or, if there are multiple keys, a multiple of one or more keys to a block of the converted cover text signal in a logarithmic power spectrogram domain;
Means for controlling the addition or subtraction of said key (s) by a multiplier according to data representing a desired code;
Second transform means for performing a polar square transform and an inverse fast Fourier transform of the modulated cover text signal, thereby producing a stegotext.

The first transforming means is adapted to divide the cover text into overlapping segments before performing a fast Fourier transform and a square pole transform, wherein the cover text has a function such that each segment tapers off at both ends. The encoder of claim 1, including a multiplier for multiplying each segment.

3. The encoder according to claim 1, wherein said first converting means converts each segment into a logarithmic power spectrogram area to generate a block having the same length and the same number of columns as said key. .

Each block of the logarithmic power spectrogram of the cover text signal is X columns wide, and the multiplier is such that the key is applied to each spectrogram block at least partially, approximately 2x / T times. The encoder according to claim 3, wherein T is applied to the block in T steps, where T is an integer number of key columns.

The second transforming means is adapted to perform a polar transform and an inverse fast Fourier transform on each modulated block of the cover text signal, and combine the resulting segments to generate a stegotext. An encoder according to any one of the preceding claims.

A method as claimed in any preceding claim, including an error correcting convolutional encoder for error correcting encoding the watermark code data prior to controlling the watermark of the cover text using the watermark code data. The described encoder.

A method for encoding a cover text signal to generate a stego text,
Performing a fast Fourier transform and a square pole transform of the cover text signal, thereby converting the cover text signal into a power spectrogram domain, performing;
Providing at least one key, wherein one or each key is in the form of a two-dimensional pattern of a predetermined size;
Adding or subtracting multiples of a key, or multiples of one or more keys, if there are multiple keys, to a segment of the converted cover text signal within the logarithmic power spectrogram domain;
Controlling the addition or subtraction of multiples of said key (s) in an addition / multiplication step according to data representing a desired code;
Performing a polar square transform and an inverse fast Fourier transform of the modulated cover text signal, thereby generating and performing a stegotext.

8. The method of claim 7, including dividing the cover text into overlapping segments before performing a fast Fourier transform and a square pole transform, and multiplying each segment of the cover text by a function such that each segment tapers off at both ends. The described method.

Each segment of the cover text is converted to a log power domain to generate a block having the same height as the key and the same number of columns, and each block of the log power spectrogram of the cover text signal is X columns wide and the multiplier is adapted to apply the key to the block in T steps, such that the key is applied to each spectrogram block at least partially 2x / T times, 9. The method according to claim 7, wherein T is an integer number of key columns.

The conversion of the modulated cover text signal to stego text is performed by performing a polar and inverse fast Fourier transform on each modulated block of the cover text signal and combining the resulting segments, 10. Each of the segments resulting from the polar square transform and the inverse fast Fourier transform is multiplied by a function such that each segment tapers off at both ends before combining to produce a stegotext. The method according to any one of the preceding claims.

A processor-executable instruction for controlling a processor to perform a method according to any one of claims 7 to 9, or a processor for controlling a processor. Storage medium for storing electrical signals that carry processor-executable instructions for performing the method.

Transmitting a stegotext encoded by a method according to one of claims 7 to 9 in a readable format or a stegotext encoded by a method according to one of claims 7 to 9. A storage medium for storing signals.

The log power spectrogram of the cover text signal is added or subtracted from the cover text power spectrogram in the log domain according to the data of the watermark code from which one key or each key was generated by which the stegotext was generated. A decoder for decoding a stegotext generated by modulating with at least one key (K) and returning the modulated power spectrogram to the original area of the covertext,
Transforming means for performing a fast Fourier transform and a square pole transform of the stegotext signal, thereby transforming the stegotext signal into a logarithmic power spectrogram domain,
Means for providing the key (s) by which the logarithmic power spectrogram of the original cover text signal is encoded;
The probability of subtracting the positive and negative multiples of the key (s) from a block of the log power spectrogram in the log power domain and representing an unmodified block of cover text according to a predetermined statistical model Calculating means for evaluating
Extraction means for restoring encoded data from the output of the calculation means.

Means for storing the estimated probabilities as logarithmic ratio values, and means for extracting individual time slices from the stored values, wherein the predetermined statistical model is such that the mean and covariance matrix parameters are 14. The decoder according to claim 13, wherein individual time slices of the spectrogram are assumed to be distributed according to a multidimensional Gaussian distribution which may or may not change.

The predetermined statistical model is that the average value is constant and predetermined, or that the average parameter is constant for each time slice but is not predetermined and is derived from another distribution, or 15. The decoder of claim 14, wherein the average value is not different between time slices separated by no more than the width of the key and is assumed to be derived from some other distribution.

The predetermined statistical model may be such that the covariance matrix parameter is constant and predetermined, or that the covariance matrix parameter is constant for each time slice, but is not predetermined and has a different distribution. Or the covariance matrix parameters are not different between time slices separated by no more than the width of the key and are assumed to be derived from some other distribution. 16. The decoder according to any one of 14 and 15.

17. The decoder according to any one of claims 14, 15, or 16, wherein said another distribution is an inappropriate uniform distribution.

17. The decoder according to claim 16, wherein said another distribution is a Wishart distribution having parameters k, v.

The preceding statistical model combines a plurality of sample sections of the cover text, generates a log power spectrogram of the combined samples, and generates a covariance matrix (U ') from the log power spectrogram. 13. The decoder according to claim 12, wherein the matrix (U ') converts the mean local covariance matrix of the log power spectrogram of the samples into an identity matrix.

Means for providing a preceding sample matrix (U '), and a power spectrogram of the cover text using which to encode at least one matrix (U'K) with the preceding sample matrix (U'). Means for providing the result of the matrix multiplication of the key (s) to be generated (K);
Means for providing a continuous segment (Y) of the power spectrogram of the cover text;
Means for multiplying each segment (Y) by said matrix (U ') to generate a matrix (U'Y);
From the respective matrices (U'Y) and (U'K) using Bayes' theorem, it is determined that the log ratio of the probability of the data bit, if any, is "+1" or "-1", respectively. 20. The decoder of claim 19, further comprising: means for deriving the set of values shown.

A first matrix multiplying means for multiplying a power spectrogram of a stegotext in a logarithmic domain by the preceding sample matrix (U '), thereby generating a matrix (U'F). When,
Means for dividing said matrix (U'F) into blocks (Y) whose length and height correspond to said key (s) (K);
Means for adding and subtracting a matrix (U'K) to and from each block (Y), respectively, thereby generating (U'Y + U'K) and (U'Y-U'K), adding and subtracting Means,
Means for generating a scaled logarithmic determinant of each of said matrices (U'Y + U'K) and (U'Y-U'K);
Means for subtracting the scaled log determinant of (U'Y + U'K) from (U'Y-U'K) so that the log ratio of the probability of each data bit of interest is "-1". 21. The decoder of claim 20, further comprising: means for generating and subtracting the series of values representing whether the value is "+1".

The means for generating the scaled log determinant includes means for generating (U'Y + U'K) and (U'Y-U'K), and scaling each log determinant by-[k + M] / 2. Where k represents the shape parameter of the Wishart distribution, M represents the number of columns in the key, each key K has x column width, y bit height, and the stegotext has It is generated by adding or subtracting K to a block of stegotext (Y) in T steps, where T is an integer number of key columns so that each key can be applied multiple times to each spectrogram block. Decoder according to any one of claims 19 to 21 according to claim 18 and when dependent on claim 18.

A buffer for storing the value of the logarithmic ratio, and clock extracting means for extracting a clock from the value in the buffer, wherein the clock extracting means uses the regular clock of the input stegotext to generate the clock. 14. The method of claim 13, further comprising: generating a series of slice points of the value in the buffer and selecting a series of points having a maximum overall deviation from zero as a clock for code embedded in the data in the buffer. And a decoder according to any one of claims 14 to 22 when dependent on claim 13.

A controllable resampler adapted to resample the stego text so that the cover text can be expanded or compressed;
Means for extracting a random sample from the contents of said buffer;
24. The decoder of claim 23, further comprising: means for controlling the resampler in response to randomly sampled samples from the buffer until the output of the resampler is at a frequency that allows extraction of the code. .

Means for scaling a series of output values output from the buffer in response to the operation of the clock extraction means, such that the values have a standard deviation of 1+ or 1- about α / σ. 25. A method according to claim 23 or 24, comprising means for scaling, wherein α is the average of the absolute values of a series of values (C _n ) extracted from the buffer, and σ = std (| C _n | -α). A decoder as described.

Means for correcting the scaled value C _n according to statistical data obtained from said predetermined statistical model, said correction comprising the following function:

Is performed by applying
26. The decoder according to claim 25, wherein r and m are pre-calculated Student distribution parameters.

The matrix (U ′) represents the matrix (B ₀ U ′), where (B ₀ ) is a matrix that transforms the matrix (U ′) into rectangular coordinates having fewer dimensions than the original coordinate system of (U ′). Decoder according to any one of claims 20 to 26 wherein there is, or is dependent on, claim 19.

The log power spectrogram of the cover text signal is added or subtracted from the cover text power spectrogram in the log domain according to the data of the watermark code from which one key or each key was generated by which the stegotext was generated. A method for decoding a stegotext generated by modulating with at least one key (K) and returning the modulated power spectrogram to the original area of the cover text,
Performing a fast Fourier transform and a square pole transform of the stegotext signal, thereby transforming the stegotext signal into a logarithmic power spectrogram domain, performing;
The log power spectrogram of the original cover text signal providing the key (s) encoded thereby;
Subtract the positive and negative multiples of the key (s) from the blocks of the log power spectrogram in the log power domain and evaluate the probability of the result of said subtraction representing an unmodified block of cover text according to a predetermined statistical model. That
Recovering the encoded data from the output of the computing means.

Including storing the estimated probabilities as log ratio values, and extracting individual time slices from the stored values, wherein the predetermined statistical model determines the individual time slices of the spectrogram according to a multidimensional Gaussian distribution. 29. The method of claim 28, wherein is assumed to be distributed.

The predetermined statistical model is that the covariance matrix parameters are constant from time slice to time slice, but are not predetermined and are assumed to be drawn from another distribution, or separated by an amount not exceeding the width of the key. 30. The method of claim 29, wherein the time slices are not different and are assumed to be derived from some other distribution.

31. The method of claim 30, wherein another distribution is a Wishart distribution having parameters k, v.

The predetermined statistical model combines a plurality of sample sections of the cover text, generates a log power spectrogram of the combined samples, and generates a covariance matrix (U ') from the log power spectrogram. 32. The method according to any one of claims 28 to 31, wherein the matrix (U ') transforms the mean local covariance matrix of the log power spectrogram of the sample into an identity matrix.

Providing a preceding sample matrix (U ');
The power spectrogram of the cover text provides the result of the matrix multiplication of the key (s) (K) encoded with the preceding sample matrix (U ′), whereby the at least one matrix (U 'K), and
Providing a continuous segment (Y) of the power spectrogram of the cover text;
Multiplying each segment (Y) by said matrix (U '), thereby producing a matrix (U'Y);
From the respective matrices (U'Y) and (U'K) using Bayes' theorem, the log ratio of the probability of the data bits, if any, is "+1" or "-1", respectively. 33. The method of claim 32, comprising: deriving a series of values.

Using first matrix multiplication means to multiply the power spectrogram of the stegotext in the logarithmic domain by the preceding sample matrix (U '), thereby generating a matrix (U'F);
Dividing the matrix (U′F) into blocks (Y) whose length and height correspond to the key (s) (K);
Adding and subtracting the matrix (U'K) to each block (Y), respectively, thereby generating (U'Y + U'K) and (U'Y-U'K); and
Generating a scaled log determinant of each of said matrices (U'Y + U'K) and (U'Y-U'K);
The (U′Y + U′K) scaled logarithmic determinant is subtracted from the (U′Y−U′K) so that the log ratio of the probability of each data bit of interest is “−1” or “ 34. The method of claim 33, further comprising: generating the series of values that represent "+1".

The log determinants of (U'Y + U'K) and (U'Y-U'K) are generated, and then each log determinant is scaled by-[k + M] / 2, where k is the Wishart distribution. A method according to any of claims 31 to 34, wherein the method represents a shape parameter, and M represents the number of columns in the key, and if dependent on claim 31.

Each key (K) has an x column width, y bits high, and the stegotext is generated by adding or subtracting each key (K) to a block (Y) of stegotext in T steps; 36. A method as claimed in any one of claims 28 to 35 wherein T is an integer number of key columns such that each key can be applied multiple times to each spectrogram block.

Storing the value of the log ratio in a buffer and extracting a clock used to extract individual time slices from the value in the buffer; and Generating a series of slice points of a value in a buffer, and selecting a series of points having a maximum overall deviation from zero as a clock for code embedded in the data in the buffer; A method according to any one of claims 30 to 36 according to claim 29 and when dependent on claim 29.

Controllably resampling the stego text so that the cover text can be expanded or compressed;
Extracting a random sample from the contents of the buffer;
Controlling the resampler in response to randomly sampled samples from the buffer until the output of the resampler is at a frequency that allows for the extraction of the code. .

A by scaling a sequence of output values C _n outputted from the buffer in response to operation of the clock extraction means, whereby, as said value has a standard deviation of 1+ or 1 to about alpha / sigma 39, wherein α is the average of the absolute values of the extracted series of values (C _n ), and σ = std (| C _n | −α). The described method.

40. The method of claim 39, wherein the scaled value is corrected according to statistical data obtained from the predetermined statistical model.

The correction applies the following function to the extracted series of values (C _n ):

Is performed by applying
41. The method of claim 40, wherein r and m are pre-computed student distribution parameters.

The matrix (U ′) represents a matrix (B ₀ U ′), where (B ₀ ) is a matrix that transforms the matrix (U ′) into rectangular coordinates having fewer dimensions than the original coordinate system of (U ′). 42. A method according to any one of claims 33 to 41 as claimed in claim 32 or when dependent on claim 32.

A storage medium storing processor-executable instructions for controlling a processor to perform the method of any one of claims 28 to 42.

43. An electrical signal carrying processor-executable instructions for controlling a processor to perform the method of any one of claims 28-42.

A digital watermark key generator for generating a key for digitally watermarking a cover text, comprising: means for generating a two-dimensional noise pattern of a predetermined height and width; Means for filtering at a cutoff frequency that varies with position within the watermark key generator.

46. The key generator of claim 45, wherein the change with position is substantially inversely proportional to a coordinate coefficient in the dimension of the position relative to a reference point in the pattern.

47. The key generator of claim 46, wherein the reference point is at the center of the dimension, or at one end of another dimension.

The filtering means functions as a low-pass filter, filtering the noise pattern so as to have no frequency components with a time scale shorter than the time scale threshold τ, where τ = C | t |, where C is A positive constant in the range of 0.05-0.4 pixels per cycle per pixel, t being the coordinates in the dimension relative to a reference point and filtering the white noise signal transverse to the dimension 48. A key generator as claimed in any one of claims 43 to 47, comprising a second means for performing.

The second filtering means determines a cutoff frequency that varies inversely with the coordinate coefficients in the second dimension so that the noise pattern does not have a frequency component with a time scale shorter than τ = C | t |. Where t is the coordinates in the second dimension relative to the reference point, and further includes a random number generator responsive to the numerical seed input to generate a uniformly distributed random number;
Conversion means for converting the random number generated as described above into a 1-d Gaussian distributed random number;
Means for reconstructing the Gaussian-distributed random numbers into a two-dimensional noise pattern, wherein the random number generator for generating random numbers is a Tausworth generator, and the converting means is a Box-Cox conversion method. 49. The key generator of claim 48 for use.

A method for generating a key for digitally watermarking a cover text, comprising generating a two-dimensional noise pattern of a predetermined height and width, and a cut-off frequency that changes a one-dimensional noise signal depending on a position in the pattern. Wherein the change with position is substantially inversely proportional to the coordinate coefficient in the dimension of the position relative to the reference point in the pattern, and the filtering removes frequency components with a time scale shorter than a time scale threshold τ. Acts as a low pass filter that filters out noise patterns without having τ = C | t |, where C is a positive constant ranging from 0.05 to 0.4 pixels per cycle per pixel. And wherein t is a coordinate in said dimension relative to a reference point.

A storage medium storing a key generated by the method of claim 50, or an electrical signal conveying a key generated by the method of claim 50.

An encoder for encoding a cover text signal to generate a stego text,
51. A transforming means for performing a fast Fourier transform and a square pole transform of the cover text signal, thereby transforming the cover text signal into a logarithmic power spectrogram, generated by the method according to claim 50. Means for providing a key in the form of an encoded two-dimensional noise pattern;
Means for modulating the logarithmic power spectrogram with a key according to a digital watermark code.