JPH11272299A

JPH11272299A - Method for embedding watermark bit during voice encoding

Info

Publication number: JPH11272299A
Application number: JP10115854A
Authority: JP
Inventors: Kineo Matsui; 甲子雄松井; Munetoshi Iwakiri; 宗利岩切
Original assignee: Toyo Communication Equipment Co Ltd
Current assignee: Toyo Communication Equipment Co Ltd
Priority date: 1998-03-23
Filing date: 1998-03-23
Publication date: 1999-10-08
Anticipated expiration: 2018-03-23
Also published as: JP3355521B2

Abstract

PROBLEM TO BE SOLVED: To provide one method for secretly embedding electronic watermarks by use of some of voice codes provided by ITU-T-recommended G. 729 CS-ACELP which is a method for encoding digital voices. SOLUTION: When a key kp for labeling candidates for pulse positions involved in information having the candidates for the plurality of adjacent pulse positions in voice codes is introduced and watermark bits are embedded by use of a synthesizing method which employs a multi-pulse sound source code book used in coding voices, the fourth pulse position m3 corresponding to the watermark bit to be embedded is selected from the candidates for the pulse positions labeled by the key kp , whereby watermark information at 200 bit/sec. or less can be secretly embedded and transmitted as the pulse positions are controlled by a random bit series generated from a text, without providing acoustic discomfort at all even if the transmitted codes with the watermark information embedded therein is directly reproduced as voices, i.e., without letting outsiders know it.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、デジタル音声の音
声符号の一部を利用して電子透かしを密かに埋込む方法
に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for secretly embedding a digital watermark using a part of a voice code of digital voice.

【０００２】[0002]

【従来技術】従来から、音声をデジタル化する最も基本
的な技術として、波形の振幅をサンプリング定理に基づ
いて量子化する線形パルス符号化（ＰｕｌｓｅＣｏｄ
ｅＭｏｄｕｌａｔｉｏｎ：以下、ＰＣＭと記載する）方
式が知られている。又、このＰＣＭ方式により得られた
複数のデジタル波形値をフレームとしてまとめ、そのフ
レームごとに音声符号を生成する手法の一つに符号励振
線形予測（ＣｏｄｅＥｘｃｉｔｅｄＬｉｎｃａｒＰ
ｒｅｄｉｃｔｉｏｎＡｕｄｉｏＣｏｄｅｓ：以下、
ＣＥＬＰと記載する）が知られている。更に、このＣＥ
ＬＰについては、国際的な標準化組織であるＩＴＵによ
り、ＩＴＵ−勧告Ｇ．７２９８ｋｂｉｔ／ｓＣＳ−
ＡＣＥＬＰ（ＣｏｎｊｕｇａｔｅＳｔｒｕｃｔｕｒｅ
ＡｌｇｅｂｒａｉｃＣＥＬＰ）が発表されており、
この勧告はＣＥＬＰを原理とする符号化法の一つで、大
幅に符号量を削減しているにも関わらず高音質な音声を
再生できる技術に関するものである。2. Description of the Related Art Conventionally, as the most basic technique for digitizing voice, linear pulse coding (Pulse Code) for quantizing the amplitude of a waveform based on a sampling theorem is known.
eModulation (hereinafter referred to as PCM) is known. A plurality of digital waveform values obtained by the PCM method are grouped into a frame, and one of the techniques for generating a speech code for each frame is code excitation linear prediction (CodeExited Lincar P).
rediction Audio Codes:
CELP) is known. Furthermore, this CE
Regarding LPs, ITU-Recommendation G.LP. 729 8 kbit / s CS-
ACELP (Conjugate Structure)
Algebraic CELP) has been announced,
This recommendation is one of the coding methods based on CELP, and relates to a technology capable of reproducing high-quality sound despite a significant reduction in the code amount.

【０００３】しかし、この勧告のようなデジタル符号に
よる音声の伝送では、伝送先で完全な音声を容易に複製
できるため、著作者や制作者或いは演奏者等の著作権保
護については逆に難しくなるという問題が指摘されてい
る。この著作権保護の対策として、デジタル符号による
デジタルメディアの不正コピーを特定するための著作権
情報（電子透かし）を、人間の知覚のあいまいさを利用
して埋込む試みが行われている。（例：松井甲子雄：デ
ィジタル透かし、画像電子学会誌，Ｖｏｌ．２６，Ｎ
ｏ．３，ｐｐ．２６６−２７４，１９９７）However, in the transmission of sound using digital codes as in this recommendation, since the complete sound can be easily duplicated at the transmission destination, it is difficult to protect the copyright of the author, creator, or performer. The problem has been pointed out. As a measure for copyright protection, an attempt has been made to embed copyright information (digital watermark) for identifying an illegal copy of digital media using a digital code by using the ambiguity of human perception. (Example: Koshio Matsui: Digital Watermark, Journal of the Institute of Image Electronics Engineers of Japan, Vol. 26, N
o. 3, pp. 266-274, 1997).

【０００４】デジタル音声に対して電子透かし等の特殊
信号を埋込むことについては、Ｂｏｎｅｙ等により、聴
感的マスキング現象を利用した電子透かしの埋込み法が
提案されている。（Ｂｏｎｅｙ他：Ｄｉｇｉｔａｌｗ
ａｔｅｒｍａｒｋｓｆｏｒａｕｄｉｏｓｉｇｎａｌ
ｓ，Ｐｒｏｃ．ｏｆｔｈｅＩｎｔｅｒｎａｔｉｏｎ
ａｌＣｏｎｆｅｒｅｎｃｅｏｎＭｕｌｔｉｍｅｄ
ｉａＣｏｍｐｕｔｉｎｇａｎｄＳｙｓｔｅｍｓ，
ｐｐ．４７３−４８０，１９９６）また、松井等によ
り、電子化雑音に見せかけて文書データを埋込み伝送す
る方法が提案されている。（松井甲子雄他：適応差分
ＰＣＭ符号化における音声符号へのテキスト情報の埋込
み、情報処理学会誌、Ｖｏｌ．３８，Ｎｏ．１０，ｐ
ｐ．２０５３−２０６１，１９９７）For embedding a special signal such as a digital watermark in digital audio, a method of embedding a digital watermark using an audible masking phenomenon has been proposed by Boney et al. (Boney et al .: Digital w
atermarks foraudio signal
s, Proc. of the International
al Conference on Multimed
ia Computing and Systems,
pp. 473-480, 1996) has proposed a method of embedding and transmitting document data by imitating electronic noise. (Koshio Matsui et al .: Embedding Text Information in Speech Code in Adaptive Differential PCM Coding, Journal of Information Processing Society of Japan, Vol. 38, No. 10, p.
p. 2053-2061, 1997)

【０００５】一方、岩切等により、すでに国際標準規格
（勧告）Ｇ．７２６に対して電子透かしを埋込む巧みな
方法も提案されている。（岩切宗利他：適応差分ＰＣ
Ｍ符号化における音声符号へのテキスト情報の埋込み、
情報処理学会論文誌、Ｖｏｌ．３８，Ｎｏ．１０．ｐ
ｐ．２０５３−２０６１，１９９７）On the other hand, according to Iwakiri et al., International standards (recommendations) G. For 726, a clever method of embedding a digital watermark has also been proposed. (Munetoshi Iwakiri et al .: Adaptive Differential PC
Embedding text information in speech code in M coding,
IPSJ Transactions, Vol. 38, no. 10. p
p. 2053-2061, 1997)

【０００６】[0006]

【発明が解決しようとする課題】しかしながら、上記し
たＢｏｎｅｙ等や松井等の提案する方法では、第３者に
埋込み位置が特定される可能性が有り、さらには音声デ
ータの配布や保存の際に施される大幅な符号圧縮で透か
し情報が消失することもあると考えられる。又、勧告
Ｇ．７２６については、上記したように岩切等により巧
みな電子透かしの埋込方法が提案されているが、勧告
Ｇ．７２９についてはそのような巧みな電子透かしの埋
込方法提案されていない。そこで、本発明の目的は、勧
告Ｇ．７２６について岩切等により示された方法とは異
なる方法により、勧告Ｇ．７２９について、圧縮された
状態の音声符号に、電子透かしを埋込む方法と、その電
子透かしの存在を隠すための簡単な方法を提供すること
にある。However, according to the method proposed by Boney et al. And Matsui et al., The embedding position may be specified by a third party. It is considered that the watermark information may be lost due to the large code compression performed. Recommendation G. As described above, Iwakiri et al. Have proposed a method of embedding a digital watermark skillfully as described above. 729, there is no proposal for such a skillful digital watermark embedding method. Therefore, an object of the present invention is to provide Recommendation G. Recommendation G.726 by a method different from that shown by Iwakiri et al. 729 is to provide a method for embedding a digital watermark in a compressed speech code and a simple method for hiding the presence of the digital watermark.

【０００７】[0007]

【課題を解決するための手段】上記課題を解決するため
の本発明の基本的なアイデアは、デジタル音声データを
符号化する際に用いられるマルチパルス音源の構造に着
目し、その合成過程においてビット系列化されたデータ
を埋込むものである。その際、埋込を施す音声符号を不
特定に選択し、埋込の規則を変化させることによって、
埋込の存在を隠すことができる。The basic idea of the present invention for solving the above problems is to focus on the structure of a multi-pulse sound source used when encoding digital audio data, and to determine the bit structure in the synthesis process. This is to embed the serialized data. At this time, by arbitrarily selecting the speech code to be embedded and changing the embedding rule,
The presence of the implant can be hidden.

【０００８】請求項１の本発明は、少なくとも固定符号
帳を用いて音声をデジタル符号化して送信する際に透か
しビットを埋め込む方法であって、前記固定符号帳にお
ける隣接した複数のパルス位置の候補に「１」か「０」
かの割り当てを行い、該パルス位置の候補を「１」か
「０」かにより選択する第１の鍵を定め、送信音声符号
中の透かしを埋込むビット位置には、前記第１の鍵によ
って選択されたパルス位置を用いることを特徴とし、デ
ジタル音声データを符号化する際に用いられる固定符号
帳（マルチパルス音源）の構造に着目し、その合成過程
においてビット系列化されたデータを埋込むものであ
る。[0008] The present invention according to claim 1 is a method for embedding a watermark bit at least when digitally encoding voice using a fixed codebook and transmitting the same, wherein candidates for a plurality of adjacent pulse positions in the fixed codebook are provided. "1" or "0"
And a first key for selecting the pulse position candidate based on whether it is “1” or “0” is determined, and a bit position for embedding a watermark in the transmission speech code is determined by the first key. It is characterized by using the selected pulse position, focuses on the structure of a fixed codebook (multi-pulse sound source) used when encoding digital audio data, and embeds bit sequence data in the synthesis process. It is a thing.

【０００９】請求項２の本発明は、上記の請求項１の本
発明において、前記固定符号帳において、前記パルス位
置の候補の所定数の合計値の取り得る値の各々に対して
「１」か「０」かの割り当てを行い、前記透かしの埋込
の実施と非実施を前記合計値が「１」か「０」かにより
選択する第２の鍵を定め、出力音声符号からフィードバ
ックにより得られた前記合計値を前記第２の鍵に対応さ
せて、前記透かしの埋め込みを実施することを特徴と
し、透かしの埋込みを施す音声符号を不特定に選択する
ようにしたことで、第３者が透かし情報を含む音声符号
を特定することが難しくなり、第３者による鍵の解析さ
れる可能性を減らすことができる。According to a second aspect of the present invention, in the first aspect of the present invention, in the fixed codebook, each of the possible values of the predetermined number of the pulse position candidates is set to “1”. Or "0" is assigned, and a second key for selecting whether to perform the embedding of the watermark depending on whether the total value is "1" or "0" is determined, and obtained by feedback from the output speech code. The embedding of the watermark is performed by associating the obtained total value with the second key, and the audio code for embedding the watermark is unspecified, so that the third party can be selected. Makes it difficult to specify a speech code including watermark information, and it is possible to reduce the possibility that a key is analyzed by a third party.

【００１０】請求項３の本発明は、上記の請求項２の本
発明において、前記第１の鍵における「１」と「０」の
割り当てと逆の割り当てをした第３の鍵を定め、前記パ
ルス位置の候補の所定数の合計値が偶数値であるか奇数
値であるかを検出し、前記合計値の偶数値と奇数値の各
々に同一の鍵となることがないように前記第１の鍵と前
記第３の鍵の一方を対応させ、送信音声符号中の透かし
を埋込むビット位置には、前記第１の鍵または前記第３
の鍵によって選択されたパルス位置を用いることを特徴
とし、透かしの埋込みの規則を変化させることで、長期
間にわたって同じ鍵を使用しても鍵を解析される可能性
を減らすことができる。According to a third aspect of the present invention, in the second aspect of the present invention, a third key which is reversely assigned to “1” and “0” in the first key is determined, Detecting whether the total value of a predetermined number of pulse position candidates is an even value or an odd value, and determining the first key so that each of the even value and the odd value of the total value does not have the same key. And one of the third keys is associated with each other, and the first key or the third key is placed in a bit position for embedding a watermark in the transmission voice code.
By using the pulse position selected by the key of (i), and changing the rule of embedding the watermark, the possibility that the key is analyzed even if the same key is used for a long period of time can be reduced.

【００１１】[0011]

【発明の実施の形態】以下に図と数式を用いて本発明の
実施形態を説明する。本実施形態では、まず、「Ｉ．勧
告Ｇ．７２９の概要」を説明し、次に、「ＩＩ．埋込み
方法」（請求項１に対応したマルチパルス音源を利用し
てテキストビット系列を音声符号に埋込む方法）、その
次に、「ＩＩＩ．秘匿性の向上方法」（請求項２と３の
埋込みの存在を隠して鍵の解析を難しくする方法）、更
に、「ＩＶ．探索アルゴリズム」（埋込のためのパルス
位置を探索する実際の動作のアルゴリズム）、「Ｖ．実
験結果」という順に説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings and mathematical expressions. In the present embodiment, first, "I. Overview of Recommendation G.729" will be described, and then "II. Embedding method" (a text bit sequence is converted to a speech code using a multipulse sound source according to claim 1). ), Followed by "III. Method of improving confidentiality" (the method of concealing the presence of the embedment of claims 2 and 3 to make key analysis difficult), and "IV. Search algorithm" ( An algorithm of an actual operation for searching for a pulse position for embedding) will be described in the order of “V. Experimental results”.

【００１２】Ｉ．勧告Ｇ．７２９の概要以下に本願の発明を理解するための参考として、ＩＴＵ
−Ｔ勧告Ｇ．７２９の概要を簡単に説明する。尚、本願
発明の目的は上記したように、勧告Ｇ．７２９のように
固定符号帳を用いて圧縮された状態の音声符号に電子透
かしを埋込む方法と、その電子透かしの存在を隠すため
の簡単な方法を提供することであり、勧告Ｇ．７２９を
詳しく説明することが目的ではないので、本明細書にお
ける以下のＩＴＵ−Ｔ勧告Ｇ．７２９の説明では、ＩＴ
Ｕ−Ｔ勧告Ｇ．７２９の本質的な構成や動作として必要
な部分であっても、本発明に関係の薄い部分は削除して
説明している。従って、本明細書における勧告Ｇ．７２
９の説明で不十分な部分については、勧告Ｇ．７２９の
本文を参照いただきたい。I. Recommendation G. Outline of ITU 729 The following ITU is used as a reference for understanding the invention of the present application.
-T Recommendation G. 729 will be briefly described. Note that the purpose of the present invention is, as described above, the recommendation G. Recommendation G.729 provides a method for embedding a digital watermark in a speech code compressed using a fixed codebook as in 729, and a simple method for hiding the presence of the digital watermark. 729 is not intended to be described in detail, and therefore the following ITU-T Recommendation G. 729, IT
U.T. Recommendation G. Even though it is a part necessary for the essential configuration and operation of the G.729, a part that is not relevant to the present invention is deleted and described. Therefore, Recommendation G. 72
9 are insufficient for the explanation of Recommendation G.9. Please refer to the text of G.729.

【００１３】ＩＴＵ−Ｔ勧告Ｇ．７２９では、共役構造
−代数的符号励振線形予測符号化方式（以下、ＣＳ−Ａ
ＣＥＬＰ方式と記す）で音声のコーデックを行ってい
る。１フレームの長さは、１０ミリ秒でその中に８０点
の標本点を有する。サブフレームの長さは、５ミリ秒で
その中に４０点の標本点を有する。ビットレートは、８
ｋビット／秒である。即ち、音声はサンプリングされ
て、各フレームごとに８０ビットの音声符号が生成さ
れ、８ｋビット／秒のビットレートで送出される。ITU-T Recommendation G. 729, a conjugate structure-algebraic code excitation linear prediction coding scheme (hereinafter, CS-A
The audio codec is performed using the CELP method. One frame is 10 milliseconds long and has 80 sample points therein. The subframe is 5 ms long and has 40 sample points therein. The bit rate is 8
k bits / sec. That is, the audio is sampled, an 80-bit audio code is generated for each frame, and transmitted at a bit rate of 8 kbit / sec.

【００１４】ＣＳ−ＳＣＥＬＰ方式を用いる勧告Ｇ．７
２９では、３２ｋビット／秒の勧告Ｇ．７２６と同等の
音声品質を実現し、第３世代のパーソナル通信である
「将来の公衆陸上移動通信システム（以下、ＦＰＬＭＴ
Ｓと記す）」に適用するために誤り耐性を確保する等の
厳しい条件が掲げられて標準化されたため、少ない遅延
で高品質なコーデックとなった。そのため、勧告Ｇ．７
２９は、ＦＰＬＭＴＳだけでなく、非同期転送モード
（ＡＴＭ）やフレーム・リレー用の音声対応多重装置に
も適用でき、既に一部の製品への実装が始まっている。G. Recommendation using CS-SCELP method 7
In Recommendation G.29, 32 kbit / sec. 726, which realizes the same voice quality as that of the third generation personal communication, "Future Public Land Mobile Communication System (hereinafter, FPLMT)".
S)) and standardized with strict conditions such as securing error resilience, resulting in a high-quality codec with a small delay. Therefore, Recommendation G. 7
29 can be applied not only to FPLMTS but also to voice-compatible multiplexing devices for asynchronous transfer mode (ATM) and frame relay, and has already been implemented in some products.

【００１５】勧告Ｇ．７２９の符号器では、（１）入力
音声を分析して量子化した線形予測合成フィルタの係数
（人の口の形状に相当）、（２）駆動音源である適応符
号帳と雑音符号帳から選択される信号パターン（声帯振
動に相当）、（３）音量を調整する利得・パラメータ
（声の大きさに相当）を符号化する。そして、復号器で
は、それらのパラメータから音声を合成する。線形予測
合成フィルタと利得の係数には、量子化効率を上げるた
めにフレーム間予測を適用する。前のフレームで得られ
たパラメータから今のパラメータを予測し、その差分を
符号化する方法である。Recommendation G. The 729 encoder selects (1) the coefficients of a linear prediction synthesis filter (corresponding to the shape of a human mouth) obtained by analyzing and quantizing the input speech, and (2) selecting from the adaptive codebook and the noise codebook which are the driving sound sources. And (3) a gain parameter (corresponding to the volume of voice) for adjusting the volume. Then, the decoder synthesizes speech from those parameters. Inter-frame prediction is applied to the linear prediction synthesis filter and the gain coefficient in order to increase the quantization efficiency. This is a method of predicting the current parameter from the parameter obtained in the previous frame and encoding the difference.

【００１６】勧告Ｇ．７２９の特徴を列挙すると、
（１）雑音符号帳で「代数的」（Ａｌｇｅｂｒａｉｃ）
と呼ぶ信号表現を取り入れたこと、（２）演算を高速化
するとともに、従来のＣＥＬＰ方式のように雑音符号帳
パターンを全て蓄積しておく必要を無くしたこと、
（３）メモリーを節約でき、パターン選択の演算量も減
少すること、（４）雑音符号帳の選択にはＡ−ｂ−Ｓ方
式（ａｎａｌｙｓｉｓ−ｂｙ−ｓｙｎｔｈｅｓｉｓ：合
成による分析）という演算量は大きいもののビットレー
トさえ十分に割り当てれば、限りなく入力音声に近い音
を合成できる方式を用いること、（５）適応符号帳から
得られるピッチ周期には、伝送中のビット誤りを検出す
るパリティを付加するので勧告Ｇ．７２６と比べて伝送
誤り耐性に優れること、ということになる。Recommendation G. 729 features,
(1) Algebraic in the noise codebook
(2) speeding up the operation and eliminating the need to accumulate all the noise codebook patterns as in the conventional CELP system;
(3) The memory can be saved, and the amount of calculation for pattern selection can be reduced. (4) The amount of calculation of the A-B-S method (analysis-by-synthesis: analysis by synthesis) for selecting a random codebook is large. If the bit rate is sufficiently allocated, use a method that can synthesize sounds as close as possible to the input speech. (5) Add a parity to the pitch period obtained from the adaptive codebook to detect bit errors during transmission Recommendation G. That is, the transmission error resistance is superior to that of H.726.

【００１７】図１は、勧告Ｇ．７２９の基本的な符号器
の構成を示すブロック図である。図１における符号器
は、入力信号１を高域通過フィルタリングとスケーリン
グする前処理部２と、前処理された信号を１０ミリ秒の
１フレームに１度の割合で線形予測フィルタの係数を算
出して、その係数を線スペクトル対（ＬＳＰ）に変換
し、量子化して線形予測係数７を出力するする線形予測
分析部３と、前記線形予測係数等が入力される線形予測
合成フィルタ４、前処理部２と線形予測フィルタ４から
の信号が入力され誤差信号を入力する加算器５，加算器
５の出力を線形予測係数７を用いてフィルタリングする
聴覚的重み付けフィルタ６、聴覚的重み付けフィルタ６
の出力を入力する固定符号帳探索部８とピッチ分析部
９、固定符号帳探索部８とピッチ分析部９の出力等が入
力されて線形予測係数７により送信ビットストリーム１
１を出力するパラメータ符号化部１０、固定符号帳探索
部８とピッチ分析部９の出力が入力されて利得を出力す
る利得量子化部１２、駆動音源である適応符号帳１３、
駆動音源であり雑音符号帳でもある固定符号帳１４、適
応符号帳１３の利得Ｇｐ１５、固定符号帳１４の利得Ｇ
ｃ１６から構成され、その動作は、勧告Ｇ．７２９に詳
しく示されているためここでは特に説明しない。FIG. 729 is a block diagram illustrating the configuration of a basic encoder of FIG. The encoder in FIG. 1 calculates a coefficient of a linear prediction filter at a rate of once per frame of 10 milliseconds by a preprocessing unit 2 that performs high-pass filtering and scaling of an input signal 1 and a preprocessing signal. A linear prediction analysis unit 3 that converts the coefficient into a line spectrum pair (LSP), quantizes and outputs a linear prediction coefficient 7, a linear prediction synthesis filter 4 to which the linear prediction coefficient and the like are input, a preprocessing An adder 5 to which signals from the unit 2 and the linear prediction filter 4 are input and to which an error signal is input, an audible weighting filter 6 for filtering the output of the adder 5 using the linear prediction coefficient 7, and an audible weighting filter 6
, And the outputs of the fixed codebook search unit 8 and the pitch analysis unit 9 are input to the transmission bit stream 1 based on the linear prediction coefficient 7.
1; a gain quantization unit 12 to which the outputs of the fixed codebook search unit 8 and the pitch analysis unit 9 are input to output a gain; an adaptive codebook 13, which is a driving excitation;
Fixed codebook 14, which is both a driving excitation and a noise codebook, gain Gp15 of adaptive codebook 13, and gain G of fixed codebook 14
c16, and its operation is described in Recommendation G. 729 is not described in detail here.

【００１８】ところで、勧告Ｇ．７２９においては、符
号化に用いる固定符号帳（雑音符号帳）に特徴があり、
本願発明においても、その固定符号帳の特徴を利用して
いる。その勧告Ｇ．７２９の固定符号帳を示す図表を図
２に示す。図２は、マルチパルス音源型の固定符号帳で
あり、５ミリ秒のサブフレームにおける４０サンプルの
図表に示す候補の中から、４つのパルス位置「ｍ０」〜
「ｍ３」と、極性情報「ｓ０」〜「ｓ３」を決定する。Incidentally, Recommendation G. 729 is characterized by a fixed codebook (noise codebook) used for encoding.
The present invention also utilizes the features of the fixed codebook. Recommendation G. FIG. 2 shows a table showing the fixed codebook of H.729. FIG. 2 shows a multi-pulse excitation type fixed codebook, in which four pulse positions “m0” to “4” are selected from the candidates shown in the chart of 40 samples in a subframe of 5 ms.
"M3" and polarity information "s0" to "s3" are determined.

【００１９】勧告Ｇ．７２９において、この図２の図表
からのパルス位置と極性情報の探索処理は、次のように
なる。例えば、図１における線形予測合成フィルタ４に
相当する１０次の線形予測フ１）（＝Ｇ．７２９：式２）のように定義する。Recommendation G. At 729, the process of searching for pulse position and polarity information from the chart of FIG. 2 is as follows. For example, a 10th-order linear prediction filter corresponding to the linear prediction synthesis filter 4 in FIG. 1) (= G.729: Equation 2).

【数１】 (Equation 1)

【００２０】また、図１における聴覚的重み付けフィル
タ６に相当する聴感的重み付けフィ２）（＝Ｇ．７２９：式２７）のように定義する。ただ
し、γ１、γ２は、聴感的重み付けフィルタＷ（ｚ）の
特性を決定するための入力信号のスペクトル形状の関数
である重み付け係数である。An audible weighting filter corresponding to the audible weighting filter 6 in FIG. 2) (= G.729: Equation 27). Here, γ1 and γ2 are weighting coefficients that are functions of the spectrum shape of the input signal for determining the characteristics of the perceptual weighting filter W (z).

【数２】 (Equation 2)

【００２１】ここで、上記線形予測フィルタと聴感的重
み付けフィルタの合成フィルタＷ図２の図表から導かれる４０次のコードベクトルｃ
（ｎ）を次の数式（数３）（＝Ｇ．７２９：式４５）の
ように表現する。Here, the synthesis filter W of the linear prediction filter and the audible weighting filter is used. A 40th-order code vector c derived from the chart of FIG.
(N) is expressed as the following equation (Equation 3) (= G.729: Equation 45).

【数３】ただし、数式（数３）中のδについては、次の数式（数
４）であるとする。(Equation 3) However, it is assumed that δ in the mathematical expression (Equation 3) is the following mathematical expression (Equation 4).

【数４】 (Equation 4)

【００２２】ここで、まず、次の数式（数５）（＝Ｇ．
７２９：式４６）のフィルタＰ（ｚ）を用いて、４０次
のコードベクトルｃ（ｎ）を処理する。Here, first, the following equation (Equation 5) (= G.
729: The 40th-order code vector c (n) is processed using the filter P (z) in the equation 46).

【数５】この数式（数５）において、Ｔはピッチ遅延である。ま
た、βは前フレームにおける適応符号帳（図１における
適応符号帳１３に相当）の利得ｇ_ｐ ^{（ｍ−１）}の量４７）のようになる。(Equation 5) In this equation (Equation 5), T is a pitch delay. Β is the amount of gain g _p ^(m−1) of the adaptive codebook (corresponding to adaptive codebook 13 in FIG. 1) in the previous frame. 47).

【数６】（ｚ）のインパルス応答ｈ（ｎ）を、次の数式（数７）
（＝Ｇ．７２９：式４９）のように修正する。(Equation 6) The impulse response h (n) of (z) is expressed by the following equation (Equation 7).
(= G.729: Equation 49).

【数７】 (Equation 7)

【００２３】次に、ターゲット信号ｘ’（ｎ）を、次の
数式（数８）（＝Ｇ．７２９：式５０）により求める。Next, the target signal x '(n) is obtained by the following equation (Equation 8) (= G.729: Equation 50).

【数８】上記の数式（数８）におけるｘ（ｎ）は、重み付け音声
ｓｗ（ｎ）からＷ（ｚ）（ｎ）とインパルス応答ｈ（ｎ）の畳み込み積分値であ
る。(Equation 8) X (n) in the above equation (Equation 8) is calculated from the weighted sound sw (n) to W (z). (N) is a convolution integral value of the impulse response h (n).

【００２４】次に、数式（数８）で得られるｘ’（ｎ）
を用いて、次の数式（数９）（＝Ｇ．７２９：式５２）
によりｄ（ｎ）を求める。Next, x '(n) obtained by the equation (Equation 8)
Using the following equation (Equation 9) (= G.729: Equation 52)
To obtain d (n).

【数９】 (Equation 9)

【００２５】また、次の数式（数１０）（＝Ｇ．７２
９：式５４）による値を、代数的構造を有する符号帳Ｃ
とする。ここで、ｍ_ｉは各パルス位置である。The following equation (Equation 10) (= G.72)
9: The value according to equation 54) is converted to a codebook C having an algebraic structure.
And Here, _mi is each pulse position.

【数１０】更に、次の数式（数１１）（参考式：Ｇ．７２９：式５
５）による値をエネルギＥとする。(Equation 10) Further, the following equation (Equation 11) (Reference equation: G.729: Equation 5)
The value according to 5) is defined as energy E.

【数１１】ここで、行列φ′（ｉ，ｊ）は、次の数式（数１２）
（＝Ｇ．７２９：式５６）（数１３）（＝Ｇ．７２９：
式５７）により求まる。[Equation 11] Here, the matrix φ ′ (i, j) is expressed by the following equation (Equation 12).
(= G.729: Equation 56) (Equation 13) (= G.729:
Equation 57).

【数１２】 (Equation 12)

【数１３】また、行列φ（ｉ，ｊ）は、次の数式（数１４）（＝
Ｇ．７２９：式５１）により求まる。(Equation 13) The matrix φ (i, j) is expressed by the following equation (Equation 14) (=
G. FIG. 729: It is obtained by Expression 51).

【数１４】 [Equation 14]

【００２６】図２の図表の符号帳の探索は、勧告Ｇ．７
２９の式５３に示したようなＣ^２／Ｅの値を最大にする
ように、Ａ−ｂ−ｓ（合成分析：Ａｎａｌｙｓｉｓｂ
ｙＳｙｎｔｈｅｓｉｓ）の手法を用いて行う。よって、
その処理量は、一般に膨大となるので、次のように閾値
を導入して探索処理量を削減する。The search for the codebook in the chart of FIG. 7
Abs (synthesis analysis: Analysis b) so as to maximize the value of C ² / E as shown in equation 53 of 29.
ySynthesis). Therefore,
Since the processing amount is generally enormous, a threshold value is introduced as described below to reduce the search processing amount.

【００２７】および、係数Ｋ_３＝０．４を用いて、次の数式（数１
５）（＝Ｇ．７２９：式６０）により閾値ｔｈｒ_３を求
める。[0027] Using the coefficient K ₃ = 0.4, the following equation (Equation 1)
5) Calculate the threshold thr ₃ by (= G.729: Equation 60).

【数１５】 (Equation 15)

【００２８】ここで、４番目のパルスｍ_３の探索は、ｔ
ｈｒ_３を超えるパルス位置ｍ_０、ｍ_１、ｍ_２との組み合
わせについてのみ実施する。また、各フレームごとの最
大探索処理量ＴＩ_ｍａｘを用いて、処理遅延が増大する
ことを抑制している。一方、送信側では、送られてきた
符号から各パラメータを抽出して出力音声を合成するた
め、その処理量は符号化に比べ少なく高速である。[0028] In this case, the fourth of the search of the pulse _{m 3} is, t
This is performed only for a combination with pulse positions m ₀ , m ₁ , and m ₂ exceeding hr ₃ . In addition, an increase in processing delay is suppressed by using the maximum search processing amount TI _max for each frame. On the other hand, the transmitting side extracts each parameter from the transmitted code and synthesizes the output speech, so that the processing amount is smaller and faster than the encoding.

【００２９】ＩＩ．埋込み方法以下、本発明を勧告Ｇ．７２９のＣＳ−ＡＣＥＬＰ符号
化方式に適用し、音声符号中に透かし情報を埋込む方法
の原理を示す。II. Embedding method Hereinafter, the present invention is described in Recommendation G. 729, the principle of a method of embedding watermark information in a speech code by applying the CS-ACELP coding method.

【００３０】図２の図表において４番目のパルス情報ｉ
_３のパルス位置ｍ_３は、他のｉ_０〜ｉ_２のパルス位置ｍ
_０〜ｍ_２の候補と異なり隣接した候補を持つことがわか
る。ここで、４番目のパルス情報のパルス位置ｍ_３の置
換を示す図である図３に示すように、選択された最適パ
ルス位置ｍ_３をそれに隣接する候補ｍ′_３に置き換えて
音声符号としても再生音声に与える影響は少ないと考え
られる。これを利用して、音声符号のマルチパルス音源
情報部分に透かし情報等の特殊信号系列を埋込む。The fourth pulse information i in the chart of FIG.
Pulse position _{m 3} of _3, other _i 0 through i ₂ pulse positions m
₀ ~m Unlike _second candidate seen to have adjacent candidate. Here, as shown in FIG. 3 is a diagram showing a fourth substitution pulse position m ₃ pulse information, even voice code the optimum pulse position m _3, which is selected by replacing the candidate m _'3 adjacent thereto It is considered that the influence on the reproduced sound is small. By utilizing this, a special signal sequence such as watermark information is embedded in the multi-pulse sound source information portion of the speech code.

【００３１】まず、パルス位置ｍ_３の候補にラベル付け
を行う鍵ｋ_ｐを導入する。例えば、鍵ｋ_ｐによる４番目
のパルス情報のパルス位置ｍ_３の分類を示す図である図
４に示すように、鍵ｋ_ｐ＝“００００１１１１”とした
ならば、鍵ｋ_ｐの最上位ビットが“０”であるので、パ
ルス位置ｍ_３の候補｛３｝に“０”を割り当て、それに
隣接する候補｛４｝に“１”を割り当てる。一方、鍵ｋ
_ｐの最下位ビットは“１”であるのでパルス位置ｍ_３の
候補｛３８｝に“１”を割り当て、それに隣接する候補
｛３９｝に“０”を割り当てる。この要領で、パルス位
置ｍ_３の全候補に“０”と“１”のラベルを付ける。Firstly, introducing the key _{k p} performing labeling candidate pulse position _{m 3.} For example, as shown in FIG. 4 is a diagram showing the classification of pulse position _{m 3} of the fourth pulse information by the key _{k p,} if the key _k p = "00001111", the most significant bit key _{k p} since "0" is, assigned a "0" to a candidate of a pulse position _{m 3} {3}, assigning a "1" to the candidate {4} adjacent thereto. On the other hand, the key k
least significant bit of _p assigns a "1" to "1" is because the pulse position _{m 3} candidate {38}, assign "0" to a candidate {39} adjacent thereto. In this manner, the entire candidate pulse position m ₃ label of "0" and "1".

【００３２】ここで、音声符号に透かしビット“０”を
埋込む場合は、鍵ｋ_ｐにより、図４中の“０”のラベル
を付けられた候補の中からパルス位置ｍ_３を選定する。
一方、透かしビット“１”を埋込む場合は、図４中の
“１”のラベルを付けられた候補の中からパルス位置ｍ
_３を選定する。これを繰り返すことで２値化した透かし
情報を埋込むことができる。[0032] Here, when embedding a watermark bit "0" to the audio code, the key k _p, selects the pulse position m ₃ from candidates, labeled "0" in FIG.
On the other hand, when embedding the watermark bit “1”, the pulse position m is selected from the candidates labeled “1” in FIG.
Select ₃ . By repeating this, the binary watermark information can be embedded.

【００３３】また、鍵ｋ_ｐを知る署名者は、音声符号中
に含まれるパルス位置ｍ_３のラベルが“１”か“０”か
を調べることで、透かし情報を容易に抽出できる。Further, the signer to know the key k _p, that the label is pulse position m ₃ included in the voice code checks "1" or "0", can be easily extracted watermark information.

【００３４】ＩＩＩ．秘匿性の向上方法上記の「ＩＩ．埋込方法」で述べた方法により、全サブ
フレームに埋込みを施すと、１秒あたり８００ビットの
各４番目のパルス情報のパルス位置ｍ_３の位置を選定す
ることで透かし情報を埋め込むので２００ビットの透か
し情報を埋込み可能である。しかし、同じ鍵ｋ_ｐを用い
て全符号に透かしを埋込むと、不正な第３者が鍵ｋ_ｐを
解析する可能性が高くなる。そこで、次の方法により秘
匿性の向上させた。III. By the method described in the confidentiality improving methods of "II. Embedding method", when subjected to embedding in all the sub-frame, selecting the position of the pulse position m ₃ of 800 bits every fourth pulse information per second By doing so, the watermark information is embedded, so that 200-bit watermark information can be embedded. However, when embedding watermark in all code using the same key k _p, more likely to third party unauthorized to analyze the key k _p. Therefore, the confidentiality was improved by the following method.

【００３５】まず、次の数式（数１６）に示すようにパ
ルス位置ｍ_０〜ｍ_３の候補の合計値をｃ_ｐとする。[0035] First, the total value of the candidate pulse position _m 0 ~m ₃ as shown in the following equation (Equation 16) and _{c p.}

【数１６】この合計値ｃ_ｐの値は、合計値ｃ_ｐが取り得る値の図表
である図５に示した５８通りのいずれかになる。(Equation 16) The value of the sum c _p will either 58 types shown in FIG. 5 the sum c _p is a table of possible values.

【００３６】また、音声符号に含まれるパルス位置ｍ_０
〜ｍ_３の候補の各値がランダムならば、図２の図表か
ら、合計値ｃ_ｐの出現頻度は、ほぼ連続する自然数から
ランダムに抽出した数値の和と考えられ、従って、正規
分布に近い特性を示すと考えられる。The pulse position m ₀ included in the speech code
If the random values of the candidates of ~m ₃ is, from the table of FIG. 2, the frequency of occurrence of the sum c _p is considered as the sum of the numbers extracted at random from almost successive natural numbers, therefore, close to a normal distribution It is considered to show characteristics.

【００３７】上記のパルス位置ｍ_０〜ｍ_３の候補の各値
がランダムなら正規分布であることの一例を得るため
に、発明者は、英語の女声Ｅｗｓ（後述する図９の図表
を参照）から得られた音声符号に含まれるパルス位置ｍ
_０〜ｍ_３の候補を用いて合計値ｃ_ｐの出現頻度を調査
し、図６に示す調査結果を得た。In order to obtain an example of a normal distribution if the values of the above-mentioned candidates for the pulse positions m _{0 to} m ₃ are random, the inventor must use the English female voice Ews (see the table of FIG. 9 described later). Pulse position m included in the speech code obtained from
We investigated the occurrence frequency of the sum c _p with the ₀ ~m ₃ candidate, to obtain a survey results shown in FIG.

【００３８】図６は、４つのパルス位置ｍ_０〜ｍ_３の合
計値ｃ_ｐの出現頻度を示す図であり、図６において、合
計値ｃ_ｐの出現頻度は、概略で正規分布を示していると
いえるので、各パルス位置ｍ_０〜ｍ_３の候補は、ほぼラ
ンダムに選択されていることになる。そこで、図７に示
すフィードバック処理構造により、透かしビットの分散
配置を行った。[0038] FIG. 6 is a diagram showing the frequency of occurrence of the four sum c _p of the pulse position m ₀ ~m _3, 6, occurrence frequency of the sum c _p is shows a normal distribution with schematic Therefore, the candidates for the pulse positions m _{0 to} m ₃ are almost randomly selected. Therefore, distributed arrangement of watermark bits is performed by the feedback processing structure shown in FIG.

【００３９】図７（ａ）は、送信装置におけるフィード
バック制御を示す図であり、図７（ｂ）は、受信装置に
おける前記送信装置のフィードバック処理結果を利用し
たフィードフォワード制御を示す図である。図７（ａ）
において、入力音声２１は、Ｇ７２９符号化装置２２に
入力され、８ｋビット／秒の出力符号２３（ｍ_０、
ｍ_１、ｍ_２、ｍ_３）として出力される。出力符号２３の
一部は、制御装置２４にフィードバックされ、制御装置
２４からは、Ｇ７２９符号化装置２２に対して埋込符号
選択信号２５が出力される。Ｇ７２９符号化装置２２で
は、そのフィードバックされてきた埋込符号選択信号２
５により入力音声２１の符号化を行い出力符号２３とし
て出力する。FIG. 7A is a diagram showing feedback control in the transmitting device, and FIG. 7B is a diagram showing feedforward control in the receiving device using the feedback processing result of the transmitting device. FIG. 7 (a)
In, the input speech 21 is input to the G729 encoding device 22 and the output code 23 (m ₀ , 8 kbit / s)
m ₁ , m ₂ , and m ₃ ). A part of the output code 23 is fed back to the control device 24, and the control device 24 outputs the embedded code selection signal 25 to the G729 encoding device 22. In the G729 encoding device 22, the embedded code selection signal 2
5, the input speech 21 is encoded and output as an output code 23.

【００４０】図７（ｂ）において、出力符号２３は、Ｇ
７２９復号装置２６に入力されると共に、制御装置２７
にも入力され、制御装置２７からＧ７２９復号装置２６
に対して埋込符号検索信号２８が出力される。Ｇ７２９
復号装置２６では、埋込符号を検索し、埋込符号を抽出
して出力音声２９を出力する。In FIG. 7B, the output code 23 is G
729 decoding device 26 and the control device 27
Is also input to the G729 decoding device 26 from the control device 27.
, An embedded code search signal 28 is output. G729
The decoding device 26 searches for an embedded code, extracts the embedded code, and outputs an output speech 29.

【００４１】ここで、図５に示す合計値ｃ_ｐが取り得る
５８通りの全値に対し“０”と“１”を割り当てた鍵ｋ
_ｃｏｎを導入する。この鍵ｋ_ｃｏｎは、５８ビットの２
進数である。[0041] Here, the key k assigned to "0" and "1" to the total value of the 58 types that can take the sum _{c p} shown in FIG. 5
Introduce _con . This key k _con is a 58 bit 2
It is a decimal number.

【００４２】図８は、図７の制御系を用いてフィードバ
ックした出力音声符号の合計値ｃ_ｐに対応する鍵ｋ
_ｃｏｎのビット値Ｃ_ｐｂの抽出処理を示す図である。図
８では、まず、フィードバックした出力音声符号から合
計値ｃ_ｐを求める。次に、その合計値ｃ_ｐに対応する鍵
ｋ_ｃｏｎのビット値ｃ_ｐｂを抽出する。この抽出された
ビット値ｃ_ｐｂが“１”のときは、音声符号への透かし
ビット埋込みを実施する。一方、そのビット値ｃ_ｐｂが
“０”のときは音声符号への透かしビット埋込みを実施
しない。これを繰り返すことで、透かしビットを音声符
号全体に分散配置できる。この方法により、鍵ｋ_ｃｏｎ
を知らない第３者が透かし情報を含む音声符号を特定す
るのは難しくなる。[0042] Figure 8, the key corresponding to the sum c _p of the output speech code obtained by feedback using the control system of FIG. 7 k
It is a figure which shows the extraction process of the bit value _Cpb of _con . In Figure 8, first, we obtain the sum c _p from the output speech code obtained by feedback. Next, extracts the bit values _{c pb} key _{k con} corresponding to the total value _{c p.} When the extracted bit value _cpb is “1”, embedding a watermark bit in the audio code is performed. On the other hand, when the bit value _cpb is “0”, the watermark bit is not embedded in the audio code. By repeating this, the watermark bits can be distributed and arranged over the entire speech code. In this way, the key k _con
It is difficult for a third party who does not know the audio code to specify the audio code including the watermark information.

【００４３】しかしながら、例えば、上記のような鍵ｋ
_ｃｏｎを導入しても図４の鍵ｋ_ｐに特殊な鍵ｋ_ｐ（“０
０００００００”や”１１１１１１１”）を用いて長期
間にわたって埋込みを施すと、埋込んだ透かしデータの
統計的特性が音声符号に反映されると考えられる。よっ
て、同じ鍵ｋ_ｐの長期使用は、透かしの存在を隠す上で
望込みを施せば、合計値ｃ_ｐの出現頻度の偏りを拡散する
ことを考える。However, for example, the key k as described above
be introduced _con special to key _{k p} of FIG. 4 key _k p ( "0
When subjected to embedding for a long period of time using 0000000 "or" 1111111 "), the statistical characteristics of the watermark data is embedded, it is considered to be reflected in the speech code. Therefore, long-term use of the same key _{k p} is the watermark Hope in hiding the existence of If Hodokose the Inclusive, consider spreading the deviation of frequency of appearance of the sum c _p.

【００４４】ここで、図５の図表に示す数は、全て連続
する２数の組で構成されていることから、合計値ｃ_ｐは
１／２の確率で偶数になる。従って、合計値ｃ_ｐが偶数
にな [0044] Here, the number shown in the table of FIG. 5, since it is composed of 2 number of successive sets all, sum c _p is an even number at a probability of 1/2. Therefore, the total value c _p is I to even

【数１７】拡散できる。[Equation 17] Can spread.

【００４５】ＩＶ．探索アルゴリズム上記した鍵を用いた各パルス位置の探索アルゴリズムに
ついて、符号化手順の動作フローチャートである図９
と、復号手順の動作フローチャートである図１０を用い
て、詳細に説明する。IV. Search Algorithm FIG. 9 is an operation flowchart of an encoding procedure for a search algorithm for each pulse position using the key described above.
Will be described in detail with reference to FIG. 10 which is an operation flowchart of the decoding procedure.

【００４６】尚、以下の探索アルゴリズムの説明では、
次のように表記を行う。ｔｈｒ_３：探索処理閾値、Ｓ_ｍａｘ：Ｃ／Ｅの最大値、
ＴＩ_ｍａｘ：最大探索処理量、ｔｉｍｅ：探索処理量、
Ｌ０，…，Ｌ４：ループ処理、ｉ_０，…，ｉ_４：パルス
位置候補、ｍ_０，…，ｍ_３：最適パルス位置、ＣＥ（ｘ
_０，ｘ_１，ｘ_２，ｘ_３）：パルス位置ｘ_０，ｘ_１，
ｘ_２，ｘ_３を用いてＣ／Ｅを求める関数、ｔｂｉｔ：埋
込むビット値、ｍｏｄｅ：埋込み実施フラグ（埋込む場
合：１，埋込まない場合：０，初期値：０）、ｔａｐ
（ｘ，ｙ）：ｘの上位からｙビット目のビット値を抽出
する関数、ｃｈｅｃｋ（ｋ_ｃｏｎ，ｃ_ｐ）：鍵ｋ_ｃｏｎ
の合計値ｃ_ｐに対応するビット値ｃ_ｐｂを抽出する関
数、ｇｅｔ（Ｔ）：埋込むデータファイルＴから１ビッ
トずつ抽出する関数、ｐｕｔ（ｔｂｉｔ，Ｔ）：抽出ビ
ット値ｔｂｉｔをデータファイルＴに出力する関数。In the following description of the search algorithm,
The notation is as follows. thr ₃ : search processing threshold value, S _max : maximum value of C / E,
TI _max : maximum search processing amount, time: search processing amount,
L0, ..., L4: _{_{loop, i 0, ..., i 4}} : pulse position _{_{candidate, m 0, ..., m 3}} : optimum pulse position, CE (x
_{_{_{_{0, x 1, x 2,}}}} x 3): pulse position _{_x 0,} _x _1,
a function for calculating C / E using x ₂ and x ₃ , tbit: bit value to be embedded, mode: embedding execution flag (when embedding: 1, when not embedded: 0, initial value: 0), tap
(X, y): function to extract a bit value of y most significant bit of _{_{x, check (k con, c}} p): Key _{k con}
Function of extracting a bit value _{c pb} corresponding to the sum value _{c p} of, get (T): function to extract one bit from the embedded data file T, put (tbit, T) : The extracted bit value tbit data file T Function to output to.

【００４７】図９の符号化手順の動作フローチャートに
示す処理は、以下のようになる。ステップＳ１では、探
索処理閾値ｔｈｒ_３を計算する。ステップＳ２では、Ｃ
／Ｅの最大値Ｓ_ｍａｘを「０」とする。ステップＳ３で
は、探索処理量ｔｉｍｅを「０」とする。ステップＳ４
では、パルス位置候補としてｉ_０＝０，５，…，３５を
用いてループ処理Ｌ０を開始する。ステップＳ５では、
パルス位置候補としてｉ_１＝１，６，…，３６を用いて
ループ処理Ｌ１を開始する。ステップＳ６では、パルス
位置候補としてｉ_２＝２，７，…，３７を用いてループ
処理Ｌ２を開始する。The processing shown in the operation flowchart of the encoding procedure of FIG. 9 is as follows. In step S1, to calculate the search processing threshold thr _3. In step S2, C
The maximum value S _max of / E is set to “0”. In step S3, the search processing amount time is set to “0”. Step S4
Then, the loop processing L0 is started using i ₀ = 0, 5,..., 35 as pulse position candidates. In step S5,
The loop processing L1 is started using i ₁ = 1, 6,..., 36 as pulse position candidates. In step S6, loop processing L2 is started using i ₂ = 2, 7,..., 37 as pulse position candidates.

【００４８】ｈｒ_３である場合（ステップＳ７：ＹＥＳ）には、ステ
ップＳ８に進み、進む。ステップＳ８では、パルス位置候補としてｉ_３＝
３，８，…，３８を用いてループ処理Ｌ３を開始する。
ステップＳ９では、埋め込みの無い場合、又は、初期値
である場合を示す埋込み実施フラグの状態であるｍｏｄ
ｅが「０」であるか、または、フラグの埋込が有る場合
を示す埋込み実施フラグの状態であるｍｏｄｅが「１」
であり、且つ、鍵ｋ_ｐの上位から（ｉ_３−３）／５ビッ
ト目のビット値を抽出する関数であるｔａｐ（ｋ_ｐ，
（ｉ_３−３）／５）＝ｔｂｉｔ（埋め込むビット値）で
あるか否かを判断する。ｍｏｄｅが「０」であるか、ま
たは、ｍｏｄｅが「１」で、且つ、ｔａｐ（ｋ_ｐ，（ｉ
_３−３）／５）＝ｔｂｉｔである場合（ステップＳ９：
ＹＥＳ）には、ステップＳ１０に進み、ｍｏｄｅが
「０」であるか、または、ｍｏｄｅが「１」で、且つ、
ｔａｐ（ｋ_ｐ，（ｉ_３−３）／５）＝ｔｂｉｔでない場
合（ステップＳ９：ＮＯ）には、ステップＳ１４に進
む。[0048] If a hr ₃ (step S7: YES), the process proceeds to step S8, move on. In step S8, i ₃ =
, 38, the loop processing L3 is started.
In step S9, the state of the embedding execution flag “mod” indicating that there is no embedding or that it is the initial value
e is “0” or the mode of the embedding execution flag indicating that the flag is embedded is “1”.
And tap (k _p , a function that extracts the bit value of the (i ₃ −3) / 5th bit from the higher order of the key k _p
It is determined whether or not (i ₃ −3) / 5) = tbit (bit value to be embedded). mode is “0” or mode is “1” and tap (k _p , (i
₃ -3) / 5) = if a tbit (step S9:
YES), the process proceeds to step S10, and the mode is “0” or the mode is “1” and
_{_{tap (k p, (i 3}} -3) / 5) if = not tbit: (step S9 NO), the process proceeds to step S14.

【００４９】ステップＳ１０では、Ｃ／Ｅの値であるＳ
を、パルス位置候補ｉ_０，ｉ_１，ｉ_２，ｉ_３を用いてＣ
／Ｅを求める関数であるＣＥ（ｉ_０，ｉ_１，ｉ_２，
ｉ_３）とする。ステップＳ１１では、Ｃ／Ｅの値である
ＳがＣ／Ｅの最大値であるＳ_ｍａｘより大きいか否かの
判断を行う。ＳがＳ_ｍａｘより大きい場合（ステップＳ
１１：ＹＥＳ）には、ステップＳ１２に進み、ＳがＳ
_ｍａｘより大きくない場合（ステップＳ１１：ＮＯ）に
は、ステップＳ１４に進む。ステップＳ１２では、Ｃ／
Ｅの最大値であるＳ_ｍａｘを、Ｓとする。ステップＳ１
３では、最適パルス位置ｍ_０をｉ_０とし、最適パルス位
置ｍ_１をｉ_１とし、最適パルス位置ｍ_２をｉ_２とし、最
適パルス位置ｍ_３をｉ_３とする。ステップＳ１４では、
ループ処理Ｌ３を終了する。ステップＳ１５では、パル
ス位置候補としてｉ_４＝４，９，…，３９を用いてルー
プ処理Ｌ４を開始する。In step S10, the value of C / E, S
Is calculated using the pulse position candidates i ₀ , i ₁ , i ₂ , i _3.
CE (i ₀ , i ₁ , i ₂ ,
i ₃ ). In step S11, it is determined whether S, which is the value of C / E, is greater than S _max, which is the maximum value of C / E. If S is greater than S _max (step S
11: YES), the process proceeds to step S12, where S is
If it is not larger than _max (step S11: NO), the process proceeds to step S14. In step S12, C /
Let _{Smax be} the maximum value of E be S. Step S1
In _No. 3, the optimum pulse position m _{0 is set} to i ₀ , the optimum pulse position m _{1 is set} to i ₁ , the optimum pulse position m _{2 is set} to i ₂ , and the optimum pulse position m _{3 is set} to i ₃ . In step S14,
The loop processing L3 ends. In step S15, loop processing L4 is started using i ₄ = 4, 9,..., 39 as pulse position candidates.

【００５０】ステップＳ１６では、埋め込みの無い場
合、又は、初期値である場合を示す埋込み実施フラグの
状態であるｍｏｄｅが「０」であるか、または、フラグ
の埋込が有る場合を示す埋込み実施フラグの状態である
ｍｏｄｅが「１」であり、且つ、鍵ｋ_ｐの上位から（ｉ
_４−４）／５ビット目のビット値を抽出する関数である
ｔａｐ（ｋ_ｐ，（ｉ_４−４）／５）≠ｔｂｉｔ（埋め込
むビット値）であるか否かを判断する。ｍｏｄｅが
「０」であるか、または、ｍｏｄｅが「１」で、且つ、
ｔａｐ（ｋ_ｐ，（ｉ_４−４）／５）≠ｔｂｉｔである場
合（ステップＳ１６：ＹＥＳ）には、ステップＳ１７に
進み、ｍｏｄｅが「０」であるか、または、ｍｏｄｅが
「１」で、且つ、ｔａｐ（ｋ_ｐ，（ｉ_４−４）／５）≠
ｔｂｉｔでない場合（ステップＳ１６：ＮＯ）には、ス
テップＳ２１に進む。ステップＳ１７では、Ｃ／Ｅの値
であるＳを、パルス位置候補ｉ_０，ｉ_１，ｉ_２，ｉ_３を
用いてＣ／Ｅを求める関数であるＣＥ（ｉ_０，ｉ_１，ｉ
_２，ｉ_３）とする。ステップＳ１８では、Ｃ／Ｅの値で
あるＳがＣ／Ｅの最大値であるＳ_ｍａｘより大きいか否
かの判断を行う。ＳがＳ_ｍａｘより大きい場合（ステッ
プＳ１８：ＹＥＳ）には、ステップＳ１９に進み、Ｓが
Ｓ_ｍａｘより大きくない場合（ステップＳ１８：ＮＯ）
には、ステップＳ２１に進む。In step S16, the embedding execution flag indicating that no embedding is performed, or the mode of the embedding execution flag indicating the initial value is "0", or the embedding execution indicating that the flag is embedded is performed. flag is a state mode is "1", and, from the top of the key _{k p} (i
_4-4) / 5 is a function of extracting a bit value of bit _{_{tap (k p, (i 4}} -4) it is determined whether / 5) ≠ tbit (bit value embedded). mode is “0”, or mode is “1”, and
If tap (k _p , (i ₄ −4) / 5) ≠ tbit (step S16: YES), the process proceeds to step S17, where mode is “0” or mode is “1”. , And tap (k _p , (i ₄ -4) / 5)}
If it is not tbit (step S16: NO), the process proceeds to step S21. At step S17, CE (i ₀ , i ₁ , i), which is a function for obtaining C / E using the pulse position candidates i ₀ , i ₁ , i ₂ , and i _3, is used as the value of C / E.
₂ , i ₃ ). In step S18, it is determined whether or not S, which is the value of C / E, is greater than _Smax, which is the maximum value of C / E. If S is larger than _Smax (step S18: YES), the process proceeds to step S19, and if S is not larger than _Smax (step S18: NO).
Proceeds to step S21.

【００５１】ステップＳ１９では、Ｃ／Ｅの最大値であ
るＳ_ｍａｘを、Ｓとする。ステップＳ２０では、最適パ
ルス位置ｍ_０をｉ_０とし、最適パルス位置ｍ_１をｉ_１と
し、最適パルス位置ｍ_２をｉ_２とし、最適パルス位置ｍ
_３をｉ_３とする。ステップＳ２１では、ループ処理Ｌ４
を終了する。ステップＳ２２では、探索処理量ｔｉｍｅ
を、ｔｉｍｅ＋１とする。ステップＳ２３では、探索処
理量ｔｉｍｅが最大探索処理量ＴＩ_ｍａｘより大きいか
否かを判断する。ｔｉｍｅがＴＩ_ｍａｘより大きい場合
（ステップＳ２３：ＹＥＳ）には、ステップＳ２７に進
み、ｔｉｍｅがＴＩ_ｍａｘより大きくない場合（ステッ
プＳ２３：ＮＯ）には、ステップＳ２４に進む。ステッ
プＳ２４では、ループ処理Ｌ２を終了する。ステップＳ
２５では、ループ処理Ｌ１を終了する。ステップＳ２６
では、ループ処理Ｌ０を終了する。In step S19, S is set to _Smax , which is the maximum value of C / E. In step S20, the optimum pulse position _{m 0} and _{i 0,} the optimum pulse position _{m 1} a _{i 1,} and the optimum pulse position _{m 2} and _{i 2,} the optimum pulse position m
₃ and _{i 3.} In step S21, the loop processing L4
To end. In step S22, the search processing amount time
Is set to time + 1. In step S23, the search processing amount time to determine maximum search processing amount _{TI max} is greater than. If time is greater than TI _max (step S23: YES), the process proceeds to step S27, and if time is not greater than TI _max (step S23: NO), the process proceeds to step S24. In step S24, the loop processing L2 ends. Step S
At 25, the loop processing L1 ends. Step S26
Then, the loop processing L0 ends.

【００５２】ステップＳ２７では、合計値ｃ_ｐを、最適
パルス位置の合計値ｍ_０＋ｍ_１＋ｍ_２＋ｍ_３とする。ス
テップＳ２８では、埋込実施フラグｍｏｄｅを、鍵ｋ
_ｃｏｎの合計値ｃ_ｐに対応するビット値ｃ_ｐｂを抽出す
る関数であるｃｈｅｃｋ（ｋ_ｃｏｎ，ｃ_ｐ）とする。ス
テップＳ２９では、埋込実施フラグｍｏｄｅが「１」で
あるか否かの判断を行う。ｍｏｄｅが「１」である場合
（ステップＳ２９：ＹＥＳ）には、ステップＳ３０に進
み、ｍｏｄｅが「１」でない場合（ステップＳ２９：Ｎ
Ｏ）には、ステップＳ３３に進む。In step S27, the total value _{c p,} and the sum _{_{_{m 0 + m 1 + m 2}}} + m 3 of optimum pulse positions. In step S28, the embedding execution flag mode is set to the key k
a function of extracting a bit value _{c pb} corresponding to the sum value _{c p} of _{_{_{con check (k con, c p}}} ) and. In step S29, it is determined whether the embedding execution flag mode is “1”. If the mode is “1” (step S29: YES), the process proceeds to step S30, and if the mode is not “1” (step S29: N)
In O), the process proceeds to step S33.

【００５３】ステップＳ３０では、埋め込むビット値ｔ
ｂｉｔを、埋め込むデータファイルＴから１ビットずつ
抽出する関数であるｇｅｔ（Ｔ）とする。ステップＳ３
１では、合計値ｃ_ｐが偶数であるか否かの判断を行う、
ｃ_ｐが偶数である場合（ステップＳ３１：ＹＥＳ）に
は、ステップＳ３２に進み、ｃ_ｐが偶数でない場合（ス
テップＳ３１：ＮＯ）には、ステップＳ３３に進む。ステップＳ３３では、最適パルス位置ｍ_０，ｍ_１，
ｍ_２，ｍ_３を出力する。In step S30, the bit value t to be embedded is
Let bit be get (T), which is a function that extracts one bit at a time from the data file T to be embedded. Step S3
In 1, a determination of whether or not the total value c _p is an even number,
If c _p is an even number: (step S31 YES), the process proceeds to step S32, if _{c p} is not an even number: (step S31 NO), the process proceeds to step S33. In step S33, the optimum pulse positions m ₀ , m ₁ ,
Output m ₂ and m ₃ .

【００５４】図１０の復号手順の動作フローチャートに
示す処理は、以下のようになる。ステップＳ５１では、
埋込実施フラグｍｏｄｅを、「０」とする。ステップＳ
５２では、受信した音声符号から最適パルス位置ｍ_０，
ｍ_１，ｍ_２，ｍ_３を取り出す。ステップＳ５３では、合
計値ｃ_ｐを、最適パルス位置の和であるｍ_０＋ｍ_１＋ｍ
_２＋ｍ_３とする。ステップＳ５４では、埋込実施フラグ
ｍｏｄｅが「０」であるか否かの判断を行う。ｍｏｄｅ
が「０」である場合（ステップＳ５４：ＹＥＳ）には、
ステップＳ６０に進み、ｍｏｄｅが「０」でない場合
（ステップＳ５４：ＮＯ）には、ステップＳ５５に進
む。The processing shown in the operation flowchart of the decoding procedure of FIG. 10 is as follows. In step S51,
The embedding execution flag mode is set to “0”. Step S
At 52, the optimum pulse position m ₀ ,
Extract m ₁ , m ₂ , and m ₃ . In step S53, the total value _{c p,} is the sum of the optimum pulse position _{_m} 0 ₊ _m 1 ₊ _m
And ₂ + _{m 3.} In step S54, it is determined whether the embedding execution flag mode is “0”. mode
Is "0" (step S54: YES),
Proceeding to step S60, if the mode is not "0" (step S54: NO), proceeding to step S55.

【００５５】ステップＳ５５では、ｍ_３から３を減じた
値が０か５の倍数であるか否かを判断する。ｍ_３から３
を減じた値が０か５の倍数である場合（ステップＳ５
５：ＹＥＳ）には、ステップＳ５６に進み、ｍ_３から３
を減じた値が０か５の倍数でない場合（ステップＳ５
５：ＮＯ）には、ステップＳ５７に進む。ステップＳ５
６では、埋め込むビット値ｔｂｉｔを、鍵ｋ_ｐの上位か
ら（ｍ_３−３）／５ビット目のビット値を抽出する関数
であるｔａｐ（ｋ_ｐ，（ｍ_３−３）／５）とする。ステ
ップＳ５７では、ｍ_３から４を減じた値が０か５の倍数
であるか否かを判断する。ｍ_３から４を減じた値が０か
５の倍数である場合（ステップＳ５７：ＹＥＳ）には、
ステップＳ５８に進み、ｍ_３から４を減じた値が０か５
の倍数でない場合（ステップＳ５５：ＮＯ）には、ステ
ップＳ５９に進む。In step S55, it is determined whether the value obtained by subtracting ₃ from m3 is 0 or a multiple of 5. m ₃ to 3
Is smaller than 0 or a multiple of 5 (step S5
5: YES), the process proceeds to step S56, _{m 3} from 3
Is not a multiple of 0 or 5 (step S5).
5: NO), the process proceeds to step S57. Step S5
In 6, the bit value tbit embedding a function of extracting a bit value of the upper key _{_{k p (m 3 -3) /}} 5 bit _{_{tap (k p, (m 3}} -3) / 5) and . At step S57, the value obtained by subtracting 4 from m ₃ is equal to or a multiple of 0 or 5. If the value obtained by subtracting from m ₃ 4 is a multiple of 0 or 5 (step S57: YES), the
Proceeds to step S58, the whether the value obtained by subtracting 4 from _{m 3} is 0 5
If not (step S55: NO), the process proceeds to step S59.

【００５６】ステップＳ５８では、埋め込むビット値ｔ
ｂｉｔを、鍵ｋ_ｐの上位から（ｍ_３−４）／５ビット目
のビット値を抽出する関数であるｔａｐ（ｋ_ｐ，（ｍ_３
−４）／５）とする。ステップＳ５９では、抽出ビット
値ｔｂｉｔをデータファイルＴに出力する。ステップＳ
６０では、埋込実施フラグｍｏｄｅを、鍵ｋ_ｃｏｎの合
計値ｃ_ｐに対応するビット値ｃ_ｐｂを抽出する関数であ
るｃｈｅｃｋ（ｋ_ｃｏｎ，ｃ_ｐ）とする。ステップＳ６
１では、合計値ｃ_ｐが偶数であるか否かの判断を行う、
ｃ_ｐが偶数である場合（ステップＳ６１：ＹＥＳ）に
は、ステップＳ６２に進み、ｃ_ｐが偶数でない場合（ス
テップＳ６１：ＮＯ）には、ステップＳ５２に戻る。ス
テップＳ６２では、鍵ｋ_ｐを鍵ｋ_ｐとして、ステップＳ
５２に戻り、ステップＳ５２〜ステップＳ６１またはＳ
６２までをを繰り返す。In step S58, the bit value t to be embedded is
The bit is defined as tap (k _p , (m ₃₎ , which is a function for extracting the bit value of the (m ₃ −4) / 5th bit from the higher order of the key k _p.
-4) / 5). In step S59, the extracted bit value tbit is output to the data file T. Step S
In 60, the embedded execution flag mode, the total value _{c p} is a function of extracting a bit value _{c pb} corresponding to _{check (k} con, _{c p)} of the key _{k con} with. Step S6
In 1, a determination of whether or not the total value c _p is an even number,
If c _p is an even number: (step S61 YES), the process proceeds to step S62, if _{c p} is not an even number: (step S61 NO), the process returns to step S52. In the step S62, the key _{k p} as the key _{k p,} step S
52, and returns to step S52 to step S61 or S
Repeat up to 62.

【００５７】Ｖ．実験結果Ｖ−１．実験システムの概要上記のようにして電子透かしを埋め込んだ本発明の実施
形態において、第３者に再生音質の異常から埋込みの存
在を知られないためには、埋込みによって音質が大きく
劣化しないことが重要である。そこで、発明者は、勧告
Ｇ．７２９のアルゴリズムに従ったシミュレータを作成
し実験を行った。V. Experimental results V-1. Outline of Experimental System In the embodiment of the present invention in which a digital watermark is embedded as described above, in order for a third party to know the presence of the embedding due to abnormal reproduction sound quality, it is necessary that the embedding does not significantly degrade the sound quality. is important. Therefore, the inventor has made the recommendation G. A simulator according to the G.729 algorithm was created and an experiment was performed.

【００５８】図１１は、男声音声と女声音声による日本
語と英語の実験音声の図表である。尚、図１１の図表に
示した実験音声は、ＦＭラジオ並びに英会話テープから
抽出した日本語と英語の男性と女性の発声音を、８ｋＨ
ｚ、１６ｂｉｔで量子化したものである。FIG. 11 is a chart of Japanese and English experimental voices using a male voice and a female voice. The experimental voices shown in the chart of FIG. 11 are 8 kHz of male and female utterances of Japanese and English extracted from FM radio and English conversation tape.
z, 16 bits.

【００５９】また、埋込みには、次の鍵ｋ_ｐ、ｋ_ｃｏｎ
を用いた。これらの鍵の値は、次の数式（数１８）、
（数１９）に示すように０〜Ｆの１６進数で表現してい
る。Also, the following keys k _p and k _con are embedded in the embedding.
Was used. The values of these keys are given by the following equation (Equation 18):
As shown in (Equation 19), it is expressed by hexadecimal numbers from 0 to F.

【数１８】 (Equation 18)

【数１９】 [Equation 19]

【００６０】ここで、鍵ｋ_ｐを数式（数１８）のように
した理由は、数式（数１８）のように特殊な値とする
と、前記したように、音声符号から得られる合計値ｃ_ｐ
の統計値に、最も大きくその統計的特性が現れると考え
られるためである。[0060] Here, the reason that the key k _p as equation (Equation 18), when a special value as in Equation (Equation 18), as described above, the total value c _p obtained from the voice code
This is because it is considered that the statistical characteristic appears largest in the statistical value of.

【００６１】又、鍵ｋ_ｃｏｎを数式（数１９）のように
した理由は、数式（数１９）のようにすると全符号に対
して透かしの埋込みが行われるが、再生音質の劣化の点
では最大になるためである。即ち、数式（数１８）で
は、鍵ｋ_ｐの最悪のケースを扱い、数式（数１９）で
は、鍵ｋ_ｃｏｎの最悪のケースを扱っている。通常この
ような事態は避けなければならないが、最悪の場合にお
いても本発明の有効性を確認するために、あえて最悪の
状態として実験を行った。The reason why the key k _con is represented by the equation (Equation 19) is that when the equation (Equation 19) is used, the watermark is embedded in all the codes, but in terms of deterioration of the reproduced sound quality, Because it is the maximum. That is, in the formula (number 18), handles the worst case of key _{k p,} in the formula (number 19), are dealing with the worst case of key _{k con.} Normally, such a situation must be avoided. However, in order to confirm the effectiveness of the present invention even in the worst case, an experiment was performed as a worst case.

【００６２】尚、本実験では、透かし情報としてインタ
ーネット規格のＲＦＣ（ＲｅｑｕｅｓｔＦｏｒＣｏ
ｍｍｅｎｔｓ）に含まれる英文テキストデータを用い
た。これは、各文字を８ビットのアスキーコードで表現
しており、統計的にはビット“０”を多く含む特徴があ
る。In this experiment, RFC (Request For Co.) of Internet standard was used as watermark information.
The text data contained in the English language texts contained in the texts are used. This is characterized in that each character is represented by an 8-bit ASCII code, and statistically includes many bits “0”.

【００６３】Ｖ−ｉｉ．音質の評価法本実験の透かしを埋め込んだ音声データの主観的評価法
として、評価者の絶対判断によるオピニオン評価を用い
た。これは、複数の評価者に音質を５段階に絶対評価さ
せ、得られた評価値から平均オピニオン値（ＭＯＳ：Ｍ
ｅａｎＯｐｉｎｉｏｎＳｃｏｒｅ）を求めるもので
ある。V-ii. Evaluation method of sound quality Opinion evaluation based on absolute judgment of the evaluator was used as a subjective evaluation method of audio data in which the watermark of this experiment was embedded. This means that a plurality of evaluators absolutely evaluate the sound quality in five levels, and obtain an average opinion value (MOS: M
ean Opinion Score).

【００６４】本実験では、オピニオン評価の基準を、非
常に良い：５、良い：４、普通：３、悪い：２、非常に
悪い：１とした。また、被験者の先入観による影響を避
けるため、各音声ごとに、埋込みのないものと、埋込み
のあるものの２種類を外見上区別できない状態で準備し
た。In this experiment, the opinion evaluation criteria were set to very good: 5, good: 4, normal: 3, bad: 2, and very bad: 1. Further, in order to avoid the influence of the subject's prejudice, for each sound, two types, one without embedding and one with embedding, were prepared in a state where they could not be distinguished from each other in appearance.

【００６５】また、各音声を任意に参照できるシステム
を準備し、自由に聞き比べることで評価させた。これに
より、埋込みによる聴感的な音質の違いがあるならば、
埋込みのある音声と、埋込の無い音声の平均オピニオン
値ＭＯＳに大差が生じると考えられる。Further, a system which can arbitrarily refer to each voice was prepared and evaluated by freely listening and comparing. With this, if there is a difference in auditory sound quality due to embedding,
It is considered that there is a large difference between the average opinion value MOS of the embedded voice and that of the non-embedded voice.

【００６６】Ｖ−ｉｉｉ．実験結果と考察本実験では、上記した数式（数１９）の鍵ｋ_ｃｏｎを用
いたため、音声符号への透かしの埋込み量は１秒あたり
最大の２００ビットになる。この透かしを埋込んだ再生
音声と透かしの埋込みのない再生音声の音質を、２０代
の健聴者８名により評価を行った場合の平均オピニオン
値ＭＯＳを示した図表が図１３である。この図１３の図
表によると、透かしの埋込みのある場合と、透かしの埋
込の無い場合の平均オピニオン値ＭＯＳは、それぞれ約
３．７程度で、透かしの有る無しに関わらず両方の場合
がほぼ同等の結果になっている。V-iii. Experimental Results and Discussion In this experiment, since the key k _con of the above equation (Equation 19) was used, the maximum amount of embedding of the watermark in the speech code was 200 bits per second. FIG. 13 is a table showing the average opinion value MOS when the sound quality of the reproduced sound in which the watermark is embedded and the reproduced sound without the watermark are evaluated by eight normal hearing persons in their 20s. According to the table of FIG. 13, the average opinion value MOS in the case where the watermark is embedded and in the case where the watermark is not embedded are about 3.7, respectively. The result is equivalent.

【００６７】これは、本発明の上記実験では、透かしデ
ータの埋込みによる聴感的な音質の違いがほとんどなか
ったということ、即ち、被験者が埋込みのある音声を特
定できなかったことを示していると考えられる。よっ
て、上記したように数式（数１８）や数式（数１９）を
用いて最悪の条件を想定した実験においても、被験者が
埋込のある音声を特定できなかったことから、本発明の
通常の実施形態においては、第３者によって、再生音質
の聴感的な違いから、再生音中から透かしの存在する音
声符号が特定されることは非常に少ないと考えられる。This means that in the above experiment of the present invention, there was almost no difference in audible sound quality due to embedding of the watermark data, that is, it indicates that the subject could not specify the embedded voice. Conceivable. Therefore, as described above, even in an experiment in which the worst condition was assumed using Expression (Equation 18) and Expression (Equation 19), the subject could not identify the voice with embedding. In the embodiment, it is considered that the third party rarely specifies the audio code having the watermark from the reproduced sound due to the audible difference in the reproduced sound quality.

【００６８】次に、再生波形の一部を切り出して、埋込
み処理が波形の形状に与えた影響を観察した。図１２で
は、透かし埋込みのない再生音声波形の図１２（ａ）
と、透かし埋込みを施した再生音声波形の図１２
（ｂ）、および、それらの差分波形の図１２（ｃ）を示
している。これら図１２の波形は、図１１における英語
の男声データＥｍにおける、発音“ｔｈｉｎｋ”に相当
する部分で、０．２ｓの音声区間である。従って、波形
（ｂ）には、４０ビットの透かし情報が埋込まれている
ことになる。Next, a part of the reproduced waveform was cut out, and the effect of the embedding process on the shape of the waveform was observed. In FIG. 12, FIG. 12 (a) shows a reproduced audio waveform without watermark embedding.
And FIG. 12 of the reproduced audio waveform with the watermark embedded.
(B) and FIG. 12 (c) showing their difference waveforms. These waveforms in FIG. 12 are portions corresponding to the pronunciation "think" in the English male voice data Em in FIG. 11, and are speech sections of 0.2 s. Therefore, the waveform (b) has the watermark information of 40 bits embedded therein.

【００６９】ここで、図１２の差分波形（ｃ）が直線で
はないことから、図１２（ａ）と図１２（ｂ）の再生音
声波形には、違いが生じていることがわかる。この違い
は、本発明において実施される、符号帳のパルス位置の
変更により生じた位相の変化が原因であると考えられ
る。しかしながら、人間の聴覚は、位相のずれを感じる
ことが一般に苦手である。そのため、図１２（ａ）と図
１２（ｂ）の波形の形状に不自然な歪みを生じなけれ
ば、その両者の位相が若干変化しても、音声として不自
然には感じられないことになる。よって、これらの再生
音声を聞き比べても、聴感的に大きな違いをほとんど感
じないと考えられる。これは、先に示した図１３の図表
において、透かし埋込の有る無しで違いが無かったこと
からも推察できる。Since the difference waveform (c) in FIG. 12 is not a straight line, it can be seen that there is a difference between the reproduced voice waveforms in FIGS. 12 (a) and 12 (b). This difference is considered to be due to a change in phase caused by a change in the pulse position of the codebook implemented in the present invention. However, human hearing is generally not good at feeling a phase shift. Therefore, unless an unnatural distortion occurs in the waveform shapes of FIG. 12A and FIG. 12B, even if the phases of both of them slightly change, the sound does not feel unnatural. . Therefore, even if these reproduced sounds are compared, it is considered that there is almost no significant difference in hearing. This can be inferred from the fact that there is no difference between the above-described chart of FIG. 13 with and without watermark embedding.

【００７０】また、通常の場合は、本発明の透かしの音
声への埋込を用いて公開される音声符号は、埋込みのあ
るもののみとなるはずであり、本実験で示したような比
較用の埋込のない音声は公開されない。よって、不正な
手段で音声符号を傍受された場合には、図１２の差分波
形（ｃ）のような埋込みのない波形と比較することはで
きないことになる。従って、第３者が、図１２（ｃ）に
示すような差分波形を得ることは、通常はあり得ないこ
とである。よって、第３者が再生波形の形状から、本発
明の埋込みのある音声符号を特定することは、非常に難
しいと考えられる。In a normal case, the speech code released using embedding of the watermark of the present invention in speech should be only the speech code with embedding, and the speech code for comparison as shown in the present experiment was used. Sounds without embedding are not released. Therefore, if the voice code is intercepted by an illegal means, it cannot be compared with a waveform without embedding such as the differential waveform (c) in FIG. Therefore, it is usually impossible for a third party to obtain a differential waveform as shown in FIG. Therefore, it is considered very difficult for a third party to specify the embedded speech code of the present invention from the shape of the reproduced waveform.

【００７１】次に、数式（数１８）のような鍵ｋ_ｐを使
用して、大量に透かしの埋込みを施すと、埋込むデータ
の統計的なビット特性が音声符号に反映されると考えら
れる。そこで、図６と同様な手法により、図１１の女声
Ｅｗｓの全音声符号に埋込みを施した場合の合計値ｃ_ｐ
の出現頻度を調べた。その結果、鍵ｋ_ｐが数式（数１
８）を用いて固定の場合の４つのパルス位置の合計値ｃ
_ｐの出現頻度を示す図である図１４が得られた。Next, considered with the key k _p as in Equation (Equation 18), when large quantities subjected to embedding watermark, statistical bit characteristics of the data to be embedded is reflected in the speech code . Therefore, the total value c _p when all voice codes of the female voice Ews in FIG. 11 are embedded by the same method as in FIG.
Was examined. As a result, key _{k p} a formula (number 1
8) The total value c of the four pulse positions when fixed using
FIG. 14 showing the appearance frequency of _p was obtained.

【００７２】この図１４と図６を比較すると、棒グラフ
の一個置きに凹凸が発生していることから、明らかに埋
込みの影響が有ることが観察される。従って、この数式
（数１８）のような特殊な鍵ｋ_ｐを長期にわたって使用
すると、埋込むビット系列の統計的な特徴が再生音声に
反映される。即ち、第３者に埋め込まれた透かしの存在
が気付かれる可能性が増えることになる。一般的には、
透かしの存在は気付かれにくい方が望ましいので、上記
の「ＩＩＩ．秘匿性の向上方法」において説明した本発
明の手法を用いて、鍵ｋ_ｐのみを変動させ、統計的な偏
りを拡散させるようにした。When FIG. 14 is compared with FIG. 6, it can be clearly seen that the embedding is clearly affected since every other bar graph has irregularities. Thus, using the special key k _p, such as the equation (equation 18) for a long time, statistical characteristics of the embedded bit sequence is reflected in the reproduced sound. That is, the possibility that the presence of the watermark embedded in the third party is noticed increases. In general,
Since the presence of the watermark might be more difficult to notice desired, using the technique of the present invention described in "III. Confidentiality improved method of" above, only the key k _p is varied, so as to diffuse the statistical bias I made it.

【００７３】図１５は、鍵ｋ_ｐを変動させた場合の４つ
のパルス位置の合計値ｃ_ｐの出現頻度を示す図であるこ
の図１５から、鍵ｋ_ｐの変動処理により、図１４に示し
たような統計的な偏りが拡散されて改善されていること
がわかる。よって、上記の「ＩＩＩ．秘匿性の向上方
法」において説明した本発明の手法による鍵ｋ_ｐの変動
処理は、埋込みにより生じる統計的な偏りを解消する方
法として有効であると考えられる。[0073] Figure 15 is from the 15 is a diagram showing the frequency of occurrence of the sum c _p of four pulse positions when the key k _p is varied by variation processing key k _p, shown in FIG. 14 It can be seen that such statistical bias has been diffused and improved. Thus, variation processing of the key k _p by the method of the present invention described in "III. Confidentiality improved method of" above is considered to be effective as a method for eliminating the statistical bias caused by implantation.

【００７４】上記に示した本発明の実施形態において
は、勧告Ｇ．７２９の８ｋビット／秒、ＣＳ−ＡＣＥＬ
Ｐによる音声符号に、透かし情報を密かに埋込む手法を
示した。また、本発明の方法を用いた実験により、音声
符号に本発明による透かし情報等が埋込まれても、その
音声符号の聴取者に聴感的な違和感を与えないで、音声
を再生できることを確かめた。In the embodiment of the present invention described above, Recommendation G. 729 8 kbit / s, CS-ACEL
A method of secretly embedding watermark information in a speech code of P has been described. In addition, experiments using the method of the present invention have confirmed that even when watermark information or the like according to the present invention is embedded in a speech code, it is possible to reproduce speech without giving a listener of the speech code a sense of incongruity. Was.

【００７５】ところで、デジタル音声は、一般的に符号
誤りに弱く、誤りを含む状態の音声符号をそのまま再生
すると音質が大きく劣化する。従って、本発明の方法を
用いた音声符号においても、符号誤りが生じた場合に
は、それにより受ける影響は大きくなると考えられる。
しかしながら、例えば、その対策として本発明の方法を
用いた音声符号に誤り訂正技術を適用することにより、
ランダム符号誤りには対処できると考えられる。By the way, digital voices are generally vulnerable to code errors, and if voice codes with errors are reproduced as they are, the sound quality is greatly degraded. Therefore, even in a speech code using the method of the present invention, if a code error occurs, the influence of the code error is considered to be large.
However, for example, by applying an error correction technique to a speech code using the method of the present invention as a countermeasure,
It is considered that random code errors can be dealt with.

【００７６】又、本発明の方法は、送信者が特定相手の
みに知らせたい非公開情報を、密かに音声符号に埋込ん
で伝送する場合にも利用が可能である。例えば、Ｓｔｅ
ｅｌ他のＳｉｍｕｌｔａｎｅｏｕｓｔｒａｎｓｍｉｓ
ｓｉｏｎｏｆｓｐｅｅｃｈａｎｄｄａｔａｕ
ｓｉｎｇｃｏｄｅ−ｂｒｅａｋｉｎｇｔｅｃｈｎｉ
ｑｕｅｓ，ＴｈｅＢｅｌｌＳｙｓｔｅｍＴｅｃｈ
ｎｉｃａｌＪｏｕｒｎａｌ，Ｖｏｌ．６０，Ｎｏ．
９，ｐｐ．２０８１−２１０５（１９８１）や、Ｗｏｎ
ｇ他のＴｒａｎｓｍｉｔｔｉｎｇｄａｔａｏｎｔ
ｈｅｐｈａｓｅｏｆｓｐｅｅｃｈｓｉｇｎａｌ
ｓ，ＴｈｅＢｅｌｌＳｙｓｔｅｍＴｅｃｈｎｉｃａ
ｌＪｏｕｒｎａｌ，Ｖｏｌ．６１，Ｎｏ．１０，ｐ
ｐ．２９４７−２９７０（１９８２）に利用が可能であ
る。この例のような場合には、不正な手段で情報を得よ
うとする第３者に秘密メッセージの存在すら知られるこ
となく伝送できる利点がある。The method of the present invention can also be used when secret information that the sender wants to notify only to a specific party is secretly embedded in a speech code and transmitted. For example, Ste
el other Simultaneous transmitis
Sion of speech and data u
sing code-breaking techni
ques, The Bell System Tech
medical Journal, Vol. 60, no.
9, pp. 2081-2105 (1981), Won
g Other transmitting data on t
he phaseof speech signal
s, The Bell SystemTechnica
l Journal, Vol. 61, no. 10, p
p. 2947-2970 (1982). In the case of this example, there is an advantage that the information can be transmitted to a third party who tries to obtain information by unauthorized means without even knowing the existence of the secret message.

【００７７】又、本発明の透かしの埋込方法は、上記の
ように第３者にその存在を知られる可能性は非常に少な
いことから、音声ソフト等の著作権保護のための電子透
かしの埋込みや、文書データの秘密伝送にも応用が可能
である。The watermark embedding method of the present invention is very unlikely to be known to a third party as described above. It can be applied to embedding and secret transmission of document data.

【００７８】[0078]

【発明の効果】本発明の透かしビット埋込方法では、デ
ジタル音声データを符号化する際に用いられるマルチパ
ルス音源の構造に着目し、その合成過程においてビット
系列化されたデータを埋込むことで、、勧告Ｇ．７２６
について岩切等により示された方法とは異なる方法によ
り、勧告Ｇ．７２９について、圧縮された状態の音声符
号に、電子透かしを埋込む方法を提供できる。The watermark bit embedding method of the present invention focuses on the structure of a multi-pulse sound source used when encoding digital audio data, and embeds bit sequence data in the synthesis process. Recommendation G. 726
Of Recommendation G. by a method different from the method indicated by Iwakiri et al. Regarding 729, a method of embedding a digital watermark in a compressed speech code can be provided.

【００７９】又、埋込みを施す音声符号を不特定に選択
することで、第３者が透かし情報を含む音声符号を特定
することが難しくなり、第３者による鍵の解析される可
能性を減らすことができ、更に、埋込みの規則を変化さ
せることで、長期間にわたって同じ鍵を使用しても鍵を
解析される可能性を減らすことができるので、埋め込ん
だ電子透かしの存在を簡単な方法で隠す方法を提供でき
る。Further, by arbitrarily selecting a voice code to be embedded, it becomes difficult for a third party to specify a voice code including watermark information, and the possibility that the key is analyzed by the third party is reduced. In addition, by changing the rules of embedding, it is possible to reduce the possibility that the key will be analyzed even if the same key is used for a long period of time. Can provide a way to hide.

[Brief description of the drawings]

【図１】勧告Ｇ．７２９の基本的な符号器の構成を示す
ブロック図である。FIG. 729 is a block diagram illustrating the configuration of a basic encoder of FIG.

【図２】勧告Ｇ．７２９の固定符号帳を示す図表であ
る。FIG. 729 is a chart showing a fixed codebook of G.729.

【図３】４番目のパルス情報のパルス位置ｍ_３の置換を
示す図である。3 is a diagram showing a fourth substitution pulse position m ₃ of pulse information.

【図４】鍵ｋ_ｐによる４番目のパルス情報のパルス位置
ｍ_３の分類を示す図である。4 is a diagram showing the classification of pulse position m ₃ of the fourth pulse information by the key k _p.

【図５】４つのパルス位置の合計値ｃ_ｐが取り得る値を
示す図表である。5 is a table showing the four sum c _p possible values of the pulse position.

【図６】４つのパルス位置の合計値ｃ_ｐの出現頻度を示
す図である。6 is a diagram showing the frequency of occurrence of the sum c _p of four pulse positions.

【図７】（ａ）は送信装置の制御を示す図であり、
（ｂ）は受信装置の制御を示す図である。FIG. 7A is a diagram illustrating control of a transmission device;
(B) is a figure which shows control of a receiver.

【図８】フィードバックした出力音声符号の４つのパル
ス位置の合計値に対応する鍵のビット値ｃ_ｐｂの抽出処
理を示す図である。FIG. 8 is a diagram illustrating a process of extracting a key bit value _cpb corresponding to a total value of four pulse positions of a feedback output speech code.

【図９】符号化手順の動作フローチャートである。FIG. 9 is an operation flowchart of an encoding procedure.

【図１０】復号手順の動作フローチャートである。FIG. 10 is an operation flowchart of a decoding procedure.

【図１１】男声音声と女声音声による日本語と英語の実
験音声の図表である。FIG. 11 is a chart of Japanese and English experimental voices by male and female voices.

【図１２】（ａ）は透かしの埋込のない音声波形を示す
図であり、（ｂ）は透かしの埋込の有る音声波形を示す
図であり、（ｃ）は（ａ）と（ｂ）の差の波形を示す図
である。12A is a diagram showing a speech waveform without watermark embedding, FIG. 12B is a diagram showing a speech waveform with watermark embedding, and FIG. 12C is a diagram showing speech waveforms with watermarks embedded therein. It is a figure which shows the waveform of the difference of ().

【図１３】再生音質の平均オピニオン値を示す図表であ
る。FIG. 13 is a chart showing an average opinion value of reproduced sound quality.

【図１４】鍵ｋ_ｐが固定の場合の４つのパルス位置の合
計値ｃ_ｐの出現頻度を示す図である。[14] The key k _p is a diagram showing the frequency of occurrence of the sum c _p of four pulse positions for fixed.

【図１５】鍵ｋ_ｐが変動の場合の４つのパルス位置の合
計値ｃ_ｐの出現頻度を示す図である。[15] The key k _p is a diagram showing the frequency of occurrence of the sum c _p of four pulse positions in the case of variations.

[Explanation of symbols]

１、２１・・・入力音声、２・・・前処理部、３・・・
線形予測分析部、４・・・線形予測合成フィルタ、５、
１７・・・加算器、６・・・聴覚重み付けフィルタ、７
・・・線形予測係数、８・・・固定符号帳探索、９・・
・ピッチ分析、１０・・・パラメータ符号化、１１・・
・送信ビットストリーム、１２・・・利得量子化、１３
・・・適応符号帳、１４・・・固定符号帳、１５・・・
適応符号帳利得（ピッチ利得）、１６・・・固定符号帳
利得、２２・・・Ｇ．７２９符号化装置、２３・・・出
力符号、２４、２７・・・制御装置、２５・・・埋込符
号選択信号、２６・・・復号装置、２８・・・埋込符号
探索信号、２９・・・出力音声、ｍ_０ _ｃ _ｐ・・・４つのパルス位置の合計値、ｃ_ｐｂ・・・フ
ィードバックした出力音声符号の４つのパルス位置の合
計値に対応する鍵のビット値、ｓ_０〜ｓ_３・・・極性情
報、1, 21 ... input voice, 2 ... preprocessing unit, 3 ...
Linear prediction analysis unit, 4 ... linear prediction synthesis filter, 5,
17 ... adder, 6 ... auditory weighting filter, 7
... Linear prediction coefficient, 8 ... Fixed codebook search, 9 ...
・ Pitch analysis, 10 ・・・ Parameter coding, 11 ・・
Transmission bit stream, 12: gain quantization, 13
... Adaptive codebook, 14 ... Fixed codebook, 15 ...
Adaptive codebook gain (pitch gain), 16... Fixed codebook gain, 22. 729 encoding device, 23 ... output code, 24, 27 ... control device, 25 ... embedded code selection signal, 26 ... decoding device, 28 ... embedded code search signal, 29 ... output voice, _{m 0} _c _p · · · four total value of the pulse _position, the bit value of the key corresponding to the sum of the four pulse positions of the output speech code obtained by _{c pb} · · · _feedback, s 0 _~s 3 ··· polarity information,

Claims

[Claims]

1. A method of embedding a watermark bit at least when digitally encoding and transmitting voice using a fixed codebook, wherein “1” or “1” is set as a candidate for a plurality of adjacent pulse positions in the fixed codebook. 0 ", and a first key for selecting the pulse position candidate based on" 1 "or" 0 "is determined. The first key is set at the bit position where the watermark is embedded in the transmission voice code. A watermark bit embedding method at the time of speech coding, characterized by using a pulse position selected by a key.

2. A method according to claim 1, wherein the fixed codebook assigns “1” or “0” to each of possible values of a predetermined total number of the pulse position candidates, and embeds the watermark. A second key for selecting whether to perform or not to perform according to whether the total value is “1” or “0” is determined, and the total value obtained by feedback from an output speech code is made to correspond to the second key, 2. The method according to claim 1, wherein the watermark is embedded.

3. A third key having a reverse assignment to the assignment of “1” and “0” in the first key is determined, and a total value of a predetermined number of the pulse position candidates is an even value. Or an odd value is detected, and one of the first key and the third key is associated with each of the even value and the odd value of the total value so as not to be the same key. The watermark bit at the time of voice encoding according to claim 2, wherein a pulse position selected by the first key or the third key is used as a bit position for embedding a watermark in the voice code. Embedding method.