JP4736794B2

JP4736794B2 - Digital watermark encoding apparatus, digital watermark decoding apparatus, digital watermark encoding method, digital watermark decoding method, and program

Info

Publication number: JP4736794B2
Application number: JP2005372652A
Authority: JP
Inventors: 邦博須賀
Original assignee: Kenwood KK
Current assignee: Kenwood KK
Priority date: 2005-12-26
Filing date: 2005-12-26
Publication date: 2011-07-27
Anticipated expiration: 2025-12-26
Also published as: JP2007171834A

Abstract

<P>PROBLEM TO BE SOLVED: To provide an electronic watermark encoding device etc., for effectively protecting speech data while freely providing data representing speeches. <P>SOLUTION: A speech data input unit E1 of the electronic watermark encoding device E acquires the speech data representing a speech, and a pitch specifying unit E2 specifies the frequency of a pitch component of the speech; and a BPFE 3 extracts a pitch component having a frequency near the frequency, and a BRFE 4 shifts the frequency. A switching unit E5 processes the frequency-shifted pitch component so that the pitch component has a pattern of intermittence representing a value of data to be embedded as an electronic watermark, and an adding unit E6 adds it to the original speech data. The electronic watermark decoding device acquires electronically watermarked speech data and extracts a pitch component similarly to the electronic watermark encoding device E, and a pitch array extracting unit specifies embedded data from a pattern of multiplexing of the pitch component. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

この発明は、電子透かしエンコード装置、電子透かしデコード装置、電子透かしエンコード方法、電子透かしデコード方法及びプログラムに関する。 The present invention relates to a digital watermark encoding apparatus, a digital watermark decoding apparatus, a digital watermark encoding method, a digital watermark decoding method, and a program.

音声を合成する手法として、録音編集方式と呼ばれる手法がある。録音編集方式は、駅の音声案内システムや、車載用のナビゲーション装置などに用いられている。
録音編集方式は、単語と、この単語を読み上げる音声を表す音声データとを対応付けておき、音声合成する対象の文章を単語に区切ってから、これらの単語に対応付けられた音声データを取得してつなぎ合わせる、という手法である（例えば、特許文献１参照）。
特開平１０−４９１９３号公報 As a technique for synthesizing speech, there is a technique called a recording editing system. The recording / editing system is used in a station voice guidance system, an in-vehicle navigation system, and the like.
The recording and editing method associates a word with voice data representing a voice that reads out the word, divides a sentence to be synthesized into words, and acquires voice data associated with these words. This is a technique of joining them together (for example, see Patent Document 1).
JP 10-49193 A

録音編集方式により得られる合成音声の話者の変更を可能としたり、あるいはその他、得られる合成音声を多様にするための手法としては、音声データをリムーバブルメディア（可搬な記録媒体）に記録して用いるものとして、互いに異なる音声データを記録した複数のリムーバブルメディアを必要に応じて差し替える、というものが考えられる。しかし、リムーバブルメディアに記録された音声データは、不正な複製や改竄、あるいはその他の不正利用をされやすいという問題がある。 As a technique for making it possible to change the speaker of the synthesized speech obtained by the recording and editing method, or to diversify the synthesized speech obtained, record the speech data on removable media (portable recording media). As one to use, a plurality of removable media in which different audio data are recorded may be replaced as necessary. However, there is a problem that the audio data recorded on the removable medium is likely to be illegally copied, falsified or otherwise illegally used.

この発明は、上記実状に鑑みてなされたものであり、音声を表すデータの自由な供給を図りながら、音声データの有効な保護を図るための電子透かしエンコード装置、電子透かしデコード装置、電子透かしエンコード方法、電子透かしデコード方法及びプログラムを提供することを目的とする。 The present invention has been made in view of the above circumstances, and is an electronic watermark encoding apparatus, electronic watermark decoding apparatus, electronic watermark encoding for effectively protecting audio data while freely supplying data representing sound. It is an object to provide a method, a digital watermark decoding method, and a program.

上記目的を達成するため、この発明の第１の観点に係る電子透かしエンコード装置は、
音声を表す音声信号を取得する音声信号取得手段と、
前記取得された音声信号に当該音声信号が表す音声のピッチ成分の周波数をシフトして得られるピッチ成分を重畳することにより多重化された部分の時間軸上における分布が、電子透かしとして埋め込む対象のデータの値を表すように、前記取得された音声信号を加工することにより、電子透かしを施された音声信号を生成する音声信号加工手段と、
を備えることを特徴とする。 In order to achieve the above object, a digital watermark encoding apparatus according to the first aspect of the present invention provides:
An audio signal acquisition means for acquiring an audio signal representing the audio;
Subject distribution on the time axis of the multiplexed portion by superimposing pitch component obtained by shifting the frequency of the pitch component of the speech represented by the audio signal on the obtained audio signal is embedded as a digital watermark to represent the value of the data, by processing the acquired voice signal, a voice signal processing means for generating an audio signal subjected to digital watermark,
Characterized the Ruco equipped with.

前記音声信号加工手段は、例えば、
取得された前記音声信号より、当該音声信号が表す音声のピッチ成分を表すピッチ成分信号を抽出するピッチ成分抽出手段と、
前記ピッチ信号の周波数をシフトするピッチ成分周波数シフト手段と、
周波数をシフトされた前記ピッチ成分信号を取得し、当該ピッチ成分信号を、電子透かしとして埋め込む対象のデータの値を表すような断続のパターンを時間軸上において有するよう加工するピッチ成分加工手段と、
前記加工を施されたピッチ成分信号及び前記音声信号を互いに加算することにより、前記電子透かしを施された音声信号を生成する加算手段と、を備えるものであればよい。 The audio signal processing means is, for example,
Pitch component extraction means for extracting a pitch component signal representing the pitch component of the voice represented by the voice signal from the acquired voice signal;
Pitch component frequency shifting means for shifting the frequency of the pitch signal;
Pitch component processing means for acquiring the pitch component signal whose frequency has been shifted and processing the pitch component signal so as to have an intermittent pattern on the time axis representing the value of data to be embedded as a digital watermark;
What is necessary is just to include an adding means for generating the digital watermarked audio signal by adding the processed pitch component signal and the audio signal to each other.

前記ピッチ成分抽出手段は、例えば、
前記音声信号が表す音声のピッチ成分の周波数を特定するピッチ周波数特定手段と、
前記音声信号のうち、特定された前記周波数近傍の成分を前記ピッチ成分として抽出するバンドパスフィルタと、を備えるものであればよい。 The pitch component extraction means is, for example,
Pitch frequency specifying means for specifying the frequency of the pitch component of the voice represented by the voice signal;
What is necessary is just to be provided with the band pass filter which extracts the component of the said frequency vicinity specified as said pitch component among the said audio | voice signals.

また、この発明の第２の観点に係る電子透かしデコード装置は、
音声を表す、電子透かしを施された音声信号を取得する音声信号取得手段と、
前記取得された音声信号から、当該音声信号が表す音声のピッチ成分を表すピッチ成分信号を抽出するピッチ成分抽出手段と、
前記抽出されたピッチ成分信号のうち、前記音声のピッチ成分と、前記ピッチ成分の周波数をシフトして得られるピッチ成分が重畳されることにより多重化されている部分の、時間軸上における分布のパターンを特定し、当該特定されたパターンに基づいて、前記音声信号に電子透かしとして埋め込まれたデータを特定する特定手段と、
を備えることを特徴とする。 A digital watermark decoding apparatus according to the second aspect of the present invention provides:
An audio signal acquisition means for acquiring an audio signal to which a digital watermark is applied, which represents audio;
Pitch component extraction means for extracting a pitch component signal representing the pitch component of the voice represented by the voice signal from the acquired voice signal;
Of the extracted pitch component signal, the distribution of the portion on the time axis of the portion that is multiplexed by superimposing the pitch component of the voice and the pitch component obtained by shifting the frequency of the pitch component. Specifying means for specifying a pattern, and specifying data embedded as a digital watermark in the audio signal based on the specified pattern;
It is characterized by providing.

また、この発明の第３の観点に係る電子透かしエンコード方法は、音声信号取得部と音声信号加工部を有する装置において実行される電子透かしエンコード方法であって、
前記音声信号取得部が、音声を表す音声信号を取得する音声信号取得ステップと、
前記音声信号加工部が、前記取得された音声信号に当該音声信号が表す音声のピッチ成分の周波数をシフトして得られるピッチ成分を重畳することにより多重化された部分の時間軸上における分布が、電子透かしとして埋め込む対象のデータの値を表すように、前記取得された音声信号を加工することにより、電子透かしを施された音声信号を生成する音声信号加工ステップと、
を備えることを特徴とする。 A digital watermark encoding method according to a third aspect of the present invention is a digital watermark encoding method executed in an apparatus having an audio signal acquisition unit and an audio signal processing unit,
An audio signal acquisition step in which the audio signal acquisition unit acquires an audio signal representing the audio;
The audio signal processing unit, the distribution on the time axis of the multiplexed portion by superimposing pitch component obtained by shifting the frequency of the pitch component of the speech represented by the audio signal on the obtained audio signal Is a voice signal processing step of generating a digital watermarked voice signal by processing the acquired voice signal so as to represent the value of data to be embedded as a digital watermark ;
It is characterized by providing .

また、この発明の第４の観点に係る電子透かしデコード方法は、音声信号取得部とピッチ成分抽出部と特定部を有する装置において実行される電子透かしデコード方法であって、
前記音声信号取得部が、音声を表す、電子透かしを施された音声信号を取得する音声信号取得ステップと、
前記ピッチ成分抽出部が、前記取得された音声信号から、当該音声信号が表す音声のピッチ成分を表すピッチ成分信号を抽出するピッチ成分抽出ステップと、
前記特定部が、前記抽出されたピッチ成分信号のうち、前記音声のピッチ成分と、前記ピッチ成分の周波数をシフトして得られるピッチ成分が重畳されることにより多重化されている部分の、時間軸上における分布のパターンを特定し、当該特定されたパターンに基づいて、前記音声信号に電子透かしとして埋め込まれたデータを特定する特定ステップと、
を備えることを特徴とする。 A digital watermark decoding method according to a fourth aspect of the present invention is a digital watermark decoding method executed in an apparatus having an audio signal acquisition unit, a pitch component extraction unit, and a specification unit,
An audio signal acquisition step in which the audio signal acquisition unit acquires an audio signal to which a digital watermark is applied, which represents audio;
A pitch component extraction step in which the pitch component extraction unit extracts a pitch component signal representing a pitch component of the voice represented by the voice signal from the acquired voice signal;
Of the extracted pitch component signal, the time of the portion multiplexed by superimposing the pitch component obtained by shifting the frequency of the pitch component and the pitch component obtained by shifting the frequency of the pitch component in the extracted pitch component signal A specifying step of specifying a pattern of distribution on the axis and specifying data embedded as a digital watermark in the audio signal based on the specified pattern;
It is characterized by providing.

また、この発明の第５の観点に係るプログラムは、
コンピュータを、
音声を表す音声信号を取得する音声信号取得手段、
前記取得された音声信号に当該音声信号が表す音声のピッチ成分の周波数をシフトして得られるピッチ成分を重畳することにより多重化された部分の時間軸上における分布が、電子透かしとして埋め込む対象のデータの値を表すように、前記取得された音声信号を加工することにより、電子透かしを施された音声信号を生成する音声信号加工手段、
として機能させるためのものであることを特徴とする。 A program according to the fifth aspect of the present invention is
Computer
An audio signal acquisition means for acquiring an audio signal representing the audio ;
Subject distribution on the time axis of the multiplexed portion by superimposing pitch component obtained by shifting the frequency of the pitch component of the speech represented by the audio signal on the obtained audio signal is embedded as a digital watermark of to represent the value of the data, by processing the acquired voice signal, audio signal processing means for generating an audio signal subjected to digital watermark,
Characterized in that it is intended to function as a.

また、この発明の第６の観点に係るプログラムは、
コンピュータを、
音声を表す、電子透かしを施された音声信号を取得する音声信号取得手段、
前記取得された音声信号から、当該音声信号が表す音声のピッチ成分を表すピッチ成分信号を抽出するピッチ成分抽出手段、
前記抽出されたピッチ成分信号のうち、前記音声のピッチ成分と、前記ピッチ成分の周波数をシフトして得られるピッチ成分が重畳されることにより多重化されている部分の、時間軸上における分布のパターンを特定し、当該特定されたパターンに基づいて、前記音声信号に電子透かしとして埋め込まれたデータを特定する手段、
として機能させるためのものであることを特徴とする。
A program according to the sixth aspect of the present invention is
Computer
An audio signal acquisition means for acquiring an audio signal to which an electronic watermark has been applied, representing the audio;
Pitch component extraction means for extracting a pitch component signal representing the pitch component of the voice represented by the voice signal from the acquired voice signal;
Of the extracted pitch component signal, the distribution of the portion on the time axis of the portion that is multiplexed by superimposing the pitch component of the voice and the pitch component obtained by shifting the frequency of the pitch component. Means for identifying a pattern and identifying data embedded as a digital watermark in the audio signal based on the identified pattern;
It is for making it function as.

この発明によれば、音声を表すデータの自由な供給を図りながら、音声データの有効な保護を図るための電子透かしエンコード装置、電子透かしデコード装置、電子透かしエンコード方法、電子透かしデコード方法及びプログラムが実現される。 According to the present invention, there are provided a digital watermark encoding device, a digital watermark decoding device, a digital watermark encoding method, a digital watermark decoding method, and a program for effectively protecting audio data while freely providing data representing audio. Realized.

以下、この発明の実施の形態を、電子透かしエンコード装置及び電子透かしデコード装置を例とし、図面を参照して説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings, taking a digital watermark encoding apparatus and a digital watermark decoding apparatus as examples.

（電子透かしエンコード装置）
図１は、この発明の実施の形態に係る電子透かしエンコード装置Ｅの構成を示す図である。図示するように、電子透かしエンコード装置Ｅは、音声データ入力部Ｅ１と、ピッチ特定部Ｅ２と、ＢＰＦ（バンドパスフィルタ）Ｅ３と、ピッチシフト部Ｅ４と、スイッチング部Ｅ５と、加算部Ｅ６とにより構成されている。 (Digital watermark encoding device)
FIG. 1 is a diagram showing a configuration of a digital watermark encoding apparatus E according to an embodiment of the present invention. As shown in the figure, the digital watermark encoding apparatus E includes an audio data input unit E1, a pitch specifying unit E2, a BPF (band pass filter) E3, a pitch shift unit E4, a switching unit E5, and an adding unit E6. It is configured.

音声データ入力部Ｅ１、ピッチ特定部Ｅ２、ＢＰＦＥ３、ピッチシフト部Ｅ４、スイッチング部Ｅ５及び加算部Ｅ６は、例えば、ＣＰＵ（中央処理ユニット）等のプロセッサや、このプロセッサが実行するためのプログラムを記憶するメモリなどより構成されている。
なお、音声データ入力部Ｅ１、ピッチ特定部Ｅ２、ＢＰＦＥ３、ピッチシフト部Ｅ４、スイッチング部Ｅ５及び加算部Ｅ６の一部又は全部の機能を単一のプロセッサが行うようにしてもよい。 The audio data input unit E1, the pitch identification unit E2, the BPFE3, the pitch shift unit E4, the switching unit E5, and the addition unit E6 store, for example, a processor such as a CPU (central processing unit) and a program to be executed by the processor. It consists of memory etc.
A single processor may perform a part or all of the functions of the audio data input unit E1, the pitch specifying unit E2, the BPFE3, the pitch shift unit E4, the switching unit E5, and the adding unit E6.

音声データ入力部Ｅ１は、外部より、音声の波形を表すディジタル形式（例えば、ＰＣＭ（パルス符号変調）形式）の入力音声データを取得して、ピッチ特定部Ｅ２、ＢＰＦＥ３及び加算部Ｅ６に供給する。 The voice data input unit E1 obtains input voice data in a digital format (for example, PCM (pulse code modulation) format) representing a voice waveform from the outside, and supplies the input voice data to the pitch specifying unit E2, BPFE3, and the adding unit E6. .

なお、音声データ入力部Ｅ１が入力音声データを取得する手法は任意であり、例えば、図示しないインターフェース回路を介して外部の装置（例えば、ハードディスク装置など）やネットワークから取得してもよいし、図示しない記録媒体ドライブ装置にセットされた記録媒体（例えば、フレキシブルディスクやＣＤ−ＲＯＭなど）から、この記録媒体ドライブ装置を介して読み取ってもよい。 Note that the voice data input unit E1 may acquire any input voice data. For example, the voice data input unit E1 may acquire the voice data from an external device (for example, a hard disk device) or a network via an interface circuit (not shown). You may read from the recording medium (for example, a flexible disk, CD-ROM, etc.) set to the recording medium drive device which does not carry out via this recording medium drive device.

ピッチ特定部Ｅ２は、音声データ入力部Ｅ１より供給された入力音声データが表す音声のピッチ成分の周波数の時間変化を特定し、特定した当該時間変化を表すデータ（以下、ピッチ周波数データと呼ぶ）を、ＢＰＦＥ３へと連続的に供給する。 The pitch specifying unit E2 specifies the time change of the frequency of the pitch component of the voice represented by the input voice data supplied from the voice data input unit E1, and represents the specified time change (hereinafter referred to as pitch frequency data). Is continuously supplied to BPFE3.

ピッチ特定部Ｅ２は、具体的には、例えばこの入力音声データにケプストラム解析を施すことにより、ピッチ成分の周波数の時間変化を特定すればよい。すなわち、例えば入力音声データが表す波形を時間軸上で多数の小部分へと区切り、得られたそれぞれの小部分の強度を、元の値の対数（対数の底は任意）に実質的に等しい値へと変換し、値が変換されたこの小部分のスペクトル（すなわち、ケプストラム）を、高速フーリエ変換の手法（あるいは、離散的変数をフーリエ変換した結果を表すデータを生成する他の任意の手法）により求める。そして、このケプストラムの極大値を与える周波数のうちの最小値を、この小部分におけるピッチ成分の周波数として特定する。 Specifically, the pitch specifying unit E2 may specify the time change of the frequency of the pitch component, for example, by performing cepstrum analysis on the input voice data. That is, for example, the waveform represented by the input voice data is divided into a number of small parts on the time axis, and the intensity of each obtained small part is substantially equal to the logarithm of the original value (the base of the logarithm is arbitrary). Convert this value into a value, and then convert this small portion of the spectrum (ie, the cepstrum) to a Fast Fourier Transform technique (or any other technique that produces data representing the result of Fourier transforming a discrete variable) ) Then, the minimum value among the frequencies giving the maximum value of the cepstrum is specified as the frequency of the pitch component in this small portion.

なお、ピッチ成分の周波数の時間変化は、例えば、特開２００３−１０８１７２号公報に開示された手法に従って入力音声データをピッチ波形データへと変換してから、このピッチ波形データに基づいて特定するようにすると良好な結果が期待できる。具体的には、入力音声データをフィルタリングしてピッチ信号を抽出し、抽出されたピッチ信号に基づいて、入力音声データが表す波形を単位ピッチ長の区間へと区切り、各区間について、ピッチ信号との相関関係に基づいて位相のずれを特定して各区間の位相を揃えることにより、入力音声データをピッチ波形信号へと変換すればよい。そして、得られたピッチ波形信号を入力音声データとして扱い、ケプストラム解析を行う等することにより、ピッチ成分の周波数の時間変化を特定すればよい。 The time change of the frequency of the pitch component is specified based on the pitch waveform data after converting the input voice data into the pitch waveform data according to the method disclosed in Japanese Patent Laid-Open No. 2003-108172, for example. A good result can be expected. Specifically, the input voice data is filtered to extract a pitch signal, and based on the extracted pitch signal, the waveform represented by the input voice data is divided into sections of unit pitch length. The input speech data may be converted into a pitch waveform signal by identifying the phase shift based on the correlation between the two and aligning the phases of the sections. Then, the obtained pitch waveform signal is handled as input voice data, and a cepstrum analysis is performed, for example, so that the time change of the frequency of the pitch component may be specified.

ＢＰＦＥ３は、自己に入力された信号のうち、自己の中心周波数及びその近傍の周波数の成分を通過させ、その他の成分を実質的に遮断するようなフィルタリングを行う。
具体的には、自己の中心周波数を、ピッチ特定部Ｅ２より供給されるピッチ周波数データが示す値（つまり、入力音声データのピッチ成分の周波数の現在の値）に実質的に等しくなるよう設定する。そして、音声データ入力部Ｅ１より供給される入力音声データをフィルタリングして、フィルタリングされた入力音声データ（以下、ピッチ成分と呼ぶ）を、ピッチシフト部Ｅ４へと供給する。なお、ピッチ成分は、いずれも同一の形式のデータからなるものとし、例えば入力音声データのサンプリング間隔と実質的に同一のサンプリング間隔を有するディジタル形式のデータからなるものとする。 The BPFE 3 performs filtering so as to pass components of its own center frequency and frequencies in the vicinity thereof among signals input to the BPFE 3 and substantially block other components.
Specifically, the self center frequency is set to be substantially equal to the value indicated by the pitch frequency data supplied from the pitch specifying unit E2 (that is, the current value of the frequency of the pitch component of the input audio data). . Then, the input voice data supplied from the voice data input unit E1 is filtered, and the filtered input voice data (hereinafter referred to as pitch component) is supplied to the pitch shift unit E4. Note that the pitch components are all composed of data in the same format, for example, digital data having a sampling interval substantially the same as the sampling interval of the input audio data.

ピッチシフト部Ｅ４は、ＢＰＦＥ３よりピッチ成分を取得し、図２に示すように、このピッチ成分の周波数を一定幅（図２において“Δｆ”として示す幅）だけシフトさせて、スイッチング部Ｅ５へと供給する。ピッチシフト部Ｅ４は、具体的には、例えば、この値Δｆに等しい周波数を有する局部発振信号を生成して、この局部発振信号とピッチ成分とを混合し、得られた信号をフィルタリングして、局部発振信号の周波数とピッチ信号の周波数との和又は差に当たる周波数を有する成分を抽出することにより、ピッチ成分の周波数のシフトを行えばよい。 The pitch shift unit E4 acquires the pitch component from the BPFE 3, shifts the frequency of the pitch component by a certain width (the width shown as “Δf” in FIG. 2), as shown in FIG. 2, and then goes to the switching unit E5. Supply. Specifically, for example, the pitch shift unit E4 generates a local oscillation signal having a frequency equal to the value Δf, mixes the local oscillation signal and the pitch component, filters the obtained signal, The frequency of the pitch component may be shifted by extracting a component having a frequency corresponding to the sum or difference between the frequency of the local oscillation signal and the frequency of the pitch signal.

スイッチング部Ｅ５は、周波数をシフトされたピッチ成分がピッチシフト部Ｅ４から加算部Ｅ６へと通過する経路を提供する一方、外部より、電子透かしの形で埋め込む対象の２値のビット列（例えば、入力音声データの作成者名を２値のビット列として表すデータなど）を取得する。そして、このビット列内の各ビット値に従って当該経路を断続することによって、ピッチ成分を、このビット列が表す情報を含むようエンコードする。 The switching unit E5 provides a path through which the frequency-shifted pitch component passes from the pitch shift unit E4 to the addition unit E6, while the binary bit string (for example, input) to be embedded in the form of a digital watermark from the outside. Data representing the name of the creator of the audio data as a binary bit string). Then, the pitch component is encoded so as to include information represented by the bit string by interrupting the path according to each bit value in the bit string.

具体的には、スイッチング部Ｅ５は、例えば図３に示すように、当該ビット列の先頭から順に各々のビットの値を判別し、当該ビットの値が“１”であれば一定時間（図２において“ΔＴ”として示す時間）だけピッチ成分を加算部Ｅ６へと通過させ、値が“０”であれば当該時間ΔＴだけピッチ成分を遮断する、という動作を、当該ビット列の末尾のビットに至るまで続ければよい。 Specifically, as shown in FIG. 3, for example, the switching unit E5 determines the value of each bit in order from the beginning of the bit string, and if the value of the bit is “1”, the switching unit E5 determines a certain time (in FIG. The operation of allowing the pitch component to pass through the adding unit E6 only for the time (ΔT) and blocking the pitch component for the time ΔT if the value is “0” until the last bit of the bit string is reached. You can continue.

なお、スイッチング部Ｅ５がビット列を取得する手法は任意であり、例えば、音声データ入力部Ｅ１が入力音声データを取得する手法と実質的に同一の手法によりビット列を取得すればよい。また、スイッチング部Ｅ５の機能を実現する処理を行うプロセッサが、自己の実行する他の処理により生成されたビット列を、電子透かしの形で埋め込むビット列として、スイッチング部Ｅ５の機能を実現する処理へと引き渡すようにしてもよい。 Note that the method by which the switching unit E5 acquires the bit string is arbitrary, and for example, the bit string may be acquired by a method substantially the same as the method by which the audio data input unit E1 acquires the input audio data. In addition, the processor that performs the process of realizing the function of the switching unit E5 moves to the process of realizing the function of the switching unit E5 by using the bit string generated by the other process performed by itself as a bit string embedded in the form of a digital watermark. You may make it deliver.

加算部Ｅ６は、音声データ入力部Ｅ１より供給される入力音声データを取得し、また、ＢＰＦＥ３よりピッチ成分が供給されている間は、このピッチ成分も取得する。そして、同時に取得した入力音声データ及びピッチ成分の各値の和（ただし、ピッチ成分が供給されていない間は、ピッチ成分の値は０であるものとする）に相当する値を有する出力音声データを生成し、出力する。この出力音声データは、音声データ入力部Ｅ１が取得した入力音声データと同一の形式のデータからなるものとする。 The adder E6 acquires the input audio data supplied from the audio data input unit E1, and also acquires this pitch component while the pitch component is supplied from the BPFE 3. The output audio data having a value corresponding to the sum of the values of the input audio data and the pitch component acquired at the same time (provided that the value of the pitch component is 0 while the pitch component is not supplied). Is generated and output. The output audio data is assumed to be composed of data in the same format as the input audio data acquired by the audio data input unit E1.

出力音声データは、音声データ入力部Ｅ１が取得した入力音声データに電子透かしが施されたものに相当するものであって、ピッチシフト部Ｅ４が取得したビット列が表す情報が埋め込まれているものであるといえる。
具体的には、例えば、入力音声データのピッチ成分がＢＰＦＥ３により抽出され、図２及び図３に示す加工を施された上で加算部Ｅ６により元の入力音声データと加算されることにより出力音声データが生成されたとすると、この出力音声データに含まれるピッチ成分の時間変化は、図４に示す通りとなる。すなわち、この出力音声データは、値“０”のビットを埋め込まれた区間については入力音声データと実質的に同一である一方、値“１”のビットを埋め込まれた区間については、入力音声データに、周波数が元のピッチ成分よりΔｆだけシフトされたもう１個のピッチ成分が重畳されることにより、ピッチ成分が多重化された構造を有している。 The output audio data corresponds to the input audio data acquired by the audio data input unit E1 and is digitally watermarked, in which information represented by the bit string acquired by the pitch shift unit E4 is embedded. It can be said that there is.
Specifically, for example, the pitch component of the input audio data is extracted by BPFE3, processed by the processing shown in FIGS. 2 and 3, and then added to the original input audio data by the adder E6. If data is generated, the time change of the pitch component included in the output audio data is as shown in FIG. That is, this output audio data is substantially the same as the input audio data for the section in which the bit of value “0” is embedded, while the input audio data for the section in which the bit of value “1” is embedded. In addition, another pitch component whose frequency is shifted by Δf from the original pitch component is superposed, so that the pitch component is multiplexed.

なお、加算部Ｅ６が出力音声データを出力する手法は任意であり、例えば、図示しないインターフェース回路を介して外部の装置（例えば、ハードディスク装置など）の記憶領域に格納してもよいし、ネットワークへと送出してもよい。また、図示しない記録媒体ドライブ装置にセットされた記録媒体へ、この記録媒体ドライブ装置を介して書き込んでもよい。 Note that the adding unit E6 can output the output audio data in any manner. For example, the adding unit E6 may store the output audio data in a storage area of an external device (for example, a hard disk device) via an interface circuit (not shown) or to a network. May be sent. Further, the recording medium may be written to a recording medium set in a recording medium drive device (not shown) via the recording medium drive device.

（電子透かしデコード装置）
図５は、この発明の実施の形態に係る電子透かしデコード装置Ｄの構成を示す図である。図示するように、電子透かしデコード装置Ｄは、音声データ入力部Ｄ１と、ピッチ特定部Ｄ２と、ＢＰＦＤ３と、ビット列抽出部Ｄ４とにより構成されている。 (Digital watermark decoding device)
FIG. 5 is a diagram showing the configuration of the digital watermark decoding apparatus D according to the embodiment of the present invention. As shown in the figure, the digital watermark decoding apparatus D is composed of an audio data input unit D1, a pitch specifying unit D2, a BPFD 3, and a bit string extraction unit D4.

音声データ入力部Ｄ１、ピッチ特定部Ｄ２、ＢＰＦＤ３及びビット列抽出部Ｄ４は、例えばＣＰＵ等のプロセッサや、このプロセッサが実行するためのプログラムを記憶するメモリなどより構成されている。なお、音声データ入力部Ｄ１、ピッチ特定部Ｄ２、ＢＰＦＤ３及びビット列抽出部Ｄ４の一部又は全部の機能を単一のプロセッサが行うようにしてもよい。 The audio data input unit D1, the pitch identification unit D2, the BPFD 3, and the bit string extraction unit D4 are configured by, for example, a processor such as a CPU and a memory that stores a program to be executed by the processor. Note that a single processor may perform a part or all of the functions of the audio data input unit D1, the pitch identification unit D2, the BPFD 3, and the bit string extraction unit D4.

音声データ入力部Ｄ１は、外部より、上述の出力音声データに相当する、電子透かしが施されたディジタル形式の音声データ（以下、デコード対象音声データと呼ぶ）を取得して、ピッチ特定部Ｄ２及びＢＰＦＤ３に供給する。音声データ入力部Ｄ１がデコード対象音声データを取得する手法は任意であり、例えば、上述の電子透かしエンコード装置Ｅの音声データ入力部Ｅ１が入力音声データを取得する手法と実質的に同一の手法により取得すればよい。 The audio data input unit D1 obtains, from the outside, digital audio data (hereinafter referred to as decoding target audio data) to which digital watermarking is applied, corresponding to the output audio data described above, and the pitch specifying unit D2 and Supply to BPFD3. The method by which the audio data input unit D1 acquires the decoding target audio data is arbitrary. For example, the audio data input unit E1 of the digital watermark encoding device E described above is substantially the same as the method by which the input audio data is acquired. Get it.

ピッチ特定部Ｄ２は、音声データ入力部Ｄ１より供給されたデコード対象音声データが表す音声のピッチ成分の周波数の時間変化を、例えば電子透かしエンコード装置Ｅのピッチ特定部Ｅ２が行う手法と実質的に同一の手法により特定する。そして、特定した当該時間変化を表すピッチ周波数データを、ＢＰＦＤ３へと連続的に供給する。 The pitch specifying unit D2 is substantially the same as the technique in which the pitch specifying unit E2 of the digital watermark encoding apparatus E performs the time change of the frequency of the pitch component of the voice represented by the decoding target audio data supplied from the audio data input unit D1. Specify by the same method. And the pitch frequency data showing the specified said time change is continuously supplied to BPFD3.

なお、ピッチ特定部Ｄ２は、ピッチ周波数データのうち、ピッチ成分が含まれていないデコード対象音声データの区間に相当する部分については、例えばラグランジェ補間あるいは直線補間などの手法により補間を行うものとすればよい。 Note that the pitch specifying unit D2 interpolates a portion corresponding to a section of the audio data to be decoded that does not include a pitch component in the pitch frequency data by a method such as Lagrange interpolation or linear interpolation, for example. do it.

ＢＰＦＤ３は、例えば電子透かしエンコード装置ＥのＢＰＦＥ３と実質的に同一の機能を行うものであり、自己に入力された信号のうち、自己の中心周波数及びその近傍の周波数の成分を通過させ、その他の成分を実質的に遮断するようなフィルタリングを行う。
そして、フィルタリングされたデコード対象音声データ（以下、デコード用ピッチ成分と呼ぶ）をビット列抽出部Ｄ４へと供給する。 The BPFD 3 performs substantially the same function as the BPFE 3 of the digital watermark encoding apparatus E, for example, and passes the center frequency of the signal input to itself and components in the vicinity thereof, Filtering is performed so as to substantially block the components.
The filtered decoding target audio data (hereinafter referred to as a decoding pitch component) is supplied to the bit string extraction unit D4.

ビット列抽出部Ｄ４は、ＢＰＦＤ３から供給されるデコード用ピッチ成分を取得し、このデコード用ピッチ成分の周波数のシフトのパターンに基づいて、デコード対象音声データに埋め込まれているビット列を抽出する。そして、抽出されたビット列を外部に出力する。 The bit string extraction unit D4 acquires the decoding pitch component supplied from the BPFD 3, and extracts the bit string embedded in the decoding target audio data based on the frequency shift pattern of the decoding pitch component. Then, the extracted bit string is output to the outside.

具体的には、例えばデコード対象音声データのピッチ成分の周波数の時間変化が図４に示す通りである場合、ビット列抽出部Ｄ４は、例えば図４に示すように、デコード用ピッチ成分の先頭から時間ΔＴ分ずつピッチ成分の多重化の有無を判別し、ピッチ成分が多重化されていると判別したときは値が“１”のビットを生成し、また、ピッチ成分が多重化されていないと判別したときは値が“０”のビットを生成する、という処理を順次行うことにより、ビット列の抽出を行えばよい。 Specifically, for example, when the time change of the frequency of the pitch component of the audio data to be decoded is as shown in FIG. 4, the bit string extraction unit D4 performs time from the beginning of the decoding pitch component as shown in FIG. Determines whether or not the pitch component is multiplexed by ΔT, generates a bit with a value of “1” when it is determined that the pitch component is multiplexed, and determines that the pitch component is not multiplexed In this case, the bit string may be extracted by sequentially performing a process of generating a bit having a value of “0”.

なお、ビット列抽出部Ｄ４がビット列を出力する手法は任意であり、例えば、図示しないインターフェース回路を介して外部の装置の記憶領域に格納してもよいし、ネットワークへと送出してもよい。また、図示しない記録媒体ドライブ装置にセットされた記録媒体へ、この記録媒体ドライブ装置を介して書き込んでもよい。 Note that the bit string extraction unit D4 can output the bit string in any manner. For example, the bit string extraction unit D4 may store the bit string in a storage area of an external device via an interface circuit (not shown) or send the bit string to a network. Further, the recording medium may be written to a recording medium set in a recording medium drive device (not shown) via the recording medium drive device.

以上説明した電子透かしエンコード装置Ｅは、入力音声データよりピッチ成分を抽出して周波数をシフトし、このピッチ成分を、埋め込むべきビット列の値に従って断続した上で元の入力音声データに加算することにより、当該ビット列が表す情報を当該音声データに埋め込む（すなわち、当該音声データに電子透かしを施す）。そして電子透かしデコード装置Ｄは、電子透かしエンコード装置Ｅによって音声データに埋め込まれたビット列を、当該音声データから抽出する。 The digital watermark encoding apparatus E described above extracts the pitch component from the input audio data, shifts the frequency, and adds the pitch component to the original input audio data after being intermittent according to the value of the bit string to be embedded. The information represented by the bit string is embedded in the audio data (that is, the audio data is digitally watermarked). Then, the digital watermark decoding device D extracts the bit string embedded in the audio data by the digital watermark encoding device E from the audio data.

一般的に、人間が発する音声のピッチ成分の周波数は２００［Ｈｚ］前後であり、一方、人間の可聴域はほぼ２０〜２００００［Ｈｚ］の範囲にあるものの、２００［Ｈｚ］前後の周波数の成分については感度が低く、このような成分に変化を生じさせてもこの変化を聞き分けることは困難である。従って、電子透かしエンコード装置Ｅによって電子透かしを施された音声データを用いて再生された音声と、電子透かしを施される前の当該音声データを用いて再生された音声との差異は、人間には識別が困難である。すなわち、電子透かしエンコード装置Ｅによる電子透かしが音声に与える影響は無視できる程度に抑えられる。 In general, the frequency of the pitch component of the voice uttered by human beings is around 200 [Hz], while the human audible range is in the range of about 20 to 20000 [Hz], but the frequency around 200 [Hz]. The sensitivity of the component is low, and even if a change is caused in such a component, it is difficult to distinguish this change. Therefore, the difference between the audio reproduced using the audio data that has been digitally watermarked by the digital watermark encoding apparatus E and the audio reproduced using the audio data before the digital watermarking is Is difficult to identify. In other words, the influence of the digital watermark by the digital watermark encoding apparatus E on the sound is suppressed to a negligible level.

また、電子透かしエンコード装置Ｅは音声データのピッチ成分を一部多重化することにより電子透かしを施すものである。従って、多重化されたピッチ成分のいずれが本来のピッチ成分であるかを知らない者は、電子透かしエンコード装置Ｅにより電子透かしを施された音声データから、電子透かしを施される前の音声データを完全に復元することができない。
また、電子透かしエンコード装置Ｅによって電子透かしを施された音声データをアナログコピーして得られたデータからも、埋め込まれたビット列の抽出が可能である。 The digital watermark encoding apparatus E applies digital watermarking by partially multiplexing the pitch component of the audio data. Therefore, a person who does not know which of the multiplexed pitch components is the original pitch component can use the audio data before being subjected to the digital watermark from the audio data subjected to the digital watermark by the digital watermark encoding device E. Cannot be completely restored.
Also, it is possible to extract an embedded bit string from data obtained by analog copying audio data that has been digitally watermarked by the digital watermark encoding device E.

なお、電子透かしエンコード装置Ｅ及び電子透かしデコード装置Ｄの構成は上述のものに限られない。
例えば、音声データ入力部Ｅ１は、マイクロフォン、増幅器、サンプリング回路、Ａ／Ｄ（Analog-to-Digital）コンバータ及びＰＣＭエンコーダなどを備えていてもよい。この場合、音声データ入力部Ｅ１は、自己のマイクロフォンが集音した音声を表す音声信号を増幅し、サンプリングしてＡ／Ｄ変換した後、サンプリングされた音声信号にＰＣＭ変調を施すことにより、音声データを作成してもよい。 The configurations of the digital watermark encoding device E and the digital watermark decoding device D are not limited to those described above.
For example, the audio data input unit E1 may include a microphone, an amplifier, a sampling circuit, an A / D (Analog-to-Digital) converter, a PCM encoder, and the like. In this case, the audio data input unit E1 amplifies the audio signal representing the audio collected by its own microphone, performs sampling and A / D conversion, and then performs PCM modulation on the sampled audio signal, thereby generating audio. Data may be created.

以上、この発明の実施の形態を説明したが、この発明にかかる電子透かしエンコード装置及び電子透かしデコード装置は、専用のシステムによらず、通常のコンピュータシステムを用いて実現可能である。 Although the embodiment of the present invention has been described above, the digital watermark encoding apparatus and the digital watermark decoding apparatus according to the present invention can be realized using a normal computer system, not a dedicated system.

例えば、パーソナルコンピュータに上述の音声データ入力部Ｅ１、ピッチ特定部Ｅ２、ＢＰＦＥ３、ピッチシフト部Ｅ４、スイッチング部Ｅ５及び加算部Ｅ６の動作を実行させるためのプログラムを格納した記録媒体（ＣＤ−ＲＯＭ、フレキシブルディスク等）から該プログラムをインストールすることにより、上述の処理を実行する電子透かしエンコード装置Ｅを構成することができる。 For example, a recording medium (CD-ROM, which stores a program for causing a personal computer to execute the operations of the audio data input unit E1, the pitch identification unit E2, BPFE3, the pitch shift unit E4, the switching unit E5, and the addition unit E6 described above. By installing the program from a flexible disk or the like, the digital watermark encoding apparatus E that executes the above-described processing can be configured.

また、パーソナルコンピュータに上述の音声データ入力部Ｄ１、ピッチ特定部Ｄ２、ＢＰＦＤ３及びビット列抽出部Ｄ４の動作を実行させるためのプログラムを格納した記録媒体（ＣＤ−ＲＯＭ、フレキシブルディスク等）から該プログラムをインストールすることにより、上述の処理を実行する電子透かしデコード装置Ｄを構成することができる。 Further, the program is stored from a recording medium (CD-ROM, flexible disk, etc.) that stores a program for causing the personal computer to execute the operations of the audio data input unit D1, pitch identification unit D2, BPFD3, and bit string extraction unit D4. By installing, it is possible to configure the digital watermark decoding apparatus D that executes the above-described processing.

なお、パーソナルコンピュータに電子透かしエンコード装置Ｅ又は電子透かしデコード装置Ｄの機能を行わせるプログラムは、例えば、通信回線の掲示板（ＢＢＳ）にアップロードし、これを通信回線を介して配信してもよく、また、これらのプログラムを表す信号により搬送波を変調し、得られた変調波を伝送し、この変調波を受信した装置が変調波を復調してこれらのプログラムを復元するようにしてもよい。
そして、これらのプログラムを起動し、ＯＳの制御下に、他のアプリケーションプログラムと同様に実行することにより、上述の処理を実行することができる。 Note that a program for causing a personal computer to perform the function of the digital watermark encoding device E or the digital watermark decoding device D may be uploaded to a bulletin board (BBS) of a communication line and distributed via the communication line, for example. Further, the carrier wave may be modulated with a signal representing these programs, the obtained modulated wave may be transmitted, and the apparatus that receives the modulated wave may demodulate the modulated wave to restore these programs.
The above-described processing can be executed by starting up these programs and executing them under the control of the OS in the same manner as other application programs.

なお、ＯＳが処理の一部を分担する場合、あるいは、ＯＳが本願発明の１つの構成要素の一部を構成するような場合には、記録媒体には、その部分を除いたプログラムを格納してもよい。この場合も、この発明では、その記録媒体には、コンピュータが実行する各機能又はステップを実行するためのプログラムが格納されているものとする。 When the OS shares a part of the processing, or when the OS constitutes a part of one component of the present invention, a program excluding the part is stored in the recording medium. May be. Also in this case, in the present invention, it is assumed that the recording medium stores a program for executing each function or step executed by the computer.

この発明の実施の形態に係る電子透かしエンコード装置の構成を示すブロック図である。It is a block diagram which shows the structure of the digital watermark encoding apparatus which concerns on embodiment of this invention. ピッチ成分の周波数がシフトされる態様を示すグラフである。It is a graph which shows the aspect by which the frequency of a pitch component is shifted. 周波数をシフトされたピッチ成分が断続される態様を示すグラフである。It is a graph which shows the aspect by which the pitch component shifted in frequency is intermittent. 出力音声データのピッチ成分の周波数の時間変化を示すグラフである。It is a graph which shows the time change of the frequency of the pitch component of output audio | voice data. この発明の実施の形態に係る電子透かしデコード装置の構成を示すブロック図である。It is a block diagram which shows the structure of the digital watermark decoding apparatus concerning embodiment of this invention.

Explanation of symbols

Ｅ電子透かしエンコード装置
Ｅ１音声データ入力部
Ｅ２ピッチ特定部
Ｅ３ＢＰＦ
Ｅ４ピッチシフト部
Ｅ５スイッチング部
Ｅ６加算部
Ｄ電子透かしデコード装置
Ｄ１音声データ入力部
Ｄ２ピッチ特定部
Ｄ３ＢＰＦ
Ｄ４ビット列抽出部 E Digital watermark encoding device E1 Audio data input unit E2 Pitch identification unit E3 BPF
E4 Pitch shift section E5 Switching section E6 Addition section D Digital watermark decoding device D1 Audio data input section D2 Pitch identification section D3 BPF
D4 Bit string extraction unit

Claims

An audio signal acquisition means for acquiring an audio signal representing the audio;
The distribution on the time axis of the portion multiplexed by superimposing the pitch component obtained by shifting the frequency of the pitch component of the voice represented by the voice signal on the acquired voice signal is the target of embedding as a digital watermark. Audio signal processing means for generating a digital watermarked audio signal by processing the acquired audio signal so as to represent a data value;
An electronic watermark encoding apparatus comprising:

The voice signal processing means is
Pitch component extraction means for extracting a pitch component signal representing the pitch component of the voice represented by the voice signal from the acquired voice signal;
Pitch component frequency shifting means for shifting the frequency of the pitch signal;
Pitch component processing means for acquiring the pitch component signal whose frequency has been shifted and processing the pitch component signal so as to have an intermittent pattern on the time axis representing the value of data to be embedded as a digital watermark;
Adding means for generating the digital watermarked audio signal by adding the processed pitch component signal and the audio signal to each other;
The digital watermark encoding apparatus according to claim 1.

The pitch component extraction means includes
Pitch frequency specifying means for specifying the frequency of the pitch component of the voice represented by the voice signal;
A band pass filter that extracts a component in the vicinity of the identified frequency as the pitch component of the audio signal,
The digital watermark encoding apparatus according to claim 2.

An audio signal acquisition means for acquiring an audio signal to which a digital watermark is applied, which represents audio;
Pitch component extraction means for extracting a pitch component signal representing the pitch component of the voice represented by the voice signal from the acquired voice signal;
Of the extracted pitch component signal, the distribution of the portion on the time axis of the portion that is multiplexed by superimposing the pitch component of the voice and the pitch component obtained by shifting the frequency of the pitch component. Specifying means for specifying a pattern, and specifying data embedded as a digital watermark in the audio signal based on the specified pattern;
An electronic watermark decoding apparatus comprising:

An electronic watermark encoding method executed in an apparatus having an audio signal acquisition unit and an audio signal processing unit,
An audio signal acquisition step in which the audio signal acquisition unit acquires an audio signal representing the audio;
A distribution on the time axis of a portion multiplexed by the audio signal processing unit superimposing a pitch component obtained by shifting the frequency of the pitch component of the audio represented by the audio signal on the acquired audio signal An audio signal processing step for generating a digital watermarked audio signal by processing the acquired audio signal so as to represent a value of data to be embedded as a digital watermark;
An electronic watermark encoding method comprising:

An electronic watermark decoding method executed in an apparatus having an audio signal acquisition unit, a pitch component extraction unit, and a specification unit,
An audio signal acquisition step in which the audio signal acquisition unit acquires an audio signal to which a digital watermark is applied, which represents audio;
A pitch component extraction step in which the pitch component extraction unit extracts a pitch component signal representing a pitch component of the voice represented by the voice signal from the acquired voice signal;
Of the extracted pitch component signal, the time of the portion multiplexed by superimposing the pitch component obtained by shifting the frequency of the pitch component and the pitch component obtained by shifting the frequency of the pitch component in the extracted pitch component signal A specifying step of specifying a pattern of distribution on the axis and specifying data embedded as a digital watermark in the audio signal based on the specified pattern;
An electronic watermark decoding method comprising:

Computer
An audio signal acquisition means for acquiring an audio signal representing the audio;
The distribution on the time axis of the portion multiplexed by superimposing the pitch component obtained by shifting the frequency of the pitch component of the voice represented by the voice signal on the acquired voice signal is the target of embedding as a digital watermark. A voice signal processing means for generating a digital watermarked voice signal by processing the acquired voice signal so as to represent a data value;
Program to function as.

Computer
An audio signal acquisition means for acquiring an audio signal to which an electronic watermark has been applied, representing the audio;
Pitch component extraction means for extracting a pitch component signal representing the pitch component of the voice represented by the voice signal from the acquired voice signal;
Of the extracted pitch component signal, the distribution of the portion on the time axis of the portion that is multiplexed by superimposing the pitch component of the voice and the pitch component obtained by shifting the frequency of the pitch component. Means for identifying a pattern and identifying data embedded as a digital watermark in the audio signal based on the identified pattern;
Program to function as.