JP2013257378A

JP2013257378A - Device of embedding disturbing sound to acoustic signal

Info

Publication number: JP2013257378A
Application number: JP2012131980A
Authority: JP
Inventors: Toshio Modegi; 敏雄茂出木
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 2012-06-11
Filing date: 2012-06-11
Publication date: 2013-12-26
Anticipated expiration: 2032-06-11
Also published as: JP6078993B2

Abstract

PROBLEM TO BE SOLVED: To provide a device of embedding disturbance sound to acoustic signals, capable of preventing duplication in an original state when duplicate acoustic signals after recording are played back, when recording is performed without a permission by a sound recorder.SOLUTION: A first acoustic frame of N samples is read from broadband acoustic signals, a second acoustic frame of M samples is read from second acoustic signals (S2, 3), and a first spectrum and a second spectrum are obtained (S4, 5). A frequency Ft and higher frequencies are removed in signal components of the first spectrum, the frequencies lower than Ft are positive/negative inverted, folded and added to a high band (S6). The frequency Ft and the higher frequencies of the second spectrum are removed, and the frequencies lower than Ft that are folded are combined with the signal components of corresponding frequency of the first spectrum (S7). After frequency-time conversion (S8), to modified acoustic frames for two channels, correction is performed to change values of two samples in the total of four samples which are adjacent two samples for the two channels so that the adjacent two samples match when adding corresponding samples of both channels (S9).

Description

本発明は、ＣＤ・ＤＶＤ・ＢＤ等を用いた民生・業務用途における鑑賞用のパッケージ音楽分野、および音楽コンテンツプロバイダー等が商業目的で配信するネットワーク音楽配信分野に関し、特に、音楽コンテンツのコピーを防止する技術に関する。 The present invention relates to the field of packaged music for viewing in consumer and business applications using CDs, DVDs, BDs, and the like, and the network music distribution field distributed by music content providers for commercial purposes, and in particular, prevents copying of music content. Related to technology.

従来、音楽コンテンツの複製防止のため、様々な技術が開発されている。例えば、ＤＲＭ（Digital Rights Management：特許文献１参照）という方式では、デジタルの音楽コンテンツに暗号化を施すことにより音楽コンテンツの複製を防止している。具体的には、商用ＤＶＤやＢＤメディアから別のＤＶＤ−Ｒ等の記録メディアにデジタルケーブル接続によりレコーダ機器で録画したり、パソコンのドライブに挿入してパソコン上で動画ファイルとしてリッピングしたりすることを不可能にしている。ただし、音楽ＣＤに対しては諸般の事情により現状ではＤＲＭが解除されているため、特にレンタルＣＤからリッピングして作成されたデジタル的な違法コピー音楽ファイルがサイトにアップされ流通してしまう被害が問題になっている。このＤＲＭ方式では、デジタルコンテンツのコピーを防止することは可能であるが、アナログコンテンツのコピーを防止することはできない。すなわち、再生中のディスプレイ画面をビデオカメラで撮影したり、再生信号をスピーカ出力よりラインまたはマイクロフォンで録音したりすることにより複製可能となる。現状、最も大きな問題は、映画館やホールなどに小型ビデオカメラを持ち込み、スクリーンに映し出された映像とともに、スピーカから流れるサウンドトラックが収録され、無尽蔵にＤＶＤが作成され商品（海賊版）として出荷されたり、違法コピー動画ファイルが動画サイトやＰ２Ｐを介して流通したりしている例がある。近年の民生用ビデオカメラはＨＤＴＶ対応になっておりＢＤ並みの画質で記録が可能であるため、それをマスターにして複製されるＤＶＤは商用品質を確保することが容易である。 Conventionally, various techniques have been developed to prevent duplication of music content. For example, in a method called DRM (Digital Rights Management: Patent Document 1), duplication of music content is prevented by encrypting digital music content. Specifically, recording from a commercial DVD or BD media to another DVD-R or other recording media using a digital cable connection with a recorder device, or inserting it into a PC drive and ripping it as a video file on the PC Making it impossible. However, DRM has been released for music CDs due to various circumstances, so that digital pirated music files created by ripping from rental CDs can be uploaded to the site and distributed. It is a problem. In this DRM method, copying of digital content can be prevented, but copying of analog content cannot be prevented. In other words, it is possible to copy the image by photographing the display screen being reproduced with a video camera or recording the reproduction signal with a line or a microphone from the speaker output. At present, the biggest problem is that a small video camera is brought into a movie theater or hall, and the soundtrack flowing from the speaker is recorded along with the image projected on the screen, and the DVD is created inexhaustibly and shipped as a product (pirated version). There are examples where illegally copied video files are distributed via video sites and P2P. Since consumer video cameras in recent years are HDTV compatible and can be recorded with an image quality comparable to that of a BD, it is easy to ensure commercial quality for a DVD that is duplicated using it as a master.

幸い、日本国内では、2007年より施行された「映画の盗撮の防止に関する法律」の効果により、スクリーン盗撮事件は2009年の愛知県での現行犯逮捕を最後に終焉したが、新たな問題が浮上している。海外では盗撮が相変わらず行われているため、盗撮洋画がネットを介して日本に上陸しているが、そのままでは日本で売れない。即ち、日本国内で流通させるためには、その盗撮洋画に字幕スーパーを入れるか、日本語に吹き替える必要があるが、後者の方が、圧倒的に需要が大きい。そこで、海外から移入された洋画の日本語吹き替え版が上映されている映画館で音声だけ隠し録りされ、海外盗撮映像にアフレコされ出荷されている。邦画に対しても、海外に輸出され違法コピーされた外国語版邦画の違法コピー品を逆移入し、日本国内で音声だけ隠し撮りし、海外盗撮邦画にアフレコして国内で出荷されている例も見つかった。盗音はスクリーン盗撮のように三脚などを必要とせず、ポケットにレコーダを隠して行えるため、暗い映画館の中で犯行を見つけることは不可能に近い。即ち、映像に対する盗撮は法整備により撲滅できたが、音響に対する盗音に対しては刃が立たず、前述のレンタル音楽ＣＤの違法リッピング問題を含め、現状では、音響コンテンツ保護に対する需要は映像コンテンツ保護に比べ格段に大きい。 Fortunately, in Japan, due to the effect of the “Act on the Prevention of Camera Voyeurism” enforced since 2007, the screen voyeurism incident ended the arrest of current offenders in Aichi Prefecture in 2009, but a new problem emerged doing. Since voyeurism continues to take place overseas, voyeur movies have landed in Japan via the Internet, but cannot be sold in Japan as they are. In other words, in order to distribute it in Japan, it is necessary to put a caption supervision into the voyeur movie or to dubb it into Japanese, but the latter is overwhelmingly in demand. Therefore, only the audio is concealed in a movie theater where a Japanese dubbed version of a foreign film imported from overseas is being screened, and it is dubbed into an overseas voyeur video and shipped. An example of a foreign film that was exported overseas and illegally copied to a Japanese film was illegally copied, and only the voice was hidden in Japan. Also found. It is almost impossible to find a crime in a dark movie theater because it does not require a tripod like a screen voyeur and can be hidden in a pocket. In other words, voyeurism for video has been eradicated by legal development, but there is no edge for acoustic voyeurism. At present, the demand for audio content protection, including the illegal ripping problem of rental music CDs, is video content. It is much larger than protection.

特表２００３−５１７７６７号公報Special table 2003-517767 gazette ＷＯ２０１１／００２０５９WO2011 / 002059

アナログコンテンツの複製を防止する手法として、主として前述の海賊版ＤＶＤ製造への対抗策としては、映像信号に不可視のコピー妨害信号を付加する技術が提案されている（特許文献２参照）。特許文献２に開示の手法では、コピー妨害信号として赤外線を用いているため、人間には不可視だが、ビデオカメラには映り込み、違法コピーを牽制することができる。しかし、コピー妨害信号はコンテンツ自体には埋め込むことができず、コピー妨害信号を発射する特殊なモジュールを装着したスクリーンやディスプレイでないと機能せず、業務用ビデオカメラや赤外カットフィルタを装着したカメラを用いるとコピー妨害信号の映り込みを回避できるという問題がある。また、映像のサウンドトラックの違法コピーに対しては全く無防備である。 As a technique for preventing duplication of analog contents, a technique for adding an invisible copy disturbing signal to a video signal has been proposed as a countermeasure against the aforementioned pirated DVD manufacturing (see Patent Document 2). In the method disclosed in Patent Document 2, since infrared rays are used as a copy interference signal, it is invisible to humans, but is reflected in a video camera, and illegal copying can be suppressed. However, copy disturbing signals cannot be embedded in the content itself, and only work with screens and displays equipped with special modules that emit copy disturbing signals. Professional video cameras and cameras equipped with infrared cut filters When using, there is a problem that reflection of a copy disturbing signal can be avoided. Also, they are completely vulnerable to illegal copying of video soundtracks.

一方、録音機器も進化しており、最近のボイスレコーダーは携帯型でありながらも、ステレオ録音可能となっている。しかし、違法録音を行おうとする場合、隠し録りをすることになり、２本のマイクを立てることは困難であるから、複数のチャンネル信号が混合したモノラル信号に近い状態で録音される。特に日本では海賊版（商用）を作成する目的で違法録音が行われることは殆どなく、ネット動画サイトで無償で流通させるボランティア的に違法録音が行われる。（営利を目的としないことから、リスクの大きいスクリーン盗撮は行われなくなった。）そのため、画質・音質は一切問われず、サウンドトラックについては音楽が多少歪んでいてもセリフが聞き取れれば問題なく、もちろんモノラルで充分である。 On the other hand, recording equipment has also evolved, and recent voice recorders are portable and can be recorded in stereo. However, if illegal recording is to be performed, recording is concealed and it is difficult to set up two microphones, so recording is performed in a state close to a monaural signal in which a plurality of channel signals are mixed. In Japan, illegal recordings are rarely made for the purpose of creating pirated versions (commercial), and illegal recordings are voluntarily distributed on online video sites. (Because it is not for profit, high-risk screen sneak shots are no longer performed.) Therefore, there is no problem with the image quality and sound quality, and if the sound can be heard even if the music is somewhat distorted, Of course, mono is sufficient.

そこで、本発明は、ボイスレコーダー等の録音機器により無許可で録音を行われた場合であっても、録音後のデジタル化された複製音響信号を再生した際、原音響信号に非可聴に埋め込まれた妨害音が可聴となって再生されることにより、本来の状態での複製を防止することが可能な音響信号に対する妨害音の埋め込み装置を提供することを課題とする。 Therefore, the present invention is embedded in the original sound signal inaudibly when the digitally reproduced duplicate sound signal is reproduced even when the sound is recorded without permission by a recording device such as a voice recorder. It is an object of the present invention to provide a device for embedding a disturbing sound with respect to an acoustic signal that can prevent duplication in an original state by reproducing the disturbing sound that is audible.

上記課題を解決するため、本発明第１の態様では、少なくとも２チャンネル以上の時系列のサンプル列で構成される原音響信号に対して、時系列のサンプル列で構成される第２の音響信号を妨害音として聴取不能な状態で埋め込む装置であって、前記原音響信号から２チャンネルを選出し、チャンネルごとに所定数のサンプルで構成される音響フレームを第１音響フレームとして読み込むとともに、前記第２の音響信号から所定数のサンプルで構成される音響フレームを第２音響フレームとして読み込む音響フレーム読込手段と、前記第１音響フレームに対してチャンネルごとに時間−周波数変換を行い、複素周波数成分である第１スペクトルを得るとともに、前記第２音響フレームに対して時間−周波数変換を行い、複素周波数成分である第２スペクトルを得る時間−周波数変換手段と、前記第２スペクトルの信号成分の中で、周波数Ｆｔ以上の信号成分を除去するとともに、前記周波数Ｆｔを中心に前記周波数Ｆｔ以下の信号成分を高域の周波数方向に折り返し、折り返された周波数Ｆｔから周波数２Ｆｔに含まれる所定範囲の信号成分に対して所定の係数値（β、βｌ、βｒ）を乗算しながら、前記第１スペクトルの対応する周波数の信号成分に加算することにより、前記第１スペクトルの周波数成分をチャンネルごとに改変する周波数成分改変手段と、前記周波数成分が改変された第１スペクトルに対して周波数−時間変換を行って、改変音響フレームを生成する周波数−時間変換手段と、前記生成された２チャンネル分の改変音響フレームに対して、双方のチャンネルを対応するサンプルごとに加算した際に、隣接する２サンプルを一致させるように、２チャンネル分の隣接する２サンプルの計４サンプルを一組として各組のうち２サンプルの値を変化させる補正を行う改変音響フレーム補正手段と、前記改変音響フレーム補正手段により補正された改変音響フレームを順次出力する改変音響フレーム出力手段を有することを特徴とする音響信号に対する妨害音の埋込み装置を提供する。 In order to solve the above-described problem, in the first aspect of the present invention, a second acoustic signal composed of a time-series sample sequence is compared to an original acoustic signal composed of a time-series sample sequence of at least two channels. In an inaudible state as an interfering sound, wherein two channels are selected from the original sound signal, a sound frame composed of a predetermined number of samples is read for each channel, and the first sound frame is read. A sound frame reading means for reading a sound frame composed of a predetermined number of samples from two sound signals as a second sound frame, and performing time-frequency conversion for each channel on the first sound frame, and a complex frequency component A certain first spectrum is obtained, and time-frequency conversion is performed on the second sound frame to obtain a complex frequency component. A time-frequency converting means for obtaining two spectra, and removing signal components above the frequency Ft from the signal components of the second spectrum, and converting signal components below the frequency Ft around the frequency Ft to high frequencies A signal of a frequency corresponding to the first spectrum is obtained by folding in a frequency direction and multiplying a predetermined range of signal components included in the frequency 2Ft from the folded frequency Ft by a predetermined coefficient value (β, βl, βr). A frequency component modifying unit that modifies the frequency component of the first spectrum for each channel by adding to the component; and frequency-time conversion is performed on the first spectrum in which the frequency component is modified, and the modified acoustic frame The frequency-time conversion means for generating the two-channel and the two modified channels of the generated sound frames are paired with each other. Modification that corrects the value of 2 samples in each set by making a total of 4 samples of 2 adjacent samples for 2 channels so that 2 adjacent samples match when added for each sample. There is provided an apparatus for embedding an interference sound in an acoustic signal, comprising acoustic frame correcting means and modified acoustic frame output means for sequentially outputting modified acoustic frames corrected by the modified acoustic frame correcting means.

本発明第１の態様によれば、埋め込むべき第２音響信号から得られた第２スペクトルの信号成分の中で、周波数Ｆｔ以上の信号成分を除去するとともに、周波数Ｆｔを中心に周波数Ｆｔ以下の信号成分を高域の周波数方向に折り返し、折り返された周波数Ｆｔから周波数２Ｆｔ（Ｆｔの２倍）の範囲の第２スペクトルの信号成分に対して所定の係数値（β、βｌ、βｒ）を乗算しながら、改変後のチャンネルごとの第１スペクトルの対応する周波数の信号成分に加算して改変音響フレームを得て、改変音響フレームに対して、双方のチャンネルを対応するサンプルごとに加算した際に、隣接する２サンプルを一致させるように、一方のチャンネルの２サンプルの値を変化させる補正を行うようにしたので、単一のマイクロフォンを介したモノラル録音を行うことにより、双方のチャンネルが混合し、混合された録音信号は擬似的に１／２に間引かれ再サンプリングされた状態と等価になり、複製時におけるＬＰＦ（ＬｏｗＰａｓｓＦｉｌｔｅｒ）による影響をほとんど受けることなくエイリアシングが発生し、２チャンネル以上のステレオ、またはサラウンド音響空間で、録音機器によりモノラルで録音複製された複製音響信号を再生した場合に、妨害音を可聴化することができる。ここで、「疑似的な１／２再サンプリング」という意味は、物理的にはサンプリング周波数および総サンプル数に変化は無いが、隣接サンプルを同一にするということは、フーリエ変換すると、サンプリング周波数および総サンプル数を１／２に改変することと等価になる。これにより、複製音響信号を再生すると、原音響信号に記録された音と同時に、原音響信号に記録された音とは異なる音楽やメッセージが妨害音として発せられるため、複製する意味がなくなり、複製を抑止することが可能となる。 According to the first aspect of the present invention, the signal component having the frequency Ft or higher is removed from the signal component of the second spectrum obtained from the second acoustic signal to be embedded, and the frequency Ft or lower is centered on the frequency Ft. The signal component is folded back in the high frequency direction, and the second spectrum signal component in the range from the folded frequency Ft to the frequency 2Ft (twice Ft) is multiplied by a predetermined coefficient value (β, βl, βr). However, when the modified acoustic frame is obtained by adding to the signal component of the corresponding frequency of the first spectrum for each channel after modification, both channels are added to the modified acoustic frame for each corresponding sample. Since the correction was made to change the value of two samples of one channel so as to match two adjacent samples, the monophonic signal via a single microphone was used. By recording, both channels are mixed, and the mixed recording signal is equivalent to a state in which the mixed recording signal is artificially decimated by half and resampled, and is affected by LPF (Low Pass Filter) during duplication. Aliasing occurs almost without being received, and when a duplicate sound signal recorded and reproduced in monaural by a recording device in a stereo or surround sound space of two or more channels is reproduced, the disturbing sound can be made audible. Here, the meaning of “pseudo ½ resampling” does not physically change the sampling frequency and the total number of samples, but the fact that the adjacent samples are the same means that if the Fourier transform is performed, the sampling frequency and This is equivalent to changing the total number of samples to ½. As a result, when the duplicate sound signal is reproduced, music or a message different from the sound recorded in the original sound signal is emitted as an interfering sound simultaneously with the sound recorded in the original sound signal. Can be suppressed.

また、本発明第２の態様では、少なくとも２チャンネル以上の時系列のサンプル列で構成される原音響信号に対して、時系列のサンプル列で構成される雑音を妨害音として聴取不能な状態で埋め込む装置であって、前記原音響信号から２チャンネルを選出し、チャンネルごとに所定数のサンプルで構成される音響フレームを第１音響フレームとして読み込む音響フレーム読込手段と、前記第１音響フレームに対してチャンネルごとに時間−周波数変換を行い、複素周波数成分である第１スペクトルを得る時間−周波数変換手段と、前記第１スペクトルの信号成分の中で、周波数Ｆｔから周波数２Ｆｔに含まれる所定範囲の信号成分に対して、信号の絶対値が所定値（γ、γｌ、γｒ）だけ増加するように、前記第１スペクトルの周波数成分を改変する周波数成分改変手段と、前記周波数成分が改変された第１スペクトルに対して周波数−時間変換を行って、改変音響フレームを生成する周波数−時間変換手段と、前記生成された２チャンネル分の改変音響フレームに対して、双方のチャンネルを対応するサンプルごとに加算した際に、隣接する２サンプルを一致させるように、２チャンネル分の隣接する２サンプルの計４サンプルを一組として各組のうち２サンプルの値を変化させる補正を行う改変音響フレーム補正手段と、前記改変音響フレーム補正手段により補正された改変音響フレームを順次出力する改変音響フレーム出力手段を有することを特徴とする音響信号に対する妨害音の埋込み装置を提供する。 Further, in the second aspect of the present invention, with respect to an original sound signal composed of a time-series sample sequence of at least two channels, the noise composed of the time-series sample sequence is inaudible as an interference sound. An apparatus for embedding, wherein two channels are selected from the original sound signal, and a sound frame reading means for reading a sound frame composed of a predetermined number of samples for each channel as a first sound frame; A time-frequency conversion means for performing time-frequency conversion for each channel to obtain a first spectrum which is a complex frequency component, and a predetermined range included in the frequency 2Ft from the frequency Ft among the signal components of the first spectrum. The frequency component of the first spectrum is modified so that the absolute value of the signal increases by a predetermined value (γ, γl, γr) with respect to the signal component. Frequency component modifying means for performing frequency-time conversion on the first spectrum in which the frequency component is modified to generate a modified acoustic frame, and modification for the generated two channels When adding both channels for each sample corresponding to the sound frame, a total of four samples of two adjacent samples for two channels are set as one set so that the two adjacent samples match. Interference with an acoustic signal, comprising: modified acoustic frame correcting means for performing correction to change the value of two samples; and modified acoustic frame output means for sequentially outputting modified acoustic frames corrected by the modified acoustic frame correcting means A sound embedding device is provided.

本発明第２の態様によれば、原音響信号から得られた第１スペクトルの信号成分の中で、周波数Ｆｔ以上、周波数２Ｆｔ（Ｆｔの２倍）以下の信号成分に対して信号の絶対値が所定値（γ、γｌ、γｒ）だけ増加するように、前記第１スペクトルの周波数成分を改変して改変音響フレームを得て、改変音響フレームに対して、双方のチャンネルを対応するサンプルごとに加算した際に、一方のチャンネルにおいて隣接する２サンプルを一致させるように、一方のチャンネルの２サンプルの値を変化させる補正を行うようにしたので、単一のマイクロフォンを介したモノラル録音を行うことにより、双方のチャンネルが混合し、混合された録音信号は擬似的に１／２に間引かれ再サンプリングされた状態と等価になり、複製時におけるＬＰＦによる影響をほとんど受けることなくエイリアシングが発生し、２チャンネル以上のステレオ、またはサラウンド音響空間で、録音機器によりモノラルで録音複製された複製音響信号を再生した場合に、妨害音を可聴化することができる。これにより、複製音響信号を再生すると、原音響信号に記録された音と同時に、意味をなさない雑音が妨害音として発せられるため、複製する意味がなくなり、複製を抑止することが可能となる。 According to the second aspect of the present invention, among the signal components of the first spectrum obtained from the original sound signal, the absolute value of the signal with respect to the signal component of the frequency Ft or more and the frequency 2Ft (twice Ft) or less. Is modified by a predetermined value (γ, γl, γr) to obtain a modified acoustic frame by modifying the frequency components of the first spectrum, and for each sample corresponding to both channels for the modified acoustic frame. When adding, correction was made to change the value of two samples in one channel so that two adjacent samples in one channel would match, so monaural recording via a single microphone should be performed As a result, both channels are mixed, and the mixed recording signal is equivalent to a state in which the mixed recording signal is artificially decimated by half and resampled. Aliasing occurs almost without being affected, and when reproducing a duplicate sound signal recorded in monaural by a recording device in a stereo or surround sound space of two or more channels, the disturbance sound can be made audible. . As a result, when the duplicated sound signal is reproduced, noise that does not make sense as a disturbing sound is emitted simultaneously with the sound recorded in the original sound signal, so that the meaning of duplication is lost and duplication can be suppressed.

本発明第３の態様では、本発明第２の態様の音響信号に対する妨害音の埋込み装置において、前記周波数成分改変手段は、前記音響フレーム読込手段により読み込まれた音響フレームのうち、所定間隔ごとの音響フレームに対して、処理を行うものであることを特徴とする。 According to a third aspect of the present invention, in the apparatus for embedding a disturbing sound with respect to the acoustic signal according to the second aspect of the present invention, the frequency component modifying means is arranged at predetermined intervals among the acoustic frames read by the acoustic frame reading means. The processing is performed on the acoustic frame.

本発明第３の態様によれば、所定間隔ごとの音響フレームに対して改変を行うようにしたので、原音響信号の改変のための演算処理量を減らすことができるとともに、雑音再生音を明瞭にすることができる。特に、所定間隔として２音響フレームごと、すなわち奇数番目の音響フレームか偶数番目の音響フレームのどちらかに対してのみ改変を行うようにすれば、演算処理量に対する妨害音の埋め込みの効率が良い。 According to the third aspect of the present invention, since the sound frames at predetermined intervals are modified, the amount of calculation processing for modifying the original sound signal can be reduced, and the noise reproduced sound is clearly displayed. Can be. In particular, if the modification is performed only for every two acoustic frames, that is, only the odd-numbered acoustic frame or the even-numbered acoustic frame as the predetermined interval, the efficiency of embedding the interference sound with respect to the calculation processing amount is good.

本発明第４の態様では、本発明第１から第３のいずれかの態様の音響信号に対する妨害音の埋込み装置において、前記周波数成分改変手段は、前記時間−周波数変換手段により得られた第１スペクトルの信号成分の中で、周波数Ｆｔ以上の信号成分を除去するとともに、前記周波数Ｆｔを中心に前記周波数Ｆｔ以下の信号成分を高域の周波数方向に折り返し、折り返された周波数Ｆｔから周波数２Ｆｔの範囲の第１スペクトルの信号成分に対して正負符号を反転しかつ所定の係数値を乗算しながら、前記第１スペクトルの対応する周波数の信号成分に加算することにより、前記第１スペクトルの周波数成分をチャンネルごとに改変する処理を予め行うことを特徴とする。 According to a fourth aspect of the present invention, in the apparatus for embedding interference sound in the acoustic signal according to any one of the first to third aspects of the present invention, the frequency component modifying means is the first obtained by the time-frequency conversion means. Among signal components of the spectrum, the signal components higher than the frequency Ft are removed, the signal components lower than the frequency Ft are folded around the frequency Ft in the high-frequency direction, and the frequency Ft to the frequency 2Ft is folded. The frequency component of the first spectrum is added to the signal component of the corresponding frequency of the first spectrum while inverting the sign of the signal component of the first spectrum of the range and multiplying by a predetermined coefficient value. It is characterized in that a process for modifying each channel is performed in advance.

本発明第４の態様によれば、原音響信号の第１スペクトルの信号成分の中で、所定の周波数Ｆｔ以上の信号成分を除去するとともに、周波数Ｆｔを中心に周波数Ｆｔ以下の信号成分を高域の周波数方向に折り返し、折り返された周波数Ｆｔから周波数２Ｆｔ（Ｆｔの２倍）の範囲の第１スペクトルの信号成分に対して正負符号を反転しかつ所定の係数値を乗算しながら、第１スペクトルの対応する周波数の信号成分に加算することにより、第１スペクトルの周波数成分を改変した後、第１、第２の態様に示した周波数成分の改変を行うようにしたので、第１、第２の態様に示したモノラル録音を施すことにより、疑似的に通常再生した音をサンプリング周波数２Ｆｔで再サンプリングされ複製された信号と等価な複製音響信号を再生した場合には、周波数Ｆｔから周波数２Ｆｔの範囲の第１スペクトルの位相反転された信号成分が、逆に周波数Ｆｔを中心に周波数Ｆｔ以下の低域の周波数方向に折り返され、原音響信号の周波数Ｆｔ以下の信号成分に重畳（または減算）されるため、原音響信号に基づく音が所定の係数値の割合だけ打ち消される。 According to the fourth aspect of the present invention, the signal component having the frequency Ft or higher is removed from the signal component of the first spectrum of the original sound signal, and the signal component having the frequency Ft or lower is increased with the frequency Ft as the center. The first spectral signal component in the range from the folded frequency Ft to the frequency 2Ft (twice Ft) is inverted while the sign is inverted and multiplied by a predetermined coefficient value. Since the frequency component of the first spectrum is modified by adding to the signal component of the corresponding frequency of the spectrum, the modification of the frequency component shown in the first and second modes is performed. When a reproduced sound signal equivalent to a duplicated signal obtained by re-sampling the pseudo-normally reproduced sound at the sampling frequency 2Ft by performing the monaural recording shown in the aspect 2 is reproduced. Is a signal component obtained by inverting the phase of the first spectrum in the range from the frequency Ft to the frequency 2Ft in the low-frequency direction below the frequency Ft centered on the frequency Ft and lower than the frequency Ft of the original sound signal. Since the signal component is superimposed (or subtracted), the sound based on the original sound signal is canceled by a ratio of a predetermined coefficient value.

本発明第５の態様では、本発明第１から第４のいずれかの態様の音響信号に対する妨害音の埋込み装置において、前記原音響信号は、サンプリング周波数Ｆｓでサンプリングされたものであり、前記原音響信号に対して、サンプリング周波数Ｆｕｓ（Ｆｕｓ＞Ｆｓ）でアップサンプリングを行い、広帯域の原音響信号である広帯域音響信号を作成するアップサンプリング手段をさらに有し、前記Ｆｔ＝Ｆｓ／２であり、前記時間−周波数変換手段、前記周波数成分改変手段、周波数−時間変換手段、改変音響フレーム補正手段、改変音響フレーム出力手段は、前記広帯域音響信号に対して処理を行うものであることを特徴とする。 According to a fifth aspect of the present invention, in the apparatus for embedding a disturbing sound with respect to the acoustic signal according to any one of the first to fourth aspects of the present invention, the original acoustic signal is sampled at a sampling frequency Fs, Upsampling means for performing upsampling on the acoustic signal at a sampling frequency Fus (Fus> Fs) to create a broadband acoustic signal that is a broadband original acoustic signal, and Ft = Fs / 2, The time-frequency conversion means, the frequency component modification means, the frequency-time conversion means, the modified sound frame correction means, and the modified sound frame output means perform processing on the wideband sound signal. .

本発明第５の態様によれば、原音響信号の信号成分除去の下限であり、折り返しの中心である周波数Ｆｔを、原音響信号のサンプリング周波数Ｆｓの１／２とし、改変音響信号のサンプル数を原音響信号のサンプル数より大きくなるように改変するようにしたので、埋め込まれた位相反転された原音響信号の低域成分や妨害音の成分の下限がＦｓ／２となり、周波数Ｆｓ／２を人間の可聴域の上限付近に設定しておくことにより、位相反転された原音響信号の低域成分や妨害音の信号成分は完全に非可聴域に入り、これらの信号成分は全く聴取されない。ところが、改変音響信号がアナログ系に変換されるかデジタルの状態で、第１、第２の態様に示したモノラル録音を施すことにより、疑似的にサンプリング周波数Ｆｓで再度サンプリングされ複製された場合に、複製された音響信号を再生すると、エイリアシングが発生して周波数Ｆｓ／２以上の妨害音の信号成分が周波数Ｆｓ／２を中心に周波数Ｆｓ／２以下の可聴帯域に折り返され、原音信号に基づく音が抑圧され、代わりに第２の音響信号に基づく音または妨害音が可聴な状態で出力される。このため、原音響信号のサンプリング周波数と同一のサンプリング周波数で、原音響信号に記録された音のみを複製されることを防ぐことが可能となる。 According to the fifth aspect of the present invention, the frequency Ft that is the lower limit of the signal component removal of the original sound signal and is the center of folding is set to ½ of the sampling frequency Fs of the original sound signal, and the number of samples of the modified sound signal Is modified so as to be larger than the number of samples of the original sound signal, so that the lower limit of the low-frequency component and the disturbing sound component of the embedded phase-inverted original sound signal is Fs / 2, and the frequency Fs / 2 Is set near the upper limit of the human audible range, the low-frequency component of the original acoustic signal whose phase is inverted and the signal component of the disturbing sound are completely inaudible, and these signal components are not heard at all. . However, when the modified acoustic signal is converted into an analog system or digitally and the monaural recording shown in the first and second modes is performed, it is artificially sampled again at the sampling frequency Fs and duplicated. When the reproduced acoustic signal is reproduced, aliasing occurs, and the signal component of the disturbing sound having the frequency Fs / 2 or higher is folded back to the audible band having the frequency Fs / 2 or lower, and based on the original sound signal. The sound is suppressed, and instead, a sound based on the second acoustic signal or a disturbing sound is output in an audible state. For this reason, it is possible to prevent only the sound recorded in the original sound signal from being duplicated at the same sampling frequency as that of the original sound signal.

本発明第６の態様では、本発明第１から第４のいずれかの態様の音響信号に対する妨害音の埋込み装置において、前記原音響信号は、サンプリング周波数Ｆｓでサンプリングされたものであり、前記Ｆｔ＝Ｆｓ／４であることを特徴とする。 According to a sixth aspect of the present invention, in the apparatus for embedding a disturbing sound with respect to the acoustic signal according to any one of the first to fourth aspects of the present invention, the original sound signal is sampled at a sampling frequency Fs, and the Ft = Fs / 4.

本発明第６の態様によれば、折り返しの中心である周波数Ｆｔを、原音響信号のサンプリング周波数Ｆｓの１／４としたので、位相反転された原音響信号の低域成分および第２の音響信号または妨害音の信号成分の上限がＦｓ／２となり、原音響信号の周波数Ｆｓ／４からＦｓ／２までの信号成分に位相反転された原音響信号の低域成分および第２の音響信号または妨害音の信号成分が重畳されており、第１、第２の態様に示したモノラル録音を施すことにより、疑似的にサンプリング周波数Ｆｓ／２で複製された場合に、複製された音響信号を再生すると、エイリアシングが発生して、周波数Ｆｓ／４からＦｓ／２までの位相反転された原音響信号の低域成分および第２の音響信号または雑音の信号成分が周波数Ｆｓ／４を中心に周波数Ｆｓ／４以下に折り返され、原音響信号に基づく音が抑圧され、代わりに第２の音響信号に基づく音または雑音が可聴な状態で出力される。このため、原音響信号のサンプリング周波数の１／２のサンプリング周波数で、原音響信号に記録された音のみを複製されることを防ぐことが可能となる。 According to the sixth aspect of the present invention, since the frequency Ft that is the center of the folding is set to ¼ of the sampling frequency Fs of the original sound signal, the low-frequency component of the phase-inverted original sound signal and the second sound The upper limit of the signal component of the signal or interfering sound is Fs / 2, and the low-frequency component of the original acoustic signal and the second acoustic signal or the second acoustic signal, the phase of which is inverted to the signal component of the original acoustic signal from the frequency Fs / 4 to Fs / 2. The signal component of the interfering sound is superimposed, and by reproducing the monaural recording shown in the first and second modes, the duplicated sound signal is reproduced when it is duplicated at the sampling frequency Fs / 2. Then, aliasing occurs, and the low frequency component of the original acoustic signal whose phase is inverted from the frequency Fs / 4 to Fs / 2 and the second acoustic signal or the noise signal component are centered on the frequency Fs / 4. / 4 folded back below, the sound based on the original acoustic signal is suppressed, the sound or noise based on the second audio signal is output at audible state instead. For this reason, it is possible to prevent only the sound recorded in the original sound signal from being duplicated at a sampling frequency that is half the sampling frequency of the original sound signal.

本発明第７の態様では、本発明第６の態様の音響信号に対する妨害音の埋込み装置において、前記周波数成分改変手段における乗算する所定の係数値（β、βｌ、βｒ）または増加させる所定値（γ、γｌ、γｒ）は、前記時間−周波数変換手段により得られた第１スペクトルの信号成分の中で、周波数Ｆｔから周波数２Ｆｔに含まれる所定範囲の信号成分の平均値に基づいて変化させるものであることを特徴とする。 According to a seventh aspect of the present invention, in the apparatus for embedding a disturbing sound for an acoustic signal according to the sixth aspect of the present invention, predetermined coefficient values (β, βl, βr) to be multiplied in the frequency component modifying means or predetermined values to be increased ( [gamma], [gamma] l, [gamma] r) are changed based on the average value of the signal components in the predetermined range included in the frequency 2Ft from the frequency Ft among the signal components of the first spectrum obtained by the time-frequency conversion means. It is characterized by being.

本発明第７の態様によれば、原音響信号を改変する際に、第２の音響信号の成分または妨害音の成分として作用させる度合いを、原音響信号の周波数Ｆｔから周波数２Ｆｔに含まれる所定範囲の信号成分の平均的な強度値に基づいて変化させるようにしたので、原音響信号の信号強度と連動させて増減させることとなり、埋め込みによる原音響信号の品質劣化を抑えることができる。 According to the seventh aspect of the present invention, when modifying the original sound signal, the degree to which the original sound signal acts as the component of the second sound signal or the disturbing sound is determined from the frequency Ft to the frequency 2Ft of the original sound signal. Since it is changed based on the average intensity value of the signal components in the range, it is increased / decreased in conjunction with the signal intensity of the original acoustic signal, and the deterioration of the quality of the original acoustic signal due to embedding can be suppressed.

本発明第８の態様では、本発明第１から第７のいずれか１つの態様の音響信号に対する妨害音の埋込み装置において、前記時間−周波数変換手段は、窓幅Ｎサンプルとして、サンプル位置ｉ（０≦ｉ≦Ｎ−１）における重みＷ（ｉ）（０≦Ｗ（ｉ）≦１）が、Ｗ（ｉ）＝０．５−０．５ｃｏｓ（２πｉ／Ｎ）で定義されるハニング窓関数を用いて時間−周波数変換を行うものであることを特徴とする。 According to an eighth aspect of the present invention, in the apparatus for embedding a disturbing sound with respect to the acoustic signal according to any one of the first to seventh aspects of the present invention, the time-frequency conversion means uses a sample position i ( Hanning window function in which weight W (i) (0 ≦ W (i) ≦ 1) in 0 ≦ i ≦ N−1) is defined by W (i) = 0.5−0.5 cos (2πi / N) Is used to perform time-frequency conversion.

本発明第８の態様によれば、周波数解析を行う際に、ハニング窓関数を用いるようにしたので、擬似高調波成分を発生させることなく、原音響信号に与える歪みを低減させながら時間−周波数変換を行うことが可能になる。 According to the eighth aspect of the present invention, since the Hanning window function is used when performing the frequency analysis, the time-frequency is reduced while reducing the distortion applied to the original acoustic signal without generating a pseudo-harmonic component. Conversion can be performed.

本発明によれば、ボイスレコーダー等の録音機器により無許可で録音を行われた場合であっても、録音後のデジタル化された複製音響信号を再生した際、エイリアシングが発生し、原音響信号に非可聴に埋め込まれた妨害音が可聴となって再生されることにより、本来の状態での複製を防止することが可能となる。 According to the present invention, even when recording is performed without permission by a recording device such as a voice recorder, aliasing occurs when the digitally reproduced duplicate audio signal after recording is reproduced, and the original audio signal In this way, it is possible to prevent duplication in the original state by reproducing the non-audible disturbing sound as audible.

エイリアシングの原理を示す図である。It is a figure which shows the principle of aliasing. 本発明の一実施形態の基本原理を示す図である。It is a figure which shows the basic principle of one Embodiment of this invention. 本発明に係る音響信号に対する妨害音の埋め込み装置のハードウェア構成図である。It is a hardware block diagram of the embedding device of the disturbance sound with respect to the acoustic signal which concerns on this invention. 本発明に係る音響信号に対する妨害音の埋め込み装置の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the embedding device of the disturbance sound with respect to the acoustic signal which concerns on this invention. 第１（第３）の実施形態に係る音響信号に対する妨害音の埋め込み装置の処理動作を示すフローチャートである。It is a flowchart which shows the processing operation of the embedding device of the disturbance sound with respect to the acoustic signal which concerns on 1st (3rd) embodiment. ステップＳ８までの処理により得られた改変音響信号が、ＬＰＦ回路を備えた現実的な装置で複製される場合の概念を示す図である。It is a figure which shows the concept in case the modified acoustic signal obtained by the process to step S8 is replicated with the realistic apparatus provided with the LPF circuit. 第１の実施形態におけるステップＳ９の補正処理を示す図である。It is a figure which shows the correction process of step S9 in 1st Embodiment. ステップＳ９までの処理により得られた改変音響信号が、ＬＰＦ回路を備えた現実的な装置で複製される場合の概念を示す図である。It is a figure which shows the concept in case the modified acoustic signal obtained by the process to step S9 is replicated with the realistic apparatus provided with the LPF circuit. 第２（第４）の実施形態に係る音響信号に対する妨害音の埋め込み装置の処理動作を示すフローチャートである。It is a flowchart which shows the processing operation of the embedding device of the disturbance sound with respect to the acoustic signal which concerns on 2nd (4th) embodiment. 第３の実施形態における妨害音の埋め込み処理の概念を示す図である。It is a figure which shows the concept of the embedding process of the disturbance sound in 3rd Embodiment. 第３の実施形態におけるステップＳ９の補正処理を示す図である。It is a figure which shows the correction process of step S9 in 3rd Embodiment. 第３の実施形態のステップＳ９までの処理により得られた改変音響信号が、ＬＰＦ回路を備えた現実的な装置で複製される場合の概念を示す図である。It is a figure which shows the concept in case the modified acoustic signal obtained by the process to step S9 of 3rd Embodiment is replicated with the realistic apparatus provided with the LPF circuit. ５．１チャンネルサラウンド音響信号に適用する場合におけるスピーカと処理対象チャンネルの関係を示す図である。It is a figure which shows the relationship between a speaker and a process target channel in the case of applying to a 5.1 channel surround sound signal.

以下、本発明の実施形態について図面を参照して詳細に説明する。
＜１．本発明の基本概念＞
最初に、本発明の基本概念について説明しておく。本発明では、エイリアシングと呼ばれる折り返し歪みを利用して、複製を防止したい原音響信号に対して、原音響信号のレベルを抑圧するための原音響信号の位相反転した信号成分と新たに警告メッセージなど第２の音響信号や白色雑音等の雑音を妨害音として埋め込む改変を行い、改変音響信号を得る処理を行う。埋め込まれた位相反転された原音響信号の信号成分および第２の音響信号または雑音の信号成分は、改変音響信号の通常再生時には聴き取ることができないが、再生された改変音響信号を録音して再サンプリングされ複製された複製音響信号（複製方法はデジタルコピーであるか、アナログコピーであるかは問わないが、原音響信号と同一のサンプリング周波数による複製を防止する場合にはアナログコピーに限定される）を再生すると、原音響信号が抑圧され、代わりに警告メッセージなどの第２の音響信号や白色雑音等の雑音が可聴な状態で再生される。しかし、エイリアシングという現象は当業界において公知であるため、録音時に高周波雑音が混入しないように、録音装置またはサンプラーにアンチエイリアシング処理、具体的にはサンプラーの前にＬＰＦ回路が設置されているのが一般的である。これにより、エイリアシングにより折り返される高域成分がＬＰＦ回路により減衰されてしまうため、実質的には折り返しが殆ど発生しないのと等価となり、上記目的は実現できない。本発明では、これに対する解決策も提案するものである。具体的には、原音響信号の再生は２チャンネル以上のステレオまたはサラウンド（映画・シアターなど）で行われるが、違法録音を行う人はポケットに隠せる携帯型のボイスレコーダーやスマートフォンを用いてモノラルで収録される点に注目し、複数のチャンネルの信号が混合されると、隣接サンプルが同一の値になるように制御し、疑似的に１／２のサンプリング周波数で収録されるようにし、ボイスレコーダーのサンプリング周波数に依存せずにエイアリシングを確実に発生できるようにしたものである。違法録音を行う人は原音響信号と同等以上のサンプリング周波数で録音すれば、ＬＰＦ回路を通しても、ＬＰＦ回路による影響を受けずに１／２のサンプリング周波数で、再サンプリングが行われ、所望の折り返しが発生し、上記目的を実現できる。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
<1. Basic concept of the present invention>
First, the basic concept of the present invention will be described. In the present invention, by utilizing aliasing distortion called aliasing, the signal component obtained by inverting the phase of the original sound signal for suppressing the level of the original sound signal and a new warning message for the original sound signal to be prevented from being duplicated. Modification is performed to embed the second acoustic signal or noise such as white noise as an interfering sound, and processing for obtaining the modified acoustic signal is performed. The signal component of the embedded original phase-inverted acoustic signal and the second acoustic signal or noise signal component cannot be heard during normal playback of the modified acoustic signal, but the reproduced modified acoustic signal is recorded. Re-sampled and duplicated audio signal (replication method is digital copy or analog copy, but it is limited to analog copy to prevent duplication with the same sampling frequency as the original audio signal) Is reproduced, the second acoustic signal such as a warning message and noise such as white noise are reproduced in an audible state instead. However, since the phenomenon of aliasing is well known in the industry, anti-aliasing processing, specifically, an LPF circuit is installed in front of the sampler so that high-frequency noise is not mixed during recording. It is common. As a result, the high frequency component folded back by aliasing is attenuated by the LPF circuit, so that it is substantially equivalent to almost no folding, and the above-mentioned object cannot be realized. The present invention also proposes a solution for this. Specifically, the original sound signal is played in stereo or surround (movies, theaters, etc.) with two or more channels, but those who make illegal recordings are monaural using a portable voice recorder or smartphone that can be hidden in a pocket. Paying attention to the point of recording, when the signals of multiple channels are mixed, the adjacent samples are controlled to have the same value and recorded at a sampling frequency of ½, and the voice recorder The aliasing can be reliably generated without depending on the sampling frequency. If the person who performs illegal recording records at a sampling frequency equal to or higher than that of the original sound signal, the sampling is performed again at the sampling frequency of ½ without being affected by the LPF circuit even through the LPF circuit. The above-mentioned purpose can be realized.

図１は、エイリアシングの原理を示す図である。このうち、図１（ａ）は、原音響信号の信号スペクトル、図１（ｂ）は図１（ａ）の原音響信号をサンプリング周波数Ｆｓでサンプリングしたサンプリングデータの信号スペクトルを示している。原音響信号に対して、サンプリング周波数Ｆｓでサンプリングを行うと、ナイキスト周波数Ｆｓ／２を超える高周波の信号成分が、Ｆｓ／２を中心に低域側に折り返され、Ｆｓ／２以下の低域信号成分として誤解析され重畳されてしまう。これが、エイリアシングである。図１の例では、サンプリング周波数Ｆｓでサンプリングを行うことにより、周波数Ｆｓ／２〜Ｆｓの信号成分は周波数方向に反転して重畳され、周波数Ｆｓ〜３Ｆｓ／２の信号成分は周波数方向に２回反転して元の向きで重畳される。図１（ｂ）の例では、３種のスペクトル成分を重ねて表示しているが、実際には複素ベクトルで加算される。 FIG. 1 is a diagram illustrating the principle of aliasing. 1A shows the signal spectrum of the original acoustic signal, and FIG. 1B shows the signal spectrum of the sampling data obtained by sampling the original acoustic signal of FIG. 1A at the sampling frequency Fs. When the original sound signal is sampled at the sampling frequency Fs, a high-frequency signal component exceeding the Nyquist frequency Fs / 2 is folded back to the low frequency side centering on Fs / 2, and the low frequency signal of Fs / 2 or less. It is misanalyzed as a component and superimposed. This is aliasing. In the example of FIG. 1, by sampling at the sampling frequency Fs, the signal components of the frequencies Fs / 2 to Fs are inverted and superimposed in the frequency direction, and the signal components of the frequencies Fs to 3Fs / 2 are twice in the frequency direction. Inverted and superimposed in the original orientation. In the example of FIG. 1B, three types of spectral components are displayed in an overlapping manner, but in reality they are added as complex vectors.

図２は、後述する本発明の一実施形態の原理を示す図である。図２において、図２（ａ）は、サンプリング周波数Ｆｓでサンプリングされた原音響信号の信号スペクトル、図２（ｂ）は、サンプリング周波数Ｆｓでサンプリングされた第２の音響信号である第２音響信号の信号スペクトル、図２（ｃ）は、本発明による埋め込み処理の結果得られた改変音響信号の信号スペクトル、図２（ｄ）は、改変音響信号を複製手段により再サンプリングして得られた複製音響信号の信号スペクトルである。 FIG. 2 is a diagram showing the principle of one embodiment of the present invention to be described later. 2A is a signal spectrum of an original acoustic signal sampled at the sampling frequency Fs, and FIG. 2B is a second acoustic signal that is a second acoustic signal sampled at the sampling frequency Fs. 2 (c) shows the signal spectrum of the modified acoustic signal obtained as a result of the embedding process according to the present invention, and FIG. 2 (d) shows the duplicate obtained by re-sampling the modified acoustic signal by the duplicating means. It is a signal spectrum of an acoustic signal.

図２（ａ）に示した原音響信号、図２（ｂ）に示した第２音響信号は、ともにサンプリング前にハイカットフィルタ処理によりサンプリング周波数Ｆｓ／２より高域の成分を削除し、エイリアシングが生じていないものを用いる。ここでは、図１（ａ）に示した原音響信号に対してあらかじめ同サンプリング周波数を２Ｆｓ（Ｆｓの２倍）に拡大して、図２（ｂ）に示した異なる音響信号を周波数Ｆｓ／２を中心に高域側へ折り返したものをサンプリング周波数が拡大された原音響信号に合成し、図２（ｃ）に示す改変音響信号を得る。この際、本発明では、原音響信号のＦｓ／２以下の成分を振幅反転しながら、Ｆｓ／２を中心に周波数方向に折り返し、合成する処理を行う。 Both the original acoustic signal shown in FIG. 2 (a) and the second acoustic signal shown in FIG. 2 (b) are subjected to high-cut filter processing before sampling to remove components in the higher frequency range than the sampling frequency Fs / 2. Use something that has not occurred. Here, the same sampling frequency is expanded to 2 Fs (twice Fs) in advance with respect to the original sound signal shown in FIG. 1A, and the different sound signal shown in FIG. 2 is synthesized with the original acoustic signal whose sampling frequency is expanded, and the modified acoustic signal shown in FIG. 2C is obtained. At this time, in the present invention, the component of Fs / 2 or less of the original sound signal is folded in the frequency direction around Fs / 2 while being amplitude-inverted and synthesized.

このようにして得られた図２（ｃ）に示す改変音響信号を再生した場合、Ｆｓ／２を人間の可聴域より高い値（例えば、２２．０５ｋＨｚ）に設定しておくことにより、Ｆｓ／２より高い異なる音響信号に由来する信号成分に基づく音は、人間にはほとんど聴こえない。図２（ｃ）に示す改変音響信号を、複製してサンプリング周波数Ｆｓで再サンプリングすると、図２（ｄ）に示す複製音響信号が得られる。図２（ｄ）に示す複製音響信号は、エイリアシングが発生し、原音響信号成分が振幅反転して人間の可聴域であるＦｓ／２以下の原音響信号に折り返されて原音響信号は打ち消され、代わりに第２音響信号に由来する信号成分が、人間の可聴域であるＦｓ／２以下に折り返される。この結果、複製音響信号には、第２音響信号に由来する信号成分が主に記録されるため、第２音響信号として記録された警告メッセージ等の第２音響信号のみが人間に聴こえることになる。このため、原音響信号の複製を防止することができる。しかし、通常は、サンプリング周波数Ｆｓで再サンプリングすると、Ｆｓ／２以上の信号成分はサンプラー前処理のＬＰＦによりカットされるため、Ｆｓ／２以下に折り返される現象は起こらず、所望の第２音響信号は聴取されない。そこで、本願では、モノラル録音により疑似的に所望の周波数による再サンプリングと同等な効果が得られるように制御することにより、ＬＰＦによる影響を受けない手法を提案するものである。 When the modified acoustic signal shown in FIG. 2C obtained in this way is reproduced, by setting Fs / 2 to a value higher than the human audible range (for example, 22.05 kHz), Fs / Sounds based on signal components derived from different acoustic signals higher than 2 are hardly audible to humans. When the modified acoustic signal shown in FIG. 2C is duplicated and resampled at the sampling frequency Fs, the duplicate acoustic signal shown in FIG. 2D is obtained. The duplicated sound signal shown in FIG. 2 (d) is aliased, and the original sound signal component is inverted in amplitude and turned back to an original sound signal of Fs / 2 or less, which is the human audible range, and the original sound signal is canceled out. Instead, the signal component derived from the second acoustic signal is folded back below Fs / 2, which is the human audible range. As a result, since the signal component derived from the second sound signal is mainly recorded in the duplicate sound signal, only the second sound signal such as a warning message recorded as the second sound signal can be heard by humans. . For this reason, duplication of the original acoustic signal can be prevented. However, normally, when re-sampling at the sampling frequency Fs, the signal component of Fs / 2 or higher is cut by the LPF of the sampler preprocessing, so that the phenomenon of folding back to Fs / 2 or lower does not occur, and the desired second acoustic signal Is not heard. In view of this, the present application proposes a technique that is not affected by LPF by controlling so that the effect equivalent to resampling at a desired frequency can be obtained in a pseudo manner by monaural recording.

＜２．１．埋め込み装置の構成＞
次に、本発明に係る音響信号に対する妨害音の埋め込み装置について説明する。図３は、本発明に係る音響信号に対する妨害音の埋め込み装置のハードウェア構成図である。音響信号に対する妨害音の埋め込み装置は、汎用のコンピュータで実現することができ、図３に示すように、ＣＰＵ１（CPU: Central Processing Unit）と、コンピュータのメインメモリであるＲＡＭ２（RAM: Random Access Memory）と、ＣＰＵ１が実行するプログラムやデータを記憶するための大容量の記憶装置３（例えば、ハードディスク、フラッシュメモリ等）と、キーボード、マウス等のキー入力Ｉ／Ｆ（インターフェース）４と、外部装置（データ記憶媒体等）とデータ通信するためのデータ入出力Ｉ／Ｆ（インターフェース）５と、表示装置（ディスプレイ）に情報を送出するための表示出力Ｉ／Ｆ（インターフェース）６と、を備え、互いにバスを介して接続されている。 <2.1. Configuration of embedded device>
Next, a description will be given of an interference sound embedding device for an acoustic signal according to the present invention. FIG. 3 is a hardware configuration diagram of an apparatus for embedding a disturbance sound for an acoustic signal according to the present invention. A device for embedding an interference sound with respect to an acoustic signal can be realized by a general-purpose computer. As shown in FIG. 3, a CPU 1 (CPU: Central Processing Unit) and a RAM 2 (RAM: Random Access Memory) which is a main memory of the computer. ), A large-capacity storage device 3 (for example, a hard disk, a flash memory, etc.) for storing programs and data executed by the CPU 1, a key input I / F (interface) 4 such as a keyboard and a mouse, and an external device A data input / output I / F (interface) 5 for data communication with (a data storage medium or the like), and a display output I / F (interface) 6 for sending information to a display device (display); They are connected to each other via a bus.

図４は、本発明に係る音響信号に対する妨害音の埋め込み装置の構成を示す機能ブロック図である。図４において、８はアップサンプリング手段、１０は音響フレーム読込手段、２０は時間−周波数変換手段、３０は周波数成分改変手段、４０は周波数−時間変換手段、４５は改変音響フレーム補正手段、５０は改変音響フレーム出力手段、６０は記憶手段、６１は音響信号記憶部、６２は改変音響信号記憶部である。なお、図４に示す装置は、ステレオ音響信号、５．１チャンネルサラウンド音響信号等の２チャンネル以上の音響信号に対応可能であるが、本実施形態では、２チャンネルのステレオ音響信号に対して処理を行う場合について説明していく。 FIG. 4 is a functional block diagram showing the configuration of the interference sound embedding device for the acoustic signal according to the present invention. In FIG. 4, 8 is an upsampling means, 10 is an acoustic frame reading means, 20 is a time-frequency conversion means, 30 is a frequency component modification means, 40 is a frequency-time conversion means, 45 is a modified acoustic frame correction means, and 50 is The modified acoustic frame output means, 60 is a storage means, 61 is an acoustic signal storage section, and 62 is a modified acoustic signal storage section. Note that the apparatus shown in FIG. 4 can cope with a stereo sound signal of two or more channels such as a 5.1 channel surround sound signal, but in the present embodiment, a process is performed on a stereo sound signal of two channels. The case of performing will be described.

アップサンプリング手段８は、音響信号記憶部６１に記憶された原音響信号である第１音響信号に対して、第１音響信号のサンプリング周波数よりも周波数を上げてサンプリングを行い、広帯域音響信号を得る機能を有している。音響フレーム読込手段１０は、広帯域音響信号の各チャンネルから所定数のサンプルを１音響フレームとして読み込む機能を有している。第２の音響信号である第２音響信号を埋め込む場合は、音響フレーム読込手段１０は、第２音響信号の各チャンネルからも所定数のサンプルを１音響フレームとして読み込む機能も有している。時間−周波数変換手段２０は、音響フレーム読込手段１０が広帯域音響信号から読み込んだ音響フレーム（以下、第１音響フレームという。）をフーリエ変換等により時間−周波数変換して周波数次元の複素数のスペクトルを生成する機能を有している。第２音響信号を埋め込む場合は、時間−周波数変換手段２０は、音響フレーム読込手段１０が第２音響信号から読み込んだ音響フレーム（以下、第２音響フレームという。）もフーリエ変換等により時間−周波数変換して周波数次元の複素数のスペクトルを生成する機能も有している。周波数成分改変手段３０は、第１音響フレームから得られたスペクトルを改変する機能を有している。第２音響信号を埋め込む場合は、周波数成分改変手段３０は、第２音響フレームから得られたスペクトルを参照しながら、第１音響フレームから得られたスペクトルを改変する機能を有している。周波数−時間変換手段４０は、改変されたスペクトル集合を含む複数の複素数のスペクトルに対して周波数−時間変換を行うことにより、時間次元の改変音響フレームを生成する機能を有している。改変音響フレーム補正手段４５は、周波数−時間変換手段４０により得られた改変音響フレームを補正する機能を有している。改変音響フレーム出力手段５０は、補正された改変音響フレームを順次出力する機能を有している。 The up-sampling means 8 samples the first acoustic signal, which is the original acoustic signal stored in the acoustic signal storage unit 61, at a frequency higher than the sampling frequency of the first acoustic signal to obtain a broadband acoustic signal. It has a function. The acoustic frame reading means 10 has a function of reading a predetermined number of samples as one acoustic frame from each channel of the broadband acoustic signal. When embedding the second acoustic signal, which is the second acoustic signal, the acoustic frame reading means 10 also has a function of reading a predetermined number of samples as one acoustic frame from each channel of the second acoustic signal. The time-frequency conversion means 20 performs time-frequency conversion on the sound frame (hereinafter referred to as the first sound frame) read from the wide-band sound signal by the sound frame reading means 10 by Fourier transform or the like to generate a complex spectrum in the frequency dimension. It has a function to generate. In the case of embedding the second acoustic signal, the time-frequency conversion means 20 also uses an acoustic frame reading means 10 to read an acoustic frame (hereinafter referred to as a second acoustic frame) from the second acoustic signal by Fourier transformation or the like. It also has the function of generating a complex spectrum in the frequency dimension by conversion. The frequency component modifying means 30 has a function of modifying the spectrum obtained from the first acoustic frame. When embedding the second acoustic signal, the frequency component modifying means 30 has a function of modifying the spectrum obtained from the first acoustic frame while referring to the spectrum obtained from the second acoustic frame. The frequency-time conversion means 40 has a function of generating a time-ordered modified acoustic frame by performing frequency-time conversion on a plurality of complex spectra including a modified spectrum set. The modified sound frame correcting unit 45 has a function of correcting the modified sound frame obtained by the frequency-time converting unit 40. The modified acoustic frame output means 50 has a function of sequentially outputting the modified modified acoustic frames.

記憶手段６０は、第２音響信号を埋め込む場合は第２音響信号を含めて第１音響信号を記憶した音響信号記憶部６１と、改変音響信号を記憶する改変音響信号記憶部６２を有しており、その他処理に必要な各種情報を記憶するものである。 The storage unit 60 includes an acoustic signal storage unit 61 that stores the first acoustic signal including the second acoustic signal when the second acoustic signal is embedded, and a modified acoustic signal storage unit 62 that stores the modified acoustic signal. In addition, various other information necessary for processing is stored.

図４に示した各構成手段は、現実には図３に示したように、コンピュータおよびその周辺機器等のハードウェアに専用のプログラムを搭載することにより実現される。すなわち、コンピュータが、専用のプログラムに従って各手段の内容を実行することになる。 Each component shown in FIG. 4 is actually realized by installing a dedicated program in hardware such as a computer and its peripheral devices as shown in FIG. That is, the computer executes the contents of each means according to a dedicated program.

図３の記憶装置３には、ＣＰＵ１を動作させ、コンピュータを、音響信号に対する妨害音の埋め込み装置として機能させるための専用のプログラムが実装されている。この専用のプログラムを実行することにより、ＣＰＵ１は、アップサンプリング手段８、音響フレーム読込手段１０、時間−周波数変換手段２０、周波数成分改変手段３０、周波数−時間変換手段４０、改変音響フレーム補正手段４５、改変音響フレーム出力手段５０としての機能を実現することになる。また、記憶装置３は、記憶手段６０としての機能を実現する他、処理に必要な様々なデータを記憶する。 In the storage device 3 of FIG. 3, a dedicated program for operating the CPU 1 and causing the computer to function as a device for embedding a disturbance sound for an acoustic signal is installed. By executing this dedicated program, the CPU 1 performs upsampling means 8, sound frame reading means 10, time-frequency conversion means 20, frequency component modification means 30, frequency-time conversion means 40, and modified sound frame correction means 45. Thus, the function as the modified sound frame output means 50 is realized. In addition to realizing the function as the storage unit 60, the storage device 3 stores various data necessary for processing.

＜２．２．埋め込み装置の処理動作＞
＜２．２．１．第１の実施形態＞
以下、本発明に係る音響信号に対する妨害音の埋め込み装置の処理動作について説明する。図５は、本発明第１の実施形態に係る音響信号に対する妨害音の埋め込み装置の処理動作を示すフローチャートである。第１音響信号、第２音響信号としては、サンプリング周波数Ｆｓとして９６ｋＨｚ、４８ｋＨｚ（映画など映像・放送業務用）または４４．１ｋＨｚ（ＣＤなど民生用）でサンプリングしたものを用いることができるが、本実施形態では、４８ｋＨｚでサンプリングしたものを用いる。また、本実施形態では、折り返しの中心となる周波数Ｆｔ＝Ｆｓ／２とする。 <2.2. Processing operation of embedded device>
<2.2.1. First Embodiment>
The processing operation of the interference sound embedding device for an acoustic signal according to the present invention will be described below. FIG. 5 is a flowchart showing the processing operation of the interference sound embedding device for the acoustic signal according to the first embodiment of the present invention. As the first acoustic signal and the second acoustic signal, those sampled at 96 kHz, 48 kHz (for video / broadcasting business such as movies) or 44.1 kHz (for consumer use such as CD) as the sampling frequency Fs can be used. In the embodiment, one sampled at 48 kHz is used. In the present embodiment, the frequency Ft = Fs / 2, which is the center of folding, is used.

まず、アップサンプリング手段８が、音響信号記憶部６１に記憶されたステレオの第１音響信号の左右の各チャンネルに対して、元のサンプリング周波数よりもサンプリング周波数を上げてサンプリングを行う処理、すなわち、アップサンプリングを行う（ステップＳ１）。どの程度サンプリング周波数を上げるかについては、適宜設定しておくことが可能であるが、本実施形態では、元の第１音響信号のサンプリング周波数４８ｋＨｚを、サンプリング周波数９６ｋＨｚにアップサンプリングする。具体的には、アップサンプリング手段８は、音響信号記憶部６１に記憶された各チャンネルの総サンプル数Ｓ個の第１音響信号ｘｌ、ｘｒに対して、以下の〔数式１〕に従った処理を実行する。 First, the upsampling means 8 performs sampling by raising the sampling frequency higher than the original sampling frequency for the left and right channels of the stereo first acoustic signal stored in the acoustic signal storage unit 61, that is, Upsampling is performed (step S1). Although how much the sampling frequency is increased can be set as appropriate, in the present embodiment, the sampling frequency of 48 kHz of the original first acoustic signal is up-sampled to the sampling frequency of 96 kHz. Specifically, the up-sampling means 8 performs processing according to the following [Formula 1] on the first acoustic signals xl and xr of the total number of samples S of each channel stored in the acoustic signal storage unit 61. Execute.

〔数式１〕
ｘｌ´（ｓ）＝ｘｌ（ｓ×４８／９６）
ｘｒ´（ｓ）＝ｘｒ（ｓ×４８／９６） [Formula 1]
xl ′ (s) = xl (s × 48/96)
xr ′ (s) = xr (s × 48/96)

上記〔数式１〕において、ｓ＝０，・・・，Ｓ´−１であり、Ｓ´＝Ｓ×９６／４８である。上記〔数式１〕に従った処理を実行した結果、各チャンネルのサンプル数Ｓ個の音響信号が、各チャンネルのサンプル数Ｓ´個の広帯域音響信号にアップサンプリングされる。元の音響信号ｘｌ、ｘｒに存在しないサンプルについては、近傍のサンプルの値を用いて補間する。 In the above [Expression 1], s = 0,..., S′−1, and S ′ = S × 96/48. As a result of executing the processing according to the above [Equation 1], the acoustic signal with the number of samples S for each channel is up-sampled into a broadband acoustic signal with the number of samples S 'for each channel. For samples that do not exist in the original acoustic signals xl and xr, interpolation is performed using the values of neighboring samples.

音響フレーム読込手段１０は、アップサンプリングされたステレオの広帯域音響信号の左右の各チャンネルから、それぞれ所定数Ｎのサンプルを１つの音響フレームとして読み込む（ステップＳ２）。同様に、音響フレーム読込手段１０は、音響信号記憶部６１に記憶されたステレオの第２音響信号の左右の各チャンネルから、それぞれ所定数Ｍのサンプルを１つの第２音響フレームとして読み込む（ステップＳ３）。 The acoustic frame reading means 10 reads a predetermined number N of samples as one acoustic frame from each of the left and right channels of the upsampled stereo broadband acoustic signal (step S2). Similarly, the acoustic frame reading means 10 reads a predetermined number M of samples as one second acoustic frame from the left and right channels of the stereo second acoustic signal stored in the acoustic signal storage unit 61 (step S3). ).

音響フレーム読込手段１０が読み込む１つの第１音響フレームのサンプル数Ｎ、第２音響フレームのサンプル数Ｍは、適宜設定することができるが、本実施形態では、以下、Ｎ＝８１９２、Ｍ＝４０９６の場合について説明する。したがって、音響フレーム読込手段１０は、広帯域音響信号から左チャンネル、右チャンネルについてそれぞれ８１９２サンプルずつ、順次第１音響フレームとして読み込み、第２音響信号から左チャンネル、右チャンネルについてそれぞれ４０９６サンプルずつ、順次第２音響フレームとして読み込んでいくことになる。 The number of samples N of one first sound frame and the number of samples M of a second sound frame read by the sound frame reading means 10 can be set as appropriate, but in this embodiment, hereinafter, N = 8192, M = 4096. The case will be described. Therefore, the sound frame reading means 10 sequentially reads 8192 samples for each of the left channel and the right channel from the wideband sound signal as the first sound frame, and sequentially stores 4096 samples for the left channel and the right channel from the second sound signal. It will be read as 2 sound frames.

本実施形態では、第１音響フレーム、第２音響フレームともに、奇数番目の音響フレーム、偶数番目の音響フレームは、互いに所定数（本実施形態ではＮ／２＝４０９６、Ｍ／２＝２０４８）のサンプルを重複して設定される。したがって、第２音響フレームの場合、奇数番目の音響フレームを先頭からＡ１、Ａ２、Ａ３…とし、偶数番目の音響フレームを先頭からＢ１、Ｂ２、Ｂ３…とすると、Ａ１はサンプル１〜４０９６、Ａ２はサンプル４０９７〜８１９２、Ａ３はサンプル８１９３〜１２２８８、Ｂ１はサンプル２０４９〜６１４４、Ｂ２はサンプル６１４５〜１０２４０、Ｂ３はサンプル１０２４１〜１４３３６となる。なお、重複させるサンプル数は適宜設定することが可能である。 In the present embodiment, both the first and second sound frames have a predetermined number of odd-numbered sound frames and even-numbered sound frames (N / 2 = 4096 and M / 2 = 2048 in the present embodiment). Duplicate samples are set. Therefore, in the case of the second sound frame, if the odd-numbered sound frames are A1, A2, A3... From the top, and the even-numbered sound frames are B1, B2, B3. Are samples 4097 to 8192, A3 is samples 8193 to 12288, B1 is samples 2049 to 6144, B2 is samples 6145 to 10240, and B3 is samples 10241 to 14336. Note that the number of samples to be overlapped can be set as appropriate.

次に、時間−周波数変換手段２０が、第１音響フレームに対して時間−周波数変換を行って、その第１音響フレームの複素数のスペクトルを得る（ステップＳ４）。同様に、時間−周波数変換手段２０が、第２音響フレームに対して時間−周波数変換を行って、その第２音響フレームの複素数のスペクトルを得る（ステップＳ５）。ステップＳ４、Ｓ５では、具体的には、窓関数を利用して時間−周波数変換を行う。時間−周波数変換としては、フーリエ変換、ウェーブレット変換その他公知の種々の手法を用いることができるが、複素数のスペクトルを得られる手法である必要がある。本実施形態では、フーリエ変換を用いた場合を例にとって説明する。 Next, the time-frequency conversion means 20 performs time-frequency conversion on the first sound frame to obtain a complex spectrum of the first sound frame (step S4). Similarly, the time-frequency conversion means 20 performs time-frequency conversion on the second sound frame to obtain a complex spectrum of the second sound frame (step S5). In steps S4 and S5, specifically, time-frequency conversion is performed using a window function. As the time-frequency conversion, various known methods such as Fourier transform, wavelet transform, and the like can be used, but the method needs to obtain a complex spectrum. In the present embodiment, a case where Fourier transform is used will be described as an example.

一般に、所定の信号に対してフーリエ変換を行う場合、信号を所定の長さに区切って行う必要があるが、この場合、区切った信号に対してそのまま矩形窓でフーリエ変換を行うと、区切った境界部に基づく擬似高調波成分が発生する。そこで、一般にフーリエ変換を行う場合には、ハニング窓と呼ばれる窓関数を用いて、窓境界部の値を減衰変化させた後、変化後の値に対してフーリエ変換を実行する。 In general, when Fourier transform is performed on a predetermined signal, it is necessary to divide the signal into predetermined lengths. In this case, if Fourier transform is performed on the divided signal as it is in a rectangular window, the division is performed. A pseudo-harmonic component based on the boundary is generated. Therefore, in general, when Fourier transform is performed, the value of the window boundary is attenuated and changed using a window function called a Hanning window, and then the Fourier transform is performed on the changed value.

本実施形態においても、ハニング窓関数Ｗ（ｉ）、Ｗ２（ｉ）を利用している。ハニング窓関数Ｗ（ｉ）、Ｗ２（ｉ）は、中央の所定のサンプル番号ｉの位置において最大値１をとり、両端付近のサンプル番号ｉの位置において最小値０をとるように設定されている。本実施形態では、各第１音響フレームについてのフーリエ変換は、以下の〔数式２〕で定義されるハニング窓関数Ｗ（ｉ）を乗じたものに対して行われることになる。また、各第２音響フレームについてのフーリエ変換は、以下の〔数式３〕で定義されるハニング窓関数Ｗ２（ｉ）を乗じたものに対して行われることになる。 Also in this embodiment, Hanning window functions W (i) and W2 (i) are used. The Hanning window functions W (i) and W2 (i) are set to take a maximum value 1 at the position of the predetermined sample number i in the center and take a minimum value 0 at the positions of the sample numbers i near both ends. . In this embodiment, the Fourier transform for each first acoustic frame is performed on the product multiplied by the Hanning window function W (i) defined by the following [Equation 2]. Further, the Fourier transform for each second acoustic frame is performed on a product multiplied by the Hanning window function W2 (i) defined by the following [Equation 3].

〔数式２〕
Ｗ（ｉ）＝０．５−０．５ｃｏｓ（２πｉ／Ｎ） [Formula 2]
W (i) = 0.5−0.5 cos (2πi / N)

〔数式３〕
Ｗ２（ｉ）＝０．５−０．５ｃｏｓ（２πｉ／Ｍ） [Formula 3]
W2 (i) = 0.5−0.5 cos (2πi / M)

本実施形態においては、第１音響フレーム、第２音響フレームいずれについても、奇数番目の音響フレームと偶数番目の音響フレームを、所定サンプルずつ重複して読み込む。したがって、周波数成分の改変を行った後、音響信号の状態に復元する際、窓関数を乗じた奇数番目の音響フレームと、窓関数を乗じた偶数番目の音響フレームの重複サンプルを加算した場合に、ほぼ元の値に戻るようにしなければならない。このため、奇数番目の音響フレームと偶数番目の音響フレームの重複部分において、窓関数Ｗ（ｉ）、Ｗ２（ｉ）を加算すると、全区間固定値１になるように定義される必要がある。 In the present embodiment, the odd-numbered sound frame and the even-numbered sound frame are read in duplicate by predetermined samples for both the first sound frame and the second sound frame. Therefore, when the frequency component is modified and then restored to the state of the acoustic signal, when an odd-numbered acoustic frame multiplied by the window function and an overlapping sample of the even-numbered acoustic frame multiplied by the window function are added Should return to almost the original value. For this reason, when the window functions W (i) and W2 (i) are added in the overlapping portion between the odd-numbered acoustic frames and the even-numbered acoustic frames, it is necessary to define the whole section fixed value 1.

時間−周波数変換手段２０が、第１音響フレームに対してフーリエ変換を行う場合は、左チャンネル信号Ｘｌ（ｉ）、右チャンネル信号Ｘｒ（ｉ）（ｉ＝０，…，Ｎ−１）に対して、窓関数Ｗ（ｉ）を用いて、以下の〔数式４〕に従った処理を行い、左チャンネルに対応する変換データの実部Ａｌ（ｊ）、虚部Ｂｌ（ｊ）、右チャンネルに対応する変換データの実部Ａｒ（ｊ）、虚部Ｂｒ（ｊ）を得る。 When the time-frequency conversion means 20 performs a Fourier transform on the first sound frame, the left channel signal Xl (i) and the right channel signal Xr (i) (i = 0,..., N−1). Then, the window function W (i) is used to perform the processing according to the following [Equation 4], and the real part Al (j), the imaginary part Bl (j) of the conversion data corresponding to the left channel is applied to the right channel. The real part Ar (j) and imaginary part Br (j) of the corresponding conversion data are obtained.

〔数式４〕
Ａｌ（ｊ）＝Σ_i=0,…,N-1Ｗ（ｉ）・Ｘｌ（ｉ）・ｃｏｓ（２πｉｊ／Ｎ）
Ｂｌ（ｊ）＝Σ_i=0,…,N-1Ｗ（ｉ）・Ｘｌ（ｉ）・ｓｉｎ（２πｉｊ／Ｎ）
Ａｒ（ｊ）＝Σ_i=0,…,N-1Ｗ（ｉ）・Ｘｒ（ｉ）・ｃｏｓ（２πｉｊ／Ｎ）
Ｂｒ（ｊ）＝Σ_i=0,…,N-1Ｗ（ｉ）・Ｘｒ（ｉ）・ｓｉｎ（２πｉｊ／Ｎ） [Formula 4]
Al (j) = Σ _{i = 0,..., N-1} W (i) · Xl (i) · cos (2πij / N)
Bl (j) = Σi _{= 0,..., N-1} W (i) · Xl (i) · sin (2πij / N)
Ar (j) = Σi _{= 0,..., N-1} W (i) · Xr (i) · cos (2πij / N)
Br (j) = Σi _{= 0,..., N-1} W (i) · Xr (i) · sin (2πij / N)

上記〔数式４〕において、ｉは、各音響フレーム内のＮ個のサンプルに付した通し番号であり、ｉ＝０，１，２，…Ｎ−１の整数値をとる。また、ｊは周波数の値について、値の小さなものから順に付した通し番号であり、ｉと同様にｊ＝０，１，２，…Ｎ／２−１の整数値をとる。サンプリング周波数が９６ｋＨｚ、Ｎ＝８１９２の場合、ｊの値が１つ異なると、周波数が約１１．７Ｈｚ異なることになる。 In the above [Expression 4], i is a serial number assigned to N samples in each acoustic frame, and takes an integer value of i = 0, 1, 2,... N−1. Further, j is a serial number assigned in order from the smallest value of the frequency value, and takes an integer value of j = 0, 1, 2,... N / 2-1 like i. When the sampling frequency is 96 kHz and N = 8192, if the value of j is different by one, the frequency will be different by about 11.7 Hz.

時間−周波数変換手段２０が、第２音響フレームに対してフーリエ変換を行う場合は、左チャンネル信号Ｘ２ｌ（ｉ）、右チャンネル信号Ｘ２ｒ（ｉ）（ｉ＝０，…，Ｍ−１）に対して、窓関数Ｗ（ｉ）を用いて、以下の〔数式５〕に従った処理を行い、左チャンネルに対応する変換データの実部Ａ２ｌ（ｊ）、虚部Ｂｌ（ｊ）、右チャンネルに対応する変換データの実部Ａ２ｒ（ｊ）、虚部Ｂｒ（ｊ）を得る。 When the time-frequency conversion means 20 performs a Fourier transform on the second acoustic frame, the left channel signal X2l (i) and the right channel signal X2r (i) (i = 0,..., M-1). Then, the window function W (i) is used to perform processing according to the following [Equation 5] to convert the real part A2l (j), imaginary part Bl (j), and right channel of the conversion data corresponding to the left channel. The real part A2r (j) and imaginary part Br (j) of the corresponding conversion data are obtained.

〔数式５〕
Ａ２ｌ（ｊ）＝Σ_i=0,…,N-1Ｗ（ｉ）・Ｘ２ｌ（ｉ）・ｃｏｓ（２πｉｊ／Ｍ）
Ｂ２ｌ（ｊ）＝Σ_i=0,…,N-1Ｗ（ｉ）・Ｘ２ｌ（ｉ）・ｓｉｎ（２πｉｊ／Ｍ）
Ａ２ｒ（ｊ）＝Σ_i=0,…,N-1Ｗ（ｉ）・Ｘ２ｒ（ｉ）・ｃｏｓ（２πｉｊ／Ｍ）
Ｂ２ｒ（ｊ）＝Σ_i=0,…,N-1Ｗ（ｉ）・Ｘ２ｒ（ｉ）・ｓｉｎ（２πｉｊ／Ｍ） [Formula 5]
A2l (j) = Σi _{= 0,..., N-1} W (i) .X2l (i) .cos (2πij / M)
B2l (j) = Σi _{= 0,..., N-1} W (i) .X2l (i) .sin (2πij / M)
A2r (j) = Σi _{= 0,..., N-1} W (i) .X2r (i) .cos (2πij / M)
B2r (j) = Σi _{= 0,..., N-1} W (i) .X2r (i) .sin (2πij / M)

上記〔数式４〕〔数式５〕において、ｉは、各音響フレーム内のＭ個のサンプルに付した通し番号であり、ｉ＝０，１，２，…Ｍ−１の整数値をとる。また、ｊは周波数の値について、値の小さなものから順に付した通し番号であり、ｉと同様にｊ＝０，１，２，…Ｍ／２−１の整数値をとる。サンプリング周波数が４８ｋＨｚ、Ｍ＝４０９６の場合、ｊの値が１つ異なると、周波数が約１１．７Ｈｚ異なることになる。 In the above [Expression 4] and [Expression 5], i is a serial number assigned to M samples in each acoustic frame, and takes an integer value of i = 0, 1, 2,... M−1. Further, j is a serial number assigned to the frequency values in ascending order, and takes an integer value of j = 0, 1, 2,... M / 2-1 as with i. When the sampling frequency is 48 kHz and M = 4096, if the value of j is different by one, the frequency will be different by about 11.7 Hz.

ステップＳ４、Ｓ５においてそれぞれ上記〔数式４〕〔数式５〕に従った処理を実行することにより、各第１音響フレーム、第２音響フレームに対応する複素数のスペクトルが得られる。続いて、周波数成分改変手段３０が、ステップＳ６、Ｓ７において周波数成分の改変を行う。まず、周波数成分改変手段３０は、第１音響フレームから得られた第１スペクトルを用いて、高域に低域打消し成分を追加する処理を行う（ステップＳ６）。具体的には、第１スペクトルの成分を正負反転させ、周波数Ｆｔ（本実施形態では、Ｆｔ＝Ｆｓ／２）を中心に折り返す処理を行う。 In steps S4 and S5, the processes according to the above [Equation 4] and [Equation 5] are executed, whereby complex spectra corresponding to the first and second acoustic frames are obtained. Subsequently, the frequency component modifying means 30 modifies the frequency component in steps S6 and S7. First, the frequency component modification means 30 performs a process of adding a low frequency cancellation component to a high frequency using the first spectrum obtained from the first acoustic frame (step S6). Specifically, the first spectrum component is inverted between positive and negative, and a process of folding around the frequency Ft (Ft = Fs / 2 in this embodiment) is performed.

ステップＳ６における低域打消し成分の追加処理を行ったら、次に、周波数成分改変手段３０は、低域打消し成分の追加処理後の第１スペクトルと第２スペクトルの合成を行う（ステップＳ７）。第２スペクトルについては、周波数Ｆｓ／２を中心に折り返しながら加算する。この結果、第１スペクトル（合成スペクトル）は、図２（ｃ）に示すような状態となる。合成後の第１スペクトル（改変音響信号のスペクトル）においては、周波数Ｆｓ／２より大きい範囲において、元の第１スペクトルに存在した信号成分を振幅反転したものと、第２スペクトルの信号成分が、折り返された状態で存在することになる。 Once the low frequency cancellation component addition processing in step S6 has been performed, the frequency component modification means 30 then combines the first spectrum and the second spectrum after the low frequency cancellation component addition processing (step S7). . The second spectrum is added while folding around the frequency Fs / 2. As a result, the first spectrum (synthetic spectrum) is in a state as shown in FIG. In the first spectrum after synthesis (the spectrum of the modified acoustic signal), a signal component obtained by inverting the amplitude of the signal component existing in the original first spectrum and a signal component of the second spectrum in a range larger than the frequency Fs / 2, It exists in a folded state.

図２（ｃ）の例では、周波数Ｆｓ／２〜Ｆｓにおいて、２種のスペクトル成分を重ねて表示しているが、実際には複素ベクトルで加算される。図２（ｃ）に示すように、合成後のスペクトルでは、周波数Ｆｓ／２〜Ｆｓにおいて、第１音響信号の振幅反転成分（低域打消し成分）および第２音響信号に由来する成分は双方とも雑音化されて、エネルギーが高い第１音響信号の低域成分にマスキングされるため、改変後の音響信号を再生したとしても、第１音響信号の振幅反転成分および第２音響信号の成分は、人間には聴取されない。 In the example of FIG. 2C, two types of spectral components are superimposed and displayed at frequencies Fs / 2 to Fs, but in reality, they are added as complex vectors. As shown in FIG.2 (c), in the spectrum after a synthesis | combination, in frequency Fs / 2-Fs, the amplitude inversion component (low frequency cancellation component) of a 1st acoustic signal and the component derived from a 2nd acoustic signal are both. Since both are made noise and masked by the low frequency component of the first acoustic signal having high energy, even if the modified acoustic signal is reproduced, the amplitude inversion component of the first acoustic signal and the component of the second acoustic signal are , Not heard by humans.

周波数成分改変手段３０が上記ステップＳ６、Ｓ７で行った処理は以下の〔数式６〕としてまとめることができる。 The processing performed by the frequency component modifying means 30 in steps S6 and S7 can be summarized as the following [Equation 6].

〔数式６〕
Ａｌ´（ｊ）← −Ａｌ（Ｎ／２−ｊ）×α＋Ａ２ｌ（Ｍ−ｊ）×β
Ｂｌ´（ｊ）← −Ｂｌ（Ｎ／２−ｊ）×α＋Ｂ２ｌ（Ｍ−ｊ）×β
Ａｒ´（ｊ）← −Ａｒ（Ｎ／２−ｊ）×α＋Ａ２ｒ（Ｍ−ｊ）×β
Ｂｒ´（ｊ）← −Ｂｒ（Ｎ／２−ｊ）×α＋Ｂ２ｒ（Ｍ−ｊ）×β [Formula 6]
Al ′ (j) ← −Al (N / 2−j) × α + A2l (M−j) × β
Bl ′ (j) ← −Bl (N / 2−j) × α + B2l (M−j) × β
Ar ′ (j) ← −Ar (N / 2−j) × α + A2r (M−j) × β
Br ′ (j) ← −Br (N / 2−j) × α + B2r (M−j) × β

周波数成分改変手段３０は、周波数成分Ａｌ（ｊ）、Ｂｌ（ｊ）、Ａ２ｌ（ｊ）、Ｂ２ｌ（ｊ）に対して、上記〔数式６〕に従った処理を、ｊ＝Ｎ／４＋１，・・・，Ｎ／２−１の各ｊについて実行する。１音響フレームのサンプル数Ｎ=８１９２、サンプリング周波数Ｆｕｓ＝９６ｋＨｚ（＝２Ｆｓ）でアップサンプリングを行った場合、ｊ＝Ｎ／４は、周波数Ｆｓ／２（＝Ｆｕｓ／４）に対応し、ｊ＝Ｎ／２は、周波数Ｆｓ（＝Ｆｕｓ／２）に対応する。したがって、〔数式６〕においては、ｊ＝２０４９，・・・，４０９６が処理対象となり、約２４．０ｋＨｚ（略人間の可聴域上限）〜約４７．９ｋＨｚの周波数成分が変更される。 The frequency component modifying means 30 performs the processing according to the above [Equation 6] on the frequency components Al (j), Bl (j), A2l (j), B2l (j), j = N / 4 + 1,. .., for each j of N / 2-1. When upsampling is performed with the number of samples of one acoustic frame N = 8192 and the sampling frequency Fus = 96 kHz (= 2Fs), j = N / 4 corresponds to the frequency Fs / 2 (= Fus / 4), and j = N / 2 corresponds to the frequency Fs (= Fus / 2). Therefore, in [Formula 6], j = 2049,..., 4096 is a processing target, and the frequency component of about 24.0 kHz (approximately the upper limit of the human audible range) to about 47.9 kHz is changed.

上記〔数式６〕において、α、βは、０．０＜α、β≦１．０の範囲で設定されるスケーリング実数値である。本実施形態ではα＝β＝１．０に設定されている。上記〔数式６〕に示した各式右辺の第１項（−Ａｌ（Ｎ／２−ｊ）×α等）は第１スペクトルの低域打消し成分であり、各式右辺の第２項（Ａ２ｌ（Ｍ−ｊ）×β等）は、第２スペクトルを周波数方向に折り返した成分である。 In the above [Expression 6], α and β are real scaling values set in the range of 0.0 <α and β ≦ 1.0. In the present embodiment, α = β = 1.0 is set. The first term (−Al (N / 2−j) × α, etc.) on the right side of each formula shown in [Formula 6] is a low-frequency canceling component of the first spectrum, and the second term ( A2l (M−j) × β etc.) is a component obtained by folding the second spectrum in the frequency direction.

図５のフローチャートと上記〔数式６〕の対応関係を示すと、ステップＳ６における高域に低域打消し成分を追加する処理が、上記〔数式６〕の右辺第１項を加算する処理に対応し、ステップＳ７における第１スペクトルと第２スペクトルの合成処理が、上記〔数式６〕の右辺第２項を加算する処理に対応することになる。 When the correspondence between the flowchart of FIG. 5 and the above [Equation 6] is shown, the process of adding the low frequency cancellation component to the high frequency in Step S6 corresponds to the process of adding the first term on the right side of the above [Equation 6]. Then, the synthesis process of the first spectrum and the second spectrum in step S7 corresponds to the process of adding the second term on the right side of the above [Equation 6].

周波数成分改変手段３０が、ステップＳ７における合成処理を終えたら、次に、周波数−時間変換手段４０が、改変後の第１スペクトルを周波数−時間変換して改変音響フレームを得る処理を行う（ステップＳ８）。この周波数−時間変換は、当然のことながら、時間−周波数変換手段２０が実行した手法に対応していることが必要となる。本実施形態では、時間−周波数変換手段２０において、フーリエ変換を施しているため、周波数−時間変換手段４０は、フーリエ逆変換を実行することになる。 After the frequency component modification unit 30 finishes the synthesis process in step S7, the frequency-time conversion unit 40 performs a process of obtaining a modified acoustic frame by performing frequency-time conversion on the first spectrum after modification (step). S8). Naturally, this frequency-time conversion needs to correspond to the technique executed by the time-frequency conversion means 20. In the present embodiment, since the time-frequency conversion means 20 performs the Fourier transform, the frequency-time conversion means 40 executes the inverse Fourier transform.

具体的には、周波数−時間変換手段４０は、周波数成分改変手段３０により得られた第１スペクトルの左チャンネルの実部Ａｌ´（ｊ）、虚部Ｂｌ´（ｊ）、右チャンネルの実部Ａｒ´（ｊ）、虚部Ｂｒ´（ｊ）を用いて、以下の〔数式７〕に従った処理を行い、Ｘｌ´（ｉ）、Ｘｒ´（ｉ）を算出する。なお、周波数成分改変手段３０において改変されていない周波数成分については、Ａｌ´（ｊ）、Ｂｌ´（ｊ）、Ａｒ´（ｊ）、Ｂｒ´（ｊ）として、それぞれ元の周波数成分であるＡｌ（ｊ）、Ｂｌ（ｊ）、Ａｒ（ｊ）、Ｂｒ（ｊ）を用いる。 Specifically, the frequency-time conversion means 40 has a real part Al ′ (j), an imaginary part Bl ′ (j) of the left channel of the first spectrum obtained by the frequency component modification means 30, and a real part of the right channel. Using Ar ′ (j) and imaginary part Br ′ (j), processing according to the following [Equation 7] is performed to calculate Xl ′ (i) and Xr ′ (i). The frequency components that have not been modified by the frequency component modifying means 30 are Al ′ (j), Bl ′ (j), Ar ′ (j), and Br ′ (j), which are the original frequency components, Al. (J), Bl (j), Ar (j), and Br (j) are used.

〔数式７〕
Ｘｌ´（ｉ）＝１／Ｎ×｛Σ_jＡｌ´（ｊ）×ｃｏｓ（２πｉｊ／Ｎ）−Σ_jＢｌ´（ｊ）×ｓｉｎ（２πｉｊ／Ｎ）｝＋Ｘｌｐ（ｉ＋Ｎ／２）
Ｘｒ´（ｉ）＝１／Ｎ×｛Σ_jＡｒ´（ｊ）×ｃｏｓ（２πｉｊ／Ｎ）−Σ_jＢｒ´（ｊ）×ｓｉｎ（２πｉｊ／Ｎ）｝＋Ｘｒｐ（ｉ＋Ｎ／２） [Formula 7]
Xl' (i) = 1 / N × {Σ j Al' (j) × cos (2πij / N) -Σ j Bl' (j) × sin (2πij / N)} + Xlp (i + N / 2)
Xr' (i) = 1 / N × {Σ j Ar' (j) × cos (2πij / N) -Σ j Br' (j) × sin (2πij / N)} + Xrp (i + N / 2)

上記〔数式７〕においては、式が繁雑になるのを防ぐため、Σ_j=0,…_,N-1をΣ_jとして示している。上記〔数式７〕における第１式の“＋Ｘｌｐ（ｉ＋Ｎ／２）”、第２式の“＋Ｘｒｐ（ｉ＋Ｎ／２）”の項は、直前に改変された改変音響フレームのデータＸｌｐ（ｉ）、Ｘｒｐ（ｉ）が存在する場合に、時間軸上Ｎ／２サンプル分重複することを考慮して加算するためのものである。上記〔数式７〕により改変音響フレームの左チャンネルの各サンプルＸｌ´（ｉ）、右チャンネルの各サンプルＸｒ´（ｉ）、が得られることになる。 In the above [Expression 7], Σ _{j = 0,} ... _{, N−1} is shown as Σ _{j in} order to prevent the expression from becoming complicated. The terms “+ Xlp (i + N / 2)” in the first expression and “+ Xrp (i + N / 2)” in the second expression in the above [Expression 7] are the data Xlp (i) of the modified acoustic frame modified immediately before, When Xrp (i) exists, the addition is performed in consideration of the overlap of N / 2 samples on the time axis. By the above [Equation 7], each sample Xl ′ (i) of the left channel and each sample Xr ′ (i) of the right channel of the modified acoustic frame are obtained.

ステップＳ８における周波数−時間変換の後、得られた改変音響フレームを順次出力して改変音響信号を得ることができる。ステップＳ８までの処理であっても、実験上は、図２を用いて説明したように、デジタル的に再サンプリングすることにより複製された際に、原音響信号が抑圧され、代わりに第２音響信号の成分が可聴化されることを確認できる。しかし、一般に普及している音響信号加工ツールを用いてデジタル的に再サンプリングを行ったり、複製の際に一般的に利用されるＡ／Ｄ変換器やサンプラーを用いて複製を行ったりすると、前述の所望の効果は殆ど実現できない。その理由は、商用の音響信号加工ツールやサンプラー・録音機器等においては、アンチエイリアシング処理を行うＬＰＦ回路が前置されていることが普通で、サンプリング時に指定されたサンプリング周波数の１／２より高域の周波数成分を減衰させてからサンプリングを行うようにし、高域折り返し信号成分による品質劣化を防ぐように対策されているためである。そのため、ステップ８までの処理で埋め込んだ原音響信号を打ち消すための信号成分および第２音響信号を発生させる信号成分が削除されてしまうのが通常である。 After the frequency-time conversion in step S8, the modified acoustic frames obtained can be sequentially output to obtain a modified acoustic signal. Even in the processing up to step S8, in the experiment, as described with reference to FIG. 2, when the digital sound is duplicated by re-sampling, the original sound signal is suppressed, and the second sound is used instead. It can be confirmed that the component of the signal is audible. However, if resampling is performed digitally using a widely used acoustic signal processing tool, or if replication is performed using an A / D converter or sampler generally used for replication, The desired effect of can hardly be realized. The reason for this is that commercial acoustic signal processing tools, samplers, recording devices, etc. usually have an anti-aliasing LPF circuit in front, which is higher than half the sampling frequency specified at the time of sampling. This is because sampling is performed after the frequency component of the frequency band is attenuated, and measures are taken to prevent quality degradation due to the high frequency aliasing signal component. Therefore, the signal component for canceling the original sound signal embedded in the processing up to step 8 and the signal component for generating the second sound signal are usually deleted.

図６は、ステップＳ８までの処理により得られた改変音響信号が、ＬＰＦ回路を備えた現実的な装置で複製される場合の概念を示す図である。図６において、図６（ａ）は図２（ｃ）と同一であり、ステップＳ８までの処理の結果得られた改変音響信号の信号スペクトルを示す。図６（ｂ）は、ＬＰＦ（ＬｏｗＰａｓｓＦｉｌｔｅｒ）処理によるフィルタゲインを示す。図６（ｃ）は、改変音響信号に対してＬＰＦ処理を行った信号の信号スペクトルを示す。図６（ｄ）は、改変音響信号をＬＰＦ処理後に再サンプリングして得られた複製音響信号の信号スペクトルを示す。 FIG. 6 is a diagram showing a concept in a case where the modified acoustic signal obtained by the processing up to step S8 is duplicated by a realistic device including an LPF circuit. In FIG. 6, FIG. 6 (a) is the same as FIG. 2 (c), and shows the signal spectrum of the modified acoustic signal obtained as a result of the processing up to step S8. FIG. 6B shows a filter gain obtained by LPF (Low Pass Filter) processing. FIG. 6C shows a signal spectrum of a signal obtained by performing LPF processing on the modified acoustic signal. FIG. 6D shows a signal spectrum of a duplicated sound signal obtained by re-sampling the modified sound signal after LPF processing.

図６（ｂ）に示すように、ＬＰＦ処理によるフィルタゲインは、周波数Ｆｓ／２までは減衰値０［ｄＢ］で一定であり、周波数Ｆｓ／２前後で急激に下がり、ある程度まで下がったところで最大減衰値に値（−３０ｄＢなど）で一定になる。アナログ的な複製処理を行う場合は無論のこと、デジタル的な複製処理を行う場合でもサンプリング周波数の変更を伴う場合には、サンプリングの前処理として通常ＬＰＦ処理が行われるため、図６（ａ）に示した改変音響信号は、図６（ｂ）に示したような特性のＬＰＦ処理が行われることになる。改変音響信号に対してＬＰＦ処理を行った信号は、図６（ｃ）に示したようなものになる。図６（ａ）と図６（ｃ）を比較するとわかるように、周波数Ｆｓ／２以下の元の音響信号に対応する部分はそのまま残り、周波数Ｆｓ／２を超える打消し成分、妨害音に対応する成分については、ＬＰＦの影響で減衰する。 As shown in FIG. 6B, the filter gain by the LPF processing is constant at an attenuation value of 0 [dB] up to the frequency Fs / 2, decreases sharply around the frequency Fs / 2, and reaches a maximum when it decreases to some extent. The attenuation value is constant at a value (such as −30 dB). Of course, when analog duplication processing is performed, even when digital duplication processing is performed, if the sampling frequency is changed, normal LPF processing is performed as sampling preprocessing. The modified acoustic signal shown in Fig. 6 is subjected to LPF processing having the characteristics shown in Fig. 6B. A signal obtained by performing the LPF process on the modified acoustic signal is as shown in FIG. As can be seen by comparing FIG. 6 (a) and FIG. 6 (c), the part corresponding to the original acoustic signal having the frequency Fs / 2 or less remains as it is, and it corresponds to the cancellation component and the interfering sound exceeding the frequency Fs / 2. The component to be attenuated is affected by the LPF.

さらに、図６（ｃ）に示す信号を、サンプリング周波数Ｆｓで再サンプリングすると、図６（ｄ）に示す複製音響信号が得られる。複製音響信号では、エイリアシングが発生し、減衰した打消し成分、妨害音に対応する成分が、人間の可聴域である周波数Ｆｓ／２以下に折り返される。これにより、図６（ｄ）に示すように、複製音響信号の周波数Ｆｓ／２以下の範囲には、減衰した打消し成分の影響により、僅かながら減衰した元の音響信号の成分に、減衰した第２音響信号に対応する成分が重畳されることになる。減衰した第２音響信号は、僅かながら減衰したものの残っている元の音響信号の成分によりマスキングされ、ほとんど人間に聴こえなくなり、複製防止効果がなくなる。本実施形態では、複製時にＬＰＦ処理が行われた場合にも第２音響信号を聴取可能にするため、以下のステップＳ９において補正処理を行う。 Furthermore, when the signal shown in FIG. 6C is resampled at the sampling frequency Fs, the duplicate acoustic signal shown in FIG. 6D is obtained. In the duplicated sound signal, aliasing occurs, and the attenuated canceling component and the component corresponding to the interfering sound are folded back to the frequency Fs / 2 or less which is a human audible range. As a result, as shown in FIG. 6D, in the range of the frequency Fs / 2 or less of the duplicated acoustic signal, the original acoustic signal component attenuated slightly due to the influence of the attenuated cancellation component. A component corresponding to the second acoustic signal is superimposed. The attenuated second acoustic signal is masked by the remaining original acoustic signal component that has been slightly attenuated, but is almost inaudible to humans, and the anti-duplication effect is lost. In the present embodiment, correction processing is performed in the following step S9 so that the second acoustic signal can be heard even when the LPF processing is performed at the time of replication.

図７は、ステップＳ９の補正処理による信号スペクトルの変化の様子を示す図である。図７において、図７（ａ）、（ｂ）は、図６（ａ）と同一であり、それぞれステップＳ８までの処理の結果得られた改変音響信号のＬチャンネル、Ｒチャンネルの信号スペクトルを示す。現実には、ステレオ音響信号の場合、Ｌチャンネル、Ｒチャンネルの信号スペクトルは同一でない場合が多いが、ここでは、説明の便宜上、Ｌチャンネル、Ｒチャンネルを同一の信号スペクトルで示している。図７（ｃ）、（ｄ）は、それぞれステップＳ９による処理後のＬチャンネル、Ｒチャンネルの信号スペクトルを示す。ステップＳ９においては、Ｌチャンネルの信号に対しては改変を行わず、Ｒチャンネルの信号に対してのみ改変を行うため、図７（ｃ）は図７（ａ）と同一となっている。 FIG. 7 is a diagram showing how the signal spectrum changes due to the correction processing in step S9. 7 (a) and 7 (b) are the same as FIG. 6 (a), and show the L channel and R channel signal spectra of the modified acoustic signal obtained as a result of the processing up to step S8, respectively. . In reality, in the case of a stereo sound signal, the signal spectrum of the L channel and the R channel is often not the same, but here, for convenience of explanation, the L channel and the R channel are indicated by the same signal spectrum. FIGS. 7C and 7D show the signal spectra of the L channel and the R channel after processing in step S9, respectively. In step S9, since the L channel signal is not modified, but only the R channel signal is modified, FIG. 7C is the same as FIG. 7A.

再び図５のフローチャートに戻って説明する。ステップＳ８までの処理により得られた図７（ａ）、（ｂ）に示す改変音響信号に対して、チャンネル間およびサンプル間の演算処理を行う（ステップＳ９）。具体的には、２つのチャンネルで対応する成分を加算しモノラル化すると、隣接する２つのサンプルの値が同一の値（２つのチャンネルで計４つのサンプルの平均値×２）になるように、Ｒチャンネルの各サンプルの値を補正する。本実施形態では、以下の〔数式８〕に従った処理を実行し、Ｒチャンネルの各サンプルの値Ｘｒ´を補正する。 Returning to the flowchart of FIG. For the modified acoustic signal shown in FIGS. 7A and 7B obtained by the processing up to step S8, arithmetic processing between channels and between samples is performed (step S9). Specifically, when the corresponding components are added to monaural in two channels, the values of two adjacent samples become the same value (average value of two samples in two channels × 2). The value of each sample in the R channel is corrected. In the present embodiment, processing according to the following [Equation 8] is executed to correct the value Xr ′ of each sample of the R channel.

〔数式８〕
Ｘｒ´（ｋ×２）←｛Ｘｌ´（ｋ×２）＋Ｘｌ´（ｋ×２＋１）＋Ｘｒ´（ｋ×２）＋Ｘｒ´（ｋ×２＋１）｝／２−Ｘｌ´（ｋ×２）
Ｘｒ´（ｋ×２＋１）←｛Ｘｌ´（ｋ×２）＋Ｘｌ´（ｋ×２＋１）＋Ｘｒ´（ｋ×２）＋Ｘｒ´（ｋ×２＋１）｝／２−Ｘｌ´（ｋ×２＋１） [Formula 8]
Xr ′ (k × 2) ← {Xl ′ (k × 2) + Xl ′ (k × 2 + 1) + Xr ′ (k × 2) + Xr ′ (k × 2 + 1)} / 2−Xl ′ (k × 2)
Xr ′ (k × 2 + 1) ← {Xl ′ (k × 2) + Xl ′ (k × 2 + 1) + Xr ′ (k × 2) + Xr ′ (k × 2 + 1)} / 2−Xl ′ (k × 2 + 1)

上記〔数式８〕において、ｋは、ｋ＝０，・・・，Ｎ／２−１の範囲の値をとる整数であり、Ｘｌ´（ｋ×２）はＬチャンネル信号の偶数番目のサンプル値、Ｘｌ´（ｋ×２＋１）はＬチャンネル信号の奇数番目のサンプル値、Ｘｒ´（ｋ×２）はＲチャンネル信号の偶数番目のサンプル値、Ｘｒ´（ｋ×２＋１）はＲチャンネル信号の奇数番目のサンプル値を示している。〔数式８〕において、第１式、第２式に共通の第１項は、２つのチャンネルの隣接する２サンプル（計４サンプル）の値の平均値×２の値である。第１式は、４サンプルの平均値×２から対応するＬチャンネルの偶数番目のサンプル値を減じた値を、新たなＲチャンネルの偶数番目のサンプル値とする処理を示している。また、第２式は、４サンプルの平均値×２から対応するＬチャンネルの奇数番目のサンプル値を減じた値を、新たなＲチャンネルの奇数番目のサンプル値とする処理を示している。このような補正を施した結果を図７（ｃ）、（ｄ）に示す。Ｌチャンネル側は何も補正を行っていないため、図７（ｃ）は図７（ｄ）と全く同一である。Ｒチャンネル側は、時間次元では大きく改変されているが、図７（ｄ）に示される周波数次元では図７（ｂ）に比べ顕著な変化は見られない。ただし、改変前に比べ隣接する２つのサンプルの値が近くなるため、図７（ｄ）の左側の点線で示されるように若干折り返しが発生する。 In the above [Equation 8], k is an integer taking a value in the range of k = 0,..., N / 2-1, and Xl ′ (k × 2) is an even-numbered sample value of the L channel signal. , Xl ′ (k × 2 + 1) is an odd-numbered sample value of the L channel signal, Xr ′ (k × 2) is an even-numbered sample value of the R channel signal, and Xr ′ (k × 2 + 1) is an odd number of the R channel signal. The second sample value is shown. In [Formula 8], the first term common to the first and second formulas is an average value × 2 of values of two adjacent samples (total of four samples) of two channels. The first equation shows processing in which the value obtained by subtracting the even-numbered sample value of the corresponding L channel from the average value of 4 samples × 2 is used as the even-numbered sample value of the new R channel. Further, the second equation shows processing in which a value obtained by subtracting the odd-numbered sample value of the corresponding L channel from the average value of 4 samples × 2 is used as the odd-numbered sample value of the new R channel. The results of such correction are shown in FIGS. 7 (c) and 7 (d). Since no correction is performed on the L channel side, FIG. 7C is exactly the same as FIG. 7D. The R channel side is greatly modified in the time dimension, but no significant change is observed in the frequency dimension shown in FIG. 7D compared to FIG. 7B. However, since the values of two adjacent samples are closer than before the modification, a slight fold occurs as shown by the dotted line on the left side of FIG.

上述のように、ステップＳ９においては、２つのチャンネルで対応する成分を加算しモノラル化すると、隣接する２つのサンプルの値が同一の値（２つのチャンネルで計４つのサンプルの平均値×２）になるように、Ｒチャンネルの各サンプルの値を補正する。ステップＳ９においては、上記〔数式８〕に限定されず、モノラル化した状態で隣接する２つのサンプルが同一になるように、変更の程度を増減させることができる。また、補正対象はＲチャンネルのサンプルに限定されず、Ｌチャンネル側の２つのサンプルを補正するようにしてもよく、あるいはＲチャンネルの奇数番目１サンプルとＬチャンネルの偶数番目１サンプルを補正するようにしても良い。 As described above, in step S9, when the corresponding components are added to monaural in two channels, the values of two adjacent samples are the same value (average value of two samples in two channels × 2). The value of each sample of the R channel is corrected so that In step S9, the degree of change can be increased or decreased so that the two adjacent samples are the same in the monaural state without being limited to the above [Formula 8]. The correction target is not limited to the R channel sample, and two samples on the L channel side may be corrected, or the odd first sample of the R channel and the even first sample of the L channel may be corrected. Anyway.

改変音響フレーム出力手段５０は、改変音響フレーム補正手段４５の処理により得られた補正後の改変音響フレームを順次出力ファイルに出力する。上記図５のフローチャートに示した処理は、広帯域音響信号の全ての第１音響フレームに対して実行される。第１音響フレームの数が、第２音響フレームの数より多い場合は、先頭の第２音響フレームに戻って繰り返し処理を行う。このようにして全ての第１音響フレームに対して処理を行った結果、改変音響フレームの集合である改変音響信号が、改変音響信号記憶部６２に記憶される。本実施形態では、第１音響信号の先頭から最後まで第２音響信号を埋め込むようにしたが、埋め込む位置を設定して、その範囲にだけ埋め込むようにすることも可能である。 The modified sound frame output means 50 sequentially outputs the modified sound frames after correction obtained by the process of the modified sound frame correction means 45 to the output file. The process shown in the flowchart of FIG. 5 is executed for all the first sound frames of the wideband sound signal. When the number of the first sound frames is larger than the number of the second sound frames, the process returns to the first second sound frame and is repeated. As a result of performing processing on all the first sound frames in this way, a modified sound signal that is a set of modified sound frames is stored in the modified sound signal storage unit 62. In the present embodiment, the second acoustic signal is embedded from the beginning to the end of the first acoustic signal. However, it is also possible to set an embedding position and embed only in that range.

図８は、ステップＳ９までの処理により得られた改変音響信号が、ＬＰＦ回路を備えた現実的な装置で複製される場合の概念を示す図である。図８において、図８（ａ）、（ｂ）は、それぞれ図７（ｃ）、（ｄ）と同一であり、ステップＳ９までの処理の結果得られた補正後の改変音響信号のＬチャンネル、Ｒチャンネルの信号スペクトルを示す。図８（ｃ）は、ＬＰＦ処理後に再サンプリングして得られた複製音響信号の信号スペクトルを示す。 FIG. 8 is a diagram showing a concept in a case where the modified acoustic signal obtained by the processing up to step S9 is duplicated by a realistic device equipped with an LPF circuit. 8, FIGS. 8A and 8B are the same as FIGS. 7C and 7D, respectively, and the L channel of the modified acoustic signal after correction obtained as a result of the processing up to step S9, The signal spectrum of R channel is shown. FIG.8 (c) shows the signal spectrum of the duplication acoustic signal obtained by resampling after LPF processing.

図８に示すように、Ｌチャンネルスピーカー、Ｒチャンネルスピーカーからそれぞれ発せられたＬチャンネル改変音響信号、Ｒチャンネル改変音響信号をステレオまたはモノラルマイクロフォンで混合して録音し、サンプリング周波数Ｆｓで再サンプリングして複製音響信号を得る。（ステレオマイクロンフォンで録音しても実質的には左右同一のＬＲ混合信号が収録され、モノラルマイクロフォンで収録する場合と大差ない）。すると、ステレオ音響信号の混合により隣接する２つのサンプルが同一値になり１／２に間引いた状態と等価になりエイリアシングが発生し、図８（ｃ）に示すように、複製音響信号の周波数Ｆｓ／２以下の範囲には、図２（ｂ）に示したような第２音響信号が復元され、第２音響信号に基づく音が人間に聴こえる。警告メッセージ等の第２音響信号が音楽に重ねて聴こえると、客観的に違法コピーであることが判明するとともに、本来の状態で音楽を鑑賞することができず、商品化することができない。このため、複製に対する抑止力が働くことになる。 As shown in FIG. 8, the L channel modified acoustic signal and the R channel modified acoustic signal respectively emitted from the L channel speaker and the R channel speaker are mixed and recorded by a stereo or monaural microphone, and resampled at the sampling frequency Fs. A duplicate acoustic signal is obtained. (Even when recording with a stereo microphone, substantially the same left and right LR mixed signals are recorded, which is not much different from recording with a monaural microphone). Then, due to the mixing of the stereo sound signal, two adjacent samples have the same value, which is equivalent to a state where the sample is thinned out to ½, and aliasing occurs. As shown in FIG. 8C, the frequency Fs of the duplicate sound signal In the range of / 2 or less, the second acoustic signal as shown in FIG. 2B is restored, and a sound based on the second acoustic signal can be heard by humans. If the second acoustic signal such as a warning message is heard over the music, it is objectively found to be an illegal copy, and the music cannot be appreciated in its original state and cannot be commercialized. For this reason, deterrence against replication works.

＜２．２．２．第２の実施形態＞
次に、本発明の第２の実施形態である音響信号に対する妨害音の埋め込み装置について説明する。図９は、第２の実施形態に係る音響信号に対する妨害音の埋め込み装置の処理動作を示すフローチャートである。第１の実施形態では、あらかじめ準備しておいた第２の音響信号を用いたが、第２の実施形態では、あらかじめ音源を準備することなく白色雑音など人工的に生成した雑音を埋め込む処理を行う。本実施形態においても、第１の実施形態と同様、第１音響信号は、サンプリング周波数Ｆｓとして４８ｋＨｚでサンプリングしたものを用いる。また、本実施形態では、折り返しの中心となる周波数Ｆｔ＝Ｆｓ／２とする。 <2.2.2. Second Embodiment>
Next, a description will be given of a disturbance sound embedding device for an acoustic signal according to a second embodiment of the present invention. FIG. 9 is a flowchart showing the processing operation of the interference sound embedding device for the acoustic signal according to the second embodiment. In the first embodiment, the second acoustic signal prepared in advance is used. However, in the second embodiment, processing for embedding artificially generated noise such as white noise without preparing a sound source in advance is performed. Do. Also in the present embodiment, as in the first embodiment, the first acoustic signal is sampled at 48 kHz as the sampling frequency Fs. In the present embodiment, the frequency Ft = Fs / 2, which is the center of folding, is used.

まず、第１の実施形態におけるステップＳ１と同様、アップサンプリング手段８が、音響信号記憶部６１に記憶されたステレオの第１音響信号の左右の各チャンネルに対して、元のサンプリング周波数よりもサンプリング周波数を上げてサンプリングする処理を行う（ステップＳ１１）。本実施形態でも、第１の実施形態と同様、上記〔数式１〕に従った処理を実行し、元の第１音響信号のサンプリング周波数４８ｋＨｚを、サンプリング周波数９６ｋＨｚにアップサンプリングする。 First, as in step S1 in the first embodiment, the upsampling means 8 samples the left and right channels of the stereo first acoustic signal stored in the acoustic signal storage unit 61 more than the original sampling frequency. A sampling process is performed by increasing the frequency (step S11). Also in the present embodiment, as in the first embodiment, the processing according to the above [Equation 1] is executed, and the sampling frequency 48 kHz of the original first acoustic signal is upsampled to the sampling frequency 96 kHz.

音響フレーム読込手段１０は、アップサンプリングされたステレオの広帯域音響信号の左右の各チャンネルから、それぞれ所定数Ｎのサンプルを１つの音響フレームとして読み込む（ステップＳ１２）。 The acoustic frame reading means 10 reads a predetermined number N of samples as one acoustic frame from each of the left and right channels of the upsampled stereo broadband acoustic signal (step S12).

音響フレーム読込手段１０が読み込む１つの第１音響フレームのサンプル数Ｎ、適宜設定することができるが、本実施形態では、以下、Ｎ＝４０９６の場合について説明する。したがって、音響フレーム読込手段１０は、広帯域音響信号から左チャンネル、右チャンネルについてそれぞれ４０９６サンプルずつ、順次第１音響フレームとして読み込んでいくことになる。 Although the number N of samples of one first sound frame read by the sound frame reading means 10 can be set as appropriate, in the present embodiment, a case where N = 4096 will be described below. Therefore, the acoustic frame reading means 10 sequentially reads 4096 samples for each of the left channel and the right channel from the broadband acoustic signal as the first acoustic frame.

本実施形態でも、第１の実施形態と同様、奇数番目の音響フレーム、偶数番目の音響フレームは、互いに所定数（本実施形態ではＮ／２＝２０４８）のサンプルを重複して設定される。したがって、奇数番目の音響フレームを先頭からＡ１、Ａ２、Ａ３…とし、偶数番目の音響フレームを先頭からＢ１、Ｂ２、Ｂ３…とすると、Ａ１はサンプル１〜４０９６、Ａ２はサンプル４０９７〜８１９２、Ａ３はサンプル８１９３〜１２２８８、Ｂ１はサンプル２０４９〜６１４４、Ｂ２はサンプル６１４５〜１０２４０、Ｂ３はサンプル１０２４１〜１４３３６となる。なお、重複させるサンプル数は適宜設定することが可能である。 Also in the present embodiment, as in the first embodiment, the odd-numbered sound frames and the even-numbered sound frames are set by overlapping a predetermined number of samples (N / 2 = 2048 in the present embodiment). Therefore, if the odd-numbered acoustic frames are A1, A2, A3... From the top, and the even-numbered acoustic frames are B1, B2, B3... From the top, A1 is samples 1 to 4096, A2 is samples 4097 to 8192, A3. Is samples 8193-12288, B1 is samples 2049-6144, B2 is samples 6145-10240, and B3 is samples 10241-14336. Note that the number of samples to be overlapped can be set as appropriate.

次に、時間−周波数変換手段２０が、第１音響フレームに対して時間−周波数変換を行って、その第１音響フレームの複素数のスペクトルを得る（ステップＳ１３）。ステップＳ１３では、具体的には、窓関数を利用して時間−周波数変換を行う。時間−周波数変換としては、フーリエ変換、ウェーブレット変換その他、複素数のスペクトルを得られる公知の種々の手法を用いることができる。本実施形態では、第１の実施形態と同様、フーリエ変換を用いた場合を例にとって説明する。 Next, the time-frequency conversion means 20 performs time-frequency conversion on the first sound frame to obtain a complex spectrum of the first sound frame (step S13). In step S13, specifically, time-frequency conversion is performed using a window function. As the time-frequency conversion, various known methods for obtaining a complex spectrum, such as Fourier transform, wavelet transform, and the like, can be used. In the present embodiment, as in the first embodiment, a case where Fourier transform is used will be described as an example.

本実施形態では、各第１音響フレームについてのフーリエ変換は、上記〔数式２〕で定義されるハニング窓関数Ｗ（ｉ）を乗じたものに対して行われることになる。 In the present embodiment, the Fourier transform for each first acoustic frame is performed on the product multiplied by the Hanning window function W (i) defined by the above [Equation 2].

時間−周波数変換手段２０が、第１音響フレームに対してフーリエ変換を行う場合は、左チャンネル信号Ｘｌ（ｉ）、右チャンネル信号Ｘｒ（ｉ）（ｉ＝０，…，Ｎ−１）に対して、窓関数Ｗ（ｉ）を用いて、上記〔数式４〕に従った処理を行い、左チャンネルに対応する変換データの実部Ａｌ（ｊ）、虚部Ｂｌ（ｊ）、右チャンネルに対応する変換データの実部Ａｒ（ｊ）、虚部Ｂｒ（ｊ）を得る。上記〔数式４〕による処理の結果、サンプリング周波数が９６ｋＨｚ、Ｎ＝４０９６の場合、ｊの値が１つ異なると、周波数が約２３．４Ｈｚ異なることになる。 When the time-frequency conversion means 20 performs a Fourier transform on the first sound frame, the left channel signal Xl (i) and the right channel signal Xr (i) (i = 0,..., N−1). Then, using the window function W (i), the processing according to the above [Equation 4] is performed, and the real part Al (j), the imaginary part Bl (j), and the right channel of the conversion data corresponding to the left channel are handled. The real part Ar (j) and imaginary part Br (j) of the conversion data to be obtained are obtained. As a result of the above processing of [Equation 4], when the sampling frequency is 96 kHz and N = 4096, if the value of j is different by one, the frequency is different by about 23.4 Hz.

ステップＳ１３において上記〔数式４〕に従った処理を実行することにより、各第１音響フレームに対応する複素数のスペクトルが得られる。続いて、周波数成分改変手段３０が、第１音響フレームから得られた第１スペクトルを用いて、高域に低域打消し成分を追加する処理を行う（ステップＳ１４）。具体的には、第１の実施形態におけるステップＳ６と同様、第１スペクトルの成分を正負反転させ、周波数Ｆｓ／２を中心に折り返す処理を行う。ステップＳ１４における低域打消し成分の付加処理は、奇数番目の音響フレームに対してのみ行われ、偶数番目の音響フレームに対しては行われない。奇数番目と偶数番目で低域打消し成分をＯｎ／Ｏｆｆさせることにより、原音響信号に対して改変を加える割合を半分にするとともに、Ｏｎ／Ｏｆｆの低周波的な交番変動を加えることにより雑音再生音を明瞭にすることができる。 By executing the processing according to [Formula 4] in step S13, a complex spectrum corresponding to each first acoustic frame is obtained. Subsequently, the frequency component modifying means 30 performs a process of adding a low frequency cancellation component to the high frequency using the first spectrum obtained from the first acoustic frame (step S14). Specifically, similarly to step S6 in the first embodiment, the first spectrum component is inverted in the positive and negative directions, and the process of folding around the frequency Fs / 2 is performed. The low frequency cancellation component addition processing in step S14 is performed only for odd-numbered sound frames, and is not performed for even-numbered sound frames. By turning On / Off the low-frequency canceling component at odd and even numbers, the ratio of modification to the original sound signal is halved, and noise is generated by adding low frequency alternating fluctuations of On / Off. The reproduced sound can be made clear.

ステップＳ１４における低域打消し成分の追加処理を行ったら、次に、周波数成分改変手段３０は、低域打消し成分の追加処理後の第１スペクトルに白色雑音を付加する処理を行う（ステップＳ１５）。白色雑音とは、全ての周波数帯域においてエネルギーが均一に混入した雑音を示す。具体的には、以下の〔数式９〕に従った処理を実行し、高域の所定の範囲に、各周波数成分を同等に含む雑音である白色雑音を付加する処理を行う。 Once the low frequency cancellation component addition processing in step S14 has been performed, the frequency component modification means 30 then performs processing for adding white noise to the first spectrum after the low frequency cancellation component addition processing (step S15). ). White noise refers to noise in which energy is uniformly mixed in all frequency bands. Specifically, processing according to the following [Equation 9] is executed, and processing for adding white noise, which is noise that equally includes each frequency component, to a predetermined high frequency range is performed.

〔数式９〕
Ａｌ（ｊ）≧０の場合
Ａｌ´（ｊ）← −Ａｌ（Ｎ／２−ｊ）×α＋γ
Ａｌ（ｊ）＜０の場合
Ａｌ´（ｊ）← −Ａｌ（Ｎ／２−ｊ）×α−γ
Ｂｌ（ｊ）≧０の場合
Ｂｌ´（ｊ）← −Ｂｌ（Ｎ／２−ｊ）×α＋γ
Ｂｌ（ｊ）＜０の場合
Ｂｌ´（ｊ）← −Ｂｌ（Ｎ／２−ｊ）×α−γ
Ａｒ（ｊ）≧０の場合
Ａｒ´（ｊ）← −Ａｒ（Ｎ／２−ｊ）×α＋γ
Ａｒ（ｊ）＜０の場合
Ａｒ´（ｊ）← −Ａｒ（Ｎ／２−ｊ）×α−γ
Ｂｒ（ｊ）≧０の場合
Ｂｒ´（ｊ）← −Ｂｒ（Ｎ／２−ｊ）×α＋γ
Ｂｒ（ｊ）＜０の場合
Ｂｒ´（ｊ）← −Ｂｒ（Ｎ／２−ｊ）×α−γ [Formula 9]
When Al (j) ≧ 0 Al ′ (j) ← −Al (N / 2−j) × α + γ
When Al (j) <0 Al ′ (j) ← −Al (N / 2−j) × α−γ
When B1 (j) ≧ 0 B1 ′ (j) ← −B1 (N / 2−j) × α + γ
When B1 (j) <0 B1 ′ (j) ← −B1 (N / 2−j) × α−γ
When Ar (j) ≧ 0 Ar ′ (j) ← −Ar (N / 2−j) × α + γ
When Ar (j) <0 Ar ′ (j) ← −Ar (N / 2−j) × α−γ
When Br (j) ≧ 0 Br ′ (j) ← −Br (N / 2−j) × α + γ
When Br (j) <0 Br ′ (j) ← −Br (N / 2−j) × α−γ

周波数成分改変手段３０は、周波数成分Ａｌ（ｊ）、Ｂｌ（ｊ）、Ａｒ（ｊ）、Ｂｒ（ｊ）に対して、上記〔数式９〕に従った処理を、ｊ＝１，・・・，Ｎ／２の各ｊについて実行する。１音響フレームのサンプル数Ｎ=４０９６、アップサンプリング後のサンプリング周波数Ｆｕｓ＝９６ｋＨｚ（＝２Ｆｓ）の場合、ｊ＝Ｎ／４は、周波数Ｆｓ／２（＝Ｆｕｓ／４）に対応し、ｊ＝Ｎ／２は、周波数Ｆｓ（＝Ｆｕｓ／２）に対応する。なお、ｊ＝Ｎ／２を超える高周波数成分に対しては、改変を加えない。上記〔数式９〕において、αは、０．０＜α≦１．０の範囲で設定されるスケーリング実数値である。本実施形態ではα＝１．０に設定されている。また、γは信号レベル実数値であり、本実施形態では、Ａｌ（ｊ）、Ｂｌ（ｊ）、Ａｒ（ｊ）、Ｂｒ（ｊ）が１６ビットの範囲（−３２７６８〜＋３２７６７）で定義されている場合、γ＝１０２４０．０に設定されている。上記〔数式９〕に示した各式右辺の第１項（−Ａｌ（Ｎ／２−ｊ）×α等）は低域打消し成分であり、各式右辺の第２項γは、白色雑音である。 The frequency component modifying means 30 performs the processing according to the above [Equation 9] on the frequency components Al (j), Bl (j), Ar (j), Br (j), j = 1,. , N / 2 for each j. When the number of samples of one acoustic frame is N = 4096 and the sampling frequency after upsampling is Fus = 96 kHz (= 2Fs), j = N / 4 corresponds to the frequency Fs / 2 (= Fus / 4), and j = N / 2 corresponds to the frequency Fs (= Fus / 2). It should be noted that the high frequency component exceeding j = N / 2 is not modified. In the above [Expression 9], α is a scaling real value set in a range of 0.0 <α ≦ 1.0. In this embodiment, α = 1.0 is set. In addition, γ is a real value of the signal level. In this embodiment, Al (j), Bl (j), Ar (j), and Br (j) are defined in a 16-bit range (−32768 to +32767). Γ = 10240.0 is set. The first term (−Al (N / 2−j) × α, etc.) on the right side of each equation shown in [Formula 9] is a low-frequency canceling component, and the second term γ on the right side of each equation is white noise. It is.

上記〔数式９〕に示したように、複素数の実数成分および虚数成分の絶対値を増加させるように、所定の強度γを白色雑音として与える。ステップＳ１５における白色雑音の付加処理は、奇数番目の音響フレームに対してのみ行われ、偶数番目の音響フレームに対しては行われない。奇数番目と偶数番目で白色雑音をＯｎ／Ｏｆｆさせることにより、原音響信号に対して改変を加える割合を半分にするとともに、Ｏｎ／Ｏｆｆの低周波的な交番変動を加えることにより雑音再生音を明瞭にすることができる。 As shown in the above [Equation 9], a predetermined intensity γ is given as white noise so as to increase the absolute values of the real and imaginary components of the complex number. The white noise addition processing in step S15 is performed only for odd-numbered sound frames, and is not performed for even-numbered sound frames. By turning on / off white noise at odd and even numbers, the ratio of modification to the original sound signal is halved and noise reproduction sound is reduced by adding low frequency alternating fluctuations of On / Off. Can be clear.

本実施形態では、ステップＳ１４、Ｓ１５における周波数成分の改変処理を奇数番目の音響フレームに対して行い、偶数番目の音響フレームに対しては行っていないが、１つ置きの音響フレームに対して改変処理を行う必要は必ずしもなく、妨害音の効果がより高まれば、２つ置きあるいは３つ置きの音響フレームに対して改変処理を行うようにしても良く、逆に全ての音響フレームに対して行うようにしても良い。なお、上記の例では、奇数番目と偶数番目は相対的なものであるため、偶数番目の音響フレームに対して処理を行い、奇数番目の音響フレームに対して処理を行わないようにしても良い。 In the present embodiment, the frequency component modification processing in steps S14 and S15 is performed on the odd-numbered sound frames and is not performed on the even-numbered sound frames, but is modified on every other sound frame. It is not always necessary to perform the processing. If the effect of the interfering sound is enhanced, the modification processing may be performed on every second or third sound frame, and conversely, it is performed on all sound frames. You may do it. In the above example, since the odd number and the even number are relative, the process may be performed on the even-numbered sound frame and may not be performed on the odd-numbered sound frame. .

周波数成分改変手段３０が、ステップＳ１５における白色雑音付加を行って改変処理を終えたら、次に、周波数−時間変換手段４０が、改変後のスペクトルを周波数−時間変換して改変音響フレームを得る処理を行う（ステップＳ１６）。この周波数−時間変換は、当然のことながら、時間−周波数変換手段２０が実行した手法に対応していることが必要となる。本実施形態では、時間−周波数変換手段２０において、フーリエ変換を施しているため、周波数−時間変換手段４０は、上記〔数式７〕に従った処理によりフーリエ逆変換を実行することになる。 After the frequency component modification unit 30 performs the white noise addition in step S15 and completes the modification process, the frequency-time conversion unit 40 performs frequency-time conversion of the modified spectrum to obtain a modified acoustic frame. Is performed (step S16). Naturally, this frequency-time conversion needs to correspond to the technique executed by the time-frequency conversion means 20. In the present embodiment, since the time-frequency conversion means 20 performs the Fourier transform, the frequency-time conversion means 40 performs the inverse Fourier transform by the process according to the above [Equation 7].

ステップＳ１６における周波数−時間変換の後、得られた改変音響フレームを順次出力して改変音響信号を得ることができる。しかし、この段階で複製が行われても、前述の通り、複製の際に利用されるＡ／Ｄ変換器やサンプラーに前置されるアンチエイリアシング処理により、埋め込んだ原音響信号に対する打ち消し成分および妨害音を発生させる信号成分が減衰されてしまうことが通常である。そこで、本実施形態においても、第１の実施形態と同様、複製時にＬＰＦ処理が行われた場合にも所望の妨害音が聴取されるようにするため、以下のステップＳ１７の処理を行う。 After the frequency-time conversion in step S16, the modified acoustic frames obtained can be sequentially output to obtain the modified acoustic signal. However, even if duplication is performed at this stage, as described above, the anti-aliasing process that precedes the A / D converter and sampler used during duplication cancels out and disturbs the embedded original sound signal. Usually, signal components that generate sound are attenuated. Therefore, in the present embodiment as well, in the same way as in the first embodiment, the following processing in step S17 is performed so that a desired disturbing sound can be heard even when the LPF processing is performed during duplication.

すなわち、第１の実施形態と同様、ステップＳ１６までの処理により得られた改変音響信号に対して、チャンネル間およびサンプル間の演算処理を行う（ステップＳ１７）。具体的には、第１の実施形態と同様、双方のチャンネルを対応するサンプルごとに加算した際に、隣接する２つのサンプルを一致させるように、Ｒチャンネルの各サンプルの値を補正する。本実施形態では、第１の実施形態と同様、上記〔数式８〕に従った処理を実行し、Ｒチャンネルの各サンプルの値Ｘｒ´を補正する。 That is, as in the first embodiment, arithmetic processing between channels and between samples is performed on the modified acoustic signal obtained by the processing up to step S16 (step S17). Specifically, as in the first embodiment, when both channels are added for each corresponding sample, the value of each sample of the R channel is corrected so that two adjacent samples are matched. In the present embodiment, as in the first embodiment, the process according to the above [Equation 8] is executed to correct the value Xr ′ of each sample of the R channel.

改変音響フレーム出力手段５０は、改変音響フレーム補正手段４５の処理により得られた補正後の改変音響フレームを順次出力ファイルに出力する。上記図９のフローチャートに示した処理は、広帯域音響信号の全ての第１音響フレームに対して実行される。このようにして全ての第１音響フレームに対して処理を行った結果、改変音響フレームの集合である改変音響信号が、改変音響信号記憶部６２に記憶される。本実施形態では、第１音響信号の先頭から最後まで妨害音を埋め込むようにしたが、埋め込む位置を設定して、その範囲にだけ埋め込むようにすることも可能である。 The modified sound frame output means 50 sequentially outputs the modified sound frames after correction obtained by the process of the modified sound frame correction means 45 to the output file. The process shown in the flowchart of FIG. 9 is executed for all the first sound frames of the wideband sound signal. As a result of performing processing on all the first sound frames in this way, a modified sound signal that is a set of modified sound frames is stored in the modified sound signal storage unit 62. In the present embodiment, the disturbing sound is embedded from the beginning to the end of the first acoustic signal. However, it is also possible to set an embedding position and embed only in that range.

＜２．２．３．第３の実施形態＞
次に、本発明第３の実施形態の音響信号に対する妨害音の埋め込み装置について説明する。第１の実施形態では、原音響信号をアップサンプリングして広帯域音響信号を作成し、広帯域音響信号に対して第２音響信号の埋め込みを行ったが、第３の実施形態では、アップサンプリングを行わず、原音響信号に対して直接第２音響信号を埋め込む処理を行う。第１の実施形態と類似する処理が多いため、図５のフローチャートを用いて説明を行う。本実施形態においては、第１音響信号として、サンプリング周波数Ｆｓ＝４４．１ｋＨｚでサンプリングしたものを用いる。また、本実施形態では、折り返しの中心となる周波数Ｆｔ＝Ｆｓ／４とする。 <2.2.3. Third Embodiment>
Next, an interference sound embedding apparatus for an acoustic signal according to a third embodiment of the present invention will be described. In the first embodiment, the original sound signal is upsampled to create a wideband sound signal, and the second sound signal is embedded in the wideband sound signal. In the third embodiment, upsampling is performed. First, the process of embedding the second acoustic signal directly into the original acoustic signal is performed. Since there are many processes similar to those of the first embodiment, description will be made with reference to the flowchart of FIG. In the present embodiment, the first sound signal sampled at the sampling frequency Fs = 44.1 kHz is used. In the present embodiment, the frequency Ft = Fs / 4, which is the center of folding, is used.

本実施形態では、アップサンプリング手段８を介さず、音響フレーム読込手段１０が、直接音響信号記憶部６１に記憶された原音響信号を読み込む。音響フレーム読込手段１０は、音響信号記憶部６１に記憶されたステレオの原音響信号の左右の各チャンネルから、それぞれ所定数Ｎのサンプルを１つの第１音響フレームとして読み込む（ステップＳ２）。同様に、音響フレーム読込手段１０は、音響信号記憶部６１に記憶されたステレオの第２音響信号の左右の各チャンネルから、それぞれ所定数Ｎのサンプルを１つの第２音響フレームとして読み込む（ステップＳ３）。 In the present embodiment, the acoustic frame reading means 10 directly reads the original acoustic signal stored in the acoustic signal storage unit 61 without using the upsampling means 8. The acoustic frame reading means 10 reads a predetermined number N of samples as one first acoustic frame from each of the left and right channels of the stereo original acoustic signal stored in the acoustic signal storage unit 61 (step S2). Similarly, the acoustic frame reading means 10 reads a predetermined number N of samples as one second acoustic frame from each of the left and right channels of the stereo second acoustic signal stored in the acoustic signal storage unit 61 (step S3). ).

音響フレーム読込手段１０が読み込む１つの音響フレームのサンプル数Ｎは、適宜設定することができるが、サンプリング周波数が４４．１ｋＨｚの場合、４０９６サンプル程度とすると、最も原音に対するダメージを少なくできることが分かっているので、以下この設定値で説明する。したがって、音響フレーム読込手段１０は、音響信号から、左チャンネル、右チャンネルについてそれぞれ４０９６サンプルずつ、順次音響フレームとして読み込んでいくことになる。 The number N of samples of one sound frame read by the sound frame reading means 10 can be set as appropriate. However, when the sampling frequency is 44.1 kHz, it is understood that the damage to the original sound can be reduced most when the number is about 4096 samples. This setting value will be described below. Therefore, the acoustic frame reading means 10 sequentially reads 4096 samples for the left channel and the right channel as acoustic frames from the acoustic signal.

本実施形態でも、第１の実施形態と同様、奇数番目の音響フレーム、偶数番目の音響フレームは、互いに所定数（本実施形態では２０４８）のサンプルを重複して設定される。したがって、奇数番目の音響フレームを先頭からＡ１、Ａ２、Ａ３…とし、偶数番目の音響フレームを先頭からＢ１、Ｂ２、Ｂ３…とすると、Ａ１はサンプル１〜４０９６、Ａ２はサンプル４０９７〜８１９２、Ａ３はサンプル８１９３〜１２２８８、Ｂ１はサンプル２０４９〜６１４４、Ｂ２はサンプル６１４５〜１０２４０、Ｂ３はサンプル１０２４１〜１４３３６となる。 Also in this embodiment, as in the first embodiment, the odd-numbered acoustic frames and the even-numbered acoustic frames are set by overlapping a predetermined number of samples (2048 in this embodiment). Therefore, if the odd-numbered acoustic frames are A1, A2, A3... From the top, and the even-numbered acoustic frames are B1, B2, B3... From the top, A1 is samples 1 to 4096, A2 is samples 4097 to 8192, A3. Is samples 8193-12288, B1 is samples 2049-6144, B2 is samples 6145-10240, and B3 is samples 10241-14336.

次に、時間−周波数変換手段２０が、第１音響フレームに対して時間−周波数変換を行って、その第１音響フレームの複素数のスペクトルを得る（ステップＳ４）。同様に、時間−周波数変換手段２０が、第２音響フレームに対して時間−周波数変換を行って、その第２音響フレームの複素数のスペクトルを得る（ステップＳ５）。ステップＳ４、Ｓ５では、具体的には、窓関数を利用して時間−周波数変換を行う。時間−周波数変換としては、フーリエ変換、ウェーブレット変換その他公知の種々の手法を用いることができるが、複素数のスペクトルを得られる手法である必要がある。本実施形態では、フーリエ変換を用いた場合を例にとって説明する。ステップＳ４、Ｓ５では、第１の実施形態と同様、上記〔数式２〕に示した窓関数Ｗ（ｉ）を利用して時間−周波数変換を行う。 Next, the time-frequency conversion means 20 performs time-frequency conversion on the first sound frame to obtain a complex spectrum of the first sound frame (step S4). Similarly, the time-frequency conversion means 20 performs time-frequency conversion on the second sound frame to obtain a complex spectrum of the second sound frame (step S5). In steps S4 and S5, specifically, time-frequency conversion is performed using a window function. As the time-frequency conversion, various known methods such as Fourier transform, wavelet transform, and the like can be used, but the method needs to obtain a complex spectrum. In the present embodiment, a case where Fourier transform is used will be described as an example. In steps S4 and S5, as in the first embodiment, time-frequency conversion is performed using the window function W (i) shown in the above [Equation 2].

時間−周波数変換手段２０が、第１音響フレームに対してフーリエ変換を行う場合は、左チャンネル信号Ｘｌ（ｉ）、右チャンネル信号Ｘｒ（ｉ）（ｉ＝０，…，Ｎ−１）に対して、窓関数Ｗ（ｉ）を用いて、上記〔数式４〕に従った処理を行い、左チャンネルに対応する変換データの実部Ａｌ（ｊ）、虚部Ｂｌ（ｊ）、右チャンネルに対応する変換データの実部Ａｒ（ｊ）、虚部Ｂｒ（ｊ）を得る。上記〔数式４〕に従った処理の結果、サンプリング周波数が４４．１ｋＨｚ、Ｎ＝４０９６の場合、ｊの値が１つ異なると、周波数が約１０．８Ｈｚ異なることになる。ステップＳ４により得られる原音響信号の信号スペクトルは、図１０（ａ）に示したようなものとなる。 When the time-frequency conversion means 20 performs a Fourier transform on the first sound frame, the left channel signal Xl (i) and the right channel signal Xr (i) (i = 0,..., N−1). Then, using the window function W (i), the processing according to the above [Equation 4] is performed, and the real part Al (j), the imaginary part Bl (j), and the right channel of the conversion data corresponding to the left channel are handled. The real part Ar (j) and imaginary part Br (j) of the conversion data to be obtained are obtained. As a result of the processing according to the above [Equation 4], when the sampling frequency is 44.1 kHz and N = 4096, if the value of j is different by one, the frequency will be different by about 10.8 Hz. The signal spectrum of the original sound signal obtained in step S4 is as shown in FIG.

時間−周波数変換手段２０が、第２音響フレームに対してフーリエ変換を行う場合は、左チャンネル信号Ｘ２ｌ（ｉ）、右チャンネル信号Ｘ２ｒ（ｉ）（ｉ＝０，…，Ｍ−１）に対して、窓関数Ｗ（ｉ）を用いて、上記〔数式５〕に従った処理を行い、左チャンネルに対応する変換データの実部Ａ２ｌ（ｊ）、虚部Ｂｌ（ｊ）、右チャンネルに対応する変換データの実部Ａ２ｒ（ｊ）、虚部Ｂｒ（ｊ）を得る。上記〔数式４〕に従った処理の結果、サンプリング周波数が４４．１ｋＨｚ、Ｎ＝４０９６の場合、ｊの値が１つ異なると、周波数が約１０．８Ｈｚ異なることになる。ステップＳ５により得られる第２音響信号の信号スペクトルは、図１０（ｂ）に示したようなものとなる。 When the time-frequency conversion means 20 performs a Fourier transform on the second acoustic frame, the left channel signal X2l (i) and the right channel signal X2r (i) (i = 0,..., M-1). Then, using the window function W (i), the processing according to the above [Formula 5] is performed, and the real part A2l (j), the imaginary part Bl (j), and the right channel of the conversion data corresponding to the left channel are handled. The real part A2r (j) and imaginary part Br (j) of the conversion data to be obtained are obtained. As a result of the processing according to the above [Equation 4], when the sampling frequency is 44.1 kHz and N = 4096, if the value of j is different by one, the frequency will be different by about 10.8 Hz. The signal spectrum of the second acoustic signal obtained in step S5 is as shown in FIG.

ステップＳ４、Ｓ５においてそれぞれ上記〔数式４〕〔数式５〕に従った処理を実行することにより、各第１音響フレーム、第２音響フレームに対応する複素数のスペクトルが得られる。続いて、周波数成分改変手段３０が、第１音響フレームから得られた第１スペクトルを用いて、高域に低域打消し成分を追加する処理を行う（ステップＳ６）。具体的には、第１スペクトルの成分を正負反転させ、周波数Ｆｓ／２を中心に折り返す処理を行う。 In steps S4 and S5, the processes according to the above [Equation 4] and [Equation 5] are executed, whereby complex spectra corresponding to the first and second acoustic frames are obtained. Subsequently, the frequency component modifying means 30 performs a process of adding a low frequency cancellation component to the high frequency using the first spectrum obtained from the first acoustic frame (step S6). Specifically, the first spectrum component is inverted between positive and negative, and a process of turning around the frequency Fs / 2 is performed.

ステップＳ６における低域打消し成分の追加処理を行ったら、次に、周波数成分改変手段３０は、低域打消し成分の追加処理後の第１スペクトルと第２スペクトルの合成を行う（ステップＳ７）。第２スペクトルについては、周波数Ｆｓ／４を中心に折り返しながら加算する。この結果、第１スペクトル（合成スペクトル）は、図１０（ｃ）に示すような状態となる。合成後の第１スペクトル（改変音響信号のスペクトル）においては、周波数Ｆｓ／４より大きい範囲において、元の第１スペクトルに存在した信号成分を振幅反転したものと、第２スペクトルの信号成分が、折り返された状態で存在することになる。 Once the low frequency cancellation component addition processing in step S6 has been performed, the frequency component modification means 30 then combines the first spectrum and the second spectrum after the low frequency cancellation component addition processing (step S7). . The second spectrum is added while being folded around the frequency Fs / 4. As a result, the first spectrum (synthetic spectrum) is in a state as shown in FIG. In the first spectrum after synthesis (the spectrum of the modified acoustic signal), a signal component obtained by inverting the amplitude of the signal component existing in the original first spectrum and a signal component of the second spectrum in a range larger than the frequency Fs / 4, It exists in a folded state.

図１０（ｃ）の例では、周波数Ｆｓ／４〜Ｆｓ／２において、２種のスペクトル成分を重ねて表示しているが、実際には複素ベクトルで加算される。図１０（ｃ）に示すように、合成後のスペクトルでは、周波数Ｆｓ／４〜Ｆｓ／２において、第１音響信号の振幅反転成分および第２音響信号に由来する成分は雑音化されて、第１音響信号の低域成分にマスキングされるため、改変後の音響信号を再生したとしても、第１音響信号の振幅反転成分および第２音響信号の成分は、人間には聴取されない。 In the example of FIG. 10C, two types of spectral components are superimposed and displayed at frequencies Fs / 4 to Fs / 2, but in reality they are added as complex vectors. As shown in FIG. 10C, in the synthesized spectrum, the amplitude inversion component of the first acoustic signal and the component derived from the second acoustic signal are converted into noise at frequencies Fs / 4 to Fs / 2. Since the low-frequency component of one acoustic signal is masked, even if the modified acoustic signal is reproduced, the amplitude inversion component of the first acoustic signal and the component of the second acoustic signal are not heard by humans.

周波数成分改変手段３０が上記ステップＳ６、Ｓ７で行った処理は以下の〔数式１０〕としてまとめることができる。 The processing performed by the frequency component modifying means 30 in steps S6 and S7 can be summarized as the following [Equation 10].

〔数式１０〕
Ａｌ´（ｊ）← −Ａｌ（Ｎ／２−ｊ）×α＋Ａ２ｌ（Ｎ／２−ｊ）×βｌ
Ｂｌ´（ｊ）← −Ｂｌ（Ｎ／２−ｊ）×α＋Ｂ２ｌ（Ｎ／２−ｊ）×βｌ
Ａｒ´（ｊ）← −Ａｒ（Ｎ／２−ｊ）×α＋Ａ２ｒ（Ｎ／２−ｊ）×βｒ
Ｂｒ´（ｊ）← −Ｂｒ（Ｎ／２−ｊ）×α＋Ｂ２ｒ（Ｎ／２−ｊ）×βｒ [Formula 10]
Al ′ (j) ← −Al (N / 2−j) × α + A2l (N / 2−j) × βl
Bl ′ (j) ← −Bl (N / 2−j) × α + B2l (N / 2−j) × βl
Ar ′ (j) ← −Ar (N / 2−j) × α + A2r (N / 2−j) × βr
Br ′ (j) ← −Br (N / 2−j) × α + B2r (N / 2−j) × βr

周波数成分改変手段３０は、周波数成分Ａｌ（ｊ）、Ｂｌ（ｊ）、Ａ２ｌ（ｊ）、Ｂ２ｌ（ｊ）に対して、上記〔数式１０〕に従った処理を、ｊ＝Ｎ／４＋１，・・・，Ｎ／２−１の各ｊについて実行する。１音響フレームのサンプル数Ｎ=４０９６、サンプリング周波数Ｆｓ＝４４．１ｋＨｚの場合、ｊ＝Ｎ／４は、周波数Ｆｓ／４に対応し、ｊ＝Ｎ／２は、周波数Ｆｓ／２に対応する。したがって、〔数式１０〕においては、ｊ＝１０２５，・・・，２０４８が処理対象となり、約１１．１ｋＨｚ〜約２２．１ｋＨｚ（略人間の可聴域上限）の周波数成分が変更される。 The frequency component modifying means 30 performs the processing according to the above [Equation 10] on the frequency components Al (j), Bl (j), A2l (j), B2l (j), j = N / 4 + 1,. .., for each j of N / 2-1. When the number of samples of one acoustic frame is N = 4096 and the sampling frequency Fs is 44.1 kHz, j = N / 4 corresponds to the frequency Fs / 4, and j = N / 2 corresponds to the frequency Fs / 2. Therefore, in [Equation 10], j = 1025,..., 2048 is a processing target, and the frequency component of about 11.1 kHz to about 22.1 kHz (approximately the human audible range upper limit) is changed.

上記〔数式１０〕において、α、βｌ、βｒは、０．０＜α、βｌ、βｒ≦１．０の範囲で設定されるスケーリング実数値である。本実施形態ではα＝βｌ＝βｒ＝１．０に設定されている。上記〔数式１０〕に示した各式右辺の第１項（−Ａｌ（Ｎ／２−ｊ）×α等）は低域打消し成分であり、各式右辺の第２項（Ａ２ｌ（Ｎ／２−ｊ）×βｌ等）は、第２スペクトルを周波数方向に折り返した成分である。 In the above [Equation 10], α, β1, and βr are scaling real values set in the range of 0.0 <α, β1, and βr ≦ 1.0. In the present embodiment, α = βl = βr = 1.0 is set. The first term (−Al (N / 2−j) × α, etc.) on the right side of each equation shown in [Formula 10] is a low-frequency canceling component, and the second term (A2l (N / 2-j) × βl etc. is a component obtained by folding the second spectrum in the frequency direction.

第２音響信号の成分を可聴域へ埋め込む場合は、音質劣化に留意する必要がある。そのため、本実施形態のように、人間の可聴域である約１１．１ｋＨｚ〜約２２．１ｋＨｚに埋め込む際には、βｌ、βｒを元の原音響信号の信号強度と連動させて増減させることにより、埋め込みによる品質劣化を抑えることができる。具体的には、以下の〔数式１１〕により算出することができる。 When embedding the component of the second acoustic signal in the audible range, it is necessary to pay attention to sound quality degradation. Therefore, as in this embodiment, when embedding in the human audible range of about 11.1 kHz to about 22.1 kHz, βl and βr are increased or decreased in conjunction with the signal intensity of the original original sound signal. , Quality deterioration due to embedding can be suppressed. Specifically, it can be calculated by the following [Equation 11].

〔数式１１〕
βｌ＝βｏ[βｌｐ＋５０×Ａｖｅｌ]／２
βｒ＝βｏ[βｒｐ＋５０×Ａｖｅｒ]／２
Ａｖｅｌ＝[Σ_{i=1,…,N/4-1}｛Ａｌ（ｊ＋Ｎ／４）²＋Ｂｌ（ｊ＋Ｎ／４）²｝×１６ｊ／Ｎ²]^1/2
Ａｖｅｒ＝[Σ_{i=1,…,N/4-1}｛Ａｒ（ｊ＋Ｎ／４）²＋Ｂｒ（ｊ＋Ｎ／４）²｝×１６ｊ／Ｎ²]^1/2 [Formula 11]
βl = βo [βlp + 50 × Avel] / 2
βr = βo [βrp + 50 × Aver] / 2
Avel = [Σ _{i = 1,..., N / 4-1} {Al (j + N / 4) ² + Bl (j + N / 4) ² } × 16j / N ² ] ^1/2
Aver = [Σ _{i = 1,..., N / 4-1} {Ar (j + N / 4) ² + Br (j + N / 4) ² } × 16j / N ² ] ^1/2

上記〔数式１１〕において、βｏ＝１．０であり、βｌｐ、βｒｐは、それぞれ直前の音響フレームで決定されたβｌ、βｒの値である。先頭の音響フレームの場合、直前の音響フレームが存在しないため、βｌｐ、βｒｐ＝βｏとする。第１式、第２式に登場する係数“５０”は、所定の定数であり、０〜１００の整数値に設定することが可能であるが、本実施形態のように“５０”に近い程好ましい。Ａｖｅｌ、Ａｖｅｒは、周波数Ｆｔから周波数２Ｆｔに含まれる所定範囲の信号成分の平均値である。上記〔数式１１〕で算出されたβｌ、βｒをそのまま使用すると、βｌ、βｒの急変によりスパイクノイズが発生する場合があるため、本実施形態では、上記〔数式１１〕で算出されたβｌ、βｒとその直前の音響フレームから得られたβｌｐ、βｒｐの平均値を新たなβｌ、βｒとして算出し、〔数式１０〕において用いる。 In the above [Formula 11], βo = 1.0, and βlp and βrp are the values of βl and βr determined in the immediately preceding acoustic frame, respectively. In the case of the first acoustic frame, since there is no previous acoustic frame, βlp and βrp = βo are set. The coefficient “50” appearing in the first and second formulas is a predetermined constant and can be set to an integer value of 0 to 100. However, as the present embodiment is closer to “50”, preferable. Avel and Aver are average values of signal components in a predetermined range included from the frequency Ft to the frequency 2Ft. If βl and βr calculated in the above [Equation 11] are used as they are, spike noise may occur due to a sudden change in βl and βr. Therefore, in this embodiment, βl and βr calculated in the above [Equation 11]. And the average values of βlp and βrp obtained from the immediately preceding sound frame are calculated as new βl and βr and used in [Equation 10].

図５のフローチャートと上記〔数式１０〕の対応関係を示すと、ステップＳ６における高域に低域打消し成分を追加する処理が、上記〔数式１０〕の右辺第１項を加算する処理に対応し、ステップＳ７における第１スペクトルと第２スペクトルの合成処理が、上記〔数式１０〕の右辺第２項を加算する処理に対応することになる。 When the correspondence between the flowchart of FIG. 5 and the above [Equation 10] is shown, the process of adding a low frequency cancellation component to the high frequency in Step S6 corresponds to the process of adding the first term on the right side of the above [Equation 10]. Then, the synthesis process of the first spectrum and the second spectrum in step S7 corresponds to the process of adding the second term on the right side of the above [Equation 10].

具体的には、第１の実施形態と同様、周波数−時間変換手段４０は、周波数成分改変手段３０により得られた第１スペクトルの左チャンネルの実部Ａｌ´（ｊ）、虚部Ｂｌ´（ｊ）、右チャンネルの実部Ａｒ´（ｊ）、虚部Ｂｒ´（ｊ）を用いて、上記〔数式７〕に従った処理を行い、Ｘｌ´（ｉ）、Ｘｒ´（ｉ）を算出する。なお、周波数成分改変手段３０において改変されていない周波数成分については、Ａｌ´（ｊ）、Ｂｌ´（ｊ）、Ａｒ´（ｊ）、Ｂｒ´（ｊ）として、それぞれ元の周波数成分であるＡｌ（ｊ）、Ｂｌ（ｊ）、Ａｒ（ｊ）、Ｂｒ（ｊ）を用いる。 Specifically, as in the first embodiment, the frequency-to-time conversion means 40 has a real part Al ′ (j), an imaginary part Bl ′ ( j) Using the real part Ar ′ (j) and imaginary part Br ′ (j) of the right channel, the processing according to the above [Equation 7] is performed to calculate Xl ′ (i) and Xr ′ (i). To do. The frequency components that have not been modified by the frequency component modifying means 30 are Al ′ (j), Bl ′ (j), Ar ′ (j), and Br ′ (j), which are the original frequency components, Al. (J), Bl (j), Ar (j), and Br (j) are used.

第１の実施形態と同様、ステップＳ８における周波数−時間変換の後、得られた改変音響フレームを順次出力して改変音響信号を得ることができる。ステップＳ８までの処理は前述の通り複製された際に、原音信号の打ち消し成分や第２音響信号の成分が複製の際に利用されるＡ／Ｄ変換器やサンプラーに前置されているアンチエイリアシング処理により減衰されてしまうことが通常である。本実施形態においても、複製時にＬＰＦ処理が行われた場合にも所望の効果を働かせるため以下のステップＳ９の処理を行う。 Similarly to the first embodiment, after the frequency-time conversion in step S8, the modified acoustic frames obtained can be sequentially output to obtain the modified acoustic signal. When the processing up to step S8 is duplicated as described above, the anti-aliasing component is preliminarily placed in the A / D converter or sampler in which the cancellation component of the original sound signal and the component of the second acoustic signal are used in the duplication. Usually, it is attenuated by the processing. Also in this embodiment, the following step S9 is performed in order to exert a desired effect even when the LPF process is performed at the time of replication.

図１１は、ステップＳ９の処理による信号スペクトルの変化の様子を示す図である。図１１において、図１１（ａ）、（ｂ）は、図１０（ａ）と同一であり、それぞれステップＳ８までの処理の結果得られた改変音響信号のＬチャンネル、Ｒチャンネルの信号スペクトルを示す。現実には、ステレオ音響信号の場合、Ｌチャンネル、Ｒチャンネルの信号スペクトルは同一でない場合が多いが、ここでは、図７と同様、説明の便宜上、Ｌチャンネル、Ｒチャンネルを同一の信号スペクトルで示している。図１１（ｃ）、（ｄ）は、それぞれステップＳ９による処理後のＬチャンネル、Ｒチャンネルの信号スペクトルを示す。ステップＳ９においては、Ｌチャンネルの信号に対しては改変を行わず、Ｒチャンネルの信号に対してのみ改変を行うため、図１１（ｃ）は図１１（ａ）と同一となっている。 FIG. 11 is a diagram showing how the signal spectrum changes due to the processing in step S9. 11 (a) and 11 (b) are the same as FIG. 10 (a) and show the L channel and R channel signal spectra of the modified acoustic signal obtained as a result of the processing up to step S8, respectively. . In reality, in the case of a stereo sound signal, the signal spectrum of the L channel and the R channel is often not the same, but here, as in FIG. 7, for convenience of explanation, the L channel and the R channel are shown by the same signal spectrum. ing. FIGS. 11C and 11D show the signal spectra of the L channel and the R channel after processing in step S9, respectively. In step S9, since the L channel signal is not modified, but only the R channel signal is modified, FIG. 11 (c) is the same as FIG. 11 (a).

再び図５のフローチャートに戻って説明する。ステップＳ８までの処理により得られた図１１（ａ）、（ｂ）に示す改変音響信号に対して、チャンネル間およびサンプル間の演算処理を行う（ステップＳ９）。具体的には、双方のチャンネルを対応するサンプルごとに加算した際に、隣接する２つのサンプルを一致させるように、Ｒチャンネルの各サンプルの値を補正する。本実施形態では、第１の実施形態と同様、上記〔数式８〕に従った処理を実行し、Ｒチャンネルの各サンプルの値Ｘｒ´を補正する。このような補正を施した結果を図１１（ｃ）、（ｄ）に示す。Ｌチャンネル側は何も補正を行っていないため、図１１（ｃ）は図１１（ｄ）と全く同一である。Ｒチャンネル側は、時間次元では大きく改変されているが、図１１（ｄ）に示される周波数次元では図１１（ｂ）に比べ顕著な変化は見られない。ただし、改変前に比べ隣接する２つのサンプルの値が近くなるため、図１１（ｄ）の左側の点線で示されるように若干折り返しが発生する。 Returning to the flowchart of FIG. For the modified acoustic signals shown in FIGS. 11A and 11B obtained by the processing up to step S8, arithmetic processing between channels and between samples is performed (step S9). Specifically, when both channels are added for each corresponding sample, the value of each sample of the R channel is corrected so that two adjacent samples are matched. In the present embodiment, as in the first embodiment, the process according to the above [Equation 8] is executed to correct the value Xr ′ of each sample of the R channel. The results of such correction are shown in FIGS. 11 (c) and 11 (d). Since no correction is performed on the L channel side, FIG. 11C is exactly the same as FIG. The R channel side is greatly modified in the time dimension, but no significant change is seen in the frequency dimension shown in FIG. 11D compared to FIG. However, since the values of two adjacent samples are closer than before the modification, a slight fold occurs as shown by the dotted line on the left side of FIG.

改変音響フレーム出力手段５０は、改変音響フレーム補正手段４５の処理により得られた補正後の改変音響フレームを順次出力ファイルに出力する。上記図５のフローチャートに示した処理（ステップＳ１除く）は、第１音響信号の全ての第１音響フレームに対して実行される。第１音響フレームの数が、第２音響フレームの数より多い場合は、先頭の第２音響フレームに戻って繰り返し処理を行う。このようにして全ての第１音響フレームに対して処理を行った結果、改変音響フレームの集合である改変音響信号が、改変音響信号記憶部６２に記憶される。本実施形態では、第１音響信号の先頭から最後まで第２音響信号を埋め込むようにしたが、埋め込む位置を設定して、その範囲にだけ埋め込むようにすることも可能である。 The modified sound frame output means 50 sequentially outputs the modified sound frames after correction obtained by the process of the modified sound frame correction means 45 to the output file. The process (except for step S1) shown in the flowchart of FIG. 5 is executed for all the first acoustic frames of the first acoustic signal. When the number of the first sound frames is larger than the number of the second sound frames, the process returns to the first second sound frame and is repeated. As a result of performing processing on all the first sound frames in this way, a modified sound signal that is a set of modified sound frames is stored in the modified sound signal storage unit 62. In the present embodiment, the second acoustic signal is embedded from the beginning to the end of the first acoustic signal. However, it is also possible to set an embedding position and embed only in that range.

図１２は、ステップＳ９までの処理により得られた改変音響信号が、ＬＰＦ回路を備えた現実的な装置で複製される場合の概念を示す図である。図１２において、図１２（ａ）、（ｂ）は、それぞれ図１１（ｃ）、（ｄ）と同一であり、ステップＳ９までの処理の結果得られた補正後の改変音響信号のＬチャンネル、Ｒチャンネルの信号スペクトルを示す。図１２（ｃ）は、ＬＰＦ処理後に再サンプリングして得られた複製音響信号の信号スペクトルを示す。 FIG. 12 is a diagram showing a concept in a case where the modified acoustic signal obtained by the processing up to step S9 is duplicated by a realistic device including an LPF circuit. 12, FIGS. 12 (a) and 12 (b) are the same as FIGS. 11 (c) and 11 (d), respectively, and the L channel of the modified acoustic signal after correction obtained as a result of the processing up to step S9, The signal spectrum of R channel is shown. FIG. 12C shows a signal spectrum of a duplicate acoustic signal obtained by re-sampling after LPF processing.

図１２に示すように、Ｌチャンネルスピーカー、Ｒチャンネルスピーカーからそれぞれ発せられたＬチャンネル改変音響信号、Ｒチャンネル改変音響信号をステレオまたはモノラルマイクロフォンで混合して録音し、サンプリング周波数Ｆｓで再サンプリングして複製音響信号を得る。すると、ステレオ音響信号の混合により隣接する２つのサンプルが同一値になり１／２に間引いた状態と等価になりエイリアシングが発生し、図１２（ｃ）に示すように、複製音響信号の周波数Ｆｓ／２以下の範囲には、図２（ｂ）に示したような第２音響信号が復元され、第２音響信号に基づく音が人間に聴こえる。警告メッセージ等の第２音響信号が音楽に重ねて聴こえると、客観的に違法コピーであることが判明するとともに、本来の状態で音楽を鑑賞することができず、商品化することができない。このため、複製に対する抑止力が働くことになる。 As shown in FIG. 12, the L channel modified acoustic signal and the R channel modified acoustic signal respectively emitted from the L channel speaker and the R channel speaker are mixed and recorded with a stereo or monaural microphone, and resampled at the sampling frequency Fs. A duplicate acoustic signal is obtained. Then, due to the mixing of the stereo sound signal, two adjacent samples have the same value and are equivalent to a state where the sample is thinned out to ½, and aliasing occurs. As shown in FIG. In the range of / 2 or less, the second acoustic signal as shown in FIG. 2B is restored, and a sound based on the second acoustic signal can be heard by humans. If the second acoustic signal such as a warning message is heard over the music, it is objectively found to be an illegal copy, and the music cannot be appreciated in its original state and cannot be commercialized. For this reason, deterrence against replication works.

＜２．２．４．第４の実施形態＞
次に、本発明の第４の実施形態である音響信号に対する妨害音の埋め込み装置について説明する。第２の実施形態では、原音響信号をアップサンプリングして広帯域音響信号を作成し、白色雑音の埋め込みを行ったが、第４の実施形態では、アップサンプリングを行わず、原音響信号に対して直接白色雑音を埋め込む処理を行う。第２の実施形態と類似する処理が多いため、図９のフローチャートを用いて説明を行う。本実施形態においては、第１音響信号として、サンプリング周波数Ｆｓ＝４４．１ｋＨｚでサンプリングしたものを用いる。また、本実施形態では、折り返しの中心となる周波数Ｆｔ＝Ｆｓ／４とする。 <2.2.4. Fourth Embodiment>
Next, an interference sound embedding device for an acoustic signal according to a fourth embodiment of the present invention will be described. In the second embodiment, the original sound signal is upsampled to create a wideband sound signal, and white noise is embedded. However, in the fourth embodiment, upsampling is not performed and the original sound signal is embedded. A process of directly embedding white noise is performed. Since there are many processes similar to those of the second embodiment, description will be made with reference to the flowchart of FIG. In the present embodiment, the first sound signal sampled at the sampling frequency Fs = 44.1 kHz is used. In the present embodiment, the frequency Ft = Fs / 4, which is the center of folding, is used.

本実施形態では、アップサンプリング手段８を介さず、音響フレーム読込手段１０が、直接音響信号記憶部６１に記憶された音響信号を読み込む。音響フレーム読込手段１０は、音響信号記憶部６１に記憶されたステレオの音響信号の左右の各チャンネルから、それぞれ所定数Ｎのサンプルを１つの第１音響フレームとして読み込む（ステップＳ１２）。 In the present embodiment, the acoustic frame reading unit 10 directly reads the acoustic signal stored in the acoustic signal storage unit 61 without using the upsampling unit 8. The acoustic frame reading means 10 reads a predetermined number N of samples as one first acoustic frame from each of the left and right channels of the stereo acoustic signal stored in the acoustic signal storage unit 61 (step S12).

本実施形態でも、第１〜第３の実施形態と同様、奇数番目の音響フレーム、偶数番目の音響フレームは、互いに所定数（本実施形態ではＮ／２＝２０４８）のサンプルを重複して設定される。したがって、奇数番目の音響フレームを先頭からＡ１、Ａ２、Ａ３…とし、偶数番目の音響フレームを先頭からＢ１、Ｂ２、Ｂ３…とすると、Ａ１はサンプル１〜４０９６、Ａ２はサンプル４０９７〜８１９２、Ａ３はサンプル８１９３〜１２２８８、Ｂ１はサンプル２０４９〜６１４４、Ｂ２はサンプル６１４５〜１０２４０、Ｂ３はサンプル１０２４１〜１４３３６となる。なお、重複させるサンプル数は適宜設定することが可能である。 Also in this embodiment, as in the first to third embodiments, odd-numbered acoustic frames and even-numbered acoustic frames are set by overlapping a predetermined number of samples (N / 2 = 2048 in this embodiment). Is done. Therefore, if the odd-numbered acoustic frames are A1, A2, A3... From the top, and the even-numbered acoustic frames are B1, B2, B3... From the top, A1 is samples 1 to 4096, A2 is samples 4097 to 8192, A3. Is samples 8193-12288, B1 is samples 2049-6144, B2 is samples 6145-10240, and B3 is samples 10241-14336. Note that the number of samples to be overlapped can be set as appropriate.

本実施形態では、各第１音響フレームについてのフーリエ変換は、上記〔数式１〕で定義されるハニング窓関数Ｗ（ｉ）を乗じたものに対して行われることになる。 In the present embodiment, the Fourier transform for each first acoustic frame is performed on the product multiplied by the Hanning window function W (i) defined by the above [Equation 1].

時間−周波数変換手段２０が、第１音響フレームに対してフーリエ変換を行う場合は、左チャンネル信号Ｘｌ（ｉ）、右チャンネル信号Ｘｒ（ｉ）（ｉ＝０，…，Ｎ−１）に対して、窓関数Ｗ（ｉ）を用いて、上記〔数式４〕に従った処理を行い、左チャンネルに対応する変換データの実部Ａｌ（ｊ）、虚部Ｂｌ（ｊ）、右チャンネルに対応する変換データの実部Ａｒ（ｊ）、虚部Ｂｒ（ｊ）を得る。上記〔数式４〕による処理の結果、サンプリング周波数が４４．１ｋＨｚ、Ｎ＝４０９６の場合、ｊの値が１つ異なると、周波数が約１０．８Ｈｚ異なることになる。 When the time-frequency conversion means 20 performs a Fourier transform on the first sound frame, the left channel signal Xl (i) and the right channel signal Xr (i) (i = 0,..., N−1). Then, using the window function W (i), the processing according to the above [Equation 4] is performed, and the real part Al (j), the imaginary part Bl (j), and the right channel of the conversion data corresponding to the left channel are handled. The real part Ar (j) and imaginary part Br (j) of the conversion data to be obtained are obtained. As a result of the above processing of [Equation 4], when the sampling frequency is 44.1 kHz and N = 4096, if the value of j is different by one, the frequency will be different by about 10.8 Hz.

ステップＳ１３において上記〔数式４〕に従った処理を実行することにより、各第１音響フレームに対応する複素数のスペクトルが得られる。続いて、第２の実施形態と同様、周波数成分改変手段３０が、第１音響フレームから得られた第１スペクトルを用いて、高域に低域打消し成分を追加する処理を行う（ステップＳ１４）。 By executing the processing according to [Formula 4] in step S13, a complex spectrum corresponding to each first acoustic frame is obtained. Subsequently, as in the second embodiment, the frequency component modifying unit 30 performs a process of adding a low frequency cancellation component to the high frequency using the first spectrum obtained from the first acoustic frame (step S14). ).

ステップＳ１４における低域打消し成分の追加処理を行ったら、次に、周波数成分改変手段３０は、第２の実施形態と同様、低域打消し成分の追加処理後の第１スペクトルに白色雑音を付加する処理を行う（ステップＳ１５）。具体的には、以下の〔数式１２〕に従った処理を実行し、高域の所定の範囲に、各周波数成分を同等に含む雑音である白色雑音を付加する処理を行う。 After performing the low frequency cancellation component addition processing in step S14, next, the frequency component modification means 30 adds white noise to the first spectrum after the low frequency cancellation component addition processing, as in the second embodiment. Processing to add is performed (step S15). Specifically, a process according to the following [Equation 12] is executed, and a process of adding white noise, which is a noise equivalently including each frequency component, to a predetermined high range is performed.

〔数式１２〕
Ａｌ（ｊ）≧０の場合
Ａｌ´（ｊ）← −Ａｌ（Ｎ／４−ｊ）×α＋γｌ
Ａｌ（ｊ）＜０の場合
Ａｌ´（ｊ）← −Ａｌ（Ｎ／４−ｊ）×α−γｌ
Ｂｌ（ｊ）≧０の場合
Ｂｌ´（ｊ）← −Ｂｌ（Ｎ／４−ｊ）×α＋γｌ
Ｂｌ（ｊ）＜０の場合
Ｂｌ´（ｊ）← −Ｂｌ（Ｎ／４−ｊ）×α−γｌ
Ａｒ（ｊ）≧０の場合
Ａｒ´（ｊ）← −Ａｒ（Ｎ／４−ｊ）×α＋γｒ
Ａｒ（ｊ）＜０の場合
Ａｒ´（ｊ）← −Ａｒ（Ｎ／４−ｊ）×α−γｒ
Ｂｒ（ｊ）≧０の場合
Ｂｒ´（ｊ）← −Ｂｒ（Ｎ／４−ｊ）×α＋γｒ
Ｂｒ（ｊ）＜０の場合
Ｂｒ´（ｊ）← −Ｂｒ（Ｎ／４−ｊ）×α−γｒ [Formula 12]
When Al (j) ≧ 0 Al ′ (j) ← −Al (N / 4−j) × α + γl
When Al (j) <0 Al ′ (j) ← −Al (N / 4−j) × α−γl
When B1 (j) ≧ 0 B1 ′ (j) ← −B1 (N / 4−j) × α + γ1
When B1 (j) <0 B1 ′ (j) ← −B1 (N / 4−j) × α−γ1
When Ar (j) ≧ 0 Ar ′ (j) ← −Ar (N / 4−j) × α + γr
When Ar (j) <0 Ar ′ (j) ← −Ar (N / 4−j) × α−γr
When Br (j) ≧ 0 Br ′ (j) ← −Br (N / 4−j) × α + γr
When Br (j) <0 Br ′ (j) ← −Br (N / 4−j) × α−γr

周波数成分改変手段３０は、周波数成分Ａｌ（ｊ）、Ｂｌ（ｊ）、Ａｒ（ｊ）、Ｂｒ（ｊ）に対して、上記〔数式１２〕に従った処理を、ｊ＝Ｎ／４＋１，・・・，Ｎ／２−１の各ｊについて実行する。１音響フレームのサンプル数Ｎ=４０９６、サンプリング周波数Ｆｓ＝４４．１ｋＨｚの場合、ｊ＝Ｎ／４は、周波数Ｆｓ／４に対応し、ｊ＝Ｎ／２は、周波数Ｆｓ／２に対応する。なお、ｊ＝１，・・・Ｎ／４の各成分に対しては、改変を加えない。上記〔数式１２〕において、αは、０．０＜α≦１．０の範囲で設定されるスケーリング実数値である。本実施形態ではα＝１．０に設定されている。また、γｌ、γｒは白色雑音成分に相当する信号レベル実数値であり、本実施形態では、Ａｌ（ｊ）、Ｂｌ（ｊ）、Ａｒ（ｊ）、Ｂｒ（ｊ）が１６ビットの範囲（−３２７６８〜＋３２７６７）で定義されている場合、γｌ＝γｒ＝１０２４．０に設定されている。上記〔数式１２〕に示した各式右辺の第１項（−Ａｌ（Ｎ／４−ｊ）×α等）は低域打消し成分であり、各式右辺の第２項γｌ、γｒは、白色雑音成分である。 The frequency component modifying means 30 performs the processing according to the above [Equation 12] on the frequency components Al (j), Bl (j), Ar (j), Br (j), j = N / 4 + 1,. .., for each j of N / 2-1. When the number of samples of one acoustic frame is N = 4096 and the sampling frequency Fs is 44.1 kHz, j = N / 4 corresponds to the frequency Fs / 4, and j = N / 2 corresponds to the frequency Fs / 2. Note that no modification is made to each component of j = 1,... N / 4. In the above [Equation 12], α is a scaling real value set in a range of 0.0 <α ≦ 1.0. In this embodiment, α = 1.0 is set. Also, γl and γr are signal level real values corresponding to white noise components. In this embodiment, Al (j), Bl (j), Ar (j), and Br (j) are in a 16-bit range (− 32768 to +32767), γl = γr = 1024.0 is set. The first term (−Al (N / 4−j) × α, etc.) on the right side of each equation shown in [Equation 12] is a low-frequency canceling component, and the second terms γl and γr on the right side of each equation are It is a white noise component.

白色雑音成分を可聴域へ埋め込む場合は、音質劣化に留意する必要がある。そのため、本実施形態のように、人間の可聴域である約１１．１ｋＨｚ〜約２２．１ｋＨｚに埋め込む際には、γｌ、γｒを元の原音響信号の信号強度と連動させて増減させることにより、埋め込みによる品質劣化を抑えることができる。具体的には、以下の〔数式１３〕により算出することができる。 When embedding a white noise component in the audible range, it is necessary to pay attention to deterioration in sound quality. Therefore, as in this embodiment, when embedding in the human audible range of about 11.1 kHz to about 22.1 kHz, γl and γr are increased or decreased in conjunction with the signal intensity of the original original sound signal. , Quality deterioration due to embedding can be suppressed. Specifically, it can be calculated by the following [Equation 13].

〔数式１３〕
γｌ＝γｏ[γｌｐ＋５０×Ａｖｅｌ]／２
γｒ＝γｏ[γｒｐ＋５０×Ａｖｅｒ]／２
Ａｖｅｌ＝[Σ_{i=1,…,N/4-1}｛Ａｌ（ｊ＋Ｎ／４）²＋Ｂｌ（ｊ＋Ｎ／４）²｝×１６ｊ／Ｎ²]^1/2
Ａｖｅｒ＝[Σ_{i=1,…,N/4-1}｛Ａｒ（ｊ＋Ｎ／４）²＋Ｂｒ（ｊ＋Ｎ／４）²｝×１６ｊ／Ｎ²]^1/2 [Formula 13]
γl = γo [γlp + 50 × Avel] / 2
γr = γo [γrp + 50 × Aver] / 2
Avel = [Σ _{i = 1,..., N / 4-1} {Al (j + N / 4) ² + Bl (j + N / 4) ² } × 16j / N ² ] ^1/2
Aver = [Σ _{i = 1,..., N / 4-1} {Ar (j + N / 4) ² + Br (j + N / 4) ² } × 16j / N ² ] ^1/2

上記〔数式１３〕において、γｏ＝１０２４．０であり、γｌｐ、γｒｐは、それぞれ直前の音響フレームで決定されたγｌ、γｒの値である。先頭の音響フレームの場合、直前の音響フレームが存在しないため、γｌｐ、γｒｐ＝γｏとする。第１式、第２式に登場する係数“５０”は、所定の定数であり、０〜１００の整数値に設定することが可能であるが、本実施形態のように“５０”に近い程好ましい。Ａｖｅｌ、Ａｖｅｒは、第３実施形態と同様、周波数Ｆｔから周波数２Ｆｔに含まれる所定範囲の信号成分の平均値である。上記〔数式１３〕で算出されたγｌ、γｒをそのまま使用すると、γｌ、γｒの急変によりスパイクノイズが発生する場合があるため、本実施形態では、上記〔数式１３〕で算出されたγｌ、γｒとその直前の音響フレームから得られたγｌｐ、γｒｐの平均値を新たなγｌ、γｒとして算出し、〔数式１２〕において用いる。 In the above [Equation 13], γo = 1024.0, and γlp and γrp are the values of γl and γr determined in the immediately preceding acoustic frame, respectively. In the case of the first acoustic frame, since there is no previous acoustic frame, γlp and γrp = γo are set. The coefficient “50” appearing in the first and second formulas is a predetermined constant and can be set to an integer value of 0 to 100. However, as the present embodiment is closer to “50”, preferable. Avel and Aver are average values of signal components in a predetermined range included from frequency Ft to frequency 2Ft, as in the third embodiment. If γl and γr calculated in the above [Equation 13] are used as they are, spike noise may occur due to a sudden change in γl and γr. And the average values of γlp and γrp obtained from the immediately preceding acoustic frame are calculated as new γl and γr and used in [Equation 12].

上記〔数式１２〕に示したように、複素数の実数成分および虚数成分の絶対値を増加させるように、所定の強度γｌ、γｒを白色雑音成分として与える。ステップＳ１５における白色雑音の付加処理は、第２の実施形態と同様、奇数番目の音響フレームに対してのみ行われ、偶数番目の音響フレームに対しては行われない。奇数番目と偶数番目で白色雑音をＯｎ／Ｏｆｆさせることにより、原音響信号に対して改変を加える割合を半分にするとともに、Ｏｎ／Ｏｆｆの低周波的な交番変動を加えることにより雑音再生音を明瞭にすることができる。 As shown in [Expression 12] above, predetermined intensities γl and γr are given as white noise components so as to increase the absolute values of the real and imaginary components of the complex number. The white noise addition processing in step S15 is performed only for odd-numbered sound frames, and not performed for even-numbered sound frames, as in the second embodiment. By turning on / off white noise at odd and even numbers, the ratio of modification to the original sound signal is halved and noise reproduction sound is reduced by adding low frequency alternating fluctuations of On / Off. Can be clear.

本実施形態では、ステップＳ１４、Ｓ１５における周波数成分の改変処理を奇数番目の音響フレームに対して行い、偶数番目の音響フレームに対しては行っていないが、前述の通り、１つ置きの音響フレームに対して改変処理を行う必要はなく、妨害音の効果が高い手法を適宜選択すれば良い。 In the present embodiment, the frequency component modification process in steps S14 and S15 is performed on the odd-numbered sound frames and not performed on the even-numbered sound frames. There is no need to perform modification processing on the above, and a method with a high effect of interfering sound may be selected as appropriate.

ステップＳ１６における周波数−時間変換の後、得られた改変音響フレームを順次出力して改変音響信号を得ることができる。ただし、前述の通りステップＳ１６までの処理の段階では、複製防止効果はない。そこで、本実施形態においても、第１〜第３の実施形態と同様、複製時にＬＰＦ処理が行われた場合にも所望の妨害音が再生されるようにするため、以下のステップＳ１７の処理を行う。 After the frequency-time conversion in step S16, the modified acoustic frames obtained can be sequentially output to obtain the modified acoustic signal. However, as described above, there is no copy prevention effect at the stage of processing up to step S16. Therefore, in this embodiment as well, in the same way as in the first to third embodiments, in order to reproduce a desired disturbing sound even when the LPF processing is performed at the time of duplication, the processing in the following step S17 is performed. Do.

すなわち、第２の実施形態と同様、ステップＳ１６までの処理により得られた改変音響信号に対して、チャンネル間およびサンプル間の演算処理を行う（ステップＳ１７）。具体的には、第２の実施形態と同様、２つのチャンネルで対応する成分を加算しモノラル化すると、隣接する２つのサンプルの値が同一の値（２つのチャンネルで計４つのサンプルの平均値×２）になるように、Ｒチャンネルの各サンプルの値を補正する。本実施形態では、第１の実施形態と同様、上記〔数式８〕に従った処理を実行し、Ｒチャンネルの各サンプルの値Ｘｒ´を補正する。 That is, similarly to the second embodiment, the inter-channel and inter-sample arithmetic processing is performed on the modified acoustic signal obtained by the processing up to step S16 (step S17). Specifically, as in the second embodiment, when the corresponding components are added to monaural in two channels, the values of two adjacent samples are the same value (the average value of a total of four samples in two channels). The value of each sample of the R channel is corrected so that × 2). In the present embodiment, as in the first embodiment, the process according to the above [Equation 8] is executed to correct the value Xr ′ of each sample of the R channel.

改変音響フレーム出力手段５０は、改変音響フレーム補正手段４５の処理により得られた補正後の改変音響フレームを順次出力ファイルに出力する。上記図９のフローチャートに示した処理は、第１音響信号の全ての第１音響フレームに対して実行される。このようにして全ての第１音響フレームに対して処理を行った結果、改変音響フレームの集合である改変音響信号が、改変音響信号記憶部６２に記憶される。本実施形態では、第１音響信号の先頭から最後まで妨害音を埋め込むようにしたが、埋め込む位置を設定して、その範囲にだけ埋め込むようにすることも可能である。 The modified sound frame output means 50 sequentially outputs the modified sound frames after correction obtained by the process of the modified sound frame correction means 45 to the output file. The process shown in the flowchart of FIG. 9 is executed for all the first sound frames of the first sound signal. As a result of performing processing on all the first sound frames in this way, a modified sound signal that is a set of modified sound frames is stored in the modified sound signal storage unit 62. In the present embodiment, the disturbing sound is embedded from the beginning to the end of the first acoustic signal. However, it is also possible to set an embedding position and embed only in that range.

＜３．５．１チャンネルサラウンド音響への対応＞
上記実施形態では、ＬＲの２チャンネルステレオ音響の場合について説明したが、本発明は５．１チャンネルサラウンド音響に対応することも可能である。５．１チャンネルサラウンド音響信号は、図１３に示すように、前方スピーカ用のチャンネルＦｌ、Ｆｃ、Ｆｒ、後方スピーカ用のチャンネルＢｌ、Ｂｒ、低音スピーカ用のチャンネルＬｆの計６チャンネルで構成される。 <Support for 3.5.1 channel surround sound>
In the above embodiment, the case of LR two-channel stereo sound has been described, but the present invention can also be adapted to 5.1 channel surround sound. As shown in FIG. 13, the 5.1-channel surround sound signal is composed of a total of 6 channels: front speaker channels Fl, Fc, Fr, rear speaker channels Bl, Br, and bass speaker channel Lf. .

本発明を５．１チャンネルサラウンド音響信号に適用する場合、前方スピーカ用のＦｌとＦｃの組み合わせで、Ｆｃをステレオ音響信号の場合のＬチャンネル（図１３において「Ｌ１」）、Ｆｌをステレオ音響信号の場合のＲチャンネル（図１３において「Ｒ１」）として、上記実施形態と同様な処理を行い、Ｆｌの信号に対して改変音響フレーム補正手段４５が補正を施す。続いて、前方スピーカ用のＦｒとＦｃの組み合わせで、Ｆｃをステレオ音響信号の場合のＬチャンネル（図１３において「Ｌ２」）、Ｆｒをステレオ音響信号の場合のＲチャンネル（図１３において「Ｒ２」）として、上記実施形態と同様な処理を行い、Ｆｒの信号に対して改変音響フレーム補正手段４５が補正を施す。さらに、後方スピーカ用のＢｌとＢｒの組み合わせで、Ｂｌをステレオ音響信号の場合のＬチャンネル（図１３において「Ｌ３」）、Ｂｒをステレオ音響信号の場合のＲチャンネル（図１３において「Ｒ３」）として、上記実施形態と同様な処理を行い、Ｂｒの信号に対して改変音響フレーム補正手段４５が補正を施す。低音スピーカ用のＬｆに対しては、埋め込み処理を行わない。 When the present invention is applied to a 5.1 channel surround sound signal, a combination of Fl and Fc for front speakers, L channel ("L1" in FIG. 13) when Fc is a stereo sound signal, and Fl is a stereo sound signal. For the R channel ("R1" in FIG. 13), the same processing as in the above embodiment is performed, and the modified acoustic frame correction means 45 corrects the Fl signal. Subsequently, with the combination of Fr and Fc for the front speakers, F channel is an L channel in the case of a stereo sound signal ("L2" in FIG. 13), and Fr is an R channel in the case of a stereo sound signal ("R2" in FIG. 13). ), The same processing as in the above embodiment is performed, and the modified acoustic frame correction means 45 corrects the Fr signal. Further, in the combination of Bl and Br for the rear speakers, Bl is an L channel in the case of a stereo sound signal ("L3" in FIG. 13), and Br is an R channel in the case of a stereo sound signal ("R3" in FIG. 13). Then, the same processing as in the above embodiment is performed, and the modified acoustic frame correction unit 45 corrects the Br signal. The embedding process is not performed for Lf for the bass speaker.

このようにして、妨害音が埋め込まれた５．１チャンネルサラウンド音響信号を再生した際、何者かが違法に複製しようとした場合、ＦｌとＦｃの信号が混合されるか、ＦｒとＦｃの信号が混合されるか、ＢｌとＢｒの信号が混合されるかのいずれか、または各信号の混合が同時に発生することにより、エイリアシングが発生する。その結果、複製音響信号には、妨害音に相当する成分が記録され、複製音響信号を再生すると、妨害音が再生される。 In this way, when a 5.1 channel surround sound signal in which an interfering sound is embedded is reproduced, if someone tries to copy illegally, the Fl and Fc signals are mixed, or the Fr and Fc signals are mixed. Is mixed, the signals of Bl and Br are mixed, or mixing of each signal occurs at the same time, thereby causing aliasing. As a result, a component corresponding to the disturbing sound is recorded in the duplicate sound signal, and when the duplicate sound signal is reproduced, the disturbing sound is reproduced.

＜４．変形例等＞
以上、本発明の好適な実施形態について説明したが、本発明は上記実施形態に限定されず、種々の変形が可能である。例えば、上記実施形態では、第１音響信号において、周波数Ｆｔ以上の信号成分を除去し、周波数Ｆｔを中心に周波数Ｆｔ以下の信号成分を高域の周波数方向に折り返しているが、具体的なＦｔの値は上記実施形態に限定されず、様々な値を設定することが可能である。 <4. Modified example>
The preferred embodiments of the present invention have been described above. However, the present invention is not limited to the above embodiments, and various modifications can be made. For example, in the above-described embodiment, in the first acoustic signal, the signal component having the frequency Ft or higher is removed, and the signal component having the frequency Ft or lower is folded around the frequency Ft in the high frequency direction. The value of is not limited to the above embodiment, and various values can be set.

また、上記第１、第３の実施形態におけるステップＳ６、第２、第４の実施形態におけるステップＳ１４では、時間−周波数変換手段２０が、第１スペクトルの高域に低域打消し成分を付加する処理を行うようにしたが、この処理を省略するようにしても良い。 In step S6 in the first and third embodiments, and in step S14 in the second and fourth embodiments, the time-frequency conversion means 20 adds a low frequency cancellation component to the high frequency of the first spectrum. However, this process may be omitted.

また、上記第２、第４の実施形態では、ステップＳ１４、Ｓ１５における低域打消し成分、白色雑音の付加処理を奇数番目の音響フレームまたは偶数番目の音響フレームに対して行ったが、２つ以上のフレーム間隔を置きながら付加処理を実行するようにしても良いし、あるいは全ての音響フレームに対して実行するようにしても良い。 In the second and fourth embodiments, the low-frequency cancellation component and white noise addition processing in steps S14 and S15 are performed on the odd-numbered acoustic frame or the even-numbered acoustic frame. The additional processing may be executed with the above frame interval, or may be executed for all sound frames.

１・・・ＣＰＵ
２・・・ＲＡＭ
３・・・記憶装置
４・・・キー入力Ｉ／Ｆ
５・・・データ入出力Ｉ／Ｆ
６・・・表示出力Ｉ／Ｆ
８・・・アップサンプリング手段
１０・・・音響フレーム読込手段
２０・・・時間−周波数変換手段
３０・・・周波数成分改変手段
４０・・・周波数−時間変換手段
４５・・・改変音響フレーム補正手段
５０・・・改変音響フレーム出力手段
６０・・・記憶手段
６１・・・音響信号記憶部
６２・・・改変音響信号記憶部 1 ... CPU
2 ... RAM
3 ... Storage device 4 ... Key input I / F
5. Data input / output I / F
6 ... Display output I / F
8: Upsampling means 10 ... Sound frame reading means 20 ... Time-frequency conversion means 30 ... Frequency component modification means 40 ... Frequency-time conversion means 45 ... Modified sound frame correction means 50 ... Modified acoustic frame output means 60 ... Storage means 61 ... Acoustic signal storage section 62 ... Modified acoustic signal storage section

Claims

A device that embeds a second acoustic signal composed of a time-series sample sequence in an inaudible state as an interfering sound with respect to an original acoustic signal composed of a time-series sample sequence of at least two channels,
Two channels are selected from the original acoustic signal, an acoustic frame composed of a predetermined number of samples is read as a first acoustic frame for each channel, and an acoustic frame composed of a predetermined number of samples from the second acoustic signal Sound frame reading means for reading as a second sound frame;
Time-frequency conversion is performed for each channel on the first sound frame to obtain a first spectrum that is a complex frequency component, and time-frequency conversion is performed on the second sound frame to obtain a complex frequency component. Time-frequency conversion means for obtaining a second spectrum;
Among the signal components of the second spectrum, a signal component of a predetermined frequency Ft or higher is removed, and a signal component of the frequency Ft or lower is folded back in the high frequency direction with the frequency Ft as a center, and from the folded frequency Ft. The frequency component of the first spectrum is added to the signal component of the corresponding frequency of the first spectrum by multiplying the signal component of the predetermined range included in the frequency 2Ft by a predetermined coefficient value, for each channel. A frequency component modifying means for modifying;
Frequency-time conversion means for generating a modified sound frame by performing frequency-time conversion on the first spectrum in which the frequency component is modified;
When two channels are added for each corresponding sample to the generated modified acoustic frame for two channels, a total of four adjacent two samples for two channels are matched so that the two adjacent samples match. A modified acoustic frame correction means for performing correction to change the value of two samples of each set with a set of samples;
Modified acoustic frame output means for sequentially outputting the modified acoustic frames corrected by the modified acoustic frame correcting means;
A device for embedding a disturbing sound for an acoustic signal, comprising:

An apparatus that embeds noise composed of a time-series sample sequence in an inaudible state as an interfering sound with respect to an original sound signal composed of a time-series sample sequence of at least two channels,
Sound frame reading means for selecting two channels from the original sound signal and reading a sound frame composed of a predetermined number of samples for each channel as a first sound frame;
Time-frequency conversion means for performing time-frequency conversion for each channel on the first acoustic frame to obtain a first spectrum as a complex frequency component;
Among the signal components of the first spectrum, the frequency components of the first spectrum such that the absolute value of the signal increases by a predetermined value with respect to the signal components in the predetermined range included in the frequency 2Ft from the predetermined frequency Ft. Frequency component modifying means for modifying
Frequency-time conversion means for generating a modified sound frame by performing frequency-time conversion on the first spectrum in which the frequency component is modified;
When two channels are added for each corresponding sample to the generated modified acoustic frame for two channels, a total of four adjacent two samples for two channels are matched so that the two adjacent samples match. A modified acoustic frame correction means for performing correction to change the value of two samples of each set with a set of samples;
Modified acoustic frame output means for sequentially outputting the modified acoustic frames corrected by the modified acoustic frame correcting means;
A device for embedding a disturbing sound for an acoustic signal, comprising:

In claim 2,
The frequency component altering means performs processing on acoustic frames at predetermined intervals among the acoustic frames read by the acoustic frame reading means. .

In any one of Claims 1-3,
The frequency component modifying means removes a signal component having a frequency Ft or higher from the signal component of the first spectrum obtained by the time-frequency converting means, and a signal having the frequency Ft or lower with the frequency Ft as a center. The components of the first spectrum are folded back in the high frequency direction, and the sign component of the first spectrum in the range of the folded frequency Ft to the frequency 2Ft is inverted with the sign of the first spectrum and multiplied by a predetermined coefficient value. An interference sound embedding apparatus for an acoustic signal, wherein a process of modifying the frequency component of the first spectrum for each channel by adding to a signal component of a corresponding frequency is performed in advance.

In any one of Claims 1-4,
The original sound signal is sampled at a sampling frequency Fs,
Upsampling means for upsampling the original sound signal at a sampling frequency Fus (Fus> Fs) to create a wideband sound signal that is a wideband original sound signal;
Ft = Fs / 2,
The time-frequency conversion means, the frequency component modification means, the frequency-time conversion means, the modified sound frame correction means, and the modified sound frame output means perform processing on the wideband sound signal. An embedding device for disturbing sound to an acoustic signal.

In any one of Claims 1-4,
The original sound signal is sampled at a sampling frequency Fs,
The interference sound embedding device for an acoustic signal, wherein Ft = Fs / 4.

In claim 6,
The predetermined coefficient value to be multiplied or the predetermined value to be increased in the frequency component modifying means is within a predetermined range included in the frequency 2Ft from the frequency Ft in the signal component of the first spectrum obtained by the time-frequency converting means. An interfering sound embedding device for an acoustic signal, wherein the device is varied based on an average value of signal components.

In any one of Claims 1-7,
The time-frequency conversion means has a weight W (i) (0 ≦ W (i) ≦ 1) at a sample position i (0 ≦ i ≦ N−1) as window width N samples, and W (i) = 0. A device for embedding interfering sound into an acoustic signal, characterized in that time-frequency conversion is performed using a Hanning window function defined by 5-0.5 cos (2πi / N).

A non-transitory computer-readable storage medium storing a program for causing a computer to function as an embedding device for interfering sound with respect to an acoustic signal according to any one of claims 1 to 8.