JP2007048340A

JP2007048340A - Device for extracting information from acoustic signal

Info

Publication number: JP2007048340A
Application number: JP2005228878A
Authority: JP
Inventors: Toshio Motegi; 敏雄茂出木
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 2005-08-05
Filing date: 2005-08-05
Publication date: 2007-02-22
Anticipated expiration: 2025-08-05
Also published as: JP4531653B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a device for extracting information from an acoustic signal by which additional information such as attribute information embedded in the state of inhibiting listening can more correctly be extracted while suppressing the load of a calculation amount regarding the acoustic signals of a plurality of channels provided in CD or broadcasting. <P>SOLUTION: The sample group of a predetermined number of samples is set as a reference frame from an input acoustic signal, an optimal phase frame is decided therefrom after this reference frame and a phase changed frame are set, and a bit value is detected from the optimal phase frame. When the optimal phase frame fluctuates to return again, an offset value is added to set a reference frame and a phase changed frame for a next sample group. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、ＣＤ・ＤＶＤ等を用いた民生・業務用途における鑑賞用のパッケージ音楽分野、放送事業者等が商業目的で配信する放送・ネットワーク音楽配信分野における音楽著作権の保護（不正コピーの監視）および音楽属性情報の提供（楽曲タイトル検索サービス）分野、ミュージアム、イベント会場における展示説明ナレーションに連動した文字情報の提供サービス、放送番組やＣＤ／ＤＶＤパッケージの音声信号からＵＲＬなどの情報を抽出し、携帯電話を用いて所定のコンテンツに関連するｗｅｂサイトにアクセスして詳細情報を抽出したり、アンケートに回答したりする非接触なインターネットへのゲートウェイサービス分野に関する。 The present invention relates to the protection of music copyright (monitoring illegal copying) in the field of package music for viewing for consumer and business use using CDs and DVDs, and the field of broadcasting and network music distribution distributed for commercial purposes by broadcasters and the like. ) And provision of music attribute information (music title search service), text information provision service linked to exhibition explanation narration in the fields, museums, event venues, information such as URLs from broadcast programs and audio signals of CD / DVD packages The present invention relates to a non-contact Internet gateway service field in which a mobile phone is used to access a web site related to a predetermined content and extract detailed information or answer a questionnaire.

最近、流れている音楽のタイトル等を知ることができる楽曲属性情報の提供サービスとして、放送された音楽に対して日時と地域を放送局に照会したり、携帯電話で流れている音楽断片を録音してデータベースに登録されているメロディーと照合したりするサービスが実用化されている（例えば、特許文献１、２参照）。 As a service to provide music attribute information that allows you to know the titles of music that has been played recently, you can query the broadcast station for the date and time of the broadcast music, and record music fragments that are being played on mobile phones. Services that collate with melodies registered in the database have been put into practical use (see, for example, Patent Documents 1 and 2).

特許文献１、２に記載の発明では、録音した音楽の断片と、データベースに登録されているメロディーと照合するため、データベースに登録される楽曲が増えると、処理負荷が増え、類似したメロディーを誤判定する可能性が増える。そこで、曲名やアーチスト情報などの楽曲属性情報を不可聴な電子透かしとして音響信号に埋め込む手法も提案されている（例えば、特許文献３〜６参照）。 In the inventions described in Patent Documents 1 and 2, since the recorded music fragments are checked against the melodies registered in the database, the processing load increases as the number of songs registered in the database increases, and similar melodies are mistaken. The possibility of judging increases. Therefore, a method of embedding music attribute information such as a song name and artist information as an inaudible digital watermark in an acoustic signal has also been proposed (see, for example, Patent Documents 3 to 6).

特許文献１〜６に記載の手法では、埋め込み可能な情報量が少なく、音質が少なからず劣化し、各種信号処理により透かし情報が消失し、またアナログコピーに対しては、透かし検出が困難であるという問題がある。そこで、本出願人は、複数チャンネル有する音響信号の低周波数成分の比率を属性情報のビット値に応じて変更することにより、属性情報を埋め込む手法を提案すると共に、このようにして埋め込まれた音響信号から高い精度で属性情報を抽出するために、位相を変化させて解析を行う手法を提案した（特許文献７、８参照）。
特開２００２−２５９４２１号公報特開２００３−１５７０８７号公報特開平１１−１４５８４０号公報特開平１１−２１９１７２号公報特許第３３２１７６７号公報特開２００３−９９０７７号公報特願２００５−５１５７号特願２００５−５１３８１号 In the methods described in Patent Documents 1 to 6, the amount of information that can be embedded is small, the sound quality is deteriorated to some extent, watermark information is lost by various signal processing, and watermark detection is difficult for analog copies. There is a problem. Therefore, the present applicant proposes a method of embedding attribute information by changing the ratio of the low frequency component of the sound signal having a plurality of channels according to the bit value of the attribute information, and the sound embedded in this way. In order to extract attribute information from a signal with high accuracy, a method of analyzing by changing the phase has been proposed (see Patent Documents 7 and 8).
JP 2002-259421 A Japanese Patent Laid-Open No. 2003-157087 JP-A-11-145840 JP-A-11-219172 Japanese Patent No. 3321767 JP 2003-99077 A Japanese Patent Application No. 2005-5157 Japanese Patent Application No. 2005-51381

しかしながら、上記特許文献８に記載の手法では、段階的に位相の変更が行われるため、抽出側で取り込んだ信号の先頭が、各段階の中間近辺にずれると、最適な位相の判断が不安定となり、極度に検出精度が低下したり、検出不能になる等の問題がある。この対策として、位相補正段階を細かくすることも考えられるが、演算時間が増大してリアルタイム抽出処理が困難になると共に、最適位相判定の計算誤差も増大して判定精度が向上しないという問題がある。 However, in the method described in Patent Document 8, since the phase is changed step by step, the determination of the optimum phase is unstable if the beginning of the signal captured on the extraction side deviates near the middle of each step. Thus, there are problems such as extremely low detection accuracy and impossible detection. As a countermeasure, it is conceivable to make the phase correction step fine, but there is a problem that the calculation time is increased and the real-time extraction process becomes difficult, and the calculation error of the optimum phase determination is increased and the determination accuracy is not improved. .

また、特許文献８に記載の手法では、低周波成分の強度が閾値より小さい場合には、ビット値検出の対象としないこととしているが、信号レベルの急激な変動や一時的な信号レベル低下や無音状態が発生した場合には、適切に判定できなくなるという問題もある。 Further, in the method described in Patent Document 8, when the intensity of the low frequency component is smaller than the threshold value, the bit value is not detected. However, the signal level suddenly fluctuates or the signal level temporarily decreases. There is also a problem that when a silent state occurs, it cannot be properly determined.

そこで、本発明は、ＣＤや放送で提供される複数チャンネルの音響信号について、不可聴な状態で埋め込まれた属性情報等の付加情報を、演算量の負荷を抑えつつ、より正確に抽出することが可能な音響信号からの情報の抽出装置を提供することを課題とする。また、本発明は、信号レベルの急激な変動や一時的な信号レベル低下や無音状態が発生した場合にも、適切に負荷情報を抽出することが可能な音響信号からの情報の抽出装置を提供することを課題とする。 Therefore, the present invention more accurately extracts additional information such as attribute information embedded in an inaudible state from a plurality of channels of audio signals provided on a CD or broadcast while suppressing the burden of calculation amount. It is an object of the present invention to provide an apparatus for extracting information from an acoustic signal. In addition, the present invention provides an apparatus for extracting information from an acoustic signal that can appropriately extract load information even when a sudden change in signal level, temporary signal level drop, or silence occurs. The task is to do.

上記課題を解決するため、本発明では、時系列のサンプル列で構成される音響信号から、あらかじめ聴取不能な状態で埋め込まれた情報を抽出する装置であって、前記音響信号から、所定数のサンプルで構成される音響フレームを獲得する音響フレーム獲得手段と、前記音響フレームを所定サンプル数分のオフセット値だけ移動させることにより位相を変更して基準フレームを設定する基準フレーム設定手段と、前記基準フレームから、前記オフセット値より大きい値であるステップ値に相当するサンプルずつ移動させることにより位相を変更して設定される複数の音響フレームを位相変更フレームとして設定する位相変更フレーム設定手段と、前記基準フレームおよび位相変更フレームとして設定された各音響フレームに対して周波数変換を行い、前記各音響フレームに対応するフレームスペクトルを生成する周波数変換手段と、前記生成されたフレームスペクトルから所定の周波数以下の成分に相当する低周波強度データを抽出し、前記低周波強度データに基づいて、符号判定パラメータを算出する符号判定パラメータ算出手段と、基準フレームが異なる過去の同位相の音響フレームにおいて算出された符号判定パラメータに基づいて、前記基準フレームおよび複数の位相変更フレームのうち１つの音響フレームを位相が最適な最適位相フレームであると判断し、当該最適位相フレームについて判断された前記符号判定パラメータに基づいて、所定の符号を出力するとともに、当該最適位相フレームが、直前の最適位相フレームと位相が異なり、２つ前の最適位相フレームと位相が同一である場合に、前記オフセット値を変更する符号出力手段と、を有し、前記各最適位相フレームに対して出力された符号により構成されるビット配列を、所定の規則により変換して付加情報を抽出する付加情報抽出手段を有する音響信号からの情報の抽出装置を提供する。 In order to solve the above problems, the present invention is an apparatus for extracting information embedded in an inaudible state in advance from an acoustic signal composed of a time-series sample sequence, wherein a predetermined number of information is extracted from the acoustic signal. An acoustic frame acquisition means for acquiring an acoustic frame composed of samples; a reference frame setting means for setting a reference frame by changing the phase by moving the acoustic frame by an offset value for a predetermined number of samples; and the reference A phase change frame setting means for setting a plurality of acoustic frames set by changing a phase by moving samples from a frame corresponding to a step value that is a value larger than the offset value as a phase change frame; and the reference Frequency conversion for each acoustic frame set as frame and phase change frame Performing frequency conversion means for generating a frame spectrum corresponding to each of the acoustic frames, extracting low frequency intensity data corresponding to a component of a predetermined frequency or less from the generated frame spectrum, and based on the low frequency intensity data Based on the code determination parameter calculating means for calculating the code determination parameter and the code determination parameter calculated in the same in-phase acoustic frame having a different reference frame, one of the reference frame and the plurality of phase change frames is selected. The acoustic frame is determined to be the optimal phase frame having the optimum phase, and a predetermined code is output based on the code determination parameter determined for the optimal phase frame. The phase is different from that of the frame. Code output means for changing the offset value when the two are the same, and a bit array composed of codes output for each optimum phase frame is converted and added according to a predetermined rule An apparatus for extracting information from an acoustic signal having additional information extracting means for extracting information is provided.

また、本発明では、前記符号出力手段を、前記生成されたフレームスペクトルから抽出した低周波強度データの合算値が、所定の下限閾値未満である場合に、前記音響フレームを無効フレームであると判断するものであり、前記判断に使用する下限閾値を、過去に有効フレームとして判断された低周波強度データを加算していくことにより算出し、その低周波強度データの加算に当たり、時間的に遠く離れた過去の有効フレームほど、その影響を小さくするものとしたことを特徴とする。 In the present invention, the code output means determines that the acoustic frame is an invalid frame when the sum of low frequency intensity data extracted from the generated frame spectrum is less than a predetermined lower threshold. The lower threshold used for the determination is calculated by adding low-frequency intensity data that has been determined as valid frames in the past, and the addition of the low-frequency intensity data is far away in time. In addition, the past effective frames are characterized by reducing the influence thereof.

本発明によれば、取得した音響信号を所定の音響フレーム単位で解析する際、過去の音響フレームにおける最適位相の変化に基づいて、基準フレームの位置を決定し、この基準フレームに対して位相をずらしながら、各音響フレームについて最適な位相を判断し、最適な位相と判断される音響フレームの状態に基づいて、埋め込まれていた情報を判断するため、演算負荷を最小にして最適な位相を決定することができ、不可聴な状態で埋め込まれた付加情報を、再生されている音響信号から正確に抽出することが可能となるという効果を奏する。 According to the present invention, when the acquired acoustic signal is analyzed in a predetermined acoustic frame unit, the position of the reference frame is determined based on the change in the optimum phase in the past acoustic frame, and the phase is determined with respect to the reference frame. The optimal phase is determined for each acoustic frame while shifting, and the optimal phase is determined by minimizing the computation load to determine the embedded information based on the state of the acoustic frame determined to be the optimal phase. The additional information embedded in an inaudible state can be accurately extracted from the reproduced sound signal.

また、本発明では、ビット値を抽出するために必要な低周波強度の閾値の算出にあたり、時間的に遠く離れた過去のフレームほど、その影響を小さくするようにしたので、信号レベルの急激な変動や一時的な信号レベル低下や無音状態が発生した場合にも、適切に付加情報を抽出することが可能となる。 Further, in the present invention, when calculating the threshold value of the low frequency intensity necessary for extracting the bit value, the influence of the past frame farther away in time is reduced, so that the signal level is rapidly increased. It is possible to appropriately extract additional information even when fluctuations, a temporary signal level decrease, or a silent state occurs.

以下、本発明の実施形態について図面を参照して詳細に説明する。
（１．埋め込み装置）
まず、本発明に係る音響信号からの情報の抽出装置により抽出すべき付加情報を音響信号に対して埋め込むための埋め込み装置について説明する。図１は、埋め込み装置の構成を示す機能ブロック図である。図１において、１０は音響フレーム読込手段、２０は周波数変換手段、３０は低周波成分変更手段、４０は周波数逆変換手段、５０は改変音響フレーム出力手段、６０は記憶手段、６１は音響信号記憶部、６２は付加情報記憶部、６３は改変音響信号記憶部、７０は付加情報読込手段である。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
(1. Embedding device)
First, an embedding device for embedding additional information to be extracted in an acoustic signal by an apparatus for extracting information from an acoustic signal according to the present invention will be described. FIG. 1 is a functional block diagram showing the configuration of the embedding device. In FIG. 1, 10 is an acoustic frame reading means, 20 is a frequency converting means, 30 is a low frequency component changing means, 40 is a frequency inverse converting means, 50 is a modified acoustic frame output means, 60 is a storage means, and 61 is an acoustic signal storage. , 62 is an additional information storage unit, 63 is a modified acoustic signal storage unit, and 70 is an additional information reading means.

音響フレーム読込手段１０は、付加情報の埋め込み対象とする元のステレオ音響信号の各チャンネルから所定数のサンプルを１フレームとして読み込む機能を有している。周波数変換手段２０は、音響フレーム読込手段１０が読み込んだ音響信号のフレームをフーリエ変換等により周波数変換してフレームスペクトルを生成する機能を有している。低周波成分変更手段３０は、生成された複数のフレームスペクトルから所定の周波数以下に相当する各低周波強度データを抽出し、付加情報記憶部６２から抽出した付加情報に基づいて、チャンネル間で対応する低周波強度データのチャンネル間比率を変更する機能を有している。周波数逆変換手段４０は、変更された低周波強度データを含む複数のフレームスペクトルに対して周波数逆変換を行うことにより、複数の改変音響フレームを生成する機能を有している。改変音響フレーム出力手段５０は、生成された改変音響フレームを順次出力する機能を有している。記憶手段６０は、付加情報を埋め込む対象とするステレオ音響信号を記憶した音響信号記憶部６１と、ビット配列として構成され、ステレオ音響信号に埋め込まれる付加情報を記憶した付加情報記憶部６２と、付加情報埋め込み後の改変音響信号を記憶する改変音響信号記憶部６３を有しており、その他処理に必要な各種情報を記憶するものである。付加情報読込手段７０は、付加情報記憶部６２から付加情報を抽出する機能を有している。なお、付加情報とは、音響情報に付加して埋め込むべき情報であり、タイトルやアーティスト名等の属性情報、および属性情報以外の他の情報を含むものである。図１に示した各構成手段は、現実にはコンピュータおよびその周辺機器等のハードウェアに専用のプログラムを搭載することにより実現される。すなわち、コンピュータが、専用のプログラムに従って各手段の内容を実行することになる。 The sound frame reading means 10 has a function of reading a predetermined number of samples as one frame from each channel of the original stereo sound signal to be embedded with additional information. The frequency conversion means 20 has a function of generating a frame spectrum by frequency-converting the frame of the acoustic signal read by the acoustic frame reading means 10 by Fourier transformation or the like. The low frequency component changing means 30 extracts each low frequency intensity data corresponding to a predetermined frequency or less from a plurality of generated frame spectra, and responds between channels based on the additional information extracted from the additional information storage unit 62. It has a function of changing the inter-channel ratio of low frequency intensity data. The frequency reverse conversion means 40 has a function of generating a plurality of modified sound frames by performing frequency reverse conversion on a plurality of frame spectra including the changed low frequency intensity data. The modified sound frame output means 50 has a function of sequentially outputting the generated modified sound frames. The storage means 60 includes an acoustic signal storage unit 61 that stores a stereo acoustic signal to be embedded with additional information, an additional information storage unit 62 that is configured as a bit array and stores additional information embedded in the stereo acoustic signal, and an additional information It has a modified acoustic signal storage unit 63 for storing the modified acoustic signal after information is embedded, and stores various information necessary for other processing. The additional information reading means 70 has a function of extracting additional information from the additional information storage unit 62. The additional information is information that should be added to the sound information and embedded, and includes attribute information such as a title and artist name, and other information other than the attribute information. Each component shown in FIG. 1 is actually realized by installing a dedicated program in hardware such as a computer and its peripheral devices. That is, the computer executes the contents of each means according to a dedicated program.

（２．埋め込み装置の処理動作）
次に、図１に示した埋め込み装置の処理動作について図２のフローチャートに従って説明する。まず、付加情報読込手段７０は、付加情報記憶部６２から付加情報を１ワード単位で読み込む（Ｓ１０１）。具体的には、レジスタに１ワード読み込むことになる。続いて、モードを区切りモードに設定する（Ｓ１０２）。モードは区切りモードと、ビットモード、継続識別モードの３種類が存在する。区切りモードは１ワード単位の区切りにおける処理を行うモードを示し、ビットモードは１ワードの各ビットの値に基づいた処理を行うモードを示している。付加情報記憶部６２から１バイト読み込んだ場合には、その直後に必ず区切りモードに設定されることになる。 (2. Processing operation of embedded device)
Next, the processing operation of the embedding apparatus shown in FIG. 1 will be described according to the flowchart of FIG. First, the additional information reading means 70 reads additional information from the additional information storage unit 62 in units of one word (S101). Specifically, one word is read into the register. Subsequently, the mode is set to the separation mode (S102). There are three types of modes: separation mode, bit mode, and continuous identification mode. The delimiter mode indicates a mode for performing processing in a delimiter in units of one word, and the bit mode indicates a mode for performing processing based on the value of each bit of one word. When one byte is read from the additional information storage unit 62, the separator mode is always set immediately after that.

続いて、音響フレーム読込手段１０が、音響信号記憶部６１に記憶されたステレオ音響信号の左右の各チャンネルから、それぞれ所定数のサンプルを１音響フレームとして読み込む（Ｓ１０４）。音響フレーム読込手段１０が読み込む１音響フレームのサンプル数は、適宜設定することができるが、サンプリング周波数が４４．１ｋＨｚの場合、４０９６サンプル程度とすることが望ましい。したがって、音響フレーム読込手段１０は、左チャンネル、右チャンネルについてそれぞれ４０９６サンプルずつ、順次音響フレームとして読み込んでいくことになる。 Subsequently, the acoustic frame reading means 10 reads a predetermined number of samples as one acoustic frame from each of the left and right channels of the stereo acoustic signal stored in the acoustic signal storage unit 61 (S104). The number of samples of one sound frame read by the sound frame reading means 10 can be set as appropriate, but is desirably about 4096 samples when the sampling frequency is 44.1 kHz. Therefore, the acoustic frame reading means 10 sequentially reads 4096 samples for each of the left channel and the right channel as acoustic frames.

続いて、周波数変換手段２０は、読み込んだ各音響フレームに対して、周波数変換を行って、その音響フレームのスペクトルであるフレームスペクトルを得る（Ｓ１０５）。具体的には、各音響フレームについて、３つの窓関数を用いて行う。周波数変換としては、フーリエ変換、ウェーブレット変換その他公知の種々の手法を用いることができる。本実施形態では、フーリエ変換を用いた場合を例にとって説明する。 Subsequently, the frequency conversion means 20 performs frequency conversion on each read sound frame to obtain a frame spectrum that is a spectrum of the sound frame (S105). Specifically, each acoustic frame is performed using three window functions. As frequency conversion, Fourier transform, wavelet transform, and other various known methods can be used. In the present embodiment, a case where Fourier transform is used will be described as an example.

ここで、一般的なフーリエ変換を行う場合について説明しておく。所定の信号に対してフーリエ変換を行う場合、信号を所定の長さに区切って行う必要があるが、この場合、所定長さの信号に対してそのままフーリエ変換を行うと、区切り部分が不連続になる。そこで、一般にフーリエ変換を行う場合には、ハニング窓と呼ばれる窓関数を用いて、信号の値を変化させた後、変化後の値に対してフーリエ変換を実行する。 Here, a case where general Fourier transform is performed will be described. When Fourier transform is performed on a predetermined signal, it is necessary to divide the signal into predetermined lengths. In this case, if Fourier transform is performed on a signal of a predetermined length as it is, the delimiter part is discontinuous. become. Therefore, in general, when performing Fourier transform, a signal value is changed using a window function called a Hanning window, and then Fourier transform is performed on the changed value.

ここで、一般的なフーリエ変換を行う場合の信号波形の変化の様子を図３に示す。図３において、横軸は時間軸（ｔ）であり、図３（ａ）〜図３（ｃ）は全て対応したものとなっている。また、図３（ａ）、図３（ｃ）において縦軸は信号の振幅値（レベル）を示す。図３（ｂ）において縦軸は窓関数Ｗ（ｔ）の値を示している。なお、Ｗ（ｔ）＝０．５−０．５・ｃｏｓ（２πｔ／Ｔ）であり、Ｗ（ｔ）の最大値は１である。 Here, FIG. 3 shows how the signal waveform changes when general Fourier transform is performed. In FIG. 3, the horizontal axis is the time axis (t), and FIGS. 3 (a) to 3 (c) all correspond to each other. In FIGS. 3A and 3C, the vertical axis indicates the amplitude value (level) of the signal. In FIG. 3B, the vertical axis indicates the value of the window function W (t). Note that W (t) = 0.5−0.5 · cos (2πt / T), and the maximum value of W (t) is 1.

一般的なフーリエ変換の場合、図３（ａ）に示すような所定長さの信号に対して、図３（ａ）に示すような窓関数Ｗ（ｔ）を乗じて、図３（ｃ）に示すような信号に変換する。そして、図３（ｃ）に示すような波形の信号をフーリエ変換対象としてフーリエ変換を実行することになる。 In the case of a general Fourier transform, a signal having a predetermined length as shown in FIG. 3A is multiplied by a window function W (t) as shown in FIG. Into a signal as shown in Then, the Fourier transform is executed with a signal having a waveform as shown in FIG.

本実施形態では、フーリエ変換後の不連続性を除去するためではなく、一つの音響フレームから、情報を埋め込むための複数の状態を作り出すため、複数の窓関数を用意し、一つの音響フレームに対して、各窓関数を利用してフーリエ変換を行い、複数のスペクトルを得る。ここで、１音響フレームの信号波形および複数の窓関数Ｗ（１，ｉ）、Ｗ（２，ｉ）、Ｗ（３，ｉ）の様子を図４に示す。図４において、横軸は時間軸（ｉ）である。ｉは、後述するように、各音響フレーム内のＮ個のサンプルに付した通し番号であるため時刻ｔに比例している。また、図４（ａ）（ｅ）（ｆ）（ｇ）において縦軸は信号の振幅値（レベル）を示す。図４（ｂ）〜（ｄ）において縦軸は窓関数Ｗ（１，ｉ）、Ｗ（２，ｉ）、Ｗ（３，ｉ）の値を示しており、Ｗ（１，ｉ）、Ｗ（２，ｉ）、Ｗ（３，ｉ）の最大値はいずれも１である。 In this embodiment, in order to create a plurality of states for embedding information from one acoustic frame, not to remove the discontinuity after Fourier transform, a plurality of window functions are prepared, and one acoustic frame is created. On the other hand, Fourier transform is performed using each window function to obtain a plurality of spectra. Here, FIG. 4 shows a signal waveform of one acoustic frame and a plurality of window functions W (1, i), W (2, i), and W (3, i). In FIG. 4, the horizontal axis is the time axis (i). As described later, i is a serial number assigned to N samples in each acoustic frame, and is proportional to time t. In FIGS. 4A, 4E, 4F, and 4G, the vertical axis indicates the amplitude value (level) of the signal. 4B to 4D, the vertical axis indicates the values of the window functions W (1, i), W (2, i), W (3, i), and W (1, i), W The maximum values of (2, i) and W (3, i) are both 1.

第１窓関数Ｗ（１，ｉ）は、音響フレームの前部を抽出するためのものであり、図４（ｂ）に示すように前部の所定のサンプル番号ｉの位置において、最大値１をとり、後部においては、最小値０をとるように設定されている。どのサンプル番号の場合に最大値をとるかについては、第１窓関数Ｗ（１，ｉ）の設計によって異なってくる。第１窓関数Ｗ（１，ｉ）を乗じることにより、図４（ａ）に示すような音響フレームの信号波形は、図４（ｅ）に示すように、前部に信号成分が残り、後部の信号成分が削除されたものとなり、これがフーリエ変換対象となる。また、第２窓関数Ｗ（２，ｉ）は、音響フレームの中央部を抽出するためのものであり、図４（ｃ）に示すように、中央部の所定のサンプル番号ｉの位置において、最大値１をとり、前部、後部においては、最小値０をとるように設定されている。第２窓関数Ｗ（２，ｉ）を乗じることにより、図４（ａ）に示すような音響フレームの信号波形は、図４（ｆ）に示すように、中央部に信号成分が残り、前部と後部の信号成分が削除されたものとなり、これがフーリエ変換対象となる。また、第３窓関数Ｗ（３，ｉ）は、音響フレームの後部を抽出するためのものであり、図４（ｄ）に示すように、前部においては最小値０をとり、後部の所定のサンプル番号ｉの位置において、最大値１をとるように設定されている。第３窓関数Ｗ（３，ｉ）を乗じることにより、図４（ａ）に示すような音響フレームの信号波形は、図４（ｇ）に示すように、前部の信号成分が削除され、後部に信号成分が残ったものとなり、これがフーリエ変換対象となる。このように前部、中央部、後部を抽出した後、フーリエ変換を実行するため、前部、中央部、後部に対応したスペクトルが得られることになる。１つの音響フレームにビット値を埋め込むためには、本来、前部と後部の２つに分けられれば良いのであるが、抽出側においては、必ずしも、信号を同期して読み込むことができるとは限らず、したがって、前部と後部をはっきりと区別するため、本発明では、中央部の信号も抽出することとしている。 The first window function W (1, i) is for extracting the front part of the acoustic frame. As shown in FIG. 4B, the first window function W (1, i) has a maximum value of 1 at the position of the predetermined sample number i. And the rear part is set to have a minimum value of 0. Which sample number has the maximum value depends on the design of the first window function W (1, i). By multiplying the first window function W (1, i), the signal waveform of the acoustic frame as shown in FIG. 4A has a signal component remaining in the front part as shown in FIG. The signal component is deleted and becomes a Fourier transform target. The second window function W (2, i) is for extracting the central portion of the acoustic frame, and as shown in FIG. 4C, at the position of the predetermined sample number i in the central portion. The maximum value is 1, and the minimum value is set to 0 at the front and rear portions. By multiplying by the second window function W (2, i), the signal waveform of the acoustic frame as shown in FIG. 4A has a signal component remaining in the center as shown in FIG. The signal components of the rear part and the rear part are deleted, and this becomes a Fourier transform target. The third window function W (3, i) is for extracting the rear part of the acoustic frame. As shown in FIG. 4 (d), the front part has a minimum value of 0, and the rear part has a predetermined value. The maximum value 1 is set at the position of the sample number i. By multiplying the third window function W (3, i), the signal waveform of the acoustic frame as shown in FIG. 4A is deleted as shown in FIG. The signal component remains in the rear part, and this becomes a Fourier transform target. Thus, after extracting the front part, the central part, and the rear part, the Fourier transform is executed, so that spectra corresponding to the front part, the central part, and the rear part are obtained. In order to embed a bit value in one acoustic frame, the bit value is originally divided into two parts, a front part and a rear part. However, on the extraction side, it is not always possible to read a signal synchronously. Therefore, in order to clearly distinguish the front portion and the rear portion, the present invention also extracts the signal at the center portion.

具体的には、各窓関数は、ハニング窓関数Ｗ（ｉ）＝０．５−０．５・ｃｏｓ（２πｉ／Ｎ）を基本として窓分割を行うための窓分離関数Ｗｃ（ｉ）を以下の〔数式１〕に示すように定義し、この窓分離関数Ｗｃ（ｉ）を用いて定められる。 Specifically, each window function has a window separation function Wc (i) for performing window division based on the Hanning window function W (i) = 0.5−0.5 · cos (2πi / N) as follows. [Formula 1] and defined using this window separation function Wc (i).

〔数式１〕
３Ｎ／８≦ｉ≦５Ｎ／８のとき、Ｗｃ（ｉ）＝０．５−０．５・ｃｏｓ（８π（ｉ−３Ｎ／８）／Ｎ）
ｉ＜３Ｎ／８またはｉ＞５Ｎ／８のとき、Ｗｃ（ｉ）＝０．０ [Formula 1]
When 3N / 8 ≦ i ≦ 5N / 8, Wc (i) = 0.5−0.5 · cos (8π (i−3N / 8) / N)
When i <3N / 8 or i> 5N / 8, Wc (i) = 0.0

そして、この窓分離関数Ｗｃ（ｉ）を利用して、第１窓関数Ｗ（１，ｉ）、第２窓関数Ｗ（２，ｉ）、第３窓関数Ｗ（３，ｉ）を以下の〔数式２〕に示すように定義する。 Then, using this window separation function Wc (i), the first window function W (1, i), the second window function W (2, i), and the third window function W (3, i) are It is defined as shown in [Formula 2].

〔数式２〕
Ｗ（２，ｉ）＝Ｗ（ｉ）・Ｗｃ（ｉ）
ｉ≦Ｎ／２のとき、Ｗ（１，ｉ）＝｛１−Ｗｃ（ｉ）｝Ｗ（ｉ）、Ｗ（３，ｉ）＝０．０
ｉ＞Ｎ／２のとき、Ｗ（１，ｉ）＝０．０、Ｗ（３，ｉ）＝｛１−Ｗｃ（ｉ）｝Ｗ（ｉ） [Formula 2]
W (2, i) = W (i) · Wc (i)
When i ≦ N / 2, W (1, i) = {1−Wc (i)} W (i), W (3, i) = 0.0
When i> N / 2, W (1, i) = 0.0, W (3, i) = {1-Wc (i)} W (i)

図４および上記〔数式１〕〔数式２〕からわかるように、第１窓関数、第２窓関数、第３窓関数は、ハニング窓関数を１音響フレームにおいて時間軸上で分割したものである。このうち、第１窓関数と第２窓関数、第２窓関数と第３窓関数は、同一時刻において、双方が同時に０でない値をもつような箇所が存在するように、すなわち時間軸上で互いにオーバーラップして設定される。また、第１窓関数と第３窓関数は、一方が０でない値をもつ場合に他方は必ず０となるように、すなわち時間軸上で互いにオーバーラップしないように設定される。また、第１窓関数、第３窓関数は、図４（ｂ）（ｄ）に示すように、その両側が非対称な余弦関数をもつように設定される。これは、第１窓関数を通して処理された結果と第３窓関数を通して処理された結果との違いが明瞭になるようにするためで、処理された音響フレームが時間軸上でずれが発生しても、互いに逆の窓関数で解析されるのを防ぐことができる。 As can be seen from FIG. 4 and [Formula 1] and [Formula 2], the first window function, the second window function, and the third window function are obtained by dividing the Hanning window function on the time axis in one acoustic frame. . Of these, the first window function and the second window function, and the second window function and the third window function are such that there are places where both have non-zero values at the same time, that is, on the time axis. Set to overlap each other. Further, the first window function and the third window function are set so that when one has a non-zero value, the other always becomes zero, that is, does not overlap each other on the time axis. The first window function and the third window function are set so that both sides thereof have asymmetric cosine functions, as shown in FIGS. This is to make the difference between the result processed through the first window function and the result processed through the third window function clear, and the processed acoustic frame is shifted on the time axis. However, it is possible to prevent the analysis by the inverse window functions.

Ｓ１０５においてフーリエ変換を行う場合、具体的には、左チャンネル信号ｘｌ（ｉ）、右チャンネル信号ｘｒ（ｉ）（ｉ＝０，…，Ｎ−１）に対して、上記〔数式２〕に従った３つの窓関数である第１窓関数Ｗ（１，ｉ）、第２窓関数Ｗ（２，ｉ）、第３窓関数Ｗ（３，ｉ）を用いて、以下の〔数式３〕に従った処理を行い、左チャンネルに対応する変換データの実部Ａｌ（１，ｊ）、Ａｌ（２，ｊ）、Ａｌ（３，ｊ）、虚部Ｂｌ（１，ｊ）、Ｂｌ（２，ｊ）、Ｂｌ（３，ｊ）、右チャンネルに対応する変換データの実部Ａｒ（１，ｊ）、Ａｒ（２，ｊ）、Ａｒ（３，ｊ）、虚部Ｂｒ（１，ｊ）、Ｂｒ（２，ｊ）、Ｂｒ（３，ｊ）を得る。なお、窓関数Ｗ（１，ｉ）、Ｗ（２，ｉ）、Ｗ（３，ｉ）は、それぞれ音響フレームの前部（先頭）付近、中央付近、後部付近において値が大きくなる関数となっている。 When the Fourier transform is performed in S105, specifically, the left channel signal xl (i) and the right channel signal xr (i) (i = 0,..., N−1) are in accordance with the above [Equation 2]. Using the first window function W (1, i), the second window function W (2, i), and the third window function W (3, i), which are the three window functions, Processing is performed in accordance with the real part Al (1, j), Al (2, j), Al (3, j), imaginary part Bl (1, j), Bl (2,2) of the conversion data corresponding to the left channel. j), Bl (3, j), real part Ar (1, j), Ar (2, j), Ar (3, j), imaginary part Br (1, j), corresponding to the right channel, Br (2, j) and Br (3, j) are obtained. Note that the window functions W (1, i), W (2, i), and W (3, i) are functions whose values increase near the front (front), near the center, and near the rear of the acoustic frame, respectively. ing.

〔数式３〕
Ａｌ（１，ｊ）＝Σ_i=0,…,N-1Ｗ（１，ｉ）・ｘｌ（ｉ）・ｃｏｓ（２πｉｊ／Ｎ）
Ｂｌ（１，ｊ）＝Σ_i=0,…,N-1Ｗ（１，ｉ）・ｘｌ（ｉ）・ｓｉｎ（２πｉｊ／Ｎ）
Ａｌ（２，ｊ）＝Σ_i=0,…,N-1Ｗ（２，ｉ）・ｘｌ（ｉ）・ｃｏｓ（２πｉｊ／Ｎ）
Ｂｌ（２，ｊ）＝Σ_i=0,…,N-1Ｗ（２，ｉ）・ｘｌ（ｉ）・ｓｉｎ（２πｉｊ／Ｎ）
Ａｌ（３，ｊ）＝Σ_i=0,…,N-1Ｗ（３，ｉ）・ｘｌ（ｉ）・ｃｏｓ（２πｉｊ／Ｎ）
Ｂｌ（３，ｊ）＝Σ_i=0,…,N-1Ｗ（３，ｉ）・ｘｌ（ｉ）・ｓｉｎ（２πｉｊ／Ｎ）
Ａｒ（１，ｊ）＝Σ_i=0,…,N-1Ｗ（１，ｉ）・ｘｒ（ｉ）・ｃｏｓ（２πｉｊ／Ｎ）
Ｂｒ（１，ｊ）＝Σ_i=0,…,N-1Ｗ（１，ｉ）・ｘｒ（ｉ）・ｓｉｎ（２πｉｊ／Ｎ）
Ａｒ（２，ｊ）＝Σ_i=0,…,N-1Ｗ（２，ｉ）・ｘｒ（ｉ）・ｃｏｓ（２πｉｊ／Ｎ）
Ｂｒ（２，ｊ）＝Σ_i=0,…,N-1Ｗ（２，ｉ）・ｘｒ（ｉ）・ｓｉｎ（２πｉｊ／Ｎ）
Ａｒ（３，ｊ）＝Σ_i=0,…,N-1Ｗ（３，ｉ）・ｘｒ（ｉ）・ｃｏｓ（２πｉｊ／Ｎ）
Ｂｒ（３，ｊ）＝Σ_i=0,…,N-1Ｗ（３，ｉ）・ｘｒ（ｉ）・ｓｉｎ（２πｉｊ／Ｎ） [Formula 3]
Al (1, j) = Σi _{= 0,..., N-1} W (1, i) · xl (i) · cos (2πij / N)
Bl (1, j) = Σi _{= 0,..., N-1} W (1, i) · xl (i) · sin (2πij / N)
Al (2, j) = Σi _{= 0,..., N-1} W (2, i) · xl (i) · cos (2πij / N)
Bl (2, j) = Σ i = 0, ..., N-1 W (2, i) · xl (i) · sin (2πij / N)
Al (3, j) = Σi _{= 0,..., N-1} W (3, i) · xl (i) · cos (2πij / N)
Bl (3, j) = Σi _{= 0,..., N-1} W (3, i) · xl (i) · sin (2πij / N)
Ar (1, j) = Σi _{= 0,..., N-1} W (1, i) .xr (i) .cos (2πij / N)
Br (1, j) = Σ _{i = 0,..., N−1} W (1, i) · xr (i) · sin (2πij / N)
Ar (2, j) = Σi _{= 0,..., N-1} W (2, i) .xr (i) .cos (2πij / N)
Br (2, j) = Σi _{= 0,..., N-1} W (2, i) .xr (i) .sin (2πij / N)
Ar (3, j) = Σi _{= 0,..., N-1} W (3, i) .xr (i) .cos (2πij / N)
Br (3, j) = Σi _{= 0,..., N-1} W (3, i) .xr (i) .sin (2πij / N)

上記〔数式３〕において、ｉは、各音響フレーム内のＮ個のサンプルに付した通し番号であり、ｉ＝０，１，２，…Ｎ−１の整数値をとる。また、ｊは周波数の値について、値の小さなものから順に付した通し番号であり、ｉと同様にｊ＝０，１，２，…Ｎ−１の整数値をとる。サンプリング周波数が４４．１ｋＨｚ、Ｎ＝４０９６の場合、ｊの値が１つ異なると、周波数が１０．８Ｈｚ異なることになる。 In the above [Expression 3], i is a serial number assigned to N samples in each acoustic frame, and takes an integer value of i = 0, 1, 2,... N−1. Further, j is a serial number assigned in order from the smallest value of the frequency value, and takes an integer value of j = 0, 1, 2,... N−1 similarly to i. When the sampling frequency is 44.1 kHz and N = 4096, if the value of j is different by one, the frequency will be different by 10.8 Hz.

上記〔数式３〕に従った処理を実行することにより、各音響フレームの信号成分を周波数に対応した成分であるスペクトルで表現されたフレームスペクトルが得られる。続いて、低周波成分変更手段３０が、生成されたフレームスペクトルから３つの所定周波数範囲のスペクトル集合を抽出する。人間の聴覚は、２００〜３００Ｈｚ程度までの低周波成分については、方向性を感知しにくくなっていることが知られている（コロナ社１９９０年１０月３０日発行「音響工学講座１．基礎音響工学、日本音響学会編」ｐ２４７図９・２６参照）。したがって、本実施形態では、低周波成分を２００Ｈｚ程度以下としている。周波数２００Ｈｚ付近は、上記ｊが２０に相当するので、上記〔数式３〕により算出された実部Ａｌ（１，ｊ）、Ａｌ（２，ｊ）、Ａｌ（３，ｊ）、虚部Ｂｌ（１，ｊ）、Ａｌ（２，ｊ）、Ａｌ（３，ｊ）、実部Ａｒ（１，ｊ）、Ａｒ（２，ｊ）、Ａｒ（３，ｊ）、虚部Ｂｒ（１，ｊ）、Ｂｒ（２，ｊ）、Ｂｒ（３，ｊ）のうち、ｊ≦２０のものを抽出することになる。 By executing the processing according to the above [Equation 3], a frame spectrum in which the signal component of each acoustic frame is expressed by a spectrum corresponding to the frequency is obtained. Subsequently, the low frequency component changing unit 30 extracts a spectrum set of three predetermined frequency ranges from the generated frame spectrum. Human hearing is known to be less sensitive to directionality for low frequency components up to about 200-300 Hz (Corona Corp., issued October 30, 1990, "Sound Engineering Course 1. Basic Acoustics"). Engineering, Acoustical Society of Japan ”p.247 (see FIGS. 9 and 26). Therefore, in this embodiment, the low frequency component is about 200 Hz or less. In the vicinity of a frequency of 200 Hz, j corresponds to 20, so the real part Al (1, j), Al (2, j), Al (3, j), and imaginary part Bl ( 1, j), Al (2, j), Al (3, j), real part Ar (1, j), Ar (2, j), Ar (3, j), imaginary part Br (1, j) , Br (2, j), Br (3, j), j ≦ 20 are extracted.

続いて、低周波成分変更手段３０は、抽出した実部Ａｌ（１，ｊ）等（（２，ｊ）（３，ｊ）も含む、他の実部、虚部についても同様）、虚部Ｂｌ（１，ｊ）等、実部Ａｒ（１，ｊ）等、虚部Ｂｒ（１，ｊ）等のうち、左チャンネルの実部Ａｌ（１，ｊ）等、虚部Ｂｌ（１，ｊ）等を利用して、以下の〔数式４〕により、合算値Ｅ₁、合算値Ｅ₂を算出する。 Subsequently, the low frequency component changing means 30 extracts the extracted real part Al (1, j) and the like (the same applies to other real parts and imaginary parts including (2, j) (3, j)), and the imaginary part. Among the real parts Ar (1, j), etc., the imaginary parts Br (1, j), etc., among the real parts Al (1, j), etc., the imaginary parts Bl (1, j), etc. ) And the like, the total value E ₁ and the total value E ₂ are calculated by the following [Equation 4].

〔数式４〕
Ｅ₁＝Σ_j=1,…,M-3｛Ａｌ（１，ｊ）²＋Ｂｌ（１，ｊ）²＋Ａｒ（１，ｊ）²＋Ｂｒ（１，ｊ）²｝
Ｅ₂＝Σ_j=1,…,M-3｛Ａｌ（３，ｊ）²＋Ｂｌ（３，ｊ）²＋Ａｒ（３，ｊ）²＋Ｂｒ（３，ｊ）²｝ [Formula 4]
E ₁ = Σj _{= 1,..., M-3} {Al (1, j) ² + Bl (1, j) ² + Ar (1, j) ² + Br (1, j) ² }
E ₂ = Σ _{j = 1,..., M-3} {Al (3, j) ² + Bl (3, j) ² + Ar (3, j) ² + Br (3, j) ² }

上記〔数式４〕により算出されたＥ₁は音響フレーム前部付近のスペクトル集合の成分強度の合算値、Ｅ₂は音響フレーム後部付近のスペクトル集合の成分強度の合算値を示すことになる。続いて、この合算値Ｅ₁、Ｅ₂がレベル下限値Ｌｅｖ以上であるかどうかの判定を行う。レベル下限値Ｌｅｖは、音響信号ｘｌ（ｉ）、ｘｒ（ｉ）の振幅最大値が１に正規化されており、Ｍ＝２０に設定されている場合、０．０５に設定する。この条件において、経験的にアナログ変換への耐性が維持できるレベルは“０．５”となるが、本発明においては、後述するように、電子透かし抽出・照合手段２００において照合を行うため、余裕を持たせておく必要がある。したがって、本実施形態においては、アナログ変換への耐性が維持できるレベルの１／１０程度の０．０５に設定している。なお、上記〔数式４〕において、総和を計算する範囲を１〜Ｍ−３までとし、Ｍ−２、Ｍ−１、Ｍを除外しているのは、後述するように、非埋め込み領域との分離を明瞭にするためである。 E ₁ calculated by the above [Equation 4] indicates the sum of the component intensities of the spectrum set near the front of the acoustic frame, and E ₂ indicates the sum of the component intensities of the spectrum set near the rear of the acoustic frame. Subsequently, it is determined whether or not the combined values E ₁ and E ₂ are equal to or higher than the level lower limit value Lev. The level lower limit value Lev is set to 0.05 when the maximum amplitude value of the acoustic signals xl (i) and xr (i) is normalized to 1 and M = 20 is set. Under this condition, the level at which the resistance to analog conversion can be maintained empirically is “0.5”. However, in the present invention, as will be described later, the digital watermark extraction / collation means 200 performs collation, so It is necessary to have. Therefore, in this embodiment, it is set to 0.05, which is about 1/10 of the level at which the resistance to analog conversion can be maintained. In the above [Equation 4], the range for calculating the sum is from 1 to M-3, and M-2, M-1, and M are excluded, as will be described later. This is to make the separation clear.

合算値Ｅ₁、Ｅ₂がレベル下限値Ｌｅｖ以上であるかどうかを判断するのは、信号の強度が小さいと、信号を変化させても、その変化を抽出側で検出することができないためである。また、本実施形態では、第１の値（例えば“１”）と第２の値（例えば“０”）をとり得るビット値が“１”の場合、窓３成分に、ビット値が“０”の場合、窓１成分に、埋め込むこととしている。したがって、埋め込むビット値が“１”の場合は、合算値Ｅ₁が下限値Ｌｅｖ未満であるとき、埋め込むビット値が“０”の場合は、合算値Ｅ₂がレベル下限値Ｌｅｖ未満であるとき、付加情報のビット値に応じた記録をせず、モードを区切りモードに設定する（Ｓ１０６）。一方、埋め込むビット値が“１”で合算値Ｅ₁がレベル下限値Ｌｅｖ以上であるか、埋め込むビット値が“０”で合算値Ｅ₂がレベル下限値Ｌｅｖ以上であるときには、モードを判断することになる。 The reason why it is determined whether or not the combined values E ₁ and E ₂ are equal to or greater than the level lower limit value Lev is that if the signal strength is small, even if the signal is changed, the change cannot be detected on the extraction side. is there. In this embodiment, when the bit value that can take the first value (for example, “1”) and the second value (for example, “0”) is “1”, the bit value is “0” for the three components of the window. In the case of "," the window 1 component is embedded. Therefore, when the bit value to be embedded is “1”, the sum value E ₁ is less than the lower limit value Lev, and when the bit value to be embedded is “0”, the sum value E ₂ is less than the level lower limit value Lev. The mode is set to the separation mode without recording according to the bit value of the additional information (S106). On the other hand, when the bit value to be embedded is “1” and the total value E ₁ is greater than or equal to the level lower limit value Lev, or when the bit value to be embedded is “0” and the total value E ₂ is greater than or equal to the level lower limit value Lev, the mode is determined. It will be.

低周波成分変更手段３０は、モードが区切りモードである場合、左（Ｌ）チャンネル信号において、窓１成分と窓３成分を均等（全て０となる場合も含む）とする処理を行う（Ｓ１０８）。具体的には、以下の〔数式５〕に従って、Ｌ側の双方を０に設定する処理を実行することになる。この場合、右（Ｒ）チャンネル信号の窓１成分と窓３成分は必ずしも均等ではない。 When the mode is the separation mode, the low-frequency component changing unit 30 performs a process for equalizing the window 1 component and the window 3 component (including the case where all are 0) in the left (L) channel signal (S108). . Specifically, according to the following [Equation 5], processing for setting both L side to 0 is executed. In this case, the window 1 component and the window 3 component of the right (R) channel signal are not necessarily equal.

〔数式５〕
ｊ＝１〜Ｍに対して、
Ａｌ´（１，ｊ）＝０
Ｂｌ´（１，ｊ）＝０
Ａｌ´（３，ｊ）＝０
Ｂｌ´（３，ｊ）＝０
ステレオの場合、右信号に対応した以下も算出
Ｅ（１，ｊ）＝｛Ａｌ（１，ｊ）²＋Ｂｌ（１，ｊ）²＋Ａｒ（１，ｊ）²＋Ｂｒ（１，ｊ）²｝^1/2
Ａｒ´（１，ｊ）＝Ａｒ（１，ｊ）・Ｅ（１，ｊ）／｛Ａｒ（１，ｊ）²＋Ｂｒ（１，ｊ）²｝^1/2
Ｂｒ´（１，ｊ）＝Ｂｒ（１，ｊ）・Ｅ（１，ｊ）／｛Ａｒ（１，ｊ）²＋Ｂｒ（１，ｊ）²｝^1/2
Ｅ（３，ｊ）＝｛Ａｌ（３，ｊ）²＋Ｂｌ（３，ｊ）²＋Ａｒ（３，ｊ）²＋Ｂｒ（３，ｊ）²｝^1/2
Ａｒ´（３，ｊ）＝Ａｒ（３，ｊ）・Ｅ（３，ｊ）／｛Ａｒ（３，ｊ）²＋Ｂｒ（３，ｊ）²｝^1/2
Ｂｒ´（３，ｊ）＝Ｂｒ（３，ｊ）・Ｅ（３，ｊ）／｛Ａｒ（３，ｊ）²＋Ｂｒ（１，ｊ）²｝^1/2 [Formula 5]
For j = 1 to M,
Al ′ (1, j) = 0
Bl ′ (1, j) = 0
Al ′ (3, j) = 0
Bl ′ (3, j) = 0
In the case of stereo, the following corresponding to the right signal is also calculated: E (1, j) = {Al (1, j) ² + Bl (1, j) ² + Ar (1, j) ² + Br (1, j) ² } ^{1 / 2}
Ar ′ (1, j) = Ar (1, j) · E (1, j) / {Ar (1, j) ² + Br (1, j) ² } ^1/2
Br ′ (1, j) = Br (1, j) · E (1, j) / {Ar (1, j) ² + Br (1, j) ² } ^1/2
E (3, j) = {Al (3, j) ² + B1 (3, j) ² + Ar (3, j) ² + Br (3, j) ² } ^1/2
Ar ′ (3, j) = Ar (3, j) · E (3, j) / {Ar (3, j) ² + Br (3, j) ² } ^1/2
Br ′ (3, j) = Br (3, j) · E (3, j) / {Ar (3, j) ² + Br (1, j) ² } ^1/2

上記〔数式５〕に従った処理を実行することにより、左チャンネルのフレームスペクトルの低周波数成分は、窓１成分と窓３成分共に“０” で同一となる。この窓１成分と窓３成分が均等のパターンは、付加情報の先頭位置（区切り）を示す情報となる。なお、上記〔数式５〕においては、窓１成分と窓３成分ともにＡｌ´（ｊ）＝Ｂｌ´（ｊ）＝０としているが、抽出側で区切りであることが認識可能とすることを目的としているため、十分小さな値であれば、必ずしも０とする必要はない。また、必ずしも窓１成分と窓３成分において同一である必要はなく、差が小さければ良い。この意味で、ここでは「均等」という言葉を用いている。 By executing the processing according to the above [Equation 5], the low frequency component of the left channel frame spectrum is the same when both the window 1 component and the window 3 component are “0”. The pattern in which the window 1 component and the window 3 component are equal is information indicating the head position (separation) of the additional information. In the above [Equation 5], Al ′ (j) = Bl ′ (j) = 0 is set for both the window 1 component and the window 3 component, but the purpose is to make it possible to recognize the separation on the extraction side. Therefore, if the value is sufficiently small, it is not always necessary to set it to zero. Further, the window 1 component and the window 3 component do not necessarily have to be the same, and it is sufficient that the difference is small. In this sense, the term “equal” is used here.

一方、低周波成分変更手段３０は、モードがビットモードまたは継続識別モードである場合、付加情報記憶部６２から抽出した付加情報のビット配列のビット値に応じて、左チャンネル信号の窓１成分と窓３成分のスペクトル強度の割合を窓１成分が優位か、窓３成分が優位かのいずれかの状態に変更する処理を行う（Ｓ１０７）。ここで、「優位」とは、一方の窓成分のスペクトル集合におけるスペクトル強度が、他方の窓成分のスペクトル集合におけるスペクトル強度よりも大きいことを示す。そこで、Ｓ１０７においては、第１の値と第２の値をとり得るビット値に応じて以下の〔数式６〕、〔数式７〕のいずれかに従った処理を実行することにより、窓１成分のスペクトル強度と、窓３成分のスペクトル強度の大小関係を変更し、窓１成分が優位か、窓３成分が優位かのいずれかに変更する処理を行う。例えば、第１の値を１、第２の値を０とした場合、ビット値が１のとき、窓１成分に対して、以下の〔数式６〕に従った処理を実行する。なお、継続識別モードである場合、新規であるときは〔数式６〕に従って窓１成分のスペクトル強度と、窓３成分のスペクトル強度の大小関係を変更し、窓３成分優位に変更し、継続であるときは〔数式７〕に従って窓１成分優位に変更する。 On the other hand, when the mode is the bit mode or the continuous identification mode, the low-frequency component changing unit 30 determines the window 1 component of the left channel signal and the bit value of the bit array of the additional information extracted from the additional information storage unit 62. Processing is performed to change the ratio of the spectral intensity of the three window components to a state in which the first window component is dominant or the third window component is dominant (S107). Here, “dominant” indicates that the spectral intensity in the spectrum set of one window component is larger than the spectral intensity in the spectrum set of the other window component. Therefore, in S107, the window 1 component is obtained by executing the processing according to any of the following [Equation 6] and [Equation 7] according to the bit value that can take the first value and the second value. The magnitude relationship between the spectral intensity of the window 3 and the spectral intensity of the window 3 component is changed, and processing is performed to change the window 1 component to dominant or the window 3 component to dominant. For example, when the first value is 1 and the second value is 0, when the bit value is 1, the processing according to the following [Equation 6] is executed for the window 1 component. In the case of the continuous identification mode, if it is new, the magnitude relationship between the spectral intensity of the window 1 component and the spectral intensity of the window 3 component is changed according to [Equation 6], and the window 3 component is predominated. In some cases, the window 1 component is changed according to [Equation 7].

〔数式６〕
ｊ＝１〜Ｍに対して
Ａｌ´（１，ｊ）＝０
Ｂｌ´（１，ｊ）＝０
ステレオの場合、右信号に対応した以下も算出
Ｅ（１，ｊ）＝｛Ａｌ（１，ｊ）²＋Ｂｌ（１，ｊ）²＋Ａｒ（１，ｊ）²＋Ｂｒ（１，ｊ）²｝^1/2
Ａｒ´（１，ｊ）＝Ａｒ（１，ｊ）・Ｅ（１，ｊ）／｛Ａｒ（１，ｊ）²＋Ｂｒ（１，ｊ）²｝^1/2
Ｂｒ´（１，ｊ）＝Ｂｒ（１，ｊ）・Ｅ（１，ｊ）／｛Ａｒ（１，ｊ）²＋Ｂｒ（１，ｊ）²｝^1/2
Ｅ（３，ｊ）＝｛Ａｌ（３，ｊ）²＋Ｂｌ（３，ｊ）²＋Ａｒ（３，ｊ）²＋Ｂｒ（３，ｊ）²｝^1/2
Ａｌ´（３，ｊ）＝Ａｌ（３，ｊ）・Ｅ（３，ｊ）／｛Ａｌ（３，ｊ）²＋Ｂｌ（３，ｊ）²｝^1/2
Ｂｌ´（３，ｊ）＝Ｂｌ（３，ｊ）・Ｅ（３，ｊ）／｛Ａｌ（３，ｊ）²＋Ｂｌ（３，ｊ）²｝^1/2 [Formula 6]
For j = 1 to M, Al ′ (1, j) = 0
Bl ′ (1, j) = 0
In the case of stereo, the following corresponding to the right signal is also calculated: E (1, j) = {Al (1, j) ² + Bl (1, j) ² + Ar (1, j) ² + Br (1, j) ² } ^{1 / 2}
Ar ′ (1, j) = Ar (1, j) · E (1, j) / {Ar (1, j) ² + Br (1, j) ² } ^1/2
Br ′ (1, j) = Br (1, j) · E (1, j) / {Ar (1, j) ² + Br (1, j) ² } ^1/2
E (3, j) = {Al (3, j) ² + B1 (3, j) ² + Ar (3, j) ² + Br (3, j) ² } ^1/2
Al ′ (3, j) = Al (3, j) · E (3, j) / {Al (3, j) ² + Bl (3, j) ² } ^1/2
Bl ′ (3, j) = B1 (3, j) · E (3, j) / {Al (3, j) ² + Bl (3, j) ² } ^1/2

なお、上記〔数式６〕において、最後の３つの式において、情報が埋め込まれる左チャンネル信号の窓３成分に、右チャンネル信号の対応する周波数成分を加算して、Ａｌ´（３，ｊ）、Ｂｌ´（３，ｊ）を得ている。これにより、窓３成分と他の窓成分の強度の大小が明確となり、抽出側において、情報の抽出が行い易くなる。 In the above [Expression 6], in the last three expressions, the corresponding frequency component of the right channel signal is added to the window 3 component of the left channel signal in which information is embedded to obtain Al ′ (3, j), Bl ′ (3, j) is obtained. Thereby, the magnitudes of the strengths of the window 3 component and the other window components are clarified, and information can be easily extracted on the extraction side.

なお、この場合、窓３成分の一部の周波数成分に対しては、以下の〔数式７〕に従った処理を実行する。 In this case, the processing according to the following [Equation 7] is executed for some of the frequency components of the window 3 component.

〔数式７〕
ｊ＝Ｍ−２、Ｍ−１、Ｍの３成分に対して
Ａｌ´（３，ｊ）＝０
Ｂｌ´（３，ｊ）＝０
ステレオの場合、右信号に対応した以下も算出
Ｅ（３，ｊ）＝｛Ａｌ（３，ｊ）²＋Ｂｌ（３，ｊ）²＋Ａｒ（３，ｊ）²＋Ｂｒ（３，ｊ）²｝^1/2
Ａｒ´（３，ｊ）＝Ａｒ（３，ｊ）・Ｅ（３，ｊ）／｛Ａｒ（３，ｊ）²＋Ｂｒ（３，ｊ）²｝^1/2
Ｂｒ´（３，ｊ）＝Ｂｒ（３，ｊ）・Ｅ（３，ｊ）／｛Ａｒ（３，ｊ）²＋Ｂｒ（３，ｊ）²｝^1/2 [Formula 7]
For three components j = M−2, M−1 and M, Al ′ (3, j) = 0
Bl ′ (3, j) = 0
In the case of stereo, the following corresponding to the right signal is also calculated. E (3, j) = {Al (3, j) ² + Bl (3, j) ² + Ar (3, j) ² + Br (3, j) ² } ^{1 / 2}
Ar ′ (3, j) = Ar (3, j) · E (3, j) / {Ar (3, j) ² + Br (3, j) ² } ^1/2
Br ′ (3, j) = Br (3, j) · E (3, j) / {Ar (3, j) ² + Br (3, j) ² } ^1/2

ビット値が０のとき、窓３成分に対して、以下の〔数式８〕に従った処理を実行する。 When the bit value is 0, the processing according to the following [Equation 8] is executed for the three window components.

〔数式８〕
ｊ＝１〜Ｍの各成分に対して
Ａｌ´（３，ｊ）＝０
Ｂｌ´（３，ｊ）＝０
ステレオの場合、右信号に対応した以下も算出
Ｅ（３，ｊ）＝｛Ａｌ（３，ｊ）²＋Ｂｌ（３，ｊ）²＋Ａｒ（３，ｊ）²＋Ｂｒ（３，ｊ）²｝^1/2
Ａｒ´（３，ｊ）＝Ａｒ（３，ｊ）・Ｅ（３，ｊ）／｛Ａｒ（３，ｊ）²＋Ｂｒ（３，ｊ）²｝^1/2
Ｂｒ´（３，ｊ）＝Ｂｒ（３，ｊ）・Ｅ（３，ｊ）／｛Ａｒ（３，ｊ）²＋Ｂｒ（３，ｊ）²｝^1/2
Ｅ（１，ｊ）＝｛Ａｌ（１，ｊ）²＋Ｂｌ（１，ｊ）²＋Ａｒ（１，ｊ）²＋Ｂｒ（１，ｊ）²｝^1/2
Ａｌ´（１，ｊ）＝Ａｌ（１，ｊ）・Ｅ（１，ｊ）／｛Ａｌ（１，ｊ）²＋Ｂｌ（１，ｊ）²｝^1/2
Ｂｌ´（１，ｊ）＝Ｂｌ（１，ｊ）・Ｅ（１，ｊ）／｛Ａｌ（１，ｊ）²＋Ｂｌ（１，ｊ）²｝^1/2 [Formula 8]
For each component of j = 1 to M, Al ′ (3, j) = 0
Bl ′ (3, j) = 0
In the case of stereo, the following corresponding to the right signal is also calculated. E (3, j) = {Al (3, j) ² + Bl (3, j) ² + Ar (3, j) ² + Br (3, j) ² } ^{1 / 2}
Ar ′ (3, j) = Ar (3, j) · E (3, j) / {Ar (3, j) ² + Br (3, j) ² } ^1/2
Br ′ (3, j) = Br (3, j) · E (3, j) / {Ar (3, j) ² + Br (3, j) ² } ^1/2
E (1, j) = {Al (1, j) ² + Bl (1, j) ² + Ar (1, j) ² + Br (1, j) ² } ^1/2
Al ′ (1, j) = Al (1, j) · E (1, j) / {Al (1, j) ² + Bl (1, j) ² } ^1/2
Bl ′ (1, j) = Bl (1, j) · E (1, j) / {Al (1, j) ² + Bl (1, j) ² } ^1/2

なお、上記〔数式８〕においても、上記〔数式６〕と同様、最後の３つの式を用いて、Ｅ（１，ｊ）、Ａｌ´（１，ｊ）、Ｂｌ´（１，ｊ）を算出している。このように、情報が埋め込まれる左チャンネル信号の窓１成分に、右チャンネル信号の対応する周波数成分を加算して、Ａｌ´（１，ｊ）、Ｂｌ´（１，ｊ）を得ることにより、窓１成分と他の窓成分の強度の大小が明確となり、抽出側において、情報の抽出が行い易くなる。 In the above [Equation 8], as in the above [Equation 6], E (1, j), Al ′ (1, j), and Bl ′ (1, j) are changed using the last three equations. Calculated. In this way, by adding the corresponding frequency component of the right channel signal to the window 1 component of the left channel signal in which the information is embedded, Al ′ (1, j) and Bl ′ (1, j) are obtained, The magnitudes of the strengths of the window 1 component and the other window components are clarified, and information can be easily extracted on the extraction side.

なお、この場合、窓１成分の一部の周波数成分に対しては、以下の〔数式９〕に従った処理を実行する。 In this case, the processing according to the following [Equation 9] is executed for some of the frequency components of the window 1 component.

〔数式９〕
ｊ＝Ｍ−２、Ｍ−１、Ｍの３成分に対して
Ａｌ´（１，ｊ）＝０
Ｂｌ´（１，ｊ）＝０
ステレオの場合、右信号に対応した以下も算出
Ｅ（１，ｊ）＝｛Ａｌ（１，ｊ）²＋Ｂｌ（１，ｊ）²＋Ａｒ（１，ｊ）²＋Ｂｒ（１，ｊ）²｝^1/2
Ａｒ´（１，ｊ）＝Ａｒ（１，ｊ）・Ｅ（１，ｊ）／｛Ａｒ（１，ｊ）²＋Ｂｒ（１，ｊ）²｝^1/2
Ｂｒ´（１，ｊ）＝Ｂｒ（１，ｊ）・Ｅ（１，ｊ）／｛Ａｒ（１，ｊ）²＋Ｂｒ（１，ｊ）²｝^1/2 [Formula 9]
For three components j = M−2, M−1 and M, Al ′ (1, j) = 0
Bl ′ (1, j) = 0
In the case of stereo, the following corresponding to the right signal is also calculated: E (1, j) = {Al (1, j) ² + Bl (1, j) ² + Ar (1, j) ² + Br (1, j) ² } ^{1 / 2}
Ar ′ (1, j) = Ar (1, j) · E (1, j) / {Ar (1, j) ² + Br (1, j) ² } ^1/2
Br ′ (1, j) = Br (1, j) · E (1, j) / {Ar (1, j) ² + Br (1, j) ² } ^1/2

上記〔数式８〕〔数式９〕による処理を行った結果、窓１成分のｊ＝Ｍ−２、Ｍ−１、Ｍにおいては、値が“０”となるが、他は所定値以上の信号成分が存在することになる。したがって、この場合、窓１成分が優位な状態にスペクトル強度の割合が変更されたことになる。 As a result of performing the processing according to the above [Equation 8] and [Equation 9], a value of “0” is obtained at j = M−2, M−1, and M of one component of the window, but signals other than the predetermined value are obtained at other times. Ingredients will be present. Therefore, in this case, the ratio of the spectrum intensity is changed so that the window 1 component is dominant.

上記〔数式６〕および〔数式７〕、又は〔数式８〕および〔数式９〕のいずれかに従った処理を実行することにより、付加情報のビット配列の各ビット値に応じて、左チャンネル信号の窓１成分が優位か、窓３成分が優位かのどちらかのパターンに変更されることになる。 By executing the processing according to any one of [Formula 6] and [Formula 7], or [Formula 8] and [Formula 9], the left channel signal is determined according to each bit value of the bit array of the additional information. Thus, the pattern is changed so that the window 1 component is dominant or the window 3 component is dominant.

この場合、高周波帯と低周波数帯の間には、必ず信号成分が“０”の部分が存在し、これにより、高周波帯と低周波数帯の信号成分が混在することを防いでいる。結局、低周波成分変更手段３０は、区切りモードの場合に〔数式５〕に基づく処理をＳ１０８において行い、ビットモード又は継続識別モードの場合に〔数式６〕〔数式７〕又は〔数式８〕〔数式９〕に基づく処理をＳ１０７において行うことになる。 In this case, there is always a portion where the signal component is “0” between the high frequency band and the low frequency band, thereby preventing the signal components of the high frequency band and the low frequency band from being mixed. Eventually, the low frequency component changing means 30 performs processing based on [Equation 5] in the case of the separation mode in S108, and [Equation 6], [Equation 7] or [Equation 8] [Equation 8] in the bit mode or the continuous identification mode. The processing based on Equation 9] is performed in S107.

上記Ｓ１０７、Ｓ１０８いずれの場合であっても、次に、低周波成分変更手段３０は、窓２成分の削除を行う（Ｓ１０９）。具体的には、窓２成分に対して、以下の〔数式１０〕に従った処理を実行することになる。 In either case of S107 and S108, the low-frequency component changing unit 30 then deletes the two window components (S109). Specifically, the processing according to the following [Formula 10] is executed for the two components of the window.

〔数式１０〕
ｊ＝１〜Ｍの各成分に対して
Ａｌ´（２，ｊ）＝０
Ｂｌ´（２，ｊ）＝０
ステレオの場合、右信号に対応した以下も算出
Ｅ（２，ｊ）＝｛Ａｌ（２，ｊ）²＋Ｂｌ（２，ｊ）²＋Ａｒ（２，ｊ）²＋Ｂｒ（２，ｊ）²｝^1/2
Ａｒ´（２，ｊ）＝Ａｒ（２，ｊ）・Ｅ（２，ｊ）／｛Ａｒ（２，ｊ）²＋Ｂｒ（２，ｊ）²｝^1/2
Ｂｒ´（２，ｊ）＝Ｂｒ（２，ｊ）・Ｅ（２，ｊ）／｛Ａｒ（２，ｊ）²＋Ｂｒ（２，ｊ）²｝^1/2 [Formula 10]
For each component of j = 1 to M, Al ′ (2, j) = 0
Bl ′ (2, j) = 0
In the case of stereo, the following corresponding to the right signal is also calculated: E (2, j) = {Al (2, j) ² + Bl (2, j) ² + Ar (2, j) ² + Br (2, j) ² } ^{1 / 2}
Ar ′ (2, j) = Ar (2, j) · E (2, j) / {Ar (2, j) ² + Br (2, j) ² } ^1/2
Br ′ (2, j) = Br (2, j) · E (2, j) / {Ar (2, j) ² + Br (2, j) ² } ^1/2

次に、周波数逆変換手段４０が、上記Ｓ１０７〜Ｓ１０９の処理により各窓成分のスペクトル集合間の割合が変更されたフレームスペクトルを周波数逆変換して改変音響フレームを得る処理を行う（Ｓ１１０）。この周波数逆変換は、当然のことながら、周波数変換手段２０がＳ１０５において実行した手法に対応していることが必要となる。本実施形態では、周波数変換手段２０において、フーリエ逆変換を施しているため、周波数逆変換手段４０は、フーリエ逆変換を実行することになる。具体的には、上記〔数式５〕〜〔数式１０〕のいずれかにより得られたスペクトルの左チャンネルの実部Ａｌ´（１，ｊ）等、虚部Ｂｌ´（１，ｊ）等、右チャンネルの実部Ａｒ´（１，ｊ）等、虚部Ｂｒ´（１，ｊ）等を用いて、以下の〔数式１１〕に従った処理を行い、ｘｌ´（ｉ）、ｘｒ´（ｉ）を算出する。なお、上記〔数式５〕〜〔数式１０〕において処理されていない周波数成分については、Ａｌ´（１，ｊ）等、Ｂｌ´（１，ｊ）等、Ａｒ´（１，ｊ）等、Ｂｒ´（１，ｊ）等として、Ａｌ（１，ｊ）等、Ｂｌ（１，ｊ）等、Ａｒ（１，ｊ）等、Ｂｒ（１，ｊ）等を用いる。 Next, the frequency inverse transform means 40 performs the process of obtaining the modified acoustic frame by performing the frequency inverse transform on the frame spectrum in which the ratio between the spectrum sets of the window components is changed by the processes of S107 to S109 (S110). As a matter of course, this frequency inverse transform needs to correspond to the method executed by the frequency transform unit 20 in S105. In the present embodiment, since the frequency transform unit 20 performs the inverse Fourier transform, the frequency inverse transform unit 40 performs the inverse Fourier transform. Specifically, the real part Al ′ (1, j), etc., the imaginary part Bl ′ (1, j), etc. of the left channel of the spectrum obtained by any of the above [Formula 5] to [Formula 10], right Using the real part Ar ′ (1, j) of the channel, the imaginary part Br ′ (1, j), etc., processing according to the following [Equation 11] is performed, and xl ′ (i), xr ′ (i ) Is calculated. For frequency components not processed in the above [Formula 5] to [Formula 10], Al ′ (1, j), etc., Bl ′ (1, j), etc., Ar ′ (1, j), etc., Br, etc. As ′ (1, j) etc., Al (1, j) etc., Bl (1, j) etc., Ar (1, j) etc., Br (1, j) etc. are used.

〔数式１１〕
ｘｌ´（ｉ）＝１／Ｎ・｛Σ_jＡｌ´（１，ｊ）・ｃｏｓ（２πｉｊ／Ｎ）−Σ_jＢｌ´（１，ｊ）・ｓｉｎ（２πｉｊ／Ｎ）｝＋１／Ｎ・｛Σ_jＡｌ´（２，ｊ）・ｃｏｓ（２πｉｊ／Ｎ）−Σ_jＢｌ´（２，ｊ）・ｓｉｎ（２πｉｊ／Ｎ）｝＋１／Ｎ・｛Σ_jＡｌ´（３，ｊ）・ｃｏｓ（２πｉｊ／Ｎ）−Σ_jＢｌ´（３，ｊ）・ｓｉｎ（２πｉｊ／Ｎ）｝＋ｘｌｐ（ｉ＋Ｎ／２）
ｘｒ´（ｉ）＝１／Ｎ・｛Σ_jＡｒ´（１，ｊ）・ｃｏｓ（２πｉｊ／Ｎ）−Σ_jＢｒ´（１，ｊ）・ｓｉｎ（２πｉｊ／Ｎ）｝＋１／Ｎ・｛Σ_jＡｒ´（２，ｊ）・ｃｏｓ（２πｉｊ／Ｎ）−Σ_jＢｒ´（２，ｊ）・ｓｉｎ（２πｉｊ／Ｎ）｝＋１／Ｎ・｛Σ_jＡｒ´（３，ｊ）・ｃｏｓ（２πｉｊ／Ｎ）−Σ_jＢｒ´（３，ｊ）・ｓｉｎ（２πｉｊ／Ｎ）｝＋ｘｒｐ（ｉ＋Ｎ／２） [Formula 11]
xl' (i) = 1 / N · {Σ j Al' (1, j) · cos (2πij / N) -Σ j Bl' (1, j) · sin (2πij / N)} + 1 / N · { Σ _j Al ′ (2, j) · cos (2πij / N) −Σ _j Bl ′ (2, j) · sin (2πij / N)} + 1 / N · {Σ _j Al ′ (3, j) · cos (2πij / N) −Σ _j Bl ′ (3, j) · sin (2πij / N)} + xlp (i + N / 2)
xr' (i) = 1 / N · {Σ j Ar' (1, j) · cos (2πij / N) -Σ j Br' (1, j) · sin (2πij / N)} + 1 / N · { _{Σ j Ar' (2, j)} · cos (2πij / N) -Σ j Br' (2, j) · sin (2πij / N)} + 1 / N · {Σ j Ar' (3, j) · cos _{(2πij / N) -Σ j Br'} (3, j) · sin (2πij / N)} + xrp (i + N / 2)

上記〔数式１１〕においては、式が繁雑になるのを防ぐため、Σ_j=0,…_,N-1をΣ_jとして示している。 In the above [Expression 11], Σ _{j = 0,} ... _{, N−1} is shown as Σ _{j in} order to prevent the expression from becoming complicated.

上記〔数式１１〕における第１式の“＋ｘｌｐ（ｉ＋Ｎ／２）”、第２式の“＋ｘｒｐ（ｉ＋Ｎ／２）”の項は、直前に改変された改変音響フレームのデータｘｌｐ（ｉ）、ｘｒｐ（ｉ）が存在する場合に、時間軸上Ｎ／２サンプル分重複することを考慮して加算するためのものである。上記〔数式１１〕により改変音響フレームの左チャンネルの各サンプルｘｌ´（ｉ）、右チャンネルの各サンプルｘｒ´（ｉ）、が得られることになる。改変音響フレーム出力手段５０は、得られた改変音響フレームを順次出力ファイルに出力する（Ｓ１１１）。こうして１つの音響フレームに対する処理を終えたら、モードの判定を行い（Ｓ１１２）、モードが区切りモードである場合は、モードを継続識別モードに設定した後（Ｓ１１３）、音響フレーム読込手段１０が、音響フレームを読み込む（Ｓ１０４）。一方、モードがビットモード又は継続識別モードである場合は、モードをビットモードに設定した後（Ｓ１１４）、低周波成分変更手段３０が付加情報のビット配列中の次のビットを読み込む（Ｓ１０３）。以上のような処理を音響信号の両チャンネルの全サンプルに渡って実行していく。すなわち、所定数のサンプルを音響フレームとして読み込み、音響信号から読み込むべき音響フレームがなくなったら（Ｓ１０４）、処理を終了する。なお、Ｓ１０１において読み込んだ１ワードのデータの各ビットに対応する処理を終えた場合、Ｓ１０３からＳ１０１に戻り、付加情報の次のワードを読み込み処理をすることになる。付加情報の全ワードに対して、処理が終了した場合は、付加情報の先頭ワードに戻って処理を行う。この結果、全ての音響フレームに対して処理を行った全ての改変音響フレームが出力ファイルに記録されて、改変音響信号として得られる。得られた改変音響信号は、記憶手段６０内の改変音響信号記憶部６３に出力され、記憶される。 The terms “+ xlp (i + N / 2)” in the first equation and “+ xrp (i + N / 2)” in the second equation in the above [Equation 11] are the data xlp (i) of the modified acoustic frame modified immediately before, When xrp (i) exists, the addition is performed in consideration of overlapping of N / 2 samples on the time axis. By the above [Equation 11], each sample xl ′ (i) of the left channel and each sample xr ′ (i) of the right channel of the modified acoustic frame are obtained. The modified sound frame output means 50 sequentially outputs the obtained modified sound frames to the output file (S111). When the processing for one acoustic frame is completed in this manner, the mode is determined (S112). If the mode is the separation mode, the mode is set to the continuous identification mode (S113), and then the acoustic frame reading means 10 A frame is read (S104). On the other hand, when the mode is the bit mode or the continuous identification mode, after the mode is set to the bit mode (S114), the low frequency component changing means 30 reads the next bit in the bit array of the additional information (S103). The above processing is executed over all samples of both channels of the acoustic signal. That is, a predetermined number of samples are read as sound frames, and when there are no more sound frames to be read from the sound signal (S104), the process ends. When the processing corresponding to each bit of 1-word data read in S101 is completed, the process returns from S103 to S101, and the next word of the additional information is read. When the processing is completed for all the words of the additional information, the processing is returned to the first word of the additional information. As a result, all modified acoustic frames that have been processed for all acoustic frames are recorded in the output file and obtained as modified acoustic signals. The obtained modified acoustic signal is output to and stored in the modified acoustic signal storage unit 63 in the storage unit 60.

以上の処理による左チャンネル信号の変化の様子を図５を用いて説明する。図５において、図面左右方向は、時間軸であり、サンプル数に比例する。また、図中多数存在する矩形は、改変音響フレームの窓１成分、窓３成分を示している。窓成分を示す矩形の横幅はサンプル数、縦幅は強度を示しているが、図３においては、横幅、縦幅とも正確に示したものではなく、窓１成分に対応する先頭部分に強い信号成分があるか、窓３成分に対応する後部部分に強い信号成分があるかということを示すものである。図５（ａ）は、上記〔数式４〕により算出された合算値Ｅ₁、Ｅ₂がレベル下限値Ｌｅｖ未満となる音響フレームが存在しない場合、すなわち、付加情報を埋め込むには、良好な信号である場合を示している。図５（ｂ）は、上記〔数式４〕により算出された合算値Ｅ₁、Ｅ₂がレベル下限値Ｌｅｖ未満となる音響フレームが存在する場合、すなわち、付加情報を埋め込むには、良好でない信号である場合を示している。 The change of the left channel signal by the above processing will be described with reference to FIG. In FIG. 5, the horizontal direction in the drawing is the time axis and is proportional to the number of samples. In addition, a large number of rectangles in the figure indicate the window 1 component and the window 3 component of the modified acoustic frame. The horizontal width of the rectangle indicating the window component indicates the number of samples, and the vertical width indicates the strength. However, in FIG. 3, neither the horizontal width nor the vertical width is accurately shown, and a strong signal is applied to the leading portion corresponding to the window 1 component. It indicates whether there is a component and whether there is a strong signal component in the rear portion corresponding to the window 3 component. FIG. 5A shows a good signal when there is no acoustic frame in which the total values E ₁ and E ₂ calculated by the above [Equation 4] are less than the level lower limit value Lev, that is, for embedding additional information. The case is shown. FIG. 5B shows a signal that is not good for embedding additional information when there is an acoustic frame in which the combined values E ₁ and E ₂ calculated by the above [Equation 4] are less than the level lower limit value Lev. The case is shown.

例えば、付加情報として、１バイト目が「１１０１１１００」、２バイト目が「１１０００００１」の２バイトのビット配列を埋め込むとする。まず、各バイトの先頭には、区切りを示す情報として、窓１成分、窓３成分が均等な状態に設定されることになる。これは、Ｓ１０２により区切りモードに設定され、Ｓ１０８において、上記〔数式５〕に従った処理を実行した結果得られる。続いて、付加情報の各ビットに対応した処理を行う前に、新規であるか継続であるかを示す情報を記録することになる。これは、レベル下限値未満となる音響フレームが存在した場合であっても、その時点で処理したビットは有効とし、そこから継続して行うため、そのビットが新規であるか継続であるかの情報を記録しておく必要があるからである。そこで、区切りを示す情報を記録した後には、新規であるか継続であるかを示す情報を記録する。具体的には、区切りモードの状態で、モード判断を行うことにより（Ｓ１１２）、継続識別モードに設定され（Ｓ１１３）、付加情報のビットを読み込むことなく、音響フレームの抽出を行う（Ｓ１０４）。そして、周波数変換後（Ｓ１０５）、新規である場合には、〔数式６〕〔数式７〕に従った処理により、低周波成分の窓１成分、窓３成分間の分布を窓３成分が優位な状態に変更する（Ｓ１０７）。 For example, it is assumed that a 2-byte bit array in which the first byte is “11011100” and the second byte is “11000001” is embedded as additional information. First, at the beginning of each byte, the window 1 component and the window 3 component are set to be equal as information indicating a break. This is obtained as a result of executing the processing according to the above [Equation 5] in S108, which is set to the separation mode in S102. Subsequently, before the processing corresponding to each bit of the additional information is performed, information indicating whether the information is new or continued is recorded. This is because even if there is an acoustic frame that is less than the lower limit level, the bit processed at that time is valid and is continuously performed from there, so whether the bit is new or continuous. This is because it is necessary to record information. Therefore, after recording the information indicating the break, information indicating whether it is new or continued is recorded. Specifically, the mode determination is performed in the separation mode (S112), the continuous identification mode is set (S113), and the acoustic frame is extracted without reading the bits of the additional information (S104). Then, after the frequency conversion (S105), if it is new, the processing according to [Equation 6] and [Equation 7] is performed so that the distribution of the low frequency component between the window 1 component and the window 3 component is superior to the window 3 component. (S107).

このようにして、新規か継続かを示す情報を記録した後は、継続識別モードの状態でモード判断を行うため（Ｓ１１２）、ビットモードに設定され（Ｓ１１４）、レジスタから先頭のビットを読み込み（Ｓ１０３）、音響フレームの抽出を行う（Ｓ１０４）。図５（ａ）の例では、レベル下限値Ｌｅｖ未満となる音響フレームが存在しないため、１バイトが連続してＳ１０７により処理されることになる。これは、Ｓ１０３からＳ１１４を経由するループが８回連続して繰り返され、その間レベル下限値Ｌｅｖ未満であるとしてＳ１０６およびＳ１０８、Ｓ１１３を経由することがなかったことを示している。 After recording the information indicating whether it is new or continued in this way, the mode determination is performed in the state of the continuous identification mode (S112), so the bit mode is set (S114), and the first bit is read from the register ( (S103), an acoustic frame is extracted (S104). In the example of FIG. 5A, since there is no acoustic frame that is less than the level lower limit value Lev, one byte is continuously processed by S107. This indicates that the loop from S103 to S114 was repeated eight times in succession, and during that time, it was assumed that the level was lower than the lower limit value Lev, and that it did not pass through S106, S108, and S113.

図５（ｂ）の例では、上記〔数式６〕〔数式７〕に従った処理の結果、レベル下限値Ｌｅｖ未満となる音響フレームが存在するので、この場合Ｓ１０６およびＳ１０８を経由して、上記〔数式５〕に従った処理を実行した結果、窓１成分と窓３成分が均等な状態に設定される。この場合、Ｓ１０６において、区切りモードに設定されるため、Ｓ１１２を経由して、新規か継続かを示す情報を記録することになる。図５（ｂ）の例では、１バイト目の「１１０１１１００」を埋め込む場合に、最初は第１ビット目の「１」の１ビット処理した時点でレベル下限値Ｌｅｖ未満の音響フレームが出現しているため、区切りを示す情報を記録した後、継続を示す情報を記録し、継続して第２ビット目の「１」から処理をしている。そして、第２ビット目から第５ビット目の「１０１１」を処理した時点でレベル下限値Ｌｅｖ未満の音響フレームが出現しているため、区切りを示す情報を記録した後、継続を示す情報を記録し、継続して第６ビット目の「１」から処理をしている。 In the example of FIG. 5B, as a result of the processing according to the above [Formula 6] and [Formula 7], there is an acoustic frame that is less than the level lower limit value Lev. As a result of executing the processing according to [Equation 5], the window 1 component and the window 3 component are set to be equal. In this case, since the separation mode is set in S106, information indicating whether it is new or continued is recorded via S112. In the example of FIG. 5B, when embedding “11011100” of the first byte, an acoustic frame less than the level lower limit value Lev appears when 1 bit of “1” of the first bit is initially processed. Therefore, after the information indicating the break is recorded, the information indicating the continuation is recorded, and the processing is continued from “1” of the second bit. Then, since “1011” from the second bit to the fifth bit has been processed, an acoustic frame less than the level lower limit value Lev has appeared, so after recording the information indicating the break, the information indicating the continuation is recorded. Then, processing is continued from “1” of the sixth bit.

なお、図５の例では、付加情報が１ワード＝１バイトである場合について説明したが、図２に示す処理は、新規か継続かを示す情報を記録するため、付加情報を任意のビット数単位で記録することが可能である。 In the example of FIG. 5, the case where the additional information is 1 word = 1 byte has been described. However, since the process illustrated in FIG. 2 records information indicating whether the additional information is new or continued, the additional information may be any number of bits. It is possible to record in units.

上記の例では、可変長のワード単位で区切りを示す情報を挿入するようにしたが、さらにビット単位で区切りを示す情報を挿入することも可能である。この場合、音響フレーム読込手段１０が音響フレームの抽出を行う際、前後の音響フレームに重複する重複音響フレームを抽出し、この重複音響フレームに対して、〔数式１〕に従って周波数変換を行い、さらに、〔数式５〕に従って窓１成分と窓３成分を均等にする処理を行う。重複音響フレームは、前後の音響フレームと半数づつサンプルが重複するように設定する。例えば、先行する音響フレームがサンプル番号１から４０９６まで、後続する音響フレームがサンプル番号４０９７から８１９２までである場合、この間に設定される重複音響フレームはサンプル番号２０４９から６１４４までとなる。同様にして、音響信号の全区間について、重複音響フレームを読み込み、窓１成分と窓３成分を均等にする処理を行うことになる。 In the above example, information indicating a delimiter is inserted in word units of variable length, but information indicating a delimiter can be further inserted in bit units. In this case, when the acoustic frame reading means 10 extracts the acoustic frame, it extracts an overlapping acoustic frame that overlaps the preceding and following acoustic frames, performs frequency conversion on the overlapping acoustic frame according to [Equation 1], and In accordance with [Equation 5], the processing for equalizing the window 1 component and the window 3 component is performed. The overlapping sound frames are set so that half of the samples overlap with the preceding and following sound frames. For example, when the preceding sound frame is sample number 1 to 4096 and the subsequent sound frame is sample number 4097 to 8192, the overlapping sound frame set between this is sample number 2049 to 6144. Similarly, for all sections of the acoustic signal, a duplicate acoustic frame is read and processing for equalizing the window 1 component and the window 3 component is performed.

上記のように、重複音響フレームを設定して、その窓１成分と窓３成分を均等にする処理を行った場合、これを改変音響信号に反映させるため、窓１成分と窓３成分を均等にする処理後の重複フレームスペクトルに対して周波数逆変換を行って改変重複音響フレームを得て、さらに改変音響フレームと連結する処理を行う必要がある。 As described above, when the overlapping acoustic frame is set and the processing for equalizing the window 1 component and the window 3 component is performed, the window 1 component and the window 3 component are equalized in order to reflect this in the modified acoustic signal. It is necessary to perform a process of performing inverse frequency conversion on the duplicated frame spectrum after the processing to obtain a modified duplicated audio frame and further connecting it to the modified audio frame.

上記のようにして得られた改変音響信号の左チャンネルのうち、付加情報が埋め込まれている部分については、低周波成分は、窓１成分と窓３成分が均等となっているか、あるいは窓１成分が優位か、窓３成分が優位かの３通りの分布しかないことになる。しかし、高周波成分については、元の音響信号のままであるので、制作者の設定に基づいた種々な分布になる。また、上記の例で示したように、ステレオ音響信号を利用した場合には、左チャンネルにおいて変化させられた低周波成分は、上記〔数式５〕〜〔数式１０〕の処理からも明らかなように、必ず右チャンネルの低周波成分に付加されている。したがって、右チャンネルが左チャンネルにおいて削除された成分を補っているため、両チャンネル全体として見ると、信号の劣化がない。人間の聴覚は、高周波成分については、方向性を感知し易いが、低周波成分については、方向性を感知しにくくなっている。したがって、低周波成分が一方に偏っていても、聴いている人にとっては、通常の音響信号と変わりなく聴こえることになる。 Of the left channel of the modified acoustic signal obtained as described above, for the portion where the additional information is embedded, the low frequency component is equal to the window 1 component and the window 3 component, or the window 1 There are only three distributions: the component predominates or the window three component predominates. However, since the high frequency component remains the original acoustic signal, it has various distributions based on the setting of the producer. Further, as shown in the above example, when a stereo sound signal is used, the low frequency component changed in the left channel is apparent from the processing of [Formula 5] to [Formula 10]. In addition, it is always added to the low frequency component of the right channel. Therefore, since the right channel supplements the deleted component in the left channel, there is no signal degradation when viewed as both channels as a whole. Human auditory senses directionality with respect to high-frequency components, but it is difficult to sense directionality with respect to low-frequency components. Therefore, even if the low frequency component is biased to one side, it will be heard as if it is a normal acoustic signal for the listener.

（３．音響信号からの情報の抽出装置）
次に、本発明に係る音響信号からの情報の抽出装置について説明する。図６は、本発明に係る音響信号からの情報の抽出装置の一実施形態を示す構成図である。図６において、１００は音響信号入力手段、１１０は音響フレーム獲得手段、１２０は基準フレーム設定手段、１３０は位相変更フレーム設定手段、１４０は周波数変換手段、１５０は符号判定パラメータ算出手段、１６０は符号出力手段、１７０は付加情報抽出手段、１８０は音響フレーム保持手段である。 (3. Device for extracting information from acoustic signals)
Next, an apparatus for extracting information from an acoustic signal according to the present invention will be described. FIG. 6 is a block diagram showing an embodiment of an apparatus for extracting information from an acoustic signal according to the present invention. In FIG. 6, 100 is an acoustic signal input means, 110 is an acoustic frame acquisition means, 120 is a reference frame setting means, 130 is a phase change frame setting means, 140 is a frequency conversion means, 150 is a code determination parameter calculation means, and 160 is a code. An output means, 170 is an additional information extraction means, and 180 is an acoustic frame holding means.

音響信号入力手段１００は、流れている音声をデジタル音響信号として取得し、入力する機能を有している。現実には、マイクロフォンおよびＡ／Ｄ変換器により実現される。マイクロフォンとしては、左右の２チャンネルからの音声入力が可能な指向性マイクロフォンであることが必要である。音響フレーム獲得手段１１０は、入力されたデジタルのステレオ音響信号の各チャンネルから所定数のサンプルで構成される音響フレームを読み込む機能を有している。基準フレーム設定手段１２０は、読み込んだ音響フレームをオフセット値分移動させたものを基準フレームとして設定する機能を有している。位相変更フレーム設定手段１３０は、基準フレームと所定サンプルずつ移動させることにより位相を変更した音響フレームを位相変更フレームとして設定する機能を有している。周波数変換手段１４０は、図１に示した周波数変換手段２０と同様の機能を有している。符号判定パラメータ算出手段１５０は、生成された複数のフレームスペクトルから所定の周波数以下に相当する各低周波強度データを抽出し、窓１成分、窓３成分ごとに各低周波強度データの合算値Ｅ_c1、Ｅ_c2を算出し、この合算値Ｅ_c1、Ｅ_c2を符号判定パラメータとし、この符号判定パラメータＥ_c1、Ｅ_c2の比率に基づいて、所定の状態であると判断する機能を有している。 The acoustic signal input unit 100 has a function of acquiring and inputting a flowing sound as a digital acoustic signal. In reality, it is realized by a microphone and an A / D converter. The microphone needs to be a directional microphone capable of inputting sound from the left and right two channels. The acoustic frame acquisition means 110 has a function of reading an acoustic frame composed of a predetermined number of samples from each channel of the input digital stereo acoustic signal. The reference frame setting unit 120 has a function of setting, as a reference frame, a read sound frame that has been moved by an offset value. The phase change frame setting means 130 has a function of setting, as a phase change frame, an acoustic frame whose phase has been changed by moving the reference frame and predetermined samples. The frequency conversion means 140 has the same function as the frequency conversion means 20 shown in FIG. The code determination parameter calculation unit 150 extracts each low frequency intensity data corresponding to a predetermined frequency or less from the generated plurality of frame spectra, and adds the low frequency intensity data for each of the window 1 component and the window 3 component. _c1 and _Ec2 are calculated, the combined values _Ec1 and _Ec2 are used as code determination parameters, and a function is determined based on the ratio of the code determination parameters _Ec1 and _Ec2. Yes.

符号出力手段１６０は、１つの基準フレームに対応する音響フレーム（基準フレームおよび位相変更フレーム）の中から最適な位相であると判断されるものを判断し、その音響フレームの状態に対応する符号を出力する機能を有している。また、以降の基準フレームを設定するためのオフセット値を決定する機能も有している。付加情報抽出手段１７０は、符号出力手段１６０により出力された符号の集合である３値配列を、所定の規則により変換して意味のある付加情報として抽出する機能を有している。音響フレーム保持手段１８０は、各チャンネルごとに、連続する２個の音響フレームを保持可能なバッファメモリである。図６に示した各構成手段は、現実には情報処理機能を有する小型のコンピュータおよびその周辺機器等のハードウェアに専用のプログラムを搭載することにより実現される。特に、本発明の目的をより簡易に達成するためには、携帯型端末装置をハードウェアとして用いることが望ましい。 The code output means 160 determines which one of the acoustic frames (reference frame and phase change frame) corresponding to one reference frame is determined to have the optimum phase, and selects a code corresponding to the state of the acoustic frame. It has a function to output. It also has a function of determining an offset value for setting a subsequent reference frame. The additional information extracting unit 170 has a function of converting the ternary array, which is a set of codes output by the code output unit 160, according to a predetermined rule and extracting it as meaningful additional information. The acoustic frame holding means 180 is a buffer memory capable of holding two consecutive acoustic frames for each channel. Each component shown in FIG. 6 is actually realized by mounting a dedicated program on hardware such as a small computer having an information processing function and its peripheral devices. In particular, in order to achieve the object of the present invention more easily, it is desirable to use a portable terminal device as hardware.

（４．抽出装置の処理動作）
次に、図６に示した音響信号からの情報の抽出装置の処理動作について図７のフローチャートに従って説明する。利用者が流れている音楽について、その楽曲名等の属性情報を知りたいと思った場合、まず、抽出装置に対して、抽出装置としての起動の指示を行う。これは、例えば、抽出装置を携帯電話機等の携帯端末で実現している場合は、所定のボタンを操作することにより実行できる。抽出装置は、指示が入力されると、音響信号入力手段１００が、流れている音楽を録音し、デジタル化してデジタル音響信号として入力する。具体的には、指向性マイクロフォンの左右から入力される音声を、それぞれＡ／Ｄ変換器によりデジタル化する処理を行うことになる。 (4. Processing operation of extraction device)
Next, the processing operation of the apparatus for extracting information from the acoustic signal shown in FIG. 6 will be described with reference to the flowchart of FIG. When the user wants to know the attribute information such as the song name of the music that is playing, first, the extraction device is instructed to start as the extraction device. For example, this can be executed by operating a predetermined button when the extraction device is realized by a mobile terminal such as a mobile phone. When an instruction is input to the extraction device, the acoustic signal input unit 100 records the flowing music, digitizes it, and inputs it as a digital acoustic signal. Specifically, the audio input from the left and right sides of the directional microphone is digitized by an A / D converter.

続いて、平均符号レベルＨＬ１、ＨＬ２、位相判定テーブルＳ（ｐ）、非符号カウンタＮｎ、微小オフセット値の５つを初期化する（Ｓ２００）。これらについて説明する。平均符号レベルＨＬ１、ＨＬ２は、ビット値に対応する２値が埋め込まれていたと判断される音響フレーム（以下、有効フレームと呼ぶことにする）についての、下記〔数式１２〕で算出される低周波成分の合算値Ｅ_c1、Ｅ_c2の平均値、すなわち、過去の有効フレームにおける合算値Ｅ_c1、Ｅ_c2の平均値で与えられるものであり、初期値は、上記埋め込み装置においても用いられるレベル下限値Ｌｅｖの２倍に設定されている。したがって、本実施形態では、ＨＬ１、ＨＬ２の初期値はともに“０．１”に設定されることになる。位相判定テーブルＳ（ｐ）は、位相を判定するためのテーブルであり、ｐは０〜５の整数値をとる。初期値はＳ（ｐ）＝０に設定されている。非符号カウンタＮｎは、信号レベルが低く、非符号（区切りを示す符号と同一）であると判断されるフレーム数のカウンタであり、初期状態では、Ｎｎ＝０に設定される。微小オフセット値は、音響信号から抽出した基準フレームを移動させる場合に、そのオフセット値を示すものである。オフセット値は、８通りの値をとり、各値は、各フレームサイズの位相変更分をさらに１／８にした値、すなわち、本実施形態では、４０９６×１／６×１／８単位で設定される。 Subsequently, the average code levels HL1, HL2, the phase determination table S (p), the non-code counter Nn, and the minute offset value are initialized (S200). These will be described. The average code levels HL1 and HL2 are low frequencies calculated by the following [Equation 12] for an acoustic frame (hereinafter referred to as an effective frame) that is determined to have a binary value corresponding to a bit value embedded therein. the average value of the sum E _c1, E _c2 components, i.e., which is given by the average value of the sum E _c1, E _c2 in the past of the effective frame, the initial value is, the level limit is also used in the above embedding device It is set to twice the value Lev. Therefore, in this embodiment, the initial values of HL1 and HL2 are both set to “0.1”. The phase determination table S (p) is a table for determining the phase, and p takes an integer value of 0 to 5. The initial value is set to S (p) = 0. The non-sign counter Nn is a counter of the number of frames determined to have a low signal level and non-sign (same as a code indicating a delimiter), and is set to Nn = 0 in the initial state. The minute offset value indicates the offset value when the reference frame extracted from the acoustic signal is moved. The offset value takes eight values, and each value is a value obtained by further reducing the phase change amount of each frame size to 1/8, that is, in this embodiment, set in units of 4096 × 1/6 × 1/8. Is done.

続いて、音響フレーム獲得手段１１０が、音響信号入力手段１００から入力されたステレオ音響信号の各チャンネルから、それぞれ所定数のサンプルで構成される音響フレームを抽出する（Ｓ２０１）。具体的には、音響フレームを抽出して音響フレーム保持手段１８０に読み込むことになる。音響フレーム獲得手段１１０が基準フレームとして読み込む１音響フレームのサンプル数は、図１に示した音響フレーム読込手段１０で設定されたものと同一にする必要がある。したがって、本実施形態の場合、音響フレーム獲得手段１１０は、左チャンネル、右チャンネルについてそれぞれ４０９６サンプルずつ、順次基準フレームとして読み込んでいくことになる。音響フレーム保持手段１８０には、上述のように各チャンネルごとに２個の基準フレームが格納可能となっており、新しい基準フレームが読み込まれると、古い基準フレームを破棄するようになっている。したがって、音響フレーム保持手段１８０には、常に各チャンネルごとに基準フレーム２個分（連続する８１９２サンプル）が格納されていることになる。基準フレーム設定手段１２０は、オフセット値が“０”である場合は、音響フレーム保持手段１８０に保持されている各音響フレームを、そのまま基準フレームとして設定する。 Subsequently, the acoustic frame acquisition unit 110 extracts an acoustic frame composed of a predetermined number of samples from each channel of the stereo acoustic signal input from the acoustic signal input unit 100 (S201). Specifically, an acoustic frame is extracted and read into the acoustic frame holding unit 180. The number of samples of one acoustic frame read as the reference frame by the acoustic frame acquisition unit 110 needs to be the same as that set by the acoustic frame reading unit 10 shown in FIG. Therefore, in the present embodiment, the acoustic frame acquisition unit 110 sequentially reads 4096 samples for the left channel and the right channel as reference frames. As described above, the acoustic frame holding unit 180 can store two reference frames for each channel. When a new reference frame is read, the old reference frame is discarded. Accordingly, the acoustic frame holding means 180 always stores two reference frames (continuous 8192 samples) for each channel. When the offset value is “0”, the reference frame setting unit 120 sets each sound frame held in the sound frame holding unit 180 as a reference frame as it is.

埋め込み装置で処理する音響フレームは、オフセット値が“０”の場合に、先頭から途切れることなく隣接して設定される基準フレームと、この基準フレームと位相を変更した位相変更フレームとに分けることができる。基準フレームについては、最初の基準フレームとしてサンプル番号１からサンプル番号４０９６までを設定したら、次の基準フレームは、サンプル番号４０９７からサンプル番号８１９２、さらに次の基準フレームは、サンプル番号８１９３からサンプル番号１２２８８、というように、オフセット値が“０”である限り、途切れることなく設定される。そして、各基準フレームについて、１／６フレーム（約６８３サンプル）ずつ移動した５個の位相変更フレームを設定する。例えば、最初の基準フレームについては、サンプル番号６８３、１３６６、２０４９、２７３２、３４１３から始まる４０９６のサンプルで構成される５個の位相変更フレームが設定されることになる。 When the offset value is “0”, the acoustic frame to be processed by the embedding device can be divided into a reference frame that is set adjacently without interruption from the beginning, and a phase change frame in which the phase is changed with the reference frame. it can. For the reference frame, after setting sample number 1 to sample number 4096 as the first reference frame, the next reference frame is sample number 4097 to sample number 8192, and the next reference frame is sample number 8193 to sample number 12288. As long as the offset value is “0”, it is set without interruption. Then, for each reference frame, five phase change frames moved by 1/6 frame (about 683 samples) are set. For example, for the first reference frame, five phase change frames configured by 4096 samples starting from sample numbers 683, 1366, 2049, 2732, and 3413 are set.

続いて、周波数変換手段１４０、符号判定パラメータ算出手段１５０が、読み込んだ各音響フレームから、埋め込まれている情報を判定し、対応する符号を出力する（Ｓ２０２）。出力される情報の形式は、埋め込み側のビット値に対応する２値、および区切りとして入力された値の３値の形式となる。 Subsequently, the frequency conversion unit 140 and the code determination parameter calculation unit 150 determine embedded information from each read sound frame and output a corresponding code (S202). The format of the information to be output is a binary format corresponding to the bit value on the embedding side and a ternary format of values input as delimiters.

ここで、Ｓ２０２の符号出力処理の詳細を図８のフローチャートに従って説明する。まず、符号判定パラメータ算出手段１５０は、候補符号テーブルの初期化を行う（Ｓ３０４）。候補符号テーブルは、１つの基準フレームおよび５個の位相変更フレームを特定する０〜５の位相番号および、この６個の音響フレームの状態から得られる３値の符号を記録するものである。 Details of the code output process of S202 will be described with reference to the flowchart of FIG. First, the code determination parameter calculation unit 150 initializes the candidate code table (S304). The candidate code table records a phase number of 0 to 5 that specifies one reference frame and five phase change frames, and a ternary code obtained from the states of the six acoustic frames.

次に、アドレスの微小オフセット処理を行う（Ｓ３００）。具体的には、オフセット番号（オフセット値）に従って、基準フレームおよび位相変更フレームの読込位置を、所定フレーム数移動させた位置に設定する。 Next, an address fine offset process is performed (S300). Specifically, according to the offset number (offset value), the reading position of the reference frame and the phase change frame is set to a position moved by a predetermined number of frames.

続いて、周波数変換手段１４０が、前記オフセット処理された位置から読み込んだ各音響フレームに対して、周波数変換を行ってフレームスペクトルを得る（Ｓ３０１）。この処理は、図１に示した周波数変換手段２０における処理と同様である。ただし、抽出に用いるのは、左チャンネルだけであるので、上記〔数式１〕に従った処理を行い、左チャンネルに対応する変換データの実部Ａｌ（１，ｊ）等、虚部Ｂｌ（１，ｊ）等を得る。 Subsequently, the frequency conversion unit 140 performs frequency conversion on each acoustic frame read from the offset-processed position to obtain a frame spectrum (S301). This process is the same as the process in the frequency conversion means 20 shown in FIG. However, since only the left channel is used for extraction, the processing according to the above [Equation 1] is performed, and the imaginary part Bl (1) such as the real part Al (1, j) of the conversion data corresponding to the left channel. , J) etc.

上記周波数変換手段１４０における処理により、周波数に対応した成分であるスペクトルで表現されたフレームスペクトルが得られる。続いて、符号判定パラメータ算出手段１５０は、平均符号レベルＨＬ１、ＨＬ２の算出を行う（Ｓ３０２）。具体的には、過去窓１成分が優位な状態と判断された音響フレームについての合算値Ｅ_c1の積算値であるｖ１を、過去窓１成分が優位な状態と判断された音響フレームの数であるｎ１で除算することによりＨＬ１を算出し、過去窓３成分が優位な状態と判断された音響フレームについての合算値Ｅ_c2の積算値であるｖ２を、過去窓３成分が優位な状態と判断された音響フレームの数であるｎ２で除算することによりＨＬ２を算出する。ここで、Ｅ_c1およびＥ_c2は後述する左チャンネル信号の各々窓１成分および窓３成分の合算値で、以下〔数式１２〕で与えられる。 By the processing in the frequency conversion means 140, a frame spectrum expressed by a spectrum that is a component corresponding to the frequency is obtained. Subsequently, the code determination parameter calculation unit 150 calculates the average code levels HL1 and HL2 (S302). Specifically, v1 which is an integrated value of the total value E _c1 of the acoustic frames determined to have the dominant past window 1 component is represented by the number of acoustic frames determined to have the dominant past window 1 component. HL1 is calculated by dividing by a certain n1, and v2 that is an integrated value of the total value E _c2 for the acoustic frame determined to be in the dominant state of the past window three components is determined to be in the dominant state of the past window three components. HL2 is calculated by dividing by n2, which is the number of sound frames that have been performed. Here, E _c1 and E _c2 are the sum of the window 1 component and the window 3 component of the left channel signal, which will be described later, and are given by the following [Equation 12].

〔数式１２〕
Ｅ_c1＝Σ_j=1,…_,M-3｛Ａｌ（１，ｊ）²＋Ｂｌ（１，ｊ）²｝
Ｅ_c2＝Σ_j=1,…_,M-3｛Ａｌ（３，ｊ）²＋Ｂｌ（３，ｊ）²｝ [Formula 12]
E _c1 = Σ _{j = 1,} ... _{, M-3} {Al (1, j) ² + Bl (1, j) ² }
E _c2 = Σ _{j = 1,} ... _{, M-3} {Al (3, j) ² + Bl (3, j) ² }

したがって、平均符号レベルＨＬ１、ＨＬ２は、過去対応する窓成分が優位な状態と判断された音響フレームの低周波強度データの合算値の平均値に近くなる。厳密な平均値とならないのは、上述のように、ＨＬ１、ＨＬ２の初期値はともに“０．１”が設定されており、この初期値に合算値Ｅ_c1、合算値Ｅ_c2を分子に加算していくためである。また、平均符号レベルＨＬ１、ＨＬ２の算出のための分母として用いるｎ１、ｎ２は、上記合算値Ｅ_c1、合算値Ｅ_c2が各々ｖ１、ｖ２に加算されるごとに、同時に各々１づつ加算されるが、最大値ｎｍａｘを超えないようにし、ｎ１＝ｎｍａｘとなった場合には、ｖ１をｖ１／２に、ｎ１をｎ１／２に再設定し、ｎ２＝ｎｍａｘとなった場合には、ｖ２をｖ２／２に、ｎ２をｎ２／２に再設定する。これにより、絶えず最新のｎｍａｘ／２個の合算値Ｅ_c1、合算値Ｅ_c2で殆ど平均符号レベルＨＬ１、ＨＬ２は決定され、信号レベルの変動に速やかに追従できるようになる。 Therefore, the average code levels HL1 and HL2 are close to the average value of the sum value of the low-frequency intensity data of the sound frame for which it has been determined that the window component corresponding to the past is dominant. As described above, the initial value of HL1 and HL2 is both set to “0.1”, and the summation value E _c1 and the summation value E _c2 are added to the numerator. It is to do. In addition, n1 and n2 used as denominators for calculating the average code levels HL1 and HL2 are added one by one each time the summed value E _c1 and summed value E _c2 are added to v1 and v2, respectively. However, if n1 = nmax, v1 is reset to v1 / 2, n1 is reset to n1 / 2, and if n2 = nmax, v2 is Reset v2 / 2 to n2 / 2. As a result, the average code levels HL1 and HL2 are almost always determined by the latest nmax / 2 summation value E _c1 and summation value E _c2 , and it becomes possible to quickly follow fluctuations in the signal level.

さらに、符号判定パラメータ算出手段１５０は、生成されたフレームスペクトルから所定の周波数範囲の各低周波強度データを抽出する。抽出すべき周波数範囲は、埋め込み装置と対応させる必要がある。したがって、ここでは、周波数が２００Ｈｚ以下の低周波強度データを抽出することになり、埋め込み装置の場合と同様、上記〔数式１〕により算出された左チャンネルの実部Ａｌ（ｊ）、虚部Ｂｌ（ｊ）のうち、ｊ≦２０のものを抽出する。そして、符号判定パラメータ算出手段１５０は、上記〔数式１２〕に従った処理を実行することにより左チャンネル信号の窓１成分の合算値Ｅ_c1、窓３成分の合算値Ｅ_c2を算出する。このＥ_c1、Ｅ_c2を符号判定パラメータとして用いる。 Further, the code determination parameter calculation means 150 extracts each low frequency intensity data in a predetermined frequency range from the generated frame spectrum. The frequency range to be extracted needs to correspond to the embedding device. Therefore, here, low frequency intensity data having a frequency of 200 Hz or less is extracted, and the real part Al (j) and imaginary part Bl of the left channel calculated by the above [Equation 1] as in the case of the embedding device. Among (j), those with j ≦ 20 are extracted. The code determination parameter calculating means 150, the sum of the window 1 component of the left channel signal by performing a process in accordance with the [Equation 12] E _c1, calculates the sum E _c2 of the window 3 components. These E _c1 and E _c2 are used as sign determination parameters.

続いて、符号判定パラメータ算出手段１５０は、窓１成分の合算値Ｅ_c1、窓３成分の合算値Ｅ_c2がそれぞれ所定値以下であるかどうかの判定を行う（Ｓ３０５）。ここで、所定値として符号判定レベルＦＬ１、ＦＬ２を用いる。符号判定レベルＦＬ１、ＦＬ２としては、それぞれ平均符号レベルＨＬ１、ＨＬ２に０．００１を乗じた値を設定する。合算値Ｅ_c1がＦＬ１未満であり、かつ、合算値Ｅ_c2がＦＬ２未満である場合、符号判定パラメータ算出手段１５０は、区切り情報であると判定する（Ｓ３０９）。 Subsequently, the code determination parameter calculating means 150 makes a determination window 1 sum of components E _c1, sum E _c2 of the window 3 components of whether each is below a predetermined value (S305). Here, the code determination levels FL1 and FL2 are used as predetermined values. As the code determination levels FL1 and FL2, values obtained by multiplying the average code levels HL1 and HL2 by 0.001 are set. When the total value E _c1 is less than FL1 and the total value E _c2 is less than FL2, the code determination parameter calculation unit 150 determines that the information is delimiter information (S309).

一方、合算値Ｅ_c1、Ｅ_c2のうちいずれか一方が所定値以上である場合、すなわち、Ｅ_c1＞ＦＬ１またはＥ_c2＞ＦＬ２の条件を満たす場合、符号判定パラメータ算出手段１５０は、上記算出された符号判定パラメータＥ_c1、Ｅ_c2の比較判定を以下の〔数式１３〕に従って行い（Ｓ３０６）、比較結果に対応する符号を出力する。 On the other hand, when either one of the total values E _c1 and E _c2 is equal to or greater than a predetermined value, that is, when the condition of E _c1 > FL1 or E _c2 > FL2 is satisfied, the code determination parameter calculation unit 150 calculates the above. The code determination parameters E _c1 and E _c2 are compared and determined according to the following [Equation 13] (S306), and a code corresponding to the comparison result is output.

〔数式１３〕
Ｅ_c2＞ＦＬ２かつＥ_c2／Ｅ_c1＞２の場合、窓３成分が優位な状態
Ｅ_c1＞ＦＬ１かつＥ_c1／Ｅ_c2＞２の場合、窓１成分が優位な状態
上記いずれでもない場合（Ｅ_c2≦ＦＬ２またはＥ_c2／Ｅ_c1≦２、かつＥ_c1≦ＦＬ１またはＥ_c1／Ｅ_c2≦２）、両窓成分が均等 [Formula 13]
When E _c2 > FL2 and E _c2 / E _c1 > 2, the window 3 component is dominant. When E _c1 > FL1 and E _c1 / E _c2 > 2, the window 1 component is dominant. E _c2 ≦ FL2 or E _c2 / E _c1 ≦ 2 and E _c1 ≦ FL1 or E _c1 / E _c2 ≦ 2), both window components are equal

符号判定パラメータ算出手段１５０は、各音響フレーム単位で、上記判定結果に応じて３値の符号を出力する。すなわち、窓３成分が優位と判定した場合には、第１のビット値（例えば“１”）を出力し（Ｓ３０７）、窓１成分が優位と判定した場合には、第２のビット値（例えば“０”）を出力し（Ｓ３０８）、両窓成分が均等と判定した場合には、区切り情報を示す符号を出力する（Ｓ３０９）。窓３成分が優位と判定して、第１のビット値を出力した場合（Ｓ３０７）、又は窓１成分が優位と判定して、第２のビット値を出力した場合（Ｓ３０８）は、さらに、以下の〔数式１４〕に従って位相判定テーブルＳ（ｐ）の更新を行う（Ｓ３１０）。 The code determination parameter calculation means 150 outputs a ternary code according to the determination result for each acoustic frame. That is, when the window 3 component is determined to be dominant, the first bit value (eg, “1”) is output (S307), and when the window 1 component is determined to be dominant, the second bit value ( For example, “0”) is output (S308), and when it is determined that the two window components are equal, a code indicating delimiter information is output (S309). When it is determined that the window 3 component is dominant and the first bit value is output (S307), or when the window 1 component is determined to be dominant and the second bit value is output (S308), The phase determination table S (p) is updated according to the following [Equation 14] (S310).

〔数式１４〕
窓３成分が優位の場合、Ｓ（ｐ）←Ｓ（ｐ）＋Ｅ_c2／Ｅ_c1
窓１成分が優位の場合、Ｓ（ｐ）←Ｓ（ｐ）＋Ｅ_c1／Ｅ_c2 [Formula 14]
When the three window components are dominant, S (p) ← S (p) + E _c2 / E _c1
When the window 1 component is dominant, S (p) ← S (p) + E _c1 / E _c2

続いて、符号判定パラメータ算出手段１５０は、候補符号テーブルに、最適位相となる候補を保存する（Ｓ３１１）。具体的には、位相判定テーブルに記録されているＳ（ｐ）の値が最大となる位相番号ｐの値、前記Ｓ３０７〜Ｓ３０９により判定された３値のいずれかの符号、その音響フレームについての各Ｅ_c2、Ｅ_c1の値を最適位相の候補として候補符号テーブルに保存する。 Subsequently, the code determination parameter calculation unit 150 stores the candidate for the optimum phase in the candidate code table (S311). Specifically, the value of the phase number p that maximizes the value of S (p) recorded in the phase determination table, one of the three values determined in S307 to S309, and the sound frame The values of E _c2 and E _c1 are stored in the candidate code table as optimum phase candidates.

続いて、全ての位相番号ｐに対応する処理を終えたかどうかを判定する（Ｓ３１２）。これは、ある基準フレームに対して全ての位相変更フレームの処理を行ったかどうかを判定している。本実施形態では、ｐが０〜５までの値をとるので、６回分処理していない場合は、処理していた音響フレームから所定サンプル数ずらして、位相の異なる音響フレームを設定し、Ｓ３０５に戻って処理を繰り返す。なお、ｐ＝０の場合が基準フレームであり、ｐ＝１〜５の場合が位相変更フレームである。全ての位相番号ｐに対応する処理を終えた場合は、候補保存テーブルに記録されている位相番号ｐに対応する位相が最適位相であると判定し、候補保存テーブルに記録されている符号を出力する（Ｓ３１３）。この際、後続の音響フレームについての最適位相を求めるため、位相番号ｐを最適位相決定テーブルに保存しておく。 Subsequently, it is determined whether or not the processing corresponding to all the phase numbers p has been completed (S312). This determines whether all phase change frames have been processed for a certain reference frame. In this embodiment, since p takes a value from 0 to 5, if the processing is not performed six times, an acoustic frame having a different phase is set by shifting a predetermined number of samples from the acoustic frame being processed, and the process proceeds to S305. Return and repeat the process. The case where p = 0 is a reference frame, and the case where p = 1 to 5 is a phase change frame. When the processing corresponding to all the phase numbers p is completed, it is determined that the phase corresponding to the phase number p recorded in the candidate storage table is the optimum phase, and the code recorded in the candidate storage table is output. (S313). At this time, the phase number p is stored in the optimum phase determination table in order to obtain the optimum phase for the subsequent acoustic frame.

次に、オフセット変更処理を行う（Ｓ３１４）。このオフセット変更処理についての詳細を図９のフローチャートに示す。オフセット変更処理においては、まず、判定位相の変動があったかどうかを判断する（Ｓ４０１）。具体的には、最適位相決定テーブルを参照し、たった今処理した音響フレームの最適位相が、その２つ前の音響フレームと同一であり、かつ直前の音響フレームと異なるという変動条件に該当するかどうかを判断する。 Next, an offset change process is performed (S314). Details of the offset change processing are shown in the flowchart of FIG. In the offset changing process, first, it is determined whether or not the determination phase has changed (S401). Specifically, referring to the optimum phase determination table, whether or not the optimum phase of the acoustic frame just processed is the same as that of the previous acoustic frame and that it is different from the immediately preceding acoustic frame. Judging.

前記変動条件に該当する場合、微小オフセット値を更新する（Ｓ４０２）。具体的には、８種類用意されているオフセット値のうち、現在のオフセット値より大きく一番近いものに変更する。微小オフセット値の更新を行った場合、または上記Ｓ４０１において変動条件に該当しないと判断した場合は、モード判定を行う（Ｓ４０３）。モードは、区切りモードとビット出力モードの２つが用意されている。 If the variation condition is met, the minute offset value is updated (S402). Specifically, the offset value is changed to the closest value greater than the current offset value among the eight types of offset values prepared. When the minute offset value is updated or when it is determined in S401 that the variation condition is not met, mode determination is performed (S403). Two modes, a delimited mode and a bit output mode, are prepared.

ビット出力モードである場合、平均符号レベルＨＬ１、ＨＬ２のパラメータおよび位相判定テーブルＳ（ｐ）の更新を行う（Ｓ４０４）。平均符号レベルＨＬ１、ＨＬ２のパラメータの更新は、平均符号レベルＨＬ１、ＨＬ２算出の際の分子となる積算値ｖ１、ｖ２に合算値Ｅ_c1、Ｅ_c2を加算して積算値ｖ１、ｖ２を更新し、分母となるフレーム数ｎ１、ｎ２に１を加算してフレーム数ｎ１、ｎ２を更新することにより行う。位相判定テーブルＳ（ｐ）の更新は、上記Ｓ３１０および〔数式１４〕に従った処理と同様にして行われる。さらに、Ｓ４０４においては、更新カウンタの更新も行う。更新カウンタの値は、Ｓ４０４における処理が実行される度に１づつ加算される。 In the bit output mode, the parameters of the average code levels HL1 and HL2 and the phase determination table S (p) are updated (S404). To update the parameters of the average code levels HL1 and HL2, the sum values E _c1 and E _c2 are added to the integrated values v1 and v2 which are numerators when calculating the average code levels HL1 and HL2, and the integrated values v1 and v2 are updated. , By adding 1 to the number of frames n1 and n2 as denominators and updating the number of frames n1 and n2. The phase determination table S (p) is updated in the same manner as the processing according to S310 and [Equation 14]. In S404, the update counter is also updated. The value of the update counter is incremented by 1 each time the process in S404 is executed.

そして、更新回数が所定以上となった場合、すなわち、更新カウンタの値が所定値以上となった場合には、平均符号レベルＨＬ１、ＨＬ２のパラメータおよび位相判定テーブルＳ（ｐ）のリフレッシュを行う（Ｓ４０５）。具体的には、参照する過去データの回数を５０回までと制限し、１００回累積されるごとに積算値ｖおよびＳ（ｐ）の値を半分にする。そして、更新カウンタの値を“０”にリセットする。 Then, when the number of updates exceeds a predetermined value, that is, when the value of the update counter exceeds a predetermined value, the parameters of the average code levels HL1 and HL2 and the phase determination table S (p) are refreshed ( S405). Specifically, the number of past data to be referred to is limited to 50 times, and the accumulated values v and S (p) are halved every time 100 times are accumulated. Then, the value of the update counter is reset to “0”.

上記Ｓ４０３において区切りモードと判定された場合、Ｓ４０４において更新回数が所定未満である場合、Ｓ４０５においてリフレッシュを行った場合は、オフセット変更処理（Ｓ３１４）を終了する。 If it is determined in S403 that the separation mode is selected, if the number of updates is less than a predetermined value in S404, or if refresh is performed in S405, the offset change process (S314) is terminated.

オフセット変更処理の終了により、図８に示す符号判定処理が終了するため、図７のフローチャートに戻って説明する。Ｓ２０２による処理の結果、ビット値に相当する符号が出力された場合には、平均符号レベルＨＬ１、ＨＬ２のパラメータの更新および非符号カウンタの初期化処理を行う（Ｓ２０３）。具体的には、平均符号レベルＨＬのパラメータの更新は、上記Ｓ４０４と同様、平均符号レベルＨＬ算出の際の分子となる積算値ｖ１、ｖ２に合算値Ｅ_c1、Ｅ_c2を加算して積算値ｖ１、ｖ２を更新し、分母となるフレーム数ｎ１、ｎ２に１を加算してフレーム数ｎ１、ｎ２を更新することにより行う。非符号カウンタの初期化処理は、上記Ｓ２００の初期化処理における処理と同様、非符号カウンタＮｎ＝０と設定する。 Since the code determination process shown in FIG. 8 is completed by the end of the offset changing process, the description will be returned to the flowchart of FIG. If the code corresponding to the bit value is output as a result of the process in S202, the parameters of the average code levels HL1 and HL2 are updated and the non-code counter is initialized (S203). Specifically, the update of the parameter of the average code level HL is performed by adding the total values E _c1 and E _c2 to the integrated values v1 and v2 which are numerators when calculating the average code level HL, as in S404. This is performed by updating v1 and v2, and adding 1 to the number of frames n1 and n2 as denominators to update the number of frames n1 and n2. In the initialization process of the non-sign counter, the non-sign counter Nn = 0 is set as in the process in the initialization process of S200.

続いて、モードの判定を行う（Ｓ２０４）。具体的には、上記Ｓ４０３と同様、区切りモードかビット出力モードかの判定を行う。ビット出力モードである場合は、そのビット値をバッファに保存する（Ｓ２０９）。続いて、ビットカウンタをカウントアップする（Ｓ２１０）。一方、Ｓ２０４による判定の結果、区切りモードである場合には、さらに抽出された符号が、新規を意味するものか継続を意味するものかを判定する（Ｓ２０５）。この結果、新規である場合には、その直前で１ワードが終了していることを意味するので、バッファに記録された１ワード分のデータを、付加情報抽出手段１７０が出力する（Ｓ２０６）。そして、ビットカウンタを０に初期化する（Ｓ２０７）。さらに、モードをビット出力モードに設定する（Ｓ２０８）。Ｓ２０５において、継続と判定された場合には、バッファ内のビットに値を出力すべきであるので、ビット出力モードに設定する処理のみを行う。 Subsequently, the mode is determined (S204). Specifically, as in S403, it is determined whether the mode is a separation mode or a bit output mode. If it is in the bit output mode, the bit value is stored in the buffer (S209). Subsequently, the bit counter is counted up (S210). On the other hand, if the result of determination in S204 is that the mode is separation mode, it is further determined whether the extracted code means new or continuation (S205). As a result, if it is new, it means that one word has been completed immediately before, so that the additional information extracting means 170 outputs the data for one word recorded in the buffer (S206). Then, the bit counter is initialized to 0 (S207). Further, the mode is set to the bit output mode (S208). In S205, if it is determined to continue, the value should be output to the bit in the buffer, so only the processing for setting the bit output mode is performed.

続いて、次の音響フレームから新規か継続かの情報を抽出するため、モードを区切りモードに設定する（Ｓ２１２）。そして、非符号カウンタのカウントアップ処理を行う（Ｓ２１３）。具体的には、非符号カウンタＮｎの値に１を加算する。図７に示す処理を各基準フレームに対して実行することにより、付加情報が抽出されることになる。Ｓ２０１において全ての基準フレームが抽出されたと判断された場合には、処理を終了する。 Subsequently, the mode is set to the delimiter mode in order to extract information on whether it is new or continued from the next sound frame (S212). Then, the non-sign counter is incremented (S213). Specifically, 1 is added to the value of the unsigned counter Nn. By executing the process shown in FIG. 7 for each reference frame, additional information is extracted. If it is determined in S201 that all reference frames have been extracted, the process ends.

上記Ｓ２０６の処理において、付加情報抽出手段１７０は、まず、符号判定パラメータ算出手段１５０により出力された３値の符号のうち、両窓成分均等を示す符号を区切り位置として、その次の符号を先頭とし、窓３成分優位、窓１成分優位の符号をビット値に対応させて、ビット配列を作成する。続いて、このビット配列を、所定の規則により変換して意味のある付加情報として抽出する。所定の規則としては、情報を埋め込む者が意図した情報が受け取った者に認識可能な状態とできるものであれば、さまざまな規則が適用できるが、本実施形態では、文字情報として認識するための規則としている。すなわち、付加情報抽出手段１７０は、符号判定パラメータ算出手段１５０が判定し、符号出力手段１６０から出力される符号を１バイト（８ビット）単位で認識し、これを設定されたコード体系に従って文字情報を認識する。このようにして得られた文字情報は、表示装置（図示省略）の画面に表示出力される。 In the process of S206, the additional information extraction unit 170 first sets a code indicating equality of both window components among the ternary codes output by the code determination parameter calculation unit 150 as a delimiter position, and sets the next code as the head. The bit array is created by associating the codes of the window 3 component dominant and the window 1 component dominant with the bit values. Subsequently, this bit arrangement is converted according to a predetermined rule and extracted as meaningful additional information. As the predetermined rule, various rules can be applied as long as the information intended by the person who embeds the information can be recognized by the person who has received it. As a rule. That is, the additional information extraction unit 170 recognizes the code output from the code output unit 160 in units of 1 byte (8 bits) as determined by the code determination parameter calculation unit 150, and character information according to the set code system. Recognize The character information thus obtained is displayed and output on a screen of a display device (not shown).

従って、埋め込み装置により音響信号に、その楽曲の曲名やアーチスト等の属性情報を文字情報として埋め込んでおけば、利用者は、その音楽が流れているのを聞いて、その曲名やアーチストを知りたいと思ったときに、抽出装置として機能する自身の携帯端末に所定の操作を行えば、自身の携帯端末の画面に曲名やアーチスト等の属性情報が文字情報として表示されることになる。 Therefore, if the embedding device embeds the attribute information such as the song title or artist in the sound signal as the character information, the user wants to know the song title or artist by listening to the music being played. If a predetermined operation is performed on the mobile terminal that functions as the extraction device, attribute information such as a song title and an artist is displayed as character information on the screen of the mobile terminal.

以上の処理においては、抽出装置において正確に付加情報を抽出するために、位相を補正する処理、左右の低周波成分の強度の差を補正する処理、無効フレームであることを判断するための下限閾値を補正する処理を行っている。次に、これら３つの補正処理について補足説明を行う。 In the above processing, in order to extract additional information accurately in the extraction device, processing for correcting the phase, processing for correcting the difference in intensity between the left and right low frequency components, and a lower limit for determining that the frame is an invalid frame Processing to correct the threshold is performed. Next, supplementary explanation will be given for these three correction processes.

（５．位相補正処理およびオフセット変更処理について）
上記のように、抽出時には、埋め込み時に埋め込んだ音響フレームに対応して、音響信号を読み込むことができるとは限らない。そこで、音響フレームの位相をずらして複数通り（本実施形態では６通り）で読み込み、その中で最適な位相を決定し、その位相で特定される音響フレームに対応する符号を出力することにしている。この際、検出精度をさらに向上させるため、本発明では、基準フレームの位置を微小量づつ移動させて、最適な音響フレームを符号抽出対象として決定するようにしている。 (5. About phase correction processing and offset change processing)
As described above, at the time of extraction, it is not always possible to read an acoustic signal corresponding to the acoustic frame embedded at the time of embedding. Therefore, the phase of the acoustic frame is shifted and read in a plurality of ways (six in this embodiment), the optimum phase is determined, and a code corresponding to the acoustic frame specified by the phase is output. Yes. At this time, in order to further improve the detection accuracy, in the present invention, the position of the reference frame is moved by a minute amount to determine the optimum acoustic frame as a code extraction target.

位相補正処理については、例えば６通りで読み込む場合、先頭の音響フレームは、本来サンプル番号１〜４０９６のサンプルであるが、サンプル番号１、６８３、１３６６、２０４９、２７３２、３４１３から始まる４０９６のサンプルで構成される６個の各音響フレームに対して処理を行い、最適な音響フレームに対応する符号を出力することになる。この位相補正処理は、Ｓ３０４、Ｓ３１０、Ｓ３１１、Ｓ３１２、Ｓ３１３における処理を中心として行われることになる。 Regarding the phase correction processing, for example, when reading in six ways, the first acoustic frame is originally a sample with sample numbers 1 to 4096, but with 4096 samples starting with sample numbers 1, 683, 1366, 2049, 2732, and 3413. Processing is performed on each of the six acoustic frames that are configured, and a code corresponding to the optimal acoustic frame is output. This phase correction process is performed centering on the processes in S304, S310, S311, S312 and S313.

また、オフセット変更処理については、最適位相フレームが、２つ前の最適位相フレームと同じ位相であり、１つ前の最適位相フレームと異なる位相である場合に限り、オフセット値を変更することになる。このオフセット変更処理は、Ｓ３０４およびＳ３１４における処理を中心として行われることになる。 As for the offset changing process, the offset value is changed only when the optimum phase frame has the same phase as the two previous optimum phase frames and a phase different from the previous optimum phase frame. . This offset changing process is performed centering on the processes in S304 and S314.

ここで、オフセット変更処理による情報抽出の対象とする音響フレームの変化の様子を図１０に示す。図１０において、左右方向は時系列方向を示し、右側に行くほど未来であることを示している。図１０（ａ）は、音響信号入力手段１００が、サンプル列として取得する対象であるデジタル音響信号を模式的に示したもので、音響フレームＡから音響フレームＤと示している４つの箇所に付加情報が埋め込まれているものとする。図１０（ｂ）の各々は、音響信号入力手段１００が、前記付加情報を抽出するためにサンプル列として位相をずらしながら取得したデジタル音響信号を模式的に示したものである。上述のように、音響フレーム獲得手段１１０は、入力されたデジタルのステレオ音響信号の各チャンネルから所定数のサンプル（サンプル群）を基準フレームとして読み込み、位相変更フレーム設定手段１３０が、基準フレームと所定サンプルずつ移動させることにより位相を変更した音響フレームを位相変更フレームとして設定する。図１０（ｂ）は、基準フレームおよび位相変更フレームとしての音響フレームを示すものである。図１０の例では、サンプル群Ａを基準フレーム（音響フレームＡ０）として読み込み、音響フレームＡ１〜音響フレームＡ５は、位相変更フレームとして読み込まれたものであることを示している。サンプル群Ｂ、Ｃと音響フレームＢ０〜音響フレームＢ５、音響フレームＣ０〜音響フレームＣ５の関係についても同様である。すなわち、音響フレームのアルファベットはサンプル群のアルファベットに対応し、音響フレームの数字は、位相番号ｐに対応している。 Here, FIG. 10 shows a state of change of the acoustic frame that is a target of information extraction by the offset changing process. In FIG. 10, the left-right direction indicates the time-series direction, and indicates the future as it goes to the right. FIG. 10A schematically shows a digital acoustic signal that is an object to be acquired by the acoustic signal input unit 100 as a sample string, and is added to four locations indicated as acoustic frames A to D. It is assumed that information is embedded. Each of FIG. 10B schematically shows a digital acoustic signal acquired by the acoustic signal input unit 100 while shifting the phase as a sample sequence in order to extract the additional information. As described above, the sound frame acquisition unit 110 reads a predetermined number of samples (sample group) from each channel of the input digital stereo sound signal as the reference frame, and the phase change frame setting unit 130 determines the reference frame and the predetermined frame. An acoustic frame whose phase is changed by moving sample by sample is set as a phase change frame. FIG. 10B shows an acoustic frame as a reference frame and a phase change frame. In the example of FIG. 10, the sample group A is read as a reference frame (acoustic frame A0), and the acoustic frames A1 to A5 are read as phase change frames. The same applies to the relationship between the sample groups B and C and the acoustic frames B0 to B5 and the acoustic frames C0 to C5. That is, the alphabet of the acoustic frame corresponds to the alphabet of the sample group, and the numeral of the acoustic frame corresponds to the phase number p.

このような状況で、音響フレームＡ０〜音響フレームＡ５の中では音響フレームＡ２、音響フレームＢ０〜音響フレームＢ５の中では音響フレームＢ１、音響フレームＣ０〜音響フレームＣ５の中では音響フレームＣ２が最適位相フレームとして決定されたとする。この場合、上記Ｓ３１３における処理により、最適位相決定テーブルには、位相番号ｐが、各基準フレームに対応して、“２” “１” “２”という順序で記録されていることになる。このような不安定な現象は、音響フレームＡの位相が音響フレームＡ１と音響フレームＡ２の中間に位置することに起因するもので、どちらが最適と判断できないため、サンプル群ごとに異なる位相番号が判定されることになり、Ｓ４０１の判定位相の変動条件に該当する。すなわち、図１０（ｂ）に示すように、たった今処理した音響フレームの最適位相（Ｃ２）が、その２つ前の音響フレーム（Ａ２）と同一であり、かつ直前の音響フレーム（Ｂ１）と異なる。すると、Ｓ４０２において、微小オフセット値が更新され、Ｓ３００において、アドレスの微小オフセット処理、すなわち、更新されたオフセット値に従って、基準フレームおよび位相変更フレームの読込位置を、所定フレーム数移動させた位置に設定する。したがって、図１０（ｂ）に示すように、サンプル群Ｄは、オフセット値分移動されたものとなる。そうすると、音響フレームＤ０〜音響フレームＤ５のいずれの候補音響フレームも前記オフセット値分だけ移動し、図１０（ｂ）の例では、音響フレームＤ１以外は選択される余地がなくなり、これ以降はしばらく位相番号ｐ＝１が安定して判定されるようになる。ただし実際には、本例のように、１回の微小オフセット処理で安定することは稀であり、逆に微小オフセット処理を施すことにより位相判定の不安定さが悪化する場合もあるが、そのような場合でも本実施例では、多くても７回の微小オフセット処理を施せば位相判定が安定するようになる。 Under such circumstances, the acoustic frame A2 is in the acoustic frame A0 to the acoustic frame A5, the acoustic frame B1 is in the acoustic frame B0 to the acoustic frame B5, and the acoustic frame C2 is in the optimal phase in the acoustic frame C0 to the acoustic frame C5. Assume that the frame is determined. In this case, the phase number p is recorded in the optimum phase determination table in the order of “2” “1” “2” corresponding to each reference frame by the process in S313. Such an unstable phenomenon is caused by the fact that the phase of the acoustic frame A is located between the acoustic frame A1 and the acoustic frame A2, and since it cannot be determined which is optimal, a different phase number is determined for each sample group. Therefore, this corresponds to the determination phase fluctuation condition in S401. That is, as shown in FIG. 10B, the optimum phase (C2) of the acoustic frame just processed is the same as the previous acoustic frame (A2) and is different from the previous acoustic frame (B1). . Then, in S402, the minute offset value is updated, and in S300, the reading position of the reference frame and the phase change frame is set to a position shifted by a predetermined number of frames in accordance with the address minute offset process, that is, the updated offset value. To do. Therefore, as shown in FIG. 10B, the sample group D is moved by the offset value. Then, any candidate acoustic frame of the acoustic frames D0 to D5 moves by the offset value, and in the example of FIG. 10B, there is no room for selection other than the acoustic frame D1, and after that, the phase continues for a while. The number p = 1 is determined stably. However, in practice, as in this example, it is rare to be stabilized by a single minute offset process, and on the contrary, the instability of phase determination may worsen by performing a minute offset process. Even in such a case, in this embodiment, the phase determination becomes stable if the minute offset process is performed at most seven times.

（７．下限閾値（平均符号レベル）補正処理について）
信号レベルが小さい場合には、両窓成分の大小が判定できず、抽出側で誤判断することが多くなる。そこで、低周波強度Ｅ_c1、Ｅ_c2が符号判定レベルＦＬ１、ＦＬ２以下のフレームについては、Ｓ３０５において、無効なフレームであると判断する（ビット値抽出の対象としない）ようにしているが、この符号判定レベルＦＬ１、ＦＬ２を過去の有効フレームについての低周波強度の積算値Ｅ_c1、Ｅ_c2を利用して補正する処理を行っている。このように閾値である符号判定レベルＦＬ１、ＦＬ２を変動させることにより、信号レベルが変動しても無効なフレームであるか、有効なフレームであるかを正確に判断することが可能となる。特に、本発明では、入力信号レベルが、信号が断絶したと判断される程小さい状態が続いた場合には、平均符号レベルＨＬ１、ＨＬ２を初期値であるレベル下限値Ｌｅｖに初期化する処理を行う。また、本発明では、音響信号全般を通して信号レベルが変化した場合であっても、それに対応できるように、平均符号レベルＨＬ１、ＨＬ２の算出の際に、時間的に遠くなったフレームについての低周波強度Ｅ_c1、Ｅ_c2を、その比重を軽くして用いるようにしている。この下限閾値補正処理は、Ｓ３０２、Ｓ２０３、Ｓ４０４における処理を中心として行われることになる。 (7. Lower threshold (average code level) correction process)
When the signal level is low, the size of both window components cannot be determined, and the extraction side often makes a wrong determination. Therefore, frames whose low frequency intensities E _c1 and E _c2 are lower than the code determination levels FL1 and FL2 are determined to be invalid frames in S305 (not subject to bit value extraction). The code determination levels FL1 and FL2 are corrected using the low frequency intensity integrated values E _c1 and E _c2 for the past effective frames. Thus, by varying the code determination levels FL1 and FL2, which are threshold values, it is possible to accurately determine whether the frame is invalid or valid even if the signal level varies. In particular, in the present invention, when the input signal level remains so low that it is determined that the signal has been interrupted, the process of initializing the average code levels HL1 and HL2 to the level lower limit value Lev that is the initial value is performed. Do. Further, in the present invention, even when the signal level changes throughout the acoustic signal, the low frequency of a frame that is distant in time is calculated when the average code levels HL1 and HL2 are calculated. The strengths E _c1 and E _c2 are used while reducing the specific gravity. This lower limit threshold correction process is performed centering on the processes in S302, S203, and S404.

ここで、従来のように、平均符号レベルＨＬ１、ＨＬ２の算出の基礎である低周波強度Ｅ_c1、Ｅ_c2を過去全てのフレームについて均等に扱った場合と、本発明のように、時間的に遠いフレームの重要度を小さくし、信号が極端に小さい場合には、初期化するようにした場合の、符号判定の様子を図１１に示す。図１１において、横軸は時間軸をフレーム単位で示し、縦軸は信号レベルを示している。また、フレーム幅の太い実線は低周波強度Ｅ_c1、Ｅ_c2（図中Ｅと示す）、フレーム幅の太い破線は符号判定レベルＦＬ１、ＦＬ２（図中ＦＬと示す）を示している。なお、図１１（ａ）（ｂ）に示す低周波強度Ｅは対応するフレームにおいて全く同一となっている。 Here, as in the case of the present invention, when the low frequency intensities E _c1 and E _c2 that are the basis for calculating the average code levels HL1 and HL2 are treated equally for all the past frames, FIG. 11 shows the state of code determination when the importance of a distant frame is reduced and the signal is extremely small, and initialization is performed. In FIG. 11, the horizontal axis indicates the time axis in frame units, and the vertical axis indicates the signal level. In addition, solid lines with thick frame widths indicate low frequency intensities E _c1 and E _c2 (shown as E in the figure), and broken lines with thick frame widths show code determination levels FL1 and FL2 (shown as FL in the figure). Note that the low frequency intensity E shown in FIGS. 11A and 11B is exactly the same in the corresponding frame.

このような低周波強度Ｅが得られる場合、従来法では、図１１（ａ）に示すように、左から６番目のフレームにおいて、低周波強度Ｅ_c1、Ｅ_c2が符号判定レベルＦＬ１、ＦＬ２未満となるため、Ｓ３０９において、区切り情報と判定される。このように、従来法では、左から６番目のフレームは、それなりの低周波強度Ｅ_c1、Ｅ_c2を有しており、本来ならビット値の抽出が可能であるにも関わらず、区切り情報と判定してしまうことになる。一方、本発明では、時間的に遠くなったフレームについての低周波強度Ｅ_c1、Ｅ_c2の比重を軽くしている。したがって、時間的に近いフレームについての低周波強度Ｅの値が符号判定レベルＦＬ１、ＦＬ２に大きく反映される。このため、左から４番目から６番目のフレームにおいて低周波強度Ｅが徐々に小さくなっている場合には、それが反映されて符号判定レベルＦＬ１、ＦＬ２の値も小さくなる。すると、図１１（ｂ）に示すように、左から６番目のフレームにおいて、低周波強度Ｅ_c1、Ｅ_c2が符号判定レベルＦＬ１、ＦＬ２以上となるため、Ｓ３０６〜Ｓ３０８において、ビット値が設定されることになる。 When such a low frequency intensity E is obtained, in the conventional method, as shown in FIG. 11A, in the sixth frame from the left, the low frequency intensity E _c1 and E _c2 are less than the code determination levels FL1 and FL2. Therefore, in S309, it is determined as delimiter information. As described above, in the conventional method, the sixth frame from the left has appropriate low-frequency intensities E _c1 and E _c2 , and although bit values can be extracted originally, It will be judged. On the other hand, in the present invention, the specific gravity of the low frequency intensities E _c1 and E _c2 for a frame that is distant in time is reduced. Therefore, the value of the low frequency intensity E for a frame close in time is greatly reflected in the code determination levels FL1 and FL2. For this reason, when the low frequency intensity E gradually decreases in the fourth to sixth frames from the left, this is reflected and the values of the code determination levels FL1 and FL2 also decrease. Then, as shown in FIG. 11 (b), in the sixth frame from the left, the low frequency intensities E _c1 and E _c2 are equal to or higher than the code determination levels FL1 and FL2, so the bit values are set in S306 to S308. Will be.

また、左から１０、１１番目のフレームにおいて低周波強度Ｅ_c1、Ｅ_c2は“０”になっているが、従来法では、このような場合に符号判定レベルＦＬ１、ＦＬ２に対して特別な処理を行わないため、符号判定レベルＦＬ１、ＦＬ２の値は急激には減少しない。このため、図１１（ａ）に示すように、左から１２、１３番目のフレームにおいて、低周波強度Ｅ_c1、Ｅ_c2が符号判定レベルＦＬ１、ＦＬ２未満となるため、Ｓ３０９において、区切り情報と判定される。一方、本発明では、低周波強度Ｅが“０”の状態が所定フレーム数以上連続すると、符号判定レベルＦＬ１、ＦＬ２を初期値に再設定する。このため、左から１１番目のフレームにおいて符号判定レベルＦＬ１、ＦＬ２の値も小さくなる。すると、図１１（ｂ）に示すように、左から１２、１３番目のフレームにおいて、低周波強度Ｅが符号判定レベルＦＬ１、ＦＬ２以上となるため、Ｓ３０６〜Ｓ３０８において、ビット値が設定されることになる。なお、図１１の例では、低周波強度Ｅ_c1、Ｅ_c2が“０”の状態が２フレーム連続しただけで、符号判定レベルＦＬ１、ＦＬ２を初期値に設定しているが、これは、説明の便宜のためであり、現実には、多数のフレームについて、連続して低周波強度Ｅが“０”の場合に、符号判定レベルＦＬ１、ＦＬ２が初期値に設定される。 Further, the low frequency intensities E _c1 and E _c2 are “0” in the 10th and 11th frames from the left, but in the conventional method, special processing is applied to the code determination levels FL1 and FL2 in such a case. Therefore, the values of the code determination levels FL1 and FL2 do not decrease rapidly. For this reason, as shown in FIG. 11A, in the 12th and 13th frames from the left, the low frequency intensities E _c1 and E _c2 are less than the code determination levels FL1 and FL2, so that it is determined as delimiter information in S309. Is done. On the other hand, in the present invention, when the low frequency intensity E is “0” for a predetermined number of frames or more, the code determination levels FL1 and FL2 are reset to the initial values. For this reason, in the eleventh frame from the left, the values of the code determination levels FL1 and FL2 are also reduced. Then, as shown in FIG. 11B, in the twelfth and thirteenth frames from the left, the low frequency intensity E becomes equal to or higher than the code determination levels FL1 and FL2, so that a bit value is set in S306 to S308. become. In the example of FIG. 11, the code determination levels FL1 and FL2 are set to the initial values only when the low-frequency intensities E _c1 and E _c2 are “0” for two consecutive frames. In reality, the sign determination levels FL1 and FL2 are set to initial values when the low frequency intensity E is continuously “0” for a number of frames.

音響信号に対する情報の埋め込み装置の機能ブロック図である。It is a functional block diagram of an information embedding device for an acoustic signal. 図１に示した装置の処理概要を示すフローチャートである。It is a flowchart which shows the process outline | summary of the apparatus shown in FIG. 一般的なフーリエ変換を行う場合の信号波形の変化の様子を示す図である。It is a figure which shows the mode of the change of the signal waveform in the case of performing general Fourier transform. 本発明において用いる窓関数を示す図である。It is a figure which shows the window function used in this invention. 図２に従った処理による低周波成分の変化の様子を示すである。FIG. 3 shows how a low-frequency component is changed by processing according to FIG. 2. 本発明に係る音響信号からの情報の抽出装置の機能ブロック図である。1 is a functional block diagram of an apparatus for extracting information from an acoustic signal according to the present invention. 図６に示した装置の処理概要を示すフローチャートである。５It is a flowchart which shows the process outline | summary of the apparatus shown in FIG. 5 図７のＳ２０２の符号判定処理の詳細を示すフローチャートである。It is a flowchart which shows the detail of the code | symbol determination process of S202 of FIG. 図８のＳ３１４の符号判定処理の詳細を示すフローチャートである。It is a flowchart which shows the detail of the code | symbol determination process of S314 of FIG. オフセット変更処理による情報抽出の対象とする音響フレームの変化の様子を示すブロック図である。It is a block diagram which shows the mode of a change of the acoustic frame used as the object of information extraction by an offset change process. 従来法と本発明における符号判定の様子を示す図である。It is a figure which shows the mode of the code determination in a conventional method and this invention.

Explanation of symbols

１０・・・音響フレーム読込手段
２０・・・周波数変換手段
３０・・・低周波成分変更手段
４０・・・周波数逆変換手段
５０・・・改変音響フレーム出力手段
６０・・・記憶手段
６１・・・音響信号記憶部
６２・・・付加情報記憶部
６３・・・改変音響信号記憶部
７０・・・付加情報読込手段
１００・・・音響信号入力手段
１１０・・・音響フレーム獲得手段
１２０・・・基準フレーム設定手段
１３０・・・位相変更フレーム設定手段
１４０・・・周波数変換手段
１５０・・・符号判定パラメータ算出手段
１６０・・・符号出力手段
１７０・・・付加情報抽出手段
１８０・・・音響フレーム保持手段

DESCRIPTION OF SYMBOLS 10 ... Sound frame reading means 20 ... Frequency conversion means 30 ... Low frequency component change means 40 ... Frequency reverse conversion means 50 ... Modified sound frame output means 60 ... Storage means 61 ... -Acoustic signal storage unit 62 ... Additional information storage unit 63 ... Modified acoustic signal storage unit 70 ... Additional information reading means 100 ... Acoustic signal input means 110 ... Acoustic frame acquisition means 120 ... Reference frame setting means 130: Phase change frame setting means 140 ... Frequency conversion means 150 ... Code determination parameter calculation means 160 ... Code output means 170 ... Additional information extraction means 180 ... Acoustic frame Holding means

Claims

A device that extracts information embedded in advance in an inaudible state from an acoustic signal composed of a time-series sample sequence,
Acoustic frame acquisition means for acquiring an acoustic frame composed of a predetermined number of samples from the acoustic signal;
A reference frame setting means for setting a reference frame by changing the phase by moving the acoustic frame by an offset value for a predetermined number of samples;
Phase change frame setting means for setting a plurality of acoustic frames set by changing the phase by moving from the reference frame by a sample corresponding to a step value that is a value larger than the offset value, as a phase change frame;
Frequency conversion means for performing frequency conversion on each acoustic frame set as the reference frame and the phase change frame, and generating a frame spectrum corresponding to each acoustic frame;
Code determination parameter calculation means for extracting low frequency intensity data corresponding to a component of a predetermined frequency or less from the generated frame spectrum, and calculating a code determination parameter based on the low frequency intensity data;
Based on the code determination parameter calculated in the past in-phase acoustic frames having different reference frames, it is determined that one of the reference frames and the plurality of phase change frames is the optimum phase frame having the optimum phase. And outputting a predetermined code based on the code determination parameter determined for the optimum phase frame, and the optimum phase frame is different in phase from the immediately preceding optimum phase frame, and the phase and phase of the two previous optimum phase frames. Code output means for changing the offset value when they are the same,
Additional information extracting means for extracting the additional information by converting the bit arrangement constituted by the code output for each optimum phase frame according to a predetermined rule;
An apparatus for extracting information from an acoustic signal, comprising:

In claim 1,
The frequency conversion means includes
Frequency conversion is performed on the acoustic frame using a first window function and a third window function, respectively, and a first window spectrum and a spectrum corresponding to the third window function are spectra corresponding to the first window function. To generate a third window spectrum,
The code output means extracts a spectrum set corresponding to a predetermined low frequency band from each generated window spectrum, calculates a sum value of spectrum intensities for each spectrum set, and a spectrum set of the sum value An apparatus for extracting information from an acoustic signal, which outputs a predetermined code based on a ratio between the two.

In claim 1,
The code output means calculates the code determination parameter for each phase with respect to the acoustic frame having the same phase in the past, and the sound corresponding to the phase having the maximum value in the phase determination table calculated using the code determination parameter. The code corresponding to the state of the frame is output, and in the calculation of the phase determination table, the past acoustic frame that is farther away in time is less affected by the acoustic signal. Information extraction device.

In claim 1 or claim 2,
The code output means includes
When the sum of the low frequency intensity data extracted from the generated frame spectrum is less than a predetermined lower threshold, the acoustic frame is determined to be an invalid frame,
The lower threshold used for the determination is calculated by adding low frequency intensity data that has been determined as an effective frame in the past, and in the addition of the low frequency intensity data, past effective frames that are far apart in time An apparatus for extracting information from an acoustic signal, characterized in that the influence is reduced.

In claim 4,
The code output means includes
When the acoustic frame is determined to be an invalid frame and a predetermined number of consecutive acoustic frames are also determined to be invalid frames, the lower threshold is reset to an initial value. An apparatus for extracting information from an acoustic signal.