JP2008225234A

JP2008225234A - Audio signal processor

Info

Publication number: JP2008225234A
Application number: JP2007065601A
Authority: JP
Inventors: Hideo Koide; 英生小出
Original assignee: Crimson Tech Inc
Current assignee: Crimson Tech Inc
Priority date: 2007-03-14
Filing date: 2007-03-14
Publication date: 2008-09-25

Abstract

<P>PROBLEM TO BE SOLVED: To make it possible to specify the start position of an information addition processing frame of audio data relatively easily even when the audio data is input in stream data format by adding information so that the start position of the information addition processing frame of the audio data can be specified relatively easily. <P>SOLUTION: The audio signal processor includes a synchronism information adding means 304 of adding synchronism information showing time positions of frames to an audio data input by the frames as sections of fixed length, and an information adding means 305 of embedding relative information associated with the audio data input in the frames of the audio data with the added synchronism information. Further, the audio signal processor includes a synchronism information detecting means 702 of inputting the audio data having the information embedded in the stream data format and detecting the synchronism information to detect the frame start positions, and a relative information extracting means 704 of processing the audio data having the information embedded on the basis of the synchronism information detection result by the frames as specified and extracting the relative information embedded in the frames. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、音声信号処理装置に係り、特に音声データに対する処理フレームの時間位置を示す同期情報および音声データに関連する関連情報の埋め込みまたは抽出を行う装置に関するもので、例えばストリームデータ形式の音楽コンテンツの配信システムに使用されるものである。 The present invention relates to an audio signal processing apparatus, and more particularly to an apparatus for embedding or extracting synchronization information indicating a time position of a processing frame with respect to audio data and related information related to audio data. Used in the distribution system.

デジタル化された音楽コンテンツ、例えばCD品質のデジタル音声データは、各チャンネル毎に1 秒あたり44100 個の16ビット長のサンプルから構成される。このような無圧縮のデジタル音声データに対して、著作権情報等の電子透かし情報や、その他の情報を付加（埋め込み）する場合、デジタル信号処理の基本手法として、ある一定長の区間でサンプルを抽出し、抽出されたサンプルに対して個別の処理を行う手法が多く用いられる。ここで、一定長の各抽出区間をフレームと呼ぶ。フレーム処理を用いる代表的な手法として高速フーリェ変換を挙げることができる。 Digitized music content, eg CD quality digital audio data, consists of 44100 16-bit samples per second for each channel. When digital watermark information such as copyright information or other information is added (embedded) to such uncompressed digital audio data, as a basic method of digital signal processing, samples are sampled in a certain length section. A method of extracting and performing individual processing on the extracted sample is often used. Here, each extraction section having a certain length is called a frame. As a typical technique using frame processing, high-speed Fourier transform can be mentioned.

従来、CD品質のデジタル音声データに対して情報を付加する情報付加装置は、音声データを例えば2048サンプル毎に区切り、区切られたフレーム内のサンプルを抽出し、所定の情報付加処理を行ったうえで、情報付加済みの音声データとして出力する。そして、上記したような情報付加済みの音声データが入力される情報抽出装置は、情報付加装置における動作時と同じ2048サンプル毎にフレーム区間内のサンプルを抽出し、所定の信号処理を行うことによって、音声データに埋め込まれた情報を抽出する。 Conventionally, an information adding device for adding information to digital audio data of CD quality delimits audio data every 2048 samples, for example, extracts samples in the divided frames, and performs a predetermined information adding process. And output as audio data with added information. Then, the information extraction device to which the information-added audio data as described above is input extracts samples in the frame interval for every 2048 samples that is the same as the operation in the information addition device, and performs predetermined signal processing. Extract information embedded in audio data.

ところで、従来の情報抽出装置において、所定の記録媒体等から音声データを読み出すような場合には、情報付加装置の動作により定められた音声データのフレームの開始位置を明確に再現することができる。しかし、音声データがストリームデータ形式で情報抽出装置に入力される場合、入力データがアナログ／デジタルに拘らず、情報付加処理フレームの開始位置の特定を行うことは一般には難しい。 By the way, in the conventional information extraction apparatus, when the audio data is read from a predetermined recording medium or the like, the start position of the frame of the audio data determined by the operation of the information adding apparatus can be clearly reproduced. However, when audio data is input to the information extraction device in the stream data format, it is generally difficult to specify the start position of the information addition processing frame regardless of whether the input data is analog / digital.

なお、情報付加処理フレームの開始位置の特定を行うため、フレームの開始位置を１サンプル毎にずらしながら特定の符号の抽出を行ったり、フレーム内信号の周波数の特定に基づいた抽出処理を行う手法が提案されている（特許文献１参照）が、抽出に伴う演算量が膨大となるという問題が発生する。
特開２０００−１７２２８２号公報 In addition, in order to specify the start position of the information addition processing frame, a method of extracting a specific code while shifting the start position of the frame for each sample or performing an extraction process based on the specification of the frequency of the signal in the frame Has been proposed (see Patent Document 1), however, there is a problem in that the amount of calculation accompanying extraction becomes enormous.
JP 2000-172282 A

本発明は前記した従来の問題点を解決すべくなされたもので、音声データの情報付加処理フレームの開始位置の特定を比較的簡易に行い得るように情報を付加することが可能になる音声信号処理装置を提供することを目的とする。 The present invention has been made to solve the above-mentioned conventional problems, and an audio signal that can add information so that the start position of an information addition processing frame of audio data can be specified relatively easily. An object is to provide a processing apparatus.

また、本発明の他の目的は、音声データがストリームデータ形式で入力される場合でも、音声データの情報付加処理フレームの開始位置の特定を比較的簡易に行うことが可能になる音声信号処理装置を提供することにある。 Another object of the present invention is to provide an audio signal processing apparatus capable of relatively easily specifying the start position of an information addition processing frame of audio data even when the audio data is input in a stream data format. Is to provide.

本発明の音声信号処理装置の第１の態様は、音声データ入力に対して一定長の区間であるフレーム毎に当該フレームの時間位置を示す同期情報を付加する同期情報付加手段と、前記同期情報付加手段により同期情報が付加された音声データのフレーム内に、当該音声データに関連する関連情報の埋め込みを行う関連情報付加手段と、を具備することを特徴とする。 According to a first aspect of the audio signal processing apparatus of the present invention, there is provided synchronization information adding means for adding synchronization information indicating a time position of a frame for each frame which is a fixed length section with respect to audio data input; And related information adding means for embedding related information related to the audio data in the frame of the audio data to which the synchronization information is added by the adding means.

本発明の音声信号処理装置の第２の態様は、音声データ入力に対して一定長の区間であるフレーム毎に所定の処理が行われ、かつ、当該音声データに対して前記フレームの時間位置を示す同期情報が付加されるとともに、前記フレーム内に前記音声データに関連する関連情報が埋め込まれた情報埋め込み済みの音声データがストリームデータ形式で入力し、前記同期情報を検出する同期情報検出手段と、前記同期情報検出手段より検出された同期情報に基づいて前記情報埋め込み済みの音声データのフレーム開始位置を検出する機能を有するフレーム抽出手段と、前記フレーム抽出手段より検出されたフレーム情報に基づいて前記情報埋め込み済みの音声データに対してフレーム毎に所定の処理を行うとともに、前記フレーム内に埋め込まれている関連情報を抽出する関連情報抽出手段と、を具備することを特徴とする。 According to a second aspect of the audio signal processing device of the present invention, predetermined processing is performed for each frame that is a fixed-length section for audio data input, and the time position of the frame is set for the audio data. Synchronization information detecting means for detecting the synchronization information by inputting the embedded audio data in the stream data format in which related information related to the audio data is embedded in the frame. A frame extracting unit having a function of detecting a frame start position of the audio data embedded with the information based on the synchronization information detected by the synchronization information detecting unit, and based on the frame information detected by the frame extracting unit. A predetermined process is performed for each frame of the audio data with the information embedded and embedded in the frame. Characterized by comprising the additional information extracting means for extracting relevant information are, the.

前記同期情報として、前記音声データの隣り合う２つのフレームを１周期としてフレームに同期し、振幅が三角形状に変化する波形を有する三角波信号であって、聴感に与える影響が少ない所定の低周波帯域の信号を用いることが望ましい。 As the synchronization information, a predetermined low frequency band which is a triangular wave signal having a waveform in which two adjacent frames of the audio data are synchronized with the frame as one cycle and whose amplitude changes in a triangular shape and has little influence on the sense of hearing. It is desirable to use this signal.

本発明の音声信号処理装置によれば、音声データの情報付加処理フレームの開始位置の特定を比較的簡易に行い得るように情報を付加することができる。また、本発明の音声信号処理装置によれば、音声データがストリームデータ形式で入力される場合でも、音声データの情報付加処理フレームの開始位置の特定を比較的簡易に行うことができる。 According to the audio signal processing apparatus of the present invention, it is possible to add information so that the start position of the information addition processing frame of audio data can be specified relatively easily. Further, according to the audio signal processing apparatus of the present invention, even when audio data is input in the stream data format, the start position of the information addition processing frame of the audio data can be specified relatively easily.

以下、図面を参照して本発明の実施形態を説明する。この説明に際して、全図にわたり共通する部分には共通する参照符号を付す。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In this description, common parts are denoted by common reference numerals throughout the drawings.

＜第１の実施形態＞
図１は、本発明の音声信号処理装置の第１の実施形態に係る音声コンテンツ配信システムの構成を概略的に示している。この音声コンテンツ配信システムの送信側１０では、音声コンテンツ配信システムの音声データが入力し、所定の情報を付加する情報付加装置３０と、この情報付加装置３０の出力を何らかの伝送路４０に送出する送信回路２０とを具備する。そして、音声コンテンツ配信システムの受信側５０では、伝送路４０からの入力を受領する受信回路６０と、この受信回路６０の出力から所定の情報を抽出し、音声データを抽出する情報抽出装置７０とを具備する。上記情報付加装置３０および情報抽出装置７０は、ソフトウェア的に、あるいは、ハードウェアにより実現することができるが、本例では、ＣＰＵとプログラムを用いて実現している。 <First Embodiment>
FIG. 1 schematically shows the configuration of an audio content distribution system according to the first embodiment of the audio signal processing apparatus of the present invention. On the transmission side 10 of this audio content distribution system, audio data of the audio content distribution system is input, an information adding device 30 for adding predetermined information, and a transmission for sending the output of this information adding device 30 to some transmission line 40 Circuit 20. On the receiving side 50 of the audio content distribution system, a receiving circuit 60 that receives an input from the transmission path 40, an information extracting device 70 that extracts predetermined information from the output of the receiving circuit 60, and extracts audio data; It comprises. The information adding device 30 and the information extracting device 70 can be realized by software or hardware, but in this example, they are realized by using a CPU and a program.

図２は、図１中の情報付加装置３０における処理機能に着目して処理フロー（ステップ）の一例にしたがって示すブロック図である。この情報付加装置３０は、フレーム位置特定手段302 と、フレーム抽出手段303 と、同期情報付加手段304 と、関連情報付加手段305と、を具備する。 FIG. 2 is a block diagram showing an example of a processing flow (step) focusing on the processing function in the information adding device 30 in FIG. The information adding device 30 includes a frame position specifying unit 302, a frame extracting unit 303, a synchronization information adding unit 304, and a related information adding unit 305.

フレーム位置特定手段302 は、音声コンテンツ配信システムの音声データ（オーディオデータ）、例えばCD品質の音声データが入力し、この音声データを一定長の区間であるフレーム（例えば2048サンプル）毎に区切ることによってフレーム位置の特定を行う（例えばフレーム同期情報を付加する）機能を有する。フレーム抽出手段303 は、フレーム位置特定手段302 により区切られたフレーム毎にフレーム内のサンプルを抽出する。 The frame position specifying unit 302 receives audio data (audio data) of the audio content distribution system, for example, CD quality audio data, and divides the audio data into frames (for example, 2048 samples) that are constant length sections. It has a function of specifying a frame position (for example, adding frame synchronization information). The frame extracting unit 303 extracts a sample in the frame for each frame divided by the frame position specifying unit 302.

同期情報付加手段304 は、音声データ入力のフレーム毎にフレーム内のサンプルに対してフレームの時間位置を示す同期情報を付加する機能を有する。本例では、音声データ入力のフレームに同期した同期信号を生成する同期信号生成機能と、音声データ入力に同期信号をアナログ的に加算（重畳）する加算機能と、を有する。 The synchronization information adding unit 304 has a function of adding synchronization information indicating a time position of a frame to a sample in the frame for each frame of audio data input. This example has a synchronization signal generation function that generates a synchronization signal synchronized with a frame of audio data input, and an addition function that adds (superimposes) the synchronization signal to the audio data input in an analog manner.

関連情報付加手段305 は、同期情報付加手段304 により同期情報が付加された音声データのフレーム内に、音声データに関連する関連情報を所定のルールにしたがって埋め込む（付加する）ことによって、情報付加処理済みの音声データとして出力する機能を有する。ここで、音声データに関連する関連情報は、例えば歌詞などのテキスト情報、著作権情報、あるいは楽譜やリズムなどの演奏情報、演奏用ロボットや玩具の演奏動作制御情報などであり、例えばMIDI規格のデータ形式で付加される。 The related information adding unit 305 embeds (adds) the related information related to the audio data in the frame of the audio data to which the synchronization information is added by the synchronization information adding unit 304 according to a predetermined rule, thereby adding information processing. It has a function of outputting as finished audio data. Here, the related information related to the audio data is, for example, text information such as lyrics, copyright information, performance information such as a score or rhythm, performance control information of a performance robot or toy, and the like, for example, of the MIDI standard Added in data format.

次に、図２中の同期情報付加手段304 における処理例について、図３、図４、図５を参照して説明する。図３（ａ）、（ｂ）は、図２中のフレーム位置特定手段302 に入力する音声データの信号波形の一例を示す。ここでは、音声信号の振幅のサンプリング値がデジタル的に表現されたCD品質の音声データ入力を例にとり、サンプリング値を連続的に連ねた波形で表現している。また、本例では、ステレオデータの左チャンネルＬの信号と、右チャンネルＲの信号を示している。図４（ａ）、（ｂ）は、元の音声データに同期情報として付加されるフレーム同期信号の信号波形の一例を示す。図５（ａ）、（ｂ）は、元の音声データにフレーム同期信号が加算により付加された情報付加済みの音声データの信号波形の一例を示す。 Next, an example of processing in the synchronization information adding means 304 in FIG. 2 will be described with reference to FIG. 3, FIG. 4, and FIG. FIGS. 3A and 3B show examples of signal waveforms of audio data inputted to the frame position specifying means 302 in FIG. Here, the CD quality audio data input in which the sampling value of the amplitude of the audio signal is digitally expressed is taken as an example, and the waveform is expressed by a waveform in which the sampling values are continuously connected. In this example, a left channel L signal and a right channel R signal of stereo data are shown. 4A and 4B show an example of a signal waveform of a frame synchronization signal added as synchronization information to the original audio data. FIGS. 5A and 5B show examples of signal waveforms of audio data with information added by adding a frame synchronization signal to the original audio data.

前記フレーム同期信号に要求される条件は次のようなものである。 The conditions required for the frame synchronization signal are as follows.

（１）フレーム同期信号は、音声データの隣り合う２つのフレームを１周期としてフレームに同期する交流波である。換言すれば、フレーム同期信号は、フレームの両端位置に対応する振幅が基準レベル(0) であり、フレームの中央において振幅が最大値あるいは最小値となり、時間的に隣り合うフレームで振幅方向が対称となるように交互に反転する交流波である。この交流波は、振幅が三角形状に変化する波形を有する三角形、矩形波、正弦波、三角形状の脈流波などでもよいが、後述するように同期信号を検出する際にフレームの境界を明確に検出するために、三角波信号を用いることが望ましい。 (1) The frame synchronization signal is an AC wave that is synchronized with a frame with two adjacent frames of audio data as one period. In other words, the amplitude of the frame synchronization signal corresponding to both end positions of the frame is the reference level (0), the amplitude is the maximum value or the minimum value at the center of the frame, and the amplitude direction is symmetrical between the temporally adjacent frames. It is an alternating wave that reverses alternately so that This AC wave may be a triangle, a rectangular wave, a sine wave, a triangular pulsating wave having a waveform whose amplitude changes in a triangular shape, etc., but the boundary of the frame is clearly defined when detecting a synchronization signal as described later. It is desirable to use a triangular wave signal for detection.

（２）フレーム同期信号は、音声データを再生した場合に、聴感に与える影響が少ない所定の低周波帯域の信号、例えば直流成分に近い帯域の数十Ｈｚ以下の信号を用いることが望ましい。 (2) As the frame synchronization signal, it is desirable to use a signal in a predetermined low frequency band that has little influence on the audibility when audio data is reproduced, for example, a signal of several tens Hz or less in a band close to a direct current component.

（３）フレーム同期信号がステレオデータに対して付加される場合には、フレーム同期信号は、ステレオデータの同時刻の左チャンネルのフレーム、右チャンネルのフレームでフレーム同期信号の振幅方向が対称となるように付加される。なお、フレーム同期信号がモノラルデータに対して付加される場合には、この条件（３）は不要である。 (3) When a frame synchronization signal is added to stereo data, the amplitude direction of the frame synchronization signal is symmetrical between the left channel frame and the right channel frame of the stereo data at the same time. Is added as follows. Note that this condition (3) is not necessary when a frame synchronization signal is added to monaural data.

上記した情報付加装置３０によれば、音声データの情報付加処理フレームの開始位置の特定を比較的簡易に行い得るように情報を付加することができる。なお、上記した情報付加装置３０は、前記したような音声コンテンツ配信システムの送信側に限らず、任意の音声信号処理システムに設けることが可能である。 According to the information addition device 30 described above, information can be added so that the start position of the information addition processing frame of the audio data can be specified relatively easily. The information adding device 30 described above can be provided not only on the transmission side of the audio content distribution system as described above but also in any audio signal processing system.

図６は、図１中の情報抽出装置７０における処理機能に着目して処理フロー（ステップ）の一例にしたがって示すブロック図である。この情報抽出装置は、同期情報検出手段702と、フレーム抽出手段703 と、関連情報抽出手段704 と、を具備する。 FIG. 6 is a block diagram showing an example of a processing flow (step) focusing on the processing function in the information extracting device 70 in FIG. This information extraction apparatus includes synchronization information detection means 702, frame extraction means 703, and related information extraction means 704.

すなわち、同期情報検出手段702 は、前述したように元の音声データに対して一定長の区間であるフレーム毎に所定の処理が行われ、かつ、当該音声データに対してフレームの時間位置を示す同期情報が付加されるとともに、フレーム内に音声データに関連する関連情報が埋め込まれた情報埋め込み済みの音声データがストリームデータ形式で入力し、この音声データ入力から同期情報を検出する機能を有する。 That is, as described above, the synchronization information detection unit 702 performs predetermined processing for each frame that is a fixed-length section on the original audio data, and indicates the time position of the frame for the audio data. The synchronization information is added, and the embedded audio data in which the related information related to the audio data is embedded in the frame is input in the stream data format, and the synchronization information is detected from the audio data input.

フレーム抽出手段703 は、同期情報検出手段702 より検出された同期情報に基づいて、前記情報付加装置３０の動作時と同じ2048サンプル毎の同期位置（フレーム開始位置）を検出する機能を有する。 The frame extraction means 703 has a function of detecting the synchronization position (frame start position) every 2048 samples, which is the same as when the information adding device 30 is operated, based on the synchronization information detected by the synchronization information detection means 702.

関連情報抽出手段704 は、前記フレーム抽出手段703 より検出されたフレーム情報に基づいて情報埋め込み済みの音声データに対して各フレーム区間内のサンプルを抽出し、所定の処理を行い、フレーム内に埋め込まれている関連情報を抽出する機能を有する。 The related information extracting unit 704 extracts a sample in each frame section from the audio data in which information is embedded based on the frame information detected by the frame extracting unit 703, performs a predetermined process, and embeds it in the frame It has a function to extract related information.

次に、図６中の同期情報検出手段702 における処理例について、図７、図８を参照して説明する。同期情報検出手段702 では、まず、情報埋め込み済みの音声データ入力の各チャンネル毎に、フレーム長の総和値の推移を演算する。この演算の結果ｙ（ｎ）は、入力信号をｘ（ｎ）、検出するフレーム長をＮとした場合、次の（式１）で表される。

Next, an example of processing in the synchronization information detecting means 702 in FIG. 6 will be described with reference to FIGS. First, the synchronization information detecting means 702 calculates the transition of the sum of the frame lengths for each channel of audio data input with information embedded. The result y (n) of this calculation is expressed by the following (Equation 1), where x (n) is the input signal and N is the detected frame length.

ここで、ステレオデータの左チャンネルＬの信号と、右チャンネルＲの信号について、フレーム長の総和値の推移データの一例を図７（ａ）、（ｂ）に示している。 Here, FIG. 7A and FIG. 7B show an example of transition data of the sum value of the frame length for the left channel L signal and the right channel R signal of the stereo data.

次に、左チャンネルＬの総和値の推移の演算結果yL(n) と右チャンネルＲの総和値の推移の演算結果yR(n) の差分（絶対値）を演算する。上記差分の演算結果ｚ（ｎ）は、次の（式２）で表される。 Next, the difference (absolute value) between the calculation result yL (n) of the transition of the total value of the left channel L and the calculation result yR (n) of the transition of the total value of the right channel R is calculated. The difference calculation result z (n) is expressed by the following (Equation 2).

ｚ（ｎ）＝｜ｙL(n)−ｙR(n)｜ …（式２）
ここで、フレーム総和値推移のチャンネル間差分データの一例を図８に示している。 z (n) = | yL (n) −yR (n) | (Formula 2)
Here, an example of the inter-channel difference data of the frame total value transition is shown in FIG.

次に、前記差分の演算結果ｚ（ｎ）が最大値をとる位置を検出することにより、検出されるべきフレームの境界（フレームの開始位置）を検出する。ここで、同期信号として、前記したような三角波信号を用いていた場合には、フレームの境界を極めて明確に検出することができる。 Next, by detecting the position where the difference calculation result z (n) takes the maximum value, the boundary of the frame to be detected (the start position of the frame) is detected. Here, when the triangular wave signal as described above is used as the synchronization signal, the boundary of the frame can be detected very clearly.

次に、図６中の同期情報検出手段702 による処理手順の一例について詳細に説明する。検出するフレーム長をN とした場合、（Ａ）0 番目のサンプル（抽出開始点）〜(N-1) 番目のサンプルに対する処理、（Ｂ）N 番目のサンプル〜(2N-1)番目のサンプルに対する処理、（Ｃ）2N番目のサンプル以降に対する処理、の３つの状態に分けられる。 Next, an example of a processing procedure performed by the synchronization information detecting unit 702 in FIG. 6 will be described in detail. If the detected frame length is N, (A) 0th sample (extraction start point) to (N-1) th sample, (B) Nth sample to (2N-1) th sample And (C) the processing for the 2Nth and subsequent samples.

図９は、図６中の同期情報検出手段702 における0 番目のサンプル（抽出開始点）〜(N-1) 番目のサンプルに対する処理（Ａ）の一例を示すフローチャートである。ステップＳ1000において、回数n=0,左チャンネルＬのフレーム長の総和値の推移の演算結果YL=0, 右チャンネルＲの総和値の推移の演算結果YR=0に初期設定を行う。 FIG. 9 is a flowchart showing an example of the process (A) for the 0th sample (extraction start point) to the (N-1) th sample in the synchronization information detecting means 702 in FIG. In step S1000, initialization is performed with the number of times n = 0, the calculation result YL = 0 of the transition of the total length of the frame length of the left channel L, and the calculation result YR = 0 of the transition of the total value of the right channel R.

本状態では、抽出開始点以降に得ることができるサンプルがN サンプル未満であるので、（式１）の演算を満足することができない。そこで、N サンプル以降に（式１）の結果を評価するため、ステップＳ1001において、得られた入力サンプルXL(n) およびXR(n) をディレイバッファx'L(n)およびx'R(n)に保存する。次に、ステップＳ1002において、累積値に最新入力サンプルXL(n) を加算する演算YL= YL+ XL(n) およびYR=YR+XR(n) を行う。そして、ステップＳ1003においてnのインクリメント処理を行い、ステップＳ1004で判定を行い、n<N であればステップＳ1001に戻る。 In this state, since the number of samples that can be obtained after the extraction start point is less than N samples, the calculation of (Equation 1) cannot be satisfied. Therefore, in order to evaluate the result of (Equation 1) after N samples, in step S1001, the obtained input samples XL (n) and XR (n) are used as delay buffers x'L (n) and x'R (n ). Next, in step S1002, operations YL = YL + XL (n) and YR = YR + XR (n) for adding the latest input sample XL (n) to the accumulated value are performed. In step S1003, n is incremented. In step S1004, a determination is made. If n <N, the process returns to step S1001.

図１０は、図６中の同期情報検出手段702 におけるN 番目のサンプル〜(2N-1)番目のサンプルに対する処理（Ｂ）の一例を示すフローチャートである。ステップＳ1101において、得られた入力サンプルXL(n) およびXR(n) をディレイバッファx'L(n)およびx'R(n)に保存する。そして、ステップＳ1102において、累積値YLからN サンプル前の値XL(n-N) を除き、さらに最新入力サンプルXL(n) を加算する演算YL= YL-XL(n-N)+ XL(n) を行う。また、累積値YRからN サンプル前の値XR(n-N) を除き、さらに最新入力サンプルXR(n) を加算する演算YR=YR-XR(n-N)+XR(n) を行う。 FIG. 10 is a flowchart showing an example of the process (B) for the Nth sample to (2N-1) th sample in the synchronization information detecting means 702 in FIG. In step S1101, the obtained input samples XL (n) and XR (n) are stored in the delay buffers x'L (n) and x'R (n). In step S1102, an operation YL = YL-XL (n-N) + XL (n) is performed in which the value XL (n-N) before N samples is removed from the accumulated value YL and the latest input sample XL (n) is added. Further, an operation YR = YR−XR (n−N) + XR (n) is performed by adding the latest input sample XR (n) by subtracting the value XR (n−N) N samples before the accumulated value YR.

この結果、ステップＳ1103の演算を行って得られる値は、左チャンネルＬおよび右チャンネルＲに対応して算出されたn サンプル時点での（式１）の値と等しい。このように（式１）の演算は、加減算を１つずつ行うだけで良く、処理負荷を低く保つことができる。 As a result, the value obtained by performing the calculation in step S1103 is equal to the value of (Equation 1) at the time of n samples calculated corresponding to the left channel L and the right channel R. Thus, the calculation of (Equation 1) only needs to be added and subtracted one by one, and the processing load can be kept low.

次に、ステップＳ1104において（式２）の演算を行い、ステップＳ1105において、本状態のN 番目のサンプル〜(2N-1)番目のサンプルにおいてz(n)が最大値をとるn を識別し、変数nfに保存する。そして、ステップＳ1106においてnのインクリメント処理を行い、ステップＳ1107で判定を行い、n<2NであればステップＳ1101に戻る。 Next, the calculation of (Equation 2) is performed in step S1104, and in step S1105, n in which z (n) has the maximum value in the Nth sample to the (2N-1) th sample in this state is identified. Save to variable nf. In step S1106, n is incremented. In step S1107, a determination is made. If n <2N, the process returns to step S1101.

本状態の完了時点でnfに保存されたサンプル位置が検出フレームの開始位置を示すものであり、具体的にはnf + N, nf + 2N, nf + 3N, ……をフレーム開始位置サンプルとして利用することができる。 The sample position stored in nf at the time of completion of this state indicates the start position of the detection frame. Specifically, nf + N, nf + 2N, nf + 3N, ... are used as frame start position samples. can do.

図１１は、図６中の同期情報検出手段702 における2N番目のサンプル以降に対する処理（Ｃ）の一例を示すフローチャートである。本状態は、前の状態で得られたフレーム開始位置に対する補正処理を行うものである。ステップＳ1201〜ステップＳ1204までの処理は、前述したステップＳ1101〜ステップＳ1104までの処理に等しい。ステップＳ1205において、ステップＳ1104と同様に、2N番目のサンプル以降でz(n)が最大値をとるn を識別し、変数nf’に保存する。この識別の結果、ステップＳ1206において、n'f が以前に得られたnfとずれているかを検出し、ずれが生じている場合は、ステップＳ1207においてnfを更新する。そして、ステップＳ1208においてnのインクリメント処理を行い、ステップＳ1201に戻る。 FIG. 11 is a flowchart showing an example of the process (C) for the 2Nth sample and thereafter in the synchronization information detecting means 702 in FIG. In this state, correction processing is performed on the frame start position obtained in the previous state. The processing from step S1201 to step S1204 is equal to the processing from step S1101 to step S1104 described above. In step S 1205, as in step S 1104, n 2 in which z (n) has the maximum value after the 2N-th sample is identified and stored in the variable nf ′. As a result of the identification, in step S1206, it is detected whether n′f is deviated from the previously obtained nf. If there is a deviation, nf is updated in step S1207. In step S1208, n is incremented, and the process returns to step S1201.

上記した情報抽出装置７０によれば、音声データがストリームデータ形式で入力される場合でも、音声データの情報付加処理フレームの開始位置の特定を比較的簡易に行うことができる。 According to the information extracting device 70 described above, even when audio data is input in the stream data format, the start position of the information addition processing frame of the audio data can be specified relatively easily.

本発明は、DVD やムービーファイルなどの映像信号の音声部分を利用し、その音声部分に制御信号などの補助的信号を埋め込む分野に適用可能である。さらには、CDに記録されたオーディオデータ、あるいはパソコンで扱うwave形式などの非圧縮オーディオや、DVDや音楽配信、携帯デジタル音楽プレーヤ、着うたなどのようにMP-3やAAC 等のオーディオ圧縮伸張技術を用いたパッケージやファイル形式、あるいはファイル配信やストリーミングなどの音声信号を伝達・配信する分野に適用可能である。例えば、着うたなどの歌詞（解説やアーティストメッセージ）表示付きの音楽コンテンツ販売、MP-3/AAC等のオーディオ圧縮伸張技術を用いたカラオケサービス、歌詞表示音楽配信サービス、楽器演奏ロボット玩具の商品化、演奏情報付き音楽コンテンツの販売、ビートタイミング付き音楽コンテンツの販売、などの分野に適用可能である。 The present invention is applicable to the field of using an audio portion of a video signal such as a DVD or a movie file and embedding an auxiliary signal such as a control signal in the audio portion. In addition, audio data recorded on CDs or uncompressed audio such as wave formats handled by personal computers, audio compression / decompression technologies such as MP-3 and AAC, such as DVD and music distribution, portable digital music players, and Chaku-Uta It is applicable to the field of transmitting and distributing audio signals such as package and file format using file, or file distribution and streaming. For example, sales of music contents with lyrics (explanation and artist message) display such as Chaku-Uta, karaoke service using audio compression / decompression technology such as MP-3 / AAC, music display service for displaying lyrics, commercialization of musical instrument playing robot toys, It can be applied to fields such as sales of music content with performance information and sales of music content with beat timing.

本発明の音声信号処理装置の第１の実施形態に係る音声コンテンツ配信システムの構成を概略的に示すブロック図。1 is a block diagram schematically showing the configuration of an audio content distribution system according to a first embodiment of an audio signal processing device of the present invention. 図１中の情報付加装置における処理機能に着目して処理フローの一例にしたがって示すブロック図。The block diagram shown according to an example of a processing flow paying attention to the processing function in the information addition apparatus in FIG. 図２中の同期情報付加手段により処理される音声データ入力の一例を示す波形図。FIG. 3 is a waveform diagram showing an example of audio data input processed by the synchronization information adding means in FIG. 2. 図２中の同期情報付加手段において元の音声データに同期情報として付加されるフレーム同期信号の一例を示す波形図。FIG. 3 is a waveform diagram showing an example of a frame synchronization signal added as synchronization information to the original audio data by the synchronization information adding means in FIG. 2. 図２中の同期情報付加手段により処理された情報付加処理済みの音声データの一例を示す波形図。FIG. 3 is a waveform diagram showing an example of audio data that has been subjected to information addition processing and has been processed by the synchronization information addition means in FIG. 2; 図１中の情報抽出装置における処理機能に着目して処理フローの一例にしたがって示すブロック図。The block diagram shown according to an example of a processing flow paying attention to the processing function in the information extraction device in FIG. 図６中の同期情報検出手段で情報埋め込み済みの音声データ入力がステレオデータである場合に各チャンネル毎にフレーム長の総和値の推移を演算した結果の一例を示す特性図。FIG. 7 is a characteristic diagram showing an example of a result of calculating the transition of the sum of frame lengths for each channel when the audio data input in which information is embedded by the synchronization information detection unit in FIG. 6 is stereo data. 図６中の同期情報検出手段で情報埋め込み済みの音声データ入力がステレオデータである場合に各チャンネル毎にフレーム長の総和値の推移データの差分の絶対値を演算した結果の一例を示す特性図。FIG. 6 is a characteristic diagram showing an example of the result of calculating the absolute value of the difference between the transition data of the sum of the frame lengths for each channel when the audio data input in which information is embedded by the synchronization information detecting means in FIG. 6 is stereo data. . 図６中の同期情報検出装置における0 番目のサンプル（抽出開始点）〜(N-1) 番目のサンプルに対する処理の一例を示すフローチャート。The flowchart which shows an example of the process with respect to the 0th sample (extraction start point)-(N-1) th sample in the synchronous information detection apparatus in FIG. 図６中の同期情報検出装置におけるN 番目のサンプル〜(2N-1)番目のサンプルに対する処理の一例を示すフローチャート。The flowchart which shows an example of the process with respect to the Nth sample-(2N-1) th sample in the synchronous information detection apparatus in FIG. 図６中の同期情報検出装置における2N番目のサンプル以降に対する処理の一例を示すフローチャート。The flowchart which shows an example of the process with respect to the 2N-th sample and after in the synchronous information detection apparatus in FIG.

Explanation of symbols

１０…音声コンテンツ配信システムの送信側、２０…送信回路、４０…伝送路、３０…情報付加装置、５０…音声コンテンツ配信システムの受信側、６０…受信回路、７０…情報抽出装置、302 …フレーム位置特定手段、303 …フレーム抽出手段、304 …同期情報付加手段、305 …関連情報付加手段。 DESCRIPTION OF SYMBOLS 10 ... Transmission side of audio | voice content delivery system, 20 ... Transmission circuit, 40 ... Transmission path, 30 ... Information addition apparatus, 50 ... Reception side of audio | voice content distribution system, 60 ... Reception circuit, 70 ... Information extraction apparatus, 302 ... Frame Position specifying means, 303 ... Frame extracting means, 304 ... Synchronization information adding means, 305 ... Related information adding means.

Claims

Synchronization information adding means for adding synchronization information indicating a time position of the frame for each frame which is a fixed length section for audio data input;
Related information adding means for embedding related information related to the audio data in a frame of the audio data to which the synchronization information is added by the synchronization information adding means;
An audio signal processing device comprising:

Predetermined processing is performed for each frame that is a fixed-length section with respect to audio data input, and synchronization information indicating the time position of the frame is added to the audio data, and the frame includes the synchronization information Information embedded with related information related to the audio data embedded in the stream data format, the synchronization information detecting means for detecting the synchronization information;
A frame extraction unit having a function of detecting a frame start position of the audio data embedded with the information based on the synchronization information detected by the synchronization information detection unit;
Related information extraction means for performing predetermined processing for each frame of the audio data embedded with the information based on the frame information detected by the frame extraction means, and extracting related information embedded in the frame When,
An audio signal processing device comprising:

As the synchronization information, a predetermined low frequency band which is a triangular wave signal having a waveform in which two adjacent frames of the audio data are synchronized with the frame as one cycle and whose amplitude changes in a triangular shape and has little influence on the sense of hearing. The audio signal processing apparatus according to claim 1, wherein the signal is used.