JP2007065497A

JP2007065497A - Signal processing apparatus

Info

Publication number: JP2007065497A
Application number: JP2005253913A
Authority: JP
Inventors: Shuji Miyasaka; 修二宮阪; Yoshiaki Takagi; 良明高木; Takeshi Norimatsu; 武志則松; Akihisa Kawamura; 明久川村; Koshiro Ono; 耕司郎小野
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2005-09-01
Filing date: 2005-09-01
Publication date: 2007-03-15

Abstract

<P>PROBLEM TO BE SOLVED: To solve the problem, wherein when a decorrelation signal is created by a delay means and an all pass filter, when input voice is sharp in the temporal characteristics or in frequency characteristics, sharpness is lost by an unnecessary reverberation signal. <P>SOLUTION: A signal processing apparatus comprises a generating means for generating a second signal from a first signal; a mixing coefficient determination means for determining a mixing rate; and a mixing means for mixing the first signal with the second signal, on the basis of the mixing rate determined by the mixing coefficient determination means. The generating means comprises a delay means for delaying the first signal by N (N>0) unit time; a filtering means for processing the output signal of the delay means; and a processing means for processing the first signal. The generating means generates the second signal from the output signal of the filter means and the output signal of the processing means. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、複数の信号をダウンミックスした信号とそれをもとの信号に分離するための情報を符号化した符号化信号を復号化するための信号処理装置に関する。特に、信号間の位相差や、レベル比を符号化することによって少ない情報量でマルチチャネルの臨場感を符号化した符号化信号を復号化できるような技術に関する。 The present invention relates to a signal processing apparatus for decoding a signal obtained by down-mixing a plurality of signals and an encoded signal obtained by encoding information for separating the signal into the original signal. In particular, the present invention relates to a technique capable of decoding an encoded signal obtained by encoding multi-channel presence with a small amount of information by encoding a phase difference between signals and a level ratio.

近年、Spatial Codec（空間的符号化）といわれる技術開発が行われている。これは、非常に少ない情報量でマルチチャネルの臨場感を圧縮・符号化することを目的としており、例えば、既にデジタルテレビの音声方式として広く用いられているマルチチャネルコーデックであるＡＡＣ方式が、5.1ｃｈ当り512kbpsや、384kbpsというビットレートを要するのに対し、Spatial Codecでは、128kbpsや、64kbps、さらに48kbpsといった非常に少ないビットレートでマルチチャネル信号を圧縮・符号化することを目指している。そのための技術として、例えば、ＭＰＥＧオーディオ方式で規格化されたParametric Coding for High Quality Audio（非特許文献１）がある。それによると、チャンネル間の位相差や、レベル比を符号化することによって少ない情報量で臨場感を圧縮符号化した信号を復号化する過程が述べられている。図５にその過程を示した。まず、入力信号Ｓは、もともとは２ｃｈの信号であったものをモノラル信号にダウンミックスしたものである。入力信号Ｓは、decorrelation と呼ばれる処理モジュールに入力され、出力信号Dを得る。decorrelationの処理過程は、非特許文献１の8.6.4.5.2節 Calculate decorrelated signal に詳しく述べられているので詳しい説明は省略するが、decorrelationは、大きく２つの処理で構成されている。１つ目は遅延の処理である。これは入力信号を予め定められた時間分遅延させる処理である。その後、前記遅延した信号は、All Pass Filterという処理に掛けられる。この処理は信号に残響成分(reverberation)を与える処理である。さて、そのようにして生成された信号Dと、前記入力信号Ｓとは、Mixingといわれる処理に掛けられる。この処理も、非特許文献１の8.6.4.6.2 Mixingに詳しく述べられているので詳しい説明は省略するが、前記２つの信号SとDとに、係数h11, h12, h21, h22が掛けられでそれぞれ合算され、出力のLｃｈ信号、Rｃｈ信号を得る。その式は図内に示したとおりである。ここで、係数h11, h12, h21, h22は、前記入力のモノラル信号をもとになった、もともとの２ｃｈの信号間のレベル比や、位相差によって決まる値であるが、それらレベル比や、位相差の情報から前記係数h11, h12, h21, h22を求めるこの求め方も非特許文献１に述べられているのでここでは省略する。 In recent years, a technology called Spatial Codec has been developed. This is for the purpose of compressing and encoding the presence of multi-channel with a very small amount of information. For example, the AAC system, which is a multi-channel codec that is already widely used as an audio system for digital television, is 5.1. While bit rates of 512 kbps and 384 kbps are required per channel, Spatial Codec aims to compress and encode multichannel signals at very low bit rates of 128 kbps, 64 kbps, and 48 kbps. As a technique for that purpose, for example, there is Parametric Coding for High Quality Audio (Non-patent Document 1) standardized by the MPEG audio system. According to this, a process is described in which a signal in which a sense of presence is compression-coded with a small amount of information is encoded by encoding a phase difference between channels or a level ratio. FIG. 5 shows the process. First, the input signal S is a signal obtained by downmixing what was originally a 2ch signal into a monaural signal. The input signal S is input to a processing module called decorrelation to obtain an output signal D. The decorrelation process is described in detail in Section 8.6.4.5.2 Calculate Decorrelated signal in Non-Patent Document 1, so detailed description is omitted, but decorrelation is mainly composed of two processes. The first is delay processing. This is a process of delaying the input signal by a predetermined time. Thereafter, the delayed signal is subjected to a process called All Pass Filter. This process is a process for giving a reverberation component to the signal. The signal D thus generated and the input signal S are subjected to a process called “Mixing”. This process is also described in detail in 8.6.4.6.2 Mixing of Non-Patent Document 1, so detailed description is omitted, but the two signals S and D are multiplied by coefficients h11, h12, h21, and h22. Are combined to obtain an output Lch signal and Rch signal. The equation is as shown in the figure. Here, the coefficients h11, h12, h21, and h22 are values determined by the level ratio and phase difference between the original 2ch signals based on the input monaural signal. Since this method of obtaining the coefficients h11, h12, h21, h22 from the information of the phase difference is also described in Non-Patent Document 1, it is omitted here.

このような処理をすることによって、decorrelationにおける遅延の処理と残響成分の付加との効果で、モノラル化された信号から２ｃｈの信号を生成する際に、空間的な広がり間が与えられ、良好なステレオ信号が得られる。
ISO/IEC 14496-3:2001/FDAM 2:2004(E) By performing such processing, a spatial spread is given when a 2ch signal is generated from a monaural signal due to the effects of delay processing in decorrelation and the addition of reverberation components, which is favorable. A stereo signal is obtained.
ISO / IEC 14496-3: 2001 / FDAM 2: 2004 (E)

しかしながら、上記のような方法には、以下のような課題がある。すなわち、入力の信号が非常に時間変動の激しいもので合った場合（例えば金属系の打楽器のアタックの瞬間などの場合）、前記decorrelationの処理内の、遅延と残響成分の付加との効果で、decorrelation後の信号はそのシャープさを失ってしまう。さらにそのdecorrelation後の信号が、後段のMixingの処理によって、入力の信号Sと合算されるので、結果として、出力信号は、入力信号のシャープさを失ってしまうこととなる。 However, the above method has the following problems. That is, when the input signal is matched with a signal with very severe time fluctuation (for example, at the moment of attack of a metallic percussion instrument), the delay and reverberation components in the decorrelation process are effective. The signal after decorrelation loses its sharpness. Further, since the signal after the decorrelation is added to the input signal S by the mixing process in the subsequent stage, as a result, the output signal loses the sharpness of the input signal.

また同様に、入力の信号の周波数成分が特定の周波数帯域に偏って存在する場合（例えば１種類の楽器の音色が連続的に続いているような場合）、本来、非常にしっかりとした定位の音像が結ばれるべきであるが、前記decorrelationの処理内の、遅延と残響成分の付加との効果で、decorrelation後の信号はそのしっかりとした定位の音像がぼやけてしまう。さらにそのdecorrelation後の信号が、後段のMixingの処理によって、入力の信号Sと合算されるので、結果として、出力信号の音像がぼやけてしまうこととなる。 Similarly, when the frequency component of the input signal is biased to a specific frequency band (for example, when the tone of one kind of instrument continues continuously), it is inherently very solid. Although a sound image should be formed, due to the effects of delay and addition of reverberation components in the decorrelation process, the sound image with a firm localization is blurred in the signal after decorrelation. Further, the signal after the decorrelation is added to the input signal S by the subsequent mixing process, and as a result, the sound image of the output signal is blurred.

また、生成される２ｃｈの信号は、モノラル化された信号からレベル比や、位相差の情報のみを手がかりに分離されたものであるので、その分離の性能は、不十分である場合も多い。 In addition, since the generated 2ch signal is separated from the monaural signal using only the level ratio and phase difference information as a clue, the separation performance is often insufficient.

本発明は、このような従来の問題点に鑑みてなされたものであって、
モノラル化された信号から２ｃｈの信号を生成する際に、空間的な広がり間が与えられ、良好なステレオ信号が得られると同時に、音の時間的変動のシャープさや、音像のしっかりとした定位も実現できる信号処理装置を提供することを目的とする。 The present invention has been made in view of such conventional problems,
When a 2ch signal is generated from a monaural signal, a spatial spread is given and a good stereo signal is obtained. At the same time, the sharpness of temporal fluctuation of sound and the localization of the sound image are fixed. An object of the present invention is to provide a signal processing apparatus that can be realized.

また、分離感の不足を補うようなる信号処理装置を提供することを目的とする。 Another object of the present invention is to provide a signal processing device that compensates for the lack of separation.

上記の課題を解決するため、請求項１記載の信号処理装置は、第１の信号と、前記第１の信号から生成した第２の信号とを、２通りの混合の度合で混合することで２つの信号を生成する信号処理装置であって、前記第１の信号から前記第２の信号を生成する生成手段と、前記混合の度合を決定する混合係数決定手段と、前記混合係数決定手段で決定された混合の度合に基づいて、前記第１の信号と前記第２の信号とを混合する混合手段と、を有し、前記生成手段は、前記第１の信号をＮ（Ｎ＞０）単位時間遅延させる遅延手段と、前記遅延手段の出力信号を加工するフィルタ手段と、前記第１の信号を加工する加工手段と、を備え、前記生成手段は、前記フィルタ手段の出力信号と前記加工手段の出力信号とから前記第２の信号を生成することを特徴とするものである。 In order to solve the above-described problem, the signal processing device according to claim 1 is configured to mix the first signal and the second signal generated from the first signal in two degrees of mixing. A signal processing apparatus for generating two signals, wherein the generating means for generating the second signal from the first signal, the mixing coefficient determining means for determining the degree of mixing, and the mixing coefficient determining means Mixing means for mixing the first signal and the second signal based on the determined degree of mixing, and the generating means converts the first signal to N (N> 0). A delay unit that delays the unit time; a filter unit that processes an output signal of the delay unit; and a processing unit that processes the first signal. The generation unit includes the output signal of the filter unit and the processing Generating the second signal from the output signal of the means. The one in which the features.

請求項２記載の信号処理装置は、前記生成手段が、前記第１の信号の音響的特徴量に応じて、前記フィルタ手段の出力信号と前記加工手段の出力信号とから前記第２の信号を合成する合成手段を有し、前記音響的特徴量が、前記第１の信号が急峻に変動している場合大となる特徴量であることを特徴とするものである。 The signal processing apparatus according to claim 2, wherein the generation unit generates the second signal from an output signal of the filter unit and an output signal of the processing unit according to an acoustic feature amount of the first signal. And combining means for combining, wherein the acoustic feature quantity is a feature quantity that becomes large when the first signal is abruptly fluctuating.

請求項３記載の信号処理装置は、前記生成手段が、前記第１の信号の音響的特徴量に応じて、前記フィルタ手段の出力信号と前記加工手段の出力信号とから前記第２の信号を合成する合成手段を有し、前記音響的特徴量が、前記第１の信号が特定の周波数帯域に強いエネルギーが集中している場合大となる特徴量であることを特徴とするものである。 The signal processing apparatus according to claim 3, wherein the generation unit generates the second signal from an output signal of the filter unit and an output signal of the processing unit in accordance with an acoustic feature amount of the first signal. The acoustic feature amount is a feature amount that becomes large when strong energy is concentrated in a specific frequency band in the first signal.

請求項４記載の信号処理装置は、前記合成手段が、前記特徴量が小である場合は、前記フィルタ手段の出力信号を出力し、前記特徴量が大である場合は、前記加工手段の出力信号を出力することを特徴とするものである。 The signal processing apparatus according to claim 4, wherein the combining unit outputs an output signal of the filter unit when the feature amount is small, and an output of the processing unit when the feature amount is large. A signal is output.

請求項５記載の信号処理装置は、前記合成手段が、前記フィルタ手段の出力信号と前記加工手段の出力信号とを混合する第２の混合手段を有し、前記特徴量が小である場合は、前記フィルタ手段の出力信号を多めに混合し、前記特徴量が大である場合は、前記加工手段の出力信号を多めに混合することを特徴とするものである。 The signal processing apparatus according to claim 5, wherein the combining unit includes a second mixing unit that mixes the output signal of the filter unit and the output signal of the processing unit, and the feature amount is small. The output signal of the filter means is mixed in a large amount, and when the feature amount is large, the output signal of the processing means is mixed in a large amount.

請求項６記載の信号処理装置は、前記加工手段が、第２のフィルタ手段を有し、前記第２のフィルタ手段は前期第１のフィルタ手段よりフィルタの次数が少ないことを特徴とするものである。 The signal processing apparatus according to claim 6, wherein the processing means includes second filter means, and the second filter means has a filter order less than the first filter means in the previous period. is there.

請求項７記載の信号処理装置は、前記加工手段が、第２の遅延手段を有し、前記第２の遅延手段は前記第１の遅延手段より遅延量が少ないことを特徴とするものである。 The signal processing apparatus according to claim 7, wherein the processing unit includes a second delay unit, and the second delay unit has a delay amount smaller than that of the first delay unit. .

請求項８記載の信号処理装置は、第３のフィルタ手段を備え、前記第３のフィルタ手段は、入力信号の位相を９０度或いは‐９０度回転させる処理であることを特徴とするものである。 The signal processing device according to claim 8 is provided with a third filter means, and the third filter means is a process for rotating the phase of the input signal by 90 degrees or -90 degrees. .

請求項９記載の信号処理装置は、前記生成手段が複数の周波数成分に対しそれぞれ独立に信号を処理することが出来るように構成されており、低い周波数帯域の信号に対しては前記フィルタ手段の信号を出力し、高い周波数帯域の信号に対しては前記加工手段の信号を出力することを特徴とするものである。 The signal processing device according to claim 9 is configured such that the generation unit can process a signal independently for each of a plurality of frequency components. A signal is output, and a signal of the processing means is output for a signal in a high frequency band.

請求項１０記載の信号処理装置は、前記第１の信号は、２つの信号をダウンミックして得られた信号であり、前記混合係数決定手段は、前記もともとの２つの信号間のレベル比Ｌと位相差θとに応じて決まる値から、混合の度合を決定するものである。 The signal processing apparatus according to claim 10, wherein the first signal is a signal obtained by downmixing two signals, and the mixing coefficient determination unit is configured to output a level ratio L between the two original signals. The degree of mixing is determined from a value determined according to the phase difference θ.

請求項１１記載の信号処理装置は、前記混合係数決定手段が、隣り合う２辺の成す角度が前記θで、長さの比が前記Ｌであるところの平行四辺形の前記θが当該平行四辺形の対角線によって分割されて得られる角度をＡ、Ｂとし、前記レベル比Ｌに応じて決まる値ｄ1、d2とした時、d1*cos(Ａ)、d1*sin(Ａ)、d2*cos(-Ｂ)、d2*sin(-Ｂ)とをもとめ、前記混合手段は、前記第１の信号を複素数で表現したときの実数部をr1、虚数部をi1、前記第２の信号を複素数で表現したときの実数部をr2、虚数部をi2、としたとき、d1*cos(Ａ)*r1＋d1*sin(Ａ)*r2 を１つ目の出力信号の実数部とし、d1*cos(Ａ)*i1＋d1*sin(Ａ)*i2 を１つ目の出力信号の虚数部とし、d2*cos(-Ｂ)*r1＋d2*sin(-Ｂ)*r2 を２つ目の出力信号の実数部とし、d2*cos(-Ｂ)*i1＋d2*sin(-Ｂ)*i2 を２つ目の出力信号の虚数部とすることを特徴とするものである。 12. The signal processing apparatus according to claim 11, wherein the mixing coefficient determining means is configured such that the angle of the parallelogram where the angle formed by two adjacent sides is the θ and the length ratio is the L is the parallelogram. Assuming that the angles obtained by dividing the diagonal of the shape are A and B, and values d1 and d2 determined according to the level ratio L, d1 * cos (A), d1 * sin (A), d2 * cos ( -B) and d2 * sin (-B), the mixing means uses the complex part to express the real part as r1, the imaginary part as i1, and the second signal as a complex number. When the real part when expressed is r2 and the imaginary part is i2, d1 * cos (A) * r1 + d1 * sin (A) * r2 is the real part of the first output signal and d1 * cos (A ) * i1 + d1 * sin (A) * i2 is the imaginary part of the first output signal, and d2 * cos (-B) * r1 + d2 * sin (-B) * r2 is the real part of the second output signal , D2 * cos (-B) * i1 + d2 * sin (-B) * i2 is the imaginary part of the second output signal It is characterized in.

請求項１２記載の信号処理装置は、前記混合係数決定手段が、前記d1、d2の値を、d1 = L/((1+2*L*cos(θ)+ L*L)^0.5 )、d2=1/((1+2*L*cos(θ)+L*L)^0.5)として求めることを特徴とするものである。 The signal processing apparatus according to claim 12, wherein the mixing coefficient determination unit sets the values of d1 and d2 to d1 = L / ((1 + 2 * L * cos (θ) + L * L) ^ 0.5), d2 = 1 / ((1 + 2 * L * cos (θ) + L * L) ^ 0.5).

請求項１３記載の信号処理装置は、前記混合係数決定手段が、前記d1、d2の値を、d1=L/((1+L*L)^0.5)、d2=1/((1+L*L)^0.5)として求めることを特徴とするものである。 The signal processing apparatus according to claim 13, wherein the mixing coefficient determining means sets the values of d1 and d2 to d1 = L / ((1 + L * L) ^ 0.5), d2 = 1 / ((1 + L * L) ^ 0.5).

請求項１４の信号処理装置は、前記音響的特徴量を符号化したデータを受信する特徴量受信手段を更に有し、前記生成手段は、前記音響的特徴量を符号化したデータに応じて、信号を生成することを特徴とするものである。 The signal processing device according to claim 14 further includes a feature amount receiving unit that receives data in which the acoustic feature amount is encoded, and the generation unit is configured to respond to the data in which the acoustic feature amount is encoded. A signal is generated.

請求項１５の信号処理装置は、前記音響的特徴量を符号化したデータは１ビットのデータであり、前記生成手段は、当該データが真の場合は、前記加工手段の出力信号を出力し、偽の場合は前記フィルタ手段の出力信号を出力することを特徴とするものである。 The signal processing device according to claim 15, wherein the data obtained by encoding the acoustic feature amount is 1-bit data, and the generation unit outputs an output signal of the processing unit when the data is true, In the case of false, the output signal of the filter means is output.

請求項１６記載の信号処理装置は、第１の信号と、前記第１の信号から生成した第２の信号とを、２通りの混合の度合で混合することで２つの信号を生成する信号処理装置であって、前記第１の信号から前記第２の信号を生成する生成手段と、前記混合の度合を決定する混合係数決定手段と、前記混合係数決定手段で決定された混合の度合に基づいて、前記第１の信号と前記第２の信号とを混合する混合手段と、前記混合手段から生成される２つの信号の再生音像を離れた位置に移動させるための信号処理を行う音像移動手段と、を有することを特徴とするものである。 17. The signal processing device according to claim 16, wherein the first signal and the second signal generated from the first signal are mixed in two degrees of mixing to generate two signals. An apparatus for generating the second signal from the first signal; a mixing coefficient determining means for determining the degree of mixing; and a mixing degree determined by the mixing coefficient determining means. Mixing means for mixing the first signal and the second signal, and sound image moving means for performing signal processing for moving the reproduced sound images of the two signals generated from the mixing means to positions separated from each other. It is characterized by having.

請求項１７記載の信号処理装置は、前記音像移動手段は、前記混合手段から生成される２つの信号を受信し、当該信号を加工することによって２つの信号の再生音像を離れた位置に移動させることを特徴とするものである。 The signal processing device according to claim 17, wherein the sound image moving means receives two signals generated from the mixing means and processes the signals to move the reproduced sound images of the two signals to positions separated from each other. It is characterized by this.

請求項１８記載の信号処理装置は、前記音像移動手段は、前記混合係数決定手段で決定された混合の度合を加工することによって、前記混合手段によって生成される２つの信号の再生音像が離れた位置に移動するようにすることを特徴とするものである。 19. The signal processing apparatus according to claim 18, wherein the sound image moving means processes the degree of mixing determined by the mixing coefficient determining means so that the reproduced sound images of the two signals generated by the mixing means are separated from each other. It is characterized by moving to a position.

請求項１９記載の信号処理装置は、前記音像移動手段は、前記混合手段から生成される２つの信号のうち一方の信号に対し当該信号の再生音像を第１の方向に移動させるための信号処理を施す第１の処理手段と、他方の信号に対し当該信号の再生音像を前記第１の方向の逆の方向である第２の方向に移動させるための信号処理を施す第２の処理手段とを備えることを特徴とするものである。 20. The signal processing apparatus according to claim 19, wherein the sound image moving means moves the reproduced sound image of the signal in the first direction with respect to one of the two signals generated from the mixing means. And second processing means for performing signal processing for moving the reproduced sound image of the signal in a second direction that is opposite to the first direction with respect to the other signal. It is characterized by providing.

請求項２０記載の信号処理装置は、前記第１の処理手段は、所定の周波数帯域の信号の振幅を変化させる処理を実施し、前記第２の処理手段は、前記周波数帯域と異なる周波数帯域の信号の振幅を変化させる処理を実施することを特徴とするものである。 The signal processing apparatus according to claim 20, wherein the first processing unit performs processing of changing an amplitude of a signal in a predetermined frequency band, and the second processing unit has a frequency band different from the frequency band. A process for changing the amplitude of the signal is performed.

請求項２１記載の信号処理装置は、前記混合手段から生成される２つの信号が近似している場合は前記音像移動手段から生成される信号を出力し、そうでない場合は、前記混合手段から生成される信号を出力することを特徴とするものである。 The signal processing apparatus according to claim 21, wherein the two signals generated from the mixing unit are approximated to output a signal generated from the sound image moving unit, and otherwise, generated from the mixing unit. The output signal is output.

請求項１の発明によれば、モノラル信号からステレオ信号を生成する際、残響成分が有用な場合と不要な場合のどちらでも適切なステレオ信号を生成できることとなる。 According to the first aspect of the present invention, when a stereo signal is generated from a monaural signal, an appropriate stereo signal can be generated regardless of whether the reverberation component is useful or unnecessary.

請求項２の発明によれば、時間変動が急峻な入力信号に対して適切なステレオ信号を生成できることとなる。 According to the second aspect of the present invention, it is possible to generate an appropriate stereo signal for an input signal having a sharp time variation.

請求項３の発明によれば、周波数特性が急峻な入力信号に対して適切なステレオ信号を生成できることとなる。 According to the third aspect of the present invention, an appropriate stereo signal can be generated for an input signal having a sharp frequency characteristic.

請求項４の発明によれば、入力信号の状態に応じて、適切に切り替えることができることとなる。 According to the invention of claim 4, it is possible to switch appropriately according to the state of the input signal.

請求項５の発明によれば、入力信号の状態が、中間的な状態であっても、適切な信号を生成できることとなる。 According to the invention of claim 5, even if the state of the input signal is an intermediate state, an appropriate signal can be generated.

請求の６発明によれば、フィルタ次数の小さなフィルタによって適切な残響の信号を生成できることとなる。 According to the sixth aspect of the invention, an appropriate reverberant signal can be generated by a filter having a small filter order.

請求の７発明によれば、遅延量の小さな遅延手段によって適切な残響の信号を生成できることとなる。 According to the seventh aspect of the invention, an appropriate reverberation signal can be generated by the delay means having a small delay amount.

請求の８発明によれば、フィルタ遅延を伴わないフィルタによって適切な残響の信号を生成できることとなる。 According to the eighth aspect of the present invention, an appropriate reverberant signal can be generated by a filter without a filter delay.

請求の９発明によれば、周波数成分ごとに独立に残響成分を制御できることとなる。
請求の１０発明によれば、混合の度合をレベル比と位相差情報とによって決定することができることとなる。 According to the ninth aspect of the invention, the reverberation component can be controlled independently for each frequency component.
According to the tenth aspect of the present invention, the degree of mixing can be determined by the level ratio and the phase difference information.

請求の１１発明によれば、混合の度合をレベル比と位相差情報とによってもとめる際、位相の分配を数学的に正しく決定することができることとなる。 According to the eleventh aspect of the invention, when the degree of mixing is obtained from the level ratio and the phase difference information, the phase distribution can be determined mathematically and correctly.

請求の１２発明によれば、混合の度合をレベル比と位相差情報とによってもとめる際、ゲインの分配を数学的に正しく決定することができることとなる。 According to the twelfth aspect of the present invention, when the degree of mixing is obtained from the level ratio and the phase difference information, the gain distribution can be determined mathematically and correctly.

請求の１３発明によれば、混合の度合をレベル比と位相差情報とによってもとめる際、ゲインの分配を簡易的に決定することができることとなる。 According to the thirteenth aspect of the invention, when the degree of mixing is obtained from the level ratio and the phase difference information, the gain distribution can be easily determined.

請求の１４発明によれば、残響成分が有用な場合か不要な場合かを判別する信号が予め符号化され与えられるので、容易に適切なステレオ信号を生成できることとなる。 According to the fourteenth aspect of the invention, since a signal for determining whether the reverberation component is useful or unnecessary is encoded and given in advance, an appropriate stereo signal can be easily generated.

請求の１５発明によれば、残響成分が有用な場合か不要な場合かを判別する信号が１ビットであるので、単純な切り替えによって適切なステレオ信号を生成できることとなる。 According to the fifteenth aspect of the invention, since the signal for determining whether the reverberation component is useful or unnecessary is 1 bit, an appropriate stereo signal can be generated by simple switching.

請求の１６、１７の発明によれば、生成される２ｃｈの信号が、モノラル化された信号からレベル比や、位相差の情報のみを手がかりに分離されたものであることによる分離性能の不十分さを補うことができることとなる。 According to the sixteenth and seventeenth aspects of the invention, the separation performance is insufficient because the generated 2ch signal is separated from the monaural signal based on only the level ratio and phase difference information. You can make up for it.

請求の１８発明によれば、混合手段の係数を変更するだけであるので、ほとんど演算量を増加させることなく、分離性能を高めることができることとなる。 According to the eighteenth aspect of the invention, since only the coefficient of the mixing means is changed, the separation performance can be improved with almost no increase in the calculation amount.

請求の１９発明によれば、分離性能の不十分さを、双方の信号の音像をそれぞれ逆方向に移動させることによって補うことができることとなる。 According to the nineteenth aspect of the invention, the insufficient separation performance can be compensated by moving the sound images of both signals in opposite directions.

請求の２０発明によれば、非常に少ない演算量で、分離性能の不十分さを補うことができることとなる。 According to the twentieth aspect of the invention, insufficient separation performance can be compensated with a very small amount of calculation.

請求の２１発明によれば、分離性能が不十分になる場合のみ分離性能の不十分さを補うことができることとなる。 According to the twenty-first aspect of the present invention, insufficient separation performance can be compensated only when separation performance is insufficient.

（実施の形態１）
以下本発明の実施の形態１における信号処理装置について図面を参照しながら説明する。
図１は本実施の形態１における信号処理装置の構成を示す図である。本信号処理装置は、２つのオーディオ信号をダウンミックスした信号を符号化した第１の符号化信号と、前記２つのオーディオ信号間のレベル比Ｌに応じて決まる値を符号化した第２の符号化信号と、前記２つのオーディオ信号間の位相差θに応じて決まる値を符号化した第３の符号化信号と、からなるビットストリームをデコードする信号処理装置である。 (Embodiment 1)
Hereinafter, a signal processing apparatus according to Embodiment 1 of the present invention will be described with reference to the drawings.
FIG. 1 is a diagram showing the configuration of the signal processing apparatus according to the first embodiment. The signal processing apparatus includes a first encoded signal obtained by encoding a signal obtained by down-mixing two audio signals, and a second code obtained by encoding a value determined according to the level ratio L between the two audio signals. And a third encoded signal obtained by encoding a value determined according to a phase difference θ between the two audio signals.

図１において、１００は、前記第１の符号化信号を復号化し第１の信号を生成する復号化手段、１０１は、前記第１の信号から前記第２の信号を生成する生成手段、１０２は、前記第２の符号化信号と前記第３の符号化信号とから混合係数を決定する混合係数決定手段、１０３は、前記混合係数決定手段１０２で決定された混合の度合に基づいて、前記第１の信号と前記第２の信号とを混合する混合手段、１０４は、前記第１の信号をＮ（Ｎ＞０）単位時間遅延させる遅延手段、１０５は、前記遅延手段１０４の出力信号を加工する第１のフィルタ手段、１０６は、前記遅延手段１０４の出力信号を加工する第２のフィルタ手段、１０７は、前記第１の信号の音響的特徴量を検出する特徴量検出手段、１０８は、前記音響的特徴量に応じて、前記第１のフィルタ手段１０５の出力信号と前記第２のフィルタ手段１０６の出力信号とから前記第２の信号を合成する合成手段、である。 In FIG. 1, reference numeral 100 denotes decoding means for decoding the first encoded signal to generate a first signal, 101 denotes generation means for generating the second signal from the first signal, and 102 denotes Mixing coefficient determining means for determining a mixing coefficient from the second encoded signal and the third encoded signal, 103 is based on the degree of mixing determined by the mixing coefficient determining means 102, Mixing means for mixing the first signal and the second signal; 104, delay means for delaying the first signal by N (N> 0) unit time; and 105, processing the output signal of the delay means 104 First filter means 106, second filter means 106 for processing the output signal of the delay means 104, 107 feature quantity detection means for detecting an acoustic feature quantity of the first signal, 108, According to the acoustic feature amount, Serial first combining means for combining the second signal from the output signal and the output signal of said second filter means 106 of the filter means 105, it is.

以上のように構成された信号処理装置の動作について以下説明する。
まず、前記復号化手段１００で、前記第１の符号化信号を復号化し、第１の信号を生成する。ここで前記第１の符号化信号は、２つのオーディオ信号をダウンミックスしたモノラル信号を符号化したものであり、例えば、MPEG方式AAC規格のエンコーダで符号化されたものである。ここでは、このようなAAC規格の符号化信号を復号化して得られたPCM信号を複数の周波数帯域からなる周波数信号に変換するところまで、当該復号化手段１００で行うものとする。以下の説明では、そのような複数の周波数帯域の信号のうちある特定の１つの帯域の信号に対する処理を説明する。 The operation of the signal processing apparatus configured as described above will be described below.
First, the decoding unit 100 decodes the first encoded signal to generate a first signal. Here, the first encoded signal is obtained by encoding a monaural signal obtained by downmixing two audio signals, and is encoded by, for example, an MPEG AAC standard encoder. Here, it is assumed that the decoding means 100 performs the process until the PCM signal obtained by decoding such an AAC standard encoded signal is converted to a frequency signal composed of a plurality of frequency bands. In the following description, processing for a signal in one specific band among the signals in such a plurality of frequency bands will be described.

生成手段１０１では、前記第１の信号から第２の信号を生成するが、それは以下のようにして行う。すなわち、前記遅延手段１０４にて、まず、前記第１の信号をＮ（Ｎ＞０）単位時間遅延させる。次に前記第１のフィルタ手段１０５にて、前記遅延手段１０４の出力信号にフィルタ処理を施す。例えばこの処理として、次数がP次のAll Pass Filterを実施する。All Pass Filterの処理は従来から知られているどのような方法でも良いが、例えば、前述の非特許文献１の8.6.4.5.2節の中で述べられているAll Pass Filterでよい。一方、前記第２のフィルタ手段１０６では、前記遅延手段１０４の出力信号に対し、次数がP次より少ないAll Pass Filterの処理を実施する。 The generation unit 101 generates a second signal from the first signal, and this is performed as follows. That is, the delay means 104 first delays the first signal by N (N> 0) unit time. Next, the first filter means 105 performs a filtering process on the output signal of the delay means 104. For example, as this process, an All Pass Filter whose degree is P order is performed. The All Pass Filter process may be performed by any conventionally known method. For example, the All Pass Filter described in Section 8.6.4.5.2 of Non-Patent Document 1 may be used. On the other hand, the second filter means 106 performs an All Pass Filter process on the output signal of the delay means 104, the order of which is less than the P order.

このようにして生成された前記第１のフィルタ手段１０５からの出力信号と前記第２のフィルタ手段１０６からの出力信号とは、前記合成手段１０８によって処理され、前記第２の信号が生成される。この過程は以下のようなものである。すなわち、前記特徴量検出手段１０７において、前記第１の信号の音響的特徴量を検出し、その特徴量に応じて、前記第１のフィルタ手段１０５からの出力信号と前記第２のフィルタ手段１０６からの出力信号とを混ぜ合わせる比率が決定される。 The output signal from the first filter unit 105 and the output signal from the second filter unit 106 thus generated are processed by the synthesis unit 108 to generate the second signal. . This process is as follows. That is, the feature quantity detection means 107 detects the acoustic feature quantity of the first signal, and the output signal from the first filter means 105 and the second filter means 106 according to the feature quantity. The ratio of mixing with the output signal from is determined.

例えば、前記音響的特徴量は、前記第１の信号が急峻に変動している場合大となる特徴量であり、前記合成手段は、前記音響的特徴量が小である場合は、前記第１のフィルタ手段１０５の出力信号を出力する、あるいは、前記第１のフィルタ手段１０５の出力信号を多めに、前記第２のフィルタ手段１０６の出力信号を少なめに混ぜ合わせて出力する。反対に、前記音響的特徴量が大である場合は、前記第２のフィルタ手段１０６の出力信号を出力する、あるいは、前記第１のフィルタ手段１０５の出力信号を少なめに、前記第２のフィルタ手段１０６の出力信号を多めに混ぜ合わせて出力する。 For example, the acoustic feature amount is a feature amount that becomes large when the first signal fluctuates sharply, and the synthesizing unit determines that the first feature is small when the acoustic feature amount is small. The output signal of the filter means 105 is output, or the output signal of the first filter means 105 is increased and the output signal of the second filter means 106 is mixed slightly and output. On the other hand, when the acoustic feature quantity is large, the output signal of the second filter means 106 is output, or the output signal of the first filter means 105 is reduced, and the second filter means A large amount of output signals from the means 106 are mixed and output.

ここで、前記音響的特徴量は、前記第１の信号が特定の周波数帯域に強いエネルギーが集中している場合大となる特徴量であってもよい。あるいは、そのような特徴量の組み合わせであってもよい。 Here, the acoustic feature amount may be a feature amount that becomes large when strong energy is concentrated in a specific frequency band of the first signal. Alternatively, a combination of such feature amounts may be used.

ここで重要なことは、前記音響的特徴量が、音の時間的変動のシャープさや、音像のしっかりとした定位感を表す特徴量であるということである。なぜならば、前記フィルタ手段１０５は、次数がP次のAll Pass Filterであり、音に残響感を与えるフィルタであるので、そのような残響感が不要である場合、すなわち音の時間的変動のシャープさや、音像のしっかりとした定位感が必要な場合は、All Pass Filterの次数を少なくすることで残響感を減らす必要があるからである。 What is important here is that the acoustic feature amount is a feature amount that represents the sharpness of temporal variation in sound and the sense of localization of a sound image. This is because the filter means 105 is a P-th order All Pass Filter and gives a reverberation to the sound. Therefore, when such a reverberation is not necessary, that is, sharpness of the temporal variation of the sound. This is because, when a firm sense of localization of the sound image is required, it is necessary to reduce the reverberation by reducing the order of the All Pass Filter.

このような観点から言えば、前記生成手段１０１は、図２に示したような構成であってもよい。図２において、遅延手段１０４と、第１のフィルタ手段１０５と、合成手段１０８とは、図１に示したものと同じである。図２において、２００は、前記第１の信号をｎ（Ｎ＞ｎ≧０）単位時間遅延させる第２の遅延手段である。２０１は、入力信号の位相を９０度或いは‐９０度回転させる第３のフィルタ手段である。 From this point of view, the generation unit 101 may have a configuration as shown in FIG. In FIG. 2, the delay means 104, the first filter means 105, and the synthesis means 108 are the same as those shown in FIG. In FIG. 2, reference numeral 200 denotes second delay means for delaying the first signal by n (N> n ≧ 0) unit time. Reference numeral 201 denotes third filter means for rotating the phase of the input signal by 90 degrees or -90 degrees.

前記遅延手段１０４や前記フィルタ手段１０５は、音の空間的広がり感や残響感を与える効果があるが、それらが不要な場合、すなわち、音の時間的変動のシャープさや、音像のしっかりとした定位感が必要である場合、遅延の量を少なくしたり、残響の量を少なくしたりすることが必要である。そのような場合は、遅延量が前記遅延手段１０４より小さい前記第２の遅延手段２００を用い、更に、残響感が少ない前記第３のフィルタを用いる。前記第２の遅延手段２００の遅延量は０でもよい。すなわち前記第２の遅延手段２００はなくてもよい。前記第３のフィルタ手段２０１は入力信号の位相を９０度或いは‐９０度回転させるものであるが、これは非常にすくない演算量で、入力信号と無相関でしかも遅延を伴なわない信号が生成できるので、入力信号と無相関でしかもシャープな信号を生成する手段として利便性が高い。ここで、生成される信号が入力信号（前記第１の信号）と無相関であることは非常に重要である。なぜならば、もし相関の高い信号であれば、後段の混合手段による処理によって第1の信号と混合される際に、単にモノラル的な音（ステレオ感のない音）になってしまうからである。 The delay means 104 and the filter means 105 have an effect of giving a sense of spatial spread and reverberation of the sound, but when they are unnecessary, that is, the sharpness of the temporal variation of the sound and the localization of the sound image. When feeling is needed, it is necessary to reduce the amount of delay or the amount of reverberation. In such a case, the second delay unit 200 having a delay amount smaller than that of the delay unit 104 is used, and further, the third filter having a low reverberation feeling is used. The delay amount of the second delay means 200 may be zero. That is, the second delay means 200 may be omitted. The third filter means 201 rotates the phase of the input signal by 90 degrees or -90 degrees. However, this is a very small amount of computation, and a signal that is uncorrelated with the input signal and has no delay is generated. Therefore, it is convenient as a means for generating a sharp signal that is uncorrelated with the input signal. Here, it is very important that the generated signal is uncorrelated with the input signal (the first signal). This is because if the signal has a high correlation, it is simply a monaural sound (a sound without a sense of stereo) when it is mixed with the first signal by the processing by the subsequent mixing means.

このようにして得られた前記フィルタ手段１０５からの出力信号と、前記第３のフィルタ手段２０１とは、前記合成手段１０８において、音響的特徴量に応じて合成されるがその方法は前述と同じでよい。このようにすることで、残響感や音の広がり感が不要な場合は、シャープで定位がしっかりとした音を生成することができる。
さて、このようにして、前記生成手段１０１で生成された第２の信号と前記第１の信号とは、混合手段１０３で混合されるが、その動作を以下説明する。 The output signal from the filter means 105 obtained in this way and the third filter means 201 are synthesized by the synthesis means 108 according to the acoustic feature quantity, but the method is the same as described above. It's okay. By doing so, a sharp and well-positioned sound can be generated when a feeling of reverberation or a feeling of sound spread is unnecessary.
Now, in this way, the second signal generated by the generating unit 101 and the first signal are mixed by the mixing unit 103. The operation will be described below.

まず、前記混合係数決定手段１０２で、前記第２の符号化信号と前記第３の符号化信号とから混合係数を決定する。前記第２の符号化信号は、もともとの２つのオーディオ信号間のレベル比Ｌに応じて決まる値を符号化したものであり、前記第３の符号化信号はもともとの２つのオーディオ信号間の位相差θに応じて決まる値を符号化したものである。このようなレベル比情報と位相差情報とから混合係数h11, h12, h21, h22を求める方法は、例えば、前述の非特許文献１の8.6.4.6.2 節Mixingに詳しく述べられているような方法でも良いが、以下のような方法でもよい。 First, the mixing coefficient determining means 102 determines a mixing coefficient from the second encoded signal and the third encoded signal. The second encoded signal is obtained by encoding a value determined according to the level ratio L between the two original audio signals, and the third encoded signal is a level between the two original audio signals. A value determined according to the phase difference θ is encoded. A method for obtaining the mixing coefficients h11, h12, h21, h22 from such level ratio information and phase difference information is described in detail in, for example, Section 8.6.4.6.2 Mixing of Non-Patent Document 1 described above. The following method may be used.

すなわち、隣り合う２辺の成す角度が前記θで、長さの比が前記Ｌであるところの平行四辺形の前記θが当該平行四辺形の対角線によって分割されて得られる角度をＡ、Ｂとし、前記レベル比Ｌに応じて決まる値をｄ1、d2とした時、h11=d1*cos(Ａ)、h21=d1*sin(Ａ)、h12=d2*cos(-Ｂ)、h22=d2*sin(-Ｂ)、とする。上記において、d1、d2の値を、d1=L/((1+2*L*cos(θ)+L*L)^0.5)、d2=1/((1+2*L*cos(θ)+L*L)^0.5)とする。このようにすることによって、ダウンミックスされモノラル化された信号を、もともとの２つの信号の位相差とレベル比とに応じて、数学的に正確にもとの２つの信号に分離できるのである。その理由を図３に示した。隣り合う２辺の成す角度が前記θで、長さの比が前記Ｌであるところの平行四辺形ＸＹＺＷにおいて、その対角線によって分割されて得られる角度ＹＸＺをＡ、角度ＷＸＺをＢとした。対角線の長さＸＺは、数学的に((1+2*L*cos(θ)+L*L)^0.5として求められる。従って、上記d1とd2とは、d1=L/((1+2*L*cos(θ)+L*L)^0.5)、d2=1/((1+2*L*cos(θ)+L*L)^0.5)として求められる。 That is, the angles obtained by dividing the parallelogram θ where the angle between two adjacent sides is θ and the length ratio is the L by the diagonal of the parallelogram are A and B. When the values determined according to the level ratio L are d1 and d2, h11 = d1 * cos (A), h21 = d1 * sin (A), h12 = d2 * cos (−B), h22 = d2 * Let sin (-B). In the above, the values of d1 and d2 are as follows: d1 = L / ((1 + 2 * L * cos (θ) + L * L) ^ 0.5), d2 = 1 / ((1 + 2 * L * cos (θ ) + L * L) ^ 0.5). In this way, the downmixed and monaural signal can be mathematically and accurately separated into two signals according to the phase difference and level ratio of the original two signals. The reason is shown in FIG. In the parallelogram XYZW where the angle formed by two adjacent sides is θ and the length ratio is L, the angle YXZ obtained by dividing the parallelogram XYZW is A, and the angle WXZ is B. The diagonal length XZ is mathematically obtained as ((1 + 2 * L * cos (θ) + L * L) ^ 0.5. Therefore, the above d1 and d2 are d1 = L / ((1+ 2 * L * cos (θ) + L * L) ^ 0.5), d2 = 1 / ((1 + 2 * L * cos (θ) + L * L) ^ 0.5).

上記において、d1、d2の値を、簡易的にd1=L/((1+L*L)^0.5)、d2=1/((1+L*L)^0.5)として求めてもよい。 In the above, the values of d1 and d2 may be simply obtained as d1 = L / ((1 + L * L) ^ 0.5) and d2 = 1 / ((1 + L * L) ^ 0.5).

さて、このようにして、生成された混合係数h11,h21,h12,h22を用いて、前記第１の信号と前記第２の信号とが、混合手段１０３で混合される。その方法は以下の通りである。すなわち、前記第１の信号を複素数で表現したときの実数部をr1、虚数部をi1、前記第２の信号を複素数で表現したときの実数部をr2、虚数部をi2、としたとき、h11*r1＋h21*r2を１つ目の出力信号の実数部とし、h11*i1＋h21*i2を１つ目の出力信号の虚数部とし、h12*r1＋h22*r2を２つ目の出力信号の実数部とし、h12*i1＋h22*i2を２つ目の出力信号の虚数部とする。
以上のように本実施の形態によれば、第１の信号と、前記第１の信号から生成した第２の信号とを、２通りの混合の度合（h11とh21の組み合わせで混合する場合と、h12とh22の組み合わせで混合する場合の２通り）で混合することで２つの信号を生成する信号処理装置において、前記第１の信号から前記第２の信号を生成する生成手段と、前記混合の度合を決定する混合係数決定手段と、前記混合係数決定手段で決定された混合の度合に基づいて、前記第１の信号と前記第２の信号とを混合する混合手段と、を有し、前記生成手段は、前記第１の信号をＮ（Ｎ＞０）単位時間遅延させる遅延手段と、前記遅延手段の出力信号を加工するAll Pass Filterと、前記第１の信号を加工する加工手段と、を備え、前記加工手段を、前記遅延手段とAll Pass Filterとのよって生成される信号より音の広がり感や残響感の少ない信号を生成するようにし、前記第１の信号が急峻に変動しているような信号であったり、特定の周波数帯域に強いエネルギーが集中している信号であったりした場合、前記第２の信号に前記加工手段の出力信号を多めに混ぜ合わせることによって、モノラル化された信号から２ｃｈの信号を生成する際に、空間的な広がり間が与えられ、良好なステレオ信号が得られると同時に、音の時間的変動のシャープさや、音像のしっかりとした定位も実現できることとなる。 The first signal and the second signal are mixed by the mixing unit 103 using the generated mixing coefficients h11, h21, h12, and h22. The method is as follows. That is, when the real part when the first signal is expressed by a complex number is r1, the imaginary part is i1, the real part when the second signal is expressed by a complex number is r2, and the imaginary part is i2, h11 * r1 + h21 * r2 is the real part of the first output signal, h11 * i1 + h21 * i2 is the imaginary part of the first output signal, and h12 * r1 + h22 * r2 is the real part of the second output signal , H12 * i1 + h22 * i2 is the imaginary part of the second output signal.
As described above, according to the present embodiment, the first signal and the second signal generated from the first signal are mixed in two degrees of mixing (in the combination of h11 and h21). In the signal processing device that generates two signals by mixing in a combination of h12 and h22), the generating means for generating the second signal from the first signal, and the mixing Mixing coefficient determining means for determining the degree of the above, and mixing means for mixing the first signal and the second signal based on the degree of mixing determined by the mixing coefficient determining means, The generating means includes delay means for delaying the first signal by N (N> 0) unit time, an All Pass Filter for processing an output signal of the delay means, and processing means for processing the first signal. The processing means includes the delay means and an All Pass Filter. Therefore, a signal with less sound spread or reverberation than the generated signal is generated, and the first signal is a signal that fluctuates sharply, or strong energy is concentrated in a specific frequency band. When the 2ch signal is generated from the monaural signal by mixing a large amount of the output signal of the processing means to the second signal, Thus, a good stereo signal can be obtained, and at the same time, the sharpness of temporal fluctuation of sound and the localization of sound image can be realized.

なお、本実施の形態では、音響的特徴量は、特徴量検出手段107によって検出されるものとしたが、必ずしもその必要はなく、音響的特徴量を予め符号化したデータを受信するようにしてもよい。その場合の構成図は、図６のようになる。図１と図６との違いは、特徴量検出手段107の代わりに、特徴量受信手段109を備えていることだけである。特徴量受信手段109は、第4の符号化信号として、入力信号の音響的特徴量を符号化したデータを受信する。例えば、第4の符号化信号は、特定の周波数帯域に強いエネルギーが集中している場合真となり、そうでない場合に偽となる符号化信号である。前記生成手段101は、第4の符号化信号が真である場合は、残響成分の少ない信号（すなわち遅延量の少ないあるいは遅延のない信号に対しフィルタタップ長の短いフィルタで処理された信号か、位相を９０度回転させた信号）を生成し、そうでない場合は、残響成分の多い信号（すなわち遅延量の多い信号に対しフィルタタップ長の長いフィルタで処理した信号）を生成する。そうすることによって、符号化装置側で意図したとおりの処理が実施できるので、高音質な信号を生成できることとなる。この場合、合成手段108は、単にセレクタだけの機能で済むことは言うまでもない。 In the present embodiment, the acoustic feature quantity is detected by the feature quantity detection unit 107, but it is not always necessary to receive data obtained by encoding the acoustic feature quantity in advance. Also good. The configuration diagram in that case is as shown in FIG. The only difference between FIG. 1 and FIG. 6 is that a feature amount receiving means 109 is provided instead of the feature amount detecting means 107. The feature amount receiving unit 109 receives data obtained by encoding the acoustic feature amount of the input signal as the fourth encoded signal. For example, the fourth encoded signal is an encoded signal that is true when strong energy is concentrated in a specific frequency band, and is false otherwise. When the fourth encoded signal is true, the generating means 101 is a signal with a small reverberation component (that is, a signal processed with a filter with a short filter tap length for a signal with a small delay amount or no delay, If not, a signal having a large reverberation component (that is, a signal obtained by processing a signal having a large delay amount with a filter having a long filter tap length) is generated. By doing so, the processing as intended on the encoding apparatus side can be performed, so that a high-quality sound signal can be generated. In this case, it is needless to say that the synthesizing means 108 has only a function of a selector.

（実施の形態２）
以下本発明の実施の形態２における信号処理装置について図面を参照しながら説明する。本実施の形態２が、前記実施の形態１と大きくことなる点は、前記実施の形態１が、逐次入力される信号に応じて、第２の信号の生成の方法を逐次適応していたのに対して、本実施の形態２では、低域の周波数帯域の信号は音の残響感や広がり感に大きく寄与し、高域の周波数帯域の信号は音のシャープさに大きく寄与することを考慮し、低域と高域とで生成手段を変更するところである。 (Embodiment 2)
A signal processing apparatus according to Embodiment 2 of the present invention will be described below with reference to the drawings. The second embodiment is greatly different from the first embodiment in that the first embodiment sequentially adapts the method of generating the second signal according to the sequentially input signals. On the other hand, in the second embodiment, it is considered that the signal in the low frequency band greatly contributes to the reverberation and spread feeling of the sound, and the signal in the high frequency band greatly contributes to the sharpness of the sound. However, the generation means is changed between the low range and the high range.

図４は本実施の形態２における信号処理装置の構成を示す図である。本信号処理装置は、２つのオーディオ信号をダウンミックスした信号を符号化した第１の符号化信号と、前記２つのオーディオ信号間のレベル比Ｌに応じて決まる値を符号化した第２の符号化信号と、前記２つのオーディオ信号間の位相差θに応じて決まる値を符号化した第３の符号化信号と、からなるビットストリームをデコードする信号処理装置である。 FIG. 4 is a diagram showing the configuration of the signal processing apparatus according to the second embodiment. The signal processing apparatus includes a first encoded signal obtained by encoding a signal obtained by down-mixing two audio signals, and a second code obtained by encoding a value determined according to the level ratio L between the two audio signals. And a third encoded signal obtained by encoding a value determined according to a phase difference θ between the two audio signals.

図４において、４００は、前記第１の符号化信号を復号化し第１の信号を生成する復号化手段、４０１は、前記第１の信号から前記第２の信号を生成する生成手段、４０２は、前記第２の符号化信号と前記第３の符号化信号とから混合係数を決定する混合係数決定手段、４０３は、前記混合係数決定手段４０２で決定された混合の度合に基づいて、前記第１の信号と前記第２の信号とを混合する混合手段、である。 In FIG. 4, reference numeral 400 denotes decoding means for decoding the first encoded signal to generate a first signal, 401 denotes generation means for generating the second signal from the first signal, and 402 denotes Mixing coefficient determining means for determining a mixing coefficient from the second encoded signal and the third encoded signal, 403, based on the degree of mixing determined by the mixing coefficient determining means 402, Mixing means for mixing the first signal and the second signal.

ここで、前記第１の信号は、複数の周波数帯域からなる周波数信号であり、前記生成手段４０１は、図４に示したように、それぞれの周波数帯域の信号を独立に処理して第２の信号を生成する。例えば、低域の周波数帯域の信号に対しては、遅延手段とフィルタ手段とによって信号を処理するが、高域の周波数帯域の信号に対しては、フィルタ手段のみによって信号を処理するように構成してもよい。また、低域の周波数帯域の信号に対する遅延量は、それより高域のものと比較して、同じか、それよりも大きい値となるようにしてもよい。また、低域の周波数帯域の信号に対するフィルタ手段のフィルタ次数は、それより高域のものと比較して、同じか、それよりも大きい値になるようにしてもよい。また、所定の帯域より高い帯域のフィルタ手段は、入力信号を９０度か‐９０度回転させる処理であってもよい。 Here, the first signal is a frequency signal composed of a plurality of frequency bands, and the generating means 401 independently processes the signals of the respective frequency bands as shown in FIG. Generate a signal. For example, a low frequency band signal is processed by a delay unit and a filter unit, but a high frequency band signal is processed only by a filter unit. May be. Further, the delay amount for the signal in the low frequency band may be the same or larger than that in the higher frequency signal. Further, the filter order of the filter means for the signal in the low frequency band may be the same or larger than that in the higher frequency band. Further, the filter means having a band higher than the predetermined band may be a process of rotating the input signal by 90 degrees or -90 degrees.

以上のように構成された信号処理装置の動作について以下説明する。
まず、前記復号化手段４００で、前記第１の符号化信号を復号化し、第１の信号を生成する。ここで前記第１の符号化信号は、２つのオーディオ信号をダウンミックスしたモノラル信号を符号化したものであり、例えば、MPEG方式AAC規格のエンコーダで符号化されたものである。ここでは、このようなAAC規格の符号化信号を復号化して得られたPCM信号を複数の周波数帯域からなる周波数信号に変換するところまで、当該復号化手段４００で行うものとする。生成手段４０１では、前記第１の信号から第２の信号を生成するが、それは以下のようにして行う。すなわち、前記第１の信号を構成する複数の周波数帯域のうち、低域の周波数帯域については、予め設定された値Ｎ単位時間だけ信号を遅延させ、そのようにして遅延させた信号に対し、次数がＰ次のAll Pass Filterの処理を実施する。ここで、All Pass Filterの処理は従来から知られているどのような方法でも良いが、例えば、前述の非特許文献１の8.6.4.5.2節の中で述べられているAll Pass Filterでよい。 The operation of the signal processing apparatus configured as described above will be described below.
First, the decoding unit 400 decodes the first encoded signal to generate a first signal. Here, the first encoded signal is obtained by encoding a monaural signal obtained by downmixing two audio signals, and is encoded by, for example, an MPEG AAC standard encoder. Here, it is assumed that the decoding means 400 performs the process until the PCM signal obtained by decoding such an AAC standard encoded signal is converted into a frequency signal composed of a plurality of frequency bands. The generation unit 401 generates a second signal from the first signal, which is performed as follows. That is, among the plurality of frequency bands constituting the first signal, for the low frequency band, the signal is delayed by a preset value N unit time, and for the signal thus delayed, All pass filter processing of the order P is performed. Here, the All Pass Filter process may be any method known in the art, but may be, for example, the All Pass Filter described in Section 8.6.4.5.2 of Non-Patent Document 1 described above. .

また、上で述べた周波数帯域より高い周波数帯域の信号に対しては、前記Ｎと同じかそれより小さい値n（Ｎ≧ｎ≧０）の時間単位分だけ信号を遅延させ、そのようにして遅延させた信号に対し、次数が前記Ｐと同じかそれより小さい値ｐ（Ｐ≧ｐ≧０）次のAll Pass Filterの処理を実施する。あるいは、All Pass Filterの処理でなく、入力信号を９０度か‐９０度回転させる処理であってもよい。 Further, for a signal in a frequency band higher than the frequency band described above, the signal is delayed by a time unit of a value n (N ≧ n ≧ 0) that is equal to or smaller than N, and so The delayed signal is subjected to an All Pass Filter process of the order p (P ≧ p ≧ 0) whose order is equal to or smaller than P. Alternatively, instead of the All Pass Filter process, a process of rotating the input signal by 90 degrees or -90 degrees may be used.

要するに、低い周波数帯域の信号ほど多くの遅延と長いフィルタタップ数のフィルタとで、音の広がり感と残響感を多く与え、高い周波数帯域の信号ほど少ない遅延と短いフィルタタップ数のフィルタとで、音の広がり感と残響感を少なくする。このようにする理由は、一般に、低域の周波数帯域の信号は音の残響感や広がり感に大きく寄与し、高域の周波数帯域の信号は音のシャープさに大きく寄与することを考慮したためである。勿論、細かい周波数帯域ごとに精密に聴覚の知覚特性を分析しその結果に基づいた場合、必ずしも上記のように、低域から高域にいくに従って短調に値が減少するという方法に限定されるべきではない。ここで重要なことは、各周波数帯域毎に独立に値が制御されるということである。 In short, the lower frequency band signal gives more delay and longer filter taps, and the higher frequency band signal gives less delay and shorter filter taps. Reduce the sense of sound spread and reverberation. The reason for doing this is that, in general, the signal in the low frequency band greatly contributes to the reverberation and spread of the sound, and the signal in the high frequency band greatly contributes to the sharpness of the sound. is there. Of course, when the perceptual characteristics of hearing are analyzed precisely for each fine frequency band and based on the result, it should be limited to a method in which the value decreases in a minor manner as it goes from low to high as described above. is not. What is important here is that the value is controlled independently for each frequency band.

さて、このようにして生成された第２の信号と、前記第１の信号とは、前記混合係数決定手段４０２で決定された混合係数を用いて、前記混合手段４０３で混合されるが、その動作は、前述の実施の形態１で示したものと同じでよい。 Now, the second signal generated in this way and the first signal are mixed by the mixing unit 403 using the mixing coefficient determined by the mixing coefficient determining unit 402. The operation may be the same as that shown in the first embodiment.

以上のように本実施の形態によれば、第１の信号と、前記第１の信号から生成した第２の信号とを、２通りの混合の度合（h11とh21の組み合わせで混合する場合と、h12とh22の組み合わせで混合する場合の２通り）で混合することで２つの信号を生成する信号処理装置において、前記第１の信号から前記第２の信号を生成する生成手段と、前記混合の度合を決定する混合係数決定手段と、前記混合係数決定手段で決定された混合の度合に基づいて、前記第１の信号と前記第２の信号とを混合する混合手段と、を有し、前記生成手段は、前記第１の信号のうち、低い周波数帯域の信号については、比較的大きな値Ｎ（Ｎ＞０）単位時間遅延させる遅延手段と、比較的大きな値Ｐの次数を持つAll Pass Filterとで信号を生成し、前記第１の信号のうち、高い周波数帯域の信号については、比較的小さな値ｎ単位時間遅延させる遅延手段と（或いは全然遅延させない）、比較的小さな値ｐの次数を持つAll Pass Filterと（或いは入力信号を９０度或いは‐９０度回転させるだけ）で信号を生成するようにすることによって、モノラル化された信号から２ｃｈの信号を生成する際に、空間的な広がり間が与えられ、良好なステレオ信号が得られると同時に、音の時間的変動のシャープさや、音像のしっかりとした定位も実現できることとなる。 As described above, according to the present embodiment, the first signal and the second signal generated from the first signal are mixed in two degrees of mixing (in the combination of h11 and h21). In the signal processing device that generates two signals by mixing in a combination of h12 and h22), the generating means for generating the second signal from the first signal, and the mixing Mixing coefficient determining means for determining the degree of the above, and mixing means for mixing the first signal and the second signal based on the degree of mixing determined by the mixing coefficient determining means, The generating means includes a delay means for delaying a relatively large value N (N> 0) unit time for a signal in a low frequency band of the first signal, and an All Pass having an order of a relatively large value P. A signal is generated with the filter, and the first signal is high. For a signal in the frequency band, a delay means for delaying a relatively small value n unit time (or not delaying it at all), an All Pass Filter having an order of a relatively small value p (or an input signal of 90 degrees or -90 degrees) When a 2ch signal is generated from a monaural signal, a spatial spread is given and a good stereo signal can be obtained at the same time. The sharpness of temporal fluctuations and the sound localization can be realized.

なお、実施の形態２では、入力信号の性質にかかわらず、各周波数帯域信号の処理の方法（遅延量とフィルタ次数）は固定としたが、勿論このように限定する必要はなく、入力信号に応じて適宜切り替えてもよい。例えば、周波数帯域Ｔ以下の周波数帯域は遅延とAll Pass Filterの処理を行い、Ｔより上の周波数帯域は遅延は０で、フィルタの処理は、入力信号を９０度或いは‐９０度回転させるだけの処理にするようにしておき、上記Ｔの値を、入力信号に応じて適宜切り替えてもよい。 In the second embodiment, the processing method (delay amount and filter order) of each frequency band signal is fixed regardless of the nature of the input signal. However, of course, it is not necessary to limit to this method. You may switch suitably according to it. For example, the frequency band below the frequency band T performs delay and All Pass Filter processing, the frequency band above T has zero delay, and the filter processing only rotates the input signal by 90 degrees or -90 degrees. The value of T may be appropriately switched according to the input signal.

（実施の形態３）
以下本発明の実施の形態３における信号処理装置について図面を参照しながら説明する。 (Embodiment 3)
Hereinafter, a signal processing apparatus according to Embodiment 3 of the present invention will be described with reference to the drawings.

図７は本実施の形態３における信号処理装置の構成を示す図である。図７において、７００は復号化手段、７０１は生成手段、７０２は混合係数決定手段、７０３は混合手段、であり、実施の形態２における、復号化手段４００、生成手段４０１、混合係数決定手段４０２、混合手段４０３、と同じものである。実施の形態２と異なる点は、混合手段７０３の後段に、音像移動手段７０４を配置している点である。 FIG. 7 is a diagram showing the configuration of the signal processing apparatus according to the third embodiment. In FIG. 7, reference numeral 700 denotes decoding means, 701 denotes generation means, 702 denotes mixing coefficient determination means, and 703 denotes mixing means. In the second embodiment, decoding means 400, generation means 401, and mixing coefficient determination means 402 according to the second embodiment. , The same as the mixing means 403. The difference from the second embodiment is that a sound image moving unit 704 is arranged at the subsequent stage of the mixing unit 703.

以上のように構成された信号処理装置の動作について以下説明する。
図７において、復号化手段７００、生成手段７０１、混合係数決定手段７０２、混合手段７０３、の各動作は、実施の形態２で述べた動作をおなじであるので省略する。 The operation of the signal processing apparatus configured as described above will be described below.
In FIG. 7, the operations of the decoding unit 700, the generation unit 701, the mixing coefficient determination unit 702, and the mixing unit 703 are the same as those described in the second embodiment, and thus will be omitted.

混合手段７０３から生成された２つの信号は、音像移動手段７０４によって処理される。この処理は、所謂、頭部伝達関数を応用した処理であり、実際に配置されているスピーカで囲まれている空間よりも広い空間で音がなっているように感じさせる処理である。 The two signals generated from the mixing unit 703 are processed by the sound image moving unit 704. This process is a process that applies a so-called head-related transfer function, and is a process that makes it feel as if sound is being produced in a wider space than the space surrounded by the speakers that are actually arranged.

まず、最初に、そのような処理を行う趣旨について、図８、図９を用いて説明する。
そもそも本願が扱う信号処理は、元々は複数チャネルの信号であったものを少ないチャネルにダウンミックスした信号を、元々の複数チャネル間の位相差情報やレベル比情報のみからものと複数チャネルの信号に分離するものである。しかしながら、元々の複数チャネル間の位相差情報やレベル比情報のみでは、完全にはもとの状態に戻らず、もともとがステレオ信号であったものに対して、完全にはもとのステレオ信号に戻らず、ややモノラル的な信号となる。すなわち、２つのスピーカで生成される音像が、狭い間隔に配置されたスピーカから生成された音像のようになってしまう。図８はそのことを示しており、実線で描かれたスピーカが実際に配置されているスピーカであるにもかかわらず、点線で描かれたような狭い空間に配置されたスピーカから生成された音のように聞こえてしまう。 First, the purpose of performing such processing will be described with reference to FIGS.
In the first place, the signal processing handled by the present application is to convert a signal that was originally a multi-channel signal downmixed into a few channels into a multi-channel signal from only the phase difference information and level ratio information between the original multiple channels. To separate. However, the original phase difference information and level ratio information alone do not completely return to the original state, but the original stereo signal is completely different from the original stereo signal. The signal does not return and becomes a mono signal. That is, a sound image generated by two speakers becomes like a sound image generated from speakers arranged at a narrow interval. FIG. 8 shows this, and the sound generated from the speaker arranged in a narrow space as drawn by the dotted line, even though the speaker drawn by the solid line is actually arranged. It sounds like

そこで、音像を広げて聞かせる技術を導入することによって、もとのステレオ信号のイメージに近づけることが本願の趣旨である。 Therefore, the purpose of the present application is to bring the sound image closer to the original stereo signal image by introducing a technique for expanding the sound image.

図９は、４チャンネルの場合を示している。前方左チャネルの信号と後方左チャネルの信号とがダウンミックスされ、前方右チャネルの信号を後方右チャネルの信号とがダウンミックスされている場合、左右の分離は損なわれていないが、前後の分離が損なわれるので、前方チャネルのスピーカをさらに前方に配置し、後方チャネルのスピーカをさらに後方に配置したかのような音像を生成することで、損なわれた分離感を回復したいというのが本願のねらいである。 FIG. 9 shows the case of 4 channels. When the front left channel signal and the rear left channel signal are downmixed, and the front right channel signal and the rear right channel signal are downmixed, the left and right separation is not impaired, but the front and rear separation Therefore, it is desired to recover the sense of separation that has been lost by generating a sound image as if the speaker of the front channel is arranged further forward and the speaker of the rear channel is arranged further rearward. Aim.

さて、実施の形態３の動作の説明に戻る。
図１０は、頭部伝達関数の考え方を示した図である。実際に配置されているスピーカの位置と異なる位置にスピーカが存在するように音像を定位させたい場合（図１０におけるＳ、Ｓ'に音像を定位させたい場合）、音像を定位させたい位置から受聴者の耳までの音響伝達関数を忠実に再現し、音源信号に畳み込んで受聴者に提示することにより、所望の位置に音像を定位させることが可能であることが知られている。図１０に示された曲線矢印の経路における伝達関数（Hｌ（ｆ、φ）など）が頭部伝達関数である。 Now, the description returns to the operation of the third embodiment.
FIG. 10 is a diagram showing the concept of the head-related transfer function. When it is desired to localize the sound image so that the speaker is present at a position different from the position of the speaker actually arranged (when the sound image is localized at S and S ′ in FIG. 10), the sound image is received from the position where the sound image is to be localized. It is known that a sound image can be localized at a desired position by faithfully reproducing an acoustic transfer function up to a listener's ear, convoluted with a sound source signal, and presenting it to a listener. The transfer function (Hl (f, φ), etc.) in the path of the curved arrow shown in FIG. 10 is the head-related transfer function.

このような頭部伝達関数の振幅周波数特性における構造的特徴を示す図が図１１、両耳間時間差および両耳間レベル差示す図が図１２である。音像の前後および上下の定位にかかわる手がかりは、図１１に示す頭部伝達関数の振幅周波数特性に含まれるピークとディップにあることが既に知られている。また、左右方向の定位にかかわる手がかりは、図１２に示す頭部伝達関数の左右の時間差（ITD）やレベル差（ILD）にあることが既に知られている。（特願2005−161602参照）。 FIG. 11 is a diagram showing structural features in the amplitude frequency characteristics of such a head-related transfer function, and FIG. 12 is a diagram showing interaural time differences and interaural level differences. It is already known that the clues related to the localization before and after the sound image and the top and bottom are in the peak and dip included in the amplitude frequency characteristic of the head-related transfer function shown in FIG. Further, it is already known that a clue related to the localization in the left-right direction is the time difference (ITD) and level difference (ILD) on the left and right of the head-related transfer function shown in FIG. (See Japanese Patent Application 2005-161602).

このような特徴量をもった頭部伝達関数の処理を行うことが音像移動手段７０４の動作である。その中で、一例として、前後方法の音像の定位に関する処理を以下に述べる。
前述のように、音像の前後方向の定位にかかわる手がかりは、図１１に示す頭部伝達関数の振幅周波数特性に含まれるピークとディップにあることが知られている。一方、図７に示した混合手段７０３は複数の周波数帯域信号を音像移動手段７０４に対して送出する。そこで、音像移動手段７０４では、図１３に示すように、頭部伝達関数の振幅周波数特性に合致するように、混合手段７０３から入力される複数の周波数帯域信号の振幅レベルを夫々調整する。 It is the operation of the sound image moving means 704 to perform processing of the head-related transfer function having such a characteristic amount. As an example, processing relating to localization of the sound image in the front-rear method will be described below.
As described above, it is known that the clues related to the localization in the front-rear direction of the sound image are the peak and dip included in the amplitude frequency characteristic of the head-related transfer function shown in FIG. On the other hand, the mixing unit 703 shown in FIG. 7 sends a plurality of frequency band signals to the sound image moving unit 704. Therefore, as shown in FIG. 13, the sound image moving unit 704 adjusts the amplitude levels of the plurality of frequency band signals input from the mixing unit 703 so as to match the amplitude frequency characteristics of the head-related transfer function.

図１３に示す曲線は、図１１に示す頭部伝達関数の振幅周波数特性と同じものであり、斜線柄で示した四角形が、その周波数帯域信号を、その高さの分だけゲインを増加させることを示しており、格子柄で示した四角形が、その帯域信号を、その高さの分だけゲインを減少させることを示している。 The curve shown in FIG. 13 is the same as the amplitude frequency characteristic of the head-related transfer function shown in FIG. 11, and the square indicated by the oblique line pattern increases the gain of the frequency band signal by the height. A square indicated by a lattice pattern indicates that the gain of the band signal is reduced by the height.

ここでは、頭部伝達関数の振幅周波数特性を完全に模擬するように振幅レベルを調整する必要はなく、聴感上重要とされるいくつかのピークとディップだけを模擬すればよい。或いは、聴感上特に重要とされるいくつかのディップだけを模擬すればよい。そうすることによって、少ない演算量で効率的に音像の移動の処理が実現できる。
図１４は、各周波数帯域信号の帯域幅が、ディップの幅に対して広い場合の例を示している。このような場合、当該帯域信号の振幅を一括して減少させると、ディップの形状を適切に表現できないので、当該周波数帯域の信号に対して、所定の周波数特性をもったフィルタを掛けることで、図１４に示すようなディップを形成できる。例えば、Ｆ（ｚ）＝１＋Ａ＊ｚ-1＋ｚ-2 のフィルタのＡの値を−２から２の間で適切に設定し、当該周波数帯域信号に当該フィルタ処理を施すことによって、当該周波数帯域内の所定の位置にディップを形成することができる。図１４の赤い点線でしめした曲線がそれにあたる。 Here, it is not necessary to adjust the amplitude level so as to completely simulate the amplitude-frequency characteristic of the head-related transfer function, and only a few peaks and dips that are important for hearing can be simulated. Alternatively, only a few dips that are particularly important for hearing need be simulated. By doing so, the process of moving the sound image can be realized efficiently with a small amount of calculation.
FIG. 14 shows an example in which the bandwidth of each frequency band signal is wider than the width of the dip. In such a case, if the amplitude of the band signal is collectively reduced, the shape of the dip cannot be appropriately expressed, so by applying a filter having a predetermined frequency characteristic to the signal in the frequency band, A dip as shown in FIG. 14 can be formed. For example, by appropriately setting the value A of the filter of F (z) = 1 + A * z−1 + z−2 between −2 and 2, and applying the filter processing to the frequency band signal, A dip can be formed at a predetermined position. The curve shown by the red dotted line in FIG.

図１３、図１４を用いた音像移動手段の動作の説明では、特定のチャネルに対する頭部伝達関数の振幅周波数特性を効率的に実現する方法を述べたが、実際には、各チャネルごとに頭部伝達関数の処理を行い、所定の伝達関数の出力同士を加算することによって実際に音像を移動させる処理を行うが、それについては、広く知られている方法を用いればよい。特願2005−161602参照）。 In the explanation of the operation of the sound image moving means using FIGS. 13 and 14, the method for efficiently realizing the amplitude frequency characteristic of the head-related transfer function for a specific channel has been described. The processing of the partial transfer function is performed, and the processing of actually moving the sound image is performed by adding the outputs of the predetermined transfer functions. For this, a widely known method may be used. (See Japanese Patent Application No. 2005-161602).

また、もちろん、図１３、図１４に示した頭部伝達関数の振幅周波数特性が、音像を前方に移動させるものであったならば、もう一方のチャネルに対する頭部伝達関数の振幅周波数特性は音像を後方に移動させるものであれば、音響空間の広がり感が増すことは言うまでもない。 Of course, if the amplitude frequency characteristic of the head-related transfer function shown in FIGS. 13 and 14 is to move the sound image forward, the amplitude frequency characteristic of the head-related transfer function for the other channel is the sound image. Needless to say, if the sound is moved rearwardly, the feeling of expansion of the acoustic space is increased.

以上のように本実施の形態によれば、第１の信号と、前記第１の信号から生成した第２の信号とを、２通りの混合の度合（h11とh21の組み合わせで混合する場合と、h12とh22の組み合わせで混合する場合の２通り）で混合することで２つの信号を生成する信号処理装置において、前記第１の信号から前記第２の信号を生成する生成手段と、前記混合の度合を決定する混合係数決定手段と、前記混合係数決定手段で決定された混合の度合に基づいて、前記第１の信号と前記第２の信号とを混合する混合手段と、前記混合手段から生成される２つの信号の再生音像を離れた位置に移動させるための信号処理を行う音像移動手段と、を有することによって、モノラル化された信号から２ｃｈの信号を生成する際に、空間的な広がり間が与えられ、良好なステレオ信号が得られると同時に、音の時間的変動のシャープさや、音像のしっかりとした定位も実現できることとなり、しかも、チャネルの分離感がさらに向上することとなる。 As described above, according to the present embodiment, the first signal and the second signal generated from the first signal are mixed in two degrees of mixing (in the combination of h11 and h21). In the signal processing device that generates two signals by mixing in a combination of h12 and h22), the generating means for generating the second signal from the first signal, and the mixing A mixing coefficient determining means for determining the degree of the mixing means, a mixing means for mixing the first signal and the second signal based on the degree of mixing determined by the mixing coefficient determining means, and the mixing means And a sound image moving means for performing signal processing for moving the reproduced sound images of the two signals generated to separate positions, thereby generating a spatial signal when generating a 2ch signal from the monaural signal. Good spread and good At the same time stereo signal is obtained, sharp sheath temporal variation of the sound, becomes the localization can be realized with a solid sound image, moreover, so that the separation sense channel is further improved.

なお、本実施の形態では、音像移動手段として、混合手段からの出力の周波数帯域信号に対して、その振幅の大きさを変更することを開示したが、その場合、図１５のような構成をとってもよい。図１５に示された各手段は、図７のものと同じであるが、混合係数決定手段1502で決定された混合の度合を変更する混合係数変更手段1504を設け、当該混合係数変更手段1504において、図１３に示したように、所定の周波数帯域に対して、その混合係数を予め増減させておくことによって、混合手段1503が、図７に示す混合手段703と全く同様の動作しかしないにもかかわらず、生成される２つの信号は、その分離性能が高まっているのである。 In the present embodiment, it has been disclosed that the amplitude of the frequency band signal output from the mixing unit is changed as the sound image moving unit. In this case, the configuration shown in FIG. It may be taken. Each unit shown in FIG. 15 is the same as that shown in FIG. 7, but a mixing coefficient changing unit 1504 for changing the degree of mixing determined by the mixing coefficient determining unit 1502 is provided. As shown in FIG. 13, the mixing means 1503 has the same operation as the mixing means 703 shown in FIG. 7 by increasing or decreasing the mixing coefficient in advance for a predetermined frequency band. Regardless, the two generated signals have increased separation performance.

なお、本実施の形態３では、生成手段７０１は、前記の実施の形態２で述べたものと同じであるとしたが、本実施の形態３で述べている発明の趣旨からいえば、生成手段７０１の動作は、どのようなものであってもよい。例えば、図５を用いて説明した従来の技術による、遅延器とＡｌｌＰａｓｓＦｉｌｔｅｒによって構成されているようなものでもよいし、実施の形態１で示した生成手段１０１（図１：内部の詳細は図２）のようなものであってもよい。また、図２内の第3のフィルタ手段２０１のみで構成されているようなものであってもよい。本実施の形態３で示している発明の趣旨は、生成される複数チャネルの信号が、モノラル化された信号からレベル比や位相差の情報を手がかりに分離されたものであることによる分離性能の不十分さを、音像移動手段７０４によって補うところにあるからである。 In the third embodiment, the generation unit 701 is the same as that described in the second embodiment. However, from the gist of the invention described in the third embodiment, the generation unit 701 The operation 701 may be any operation. For example, it may be configured by a delay device and an AllPassFilter according to the conventional technique described with reference to FIG. 5, or the generation means 101 shown in the first embodiment (FIG. 1: details of the inside are shown in FIG. ). Further, it may be configured by only the third filter means 201 in FIG. The gist of the invention shown in the third embodiment is that the generated multi-channel signal is separated from a monaural signal by using information on the level ratio and phase difference as clues. This is because the sound image moving means 704 compensates for the insufficiency.

また、前記音像移動手段７０４は、その機能が必要なときのみ動作するように制御してもよい。必要なときとは、分離される２つの信号が近似している場合である。特に、レベル比の情報が０ｄＢかそれに近い値を示している場合、分離性能が悪くなるので、この様な場合は前記音像移動手段７０４を用いるようにし、そうでない場合は、用いないように制御してもよい。 Further, the sound image moving means 704 may be controlled so as to operate only when its function is necessary. When it is necessary is when the two signals to be separated are approximate. In particular, when the level ratio information indicates 0 dB or a value close thereto, the separation performance deteriorates. Therefore, in such a case, the sound image moving means 704 is used, and if not, the control is performed so that it is not used. May be.

本発明にかかる信号処理装置は、複数チャンネル間の位相差やレベル比を非常にすくないビット数で表現した符号化信号を、音響的特性を維持して復号できるので、低ビットレートでの音楽放送サービスや音楽配信サービス、及びその受信機器に応用できる。 The signal processing apparatus according to the present invention can decode an encoded signal that expresses a phase difference or level ratio between a plurality of channels with a bit number that is not very low, while maintaining acoustic characteristics, so that music broadcasting at a low bit rate can be performed. It can be applied to services, music distribution services, and receiving devices.

本実施の形態１における信号処理装置の構成を示す図である。It is a figure which shows the structure of the signal processing apparatus in this Embodiment 1. FIG. 生成手段の構成の一例を示す図である。It is a figure which shows an example of a structure of a production | generation means. レベル比情報と位相差情報を平行四辺形を用いて説明する図である。It is a figure explaining level ratio information and phase difference information using a parallelogram. 本実施の形態１における信号処理装置の構成を示す図である。It is a figure which shows the structure of the signal processing apparatus in this Embodiment 1. FIG. 従来の技術の基本構成を示す図である。It is a figure which shows the basic composition of the prior art. 音響的特徴量を示す符号化データを受信する構成の実施の形態における信号処理装置の構成を示す図である。It is a figure which shows the structure of the signal processing apparatus in embodiment of the structure which receives the encoding data which show an acoustic feature-value. 本実施の形態３における信号処理装置の構成を示す図である。It is a figure which shows the structure of the signal processing apparatus in this Embodiment 3. 本実施の形態３において本願が解決する課題の一例を示す図である。It is a figure which shows an example of the subject which this application solves in this Embodiment 3. FIG. 本実施の形態３において本願が解決する課題のもうひとつの一例を示す図である。It is a figure which shows another example of the subject which this application solves in this Embodiment 3. FIG. 音像を移動させるための信号処理の基本概念を示す図である。It is a figure which shows the basic concept of the signal processing for moving a sound image. 頭部伝達関数の振幅周波数特性における構造的特徴を示す図である。It is a figure which shows the structural characteristic in the amplitude frequency characteristic of a head-related transfer function. 頭部伝達関数の両耳間時間差および両耳間レベル差を示す図である。It is a figure which shows the time difference between both ears of a head-related transfer function, and the level difference between both ears. 音像移動手段の動作の一例を示す図である。It is a figure which shows an example of operation | movement of a sound image moving means. 音像移動手段の動作のもうひとつの一例を示す図である。It is a figure which shows another example of operation | movement of a sound image moving means. 本実施の形態３における信号処理装置のもう１つの構成を示す図である。It is a figure which shows another structure of the signal processing apparatus in this Embodiment 3.

Explanation of symbols

１００復号化手段
１０１生成手段
１０２混合係数決定手段
１０３混合手段
１０４遅延手段
１０５第１のフィルタ手段
１０６第２のフィルタ手段
１０７特徴量検出手段
１０８合成手段
２００第２の遅延手段
２０１第３のフィルタ手段
４００復号化手段
４０１生成手段
４０２混合係数決定手段
４０３混合手段
１０９特徴量受信手段
７００復号化手段
７０１生成手段
７０２混合係数決定手段
７０３混合手段
７０４音像移動手段
１５００復号化手段
１５０１生成手段
１５０２混合係数決定手段
１５０３混合手段
１５０４混合係数変更手段（音像移動手段） DESCRIPTION OF SYMBOLS 100 Decoding means 101 Generation means 102 Mixing coefficient determination means 103 Mixing means 104 Delay means 105 First filter means 106 Second filter means 107 Feature quantity detection means 108 Synthesis means 200 Second delay means 201 Third filter means 400 decoding means 401 generating means 402 mixing coefficient determining means 403 mixing means 109 feature quantity receiving means 700 decoding means 701 generating means 702 mixing coefficient determining means 703 mixing means 704 sound image moving means 1500 decoding means 1501 generating means 1502 mixing coefficient determination Means 1503 Mixing means 1504 Mixing coefficient changing means (sound image moving means)

Claims

A signal processing device that generates two signals by mixing a first signal and a second signal generated from the first signal at two degrees of mixing,
Generating means for generating the second signal from the first signal;
Mixing coefficient determining means for determining the degree of mixing;
Mixing means for mixing the first signal and the second signal based on the degree of mixing determined by the mixing coefficient determining means;
The generating means includes
Delay means for delaying the first signal by N (N> 0) unit time;
Filter means for processing the output signal of the delay means;
Processing means for processing the first signal,
The signal processing apparatus, wherein the generation unit generates the second signal from an output signal of the filter unit and an output signal of the processing unit.

The generating unit includes a synthesizing unit that synthesizes the second signal from the output signal of the filter unit and the output signal of the processing unit in accordance with an acoustic feature amount of the first signal. The signal processing apparatus according to claim 1, wherein the target feature amount is a feature amount that becomes large when the first signal is abruptly fluctuating.

The generating unit includes a synthesizing unit that synthesizes the second signal from the output signal of the filter unit and the output signal of the processing unit in accordance with an acoustic feature amount of the first signal. The signal processing apparatus according to claim 1, wherein the target feature amount is a feature amount that becomes large when strong energy is concentrated in a specific frequency band in the first signal.

The synthesizing unit outputs an output signal of the filter unit when the feature amount is small, and outputs an output signal of the processing unit when the feature amount is large. The signal processing device according to any one of claims 1 to 3.

The synthesizing unit includes a second mixing unit that mixes the output signal of the filter unit and the output signal of the processing unit. When the feature amount is small, the output signal of the filter unit is increased. 4. The signal processing apparatus according to claim 1, wherein when the mixing is performed and the feature amount is large, the output signal of the processing unit is mixed in a large amount. 5.

The said processing means has a 2nd filter means, and the said 2nd filter means has the order of a filter less than the 1st filter means in the previous period, The any one of Claims 1-5 characterized by the above-mentioned. A signal processing device according to 1.

6. The processing device according to claim 1, wherein the processing unit includes a second delay unit, and the second delay unit has a delay amount smaller than that of the first delay unit. The signal processing apparatus as described.

8. The processing unit according to claim 1, wherein the processing unit includes a third filter unit, and the third filter unit is a process of rotating the phase of the input signal by 90 degrees or -90 degrees. The signal processing device according to claim 1.

The generating means is configured to be able to process a signal independently for a plurality of frequency components, and outputs a signal of the filter means for a signal in a low frequency band, and a signal in a high frequency band. The signal processing apparatus according to claim 1, wherein a signal of the processing means is output.

The first signal is a signal obtained by downmixing two signals,
10. The mixing coefficient determining unit determines a degree of mixing from a value determined according to a level ratio L and a phase difference θ between the two original signals. The signal processing device according to claim 1.

The mixing coefficient determining means is an angle obtained by dividing the parallelogram with the angle formed by the two adjacent sides being the θ and the length ratio being the L by the diagonal of the parallelogram. , A and B, and values d1 and d2 determined according to the level ratio L, d1 * cos (A), d1 * sin (A), d2 * cos (-B), d2 * sin (-B ), And the mixing means uses r1 as the real part when the first signal is expressed as a complex number, i1 as the imaginary part when the first signal is expressed as a complex number, and r2 as the real part when the second signal is expressed as a complex number. When the part is i2,
Let d1 * cos (A) * r1 + d1 * sin (A) * r2 be the real part of the first output signal,
Let d1 * cos (A) * i1 + d1 * sin (A) * i2 be the imaginary part of the first output signal,
Let d2 * cos (-B) * r1 + d2 * sin (-B) * r2 be the real part of the second output signal,
11. The signal according to claim 1, wherein d2 * cos (-B) * i1 + d2 * sin (-B) * i2 is an imaginary part of the second output signal. Processing equipment.

The mixing coefficient determining means calculates the values of d1 and d2.
d1 = L / ((1 + 2 * L * cos (θ) + L * L) ^ 0.5), d2 = 1 / ((1 + 2 * L * cos (θ) + L * L) ^ 0.5) The signal processing device according to claim 1, wherein the signal processing device is obtained.

The mixing coefficient determining means calculates the values of d1 and d2 as d1 = L / ((1 + L * L) ^ 0.5), d2 = 1 / ((1 + L * L) ^ 0.5) The signal processing device according to claim 1, wherein the signal processing device is characterized in that:

The image processing apparatus further includes a feature amount receiving unit that receives data in which the acoustic feature amount is encoded, and the generation unit generates a signal according to the data in which the acoustic feature amount is encoded. The signal processing device according to any one of claims 1 to 13.

The data obtained by encoding the acoustic feature amount is 1-bit data, and the generation unit outputs an output signal of the processing unit when the data is true, and the data of the filter unit when the data is false. The signal processing apparatus according to claim 14, wherein an output signal is output.

A signal processing device that generates two signals by mixing a first signal and a second signal generated from the first signal at two degrees of mixing,
Generating means for generating the second signal from the first signal;
Mixing coefficient determining means for determining the degree of mixing;
Mixing means for mixing the first signal and the second signal based on the degree of mixing determined by the mixing coefficient determining means;
Sound image moving means for performing processing for moving the reproduced sound images of the two signals generated from the mixing means to positions separated from each other;
A signal processing apparatus comprising:

17. The sound image moving means receives two signals generated from the mixing means and processes the signals to move the reproduced sound images of the two signals to positions separated from each other. Signal processing device.

The sound image moving means processes the degree of mixing determined by the mixing coefficient determining means so that the reproduced sound images of the two signals generated by the mixing means move to positions separated from each other. The signal processing apparatus according to claim 16.

The sound image moving means includes a first processing means for performing a process for moving a reproduced sound image of the signal in a first direction on one of the two signals generated from the mixing means; 17. The apparatus according to claim 16, further comprising second processing means for performing a process for moving a reproduced sound image of the signal in a second direction that is opposite to the first direction. Signal processing device.

The first processing means performs processing to change the amplitude of a signal in a predetermined frequency band, and the second processing means performs processing to change the amplitude of a signal in a frequency band different from the frequency band. The signal processing apparatus according to claim 17.

When the two signals generated from the mixing unit are approximate, a signal generated from the sound image moving unit is output; otherwise, a signal generated from the mixing unit is output. The signal processing device according to any one of claims 16 to 18.