JP2007264431A

JP2007264431A - Sound source separation system, encoder and decoder

Info

Publication number: JP2007264431A
Application number: JP2006091327A
Authority: JP
Inventors: Fumitada Itakura; 文忠板倉; Hideki Sakano; 秀樹坂野; Fukuji Kawakami; 福司川上; Takao Nakatani; 隆雄中谷; Akiyoshi Sato; 明善佐藤
Original assignee: Yamaha Corp; Meijo University
Current assignee: Yamaha Corp; Meijo University
Priority date: 2006-03-29
Filing date: 2006-03-29
Publication date: 2007-10-11

Abstract

PROBLEM TO BE SOLVED: To provide a sound source separation system with wide application. SOLUTION: The sound source separation system comprises: an encoder in which, while a mixed signal is generated by adding a sound signal to each other with a predetermined sound volume ratio, the sound signal being output from a plurality of sound sources, an envelope signal indicating an envelope of each sound signal in each of a plurality of frequency bands different from each other is generated, and the envelope signal and the mixed signal are output; and a decoder in which the mixed signal and the envelope signal are received from the encoder, and each of frequency band components of the mixed signal is made a carrier signal in the frequency band, and an amplitude of each carrier signal is adjusted according to the signal value of the envelope signal corresponding to a sound source which is designated to be separated and its frequency band, and thereafter, the carrier signal is added to each other and output. COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、複数の音源から出力された音が混在している音響信号から各音源の音を分離する音源分離技術に関する。 The present invention relates to a sound source separation technique for separating the sound of each sound source from an acoustic signal in which sounds output from a plurality of sound sources are mixed.

楽曲の音響信号からその楽曲の楽譜を作り出す自動採譜や、音声認識や音声認証の事前処理として雑音・残響音が混ざった音声信号からそれら雑音や残響音を除去して音声のみを抽出する際、カラオケやＭＭＯ、自動議事録作成を実用化する際の中核となる技術として音源分離技術が挙げられる。 When extracting only the voice by automatically removing the noise and reverberation from the audio signal mixed with noise and reverberation as pre-processing for voice recognition and voice authentication, Sound source separation technology can be cited as a core technology for practical use of karaoke, MMO, and automatic minutes creation.

一般に、音源分離技術は、分離対象である音源に関する情報（例えば、音源の位置や音源の種類等、以下、音源情報）を全く必要としない手法と、音源情報を参照して音源分離を行う手法とに分類される。 In general, the sound source separation technique does not require any information about the sound source to be separated (for example, the position of the sound source and the type of the sound source, hereinafter referred to as sound source information), and a method for performing sound source separation with reference to the sound source information. And classified.

前者の一例としては、調波構造分析により、目的音声の基本周波数の抽出（ステップ１）→目的音声に窓関数をかけ短時間フーリエ変換によるスペクトログラムの抽出（ステップ２）→基本周波数の整数倍の調波成分だけを強調（または抽出）（ステップ３）→強調（または抽出）された調波成分から音を再合成（ステップ４）、という手順で音源分離を行う技術などがある。
一方、非特許文献１や特許文献１に開示された技術では、音源情報や音源の駆動状況の時間変化を示すイベント情報を参照することによって、上記ステップ３の処理を行うことなく音源分離を行う技術が開示されている。
後藤真孝村岡洋一、 “打楽器を対象とした音源分離システム”、［online］、独立行政法人産業技術研究所情報技術部門インターネット<URL:http://www.staff_aist.go.jp/ am-goto/PROJ/sss-j.html 特開２００４−２５８４４２号公報 As an example of the former, extraction of the fundamental frequency of the target speech by harmonic structure analysis (step 1) → spectrogram extraction by short-time Fourier transform by applying a window function to the target speech (step 2) → integer multiple of the fundamental frequency There is a technique of performing sound source separation by a procedure of emphasizing (or extracting) only harmonic components (step 3) → recombining sounds from emphasized (or extracted) harmonic components (step 4).
On the other hand, in the technologies disclosed in Non-Patent Document 1 and Patent Document 1, sound source separation is performed without performing the processing of step 3 by referring to sound source information and event information indicating temporal changes in the driving state of the sound source. Technology is disclosed.
Masataka Goto, Yoichi Muraoka, “Sound source separation system for percussion instruments”, [online] Information Technology Division, National Institute of Advanced Industrial Science and Technology <URL: http: //www.staff_aist.go.jp/am-goto/ PROJ / sss-j.html JP 2004-258442 A

しかしながら、前記調波構造分析を用いた音源分離技術には、処理が複雑で大規模になる割に適用範囲が限られてしまう、といった問題点がある。この問題点は、分離対象の音源が略同一のスペクトルを有している場合には分離が困難であることなどに起因している。一方、非特許文献１や特許文献１に開示された技術においても、音源情報やイベント情報が必須であるため、やはり、普遍性に欠け適用範囲が限られてしまうといった問題点がある。
本発明は、上記課題を鑑みて為されたものであり、適用範囲の広い音源分離技術を提供することを目的としている。 However, the sound source separation technique using the harmonic structure analysis has a problem in that the application range is limited for a complicated and large-scale process. This problem is caused by the fact that separation is difficult when the sound source to be separated has substantially the same spectrum. On the other hand, the techniques disclosed in Non-Patent Document 1 and Patent Document 1 also require the sound source information and event information, so that there is still a problem that the applicable range is limited due to lack of universality.
The present invention has been made in view of the above problems, and an object thereof is to provide a sound source separation technique with a wide application range.

上記課題を解決するために、本発明は、互いに異なる複数の音源の各々から出力される音響信号を予め定められた音量比率で混合して得られる混合信号を出力するエンコーダと、前記複数の音源の各々に対応する音響信号を前記混合信号から分離して出力するデコーダと、を含む音源分離システムにおいて、前記エンコーダは、前記複数の音源から出力される音響信号の各々について、互いに異なる複数の周波数帯域の各々におけるエンベロープを特定し、そのエンベロープを示すエンベロープ信号を生成するエンベロープ信号生成手段と、前記エンベロープ信号を出力する第１の出力手段とを有し、前記デコーダは、前記エンコーダから出力された前記混合信号の前記複数の周波数帯域での各サブバンド信号をその周波数帯域における搬送信号として生成する搬送信号生成手段と、前記複数の音源の各々に対応する音響信号を、その音源に対応する前記エンベロープ信号の前記各周波数帯域における信号値に応じた振幅調整をその周波数帯域に対応する前記搬送信号に施した後に前記振幅調整が施された各搬送信号を加算して生成し、出力する第２の出力手段とを有することを特徴とする音源分離システムを提供する。 In order to solve the above-described problems, the present invention provides an encoder that outputs a mixed signal obtained by mixing acoustic signals output from each of a plurality of different sound sources at a predetermined volume ratio, and the plurality of sound sources. And a decoder that separates and outputs an acoustic signal corresponding to each of the mixed signals and outputs the plurality of different frequencies for each of the acoustic signals output from the plurality of sound sources. Envelope signal generating means for specifying an envelope in each of the bands and generating an envelope signal indicating the envelope; and first output means for outputting the envelope signal, wherein the decoder is output from the encoder Each subband signal in the plurality of frequency bands of the mixed signal is conveyed in the frequency band Corresponding amplitude adjustment according to the signal value in each frequency band of the envelope signal corresponding to the sound source corresponding to each of the plurality of sound sources corresponding to the carrier signal generating means for generating as a signal And a second output unit that generates and outputs the carrier signals that have been subjected to the amplitude adjustment after being applied to the carrier signal.

より好ましい態様においては、前記第１の出力手段は、前記エンベロープ信号生成手段により生成されたエンベロープ信号の周波数を可聴帯域外の周波数へシフトさせる変調を施した後に、前記混合信号と加算して出力する一方、前記デコーダは、通過帯域の上限が可聴帯域の上限に一致する第１のフィルタと、阻止域の上限が可聴帯域の上限に一致する第２のフィルタと、前記エンコーダから出力される信号を２分流し、その一方を前記第１のフィルタを通過させるとともに他方を前記第２のフィルタを通過させることによって、前記混合信号と前記エンベロープ信号とを前記エンコーダから出力される信号から分離する分離手段と有することを特徴としている。
なお、前記可聴帯域外の周波数へシフトさせる変調処理に代えて、聴覚的に重要でない周波数帯域へのシフトとしても良い。 In a more preferred aspect, the first output means performs modulation to shift the frequency of the envelope signal generated by the envelope signal generation means to a frequency outside the audible band, and then adds and outputs the mixed signal. On the other hand, the decoder includes a first filter whose upper limit of the pass band matches the upper limit of the audible band, a second filter whose upper limit of the stop band matches the upper limit of the audible band, and a signal output from the encoder Separating the mixed signal and the envelope signal from the signal output from the encoder by passing one of them through the first filter and the other through the second filter. It is characterized by having means.
Instead of the modulation process for shifting to a frequency outside the audible band, a shift to a frequency band that is not audibly important may be used.

また、別の好ましい態様においては、前記エンベロープ信号生成手段は、各々の通過帯域が前記複数の周波数帯域の何れかに一致する複数のバンドパスフィルタ、からなる第１のフィルタ群を有し、前記複数の音源から出力される音響信号の各々を前記複数のバンドパスフィルタの各々を通過させて得られる信号の実効値を算出することによって、その音響信号の前記各周波数帯域におけるエンベロープを特定する一方、前記搬送信号生成手段は、前記第１のフィルタ群と同一の構成を有する第２のフィルタ群を有し、前記エンコーダから出力される混合信号を前記第２のフィルタ群を構成する各バンドパスフィルタを通過させることによって、前記各周波数帯域の搬送信号を生成することを特徴としている。 In another preferred aspect, the envelope signal generation means includes a first filter group including a plurality of bandpass filters each having a passband that matches any of the plurality of frequency bands, While calculating the effective value of the signal obtained by passing each of the acoustic signals output from a plurality of sound sources through each of the plurality of bandpass filters, the envelope of each acoustic signal in the frequency band is specified The carrier signal generation means has a second filter group having the same configuration as the first filter group, and each band pass that constitutes the second filter group by using the mixed signal output from the encoder. A carrier signal in each frequency band is generated by passing through a filter.

また、別の好ましい態様においては、前記エンベロープ信号生成手段は、前記複数の音源から出力される音響信号の各々にウェーブレット変換を施すことによって、前記複数の周波数帯域の各々における前記各音響信号のエンベロープを特定することを特徴としている。 In another preferred aspect, the envelope signal generating means performs wavelet transform on each of the acoustic signals output from the plurality of sound sources, thereby enclosing the envelope of each of the acoustic signals in each of the plurality of frequency bands. It is characterized by specifying.

また、別の好ましい態様においては、前記搬送信号生成手段は、前記複数の周波数帯域の各々に属するバンドノイズまたはその周波数帯域に属する周波数の正弦波を示す信号を生成する発信器を有し、前記複数の音源の各々について、その音源に対応する前記エンベロープ信号の信号値が予め定められた閾値よりも小さいか否かを前記周波数帯域毎に判定し、前記閾値よりも小さいと判定された周波数帯域については、前記発信器により生成される信号をその周波数帯域の搬送信号とする一方、前記閾値以上である周波数帯域については、前記混合信号のその周波数帯域成分をその周波数帯域における搬送信号とすることを特徴としている。 In another preferred aspect, the carrier signal generation means includes a transmitter that generates a signal indicating band noise belonging to each of the plurality of frequency bands or a sine wave of a frequency belonging to the frequency band, For each of a plurality of sound sources, it is determined for each frequency band whether the signal value of the envelope signal corresponding to the sound source is smaller than a predetermined threshold, and the frequency band determined to be smaller than the threshold For the frequency band, the signal generated by the transmitter is used as a carrier signal in the frequency band, and for the frequency band that is equal to or higher than the threshold, the frequency band component of the mixed signal is used as the carrier signal in the frequency band. It is characterized by.

また、上記課題を解決するために、本発明は、複数の音源から出力される音響信号の各々を予め定められた音量比率で加算し混合信号を生成する混合信号生成手段と、前記複数の音源から出力される音響信号の各々について、互いに異なる複数の周波数帯域の各々におけるエンベロープを特定し、そのエンベロープを示すエンベロープ信号を生成するエンベロープ信号生成手段と、前記混合信号と前記エンベロープ信号とを出力する出力手段とを有することを特徴とするエンコーダを提供する。
なお、本発明の別の態様においては、コンピュータ装置を、複数の音源から出力される音響信号の各々を予め定められた音量比率で加算し混合信号を生成する混合信号生成手段と、前記複数の音源から出力される音響信号の各々について、互いに異なる複数の周波数帯域の各々におけるエンベロープを特定し、そのエンベロープを示すエンベロープ信号を生成するエンベロープ信号生成手段と、前記混合信号と前記エンベロープ信号とを出力する出力手段として機能させることを特徴とするプログラムを提供するとしても良い。 In order to solve the above problem, the present invention provides a mixed signal generating means for generating a mixed signal by adding each of acoustic signals output from a plurality of sound sources at a predetermined volume ratio, and the plurality of sound sources. For each of the acoustic signals output from the, an envelope in each of a plurality of different frequency bands is specified, envelope signal generation means for generating an envelope signal indicating the envelope, and the mixed signal and the envelope signal are output And an output means.
In another aspect of the present invention, the computer apparatus includes a mixed signal generating unit that generates a mixed signal by adding each of the acoustic signals output from the plurality of sound sources at a predetermined volume ratio, and the plurality of the plurality of sound signals. For each acoustic signal output from the sound source, an envelope in each of a plurality of different frequency bands is specified, envelope signal generating means for generating an envelope signal indicating the envelope, and the mixed signal and the envelope signal are output. It is also possible to provide a program characterized by functioning as output means.

また、上記課題を解決するために、本発明は、互いに異なる複数の音源の各々から出力される音響信号を予め定められた音量比率で混合することにより得られる混合信号を受け取り、互いに異なる複数の周波数帯域の各々における前記混合信号のサブバンド信号をその周波数帯域における搬送信号として生成する搬送信号生成手段と、前記各音響信号の前記複数の周波数帯域におけるエンベロープを示すエンベロープ信号を受け取り、前記複数の音源の各々に対応する音響信号を、その音源に対応する前記エンベロープ信号の前記各周波数帯域における信号値に応じた振幅調整をその周波数帯域に対応する前記搬送信号に施した後に前記振幅調整が施された各搬送信号を加算して生成し、出力する出力手段とを有することを特徴とするデコーダを提供する。
なお、本発明の別の態様においては、コンピュータ装置を、互いに異なる複数の音源の各々から出力される音響信号を予め定められた音量比率で混合することにより得られる混合信号を受け取り、互いに異なる複数の周波数帯域の各々における前記混合信号のサブバンド信号をその周波数帯域における搬送信号として生成する搬送信号生成手段と、前記各音響信号の前記複数の周波数帯域におけるエンベロープを示すエンベロープ信号を受け取り、前記複数の音源の各々に対応する音響信号を、その音源に対応する前記エンベロープ信号の前記各周波数帯域における信号値に応じた振幅調整をその周波数帯域に対応する前記搬送信号に施した後に前記振幅調整が施された各搬送信号を加算して生成し、出力する出力手段として機能させることを特徴とするプログラムを提供するとしても良い。 In order to solve the above problem, the present invention receives a mixed signal obtained by mixing acoustic signals output from each of a plurality of different sound sources at a predetermined volume ratio, and receives a plurality of different sound signals. Carrier signal generating means for generating a sub-band signal of the mixed signal in each of the frequency bands as a carrier signal in the frequency band; and receiving envelope signals indicating envelopes in the plurality of frequency bands of each of the acoustic signals; After the acoustic signal corresponding to each sound source is subjected to amplitude adjustment according to the signal value in each frequency band of the envelope signal corresponding to the sound source to the carrier signal corresponding to the frequency band, the amplitude adjustment is performed. Output means for adding the generated carrier signals to each other and generating the output. To provide.
In another aspect of the present invention, the computer apparatus receives a mixed signal obtained by mixing acoustic signals output from each of a plurality of different sound sources at a predetermined volume ratio, and has a plurality of different ones. Receiving a carrier signal generating means for generating a sub-band signal of the mixed signal in each of the frequency bands as a carrier signal in the frequency band; and receiving envelope signals indicating envelopes in the plurality of frequency bands of the respective acoustic signals, The amplitude adjustment is performed after the acoustic signal corresponding to each sound source is subjected to amplitude adjustment according to the signal value in each frequency band of the envelope signal corresponding to the sound source to the carrier signal corresponding to the frequency band. To function as an output means to generate and output by adding each applied carrier signal It may provide a program characterized.

本発明によれば、適用範囲の広い音源分離技術を提供することが可能になる、といった効果を奏する。 According to the present invention, it is possible to provide a sound source separation technique with a wide application range.

本発明は、「人間の聴覚は、音源信号の波形そのものよりもアーティキュレーションに左右される」、したがって、「イベント情報や周波数スペクトルの変化だけを忠実に再現してやれば、効果的に信号の復元が可能になる」という着想に基づくものである。
以下、本発明の実施の最良の形態について図面を参照しつつ説明する。
（Ａ：構成）
（Ａ−１：音源分離システム１０の構成）
図１は、本発明の１実施形態に係る音源分離システム１０の構成例を示すブロック図である。図１に示すように、この音源分離システム１０は、例えばインターネットなどの通信網３００に接続されているエンコーダ１００と、同じく通信網３００に接続されているデコーダ２００とを含んでいる。エンコーダ１００とデコーダ２００は、通信網３００を介して通信することができるように構成されている。なお、本実施形態では、通信網３００がインターネットである場合について説明するが、エンコーダ１００とデコーダ２００との通信を仲介することが可能な通信網であればどのような通信網であっても良い。 According to the present invention, “human hearing is more influenced by articulation than the waveform of the sound source signal itself.” Therefore, “if only event information and changes in the frequency spectrum are faithfully reproduced, the signal can be effectively restored. It is based on the idea that "is possible."
The best mode for carrying out the present invention will be described below with reference to the drawings.
(A: Configuration)
(A-1: Configuration of the sound source separation system 10)
FIG. 1 is a block diagram illustrating a configuration example of a sound source separation system 10 according to an embodiment of the present invention. As shown in FIG. 1, the sound source separation system 10 includes an encoder 100 connected to a communication network 300 such as the Internet, and a decoder 200 that is also connected to the communication network 300. The encoder 100 and the decoder 200 are configured to be able to communicate via the communication network 300. In the present embodiment, the case where the communication network 300 is the Internet will be described. However, any communication network may be used as long as the communication between the encoder 100 and the decoder 200 can be mediated. .

図１に示す音源分離システム１０においては、エンコーダ１００には、互いに異なる２種類の音源ＡおよびＢの各々から音響信号が供給される。なお、以下では、音源Ａからは音響信号Ａ（ｔ）が供給され、音源Ｂからは音響信号Ｂ（ｔ）が供給されるものとする。図１のエンコーダ１００は、この両者を予め定められた音量比率で混合して混合信号Ｘ（ｔ）を生成することができるように構成されている。このため、例えば、上記音響信号Ａ（ｔ）が、ある楽曲の伴奏音であり、上記音響信号Ｂ（ｔ）がその楽曲の歌唱音である場合には、エンコーダ１００から出力される混合信号Ｘ（ｔ）は、その楽曲の伴奏付きの歌唱音を表すことになる。加えて、エンコーダ１００は、混合信号Ｘ（ｔ）の他に、本発明に特徴的な制御信号を生成し、この制御信号を混合信号Ｘ（ｔ）に加算して得られる統合信号Ｙ（ｔ）を通信網３００を介してデコーダ２００へ送信するように構成されている。 In the sound source separation system 10 shown in FIG. 1, the encoder 100 is supplied with acoustic signals from two different sound sources A and B. In the following description, it is assumed that the sound signal A (t) is supplied from the sound source A and the sound signal B (t) is supplied from the sound source B. The encoder 100 in FIG. 1 is configured to be able to generate a mixed signal X (t) by mixing both at a predetermined volume ratio. Therefore, for example, when the acoustic signal A (t) is an accompaniment sound of a certain music and the acoustic signal B (t) is a singing sound of the music, the mixed signal X output from the encoder 100 is used. (T) represents the singing sound accompanied by the music. In addition, the encoder 100 generates a control signal characteristic of the present invention in addition to the mixed signal X (t), and adds the control signal to the mixed signal X (t) to obtain an integrated signal Y (t ) To the decoder 200 via the communication network 300.

一方、図１のデコーダ２００は、通信網３００を介してエンコーダ１００から送られてくる統合信号Ｙ（ｔ）を受信し、この統合信号Ｙ（ｔ）から混合信号Ｘ（ｔ）と上記制御信号とを分離し、この制御信号を用いて上記２種類の音源に対応する各音響信号を混合信号Ｘ（ｔ）から分離し出力するように構成されている。
以下、エンコーダ１００およびデコーダ２００の構成を中心に説明する。 On the other hand, the decoder 200 of FIG. 1 receives the integrated signal Y (t) sent from the encoder 100 via the communication network 300, and from this integrated signal Y (t), the mixed signal X (t) and the above control signal. And using the control signal, the acoustic signals corresponding to the two types of sound sources are separated from the mixed signal X (t) and output.
Hereinafter, the configuration of the encoder 100 and the decoder 200 will be mainly described.

（Ａ−２：エンコーダ１００の構成）
図２は、エンコーダ１００の構成例を示すブロック図である。
図２に示すように、エンコーダ１００は、ボリューム１１０Ａおよび１１０Ｂと、エンベロープ信号生成手段１２０Ａおよび１２０Ｂと、圧縮・変換回路１３０Ａおよび１３０Ｂと、加算器１４０および１５０と、変調処理回路１６０と、を含んでいる。
図２のボリューム１１０Ａには音響信号Ａ（ｔ）が入力され、ボリューム１１０Ｂには、音響信号Ｂ（ｔ）が入力される。このボリューム１１０Ａと１１０Ｂとは、図示せぬ操作部を操作することによって音響信号Ａ（ｔ）およびＢ（ｔ）に対してユーザが指定した音量比率を満たすように、音響信号Ａ（ｔ）およびＢ（ｔ）の音量を調整して出力する。なお、本実施形態では、音響信号Ａ（ｔ）およびＢ（ｔ）をユーザの指定した音量比率で混合して混合信号Ｘ（ｔ）を生成する際に、両者の音量をボリューム１１０Ａおよび１１０Ｂにより適宜調整する場合について説明するが、係る音量の調整を行わずに混合信号Ｘ（ｔ）を生成する態様においては、これらボリューム１１０Ａおよび１１０Ｂを設けておく必要がないことは言うまでもない。 (A-2: Configuration of encoder 100)
FIG. 2 is a block diagram illustrating a configuration example of the encoder 100.
As shown in FIG. 2, encoder 100 includes volumes 110A and 110B, envelope signal generation means 120A and 120B, compression / conversion circuits 130A and 130B, adders 140 and 150, and modulation processing circuit 160. It is out.
The acoustic signal A (t) is input to the volume 110A in FIG. 2, and the acoustic signal B (t) is input to the volume 110B. The volumes 110A and 110B are configured such that the acoustic signal A (t) and the volume ratio specified by the user with respect to the acoustic signals A (t) and B (t) are satisfied by operating an operation unit (not shown). The volume of B (t) is adjusted and output. In the present embodiment, when the mixed signals X (t) are generated by mixing the acoustic signals A (t) and B (t) at a volume ratio specified by the user, the volume levels of the two are set by the volumes 110A and 110B. Although the case where it adjusts suitably is demonstrated, it cannot be overemphasized that it is not necessary to provide these volumes 110A and 110B in the aspect which produces | generates mixed signal X (t), without adjusting the volume which concerns.

図２に示すように、ボリューム１１０Ａから出力される音響信号Ａ（ｔ）は２つに分流され、その一方はエンベロープ信号生成手段１２０Ａへ入力され、他方は加算器１４０へ入力される。同様に、ボリューム１１０Ｂから出力される音響信号Ｂ（ｔ）も２つに分流され、その一方はエンベロープ信号生成手段１２０Ｂへ入力され、他方は加算器１４０へ入力される。そして、加算器１４０へ入力される音響信号Ａ（ｔ）とＢ（ｔ）とは、この加算器１４０により加算され、両者の和信号である混合信号Ｘ（ｔ）が加算器１４０から出力される。 As shown in FIG. 2, the acoustic signal A (t) output from the volume 110A is divided into two, one of which is input to the envelope signal generating means 120A and the other is input to the adder 140. Similarly, the acoustic signal B (t) output from the volume 110B is also divided into two, one of which is input to the envelope signal generation means 120B and the other is input to the adder 140. The acoustic signals A (t) and B (t) input to the adder 140 are added by the adder 140, and a mixed signal X (t) that is the sum signal of the two is output from the adder 140. The

エンベロープ信号生成手段１２０Ａと１２０Ｂとは、図２に示すように同一の構成を有している。以下、この両者を区別する必要がない場合には、「エンベロープ信号生成手段１２０」と表記する。図２に示すように、エンベロープ信号生成手段１２０は、各々通過帯域が異なるＪ（Ｊは２以上の自然数）個のバンドパスフィルタ１２１−ｊ（ｊ＝１〜Ｊ：以下、同じ）を有しており、バンドパスフィルタ１２１−ｊの出力側には実効値検出回路（図２では、「ｒｍｓ」と表記）１２２−ｊが接続されている。なお、周波数帯域分割数Ｊについては、人間の聴覚の臨界帯域幅を参考にして、各通過帯域の帯域幅が数オクターブ分の１程度（例えば、１／５オクターブ〜１／２オクターブ)になるような値であれば良い。 Envelope signal generation means 120A and 120B have the same configuration as shown in FIG. Hereinafter, when it is not necessary to distinguish between the two, they are referred to as “envelope signal generation means 120”. As shown in FIG. 2, the envelope signal generating means 120 has J (J is a natural number of 2 or more) band-pass filters 121-j (j = 1 to J: the same applies hereinafter) each having a different pass band. An effective value detection circuit (denoted as “rms” in FIG. 2) 122-j is connected to the output side of the bandpass filter 121-j. Regarding the frequency band division number J, with reference to the critical bandwidth of human hearing, the bandwidth of each pass band is about one-several octave (for example, 1/5 octave to 1/2 octave). Any value is acceptable.

エンベロープ信号生成手段１２０へ入力された音響信号は、図２に示すように、バンドパスフィルタ１２１−ｊの数分だけ分流され、バンドパスフィルタ１２１−ｊの各々へ入力される。バンドパスフィルタ１２１−ｊの各々から出力されるサブバンド信号（上記音響信号の通過帯域成分を示す信号）は、図２に示すように、実効値検出回路１２２−ｊへ入力される。なお、この実効値検出回路１２２−ｊは、図３に示す回路構成を有しており、入力された信号の実効値（エンベロープ）を算出して出力する。つまり、エンベロープ信号生成手段１２０は、そのエンベロープ信号生成手段１２０へ入力された音響信号の上記複数の通過帯域の各々におけるエンベロープを示すエンベロープ信号を出力する。なお、図３に示す実効値検出回路１２２−ｊの時定数については、実験により好適な値を適宜定めるようにすれば良いが、人間の聴覚の時間分解能の検知限に沿い数１０ミリ秒程度の値であることが望ましい。その理由は、実効値検出回路１２２−ｊの時定数を数１０ミリ秒程度の定めると、実効値検出回路１２２−ｊから出力されるエンベロープ信号の周波数は非常に低くなり、ほぼ直流に近くなる。このように、エンベロープ信号がほぼ直流に近くなると、オリジナルの音響信号に比較して情報量を大幅に削減することができるからである。 As shown in FIG. 2, the acoustic signal input to the envelope signal generation means 120 is divided by the number of band-pass filters 121-j and input to each of the band-pass filters 121-j. As shown in FIG. 2, the subband signal (signal indicating the passband component of the acoustic signal) output from each of the bandpass filters 121-j is input to the effective value detection circuit 122-j. The effective value detection circuit 122-j has the circuit configuration shown in FIG. 3, and calculates and outputs the effective value (envelope) of the input signal. That is, the envelope signal generation unit 120 outputs an envelope signal indicating an envelope in each of the plurality of passbands of the acoustic signal input to the envelope signal generation unit 120. As for the time constant of the effective value detection circuit 122-j shown in FIG. 3, a suitable value may be appropriately determined by experiment, but about several tens of milliseconds along the detection limit of the temporal resolution of human hearing. It is desirable that the value of The reason is that when the time constant of the effective value detection circuit 122-j is determined to be about several tens of milliseconds, the frequency of the envelope signal output from the effective value detection circuit 122-j becomes very low and is almost close to direct current. . As described above, when the envelope signal is substantially close to direct current, the amount of information can be greatly reduced as compared with the original acoustic signal.

本実施形態においては、図２に示すように、エンベロープ信号生成手段１２０Ａにおいては、音響信号Ａ（ｔ）を上記複数のバンドパスフィルタ１２１−ｊの各々を通過させることによって、サブバンド信号Ａ_ｊ（ｔ）（ｊ＝１〜Ｊ）が得られ、これらサブバンド信号Ａ_ｊ（ｔ）を実効値検出回路１２２へ入力することによって、音響信号Ａ（ｔ）についてのエンベロープ信号（ａ1, ａ２…ａ_Ｊ）が出力される。同様に、エンベロープ信号生成手段１２０Ｂからは、音響信号Ｂ（ｔ）についてエンベロープ信号（ｂ1, ｂ２…ｂ_Ｊ）が出力される。ここで、Ａ_ｊ（ｔ）とａ_jとの間、Ｂ_ｊ（ｔ）とｂ_jとの間には、下記の数１に示す関係が成り立っている。

In the present embodiment, as shown in FIG. 2, the envelope signal generation means 120A passes the acoustic signal A (t) through each of the plurality of bandpass filters 121-j, thereby subband signal A _j. (T) (j = 1 to J) are obtained, and by inputting these subband signals A _j (t) to the effective value detection circuit 122, envelope signals (a1, a2... For the acoustic signal A (t) are obtained. a _J ) is output. Similarly, an envelope signal (b1, b2,... B _J ) is output from the envelope signal generating means 120B for the acoustic signal B (t). Here, the relationship shown in the following Equation 1 is established between A _j (t) and a _j and between B _j (t) and b _j .

図２に示すように、エンベロープ信号生成手段１２０Ａから出力されるエンベロープ信号（ａ1, ａ２…ａ_Ｊ）は圧縮・変換回路１３０Ａへ入力され、エンベロープ信号生成手段１２０Ｂから出力されるエンベロープ信号（ｂ1, ｂ２…ｂ_Ｊ）は圧縮・変換回路１３０Ｂへ入力される。この圧縮・変換回路１３０Ａおよび１３０Ｂは、入力された信号に対して、例えば対数圧縮アルゴリズムなどの所定の圧縮アルゴリズムにしたがった圧縮処理を施して変調処理回路１６０へ出力する。ここで、上記圧縮処理を上記各エンベロープ信号に施す理由は、これらエンベロープ信号を通信網３００経由で伝送する伝送系（図示省略）の雑音の抑圧およびこれらエンベロープ信号の情報量を削減するためである。しかしながら、上記の如き圧縮処理は必ずしも必須ではなく、係る圧縮処理を行わないようにしても勿論良い。このような場合には、圧縮・変換回路１３０Ａおよび１３０Ｂを省略可能であることは言うまでもない。 As shown in FIG. 2, the envelope signals (a1, a2,... A _J ) output from the envelope signal generation means 120A are input to the compression / conversion circuit 130A and output from the envelope signal generation means 120B. b2... b _J ) are input to the compression / conversion circuit 130B. The compression / conversion circuits 130 A and 130 B perform compression processing according to a predetermined compression algorithm such as a logarithmic compression algorithm on the input signal and output the result to the modulation processing circuit 160. The reason why the compression processing is applied to the envelope signals is to suppress noise in a transmission system (not shown) that transmits the envelope signals via the communication network 300 and to reduce the amount of information of the envelope signals. . However, the compression processing as described above is not necessarily essential, and it is of course possible to avoid such compression processing. In such a case, it goes without saying that the compression / conversion circuits 130A and 130B can be omitted.

次いで、図２の変調処理回路１６０について図４を参照しつつ詳細に説明する。図４に示すように、変調処理回路１６０は、分離対象である音源の数（本実施形態では、２）に周波数帯域分割数Ｊを乗算して得られる数分の発信器１６１−ｎ（ｎ＝１〜２Ｊ）と、各発信器１６１−ｎに１つずつ接続されている電圧制御型増幅器（図４では、“ＶＣＡ”と表記）１６２−ｎと、各電圧制御型増幅器１６２−ｎに接続されている加算器１６３と、を有している。 Next, the modulation processing circuit 160 in FIG. 2 will be described in detail with reference to FIG. As shown in FIG. 4, the modulation processing circuit 160 has as many transmitters 161-n (n) as the number obtained by multiplying the number of sound sources to be separated (in this embodiment, 2) by the frequency band division number J. = 1 to 2J), a voltage controlled amplifier (indicated as “VCA” in FIG. 4) 162-n connected to each transmitter 161-n, and each voltage controlled amplifier 162-n And an adder 163 connected thereto.

発信器１６１−ｎは、可聴域よりも高い所定の周波数ｆｎの正弦波信号を生成し、電圧制御型増幅器１６２−ｎへ出力する。なお、本実施形態では、発信器１６１−ｎにより上記正弦波信号を出力する場合について説明するが、周波数ｆｎが中心周波数である狭帯域バンドノイズ信号を出力するようにしても良い。 The transmitter 161-n generates a sine wave signal having a predetermined frequency fn higher than the audible range and outputs it to the voltage controlled amplifier 162-n. In the present embodiment, the case where the sine wave signal is output by the transmitter 161-n will be described. However, a narrowband noise signal whose frequency fn is the center frequency may be output.

電圧制御型増幅器１６２―ｎには、周波数がｆｎである正弦波信号が入力されるとともに、前述したエンベロープ信号のうちの予め定められた成分の信号値に応じた制御電圧が印加され、その制御電圧に応じた増幅率で上記正弦波信号を増幅して加算器１６３へ出力する。より詳細に説明すると、図４に示すように、電圧制御型増幅器１６２―ｊ（ｊ＝１〜Ｊ）の各々には、上記エンベロープ信号のうちのａ_ｊの値に応じた制御電圧が印加される。一方、電圧制御型増幅器１６２−Ｊ＋ｊ（ｊ＝１〜Ｊ）には、上記エンベロープ信号のうちのｂ_ｊの値に応じた制御電圧が印加される。 A voltage-controlled amplifier 162-n receives a sine wave signal having a frequency of fn and a control voltage corresponding to a signal value of a predetermined component of the envelope signal described above. The sine wave signal is amplified at an amplification factor corresponding to the voltage and output to the adder 163. More specifically, as shown in FIG. 4, a control voltage corresponding to the value of a _j of the envelope signal is applied to each of the voltage controlled amplifiers 162-j (j = 1 to J). The On the other hand, a control voltage corresponding to the value of b _j of the envelope signal is applied to the voltage control type amplifier 162-J + j (j = 1 to J).

加算器１６３は、上記各電圧制御型増幅器１６２−ｎ（ｎ＝１〜２Ｊ）から出力される信号を加算して制御信号Ｆ（ｔ）を生成し、この制御信号Ｆ（ｔ）を加算器１５０へ出力する。 The adder 163 adds the signals output from the voltage controlled amplifiers 162-n (n = 1 to 2J) to generate a control signal F (t), and the control signal F (t) is added to the adder 163. To 150.

図２に戻って、加算器１５０は、加算器１４０から供給された混合信号Ｘ（ｔ）と変調処理回路１６０から供給された制御信号Ｆ（ｔ）とを加算して統合信号Ｙ（ｔ）を生成し、この統合信号Ｙ（ｔ）を図示せぬ伝送系によって通信網３００を介してデコーダ２００へ送信する。
以上がエンコーダ１００の構成である。 Returning to FIG. 2, the adder 150 adds the mixed signal X (t) supplied from the adder 140 and the control signal F (t) supplied from the modulation processing circuit 160 to obtain an integrated signal Y (t). And the integrated signal Y (t) is transmitted to the decoder 200 via the communication network 300 by a transmission system (not shown).
The above is the configuration of the encoder 100.

（Ａ−３：デコーダ２００の構成）
次いで、図５および図６を参照しつつ、デコーダ２００の構成について説明する。
図５に示すように、デコーダ２００は、分離手段２１０と、搬送信号生成手段２２０Ａおよび２２０Ｂと、復元信号出力手段２３０Ａおよび２３０Ｂと、を有している。 (A-3: Configuration of decoder 200)
Next, the configuration of the decoder 200 will be described with reference to FIGS. 5 and 6.
As shown in FIG. 5, the decoder 200 includes separation means 210, carrier signal generation means 220A and 220B, and restoration signal output means 230A and 230B.

分離手段２１０は、統合信号Ｙ（ｔ）から混合信号Ｘ（ｔ）と各エンベロープ信号を分離して出力するものであり、図６に示す構成を有している。図６に示すように、分離手段２１０は、上記統合信号Ｙ（ｔ）を２分流して一方をローパスフィルタ２１１へ入力し、他方をハイパスフィルタ２１２へ入力する。 Separation means 210 separates and outputs mixed signal X (t) and each envelope signal from integrated signal Y (t), and has the configuration shown in FIG. As shown in FIG. 6, the separation unit 210 divides the integrated signal Y (t) into two, inputs one into the low-pass filter 211, and inputs the other into the high-pass filter 212.

ここで、本実施形態では、ローパスフィルタ２１１の通過帯域の上限は可聴域の上限周波数に一致している。このため、ローパスフィルタ２１１からは、統合信号Ｙ（ｔ）の周波数成分のうち可聴域の周波数成分のみが出力される。前述したように、統合信号Ｙ（ｔ）に含まれている可聴域の周波数成分は、混合信号Ｘ（ｔ）の周波数成分のみであるから、統合信号Ｙ（ｔ）をローパスフィルタ２１１を通過させることによって、混合信号Ｘ（ｔ）のみが出力される。一方、本実施形態では、ハイパスフィルタ２１２の阻止域の上限は可聴域の上限周波数に一致している。このため、ハイパスフィルタ２１２からは、統合信号Ｙ（ｔ）の周波数成分のうち可聴域よりも高い周波数帯域（すなわち、超可聴域）の周波数成分のみが出力される。このため、統合信号Ｙ（ｔ）をハイパスフィルタ２１２を通過させることによって、制御信号Ｆ（ｔ）のみが出力される。ハイパスフィルタ２１２から出力される制御信号Ｆ（ｔ）は２分流され、その一方はエンベロープ信号復調部２１３Ａへ入力され、他方はエンベロープ信号復調部２１３Ｂへ入力される。 Here, in the present embodiment, the upper limit of the pass band of the low-pass filter 211 matches the upper limit frequency of the audible range. For this reason, from the low-pass filter 211, only the audible frequency component of the frequency component of the integrated signal Y (t) is output. As described above, since the audible frequency component included in the integrated signal Y (t) is only the frequency component of the mixed signal X (t), the integrated signal Y (t) is passed through the low-pass filter 211. As a result, only the mixed signal X (t) is output. On the other hand, in the present embodiment, the upper limit of the stop band of the high pass filter 212 matches the upper limit frequency of the audible range. For this reason, the high-pass filter 212 outputs only the frequency components in the frequency band higher than the audible range (that is, the super audible range) among the frequency components of the integrated signal Y (t). For this reason, only the control signal F (t) is output by passing the integrated signal Y (t) through the high-pass filter 212. The control signal F (t) output from the high pass filter 212 is divided into two, one of which is input to the envelope signal demodulator 213A and the other is input to the envelope signal demodulator 213B.

図６のエンベロープ信号復調部２１３Ａは、制御信号Ｆ（ｔ）から音響信号Ａ（ｔ）についてのエンベロープ信号を復調して復元信号出力手段２３０Ａへ供給するものであり、エンベロープ信号復調部２１３Ｂは、制御信号Ｆ（ｔ）から音響信号Ｂ（ｔ）についてのエンベロープ信号を復調して復元信号出力手段２３０Ｂへ供給するものである。
図６に示すように、エンベロープ信号復調部２１３Ａと２１３Ｂは、ともに、Ｊ個の狭帯域バンドパスフィルタと、これらＪ個のフィルタの各々に１つずつ接続されている実効値検出回路と、伸長・変換回路とにより構成されている。なお、伸長・変換回路とは、前述した圧縮・変換回路が行う演算処理に対する逆演算を行う回路である。 The envelope signal demodulator 213A in FIG. 6 demodulates the envelope signal for the acoustic signal A (t) from the control signal F (t) and supplies it to the restored signal output means 230A. The envelope signal demodulator 213B The envelope signal for the acoustic signal B (t) is demodulated from the control signal F (t) and supplied to the restoration signal output means 230B.
As shown in FIG. 6, each of envelope signal demodulation sections 213A and 213B includes J narrowband bandpass filters, an effective value detection circuit connected to each of these J filters, and a decompression circuit.・ Consists of a conversion circuit. The decompression / conversion circuit is a circuit that performs an inverse operation on the arithmetic processing performed by the compression / conversion circuit described above.

ここで、注目すべき点は、エンベロープ信号復調部２１３Ａの狭帯域フィルタ群とエンベロープ信号復調部２１３Ｂの狭帯域フィルタ群とでは、通過帯域が異なっている点である。具体的には、エンベロープ信号復調部２１３ＡのＪ個の狭帯域フィルタの通過帯域の中心周波数は、周波数が低いものから順に、超可聴域の周波数ｆ1、ｆ2、…ｆ_Jであることに対し、エンベロープ信号復調部２１３ＢのＪ個の狭帯域フィルタの通過帯域の中心周波数は、周波数が低いものから順に、超可聴域の周波数ｆ_J+1、ｆ_J+2、…ｆ2Jである。前述したように、制御信号Ｆ（ｔ）は、超可聴域の周波数ｆ1、ｆ2、…ｆ_Jの正弦波の振幅を、それぞれ、エンベロープ信号数ａ1、ａ2、…ａ_J、の値に応じて調整するとともに、超可聴域の周波数ｆ_J+1、ｆ_J+2、…ｆ2Jの正弦波の振幅を、それぞれ、エンベロープ信号数ｂ1、ｂ2、…ｂ_J、の値に応じて調整した後に、それら各正弦波を加算することによって生成されるのであるから、エンベロープ信号復調部２１３Ａにおいては、エンベロープ信号数ａ1、ａ2、…ａ_Jのみが抽出され、エンベロープ信号復調部２１３Ｂにおいては、エンベロープ信号数ｂ1、ｂ2、…ｂ_Jのみが抽出されることになる。 Here, it should be noted that the passband is different between the narrowband filter group of the envelope signal demodulator 213A and the narrowband filter group of the envelope signal demodulator 213B. Specifically, the center frequency of the pass band of the J narrowband filter of the envelope signal demodulator 213A in order from the lower frequency, frequencies f1, f2 ultra audible range, to be a ... f _J, The center frequencies of the passbands of the J narrowband filters of the envelope signal demodulator 213B are the super audible frequencies f _{J + 1} , f _{J + 2} ,. As described above, the control signal F (t) is the frequency f1, f2 ultra audible range, ... the amplitude of the sine wave of f _J, respectively, the envelope signal number a1, a2, ... a _J, in accordance with the value while adjusting the frequency f _{J + 1} super audible range, f J _{+ 2,} ... the amplitude of the sine wave of F2j, respectively, the envelope signal number b1, b2, ... b _J, after adjusting in response to the value of, Since these sine waves are added, the envelope signal demodulator 213A extracts only the envelope signal numbers a1, a2,... _AJ , and the envelope signal demodulator 213B b1, b2, and only ... b _J is extracted.

図５に戻って、分離手段２１０から出力された混合信号Ｘ（ｔ）は２分流される。そして、一方は、搬送信号生成手段２２０Ａへ入力され、他方は、搬送信号生成手段２２０Ｂへ入力される。図５に示すように、搬送信号生成手段２２０Ａと２２０Ｂは、同一の構成を有しているため、この両者を区別する必要がない場合には「搬送信号生成手段２２０」と表記する。図５に示すように、搬送信号生成手段２２０は、エンベロープ信号生成手段１２０が有するフィルタ群と同一の構成を有するフィルタ群（バンドパスフィルタ２２１−ｊ（ｊ＝１〜Ｊ））で構成されている。分離手段２１０から搬送信号生成手段２２０へ入力された混合信号Ｘ（ｔ）は、図５に示すように、搬送信号生成手段２２０の内部でバンドパスフィルタ２２１−ｊの数分だけ分流され、バンドパスフィルタ２２１−ｊの各々へ入力される。その結果、バンドパスフィルタ２２１−ｊからはサブバンド信号Ｘ_j（ｔ）が出力される。そして、搬送信号生成手段２２０Ａは、バンドパスフィルタ２２１−ｊから出力されるサブバンド信号Ｘ_j（ｔ）をそのバンドパスフィルタ２２１−ｊの通過帯域における搬送信号として復元信号出力手段２３０Ａへ入力し、搬送信号生成手段２２０Ｂは、バンドパスフィルタ２２１−ｊから出力されるサブバンド信号Ｘ_j（ｔ）をそのバンドパスフィルタ２２１−ｊの通過帯域における搬送信号として復元信号出力手段２３０Ｂへ入力する。 Returning to FIG. 5, the mixed signal X (t) output from the separating means 210 is divided into two. One is input to the carrier signal generating means 220A, and the other is input to the carrier signal generating means 220B. As shown in FIG. 5, since the carrier signal generating means 220A and 220B have the same configuration, they are referred to as “carrier signal generating means 220” when it is not necessary to distinguish between them. As shown in FIG. 5, the carrier signal generation unit 220 includes a filter group (bandpass filter 221-j (j = 1 to J)) having the same configuration as the filter group included in the envelope signal generation unit 120. Yes. As shown in FIG. 5, the mixed signal X (t) input from the separation unit 210 to the carrier signal generation unit 220 is divided by the number of bandpass filters 221-j within the carrier signal generation unit 220, and the band Input to each of the pass filters 221-j. As a result, the sub-band signal X _j (t) is output from the band-pass filter 221-j. Then, the carrier signal generation unit 220A inputs the subband signal X _j (t) output from the bandpass filter 221-j to the restoration signal output unit 230A as a carrier signal in the pass band of the bandpass filter 221-j. The carrier signal generation means 220B inputs the subband signal X _j (t) output from the bandpass filter 221-j to the restoration signal output means 230B as a carrier signal in the pass band of the bandpass filter 221-j.

復元信号出力手段２３０Ａと２３０Ｂは、図５を参照すれば明らかなように、同一の構成を有しているため、この両者を区別する必要がない場合には「復元信号出力手段２３０」と表記する。図５に示すように、復元信号出力手段２３０は、Ｊ個の電圧制御型増幅器２３１−ｊ（ｊ＝１〜Ｊ）と、加算器２３２とを有している。
より詳細に説明すると、復元信号出力手段２３０Ａの電圧制御型増幅器２３１−ｊは、搬送信号生成手段２２０Ａのバンドパスフィルタ２２１−ｊに接続されている。また、復元信号出力手段２３０Ａの電圧制御型増幅器２３１−ｊには、分離手段２１０から出力されるエンベロープ信号ａ_jの値に応じた制御電圧が印加される。このため、復元信号出力手段２３０Ａの電圧制御型増幅器２３１−ｊへ入力される搬送信号Ｘ_j（ｔ）の振幅は、エンベロープ信号ａ_jの値に応じて増幅されることになる。その結果、復元信号出力手段２３０Ａの電圧制御型増幅器２３１−ｊからは、オリジナルの音響信号Ａ（ｔ）のｊ番目の周波数帯域におけるスペクトルに順ずる信号Ａ_j ^´（ｔ）が出力されることになる。このようにして電圧制御型増幅器２３１−ｊから出力される信号Ａ_j ^´（ｔ）は加算器２３２によって加算され、加算器２３２からは、オリジナルの音響信号Ａ（ｔ）に順ずる復元信号Ａ^´（ｔ）が出力される。 As is apparent from FIG. 5, the restoration signal output means 230A and 230B have the same configuration. Therefore, when there is no need to distinguish between the two, the restoration signal output means 230A and 230B are expressed as “restoration signal output means 230”. To do. As shown in FIG. 5, the restoration signal output unit 230 includes J voltage-controlled amplifiers 231-j (j = 1 to J) and an adder 232.
More specifically, the voltage-controlled amplifier 231-j of the restoration signal output unit 230A is connected to the bandpass filter 221-j of the carrier signal generation unit 220A. Further, a control voltage corresponding to the value of the envelope signal a _j output from the separation unit 210 is applied to the voltage control type amplifier 231-j of the restoration signal output unit 230 A. For this reason, the amplitude of the carrier signal X _j (t) input to the voltage-controlled amplifier 231-j of the restoration signal output means 230A is amplified according to the value of the envelope signal a _j . As a result, the voltage control type amplifier 231-j of the restoration signal output means 230 A outputs a signal A _j ^′ (t) that conforms to the spectrum in the j th frequency band of the original acoustic signal A (t). become. In this way, the signal A _j ^′ (t) output from the voltage control type amplifier 231-j is added by the adder 232, and the adder 232 receives the restored signal A in accordance with the original acoustic signal A (t). ^' (T) is output.

一方、復元信号出力手段２３０Ｂにおいても、復元信号出力手段２３０Ａと同様に、電圧制御型増幅器２３１−ｊには、搬送信号Ｘ_j（ｔ）が入力されるとともに、分離手段２１０から出力されるエンベロープ信号ｂ_jの値に応じた制御電圧が印加され、電圧制御型増幅器２３１−ｊからは、オリジナルの音響信号Ｂ（ｔ）のｊ番目の周波数帯域におけるスペクトルに順ずる信号Ｂ_j ^´（ｔ）が出力される。そして、これらＢ_j ^´（ｔ）が復元信号出力手段２３０Ｂの加算器２３２によって加算され、オリジナルの音響信号Ｂ（ｔ）のスペクトルに順ずる復元信号Ｂ^´（ｔ）が出力される。
以上がデコーダ２００の構成である。 On the other hand, in the restoration signal output means 230B, as in the restoration signal output means 230A, the carrier signal X _j (t) is input to the voltage control type amplifier 231-j and the envelope output from the separation means 210 is output. A control voltage corresponding to the value of the signal b _j is applied, and the voltage controlled amplifier 231-j outputs a signal B _j ^′ (t) that follows the spectrum in the j-th frequency band of the original acoustic signal B (t). Is output. These B _j ^′ (t) are added by the adder 232 of the restoration signal output means 230B, and a restoration signal B ^′ (t) that conforms to the spectrum of the original acoustic signal B (t) is output.
The above is the configuration of the decoder 200.

以上に説明したように、デコーダ２００によれば、混合信号Ｘ（ｔ）から、オリジナルの音響信号Ａ（ｔ）に順ずる復元信号Ａ^´（ｔ）と、オリジナルの音響信号Ｂ（ｔ）のスペクトルに順ずる復元信号Ｂ^´（ｔ）とが出力されるのであるが、ここで検討しなければならない点は、例えば、音源Ａについての音源分離を行う場合、上記のようして得られる復元信号Ａ^´（ｔ）の各周波数帯域では、音響信号Ｂ（ｔ）がマスキングノイズとして重畳されていると考えられる点である。
しかしながら、
（１）ａ_jがｂ_jに比較して充分に大きい場合
この場合、分離対象である音響信号Ａ（ｔ）が優勢であるため、音響信号Ｂ（ｔ）がマスキングノイズとなることはない。
（２）ａ_jとｂ_jが略等しい場合
この場合、音響信号Ａ（ｔ）と音響信号Ｂ（ｔ）とが混じったままであるが、両者は、同じ狭帯域の、略等しいスペクトルの信号であるため、一方が他方に対する大きな雑音源、または、マスカーとはなりにくい。しかも、両者は、時間変化しており完全に振幅が等しくなる時間確率は一般に大きくはない。
（３）ａ_jがｂ_jに比較して充分に小さい場合
この場合、本発明の効果が最も顕著に表れる。すなわち、本発明に係る音源分離を行わないとすると、例えば、ある時刻ｔにおいて音響信号Ａ（ｔ）の振幅が略ゼロであっても（すなわち、音源Ａの音が鳴っていない場合）、マスキング音として音源Ｂの音が聞こえてしまい分離感を大きく阻害するが、本発明によれば、復元信号Ａ^´（ｔ）の振幅は所定のレベル（この場合、無音）まで引き下げられ、「鳴っていないもの、または、鳴っていないときは聞こえない」という本来の状態が確保される。 As described above, according to the decoder 200, the restored signal A ^′ (t) that follows the original acoustic signal A (t) from the mixed signal X (t) and the original acoustic signal B (t). A restoration signal B ^′ (t) that follows the spectrum is output, but the point to be considered here is, for example, the restoration obtained as described above when performing sound source separation for the sound source A. The acoustic signal B (t) is considered to be superposed as masking noise in each frequency band of the signal A ^′ (t).
However,
(1) When a _j is sufficiently larger than b _{j In} this case, since the acoustic signal A (t) to be separated is dominant, the acoustic signal B (t) does not become masking noise.
(2) When a _j and b _j are substantially equal In this case, the acoustic signal A (t) and the acoustic signal B (t) remain mixed, but both are signals of the same narrowband and of approximately the same spectrum. Therefore, one is unlikely to be a large noise source or masker for the other. Moreover, both of them change with time, and the time probability that the amplitudes are completely equal is generally not large.
(3) When a _j is sufficiently smaller than b _{j In} this case, the effect of the present invention is most noticeable. That is, if sound source separation according to the present invention is not performed, for example, even if the amplitude of the acoustic signal A (t) is substantially zero at a certain time t (that is, when the sound of the sound source A is not sounding), masking is performed. Although the sound of the sound source B is heard as a sound and the separation feeling is greatly hindered, according to the present invention, the amplitude of the restoration signal A ^′ (t) is lowered to a predetermined level (in this case, silence), The original state of “nothing or no sound when not ringing” is secured.

ところで、デコーダ２００から出力される復元信号と、その復元信号に対応するオリジナルの音響信号とは必ずしも同一ではない。しかしながら、バンドパスフィルタ１２１−ｊの通過帯域幅を人間の聴覚の臨界帯域幅に応じて適宜設定しておくとともに、実効値検出回路１２２−ｊの時定数を人間の聴覚の検知限に沿って適宜設定しておけば、人間の聴覚の範囲内では、上記復元信号においては、オリジナル信号のスペクトルと周波数帯域毎の成分の時間変化が略確実に再現され、聴感的にオリジナル信号に略近いものとなる。
加えて、本実施形態に係る音源分離システム１０における音源分離は、音源情報やイベント情報を必要としない音源分離であるとともに、複数の受音点の観測信号を必要とすることはなく、分離対象の音源数についての制限もない。
このように、本実施形態によれば、混合対象である複数の音響信号の各々について、互いに異なる周波数帯域におけるエンベロープを示すエンベロープ信号を生成するといった事前処理を予め行っておくことによって、従来よりも適用範囲の広い音源分離技術を提供することが可能になる。 By the way, the restoration signal output from the decoder 200 and the original acoustic signal corresponding to the restoration signal are not necessarily the same. However, the pass bandwidth of the bandpass filter 121-j is appropriately set according to the critical bandwidth of human hearing, and the time constant of the effective value detection circuit 122-j is set in accordance with the detection limit of human hearing. If set appropriately, within the range of human hearing, in the above restoration signal, the temporal change in the spectrum of the original signal and the component for each frequency band is almost reliably reproduced, and it is audibly close to the original signal. It becomes.
In addition, the sound source separation in the sound source separation system 10 according to the present embodiment is sound source separation that does not require sound source information or event information, and does not require observation signals at a plurality of sound receiving points, and can be separated. There is no limit on the number of sound sources.
As described above, according to the present embodiment, by performing pre-processing such as generating envelope signals indicating envelopes in different frequency bands in advance for each of a plurality of acoustic signals to be mixed, it is possible to perform the conventional processing. It is possible to provide a sound source separation technique with a wide application range.

（Ｂ：変形）
以上、本発明の１実施形態について説明したが、係る実施形態に以下に述べるような変形を加えても良いことは勿論である。 (B: Deformation)
Although one embodiment of the present invention has been described above, it is needless to say that the embodiment may be modified as described below.

（１）上述した実施形態では、２つの音響信号を予め定められた音量比率で混合することにより得られる混合信号Ｘ（ｔ）と制御信号Ｆ（ｔ）とを加算してなる統合信号Ｙ（ｔ）を通信網３００経由でエンコーダ１００からデコーダ２００へ伝送する場合について説明した。しかしながら、ＣＤやＤＶＤなどの記録媒体に統合信号Ｙ（ｔ）を書き込む処理をエンコーダに実行させ、上記記録媒体に書き込まれた統合信号Ｙ（ｔ）をデコーダに読み取らせることによって、統合信号Ｙ（ｔ）をエンコーダからデコーダへ伝達させるようにしても勿論良い。 (1) In the above-described embodiment, the integrated signal Y () obtained by adding the mixed signal X (t) obtained by mixing the two acoustic signals at a predetermined volume ratio and the control signal F (t). The case where t) is transmitted from the encoder 100 to the decoder 200 via the communication network 300 has been described. However, the integration signal Y (t) is written by causing the encoder to execute the process of writing the integration signal Y (t) on a recording medium such as a CD or a DVD, and causing the decoder to read the integration signal Y (t) written on the recording medium. Of course, t) may be transmitted from the encoder to the decoder.

（２）上述した実施形態では、２種類の音響信号の混合および分離を行う場合について説明したが、３種類以上の音響信号の混合および分離に本発明を適用することが可能であることはいうまでもない。具体的には、音響信号の数分だけエンベロープ信号生成手段をエンコーダ側に設けておけば良い。また、上述した実施形態では、エンコーダへ入力される音響信号の各々に対応したエンベロープ信号生成手段をそのエンコーダに設けておく場合について説明した。しかしながら、１つのエンベロープ信号生成手段を順次使用して各音響信号に対応するエンベロープ信号を生成するようにしても勿論良い。 (2) In the above-described embodiment, the case of mixing and separating two types of acoustic signals has been described. However, it can be said that the present invention can be applied to mixing and separation of three or more types of acoustic signals. Not too long. Specifically, the envelope signal generation means may be provided on the encoder side for the number of acoustic signals. Further, in the above-described embodiment, the case where the encoder is provided with the envelope signal generation unit corresponding to each of the acoustic signals input to the encoder has been described. However, it is of course possible to generate envelope signals corresponding to each acoustic signal by sequentially using one envelope signal generating means.

（３）上述した実施形態では、混合信号Ｘ（ｔ）を、各々通過帯域が異なる複数のバンドパスフィルタによりサブバンド分割し、各サブバンド信号を各通過帯域における搬送信号とする場合について説明した。しかしながら、図７に示すように、搬送信号生成手段２２０に、バンドパスフィルタ２２１−ｊから電圧制御型増幅器２３１−ｊに至る経路上に切り替えスイッチＳＷ_ｊを設けるとともに、バンドパスフィルタ２２１−ｊの通過帯域に属するバンドノイズ（例えば、ピンクノイズ）またはその周波数帯域に属する周波数の正弦波を示す信号を生成する発信器Ｇ_ｊ（ｊ＝１〜Ｊ）を設け、切り替えスイッチＳＷ_ｊの切り替え（すなわち、電圧制御型増幅器２３１−ｊにバンドパスフィルタ２２１−ｊを接続するのか、それとも、発信器Ｇ_ｊを接続するのかの切り替え）をその周波数帯域（すなわち、バンドパスフィルタ２２１−ｊの通過帯域）に対応するエンベロープ信号の信号値に応じて行うようにしても良い。
具体的には、上記信号値が予め定められた閾値よりも小さいか否かを周波数帯域毎に判定し、閾値よりも小さいと判定された周波数帯域については、発信器Ｇ_ｊに接続を切り替え、閾値以上である周波数帯域については、バンドパスフィルタ２２１−ｊに接続を切り替えるようにしても良い。但し、上記搬送信号としてバンドノイズや正弦波を用いる場合には、エンベロープ信号を生成する際に用いるフィルタの通過帯域幅を充分に狭くしておくことが望ましい。なお、搬送信号として正弦波を使用する場合には、各周波数帯域の搬送信号を互いに無相関にするため、上記正弦波に適宜周波数変調を施すようにしても良い。 (3) In the above-described embodiment, the case where the mixed signal X (t) is divided into subbands by a plurality of bandpass filters each having a different passband, and each subband signal is used as a carrier signal in each passband has been described. . However, as shown in FIG. 7, the carrier signal generating means 220 is provided with a changeover switch SW _j on the path from the bandpass filter 221-j to the voltage controlled amplifier 231-j, and the bandpass filter 221-j A transmitter G _j (j = 1 to J) for generating a signal indicating a band noise (for example, pink noise) belonging to the pass band or a sine wave having a frequency belonging to the frequency band is provided, and the changeover switch SW _j is switched (ie, , Switching between whether the band-pass filter 221-j is connected to the voltage-controlled amplifier 231-j or whether the transmitter G _j is connected to the frequency band (that is, the pass band of the band-pass filter 221-j) It may be performed according to the signal value of the envelope signal corresponding to.
Specifically, it is determined for each frequency band whether the signal value is smaller than a predetermined threshold, and for the frequency band determined to be smaller than the threshold, the connection is switched to the transmitter G _j , For the frequency band that is equal to or greater than the threshold, the connection may be switched to the bandpass filter 221-j. However, when band noise or a sine wave is used as the carrier signal, it is desirable that the passband width of the filter used when generating the envelope signal be sufficiently narrow. When a sine wave is used as the carrier signal, the sine wave may be appropriately subjected to frequency modulation in order to make the carrier signals in each frequency band uncorrelated with each other.

（４）上述した実施形態では、制御信号Ｆ（ｔ）と混合信号Ｘ（ｔ）とを加算して得られる統合信号Ｙ（ｔ）をエンコーダ１００からデコーダ２００へ伝送する場合について説明した。しかしながら、制御信号Ｆ（ｔ）と混合信号Ｘ（ｔ）とを夫々個別にエンコーダからデコーダへ伝送するとしても良く、また、デコーダ側が混合信号Ｘ（ｔ）を既に有している場合には、制御信号Ｆ（ｔ）のみをエンコーダからデコーダへ伝送するようにしても良い。
また、本変形例で説明した制御信号（またはエンベロープ信号）のみを伝送する態様と、上記変形例（３）にて説明した搬送信号としてバンドノイズまたは正弦波を用いる態様とを組み合わせると、音響信号を伝送する際の伝送情報量を著しく低減させたり、雑音を効果的に除去したりすることが可能になると期待される。具体的には、音響信号の送信側では、その音響信号について、互いに異なる複数の周波数帯域におけるエンベロープを特定し、そのエンベロープを示すエンベロープ信号を上記音響信号の受信側へ伝送し、このエンベロープ信号を受信した受信側においては、上記各周波数帯域の搬送信号として変形例（３）で説明したバンドノイズまたは正弦波を用い、各搬送信号の振幅をその搬送信号の周波数帯域に対応するエンベロープ信号の信号値に応じて調整した後に、それら搬送信号を加算することによって、上記音響信号に対応する復元信号を生成するようにすれば良い。 (4) In the above-described embodiment, the case where the integrated signal Y (t) obtained by adding the control signal F (t) and the mixed signal X (t) is transmitted from the encoder 100 to the decoder 200 has been described. However, the control signal F (t) and the mixed signal X (t) may be individually transmitted from the encoder to the decoder, and if the decoder side already has the mixed signal X (t), Only the control signal F (t) may be transmitted from the encoder to the decoder.
In addition, when the aspect in which only the control signal (or envelope signal) described in this modification is transmitted and the aspect in which band noise or sine wave is used as the carrier signal described in the modification (3) are combined, an acoustic signal It is expected that it is possible to significantly reduce the amount of information transmitted during transmission of noise and to effectively eliminate noise. Specifically, on the transmission side of the acoustic signal, envelopes in a plurality of different frequency bands are specified for the acoustic signal, an envelope signal indicating the envelope is transmitted to the reception side of the acoustic signal, and the envelope signal is transmitted. On the receiving side, the band noise or sine wave described in the modification (3) is used as the carrier signal in each frequency band, and the amplitude of each carrier signal is an envelope signal corresponding to the frequency band of the carrier signal. After the adjustment according to the value, the restored signals corresponding to the acoustic signals may be generated by adding the carrier signals.

（５）上述した実施形態では、複数個のバンドパスフィルタと、それらバンドパスフィルタの各々に接続された実効値算出回路とによってエンベロープ信号生成手段を構成する場合について説明した。しかしながら、入力信号に対してウェーブレット変換を施す回路によって上記エンベロープ信号生成手段を構成するようにしても勿論良い。このようにすると、混合および分離対象である各音響信号の特性に応じて、エンベロープ信号を抽出する際の周波数帯域幅および時間分解能を合理的に設定することが可能になるといった効果を奏する。 (5) In the above-described embodiment, the case where the envelope signal generation unit is configured by a plurality of bandpass filters and an effective value calculation circuit connected to each of the bandpass filters has been described. However, it goes without saying that the envelope signal generation means may be configured by a circuit that performs wavelet transform on the input signal. If it does in this way, according to the characteristic of each acoustic signal which is the object of mixing and separation, there will be an effect that it becomes possible to rationally set the frequency bandwidth and time resolution when extracting the envelope signal.

（６）上述した実施形態では、音源分離システム１０において混合および分離される音響信号の信号形式については特に言及しなかったが、ＭＰＥＧ１／ＬａｙｅｒIII（以下、「ＭＰ３」）などの楽音符号化方式と組み合わせて音源分離を行うことも可能である。ここで、ＭＰ３とは、楽音符号化の一方式であり、音響信号のフィルタバンク分析→直交変換（ＭＰ３においては、ＭＤＣＴ：Modified Discrete Cosine Transformが適用されることが多い）…といった手順で実現される（例えば、図８に示すエンコーダおよびデコーダの構成例参照）。このＭＰ３にしたがって符号化された音響信号に本発明に係る音源分離を適用する場合には、上記直交変換にて得られたＤＣＴ係数（ＭＤＣＴ係数）に、前述した制御信号を埋め込むようにすれば良い。なお、ほとんどの楽音符号化方式では、図３の非線形量子化処理のためのビット割り当てを動的に行うことが可能であるから、上記制御信号のデコードに最低限必要なビットをエンコーダに割り当てさせ、このようにして割り当てられるビットに上記制御信号を埋め込むようにすれば、より効率的に制御信号をエンコードすることが可能になる。なお、どの帯域に制御信号を埋め込むのか、また、制御信号にどの程度のビットを割り当てるべきかについては、実験により適宜定めるようにすれば良い。
もっとも、デコーダ側が上記制御信号の埋め込みに対応していない場合（例えば、図８に示す構成を有するデコーダである場合）には、埋め込まれている制御信号に応じて再生音に歪みが生じる虞がある。しかしながら、この種の歪みはビットレートが低いときに生ずるものと同様なもので、ＭＰ３端末のユーザにとっては比較的馴染みのあるものである。このため、上記の如き再生音の歪みが生じたとしても、ユーザに極端な不快感を抱かせることはない。 (6) In the above-described embodiment, the signal format of the acoustic signal mixed and separated in the sound source separation system 10 is not particularly mentioned, but a musical sound encoding method such as MPEG1 / LayerIII (hereinafter “MP3”) It is also possible to perform sound source separation in combination. Here, MP3 is a method of musical sound encoding, and is realized by a procedure such as filter bank analysis of sound signal → orthogonal transformation (MDCT: Modified Discrete Cosine Transform is often applied in MP3). (For example, refer to the configuration example of the encoder and decoder shown in FIG. 8). When the sound source separation according to the present invention is applied to an acoustic signal encoded according to MP3, the control signal described above is embedded in the DCT coefficient (MDCT coefficient) obtained by the orthogonal transformation. good. Note that most of the musical tone coding schemes can dynamically allocate the bits for the nonlinear quantization process of FIG. 3, so that the minimum necessary bits for decoding the control signal are allocated to the encoder. If the control signal is embedded in the bits allocated in this way, the control signal can be encoded more efficiently. Note that what band should be embedded in the control signal and how many bits should be assigned to the control signal may be appropriately determined by experiment.
However, when the decoder side does not support the embedding of the control signal (for example, in the case of a decoder having the configuration shown in FIG. 8), the reproduced sound may be distorted according to the embedded control signal. is there. However, this type of distortion is similar to that which occurs when the bit rate is low and is relatively familiar to users of MP3 terminals. For this reason, even if the reproduction sound is distorted as described above, the user does not feel extreme discomfort.

（７）上述した実施形態では、各々固有の機能を担っているハードウェアモジュールを組み合わせることによって本発明に係るエンコーダ１００やデコーダ２００を構成する場合について説明した。しかしながら、エンコーダ１００を構成する各ハードウェアモジュールと同一の機能をＣＰＵ（Central Processing Unit）に実現させるプログラムを一般的なコンピュータ装置へインストールし、そのプログラムにしたがってそのコンピュータ装置のＣＰＵを作動させることによって、そのコンピュータ装置に本発明に係るエンコーダと同一の機能を付与するとしても勿論良い。同様に、デコーダ２００を構成する各ハードウェアモジュールと同一の機能をＣＰＵに実現させるプログラムを一般的なコンピュータ装置へインストールし、そのプログラムにしたがってそのコンピュータ装置のＣＰＵを作動させることによって、そのコンピュータ装置に本発明に係るデコーダと同一の機能を付与するとしても勿論良い。また、上記各プログラムを配布する際には、ＣＤ−ＲＯＭ（Compact Disk-Read Only Memory）に上記各プログラムを書き込んで配布するとしても良く、また、インターネットなどの電気通信回線を介したダウンロードにより配布するようにしても良い。 (7) In the above-described embodiment, the case where the encoder 100 and the decoder 200 according to the present invention are configured by combining hardware modules each having a unique function has been described. However, by installing a program for causing a CPU (Central Processing Unit) to realize the same function as each hardware module constituting the encoder 100 into a general computer device, and operating the CPU of the computer device according to the program Of course, the same function as that of the encoder according to the present invention may be given to the computer apparatus. Similarly, a program for causing a CPU to realize the same function as each hardware module constituting the decoder 200 is installed in a general computer apparatus, and the CPU of the computer apparatus is operated in accordance with the program, whereby the computer apparatus Of course, the same function as that of the decoder according to the present invention may be added to the above. Further, when distributing each program, the program may be written on a CD-ROM (Compact Disk-Read Only Memory), or may be distributed by downloading via a telecommunication line such as the Internet. You may make it do.

本発明の１実施形態に係る音源分離システム１０の構成例を示す図である。1 is a diagram illustrating a configuration example of a sound source separation system 10 according to an embodiment of the present invention. エンコーダ１００の構成例を示す図である。2 is a diagram illustrating a configuration example of an encoder 100. FIG. 実効値検出回路１２２の構成例を示す図である。3 is a diagram illustrating a configuration example of an effective value detection circuit 122. FIG. 変調処理回路１６０の構成例を示す図である。3 is a diagram illustrating a configuration example of a modulation processing circuit 160. FIG. デコーダ２００の構成例を示す図である。3 is a diagram illustrating a configuration example of a decoder 200. FIG. 分離手段２１０の構成例を示す図である。3 is a diagram illustrating a configuration example of a separation unit 210. FIG. 変形例（３）に係るデコーダの構成例を示す図である。It is a figure which shows the structural example of the decoder which concerns on a modification (3). 変形例（６）に関連し、ＭＰ３におけるエンコーダおよびデコーダの構成例を示す図である。It is a figure which shows the structural example of the encoder and decoder in MP3 regarding a modification (6).

Explanation of symbols

１００…エンコーダ、２００…デコーダ、３００…通信網。 DESCRIPTION OF SYMBOLS 100 ... Encoder, 200 ... Decoder, 300 ... Communication network.

Claims

An encoder for outputting a mixed signal obtained by mixing acoustic signals output from each of a plurality of different sound sources at a predetermined volume ratio, and an acoustic signal corresponding to each of the plurality of sound sources from the mixed signal A sound source separation system including a decoder for separating and outputting,
The encoder is
Envelope signal generating means for identifying envelopes in each of a plurality of different frequency bands for each of the acoustic signals output from the plurality of sound sources, and generating an envelope signal indicating the envelope;
First output means for outputting the envelope signal,
The decoder
Carrier signal generating means for generating each subband signal in the plurality of frequency bands of the mixed signal output from the encoder as a carrier signal in the frequency band;
The amplitude of the acoustic signal corresponding to each of the plurality of sound sources after the amplitude adjustment according to the signal value in each frequency band of the envelope signal corresponding to the sound source is performed on the carrier signal corresponding to the frequency band And a second output unit that generates and outputs the adjusted carrier signals, and outputs the sound source separation system.

The first output means includes
After performing modulation that shifts the frequency of the envelope signal generated by the envelope signal generating means to a frequency outside the audible band, and then adding and outputting the mixed signal,
The decoder
A first filter whose upper limit of the passband matches the upper limit of the audible band;
A second filter whose upper limit of the stopband matches the upper limit of the audible band;
The mixed signal and the envelope signal are output from the encoder by dividing the signal output from the encoder into two, one of which passes through the first filter and the other through the second filter. The sound source separation system according to claim 1, further comprising: a separation unit that separates the signal from the generated signal.

The envelope signal generating means includes
A first filter group comprising a plurality of band-pass filters each having a pass band that matches any of the plurality of frequency bands;
By calculating the effective value of the signal obtained by passing each of the acoustic signals output from the plurality of sound sources through each of the plurality of bandpass filters, the envelope in each frequency band of the acoustic signal is specified. on the other hand,
The carrier signal generation means includes
A second filter group having the same configuration as the first filter group;
2. The sound source according to claim 1, wherein a carrier signal of each frequency band is generated by allowing the mixed signal output from the encoder to pass through each bandpass filter constituting the second filter group. Separation system.

The envelope signal generating means includes
The sound source separation according to claim 1, wherein an envelope of each acoustic signal in each of the plurality of frequency bands is specified by performing wavelet transform on each of the acoustic signals output from the plurality of sound sources. system.

The carrier signal generation means includes
A transmitter for generating a signal indicating band noise belonging to each of the plurality of frequency bands or a sine wave of a frequency belonging to the frequency band;
For each of the plurality of sound sources, it is determined for each frequency band whether the signal value of the envelope signal corresponding to the sound source is smaller than a predetermined threshold, and the frequency determined to be smaller than the threshold For a band, the signal generated by the transmitter is a carrier signal in that frequency band, while for a frequency band that is greater than or equal to the threshold, the frequency band component of the mixed signal is the carrier signal in that frequency band. The sound source separation system according to claim 1.

Mixed signal generating means for generating a mixed signal by adding each of the acoustic signals output from a plurality of sound sources at a predetermined volume ratio;
Envelope signal generating means for identifying envelopes in each of a plurality of different frequency bands for each of the acoustic signals output from the plurality of sound sources, and generating an envelope signal indicating the envelope;
An encoder comprising: output means for outputting the mixed signal and the envelope signal.

A mixed signal obtained by mixing acoustic signals output from each of a plurality of different sound sources at a predetermined volume ratio is received, and a subband signal of the mixed signal in each of a plurality of different frequency bands is received. Carrier signal generating means for generating a carrier signal in a frequency band;
An envelope signal indicating an envelope of each of the sound signals in the plurality of frequency bands is received, and an acoustic signal corresponding to each of the plurality of sound sources is determined according to a signal value in each frequency band of the envelope signal corresponding to the sound source. Output means for adding and generating each of the carrier signals subjected to the amplitude adjustment after performing the amplitude adjustment on the carrier signal corresponding to the frequency band, and outputting the added signal.