JP2014137389A

JP2014137389A - Acoustic analyzer

Info

Publication number: JP2014137389A
Application number: JP2013004375A
Authority: JP
Inventors: Daichi Kitamura; 大地北村; Hiroshi Saruwatari; 洋猿渡; Yu Takahashi; 祐高橋
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2013-01-15
Filing date: 2013-01-15
Publication date: 2014-07-28

Abstract

PROBLEM TO BE SOLVED: To precisely separate a specific sound component of sound signals.SOLUTION: A storage 24 stores a known instructor matrix F which includes plural base vectors each representing a spectrum of an instructor sound component in which acoustic property is close to a target sound component. A matrix analyzing section 34 performs a non-negative value matrix factorization on a sound signal SA(t) to calculate: a basis matrix B including plural base vectors each of which is a matrix in which the instructor matrix F is added with a compensation matrix D and which represents a spectrum of the target sound component of the sound signal SA(t); a coefficient matrix G including plural coefficient vectors each representing a time change of weight with respect to the base vectors of the basis matrix B; a basis matrix H including plural base vectors each representing a spectrum of a non-target sound component other than the target sound component of the sound signal SA(t); and a coefficient matrix U including plural coefficient vectors each representing a time change of weight with respect to the respective base vectors of the basis matrix H.

Description

本発明は、複数の音響成分の混合音の音響信号から特定の音響成分を分離（例えば抽出または抑圧）する技術に関する。 The present invention relates to a technique for separating (for example, extracting or suppressing) a specific acoustic component from an acoustic signal of a mixed sound of a plurality of acoustic components.

音響特性が相違する複数の音響成分の混合音から特定の音響成分を抽出または抑圧する音源分離技術が従来から提案されている。例えば非特許文献１や非特許文献２には、非負値行列因子分解（ＮＭＦ：Non-negative Matrix Factorization）を利用した教師無音源分離が開示されている。 Conventionally, a sound source separation technique for extracting or suppressing a specific acoustic component from a mixed sound of a plurality of acoustic components having different acoustic characteristics has been proposed. For example, Non-Patent Document 1 and Non-Patent Document 2 disclose unsupervised sound source separation using non-negative matrix factorization (NMF).

非特許文献１や非特許文献２の技術では、図９に示すように、複数の音響成分が混合された観測音の振幅スペクトログラムを示す観測行列Ｙが非負値行列因子分解により基底行列Ｈと係数行列（アクティベーション行列）Ｕとに分解される。基底行列Ｈは、観測音に含まれる各音響成分のスペクトルを示す複数の基底ベクトルｈで構成され、係数行列Ｕは、各基底ベクトルに対する加重値の時間変化を示す複数の係数ベクトルｕで構成される。基底行列Ｈの複数の基底ベクトルｈと係数行列Ｕの複数の係数ベクトルｕとを音響成分毎（音源毎）に選別し、所望の音響成分の基底ベクトルｈと係数ベクトルｕとを抽出および乗算することでその音響成分の振幅スペクトログラムが生成される。 In the techniques of Non-Patent Document 1 and Non-Patent Document 2, as shown in FIG. 9, an observation matrix Y indicating an amplitude spectrogram of an observation sound in which a plurality of acoustic components is mixed is converted into a base matrix H and a coefficient by non-negative matrix factorization. It is decomposed into a matrix (activation matrix) U. The basis matrix H is composed of a plurality of basis vectors h indicating the spectrum of each acoustic component included in the observation sound, and the coefficient matrix U is composed of a plurality of coefficient vectors u indicating the time variation of the weight value for each basis vector. The A plurality of basis vectors h of the basis matrix H and a plurality of coefficient vectors u of the coefficient matrix U are selected for each acoustic component (for each sound source), and a basis vector h and a coefficient vector u of a desired acoustic component are extracted and multiplied. Thus, an amplitude spectrogram of the acoustic component is generated.

A. CICHOCKI, et. al., "NEW ALGORITHMS FOR NON-NEGATIVE MATRIX FACTORIZATION IN APPLICATIONS TO BLIND SOURCE SEPARATION," ICASSP 2006A. CICHOCKI, et. Al., "NEW ALGORITHMS FOR NON-NEGATIVE MATRIX FACTORIZATION IN APPLICATIONS TO BLIND SOURCE SEPARATION," ICASSP 2006 Tuomas Virtanen, "Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria", IEEE Trans. Aurio, Speech and Language Processing, volume 15, pp.1066-1074, 2007Tuomas Virtanen, "Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria", IEEE Trans. Aurio, Speech and Language Processing, volume 15, pp.1066-1074, 2007

しかし、非特許文献１および非特許文献２の技術では、基底行列Ｈの複数の基底ベクトルｈや係数行列Ｕの複数の係数ベクトルｕを、音響信号の音響成分毎に正確に選別（クラスタリング）することが困難である。したがって、音響信号の特定の音響成分のみを高精度に抽出または抑圧することは現実には容易ではない。以上の事情を考慮して、本発明は、音響信号の特定の音響成分を高精度に分離することを目的とする。 However, in the techniques of Non-Patent Document 1 and Non-Patent Document 2, a plurality of basis vectors h of the basis matrix H and a plurality of coefficient vectors u of the coefficient matrix U are accurately selected (clustered) for each acoustic component of the acoustic signal. Is difficult. Therefore, in reality, it is not easy to extract or suppress only a specific acoustic component of the acoustic signal with high accuracy. In view of the above circumstances, an object of the present invention is to separate a specific acoustic component of an acoustic signal with high accuracy.

以上の課題を解決するために、本発明の音響解析装置は、教師音響成分のスペクトルを示す複数の基底ベクトルを含む既知の教師行列（例えば教師行列Ｆ）に補償行列（例えば補償行列Ｄ）を加算した行列であって音響信号の目的音成分のスペクトルを示す複数の基底ベクトルを含む第１基底行列（例えば基底行列Ｂ）と、第１基底行列の各基底ベクトルに対する加重値の時間変化を示す複数の係数ベクトルを含む第１係数行列（例えば係数行列Ｇ）と、音響信号の目的音成分以外の非目的音成分のスペクトルを示す複数の基底ベクトルを含む第２基底行列（例えば基底行列Ｈ）と、第２基底行列の各基底ベクトルに対する加重値の時間変化を示す複数の係数ベクトルを含む第２係数行列（例えば係数行列Ｕ）とを、音響信号に対する非負値行列因子分解で算定する行列解析手段を具備する。以上の構成では、教師音響成分を示す既知の教師行列との間で第１係数行列が共通する未知の補償行列を利用した非負値行列因子分解が実行されるから、音響信号の目的音成分の音響特性が教師音響成分とは完全には合致しない場合でも目的音成分を高精度に分離できるという利点がある。 In order to solve the above problems, the acoustic analysis device of the present invention adds a compensation matrix (eg, compensation matrix D) to a known teacher matrix (eg, teacher matrix F) including a plurality of basis vectors indicating the spectrum of the teacher acoustic component. A first base matrix (for example, base matrix B) including a plurality of base vectors indicating the spectrum of the target sound component of the acoustic signal, and a time change of a weight value for each base vector of the first base matrix. A first coefficient matrix (for example, coefficient matrix G) including a plurality of coefficient vectors, and a second basis matrix (for example, base matrix H) including a plurality of basis vectors indicating spectra of non-target sound components other than the target sound component of the acoustic signal. And a second coefficient matrix (for example, coefficient matrix U) including a plurality of coefficient vectors indicating temporal changes in weight values for the respective basis vectors of the second basis matrix, and a non-negative value for the acoustic signal It comprises a matrix analysis means for calculating the column factorization. In the above configuration, since the non-negative matrix factorization using the unknown compensation matrix having the first coefficient matrix in common with the known teacher matrix indicating the teacher acoustic component is performed, the target sound component of the acoustic signal is There is an advantage that the target sound component can be separated with high accuracy even when the acoustic characteristic does not completely match the teacher acoustic component.

本発明の好適な態様の音響解析装置は、第１基底行列と第１係数行列とに応じた目的音成分、および、第２基底行列と第２係数行列とに応じた非目的音成分の少なくとも一方を生成する音源分離手段を具備する。以上の態様では、音響信号の目的音成分と非目的音成分とを分離（抽出または抑圧）することが可能である。 The acoustic analysis device according to a preferred aspect of the present invention includes at least a target sound component according to the first basis matrix and the first coefficient matrix, and a non-target sound component according to the second basis matrix and the second coefficient matrix. Sound source separation means for generating one is provided. In the above aspect, the target sound component and the non-target sound component of the acoustic signal can be separated (extracted or suppressed).

本発明の好適な態様において、行列解析手段は、教師行列と補償行列との類似度が低下するという拘束条件（例えば後述の拘束条件Ｃ1）のもとで非負値行列因子分解を実行する。以上の態様では、教師行列と補償行列との類似度が低下するように非負値行列因子分解が実行されるから、教師行列と補償行列とが共通することを回避できる（目的音成分のうち教師音響成分と相違する音響成分を補償行列として抽出できる）という利点がある。 In a preferred aspect of the present invention, the matrix analysis means performs non-negative matrix factorization under a constraint condition (for example, constraint condition C1 described later) that the degree of similarity between the teacher matrix and the compensation matrix decreases. In the above aspect, since the non-negative matrix factorization is performed so that the similarity between the teacher matrix and the compensation matrix decreases, it is possible to avoid the teacher matrix and the compensation matrix being in common (the teacher among the target sound components). There is an advantage that an acoustic component different from the acoustic component can be extracted as a compensation matrix.

本発明の好適な態様において、行列解析手段は、教師行列と第２基底行列との類似度が低下するという拘束条件（例えば後述の拘束条件Ｃ2）のもとで非負値行列因子分解を実行する。以上の態様では、教師行列と第２基底行列とが共通すること（音響信号の目的音成分のうち教師音響成分に近似する成分が第２基底行列に包含されること）を回避できるという利点がある。 In a preferred aspect of the present invention, the matrix analysis means performs non-negative matrix factorization under a constraint condition (for example, constraint condition C2 described later) that the similarity between the teacher matrix and the second basis matrix decreases. . In the above aspect, there is an advantage that it is possible to avoid that the teacher matrix and the second basis matrix are common (a component that approximates the teacher acoustic component among the target sound components of the acoustic signal is included in the second basis matrix). is there.

本発明の好適な態様において、行列解析手段は、補償行列と第２基底行列との類似度が低下するという拘束条件（例えば後述の拘束条件Ｃ3）のもとで非負値行列因子分解を実行する。以上の態様では、補償行列と第２基底行列とが共通すること（音響信号の目的音成分のうち教師音響成分とは相違する成分が第２基底行列に包含されること）を回避できるという利点がある。 In a preferred aspect of the present invention, the matrix analysis means performs non-negative matrix factorization under a constraint condition (for example, constraint condition C3 described later) that the degree of similarity between the compensation matrix and the second basis matrix decreases. . In the above aspect, it is possible to avoid that the compensation matrix and the second basis matrix are common (a component different from the teacher acoustic component among the target sound components of the acoustic signal is included in the second basis matrix). There is.

本発明の好適な態様において、行列解析手段は、第１基底行列と第２基底行列との類似度が低下するという拘束条件（例えば後述の拘束条件Ｃ4）のもとで非負値行列因子分解を実行する。以上の態様では、第１基底行列と第２基底行列とが共通することを回避できる（目的音成分と非目的音成分とを明確に区別できる）という利点がある。 In a preferred aspect of the present invention, the matrix analysis means performs non-negative matrix factorization under a constraint condition (for example, constraint condition C4 described later) that the similarity between the first basis matrix and the second basis matrix decreases. Run. The above aspect has an advantage that the first base matrix and the second base matrix can be prevented from being shared (the target sound component and the non-target sound component can be clearly distinguished).

なお、以上の各態様の拘束条件において「行列間の類似度が低下する」とは、各行列の相関が減少（理想的には最小化）すること、および、各行列の距離が増加（理想的には最大化）することの双方を含意する。 Note that “the similarity between matrices decreases” in the constraint conditions of each aspect described above means that the correlation of each matrix decreases (ideally minimizes), and the distance of each matrix increases (ideal It implies both maximization).

本発明の好適な態様において、補償行列は負数の要素を含み、行列解析手段は、第１基底行列が非負値行列であるという拘束条件（例えば後述の拘束条件Ｃ5）のもとで非負値行列因子分解を実行する。以上の態様では、補償行列が負数の要素を含み得るから、目的音成分のスペクトルの強度（例えば振幅）が教師音響成分のスペクトルの強度を下回る場合でも目的音成分を高精度に分離できるという利点がある。また、第１基底行列が非負値行列であるという拘束条件が加味されるから、観測行列の非負値行列因子分解を適切に実行することが可能である。なお、以上の態様の具体例は、例えば第２実施形態として後述される。 In a preferred aspect of the present invention, the compensation matrix includes a negative number element, and the matrix analysis means is a non-negative matrix under a constraint condition that the first basis matrix is a non-negative matrix (for example, a constraint condition C5 described later). Perform factorization. In the above aspect, since the compensation matrix may include a negative element, the target sound component can be separated with high accuracy even when the spectrum intensity (for example, amplitude) of the target sound component is lower than the spectrum intensity of the teacher sound component. There is. In addition, since the constraint that the first basis matrix is a non-negative matrix is taken into account, it is possible to appropriately execute non-negative matrix factorization of the observation matrix. In addition, the specific example of the above aspect is later mentioned, for example as 2nd Embodiment.

以上の各態様に係る音響解析装置は、音響信号の解析に専用されるＤＳＰ（Digital Signal Processor）などのハードウェア（電子回路）によって実現されるほか、ＣＰＵ（Central Processing Unit）等の汎用の演算処理装置とプログラムとの協働によっても実現される。本発明のプログラムは、教師音響成分のスペクトルを示す複数の基底ベクトルを含む非負の教師行列に補償行列を加算した行列であって音響信号の目的音成分のスペクトルを示す複数の基底ベクトルを含む第１基底行列と、第１基底行列の各基底ベクトルに対する加重値の時間変化を示す複数の係数ベクトルを含む第１係数行列と、音響信号の目的音成分以外の非目的音成分のスペクトルを示す複数の基底ベクトルを含む第２基底行列と、第２基底行列の各基底ベクトルに対する加重値の時間変化を示す複数の係数ベクトルを含む第２係数行列とを、音響信号に対する非負値行列因子分解で算定する行列解析処理をコンピュータに実行させる。なお、本発明のプログラムは、コンピュータが読取可能な記録媒体に格納された形態で提供されてコンピュータにインストールされ得る。記録媒体は、例えば非一過性（non-transitory）の記録媒体であり、ＣＤ-ＲＯＭ等の光学式記録媒体（光ディスク）が好例であるが、半導体記録媒体や磁気記録媒体等の公知の任意の形式の記録媒体を包含し得る。また、例えば、本発明のプログラムは、通信網を介した配信の形態で提供されてコンピュータにインストールされ得る。 The acoustic analysis apparatus according to each aspect described above is realized by hardware (electronic circuit) such as DSP (Digital Signal Processor) dedicated to the analysis of acoustic signals, and general-purpose computation such as CPU (Central Processing Unit). This is also realized by cooperation between the processing device and the program. The program of the present invention is a matrix obtained by adding a compensation matrix to a non-negative teacher matrix including a plurality of basis vectors indicating the spectrum of the teacher acoustic component, and includes a plurality of basis vectors indicating the spectrum of the target sound component of the acoustic signal. 1 basis matrix, a first coefficient matrix including a plurality of coefficient vectors indicating temporal changes in weight values for each basis vector of the first basis matrix, and a plurality of spectra indicating non-target sound components other than the target sound component of the acoustic signal A non-negative matrix factorization for the acoustic signal, and a second coefficient matrix including a plurality of coefficient vectors indicating a time change of a weight value for each basis vector of the second basis matrix Causes the computer to execute matrix analysis processing. The program of the present invention can be provided in a form stored in a computer-readable recording medium and installed in the computer. The recording medium is, for example, a non-transitory recording medium, and an optical recording medium (optical disk) such as a CD-ROM is a good example, but a known arbitrary one such as a semiconductor recording medium or a magnetic recording medium This type of recording medium can be included. For example, the program of the present invention can be provided in the form of distribution via a communication network and installed in a computer.

本発明の第１実施形態に係る音響解析装置のブロック図である。1 is a block diagram of an acoustic analysis device according to a first embodiment of the present invention. 教師行列の説明図である。It is explanatory drawing of a teacher matrix. 行列解析部の動作の説明図である。It is explanatory drawing of operation | movement of a matrix analysis part. 教師行列と補償行列と基底行列との関係の説明図である。It is explanatory drawing of the relationship between a teacher matrix, a compensation matrix, and a base matrix. 第２実施形態が解決する課題の説明図である。It is explanatory drawing of the subject which 2nd Embodiment solves. 第２実施形態の実験結果の図表である。It is a chart of an experimental result of a 2nd embodiment. 第２実施形態の効果の説明図である。It is explanatory drawing of the effect of 2nd Embodiment. 変形例に係る教師行列および補償行列の説明図である。It is explanatory drawing of the teacher matrix and compensation matrix which concern on a modification. 背景技術における非負値行列因子分解の説明図である。It is explanatory drawing of the nonnegative matrix factorization in background art.

＜第１実施形態＞
図１は、本発明の第１実施形態に係る音響解析装置１００のブロック図である。図１に示すように、音響解析装置１００には信号供給装置１２と放音装置１４とが接続される。信号供給装置１２は、音響信号ＳA(t)を音響解析装置１００に供給する。音響信号ＳA(t)は、音響特性が相違する複数の音響成分（例えば楽音や音声）の混合音の波形を示す時間領域信号である（ｔ：時間）。例えば、複数種の楽器の演奏音（歌唱音等の音声も含意する）の混合音を示す音響信号ＳA(t)が音響解析装置１００に供給される。周囲の音響を収音して音響信号ＳA(t)を生成する収音機器や、可搬型または内蔵型の記録媒体から音響信号ＳA(t)を取得して音響解析装置１００に供給する再生装置や、通信網から音響信号ＳA(t)を受信して音響解析装置１００に供給する通信装置が信号供給装置１２として採用され得る。 <First Embodiment>
FIG. 1 is a block diagram of an acoustic analysis apparatus 100 according to the first embodiment of the present invention. As shown in FIG. 1, a signal supply device 12 and a sound emission device 14 are connected to the acoustic analysis device 100. The signal supply device 12 supplies the acoustic signal SA (t) to the acoustic analysis device 100. The acoustic signal SA (t) is a time domain signal indicating a waveform of a mixed sound of a plurality of acoustic components (for example, musical sound and voice) having different acoustic characteristics (t: time). For example, an acoustic signal SA (t) indicating a mixed sound of performance sounds (including sounds such as singing sounds) of a plurality of types of musical instruments is supplied to the acoustic analysis device 100. A sound collection device that collects ambient sounds and generates an acoustic signal SA (t), or a playback device that acquires the acoustic signal SA (t) from a portable or built-in recording medium and supplies it to the acoustic analysis device 100 Alternatively, a communication device that receives the acoustic signal SA (t) from the communication network and supplies the acoustic signal SA (t) to the acoustic analysis device 100 can be employed as the signal supply device 12.

第１実施形態の音響解析装置１００は、信号供給装置１２から供給される音響信号ＳA(t)に対する音響処理で音響信号ＳB(t)を生成する音響処理装置（音源分離装置）である。音響信号ＳB(t)は、音響信号ＳA(t)に包含される複数の音響成分のうち特定の音響成分（以下「目的音成分」という）を抽出した音響（すなわち目的音成分以外の非目的音成分を抑圧した音響）の波形を示す時間領域信号である。放音装置１４（例えばスピーカやヘッドホン）は、音響解析装置１００から供給される音響信号ＳB(t)に応じた音波を放射する。なお、音響信号ＳB(t)をデジタルからアナログに変換するＤ/Ａ変換器の図示は便宜的に省略した。 The acoustic analysis device 100 of the first embodiment is an acoustic processing device (sound source separation device) that generates an acoustic signal SB (t) by acoustic processing on the acoustic signal SA (t) supplied from the signal supply device 12. The acoustic signal SB (t) is a sound obtained by extracting a specific acoustic component (hereinafter referred to as “target sound component”) from among a plurality of acoustic components included in the acoustic signal SA (t) (that is, a non-purpose other than the target sound component). It is a time domain signal showing a waveform of a sound with suppressed sound components. The sound emitting device 14 (for example, a speaker or headphones) radiates a sound wave corresponding to the acoustic signal SB (t) supplied from the acoustic analysis device 100. The D / A converter that converts the acoustic signal SB (t) from digital to analog is not shown for convenience.

図１に示すように、音響解析装置１００は、演算処理装置２２と記憶装置２４とを具備するコンピュータシステムで実現される。記憶装置２４は、演算処理装置２２が実行するプログラムＰGMや演算処理装置２２が使用する各種のデータを記憶する。半導体記録媒体や磁気記録媒体等の公知の記録媒体や複数種の記録媒体の組合せが記憶装置２４として任意に採用され得る。音響信号ＳA(t)を記憶装置２４に記憶した構成（したがって信号供給装置１２は省略され得る）も好適である。 As shown in FIG. 1, the acoustic analysis device 100 is realized by a computer system including an arithmetic processing device 22 and a storage device 24. The storage device 24 stores a program PGM executed by the arithmetic processing device 22 and various data used by the arithmetic processing device 22. A known recording medium such as a semiconductor recording medium or a magnetic recording medium or a combination of a plurality of types of recording media can be arbitrarily employed as the storage device 24. A configuration in which the acoustic signal SA (t) is stored in the storage device 24 (therefore, the signal supply device 12 can be omitted) is also suitable.

第１実施形態の記憶装置２４は、目的音成分に対応（例えば近似または合致）した音響特性を表現する基底行列（以下「教師行列」という）Ｆを記憶する。音響解析装置１００は、記憶装置２４に記憶された教師行列Ｆを事前情報（教師情報）として利用した教師有（Supervised）音源分離で音響信号ＳA(t)から音響信号ＳB(t)を生成する。 The storage device 24 of the first embodiment stores a base matrix (hereinafter referred to as “teacher matrix”) F that represents acoustic characteristics corresponding to (for example, approximate or match) the target sound component. The acoustic analysis device 100 generates the acoustic signal SB (t) from the acoustic signal SA (t) by supervised sound source separation using the teacher matrix F stored in the storage device 24 as prior information (teacher information). .

教師行列Ｆは、既知の音源が発音した音響成分（以下「教師音響成分」という）から事前に生成されて記憶装置２４に格納される。教師音響成分は、音響特性（例えば音色）が目的音成分と近似または共通する音響成分である。例えば、教師音響成分と目的音成分とは、楽器としての種類（例えば発音の原理）は相互に共通するけれども相異なる個体の楽器の演奏音（例えば相異なる型式のフルートの演奏音）に相当する。したがって、教師音響成分の音響特性と音響信号ＳA(t)の目的音成分の音響特性とは、同種の楽器の演奏音であると受聴者が知覚できる程度に相互に近似するが完全には合致しない。事前に用意された教師行列Ｆが表現する教師音響成分と音響特性が近似する音響成分が目的音成分に該当すると換言することも可能である。他方、非目的音成分と教師音響成分とは音響特性が相違する。例えば、教師音響成分と非目的音成分とは、相異なる種類の楽器の演奏音に相当し、両者が相異なる種類の楽器の演奏音であると受聴者が知覚できる程度に音響特性が相違する。例えば、目的音成分と教師音響成分とがフルートの演奏音に相当し、非目的音成分はクラリネットの演奏音に相当する。 The teacher matrix F is generated in advance from an acoustic component generated by a known sound source (hereinafter referred to as “teacher acoustic component”) and stored in the storage device 24. The teacher acoustic component is an acoustic component whose acoustic characteristics (for example, timbre) are similar to or in common with the target sound component. For example, the teacher sound component and the target sound component correspond to performance sounds of different musical instruments (for example, performance sounds of different types of flutes) although the types of musical instruments (for example, the principle of pronunciation) are common to each other. . Therefore, the acoustic characteristics of the teacher acoustic component and the acoustic characteristics of the target sound component of the acoustic signal SA (t) are approximated to each other to the extent that the listener can perceive that they are performance sounds of the same type of instrument, but are completely matched. do not do. In other words, it can be said that the teacher sound component expressed by the teacher matrix F prepared in advance and the sound component whose sound characteristics are approximated correspond to the target sound component. On the other hand, the non-target sound component and the teacher sound component have different acoustic characteristics. For example, the teacher sound component and the non-target sound component correspond to performance sounds of different types of musical instruments, and the acoustic characteristics are different to the extent that the listener can perceive that they are performance sounds of different types of musical instruments. . For example, the target sound component and the teacher sound component correspond to a flute performance sound, and the non-target sound component corresponds to a clarinet performance sound.

図２は、教師音響成分から教師行列Ｆを生成する処理の説明図である。図２の観測行列Ｘは、事前に収録された教師音響成分を時間軸上で区分したＮ個のフレームの各々の振幅スペクトルの時系列（振幅スペクトログラム）を表現するＭ行Ｎ列の非負値行列である（ＭおよびＮは自然数）。すなわち、観測行列Ｘの第ｎ列（ｎ＝１〜Ｎ）は、教師音響成分のうち第ｎ番目のフレームの振幅スペクトルｘ[n]に相当する。振幅スペクトルｘ[n]の第ｍ行（ｍ＝１〜Ｍ）の要素は、周波数軸上に設定されたＭ個の周波数のうち第ｍ番目の周波数での振幅値を意味する。 FIG. 2 is an explanatory diagram of processing for generating the teacher matrix F from the teacher acoustic component. The observation matrix X in FIG. 2 is an M-row N-column non-negative matrix representing a time series (amplitude spectrogram) of amplitude spectra of N frames obtained by dividing pre-recorded teacher acoustic components on the time axis. (M and N are natural numbers). That is, the nth column (n = 1 to N) of the observation matrix X corresponds to the amplitude spectrum x [n] of the nth frame among the teacher acoustic components. The element of the m-th row (m = 1 to M) of the amplitude spectrum x [n] means an amplitude value at the m-th frequency among M frequencies set on the frequency axis.

図２の観測行列Ｘは、以下の数式(1)で表現される通り、非負値行列因子分解（ＮＭＦ：Non-negative Matrix Factorization）により教師行列Ｆと係数行列（アクティベーション行列）Ｑとに分解される。
The observation matrix X in FIG. 2 is decomposed into a teacher matrix F and a coefficient matrix (activation matrix) Q by non-negative matrix factorization (NMF) as expressed by the following equation (1). Is done.

数式(1)の教師行列Ｆは、図２に示すように、教師音響成分を構成する各成分に対応するＫ個の基底ベクトルｆ[1]〜ｆ[K]を横方向に配列したＭ行Ｋ列の非負値行列（基底行列）である。教師行列Ｆのうち第ｋ列（ｋ＝１〜Ｋ）の基底ベクトルｆ[k]は、教師音響成分を構成するＫ個の成分（基底）のうち第ｋ番目の成分の振幅スペクトルに相当する。すなわち、基底ベクトルｆ[k]の第ｍ行（教師行列Ｆの第ｍ行第ｋ列）の要素は、教師音響成分の第ｋ番目の成分の振幅スペクトルのうち周波数軸上の第ｍ番目の周波数での振幅値を意味する。 As shown in FIG. 2, the teacher matrix F of Equation (1) has M rows in which K basis vectors f [1] to f [K] corresponding to the components constituting the teacher acoustic component are arranged in the horizontal direction. It is a non-negative matrix (base matrix) of K columns. The basis vector f [k] of the k-th column (k = 1 to K) in the teacher matrix F corresponds to the amplitude spectrum of the k-th component among the K components (bases) constituting the teacher acoustic component. . That is, the element of the m-th row (m-th row and k-th column of the teacher matrix F) of the basis vector f [k] is the m-th component on the frequency axis in the amplitude spectrum of the k-th component of the teacher acoustic component. It means the amplitude value at frequency.

数式(1)の係数行列Ｑは、図２に示すように、教師行列Ｆの各基底ベクトルｆ[k]に対応するＫ個の係数ベクトルｑ[1]〜ｑ[K]を縦方向に配列したＫ行Ｎ列の非負値行列である。係数行列Ｑの第ｋ行の係数ベクトルｑ[k]は、教師行列Ｆの基底ベクトルｆ[k]に対する加重値（活性度）の時系列に相当する。以上の説明から理解される通り、観測行列Ｘの１個の振幅スペクトルｘ[n]は、教師行列ＦのＫ個の基底ベクトルｆ[1]〜ｆ[K]の加重和として表現される。 As shown in FIG. 2, the coefficient matrix Q of Equation (1) arranges K coefficient vectors q [1] to q [K] corresponding to the respective base vectors f [k] of the teacher matrix F in the vertical direction. This is a non-negative matrix of K rows and N columns. The coefficient vector q [k] in the k-th row of the coefficient matrix Q corresponds to a time series of weight values (activity) for the base vector f [k] of the teacher matrix F. As understood from the above description, one amplitude spectrum x [n] of the observation matrix X is expressed as a weighted sum of the K basis vectors f [1] to f [K] of the teacher matrix F.

教師行列Ｆと係数行列Ｑとを乗算した行列ＦＱが観測行列Ｘに近似する（すなわち、行列ＦＱと観測行列Ｘとの類似度が増加する）ように教師行列Ｆおよび係数行列Ｑが算定されたうえで教師行列Ｆが記憶装置２４に格納される。教師行列ＦのＫ個の基底ベクトルｆ[1]〜ｆ[K]の各々は、概略的には相異なる音高に対応する。すなわち、教師行列Ｆの生成に使用される教師音響成分は、音響信号ＳA(t)の目的音成分に想定され得る全部の音高を含むように生成され、教師行列Ｆの基底ベクトルｆ[k]の総数（基底数）Ｋは、音響信号ＳA(t)の目的音成分に想定され得る音高の総数以上の数値に設定される。以上が教師行列Ｆの生成の手順である。 The teacher matrix F and the coefficient matrix Q are calculated so that the matrix FQ obtained by multiplying the teacher matrix F and the coefficient matrix Q approximates the observation matrix X (that is, the similarity between the matrix FQ and the observation matrix X increases). In addition, the teacher matrix F is stored in the storage device 24. Each of the K basis vectors f [1] to f [K] of the teacher matrix F generally corresponds to different pitches. That is, the teacher acoustic component used for generating the teacher matrix F is generated so as to include all possible pitches of the target sound component of the acoustic signal SA (t), and the basis vector f [k of the teacher matrix F is generated. ] (Base number) K is set to a value equal to or greater than the total number of pitches that can be assumed as the target sound component of the acoustic signal SA (t). The above is the procedure for generating the teacher matrix F.

図１の演算処理装置２２は、記憶装置２４に記憶されたプログラムＰGMを実行することで、音響信号ＳA(t)から音響信号ＳB(t)を生成するための複数の機能（周波数分析部３２，行列解析部３４，音源分離部３６）を実現する。演算処理装置２２の各要素による処理は、音響信号ＳA(t)を時間軸上で区分したＮ個のフレームを単位として順次に反復される。なお、演算処理装置２２の各機能を複数の集積回路に分散した構成や、専用の電子回路（例えばＤＳＰ）が一部の機能を実現する構成も採用され得る。 The arithmetic processing unit 22 in FIG. 1 executes a program PGM stored in the storage device 24 to thereby generate a plurality of functions (frequency analysis unit 32) for generating the acoustic signal SB (t) from the acoustic signal SA (t). , Matrix analysis unit 34, and sound source separation unit 36). Processing by each element of the arithmetic processing unit 22 is sequentially repeated in units of N frames obtained by dividing the acoustic signal SA (t) on the time axis. A configuration in which each function of the arithmetic processing unit 22 is distributed over a plurality of integrated circuits, or a configuration in which a dedicated electronic circuit (for example, a DSP) realizes a part of the functions may be employed.

図３は、周波数分析部３２および行列解析部３４による処理の説明図である。周波数分析部３２は、音響信号ＳA(t)のＮ個のフレームを単位として図３の観測行列Ｙを順次に生成する。観測行列Ｙは、図３に示すように、音響信号ＳA(t)を時間軸上で区分したＮ個のフレームの各々の振幅スペクトルｙ[1]〜ｙ[N]の時系列（振幅スペクトログラム）を表現するＭ行Ｎ列の非負値行列である。すなわち、観測行列Ｙの第ｎ列は、音響信号ＳA(t)のうち第ｎ番目のフレームの振幅スペクトルｙ[n]（Ｍ個の周波数の各々での振幅値の系列）に相当する。観測行列Ｙの生成には例えば短時間フーリエ変換等の公知の周波数分析が利用される。なお、音響信号ＳA(t)の各フレームのパワースペクトルの時系列を観測行列Ｙとして利用することも可能である。 FIG. 3 is an explanatory diagram of processing by the frequency analysis unit 32 and the matrix analysis unit 34. The frequency analysis unit 32 sequentially generates the observation matrix Y in FIG. 3 in units of N frames of the acoustic signal SA (t). As shown in FIG. 3, the observation matrix Y is a time series (amplitude spectrogram) of amplitude spectra y [1] to y [N] of N frames obtained by dividing the acoustic signal SA (t) on the time axis. Is a non-negative matrix with M rows and N columns. That is, the nth column of the observation matrix Y corresponds to the amplitude spectrum y [n] (sequence of amplitude values at each of the M frequencies) of the nth frame of the acoustic signal SA (t). For the generation of the observation matrix Y, a known frequency analysis such as a short-time Fourier transform is used. The time series of the power spectrum of each frame of the acoustic signal SA (t) can be used as the observation matrix Y.

図１の行列解析部３４は、記憶装置２４に格納された既知の教師行列Ｆを事前情報として利用した非負値行列因子分解（ＮＭＦ）を観測行列Ｙに対して実行する。第１実施形態の行列解析部３４は、以下の数式(2)で表現されるように、周波数分析部３２が生成した観測行列Ｙを、基底行列Ｂと係数行列Ｇと基底行列Ｈと係数行列Ｕとに分解する。
The matrix analysis unit 34 in FIG. 1 performs non-negative matrix factorization (NMF) on the observation matrix Y using the known teacher matrix F stored in the storage device 24 as prior information. The matrix analysis unit 34 according to the first embodiment uses the observation matrix Y generated by the frequency analysis unit 32 as a basis matrix B, a coefficient matrix G, a basis matrix H, and a coefficient matrix, as expressed by the following formula (2). Disassemble into U.

数式(2)の基底行列Ｂは、図３に示すように、Ｋ個の基底ベクトルｂ[1]〜ｂ[K]を横方向に配列したＭ行Ｋ列の非負値行列である。また、数式(2)の係数行列Ｇは、図３に示すように、基底行列Ｂの各基底ベクトルｂ[k]に対応するＫ個の係数ベクトルｇ[1]〜ｇ[K]を縦方向に配列したＫ行Ｎ列の非負値行列である。係数行列Ｇの第ｋ行の係数ベクトルｇ[k]は、基底行列Ｂの基底ベクトルｂ[k]に対する加重値（活性度）の時系列に相当する。すなわち、係数ベクトルｇ[k]の第ｎ列の要素は、音響信号ＳA(t)のＮ個のフレームのうち第ｎ番目のフレームにおける基底ベクトルｆ[k]の加重値を意味する。 As shown in FIG. 3, the basis matrix B of the equation (2) is a non-negative matrix of M rows and K columns in which K basis vectors b [1] to b [K] are arranged in the horizontal direction. In addition, the coefficient matrix G of the equation (2) has K coefficient vectors g [1] to g [K] corresponding to the basis vectors b [k] of the basis matrix B in the vertical direction, as shown in FIG. 2 is a non-negative matrix of K rows and N columns arranged in a matrix. The coefficient vector g [k] in the k-th row of the coefficient matrix G corresponds to a time series of weight values (activity) for the base vector b [k] of the base matrix B. That is, the element in the nth column of the coefficient vector g [k] means a weight value of the base vector f [k] in the nth frame among the N frames of the acoustic signal SA (t).

数式(2)で表現される通り、基底行列Ｂは、記憶装置２４に記憶された既知の教師行列Ｆと未知の補償行列Ｄとの加算で表現される。教師行列Ｆは、前述の通り、教師音響成分を構成する各成分に対応するＫ個の基底ベクトルｆ[1]〜ｆ[K]を横方向に配列したＭ行Ｋ列の非負値行列である。他方、補償行列Ｄは、図３に示す通り、Ｋ個の基底ベクトルｄ[1]〜ｄ[K]を横方向に配列したＭ行Ｋ列の非負値行列である。数式(2)および図３に示す通り、基底行列Ｂの第ｋ列の基底ベクトルｂ[k]は、教師行列Ｆの第ｋ列の基底ベクトルｆ[k]と補償行列Ｄの第ｋ列の基底ベクトルｄ[k]との加算に相当する。 As expressed by Equation (2), the base matrix B is expressed by adding the known teacher matrix F and the unknown compensation matrix D stored in the storage device 24. As described above, the teacher matrix F is a non-negative matrix of M rows and K columns in which K basis vectors f [1] to f [K] corresponding to the components constituting the teacher acoustic component are arranged in the horizontal direction. . On the other hand, the compensation matrix D is a non-negative matrix of M rows and K columns in which K basis vectors d [1] to d [K] are arranged in the horizontal direction, as shown in FIG. As shown in Equation (2) and FIG. 3, the basis vector b [k] of the kth column of the basis matrix B is the basis vector f [k] of the kth column of the teacher matrix F and the kth column of the compensation matrix D. This corresponds to addition with the basis vector d [k].

数式(2)から理解される通り、教師行列Ｆと補償行列Ｄとは共通の係数行列Ｇに対応する。すなわち、補償行列Ｄの基底ベクトルｄ[k]は、教師行列Ｆの基底ベクトルｆ[k]と共通の時点にて同様の度合（係数ベクトルｇ[k]内の１個の加重値）で励起される振幅スペクトルに相当する。以上の関係から理解される通り、教師行列Ｆの基底ベクトルｆ[k]と補償行列Ｄの基底ベクトルｄ[k]とは、音源が共通する音響成分の振幅スペクトルに相当する。前述の通り、教師行列Ｆは、音響特性が目的音成分に近似する教師音響成分（例えば音源の種類が目的音成分と共通する音響成分）を利用して生成される。したがって、図４に示すように、教師行列Ｆの基底ベクトルｆ[k]と補償行列Ｄの基底ベクトルｄ[k]とを加算した基底行列Ｂの基底ベクトルｂ[k]は、音響信号ＳA(t)の複数の音響成分のうち音響特性が教師音響成分に近似する目的音成分（例えば音源の種類が教師音響成分と共通する音響成分）の振幅スペクトルＡbに相当する。すなわち、図４からも把握される通り、補償行列Ｄの基底ベクトルｄ[k]は、目的音成分の振幅スペクトルＡbと教師音響成分の振幅スペクトルＡfとの差分（残差）の振幅スペクトルＡdに相当すると理解できる。したがって、音響信号ＳA(t)内の目的音成分の音響特性が教師音響成分の音響特性に完全に合致する場合には補償行列Ｄは零行列となる。また、補償行列Ｄの基底ベクトルｄ[k]は、音響信号ＳA(t)内の目的音成分の振幅スペクトルＡbに近似（理想的には合致）するように教師行列Ｆの基底ベクトルｆ[k]の振幅スペクトルＡfを変形することで目的音成分の振幅スペクトルＡbと教師音響成分の振幅スペクトルＡfとの相違を補償（吸収）するための要素と換言することも可能である。 As understood from Equation (2), the teacher matrix F and the compensation matrix D correspond to a common coefficient matrix G. That is, the basis vector d [k] of the compensation matrix D is excited with the same degree (one weight value in the coefficient vector g [k]) at the same time as the basis vector f [k] of the teacher matrix F. Corresponds to the amplitude spectrum to be applied. As understood from the above relationship, the basis vector f [k] of the teacher matrix F and the basis vector d [k] of the compensation matrix D correspond to the amplitude spectrum of the acoustic component common to the sound sources. As described above, the teacher matrix F is generated using a teacher acoustic component whose acoustic characteristics approximate to the target sound component (for example, an acoustic component in which the type of sound source is common to the target sound component). Therefore, as shown in FIG. 4, the base vector b [k] of the base matrix B obtained by adding the base vector f [k] of the teacher matrix F and the base vector d [k] of the compensation matrix D is the acoustic signal SA ( Among the plurality of acoustic components of t), the acoustic characteristic corresponds to the amplitude spectrum Ab of the target sound component whose acoustic characteristics approximate to the teacher acoustic component (for example, the acoustic component having the same sound source type as the teacher acoustic component). That is, as can be understood from FIG. 4, the basis vector d [k] of the compensation matrix D is changed to the amplitude spectrum Ad of the difference (residual) between the amplitude spectrum Ab of the target sound component and the amplitude spectrum Af of the teacher sound component. It can be understood that it corresponds. Therefore, when the acoustic characteristic of the target sound component in the acoustic signal SA (t) completely matches the acoustic characteristic of the teacher acoustic component, the compensation matrix D is a zero matrix. In addition, the base vector d [k] of the compensation matrix D is approximated (ideally matched) to the amplitude spectrum Ab of the target sound component in the acoustic signal SA (t). The amplitude spectrum Af in FIG. 11 can be transformed into an element for compensating (absorbing) the difference between the amplitude spectrum Ab of the target sound component and the amplitude spectrum Af of the teacher sound component.

以上に説明した通り、基底行列Ｂの各基底ベクトルｂ[k]は目的音成分の振幅スペクトルＡbに相当するから、基底行列Ｂと係数行列Ｇとを乗算した数式(2)の行列ＢＧは、音響信号ＳA(t)内の目的音成分の振幅スペクトログラムを表現する。したがって、数式(2)のうち基底行列Ｈと係数行列Ｕとを乗算した行列ＨＵは、音響信号ＳA(t)のうち目的音成分以外の非目的音成分の振幅スペクトログラムを表現するＭ行Ｎ列の非負値行列である。 As described above, since each base vector b [k] of the base matrix B corresponds to the amplitude spectrum Ab of the target sound component, the matrix BG of the formula (2) obtained by multiplying the base matrix B and the coefficient matrix G is An amplitude spectrogram of the target sound component in the acoustic signal SA (t) is expressed. Therefore, the matrix HU obtained by multiplying the base matrix H and the coefficient matrix U in Equation (2) is M rows and N columns representing the amplitude spectrogram of the non-target sound component other than the target sound component in the acoustic signal SA (t). Is a non-negative matrix.

基底行列Ｈは、図３に示すように、音響信号ＳA(t)の非目的音成分を構成する各成分に対応するＲ個の基底ベクトルｈ[1]〜ｈ[R]を横方向に配列したＭ行Ｒ列の非負値行列である。基底行列Ｈの第ｒ列（ｒ＝１〜Ｒ）の基底ベクトルｈ[r]は、音響信号ＳA(t)の非目的音成分を構成するＲ個の成分のうち第ｒ番目の成分の振幅スペクトルに相当する。すなわち、基底ベクトルｈ[r]の第ｍ行の要素は、音響信号ＳA(t)の非目的音成分を構成する第ｒ番目の成分の振幅スペクトルのうち周波数軸上の第ｍ番目の周波数での振幅値を意味する。なお、基底行列Ｂの列数Ｋと基底行列Ｈの列数Ｒとの異同は不問である。 As shown in FIG. 3, the base matrix H arranges R basis vectors h [1] to h [R] corresponding to the components constituting the non-target sound components of the acoustic signal SA (t) in the horizontal direction. This is a non-negative matrix of M rows and R columns. The base vector h [r] of the r-th column (r = 1 to R) of the base matrix H is the amplitude of the r-th component among the R components constituting the non-target sound component of the acoustic signal SA (t). Corresponds to the spectrum. That is, the mth row element of the basis vector h [r] is the mth frequency on the frequency axis in the amplitude spectrum of the rth component constituting the non-target sound component of the acoustic signal SA (t). Means the amplitude value. The difference between the number of columns K of the base matrix B and the number of columns R of the base matrix H is not questioned.

数式(2)の係数行列Ｕは、図３に示すように、基底行列Ｈの各基底ベクトルｈ[r]に対応するＲ個の係数ベクトルｕ[1]〜ｕ[R]を縦方向に配列したＲ行Ｎ列の非負値行列である。係数行列Ｇの第ｒ行の係数ベクトルｕ[r]は、基底行列Ｈの基底ベクトルｈ[r]に対する加重値の時系列に相当する。すなわち、係数ベクトルｕ[r]の第ｎ列の要素は、音響信号ＳA(t)のＮ個のフレームのうち第ｎ番目のフレームにおける基底ベクトルｈ[r]の大きさ（加重値）を意味する。したがって、前述の通り、基底行列Ｈと係数行列Ｇとを乗算した数式(2)の行列ＨＵは、音響信号ＳA(t)内の非目的音成分の振幅スペクトログラムを表現する。なお、非目的音成分は、相異なる音源（目的音成分の音源とは別種の音源）から発音された複数の音響成分を包含し得る。 As shown in FIG. 3, the coefficient matrix U of Equation (2) arranges R coefficient vectors u [1] to u [R] corresponding to each base vector h [r] of the base matrix H in the vertical direction. This is a non-negative matrix of R rows and N columns. The coefficient vector u [r] in the r-th row of the coefficient matrix G corresponds to a time series of weight values for the base vector h [r] of the base matrix H. That is, the element in the nth column of the coefficient vector u [r] means the magnitude (weight value) of the basis vector h [r] in the nth frame among the N frames of the acoustic signal SA (t). To do. Therefore, as described above, the matrix HU of the mathematical formula (2) obtained by multiplying the base matrix H and the coefficient matrix G represents the amplitude spectrogram of the non-target sound component in the acoustic signal SA (t). Note that the non-target sound component can include a plurality of acoustic components that are generated from different sound sources (a sound source different from the sound source of the target sound component).

図１の行列解析部３４は、前掲の数式(2)の通り、目的音成分の行列(Ｆ＋Ｄ)Ｇと非目的音成分の行列ＨＵとの加算が音響信号ＳA(t)の観測行列Ｙに近似する（すなわち、両者間の相違が最小化する）ように、周波数分析部３２が音響信号ＳA(t)から生成した観測行列Ｙと既知の教師行列Ｆとを利用した非負値行列因子分解で補償行列Ｄと係数行列Ｇと基底行列Ｈと係数行列Ｕとを算定する。第１実施形態の非負値行列因子分解では、各行列間の類似度が低下（理想的には最小化）するという以下の拘束条件Ｃ1〜Ｃ4を導入する。
Ｃ1：教師行列Ｆと補償行列Ｄとの類似度が低下する。
Ｃ2：教師行列Ｆと基底行列Ｈとの類似度が低下する。
Ｃ3：補償行列Ｄと基底行列Ｈとの類似度が低下する。
Ｃ4：基底行列Ｂ（Ｂ＝Ｆ＋Ｄ）と基底行列Ｈとの類似度が低下する。 The matrix analysis unit 34 shown in FIG. 1 adds the target sound component matrix (F + D) G and the non-target sound component matrix HU to the observation matrix Y of the acoustic signal SA (t) as shown in Equation (2). Non-negative matrix factorization using the observation matrix Y generated by the frequency analysis unit 32 from the acoustic signal SA (t) and the known teacher matrix F so as to approximate (that is, the difference between the two is minimized). A compensation matrix D, a coefficient matrix G, a base matrix H, and a coefficient matrix U are calculated. In the non-negative matrix factorization of the first embodiment, the following constraint conditions C1 to C4 are introduced that the degree of similarity between the matrices decreases (ideally minimized).
C1: The similarity between the teacher matrix F and the compensation matrix D decreases.
C2: The similarity between the teacher matrix F and the base matrix H decreases.
C3: The degree of similarity between the compensation matrix D and the basis matrix H decreases.
C4: The similarity between the base matrix B (B = F + D) and the base matrix H decreases.

拘束条件Ｃ1は、目的音成分のうち教師音響成分（教師行列Ｆ）とは音響特性が相違する音響成分を補償行列Ｄとして抽出するための条件である。すなわち、拘束条件Ｃ1の導入により、教師行列Ｆと補償行列Ｄとが共通する状態（基底行列Ｂが教師音響成分のみを反映した状態）で数式(2)が成立する状況は回避される。第１実施形態では、教師行列Ｆと補償行列Ｄとの相関λ(F|D)の最小化（minimize λ(F|D)）を拘束条件Ｃ1として例示する。教師行列Ｆと補償行列Ｄとの相関行列Ｆ^TＤ（記号Ｔは行列の転置を意味する）のフロベニウスノルム‖Ｆ^TＤ‖_Frが相関λ(F|D)として好適である。 The constraint condition C1 is a condition for extracting, as the compensation matrix D, an acoustic component having an acoustic characteristic different from that of the teacher acoustic component (teacher matrix F) among the target sound components. That is, the introduction of the constraint condition C1 avoids the situation in which Equation (2) is established in a state where the teacher matrix F and the compensation matrix D are common (the basis matrix B reflects only the teacher acoustic component). In the first embodiment, the minimization of the correlation λ (F | D) between the teacher matrix F and the compensation matrix D (minimize λ (F | D)) is exemplified as the constraint condition C1. It is suitable as | (D F) correlation matrix F ^T D of a teacher matrix F and the compensation matrix D (symbol T is meant the matrix transpose) Frobenius norm ‖F ^T D‖ _Fr of correlation lambda.

拘束条件Ｃ2は、教師行列Ｆで表現される教師音響成分とは音響特性が相違する非目的音成分を基底行列Ｈとして抽出するための条件である。すなわち、拘束条件Ｃ2の導入により、教師行列Ｆと基底行列Ｈとが共通した状態で数式(2)が成立する状況は回避される。第１実施形態では、教師行列Ｆと基底行列Ｈとの相関λ(F|H)（例えばλ(F|H)＝‖Ｆ^TＨ‖_Fr）の最小化を拘束条件Ｃ2として例示する。 The constraint condition C2 is a condition for extracting a non-target sound component having an acoustic characteristic different from that of the teacher acoustic component expressed by the teacher matrix F as the base matrix H. In other words, the introduction of the constraint condition C2 avoids the situation where the mathematical formula (2) is established in a state where the teacher matrix F and the base matrix H are in common. In the first embodiment, the correlation lambda of teacher matrix F and the basis matrix H illustrated as minimizing the constraint C2 (F | | H) ( H) = ‖F T H‖ Fr example lambda (F).

拘束条件Ｃ3は、補償行列Ｄで表現される音響成分（目的音成分と教師音響成分との差分）と基底行列Ｈで表現される非目的音成分とで音響特性を相違させるための条件である。すなわち、拘束条件Ｃ3の導入により、補償行列Ｄと基底行列Ｈとが共通した状態で数式(2)が成立する状況は回避される。第１実施形態では、補償行列Ｄと基底行列Ｈとの相関λ(D|H)（例えばλ(D|H)＝‖Ｆ^TＨ‖_Fr）の最小化を拘束条件Ｃ3として例示する。 The constraint condition C3 is a condition for making the acoustic characteristics different between the acoustic component expressed by the compensation matrix D (difference between the target sound component and the teacher acoustic component) and the non-target sound component expressed by the base matrix H. . That is, by introducing the constraint condition C3, a situation in which the formula (2) is established in a state where the compensation matrix D and the base matrix H are common is avoided. In the first embodiment, the correlation between the compensation matrix D and the basis matrix H lambda illustrate the minimization as constraints C3 (D | | H) ( H) = ‖F T H‖ Fr example lambda (D).

拘束条件Ｃ4は、基底行列Ｂ（Ｂ＝Ｆ＋Ｄ）で表現される目的音成分と基底行列Ｈで表現される非目的音成分とで音響特性を相違させるための条件である。すなわち、拘束条件Ｃ4の導入により、基底行列Ｂと基底行列Ｈとが共通した状態（目的音成分と非目的音成分とを区別できない状態）で数式(2)が成立する状況は回避される。第１実施形態では、基底行列Ｂ（Ｂ＝Ｆ＋Ｄ）と基底行列Ｈとの相関λ(F+D|H)（例えばλ(F+D|H)＝‖(Ｆ＋Ｄ)^TＨ‖_Frの最小化を拘束条件Ｃ4として例示する。 The constraint condition C4 is a condition for making the acoustic characteristics different between the target sound component expressed by the base matrix B (B = F + D) and the non-target sound component expressed by the base matrix H. That is, by introducing the constraint condition C4, a situation in which Equation (2) is established in a state where the base matrix B and the base matrix H are in common (a state where the target sound component and the non-target sound component cannot be distinguished) is avoided. In the first embodiment, the correlation lambda between base matrix B (B = F + D) and the basis matrix H smallest H) = ‖ ^{_{(F + D) T H‖ Fr}} | (F + D | H) ( e.g., lambda (F + D Is exemplified as the constraint condition C4.

以上に説明した拘束条件Ｃ1〜Ｃ4のもとで数式(2)の成否を評価するために以下の数式(3)の評価関数Ｚを導入する。
数式(3)の記号δ(Y|(F+D)G+HU)は、観測行列Ｙと行列｛(Ｆ＋Ｄ)Ｇ＋ＨＵ｝との距離を意味する。例えば、一般化ＫＬ（Kullback-Leibler）擬距離（I-divergence）が距離δ(Y|(F+D)G+HU)として好適である。数式(3)の係数μ1〜μ4は、数式(3)の各項（相関λ）の値域を相互に整合させるための係数であり、所定の非負値（μ1,μ2,μ3,μ4≧０）に設定される。 In order to evaluate the success or failure of the formula (2) under the constraint conditions C1 to C4 described above, an evaluation function Z of the following formula (3) is introduced.
The symbol δ (Y | (F + D) G + HU) in Equation (3) means the distance between the observation matrix Y and the matrix {(F + D) G + HU}. For example, a generalized KL (Kullback-Leibler) pseudorange (I-divergence) is suitable as the distance δ (Y | (F + D) G + HU). The coefficients μ1 to μ4 in the formula (3) are coefficients for mutually matching the value ranges of the respective terms (correlation λ) in the formula (3), and are predetermined non-negative values (μ1, μ2, μ3, μ4 ≧ 0). Set to

第１実施形態の行列解析部３４による非負値行列因子分解は、数式(3)の評価関数Ｚを最小化する処理（すなわち、拘束条件Ｃ1〜Ｃ4のもとで数式(2)を成立させる処理）に相当する。評価関数Ｚの最小化という条件から以下の数式(4)から数式(7)が導出される。
The non-negative matrix factorization by the matrix analysis unit 34 of the first embodiment is a process for minimizing the evaluation function Z of the expression (3) (that is, a process for establishing the expression (2) under the constraint conditions C1 to C4). ). From the condition that the evaluation function Z is minimized, the following equation (7) is derived from the following equation (4).

数式(4)は、補償行列Ｄ（Ｍ行Ｋ列）の第ｍ行第ｋ列の要素Ｄ_mk（Ｄ_mk≧０）を逐次的に更新する更新式であり、数式(5)は、係数行列Ｇ（Ｋ行Ｎ列）の第ｋ行第ｎ列の要素Ｇ_kn（Ｇ_kn≧０）を逐次的に更新する更新式である。数式(6)は、基底行列Ｈ（Ｍ行Ｒ列）の第ｍ行第ｒ列の要素Ｈ_mr（Ｈ_mr≧０）を逐次的に更新する更新式であり、数式(7)は、係数行列Ｕ（Ｒ行Ｎ列）の第ｒ行第ｎ列の要素Ｕ_rn（Ｕ_rn≧０）を逐次的に更新する更新式である。 Equation (4) is an update equation that sequentially updates the element D _mk (D _mk ≧ 0) of the m-th row and the k-th column of the compensation matrix D (M rows and K columns). This is an update formula for sequentially updating the element G _kn (G _kn ≧ 0) of the k-th row and the n-th column of the matrix G (K rows and N columns). Equation (6) is an update equation that sequentially updates the element H _mr (H _mr ≧ 0) of the m-th row and the r-th column of the base matrix H (M rows and R columns). This is an update equation for sequentially updating the element U _rn (U _rn ≧ 0) of the r-th row and the n-th column of the matrix U (R rows and N columns).

図１の行列解析部３４は、評価関数Ｚ内で未知の行列（Ｄ,Ｇ,Ｈ,Ｕ）の初期値を乱数に設定したうえで数式(4)から数式(7)の演算を反復し、反復回数が所定の回数に到達した時点での演算結果（Ｄ_mk,Ｇ_kn,Ｈ_mr,Ｕ_rn）を補償行列Ｄ，係数行列Ｇ，基底行列Ｈおよび係数行列Ｕとして確定する。更新式の反復回数は、評価関数Ｚが所定値（例えばゼロ）に収束するように実験的または統計的に選定される。以上の説明から理解される通り、第１実施形態の行列解析部３４は、音響信号ＳA(t)の観測行列Ｙおよび既知の教師行列Ｆに対して拘束条件Ｃ1〜Ｃ4のもとで数式(2)の関係が成立するように補償行列Ｄと係数行列Ｇと基底行列Ｈと係数行列Ｕとを生成する。 1 sets the initial value of the unknown matrix (D, G, H, U) in the evaluation function Z to a random number, and then repeats the operations of Equation (4) to Equation (7). The calculation results (D _mk , G _kn , H _mr , U _rn ) when the number of iterations reaches a predetermined number are determined as a compensation matrix D, a coefficient matrix G, a base matrix H, and a coefficient matrix U. The number of iterations of the update formula is selected experimentally or statistically so that the evaluation function Z converges to a predetermined value (for example, zero). As can be understood from the above description, the matrix analysis unit 34 of the first embodiment uses the mathematical expression (1) under the constraint conditions C1 to C4 with respect to the observation matrix Y of the acoustic signal SA (t) and the known teacher matrix F. A compensation matrix D, a coefficient matrix G, a base matrix H, and a coefficient matrix U are generated so that the relationship 2) is established.

図１の音源分離部３６は、行列解析部３４による解析結果（Ｄ,Ｇ,Ｈ,Ｕ）を利用して音響信号ＳB(t)を生成する。具体的には、音源分離部３６は、記憶装置２４に記憶された教師行列Ｆと行列解析部３４が算定した補償行列Ｄとを加算した基底行列Ｂに行列解析部３４が算定した係数行列Ｇを乗算することで、音響信号ＳA(t)内の目的音成分の振幅スペクトログラム（(Ｆ＋Ｄ)Ｇ）を算定し、各フレームの振幅スペクトルと音響信号ＳA(t)のそのフレームでの位相スペクトルとを適用した逆フーリエ変換で時間領域の音響信号ＳB(t)を生成する。音源分離部３６が生成した音響信号ＳB(t)が放音装置１４に供給されて音波として再生される。 The sound source separation unit 36 in FIG. 1 generates an acoustic signal SB (t) using the analysis result (D, G, H, U) by the matrix analysis unit 34. Specifically, the sound source separation unit 36 adds the coefficient matrix G calculated by the matrix analysis unit 34 to the base matrix B obtained by adding the teacher matrix F stored in the storage device 24 and the compensation matrix D calculated by the matrix analysis unit 34. To calculate the amplitude spectrogram ((F + D) G) of the target sound component in the acoustic signal SA (t), and the amplitude spectrum of each frame and the phase spectrum of the acoustic signal SA (t) in that frame The time domain acoustic signal SB (t) is generated by inverse Fourier transform using the above. The acoustic signal SB (t) generated by the sound source separation unit 36 is supplied to the sound emitting device 14 and reproduced as a sound wave.

以上に説明した第１実施形態では、教師音響成分を表現する既知の教師行列Ｆとの間で係数行列Ｇが共通する未知の補償行列Ｄを適用した非負値行列因子分解が実行されるから、音響信号ＳA(t)の目的音成分の音響特性が教師音響成分とは完全には合致しない場合でも目的音成分を高精度に分離できるという利点がある。第１実施形態では、教師行列Ｆと補償行列Ｄと基底行列Ｈとに関する拘束条件Ｃ1〜Ｃ4のもとで非負値行列因子分解が実行されるから、目的音成分と非目的音成分とを高精度に分離できるという効果は格別に顕著である。 In the first embodiment described above, non-negative matrix factorization is performed by applying an unknown compensation matrix D having a common coefficient matrix G with a known teacher matrix F that expresses a teacher acoustic component. There is an advantage that the target sound component can be separated with high accuracy even when the acoustic characteristics of the target sound component of the acoustic signal SA (t) do not completely match the teacher acoustic component. In the first embodiment, since non-negative matrix factorization is performed under the constraint conditions C1 to C4 regarding the teacher matrix F, the compensation matrix D, and the base matrix H, the target sound component and the non-target sound component are increased. The effect that it can be separated into precision is particularly remarkable.

＜第２実施形態＞
本発明の第２実施形態を以下に説明する。なお、以下に例示する各態様において作用や機能が第１実施形態と同様である要素については、第１実施形態の説明で参照した符号を流用して各々の詳細な説明を適宜に省略する。 Second Embodiment
A second embodiment of the present invention will be described below. In addition, about the element in which an effect | action and a function are the same as that of 1st Embodiment in each aspect illustrated below, the detailed description of each is abbreviate | omitted suitably using the code | symbol referred by description of 1st Embodiment.

第１実施形態では、補償行列Ｄを非負値行列（Ｄ_mk≧０）と仮定した。図４を参照して説明した通り、基底行列Ｂの基底ベクトルｂ[k]で表現される目的音成分の振幅スペクトルＡb（基底ベクトルｙ[n]）が、教師行列Ｆの基底ベクトルｆ[k]で表現される教師音響成分の振幅スペクトルＡfに対して正数（基底ベクトルｄ[f]）を加算した関係にある場合（すなわち、振幅スペクトルＡbの振幅値が全周波数にわたり振幅スペクトルＡfを上回る場合）には、補償行列Ｄを非負値行列とした第１実施形態でも目的音成分を高精度に分離することが可能である。しかし、図５に例示されるように、音響信号ＳA(t)の目的音成分の振幅スペクトルＡbの振幅値が教師行列Ｆの振幅スペクトルＡfを下回る場合、非負の教師行列Ｆの基底ベクトルｆ[k]と非負の補償行列Ｄの基底ベクトルｄ[k]との加算では目的音成分の振幅スペクトルＡbを適切に表現できない。 In the first embodiment, the compensation matrix D is assumed to be a non-negative matrix (D _mk ≧ 0). As described with reference to FIG. 4, the amplitude spectrum Ab (basic vector y [n]) of the target sound component expressed by the base vector b [k] of the base matrix B is the base vector f [k] of the teacher matrix F. ] Is a relationship obtained by adding a positive number (basic vector d [f]) to the amplitude spectrum Af of the teacher acoustic component expressed by the above equation (that is, the amplitude value of the amplitude spectrum Ab exceeds the amplitude spectrum Af over all frequencies). In the first case, the target sound component can be separated with high accuracy even in the first embodiment in which the compensation matrix D is a non-negative matrix. However, as illustrated in FIG. 5, when the amplitude value of the amplitude spectrum Ab of the target sound component of the acoustic signal SA (t) is lower than the amplitude spectrum Af of the teacher matrix F, the base vector f [ k] and the basis vector d [k] of the non-negative compensation matrix D cannot appropriately represent the amplitude spectrum Ab of the target sound component.

以上の課題を解決するために、第２実施形態では、補償行列Ｄが非負値行列であるという第１実施形態の条件を解除する。すなわち、補償行列Ｄは非負値行列に制限されず、補償行列Ｄの各要素Ｄ_mkは非負数（正数およびゼロ）だけでなく負数にも設定され得る。したがって、音響信号ＳA(t)の目的音成分の振幅スペクトルＡbの振幅値が教師音響成分の振幅スペクトルＡfを下回る周波数では、補償行列Ｄの振幅スペクトルＡdの振幅値（基底ベクトルｄ[k]の各要素）を負数に設定すれば、教師行列Ｆの基底ベクトルｆ[k]と補償行列Ｄの基底ベクトルｄ[k]との加算で目的音成分の振幅スペクトルＡbを適切に表現することが可能である。以上の説明から理解される通り、補償行列Ｄの基底ベクトルｄ[k]内の正数は、教師行列Ｆの基底ベクトルｆ[k]が示す振幅スペクトルＡfに成分を付加することで目的音成分の振幅スペクトルＡbに近似させる要素として機能し、基底ベクトルｄ[k]内の負数は、振幅スペクトルＡfの成分を除去することで振幅スペクトルＡbに近似させる要素として機能する。 In order to solve the above problems, in the second embodiment, the condition of the first embodiment that the compensation matrix D is a non-negative matrix is canceled. That is, the compensation matrix D is not limited to a non-negative matrix, and each element D _mk of the compensation matrix D can be set to a negative number as well as a non-negative number (positive number and zero). Therefore, at the frequency where the amplitude value of the amplitude spectrum Ab of the target sound component of the acoustic signal SA (t) is lower than the amplitude spectrum Af of the teacher acoustic component, the amplitude value of the amplitude spectrum Ad of the compensation matrix D (the basis vector d [k] If each element) is set to a negative number, it is possible to appropriately represent the amplitude spectrum Ab of the target sound component by adding the base vector f [k] of the teacher matrix F and the base vector d [k] of the compensation matrix D. It is. As understood from the above description, the positive number in the basis vector d [k] of the compensation matrix D is added to the amplitude spectrum Af indicated by the basis vector f [k] of the teacher matrix F to add the target sound component. The negative number in the basis vector d [k] functions as an element approximated to the amplitude spectrum Ab by removing the component of the amplitude spectrum Af.

ただし、行列解析部３４が実行する非負値行列因子分解では、基底行列（Ｂ,Ｈ）および係数行列（Ｇ,Ｕ）が非負値行列であるという条件が前提となる。そこで、第２実施形態では、補償行列Ｄの要素Ｄ_mkが負数に設定されることを許容する一方、教師行列Ｆと補償行列Ｄとを加算した基底行列Ｂは非負値行列であるという拘束条件Ｃ5を、第１実施形態と同様の拘束条件Ｃ1〜Ｃ4に追加する。具体的には、教師行列Ｆの第ｍ行第ｋ列の要素Ｆ_mk（非負値）と補償行列Ｄの第ｍ行第ｋ列の要素Ｄ_mk（非負数または負数）との加算値（Ｂ_mk）が非負値であるという拘束条件Ｃ5が導入される。例えば、第２実施形態では、以下の数式(8)で表現される拘束条件Ｃ5を適用する。
However, the non-negative matrix factorization executed by the matrix analysis unit 34 is premised on the condition that the base matrix (B, H) and the coefficient matrix (G, U) are non-negative matrices. Therefore, in the second embodiment, a constraint condition that the element D _mk of the compensation matrix D is allowed to be set to a negative number, while the base matrix B obtained by adding the teacher matrix F and the compensation matrix D is a non-negative matrix. C5 is added to the same constraint conditions C1 to C4 as in the first embodiment. Specifically, the addition value (B) of the element F _mk (non-negative value) in the m-th row and k-th column of the teacher matrix F and the element D _mk (non-negative number or negative number) in the m-th row and k-th column of the compensation matrix D. A constraint C5 is introduced that _mk ) is non-negative. For example, in the second embodiment, the constraint condition C5 expressed by the following formula (8) is applied.

数式(8)の係数ηは、基底行列Ｂの要素Ｂ_mkに許容される値域（要素Ｂ_mkが負数に接近する度合）を調整するための係数であり、１以下の正数（０＜η≦１）に設定される。例えば係数ηは０.３程度に好適に設定される。数式(8)の拘束条件Ｃ5のもとで前掲の数式(3)の評価関数Ｚを最小化するという条件から、未知の各行列の要素（Ｄ_mk,Ｇ_kn,Ｈ_mr,Ｕ_rn）を逐次的に更新するための以下の数式(9)から数式(12)が導出される。なお、数式(9)から数式(12)の記号Ｅは、全部の要素が１であるＭ行Ｎ列の行列を意味する。また、演算子( )pは、括弧内の行列のうち正数の要素を維持するとともに負数の要素をゼロに置換する演算子を意味し、演算子( )nは、括弧内の行列のうち負数の要素を維持するとともに正数の要素をゼロに置換する演算子を意味する。演算子.−は、行列の要素毎の除算を意味する。
The coefficient η in Equation (8) is a coefficient for adjusting a range of values allowed for the element B _mk of the base matrix B (the degree to which the element B _mk approaches a negative number), and is a positive number of 1 or less (0 <η ≦ 1). For example, the coefficient η is preferably set to about 0.3. From the condition that the evaluation function Z of Equation (3) is minimized under the constraint condition C5 of Equation (8), the unknown elements (D _mk , G _kn , H _mr , U _rn ) Equation (12) is derived from Equation (9) below for updating sequentially. Note that the symbol E in the equations (9) to (12) means a matrix of M rows and N columns in which all elements are 1. The operator () p means an operator that maintains positive elements of the matrix in parentheses and replaces negative elements with zero, and the operator () n An operator that maintains negative elements and replaces positive elements with zeros. The operator .- means division for each element of the matrix.

行列解析部３４は、第１実施形態の数式(4)から数式(7)に代えて数式(9)から数式(12)の演算を反復することで補償行列Ｄと係数行列Ｇと基底行列Ｈと係数行列Ｕとを算定する。すなわち、第２実施形態の行列解析部３４は、音響信号ＳA(t)の観測行列Ｙおよび既知の教師行列Ｆに対して拘束条件Ｃ1〜Ｃ5のもとで数式(2)の関係が成立するように補償行列Ｄと係数行列Ｇと基底行列Ｈと係数行列Ｕとを生成する。 The matrix analysis unit 34 repeats the operations of the formulas (9) to (12) instead of the formulas (4) to (7) of the first embodiment, thereby performing the compensation matrix D, the coefficient matrix G, and the basis matrix H. And the coefficient matrix U are calculated. In other words, the matrix analysis unit 34 of the second embodiment establishes the relationship of the formula (2) with respect to the observation matrix Y of the acoustic signal SA (t) and the known teacher matrix F under the constraint conditions C1 to C5. Thus, the compensation matrix D, the coefficient matrix G, the base matrix H, and the coefficient matrix U are generated.

第２実施形態においても第１実施形態と同様の効果が実現される。また、第２実施形態では、補償行列Ｄが非負値行列に限定されない（補償行列Ｄの要素Ｄ_mkが非負数だけでなく負数にも設定され得る）から、音響信号ＳA(t)の目的音成分の振幅スペクトルＡbの振幅値が教師音響成分の振幅スペクトルＡfを下回る場合でも目的音成分を高精度に分離できるという利点がある。他方、教師行列Ｆと補償行列Ｄとを加算した基底行列Ｂは非負値行列に制限される（拘束条件Ｃ5）から、補償行列Ｄが非負値行列に制約されない構成にも関わらず、行列解析部３４による非負値行列因子分解は第１実施形態と同様に適切に実行される。 In the second embodiment, the same effect as in the first embodiment is realized. In the second embodiment, the compensation matrix D is not limited to a non-negative matrix (the element D _mk of the compensation matrix D can be set not only to a non-negative number but also to a negative number), so that the target sound of the acoustic signal SA (t) is obtained. There is an advantage that the target sound component can be separated with high accuracy even when the amplitude value of the amplitude spectrum Ab of the component is lower than the amplitude spectrum Af of the teacher acoustic component. On the other hand, since the base matrix B obtained by adding the teacher matrix F and the compensation matrix D is limited to a non-negative matrix (constraint condition C5), the matrix analysis unit is not limited to the non-negative matrix. The non-negative matrix factorization by 34 is appropriately executed as in the first embodiment.

図６は、第２実施形態による音源分離の実験結果である。図６の実験では、ＭＩＤＩ（Musical Instrument Digital Interface）音源で生成されたフルートおよびクラリネットの楽音の混合音を教師音響成分として教師行列Ｆを生成し、自然楽器のフルートおよびクラリネットの楽音の混合音を収録した音響信号ＳA(t)からフルートの楽音を目的音成分として抽出した。図６に併記された対比例は、補償行列Ｄを利用しない構成（第１実施形態の数式(2)から補償行列Ｄを除外して拘束条件Ｃ1と拘束条件Ｃ3と拘束条件Ｃ4とを加味しない構成）である。図６では、信号対歪比（ＳＤＲ：Signal to Distortion Ratio）と信号対干渉比（ＳＩＲ：Signal to Interference Ratio）と非線形歪（ＳＡＲ：Sources to Artifacts Ratio）とが分離結果の評価尺度として第２実施形態および対比例の各々について表記されている。信号対歪比は、音源分離の精度と分離信号の品質との評価尺度であり、信号対干渉比は音源分離の精度のみの評価尺度であり、非線形歪は音源分離の前後にわたる信号歪の評価尺度である。音源分離の精度および分離信号の品質の双方の観点から良好な音源分離を第２実施形態により実現できることが図６の実験結果から確認できる。 FIG. 6 is a result of an experiment of sound source separation according to the second embodiment. In the experiment of FIG. 6, a teacher matrix F is generated using a mixed sound of flute and clarinet musical sounds generated by a MIDI (Musical Instrument Digital Interface) sound source as a teacher acoustic component, and a mixed sound of natural musical instrument flute and clarinet musical sounds is generated. The flute music was extracted as the target sound component from the recorded acoustic signal SA (t). 6 is a configuration in which the compensation matrix D is not used (the compensation matrix D is excluded from the mathematical expression (2) of the first embodiment and the constraint condition C1, the constraint condition C3, and the constraint condition C4 are not considered. Configuration). In FIG. 6, a signal to distortion ratio (SDR), a signal to interference ratio (SIR), and a non-linear distortion (SAR: Sources to Artifacts Ratio) are the second evaluation criteria for separation results. It describes about each of embodiment and contrast. The signal-to-distortion ratio is a measure of the accuracy of sound source separation and the quality of the separated signal, the signal-to-interference ratio is a measure of only the accuracy of sound source separation, and the nonlinear distortion is an evaluation of signal distortion before and after sound source separation. It is a scale. It can be confirmed from the experimental results of FIG. 6 that good sound source separation can be realized by the second embodiment in terms of both the accuracy of sound source separation and the quality of the separated signal.

図７は、第２実施形態および対比例による音源分離の結果を示すスペクトログラムである。フルートおよびクラリネットの楽音の混合音を収録した音響信号ＳA(t)（図７の部分(A)のスペクトログラム）からフルートの楽音を目的音成分として抽出した。図７の部分(B)には、フルートを単独で演奏した楽音の音響信号ＳA(t)のスペクトログラム（すなわち、音源分離の理想的な結果）が図示されている。図７の部分(C)は、第２実施形態で生成された音響信号ＳB(t)のスペクトログラムであり、図７の部分(D)は、対比例で生成された音響信号ＳB(t)のスペクトログラムである。図７の部分(C)は、図７の部分(D)と比較して部分(B)に近似する。すなわち、第２実施形態によれば、対比例と比較して高精度な音源分離を実現できることが図７からも確認できる。 FIG. 7 is a spectrogram showing the result of sound source separation according to the second embodiment and comparison. The flute musical sound was extracted as the target sound component from the acoustic signal SA (t) (spectrogram of part (A) in FIG. 7) containing the mixed sound of the flute and clarinet musical sounds. Part (B) of FIG. 7 shows a spectrogram (that is, an ideal result of sound source separation) of an acoustic signal SA (t) of a musical tone played by a flute alone. A part (C) of FIG. 7 is a spectrogram of the acoustic signal SB (t) generated in the second embodiment, and a part (D) of FIG. 7 is an acoustic signal SB (t) generated in proportion. Spectrogram. The part (C) in FIG. 7 approximates the part (B) as compared with the part (D) in FIG. That is, according to the second embodiment, it can be confirmed from FIG. 7 that sound source separation with higher accuracy can be realized as compared with the comparative example.

＜変形例＞
以上の各形態は多様に変形され得る。具体的な変形の態様を以下に例示する。以下の例示から任意に選択された２以上の態様は適宜に併合され得る。 <Modification>
Each of the above forms can be variously modified. Specific modifications are exemplified below. Two or more aspects arbitrarily selected from the following examples can be appropriately combined.

（１）前述の各形態では、教師音響成分の観測行列Ｘに対する非負値行列因子分解で教師行列Ｆを生成したが（図２）、教師行列Ｆを生成する方法は任意である。教師行列Ｆは、教師音響成分に想定されるＫ個の振幅スペクトルで構成されるから、例えば、教師音響成分のＫ個の音高の各々について平均的な振幅スペクトルを算定し、平均後の各振幅スペクトルを基底ベクトルｆ[k]としてＫ個分を配列することで教師行列Ｆを生成することも可能である。すなわち、音響の振幅スペクトルを特定する任意の技術が教師行列Ｆの生成に適用される。 (1) In each of the above embodiments, the teacher matrix F is generated by non-negative matrix factorization with respect to the observation matrix X of the teacher acoustic component (FIG. 2), but the method of generating the teacher matrix F is arbitrary. Since the teacher matrix F is composed of K amplitude spectra assumed for the teacher sound component, for example, an average amplitude spectrum is calculated for each of the K pitches of the teacher sound component, and each average after the average is calculated. It is also possible to generate the teacher matrix F by arranging K amplitudes with the amplitude spectrum as the basis vector f [k]. That is, any technique for specifying the acoustic amplitude spectrum is applied to the generation of the teacher matrix F.

（２）前述の各形態では、行列間の相関λ（（λ(F|D),λ(F|H),λ(D|H),λ(F+D|H)）に着目して拘束条件Ｃ1〜Ｃ4を規定したが、各行列間の距離δ（δ(F|D),δ(F|H),δ(D|H),δ(F+D|H)）に着目して拘束条件Ｃ1〜Ｃ4を規定することも可能である。例えば、拘束条件Ｃ1は、教師行列Ｆと補償行列Ｄとの距離δ(F|D)の最大化に相当し、拘束条件Ｃ2は、教師行列Ｆと基底行列Ｈとの距離δ(F|H)の最大化に相当し、拘束条件Ｃ3は、補償行列Ｄと基底行列Ｈとの距離δ(D|H)の最大化に相当し、拘束条件Ｃ4は、基底行列Ｂと基底行列Ｈとの距離δ(B|H)の最大化に相当する。例えば、一般化ＫＬ擬距離や板倉-斉藤距離（Itakura - Saito Divergence）やβ-ダイバージェンス等の公知の距離規範が距離δとして任意に採用される。行列間の距離δに着目した場合、以下の数式(13)の評価関数Ｚが最小化されるように未知の各行列の要素（Ｄ_mk,Ｇ_kn,Ｈ_mr,Ｕ_rn）が算定される。
(2) In each of the above-mentioned forms, paying attention to the correlation λ ((λ (F | D), λ (F | H), λ (D | H), λ (F + D | H)) between the matrices. Constraint conditions C1 to C4 were defined, but focusing on the distance δ (δ (F | D), δ (F | H), δ (D | H), δ (F + D | H)) between the matrices. It is also possible to define the constraint conditions C1 to C4, for example, the constraint condition C1 corresponds to maximization of the distance δ (F | D) between the teacher matrix F and the compensation matrix D. This corresponds to maximization of the distance δ (F | H) between the teacher matrix F and the base matrix H, and the constraint condition C3 corresponds to maximization of the distance δ (D | H) between the compensation matrix D and the base matrix H. The constraint condition C4 corresponds to the maximization of the distance δ (B | H) between the base matrix B and the base matrix H. For example, the generalized KL pseudorange, the Itakura-Saito distance, and β- A well-known distance criterion such as divergence is arbitrarily adopted as the distance δ.When attention is paid to the distance δ between the matrices, the evaluation function Z in the following equation (13) is the highest. Of the elements of each unknown matrix as _{_{(D mk, G kn, H}} mr, U rn) is calculated.

以上の説明から理解されるように、行列間の類似度が低下（理想的には最小化）するという拘束条件は、行列間の相関λが減少（理想的には最小化）するという条件と、行列間の距離δが増加（理想的には最大化）するという条件との双方を含意し得る。 As can be understood from the above description, the constraint condition that the similarity between the matrices decreases (ideally minimized) is that the correlation λ between the matrices decreases (ideally minimizes). And the condition that the distance δ between the matrices is increased (ideally maximized).

（３）前述の各形態では、音響信号ＳA(t)の目的音成分を抽出（非目的音成分を抑圧）した音響信号ＳB(t)を生成したが、音響信号ＳA(t)の目的音成分を抑圧（非目的音成分を抽出）した音響信号ＳB(t)を生成することも可能である。例えば、行列解析部３４が算定した基底行列Ｈと係数行列Ｕとを音源分離部３６が乗算することで、音響信号ＳA(t)の非目的音成分を抽出した音響信号ＳB(t)が生成される。以上の説明から理解される通り、音源分離部３６は、行列解析部３４による解析結果を利用して目的音成分および非目的音成分の一方を分離（抽出または抑圧）する要素として包括される。 (3) In each of the embodiments described above, the acoustic signal SB (t) is generated by extracting the target sound component of the acoustic signal SA (t) (suppressing the non-target sound component), but the target sound of the acoustic signal SA (t) is generated. It is also possible to generate an acoustic signal SB (t) with components suppressed (extracting non-target sound components). For example, the sound source separation unit 36 multiplies the base matrix H calculated by the matrix analysis unit 34 and the coefficient matrix U to generate the acoustic signal SB (t) from which the non-target sound component of the acoustic signal SA (t) is extracted. Is done. As understood from the above description, the sound source separation unit 36 is included as an element that separates (extracts or suppresses) one of the target sound component and the non-target sound component using the analysis result of the matrix analysis unit 34.

また、行列解析部３４による解析結果を利用して音源分離部３６が音響信号ＳB(t)を生成する方法は以上の例示に限定されない。例えば、基底行列Ｈと係数行列Ｕとを乗算することで、非目的音成分（雑音成分）の振幅スペクトログラムを表現する行列（以下「推定雑音行列」という）Ｅを算定し、音響信号ＳA(t)の観測行列Ｙから推定雑音行列Ｅを抑圧することで音響信号ＳB(t)を生成することも可能である。推定雑音行列Ｅの抑圧には、音響信号ＳA(t)（観測行列Ｙ）から雑音成分（推定雑音行列Ｅ）を抑圧する公知の技術（例えばスペクトル減算，ウィーナフィルタ，ＭＭＳＥ-ＳＴＳＡ等）が任意に採用される。 The method by which the sound source separation unit 36 generates the acoustic signal SB (t) using the analysis result by the matrix analysis unit 34 is not limited to the above example. For example, by multiplying the base matrix H and the coefficient matrix U, a matrix E (hereinafter referred to as “estimated noise matrix”) E representing the amplitude spectrogram of the non-target sound component (noise component) is calculated, and the acoustic signal SA (t It is also possible to generate the acoustic signal SB (t) by suppressing the estimated noise matrix E from the observation matrix Y of FIG. For the suppression of the estimated noise matrix E, a known technique (for example, spectral subtraction, Wiener filter, MMSE-STSA, etc.) for suppressing the noise component (estimated noise matrix E) from the acoustic signal SA (t) (observation matrix Y) is arbitrary. Adopted.

（４）前述の各形態では、１種類の音源が発音した教師音響成分の教師行列Ｆを例示したが、相異なる種類の複数（Ｊ個）の音源が発音した音響の混合音を教師音響成分として教師行列Ｆを生成することも可能である。教師行列Ｆは、図８に示すように、教師音響成分の相異なる音源に対応するＪ個の行列Ｆ[1]〜Ｆ[J]を横方向に配列した大行列である。１個の行列Ｆ[j]（ｊ＝１〜Ｊ）は、Ｊ個の音源のうち第ｊ番目の音源の音響成分を構成する各成分の振幅スペクトルを表現する１個以上の基底ベクトルを横方向に配列したＭ行（列数は任意）の行列である。補償行列Ｄは、相異なる行列Ｆ[j]に対応するＪ個の行列Ｄ[1]〜Ｄ[J]を横方向に配列した行列である。行列Ｄ[j]の行数および列数は行列Ｆ[j]と共通する。未知の各行列の要素（Ｄ_mk,Ｇ_kn,Ｈ_mr,Ｕ_rn）を算定する方法（更新式や拘束条件）は前述の各形態と同様である。 (4) In each of the above-described embodiments, the teacher matrix F of the teacher sound component generated by one type of sound source is illustrated. However, the mixed sound of sounds generated by a plurality of different types (J) of sound sources is used as the teacher sound component. It is also possible to generate a teacher matrix F as follows. As shown in FIG. 8, the teacher matrix F is a large matrix in which J matrices F [1] to F [J] corresponding to sound sources having different teacher acoustic components are arranged in the horizontal direction. One matrix F [j] (j = 1 to J) horizontally represents one or more basis vectors representing the amplitude spectrum of each component constituting the acoustic component of the jth sound source among the J sound sources. It is a matrix of M rows (arbitrary number of columns) arranged in the direction. The compensation matrix D is a matrix in which J matrices D [1] to D [J] corresponding to different matrices F [j] are arranged in the horizontal direction. The number of rows and columns of the matrix D [j] is the same as that of the matrix F [j]. The method (update formula and constraint condition) for calculating the elements (D _mk , G _kn , H _mr , U _rn ) of each unknown matrix is the same as in each of the above-described embodiments.

以上の構成によれば、音響信号ＳA(t)の複数の音響成分のうち音響特性（音源の種類）が教師音響成分と近似するＪ個の音響成分を混合した目的音成分（行列ＢＧ）と目的音成分以外の非目的音成分（行列ＨＵ）とが分離される。また、教師行列Ｆ内の行列Ｆ[j]と補償行列Ｄ内の行列Ｄ[j]とを加算した行列を係数行列Ｇのうち当該行列に対応する各係数ベクトルｇ[k]に乗算することで、Ｊ個の音源のうち第ｊ番目の音源の音響成分の振幅スペクトログラムが算定される。以上の例示から理解される通り、教師音響成分または目的音成分の音源の総数は任意である。 According to the above configuration, the target sound component (matrix BG) obtained by mixing the J acoustic components whose acoustic characteristics (sound source type) approximate the teacher acoustic component among the plurality of acoustic components of the acoustic signal SA (t). Non-target sound components (matrix HU) other than the target sound component are separated. In addition, the coefficient vector g [k] corresponding to the matrix of the coefficient matrix G is multiplied by a matrix obtained by adding the matrix F [j] in the teacher matrix F and the matrix D [j] in the compensation matrix D. Thus, the amplitude spectrogram of the acoustic component of the jth sound source among the J sound sources is calculated. As understood from the above examples, the total number of sound sources of the teacher sound component or the target sound component is arbitrary.

（５）空間情報（音響成分の到来方向）を利用した音響分離を前述の各形態に併合することも可能である。例えば、相互に離間する収音機器（マイクロホンアレイ）で収録した複数のチャネルの音響信号を解析することで特定方向からの到来音の音響信号ＳA(t)を抽出し、抽出後の音響信号ＳA(t)を対象として前述の各形態の音源分離を実行する。空間情報を利用した音源分離には公知の技術が任意に採用され得るが、例えば、Shigeki Miyabe, et. al., "Temporal quantization of spatial information using directional clustering for multichannel audio coding", IEEE WASPAA2009, p. 261-264, 2009が好適である。以上の構成によれば、受音点（複数の収音機器）に対して所定の方向から到来する目的音成分を高精度に分離できるという利点がある。 (5) It is also possible to merge acoustic separation using spatial information (acoustic component arrival direction) into the above-described embodiments. For example, an acoustic signal SA (t) of an incoming sound from a specific direction is extracted by analyzing acoustic signals of a plurality of channels recorded by sound collecting devices (microphone arrays) separated from each other, and the extracted acoustic signal SA The sound source separation of each form described above is executed for (t). Known techniques can be arbitrarily employed for sound source separation using spatial information. For example, Shigeki Miyabe, et.al., "Temporal quantization of spatial information using directional clustering for multichannel audio coding", IEEE WASPAA2009, p. 261-264, 2009 is preferred. According to the above structure, there exists an advantage that the target sound component which arrives from a predetermined direction with respect to a sound receiving point (several sound collection apparatus) can be isolate | separated with high precision.

（６）行列解析部３４が反復的に実行する演算の内容は以上の各形態での例示（(4)〜(7)，(9)〜(12)）に限定されない。例えば、前掲の数式(9)を以下の数式(9A)に置換することも可能である。
すなわち、前掲の数式(9)では、行列Ｖの項毎（2μ₁(FF^TD)，2μ₃(HH^TD)）に各要素の正負が判別されるのに対し、数式(9A)では、行列Ｖの全体として（すなわち各項の加算後に）各要素の正負が判別される。 (6) The contents of the calculation that the matrix analysis unit 34 repeatedly executes are not limited to the examples ((4) to (7) and (9) to (12)) in the above embodiments. For example, the above formula (9) can be replaced with the following formula (9A).
That is, in the above formula (9), the sign of each element is determined for each term of the matrix V (2μ ₁ (FF ^T D), 2μ ₃ (HH ^T D)), whereas in the formula (9A) The sign of each element is determined as a whole of the matrix V (that is, after addition of each term).

（７）非負値行列因子分解に適用される拘束条件は前述の各形態の例示に限定されない。例えば、前述の各形態のように、拘束条件Ｃ2（教師行列Ｆと基底行列Ｈとの類似度の低下）と拘束条件Ｃ3（補償行列Ｄと基底行列Ｈとの類似度の低下）とを加味すれば拘束条件Ｃ4も成立し得るから、拘束条件Ｃ4を省略することも可能である。 (7) The constraint condition applied to the non-negative matrix factorization is not limited to the examples of the above-described embodiments. For example, as in the above-described embodiments, the constraint condition C2 (decrease in similarity between the teacher matrix F and the base matrix H) and the constraint condition C3 (decrease in similarity between the compensation matrix D and the base matrix H) are taken into account. In this case, the constraint condition C4 can also be established, so that the constraint condition C4 can be omitted.

（８）前述の各形態では音響信号ＳA(t)の全帯域を処理対象としたが、音響信号ＳA(t)のうち特定の帯域を選択的に処理対象とすることも可能である。音響信号ＳA(t)のうち所望の音源に想定される帯域成分のみを処理対象とすれば、その音源の分離精度を向上することが可能である。 (8) In the above-described embodiments, the entire band of the acoustic signal SA (t) is the processing target. However, a specific band of the acoustic signal SA (t) can be selectively processed. If only the band component assumed for the desired sound source in the acoustic signal SA (t) is processed, the separation accuracy of the sound source can be improved.

（９）行列解析部３４による解析結果を利用した音響信号ＳB(t)の生成（音源分離部３６）は省略され得る。例えば、音響信号ＳA(t)の観測行列Ｙに対する非負値行列因子分解で未知の行列（Ｄ,Ｇ,Ｈ,Ｕ）を算定する音響解析装置１００（行列解析部３４）としても本発明は実施され得る。また、携帯電話機等の端末装置と通信するサーバ装置で音響解析装置１００を実現することも可能である。例えば、音響解析装置１００は、端末装置から受信した音響信号ＳA(t)から音響信号ＳB(t)を生成して端末装置に送信する。なお、音響信号ＳA(t)の観測行列Ｙを端末装置から受信する構成（例えば端末装置が周波数分析部３２を具備する構成）では音響解析装置１００から周波数分析部３２が省略され、行列解析部３４による解析結果を端末装置に送信する構成（例えば端末装置が音源分離部３６を具備する構成）では音響解析装置１００から音源分離部３６が省略される。 (9) Generation of the acoustic signal SB (t) using the analysis result by the matrix analysis unit 34 (sound source separation unit 36) can be omitted. For example, the present invention is also implemented as an acoustic analysis apparatus 100 (matrix analysis unit 34) that calculates an unknown matrix (D, G, H, U) by non-negative matrix factorization with respect to the observation matrix Y of the acoustic signal SA (t). Can be done. In addition, the acoustic analysis device 100 can be realized by a server device that communicates with a terminal device such as a mobile phone. For example, the acoustic analysis device 100 generates an acoustic signal SB (t) from the acoustic signal SA (t) received from the terminal device and transmits the acoustic signal SB (t) to the terminal device. In the configuration in which the observation matrix Y of the acoustic signal SA (t) is received from the terminal device (for example, the configuration in which the terminal device includes the frequency analysis unit 32), the frequency analysis unit 32 is omitted from the acoustic analysis device 100, and the matrix analysis unit In the configuration in which the analysis result of 34 is transmitted to the terminal device (for example, the configuration in which the terminal device includes the sound source separation unit 36), the sound source separation unit 36 is omitted from the acoustic analysis device 100.

１００……音響解析装置、１２……信号供給装置、１４……放音装置、２２……演算処理装置、２４……記憶装置、３２……周波数分析部、３４……行列解析部、３６……音源分離部、Ｆ……教師行列、Ｄ……補償行列、Ｂ，Ｈ……基底行列、Ｇ，Ｕ……係数行列、Ｘ，Ｙ……観測行列。 DESCRIPTION OF SYMBOLS 100 ... Acoustic analysis device, 12 ... Signal supply device, 14 ... Sound emission device, 22 ... Arithmetic processing device, 24 ... Memory | storage device, 32 ... Frequency analysis part, 34 ... Matrix analysis part, 36 ... ... sound source separation unit, F ... teacher matrix, D ... compensation matrix, B, H ... basis matrix, G, U ... coefficient matrix, X, Y ... observation matrix.

Claims

A matrix obtained by adding a compensation matrix to a known teacher matrix including a plurality of basis vectors indicating a spectrum of a teacher acoustic component, the first basis matrix including a plurality of basis vectors indicating a spectrum of a target sound component of an acoustic signal; A first coefficient matrix including a plurality of coefficient vectors indicating a temporal change in a weight value for each basis vector of the first basis matrix; and a plurality of basis vectors indicating a spectrum of a non-target sound component other than the target sound component of the acoustic signal. A matrix that calculates a second basis matrix including the second basis matrix and a second coefficient matrix including a plurality of coefficient vectors indicating a time change of a weight value for each basis vector of the second basis matrix by non-negative matrix factorization for the acoustic signal An acoustic analysis device comprising analysis means.

Sound source separation for generating at least one of the target sound component according to the first base matrix and the first coefficient matrix and the non-target sound component according to the second base matrix and the second coefficient matrix The acoustic analysis device according to claim 1, further comprising:

The acoustic analysis apparatus according to claim 1, wherein the matrix analysis unit performs the non-negative matrix factorization under a constraint that a similarity between the teacher matrix and the compensation matrix decreases.

The acoustic analysis according to any one of claims 1 to 3, wherein the matrix analysis unit performs the non-negative matrix factorization under a constraint that a similarity between the compensation matrix and the second basis matrix decreases. Analysis device.

The compensation matrix includes negative elements;
The acoustic analysis apparatus according to claim 1, wherein the matrix analysis unit performs the non-negative matrix factorization under a constraint that the first basis matrix is a non-negative matrix.