JP2014137389A - Acoustic analyzer - Google Patents

Acoustic analyzer Download PDF

Info

Publication number
JP2014137389A
JP2014137389A JP2013004375A JP2013004375A JP2014137389A JP 2014137389 A JP2014137389 A JP 2014137389A JP 2013004375 A JP2013004375 A JP 2013004375A JP 2013004375 A JP2013004375 A JP 2013004375A JP 2014137389 A JP2014137389 A JP 2014137389A
Authority
JP
Japan
Prior art keywords
matrix
acoustic
teacher
component
basis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP2013004375A
Other languages
Japanese (ja)
Inventor
Daichi Kitamura
大地 北村
Hiroshi Saruwatari
洋 猿渡
Yu Takahashi
祐 高橋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yamaha Corp
Original Assignee
Yamaha Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yamaha Corp filed Critical Yamaha Corp
Priority to JP2013004375A priority Critical patent/JP2014137389A/en
Publication of JP2014137389A publication Critical patent/JP2014137389A/en
Pending legal-status Critical Current

Links

Landscapes

  • Auxiliary Devices For Music (AREA)

Abstract

PROBLEM TO BE SOLVED: To precisely separate a specific sound component of sound signals.SOLUTION: A storage 24 stores a known instructor matrix F which includes plural base vectors each representing a spectrum of an instructor sound component in which acoustic property is close to a target sound component. A matrix analyzing section 34 performs a non-negative value matrix factorization on a sound signal SA(t) to calculate: a basis matrix B including plural base vectors each of which is a matrix in which the instructor matrix F is added with a compensation matrix D and which represents a spectrum of the target sound component of the sound signal SA(t); a coefficient matrix G including plural coefficient vectors each representing a time change of weight with respect to the base vectors of the basis matrix B; a basis matrix H including plural base vectors each representing a spectrum of a non-target sound component other than the target sound component of the sound signal SA(t); and a coefficient matrix U including plural coefficient vectors each representing a time change of weight with respect to the respective base vectors of the basis matrix H.

Description

本発明は、複数の音響成分の混合音の音響信号から特定の音響成分を分離(例えば抽出または抑圧)する技術に関する。   The present invention relates to a technique for separating (for example, extracting or suppressing) a specific acoustic component from an acoustic signal of a mixed sound of a plurality of acoustic components.

音響特性が相違する複数の音響成分の混合音から特定の音響成分を抽出または抑圧する音源分離技術が従来から提案されている。例えば非特許文献1や非特許文献2には、非負値行列因子分解(NMF:Non-negative Matrix Factorization)を利用した教師無音源分離が開示されている。   Conventionally, a sound source separation technique for extracting or suppressing a specific acoustic component from a mixed sound of a plurality of acoustic components having different acoustic characteristics has been proposed. For example, Non-Patent Document 1 and Non-Patent Document 2 disclose unsupervised sound source separation using non-negative matrix factorization (NMF).

非特許文献1や非特許文献2の技術では、図9に示すように、複数の音響成分が混合された観測音の振幅スペクトログラムを示す観測行列Yが非負値行列因子分解により基底行列Hと係数行列(アクティベーション行列)Uとに分解される。基底行列Hは、観測音に含まれる各音響成分のスペクトルを示す複数の基底ベクトルhで構成され、係数行列Uは、各基底ベクトルに対する加重値の時間変化を示す複数の係数ベクトルuで構成される。基底行列Hの複数の基底ベクトルhと係数行列Uの複数の係数ベクトルuとを音響成分毎(音源毎)に選別し、所望の音響成分の基底ベクトルhと係数ベクトルuとを抽出および乗算することでその音響成分の振幅スペクトログラムが生成される。   In the techniques of Non-Patent Document 1 and Non-Patent Document 2, as shown in FIG. 9, an observation matrix Y indicating an amplitude spectrogram of an observation sound in which a plurality of acoustic components is mixed is converted into a base matrix H and a coefficient by non-negative matrix factorization. It is decomposed into a matrix (activation matrix) U. The basis matrix H is composed of a plurality of basis vectors h indicating the spectrum of each acoustic component included in the observation sound, and the coefficient matrix U is composed of a plurality of coefficient vectors u indicating the time variation of the weight value for each basis vector. The A plurality of basis vectors h of the basis matrix H and a plurality of coefficient vectors u of the coefficient matrix U are selected for each acoustic component (for each sound source), and a basis vector h and a coefficient vector u of a desired acoustic component are extracted and multiplied. Thus, an amplitude spectrogram of the acoustic component is generated.

A. CICHOCKI, et. al., "NEW ALGORITHMS FOR NON-NEGATIVE MATRIX FACTORIZATION IN APPLICATIONS TO BLIND SOURCE SEPARATION," ICASSP 2006A. CICHOCKI, et. Al., "NEW ALGORITHMS FOR NON-NEGATIVE MATRIX FACTORIZATION IN APPLICATIONS TO BLIND SOURCE SEPARATION," ICASSP 2006 Tuomas Virtanen, "Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria", IEEE Trans. Aurio, Speech and Language Processing, volume 15, pp.1066-1074, 2007Tuomas Virtanen, "Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria", IEEE Trans. Aurio, Speech and Language Processing, volume 15, pp.1066-1074, 2007

しかし、非特許文献1および非特許文献2の技術では、基底行列Hの複数の基底ベクトルhや係数行列Uの複数の係数ベクトルuを、音響信号の音響成分毎に正確に選別(クラスタリング)することが困難である。したがって、音響信号の特定の音響成分のみを高精度に抽出または抑圧することは現実には容易ではない。以上の事情を考慮して、本発明は、音響信号の特定の音響成分を高精度に分離することを目的とする。   However, in the techniques of Non-Patent Document 1 and Non-Patent Document 2, a plurality of basis vectors h of the basis matrix H and a plurality of coefficient vectors u of the coefficient matrix U are accurately selected (clustered) for each acoustic component of the acoustic signal. Is difficult. Therefore, in reality, it is not easy to extract or suppress only a specific acoustic component of the acoustic signal with high accuracy. In view of the above circumstances, an object of the present invention is to separate a specific acoustic component of an acoustic signal with high accuracy.

以上の課題を解決するために、本発明の音響解析装置は、教師音響成分のスペクトルを示す複数の基底ベクトルを含む既知の教師行列(例えば教師行列F)に補償行列(例えば補償行列D)を加算した行列であって音響信号の目的音成分のスペクトルを示す複数の基底ベクトルを含む第1基底行列(例えば基底行列B)と、第1基底行列の各基底ベクトルに対する加重値の時間変化を示す複数の係数ベクトルを含む第1係数行列(例えば係数行列G)と、音響信号の目的音成分以外の非目的音成分のスペクトルを示す複数の基底ベクトルを含む第2基底行列(例えば基底行列H)と、第2基底行列の各基底ベクトルに対する加重値の時間変化を示す複数の係数ベクトルを含む第2係数行列(例えば係数行列U)とを、音響信号に対する非負値行列因子分解で算定する行列解析手段を具備する。以上の構成では、教師音響成分を示す既知の教師行列との間で第1係数行列が共通する未知の補償行列を利用した非負値行列因子分解が実行されるから、音響信号の目的音成分の音響特性が教師音響成分とは完全には合致しない場合でも目的音成分を高精度に分離できるという利点がある。   In order to solve the above problems, the acoustic analysis device of the present invention adds a compensation matrix (eg, compensation matrix D) to a known teacher matrix (eg, teacher matrix F) including a plurality of basis vectors indicating the spectrum of the teacher acoustic component. A first base matrix (for example, base matrix B) including a plurality of base vectors indicating the spectrum of the target sound component of the acoustic signal, and a time change of a weight value for each base vector of the first base matrix. A first coefficient matrix (for example, coefficient matrix G) including a plurality of coefficient vectors, and a second basis matrix (for example, base matrix H) including a plurality of basis vectors indicating spectra of non-target sound components other than the target sound component of the acoustic signal. And a second coefficient matrix (for example, coefficient matrix U) including a plurality of coefficient vectors indicating temporal changes in weight values for the respective basis vectors of the second basis matrix, and a non-negative value for the acoustic signal It comprises a matrix analysis means for calculating the column factorization. In the above configuration, since the non-negative matrix factorization using the unknown compensation matrix having the first coefficient matrix in common with the known teacher matrix indicating the teacher acoustic component is performed, the target sound component of the acoustic signal is There is an advantage that the target sound component can be separated with high accuracy even when the acoustic characteristic does not completely match the teacher acoustic component.

本発明の好適な態様の音響解析装置は、第1基底行列と第1係数行列とに応じた目的音成分、および、第2基底行列と第2係数行列とに応じた非目的音成分の少なくとも一方を生成する音源分離手段を具備する。以上の態様では、音響信号の目的音成分と非目的音成分とを分離(抽出または抑圧)することが可能である。   The acoustic analysis device according to a preferred aspect of the present invention includes at least a target sound component according to the first basis matrix and the first coefficient matrix, and a non-target sound component according to the second basis matrix and the second coefficient matrix. Sound source separation means for generating one is provided. In the above aspect, the target sound component and the non-target sound component of the acoustic signal can be separated (extracted or suppressed).

本発明の好適な態様において、行列解析手段は、教師行列と補償行列との類似度が低下するという拘束条件(例えば後述の拘束条件C1)のもとで非負値行列因子分解を実行する。以上の態様では、教師行列と補償行列との類似度が低下するように非負値行列因子分解が実行されるから、教師行列と補償行列とが共通することを回避できる(目的音成分のうち教師音響成分と相違する音響成分を補償行列として抽出できる)という利点がある。   In a preferred aspect of the present invention, the matrix analysis means performs non-negative matrix factorization under a constraint condition (for example, constraint condition C1 described later) that the degree of similarity between the teacher matrix and the compensation matrix decreases. In the above aspect, since the non-negative matrix factorization is performed so that the similarity between the teacher matrix and the compensation matrix decreases, it is possible to avoid the teacher matrix and the compensation matrix being in common (the teacher among the target sound components). There is an advantage that an acoustic component different from the acoustic component can be extracted as a compensation matrix.

本発明の好適な態様において、行列解析手段は、教師行列と第2基底行列との類似度が低下するという拘束条件(例えば後述の拘束条件C2)のもとで非負値行列因子分解を実行する。以上の態様では、教師行列と第2基底行列とが共通すること(音響信号の目的音成分のうち教師音響成分に近似する成分が第2基底行列に包含されること)を回避できるという利点がある。   In a preferred aspect of the present invention, the matrix analysis means performs non-negative matrix factorization under a constraint condition (for example, constraint condition C2 described later) that the similarity between the teacher matrix and the second basis matrix decreases. . In the above aspect, there is an advantage that it is possible to avoid that the teacher matrix and the second basis matrix are common (a component that approximates the teacher acoustic component among the target sound components of the acoustic signal is included in the second basis matrix). is there.

本発明の好適な態様において、行列解析手段は、補償行列と第2基底行列との類似度が低下するという拘束条件(例えば後述の拘束条件C3)のもとで非負値行列因子分解を実行する。以上の態様では、補償行列と第2基底行列とが共通すること(音響信号の目的音成分のうち教師音響成分とは相違する成分が第2基底行列に包含されること)を回避できるという利点がある。   In a preferred aspect of the present invention, the matrix analysis means performs non-negative matrix factorization under a constraint condition (for example, constraint condition C3 described later) that the degree of similarity between the compensation matrix and the second basis matrix decreases. . In the above aspect, it is possible to avoid that the compensation matrix and the second basis matrix are common (a component different from the teacher acoustic component among the target sound components of the acoustic signal is included in the second basis matrix). There is.

本発明の好適な態様において、行列解析手段は、第1基底行列と第2基底行列との類似度が低下するという拘束条件(例えば後述の拘束条件C4)のもとで非負値行列因子分解を実行する。以上の態様では、第1基底行列と第2基底行列とが共通することを回避できる(目的音成分と非目的音成分とを明確に区別できる)という利点がある。   In a preferred aspect of the present invention, the matrix analysis means performs non-negative matrix factorization under a constraint condition (for example, constraint condition C4 described later) that the similarity between the first basis matrix and the second basis matrix decreases. Run. The above aspect has an advantage that the first base matrix and the second base matrix can be prevented from being shared (the target sound component and the non-target sound component can be clearly distinguished).

なお、以上の各態様の拘束条件において「行列間の類似度が低下する」とは、各行列の相関が減少(理想的には最小化)すること、および、各行列の距離が増加(理想的には最大化)することの双方を含意する。   Note that “the similarity between matrices decreases” in the constraint conditions of each aspect described above means that the correlation of each matrix decreases (ideally minimizes), and the distance of each matrix increases (ideal It implies both maximization).

本発明の好適な態様において、補償行列は負数の要素を含み、行列解析手段は、第1基底行列が非負値行列であるという拘束条件(例えば後述の拘束条件C5)のもとで非負値行列因子分解を実行する。以上の態様では、補償行列が負数の要素を含み得るから、目的音成分のスペクトルの強度(例えば振幅)が教師音響成分のスペクトルの強度を下回る場合でも目的音成分を高精度に分離できるという利点がある。また、第1基底行列が非負値行列であるという拘束条件が加味されるから、観測行列の非負値行列因子分解を適切に実行することが可能である。なお、以上の態様の具体例は、例えば第2実施形態として後述される。   In a preferred aspect of the present invention, the compensation matrix includes a negative number element, and the matrix analysis means is a non-negative matrix under a constraint condition that the first basis matrix is a non-negative matrix (for example, a constraint condition C5 described later). Perform factorization. In the above aspect, since the compensation matrix may include a negative element, the target sound component can be separated with high accuracy even when the spectrum intensity (for example, amplitude) of the target sound component is lower than the spectrum intensity of the teacher sound component. There is. In addition, since the constraint that the first basis matrix is a non-negative matrix is taken into account, it is possible to appropriately execute non-negative matrix factorization of the observation matrix. In addition, the specific example of the above aspect is later mentioned, for example as 2nd Embodiment.

以上の各態様に係る音響解析装置は、音響信号の解析に専用されるDSP(Digital Signal Processor)などのハードウェア(電子回路)によって実現されるほか、CPU(Central Processing Unit)等の汎用の演算処理装置とプログラムとの協働によっても実現される。本発明のプログラムは、教師音響成分のスペクトルを示す複数の基底ベクトルを含む非負の教師行列に補償行列を加算した行列であって音響信号の目的音成分のスペクトルを示す複数の基底ベクトルを含む第1基底行列と、第1基底行列の各基底ベクトルに対する加重値の時間変化を示す複数の係数ベクトルを含む第1係数行列と、音響信号の目的音成分以外の非目的音成分のスペクトルを示す複数の基底ベクトルを含む第2基底行列と、第2基底行列の各基底ベクトルに対する加重値の時間変化を示す複数の係数ベクトルを含む第2係数行列とを、音響信号に対する非負値行列因子分解で算定する行列解析処理をコンピュータに実行させる。なお、本発明のプログラムは、コンピュータが読取可能な記録媒体に格納された形態で提供されてコンピュータにインストールされ得る。記録媒体は、例えば非一過性(non-transitory)の記録媒体であり、CD-ROM等の光学式記録媒体(光ディスク)が好例であるが、半導体記録媒体や磁気記録媒体等の公知の任意の形式の記録媒体を包含し得る。また、例えば、本発明のプログラムは、通信網を介した配信の形態で提供されてコンピュータにインストールされ得る。   The acoustic analysis apparatus according to each aspect described above is realized by hardware (electronic circuit) such as DSP (Digital Signal Processor) dedicated to the analysis of acoustic signals, and general-purpose computation such as CPU (Central Processing Unit). This is also realized by cooperation between the processing device and the program. The program of the present invention is a matrix obtained by adding a compensation matrix to a non-negative teacher matrix including a plurality of basis vectors indicating the spectrum of the teacher acoustic component, and includes a plurality of basis vectors indicating the spectrum of the target sound component of the acoustic signal. 1 basis matrix, a first coefficient matrix including a plurality of coefficient vectors indicating temporal changes in weight values for each basis vector of the first basis matrix, and a plurality of spectra indicating non-target sound components other than the target sound component of the acoustic signal A non-negative matrix factorization for the acoustic signal, and a second coefficient matrix including a plurality of coefficient vectors indicating a time change of a weight value for each basis vector of the second basis matrix Causes the computer to execute matrix analysis processing. The program of the present invention can be provided in a form stored in a computer-readable recording medium and installed in the computer. The recording medium is, for example, a non-transitory recording medium, and an optical recording medium (optical disk) such as a CD-ROM is a good example, but a known arbitrary one such as a semiconductor recording medium or a magnetic recording medium This type of recording medium can be included. For example, the program of the present invention can be provided in the form of distribution via a communication network and installed in a computer.

本発明の第1実施形態に係る音響解析装置のブロック図である。1 is a block diagram of an acoustic analysis device according to a first embodiment of the present invention. 教師行列の説明図である。It is explanatory drawing of a teacher matrix. 行列解析部の動作の説明図である。It is explanatory drawing of operation | movement of a matrix analysis part. 教師行列と補償行列と基底行列との関係の説明図である。It is explanatory drawing of the relationship between a teacher matrix, a compensation matrix, and a base matrix. 第2実施形態が解決する課題の説明図である。It is explanatory drawing of the subject which 2nd Embodiment solves. 第2実施形態の実験結果の図表である。It is a chart of an experimental result of a 2nd embodiment. 第2実施形態の効果の説明図である。It is explanatory drawing of the effect of 2nd Embodiment. 変形例に係る教師行列および補償行列の説明図である。It is explanatory drawing of the teacher matrix and compensation matrix which concern on a modification. 背景技術における非負値行列因子分解の説明図である。It is explanatory drawing of the nonnegative matrix factorization in background art.

<第1実施形態>
図1は、本発明の第1実施形態に係る音響解析装置100のブロック図である。図1に示すように、音響解析装置100には信号供給装置12と放音装置14とが接続される。信号供給装置12は、音響信号SA(t)を音響解析装置100に供給する。音響信号SA(t)は、音響特性が相違する複数の音響成分(例えば楽音や音声)の混合音の波形を示す時間領域信号である(t:時間)。例えば、複数種の楽器の演奏音(歌唱音等の音声も含意する)の混合音を示す音響信号SA(t)が音響解析装置100に供給される。周囲の音響を収音して音響信号SA(t)を生成する収音機器や、可搬型または内蔵型の記録媒体から音響信号SA(t)を取得して音響解析装置100に供給する再生装置や、通信網から音響信号SA(t)を受信して音響解析装置100に供給する通信装置が信号供給装置12として採用され得る。
<First Embodiment>
FIG. 1 is a block diagram of an acoustic analysis apparatus 100 according to the first embodiment of the present invention. As shown in FIG. 1, a signal supply device 12 and a sound emission device 14 are connected to the acoustic analysis device 100. The signal supply device 12 supplies the acoustic signal SA (t) to the acoustic analysis device 100. The acoustic signal SA (t) is a time domain signal indicating a waveform of a mixed sound of a plurality of acoustic components (for example, musical sound and voice) having different acoustic characteristics (t: time). For example, an acoustic signal SA (t) indicating a mixed sound of performance sounds (including sounds such as singing sounds) of a plurality of types of musical instruments is supplied to the acoustic analysis device 100. A sound collection device that collects ambient sounds and generates an acoustic signal SA (t), or a playback device that acquires the acoustic signal SA (t) from a portable or built-in recording medium and supplies it to the acoustic analysis device 100 Alternatively, a communication device that receives the acoustic signal SA (t) from the communication network and supplies the acoustic signal SA (t) to the acoustic analysis device 100 can be employed as the signal supply device 12.

第1実施形態の音響解析装置100は、信号供給装置12から供給される音響信号SA(t)に対する音響処理で音響信号SB(t)を生成する音響処理装置(音源分離装置)である。音響信号SB(t)は、音響信号SA(t)に包含される複数の音響成分のうち特定の音響成分(以下「目的音成分」という)を抽出した音響(すなわち目的音成分以外の非目的音成分を抑圧した音響)の波形を示す時間領域信号である。放音装置14(例えばスピーカやヘッドホン)は、音響解析装置100から供給される音響信号SB(t)に応じた音波を放射する。なお、音響信号SB(t)をデジタルからアナログに変換するD/A変換器の図示は便宜的に省略した。   The acoustic analysis device 100 of the first embodiment is an acoustic processing device (sound source separation device) that generates an acoustic signal SB (t) by acoustic processing on the acoustic signal SA (t) supplied from the signal supply device 12. The acoustic signal SB (t) is a sound obtained by extracting a specific acoustic component (hereinafter referred to as “target sound component”) from among a plurality of acoustic components included in the acoustic signal SA (t) (that is, a non-purpose other than the target sound component). It is a time domain signal showing a waveform of a sound with suppressed sound components. The sound emitting device 14 (for example, a speaker or headphones) radiates a sound wave corresponding to the acoustic signal SB (t) supplied from the acoustic analysis device 100. The D / A converter that converts the acoustic signal SB (t) from digital to analog is not shown for convenience.

図1に示すように、音響解析装置100は、演算処理装置22と記憶装置24とを具備するコンピュータシステムで実現される。記憶装置24は、演算処理装置22が実行するプログラムPGMや演算処理装置22が使用する各種のデータを記憶する。半導体記録媒体や磁気記録媒体等の公知の記録媒体や複数種の記録媒体の組合せが記憶装置24として任意に採用され得る。音響信号SA(t)を記憶装置24に記憶した構成(したがって信号供給装置12は省略され得る)も好適である。   As shown in FIG. 1, the acoustic analysis device 100 is realized by a computer system including an arithmetic processing device 22 and a storage device 24. The storage device 24 stores a program PGM executed by the arithmetic processing device 22 and various data used by the arithmetic processing device 22. A known recording medium such as a semiconductor recording medium or a magnetic recording medium or a combination of a plurality of types of recording media can be arbitrarily employed as the storage device 24. A configuration in which the acoustic signal SA (t) is stored in the storage device 24 (therefore, the signal supply device 12 can be omitted) is also suitable.

第1実施形態の記憶装置24は、目的音成分に対応(例えば近似または合致)した音響特性を表現する基底行列(以下「教師行列」という)Fを記憶する。音響解析装置100は、記憶装置24に記憶された教師行列Fを事前情報(教師情報)として利用した教師有(Supervised)音源分離で音響信号SA(t)から音響信号SB(t)を生成する。   The storage device 24 of the first embodiment stores a base matrix (hereinafter referred to as “teacher matrix”) F that represents acoustic characteristics corresponding to (for example, approximate or match) the target sound component. The acoustic analysis device 100 generates the acoustic signal SB (t) from the acoustic signal SA (t) by supervised sound source separation using the teacher matrix F stored in the storage device 24 as prior information (teacher information). .

教師行列Fは、既知の音源が発音した音響成分(以下「教師音響成分」という)から事前に生成されて記憶装置24に格納される。教師音響成分は、音響特性(例えば音色)が目的音成分と近似または共通する音響成分である。例えば、教師音響成分と目的音成分とは、楽器としての種類(例えば発音の原理)は相互に共通するけれども相異なる個体の楽器の演奏音(例えば相異なる型式のフルートの演奏音)に相当する。したがって、教師音響成分の音響特性と音響信号SA(t)の目的音成分の音響特性とは、同種の楽器の演奏音であると受聴者が知覚できる程度に相互に近似するが完全には合致しない。事前に用意された教師行列Fが表現する教師音響成分と音響特性が近似する音響成分が目的音成分に該当すると換言することも可能である。他方、非目的音成分と教師音響成分とは音響特性が相違する。例えば、教師音響成分と非目的音成分とは、相異なる種類の楽器の演奏音に相当し、両者が相異なる種類の楽器の演奏音であると受聴者が知覚できる程度に音響特性が相違する。例えば、目的音成分と教師音響成分とがフルートの演奏音に相当し、非目的音成分はクラリネットの演奏音に相当する。   The teacher matrix F is generated in advance from an acoustic component generated by a known sound source (hereinafter referred to as “teacher acoustic component”) and stored in the storage device 24. The teacher acoustic component is an acoustic component whose acoustic characteristics (for example, timbre) are similar to or in common with the target sound component. For example, the teacher sound component and the target sound component correspond to performance sounds of different musical instruments (for example, performance sounds of different types of flutes) although the types of musical instruments (for example, the principle of pronunciation) are common to each other. . Therefore, the acoustic characteristics of the teacher acoustic component and the acoustic characteristics of the target sound component of the acoustic signal SA (t) are approximated to each other to the extent that the listener can perceive that they are performance sounds of the same type of instrument, but are completely matched. do not do. In other words, it can be said that the teacher sound component expressed by the teacher matrix F prepared in advance and the sound component whose sound characteristics are approximated correspond to the target sound component. On the other hand, the non-target sound component and the teacher sound component have different acoustic characteristics. For example, the teacher sound component and the non-target sound component correspond to performance sounds of different types of musical instruments, and the acoustic characteristics are different to the extent that the listener can perceive that they are performance sounds of different types of musical instruments. . For example, the target sound component and the teacher sound component correspond to a flute performance sound, and the non-target sound component corresponds to a clarinet performance sound.

図2は、教師音響成分から教師行列Fを生成する処理の説明図である。図2の観測行列Xは、事前に収録された教師音響成分を時間軸上で区分したN個のフレームの各々の振幅スペクトルの時系列(振幅スペクトログラム)を表現するM行N列の非負値行列である(MおよびNは自然数)。すなわち、観測行列Xの第n列(n=1〜N)は、教師音響成分のうち第n番目のフレームの振幅スペクトルx[n]に相当する。振幅スペクトルx[n]の第m行(m=1〜M)の要素は、周波数軸上に設定されたM個の周波数のうち第m番目の周波数での振幅値を意味する。   FIG. 2 is an explanatory diagram of processing for generating the teacher matrix F from the teacher acoustic component. The observation matrix X in FIG. 2 is an M-row N-column non-negative matrix representing a time series (amplitude spectrogram) of amplitude spectra of N frames obtained by dividing pre-recorded teacher acoustic components on the time axis. (M and N are natural numbers). That is, the nth column (n = 1 to N) of the observation matrix X corresponds to the amplitude spectrum x [n] of the nth frame among the teacher acoustic components. The element of the m-th row (m = 1 to M) of the amplitude spectrum x [n] means an amplitude value at the m-th frequency among M frequencies set on the frequency axis.

図2の観測行列Xは、以下の数式(1)で表現される通り、非負値行列因子分解(NMF:Non-negative Matrix Factorization)により教師行列Fと係数行列(アクティベーション行列)Qとに分解される。
The observation matrix X in FIG. 2 is decomposed into a teacher matrix F and a coefficient matrix (activation matrix) Q by non-negative matrix factorization (NMF) as expressed by the following equation (1). Is done.

数式(1)の教師行列Fは、図2に示すように、教師音響成分を構成する各成分に対応するK個の基底ベクトルf[1]〜f[K]を横方向に配列したM行K列の非負値行列(基底行列)である。教師行列Fのうち第k列(k=1〜K)の基底ベクトルf[k]は、教師音響成分を構成するK個の成分(基底)のうち第k番目の成分の振幅スペクトルに相当する。すなわち、基底ベクトルf[k]の第m行(教師行列Fの第m行第k列)の要素は、教師音響成分の第k番目の成分の振幅スペクトルのうち周波数軸上の第m番目の周波数での振幅値を意味する。   As shown in FIG. 2, the teacher matrix F of Equation (1) has M rows in which K basis vectors f [1] to f [K] corresponding to the components constituting the teacher acoustic component are arranged in the horizontal direction. It is a non-negative matrix (base matrix) of K columns. The basis vector f [k] of the k-th column (k = 1 to K) in the teacher matrix F corresponds to the amplitude spectrum of the k-th component among the K components (bases) constituting the teacher acoustic component. . That is, the element of the m-th row (m-th row and k-th column of the teacher matrix F) of the basis vector f [k] is the m-th component on the frequency axis in the amplitude spectrum of the k-th component of the teacher acoustic component. It means the amplitude value at frequency.

数式(1)の係数行列Qは、図2に示すように、教師行列Fの各基底ベクトルf[k]に対応するK個の係数ベクトルq[1]〜q[K]を縦方向に配列したK行N列の非負値行列である。係数行列Qの第k行の係数ベクトルq[k]は、教師行列Fの基底ベクトルf[k]に対する加重値(活性度)の時系列に相当する。以上の説明から理解される通り、観測行列Xの1個の振幅スペクトルx[n]は、教師行列FのK個の基底ベクトルf[1]〜f[K]の加重和として表現される。   As shown in FIG. 2, the coefficient matrix Q of Equation (1) arranges K coefficient vectors q [1] to q [K] corresponding to the respective base vectors f [k] of the teacher matrix F in the vertical direction. This is a non-negative matrix of K rows and N columns. The coefficient vector q [k] in the k-th row of the coefficient matrix Q corresponds to a time series of weight values (activity) for the base vector f [k] of the teacher matrix F. As understood from the above description, one amplitude spectrum x [n] of the observation matrix X is expressed as a weighted sum of the K basis vectors f [1] to f [K] of the teacher matrix F.

教師行列Fと係数行列Qとを乗算した行列FQが観測行列Xに近似する(すなわち、行列FQと観測行列Xとの類似度が増加する)ように教師行列Fおよび係数行列Qが算定されたうえで教師行列Fが記憶装置24に格納される。教師行列FのK個の基底ベクトルf[1]〜f[K]の各々は、概略的には相異なる音高に対応する。すなわち、教師行列Fの生成に使用される教師音響成分は、音響信号SA(t)の目的音成分に想定され得る全部の音高を含むように生成され、教師行列Fの基底ベクトルf[k]の総数(基底数)Kは、音響信号SA(t)の目的音成分に想定され得る音高の総数以上の数値に設定される。以上が教師行列Fの生成の手順である。   The teacher matrix F and the coefficient matrix Q are calculated so that the matrix FQ obtained by multiplying the teacher matrix F and the coefficient matrix Q approximates the observation matrix X (that is, the similarity between the matrix FQ and the observation matrix X increases). In addition, the teacher matrix F is stored in the storage device 24. Each of the K basis vectors f [1] to f [K] of the teacher matrix F generally corresponds to different pitches. That is, the teacher acoustic component used for generating the teacher matrix F is generated so as to include all possible pitches of the target sound component of the acoustic signal SA (t), and the basis vector f [k of the teacher matrix F is generated. ] (Base number) K is set to a value equal to or greater than the total number of pitches that can be assumed as the target sound component of the acoustic signal SA (t). The above is the procedure for generating the teacher matrix F.

図1の演算処理装置22は、記憶装置24に記憶されたプログラムPGMを実行することで、音響信号SA(t)から音響信号SB(t)を生成するための複数の機能(周波数分析部32,行列解析部34,音源分離部36)を実現する。演算処理装置22の各要素による処理は、音響信号SA(t)を時間軸上で区分したN個のフレームを単位として順次に反復される。なお、演算処理装置22の各機能を複数の集積回路に分散した構成や、専用の電子回路(例えばDSP)が一部の機能を実現する構成も採用され得る。   The arithmetic processing unit 22 in FIG. 1 executes a program PGM stored in the storage device 24 to thereby generate a plurality of functions (frequency analysis unit 32) for generating the acoustic signal SB (t) from the acoustic signal SA (t). , Matrix analysis unit 34, and sound source separation unit 36). Processing by each element of the arithmetic processing unit 22 is sequentially repeated in units of N frames obtained by dividing the acoustic signal SA (t) on the time axis. A configuration in which each function of the arithmetic processing unit 22 is distributed over a plurality of integrated circuits, or a configuration in which a dedicated electronic circuit (for example, a DSP) realizes a part of the functions may be employed.

図3は、周波数分析部32および行列解析部34による処理の説明図である。周波数分析部32は、音響信号SA(t)のN個のフレームを単位として図3の観測行列Yを順次に生成する。観測行列Yは、図3に示すように、音響信号SA(t)を時間軸上で区分したN個のフレームの各々の振幅スペクトルy[1]〜y[N]の時系列(振幅スペクトログラム)を表現するM行N列の非負値行列である。すなわち、観測行列Yの第n列は、音響信号SA(t)のうち第n番目のフレームの振幅スペクトルy[n](M個の周波数の各々での振幅値の系列)に相当する。観測行列Yの生成には例えば短時間フーリエ変換等の公知の周波数分析が利用される。なお、音響信号SA(t)の各フレームのパワースペクトルの時系列を観測行列Yとして利用することも可能である。   FIG. 3 is an explanatory diagram of processing by the frequency analysis unit 32 and the matrix analysis unit 34. The frequency analysis unit 32 sequentially generates the observation matrix Y in FIG. 3 in units of N frames of the acoustic signal SA (t). As shown in FIG. 3, the observation matrix Y is a time series (amplitude spectrogram) of amplitude spectra y [1] to y [N] of N frames obtained by dividing the acoustic signal SA (t) on the time axis. Is a non-negative matrix with M rows and N columns. That is, the nth column of the observation matrix Y corresponds to the amplitude spectrum y [n] (sequence of amplitude values at each of the M frequencies) of the nth frame of the acoustic signal SA (t). For the generation of the observation matrix Y, a known frequency analysis such as a short-time Fourier transform is used. The time series of the power spectrum of each frame of the acoustic signal SA (t) can be used as the observation matrix Y.

図1の行列解析部34は、記憶装置24に格納された既知の教師行列Fを事前情報として利用した非負値行列因子分解(NMF)を観測行列Yに対して実行する。第1実施形態の行列解析部34は、以下の数式(2)で表現されるように、周波数分析部32が生成した観測行列Yを、基底行列Bと係数行列Gと基底行列Hと係数行列Uとに分解する。
The matrix analysis unit 34 in FIG. 1 performs non-negative matrix factorization (NMF) on the observation matrix Y using the known teacher matrix F stored in the storage device 24 as prior information. The matrix analysis unit 34 according to the first embodiment uses the observation matrix Y generated by the frequency analysis unit 32 as a basis matrix B, a coefficient matrix G, a basis matrix H, and a coefficient matrix, as expressed by the following formula (2). Disassemble into U.

数式(2)の基底行列Bは、図3に示すように、K個の基底ベクトルb[1]〜b[K]を横方向に配列したM行K列の非負値行列である。また、数式(2)の係数行列Gは、図3に示すように、基底行列Bの各基底ベクトルb[k]に対応するK個の係数ベクトルg[1]〜g[K]を縦方向に配列したK行N列の非負値行列である。係数行列Gの第k行の係数ベクトルg[k]は、基底行列Bの基底ベクトルb[k]に対する加重値(活性度)の時系列に相当する。すなわち、係数ベクトルg[k]の第n列の要素は、音響信号SA(t)のN個のフレームのうち第n番目のフレームにおける基底ベクトルf[k]の加重値を意味する。   As shown in FIG. 3, the basis matrix B of the equation (2) is a non-negative matrix of M rows and K columns in which K basis vectors b [1] to b [K] are arranged in the horizontal direction. In addition, the coefficient matrix G of the equation (2) has K coefficient vectors g [1] to g [K] corresponding to the basis vectors b [k] of the basis matrix B in the vertical direction, as shown in FIG. 2 is a non-negative matrix of K rows and N columns arranged in a matrix. The coefficient vector g [k] in the k-th row of the coefficient matrix G corresponds to a time series of weight values (activity) for the base vector b [k] of the base matrix B. That is, the element in the nth column of the coefficient vector g [k] means a weight value of the base vector f [k] in the nth frame among the N frames of the acoustic signal SA (t).

数式(2)で表現される通り、基底行列Bは、記憶装置24に記憶された既知の教師行列Fと未知の補償行列Dとの加算で表現される。教師行列Fは、前述の通り、教師音響成分を構成する各成分に対応するK個の基底ベクトルf[1]〜f[K]を横方向に配列したM行K列の非負値行列である。他方、補償行列Dは、図3に示す通り、K個の基底ベクトルd[1]〜d[K]を横方向に配列したM行K列の非負値行列である。数式(2)および図3に示す通り、基底行列Bの第k列の基底ベクトルb[k]は、教師行列Fの第k列の基底ベクトルf[k]と補償行列Dの第k列の基底ベクトルd[k]との加算に相当する。   As expressed by Equation (2), the base matrix B is expressed by adding the known teacher matrix F and the unknown compensation matrix D stored in the storage device 24. As described above, the teacher matrix F is a non-negative matrix of M rows and K columns in which K basis vectors f [1] to f [K] corresponding to the components constituting the teacher acoustic component are arranged in the horizontal direction. . On the other hand, the compensation matrix D is a non-negative matrix of M rows and K columns in which K basis vectors d [1] to d [K] are arranged in the horizontal direction, as shown in FIG. As shown in Equation (2) and FIG. 3, the basis vector b [k] of the kth column of the basis matrix B is the basis vector f [k] of the kth column of the teacher matrix F and the kth column of the compensation matrix D. This corresponds to addition with the basis vector d [k].

数式(2)から理解される通り、教師行列Fと補償行列Dとは共通の係数行列Gに対応する。すなわち、補償行列Dの基底ベクトルd[k]は、教師行列Fの基底ベクトルf[k]と共通の時点にて同様の度合(係数ベクトルg[k]内の1個の加重値)で励起される振幅スペクトルに相当する。以上の関係から理解される通り、教師行列Fの基底ベクトルf[k]と補償行列Dの基底ベクトルd[k]とは、音源が共通する音響成分の振幅スペクトルに相当する。前述の通り、教師行列Fは、音響特性が目的音成分に近似する教師音響成分(例えば音源の種類が目的音成分と共通する音響成分)を利用して生成される。したがって、図4に示すように、教師行列Fの基底ベクトルf[k]と補償行列Dの基底ベクトルd[k]とを加算した基底行列Bの基底ベクトルb[k]は、音響信号SA(t)の複数の音響成分のうち音響特性が教師音響成分に近似する目的音成分(例えば音源の種類が教師音響成分と共通する音響成分)の振幅スペクトルAbに相当する。すなわち、図4からも把握される通り、補償行列Dの基底ベクトルd[k]は、目的音成分の振幅スペクトルAbと教師音響成分の振幅スペクトルAfとの差分(残差)の振幅スペクトルAdに相当すると理解できる。したがって、音響信号SA(t)内の目的音成分の音響特性が教師音響成分の音響特性に完全に合致する場合には補償行列Dは零行列となる。また、補償行列Dの基底ベクトルd[k]は、音響信号SA(t)内の目的音成分の振幅スペクトルAbに近似(理想的には合致)するように教師行列Fの基底ベクトルf[k]の振幅スペクトルAfを変形することで目的音成分の振幅スペクトルAbと教師音響成分の振幅スペクトルAfとの相違を補償(吸収)するための要素と換言することも可能である。   As understood from Equation (2), the teacher matrix F and the compensation matrix D correspond to a common coefficient matrix G. That is, the basis vector d [k] of the compensation matrix D is excited with the same degree (one weight value in the coefficient vector g [k]) at the same time as the basis vector f [k] of the teacher matrix F. Corresponds to the amplitude spectrum to be applied. As understood from the above relationship, the basis vector f [k] of the teacher matrix F and the basis vector d [k] of the compensation matrix D correspond to the amplitude spectrum of the acoustic component common to the sound sources. As described above, the teacher matrix F is generated using a teacher acoustic component whose acoustic characteristics approximate to the target sound component (for example, an acoustic component in which the type of sound source is common to the target sound component). Therefore, as shown in FIG. 4, the base vector b [k] of the base matrix B obtained by adding the base vector f [k] of the teacher matrix F and the base vector d [k] of the compensation matrix D is the acoustic signal SA ( Among the plurality of acoustic components of t), the acoustic characteristic corresponds to the amplitude spectrum Ab of the target sound component whose acoustic characteristics approximate to the teacher acoustic component (for example, the acoustic component having the same sound source type as the teacher acoustic component). That is, as can be understood from FIG. 4, the basis vector d [k] of the compensation matrix D is changed to the amplitude spectrum Ad of the difference (residual) between the amplitude spectrum Ab of the target sound component and the amplitude spectrum Af of the teacher sound component. It can be understood that it corresponds. Therefore, when the acoustic characteristic of the target sound component in the acoustic signal SA (t) completely matches the acoustic characteristic of the teacher acoustic component, the compensation matrix D is a zero matrix. In addition, the base vector d [k] of the compensation matrix D is approximated (ideally matched) to the amplitude spectrum Ab of the target sound component in the acoustic signal SA (t). The amplitude spectrum Af in FIG. 11 can be transformed into an element for compensating (absorbing) the difference between the amplitude spectrum Ab of the target sound component and the amplitude spectrum Af of the teacher sound component.

以上に説明した通り、基底行列Bの各基底ベクトルb[k]は目的音成分の振幅スペクトルAbに相当するから、基底行列Bと係数行列Gとを乗算した数式(2)の行列BGは、音響信号SA(t)内の目的音成分の振幅スペクトログラムを表現する。したがって、数式(2)のうち基底行列Hと係数行列Uとを乗算した行列HUは、音響信号SA(t)のうち目的音成分以外の非目的音成分の振幅スペクトログラムを表現するM行N列の非負値行列である。   As described above, since each base vector b [k] of the base matrix B corresponds to the amplitude spectrum Ab of the target sound component, the matrix BG of the formula (2) obtained by multiplying the base matrix B and the coefficient matrix G is An amplitude spectrogram of the target sound component in the acoustic signal SA (t) is expressed. Therefore, the matrix HU obtained by multiplying the base matrix H and the coefficient matrix U in Equation (2) is M rows and N columns representing the amplitude spectrogram of the non-target sound component other than the target sound component in the acoustic signal SA (t). Is a non-negative matrix.

基底行列Hは、図3に示すように、音響信号SA(t)の非目的音成分を構成する各成分に対応するR個の基底ベクトルh[1]〜h[R]を横方向に配列したM行R列の非負値行列である。基底行列Hの第r列(r=1〜R)の基底ベクトルh[r]は、音響信号SA(t)の非目的音成分を構成するR個の成分のうち第r番目の成分の振幅スペクトルに相当する。すなわち、基底ベクトルh[r]の第m行の要素は、音響信号SA(t)の非目的音成分を構成する第r番目の成分の振幅スペクトルのうち周波数軸上の第m番目の周波数での振幅値を意味する。なお、基底行列Bの列数Kと基底行列Hの列数Rとの異同は不問である。   As shown in FIG. 3, the base matrix H arranges R basis vectors h [1] to h [R] corresponding to the components constituting the non-target sound components of the acoustic signal SA (t) in the horizontal direction. This is a non-negative matrix of M rows and R columns. The base vector h [r] of the r-th column (r = 1 to R) of the base matrix H is the amplitude of the r-th component among the R components constituting the non-target sound component of the acoustic signal SA (t). Corresponds to the spectrum. That is, the mth row element of the basis vector h [r] is the mth frequency on the frequency axis in the amplitude spectrum of the rth component constituting the non-target sound component of the acoustic signal SA (t). Means the amplitude value. The difference between the number of columns K of the base matrix B and the number of columns R of the base matrix H is not questioned.

数式(2)の係数行列Uは、図3に示すように、基底行列Hの各基底ベクトルh[r]に対応するR個の係数ベクトルu[1]〜u[R]を縦方向に配列したR行N列の非負値行列である。係数行列Gの第r行の係数ベクトルu[r]は、基底行列Hの基底ベクトルh[r]に対する加重値の時系列に相当する。すなわち、係数ベクトルu[r]の第n列の要素は、音響信号SA(t)のN個のフレームのうち第n番目のフレームにおける基底ベクトルh[r]の大きさ(加重値)を意味する。したがって、前述の通り、基底行列Hと係数行列Gとを乗算した数式(2)の行列HUは、音響信号SA(t)内の非目的音成分の振幅スペクトログラムを表現する。なお、非目的音成分は、相異なる音源(目的音成分の音源とは別種の音源)から発音された複数の音響成分を包含し得る。   As shown in FIG. 3, the coefficient matrix U of Equation (2) arranges R coefficient vectors u [1] to u [R] corresponding to each base vector h [r] of the base matrix H in the vertical direction. This is a non-negative matrix of R rows and N columns. The coefficient vector u [r] in the r-th row of the coefficient matrix G corresponds to a time series of weight values for the base vector h [r] of the base matrix H. That is, the element in the nth column of the coefficient vector u [r] means the magnitude (weight value) of the basis vector h [r] in the nth frame among the N frames of the acoustic signal SA (t). To do. Therefore, as described above, the matrix HU of the mathematical formula (2) obtained by multiplying the base matrix H and the coefficient matrix G represents the amplitude spectrogram of the non-target sound component in the acoustic signal SA (t). Note that the non-target sound component can include a plurality of acoustic components that are generated from different sound sources (a sound source different from the sound source of the target sound component).

図1の行列解析部34は、前掲の数式(2)の通り、目的音成分の行列(F+D)Gと非目的音成分の行列HUとの加算が音響信号SA(t)の観測行列Yに近似する(すなわち、両者間の相違が最小化する)ように、周波数分析部32が音響信号SA(t)から生成した観測行列Yと既知の教師行列Fとを利用した非負値行列因子分解で補償行列Dと係数行列Gと基底行列Hと係数行列Uとを算定する。第1実施形態の非負値行列因子分解では、各行列間の類似度が低下(理想的には最小化)するという以下の拘束条件C1〜C4を導入する。
C1:教師行列Fと補償行列Dとの類似度が低下する。
C2:教師行列Fと基底行列Hとの類似度が低下する。
C3:補償行列Dと基底行列Hとの類似度が低下する。
C4:基底行列B(B=F+D)と基底行列Hとの類似度が低下する。
The matrix analysis unit 34 shown in FIG. 1 adds the target sound component matrix (F + D) G and the non-target sound component matrix HU to the observation matrix Y of the acoustic signal SA (t) as shown in Equation (2). Non-negative matrix factorization using the observation matrix Y generated by the frequency analysis unit 32 from the acoustic signal SA (t) and the known teacher matrix F so as to approximate (that is, the difference between the two is minimized). A compensation matrix D, a coefficient matrix G, a base matrix H, and a coefficient matrix U are calculated. In the non-negative matrix factorization of the first embodiment, the following constraint conditions C1 to C4 are introduced that the degree of similarity between the matrices decreases (ideally minimized).
C1: The similarity between the teacher matrix F and the compensation matrix D decreases.
C2: The similarity between the teacher matrix F and the base matrix H decreases.
C3: The degree of similarity between the compensation matrix D and the basis matrix H decreases.
C4: The similarity between the base matrix B (B = F + D) and the base matrix H decreases.

拘束条件C1は、目的音成分のうち教師音響成分(教師行列F)とは音響特性が相違する音響成分を補償行列Dとして抽出するための条件である。すなわち、拘束条件C1の導入により、教師行列Fと補償行列Dとが共通する状態(基底行列Bが教師音響成分のみを反映した状態)で数式(2)が成立する状況は回避される。第1実施形態では、教師行列Fと補償行列Dとの相関λ(F|D)の最小化(minimize λ(F|D))を拘束条件C1として例示する。教師行列Fと補償行列Dとの相関行列FTD(記号Tは行列の転置を意味する)のフロベニウスノルム‖FTD‖Frが相関λ(F|D)として好適である。 The constraint condition C1 is a condition for extracting, as the compensation matrix D, an acoustic component having an acoustic characteristic different from that of the teacher acoustic component (teacher matrix F) among the target sound components. That is, the introduction of the constraint condition C1 avoids the situation in which Equation (2) is established in a state where the teacher matrix F and the compensation matrix D are common (the basis matrix B reflects only the teacher acoustic component). In the first embodiment, the minimization of the correlation λ (F | D) between the teacher matrix F and the compensation matrix D (minimize λ (F | D)) is exemplified as the constraint condition C1. It is suitable as | (D F) correlation matrix F T D of a teacher matrix F and the compensation matrix D (symbol T is meant the matrix transpose) Frobenius norm ‖F T D‖ Fr of correlation lambda.

拘束条件C2は、教師行列Fで表現される教師音響成分とは音響特性が相違する非目的音成分を基底行列Hとして抽出するための条件である。すなわち、拘束条件C2の導入により、教師行列Fと基底行列Hとが共通した状態で数式(2)が成立する状況は回避される。第1実施形態では、教師行列Fと基底行列Hとの相関λ(F|H)(例えばλ(F|H)=‖FTH‖Fr)の最小化を拘束条件C2として例示する。 The constraint condition C2 is a condition for extracting a non-target sound component having an acoustic characteristic different from that of the teacher acoustic component expressed by the teacher matrix F as the base matrix H. In other words, the introduction of the constraint condition C2 avoids the situation where the mathematical formula (2) is established in a state where the teacher matrix F and the base matrix H are in common. In the first embodiment, the correlation lambda of teacher matrix F and the basis matrix H illustrated as minimizing the constraint C2 (F | | H) ( H) = ‖F T H‖ Fr example lambda (F).

拘束条件C3は、補償行列Dで表現される音響成分(目的音成分と教師音響成分との差分)と基底行列Hで表現される非目的音成分とで音響特性を相違させるための条件である。すなわち、拘束条件C3の導入により、補償行列Dと基底行列Hとが共通した状態で数式(2)が成立する状況は回避される。第1実施形態では、補償行列Dと基底行列Hとの相関λ(D|H)(例えばλ(D|H)=‖FTH‖Fr)の最小化を拘束条件C3として例示する。 The constraint condition C3 is a condition for making the acoustic characteristics different between the acoustic component expressed by the compensation matrix D (difference between the target sound component and the teacher acoustic component) and the non-target sound component expressed by the base matrix H. . That is, by introducing the constraint condition C3, a situation in which the formula (2) is established in a state where the compensation matrix D and the base matrix H are common is avoided. In the first embodiment, the correlation between the compensation matrix D and the basis matrix H lambda illustrate the minimization as constraints C3 (D | | H) ( H) = ‖F T H‖ Fr example lambda (D).

拘束条件C4は、基底行列B(B=F+D)で表現される目的音成分と基底行列Hで表現される非目的音成分とで音響特性を相違させるための条件である。すなわち、拘束条件C4の導入により、基底行列Bと基底行列Hとが共通した状態(目的音成分と非目的音成分とを区別できない状態)で数式(2)が成立する状況は回避される。第1実施形態では、基底行列B(B=F+D)と基底行列Hとの相関λ(F+D|H)(例えばλ(F+D|H)=‖(F+D)TH‖Frの最小化を拘束条件C4として例示する。 The constraint condition C4 is a condition for making the acoustic characteristics different between the target sound component expressed by the base matrix B (B = F + D) and the non-target sound component expressed by the base matrix H. That is, by introducing the constraint condition C4, a situation in which Equation (2) is established in a state where the base matrix B and the base matrix H are in common (a state where the target sound component and the non-target sound component cannot be distinguished) is avoided. In the first embodiment, the correlation lambda between base matrix B (B = F + D) and the basis matrix H smallest H) = ‖ (F + D) T H‖ Fr | (F + D | H) ( e.g., lambda (F + D Is exemplified as the constraint condition C4.

以上に説明した拘束条件C1〜C4のもとで数式(2)の成否を評価するために以下の数式(3)の評価関数Zを導入する。
数式(3)の記号δ(Y|(F+D)G+HU)は、観測行列Yと行列{(F+D)G+HU}との距離を意味する。例えば、一般化KL(Kullback-Leibler)擬距離(I-divergence)が距離δ(Y|(F+D)G+HU)として好適である。数式(3)の係数μ1〜μ4は、数式(3)の各項(相関λ)の値域を相互に整合させるための係数であり、所定の非負値(μ1,μ2,μ3,μ4≧0)に設定される。
In order to evaluate the success or failure of the formula (2) under the constraint conditions C1 to C4 described above, an evaluation function Z of the following formula (3) is introduced.
The symbol δ (Y | (F + D) G + HU) in Equation (3) means the distance between the observation matrix Y and the matrix {(F + D) G + HU}. For example, a generalized KL (Kullback-Leibler) pseudorange (I-divergence) is suitable as the distance δ (Y | (F + D) G + HU). The coefficients μ1 to μ4 in the formula (3) are coefficients for mutually matching the value ranges of the respective terms (correlation λ) in the formula (3), and are predetermined non-negative values (μ1, μ2, μ3, μ4 ≧ 0). Set to

第1実施形態の行列解析部34による非負値行列因子分解は、数式(3)の評価関数Zを最小化する処理(すなわち、拘束条件C1〜C4のもとで数式(2)を成立させる処理)に相当する。評価関数Zの最小化という条件から以下の数式(4)から数式(7)が導出される。
The non-negative matrix factorization by the matrix analysis unit 34 of the first embodiment is a process for minimizing the evaluation function Z of the expression (3) (that is, a process for establishing the expression (2) under the constraint conditions C1 to C4). ). From the condition that the evaluation function Z is minimized, the following equation (7) is derived from the following equation (4).

数式(4)は、補償行列D(M行K列)の第m行第k列の要素Dmk(Dmk≧0)を逐次的に更新する更新式であり、数式(5)は、係数行列G(K行N列)の第k行第n列の要素Gkn(Gkn≧0)を逐次的に更新する更新式である。数式(6)は、基底行列H(M行R列)の第m行第r列の要素Hmr(Hmr≧0)を逐次的に更新する更新式であり、数式(7)は、係数行列U(R行N列)の第r行第n列の要素Urn(Urn≧0)を逐次的に更新する更新式である。 Equation (4) is an update equation that sequentially updates the element D mk (D mk ≧ 0) of the m-th row and the k-th column of the compensation matrix D (M rows and K columns). This is an update formula for sequentially updating the element G kn (G kn ≧ 0) of the k-th row and the n-th column of the matrix G (K rows and N columns). Equation (6) is an update equation that sequentially updates the element H mr (H mr ≧ 0) of the m-th row and the r-th column of the base matrix H (M rows and R columns). This is an update equation for sequentially updating the element U rn (U rn ≧ 0) of the r-th row and the n-th column of the matrix U (R rows and N columns).

図1の行列解析部34は、評価関数Z内で未知の行列(D,G,H,U)の初期値を乱数に設定したうえで数式(4)から数式(7)の演算を反復し、反復回数が所定の回数に到達した時点での演算結果(Dmk,Gkn,Hmr,Urn)を補償行列D,係数行列G,基底行列Hおよび係数行列Uとして確定する。更新式の反復回数は、評価関数Zが所定値(例えばゼロ)に収束するように実験的または統計的に選定される。以上の説明から理解される通り、第1実施形態の行列解析部34は、音響信号SA(t)の観測行列Yおよび既知の教師行列Fに対して拘束条件C1〜C4のもとで数式(2)の関係が成立するように補償行列Dと係数行列Gと基底行列Hと係数行列Uとを生成する。 1 sets the initial value of the unknown matrix (D, G, H, U) in the evaluation function Z to a random number, and then repeats the operations of Equation (4) to Equation (7). The calculation results (D mk , G kn , H mr , U rn ) when the number of iterations reaches a predetermined number are determined as a compensation matrix D, a coefficient matrix G, a base matrix H, and a coefficient matrix U. The number of iterations of the update formula is selected experimentally or statistically so that the evaluation function Z converges to a predetermined value (for example, zero). As can be understood from the above description, the matrix analysis unit 34 of the first embodiment uses the mathematical expression (1) under the constraint conditions C1 to C4 with respect to the observation matrix Y of the acoustic signal SA (t) and the known teacher matrix F. A compensation matrix D, a coefficient matrix G, a base matrix H, and a coefficient matrix U are generated so that the relationship 2) is established.

図1の音源分離部36は、行列解析部34による解析結果(D,G,H,U)を利用して音響信号SB(t)を生成する。具体的には、音源分離部36は、記憶装置24に記憶された教師行列Fと行列解析部34が算定した補償行列Dとを加算した基底行列Bに行列解析部34が算定した係数行列Gを乗算することで、音響信号SA(t)内の目的音成分の振幅スペクトログラム((F+D)G)を算定し、各フレームの振幅スペクトルと音響信号SA(t)のそのフレームでの位相スペクトルとを適用した逆フーリエ変換で時間領域の音響信号SB(t)を生成する。音源分離部36が生成した音響信号SB(t)が放音装置14に供給されて音波として再生される。   The sound source separation unit 36 in FIG. 1 generates an acoustic signal SB (t) using the analysis result (D, G, H, U) by the matrix analysis unit 34. Specifically, the sound source separation unit 36 adds the coefficient matrix G calculated by the matrix analysis unit 34 to the base matrix B obtained by adding the teacher matrix F stored in the storage device 24 and the compensation matrix D calculated by the matrix analysis unit 34. To calculate the amplitude spectrogram ((F + D) G) of the target sound component in the acoustic signal SA (t), and the amplitude spectrum of each frame and the phase spectrum of the acoustic signal SA (t) in that frame The time domain acoustic signal SB (t) is generated by inverse Fourier transform using the above. The acoustic signal SB (t) generated by the sound source separation unit 36 is supplied to the sound emitting device 14 and reproduced as a sound wave.

以上に説明した第1実施形態では、教師音響成分を表現する既知の教師行列Fとの間で係数行列Gが共通する未知の補償行列Dを適用した非負値行列因子分解が実行されるから、音響信号SA(t)の目的音成分の音響特性が教師音響成分とは完全には合致しない場合でも目的音成分を高精度に分離できるという利点がある。第1実施形態では、教師行列Fと補償行列Dと基底行列Hとに関する拘束条件C1〜C4のもとで非負値行列因子分解が実行されるから、目的音成分と非目的音成分とを高精度に分離できるという効果は格別に顕著である。   In the first embodiment described above, non-negative matrix factorization is performed by applying an unknown compensation matrix D having a common coefficient matrix G with a known teacher matrix F that expresses a teacher acoustic component. There is an advantage that the target sound component can be separated with high accuracy even when the acoustic characteristics of the target sound component of the acoustic signal SA (t) do not completely match the teacher acoustic component. In the first embodiment, since non-negative matrix factorization is performed under the constraint conditions C1 to C4 regarding the teacher matrix F, the compensation matrix D, and the base matrix H, the target sound component and the non-target sound component are increased. The effect that it can be separated into precision is particularly remarkable.

<第2実施形態>
本発明の第2実施形態を以下に説明する。なお、以下に例示する各態様において作用や機能が第1実施形態と同様である要素については、第1実施形態の説明で参照した符号を流用して各々の詳細な説明を適宜に省略する。
Second Embodiment
A second embodiment of the present invention will be described below. In addition, about the element in which an effect | action and a function are the same as that of 1st Embodiment in each aspect illustrated below, the detailed description of each is abbreviate | omitted suitably using the code | symbol referred by description of 1st Embodiment.

第1実施形態では、補償行列Dを非負値行列(Dmk≧0)と仮定した。図4を参照して説明した通り、基底行列Bの基底ベクトルb[k]で表現される目的音成分の振幅スペクトルAb(基底ベクトルy[n])が、教師行列Fの基底ベクトルf[k]で表現される教師音響成分の振幅スペクトルAfに対して正数(基底ベクトルd[f])を加算した関係にある場合(すなわち、振幅スペクトルAbの振幅値が全周波数にわたり振幅スペクトルAfを上回る場合)には、補償行列Dを非負値行列とした第1実施形態でも目的音成分を高精度に分離することが可能である。しかし、図5に例示されるように、音響信号SA(t)の目的音成分の振幅スペクトルAbの振幅値が教師行列Fの振幅スペクトルAfを下回る場合、非負の教師行列Fの基底ベクトルf[k]と非負の補償行列Dの基底ベクトルd[k]との加算では目的音成分の振幅スペクトルAbを適切に表現できない。 In the first embodiment, the compensation matrix D is assumed to be a non-negative matrix (D mk ≧ 0). As described with reference to FIG. 4, the amplitude spectrum Ab (basic vector y [n]) of the target sound component expressed by the base vector b [k] of the base matrix B is the base vector f [k] of the teacher matrix F. ] Is a relationship obtained by adding a positive number (basic vector d [f]) to the amplitude spectrum Af of the teacher acoustic component expressed by the above equation (that is, the amplitude value of the amplitude spectrum Ab exceeds the amplitude spectrum Af over all frequencies). In the first case, the target sound component can be separated with high accuracy even in the first embodiment in which the compensation matrix D is a non-negative matrix. However, as illustrated in FIG. 5, when the amplitude value of the amplitude spectrum Ab of the target sound component of the acoustic signal SA (t) is lower than the amplitude spectrum Af of the teacher matrix F, the base vector f [ k] and the basis vector d [k] of the non-negative compensation matrix D cannot appropriately represent the amplitude spectrum Ab of the target sound component.

以上の課題を解決するために、第2実施形態では、補償行列Dが非負値行列であるという第1実施形態の条件を解除する。すなわち、補償行列Dは非負値行列に制限されず、補償行列Dの各要素Dmkは非負数(正数およびゼロ)だけでなく負数にも設定され得る。したがって、音響信号SA(t)の目的音成分の振幅スペクトルAbの振幅値が教師音響成分の振幅スペクトルAfを下回る周波数では、補償行列Dの振幅スペクトルAdの振幅値(基底ベクトルd[k]の各要素)を負数に設定すれば、教師行列Fの基底ベクトルf[k]と補償行列Dの基底ベクトルd[k]との加算で目的音成分の振幅スペクトルAbを適切に表現することが可能である。以上の説明から理解される通り、補償行列Dの基底ベクトルd[k]内の正数は、教師行列Fの基底ベクトルf[k]が示す振幅スペクトルAfに成分を付加することで目的音成分の振幅スペクトルAbに近似させる要素として機能し、基底ベクトルd[k]内の負数は、振幅スペクトルAfの成分を除去することで振幅スペクトルAbに近似させる要素として機能する。 In order to solve the above problems, in the second embodiment, the condition of the first embodiment that the compensation matrix D is a non-negative matrix is canceled. That is, the compensation matrix D is not limited to a non-negative matrix, and each element D mk of the compensation matrix D can be set to a negative number as well as a non-negative number (positive number and zero). Therefore, at the frequency where the amplitude value of the amplitude spectrum Ab of the target sound component of the acoustic signal SA (t) is lower than the amplitude spectrum Af of the teacher acoustic component, the amplitude value of the amplitude spectrum Ad of the compensation matrix D (the basis vector d [k] If each element) is set to a negative number, it is possible to appropriately represent the amplitude spectrum Ab of the target sound component by adding the base vector f [k] of the teacher matrix F and the base vector d [k] of the compensation matrix D. It is. As understood from the above description, the positive number in the basis vector d [k] of the compensation matrix D is added to the amplitude spectrum Af indicated by the basis vector f [k] of the teacher matrix F to add the target sound component. The negative number in the basis vector d [k] functions as an element approximated to the amplitude spectrum Ab by removing the component of the amplitude spectrum Af.

ただし、行列解析部34が実行する非負値行列因子分解では、基底行列(B,H)および係数行列(G,U)が非負値行列であるという条件が前提となる。そこで、第2実施形態では、補償行列Dの要素Dmkが負数に設定されることを許容する一方、教師行列Fと補償行列Dとを加算した基底行列Bは非負値行列であるという拘束条件C5を、第1実施形態と同様の拘束条件C1〜C4に追加する。具体的には、教師行列Fの第m行第k列の要素Fmk(非負値)と補償行列Dの第m行第k列の要素Dmk(非負数または負数)との加算値(Bmk)が非負値であるという拘束条件C5が導入される。例えば、第2実施形態では、以下の数式(8)で表現される拘束条件C5を適用する。
However, the non-negative matrix factorization executed by the matrix analysis unit 34 is premised on the condition that the base matrix (B, H) and the coefficient matrix (G, U) are non-negative matrices. Therefore, in the second embodiment, a constraint condition that the element D mk of the compensation matrix D is allowed to be set to a negative number, while the base matrix B obtained by adding the teacher matrix F and the compensation matrix D is a non-negative matrix. C5 is added to the same constraint conditions C1 to C4 as in the first embodiment. Specifically, the addition value (B) of the element F mk (non-negative value) in the m-th row and k-th column of the teacher matrix F and the element D mk (non-negative number or negative number) in the m-th row and k-th column of the compensation matrix D. A constraint C5 is introduced that mk ) is non-negative. For example, in the second embodiment, the constraint condition C5 expressed by the following formula (8) is applied.

数式(8)の係数ηは、基底行列Bの要素Bmkに許容される値域(要素Bmkが負数に接近する度合)を調整するための係数であり、1以下の正数(0<η≦1)に設定される。例えば係数ηは0.3程度に好適に設定される。数式(8)の拘束条件C5のもとで前掲の数式(3)の評価関数Zを最小化するという条件から、未知の各行列の要素(Dmk,Gkn,Hmr,Urn)を逐次的に更新するための以下の数式(9)から数式(12)が導出される。なお、数式(9)から数式(12)の記号Eは、全部の要素が1であるM行N列の行列を意味する。また、演算子( )pは、括弧内の行列のうち正数の要素を維持するとともに負数の要素をゼロに置換する演算子を意味し、演算子( )nは、括弧内の行列のうち負数の要素を維持するとともに正数の要素をゼロに置換する演算子を意味する。演算子.−は、行列の要素毎の除算を意味する。
The coefficient η in Equation (8) is a coefficient for adjusting a range of values allowed for the element B mk of the base matrix B (the degree to which the element B mk approaches a negative number), and is a positive number of 1 or less (0 <η ≦ 1). For example, the coefficient η is preferably set to about 0.3. From the condition that the evaluation function Z of Equation (3) is minimized under the constraint condition C5 of Equation (8), the unknown elements (D mk , G kn , H mr , U rn ) Equation (12) is derived from Equation (9) below for updating sequentially. Note that the symbol E in the equations (9) to (12) means a matrix of M rows and N columns in which all elements are 1. The operator () p means an operator that maintains positive elements of the matrix in parentheses and replaces negative elements with zero, and the operator () n An operator that maintains negative elements and replaces positive elements with zeros. The operator .- means division for each element of the matrix.

行列解析部34は、第1実施形態の数式(4)から数式(7)に代えて数式(9)から数式(12)の演算を反復することで補償行列Dと係数行列Gと基底行列Hと係数行列Uとを算定する。すなわち、第2実施形態の行列解析部34は、音響信号SA(t)の観測行列Yおよび既知の教師行列Fに対して拘束条件C1〜C5のもとで数式(2)の関係が成立するように補償行列Dと係数行列Gと基底行列Hと係数行列Uとを生成する。   The matrix analysis unit 34 repeats the operations of the formulas (9) to (12) instead of the formulas (4) to (7) of the first embodiment, thereby performing the compensation matrix D, the coefficient matrix G, and the basis matrix H. And the coefficient matrix U are calculated. In other words, the matrix analysis unit 34 of the second embodiment establishes the relationship of the formula (2) with respect to the observation matrix Y of the acoustic signal SA (t) and the known teacher matrix F under the constraint conditions C1 to C5. Thus, the compensation matrix D, the coefficient matrix G, the base matrix H, and the coefficient matrix U are generated.

第2実施形態においても第1実施形態と同様の効果が実現される。また、第2実施形態では、補償行列Dが非負値行列に限定されない(補償行列Dの要素Dmkが非負数だけでなく負数にも設定され得る)から、音響信号SA(t)の目的音成分の振幅スペクトルAbの振幅値が教師音響成分の振幅スペクトルAfを下回る場合でも目的音成分を高精度に分離できるという利点がある。他方、教師行列Fと補償行列Dとを加算した基底行列Bは非負値行列に制限される(拘束条件C5)から、補償行列Dが非負値行列に制約されない構成にも関わらず、行列解析部34による非負値行列因子分解は第1実施形態と同様に適切に実行される。 In the second embodiment, the same effect as in the first embodiment is realized. In the second embodiment, the compensation matrix D is not limited to a non-negative matrix (the element D mk of the compensation matrix D can be set not only to a non-negative number but also to a negative number), so that the target sound of the acoustic signal SA (t) is obtained. There is an advantage that the target sound component can be separated with high accuracy even when the amplitude value of the amplitude spectrum Ab of the component is lower than the amplitude spectrum Af of the teacher acoustic component. On the other hand, since the base matrix B obtained by adding the teacher matrix F and the compensation matrix D is limited to a non-negative matrix (constraint condition C5), the matrix analysis unit is not limited to the non-negative matrix. The non-negative matrix factorization by 34 is appropriately executed as in the first embodiment.

図6は、第2実施形態による音源分離の実験結果である。図6の実験では、MIDI(Musical Instrument Digital Interface)音源で生成されたフルートおよびクラリネットの楽音の混合音を教師音響成分として教師行列Fを生成し、自然楽器のフルートおよびクラリネットの楽音の混合音を収録した音響信号SA(t)からフルートの楽音を目的音成分として抽出した。図6に併記された対比例は、補償行列Dを利用しない構成(第1実施形態の数式(2)から補償行列Dを除外して拘束条件C1と拘束条件C3と拘束条件C4とを加味しない構成)である。図6では、信号対歪比(SDR:Signal to Distortion Ratio)と信号対干渉比(SIR:Signal to Interference Ratio)と非線形歪(SAR:Sources to Artifacts Ratio)とが分離結果の評価尺度として第2実施形態および対比例の各々について表記されている。信号対歪比は、音源分離の精度と分離信号の品質との評価尺度であり、信号対干渉比は音源分離の精度のみの評価尺度であり、非線形歪は音源分離の前後にわたる信号歪の評価尺度である。音源分離の精度および分離信号の品質の双方の観点から良好な音源分離を第2実施形態により実現できることが図6の実験結果から確認できる。   FIG. 6 is a result of an experiment of sound source separation according to the second embodiment. In the experiment of FIG. 6, a teacher matrix F is generated using a mixed sound of flute and clarinet musical sounds generated by a MIDI (Musical Instrument Digital Interface) sound source as a teacher acoustic component, and a mixed sound of natural musical instrument flute and clarinet musical sounds is generated. The flute music was extracted as the target sound component from the recorded acoustic signal SA (t). 6 is a configuration in which the compensation matrix D is not used (the compensation matrix D is excluded from the mathematical expression (2) of the first embodiment and the constraint condition C1, the constraint condition C3, and the constraint condition C4 are not considered. Configuration). In FIG. 6, a signal to distortion ratio (SDR), a signal to interference ratio (SIR), and a non-linear distortion (SAR: Sources to Artifacts Ratio) are the second evaluation criteria for separation results. It describes about each of embodiment and contrast. The signal-to-distortion ratio is a measure of the accuracy of sound source separation and the quality of the separated signal, the signal-to-interference ratio is a measure of only the accuracy of sound source separation, and the nonlinear distortion is an evaluation of signal distortion before and after sound source separation. It is a scale. It can be confirmed from the experimental results of FIG. 6 that good sound source separation can be realized by the second embodiment in terms of both the accuracy of sound source separation and the quality of the separated signal.

図7は、第2実施形態および対比例による音源分離の結果を示すスペクトログラムである。フルートおよびクラリネットの楽音の混合音を収録した音響信号SA(t)(図7の部分(A)のスペクトログラム)からフルートの楽音を目的音成分として抽出した。図7の部分(B)には、フルートを単独で演奏した楽音の音響信号SA(t)のスペクトログラム(すなわち、音源分離の理想的な結果)が図示されている。図7の部分(C)は、第2実施形態で生成された音響信号SB(t)のスペクトログラムであり、図7の部分(D)は、対比例で生成された音響信号SB(t)のスペクトログラムである。図7の部分(C)は、図7の部分(D)と比較して部分(B)に近似する。すなわち、第2実施形態によれば、対比例と比較して高精度な音源分離を実現できることが図7からも確認できる。   FIG. 7 is a spectrogram showing the result of sound source separation according to the second embodiment and comparison. The flute musical sound was extracted as the target sound component from the acoustic signal SA (t) (spectrogram of part (A) in FIG. 7) containing the mixed sound of the flute and clarinet musical sounds. Part (B) of FIG. 7 shows a spectrogram (that is, an ideal result of sound source separation) of an acoustic signal SA (t) of a musical tone played by a flute alone. A part (C) of FIG. 7 is a spectrogram of the acoustic signal SB (t) generated in the second embodiment, and a part (D) of FIG. 7 is an acoustic signal SB (t) generated in proportion. Spectrogram. The part (C) in FIG. 7 approximates the part (B) as compared with the part (D) in FIG. That is, according to the second embodiment, it can be confirmed from FIG. 7 that sound source separation with higher accuracy can be realized as compared with the comparative example.

<変形例>
以上の各形態は多様に変形され得る。具体的な変形の態様を以下に例示する。以下の例示から任意に選択された2以上の態様は適宜に併合され得る。
<Modification>
Each of the above forms can be variously modified. Specific modifications are exemplified below. Two or more aspects arbitrarily selected from the following examples can be appropriately combined.

(1)前述の各形態では、教師音響成分の観測行列Xに対する非負値行列因子分解で教師行列Fを生成したが(図2)、教師行列Fを生成する方法は任意である。教師行列Fは、教師音響成分に想定されるK個の振幅スペクトルで構成されるから、例えば、教師音響成分のK個の音高の各々について平均的な振幅スペクトルを算定し、平均後の各振幅スペクトルを基底ベクトルf[k]としてK個分を配列することで教師行列Fを生成することも可能である。すなわち、音響の振幅スペクトルを特定する任意の技術が教師行列Fの生成に適用される。 (1) In each of the above embodiments, the teacher matrix F is generated by non-negative matrix factorization with respect to the observation matrix X of the teacher acoustic component (FIG. 2), but the method of generating the teacher matrix F is arbitrary. Since the teacher matrix F is composed of K amplitude spectra assumed for the teacher sound component, for example, an average amplitude spectrum is calculated for each of the K pitches of the teacher sound component, and each average after the average is calculated. It is also possible to generate the teacher matrix F by arranging K amplitudes with the amplitude spectrum as the basis vector f [k]. That is, any technique for specifying the acoustic amplitude spectrum is applied to the generation of the teacher matrix F.

(2)前述の各形態では、行列間の相関λ((λ(F|D),λ(F|H),λ(D|H),λ(F+D|H))に着目して拘束条件C1〜C4を規定したが、各行列間の距離δ(δ(F|D),δ(F|H),δ(D|H),δ(F+D|H))に着目して拘束条件C1〜C4を規定することも可能である。例えば、拘束条件C1は、教師行列Fと補償行列Dとの距離δ(F|D)の最大化に相当し、拘束条件C2は、教師行列Fと基底行列Hとの距離δ(F|H)の最大化に相当し、拘束条件C3は、補償行列Dと基底行列Hとの距離δ(D|H)の最大化に相当し、拘束条件C4は、基底行列Bと基底行列Hとの距離δ(B|H)の最大化に相当する。例えば、一般化KL擬距離や板倉-斉藤距離(Itakura - Saito Divergence)やβ-ダイバージェンス等の公知の距離規範が距離δとして任意に採用される。行列間の距離δに着目した場合、以下の数式(13)の評価関数Zが最小化されるように未知の各行列の要素(Dmk,Gkn,Hmr,Urn)が算定される。
(2) In each of the above-mentioned forms, paying attention to the correlation λ ((λ (F | D), λ (F | H), λ (D | H), λ (F + D | H)) between the matrices. Constraint conditions C1 to C4 were defined, but focusing on the distance δ (δ (F | D), δ (F | H), δ (D | H), δ (F + D | H)) between the matrices. It is also possible to define the constraint conditions C1 to C4, for example, the constraint condition C1 corresponds to maximization of the distance δ (F | D) between the teacher matrix F and the compensation matrix D. This corresponds to maximization of the distance δ (F | H) between the teacher matrix F and the base matrix H, and the constraint condition C3 corresponds to maximization of the distance δ (D | H) between the compensation matrix D and the base matrix H. The constraint condition C4 corresponds to the maximization of the distance δ (B | H) between the base matrix B and the base matrix H. For example, the generalized KL pseudorange, the Itakura-Saito distance, and β- A well-known distance criterion such as divergence is arbitrarily adopted as the distance δ.When attention is paid to the distance δ between the matrices, the evaluation function Z in the following equation (13) is the highest. Of the elements of each unknown matrix as (D mk, G kn, H mr, U rn) is calculated.

以上の説明から理解されるように、行列間の類似度が低下(理想的には最小化)するという拘束条件は、行列間の相関λが減少(理想的には最小化)するという条件と、行列間の距離δが増加(理想的には最大化)するという条件との双方を含意し得る。   As can be understood from the above description, the constraint condition that the similarity between the matrices decreases (ideally minimized) is that the correlation λ between the matrices decreases (ideally minimizes). And the condition that the distance δ between the matrices is increased (ideally maximized).

(3)前述の各形態では、音響信号SA(t)の目的音成分を抽出(非目的音成分を抑圧)した音響信号SB(t)を生成したが、音響信号SA(t)の目的音成分を抑圧(非目的音成分を抽出)した音響信号SB(t)を生成することも可能である。例えば、行列解析部34が算定した基底行列Hと係数行列Uとを音源分離部36が乗算することで、音響信号SA(t)の非目的音成分を抽出した音響信号SB(t)が生成される。以上の説明から理解される通り、音源分離部36は、行列解析部34による解析結果を利用して目的音成分および非目的音成分の一方を分離(抽出または抑圧)する要素として包括される。 (3) In each of the embodiments described above, the acoustic signal SB (t) is generated by extracting the target sound component of the acoustic signal SA (t) (suppressing the non-target sound component), but the target sound of the acoustic signal SA (t) is generated. It is also possible to generate an acoustic signal SB (t) with components suppressed (extracting non-target sound components). For example, the sound source separation unit 36 multiplies the base matrix H calculated by the matrix analysis unit 34 and the coefficient matrix U to generate the acoustic signal SB (t) from which the non-target sound component of the acoustic signal SA (t) is extracted. Is done. As understood from the above description, the sound source separation unit 36 is included as an element that separates (extracts or suppresses) one of the target sound component and the non-target sound component using the analysis result of the matrix analysis unit 34.

また、行列解析部34による解析結果を利用して音源分離部36が音響信号SB(t)を生成する方法は以上の例示に限定されない。例えば、基底行列Hと係数行列Uとを乗算することで、非目的音成分(雑音成分)の振幅スペクトログラムを表現する行列(以下「推定雑音行列」という)Eを算定し、音響信号SA(t)の観測行列Yから推定雑音行列Eを抑圧することで音響信号SB(t)を生成することも可能である。推定雑音行列Eの抑圧には、音響信号SA(t)(観測行列Y)から雑音成分(推定雑音行列E)を抑圧する公知の技術(例えばスペクトル減算,ウィーナフィルタ,MMSE-STSA等)が任意に採用される。   The method by which the sound source separation unit 36 generates the acoustic signal SB (t) using the analysis result by the matrix analysis unit 34 is not limited to the above example. For example, by multiplying the base matrix H and the coefficient matrix U, a matrix E (hereinafter referred to as “estimated noise matrix”) E representing the amplitude spectrogram of the non-target sound component (noise component) is calculated, and the acoustic signal SA (t It is also possible to generate the acoustic signal SB (t) by suppressing the estimated noise matrix E from the observation matrix Y of FIG. For the suppression of the estimated noise matrix E, a known technique (for example, spectral subtraction, Wiener filter, MMSE-STSA, etc.) for suppressing the noise component (estimated noise matrix E) from the acoustic signal SA (t) (observation matrix Y) is arbitrary. Adopted.

(4)前述の各形態では、1種類の音源が発音した教師音響成分の教師行列Fを例示したが、相異なる種類の複数(J個)の音源が発音した音響の混合音を教師音響成分として教師行列Fを生成することも可能である。教師行列Fは、図8に示すように、教師音響成分の相異なる音源に対応するJ個の行列F[1]〜F[J]を横方向に配列した大行列である。1個の行列F[j](j=1〜J)は、J個の音源のうち第j番目の音源の音響成分を構成する各成分の振幅スペクトルを表現する1個以上の基底ベクトルを横方向に配列したM行(列数は任意)の行列である。補償行列Dは、相異なる行列F[j]に対応するJ個の行列D[1]〜D[J]を横方向に配列した行列である。行列D[j]の行数および列数は行列F[j]と共通する。未知の各行列の要素(Dmk,Gkn,Hmr,Urn)を算定する方法(更新式や拘束条件)は前述の各形態と同様である。 (4) In each of the above-described embodiments, the teacher matrix F of the teacher sound component generated by one type of sound source is illustrated. However, the mixed sound of sounds generated by a plurality of different types (J) of sound sources is used as the teacher sound component. It is also possible to generate a teacher matrix F as follows. As shown in FIG. 8, the teacher matrix F is a large matrix in which J matrices F [1] to F [J] corresponding to sound sources having different teacher acoustic components are arranged in the horizontal direction. One matrix F [j] (j = 1 to J) horizontally represents one or more basis vectors representing the amplitude spectrum of each component constituting the acoustic component of the jth sound source among the J sound sources. It is a matrix of M rows (arbitrary number of columns) arranged in the direction. The compensation matrix D is a matrix in which J matrices D [1] to D [J] corresponding to different matrices F [j] are arranged in the horizontal direction. The number of rows and columns of the matrix D [j] is the same as that of the matrix F [j]. The method (update formula and constraint condition) for calculating the elements (D mk , G kn , H mr , U rn ) of each unknown matrix is the same as in each of the above-described embodiments.

以上の構成によれば、音響信号SA(t)の複数の音響成分のうち音響特性(音源の種類)が教師音響成分と近似するJ個の音響成分を混合した目的音成分(行列BG)と目的音成分以外の非目的音成分(行列HU)とが分離される。また、教師行列F内の行列F[j]と補償行列D内の行列D[j]とを加算した行列を係数行列Gのうち当該行列に対応する各係数ベクトルg[k]に乗算することで、J個の音源のうち第j番目の音源の音響成分の振幅スペクトログラムが算定される。以上の例示から理解される通り、教師音響成分または目的音成分の音源の総数は任意である。   According to the above configuration, the target sound component (matrix BG) obtained by mixing the J acoustic components whose acoustic characteristics (sound source type) approximate the teacher acoustic component among the plurality of acoustic components of the acoustic signal SA (t). Non-target sound components (matrix HU) other than the target sound component are separated. In addition, the coefficient vector g [k] corresponding to the matrix of the coefficient matrix G is multiplied by a matrix obtained by adding the matrix F [j] in the teacher matrix F and the matrix D [j] in the compensation matrix D. Thus, the amplitude spectrogram of the acoustic component of the jth sound source among the J sound sources is calculated. As understood from the above examples, the total number of sound sources of the teacher sound component or the target sound component is arbitrary.

(5)空間情報(音響成分の到来方向)を利用した音響分離を前述の各形態に併合することも可能である。例えば、相互に離間する収音機器(マイクロホンアレイ)で収録した複数のチャネルの音響信号を解析することで特定方向からの到来音の音響信号SA(t)を抽出し、抽出後の音響信号SA(t)を対象として前述の各形態の音源分離を実行する。空間情報を利用した音源分離には公知の技術が任意に採用され得るが、例えば、Shigeki Miyabe, et. al., "Temporal quantization of spatial information using directional clustering for multichannel audio coding", IEEE WASPAA2009, p. 261-264, 2009が好適である。以上の構成によれば、受音点(複数の収音機器)に対して所定の方向から到来する目的音成分を高精度に分離できるという利点がある。 (5) It is also possible to merge acoustic separation using spatial information (acoustic component arrival direction) into the above-described embodiments. For example, an acoustic signal SA (t) of an incoming sound from a specific direction is extracted by analyzing acoustic signals of a plurality of channels recorded by sound collecting devices (microphone arrays) separated from each other, and the extracted acoustic signal SA The sound source separation of each form described above is executed for (t). Known techniques can be arbitrarily employed for sound source separation using spatial information. For example, Shigeki Miyabe, et.al., "Temporal quantization of spatial information using directional clustering for multichannel audio coding", IEEE WASPAA2009, p. 261-264, 2009 is preferred. According to the above structure, there exists an advantage that the target sound component which arrives from a predetermined direction with respect to a sound receiving point (several sound collection apparatus) can be isolate | separated with high precision.

(6)行列解析部34が反復的に実行する演算の内容は以上の各形態での例示((4)〜(7),(9)〜(12))に限定されない。例えば、前掲の数式(9)を以下の数式(9A)に置換することも可能である。
すなわち、前掲の数式(9)では、行列Vの項毎(2μ1(FFTD),2μ3(HHTD))に各要素の正負が判別されるのに対し、数式(9A)では、行列Vの全体として(すなわち各項の加算後に)各要素の正負が判別される。
(6) The contents of the calculation that the matrix analysis unit 34 repeatedly executes are not limited to the examples ((4) to (7) and (9) to (12)) in the above embodiments. For example, the above formula (9) can be replaced with the following formula (9A).
That is, in the above formula (9), the sign of each element is determined for each term of the matrix V (2μ 1 (FF T D), 2μ 3 (HH T D)), whereas in the formula (9A) The sign of each element is determined as a whole of the matrix V (that is, after addition of each term).

(7)非負値行列因子分解に適用される拘束条件は前述の各形態の例示に限定されない。例えば、前述の各形態のように、拘束条件C2(教師行列Fと基底行列Hとの類似度の低下)と拘束条件C3(補償行列Dと基底行列Hとの類似度の低下)とを加味すれば拘束条件C4も成立し得るから、拘束条件C4を省略することも可能である。 (7) The constraint condition applied to the non-negative matrix factorization is not limited to the examples of the above-described embodiments. For example, as in the above-described embodiments, the constraint condition C2 (decrease in similarity between the teacher matrix F and the base matrix H) and the constraint condition C3 (decrease in similarity between the compensation matrix D and the base matrix H) are taken into account. In this case, the constraint condition C4 can also be established, so that the constraint condition C4 can be omitted.

(8)前述の各形態では音響信号SA(t)の全帯域を処理対象としたが、音響信号SA(t)のうち特定の帯域を選択的に処理対象とすることも可能である。音響信号SA(t)のうち所望の音源に想定される帯域成分のみを処理対象とすれば、その音源の分離精度を向上することが可能である。 (8) In the above-described embodiments, the entire band of the acoustic signal SA (t) is the processing target. However, a specific band of the acoustic signal SA (t) can be selectively processed. If only the band component assumed for the desired sound source in the acoustic signal SA (t) is processed, the separation accuracy of the sound source can be improved.

(9)行列解析部34による解析結果を利用した音響信号SB(t)の生成(音源分離部36)は省略され得る。例えば、音響信号SA(t)の観測行列Yに対する非負値行列因子分解で未知の行列(D,G,H,U)を算定する音響解析装置100(行列解析部34)としても本発明は実施され得る。また、携帯電話機等の端末装置と通信するサーバ装置で音響解析装置100を実現することも可能である。例えば、音響解析装置100は、端末装置から受信した音響信号SA(t)から音響信号SB(t)を生成して端末装置に送信する。なお、音響信号SA(t)の観測行列Yを端末装置から受信する構成(例えば端末装置が周波数分析部32を具備する構成)では音響解析装置100から周波数分析部32が省略され、行列解析部34による解析結果を端末装置に送信する構成(例えば端末装置が音源分離部36を具備する構成)では音響解析装置100から音源分離部36が省略される。 (9) Generation of the acoustic signal SB (t) using the analysis result by the matrix analysis unit 34 (sound source separation unit 36) can be omitted. For example, the present invention is also implemented as an acoustic analysis apparatus 100 (matrix analysis unit 34) that calculates an unknown matrix (D, G, H, U) by non-negative matrix factorization with respect to the observation matrix Y of the acoustic signal SA (t). Can be done. In addition, the acoustic analysis device 100 can be realized by a server device that communicates with a terminal device such as a mobile phone. For example, the acoustic analysis device 100 generates an acoustic signal SB (t) from the acoustic signal SA (t) received from the terminal device and transmits the acoustic signal SB (t) to the terminal device. In the configuration in which the observation matrix Y of the acoustic signal SA (t) is received from the terminal device (for example, the configuration in which the terminal device includes the frequency analysis unit 32), the frequency analysis unit 32 is omitted from the acoustic analysis device 100, and the matrix analysis unit In the configuration in which the analysis result of 34 is transmitted to the terminal device (for example, the configuration in which the terminal device includes the sound source separation unit 36), the sound source separation unit 36 is omitted from the acoustic analysis device 100.

100……音響解析装置、12……信号供給装置、14……放音装置、22……演算処理装置、24……記憶装置、32……周波数分析部、34……行列解析部、36……音源分離部、F……教師行列、D……補償行列、B,H……基底行列、G,U……係数行列、X,Y……観測行列。 DESCRIPTION OF SYMBOLS 100 ... Acoustic analysis device, 12 ... Signal supply device, 14 ... Sound emission device, 22 ... Arithmetic processing device, 24 ... Memory | storage device, 32 ... Frequency analysis part, 34 ... Matrix analysis part, 36 ... ... sound source separation unit, F ... teacher matrix, D ... compensation matrix, B, H ... basis matrix, G, U ... coefficient matrix, X, Y ... observation matrix.

Claims (5)

教師音響成分のスペクトルを示す複数の基底ベクトルを含む既知の教師行列に補償行列を加算した行列であって音響信号の目的音成分のスペクトルを示す複数の基底ベクトルを含む第1基底行列と、前記第1基底行列の各基底ベクトルに対する加重値の時間変化を示す複数の係数ベクトルを含む第1係数行列と、前記音響信号の目的音成分以外の非目的音成分のスペクトルを示す複数の基底ベクトルを含む第2基底行列と、前記第2基底行列の各基底ベクトルに対する加重値の時間変化を示す複数の係数ベクトルを含む第2係数行列とを、前記音響信号に対する非負値行列因子分解で算定する行列解析手段
を具備する音響解析装置。
A matrix obtained by adding a compensation matrix to a known teacher matrix including a plurality of basis vectors indicating a spectrum of a teacher acoustic component, the first basis matrix including a plurality of basis vectors indicating a spectrum of a target sound component of an acoustic signal; A first coefficient matrix including a plurality of coefficient vectors indicating a temporal change in a weight value for each basis vector of the first basis matrix; and a plurality of basis vectors indicating a spectrum of a non-target sound component other than the target sound component of the acoustic signal. A matrix that calculates a second basis matrix including the second basis matrix and a second coefficient matrix including a plurality of coefficient vectors indicating a time change of a weight value for each basis vector of the second basis matrix by non-negative matrix factorization for the acoustic signal An acoustic analysis device comprising analysis means.
前記第1基底行列と前記第1係数行列とに応じた前記目的音成分、および、前記第2基底行列と前記第2係数行列とに応じた前記非目的音成分の少なくとも一方を生成する音源分離手段
を具備する請求項1の音響解析装置。
Sound source separation for generating at least one of the target sound component according to the first base matrix and the first coefficient matrix and the non-target sound component according to the second base matrix and the second coefficient matrix The acoustic analysis device according to claim 1, further comprising:
前記行列解析手段は、前記教師行列と前記補償行列との類似度が低下するという拘束条件のもとで前記非負値行列因子分解を実行する
請求項1または請求項2の音響解析装置。
The acoustic analysis apparatus according to claim 1, wherein the matrix analysis unit performs the non-negative matrix factorization under a constraint that a similarity between the teacher matrix and the compensation matrix decreases.
前記行列解析手段は、前記補償行列と前記第2基底行列との類似度が低下するという拘束条件のもとで前記非負値行列因子分解を実行する
請求項1から請求項3の何れかの音響解析装置。
The acoustic analysis according to any one of claims 1 to 3, wherein the matrix analysis unit performs the non-negative matrix factorization under a constraint that a similarity between the compensation matrix and the second basis matrix decreases. Analysis device.
前記補償行列は負数の要素を含み、
前記行列解析手段は、前記第1基底行列が非負値行列であるという拘束条件のもとで前記非負値行列因子分解を実行する
請求項1から請求項4の何れかの音響解析装置。
The compensation matrix includes negative elements;
The acoustic analysis apparatus according to claim 1, wherein the matrix analysis unit performs the non-negative matrix factorization under a constraint that the first basis matrix is a non-negative matrix.
JP2013004375A 2013-01-15 2013-01-15 Acoustic analyzer Pending JP2014137389A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2013004375A JP2014137389A (en) 2013-01-15 2013-01-15 Acoustic analyzer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2013004375A JP2014137389A (en) 2013-01-15 2013-01-15 Acoustic analyzer

Publications (1)

Publication Number Publication Date
JP2014137389A true JP2014137389A (en) 2014-07-28

Family

ID=51414946

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2013004375A Pending JP2014137389A (en) 2013-01-15 2013-01-15 Acoustic analyzer

Country Status (1)

Country Link
JP (1) JP2014137389A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017046976A1 (en) * 2015-09-16 2017-03-23 日本電気株式会社 Signal detection device, signal detection method, and signal detection program

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017046976A1 (en) * 2015-09-16 2017-03-23 日本電気株式会社 Signal detection device, signal detection method, and signal detection program
JPWO2017046976A1 (en) * 2015-09-16 2018-07-05 日本電気株式会社 Signal detection device, signal detection method, and signal detection program
US10650842B2 (en) 2015-09-16 2020-05-12 Nec Corporation Signal detection device, signal detection method, and signal detection program

Similar Documents

Publication Publication Date Title
US20210089967A1 (en) Data training in multi-sensor setups
KR101564151B1 (en) Decomposition of music signals using basis functions with time-evolution information
KR101521368B1 (en) Method, apparatus and machine-readable storage medium for decomposing a multichannel audio signal
JP6807029B2 (en) Sound source separators and methods, and programs
JP5942420B2 (en) Sound processing apparatus and sound processing method
Ozerov et al. Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation
EP3201917B1 (en) Method, apparatus and system for blind source separation
WO2009110574A1 (en) Signal emphasis device, method thereof, program, and recording medium
JP6485711B2 (en) Sound field reproduction apparatus and method, and program
Prätzlich et al. Kernel additive modeling for interference reduction in multi-channel music recordings
Muñoz-Montoro et al. Multichannel blind music source separation using directivity-aware MNMF with harmonicity constraints
JP5454330B2 (en) Sound processor
Chen et al. A dual-stream deep attractor network with multi-domain learning for speech dereverberation and separation
WO2021161543A1 (en) Signal processing device, signal processing method, and signal processing program
EP4131250A1 (en) Method and system for instrument separating and reproducing for mixture audio source
Mu et al. A timbre matching approach to enhance audio quality of psychoacoustic bass enhancement system
JP2014137389A (en) Acoustic analyzer
JP2014134688A (en) Acoustic analyzer
JP2014215544A (en) Sound processing device
JP2018049228A (en) Acoustic processing device and acoustic processing method
JP2006180392A (en) Sound source separation learning method, apparatus and program, sound source separation method, apparatus and program, and recording medium
CN115136234A (en) Sound processing method, estimation model training method, sound processing system, and program
JP2014222281A (en) Acoustic processing device
JP2016001235A (en) Information processor, terminal device and program

Legal Events

Date Code Title Description
RD04 Notification of resignation of power of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7424

Effective date: 20150410