JP2011133780A

JP2011133780A - Signal analyzing device, signal analyzing method and signal analyzing program

Info

Publication number: JP2011133780A
Application number: JP2009294892A
Authority: JP
Inventors: Hirokazu Kameoka; 弘和亀岡; Roux Jonathan Le; ジョナトンルルー; Yasutomo Oishi; 康智大石; Kunio Kayano; 邦夫柏野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2009-12-25
Filing date: 2009-12-25
Publication date: 2011-07-07
Anticipated expiration: 2029-12-25
Also published as: JP5580585B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a signal analyzing device that achieves hierarchical factorization. <P>SOLUTION: The signal analyzing device includes: a means which determines a data matrix Y for storing time-frequency components of acoustic signal data by performing time-frequency analysis on read acoustic signal data; a means which sets initial values of a spectrum base parameter, a power envelope base parameter and a power envelope base activity value to be determined when the data matrix Y is approximated by the product of a base matrix H and a coefficient matrix U using a non-negative matrix factorization; a means which calculates a spectrogram model value obtained by expressing as a convolution mixture format the spectrum base activity time series corresponding to each row of the coefficient matrix U, using the parameter with initial value; and a means which continues updating the value till convergence of the parameter to be determined, and outputs the parameter value to be determined at the time of convergence. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、音響信号をノート（楽譜における音符）ごとの信号に分解するために音響信号を分析する信号分析装置、信号分析方法及び信号分析プログラムに関する。 The present invention relates to a signal analysis apparatus, a signal analysis method, and a signal analysis program for analyzing an acoustic signal in order to decompose the acoustic signal into signals for each note (note in a musical score).

複数の音響信号が重畳する混合信号から個々の音響信号を分離抽出することは容易ではない。このような問題は音源分離と呼ばれ、特にモノラル信号を対象とした信号分離は典型的な不良設定問題であり、何の仮定も置かずに解くことは困難である。モノラル信号分離の研究は、これまで多くのアプローチが検討されているが、近年有効なアプローチとして脚光を浴びているのは、非負値行列因子分解（Non-negative Matrix Factorization；ＮＭＦ）の原理を応用した手法である（例えば、特許文献１参照）。このアプローチでは、観測信号の各時刻でのスペクトル（周波数成分の大きさ）を列ベクトルとして並べた非負値のデータ行列Ｙを非負値の基底行列Ｈと非負値の係数行列Ｕの積の形で近似する。
It is not easy to separate and extract individual acoustic signals from a mixed signal in which a plurality of acoustic signals are superimposed. Such a problem is called sound source separation. In particular, signal separation for monaural signals is a typical defect setting problem and is difficult to solve without making any assumptions. Many approaches have been studied for monaural signal separation so far, but the approach that has been highlighted as an effective approach in recent years is the application of the principle of non-negative matrix factorization (NMF). (For example, refer to Patent Document 1). In this approach, a non-negative data matrix Y in which spectra (magnitudes of frequency components) at each time of an observation signal are arranged as a column vector is expressed as a product of a non-negative base matrix H and a non-negative coefficient matrix U. Approximate.

この結果、観測される全スペクトルを構成しているスペクトル基底関数が基底行列Ｈの各列に格納され、特定のスペクトル基底関数が各時刻でどの程度の大きさでアクティベートしているかを表すスペクトル基底アクティビティ値の時系列が係数行列Ｕの一つの行に格納される。以上のようにして、信号の分解表現を得ることができる。この手法は、音源分離の問題に対して、「観測信号は限られた種類のスペクトルをもった音だけで構成される」という仮定を活用している点が特徴的であり、この仮定に適合する信号に対しては有効な解法となる。 As a result, the spectral basis functions constituting the entire spectrum to be observed are stored in each column of the basis matrix H, and the spectral basis representing how large the specific spectral basis function is activated at each time. A time series of activity values is stored in one row of the coefficient matrix U. As described above, a decomposition expression of the signal can be obtained. This method is characterized by the fact that it uses the assumption that the observed signal consists only of sounds with a limited type of spectrum for the sound source separation problem. This is an effective solution for signals that

非負値行列因子分解によるスペクトログラムの分解表現は、スペクトログラムを
と見立て、
となるような
を決めてやることによって得られる。ただし、ω、ｔはそれぞれ周波数と時刻に対応するインデックスである。 The decomposition expression of the spectrogram by non-negative matrix factorization is the spectrogram
As
Like
It is obtained by deciding. Here, ω and t are indexes corresponding to the frequency and time, respectively.

ｙ_ｔ：＝（Ｙ_１，，・・・，Ｙ_Ω，ｔ）^Ｔ，ｈ_ｉ：＝（Ｈ_１，ｉ，・・・，Ｈ_Ω，ｉ）^Ｔとすると式（２）は、
と書き直せることから分かるとおり、すべてのｔにおける観測データｙ_ｔを、高々Ｉ種類の「パーツ」ｈ_１，・・・，ｈ_Ｉだけで構成されたものと見なそうとしていることになり、そのためにそれぞれのパーツをどのように置くのが最も妥当かを判断する問題と理解される。これにより得られるそれぞれの行列が表すものは、図２を見るとより分かりやすい。Ｈの各列ベクトルには楽曲中に繰り返し現れるスペクトルが典型的な構成パーツと見なされて表出される。従って、音楽スペクトログラムが楽器の種類や音階で決まる限られたパターンのスペクトルだけで構成されているとするなら、Ｈの各列ベクトルはおおよそ特定楽器の特定音階に対応したスペクトルとなる。一方、Ｕの各行ベクトルはそれぞれのスペクトルパーツがどの時刻にどの程度の強さでアクティベートしているかを表している。 _{_{y t: = (Y 1,}} , ···, Y Ω, t) T, h i: = (H 1, i, ···, H Ω, i) When ^T Equation (2)
As can be seen from the above, the observation data y _t at all t is to be regarded as being composed of at most I types of “parts” h ₁ ,..., H _I. It is understood that this is a matter of determining how to place each part most appropriately. What each matrix obtained in this way represents is easier to understand when looking at FIG. In each column vector of H, a spectrum repeatedly appearing in the music is regarded as a typical component part and is expressed. Therefore, if the music spectrogram is composed only of a limited pattern spectrum determined by the type of musical instrument and the scale, each column vector of H is a spectrum that roughly corresponds to a specific musical scale of the specific musical instrument. On the other hand, each row vector of U represents how much intensity each spectrum part is activated at what time.

P. Smaragdis and J. C. Brown, “Non-negative matrix factorization for music transcription," in Proceedings of the IEEEWorkshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2003, pp. 177.180.P. Smaragdis and J. C. Brown, “Non-negative matrix factorization for music transcription,” in Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2003, pp. 177.180.

前述のとおり、非負値行列因子分解は、観測スペクトルを格納したデータ行列を、スペクトル基底を各列に格納した基底行列とスペクトル基底アクティビティの時系列を各行に格納した係数行列の積で近似することで、観測スペクトルの集合からスペクトル基底関数を自動獲得し、スペクトル基底関数ごとに観測スペクトルを分解する機能をもつものである。 As mentioned above, non-negative matrix factorization approximates a data matrix that stores the observed spectrum by the product of the base matrix that stores the spectrum base in each column and the coefficient matrix that stores the time series of spectrum base activities in each row. Thus, a spectrum basis function is automatically acquired from a set of observed spectra, and the observed spectrum is decomposed for each spectrum basis function.

しかしながら、音楽信号をノート（楽譜における音符）ごとの信号に分解するためには、スペクトル基底アクティビティ時系列自体をも各ノートイベントに該当するスペクトル基底アクティビティ時系列に分解する必要がある。明らかに従来の非負値行列因子分解にはこのような階層的な分解表現を取得する機能がないという問題がある。 However, in order to decompose a music signal into signals for each note (notes in a musical score), it is necessary to decompose the spectrum base activity time series itself into a spectrum base activity time series corresponding to each note event. Obviously, the conventional non-negative matrix factorization has a problem that there is no function for acquiring such a hierarchical decomposition expression.

本発明は、このような事情に鑑みてなされたもので、前述したような階層的な分解表現を得ることができる信号分析装置、信号分析方法及び信号分析プログラムを提供することを目的とする。 The present invention has been made in view of such circumstances, and an object thereof is to provide a signal analysis device, a signal analysis method, and a signal analysis program capable of obtaining the hierarchical decomposition expression as described above.

本発明は、音響信号データが記憶された信号データ記憶手段と、前記信号データ記憶手段から読み込んだ前記音響信号データに対する時間周波数解析により、前記音響信号データの時間周波数成分を格納したデータ行列Ｙを求める時間周波数解析手段と、非負値行列因子分解手法を用いて、前記データ行列Ｙを、基底行列Ｈと係数行列Ｕの積で近似した場合に、求めるべきスペクトル基底パラメータ、パワーエンベロープ基底パラメータ及びパワーエンベロープ基底アクティビティ値それぞれの初期値を設定する初期値設定手段と、前記初期値が設定されたスペクトル基底パラメータ、パワーエンベロープ基底パラメータ及びパワーエンベロープ基底アクティビティ値を用いて、前記係数行列Ｕの各行に対応するスペクトル基底アクティビティ時系列を畳み込み混合の形式で表現したスペクトログラムモデルの値を算出するモデル算出手段と、前記スペクトログラムモデルの値と、前記データ行列と、前記スペクトル基底パラメータと、前記パワーエンベロープ基底パラメータ及び前記パワーエンベロープ基底アクティビティ値それぞれの値を用いて、前記初期値を設定した前記スペクトル基底パラメータと、前記パワーエンベロープ基底パラメータと、前記パワーエンベロープ基底アクティビティ値とを更新する更新手段と、前記スペクトル基底パラメータ、前記パワーエンベロープ基底パラメータ及び前記パワーエンベロープ基底アクティビティ値が収束するまで前記更新手段により値の更新を続け、収束した時点で、前記スペクトル基底パラメータ、前記パワーエンベロープ基底パラメータ、前記パワーエンベロープ基底アクティビティ値及び前記スペクトログラムモデルの値を出力する出力手段とを備えたことを特徴とする。 The present invention provides a signal data storage means storing acoustic signal data, and a data matrix Y storing time frequency components of the acoustic signal data by time frequency analysis on the acoustic signal data read from the signal data storage means. Spectral basis parameters, power envelope basis parameters, and power to be obtained when the data matrix Y is approximated by a product of a basis matrix H and a coefficient matrix U using a time frequency analysis means to be obtained and a non-negative matrix factorization method. Corresponding to each row of the coefficient matrix U by using an initial value setting means for setting initial values of each envelope base activity value, and a spectrum base parameter, a power envelope base parameter, and a power envelope base activity value in which the initial values are set Spectral basis activity Model calculation means for calculating a spectrogram model value representing a time series in the form of convolution mixture, the spectrogram model value, the data matrix, the spectrum basis parameter, the power envelope basis parameter, and the power envelope basis Update means for updating the spectrum base parameter, the power envelope base parameter, and the power envelope base activity value for which the initial value is set using the values of the activity values, the spectrum base parameter, and the power envelope The updating of the value is continued by the updating means until the basis parameter and the power envelope basis activity value converge, and at the time of convergence, the spectrum basis parameter, the power envelope Bottom parameter, characterized in that an output means for outputting a value of the power envelope basal activity value and the spectrogram model.

本発明は、音響信号データが記憶された信号データ記憶手段を備えた信号分析装置のコンピュータに、信号分析処理を行わせる信号分析方法であって、前記信号データ記憶手段から読み込んだ前記音響信号データに対する時間周波数解析により、前記音響信号データの時間周波数成分を格納したデータ行列Ｙを求める時間周波数解析ステップと、非負値行列因子分解手法を用いて、前記データ行列Ｙを、基底行列Ｈと係数行列Ｕの積で近似した場合に、求めるべきスペクトル基底パラメータ、パワーエンベロープ基底パラメータ及びパワーエンベロープ基底アクティビティ値それぞれの初期値を設定する初期値設定ステップと、前記初期値が設定されたスペクトル基底パラメータ、パワーエンベロープ基底パラメータ及びパワーエンベロープ基底アクティビティ値を用いて、前記係数行列Ｕの各行に対応するスペクトル基底アクティビティ時系列を畳み込み混合の形式で表現したスペクトログラムモデルの値を算出するモデル算出ステップと、前記スペクトログラムモデルの値と、前記データ行列と、前記スペクトル基底パラメータと、前記パワーエンベロープ基底パラメータ及び前記パワーエンベロープ基底アクティビティ値それぞれの値を用いて、前記初期値を設定した前記スペクトル基底パラメータと、前記パワーエンベロープ基底パラメータと、前記パワーエンベロープ基底アクティビティ値とを更新する更新ステップと、前記スペクトル基底パラメータ、前記パワーエンベロープ基底パラメータ及び前記パワーエンベロープ基底アクティビティ値が収束するまで前記更新ステップにより値の更新を続け、収束した時点で、前記スペクトル基底パラメータ、前記パワーエンベロープ基底パラメータ、前記パワーエンベロープ基底アクティビティ値及び前記スペクトログラムモデルの値を出力する出力ステップとを有することを特徴とする。 The present invention is a signal analysis method for causing a computer of a signal analysis apparatus having signal data storage means in which acoustic signal data is stored to perform signal analysis processing, wherein the acoustic signal data read from the signal data storage means A time-frequency analysis step for obtaining a data matrix Y storing a time-frequency component of the acoustic signal data by a time-frequency analysis on the sound signal, and using a non-negative matrix factorization technique, An initial value setting step for setting initial values of spectrum base parameters, power envelope base parameters, and power envelope base activity values to be obtained when approximated by the product of U, and the spectral base parameters and powers for which the initial values are set Envelope basis parameters and power envelope A model calculation step of calculating a spectrogram model value representing a spectrum base activity time series corresponding to each row of the coefficient matrix U in a convolution mixed format using a bottom activity value, the spectrogram model value, and the data A matrix, the spectrum basis parameter, the spectrum basis parameter in which the initial value is set using the values of the power envelope basis parameter and the power envelope basis activity value, the power envelope basis parameter, and the power envelope. An updating step for updating a base activity value, and the updating until the spectrum base parameter, the power envelope base parameter, and the power envelope base activity value converge. Continues to update the value by step, at the time of convergence, the spectral basis parameter, the power envelope basal parameters, and having an output step of outputting a value of the power envelope basal activity value and the spectrogram model.

本発明は、音響信号データが記憶された信号データ記憶手段を備えた信号分析装置のコンピュータに、信号分析処理を行わせる信号分析プログラムであって、前記信号データ記憶手段から読み込んだ前記音響信号データに対する時間周波数解析により、前記音響信号データの時間周波数成分を格納したデータ行列Ｙを求める時間周波数解析ステップと、非負値行列因子分解手法を用いて、前記データ行列Ｙを、基底行列Ｈと係数行列Ｕの積で近似した場合に、求めるべきスペクトル基底パラメータ、パワーエンベロープ基底パラメータ及びパワーエンベロープ基底アクティビティ値それぞれの初期値を設定する初期値設定ステップと、前記初期値が設定されたスペクトル基底パラメータ、パワーエンベロープ基底パラメータ及びパワーエンベロープ基底アクティビティ値を用いて、前記係数行列Ｕの各行に対応するスペクトル基底アクティビティ時系列を畳み込み混合の形式で表現したスペクトログラムモデルの値を算出するモデル算出ステップと、前記スペクトログラムモデルの値と、前記データ行列と、前記スペクトル基底パラメータと、前記パワーエンベロープ基底パラメータ及び前記パワーエンベロープ基底アクティビティ値それぞれの値を用いて、前記初期値を設定した前記スペクトル基底パラメータと、前記パワーエンベロープ基底パラメータと、前記パワーエンベロープ基底アクティビティ値とを更新する更新ステップと、前記スペクトル基底パラメータ、前記パワーエンベロープ基底パラメータ及び前記パワーエンベロープ基底アクティビティ値が収束するまで前記更新ステップにより値の更新を続け、収束した時点で、前記スペクトル基底パラメータ、前記パワーエンベロープ基底パラメータ、前記パワーエンベロープ基底アクティビティ値及び前記スペクトログラムモデルの値を出力する出力ステップとを前記コンピュータに行わせることを特徴とする。 The present invention is a signal analysis program for causing a computer of a signal analysis apparatus having signal data storage means in which acoustic signal data is stored to perform signal analysis processing, the acoustic signal data read from the signal data storage means A time-frequency analysis step for obtaining a data matrix Y storing a time-frequency component of the acoustic signal data by a time-frequency analysis on the sound signal, and using a non-negative matrix factorization technique, An initial value setting step for setting initial values of spectrum base parameters, power envelope base parameters, and power envelope base activity values to be obtained when approximated by the product of U, and the spectral base parameters and powers for which the initial values are set Envelope basis parameters and power envelope A model calculation step of calculating a spectrogram model value expressing a spectrum base activity time series corresponding to each row of the coefficient matrix U in the form of a convolution mixture using a loop basis activity value; a value of the spectrogram model; Using the data matrix, the spectrum basis parameter, the power envelope basis parameter and the power envelope basis activity value, the spectrum basis parameter in which the initial value is set, the power envelope basis parameter, An update step of updating a power envelope basis activity value, and before the spectral basis parameter, the power envelope basis parameter and the power envelope basis activity value converge Updating the value by the updating step, and causing the computer to perform an output step of outputting the spectrum basis parameter, the power envelope basis parameter, the power envelope basis activity value, and the spectrogram model value at the time of convergence. It is characterized by.

本発明によれば、複数の音が混在している音響信号からの特定音の検出、複数の音が混在している音響信号からの特定音の抽出、および複数の音が混在している音響信号からの特定音の加工などに信号分析結果のパラメータを利用することができるという効果が得られる。 According to the present invention, detection of a specific sound from an acoustic signal in which a plurality of sounds are mixed, extraction of a specific sound from an acoustic signal in which a plurality of sounds are mixed, and acoustic in which a plurality of sounds are mixed An effect is obtained that a parameter of a signal analysis result can be used for processing a specific sound from a signal.

本発明の一実施形態の構成を示すブロック図である。It is a block diagram which shows the structure of one Embodiment of this invention. 非負値行列因子分解（ＮＭＦ）による音楽スペクトルグラムの分解表現を示す説明図である。It is explanatory drawing which shows the decomposition | disassembly expression of the music spectrumgram by nonnegative matrix factorization (NMF).

以下、図面を参照して、本発明の一実施形態による信号分析装置を説明する。初めに、本発明による信号分析装置の基本原理について説明する。音楽信号からデータ行列を取得した場合、非負値行列因子分解によって得られる係数行列の各行に格納されたスペクトル基底アクティビティ時系列は、限られた種類の局所パターンの組み合わせで表現されることが多い。これは、音楽においては、音価と呼ぶノートの発音時間長に関する単位があり、同じ音価のノートは音の立ち上がり方と減衰の仕方が類似することが多いからである。 Hereinafter, a signal analyzer according to an embodiment of the present invention will be described with reference to the drawings. First, the basic principle of the signal analyzer according to the present invention will be described. When a data matrix is acquired from a music signal, the spectrum base activity time series stored in each row of the coefficient matrix obtained by non-negative matrix factorization is often expressed by a combination of limited types of local patterns. This is because, in music, there is a unit related to the sound duration of a note called note value, and notes with the same note value are often similar in how the sound rises and decays.

本発明では、前述した音楽の性質を利用し、観測スペクトルの集合からスペクトル基底関数を自動獲得しつつスペクトル基底関数ごとに観測スペクトルを分解する非負値行列因子分解の従来の機能に加え、各スペクトル基底アクティビティ時系列の中に混在する局所パターンの基底関数（以後、これをパワーエンベロープ基底という）を自動獲得しつつパワーエンベロープ基底ごとにスペクトル基底アクティビティ時系列を分解する機能を実現している。 In the present invention, in addition to the conventional function of non-negative matrix factorization that decomposes the observed spectrum for each spectral basis function while automatically acquiring the spectral basis function from the set of observed spectra, using the above-mentioned music properties, A function for decomposing the spectrum basis activity time series for each power envelope basis is realized while automatically acquiring basis functions of local patterns (hereinafter referred to as power envelope basis) mixed in the basis activity time series.

具体的に説明すると、係数行列Ｕの各行に対応するスペクトル基底アクティビティ時系列を畳み込み混合の形で表現する。すなわち、Ｕのｉ行目の要素をＵ_ｉ，０，Ｕ_ｉ，１，・・・とすると、
と表す。Ｇ_ｊ，０，Ｇ_ｊ，１，・・・はｊ番目のパワーエンベロープ基底であり、Ｏ_{ｉ，ｊ，０}，Ｏ_{ｉ，ｊ，１}，・・・はそのアクティビティを表すので以後、Ｏ_{ｉ，ｊ，ｔ}をパワーエンベロープ基底アクティビティと呼ぶ。ところで、畳み込み混合とは、複数の信号源から到来する信号を複数のマイクロホンで観測する際の、信号源の混合過程のモデルとして広く認知されているものであり、前述の場合、Ｕのｉ行目のスペクトル基底アクティビティ時系列がｉ番目のマイクロホンにおける観測信号に対応し、Ｇ_ｊ，１，Ｇ_ｊ，２，・・・がｊ番目の信号源の信号に対応し、Ｏ_{ｉ，ｊ，１}，Ｏ_{ｉ，ｊ，２}・・・がｊ番目の信号源からｉ番目のマイクロホンまでのインパルス応答に対応していることになる。 More specifically, the spectrum base activity time series corresponding to each row of the coefficient matrix U is expressed in the form of convolutional mixing. That is, if the elements in the i-th row of U are U _{i, 0} , U _{i, 1} ,.
It expresses. G _{j, 0} , G _{j, 1} ,... Is the jth power envelope base, and O _{i, j, 0} , O _{i, j, 1} _{,. , J, t} are called power envelope base activities. By the way, convolutional mixing is widely recognized as a model of a mixing process of signal sources when signals coming from a plurality of signal sources are observed with a plurality of microphones. The spectral basis activity time series of the eye corresponds to the observed signal at the i-th microphone, G _{j, 1} , G _{j, 2} ,... Correspond to the signal of the j-th signal source, and O _{i, j, 1} , O _{i, j, 2} ... Correspond to impulse responses from the j-th signal source to the i-th microphone.

次に、スペクトログラムモデルについて説明する。図２に示すＵの各行ベクトルを見ると、ｈ_ｉのスペクトル基底アクティベーションの時間エンベロープは、限られた種類の局所パターンだけで構成されていることに気づく。これは、個々のノートは限られた種類の音長で弾かれるため、立ち上がり・減衰パターンの種類もまた限られるからである。そこで、Ｕに関して、さらに、
のような畳み込み混合の形で表される分解表現を考える。Ｇ_ｊ，τはｊ番目の時間エンベロープの局所パターンを表す。一方Ｏ_{ｉ，ｊ，ｔ}はそのアクティベーションを表し、理想的には各ノートのオンセット時刻にピークが立つようなイメージとなる。式（２）の右辺に式（５）を代入すると、
のように、式（２）で与えられるＮＭＦ型のモデルを拡張したスペクトログラムモデルが立てられる。ここで、分解のスケールの任意性を除く目的で、
を仮定しておく。 Next, the spectrogram model will be described. Looking at each row vector of U shown in FIG. 2, the time envelope of the spectral basis Activation h _i is notice that consists only of a limited type of local patterns. This is because each note is played with a limited number of note lengths, so the types of rise and decay patterns are also limited. So for U,
Consider a decomposition expression expressed in the form of convolutional mixing. G _{j, τ} represents the local pattern of the jth time envelope. On the other hand, O _{i, j, t} represents the activation and ideally has an image that peaks at the onset time of each note. Substituting equation (5) into the right side of equation (2),
Thus, a spectrogram model obtained by extending the NMF type model given by the equation (2) is established. Here, for the purpose of removing the arbitrary scale of decomposition,
Is assumed.

次に、最適化アルゴリズムについて説明する。非負値行列因子分解でＹから図２のようにＨとＵを求めたように、提案するモデルのもとでＹからＨとＧとＯを求めたい。以下では、提案モデルによる音楽の階層的スパース表現を得るための最適化アルゴリズムについて説明する。まず、二乗誤差規準について説明する。ここでは、観測スペクトログラムＹ：＝（Ｙ_ω，ｔ）_Ω×Ｔのもとで設定される次の最適化問題
を検討する。Ｓ（Ｇ，Ｏ）はＧ，Ｏをスパースな解へと誘導する正則化項であり、ここでは
と定義する。ただし、０＜ｐ_ｇ≦２，０＜ｐ_ｏ≦２とする。 Next, an optimization algorithm will be described. We want to find H, G, and O from Y under the proposed model, as H and U were obtained from Y as shown in FIG. 2 by non-negative matrix factorization. In the following, an optimization algorithm for obtaining a hierarchical sparse representation of music according to the proposed model will be described. First, the square error criterion will be described. Here, the following optimization problem is set under the observation spectrogram Y: = ( _{Yω, t} ) _{Ω × T.}
To consider. S (G, O) is a regularization term that leads G and O to a sparse solution, where
It is defined as However, 0 <p _g ≦ 2 and 0 <p _o ≦ 2.

まず、Ｆを降下させるＨの更新式を導出する。Ｈ，Ｇ，Ｏの１ステップ前での更新値をそれぞれＨ’，Ｇ’，Ｏ’とすると、
となる。ただし、
である。詳細な説明は省略するが、Ｆ_Ｈ（Ｈ，Ｇ’，Ｏ’）を最小化するＨは解析的に
と求まり、このように更新すればＦ（Ｈ，Ｇ’，Ｏ’）は増加しないことが保証される。また、Ｈ’およびＵ’がともに非負値であればＨも必ず非負値となる。 First, an update formula for H that lowers F is derived. If the updated values of H, G and O one step before are H ′, G ′ and O ′, respectively,
It becomes. However,
It is. Although detailed explanation is omitted, _H that minimizes F _H (H, G ′, O ′) is analytically determined.
It can be guaranteed that F (H, G ′, O ′) does not increase by updating in this way. Further, if both H ′ and U ′ are non-negative values, H is always a non-negative value.

次に、Ｆを降下させるＧの更新式を導出する。先と同様に、
となる。ただし、
である。Ｆ_Ｇ（Ｈ’，Ｇ，Ｏ’）の第２項は
の不等式による（右辺は接点±ｘ’で｜ｘ｜^ｐに接する放物線であるため明らか）。以上より、Ｆ_Ｇ（Ｈ’，Ｇ，Ｏ’）を用いてＧの更新式が
と導ける。 Next, an update formula for G that lowers F is derived. As before,
It becomes. However,
It is. The second term of F _G (H ′, G, O ′) is
(The right side is apparent because it is a parabola in contact with | x | ^p at the contact ± x ′). From the above, the update formula of G is obtained using F _G (H ′, G, O ′).
I can lead.

最後に、Ｆを降下させるＯの更新式を導出する。先と同様に、
が言え、これを用いてＯの更新式が
と導ける。 Finally, an update formula for O that lowers F is derived. As before,
But using this, the update formula for O
I can lead.

次に、Ｉダイバージェンス規準について説明する。モデル化誤差をＩダイバージェンスで測った場合の最適化問題
についても検討する。 Next, the I divergence criterion will be described. Optimization problem when modeling error is measured by I-divergence
Also consider.

まず、Ｈの更新式については、
の不等式より導かれる
を最小化するＨを求めれば良く、
のように解析的に得られる。以下同様にして、Ｇの更新式については、
の不等式、Ｏの更新式については、
の不等式より導かれる
をそれぞれ最小化するＧ，Ｏを求めれば良く、
のように解析的に得られる。式（２１）の第２項および式（２２）の第３項は
∀_ｘ＞０，｜ｘ｜^ｐ≦ｐ｜ｘ’｜^ｐ−１（ｘ−ｘ’）＋｜ｘ’｜^ｐ（０＜ｐ≦２）・・・（２５）による（右辺は接点ｘ’における｜ｘ｜^ｐの接線であるため明らか）。 First, for the update formula of H,
Derived from the inequality of
Find H that minimizes
It is obtained analytically as follows. Similarly, for the G update formula,
For the inequality of, the update formula of O,
Derived from the inequality of
Find G and O to minimize
It is obtained analytically as follows. The second term of the equation (21) and the third term of the equation (22) are expressed as ∀ _{x> 0} , | x | ^p ≦ p | x ′ | ^p−1 (xx ′) + | x ′ | ^p (0 <P ≦ 2) (25) (the right side is obvious because it is a tangent line of | x | ^{p at} the contact point x ′).

次に、前述した基本原理を使用した信号分析装置について説明する。図１は第１の実施形態における信号分析装置の構成を示すブロック図である。信号分析装置は、コンピュータ装置で構成する。この図において、符号１は、音響信号を標本化・量子化することにより得られる音響信号データを入力して記憶した信号データ記憶部である。符号２は、時間周波数解析を行う時間周波数解析部である。符号３は、値の初期設定を行う初期設定部である。符号４は、スペクトログラムモデルを算出するスペクトログラムモデル算出部である。符号５は、スペクトル基底を更新するスペクトル基底更新部である。符号６は、パワーエンベロープ基底を更新するパワーエンベロープ基底更新部である。符号７は、パワーエンベロープ基底アクティビティを更新するパワーエンベロープ基底アクティビティ更新部である。符号８は、パラメータを規格化するパラメータ規格化部である。符号９は、処理が収束したか否かを判定する収束判定部である。符号１０は、パラメータを出力するパラメータ出力部である。符号１１は、出力されたパラメータを記憶するパラメータ記憶部である。 Next, a signal analyzer using the basic principle described above will be described. FIG. 1 is a block diagram showing the configuration of the signal analysis apparatus according to the first embodiment. The signal analyzer is configured by a computer device. In this figure, reference numeral 1 denotes a signal data storage unit that receives and stores acoustic signal data obtained by sampling and quantizing an acoustic signal. Reference numeral 2 denotes a time-frequency analysis unit that performs time-frequency analysis. Reference numeral 3 denotes an initial setting unit that performs initial setting of values. Reference numeral 4 denotes a spectrogram model calculation unit that calculates a spectrogram model. Reference numeral 5 denotes a spectrum base update unit that updates the spectrum base. Reference numeral 6 denotes a power envelope base update unit that updates the power envelope base. Reference numeral 7 denotes a power envelope base activity update unit that updates the power envelope base activity. Reference numeral 8 denotes a parameter normalization unit that normalizes parameters. Reference numeral 9 denotes a convergence determination unit that determines whether the process has converged. Reference numeral 10 denotes a parameter output unit that outputs parameters. Reference numeral 11 denotes a parameter storage unit that stores the output parameters.

次に、図１を参照して、図１に示す信号分析装置の動作を説明する。まず、時間周波数解析部２は、信号データ記憶部１に記憶されている分析対象の信号データを読み込み、短時間フーリエ変換（Short-Time Fourier Transform；ＳＴＦＴ）やウェーブレット変換などを用いて、時間周波数解析を行って非負値で与えられる時間周波数成分｛Ｙω，ｔ｝_{０≦Ω≦Ω−１，０≦ｔ≦Ｔ−１}を計算する。ただし、ω＝０，・・・，Ω−１，ｔ＝０，・・・，Ｔ−１はそれぞれ周波数、時刻に対応するインデックスとする。時間周波数解析部２は、時間周波数成分Ｙ_ω，ｔを格納した行列Ｙ＝（Ｙ_ω，ｔ）_Ω×Ｔを出力する。 Next, the operation of the signal analyzer shown in FIG. 1 will be described with reference to FIG. First, the time-frequency analysis unit 2 reads the signal data to be analyzed stored in the signal data storage unit 1 and uses a short-time Fourier transform (STFT), wavelet transform, or the like to obtain the time frequency. The time frequency component {Yω, t} given by the non-negative value is analyzed and _{0 ≦ Ω ≦ Ω−1 and 0 ≦ t ≦ T−1} are calculated. However, ω = 0,..., Ω-1, t = 0,..., T-1 are indices corresponding to the frequency and time, respectively. The time frequency analysis unit 2 outputs a matrix Y = (Y _{ω, t} ) _{Ω × T in} which the time frequency components Y _{ω, t} are stored.

次に、初期設定部３は、スペクトル基底数Ｉ、パワーエンベロープ基底数Ｊおよび正則化パラメータλ_ｇ，λ_ｏ，ｐ_ｇ，ｐ_ｏを決定する。そして、初期設定部３は、非負値行列因子分解（ＮＭＦ）によりＹに対し、
となる
を出力する。これを用いて、スペクトル基底パラメータＨ_ω，ｉ、パワーエンベロープ基底パラメータＧ_ｊ，ｔ、パワーエンベロープ基底アクティビティ値Ｏ_{ｉ，ｊ，ｔ}の初期値をそれぞれ
として出力する。ただし、ｉ＝１，・・・，Ｉはスペクトル基底のインデックス、ｊ＝１，・・・，Ｊはパワーエンベロープ基底のインデックスとする。また、［・］_ａ，ｂは行列のａ行ｂ列の成分を表す。 Next, the initial setting unit 3 determines the spectrum basis number I, the power envelope basis number J, and the regularization parameters λ _g , λ _o , p _g , p _o . Then, the initial setting unit 3 performs non-negative matrix factorization (NMF) on Y,
Become
Is output. Using this, the initial values of the spectrum basis parameter H _{ω, i} , power envelope basis parameter G _{j, t} , and power envelope basis activity value O _{i, j, t} are respectively determined.
Output as. Here, i = 1,..., I is a spectrum basis index, and j = 1,..., J is a power envelope basis index. [•] _{a, b} represent the components of a row and b column of the matrix.

次に、スペクトログラムモデル算出部４は、前段で得られたＨ_ω，ｉ、Ｇ_ｊ，ｔ、Ｏ_{ｉ，ｊ，ｔ}を用いてスペクトログラムモデルＸ_ω，ｔを以下の手順で算出して出力する。まず、スペクトログラムモデル算出部４は、Ｇ_ｊ，ｔとＯ_{ｉ，ｊ，ｔ}を用いてスペクトル基底アクティビティ値Ｕ_ｉ，ｔを畳み込み混合演算
により算出する。この畳み込み混合演算は高速フーリエ変換を用いて高速計算する。 Next, the spectrogram model calculation unit 4 calculates and outputs the spectrogram model X _{ω, t} by the following procedure using H _{ω, i} , G _{j, t} , O _{i, j, t} obtained in the previous stage. . First, the spectrogram model calculation unit 4 convolves the spectrum base activity value U _{i, t} using G _{j, t} and O _{i, j, t.}
Calculated by This convolution mixing calculation is performed at high speed using a fast Fourier transform.

次に、スペクトログラムモデル算出部４は、Ｈ_ω，ｉおよび先に求まったＵ_ｉ，ｔを用いて、Ｘ_ω，ｔを積和演算
により算出する。 Next, the spectrogram model calculation unit 4 uses X _{ω, t} to calculate the sum of products using H _{ω, i} and U _{i, t} previously obtained.
Calculated by

次に、スペクトル基底更新部５は、Ｙ_ω，ｔおよび前段で得られたＸ_ω，ｔとＵ_ｉ，ｔとＨ_ω，ｉを用いて、Ｈ_ω，ｉを
により更新して出力する。 Next, the spectrum base update unit 5 uses H _{ω, t} and X _{ω, t} and U _i obtained in the previous stage _{, t} and H _{ω, i} to calculate H _{ω, i} .
To update and output.

次に、パワーエンベロープ基底更新部６は、Ｙ_ω，ｔおよび前段で得られたＸ_ω，ｔとＨ_ω，ｉとＯ_{ｉ，ｊ，ｔ}を用いて、Ｇ_ｊ，ｔを以下の手順により算出する。まず、パワーエンベロープ基底更新部６は、ｉ番目のスペクトル基底Ｈ_{０，ｉ，・・・}，Ｈ_{Ω−１，ｉ}と時刻ｔにおける観測スペクトルＹ_{０，ｔ，・・・}，Ｙ_{Ω−１，ｔ}との
を
により算出する。同様に、パワーエンベロープ基底更新部６は、Ｈ_{０，ｉ，・・・}，Ｈ_{Ω−１，ｉ}とＸ_{０，ｔ，・・・}，Ｘ_{Ω−１，ｔ}の
を
により算出する。 Then, the power envelope basal updating unit 6, Y _omega, X _omega obtained by _t and _{front, t} and H _{omega, i} and _{O i, j,} using a _{_t, G j,} by the following procedure _t calculate. First, the power envelope base update unit 6 includes the i-th spectrum base H _{0, i,...} , H _{Ω-1, i} and the observed spectrum Y _{0, t} _{,. with t}
The
Calculated by Similarly, the power envelope base update unit 6 includes H _{0, i,...} , H _{Ω-1, i} and X _{0, t} _,.
The
Calculated by

次に、パワーエンベロープ基底更新部６は、
を
により算出する。この相互相関の演算は、高速フーリエ変換を用いることで高速計算する。同様に、パワーエンベロープ基底更新部６は、
を
により算出する。この相互相関の演算も高速フーリエ変換を用いることで高速計算する。 Next, the power envelope base update unit 6
The
Calculated by This cross-correlation calculation is performed at high speed by using a fast Fourier transform. Similarly, the power envelope base update unit 6
The
Calculated by This cross-correlation operation is also calculated at high speed by using fast Fourier transform.

最後に、パワーエンベロープ基底更新部６は、
および前段で得られたＧ_ｊ，ｔを用いて、Ｇ_ｊ，ｔを
により更新する。このλ_ｇｐ_ｇ｜Ｇ_ｊ，τ｜^ｐｇ−１はスパース正則化項に関係する項であり、Ｇ_ｊ，ｔの要素をスパースにするように誘導する効果をもつものであればこれ以外の形のもので代替してもよい。 Finally, the power envelope base update unit 6
And using G _{j, t} obtained in the previous stage, G _{j, t}
Update with The _{_{λ g p g | G j,}} τ | pg-1 is a term related to sparse regularization term, G _j, which other as long as it has the effect of inducing the elements of _t to the sparse It may be replaced by a shape.

次に、パワーエンベロープ基底アクティビティ更新部７は、Ｙ_ω，ｔおよび前段で得られたＸ_ω，ｔとＨ_ω，ｉとＯ_{ｉ，ｊ，ｔ}を用いて、Ｇ_ｊ，ｔを以下の手順により算出して出力する。まず、パワーエンベロープ基底アクティビティ更新部７は、ｉ番目のスペクトル基底Ｈ_{０，ｉ，・・・}，Ｈ_{Ω−１，ｉ}と時刻ｔにおける観測スペクトルＹ_{０，ｔ，・・・}，Ｙ_{Ω−１，ｔ}との
を
により算出する。同様に、パワーエンベロープ基底アクティビティ更新部７は、Ｈ_{０，ｉ，・・・}，Ｈ_{Ω−１，ｉ}とＸ_{０，ｔ，・・・}，Ｘ_{Ω−１，ｔ}の
を
により算出する。 Next, the power envelope basal activity update unit 7, Y _omega, X _omega obtained by _t and the preceding _stage, using the _t and H _{omega, i} and _{O i, j, _t,} _{G j,} the following procedure _t Is calculated and output. First, the power envelope basal activity update unit 7, i-th spectral basis _{H 0, i, ···, H} Ω-1, observed in the _i and time t spectrum _{Y 0, t, ···, Y} Ω-1 _{, T}
The
Calculated by Similarly, the power envelope base activity update unit 7 includes H _{0, i,...} , H _{Ω-1, i} and X _{0, t} _,.
The
Calculated by

次に、パワーエンベロープ基底アクティビティ更新部７は、
を
により算出する。この相互相関の演算は，高速フーリエ変換を用いることで高速計算する。同様に、パワーエンベロープ基底アクティビティ更新部７は、
を
により算出する。この相互相関の演算も高速フーリエ変換を用いることで高速計算する。 Next, the power envelope base activity update unit 7
The
Calculated by This cross-correlation calculation is performed at high speed by using a fast Fourier transform. Similarly, the power envelope base activity update unit 7
The
Calculated by This cross-correlation operation is also calculated at high speed by using fast Fourier transform.

最後に、パワーエンベロープ基底アクティビティ更新部７は、
を
により更新する。この＋λ_ｏｐ_ｏ｜Ｏ’_{ｉ，ｊ，τ}｜^ｐｏ−１はスパース正則化項に関係する項であり、Ｏ_{ｉ，ｊ，ｔ}の要素をスパースにするように誘導する効果をもつものであればこれ以外の形のもので代替してもよい。 Finally, the power envelope base activity update unit 7
The
Update with This + λ _o p _o | O ′ _{i, j, τ} | ^po−1 is a term related to the sparse regularization term, and has the effect of inducing the elements of O _{i, j, t} to be sparse. If there is any other form, it may be replaced.

次に、パラメータ規格化部８は、前段で得られたＨ_ω，ｉおよびＧ_ｊ，ｔを規格化して出力する。例えば、足して１になるように両者を規格化する場合は、
によりＨ_ω，ｉおよびＧ_ｊ，ｔをそれぞれ更新する。 Next, the parameter normalization unit 8 normalizes and outputs H _{ω, i} and G _{j, t} obtained in the previous stage. For example, when standardizing both to be 1,
To update H _{ω, i} and G _{j, t} respectively.

次に、収束判定部９は、前段の反復計算が所定の回数を満たしたか否か、あるいは、反復計算においてパラメータの更新の変化率が所定値以下になったか否か、あるいは、目的関数値の変化率が所定値以下になったか否かを判定する。例えば、目的関数は、
により計算する。ただし、Ｓ（Ｇ，Ｏ）はＧ，Ｏをスパースな解へと誘導する正則化項であり、
と定義する。収束判定部９は、反復計算が収束していなければ、スペクトログラムモデル算出部４に対して、再度スペクトログラムモデルを算出し直す指示を出力し、これを受けて、スペクトログラムモデル算出部４、スペクトル基底更新部５、パワーエンベロープ基底更新部６、パワーエンベロープ基底アクティビティ更新部７及びパラメータ規格化部８は、反復計算が収束するまで前述した処理動作を繰り返す。 Next, the convergence determination unit 9 determines whether or not the previous iteration has satisfied a predetermined number of times, whether or not the parameter update change rate in the iterative calculation is less than or equal to a predetermined value, or the objective function value It is determined whether or not the rate of change has become a predetermined value or less. For example, the objective function is
Calculate according to Where S (G, O) is a regularization term that guides G, O to a sparse solution,
It is defined as If the iterative calculation has not converged, the convergence determination unit 9 outputs an instruction to recalculate the spectrogram model to the spectrogram model calculation unit 4 and receives the spectrogram model calculation unit 4 to update the spectrum base. The unit 5, the power envelope base update unit 6, the power envelope base activity update unit 7 and the parameter normalization unit 8 repeat the processing operations described above until the iterative calculation converges.

次に、パラメータ出力部９は、反復計算が収束したと見なされたＨ_ω，ｉ，Ｇ_ｊ，ｔ，Ｏ_{ｉ，ｊ，ｔ}，Ｘ_ω，ｔなどのパラメータをパラメータ記憶部１１に記憶する。 Next, the parameter output unit 9 stores parameters such as H _{ω, i} , G _{j, t} , O _{i, j, t} , X _{ω, t,} etc., in which the iterative calculation has converged, in the parameter storage unit 11. .

次に、第２の実施形態における信号分析装置を説明する。まず、第２の実施形態におけるスペクトル基底更新部５の処理動作を説明する。スペクトル基底更新部５は、Ｙ_ω，ｔおよび前段で得られたＸ_ω，ｔとＨ_ω，ｉとＵ_ｉ，ｔを用いて、以下の手順によりＨ_ω，ｉを更新する。まず、観測スペクトログラムＹ_ω，ｔとスペクトログラムモデルＸ_ω，ｔとのスペクトログラム比Ｒ_ω，ｔを
により算出する。 Next, a signal analysis apparatus according to the second embodiment will be described. First, the processing operation of the spectrum base update unit 5 in the second embodiment will be described. Spectral basis update unit 5, Y _omega, X obtained in the _t and preceding _omega, with _t and H _{omega, i} and _{U _i, t,} H _ω, updates the _i by the following procedure. First, the spectrogram ratio R _{ω, t} between the observed spectrogram Y _{ω, t} and the spectrogram model X _{ω, t} is _expressed as
Calculated by

次に、スペクトル基底更新部５は、先に求まったＲ_ω，ｔおよびＨ_ω，ｉとＵ_ｉ，ｔを用いて、
により算出する。 Next, the spectrum base update unit 5 uses the previously obtained R _{ω, t} and H _{ω, i} and U _{i, t} ,
Calculated by

次に、第２の実施形態におけるパワーエンベロープ基底更新部６の処理動作を説明する。パワーエンベロープ基底更新部６、Ｙ_ω，ｔおよび前段で得られたＸ_ω，ｔとＨ_ω，ｉとＧ_ｊ，ｔとＯ_{ｉ，ｊ，ｔ}を用いて、以下の手順によりＧ_ｊ，ｔを更新する。まず、パワーエンベロープ基底更新部６は、観測スペクトログラムＹ_ω，ｔとスペクトログラムモデルＸ_ω，ｔとのスペクトログラム比Ｒ_ω，ｔを
により算出する。次に、パワーエンベロープ基底更新部６は、Ｈ_{０，ｉ，・・・}，Ｈ_{Ω−１，ｉ}とＲ_{０，ｔ，・・・}，Ｒ_{Ω−１，ｔ}との
を
により計算する。次に、パワーエンベロープ基底更新部６は、
により計算する。この相互相関の演算は、高速フーリエ変換を用いることで高速計算する。 Next, the processing operation of the power envelope base update unit 6 in the second embodiment will be described. Power envelope base updating unit _6, Y ω, X obtained in the _t and preceding _{omega, t} and H _{omega, i} and _{G j, t} and _{O i, j,} using a _t, _{G j} according to the following _{procedure, t} Update. First, the power envelope base update unit 6 determines the spectrogram ratio R _{ω, t} between the observed spectrogram Y _{ω, t} and the spectrogram model X _{ω, t.}
Calculated by Next, the power envelope base update unit 6 calculates the relationship between H _{0, i,...} , H _{Ω-1, i} and R _{0, t} _,.
The
Calculate according to Next, the power envelope base update unit 6
Calculate according to This cross-correlation calculation is performed at high speed by using a fast Fourier transform.

最後に、パワーエンベロープ基底更新部６は、
によりＧ_ｊ，ｔを更新する。この２λ_ｇｐ_ｇ｜Ｇ_ｊ，τ｜^ｐｇ−１はスパース正則化項に関係する項であり、Ｇ_ｊ，ｔの要素をスパースにするように誘導する効果をもつものであればこれ以外の形のもので代替してもよい。 Finally, the power envelope base update unit 6
To update G _{j, t} . The _{_{2λ g p g | G j,}} τ | pg-1 is a term related to sparse regularization term, G _j, which other as long as it has the effect of inducing the elements of _t to the sparse It may be replaced by a shape.

次に、第２の実施形態におけるパワーエンベロープ基底アクティビティ更新部７の処理動作を説明する。パワーエンベロープ基底アクティビティ更新部７は、Ｙ_ω，ｔおよび前段で得られたＸ_ω，ｔとＨ_ω，ｉとＧ_ｊ，ｔとＯ_{ｉ，ｊ，ｔ}を用いて、以下の手順によりＯ_{ｉ，ｊ，ｔ}を更新する。まず、パワーエンベロープ基底アクティビティ更新部７は、観測スペクトログラムＹ_ω，ｔとスペクトログラムモデルＸ_ω，ｔとのスペクトログラム比Ｒ_ω，ｔを
により算出する。次に、パワーエンベロープ基底アクティビティ更新部７は、Ｈ_{０，ｉ，・・・}，Ｈ_{Ω−１，ｉ}とＲ_{０，ｔ，・・・}，Ｒ_{Ω−１，ｔ}との
を
により計算する。次に、パワーエンベロープ基底アクティビティ更新部７は、
を
により計算する。この相互相関の演算は、高速フーリエ変換を用いることで高速計算する。 Next, the processing operation of the power envelope base activity update unit 7 in the second embodiment will be described. Power envelope basal activity update unit 7, Y _omega, X obtained in the _t and preceding _{omega, t} and H _{omega, i} and _{G j, t} and _{O i, j,} using a _t, _{O i} by the following steps _{, J, t} are updated. First, the power envelope base activity update unit 7 calculates a spectrogram ratio R _{ω, t} between the observed spectrogram Y _{ω, t} and the spectrogram model X _{ω, t.}
Calculated by Next, the power envelope base activity update unit 7 calculates the relationship between H _{0, i,...} , H _{Ω-1, i} and R _{0, t} _,.
The
Calculate according to Next, the power envelope base activity update unit 7
The
Calculate according to This cross-correlation calculation is performed at high speed by using a fast Fourier transform.

最後に、パワーエンベロープ基底アクティビティ更新部７は、
によりＯ_{ｉ，ｊ，ｔ}を更新する。この２λ_ｏｐ_ｏ｜Ｏ_{ｉ，ｊ，τ}｜^ｐｏ−１はスパース正則化項に関係する項であり、Ｏ_{ｉ，ｊ，ｔ}の要素をスパースにするように誘導する効果をもつものであればこれ以外の形のもので代替してもよい。 Finally, the power envelope base activity update unit 7
To update O _{i, j, t} . This 2λ _o p _o | O _{i, j, τ} | ^po−1 is a term related to the sparse regularization term, and has the effect of inducing the elements of O _{i, j, t} to be sparse. For example, it may be replaced with other forms.

次に、第２の実施形態における収束判定部９の処理動作を説明する。収束判定部９は、反復計算が所定の回数を満たしたか否か、あるいは、反復計算においてパラメータの更新の変化率が所定値以下になったか否か、あるいは、目的関数値の変化率が所定値以下になったか否かを判定する。例えば、目的関数は
により計算する。ただし、Ｓ（Ｇ，Ｏ）はＧ，Ｏをスパースな解へと誘導する正則化項であり、ここでは
と定義する。 Next, the processing operation of the convergence determination unit 9 in the second embodiment will be described. The convergence determination unit 9 determines whether or not the iterative calculation has satisfied a predetermined number of times, or whether or not the parameter update change rate has become a predetermined value or less in the iterative calculation, or whether the change rate of the objective function value is a predetermined value. It is determined whether or not the following has occurred. For example, the objective function is
Calculate according to Where S (G, O) is a regularization term that guides G and O to a sparse solution.
It is defined as

以上説明したように、音響信号をノートごとの信号に分解するために、従来の非負値行列分解（ＮＭＦ）で用いられる分解要素（行列Ｕ）について、さらに音の立ち上がりや減衰のパターンの情報を表現できる分解（式（４）及び式（５））を導入した新たなモデルを用い、モデルの各パラメータを推定することにより、複数の音が混在している音響信号からの特定音の検出、複数の音が混在している音響信号からの特定音の抽出、および複数の音が混在している音響信号からの特定音の加工などに利用することができる。 As described above, in order to decompose the acoustic signal into signals for each note, information on the sound rise and attenuation patterns is further obtained for the decomposition element (matrix U) used in the conventional non-negative matrix decomposition (NMF). Detection of a specific sound from an acoustic signal in which a plurality of sounds are mixed by estimating each parameter of the model using a new model that introduces decomposition (expression (4) and expression (5)) that can be expressed, It can be used for extraction of a specific sound from an acoustic signal in which a plurality of sounds are mixed, processing of a specific sound from an acoustic signal in which a plurality of sounds are mixed, and the like.

なお、図１に示す時間周波数解析部２、初期設定部３、スペクトログラムモデル算出部４、スペクトル基底更新部５、パワーエンベロープ基底更新部６、パワーエンベロープ基底アクティビティ更新部７、パラメータ規格化部８、収束判定部９及びパラメータ出力部１０の機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより信号分析処理を行ってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータシステム」は、ホームページ提供環境（あるいは表示環境）を備えたＷＷＷシステムも含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（ＲＡＭ）のように、一定時間プログラムを保持しているものも含むものとする。 1, the time frequency analysis unit 2, the initial setting unit 3, the spectrogram model calculation unit 4, the spectrum base update unit 5, the power envelope base update unit 6, the power envelope base activity update unit 7, the parameter normalization unit 8, A program for realizing the functions of the convergence determination unit 9 and the parameter output unit 10 is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into a computer system and executed to perform signal analysis. Processing may be performed. Here, the “computer system” includes an OS and hardware such as peripheral devices. The “computer system” includes a WWW system having a homepage providing environment (or display environment). The “computer-readable recording medium” refers to a portable medium such as a flexible disk, a magneto-optical disk, a ROM, and a CD-ROM, and a storage device such as a hard disk built in the computer system. Further, the “computer-readable recording medium” refers to a volatile memory (RAM) in a computer system that becomes a server or a client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line. In addition, those holding programs for a certain period of time are also included.

また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。また、上記プログラムは、前述した機能の一部を実現するためのものであってもよい。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であってもよい。 The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, what is called a difference file (difference program) may be sufficient.

複数の音が混在している音響信号からの特定音の検出、複数の音が混在している音響信号からの特定音の抽出、および複数の音が混在している音響信号からの特定音の加工などを行うことが不可欠な用途に適用できる。 Detection of a specific sound from an acoustic signal in which multiple sounds are mixed, extraction of a specific sound from an acoustic signal in which multiple sounds are mixed, and detection of a specific sound from an acoustic signal in which multiple sounds are mixed It can be used in applications where it is essential to perform processing.

１・・・信号データ記憶部、２・・・時間周波数解析部、３・・・初期設定部、４・・・スペクトログラムモデル算出部、５・・・スペクトル基底更新部、６・・・パワーエンベロープ基底更新部、７・・・パワーエンベロープ基底アクティビティ更新部、８・・・パラメータ規格化部、９・・・収束判定部、１０・・・パラメータ出力部、１１・・・パラメータ記憶部 DESCRIPTION OF SYMBOLS 1 ... Signal data storage part, 2 ... Time frequency analysis part, 3 ... Initial setting part, 4 ... Spectrogram model calculation part, 5 ... Spectral base update part, 6 ... Power envelope Base update unit, 7 ... Power envelope base activity update unit, 8 ... Parameter normalization unit, 9 ... Convergence determination unit, 10 ... Parameter output unit, 11 ... Parameter storage unit

Claims

Signal data storage means storing acoustic signal data;
Time frequency analysis means for obtaining a data matrix Y storing time frequency components of the acoustic signal data by time frequency analysis for the acoustic signal data read from the signal data storage means;
When the data matrix Y is approximated by a product of a base matrix H and a coefficient matrix U using a non-negative matrix factorization method, initial values of spectrum base parameters, power envelope base parameters, and power envelope base activity values to be obtained are obtained. An initial value setting means for setting a value;
A spectrogram model representing a spectrum basis activity time series corresponding to each row of the coefficient matrix U in the form of a convolution mixture using the spectrum basis parameter, the power envelope basis parameter and the power envelope basis activity value to which the initial values are set. Model calculation means for calculating a value;
Using the spectrogram model values, the data matrix Y, the spectrum basis parameters, the power envelope basis parameters, and the power envelope basis activity values, the spectrum basis parameters for which the initial values are set, Updating means for updating the power envelope basis parameter and the power envelope basis activity value;
The updating means continues updating the values until the spectrum basis parameters, the power envelope basis parameters, and the power envelope basis activity values converge, and at the time of convergence, the spectrum basis parameters, the power envelope basis parameters, the power envelope An output means for outputting a base activity value and a value of the spectrogram model.

A signal analysis method for causing a computer of a signal analysis apparatus provided with signal data storage means in which acoustic signal data is stored to perform signal analysis processing,
A time frequency analysis step for obtaining a data matrix Y storing a time frequency component of the acoustic signal data by time frequency analysis with respect to the acoustic signal data read from the signal data storage means;
When the data matrix Y is approximated by a product of a base matrix H and a coefficient matrix U using a non-negative matrix factorization method, initial values of spectrum base parameters, power envelope base parameters, and power envelope base activity values to be obtained are obtained. An initial value setting step for setting a value;
A spectrogram model representing a spectrum basis activity time series corresponding to each row of the coefficient matrix U in the form of a convolution mixture using the spectrum basis parameter, the power envelope basis parameter and the power envelope basis activity value to which the initial values are set. A model calculation step for calculating a value;
Using the spectrogram model values, the data matrix Y, the spectrum basis parameters, the power envelope basis parameters, and the power envelope basis activity values, the spectrum basis parameters for which the initial values are set, An updating step for updating the power envelope basis parameter and the power envelope basis activity value;
The update step continues to update the spectrum basis parameter, the power envelope basis parameter, and the power envelope basis activity value until convergence, and at the time of convergence, the spectrum basis parameter, the power envelope basis parameter, the power envelope. An output step of outputting a base activity value and a value of the spectrogram model.

A signal analysis program for causing a computer of a signal analyzer having signal data storage means in which acoustic signal data is stored to perform signal analysis processing,
A time frequency analysis step for obtaining a data matrix Y storing a time frequency component of the acoustic signal data by time frequency analysis with respect to the acoustic signal data read from the signal data storage means;
When the data matrix Y is approximated by a product of a base matrix H and a coefficient matrix U using a non-negative matrix factorization method, initial values of spectrum base parameters, power envelope base parameters, and power envelope base activity values to be obtained are obtained. An initial value setting step for setting a value;
A spectrogram model representing a spectrum basis activity time series corresponding to each row of the coefficient matrix U in the form of a convolution mixture using the spectrum basis parameter, the power envelope basis parameter and the power envelope basis activity value to which the initial values are set. A model calculation step for calculating a value;
Using the spectrogram model values, the data matrix Y, the spectrum basis parameters, the power envelope basis parameters, and the power envelope basis activity values, the spectrum basis parameters for which the initial values are set, An updating step for updating the power envelope basis parameter and the power envelope basis activity value;
The update step continues to update the spectrum basis parameter, the power envelope basis parameter, and the power envelope basis activity value until convergence, and at the time of convergence, the spectrum basis parameter, the power envelope basis parameter, the power envelope. An output step for outputting a base activity value and a spectrogram model value to the computer.