JP6498141B2

JP6498141B2 - Acoustic signal analyzing apparatus, method, and program

Info

Publication number: JP6498141B2
Application number: JP2016052633A
Authority: JP
Inventors: 允裕中野; 柏野　邦夫; 邦夫柏野; 松井　知子; 知子松井; 大地持橋
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2016-03-16
Filing date: 2016-03-16
Publication date: 2019-04-10
Anticipated expiration: 2036-03-16
Also published as: JP2017167347A

Description

本発明は、音響信号解析装置、方法、及びプログラムに係り、歌声を示す音響信号の基本周波数軌跡の観測データを解析する音響信号解析装置、方法、及びプログラムに関する。 The present invention relates to an acoustic signal analyzing apparatus, method, and program, and more particularly to an acoustic signal analyzing apparatus, method, and program for analyzing observation data of a fundamental frequency locus of an acoustic signal indicating a singing voice.

従来より、解析手法として、ノンパラメトリックベイズ法とカーネル法と呼ばれる2つの機械学習分野が知られている。 Conventionally, two machine learning fields called non-parametric Bayes method and kernel method are known as analysis methods.

＜カーネル平均法＞
複雑なデータに対する確率分布を表現する際、原理的にはありとあらゆるデータへの確率を付与していくことで構成することができるが、データのとりうる場合の数が増えるに従いそれが困難となっていく。特に無限の場合の数に対しては、それを計算機上で表現するのは困難を極める。非特許文献１のカーネル平均法は、元の確率分布と同等の表現能力を維持したまま、それを計算機上で表現する術を与えてくれ、少ない有限のデータからよく近似することが出来る方法として知られている。 <Kernel averaging method>
When expressing the probability distribution for complex data, in principle, it can be configured by adding probabilities to all kinds of data, but it becomes more difficult as the number of possible cases of data increases. Go. Especially for infinite numbers, it is extremely difficult to express them on a computer. The kernel averaging method of Non-Patent Document 1 gives a technique to express it on a computer while maintaining the same expression ability as the original probability distribution, and can be approximated well from a small amount of finite data. Are known.

＜ガンマ過程による信号分解＞
観測データが潜在的な意味ある辞書の組み合わせで構成されていると考えられる場合、非特許文献２に記載されている、ガンマ過程を用いた辞書モデルを構築することで、原理的に無限の辞書サイズでありながら、出来るだけ少数の辞書だけを学習するよう振る舞うような解析アルゴリズムを構成することが出来る。 <Signal decomposition by gamma process>
If observation data is considered to be composed of potentially meaningful combinations of dictionaries, it is possible to construct an infinite dictionary in principle by constructing a dictionary model using the gamma process described in Non-Patent Document 2. It is possible to construct an analysis algorithm that behaves to learn only as few dictionaries as possible, despite its size.

A. Smola, A. Gretton, L. Song, B. Scholkopf. (2007). A Hilbert Space Embedding for Distributions. Algorithmic Learning Theory: 18th International Conference. Springer: 13-31.A. Smola, A. Gretton, L. Song, B. Scholkopf. (2007). A Hilbert Space Embedding for Distributions. Algorithmic Learning Theory: 18th International Conference. Springer: 13-31. Matthew D. Hoffman, David M. Blei, Perry R. Cook, Bayesian Nonparametric Matrix Factorization for Recorded Music, International Conference on Machine Learning, 2011.Matthew D. Hoffman, David M. Blei, Perry R. Cook, Bayesian Nonparametric Matrix Factorization for Recorded Music, International Conference on Machine Learning, 2011.

従来、歌声のような非常に情報豊かなデータから、その潜在的な意味ある特徴を抽出する研究が行われてきたが、その特徴からさらに各人の個性に相当する特徴まで抽出することが出来なかった。 Conventionally, research has been conducted to extract potential and meaningful features from very information-rich data such as singing voices, but it is possible to extract features that correspond to the individuality of each person from those features. There wasn't.

本発明では、上記の事情を考慮してなされたものであり、歌声から各歌唱者の個性に相当する特徴を抽出することができる音響信号解析装置、方法、及びプログラムを提供することを目的とする。 The present invention has been made in consideration of the above circumstances, and an object thereof is to provide an acoustic signal analyzing apparatus, method, and program capable of extracting features corresponding to the individuality of each singer from a singing voice. To do.

上記目的を達成するために、本発明の音響信号解析装置は、Ｎ人の歌唱者の各々がＬ種類の楽譜の各々に対して少なくとも１回歌ったときの歌声を示す音響信号の観測データを解析する音響信号解析装置であって、歌声を示す音響信号の各時刻の基本周波数を表す基本周波数軌跡ｘと楽譜の各時刻の音程を表す楽譜ベクトルｓとのペア（ｘ、ｓ）間の類似度を測る規準として予め定められたＭ個のカーネルＫ_m、前記Ｍ個のカーネルＫ_mの各々の重みａ_m、及び前記Ｎ人の歌唱者ｎの各々がＬ種類の楽譜ｓ_lの各々に対して少なくとも１回歌ったときの前記観測データから得られる前記基本周波数軌跡ｘと前記楽譜ベクトルｓとの各ペア（ｘ、ｓ）から求められる、歌唱者ｎが楽譜ｓ_lに対して歌ったときのペア（ｘ、ｓ_l）のヒルベルト空間上の期待値μ_n,lと、予め定められたＤ個の歌声歌唱分布の辞書ｑ_dを用いて表される各ペア（ｘ、ｓ）の確率分布、歌唱者ｎに対する楽譜共通の前記Ｄ個の辞書ｑ_dの各々の重みｂ_d,n、及び楽譜ｓ_lに対する歌唱者共通の前記Ｄ個の辞書ｑ_dの各々の重みｃ_l,dから求められる、歌唱者ｎが楽譜ｓ_lに対して歌ったときのペア（ｘ、ｓ_l）のヒルベルト空間上の期待値μ_n,l ^*との距離を用いて表される目的関数を最小化するように、前記Ｍ個のカーネルＫ_mの各々の重みａ_mと、前記Ｎ人の歌唱者ｎの各々に対する楽譜共通の前記Ｄ個の辞書ｑ_dの各々の重みｂ_d,nと、前記Ｌ種類の楽譜ｓ_lに対する歌唱者共通の前記Ｄ個の辞書ｑ_dの各々の重みｃ_l,dを推定するパラメータ推定部を含んで構成されている。 In order to achieve the above object, the acoustic signal analysis apparatus of the present invention provides observation data of an acoustic signal indicating a singing voice when each of N singers sang at least once for each of L kinds of sheet music. An acoustic signal analyzing apparatus for analyzing, a similarity between a pair (x, s) of a fundamental frequency trajectory x representing a fundamental frequency at each time of an acoustic signal representing a singing voice and a score vector s representing a pitch at each time of the score predetermined the M kernel K _m as criteria to measure the degree to each of the M kernel K each weight a _m of _m, and the n number each L type singer n of music s _l The singer n sings the score s _l obtained from each pair (x, s) of the fundamental frequency trajectory x obtained from the observation data when sung at least once and the score vector s. Hilbert space of pairs (x, s _l) at the time The D number of the expected value mu _{n, l,} the score common for probability distributions, singer n of each pair (x, s) represented by using the dictionary q _d of the D pieces of voice singing distribution predetermined each weight b _d dictionaries q _{_d, n,} and each of the weights c _l a singer common the D pieces of dictionary q _d for the score s _{_l,} determined from _d, singer n Whereas score s _l Of the M kernels K _m so as to minimize the objective function expressed using the distance between the expected value μ _{n, l} ^* of the pair (x, _sl ) in the Hilbert space and each of the weights a _m, the weight b _d of each of the music common said for each of the singers n of the n number D number of dictionary q _{_d,} and _n, singer common the respect to the L type of music s _l weight c _l of each of the D pieces of dictionary q _{_d,} is configured to include a parameter estimation unit that estimates a _d.

本発明の音響信号解析方法は、Ｎ人の歌唱者の各々がＬ種類の楽譜の各々に対して少なくとも１回歌ったときの歌声を示す音響信号の観測データを解析する音響信号解析装置における音響信号解析方法であって、パラメータ推定部が、歌声を示す音響信号の各時刻の基本周波数を表す基本周波数軌跡ｘと楽譜の各時刻の音程を表す楽譜ベクトルｓとのペア（ｘ、ｓ）間の類似度を測る規準として予め定められたＭ個のカーネルＫ_m、前記Ｍ個のカーネルＫ_mの各々の重みａ_m、及び前記Ｎ人の歌唱者ｎの各々がＬ種類の楽譜ｓ_lの各々に対して少なくとも１回歌ったときの前記観測データから得られる前記基本周波数軌跡ｘと前記楽譜ベクトルｓとの各ペア（ｘ、ｓ）から求められる、歌唱者ｎが楽譜ｓ_lに対して歌ったときのペア（ｘ、ｓ_l）のヒルベルト空間上の期待値μ_n,lと、予め定められたＤ個の歌声歌唱分布の辞書ｑ_dを用いて表される各ペア（ｘ、ｓ）の確率分布、歌唱者ｎに対する楽譜共通の前記Ｄ個の辞書ｑ_dの各々の重みｂ_d,n、及び楽譜ｓ_lに対する歌唱者共通の前記Ｄ個の辞書ｑ_dの各々の重みｃ_l,dから求められる、歌唱者ｎが楽譜ｓ_lに対して歌ったときのペア（ｘ、ｓ_l）のヒルベルト空間上の期待値μ_n,l ^*との距離を用いて表される目的関数を最小化するように、前記Ｍ個のカーネルＫ_mの各々の重みａ_mと、前記Ｎ人の歌唱者ｎの各々に対する楽譜共通の前記Ｄ個の辞書ｑ_dの各々の重みｂ_d,nと、前記Ｌ種類の楽譜ｓ_lに対する歌唱者共通の前記Ｄ個の辞書ｑ_dの各々の重みｃ_l,dを推定する。 The acoustic signal analysis method of the present invention is an acoustic signal analysis apparatus for analyzing observation data of an acoustic signal indicating a singing voice when each of N singers sings at least once on each of L types of sheet music. In the signal analysis method, the parameter estimation unit between a pair (x, s) of a fundamental frequency trajectory x representing a fundamental frequency at each time of an acoustic signal indicating a singing voice and a score vector s representing a pitch at each time of the score of the M kernel K _m that predetermined as criteria to measure the similarity, each of the weights a _m of the M kernel K _m, and each of the singers n of said n number of L type music s _l A singer n obtained from each pair (x, s) of the fundamental frequency trajectory x obtained from the observation data obtained when singing at least once for each and the score vector s is _assigned to the score _sl . pair at the time of singing (x, s _l The probability distribution of the expected value mu _{n, l} in the Hilbert space, each pair represented by using the dictionary q _d of the D pieces of voice singing distribution predetermined (x, s), score common for the singer n the D number each of the dictionary q _d weights b _{d, n,} and each of the singers common the D pieces of dictionary q _d for the score s _l weight c _l, obtained from _d, singer n musical score of pairs when singing against s _{_l} (x, s _l) so as to minimize the objective function expressed by using the distance between the expected value mu _{n, l} on Hilbert space ^* of, the M and weight a _m each kernel K _m, the weight b _d of each of the music common for each D number of dictionary q _d of singer n of the n _number, and _n, singing for the L type of music s _l The weights c _{l, d} of each of the D dictionaries q _d common to the users are estimated.

本発明のプログラムは、コンピュータを、音響信号解析装置の各部として機能させるためのプログラムである。 The program of this invention is a program for functioning a computer as each part of an acoustic signal analyzer.

以上説明したように、本発明の音響信号解析装置、方法、及びプログラムによれば、予め定められたＭ個のカーネルＫ_m、カーネルＫ_mの各々の重みａ_m、及び観測データから得られる前記基本周波数軌跡ｘと前記楽譜ベクトルｓとの各ペア（ｘ、ｓ）から求められる、歌唱者ｎが楽譜ｓ_lに対して歌ったときのペア（ｘ、ｓ_l）のヒルベルト空間上の期待値μ_n,lと、予め定められたＤ個の歌声歌唱分布の辞書ｑ_dを用いて表される各ペア（ｘ、ｓ）の確率分布、歌唱者ｎに対する楽譜共通の辞書ｑ_dの各々の重みｂ_d,n、及び楽譜ｓ_lに対する歌唱者共通の辞書ｑ_dの各々の重みｃ_l,dから求められる、歌唱者ｎが楽譜ｓ_lに対して歌ったときのペア（ｘ、ｓ_l）のヒルベルト空間上の期待値μ_n,l ^*との距離を用いて表される目的関数を最小化するように、カーネルＫ_mの各々の重みａ_mと、歌唱者ｎの各々に対する楽譜共通の辞書ｑ_dの各々の重みｂ_d,nと、楽譜ｓ_lに対する歌唱者共通の辞書ｑ_dの各々の重みｃ_l,dを推定することにより、歌声から各歌唱者の個性に相当する特徴を抽出することができる。 As described above, the acoustic signal analyzer of the present invention, a method, and according to the program, the predetermined M-number of kernel K _m, the obtained from each of the weights a _m, and the observation data of the kernel K _m each pair (x, s) of the score vector s with the fundamental frequency trajectory x obtained from the Hilbert space on the expected value of the pair (x, s _l) when the singer n sang relative score s _l μ _{n, l} and the probability distribution of each pair (x, s) expressed using a predetermined dictionary _d of singing voice singing singing q _d , each of the common score q _d for singer n weight b _{d, n,} and the weight c _l for each singing person common dictionary q _d for the score s _{_l,} determined from _d, the pair (x when singer n sang relative score s _l, s _l ) Minimizes the objective function expressed using the distance to the expected value μ _{n, l} ^{* in} the Hilbert space As such, the weight a _m of each kernel K _m, each of the weights b _d music common dictionary q _d for each singer _n, and _n, each singing person common dictionary q _d for the score s _l By estimating the weights cl _{, d} , it is possible to extract features corresponding to the individuality of each singer from the singing voice.

本実施の形態の音響信号解析装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the acoustic signal analyzer of this Embodiment. 本実施の形態の音響信号解析装置におけるパラメータ推定処理ルーチンの内容を示すフローチャートである。It is a flowchart which shows the content of the parameter estimation process routine in the acoustic signal analyzer of this Embodiment. 実験結果を示す図である。It is a figure which shows an experimental result.

以下、図面を参照して本発明の実施の形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

＜本発明の実施の形態の原理＞
歌声の基本周波数軌跡xは、離散化された各時刻ごとの基本周波数値（実数）を並べたベクトルとして表現することが出来る。ここで、基本周波数軌跡xの時間方向の長さは全て統一され、Vとなっているものとする。また、楽譜ｓは、各時刻ごとの音の高さを並べたベクトルとして表現することが出来る。歌声の基本周波数軌跡xと楽譜sを再生核ヒルベルト空間へ写像する関数φを導入し、基本周波数軌跡xと楽譜sのペアをその一点φ(x,s)として考える。このヒルベルト空間は、歌声の基本周波数軌跡と楽譜のペア(x,s)、(x´,s´)の間の類似度を測る規準としてカーネルK(・,(x,s))を用いて特徴づけることが出来る。 <Principle of Embodiment of the Present Invention>
The fundamental frequency trajectory x of the singing voice can be expressed as a vector in which the discrete fundamental frequency values (real numbers) for each time are arranged. Here, it is assumed that the lengths in the time direction of the fundamental frequency trajectory x are all unified and V. In addition, the score s can be expressed as a vector in which the pitches of each time are arranged. A function φ that maps the fundamental frequency trajectory x of the singing voice and the score s to the reproduction kernel Hilbert space is introduced, and a pair of the fundamental frequency trajectory x and the score s is considered as one point φ (x, s). This Hilbert space uses the kernel K (・, (x, s)) as a criterion for measuring the similarity between the fundamental frequency trajectory of the singing voice and the score pair (x, s), (x´, s´). Can be characterized.

M個の素片となる規準として、以下の（１）式に示すカーネルK_m((x,s), (x´,s´))(m= 1, 2, ... ,M)を考える。 Kernel K _m ((x, s), (x´, s´)) (m = 1, 2, ..., M) shown in the following equation (1) is used as a criterion for M pieces. Think.

（１）
(1)

ここでλ₁、λ₂は事前に設定した正の実数（例えば λ1=λ2=1）を用いる。人体の物理的な制約から歌声の発生は強い拘束を受けることが考えられる。そこで歌声の類似度を測る距離規準はいくつかの典型的なテンプレートがあると考えることが出来る。カーネルKは実数の重みa_m 〜Gamma(. , .)(m=1, 2, ... ,M)を用いて Here, positive real numbers (for example, λ1 = λ2 = 1) set in advance are used as λ ₁ and λ ₂ . Due to physical constraints of the human body, the generation of singing voices can be strongly constrained. Therefore, the distance criterion for measuring the similarity of singing voices can be considered to have some typical templates. Kernel K uses real weights a _{m to} Gamma (.,.) (M = 1, 2, ..., M)

（２）
(2)

と表すことにする。 It will be expressed as

N人の歌唱者がL種類の楽譜を歌っている場合を考える。n番目の歌唱者がl番目の楽譜を歌う際の歌声基本周波数は確率分布P_n,lにしたがうものとする。この確率分布を先の再生核ヒルベルト空間の期待値を用いて考える。確率分布P_n,lに従う基本周波数軌跡と楽譜のペア(x, s)のヒルベルト空間上の期待値μ_n,lは実数b_n,l,dを用いて Consider a case where N singers are singing L types of music. It is assumed that the fundamental voice frequency when the nth singer sings the lth sheet music follows the probability distribution _{Pn, l} . This probability distribution is considered using the expected value of the previous reproduction kernel Hilbert space. Expected value μ _{n, l} in Hilbert space of a pair (x, s) of fundamental frequency trajectory and sheet music according to probability distribution P _{n, l} is obtained by using real numbers b _{n, l, d}

（３）
(3)

と表すことが出来る。 Can be expressed as

この歌声歌唱確率分布を扱いやすいモデルによって表現することを考える。D個の歌声歌唱分布の辞書q₁,q₂, ... ,q_Dを事前に与えるものとする。n番目の歌唱者のl番目の楽譜に対する歌唱確率分布と同等のμ_n,lは、n番目の歌唱者に対する楽曲共通の辞書ｑ_dの重みb_d,n(＞0)とl番目の楽曲に対する歌唱者共通の辞書ｑ_dの重みc_l,d(＞0)を用いて、次のようにモデルμ_n,l ^*によって近似することが出来る。 Let us consider expressing this singing voice singing probability distribution by an easy-to-handle model. It is assumed that _d q singing voice distribution dictionaries q ₁ , q ₂ ,..., Q _D are given in advance. The μ _{n, l} equivalent to the singing probability distribution for the l-th score of the n-th singer is the weight b _{d, n} (> 0) of the common dictionary q _d for the n-th singer and the l-th song _Can be approximated by the model μ _{n, l} ^* using the weight c _{l, d} (> 0) of the common dictionary q _d for

（４）
(4)

n番目の歌唱者がl番目の楽譜s_lに対して実際に歌ったｏ_l個の観測データ（基本周波数軌跡）を x₁,x₂, ... ,x_ｏとすると μ_n,lは次のように計算することができる。 n-th of the singer is the l-th score s _l actually sang o _l pieces of observation data (fundamental frequency trajectory) with respect to x _1, x _2, ..., and the x _o μ _{n, l} is It can be calculated as follows:

（５）
(5)

次にμ_n,l ^*は次のように近似することができる。 Next, μ _{n, l} ^* can be approximated as follows.

（６）
(6)

ただし、Q_d(d=1, 2, ... ,D)は事前に用意したパラメトリックな確率分布として何を用いてもよい。a_m、b_d,n、c_l,dにはいずれもガンマ過程事前分布をおくことができ、例えば最も簡潔には However, Q _d (d = 1, 2,..., D) may be any parametric probability distribution prepared in advance. Each of a _m , b _{d, n} , c _{l, d} can have a gamma process prior distribution, for example, most simply

（７）

（８）

（９）
(7)

(8)

(9)

と設定することが出来る。モデルのフィッティングはμ_n,lをμ_n,l ^*によって近似することで行うことができ、目的関数Ｗは Can be set. Model fitting can be performed by approximating μ _{n, l} by μ _{n, l} ^* , and the objective function W is

（１０）
(10)

となり、これを最小化するようなa_m、b_d,n、c_l,d(m=1, 2, ..., M, d=1, 2, ..., D, l=1, 2, ..., L)を求める最適化問題となる。その解法としては例えば最急降下法や確率的最急降下法が考えられる。 A _m , b _{d, n} , c _{l, d} (m = 1, 2, ..., M, d = 1, 2, ..., D, l = 1, 2, ..., L). For example, the steepest descent method or the stochastic steepest descent method can be considered.

最も素朴には、以下の更新を、例えば事前に指定した回数だけ反復することが出来る。 Most simply, the following updates can be repeated, for example, a predetermined number of times.

（１１）

（１２）

（１３）
(11)

(12)

(13)

＜システム構成＞
次に、図１を参照して、本発明の実施形態による音響信号解析装置の構成を説明する。本発明の実施の形態に係る音響信号解析装置１０は、Ｎ人の歌唱者の各々がＬ種類の楽譜の各々に対して少なくとも１回歌ったときの観測データを解析する。図１に示すように、音響信号解析装置１０は、入力部１２と、演算部１４と、出力部１６と、を備えている。 <System configuration>
Next, the configuration of the acoustic signal analyzing apparatus according to the embodiment of the present invention will be described with reference to FIG. The acoustic signal analyzing apparatus 10 according to the embodiment of the present invention analyzes observation data when each of N singers sings at least once for each of L kinds of musical scores. As shown in FIG. 1, the acoustic signal analysis device 10 includes an input unit 12, a calculation unit 14, and an output unit 16.

入力部１２により、Ｎ人の歌唱者の各々がＬ種類の楽譜の各々に対して少なくとも１回歌ったときの歌声音響信号及び楽譜が、演算部１４に入力される。 By the input unit 12, the singing voice sound signal and the score when each of the N singers sang at least once for each of the L types of score are input to the calculation unit 14.

演算部１４は、基本周波数抽出部２０と、データ記憶部２２と、パラメータ推定部２４と、を含んだ構成で表すことができる。 The calculation unit 14 can be represented by a configuration including a fundamental frequency extraction unit 20, a data storage unit 22, and a parameter estimation unit 24.

基本周波数抽出部２０は、Ｎ人の歌唱者の各々がＬ種類の楽譜の各々に対して少なくとも１回歌ったときの歌声音響信号の各々について、基本周波数軌跡ｘを推定し出力する。この処理は、周知技術により実現でき、例えば、文献：A de Cheveign´e and H. Kawahara,“ YIN, a fundamental frequency estimator for speech and music,” Journal of the AcousticalSociety of America, vol.111, no.4, pp. 1917-1930, 2002 で提案される基本周波数推定法ＹＩＮを利用する。この手法は自己相関関数を使用して基本周波数を推定するが、倍ピッチエラーや半ピッチエラー、その他ノイズによる推定エラーを低減するために差分関数、正規化、放物線補間などの処理を後処理として導入した手法である。ピッチの高い音楽や歌声の基本周波数推定に効果的な手法であることが従来研究によって明らかにされている。本実施形態では、ＹＩＮを利用して、歌声音響信号から５ｍｓごとに基本周波数を推定し、基本周波数軌跡を出力する。 The fundamental frequency extraction unit 20 estimates and outputs a fundamental frequency trajectory x for each of the singing voice acoustic signals when each of the N singers sang at least once for each of the L types of sheet music. This process can be realized by a well-known technique, for example, literature: A de Cheveign´e and H. Kawahara, “YIN, a fundamental frequency estimator for speech and music,” Journal of the Acoustical Society of America, vol. 111, no. 4, the fundamental frequency estimation method YIN proposed in pp. 1917-1930, 2002 is used. This method uses the autocorrelation function to estimate the fundamental frequency, but post-processing such as difference function, normalization, and parabolic interpolation is used to reduce estimation errors due to double pitch error, half-pitch error, and other noises. It is an introduced method. Previous studies have shown that this is an effective technique for estimating the fundamental frequency of high pitch music and singing voices. In this embodiment, YIN is used to estimate the fundamental frequency from the singing voice acoustic signal every 5 ms and output a fundamental frequency locus.

データ記憶部２２は、Ｎ人の歌唱者の各々がＬ種類の楽譜の各々に対して少なくとも１回歌ったときの歌声音響信号の各々についての、抽出された基本周波数軌跡を表すベクトルｘと、当該楽譜の各時刻の音程を表す楽譜ベクトルｓとのペア（ｘ、ｓ）が記憶されている。 The data storage unit 22 is a vector x representing an extracted fundamental frequency trajectory for each of the singing voice signals when each of the N singers sang at least once for each of the L types of sheet music; A pair (x, s) with a score vector s representing a pitch at each time of the score is stored.

パラメータ推定部２４は、データ記憶部２２に記憶されている、基本周波数軌跡を表すベクトルｘと、楽譜ベクトルｓとの各ペア（ｘ、ｓ）に基づいて、予め定められたＭ個のカーネルＫ_m、カーネルＫ_mの各々の重みａ_m、及び観測データの各ペア（ｘ、ｓ）から求められる、歌唱者ｎが楽譜ｓｌに対して歌ったときのペア（ｘ、ｓ_l）のヒルベルト空間上の期待値μ_n,lと、予め定められたＤ個の歌声歌唱分布の辞書ｑ_dを用いて表される各ペア（ｘ、ｓ）の確率分布、歌唱者ｎに対する楽譜共通の辞書ｑ_dの各々の重みｂ_d,n、及び楽譜ｓ_lに対する歌唱者共通の辞書ｑ_dの各々の重みｃ_l,dから求められる、歌唱者ｎが楽譜ｓｌに対して歌ったときのペア（ｘ、ｓ_l）のヒルベルト空間上の期待値μ_n,l ^*との距離を用いて表される上記（１０）式に示す目的関数を最小化するように、カーネルＫ_mの各々の重みａ_mと、歌唱者ｎの各々に対する楽譜共通の辞書ｑ_dの各々の重みｂ_d,nと、楽譜ｓ_lの各々に対する歌唱者共通の辞書ｑ_dの各々の重みｃ_l,dを推定する。 The parameter estimation unit 24 determines M kernels K determined in advance based on each pair (x, s) of the vector x representing the fundamental frequency locus and the score vector s stored in the data storage unit 22. _m, is determined from the weight a _m each kernel K _m, and the observed data of each pair (x, s), Hilbert space of pairs (x, s _l) when the singer n sang relative score sl The above expected value μ _{n, l} , the probability distribution of each pair (x, s) expressed using a predetermined dictionary _d of D singing voice singing distributions, and a common musical score dictionary q for singer n _d each weight b _{d a, n,} and the weight c _l for each singing person common dictionary q _d for the score s _{_l,} determined from _d, the pair (x when singer n sang relative score sl the (10, represented with a distance between the s _{_l,} the expected value mu _n on Hilbert space) _l ^* So as to minimize the objective function as shown in formula, and the weight a _m of each kernel K _m, the weight b _d of each of the music common dictionary q _d for each singer _n, and _n, each score s _l Estimate each weight c _{l, d} of the dictionary q _d common to the singers.

具体的には、パラメータ推定部２４は、パラメータ初期化部３０、カーネル重み更新部３２、歌唱者辞書重み更新部３４、楽曲辞書重み更新部３６、及び終了判定部３８を備えている。 Specifically, the parameter estimation unit 24 includes a parameter initialization unit 30, a kernel weight update unit 32, a singer dictionary weight update unit 34, a music dictionary weight update unit 36, and an end determination unit 38.

パラメータ初期化部３０は、Ｍ個のカーネルＫ_mの各々の重みａ_mと、Ｎ人の歌唱者ｎの各々に対する楽譜共通のＤ個の辞書ｑ_dの各々の重みｂ_d,nと、Ｌ種類の楽譜ｓ_lの各々に対する歌唱者共通のＤ個の辞書ｑ_dの各々の重みｃ_l,dとに初期値を設定する。 Parameter initializing unit 30, the weight a _m of each of the M kernel K _m, the weight b _d of each of the music common D number of dictionary q _d for each of the singers n of N _number, and _n, L each of the weights c _l a singer common D number of dictionary q _d for each type of music s _{_l,} sets an initial value to the _d.

例えば、以下のように、a_m、b_d,n、c_l,d (m=1, 2, ... , M, d=1, 2, ... ,D, l =1, 2, ...,L)の初期化を行う。 For example, a _m , b _{d, n} , c _{l, d} (m = 1, 2, ..., M, d = 1, 2, ..., D, l = 1, 2, ..., L) is initialized.

a_m(m = 1, 2, ... ,M)の初期値に関しては Gamma(1/M, 1)のガンマ分布から生成する。ｂ_d,n(d =1, 2, ... ,D, n = 1, 2, ... , N)の初期値に関してはGamma(1/D, 1)のガンマ分布から生成する。c_l,d (l =1, 2, ... , L, d = 1, 2, ...D)に関しては Gamma(1/D, 1)のガンマ分布から生成する。 The initial value of a _m (m = 1, 2, ..., M) is generated from the gamma distribution of Gamma (1 / M, 1). The initial values of b _{d, n} (d = 1, 2,..., D, n = 1, 2,..., N) are generated from the gamma distribution of Gamma (1 / D, 1). c _{l, d} (l = 1, 2, ..., L, d = 1, 2, ... D) is generated from the gamma distribution of Gamma (1 / D, 1).

カーネル重み更新部３２は、データ記憶部２２に記憶されている基本周波数軌跡ｘと楽譜ベクトルｓとの各ペア（ｘ、ｓ）と、初期化された、又は前回更新された、カーネルＫ_mの重みａ_m(m = 1, 2, ... ,M)、歌唱者ｎに対する楽譜共通の辞書ｑ_dの重みｂ_d,n(d =1, 2, ... ,D, n = 1, 2, ... , N)、及び楽譜ｓ_lに対する歌唱者共通の辞書ｑ_dの重みｃ_l,d(l =1, 2, ... , L, d = 1, 2, ...D)に基づいて、上記（１０）式の目的関数を最小化するように、上記（１１）式に従って、カーネルＫ_mの重みａ_m(m = 1, 2, ... ,M)を更新する The kernel weight updating unit 32 stores each pair (x, s) of the fundamental frequency trajectory x and the score vector s stored in the data storage unit 22 and the kernel K _m that has been initialized or updated last time. weight _{a m (m = 1, 2} , ..., M), the weight b _d music common dictionary q _d for singer _{n, n (d = 1,} 2, ..., D, n = 1, 2, ..., N), and weight c _l a singer common dictionary q _d for the score _{_{s l, d (l = 1}} , 2, ..., L, d = 1, 2, ... D ), The weight a _m (m = 1, 2,..., M) of the kernel K _m is updated according to the above equation (11) so as to minimize the objective function of the above equation (10).

歌唱者辞書重み更新部３４は、データ記憶部２２に記憶されている基本周波数軌跡ｘと楽譜ベクトルｓとの各ペア（ｘ、ｓ）と、初期化された、又は前回更新された、カーネルＫ_mの重みａ_m(m = 1, 2, ... ,M)、歌唱者ｎに対する楽譜共通の辞書ｑ_dの重みｂ_d,n(d =1, 2, ... ,D, n = 1, 2, ... , N)、及び楽譜ｓ_lに対する歌唱者共通の辞書ｑ_dの重みｃ_l,d(l =1, 2, ... , L, d = 1, 2, ...D)に基づいて、上記（１０）式の目的関数を最小化するように、上記（１２）式に従って、歌唱者ｎに対する楽譜共通の辞書ｑ_dの重みｂ_d,n(d =1, 2, ... ,D, n = 1, 2, ... , N)を更新する。 The singer dictionary weight updating unit 34 initializes or previously updated the kernel K with each pair (x, s) of the fundamental frequency trajectory x and the score vector s stored in the data storage unit 22. _m weight _{m a} (m = 1, 2,..., M), common score _{d d} for song n _, b _{d, n} (d = 1, 2,..., D, n = 1, 2, ..., N) , and weight c _l a singer common dictionary q _d for the score _{_{s l, d (l = 1}} , 2, ..., L, d = 1, 2, .. .D), the weight b _{d, n} (d = 1, d = 1, d) of the common score q _d for the singer n according to the above equation (12) so as to minimize the objective function of the above equation (10). 2, ..., D, n = 1, 2, ..., N).

楽曲辞書重み更新部３６は、データ記憶部２２に記憶されている基本周波数軌跡ｘと楽譜ベクトルｓとの各ペア（ｘ、ｓ）と、初期化された、又は前回更新された、カーネルＫ_mの重みａ_m(m = 1, 2, ... ,M)、歌唱者ｎに対する楽譜共通の辞書ｑ_dの重みｂ_d,n(d =1, 2, ... ,D, n = 1, 2, ... , N)、及び楽譜ｓ_lに対する歌唱者共通の辞書ｑ_dの重みｃ_l,d(l =1, 2, ... , L, d = 1, 2, ...D)に基づいて、上記（１０）式の目的関数を最小化するように、上記（１３）式に従って、楽譜ｓ_lに対する歌唱者共通の辞書ｑ_dの重みｃ_l,d(l =1, 2, ... , L, d = 1, 2, ...D)を更新する。 The music dictionary weight updating unit 36, each pair (x, s) of the fundamental frequency trajectory x and the score vector s stored in the data storage unit 22, and the kernel K _m that has been initialized or updated last time. weight a _m of (m = 1, 2, ... , M), the weight b _d music common dictionary q _d for singer _{n, n (d = 1,} 2, ..., D, n = 1 , 2, ..., N), and weight c _l a singer common dictionary q _d for the score _{_{s l, d (l = 1}} , 2, ..., L, d = 1, 2, ... Based on D), according to the above equation (13), the weights c _{1, d} (l = 1, 1) of the common dictionary q _d for the score s _l are used to minimize the objective function of the above equation (10). 2, ..., L, d = 1, 2, ... D).

終了判定部３８は、予め定められた終了条件を満たしたか否かを判定し、終了条件を満たすまで、カーネル重み更新部３２による更新、歌唱者辞書重み更新部３４による更新、及び楽曲辞書重み更新部３６による更新を繰り返させる。 The end determination unit 38 determines whether a predetermined end condition is satisfied, and updates by the kernel weight update unit 32, update by the singer dictionary weight update unit 34, and music dictionary weight update until the end condition is satisfied. The updating by the unit 36 is repeated.

例えば、終了条件として、予め指定された反復回数に到達することを用いればよい。また、終了条件として、更新前のパラメータを用いて計算した目的関数の値と更新後のパラメータを用いて計算した目的関数の値との誤差が、所定の閾値以下であることを用いてもよい。 For example, reaching the number of iterations specified in advance may be used as the termination condition. Further, as an end condition, it may be used that an error between the value of the objective function calculated using the parameter before update and the value of the objective function calculated using the parameter after update is equal to or less than a predetermined threshold. .

出力部１６は、終了判定部３８において終了条件を満たしたと判定された場合に、カーネル重み更新部３２、歌唱者辞書重み更新部３４、及び楽曲辞書重み更新部３６により最終的に更新されたカーネルＫ_mの重みａ_m(m = 1, 2, ... ,M)、歌唱者ｎに対する楽譜共通の辞書ｑ_dの重みｂ_d,n(d =1, 2, ... ,D, n = 1, 2, ... , N)、及び楽譜ｓ_lに対する歌唱者共通の辞書ｑ_dの重みｃ_l,d(l =1, 2, ... , L, d = 1, 2, ...D)を出力する。 The output unit 16, when the end determination unit 38 determines that the end condition is satisfied, the kernel finally updated by the kernel weight update unit 32, the singer dictionary weight update unit 34, and the music dictionary weight update unit 36. K _m weights a _m (m = 1, 2,..., M), a common score q _d weights b _{d, n} (d = 1, 2,..., D, n) for the singer n = 1, 2, ..., N ), and weight c _l a singer common dictionary q _d for the score _{_{s l, d (l = 1}} , 2, ..., L, d = 1, 2,. ..D) is output.

＜音響信号解析装置の作用＞
次に、本実施の形態に係る音響信号解析装置１０の作用について説明する。まず、Ｎ人の歌唱者の各々がＬ種類の楽譜の各々に対して少なくとも１回歌ったときの歌声音響信号及び楽譜が、入力部１２により音響信号解析装置１０に入力されると、図２に示すパラメータ推定処理ルーチンが実行される。 <Operation of acoustic signal analyzer>
Next, the operation of the acoustic signal analysis device 10 according to the present embodiment will be described. First, when the singing voice acoustic signal and the score when each of the N singers sang at least once for each of the L types of scores are input to the acoustic signal analysis apparatus 10 by the input unit 12, FIG. The parameter estimation processing routine shown in FIG.

ステップＳ１００で、基本周波数抽出部２０によって、基本周波数推定法ＹＩＮを利用して、入力された歌声音響信号の各々について、５ｍｓごとに基本周波数を推定し、基本周波数軌跡ｘを推定し、楽譜ベクトルｓとのペア（ｘ、ｓ）を、データ記憶部２２に格納する。 In step S100, the fundamental frequency extraction unit 20 estimates the fundamental frequency every 5 ms, estimates the fundamental frequency trajectory x for each of the input singing voice acoustic signals using the fundamental frequency estimation method YIN, and calculates the score vector. The pair (x, s) with s is stored in the data storage unit 22.

ステップＳ１０２で、カーネルＫ_mの重みａ_m(m = 1, 2, ... ,M)、歌唱者ｎに対する楽譜共通の辞書ｑ_dの重みｂ_d,n(d =1, 2, ... ,D, n = 1, 2, ... , N)、及び楽譜ｓ_lに対する歌唱者共通の辞書ｑ_dの重みｃ_l,d(l =1, 2, ... , L, d = 1, 2, ...D)に初期値を設定する。 In step S102, the weight a _m kernel _{K m (m = 1, 2} , ..., M), the weight b _d music common dictionary q _d for singer _{n, n (d = 1,} 2, .. ., D, n = 1, 2, ..., n), and weight c _l a singer common dictionary q _d for the score _{_{s l, d (l = 1}} , 2, ..., L, d = Set initial values to 1, 2, ... D).

ステップＳ１０４では、データ記憶部２２に記憶されている基本周波数軌跡ｘと楽譜ベクトルｓとの各ペア（ｘ、ｓ）と、上記ステップＳ１０２で初期化された、又は前回更新された、カーネルＫ_mの重みａ_m(m = 1, 2, ... ,M)、歌唱者ｎに対する楽譜共通の辞書ｑ_dの重みｂ_d,n(d =1, 2, ... ,D, n = 1, 2, ... , N)、及び楽譜ｓ_lに対する歌唱者共通の辞書ｑ_dの重みｃ_l,d(l =1, 2, ... , L, d = 1, 2, ...D)に基づいて、上記（１１）式に従って、カーネルＫ_mの重みａ_m(m = 1, 2, ... ,M)を更新する。 In step S104, each pair (x, s) of the fundamental frequency trajectory x and the score vector s stored in the data storage unit 22 and the kernel K _m initialized in the above step S102 or updated last time. weight a _m of (m = 1, 2, ... , M), the weight b _d music common dictionary q _d for singer _{n, n (d = 1,} 2, ..., D, n = 1 , 2, ..., N), and weight c _l a singer common dictionary q _d for the score _{_{s l, d (l = 1}} , 2, ..., L, d = 1, 2, ... Based on D), the weight a _m (m = 1, 2,..., M) of the kernel K _m is updated according to the above equation (11).

ステップＳ１０６では、データ記憶部２２に記憶されている基本周波数軌跡ｘと楽譜ベクトルｓとの各ペア（ｘ、ｓ）と、上記ステップＳ１０２で初期化された、又は前回更新された、カーネルＫ_mの重みａ_m(m = 1, 2, ... ,M)、歌唱者ｎに対する楽譜共通の辞書ｑ_dの重みｂ_d,n(d =1, 2, ... ,D, n = 1, 2, ... , N)、及び楽譜ｓ_lに対する歌唱者共通の辞書ｑ_dの重みｃ_l,d(l =1, 2, ... , L, d = 1, 2, ...D)に基づいて、上記（１２）式に従って、歌唱者ｎに対する楽譜共通の辞書ｑ_dの重みｂ_d,n(d =1, 2, ... ,D, n = 1, 2, ... , N)を更新する。 In step S106, each pair (x, s) of the fundamental frequency trajectories x and the score vector s stored in the data storage unit 22 and, initialized at step S102, or was last updated, the kernel K _m weight a _m of (m = 1, 2, ... , M), the weight b _d music common dictionary q _d for singer _{n, n (d = 1,} 2, ..., D, n = 1 , 2, ..., N), and weight c _l a singer common dictionary q _d for the score _{_{s l, d (l = 1}} , 2, ..., L, d = 1, 2, ... based on D), according to the above (12), the weight b _d music common dictionary q _d for singer _{n, n (d = 1,} 2, ..., D, n = 1, 2, .. , N).

ステップＳ１０８では、データ記憶部２２に記憶されている基本周波数軌跡ｘと楽譜ベクトルｓとの各ペア（ｘ、ｓ）と、上記ステップＳ１０２で初期化された、又は前回更新された、カーネルＫ_mの重みａ_m(m = 1, 2, ... ,M)、歌唱者ｎに対する楽譜共通の辞書ｑ_dの重みｂ_d,n(d =1, 2, ... ,D, n = 1, 2, ... , N)、及び楽譜ｓ_lに対する歌唱者共通の辞書ｑ_dの重みｃ_l,d(l =1, 2, ... , L, d = 1, 2, ...D)に基づいて、上記（１３）式に従って、楽譜ｓ_lに対する歌唱者共通の辞書ｑ_dの重みｃ_l,d(l =1, 2, ... , L, d = 1, 2, ...D)を更新する。 In step S108, each pair (x, s) of the fundamental frequency trajectories x and the score vector s stored in the data storage unit 22 and, initialized at step S102, or was last updated, the kernel K _m weight a _m of (m = 1, 2, ... , M), the weight b _d music common dictionary q _d for singer _{n, n (d = 1,} 2, ..., D, n = 1 , 2, ..., N), and weight c _l a singer common dictionary q _d for the score _{_{s l, d (l = 1}} , 2, ..., L, d = 1, 2, ... based on D), according to the above (13), the weights c _l a singer common dictionary q _d for the score _{_{s l, d (l = 1}} , 2, ..., L, d = 1, 2,. Update ..D).

ステップＳ１１０で、予め定められた終了条件を満たしたか否かを判定する。終了条件を満たしている場合には、最終的に得られた各パラメータを、出力部１６により出力して処理を終了し、一方、終了条件を満たしていない場合には、上記ステップＳ１０４へ戻る。 In step S110, it is determined whether a predetermined end condition is satisfied. If the end condition is satisfied, each finally obtained parameter is output by the output unit 16 to end the process. On the other hand, if the end condition is not satisfied, the process returns to step S104.

＜実施例＞
次に、本発明の効果および作用を示すため、本発明の実施形態による音響信号解析装置を用いた実施例を以下に説明する。 <Example>
Next, in order to show the effect and operation of the present invention, an example using an acoustic signal analyzer according to an embodiment of the present invention will be described below.

実施例として、本発明の実施形態の提案アルゴリズムを女性5名、男性5名がそれぞれ20曲の楽曲フレーズを1回ずつ歌った歌唱データに適用した結果を示す。 As an example, the result of applying the proposed algorithm of the embodiment of the present invention to song data in which five females and five males each sang 20 song phrases once is shown.

まず定量的な評価として、本発明の実施形態の提案アルゴリズムと、素朴なガウス過程回帰によるフィッティングとで、テストデータへの平均二乗誤差を評価した。訓練データとして、2曲、4曲、6曲を無作為に抽出し、それぞれのアルゴリズムを乱数で決められた初期値から10試行したときの平均と標準偏差を以下の表１にまとめた。 First, as a quantitative evaluation, the mean square error to the test data was evaluated by the proposed algorithm of the embodiment of the present invention and the fitting by simple Gaussian process regression. As training data, 2 songs, 4 songs, and 6 songs were randomly selected, and the average and standard deviation when each algorithm was tested 10 times from the initial value determined by random numbers are summarized in Table 1 below.

次に定量的な評価として、実際に抽出分解された各歌唱者の個性に相当する特徴を図３に示す。ここでは全ての楽曲を訓練データとして用い、パラメータの初期値は乱数によって決定した1試行を示している。上記図３では、行は分解された歌い方の辞書に対応し、列は歌唱者に対応している。例えば、5番目の歌唱者は5番目の歌い方辞書を使いやすいことや、3番目の歌唱者は10番目の歌い方辞書を使いやすいことが見て取れる。このような歌い方個性を反映した特徴量は、歌唱者の識別問題など様々な応用が考えられる。 Next, as quantitative evaluation, characteristics corresponding to the individuality of each singer actually extracted and decomposed are shown in FIG. Here, all songs are used as training data, and the initial value of the parameter indicates one trial determined by random numbers. In FIG. 3 above, rows correspond to decomposed singing dictionaries, and columns correspond to singers. For example, it can be seen that the fifth singer is easy to use the fifth singing dictionary, and the third singer is easy to use the tenth singing dictionary. Such features that reflect the individuality of singing can have various applications such as singers' identification problems.

以上説明したように、本発明の実施の形態の音響信号解析装置１０によれば、Ｍ個のカーネルＫ_m、カーネルＫ_mの各々の重みａ_m、及び観測データから得られる基本周波数軌跡ｘと楽譜ベクトルｓとの各ペア（ｘ、ｓ）から求められる、歌唱者ｎが楽譜ｓ_lに対して歌ったときのペア（ｘ、ｓ_l）のヒルベルト空間上の期待値μ_n,lと、Ｄ個の歌声歌唱分布の辞書ｑ_dを用いて表される各ペア（ｘ、ｓ）の確率分布、歌唱者ｎに対する楽譜共通の辞書ｑ_dの各々の重みｂ_d,n、及び楽譜ｓ_lに対する歌唱者共通の辞書ｑ_dの各々の重みｃ_l,dから求められる、歌唱者ｎが楽譜ｓ_lに対して歌ったときのペア（ｘ、ｓ_l）のヒルベルト空間上の期待値μ_n,l ^*との距離を用いて表される目的関数を最小化するように、カーネルＫ_mの各々の重みａ_mと、歌唱者ｎの各々に対する楽譜共通の辞書ｑ_dの各々の重みｂ_d,nと、楽譜ｓ_lに対する歌唱者共通の辞書ｑ_dの各々の重みｃ_l,dを推定することにより、歌声から各歌唱者の個性に相当する特徴を抽出することができる。 As described above, according to the sound signal analysis apparatus 10 according to the embodiment of the present invention, M-number of kernel K _m, and the fundamental frequency trajectory x obtained from each of the weights a _m, and the observation data of the kernel K _m each pair (x, s) of the musical score vector s is determined from an expected value mu _{n, l} on Hilbert space pairs when singer n sang relative score s _l (x, s _l), Probability distribution of each pair (x, s) expressed using a dictionary q _d of D singing voice singing distributions, weights b _{d, n of} a common score q _d for a singer _n , and a score s _l weight c _l for each singing person common dictionary q _d for _is determined from _d, the pair (x, s _l) when the singer n sang relative score s _l expected value of the Hilbert space of mu _{n ,} so as to minimize the objective function expressed by using the distance between _l ^*, and the weight a _m of each kernel K _m, songs 'S weight b _d each respective score common dictionary q _d for the _n, and _n, the weights c _l for each singing person common dictionary q _d for the score s _{_l,} by estimating the _d, each singing from singing It is possible to extract features corresponding to the individuality of the person.

また、歌声のモデル化と、その潜在的な個性のモデル化として、従来別々な手法として独立に用いられてきたカーネル平均法とノンパラメトリックベイズ法（具体的にはガンマ過程による信号の辞書分解）を組み合わせることにより、音響信号（特に歌声データ）解析技術として、複数人の歌唱者が複数の楽曲に対して歌った歌唱データから、各歌唱者の個性を反映した特徴量を抽出することが出来る。 In addition, as a modeling of singing voice and its potential individuality, the kernel averaging method and the nonparametric Bayes method that have been used independently as separate methods in the past (specifically, dictionary decomposition of signals by gamma process) As a technique for analyzing acoustic signals (especially singing voice data), it is possible to extract feature quantities reflecting the individuality of each singer from singing data sung by a plurality of singers on a plurality of songs. .

なお、本発明は、上記実施の形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 The present invention is not limited to the above embodiment, and various modifications and applications can be made without departing from the gist of the present invention.

例えば、上述の音響信号解析装置１０は、内部にコンピュータシステムを有しているが、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。 For example, the acoustic signal analysis apparatus 10 described above has a computer system inside, but the “computer system” includes a homepage providing environment (or display environment) if a WWW system is used. Shall be.

また、本願明細書中において、プログラムが予めインストールされている実施形態として説明したが、当該プログラムを、コンピュータ読み取り可能な記録媒体に格納して提供することも可能であるし、ネットワークを介して提供することも可能である。また、本実施の形態の音響信号解析装置１０の各部をハードウエアにより構成してもよい。また、パラメータ初期値が記憶されるデータベースとしては、ハードディスク装置やファイルサーバ等に例示される記憶手段によって実現可能であり、音響信号解析装置１０内部にデータベースを設けても良いし、外部装置に設けてもよい。 Further, in the present specification, the embodiment has been described in which the program is installed in advance. However, the program can be provided by being stored in a computer-readable recording medium or provided via a network. It is also possible to do. Moreover, you may comprise each part of the acoustic signal analyzer 10 of this Embodiment by hardware. Further, the database storing the parameter initial values can be realized by storage means exemplified by a hard disk device, a file server, etc., and the database may be provided inside the acoustic signal analysis device 10 or provided in an external device. May be.

１０音響信号解析装置
１２入力部
１４演算部
１６出力部
２０基本周波数抽出部
２２データ記憶部
２４パラメータ推定部
３０パラメータ初期化部
３２カーネル重み更新部
３４歌唱者辞書重み更新部
３６楽曲辞書重み更新部
３８終了判定部 DESCRIPTION OF SYMBOLS 10 Acoustic signal analyzer 12 Input part 14 Calculation part 16 Output part 20 Fundamental frequency extraction part 22 Data storage part 24 Parameter estimation part 30 Parameter initialization part 32 Kernel weight update part 34 Singer dictionary weight update part 36 Music dictionary weight update part 38 End determination part

Claims

An acoustic signal analyzer for analyzing observation data of an acoustic signal indicating a singing voice when each of N singers sang at least once for each of L types of sheet music,
M defined in advance as a criterion for measuring the similarity between a pair (x, s) of a fundamental frequency locus x representing a fundamental frequency at each time of an acoustic signal representing a singing voice and a score vector s representing a pitch at each time of the score. wherein when the number of kernel K _m, the M kernel K _m each weight a _m, and each of singer n of the n people sang least once for each of the L types of music s _l The pair (x, s _l ) of the pair (x, s _l ) obtained when the singer n sings the score s _l obtained from each pair (x, s) of the fundamental frequency trajectory x obtained from the observation data and the score vector s. Expected value μ _{n, l} on Hilbert space,
Probability distribution of each pair (x, s) expressed using a predetermined dictionary _d of S singing voice singing distributions, weight b of each of the D dictionaries q _d common to the score for singer n _{d, n,} and the score s _l each of the weights c _l a singer common the D pieces of dictionary q _d for _is determined from _d, the pair (x when singer n sang relative score s _l, s _l) of so as to minimize the objective function expressed by using the distance between the Hilbert space on the expectation μ _{n, l ^*,} and the weight a _m each of the M kernel K _m, the n a weight b _{d, n} of each of the music common the D pieces of dictionary q _d for each of the singers n people, the singer common for each of said L types of music s _l said D number of dictionary q _d An acoustic signal analyzer including a parameter estimation unit for estimating each weight c _{l, d} .

The parameter estimation unit includes:
The M kernel K and the weight a _m each of _m, each of the weights b _d music common said for each of the singers n of the N number D number of dictionary q _{_d,} and _n, the L type of music the common singing person for each s _l D number of dictionary q _d each weight c _{l a,} and a parameter initializing unit for setting an initial value to each of the _d,
Wherein said fundamental frequency trajectory x obtained from the observed data score each pair of the vector s (x, s), the weight of each of said music common for each of the N number of singers n D pieces of dictionary q _d b _{d, n,} and the L type of music s _l each of the weights c _l a singer common the D pieces of dictionary q _d for _each, based on the _d, so as to minimize the objective function, wherein and kernel weight updating section for updating the weights a _m of each of the M kernel K _m,
Wherein each pair of said fundamental frequency trajectory x obtained from the observed data and the score vector s (x, s), weighting a _m of each of the M kernel K _m, and singing to said L types of music s _l The D dictionaries common to each of the N singers n so as to minimize the objective function based on the weights c _{l, d} of the D dictionaries q _d common a singer dictionary weight updating unit for updating each weight b _{d, n} of q _d ;
Each pair (x, s) of the fundamental frequency trajectory x obtained from the observation data and the score vector s, a weight a _m of each of the M kernels K _m , and each of the N singers n Based on the respective weights b _{d, n} of the D dictionaries q _d common to the scores for D, the D songs common to the singers for each of the L kinds of scores s _l are minimized. A music dictionary weight updating unit for updating each weight c _{l, d} of the dictionary q _d ;
An end determination unit that repeats the update by the kernel weight update unit, the update by the singer dictionary weight update unit, and the update by the music dictionary weight update unit until a predetermined end condition is satisfied,
The acoustic signal analysis device according to claim 1, comprising:

The acoustic signal analyzing apparatus according to claim 1, wherein the objective function is represented by the following expression.

However, Q _d is a probability distribution of pairs (x, s) represented using the dictionary q _d , and O _l represents the number of all pairs (x, s) obtained from the observed data.

An acoustic signal analysis method in an acoustic signal analyzer for analyzing observation data of an acoustic signal indicating a singing voice when each of N singers sings at least once for each of L types of sheet music,
As a criterion for the parameter estimation unit to measure the similarity between a pair (x, s) of a fundamental frequency trajectory x representing a fundamental frequency at each time of a sound signal indicating a singing voice and a score vector s representing a pitch at each time of the score. predetermined the M kernel K _m, of at least one each of the weights a _m of the M kernel K _m, and each of the singers n of the n number for each of the L types of music s _l A pair (x) obtained when a singer n sings a score s _l obtained from each pair (x, s) of the fundamental frequency trajectory x obtained from the observation data when sung and the score vector s. , Sl _l ) the expected value μ _{n, l} on the Hilbert space,
Probability distribution of each pair (x, s) expressed using a predetermined dictionary _d of S singing voice singing distributions, weight b of each of the D dictionaries q _d common to the score for singer n _{d, n,} and the score s _l each of the weights c _l a singer common the D pieces of dictionary q _d for _is determined from _d, the pair (x when singer n sang relative score s _l, s _l) of so as to minimize the objective function expressed by using the distance between the Hilbert space on the expectation μ _{n, l ^*,} and the weight a _m each of the M kernel K _m, the n a weight b _{d, n} of each of the music common the D pieces of dictionary q _d for each of the singers n people, the singer common for each of said L types of music s _l said D number of dictionary q _d An acoustic signal analysis method for estimating each weight c _{l, d} .

By estimating by the parameter estimation unit,
Parameter initialization section, the M kernel K and the weight a _m each of _m, each of the weights b _d music common said for each of the singers n of the N number D number of dictionary q _{_d,} and _n the initial value set L type music s _l each of the weights c _l a singer common the D pieces of dictionary q _d for _each, in each of the _d,
A kernel weight updating unit includes the D dictionaries common to the score for each pair (x, s) of the fundamental frequency trajectory x obtained from the observation data and the score vector s, and each of the N singers n. q each weight b _d of _{_d, n,} and the L type of music s _l each of the weights c _l a singer common the D pieces of dictionary q _d for _each, based on the _d, minimizing the objective function as of, and updates the weights a _m of each of the M kernel K _m,
Singer dictionary weight updating section, each pair of said fundamental frequency trajectory x obtained from the observation data and the score vector s (x, s), weighting a _m of each of the M kernel K _m, and the For each of the N singers n so as to minimize the objective function based on the weights c _{l, d} of each of the D dictionaries q _d common to the singers for L kinds of sheet music s _l Update the weights b _{d, n} of each of the D dictionaries q _d common to the score,
Music dictionary weight updating section, each pair of said fundamental frequency trajectory x obtained from the observation data and the score vector s (x, s), each of the weights a _m of the M kernel K _m, and the N Common to singers for the L types of musical scores s _l so as to minimize the objective function based on the weights b _{d, n} of the D dictionaries q _d common to the musical scores for each of the human singers n Update the weights c _{l, d} of each of the D dictionaries q _d of
The termination determination unit includes repeating update by the kernel weight update unit, update by the singer dictionary weight update unit, and update by the music dictionary weight update unit until a predetermined termination condition is satisfied. 4. The acoustic signal analysis method according to 4.

The acoustic signal analysis method according to claim 4 or 5, wherein the objective function is expressed by the following equation.

However, Q _d is a probability distribution of pairs (x, s) represented using the dictionary q _d , and O _l represents the number of all pairs (x, s) obtained from the observed data.

The program for functioning a computer as each part of the acoustic signal analyzer of any one of Claims 1-3.