JP6187132B2

JP6187132B2 - Score alignment apparatus and score alignment program

Info

Publication number: JP6187132B2
Application number: JP2013217168A
Authority: JP
Inventors: 陽前澤; 吉就中村
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2013-10-18
Filing date: 2013-10-18
Publication date: 2017-08-30
Anticipated expiration: 2033-10-18
Also published as: JP2015079183A

Description

本発明は、楽曲の演奏音を表わす音響信号を取り込みつつ前記取り込んだ音響信号を分析することにより、前記楽曲の楽譜のうち現在演奏されている部分（以下、楽譜位置と呼ぶ）を実時間で推定するスコアアライメント装置、及びスコアアライメント装置が備えるコンピュータに適用されるコンピュータプログラム（スコアアライメントプログラム）に関する。 The present invention analyzes the captured sound signal while capturing the sound signal representing the performance sound of the music, thereby allowing the currently played portion of the score of the music (hereinafter referred to as the score position) in real time. The present invention relates to a score alignment apparatus to be estimated and a computer program (score alignment program) applied to a computer provided in the score alignment apparatus.

従来から、例えば、下記非特許文献１及び２に示されているように、スコアアライメント装置（自動伴奏装置）は知られている。演奏者が楽曲を演奏するとき、その楽曲の楽譜通りに演奏することは稀であり、同じ部分を繰り返し演奏したり、弾けない部分を飛ばしたりすることがある。上記のような任意の楽譜位置遷移に対応するために、非特許文献１及び非特許文献２のスコアアライメント装置においては、演奏過程（楽譜位置の遷移）を確率モデルとして記述している。 Conventionally, for example, as shown in Non-Patent Documents 1 and 2 below, score alignment devices (automatic accompaniment devices) are known. When a performer performs a piece of music, it is rare that the performer performs according to the score of the piece of music, and the same part is played repeatedly or a part that cannot be played may be skipped. In order to cope with any musical score position transition as described above, in the score alignment apparatuses of Non-Patent Document 1 and Non-Patent Document 2, the performance process (transition of the musical score position) is described as a probability model.

現在の楽譜位置から他の全ての楽譜位置への遷移が可能とする場合、遷移後の楽譜位置を推定する際の計算量が著しく多くなる。そこで、非特許文献１においては、楽譜位置の遷移に適当な仮定を設定することにより、計算量の増大を抑制している。 When transition from the current score position to all other score positions is possible, the amount of calculation when estimating the score position after the transition is remarkably increased. Therefore, in Non-Patent Document 1, an increase in calculation amount is suppressed by setting an appropriate assumption for the transition of the musical score position.

また、実時間で推定された楽譜位置の推定精度は、バッチ処理（非実時間処理）で推定された楽譜位置の推定精度に比べて低い。そこで、非特許文献２においては、現時点より所定の時間だけ前の楽譜位置を推定するとともにテンポ軌跡を推定し、両推定結果を用いて、現在の楽譜位置を推定している。 Further, the estimation accuracy of the score position estimated in real time is lower than the estimation accuracy of the score position estimated by batch processing (non-real time processing). Therefore, in Non-Patent Document 2, a musical score position that is a predetermined time before the current time is estimated, a tempo trajectory is estimated, and the current musical score position is estimated using both estimation results.

中村栄太，武田晴登，山本龍一，斎藤康之，酒向慎司，嵯峨山茂樹、「任意箇所への弾き直し・弾き飛ばしを含む演奏に追従可能な楽譜追跡と自動伴奏」、情報処理学会論文誌、２０１３年４月、ｖｏｌ．５４、ｎｏ．４、ｐ．１３３８−１３４９Eita Nakamura, Haruto Takeda, Ryuichi Yamamoto, Yasuyuki Saito, Shinji Sakai, Shigeki Hatakeyama, “Score tracking and automatic accompaniment that can follow performances including replaying and skipping to any location”, Transactions of Information Processing Society of Japan April 2013, vol. 54, no. 4, p. 1338-1349 山本龍一，酒向慎司，北村正、「Ｒｙｒｙ：多声楽器に対応可能な音響入力自動伴奏システム」、情報処理学会インタラクション、２０１３年３月２日、３ＥＸＢ−１３Ryuichi Yamamoto, Shinji Sakaki, Tadashi Kitamura, “Ryry: an automatic audio input accompaniment system for polyphonic instruments”, Information Processing Society of Japan Interaction, March 2, 2013, 3EXB-13

上記非特許文献１では、計算量の増大が抑制されてはいるが、十分ではない。また、上記非特許文献２では、テンポの推定に際し、状態系列を構成する各状態の確信度が考慮されていないため、楽譜位置の推定精度が低下する可能性がある。したがって、このようなスコアアライメント装置をメディアプレーヤ（自動伴奏装置、画像表示装置など（特許４３９９９６１号公報、特許４５３４９２６号公報など参照））に適用した場合、演奏者による演奏の進行位置と他のメディア（伴奏、画像など）の再生位置とがずれる可能性がある。つまり、演奏者の演奏に対する他のメディアの再生が不自然に感じられる可能性がある。 In the said nonpatent literature 1, although the increase in calculation amount is suppressed, it is not enough. In Non-Patent Document 2, since the certainty of each state constituting the state series is not taken into account when estimating the tempo, there is a possibility that the estimation accuracy of the score position is lowered. Therefore, when such a score alignment device is applied to a media player (automatic accompaniment device, image display device, etc. (see Japanese Patent No. 4399996, Japanese Patent No. 4534926 etc.)), the progress position of the performance by the performer and other media There is a possibility that the playback position of (accompaniment, images, etc.) is shifted. In other words, playback of other media in response to the performer's performance may feel unnatural.

本発明は上記問題に対処するためになされたもので、その目的は、計算量の増大をより効果的に抑制するとともに、楽譜位置の推定精度を向上させたスコアアライメント装置を提供することにある。なお、下記本発明の各構成要件の記載においては、本発明の理解を容易にするために、実施形態の対応箇所の符号を括弧内に記載しているが、本発明の各構成要件は、実施形態の符号によって示された対応箇所の構成に限定解釈されるべきものではない。 The present invention has been made to address the above problems, and an object of the present invention is to provide a score alignment apparatus that more effectively suppresses an increase in the amount of calculation and improves the accuracy of estimating a score position. . In addition, in the description of each constituent element of the present invention below, in order to facilitate understanding of the present invention, reference numerals of corresponding portions of the embodiment are described in parentheses, but each constituent element of the present invention is The present invention should not be construed as being limited to the configurations of the corresponding portions indicated by the reference numerals of the embodiments.

上記目的を達成するために、本発明の特徴は、楽曲の演奏音を表わす音響信号を取り込みつつ前記取り込んだ音響信号を分析することにより、前記楽曲の楽譜のうち現在演奏されている部分を表す楽譜位置及びテンポを実時間で推定するスコアアライメント装置（１０）であって、楽譜位置をそれぞれ表す状態の系列として表された確率モデル（ＨＳＭＭ）であって、現在の状態が直前の状態に依存する性質と、現在の状態からいずれの状態にも遷移可能である性質とを備えた確率モデルに基づいて前記楽譜位置の確率密度及びテンポの確率密度を計算する楽譜位置確率密度・テンポ確率密度計算手段（Ｓ１５１〜Ｓ１５５）と、真の楽譜位置（ｘ_ｔ）、真の楽譜位置の遷移速度（ｖ_ｔ）、及び真の楽譜位置の遷移加速度（ａ_ｔ）を用いて表された自己回帰過程に基づいて、前記計算された楽譜位置の確率密度の系列を用いて、現在の楽譜位置及びテンポを決定する楽譜位置・テンポ決定手段（Ｓ１６２〜Ｓ１６５）と、を備えたスコアアライメント装置としたことにある。なお、上記の確率モデル（ＨＳＭＭ）の状態遷移においては、遷移前の状態と遷移後の状態が同じであってもよい。 In order to achieve the above object, a feature of the present invention represents a portion of a musical score currently being played in the musical score of the musical piece by analyzing the captured acoustic signal while taking in an acoustic signal representing the musical performance sound. A score alignment apparatus (10) for estimating a score position and a tempo in real time, a probability model (HSMM) expressed as a series of states each representing a score position, the current state depending on the immediately preceding state Score position probability density / tempo probability density calculation that calculates the probability density of the score position and the probability density of the tempo based on a probability model that has the property of being able to transition to any state from the current state and means (S151~S155), the true score position _(x t), the transition rates of the true score position _(v t), and the true score position of the transition acceleration _{(a t)} using A score position / tempo determination means (S162 to S165) for determining the current score position and tempo using the calculated probability density sequence of the score position based on the expressed autoregressive process. The score alignment apparatus is used. In the state transition of the probability model (HSMM), the state before the transition and the state after the transition may be the same.

この場合、前記状態（Ｓ_{ｉ，ｎ，Ｔ}（ｔ））は、前記楽譜を分割して得られた複数の区間のうちの現在の楽譜位置が含まれる区間（ｉ）と、現在の楽譜位置が含まれる区間の先頭から現在の楽譜位置まで演奏するのにかかった時間（ｎ）と、前記区間全体を演奏するのにかかる時間（Ｔ）とを用いて特定され、前記確率モデルは、前記状態の系列として表された隠れセミマルコフモデル（ＨＳＭＭ）であり、前記楽譜位置確率密度・テンポ確率密度計算手段は、前記隠れセミマルコフモデルに前向きアルゴリズムを適用することにより前記楽譜位置の確率密度及びテンポの確率密度を計算するとよい。 In this case, the state (S _{i, n, T} (t)) includes the section (i) including the current score position among the plurality of sections obtained by dividing the score and the current score position. Is specified using a time (n) required to play from the beginning of the section including the current score position and a time (T) required to play the entire section, and the probability model is A hidden semi-Markov model (HSMM) represented as a sequence of states, wherein the score position probability density / tempo probability density calculating means applies a forward algorithm to the hidden semi-Markov model, Calculate the probability density of the tempo.

上記のように構成したスコアアライメント装置においては、まず、楽譜位置確率密度及びテンポ確率密度が計算される。そして、楽譜位置確率密度の系列及びテンポ確率密度の系列を用いて、高次の自己回帰過程に基づいて楽譜位置及びテンポが決定される。これによれば、テンポの時間微分（すなわち、楽譜位置の加速度）が連続的であって、且つ「０」に戻る傾向を示すという音楽音響信号の性質を表現できる。また、楽譜位置確率密度及びテンポ確率密度の分散が大きい区間では、楽譜位置確率密度の系列及びテンポ確率密度の系列の経路が平滑化されるという挙動が得られる。これにより、楽譜位置及びテンポの推定精度を向上させることができる。 In the score alignment apparatus configured as described above, first, a score position probability density and a tempo probability density are calculated. Then, the score position and the tempo are determined based on the higher-order autoregressive process using the score position probability density sequence and the tempo probability density sequence. According to this, it is possible to express the property of the music acoustic signal that the temporal differentiation of the tempo (that is, the acceleration of the musical score position) is continuous and shows a tendency to return to “0”. Further, in a section where the distribution of the score position probability density and the tempo probability density is large, a behavior is obtained in which the path of the score position probability density sequence and the tempo probability density sequence is smoothed. Thereby, the estimation accuracy of the score position and the tempo can be improved.

また、本発明の他の特徴は、前記楽譜位置確率密度・テンポ確率密度計算手段は、前記楽譜を分割して得られた複数の区間のうちの現在の楽譜位置が含まれる区間と、現在の楽譜位置が含まれる区間の先頭から現在の楽譜位置まで演奏するのにかかった時間と、前記区間全体を演奏するのにかかる平均的な時間とを用いて特定される状態の系列として表された隠れマルコフモデル（ＨＭＭ）に基づいて、前記隠れセミマルコフモデルの区間のうち前向きアルゴリズムを適用する複数の区間を検索する区間検索手段（Ｓ１５３）を備えた、スコアアライメント装置としたことにある。 Another feature of the present invention is that the score position probability density / tempo probability density calculating means includes a section including a current score position among a plurality of sections obtained by dividing the score, Expressed as a sequence of states specified using the time taken to play from the beginning of the section containing the score position to the current score position and the average time taken to play the entire section The score alignment apparatus includes section search means (S153) for searching a plurality of sections to which the forward algorithm is applied among sections of the hidden semi-Markov model based on a hidden Markov model (HMM).

これによれば、通常の隠れマルコフモデルに前向きアルゴリズムを適用して前向き変数を計算し、前向き変数が最大となる状態（区間）が検索される。そして、隠れセミマルコフモデルにおいて、前記検索された状態（区間）に対応する区間を含む複数の区間に対して前向きアルゴリズムが適用され、楽譜位置確率密度及びテンポ確率密度が計算される。したがって、隠れセミマルコフモデルを構成する全ての区間に対して前向きアルゴリズムを適用する場合に比べて、計算量の増大を抑制することができる。 According to this, a forward variable is calculated by applying a forward algorithm to a normal hidden Markov model, and a state (section) in which the forward variable is maximized is searched. Then, in the hidden semi-Markov model, a forward algorithm is applied to a plurality of sections including a section corresponding to the searched state (section), and a score position probability density and a tempo probability density are calculated. Therefore, an increase in the amount of calculation can be suppressed as compared with the case where the forward algorithm is applied to all the sections constituting the hidden semi-Markov model.

なお、この場合、隠れマルコフモデル（ＨＭＭ）の区間数を隠れセミマルコフモデル（ＨＳＭＭ）の区間数よりも多くすると良い。これによれば、前記隠れセミマルコフモデル（ＨＳＭＭ）の区間のうち前向きアルゴリズムを適用する複数の区間を、隠れマルコフモデル（ＨＭＭ）の区間数と隠れセミマルコフモデル（ＨＳＭＭ）の区関数とが同一である場合に比べて、より適切に検索できる。 In this case, it is preferable that the number of sections of the hidden Markov model (HMM) is larger than the number of sections of the hidden semi-Markov model (HSMM). According to this, among the sections of the hidden semi-Markov model (HSMM), the number of sections of the hidden Markov model (HMM) is identical to the section function of the hidden semi-Markov model (HSMM). Compared to the case where it is, it can search more appropriately.

また、本発明は、スコアアライメント装置が備えるコンピュータに適用されるコンピュータプログラムとしても実施可能である。 The present invention can also be implemented as a computer program applied to a computer provided in the score alignment apparatus.

本発明の一実施形態に係るスコアアライメント装置の構成を表わすブロック図である。It is a block diagram showing the structure of the score alignment apparatus which concerns on one Embodiment of this invention. スコアアライメント装置の機能ブロック図である。It is a functional block diagram of a score alignment apparatus. 隠れセミマルコフモデル及び隠れマルコフモデルの状態遷移図である。It is a state transition diagram of a hidden semi-Markov model and a hidden Markov model. 音モデルの一例を表わすグラフである。It is a graph showing an example of a sound model. 楽譜位置観測密度の系列に対するテンポ軌跡モデルを推定する過程を表わす概念図であるIt is a conceptual diagram showing the process of estimating the tempo trajectory model for the sequence of score position observation density スコアアライメント処理を表わすフローチャートである。It is a flowchart showing a score alignment process. 楽譜位置確率密度・テンポ確率密度計算処理を表わすフローチャートである。It is a flowchart showing a score position probability density / tempo probability density calculation process. 楽譜位置・テンポ決定処理を表わすフローチャートである。It is a flowchart showing a score position / tempo determination process.

本発明の一実施形態に係るスコアアライメント装置１０について説明する。スコアアライメント装置１０は、楽曲の演奏を表わす音響信号を取り込みつつ前記取り込んだ音響信号を分析して、前記楽曲の楽譜のうちのどの部分が現在演奏されているのかを推定する。本実施形態では、楽譜を表わす楽譜データとして、標準ＭＩＤＩファイル形式のデータを用いる。 A score alignment apparatus 10 according to an embodiment of the present invention will be described. The score alignment apparatus 10 analyzes the captured acoustic signal while capturing the acoustic signal representing the performance of the music, and estimates which part of the musical score of the music is currently being played. In the present embodiment, standard MIDI file format data is used as musical score data representing a musical score.

スコアアライメント装置１０は、図１に示すように、入力操作子１１、コンピュータ部１２、表示器１３、記憶装置１４、外部インターフェース回路１５及びサウンドシステム１６を備えており、これらがバスＢＳを介して接続されている。 As shown in FIG. 1, the score alignment apparatus 10 includes an input operator 11, a computer unit 12, a display 13, a storage device 14, an external interface circuit 15, and a sound system 16, which are connected via a bus BS. It is connected.

入力操作子１１は、オン・オフ操作に対応したスイッチ（例えば数値を入力するためのテンキー）、回転操作に対応したボリューム又はロータリーエンコーダ、スライド操作に対応したボリューム又はリニアエンコーダ、マウス、タッチパネルなどから構成される。これらの操作子は、演奏者の手によって操作されて、スコアアライメント処理の開始又は停止、スコアアライメント処理に関する各種パラメータの設定などに用いられる。入力操作子１１を操作すると、その操作内容を表す操作情報が、バスＢＳを介して、後述するコンピュータ部１２に供給される。 The input operator 11 includes a switch corresponding to an on / off operation (for example, a numeric keypad for inputting a numerical value), a volume or rotary encoder corresponding to a rotation operation, a volume or linear encoder corresponding to a slide operation, a mouse, a touch panel, etc. Composed. These operators are operated by a player's hand and used for starting or stopping the score alignment process, setting various parameters related to the score alignment process, and the like. When the input operator 11 is operated, operation information indicating the operation content is supplied to the computer unit 12 described later via the bus BS.

コンピュータ部１２は、バスＢＳにそれぞれ接続されたＣＰＵ１２ａ、ＲＯＭ１２ｂ及びＲＡＭ１２ｃからなる。ＣＰＵ１２ａは、後述するスコアアライメント処理の手順を表わしたスコアアライメントプログラムをＲＯＭ１２ｂから読み出して実行する。ＲＯＭ１２ｂには、前記プログラムに加えて、初期設定パラメータ、表示器１３に表示される画像を表わす表示データを生成するための図形データ及び文字データなどの各種データが記憶されている。ＲＡＭ１２ｃには、前記プログラムの実行時に必要なデータが一時的に記憶される。 The computer unit 12 includes a CPU 12a, a ROM 12b, and a RAM 12c connected to the bus BS. The CPU 12a reads out from the ROM 12b and executes a score alignment program representing a procedure of score alignment processing described later. In addition to the program, the ROM 12b stores various data such as initial setting parameters, graphic data for generating display data representing an image displayed on the display 13, and character data. The RAM 12c temporarily stores data necessary for executing the program.

表示器１３は、液晶ディスプレイ（ＬＣＤ）によって構成される。コンピュータ部１２は、図形データ、文字データなどを用いて表示すべき内容を表わす表示データを生成して表示器１３に供給する。例えば、コンピュータ部１２は、後述するスコアアライメント処理により推定された楽譜位置を表わす表示データを表示器１３に供給する。表示器１３は、コンピュータ部１２から供給された表示データに基づいて画像を表示する。 The display 13 is configured by a liquid crystal display (LCD). The computer unit 12 generates display data representing contents to be displayed using graphic data, character data, and the like, and supplies the display data to the display unit 13. For example, the computer unit 12 supplies the display unit 13 with display data representing a musical score position estimated by score alignment processing described later. The display device 13 displays an image based on the display data supplied from the computer unit 12.

また、記憶装置１４は、ＨＤＤ、ＦＤＤ、ＣＤ、ＤＶＤなどの大容量の不揮発性記録媒体と、同各記録媒体に対応するドライブユニットから構成されている。記憶装置１４には、楽譜を表わす楽譜データ（標準ＭＩＤＩファイル）が記憶されている。楽譜データは予め記憶装置１４に記憶されていてもよいし、後述する外部インターフェース回路１５を介して外部から取り込んでもよい。 The storage device 14 includes a large-capacity nonvolatile recording medium such as an HDD, FDD, CD, or DVD, and a drive unit corresponding to each recording medium. The storage device 14 stores musical score data (standard MIDI file) representing a musical score. The musical score data may be stored in the storage device 14 in advance or may be taken in from the outside via the external interface circuit 15 described later.

外部インターフェース回路１５は、スコアアライメント装置１０を電子音楽装置、パーソナルコンピュータなどの外部機器に接続可能とする接続端子を備えている。スコアアライメント装置１０は、外部インターフェース回路１５を介して、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、インターネットなどの通信ネットワークにも接続可能である。 The external interface circuit 15 includes a connection terminal that enables the score alignment device 10 to be connected to an external device such as an electronic music device or a personal computer. The score alignment apparatus 10 can be connected to a communication network such as a LAN (Local Area Network) or the Internet via the external interface circuit 15.

サウンドシステム１６は、ディジタル音信号を生成する音源回路、前記生成されたディジタル音信号をアナログ音信号に変換するＤ／Ａ変換器、前記変換したアナログ音信号を増幅するアンプ、及び増幅されたアナログ音信号を音響信号に変換して出力するスピーカを備えている。また、サウンドシステム１６は、楽曲の演奏により放音された楽音を収音するためのマイク、収音された楽音を表わすアナログ音信号をディジタル音信号に変換するＡ／Ｄ変換器、変換されたディジタル音信号を表わすサンプルデータを一時的に記憶するバッファも備えている。つまり、サウンドシステム１６は、楽音を所定のサンプリング周期（例えば、１／４４１００ｓｅｃ）でサンプリングし、サンプリングによって得られたサンプルデータを前記バッファに記憶する。 The sound system 16 includes a sound source circuit that generates a digital sound signal, a D / A converter that converts the generated digital sound signal into an analog sound signal, an amplifier that amplifies the converted analog sound signal, and an amplified analog A speaker that converts a sound signal into an acoustic signal and outputs the sound signal is provided. The sound system 16 includes a microphone for collecting a musical sound emitted by playing a musical piece, an A / D converter for converting an analog sound signal representing the collected musical sound into a digital sound signal, and a converted sound. A buffer for temporarily storing sample data representing the digital sound signal is also provided. That is, the sound system 16 samples the musical sound at a predetermined sampling period (for example, 1/444100 sec), and stores the sample data obtained by the sampling in the buffer.

次に、楽譜位置及びテンポの推定手法について説明する。スコアアライメント装置１０は、図２に示すように、まず、マイクを介して楽曲の演奏を表わす音響信号を取り込みつつ前記取り込んだ音響信号を分析して現在演奏されている楽譜位置の確率密度及び現在のテンポの確率密度を計算する。そして、前記計算された確率密度の系列を用いて、最適な楽譜位置及びテンポを決定する。前記決定された楽譜位置及びテンポは、制御対象（表示器１３、サウンドシステム１６など）の制御に用いられる。 Next, a score position and tempo estimation method will be described. As shown in FIG. 2, the score alignment apparatus 10 first analyzes the captured acoustic signal while capturing the acoustic signal representing the performance of the music through the microphone, and the probability density of the currently played musical score position and the current Calculate the probability density of the tempo. Then, an optimal score position and tempo are determined using the calculated probability density series. The determined musical score position and tempo are used to control a control target (display 13, sound system 16, etc.).

次に、楽譜位置及びテンポの確率密度の計算手法について説明する。本実施形態では、以下説明するように、前記複数の区間の系列が隠れセミマルコフモデルＨＳＭＭ（図３、式（２）参照）としてモデル化される。まず、図３に示すように、楽曲の楽譜が複数の区間ｉ（＝１，２，・・・，Ｉ）に分割される。各区間の長さは同一である。例えば、各区間の長さは、４分音符１つ分の長さである。「ｉ」は、楽曲の先頭から数えて何番目の区間であるかを表わすインデックスである。実際の演奏に対応した楽譜は、上記のように分割された複数の区間の系列として表現可能である。 Next, a method for calculating the score density and the tempo probability density will be described. In the present embodiment, as will be described below, the series of the plurality of sections is modeled as a hidden semi-Markov model HSMM (see FIG. 3, equation (2)). First, as shown in FIG. 3, the musical score is divided into a plurality of sections i (= 1, 2,..., I). The length of each section is the same. For example, the length of each section is the length of one quarter note. “I” is an index indicating the number of sections counted from the beginning of the music. The score corresponding to the actual performance can be expressed as a series of a plurality of sections divided as described above.

１つの区間を演奏するのにかかる時間（演奏が１つの区間に留まる時間）は、テンポに依存する。例えば、テンポが６０ＢＰＭ（ＢｅａｔｓＰｅｒＭｉｎｕｔｅ）の場合は、１つの区間を演奏するのに１秒かかる。また、テンポが１２０ＢＰＭの場合は、１つの区間を演奏するのに０．５秒かかる。ここで、例えば０．１秒を単位時間とするフレームの数として表現すれば、テンポが６０ＢＰＭの場合は、１つの区間を演奏するのに１０フレーム分の時間がかかり、テンポが１２０ＢＰＭの場合は、１つの区間を演奏するのに５フレーム分の時間がかかる。 The time taken to play one section (the time during which the performance stays in one section) depends on the tempo. For example, when the tempo is 60 BPM (Beats Per Minute), it takes 1 second to play one section. When the tempo is 120 BPM, it takes 0.5 seconds to play one section. Here, if expressed as the number of frames with a unit time of 0.1 seconds, for example, when the tempo is 60 BPM, it takes 10 frames to play one section, and when the tempo is 120 BPM. It takes time for 5 frames to play one section.

時刻ｔ（楽曲の先頭から数えてｔ番目のフレーム）において区間ｉが演奏されており、区間ｉを演奏するのにフレーム数Ｔに相当する時間がかかることが確定していて、区間ｉの先頭から数えてｎ番目のフレームまで演奏し終えた状態（区間ｉの先頭から現在の楽譜位置まで演奏するのにｎ個分のフレームの時間がかかった状態）を状態Ｓ_{ｉ，ｎ，Ｔ}（ｔ）と表記する。図３における○印は、各状態Ｓ_{ｉ，ｎ，Ｔ}（ｔ）に対応している。なお、各区間内ではテンポは変化しないものとする。つまり、同図において矢印で結ばれた○印の各系列においては、左側の○印から右側の○印へ順に遷移する。したがって、各系列を構成する○印の数がテンポに対応している。すなわち、○印の数が少ない系列ほどテンポが速く、○印の数が多い系列ほどテンポが遅い。よって、各区間において初期状態を１つ選択すれば、演奏がその区間に留まる長さ（フレーム数Ｔ）が確定する。 The section i is played at time t (t-th frame from the beginning of the music), and it is determined that it takes time corresponding to the number of frames T to play the section i. The state in which the performance has been completed up to the nth frame from the beginning (the state in which it took n frames to perform from the beginning of the section i to the current score position) is the state S _{i, n, T} (t ). The circles in FIG. 3 correspond to the states S _{i, n, T} (t). It is assumed that the tempo does not change within each section. That is, in each series of circles connected by arrows in the figure, the transition from the left circle to the right circle is sequentially performed. Therefore, the number of circles constituting each series corresponds to the tempo. That is, the tempo is faster as the series has a smaller number of circles, and the tempo is slower as the series has a larger number of circles. Therefore, if one initial state is selected in each section, the length (number of frames T) that the performance stays in that section is determined.

また、通常、演奏は楽譜の先頭から末尾へ向かって順に進行するので、１つの区間の演奏が終了したとき、その区間の１つ後の区間への遷移のみが許容される。ここで、区間ｉから区間ｊへ遷移する確率を確率τ_ｉ，ｊと表記する。また、１つの区間の末尾の状態から、次の区間の初期状態へ遷移するときには、任意の初期状態へ遷移可能とする。つまり、１つの区間から次の区間へ遷移するとき、テンポが変更され得る。ここで、フレーム数Ｔ´に相当するテンポから、フレーム数Ｔに相当するテンポに遷移する確率を確率τ_Ｔ‘，Ｔと表記する。すると、状態Ｓ_{ｉ´，ｎ´，Ｔ´}（ｔ）から状態Ｓ_{ｉ，ｎ，Ｔ}（ｔ＋１）へ遷移する確率である状態遷移確率τ_{（ｉ´，ｎ´，Ｔ´）〜（ｉ，ｎ，Ｔ）}は、下記の式（１）のように表わされる。なお、本実施形態では、説明を簡単にするために、楽譜位置を遠く離れた区間へ遷移させる演奏記号（ダ・カーポ、繰り返し記号など）が楽譜に含まれないと仮定する。

In general, since the performance progresses in order from the beginning to the end of the score, when the performance of one section is completed, only the transition to the next section is allowed. Here, the probability of transition from section i to section _j is denoted as probability τ _{i, j} . Further, when transitioning from the last state of one section to the initial state of the next section, it is possible to transition to an arbitrary initial state. That is, when transitioning from one section to the next section, the tempo can be changed. Here, the probability of transition from the tempo corresponding to the number of frames _{T ′} to the tempo corresponding to the number of frames _T is expressed as probability τ _{T ′, T.} Then, the state _{S i', n',} T'state from _{(t) S i, n,} T (t + 1) to the probability of transition state transition probability _{τ (i', n', T')} ~ (i, _{n, T)} is expressed as the following formula (1). In this embodiment, in order to simplify the description, it is assumed that performance symbols (da capo, repetition symbols, etc.) that shift the musical score position to a distant section are not included in the musical score.

しかし、演奏者が楽曲を実際に演奏するとき、楽譜では指定されていない部分を繰り返し演奏したり、弾けない部分を飛ばしたりすることがある。つまり、楽譜位置が、隣接する区間ではなく、遠く離れた区間へ遷移する（ジャンプする）可能性がある。そこで、隠れマルコフモデルＨＭＭや隠れセミマルコフモデルＨＳＭＭに従って区間が遷移する確率γ、状態Ｓ_{ｉ，ｎ，Ｔ}を観測する確率π_{ｉ，ｎ，Ｔ}、及び状態Ｓ_{ｉ，ｎ，Ｔ}の観測尤度Ｏ_{ｉ，ｎ，Ｔ}（ｔ）を用いて、下記の式（２）に示すようなモデルを設定する。観測尤度Ｏ_{ｉ，ｎ，Ｔ}（ｔ）については後述する。

However, when a performer actually plays a piece of music, a part that is not specified in the score may be played repeatedly or a part that cannot be played may be skipped. That is, there is a possibility that the musical score position changes (jumps) to a section far away from the adjacent section. Therefore, the probability γ that the interval transitions according to the hidden Markov model HMM or the hidden semi-Markov model HSMM _, the probability π _{i, n, T} that observes the state S _{i, n, T} , and the observation likelihood of the state S _{i, n, T} A model as shown in the following equation (2) is set using O _{i, n, T} (t). The observation likelihood O _{i, n, T} (t) will be described later.

次に、サウンドシステム１６によってサンプリングされた楽音のフレームｔに含まれる各音高ｍのパワーｙ_ｍ（ｔ）及び１つ前のフレームからのパワーの増加量Δｙ_ｍ（ｔ）が音響信号の特徴量として計算される。ここで、パワーｙ_ｍ（ｔ）の観測尤度及びパワーの増加量Δｙ_ｍ（ｔ）の観測尤度は、式（３）及び式（４）に示すように、それぞれｖｏｎＭｉｓｅｓ−Ｆｉｓｈｅｒ分布に従うと仮定する。

Next, the power y _m (t) of each pitch m included in the musical sound frame t sampled by the sound system 16 and the power increase Δy _m (t) from the previous frame are the characteristics of the acoustic signal. Calculated as a quantity. Here, the observation likelihood of the power y _m (t) and the observation likelihood of the power increase amount Δy _m (t) follow the von Mises-Fisher distribution, as shown in the equations (3) and (4), respectively. Assume that

すると、式（２）の観測尤度Ｏ_{ｉ，ｎ，Ｔ}（ｔ）は、下記の式（５）のように表わされる。

Then, the observation likelihood O _{i, n, T} (t) of the equation (2) is expressed as the following equation (5).

なお、上記式（３）及び式（４）における「κ」は、ｖｏｎＭｉｓｅｓＦｉｓｈｅｒ分布の集中度を表わす。つまり、「κ」が大きいほど、パワーｙ_ｍ（ｔ）及びパワーの増加量Δｙ_ｍ（ｔ）の観測尤度の分布図において、平均値を中心に急峻なピークが形成される。「κ」の値は、例えば「１００」に設定される。また、「ｗ（ｋ）」は音響信号の特徴量のテンプレート（以下、音モデルと呼ぶ）である。「ｋ」は音モデルを特定するためのインデックスである。各音モデルは、各楽器を用いて各音高の楽音（単音）をそれぞれ発生させ、それらの楽音の特徴量を計算して記録したデータである。例えば、「ｗ（ｋ＝１）」は、ピアノを弾いてＭＩＤＩノートナンバが「６９」に相当する音高の音を発生させ、その特徴量（パワー）を記録したデータである。また、例えば、「ｗ（ｋ＝２）」は、バイオリンを弾いてＭＩＤＩノートナンバが「６９」に相当する音高の楽音を発生させ、その特徴量（パワー）を記録したデータである。また、「ｈ」は、各音モデルの強度を表わす。なお、発音数が多い楽曲の場合、設定された各音モデルの強度と実際に演奏された楽音の強度に大きな差が生じることがある。この場合、「κ」の値を小さくして、分散をより大きくすればよい。 Note that “κ” in the above formulas (3) and (4) represents the degree of concentration of the von Mises Fisher distribution. That is, as “κ” is larger, a sharper peak is formed around the average value in the distribution of observation likelihood of the power y _m (t) and the increase amount Δy _m (t) of the power. The value of “κ” is set to “100”, for example. Further, “w (k)” is a template for a characteristic amount of an acoustic signal (hereinafter referred to as a sound model). “K” is an index for specifying a sound model. Each sound model is data obtained by generating musical tones (single notes) of each pitch using each musical instrument, and calculating and recording characteristic quantities of those musical tones. For example, “w (k = 1)” is data obtained by playing a piano to generate a sound having a pitch corresponding to a MIDI note number “69” and recording the characteristic amount (power). Further, for example, “w (k = 2)” is data in which a musical tone having a pitch corresponding to a MIDI note number “69” is generated by playing a violin and the characteristic amount (power) is recorded. “H” represents the intensity of each sound model. In the case of music with a large number of pronunciations, there may be a large difference between the intensity of each set sound model and the intensity of the musical sound actually played. In this case, the value of “κ” may be decreased to increase the dispersion.

パワーｙ_ｍ（ｔ）の観測尤度（式（３））に関して具体的に説明する。説明を簡単にするために、分析対象の楽曲は、単一の楽器で演奏される楽曲とし、音モデルのインデックスとしての「ｋ」とＭＩＤＩノートナンバＮＮとが一致すると仮定する。ここで、現在の状態が状態Ｓ_{ｉ＝４，ｎ＝６，Ｔ＝１２}であるとする。このときのパワーｙ_ｍ（ｔ）の観測尤度について考察する。この場合、ｉ＋ｎ／Ｔ＝４＋６／１２＝４．５であるから楽譜位置４．５に対応する「ｈ（４．５）」を抽出する。図４においては、各音モデルの強度（つまり「ｈ」の大きさ）を濃淡グラフとして示している。同図において濃く示された要素の強度が大きい。この例ではｋ＝６９の要素の強度が大きいので、結果として、パワーｙ_ｍ（ｔ）の観測尤度は、「ｗ（ｋ＝６９）」の要素が支配的である平均値を持つように分布する。 The observation likelihood (formula (3)) of the power y _m (t) will be specifically described. In order to simplify the explanation, it is assumed that the music to be analyzed is a music played by a single instrument, and that “k” as the sound model index matches the MIDI note number NN. Here, it is assumed that the current state is the state S _{i = 4, n = 6, T = 12} . Consider the observation likelihood of power y _m (t) at this time. In this case, since i + n / T = 4 + 6/12 = 4.5, “h (4.5)” corresponding to the score position 4.5 is extracted. In FIG. 4, the intensity (that is, the magnitude of “h”) of each sound model is shown as a gray scale graph. The strength of elements shown dark in the figure is large. In this example, the intensity of the element of k = 69 is large, and as a result, the observation likelihood of the power y _m (t) has an average value in which the element of “w (k = 69)” is dominant. Distributed.

隠れセミマルコフモデルＨＳＭＭにおける前向き変数α_{ｉ，ｎ，Ｔ}（ｔ）は、下記の式（６）のように表される。

The forward variable α _{i, n, T} (t) in the hidden semi-Markov model HSMM is expressed as the following equation (6).

この式（６）を整理すると、下記の式（７）に示す漸化式が得られる。

If this formula (6) is arranged, the recurrence formula shown in the following formula (7) is obtained.

ここで、説明を簡単にするために、いずれの楽譜位置にも一様に遷移可能なモデルについて考察する。この場合、状態Ｓを観測する確率πは、状態数｜Ｓ｜を用いて、式（８）のように表わされる。

Here, in order to simplify the description, a model that can uniformly transition to any musical score position will be considered. In this case, the probability π of observing the state S is expressed as in Equation (8) using the number of states | S |.

また、初期の状態に遷移する確率（１−γ）を「０．０１」とすると、状態ｉの観測尤度Ｏ_ｉ（ｔ）、状態ｉから状態ｊへの遷移確率τ_ｉ，ｊ、状態ｉの前向き変数α_ｉ（ｔ）を用いて、前向き変数αの更新式は、式（９）のように表わされる。

If the probability (1-γ) of transition to the initial state is “0.01”, the observation likelihood O _i (t) of state i, transition probability τ _{i, j} from state i to state _j , state Using i's forward variable α _i (t), an update equation for the forward variable α is expressed as in Equation (9).

式（９）における「τ_ｉ，ｊ×０．９９」の部分と、「０．０１／｜Ｓ｜」の部分は、楽譜データを読み込んだ際に計算しておくことができる。一方、式（７）において、「γ」の値を「１」とすれば、式（１０）に示すように、通常の隠れマルコフモデルＨＭＭにおける前向き変数の漸化式が得られる。

The part of “τ _{i, j} × 0.99” and the part of “0.01 / | S |” in equation (9) can be calculated when the musical score data is read. On the other hand, if the value of “γ” is set to “1” in equation (7), a recurrence formula of the forward variable in the normal hidden Markov model HMM is obtained as shown in equation (10).

したがって、隠れセミマルコフモデルＨＳＭＭにおける前向き変数の更新の演算と、通常の隠れマルコフモデルＨＭＭにおける前向き変数の更新の演算との違い（オーバーヘッド）は、「０．０１／｜Ｓ｜」を加算する処理のみである。なお、この例では、いずれの楽譜位置にも一様に遷移可能としているが、状態遷移が制限されている場合であってもオーバーヘッドに関しては、この例と同様である。 Therefore, the difference (overhead) between the computation of the forward variable update in the hidden semi-Markov model HSMM and the computation of the forward variable update in the normal hidden Markov model HMM is a process of adding “0.01 / | S |”. Only. In this example, it is possible to make uniform transition to any musical score position, but the overhead is the same as in this example even when state transition is restricted.

本実施形態においては、分割された区間の時系列を隠れセミマルコフモデルＨＳＭＭとしてモデル化しているので、通常の隠れマルコフモデルとしてモデル化した場合に比べて状態数が膨大であり、区間ｉ、フレーム数ｎ及びフレーム数Ｔの組み合わせの数も膨大である。したがって、前向きアルゴリズムを用いて楽譜位置の確率密度を計算すると、計算量が膨大になってしまう。そこで、スコアアライメント装置１０は、以下説明するように、通常の隠れマルコフモデルＨＭＭを用いて、前向きアルゴリズムを適用する隠れセミマルコフモデルＨＳＭＭの区間を絞り込む区間検索手段を備える。 In the present embodiment, the time series of the divided sections is modeled as a hidden semi-Markov model HSMM, so the number of states is enormous compared to the case of modeling as a normal hidden Markov model, and the section i, frame The number of combinations of the number n and the frame number T is also enormous. Therefore, if the probability density of the score position is calculated using a forward algorithm, the calculation amount becomes enormous. Therefore, as will be described below, the score alignment apparatus 10 includes section search means for narrowing down a section of a hidden semi-Markov model HSMM to which a forward algorithm is applied using a normal hidden Markov model HMM.

通常の隠れマルコフモデルＨＭＭは次のように定義される。すなわち、上記の隠れセミマルコフモデルＨＳＭＭと同様に楽譜を分割し、分割した区間のそれぞれに状態変数を割り当てる。ただし、隠れセミマルコフモデルＨＳＭＭの区間数よりも隠れマルコフモデルＨＭＭの区間数のほうが多くなるように楽譜を分割する。例えば、隠れセミマルコフモデルＨＳＭＭにおいては、それぞれの区間の長さが４分音符の長さになるように楽譜を分割し、隠れマルコフモデルＨＭＭにおいては、それぞれの区間の長さが３２分音符の長さになるように楽譜を分割する。また、各状態（区間）は自分自身にも遷移可能とする。つまり、隠れマルコフモデルＨＭＭにおいては、ある状態から自分自身に遷移する確率が「τ^{（ＨＭＭ）}」であり、ある状態から１つ後ろの状態へ遷移する確率が「１−τ^{（ＨＭＭ）}」である。このような隠れマルコフモデルＨＭＭに対して実時間で前向きアルゴリズムを適用し、各フレームｔにおいて前向き変数が最大となる状態を検索する。そして、前記検索した状態に対応する隠れセミマルコフモデルＨＳＭＭの区間に隣接する所定数（例えば１６個（４拍子の曲の４小節分））の区間ΔＳにのみ前向きアルゴリズムを適用する。 A normal hidden Markov model HMM is defined as follows. That is, the score is divided in the same manner as the above-described hidden semi-Markov model HSMM, and a state variable is assigned to each of the divided sections. However, the musical score is divided so that the number of sections of the hidden Markov model HMM is larger than the number of sections of the hidden semi-Markov model HSMM. For example, in the hidden semi-Markov model HSMM, the score is divided so that the length of each section is the length of a quarter note. In the hidden Markov model HMM, the length of each section is a 32nd note. Divide the score into lengths. Each state (section) can also transition to itself. That is, in the hidden Markov model HMM, the probability of transition from a certain state to itself is “τ ^(HMM) ”, and the probability of transition from one state to the next state is “1-τ ^(HMM) ”. is there. A forward algorithm is applied to such a hidden Markov model HMM in real time to search for a state in which the forward variable is maximum in each frame t. Then, the forward algorithm is applied only to a predetermined number (for example, 16 pieces (four bars of music of four beats)) adjacent to the section of the hidden semi-Markov model HSMM corresponding to the searched state.

なお、「τ^{（ＨＭＭ）}」は、１フレームあたりに遷移する区間数とみなすことができる。この「区間」とは、隠れマルコフモデルＨＭＭの区間である。したがって、自分自身に遷移する確率を表す「τ^{（ＨＭＭ）}」と、現在推定されているテンポ（すなわちフレーム数Ｔ）とが整合していないと、区間ΔＳが適切に得られない可能性がある。そこで、隠れセミマルコフモデルＨＳＭＭに基づいて計算された確率密度を用いて、現在のフレーム数Ｔに対する分布（＝Σ_ｉ，ｎα_{ｉ，ｎ，Ｔ}（ｔ））を計算する。そして、現在のフレーム数Ｔを用いて１フレームあたりに遷移する区間数の期待値を計算することにより、「τ^{（ＨＭＭ）}」を決定する。これにより、隠れマルコフモデルＨＭＭと隠れセミマルコフモデルＨＳＭＭのテンポが整合する。 Note that “τ ^(HMM) ” can be regarded as the number of sections that transition per frame. This “section” is a section of the hidden Markov model HMM. Therefore, if “τ ^(HMM) ” representing the probability of transition to itself does not match the currently estimated tempo (that is, the number of frames T), the section ΔS may not be appropriately obtained. . Therefore, the distribution (= Σ _{i, n} α _{i, n, T} (t)) with respect to the current frame number T is calculated using the probability density calculated based on the hidden semi-Markov model HSMM. Then, “τ ^(HMM) ” is determined by calculating the expected value of the number of transitions per frame using the current frame number T. Thereby, the tempos of the hidden Markov model HMM and the hidden semi-Markov model HSMM are matched.

次に、前記計算された楽譜位置確率密度及びテンポ確率密度の系列に基づいて、現在の楽譜位置を決定する手法について説明する。上記非特許文献２においては、テンポの連続性を１次の自己回帰過程としてモデル化していた。つまり、フレームｔにおけるテンポを「ν_ｔ」とし、平均値が「０」であって、分散σ^２が「０」より大きい正規分布に従う独立なテンポ変化量ε_ｔに対して、ν_ｔ＝ν_ｔ−１＋ε_ｔというモデルを仮定していた。しかし、音楽音響信号においては、テンポを速く（遅く）しているときには、ある程度連続した区間でε_ｔが正（負）の値をとり、かつテンポの時間微分（つまり、楽譜位置の加速度）は「０」に戻る傾向にある。つまり、あるフレームにおけるテンポ変化量ε_ｔは、そのフレームに隣接するフレームのテンポ変化量ε_ｔに依存する。 Next, a method for determining the current score position based on the calculated score position probability density and tempo probability density series will be described. In Non-Patent Document 2, tempo continuity is modeled as a first-order autoregressive process. That is, with respect to an independent tempo change amount ε _{t according} to a normal distribution in which the tempo at frame t is “ν _t ”, the average value is “0”, and the variance σ ² is greater than “0”, ν _t = ν a model in _{_t-1} + _{ε t} has been assumed. However, in a music acoustic signal, when the tempo is fast (slow), ε _t takes a positive (negative) value in a certain continuous section, and the time derivative of the tempo (that is, the acceleration of the musical score position) is It tends to return to “0”. That is, the tempo change amount ε _t in a certain frame depends on the tempo change amount ε _t of a frame adjacent to the frame.

そこで、本実施形態においては、さらに高次の情報を採り入れる。フレームｔに関する楽譜位置の確率密度を楽譜位置確率密度Ｕ_ｑ（ｔ）、テンポの確率密度をテンポ確率密度Ｖ_Ｔ（ｔ）と表記する。ここで、「ｑ」は、任意の「Ｍ」に対して、「ｑ＝ｒｏｕｎｄ（Ｍ（ｉ＋ｎ／Ｔ））」なる演算式により定義される変数である。つまり、「Ｖ_Ｔ（ｔ）」は、「ｑ」が１フレームあたりに「Ｍ／Ｔ」だけ遷移する確率である。 Therefore, in this embodiment, higher order information is adopted. The probability density of the musical score position related to the frame t is expressed as a musical score position probability density U _q (t), and the probability density of the tempo is expressed as tempo probability density V _T (t). Here, “q” is a variable defined by an arithmetic expression “q = round (M (i + n / T))” for an arbitrary “M”. That is, “V _T (t)” is a probability that “q” transitions by “M / T” per frame.

ここで、フレームｔにおける真の楽譜位置を楽譜位置ｘ_ｔ、真の楽譜位置の遷移速度をテンポｖ_ｔ、真の楽譜位置の遷移加速度を加速度ａ_ｔと表記する。つまり、テンポｖ_ｔは、楽譜位置ｘ_ｔの一階微分に相当し、加速度ａ_ｔは、楽譜位置ｘ_ｔの二階微分に相当する。そして、下記の式（１１）〜式（１３）によって定義される状態空間モデル（高次の自己回帰過程）を設定する。すなわち、楽譜位置の推移を表わす楽譜位置軌跡モデル、テンポの推移を表わすテンポ軌跡モデル、及び加速度の推移を表わす加速度軌跡モデルを設定する。

Here, denoted the true score position score position x _t, tempo v _t the transition speed of the true score _position, the acceleration a _t the transition acceleration of the true score position in the frame t. In other words, the tempo v _t, which corresponds to the first derivative of the score position x _t, the acceleration a _t corresponds to the second differential of the score position x _t. Then, a state space model (higher order autoregressive process) defined by the following equations (11) to (13) is set. That is, the musical score position trajectory model representing the transition of the musical score position, the tempo trajectory model representing the tempo transition, and the acceleration trajectory model representing the acceleration transition are set.

なお、式（１３）における「ｒ」は加速度ａ_ｔの減衰係数である。この減衰係数の作用により、加速度ａ_ｔは、連続的に変化し、かつ「０」に戻るという傾向を示す。また、「ｒ」が大きいとテンポの変化が緩やかになり、「ｒ」が小さいとテンポの変化が激しくなる傾向を示す。「ｒ」は例えば、「０．５」に設定される。実演奏のテンポデータに基づいて、「ｒ」を最適な値に設定しても良い。 Note that "r" in the equation (13) is the attenuation coefficient of the acceleration _{a t.} By the action of the damping coefficient, the acceleration a _t is a tendency that continuously changes, and return to "0". In addition, when “r” is large, the change in tempo becomes gentle, and when “r” is small, the change in tempo tends to become intense. “R” is set to “0.5”, for example. “R” may be set to an optimal value based on the tempo data of the actual performance.

上記の状態空間モデルがどのような観測値（つまり、楽譜位置確率密度Ｕ_ｑ（ｔ）及びテンポ確率密度Ｖ_Ｔ（ｔ）を生成するかをモデル化できれば、状態遷移と観測尤度を同時に考慮することにより、状態変数を推論することができる。そこで、楽譜位置確率密度Ｕ_ｑ（ｔ）の平均値μ（Ｕ_ｑ（ｔ））及び分散σ^２（Ｕ_ｑ（ｔ））、並びにテンポ確率密度Ｖ_Ｔ（ｔ）の平均値μ（Ｖ_Ｔ（ｔ））及び分散σ^２（Ｖ_Ｔ（ｔ））を下記の式（１４）〜式（１７）を用いて計算する。

If the state space model can model what observation values (that is, score position probability density U _q (t) and tempo probability density V _T (t) are generated, state transition and observation likelihood are considered simultaneously. Thus, the state variable can be inferred, so that the mean value μ (U _q (t)) and variance σ ² (U _q (t)) of the score position probability density U _q (t), and the tempo probability The average value μ (V _T (t)) and variance σ ² (V _T (t)) of the density V _T (t) are calculated using the following formulas (14) to (17).

つまり、推定された現在の楽譜位置ｘ_ｔの周辺のフレームに関する楽譜位置確率密度Ｕ_ｑ（ｔ）及びテンポ確率密度Ｖ_Ｔ（ｔ）の平均値及び分散を計算する。そして、観測尤度を下記の式（１８）に示すように定義する。

That is, the average value and variance of the score position probability density U _q (t) and the tempo probability density V _T (t) for the frames around the estimated current score position x _t are calculated. Then, the observation likelihood is defined as shown in the following equation (18).

すなわち、まず、現在のフレームよりＮ個前に位置するフレームと現在のフレームとの間における楽譜位置確率密度の系列及びテンポ確率密度の系列から、楽譜位置ｘ_ｔの周辺のフレームの確率密度の系列をそれぞれ抽出する。ここで、ΔＴ個だけ前に位置するフレームにおいて計算された確率密度を正規分布と見なす。つまり、ΔＴ個だけ前に位置するフレームにおいて計算された確率密度のヒストグラムの平均及び分散を正規分布の平均及び分散と見なす。そして、楽譜位置ｘ_ｔ、テンポｖ_ｔ及び加速度ａ_ｔを用いて、ΔＴ個だけ前に位置するフレームにおけるテンポ軌跡モデル及び加速度軌跡モデルの尤度を計算する。図５は、計算された楽譜位置観測密度の系列に対するテンポ軌跡モデルを推定する過程を示す概念図である。実際には、テンポ確率密度の系列に対する加速度軌跡モデルも推定する。カルマンフィルタを用いれば、上記のような楽譜位置軌跡モデル、テンポ軌跡モデル及び加速度軌跡モデルを実時間で推定できる。カルマンフィルタの更新ステップが実行され、更新された状態の推定値を用いて、楽譜位置ｘ_ｔの平均値〈ｘ_ｔ〉、及びテンポｖ_ｔの平均値〈ｖ_ｔ〉が計算される。そして、前記計算された楽譜位置ｘ_ｔの平均値〈ｘ_ｔ〉、及びテンポｖ_ｔの平均値〈ｖ_ｔ〉が、現在の楽譜位置及びテンポとして決定される。 That is, first, from the sequence of the sequence and the tempo probability density of score position probability density between the current frame and the current frame located N pieces prior frame, a sequence of probability density of the frame around the score position x _t Are extracted respectively. Here, the probability density calculated in a frame located by ΔT pieces before is regarded as a normal distribution. That is, the mean and variance of the histogram of probability density calculated in the frame located by ΔT before are regarded as the mean and variance of the normal distribution. The score position x _t, using the tempo v _t and the acceleration a _t, calculates the likelihood of the tempo track model and acceleration trajectory model in frame located before ΔT pieces only. FIG. 5 is a conceptual diagram showing a process of estimating a tempo trajectory model for the calculated musical score position observation density sequence. In practice, an acceleration trajectory model for the tempo probability density series is also estimated. If the Kalman filter is used, the musical score position trajectory model, the tempo trajectory model, and the acceleration trajectory model as described above can be estimated in real time. An update step of the Kalman filter is executed, and the average value <x _t > of the musical score position x _t and the average value <v _t > of the tempo v _t are calculated using the updated estimated value. Then, the calculated average value <x _t > of the musical score position x _t and the average value <v _t > of the tempo v _t are determined as the current musical score position and tempo.

次に、スコアアライメント装置１０の動作について具体的に説明する。ＣＰＵ１２ａは、図６Ａに示すように、ステップＳ１０にて、スコアアライメントプログラムをＲＯＭ１２ｂから読み込んで、スコアアライメント処理を開始する。次に、ＣＰＵ１２ａは、ステップＳ１１にて、表示器１３に楽譜データのリストを表示する。ユーザは、表示されたリストの中から、スコアアライメント処理を実行する対象の楽曲（つまり、演奏される楽曲）の楽譜データを、入力操作子１１を用いて選択する。次に、ＣＰＵ１２ａは、ステップＳ１２にて、前記選択された楽譜データを記憶装置１４から読み込んで、複数の区間ｉ（＝１，２，・・・，Ｉ）に分割する。 Next, the operation of the score alignment apparatus 10 will be specifically described. As shown in FIG. 6A, the CPU 12a reads the score alignment program from the ROM 12b in step S10, and starts the score alignment process. Next, the CPU 12a displays a list of score data on the display unit 13 in step S11. The user uses the input operator 11 to select the musical score data of the music to be subjected to the score alignment process (that is, the music to be played) from the displayed list. Next, in step S12, the CPU 12a reads the selected score data from the storage device 14 and divides it into a plurality of sections i (= 1, 2,..., I).

次に、ＣＰＵ１２ａは、ステップＳ１３にて、サウンドシステム１６に、楽音のサンプリングを開始させる。次に、ＣＰＵ１２ａは、ステップＳ１４にて、処理対象のフレームを最初のフレームに設定する。すなわち、フレームのインデックスである「ｔ」の値を「１」に設定する。 Next, in step S13, the CPU 12a causes the sound system 16 to start sampling of musical sounds. Next, in step S14, the CPU 12a sets the processing target frame as the first frame. That is, the value of “t” that is the index of the frame is set to “1”.

次に、ＣＰＵ１２ａは、ステップＳ１５にて、楽譜位置確率密度・テンポ確率密度計算処理を実行する。図６Ｂに示すように、ＣＰＵ１２ａは、ステップＳ１５０にて楽譜位置確率密度・テンポ確率密度計算処理を開始する。次に、ＣＰＵ１２ａは、ステップＳ１５１にて、フレームｔに含まれる音響信号（サンプルデータ）をサウンドシステム１６のバッファから読み込む。次に、ＣＰＵ１２ａは、ステップＳ１５２にて、上記式（３）乃至式（５）に基づいて、観測尤度Ｏ_{ｉ，ｎ，Ｔ}（ｔ）を計算する。次に、ＣＰＵ１２ａは、ステップＳ１５３にて、前記計算された観測尤度Ｏ_{ｉ，ｎ，Ｔ}（ｔ）を用いて、通常の隠れマルコフモデルＨＭＭに前向きアルゴリズムを適用し、フレームｔにおいて前向き変数が最大となる状態を検出する。これにより、隠れセミマルコフモデルＨＳＭＭに前向きアルゴリズムを適用する区間が決定される。次に、ＣＰＵ１２ａは、ステップＳ１５４にて、隠れセミマルコフモデルＨＳＭＭを構成する複数の区間のうち、前記決定された区間に前向きアルゴリズムを適用する（式（７）参照）。次に、ＣＰＵ１２ａは、ステップＳ１５５にて、隠れセミマルコフモデルＨＳＭＭに前向きアルゴリズムを適用して計算された前向き変数を用いて、楽譜位置確率密度Ｕ_ｑ（ｔ）及びテンポ確率密度Ｖ_Ｔ（ｔ）を計算する。そして、ＣＰＵ１２ａは、ステップＳ１５６にて、楽譜位置確率密度・テンポ確率密度計算処理を終了し、アライメント計算処理のステップＳ１６に処理を進める。 Next, in step S15, the CPU 12a executes a score position probability density / tempo probability density calculation process. As shown in FIG. 6B, the CPU 12a starts the score position probability density / tempo probability density calculation process in step S150. Next, the CPU 12a reads the acoustic signal (sample data) included in the frame t from the buffer of the sound system 16 in step S151. Next, in step S152, the CPU 12a calculates the observation likelihood O _{i, n, T} (t) based on the above formulas (3) to (5). Next, in step S153, the CPU 12a applies a forward algorithm to the normal hidden Markov model HMM using the calculated observation likelihood O _{i, n, T} (t), and the forward variable is changed in the frame t. Detect the maximum state. As a result, the interval in which the forward algorithm is applied to the hidden semi-Markov model HSMM is determined. Next, in step S154, the CPU 12a applies a forward algorithm to the determined section among the plurality of sections constituting the hidden semi-Markov model HSMM (see formula (7)). Next, in step S155, the CPU 12a uses the forward variable calculated by applying the forward algorithm to the hidden semi-Markov model HSMM, and uses the score position probability density U _q (t) and the tempo probability density V _T (t). Calculate Then, in step S156, the CPU 12a ends the score position probability density / tempo probability density calculation process, and proceeds to step S16 of the alignment calculation process.

次に、ＣＰＵ１２ａは、ステップＳ１６にて、楽譜位置・テンポ決定処理を実行する。ＣＰＵ１２ａは、図６Ｃに示すように、ステップＳ１６０にて、楽譜位置・テンポ決定処理を開始する。次に、ＣＰＵ１２ａは、ステップＳ１６１にて、楽譜位置がジャンプしたか否かを判定する。具体的には、現在のフレームに関して計算された楽譜位置確率密度Ｕ_ｑ（ｔ）と、１つ前のフレームに関して計算された楽譜位置確率密度Ｕ_ｑ（ｔ−１）との差に基づいて、楽譜位置がジャンプしたか否かを判定する。例えば、現在のフレームにおける楽譜位置確率密度Ｕ_ｑ（ｔ）及び１つ前のフレームにおける楽譜位置確率密度Ｕ_ｑ（ｔ−１）に基づいて、最も可能性の高い楽譜位置（隠れセミマルコフモデルＨＳＭＭの前向き変数の値が最も大きい状態）をそれぞれ検出し、前記検出した現在のフレームの楽譜位置と１つ前のフレームの楽譜位置とが４小節以上離れている場合に、楽譜位置がジャンプしたと判定する。楽譜位置がジャンプしていない場合には、ＣＰＵ１２ａは、「Ｎｏ」と判定して、ステップＳ１６２にて、上記式（１４）乃至式（１７）に基づいて、楽譜位置ｘ_ｔの周辺のフレームにおける楽譜位置確率密度Ｕ_ｑ（ｔ）及びテンポ確率密度Ｖ_Ｔ（ｔ）の平均値及び分散を計算する。次に、ＣＰＵ１２ａは、ステップＳ１６３にて、上記式（１８）に基づいて観測尤度を計算し、カルマンフィルタを用いてテンポ推移モデル及び加速度推移モデルを推定する。 Next, in step S16, the CPU 12a executes score position / tempo determination processing. As shown in FIG. 6C, the CPU 12a starts the musical score position / tempo determination process in step S160. Next, in step S161, the CPU 12a determines whether or not the musical score position has jumped. Specifically, based on the difference between the score position probability density U _q (t) calculated for the current frame and the score position probability density U _q (t−1) calculated for the previous frame, It is determined whether or not the score position has jumped. For example, based on the score position probability density U _q (t) in the current frame and the score position probability density U _q (t−1) in the previous frame, the most likely score position (hidden semi-Markov model HSMM) And the score position jumps when the score position of the detected current frame and the score position of the previous frame are 4 bars or more apart from each other. judge. If the score position is not jump, CPU 12a determines "No" at step S162, the equation (14) to, based on the equation (17), in the vicinity of the frame of the score position _{x t} The average value and variance of the score position probability density U _q (t) and the tempo probability density V _T (t) are calculated. Next, in step S163, the CPU 12a calculates an observation likelihood based on the above equation (18), and estimates a tempo transition model and an acceleration transition model using a Kalman filter.

一方、楽譜位置がジャンプした場合には、ＣＰＵ１２ａは、ステップＳ１６１において「Ｙｅｓ」と判定する。そして、ＣＰＵ１２ａは、ステップＳ１６４にて、楽譜位置ｘ_ｔ、テンポｖ_ｔ、及び加速度ａ_ｔの値を次のように設定する。例えば、現在のフレームにおける楽譜位置確率密度Ｕ_ｑ（ｔ）に基づいて、最も可能性の高い楽譜位置を検出し、前記検出した楽譜位置を楽譜位置ｘ_ｔとして設定する。また、テンポｖ_ｔを規定値（例えば「１２０ＢＰＭ」）に設定する。また、加速度ａ_ｔを規定値（例えば「０」）に設定する。 On the other hand, if the musical score position jumps, the CPU 12a determines “Yes” in step S161. Then, CPU 12a, at step S164, sets score position _{x t,} tempo _{v t,} and the value of the acceleration _{a t} as follows. For example, the most likely score position is detected based on the score position probability density U _q (t) in the current frame, and the detected score position is set as the score position x _t . Further, the tempo v _t is set to a specified value (for example, “120 BPM”). Also, setting the acceleration a _t prescribed value (e.g. "0").

そして、ＣＰＵ１２ａは、ステップＳ１６５にて、楽譜位置推移モデル、テンポ推移モデル及び加速度推移モデルの推定結果を用いて楽譜位置及びテンポを決定して、前記決定した楽譜位置及びテンポに応じて制御対象を制御する。 In step S165, the CPU 12a determines the score position and tempo using the estimation results of the score position transition model, the tempo transition model, and the acceleration transition model, and sets the control target according to the determined score position and tempo. Control.

例えば、楽曲の楽譜を表示器１３に表示し、前記決定した楽譜位置（音符）の色を他の部分の色とは異ならせることにより、現在の楽譜位置を明示する。また、例えば、推定された楽譜位置に対応する静止画、動画などを表示する。例えば、静止画を表わす静止画データのファイル名と楽譜位置を予め対応させておき、推定された楽譜位置に対応する静止画データを用いて静止画を表示器１３に表示してもよい。また、例えば、動画を表わす動画データの再生位置（例えばフレーム番号）と楽譜位置を予め対応付けておき、前記動画のうち、推定された楽譜位置に対応する部分を表示器１３に表示してもよい。また、例えば、伴奏を表わす伴奏データの再生位置（例えば小節番号）と楽譜位置を予め対応付けておき、推定された楽譜位置に対応する部分のデータをサウンドシステム１６の音源回路に送信して、伴奏の楽音を放音させてもよい。伴奏を再生する際には、伴奏のテンポを前記決定したテンポに設定すればよい。 For example, the musical score of the music is displayed on the display 13 and the current musical score position is clearly indicated by making the determined musical score position (note) color different from the color of other parts. Further, for example, a still image or a moving image corresponding to the estimated score position is displayed. For example, a file name of still image data representing a still image and a score position may be associated in advance, and the still image may be displayed on the display unit 13 using still image data corresponding to the estimated score position. Further, for example, a reproduction position (for example, a frame number) of moving image data representing a moving image is associated with a musical score position in advance, and a portion corresponding to the estimated musical score position of the moving image is displayed on the display unit 13. Good. Further, for example, a reproduction position (for example, a measure number) of accompaniment data representing accompaniment is associated with a musical score position in advance, and a portion of data corresponding to the estimated musical score position is transmitted to the sound source circuit of the sound system 16. Accompanied musical sounds may be emitted. When reproducing the accompaniment, the accompaniment tempo may be set to the determined tempo.

次に、ＣＰＵ１２ａは、前記計算された前向き変数α_{ｉ，ｎ，Ｔ}（ｔ）を用いて、ステップＳ１６６にて、隠れマルコフモデルＨＭＭの状態遷移確率を更新する。具体的には、まず、セミマルコフモデルＨＳＭＭの前向き変数を用いて、現在のフレーム数Ｔの期待値〈Ｔ〉を計算する。隠れセミマルコフモデルＨＳＭＭにおいて、フレーム数Ｔの値が平均値〈Ｔ〉である場合、１フレームあたりに遷移する区間数は、「１／〈Ｔ〉」と表わされる。また、上記のように、「τ^{（ＨＭＭ）}」は、１フレームあたりに遷移する隠れマルコフモデルＨＭＭの区間数とみなすことができる。したがって、隠れセミマルコフモデルＨＳＭＭの区間長と隠れマルコフモデルＨＭＭの区間長との比を「φ」（＝隠れセミマルコフモデルＨＳＭＭの区間長／隠れマルコフモデルＨＭＭの区間長）とすると、「τ^{（ＨＭＭ）}＝φ／〈Ｔ〉」と表わされる。「τ^{（ＨＭＭ）}」は「０」以上、且つ「１」以下の値として定義されているため、「τ^{（ＨＭＭ）}」は、次の式（１９）に基づいて更新される。

Next, the CPU 12a updates the state transition probability of the hidden Markov model HMM in step S166 using the calculated forward variable α _{i, n, T} (t). Specifically, first, an expected value <T> of the current frame number T is calculated using a forward variable of the semi-Markov model HSMM. In the hidden semi-Markov model HSMM, when the value of the number of frames T is an average value <T>, the number of sections that transition per frame is expressed as “1 / <T>”. Further, as described above, “τ ^(HMM) ” can be regarded as the number of sections of the hidden Markov model HMM that changes per frame. Therefore, if the ratio between the section length of the hidden semi-Markov model HSMM and the section length of the hidden Markov model HMM is “φ” (= section length of the hidden semi-Markov model HSMM / section length of the hidden Markov model HMM), “τ ^{( HMM)} = φ / <T> ”. Since “τ ^(HMM) ” is defined as a value not less than “0” and not more than “1”, “τ ^(HMM) ” is updated based on the following equation (19).

そして、ＣＰＵ１２ａは、ステップＳ１６７にて、楽譜位置決定処理を終了し、アライメント計算処理のステップＳ１７に処理を進める。 Then, in step S167, the CPU 12a ends the score position determination process, and proceeds to step S17 of the alignment calculation process.

次に、ＣＰＵ１２ａは、ステップＳ１７にて、処理対象のフレームを次のフレームに設定する。すなわち、フレームのインデックスである「ｔ」の値をインクリメントする。これ以降、ＣＰＵ１２ａは、ステップＳ１５乃至ステップＳ１７を繰り返し実行する。ただし、ユーザが入力操作子１１を用いてスコアアライメント処理の終了を指示すると、ＣＰＵ１２ａは、制御対象の動作を停止させ、スコアアライメント処理を終了する。 Next, in step S17, the CPU 12a sets the processing target frame as the next frame. That is, the value of “t” that is the index of the frame is incremented. Thereafter, the CPU 12a repeatedly executes steps S15 to S17. However, when the user uses the input operator 11 to instruct the end of the score alignment process, the CPU 12a stops the operation to be controlled and ends the score alignment process.

上記のように構成したスコアアライメント装置１０においては、まず、楽譜位置確率密度及びテンポ確率密度を計算する。そして、楽譜位置確率密度Ｕ_ｑ（ｔ）の系列及びテンポ確率密度Ｖ_Ｔ（ｔ）の系列を用いて、高次の自己回帰過程に基づいて楽譜位置及びテンポを決定する。これによれば、テンポの時間微分（すなわち、楽譜位置の加速度）が連続的であって、且つ「０」に戻る傾向を示すという音楽音響信号の性質を表現できる。また、隠れセミマルコフモデルＨＳＭＭを用いて計算された楽譜位置確率密度Ｕ_ｑ（ｔ）及びテンポ確率密度Ｖ_Ｔ（ｔ）の分散が大きい区間では、楽譜位置確率密度の系列及びテンポ確率密度の系列の経路が平滑化されるという挙動が得られる。これにより、楽譜位置の推定精度を向上させることができる。したがって、スコアアライメント装置１０をメディアプレーヤに適用すれば、演奏者による演奏の進行位置と他のメディア（自動伴奏、画像など）の再生位置とのずれを従来よりも抑制できる。つまり、演奏者による演奏に対する他のメディアの再生が不自然に感じられることを抑制できる。 In the score alignment apparatus 10 configured as described above, first, a score position probability density and a tempo probability density are calculated. Then, the score position and the tempo are determined based on the higher-order autoregressive process using the sequence of the score position probability density U _q (t) and the sequence of the tempo probability density V _T (t). According to this, it is possible to express the property of the music acoustic signal that the temporal differentiation of the tempo (that is, the acceleration of the musical score position) is continuous and shows a tendency to return to “0”. In addition, in a section where the variance of the score position probability density U _q (t) and the tempo probability density V _T (t) calculated using the hidden semi-Markov model HSMM is large, the score position probability density series and the tempo probability density series The behavior of smoothing the path is obtained. Thereby, the estimation accuracy of the score position can be improved. Therefore, if the score alignment apparatus 10 is applied to a media player, it is possible to suppress the shift between the performance position of the performance by the performer and the playback position of other media (automatic accompaniment, images, etc.). That is, it is possible to suppress the reproduction of other media in response to the performance by the performer from being unnatural.

また、通常の隠れマルコフモデルＨＭＭに前向きアルゴリズムを適用して前向き変数を計算し、前向き変数が最大となる状態（区間）を検索する。そして、隠れセミマルコフモデルＨＳＭＭにおいて、前記検索された状態（区間）に対応する区間を含む複数の区間ΔＳに対して前向きアルゴリズムを適用して楽譜位置確率密度Ｕ_ｑ（ｔ）及びテンポ確率密度Ｖ_Ｔ（ｔ）を計算している。したがって、隠れセミマルコフモデルＨＳＭＭを構成する全ての区間に対して前向きアルゴリズムを適用する場合に比べて、計算量の増大を抑制することができる。 Further, a forward variable is calculated by applying a forward algorithm to a normal hidden Markov model HMM, and a state (section) in which the forward variable is maximized is searched. Then, in the hidden semi-Markov model HSMM, the score position probability density U _q (t) and the tempo probability density V are applied by applying a forward algorithm to a plurality of sections ΔS including a section corresponding to the searched state (section). _T (t) is calculated. Therefore, an increase in the amount of calculation can be suppressed as compared with the case where the forward algorithm is applied to all the sections constituting the hidden semi-Markov model HSMM.

また、隠れセミマルコフモデルＨＳＭＭの区間数よりも隠れマルコフモデルＨＭＭの区間数のほうが多くなるように楽譜を分割した。これにより、隠れセミマルコフモデルＨＳＭＭの区間のうち前向きアルゴリズムを適用する複数の区間を、隠れマルコフモデルＨＭＭの区間数と隠れセミマルコフモデルＨＳＭＭの区関数とが同一である場合に比べて、より適切に検索できる。 Further, the musical score was divided so that the number of sections of the hidden Markov model HMM was larger than the number of sections of the hidden semi-Markov model HSMM. Accordingly, a plurality of sections to which the forward algorithm is applied among the sections of the hidden semi-Markov model HSMM are more appropriate as compared with the case where the number of sections of the hidden Markov model HMM and the section function of the hidden semi-Markov model HSMM are the same. Can be searched.

さらに、本発明の実施にあたっては、上記実施形態に限定されるものではなく、本発明の目的を逸脱しない限りにおいて種々の変更が可能である。 Furthermore, in carrying out the present invention, the present invention is not limited to the above embodiment, and various modifications can be made without departing from the object of the present invention.

例えば、上記実施形態では、楽譜にダ・カーポ、繰り返し記号などが含まれないと仮定しているが、楽譜にダ・カーポ、繰り返し記号などが含まれる場合には、それらの演奏記号に従って区間の遷移確率を適当に設定すればよい。例えば、繰り返し記号が含まれる場合には、繰り返し区間のうちの末尾の区間から、繰り返し区間の先頭へ遷移する確率を５０％とし、繰り返し区間の末尾の区間から、前記末尾の区間の１つ後の区間へ遷移する確率を５０％とすればよい。 For example, in the above embodiment, it is assumed that the score does not include da-capo, repeated symbols, etc., but if the score includes da-capo, repeated symbols, etc. What is necessary is just to set a transition probability appropriately. For example, when a repetition symbol is included, the probability of transitioning from the end section of the repeat section to the start of the repeat section is 50%, and one end of the end section from the end section of the repeat section. The probability of transition to this section may be 50%.

また、例えば、ステップＳ１５１にて、音響信号データを読み込んだとき、無音か否かを判定するステップを追加しても良い。無音である場合には、楽譜位置ｘ_ｔのモデルを状態空間モデルに基づいてのみ更新すればよい。つまり、ｘ_ｔ＝ｘ_ｔ−１＋ｖ_ｔ−１＋ａ_ｔ−１／２なる演算式を用いればよい。また、この場合、隠れマルコフモデルＨＭＭ及び隠れセミマルコフモデルＨＳＭＭの前向き変数に関しては、楽譜位置ｘ_ｔに対応する箇所の変数のみを一様分布に設定し、それ以外を「０」に設定すればよい。 Further, for example, a step of determining whether or not there is silence when the acoustic signal data is read in step S151 may be added. If it is silence, it may be updated only based on a model of the score position x _t to the state space model. That is, an arithmetic expression of x _t = x _t−1 + v _t−1 + a _t−1 / 2 may be used. In this case, with respect to the forward variable Hidden Markov Models HMM and hidden semi Markov model HSMM, it sets only variable portion corresponding to the score position x _t a uniform distribution, by setting the rest to "0" Good.

また、楽譜にフェルマータが存在する場合は、フェルマータが記された区間において自己遷移を許容するように隠れセミマルコフモデルＨＳＭＭを設定すればよい。つまり、区間ｉにフェルマータが存在するとき、確率τ_ｉ，ｉを「ρ」とし、確率τ_ｉ，ｊを「１−ρ」に設定すればよい。この場合、区間ｉにおいて自己遷移した回数をカウントし、カウント結果に応じて、演奏が区間ｉに留まった長さを評価してもよい。例えば、区間ｉに留まった長さを「短すぎる」、「普通」、「長すぎる」という３段階で判定し、判定結果を演奏評価情報として出力しても良い。 In addition, when fermata exists in the score, the hidden semi-Markov model HSMM may be set so as to allow self-transition in the section where the fermata is written. That is, when fermata exists in the interval i, the probability τ _{i, i} may be set to “ρ” and the probability τ _{i, j} may be set to “1-ρ”. In this case, the number of times of self-transition in the section i may be counted, and the length of the performance remaining in the section i may be evaluated according to the count result. For example, the length remaining in the section i may be determined in three stages of “too short”, “normal”, and “too long”, and the determination result may be output as performance evaluation information.

また、推定されたテンポ及びその分散を用いて、楽曲の演奏速度を評価してもよい。例えば、楽曲の演奏速度を「遅すぎる」、「普通」、「速すぎる」という３段階で判定し、判定結果を演奏評価情報として出力しても良い。 Moreover, you may evaluate the performance speed of a music using the estimated tempo and its dispersion | distribution. For example, the performance speed of the music may be determined in three stages, “too slow”, “normal”, and “too fast”, and the determination result may be output as performance evaluation information.

１０・・・スコアアライメント装置、ＨＭＭ・・・隠れマルコフモデル、ＨＳＭＭ・・・隠れセミマルコフモデル、ｘ_ｔ・・・楽譜位置、ｖ_ｔ・・・テンポ、ａ_ｔ・・・加速度 10 ... score alignment apparatus, HMM ··· hidden Markov model, HSMM ··· hidden semi-Markov _{model, x} t ··· score _{position, v} t ··· _{tempo, a} t ··· acceleration

Claims

A score alignment device that estimates in real time a musical score position and a tempo representing a currently played portion of the musical score of the musical piece by analyzing the captured acoustic signal while capturing an acoustic signal representative of the musical performance sound There,
Probability model expressed as a sequence of states that represent each musical score position, with the probability that the current state depends on the immediately preceding state and the property that it can transition to any state from the current state A score position probability density / tempo probability density calculation means for calculating the score position probability density and the tempo probability density based on the model;
Based on the autoregressive process expressed using the true score position, the transition speed of the true score position, and the transition acceleration of the true score position, using the calculated probability density sequence of the score position, A score alignment device comprising: a score position / tempo determination means for determining a score position and a tempo.

The score alignment apparatus according to claim 1,
The state depends on playing from the beginning of the section including the current score position to the current score position among the plurality of sections obtained by dividing the score and including the current score position. And the time taken to play the entire section,
The stochastic model is a hidden semi-Markov model represented as a sequence of the states;
The score position probability density / tempo probability density calculation means calculates a score position probability density and a tempo probability density by applying a forward algorithm to the hidden semi-Markov model.

The score alignment apparatus according to claim 2,
The musical score position probability density / tempo probability density calculation means includes:
Of the plurality of sections obtained by dividing the score, the section including the current score position, the time taken to play from the beginning of the section including the current score position to the current score position, Based on a hidden Markov model represented as a sequence of states specified using an average time taken to play the entire section, a plurality of forward-looking algorithms are applied among the sections of the hidden semi-Markov model. A score alignment apparatus comprising section search means for searching for a section.

A score alignment device that estimates in real time a musical score position and a tempo representing a currently played portion of the musical score of the musical composition by analyzing the captured acoustic signal while capturing an acoustic signal representing a musical performance sound Computer
A probabilistic model expressed as a sequence of states that represent each musical score position, with the property that the current state depends on the immediately preceding state and the property that it can transition from the current state to any other state A score position probability density / tempo probability density calculation step of calculating a score density and a tempo probability density based on the probability model;
Based on the autoregressive process expressed using the true score position, the transition speed of the true score position, and the transition acceleration of the true score position, using the calculated probability density sequence of the score position, A computer program for executing a musical score position / tempo determination step for determining a musical score position and tempo.