JP2018025644A

JP2018025644A - Music key estimation device, and music code progression estimation device

Info

Publication number: JP2018025644A
Application number: JP2016156821A
Authority: JP
Inventors: 伊藤　伸一; Shinichi Ito; 伸一伊藤; 稔福見; Minoru Fukumi; 桃代伊藤; Momoyo Ito; 集田村; Shu Tamura
Original assignee: University of Tokushima NUC
Current assignee: University of Tokushima NUC
Priority date: 2016-08-09
Filing date: 2016-08-09
Publication date: 2018-02-15

Abstract

PROBLEM TO BE SOLVED: To provide a music code progression device that estimates a Key and code progression of a music.SOLUTION: A music code progression estimation device comprises: a voice data acquisition unit (21 and 17) that acquires voice data on a music; and a code progression estimation unit (11) that analyzes the voice data to estimate a code progression of the music. The code progression estimation unit (11) is configured to: conduct a frequency analysis of the voice data to obtain a chroma vector; determine a Key of the music from the chroma vector; obtain a code in each of a plurality of prescribed small sections from the chroma vector; convert the code obtained for each small section on the basis of the determined Key of the music; and determine a code progression of the music on the basis of combinations of the prescribed number of the codes extracted from a converted code group.SELECTED DRAWING: Figure 1

Description

本発明は、楽曲のＫｅｙを推定する装置及びコード進行を推定する装置に関する。 The present invention relates to an apparatus for estimating the key of music and an apparatus for estimating chord progression.

近年、楽曲の自動解析分野の研究は盛んに行われており（特許文献１等参照）、実用レベルのアプリケーションが多く開発されている。例えば、類似楽曲検索ソフト、コードトラッキング機能を持つソフト、採譜ソフト等のアプリケーションがある。楽曲を自動解析することで、楽曲の定量評価や採譜につなげることができる。自動解析によって得られる主な情報として、メロディ解析、サビ区間推定等が存在するが、その中でも楽曲の雰囲気に大きく意味付けを行うのがコード進行である。コード進行を適切に解析することで、楽曲がもつ雰囲気といった曖昧なものを定量的に評価することが可能である。 In recent years, research in the field of automatic music analysis has been actively conducted (see Patent Document 1 and the like), and many practical applications have been developed. For example, there are applications such as similar music search software, software having a code tracking function, and music recording software. Automatic analysis of music can lead to quantitative evaluation of music and transcription. As main information obtained by automatic analysis, there are melody analysis, chorus section estimation, etc. Among them, chord progression is what makes the atmosphere of the music significant. By analyzing chord progression appropriately, it is possible to quantitatively evaluate ambiguous things such as the atmosphere of music.

特開２００９−０１５５３５号公報JP 2009-015535 A 特開２００９−１８６９４４号公報JP 2009-186944 A 特開２００８−０９６８４４号公報JP 2008-096844 A 特開２００７−２４８６１０号公報JP 2007-248610 A

しかしながら、コード進行を取得する為に周波数解析による音程分析を用いるのみでは、複雑な信号に対しては高い精度を見込めない。また、コードの特性を評価するためには、楽曲のＫｅｙ（調）を決定する必要もある。 However, high accuracy cannot be expected for complex signals only by using pitch analysis by frequency analysis to obtain chord progression. In addition, in order to evaluate the chord characteristics, it is necessary to determine the key of the music.

本発明は、楽曲のＫｅｙを推定する装置及びコード進行を推定する装置を提供する。 The present invention provides an apparatus for estimating the key of music and an apparatus for estimating chord progression.

本発明に係る楽曲Ｋｅｙ推定装置は、楽曲の音声データを取得する音声データ取得部と、Ｋｅｙと音階の組み合わせとの対応を示すＫｅｙ情報を格納する記憶部と、音声データを解析して、楽曲のＫｅｙを特定するＫｅｙ推定部と、を備える。Ｋｅｙ推定部は、所定区間の音声データを周波数解析してクロマベクトルを求め、クロマベクトルに含まれる音階の中から複数の音階を選択し、選択した複数の音階の組み合わせと前記Ｋｅｙ情報に基づきＫｅｙ候補を決定し、決定したＫｅｙ候補の中から前記楽曲のＫｅｙを特定する。 The music key estimation apparatus according to the present invention includes an audio data acquisition unit that acquires audio data of a music, a storage unit that stores key information indicating a correspondence between a key and a combination of scales, and analyzes audio data, A key estimation unit that identifies the key of the key. The key estimation unit frequency-analyzes speech data of a predetermined section to obtain a chroma vector, selects a plurality of scales from the scales included in the chroma vector, and based on the combination of the selected plurality of scales and the key information. A candidate is determined, and the key of the music is specified from the determined key candidates.

本発明に係る楽曲コード進行推定装置は、楽曲の音声データを取得する音声データ取得部と、音声データを解析して、楽曲のコード進行を推定するコード進行推定部と、を備える。
コード進行推定部は、音声データを周波数解析してクロマベクトルを求め、クロマベクトルから楽曲のＫｅｙを特定し、クロマベクトルから、複数の所定の小区間のそれぞれにおいてコードを求め、特定した楽曲のＫｅｙに基づき小区間毎に求めたコードを変換し、変換したコード群の中から抽出した所定数のコードの組み合わせに基づいて楽曲のコード進行を特定する。 The music chord progression estimation apparatus according to the present invention includes an audio data acquisition unit that acquires audio data of a music, and a chord progression estimation unit that analyzes the audio data and estimates the chord progression of the music.
The chord progression estimation unit frequency-analyzes the audio data to obtain a chroma vector, specifies a music key from the chroma vector, obtains a chord from each of the plurality of predetermined small sections from the chroma vector, and specifies the specified music key The chord obtained for each small section is converted based on the chord, and the chord progression of the music is specified based on a combination of a predetermined number of chords extracted from the chord group that has been converted.

本発明に係る第１のプログラムは、情報処理装置を楽曲のＫｅｙを特定する装置として動作させるためのプログラムである。第１のプログラムは、情報処理装置の制御装置に、所定区間の音声データを周波数解析してクロマベクトルを求める機能と、クロマベクトルに含まれる音階の中から複数の音階を選択する機能と、選択した複数の音階の組み合わせと前記Ｋｅｙ情報に基づきＫｅｙ候補を決定する機能と、決定したＫｅｙ候補の中から前記楽曲のＫｅｙを特定する機能とを実行させる、プログラムである。 The first program according to the present invention is a program for causing the information processing apparatus to operate as an apparatus for specifying the key of music. The first program causes the control device of the information processing device to obtain a chroma vector by frequency analysis of audio data in a predetermined section, a function to select a plurality of scales from the scales included in the chroma vector, and a selection A program for executing a function for determining a key candidate based on a combination of a plurality of scales and the key information, and a function for specifying the key of the music from the determined key candidates.

本発明に係る第２のプログラムは、情報処理装置を楽曲のコード進行を特定する装置として動作させるためのプログラムである。第２のプログラムは、情報処理装置の制御装置に、音声データを周波数解析してクロマベクトルを求める機能と、クロマベクトルから楽曲のＫｅｙを特定する機能と、クロマベクトルから、複数の所定の小区間のそれぞれにおいてコードを求める機能と、特定した楽曲のＫｅｙに基づき小区間毎に求めたコードを変換する機能と、変換したコード群の中から抽出した所定数のコードの組み合わせに基づいて楽曲のコード進行を特定する機能とを実行させる、プログラムである。 The second program according to the present invention is a program for causing the information processing apparatus to operate as an apparatus for specifying the chord progression of music. The second program has a function for obtaining a chroma vector by performing frequency analysis on audio data, a function for specifying the key of a song from the chroma vector, and a plurality of predetermined small sections from the chroma vector. A code for a music piece based on a combination of a function for obtaining a chord, a function for converting a chord obtained for each subsection based on the key of the specified music piece, and a predetermined number of chords extracted from the converted chord group It is a program that executes a function for identifying progress.

本発明によれば、楽曲データを解析することにより、楽曲の所定区間におけるＫｅｙを自動で抽出でき、さらに、楽曲に対するコード進行パターンを自動で抽出することができる。 According to the present invention, by analyzing music data, it is possible to automatically extract a key in a predetermined section of the music, and further to automatically extract a chord progression pattern for the music.

本発明に係る楽曲コード進行推定装置及び楽曲Ｋｅｙ推定装置の一実施の形態である楽曲解析装置の構成を示す図The figure which shows the structure of the music analysis apparatus which is one Embodiment of the music chord progression estimation apparatus and music key estimation apparatus which concern on this invention. 楽曲解析装置における処理を示すフローチャートThe flowchart which shows the process in a music analysis device 解析区間におけるクロマベクトルの算出を説明するための図Diagram for explaining calculation of chroma vector in analysis interval 楽曲解析装置におけるＫｅｙ推定処理を示すフローチャートThe flowchart which shows the key estimation process in a music analyzer 音階と、音階を特定するラベルとの対応を説明した図A diagram explaining the correspondence between musical scales and labels that identify musical scales ５つの低値（クロマベクトルの値が低い音階）の選択を説明した図Diagram explaining the selection of five low values (scales with low chroma vector values) 各Ｋｅｙと５つの低値の組み合わせとの対応を示すＫｅｙ情報の構成例を示す図The figure which shows the structural example of the Key information which shows a response | compatibility with each Key and the combination of five low values. 音階の循環モデルを説明した図Diagram explaining scale circulation model “Ｃ”と“Ｇ”のＫｅｙについてそれぞれの構成音を示した図The figure which showed each constituent sound about Key of "C" and "G" 楽曲解析装置におけるコード進行推定処理を示すフローチャートA flowchart showing chord progression estimation processing in the music analysis device Ｋｅｙ“Ｃ”へ変換する際のシフト量を示した図The figure which showed the shift amount at the time of converting to Key "C" 解析区間におけるコード進行パターン候補（４つのコード列）の抽出を説明した図The figure explaining extraction of the chord progression pattern candidate (four chord strings) in the analysis section コード特定によるマルコフモデルを示した図Diagram showing Markov model with code identification 解析区間におけるコードからの６つのコードの順次抽出を説明した図Diagram explaining the sequential extraction of six codes from codes in the analysis section

以下、適宜図面を参照しながら、本発明にかかる楽曲コード進行推定装置及び楽曲Ｋｅｙ推定装置の実施の形態を説明する。 Hereinafter, embodiments of a music chord progression estimation apparatus and a music key estimation apparatus according to the present invention will be described with reference to the drawings as appropriate.

（実施の形態１）
１．楽曲解析装置の構成
図１は、本発明に係る楽曲コード進行推定装置及び楽曲Ｋｅｙ推定装置の一実施の形態である楽曲解析装置の構成を示す図である。楽曲解析装置は、楽曲のＫｅｙ（調）及びコード進行を推定する（すなわち自動で特定する）装置である。楽曲解析装置は、楽曲の任意の区間に対するクロマベクトルを推定し、推定されたクロマベクトルに基づいて、Ｋｅｙを特定し、さらにコード進行を特定する。 (Embodiment 1)
1. Configuration of Music Analysis Device FIG. 1 is a diagram showing a configuration of a music analysis device that is an embodiment of a music chord progression estimation device and a music key estimation device according to the present invention. The music analysis device is a device that estimates the key (key) and chord progression of music (that is, automatically specifies). The music analysis apparatus estimates a chroma vector for an arbitrary section of the music, specifies a key based on the estimated chroma vector, and further specifies a chord progression.

図１に、楽曲解析装置１０の構成を示す。楽曲解析装置１０は例えばパーソナルコンピュータのような情報処理装置で構成される。楽曲解析装置１０は、その全体動作を制御するコントローラ１１と、画面表示を行う表示部１３と、ユーザが操作を行う操作部１５と、データやプログラムを記憶する記憶部１７とを備える。 FIG. 1 shows the configuration of the music analysis apparatus 10. The music analysis apparatus 10 is configured by an information processing apparatus such as a personal computer. The music analysis apparatus 10 includes a controller 11 that controls the overall operation, a display unit 13 that performs screen display, an operation unit 15 that is operated by a user, and a storage unit 17 that stores data and programs.

表示部１３は、例えば、液晶ディスプレイや有機ＥＬディスプレイで構成される。操作部１５は、使用者が指示を行うための装置であり、キーボード、マウス、タッチパネル等で構成される。 The display unit 13 is configured by, for example, a liquid crystal display or an organic EL display. The operation unit 15 is a device for a user to give an instruction, and includes a keyboard, a mouse, a touch panel, and the like.

記憶部１７は機能を実現するために必要なパラメータ、データ及びプログラムを記憶する記録媒体であり、コントローラ１１で実行される制御プログラムや各種のデータを格納している。記憶部１７は、例えば、ハードディスク（ＨＤＤ）、半導体記憶装置（ＳＳＤ）、フラッシュメモリで構成される。 The storage unit 17 is a recording medium that stores parameters, data, and programs necessary for realizing the functions, and stores a control program executed by the controller 11 and various data. The storage unit 17 includes, for example, a hard disk (HDD), a semiconductor storage device (SSD), and a flash memory.

コントローラ１１は、ＣＰＵやＭＰＵで構成され、記憶部１７に格納された所定の制御プログラム１７ａを実行することで所定の機能を実現する。すなわち、コントローラ１１は制御プログラム１７ａを実行することでＫｅｙ推定部及びコード進行推定部として機能する。コントローラ１１で実行される制御プログラム１７ａはネットワークを介して提供されてもよいし、ＣＤ−ＲＯＭ等の記録媒体によって提供されてもよい。コントローラ１１の機能はハードウェアとソフトウェアの協働により実現してもよいし、ハードウェア回路のみで実現してもよい。すなわち、コントローラ１１は、ＣＰＵ、ＭＰＵのみならず、ＤＳＰ、ＦＰＧＡ、ＡＳＩＣ等で構成することができる。 The controller 11 includes a CPU and an MPU, and implements a predetermined function by executing a predetermined control program 17a stored in the storage unit 17. That is, the controller 11 functions as a key estimation unit and a chord progression estimation unit by executing the control program 17a. The control program 17a executed by the controller 11 may be provided via a network or may be provided by a recording medium such as a CD-ROM. The function of the controller 11 may be realized by cooperation of hardware and software, or may be realized only by a hardware circuit. That is, the controller 11 can be composed of not only a CPU and an MPU but also a DSP, an FPGA, an ASIC, and the like.

楽曲解析装置１０は、プリンタ等の外部機器に接続するための通信インターフェース１９を含む。通信インターフェース１９は、ＵＳＢ、ＨＤＭＩ（登録商標）、ＩＥＥＥ１３９４等に準拠して外部機器とデータ等の通信を行うインターフェース回路である。楽曲解析装置１０はさらにネットワークに接続するためのＩＥＥＥ８０２．１１、ＷｉＦｉ等の規格に準拠して通信を行うインターフェース回路を備えても良い。楽曲解析装置１０は、さらに、音声を音声信号に変換するマイク２０からの音声信号を入力する音声入力インターフェース２１を備える。音声入力インターフェース２１を介して入力された音声信号はＡＤコンバータ（図示せず）により音声データに変換されてコントローラ１１に入力される。 The music analysis apparatus 10 includes a communication interface 19 for connecting to an external device such as a printer. The communication interface 19 is an interface circuit that communicates data and the like with an external device in accordance with USB, HDMI (registered trademark), IEEE1394, or the like. The music analysis apparatus 10 may further include an interface circuit that performs communication in accordance with standards such as IEEE 802.11 and WiFi for connection to a network. The music analysis apparatus 10 further includes an audio input interface 21 that inputs an audio signal from the microphone 20 that converts audio into an audio signal. An audio signal input via the audio input interface 21 is converted into audio data by an AD converter (not shown) and input to the controller 11.

２．楽曲解析装置の動作
上記の構成を有する楽曲解析装置１０の動作を説明する。図２は、楽曲解析装置１０のＫｅｙ及びコード進行の推定に関する処理を示すフローチャートである。図２を用いて、楽曲解析装置１０の処理を説明する。なお、図２に示す処理は、コントローラ１１により制御プログラム１７ａにしたがい実行される。 2. Operation of Music Analysis Device The operation of the music analysis device 10 having the above configuration will be described. FIG. 2 is a flowchart showing processing related to the estimation of the key and chord progression of the music analysis apparatus 10. The process of the music analysis device 10 will be described with reference to FIG. The processing shown in FIG. 2 is executed by the controller 11 according to the control program 17a.

図２において、楽曲解析装置１０のコントローラ１１は、まず、楽曲の一部の区間を解析区間として、その解析区間の音声データを抽出する（Ｓ１１）。楽曲は多くの場合、左右２チャンネルのステレオ信号により形成されている。本実施の形態では、左右の各チャンネルのどちらの波形も有用な情報として利用するために、解析においては左右のチャンネルの信号の合算信号を用いている。解析区間は、楽曲中の任意の区間に設定してよい。例えば、解析区間は、より楽曲の特徴が表れるサビ区間に設定する。解析区間には、図３に示すように複数のサンプリング区間が含まれる。 In FIG. 2, the controller 11 of the music analysis apparatus 10 first extracts audio data of the analysis section using a partial section of the music as the analysis section (S11). In many cases, music is formed by stereo signals of two left and right channels. In this embodiment, in order to use both waveforms of the left and right channels as useful information, a combined signal of the signals of the left and right channels is used in the analysis. The analysis section may be set to an arbitrary section in the music. For example, the analysis section is set to a chorus section where the characteristics of the music appear more. The analysis interval includes a plurality of sampling intervals as shown in FIG.

コントローラ１１は、解析区間の音声データを周波数解析してクロマベクトルを求め、そのクロマベクトルに基づいて楽曲のＫｅｙを推定（すなわち、特定）する（Ｓ１２）。この処理では、楽曲の音声信号を周波数解析し、解析区間におけるクロマベクトルを求め、クロマベクトルからＫｅｙを推定する。さらに、コントローラ１１は、推定したＫｅｙに基づき、その楽曲のコード進行を推定（特定）する（Ｓ１３）。 The controller 11 frequency-analyzes the voice data in the analysis section to obtain a chroma vector, and estimates (ie specifies) the music key based on the chroma vector (S12). In this process, the audio signal of the music is subjected to frequency analysis, a chroma vector in the analysis section is obtained, and the key is estimated from the chroma vector. Further, the controller 11 estimates (specifies) the chord progression of the music based on the estimated key (S13).

以下、Ｋｅｙ推定処理（Ｓ１２）及びコード進行推定処理（Ｓ１３）についてより具体的に説明する。 Hereinafter, the key estimation process (S12) and the chord progression estimation process (S13) will be described more specifically.

２−１．Ｋｅｙ推定
図４は、図２に示すフローチャートにおけるＫｅｙ推定処理（Ｓ１２）の詳細を示すフローチャートである。コントローラ１１は、まず、解析区間におけるクロマベクトルを算出する（Ｓ２１）。より具体的には、まず、解析区間におけるサンプリング区間毎に、周波数解析を行ってクロマベクトルＣＨｉ（ｉ＝１，２，・・・）を算出する。周波数解析手法として例えば連続ウェーブレット変換を用いる。連続ウェーブレット変換とはマザーウェーブレットと呼ばれる基本波の拡大縮小，平行移動によってあらゆる波形を表現する手法であり、解析元の波形の時間軸情報を保持することが可能である。クロマベクトルは、各音階（音）（Ｃ，Ｃ＃，Ｄ，・・・）の信号強度を成分に持つ。クロマベクトルは、各サンプリング区間内で周波数毎（音階毎）に合算していくことにより生成される。 2-1. Key Estimation FIG. 4 is a flowchart showing details of the key estimation process (S12) in the flowchart shown in FIG. First, the controller 11 calculates a chroma vector in the analysis section (S21). More specifically, first, the chroma vector CHi (i = 1, 2,...) Is calculated by performing frequency analysis for each sampling interval in the analysis interval. For example, continuous wavelet transform is used as a frequency analysis method. Continuous wavelet transform is a technique called mother wavelet that expresses all waveforms by scaling and translation of the fundamental wave, and can hold time-axis information of the waveform of the analysis source. The chroma vector has as its component the signal intensity of each scale (sound) (C, C #, D,...). The chroma vector is generated by summing up each frequency (every scale) within each sampling interval.

各サンプリング区間でクロマベクトルを求めた後、解析区間全体で合算したクロマベクトルを求める。すなわち、同じ音階（音）毎（クロマベクトルの成分毎）に、クロマベクトルの値を合算することで、解析区間全体についてのクロマベクトルを求める。図５は、このようにして求めたクロマベクトルの一例を示している。なお、本実施の形態では、各音階（音）に対して音階（音）を示すラベルを付している。図５のカッコ内の数字がラベルを示している。例えば、音階「Ｃ」は「０」の音と、音階「Ｆ」は「５」の音と表す。 After obtaining a chroma vector in each sampling interval, a chroma vector obtained in the entire analysis interval is obtained. That is, the chroma vector for the entire analysis interval is obtained by adding the chroma vector values for each same scale (sound) (each chroma vector component). FIG. 5 shows an example of the chroma vector obtained in this way. In the present embodiment, a label indicating a scale (sound) is attached to each scale (sound). Numbers in parentheses in FIG. 5 indicate labels. For example, the scale “C” is represented as “0” and the scale “F” is represented as “5”.

次に、コントローラ１１は、クロマベクトルを参照し、「Ｃ」〜「Ｂ」の１２音の中で、信号強度の弱いものから５つの音（以下「低値音」という）を選択する（Ｓ２２）。例えば、図５に示すクロマベクトルの例では、低値音として、図６に示すように、Ｃ＃、Ｄ＃、Ｆ＃、Ｇ＃、Ａ＃の５音が選択される。 Next, the controller 11 refers to the chroma vector and selects five sounds (hereinafter referred to as “low value sounds”) from among the 12 sounds “C” to “B” having the weak signal intensity (S22). ). For example, in the example of the chroma vector shown in FIG. 5, five sounds of C #, D #, F #, G #, and A # are selected as low-value sounds as shown in FIG.

低音値が選択された場合、低音値の組み合わせからＫｅｙ候補を設定する（Ｓ２３）。図７は、各Ｋｅｙと、そのＫｅｙを構成する５つの低値音の組み合わせとの対応を示すＫｅｙ情報１７ｂを示した図である。Ｋｅｙ情報１７ｂでは、ラベルを用いて音を表している。図７のＫｅｙ情報１７ｂは、例えば、Ｋｅｙ「Ｃ」と、ラベルが１、３、６、８、１０の音の組み合わせとが対応づけられている。Ｋｅｙの特定においては、信号強度の高い音を用いるのが一般的であると思われるが、本実施の形態では、信号強度の低い方の音を用いてＫｅｙを特定している。このように信号強度の高い音を用いずに信号強度の低い音を用いる理由としては、Ｋｅｙの特定精度が向上することが、発明者の実験により得られたためである。また、信号強度が低い方から５つの音を用いる理由としては、本実施の形態では、特にポピュラー音楽を想定しており、ポピュラー音楽は７音使用による構成（ダイヤトニックスケール）であることが多いためである。コントローラ１１は、Ｋｅｙ情報１７ｂを参照し、選択した低音値の組み合わせからＫｅｙ候補を設定する。 When a bass value is selected, a key candidate is set from a combination of bass values (S23). FIG. 7 is a diagram showing the key information 17b indicating the correspondence between each key and a combination of five low-value sounds constituting the key. In the key information 17b, a sound is expressed using a label. In the key information 17b in FIG. 7, for example, Key “C” is associated with a combination of sounds with labels 1, 3, 6, 8, and 10. In specifying the key, it seems that it is common to use a sound with a high signal strength, but in this embodiment, the key is specified using the sound with the lower signal strength. The reason for using the low signal strength sound without using the high signal strength sound is that the key identification accuracy has been improved by the inventors' experiments. The reason why five sounds are used from the lowest signal intensity is particularly assumed in the present embodiment is popular music, and popular music often has a configuration using seven sounds (diatonic scale). Because. The controller 11 refers to the key information 17b and sets a key candidate from the selected combination of bass values.

Ｋｅｙ候補の設定は以下のように行う。コントローラ１１は、Ｋｅｙ情報１７ｂで定義した組み合わせの中で、５つの低値音の組み合わせと最も多く一致する組み合わせのＫｅｙを、Ｋｅｙ候補に選定する。例えば、選択した５つの低値音の組み合わせが（１，３，６，８，１０）である場合、図７のＫｅｙ情報１７ｂを参照してＫｅｙ「Ｃ」が一意的に特定される。 Key candidates are set as follows. The controller 11 selects the key of the combination that most closely matches the combination of the five low-value sounds among the combinations defined by the key information 17b as the key candidate. For example, when the combination of the five selected low-value sounds is (1, 3, 6, 8, 10), the key “C” is uniquely identified with reference to the key information 17b in FIG.

一方、５つの低値音の組み合わせから、Ｋｅｙが一意に求まらない場合、条件を変えてＫｅｙの決定を行う。具体的には、求めた５つの低値音の中から４音の組み合わせパターンを構築し、そのパターンが、Ｋｅｙ情報１７ｂにて定義されたいずれの低値音の組み合わせに近いかを判定する。例えば、本来Ｋｅｙが“Ｃ”の楽曲（＝（１，３，６，８，１０））であり、５つの低値音が（１，３，５，６，８）と検出された場合、Ｋｅｙ情報１７ｂにおいて、４つの音の組み合わせが該当する組み合わせをＫｅｙ候補として算出する。図７では、（１，３，６，８）が一致する“Ｃ”と“Ｆ”と、（１，３，５，８）が一致する“Ｇ”の３つのＫｅｙ候補が求められる。４音の組み合わせによるＫｅｙ候補の決定で算出されるＫｅｙ候補の数は最高３で、最低では１となっている。 On the other hand, when the key cannot be uniquely determined from the combination of the five low-value sounds, the key is determined by changing the conditions. Specifically, a combination pattern of four sounds is constructed from the obtained five low-value sounds, and it is determined which low-value sound combination defined in the key information 17b is close to the pattern. For example, if the key is originally a song with “C” (= (1, 3, 6, 8, 10)) and five low-value sounds are detected as (1, 3, 5, 6, 8), In the key information 17b, a combination corresponding to a combination of four sounds is calculated as a key candidate. In FIG. 7, three key candidates of “C” and “F” matching (1, 3, 6, 8) and “G” matching (1, 3, 5, 8) are obtained. The number of key candidates calculated by determining a key candidate by a combination of four sounds is 3 at the maximum and 1 at the minimum.

なお、低値音として５つの音が選択できない場合もある。例えば、クロマベクトルの値が同じ音が複数あり、信号強度の弱いものから５つを選択したときに、５種類より多くの音が選択される場合がある。そのような場合は、選択される音の数が５以下になるように音を選択する。例えば、クロマベクトルにおいて、信号強度の弱い方から音のレベルを選択した場合に、「０．０９」、「０．１１７」、「０．１４７」、「０．１９１」、「０．２３」、「０．２３」となる場合、６つの音が該当する。この場合は、信号強度の弱いものから４つを低音値として選択する。同様に、低音値として、２つまたは３つの音しか選択されない場合もある。この場合、低音値として選択された４個ないし２個の音の組み合わせからＫｅｙ情報１７ｂを参照してＫｅｙ候補を設定する。すなわち、４ないし２個の低音値を構成音として含むＫｅｙをＫｅｙ候補に設定する。なお、低音値が１つのみの場合はエラーとして処理する。 In some cases, five sounds cannot be selected as the low-value sound. For example, when there are a plurality of sounds having the same chroma vector value and five are selected from those having a weak signal intensity, more than five sounds may be selected. In such a case, the sound is selected so that the number of selected sounds is 5 or less. For example, in the chroma vector, when the sound level is selected from the one with the weaker signal intensity, “0.09”, “0.117”, “0.147”, “0.191”, “0.23” , “0.23” corresponds to six sounds. In this case, four of the low signal strengths are selected as bass values. Similarly, only two or three sounds may be selected as the bass value. In this case, a key candidate is set by referring to the key information 17b from a combination of four or two sounds selected as the bass value. That is, a key including 4 or 2 bass values as constituent sounds is set as a key candidate. If there is only one bass value, it is processed as an error.

以上のようにしてＫｅｙ候補が設定されると、コントローラ１１はＫｅｙ候補からその楽曲のＫｅｙ（以下「推定Ｋｅｙ」という）を決定する（Ｓ２４）。具体的には以下のようにしてＫｅｙ候補から推定Ｋｅｙを決定する。 When the key candidate is set as described above, the controller 11 determines the key (hereinafter referred to as “estimated key”) of the music from the key candidate (S24). Specifically, the estimated key is determined from the key candidates as follows.

推定Ｋｅｙの決定の方法はＫｅｙ候補の数に応じて異なる。以下それぞれの場合について説明する。 The method for determining the estimated key differs depending on the number of key candidates. Each case will be described below.

（１）Ｋｅｙ候補の数＝１のとき
そのＫｅｙ候補を推定Ｋｅｙに決定する。 (1) When the number of key candidates = 1, the key candidate is determined as an estimated key.

（２）Ｋｅｙ候補の数＝３のとき
本発明者は、３つのＫｅｙ候補が求められる場合、正しいＫｅｙをＮとした場合に、求められる候補がＮ、Ｎ−７、Ｎ＋７の３つとなることを発見した。そこで、２つのＫｅｙ候補からの距離が７となる共通のＫｅｙ候補がある場合、その共通のＫｅｙを推定Ｋｅｙに決定する。そのような共通のＫｅｙがない場合は、エラーとして処理する。図８は、音階の循環モデルを説明した図である。音階の加減算はこのモデルに従って行う。例えば、Ｄ−２は“Ｄ”から反時計回りに２だけ戻り“Ｃ” となる。Ｄ＋２は“Ｄ”から時計回りに２だけ進み“Ｅ” となる。よって、例えば、３つのＫｅｙ候補が求められ、それらが“Ｃ”、“Ｆ”、“Ｇ”であった場合、Ｆ＋７＝ＣかつＧ−７＝Ｃであるため、推定Ｋｅｙは“Ｃ”となる。 (2) When the number of key candidates = 3 When the present inventors are required to have three key candidates, when the correct key is N, the required candidates are N, N-7, and N + 7. I found Therefore, when there is a common key candidate whose distance from the two key candidates is 7, the common key is determined as the estimated key. If there is no such common key, it is processed as an error. FIG. 8 is a diagram for explaining a scale circulation model. Scale addition / subtraction is performed according to this model. For example, D-2 returns from “D” by “2” counterclockwise to “C”. D + 2 advances from “D” by 2 clockwise and becomes “E”. Thus, for example, when three key candidates are obtained and they are “C”, “F”, and “G”, since F + 7 = C and G−7 = C, the estimated key is “C”. Become.

（３）Ｋｅｙ候補の数＝２のとき
２つのＫｅｙ候補間の距離が２であるか否かによって処理が異なる。以下それぞれの場合の処理を説明する。 (3) When the number of key candidates = 2 The processing differs depending on whether or not the distance between two key candidates is two. The processing in each case will be described below.

ａ）２つのＫｅｙ候補間の距離が２のとき
２つのＫｅｙ候補から距離が７にあるＫｅｙを推定Ｋｅｙに設定する。
ｂ）２つのＫｅｙ候補間の距離が２でないとき
２つのＫｅｙ候補に対して所定の判定条件にしたがい３種類の投票を行い、投票値がより大きい方のＫｅｙ候補を推定Ｋｅｙに決定する。以下、３種類の投票について説明する。 a) When the distance between two key candidates is 2 A key whose distance is 7 from the two key candidates is set as an estimated key.
b) When the distance between two key candidates is not two: Three types of voting are performed on the two key candidates according to a predetermined determination condition, and the key candidate having the larger vote value is determined as the estimated key. Hereinafter, three types of voting will be described.

＜投票１＞
２つのＫｅｙ候補間で、同じ音階について強度を比較し、強度の高い方の音階を有するＫｅｙ候補に投票する。 <Voting 1>
The two key candidates are compared in strength for the same scale, and the key candidate having the higher scale is voted.

例えば、“Ｃ”と“Ｇ”がＫｅｙ候補として求められており、解析区間で求めた“Ｃ”のクロマベクトルの値が０．５であり、“Ｇ”のクロマベクトルの値が０．３であった場合、強度の強い方の“Ｃ”に票を入れる。 For example, “C” and “G” are obtained as key candidates, the value of the chroma vector of “C” obtained in the analysis interval is 0.5, and the value of the chroma vector of “G” is 0.3. If it is, vote for “C”, which is stronger.

＜投票２＞
各候補を構成する音階の中で２つのＫｅｙ候補の間で異なる音階について信号強度を比較し、比較した結果、信号強度の低い方の音階を有するＫｅｙ候補に投票する。 <Voting 2>
Among the scales constituting each candidate, the signal intensities of the different scales between the two key candidates are compared, and as a result of the comparison, the key candidate having the lower scale of the signal intensity is voted.

Ｋｅｙ候補を構成する音階について、２つのＫｅｙ候補の間で、５つの低値音の中の１つだけ異なった音階が存在する場合がある。例えば、“Ｃ”と“Ｇ”がＫｅｙ候補として求められた場合、“Ｃ”を構成する低音値の組み合わせは、（１，３，６，８，１０）であり，“Ｇ”を構成する低音値の組み合わせは（１，３，５，８，１０）である。ここで、１つだけ異なっている音階はＦ（５）とＦ＃（６）である。よって、Ｆ（５）とＦ＃（６）の信号強度を比較し、信号強度が低い方の構成音を含むＫｅｙ候補に投票する。例えば、解析区間で求めたクロマベクトルにおいて“Ｆ”の値が０．６であり、“Ｆ＃”の値が０．３であった場合、“Ｆ＃”の方が小さいので，構成音に“Ｆ＃”（６）が含まれている“Ｃ”に投票する。 There is a case where only one of the five low-value sounds has a different scale between the two key candidates for the scale constituting the key candidate. For example, when “C” and “G” are obtained as key candidates, the combination of bass values constituting “C” is (1, 3, 6, 8, 10) and constitutes “G”. The combination of bass values is (1, 3, 5, 8, 10). Here, the only scale that differs by one is F (5) and F # (6). Therefore, the signal intensities of F (5) and F # (6) are compared, and a vote is given to the key candidate including the constituent sound with the lower signal intensity. For example, in the chroma vector obtained in the analysis section, when the value of “F” is 0.6 and the value of “F #” is 0.3, “F #” is smaller, so Vote for “C” containing “F #” (6).

＜投票３＞
各Ｋｅｙ候補を示す音の１つ前の音（音階）（すなわち、１段低い音）について強度を比較し、強度のより高い方のＫｅｙ候補に投票する。 <Voting 3>
The intensities of the sounds (scales) immediately before the sound indicating each Key candidate (ie, the sound one step lower) are compared, and the higher Key candidate is voted.

例えば、２つのＫｅｙ候補として“Ｃ”と“Ｇ”が求められたとする。図９は、“Ｃ”と“Ｇ”のＫｅｙそれぞれの構成音を示した図である。同図中、“○”が記載されている音がそれぞれのＫｅｙの構成音である。“○”が記載されていない音は、前述の低値音を示す。 For example, it is assumed that “C” and “G” are obtained as two key candidates. FIG. 9 is a diagram showing the constituent sounds of the “C” and “G” keys. In the figure, the sound with “◯” is a constituent sound of each key. Sounds not marked with “◯” indicate the low-value sound described above.

Ｋｅｙの構成音には、Ｋｅｙとなっている音から１だけ減算した音も含まれている。例えば、Ｋｅｙが“Ｃ”の場合、“Ｃ”から１だけ減算した音は“Ｂ”であり、“Ｂ”は図９に示すように“Ｃ”のＫｅｙの構成音に含まれる。Ｋｅｙとなっている音から１だけ減算した音の信号強度は、Ｋｅｙを構成するため、強度の低い値ではない。よって、正しいＫｅｙが“Ｃ”であった場合、“Ｃ”から１だけ減算した音である“Ｂ”の信号強度は低くはない。一方で、Ｋｅｙ候補として“Ｇ”も挙がっている場合、“Ｇ”から１を減算した音はＦ＃であるが，これは”Ｃ”の構成音に含まれていない。つまり、２つのＫｅｙ候補について、それぞれのＫｅｙ候補から１だけ減算した音の信号強度を比較し、信号強度がより高い方がよりＫｅｙ候補である可能性が高いと考えられる。そこで、本楽曲解析装置１０では、Ｋｅｙ候補をＮ１、Ｎ２とした場合、Ｎ１−１、Ｎ２−１の音（信号強度）の強さを比較し、信号強度が強い方のＫｅｙ候補に票を入れる。例えば、各クロマベクトルにおいて、“Ｃ”から１だけ減算した“Ｂ”の信号強度が０．４であり、“Ｇ”から１だけ減算したＦ＃の信号強度が０．１であった場合、“Ｂ”の方が信号強度が大きいので、“Ｃ”に対して投票する。 The key component includes a sound obtained by subtracting 1 from the key sound. For example, when the key is “C”, the sound obtained by subtracting 1 from “C” is “B”, and “B” is included in the constituent sounds of the key “C” as shown in FIG. The signal intensity of the sound obtained by subtracting 1 from the sound that is the key is not a low value because it constitutes the key. Therefore, when the correct key is “C”, the signal intensity of “B”, which is a sound obtained by subtracting 1 from “C”, is not low. On the other hand, when “G” is also listed as a key candidate, the sound obtained by subtracting 1 from “G” is F #, but this is not included in the constituent sound of “C”. That is, for two key candidates, the signal strength of a sound obtained by subtracting 1 from each key candidate is compared, and it is considered that the higher the signal strength, the higher the possibility of being a key candidate. Therefore, in the music analysis apparatus 10, when the key candidates are N1 and N2, the strengths of the sounds (signal strength) of N1-1 and N2-1 are compared, and a vote is given to the key candidate having the higher signal strength. Put in. For example, in each chroma vector, when the signal strength of “B” obtained by subtracting 1 from “C” is 0.4 and the signal strength of F # obtained by subtracting 1 from “G” is 0.1, Since “B” has a higher signal strength, vote for “C”.

以上のように、２つのＫｅｙ候補に対して３種類の投票を行い、投票値の合計が大きい方のＫｅｙ候補を推定Ｋｅｙに決定する。 As described above, three types of voting are performed on the two key candidates, and the key candidate having the larger total vote value is determined as the estimated key.

以上のようにして、楽曲の所定区間のクロマベクトルからその楽曲のＫｅｙを決定することができる。 As described above, the key of the music can be determined from the chroma vector of the predetermined section of the music.

２．コード進行の推定
次にコード進行の推定処理（Ｓ１３）について説明する。図１０は、図２に示すフローチャートにおけるコード進行の推定処理（Ｓ１３）の詳細を示すフローチャートである。図１０のフローチャートを用いてコード進行の推定処理を説明する。 2. Next, the chord progression estimation process (S13) will be described. FIG. 10 is a flowchart showing details of chord progression estimation processing (S13) in the flowchart shown in FIG. The chord progression estimation process will be described with reference to the flowchart of FIG.

コントローラ１１は、解析区間において所定の小区間毎にコードを求める（Ｓ３１）。ここで、小区間は１拍（四分音符１つ分）の区間に設定される。楽曲解析装置１０は、楽曲の解析区間における周波数解析の際にＢＰＭ（Beat Per Minute）トラッキングを行っている。これにより、小区間すなわち１拍（四分音符１つ分）の区間のサンプルの数を算出できる。コントローラ１１は、各サンプル区間のコードをクロマベクトルから求め、１拍分（小区間分）のコードを求める。すなわち、１サンプリング毎にクロマベクトルからコード候補を算出し、１拍の区間（小区間）内で閾値以上の頻度があるコード候補をその小区間のコードに決定する。 The controller 11 obtains a code for each predetermined small section in the analysis section (S31). Here, the small section is set to a section of one beat (one quarter note). The music analysis device 10 performs BPM (Beat Per Minute) tracking during frequency analysis in a music analysis section. As a result, the number of samples in a small section, that is, a section of one beat (one quarter note) can be calculated. The controller 11 obtains a code for each sample section from the chroma vector, and obtains a code for one beat (for a small section). That is, a chord candidate is calculated from a chroma vector for each sampling, and a chord candidate having a frequency equal to or higher than a threshold within a one-beat section (small section) is determined as the code of the small section.

次に、コントローラ１１は、Ｋｅｙが“Ｃ”となるように、解析区間における各小区間のコードの音階（音程）をシフトする（Ｓ３２）。具体的には、図４のフローチャートにしたがい事前に求められたＫｅｙと“Ｃ”の差分だけ各小区間のコードをシフトする。図１１に、Ｋｅｙを“Ｃ”へ変換する時のシフト量を示す。図１１を参照すると、例えば、事前に求められたＫｅｙが“Ｆ”である場合、そのシフト量は−５であることがわかる。よって、事前に求められたＫｅｙが“Ｆ”である場合、小区間のコード“Ｅ”は、−５だけシフトされて“Ｂ”に変換される。同様に、小区間のコード“Ｇ”は、−５だけシフトされ“Ｄ”に変換される。 Next, the controller 11 shifts the scale (pitch) of the chord of each small section in the analysis section so that the key becomes “C” (S32). Specifically, the code of each small section is shifted by the difference between Key and “C” obtained in advance according to the flowchart of FIG. FIG. 11 shows the shift amount when Key is converted to “C”. Referring to FIG. 11, for example, when the key obtained in advance is “F”, the shift amount is −5. Therefore, when the key obtained in advance is “F”, the code “E” in the small section is shifted by −5 and converted to “B”. Similarly, the code “G” in the small section is shifted by −5 and converted to “D”.

次に、コントローラ１１は、以上のようにしてシフトされた解析区間におけるコード群の先頭から６コードの組み合わせを抽出する（Ｓ３３）。例えば、図１２に示すように、シフト後の解析区間におけるコードが“ＦＧＥＦＡＧＥＡＦＧＥＡ・・・”である場合、まず、その先頭から６つのコード“ＦＧＥＦＡＧ”が抽出される。 Next, the controller 11 extracts a combination of 6 codes from the head of the code group in the analysis section shifted as described above (S33). For example, as shown in FIG. 12, when the code in the analysis section after the shift is “FGEFAGEAFGEA...”, First, six codes “FGEFAG” are extracted from the head.

次に、コントローラ１１は、抽出した６コードの組み合わせの中から出現順序を変えずに、４つのコードの可能な組み合わせを全て抽出して、コード進行パターン候補を決定する（Ｓ３４）。例えば、図１２に示す例では、“ＦＧＥＦＡＧ”から、“ＦＧＥＦ”、“ＦＧＥＡ”、“ＦＧＥＧ”等がコード進行パターン候補として抽出される。 Next, the controller 11 extracts all possible combinations of the four chords from the extracted six chord combinations without changing the appearance order, and determines chord progression pattern candidates (S34). For example, in the example shown in FIG. 12, “FGEF”, “FGEA”, “FGEG”, and the like are extracted from “FGEFAG” as chord progression pattern candidates.

次に、コントローラ１１は、抽出した各コード進行パターン候補の発生確率をマルコフモデルに基づき計算する（Ｓ３５）。 Next, the controller 11 calculates the occurrence probability of each extracted chord progression pattern candidate based on the Markov model (S35).

図１３は、コード進行パターンの発生確率の算出に用いるマルコフモデルを示した図である。一般に、Ｋｅｙを取得することで、ある瞬間のコードがどの特性を持っているかを取得することができる。コードの特性とは物語における「起承転結」のようなもので、音楽理論として広く知られている。本発明で扱うコード特性の種類は“トニック（Ｔ）”、“サブドミナント（ＳＤ）”、“ドミナント（Ｄ）”の３種類である。一般的に使われているコード進行は、このコード特性を加味した上で心地よく聞こえるように構成されている。本願発明者は、このコード特性を評価として利用することで、適切なコード進行の推定を行う補助になるのではないかと考え、図１３に示すようなマルコフモデルを利用することに至った。 FIG. 13 is a diagram showing a Markov model used for calculating the probability of occurrence of a chord progression pattern. In general, by acquiring the key, it is possible to acquire which characteristic a code at a certain moment has. The characteristic of chords is something like "conversion" in a story, which is widely known as music theory. There are three types of code characteristics handled in the present invention: “Tonic (T)”, “Subdominant (SD)”, and “Dominant (D)”. The chord progression generally used is configured so that it can be heard comfortably in consideration of this chord characteristic. The inventor of the present application thought that this chord characteristic may be used as an evaluation to assist in estimating an appropriate chord progression, and has come to use a Markov model as shown in FIG.

例えば、コード進行パターン候補が“ＦＧＥＦ”である場合、マルコフモデルを使用して、ＦからＧへ遷移する確率（０．５）と、ＧからＥへ遷移する確率（０．５）と、ＥからＦへ遷移する確率（０．３）とを求め、それらを用いてコード進行パターン候補“ＦＧＥＦ”の発生確率として０．０７５（＝０．５×０．５×０．３）を求める。 For example, when the chord progression pattern candidate is “FGEF”, using a Markov model, the probability of transition from F to G (0.5), the probability of transition from G to E (0.5), and E The probability of transition from F to F (0.3) is obtained, and 0.075 (= 0.5 × 0.5 × 0.3) is obtained as the occurrence probability of the chord progression pattern candidate “FGEF” using them.

なお、発生確率の算出において、コード進行の最初がドミナントコードである場合と、コード進行の最後がトニックコードである場合に、それぞれ重みを付与してもよい。これは一般的なコード進行はドミナントコード以外のコードで始まり、トニックコードで終わるためである。例えば、最初がドミナントコードである場合、得られた発生確率を減少させ、最後がトニックコードである場合は、得られた発生確率を倍にするように、それぞれの重みを設定してもよい。 In the calculation of the probability of occurrence, a weight may be assigned when the chord progression first is a dominant chord and when the chord progression last is a tonic chord. This is because a general chord progression starts with a chord other than the dominant chord and ends with a tonic chord. For example, when the first is the dominant code, the obtained occurrence probability is decreased, and when the last is the tonic code, the respective weights may be set so as to double the obtained occurrence probability.

次に、コントローラ１１は、コード進行パターン候補のそれぞれについて、その出現頻度を求める（Ｓ３６）。 Next, the controller 11 calculates the appearance frequency of each chord progression pattern candidate (S36).

解析区間全体に含まれるコード群の中から抽出された１組のコード（６つのコード）について、上記の処理（Ｓ３４−Ｓ３６）が終了すると、コントローラ１１は、解析区間に含まれる全てのコードにおいて、６コードの全ての組み合わせが抽出されたか否かを判断する（Ｓ３７）。 When the above processing (S34-S36) is completed for one set of codes (six codes) extracted from the code group included in the entire analysis section, the controller 11 determines that all the codes included in the analysis section , It is determined whether all combinations of 6 codes have been extracted (S37).

全ての６コードの組み合わせが抽出されていない場合、コントローラ１１は、抽出開始位置を１ずつずらして新たな６コードを設定しながら（Ｓ４０）、設定した６コードから複数のコード進行パターン候補を設定し（Ｓ３４）、各コード進行パターンについて発生確率および出現頻度を求める（Ｓ３５〜Ｓ３６）。例えば、図１４に示すように、設定する６コードを“ＦＧＥＦＡＧ”、“ＧＥＦＡＧＥ”、“ＥＦＡＧＥＡ”、・・・のように順次シフトしながら、各６コードの組み合わせについて、４コードからなる可能なコード進行パターンを求め、各コード進行パターンについて発生確率および出現頻度を求める。 If all the combinations of 6 chords have not been extracted, the controller 11 sets a plurality of chord progression pattern candidates from the set 6 chords while setting new 6 chords by shifting the extraction start position one by one (S40). Then, the occurrence probability and the appearance frequency are obtained for each chord progression pattern (S35 to S36). For example, as shown in FIG. 14, the 6 codes to be set can be composed of 4 codes for each 6-code combination while sequentially shifting to “FGEFAG”, “GEFAGE”, “EFAGEA”,... The chord progression pattern is obtained, and the occurrence probability and appearance frequency are obtained for each chord progression pattern.

以上のようにして、４コードからなる複数のコード進行パターン候補が求められると、コード進行パターン候補のそれぞれについて、発生確率及び出現頻度から評価値を算出する（Ｓ３８）。各コード進行パターン候補の評価値は次式により算出する。
評価値＝コード進行パターン候補の発生確率×１０^α
α＝コード進行パターン候補の出現頻度−１ When a plurality of chord progression pattern candidates consisting of four chords are obtained as described above, an evaluation value is calculated from the occurrence probability and the appearance frequency for each chord progression pattern candidate (S38). The evaluation value of each chord progression pattern candidate is calculated by the following equation.
Evaluation value = chord progression pattern candidate occurrence probability × 10 ^α
α = Appearance frequency of chord progression pattern candidate−1

コントローラ１１は、全てのコード進行パターン候補の中で最も高い評価値を示すコード進行パターン候補を、推定されたコード進行パターンに設定する（Ｓ３９）。 The controller 11 sets the chord progression pattern candidate showing the highest evaluation value among all chord progression pattern candidates as the estimated chord progression pattern (S39).

以上のようにして、楽曲解析装置１０は楽曲のコード進行を特定することができる。 As described above, the music analysis apparatus 10 can specify the chord progression of the music.

３．まとめ
以上のように本実施の形態の楽曲解析装置１０は、コード進行推定装置として動作する。コード進行推定装置としての楽曲解析装置１０は、楽曲の音声データを取得する音声入力インターフェース２１（データ取得部の一例）と、音声データを解析して、楽曲のコード進行を推定するコントローラ１１（コード進行推定部の一例）と、を備える。コントローラ１１は、音声データを周波数解析してクロマベクトルを求め、クロマベクトルから楽曲のＫｅｙを特定し、クロマベクトルから、複数の所定の小区間のそれぞれにおいてコードを求め（Ｓ３１）、特定した楽曲のＫｅｙに基づき小区間毎に求めたコードを変換し（Ｓ３２）、変換したコード群の中から抽出した所定数のコードの組み合わせに基づいて楽曲のコード進行を特定する（Ｓ３９）。 3. Summary As described above, the music analysis device 10 of the present embodiment operates as a chord progression estimation device. The music analysis apparatus 10 as a chord progression estimation apparatus includes a voice input interface 21 (an example of a data acquisition unit) that obtains voice data of a music, and a controller 11 (code) that analyzes the voice data and estimates the chord progression of the music. An example of a progress estimation unit). The controller 11 frequency-analyzes the audio data to obtain a chroma vector, specifies the key of the song from the chroma vector, obtains a code in each of a plurality of predetermined small sections from the chroma vector (S31), and Based on the key, the chord obtained for each subsection is converted (S32), and the chord progression of the music is specified based on the combination of a predetermined number of chords extracted from the converted chord group (S39).

また、本実施の形態の楽曲解析装置１０は、Ｋｅｙ推定装置としても動作する。Ｋｅｙ推定装置としての楽曲解析装置１０は、楽曲の音声データを取得する音声入力インターフェース２１と、Ｋｅｙと音階の組み合わせとの対応を示すＫｅｙ情報１７ｂを格納する記憶部１７と、音声データを解析して、楽曲のＫｅｙを特定するコントローラ１１（Ｋｅｙ推定部の一例）と、を備える。コントローラ１１は、解析区間（所定区間の一例）の音声データを周波数解析してクロマベクトルを求め（Ｓ２１）、クロマベクトルに含まれる音階の中から複数の音階を選択し（Ｓ２２）、選択した複数の音階の組み合わせとＫｅｙ情報１７ｂに基づきＫｅｙ候補を決定し（Ｓ２３）、決定したＫｅｙ候補の中から楽曲のＫｅｙを特定する（Ｓ２４）。 In addition, the music analysis device 10 according to the present embodiment also operates as a key estimation device. The music analysis device 10 as the key estimation device analyzes the voice data, the voice input interface 21 that acquires the voice data of the music, the storage unit 17 that stores the key information 17b indicating the correspondence between the key and the scale combination, and the voice data. And a controller 11 (an example of a key estimation unit) that identifies the key of the music. The controller 11 frequency-analyzes speech data in an analysis section (an example of a predetermined section) to obtain a chroma vector (S21), selects a plurality of scales from the scales included in the chroma vector (S22), and selects the selected plurality of scales. Key candidates are determined based on the scale combinations and the key information 17b (S23), and the key of the music is specified from the determined key candidates (S24).

以上の構成を有する楽曲解析装置１０によれば、楽曲の音声データを解析することにより、楽曲のＫｅｙ及びコード進行を自動で特定することができる。このような楽曲解析装置１０の技術は、コード進行を楽曲の類似度を計算するための指標として用いて雰囲気の類似した楽曲を類似楽曲として選抜する、楽曲提供システムに適用することができる。さらには、楽曲の自動楽譜生成システムにも適用することができる。 According to the music analysis apparatus 10 having the above configuration, the key and chord progression of a music can be automatically specified by analyzing the audio data of the music. Such a technique of the music analysis apparatus 10 can be applied to a music providing system that uses a chord progression as an index for calculating the degree of similarity of music and selects music having similar atmospheres as similar music. Furthermore, the present invention can be applied to an automatic musical score generation system for music.

なお、上記の例では、楽曲解析装置１０のコントローラ１１は、解析対象の音声データ（音声信号）を音声入力インターフェース２１を介して取得したが、音声データの取得先は音声入力インターフェース２１に限定されない。記憶部１７に解析データの楽曲の音声データが記憶されている場合、コントローラ１１は、記憶部１７から音声データを読み出して取得してもよい。または、通信インターフェース１９を介してネットワークから解析データの楽曲の音声データを取得してもよい。すなわち、本発明のデータ取得部は、音声入力インターフェース２１、通信インターフェース１９、記憶部１７等で構成することができる。 In the above example, the controller 11 of the music analysis apparatus 10 acquires the audio data (audio signal) to be analyzed via the audio input interface 21, but the acquisition destination of the audio data is not limited to the audio input interface 21. . When the sound data of the music of the analysis data is stored in the storage unit 17, the controller 11 may read out and acquire the sound data from the storage unit 17. Or you may acquire the audio | voice data of the music of analysis data from a network via the communication interface 19. FIG. That is, the data acquisition unit of the present invention can be configured by the voice input interface 21, the communication interface 19, the storage unit 17, and the like.

以上のように、本発明の一実施の形態として実施の形態１を説明したが、本発明における技術はこれに限定されず、特許請求の範囲またはその均等の範囲において種々の変更、置き換え、付加、省略などを行うことができる。 As described above, the first embodiment has been described as an embodiment of the present invention. However, the technology in the present invention is not limited to this, and various modifications, replacements, and additions may be made within the scope of the claims or equivalents thereof. Can be omitted.

１０楽曲解析装置
１１コントローラ
１３表示部
１５操作部
１７記憶部
１７ａ制御プログラム
１７ｂＫｅｙ情報
１９通信インターフェース
２０マイク
２１音声入力インターフェース DESCRIPTION OF SYMBOLS 10 Music analyzer 11 Controller 13 Display part 15 Operation part 17 Storage part 17a Control program 17b Key information 19 Communication interface 20 Microphone 21 Voice input interface

Claims

An audio data acquisition unit for acquiring audio data of the music;
A storage unit for storing key information indicating a correspondence between a key and a combination of scales;
A key estimation unit that analyzes voice data and identifies the key of the music,
The key estimation unit
Obtain the chroma vector by frequency analysis of the audio data of the predetermined section,
Selecting a plurality of scales from the scales included in the chroma vector;
A key candidate is determined based on the selected combination of scales and the key information,
The key of the music is specified from the determined key candidates.
Music key estimation device.

The Key estimation unit selects a predetermined number of scales from the lower signal strength in the scales included in the chroma vector.
The music key estimation apparatus according to claim 1.

The key estimation unit
If there is one key candidate, that key candidate is identified as the key of the song,
When there are two key candidates, one key is selected from the key candidates based on a predetermined determination condition, the selected key is specified as the key of the music,
2. The music key estimation apparatus according to claim 1, wherein when there are three key candidates, one key is selected from the key candidates using a scale circulation model, and the selected key is specified as the key of the music.

The predetermined determination condition is:
A first condition that the intensity of the same scale between two key candidates is higher;
A second condition in which the intensities of the different scales between the two key candidates are lower;
A third condition in which the intensity of the scale one level lower than the scale indicated by each Key candidate is higher,
The music key estimation apparatus according to claim 3.

An audio data acquisition unit for acquiring audio data of the music;
A chord progression estimation unit that analyzes voice data and estimates the chord progression of the music,
The chord progression estimation unit
Frequency analysis of the audio data to obtain a chroma vector;
The key of the music is specified from the chroma vector,
From the chroma vector, a chord is obtained in each of a plurality of predetermined subsections, the chord obtained for each subsection is converted based on the key of the specified music, and a predetermined number of codes extracted from the converted chord group A music chord progression estimation device that identifies the chord progression of the musical piece based on a combination of chords.

The chord progression estimation unit converts the chord obtained for each small section based on the difference between the identified key and “C”.
The music chord progression estimation apparatus according to claim 5.

The chord progression estimation unit
Using a predetermined number of chord combinations as chord progression patterns, extracting a plurality of chord progression patterns from a chord group obtained in a plurality of small sections,
One chord progression pattern is identified based on the occurrence probability and appearance frequency of each chord progression pattern, and the chord progression of the music is identified based on the identified chord progression pattern.
The music chord progression estimation apparatus according to claim 5.

The chord progression estimation unit calculates the occurrence probability of the chord progression pattern using a Markov model indicating a transition probability between three types of sound characteristics of tonic, dominant, and subdominant.
The music chord progression estimation apparatus according to claim 7.

The chord progression estimation unit
Obtain the chroma vector by frequency analysis of the audio data of the predetermined section,
Selecting a plurality of scales from the scales included in the chroma vector;
A key candidate is determined based on the key information indicating the correspondence between the selected combination of scales and the combination of key and scale,
The key of the music is specified from the determined key candidates.
The music chord progression estimation apparatus according to any one of claims 5 to 8.

A program for operating an information processing device as a device for specifying the key of music,
In the control device of the information processing device,
A function for obtaining a chroma vector by performing frequency analysis on audio data in a predetermined section;
A function of selecting a plurality of scales from the scales included in the chroma vector;
A function for determining a key candidate based on a combination of a plurality of selected scales and the key information;
A function for specifying the key of the music from the determined key candidates;
program.

A program for operating the information processing device as a device for specifying the chord progression of music,
In the control device of the information processing device,
A function that obtains chroma vectors by frequency analysis of audio data;
A function for specifying the key of a song from the chroma vector, and a function for obtaining a code in each of a plurality of predetermined small sections from the chroma vector;
A function of converting the code obtained for each of the small sections based on the key of the specified music;
A function of specifying the chord progression of the music based on a combination of a predetermined number of chords extracted from the converted chord group,
program.