JP2011164497A

JP2011164497A - Tempo value detecting device and tempo value detection method

Info

Publication number: JP2011164497A
Application number: JP2010029567A
Authority: JP
Inventors: Masanobu Miura; 雅展三浦; Yuichiro Yamakaji; 雄一郎山梶; Kohei Enoki; 孝平榎; Junichi Sakagami; 淳一阪上; Masayuki Igusa; 雅幸伊草
Original assignee: SOCKETS Inc; Ryukoku University
Current assignee: SOCKETS Inc; Ryukoku University
Priority date: 2010-02-13
Filing date: 2010-02-13
Publication date: 2011-08-25
Anticipated expiration: 2030-02-13
Also published as: JP5203404B2

Abstract

<P>PROBLEM TO BE SOLVED: To estimate a precise tempo value with consideration of a double-half tempo value in a short processing time when estimating a tempo value of a music piece. <P>SOLUTION: An amplitude envelope is calculated from an amplitude waveform of a received music piece, and frequency analysis is carried out on the amplitude envelope to extract a power spectrum. As to all frequencies over the whole music piece, power spectra of frequencies in a double-half relation are multiplied by the power spectrum of the frequency, and the frequencies are rearranged sequentially from a frequency with a strong power spectrum. A value of zero-crossing where an amplitude waveform of the music piece crosses a temporal axis is calculated, while a correlation between the value of zero-crossing and the tempo value is stored. A tempo value is extracted while considering the correlation based on the calculated zero-crossing, and a tempo value closest to the extracted tempo value is selected from rearranged tempo values. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、楽曲からテンポ値を検出できるようにしたテンポ値検出装置に関するものである。なお、ここで「テンポ値」とは、楽曲中に周期的に出現する拍が１分間のうちにいくつ存在するかを示した値（bpm=beat per minute）」を言うものである。 The present invention relates to a tempo value detection device capable of detecting a tempo value from a music piece. Here, the “tempo value” refers to a value (bpm = beat per minute) ”indicating how many beats that appear periodically in a music piece within one minute.

従来より、楽曲に関する情報を処理する分野において、楽曲の波形情報からテンポ値を推定できるようにした研究が盛んに行われている。 2. Description of the Related Art Conventionally, in the field of processing information related to music, there has been a great deal of research that makes it possible to estimate the tempo value from the waveform information of the music.

例えば、下記の特許文献１（特開平５−０２７７５１号公報）には、楽曲のテンポ値を推定するために、楽曲から入力された楽曲を周波数解析してパワースペクトルやその微分値を算出し、その微分値の自己相関関数の極大値を求めることによって小節時間長を算出し、その小節時間長と拍子数とからテンポ値を算出する方法が開示されている。 For example, in the following Patent Document 1 (Japanese Patent Laid-Open No. 5-027751), in order to estimate the tempo value of a song, the frequency of the music input from the music is analyzed to calculate the power spectrum and its differential value, A method is disclosed in which a measure time length is calculated by obtaining a maximum value of the autocorrelation function of the differential value, and a tempo value is calculated from the measure time length and the number of beats.

また、下記の特許文献２（特開平７−０６４５４４号公報）にも、楽曲のテンポ値を推定する方法が開示されている。この処理方法は、入力される音響信号の特定周波数帯のレベルを抽出し、この抽出したレベルのピーク値と発生時間を順次記憶するとともに、その記憶されたレベルのピーク値の平均値を算出し、該記憶されたレベルのピーク値のうち、算出した平均値を超えるピーク値のレベルを２個選択する。そして、その選択した２個のレベルの発生間隔毎に、前記記憶した他のレベルが存在するか否かを検索して、他のレベルが該発生間隔毎に存在しているときには、該発生間隔に基づいてテンポデータを生成し、他のレベルが該発生間隔毎に存在しないときには、前記記憶した他のレベルを選択して、同様に検索を行ってテンポ値を推定するようにしたものである。 The following Patent Document 2 (Japanese Patent Laid-Open No. 7-064544) also discloses a method for estimating the tempo value of a music piece. In this processing method, the level of a specific frequency band of the input acoustic signal is extracted, the peak value of the extracted level and the generation time are sequentially stored, and the average value of the peak values of the stored level is calculated. Of the peak values of the stored level, two peak value levels exceeding the calculated average value are selected. Then, for each occurrence interval of the two selected levels, a search is made as to whether or not the other stored level exists, and when another level exists for each occurrence interval, the occurrence interval is determined. Tempo data is generated based on the above, and when no other level exists at each occurrence interval, the other stored level is selected and the same search is performed to estimate the tempo value. .

また、このような特許文献以外にも非特許文献として、下記の非特許文献１に記載されるようなテンポ値推定方法も存在する。このテンポ値推定方法は、事前準備として、13次元のパラメタを用いてGMM（Gaussian mixture model）で分類の学習を行い、テンポ値の候補を選択する場合は、単位区間において、前準備の分類結果に基づいて楽曲を分類し、該当するクラスに最も相応しいテンポ値の候補を当該する単位区間におけるテンポ値として選択する。そして、その後、単位区間ごとに求めたテンポ値から楽曲全体におけるテンポ値の中央値を求め、推定したテンポ値として出力するようにしたものである。 In addition to such a patent document, there is a tempo value estimation method as described in the following non-patent document 1 as a non-patent document. This tempo value estimation method uses 13-dimensional parameters as a preliminary preparation, learns classification with GMM (Gaussian mixture model), and selects tempo value candidates. Based on the above, the music is classified, and the tempo value candidate most suitable for the corresponding class is selected as the tempo value in the corresponding unit section. Thereafter, the median tempo value of the entire music is obtained from the tempo values obtained for each unit section, and output as the estimated tempo value.

特開平５−０２７７５１号公報JP-A-5-027751 特開平７−０６４５４４号公報Japanese Unexamined Patent Publication No. 7-064544

Xiao et al, "Using a statistic model to capture the association between timbre and perceived tempo", Proc．of 11th ISMIR pp．659-662 2008.Xiao et al, "Using a statistic model to capture the association between timbre and perceived tempo", Proc. of 11th ISMIR pp. 659-662 2008.

しかしながら、このような手法を用いてテンポ値を推定する場合、次のような問題を有する。 However, when estimating the tempo value using such a method, there are the following problems.

すなわち、上記特許文献１による手法では、１小節における拍を事前にユーザに入力してもらい、その１小節の時間長を求めて前記拍からテンポ値を求めるようにしているため、あらかじめユーザに小節の中の拍を入力してもらわなければならない。このため、多数の楽曲について自動的にテンポ値を求める場合には非常に手間がかかるといった問題がある。 That is, in the method according to Patent Document 1, the user inputs the beat in one measure in advance, the time length of the one measure is obtained, and the tempo value is obtained from the beat. You have to input the beat in the. For this reason, there is a problem that it takes a lot of time to automatically obtain tempo values for a large number of music pieces.

また、上記特許文献２による手法では、このような１小節における拍を入力することなくテンポ値を推定することができるが、この特許文献２による手法では、ピーク値である拍と拍の時間幅からテンポ値を求めるようにしているため、強拍と弱拍を有する楽曲の演奏方法によっては、テンポ値が大きくずれてしまう可能性がある。すなわち、120bpmの楽曲において１拍目と３拍目に強拍が存在する場合（すなわち４／４拍子の演奏の場合）、１小節には第１拍目と第３拍目にピーク値が存在するために、ピーク値のとりかたによっては120bpmの半分の60bpmと判断されてしまう可能性がある。また逆に120bpmの楽曲において、隣接する拍の間に他の楽器の拍が入っている場合は、逆に240bpmと判断されるといういわゆる倍半テンポ値問題を生じてしまう可能性がある。 Further, with the technique according to Patent Document 2, the tempo value can be estimated without inputting a beat in such a measure, but with the technique according to Patent Document 2, the time width of beats and beats, which are peak values, is estimated. Since the tempo value is obtained from the tempo value, there is a possibility that the tempo value will be greatly shifted depending on the performance method of the music having strong and weak beats. That is, if there is a strong beat at the 1st and 3rd beats in a 120 bpm song (ie when playing 4/4 time), there will be a peak value at the 1st and 3rd beats in 1 bar Therefore, depending on how the peak value is taken, there is a possibility that it will be judged as 60 bpm, which is half of 120 bpm. On the other hand, in the case of 120 bpm music, if there is a beat of another musical instrument between adjacent beats, there is a possibility that a so-called double half tempo value problem that it is judged to be 240 bpm may occur.

これに対して、上記非特許文献１に記載される方法では、テンポ値の候補を列挙し、倍半テンポ値問題を解決するために、平均MFCCに基づいて得られたテンポ値の候補からテンポ値を推定できるようにしている。しかしながら、このようなテンポ値の推定方法では、自己相関関数に基づいて複数のピーク値を求め、そのピーク値とピーク値の時間長の逆数からテンポ値を求めるようにしているため、楽曲の時間長が長い場合は自己相関関数の計算量が膨大になってしまう。 On the other hand, in the method described in Non-Patent Document 1, the tempo value candidates are listed, and the tempo value candidates obtained based on the average MFCC are used to solve the double-half tempo value problem. The value can be estimated. However, in such a tempo value estimation method, a plurality of peak values are obtained based on an autocorrelation function, and the tempo value is obtained from the peak value and the reciprocal of the time length of the peak value. When the length is long, the calculation amount of the autocorrelation function becomes enormous.

そこで、本発明は上記課題に着目してなされたもので、短い処理時間で倍半テンポ値を考慮し、精度良いテンポ値を推定することができるテンポ値推定方法を提供することを目的とする。 Therefore, the present invention has been made paying attention to the above-described problem, and an object thereof is to provide a tempo value estimation method capable of estimating a tempo value with high accuracy in consideration of a double half tempo value in a short processing time. .

すなわち、本発明は上記課題を解決するために、入力された楽曲から当該楽曲のテンポを推定するテンポ値推定装置において、楽曲の入力を受け付ける受付手段と、当該受け付けた楽曲における振幅波形を周波数解析してパワースペクトルを抽出する周波数解析手段と、楽曲全体にわたるすべての周波数について倍半関係にある周波数のパワースペクトルを考慮した値をそれぞれの周波数のパワースペクトルに付加する倍半処理手段と、前記受付手段で受け付けた楽曲における振幅波形が時間軸とクロスするゼロクロスの値を算出するゼロクロス算出手段と、あらかじめゼロクロスの値とテンポ値との相関関係を記憶させた相関記憶手段と、前記倍半処理手段によって得られたパワースペクトルのうち最大パワースペクトルから順に対応する周波数をテンポ値候補として抽出するとともに、前記ゼロクロス算出手段で算出されたゼロクロスから前記相関関係を参酌し、前記テンポ値候補の中から一のテンポ値を推定する推定手段とを設けるようにしたものである。 That is, in order to solve the above-described problem, the present invention provides a tempo value estimation device that estimates the tempo of the music from the input music, receiving means for receiving the music input, and frequency analysis of the amplitude waveform in the received music A frequency analysis means for extracting the power spectrum, and a half processing means for adding to the power spectrum of each frequency a value that takes into account the power spectrum of the frequency that is in a half-half relationship for all frequencies throughout the music; Means for calculating a zero cross value at which the amplitude waveform in the music received by the means crosses the time axis, a correlation storage means for storing a correlation between the zero cross value and the tempo value in advance, and the half-half processing means Corresponding in order from the maximum power spectrum among the power spectra obtained by The wave number is extracted as a tempo value candidate, and an estimation means for estimating one tempo value from the tempo value candidates by providing the correlation from the zero cross calculated by the zero cross calculation means is provided. It is.

すならち、一般的に、振幅波形のゼロクロスの値とテンポ値とは相関関係を有しており、テンポ値が高ければゼロクロスの値も高くなる。逆に、テンポ値が低ければゼロクロスの値も低くなる。そこで、本発明では、このような相関関係を利用することによって正解のテンポ値を抽出するようにしたので、計算量を少なくした状態で精度良くテンポ値を推定することができるようになる。 That is, in general, the zero-cross value of the amplitude waveform and the tempo value have a correlation, and the higher the tempo value, the higher the zero-cross value. Conversely, if the tempo value is low, the zero cross value is also low. Therefore, in the present invention, since the correct tempo value is extracted by using such a correlation, the tempo value can be accurately estimated with a reduced amount of calculation.

また、このような発明において、受け付けた楽曲における振幅波形から当該波形の振幅包絡を算出する包絡算出手段を設け、当該算出された包絡波形情報を周波数解析して楽曲全体のパワースペクトルを抽出するようにする。 Further, in such an invention, an envelope calculating means for calculating the amplitude envelope of the waveform from the amplitude waveform in the received music is provided, and the power spectrum of the entire music is extracted by performing frequency analysis on the calculated envelope waveform information. To.

このようにすれば、受け付けた波形そのものを周波数解析するのではなく、包絡波形である低周波を周波数解析するので、楽器や音声による高周波波形にとらわれることなく楽曲の大きな流れである拍の位置のパワースペクトルを抽出することができるようになる。 In this way, the received waveform itself is not frequency-analyzed, but the low-frequency envelope waveform is frequency-analyzed. The power spectrum can be extracted.

さらに、包絡算出手段で振幅包絡を算出する場合、振幅波形を二乗した波形を所定の平滑化区間ごとに平均値を算出するようにする。 Furthermore, when the amplitude envelope is calculated by the envelope calculating means, an average value of a waveform obtained by squaring the amplitude waveform is calculated for each predetermined smoothing section.

このようにすれば、振幅包絡の算出を表現しつつ、ダウンサンプリングを行うことができるようになる。 In this way, downsampling can be performed while expressing the calculation of the amplitude envelope.

本発明では、入力された楽曲から当該楽曲のテンポを推定するテンポ値推定装置において、楽曲の入力を受け付ける受付手段と、当該受け付けた楽曲における振幅波形を周波数解析してパワースペクトルを抽出する周波数解析手段と、楽曲全体にわたるすべての周波数について倍半関係にある周波数のパワースペクトルを考慮した値をそれぞれの周波数のパワースペクトルに付加する倍半処理手段と、前記受付手段で受け付けた楽曲における振幅波形が時間軸とクロスするゼロクロスの値を算出するゼロクロス算出手段と、あらかじめゼロクロスの値とテンポ値との相関関係を記憶させた相関記憶手段と、前記倍半処理手段によって得られたパワースペクトルのうち最大パワースペクトルから順に対応する周波数をテンポ値候補として抽出するとともに、前記ゼロクロス算出手段で算出されたゼロクロスから前記相関関係を参酌し、前記テンポ値候補の中から一のテンポ値を推定する推定手段とを設けるようにしたので、計算量を少なくして精度良いテンポ値を推定することができるようになる。 In the present invention, in a tempo value estimation device for estimating the tempo of a music from the input music, a receiving means for receiving the music input, and a frequency analysis for extracting a power spectrum by frequency analysis of the amplitude waveform in the received music Means, a half processing means for adding to the power spectrum of each frequency a value that takes into account the power spectrum of the frequency that is in a half-half relationship for all frequencies throughout the music, and an amplitude waveform in the music received by the receiving means Zero-cross calculating means for calculating a zero-cross value that crosses the time axis, correlation storage means for storing the correlation between the zero-cross value and the tempo value in advance, and the maximum power spectrum obtained by the half-half processing means Extract corresponding frequencies as tempo value candidates in order from the power spectrum. In addition, since the correlation is taken into consideration from the zero cross calculated by the zero cross calculation means and an estimation means for estimating one tempo value from the tempo value candidates is provided, the calculation amount is reduced and the accuracy is increased. A good tempo value can be estimated.

本発明の一実施の形態におけるテンポ値推定装置の機能ブロック図Functional block diagram of a tempo value estimation apparatus according to an embodiment of the present invention 同形態における平滑化ダウンサンプリングの概要を示す図The figure which shows the outline | summary of the smoothing downsampling in the same form 同形態におけるパワースペクトルと倍半処理を示す図The figure which shows the power spectrum and the double half processing in the same form 音響パラメタを算出する楽曲のテンポ値の分布を示す図The figure which shows distribution of the tempo value of the music which calculates the acoustic parameter 同形態におけるゼロクロスの値を示す図The figure which shows the value of the zero cross in the form 同形態におけるゼロクロスとテンポ値の相関関係を示す図The figure which shows the correlation of the zero cross and tempo value in the form 同形態におけるテンポ値推定のためのフローチャートFlowchart for tempo value estimation in the same form

以下、本発明の一実施の形態について図面を参照して説明する。この実施の形態におけるテンポ値推定装置１は、パーソナルコンピューターや携帯型音楽再生装置、電子楽器などのような情報処理装置によって構成されるものであって、楽曲の振幅波形の情報を受け付けて、そこから一分間に拍となる音が何回出力されるのかを示すテンポ値を電子的に推定できるようにしたものである。より詳しく説明すると、このテンポ置推定装置１は、まず、楽曲の入力を受け付け、その受け付けた楽曲の振幅−時間領域における振幅波形をデータ処理して振幅包絡を算出する。そして、その算出された振幅包絡を周波数解析して楽曲全体のパワースペクトルを抽出し、それらのパワースペクトルのうち、各周波数の倍半関係にあるパワースペクトルをその周波数のパワースペクトルに乗算し、最大パワースペクトルから順にそれに対応する周波数を並び替えるようにする。一方、これと並行して、あらかじめ振幅−時間領域における振幅波形が時間軸とクロスするゼロクロスの値を算出し、また、そのゼロクロスの値とテンポ値との相関関係を示す情報を記憶しておき、その相関関係を示す情報を参照しながら、ゼロクロスの値を用いて前記並び替えられたテンポ値の候補の中から最適なテンポ値を抽出するようにしたものである。以下に、これらの処理を行う機能実現手段の詳細について説明する。なお、これらの機能実現手段は、情報処理装置のＣＰＵや記憶媒体、入力装置や表示出力装置、通信装置などとプログラムなどを協働させて構成される。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings. The tempo value estimation device 1 in this embodiment is configured by an information processing device such as a personal computer, a portable music playback device, an electronic musical instrument, etc., which receives information on the amplitude waveform of the music, It is possible to electronically estimate the tempo value that indicates how many beats per minute are output. More specifically, the tempo position estimating apparatus 1 first receives an input of a music piece, calculates an amplitude envelope by data processing the amplitude waveform in the amplitude-time domain of the received music piece. Then, frequency analysis of the calculated amplitude envelope is performed to extract the power spectrum of the entire music, and among those power spectra, the power spectrum of the double frequency of each frequency is multiplied by the power spectrum of that frequency, and the maximum The corresponding frequencies are rearranged in order from the power spectrum. On the other hand, in parallel with this, a zero cross value at which the amplitude waveform in the amplitude-time domain crosses the time axis is calculated in advance, and information indicating the correlation between the zero cross value and the tempo value is stored in advance. The optimum tempo value is extracted from the rearranged tempo value candidates using the zero cross value while referring to the information indicating the correlation. Details of the function realizing means for performing these processes will be described below. Note that these function realizing means are configured by cooperating a program and the like with the CPU and storage medium of the information processing apparatus, the input device, the display output device, the communication device, and the like.

まず、受付手段２は、ユーザが所有している楽曲の入力を受け付ける。この楽曲を受け付ける場合、ＣＤ音源として広く普及しているWAVE形式のデジタル信号や、マイクなどを介して取得されたアナログ信号をＡ／Ｄ変換した信号、インターネットラジオで配信された情報やmp3形式の情報などを受け付ける。このようにして受け付けられた楽曲の情報は、図２に示すように、縦軸が振幅、横軸が時間軸となる振幅−時間領域における振幅波形として記憶される。この振幅波形からテンポ値を推定する場合、一般に、人間は単位時間における拍を数えてテンポ値を特定するが、この振幅波形のテンポ値を抽出するには、周期的に出現する拍の出現周期を求める必要がある。通常、拍が存在する時刻には、アクセントとなる強い音が出力され、振幅波形では、大きな振幅を有する時刻が周期的に存在することになる。一方、楽曲の波形は楽器の種類などによって大きく変化し、高周波で変化する波形成分も多く含まれることになる。そこで、振幅波形からテンポ値を推定するためには、低周波数分解能を高くするようにしている。ここで「低周波数分解能」とは、波形情報の低周波の帯域における周波数解析の周波数分解能を示すものである。テンポ値に相当する周波数帯域としては、テンポ値が約50bpm-220bpmの範囲に現実的に含まれるため、低周波の帯域は約1Hz-3Hzとなる。そこで、この低周波数分解能を確保するために包絡情報などを算出して、細かく正確なテンポ値を抽出できるようにする。 First, the accepting unit 2 accepts an input of music pieces owned by the user. When accepting this music, the WAVE format digital signal, which is widely used as a CD sound source, the analog signal acquired via a microphone, etc., A / D converted signal, the information distributed on the Internet radio and the mp3 format Accept information. As shown in FIG. 2, the music information received in this way is stored as an amplitude waveform in an amplitude-time region in which the vertical axis is amplitude and the horizontal axis is time axis. When estimating the tempo value from this amplitude waveform, human beings generally specify the tempo value by counting the number of beats in unit time, but in order to extract the tempo value of this amplitude waveform, the appearance period of the periodically appearing beats It is necessary to ask. Normally, a strong sound as an accent is output at the time when a beat exists, and in the amplitude waveform, time having a large amplitude periodically exists. On the other hand, the waveform of music changes greatly depending on the type of musical instrument, and includes many waveform components that change at high frequencies. Therefore, in order to estimate the tempo value from the amplitude waveform, the low frequency resolution is increased. Here, “low frequency resolution” indicates the frequency resolution of the frequency analysis in the low frequency band of the waveform information. As the frequency band corresponding to the tempo value, the tempo value is practically included in the range of about 50 bpm-220 bpm, so the low frequency band is about 1 Hz-3 Hz. Therefore, in order to ensure this low frequency resolution, envelope information and the like are calculated so that a fine and accurate tempo value can be extracted.

包絡算出手段３は、この受付手段２で受け付けた楽曲の振幅−時間領域の振幅波形からその振幅包絡を算出してダウンサンプリングし、低周波数分解能を確保する。低周波数分解能を確保する方法としては、現状の手法として周波数解析のサンプル点数を多くするか、前処理としてサンプリング周波数を落とすダウンサンプリングをする方法が挙げられる。周波数解析のサンプル点数を多くする場合、必要なテンポ推定の精度の細かさを1bpm、周波数解析にフーリエ解析を用いるものとし、解析する波形情報をサンプリング周波数44100Hzとすると約260万以上のサンプル点数が必要となる（必要なサンプル点数2646000=サンプリング周波数44100Hz/(1bpm/60)。ダウンサンプリングを行う場合は、解析する周波数帯域が低周波に限定され、ダウンサンプリング数に応じて周波数解析に必要なサンプル点数が減る。ただし、このとき、帯域制限を行わなければエイリアシングが発生してしまう。これら２つの方法を比較すると、周波数解析のサンプル点数を多くする場合はサンプル点数の値が比較的膨大であり、現実的な方法とは考えられないため、ダウンサンプリングを行うようにする。また、エイリアシングを発生させないための帯域制限は、テンポ値推定に必要な周波数が4Hz以下といった低周波帯域であることから、波形情報の振幅包絡を算出することで、その帯域を通すローパスフィルターとほぼ同じ効果を得ることができ、ダウンサンプリングする前に、まず振幅包絡の算出を行うようにする。 The envelope calculation means 3 calculates the amplitude envelope from the amplitude waveform in the amplitude-time domain of the music received by the reception means 2 and down-samples to ensure low frequency resolution. As a method for securing low frequency resolution, there are a method of increasing the number of sampling points of frequency analysis as a current method, or a method of down-sampling that lowers the sampling frequency as preprocessing. When increasing the number of sampling points for frequency analysis, if the precision of the required tempo estimation is 1 bpm and Fourier analysis is used for frequency analysis, and the waveform information to be analyzed is a sampling frequency of 44100 Hz, the number of sampling points is approximately 2.6 million or more. Necessary (required number of sampling points 2646000 = sampling frequency 44100Hz / (1bpm / 60). When downsampling is performed, the frequency band to be analyzed is limited to low frequency, and the samples required for frequency analysis according to the number of downsampling In this case, however, aliasing will occur if band limitation is not performed.When these two methods are compared, the number of sample points is relatively large when the number of sample points for frequency analysis is increased. Because it is not considered a realistic method, try downsampling and do aliasing. The band limitation for preventing the frequency from being generated is a low frequency band where the frequency required for tempo value estimation is 4 Hz or less, so by calculating the amplitude envelope of the waveform information, the effect is almost the same as a low-pass filter that passes that band. It is possible to calculate the amplitude envelope first before downsampling.

ここで、振幅包絡を算出する場合、平滑化ダウンサンプリングを用いることとする。この平滑化ダウンサンプリングを行う場合、図２に示すように、ダウンサンプリング数ごとの区間である平滑化区間で、振幅の二乗値である音響パワースペクトルの平均値を求め、その値をダウンサンプリング後の波形情報として抽出する。ここで、平滑化区間とは、平滑化ダウンサンプリング後の振幅波形におけるサンプリング周期の時間長に相当し、音響パワースペクトルを求めることは全波整流することに相当する。平滑化ダウンサンプリングの概要について図２を用いて説明すると、図２（ａ）は受付手段２によって受け付けられた振幅波形であり、図２（ｂ）はそれを平滑化ダウンサンプリングしたものである。図２（ｂ）における波形は振幅波形を二乗処理したものであり、その波形における平滑化区間内において波形の平均値を算出してその区間の値として記憶させている。これらの値は、実際には次式によって算出される。 Here, when calculating the amplitude envelope, smoothing downsampling is used. When performing this smoothing downsampling, as shown in FIG. 2, the average value of the acoustic power spectrum, which is the square value of the amplitude, is obtained in the smoothing section that is the section for each number of downsampling, and the value is downsampled. Is extracted as waveform information. Here, the smoothing section corresponds to the time length of the sampling period in the amplitude waveform after smoothing down-sampling, and obtaining the acoustic power spectrum corresponds to full-wave rectification. The outline of the smoothed downsampling will be described with reference to FIG. 2. FIG. 2 (a) shows the amplitude waveform received by the receiving means 2, and FIG. 2 (b) shows the result of smoothed downsampling. The waveform in FIG. 2B is obtained by squaring the amplitude waveform, and an average value of the waveform is calculated and stored as a value in the smoothed section in the waveform. These values are actually calculated by the following equation.

図２（ｂ）および数１に示すように平滑化ダウンサンプリングでは、一定区間ごとの音響パワースペクトルの平均値を求めることで、振幅包絡の算出を表現しつつ、ダウンサンプリングを行っている。ここで、エイリアシングについて考えると、平滑化区間における音響パワースペクトルの平均値を求めていることが強力なローパスフィルターを通していることに相当し、必要な帯域の制限が行われていることになる。なお、平滑化ダウンサンプリングを行うと、波形情報の値がすべて正となり、直流成分(0Hz)の値が大きくなるため、平滑化ダウンサンプリング後の波形情報を周波数解析する場合はＺ値変換といった平均値を「0」にする正規化処理を行うようにする。 As shown in FIG. 2B and Formula 1, in the smoothed downsampling, downsampling is performed while expressing the calculation of the amplitude envelope by obtaining the average value of the sound power spectrum for each fixed section. Here, considering aliasing, obtaining the average value of the acoustic power spectrum in the smoothed section corresponds to passing through a strong low-pass filter, and the necessary band is limited. When smoothed down-sampling is performed, all waveform information values become positive and the DC component (0 Hz) value increases. Therefore, when performing frequency analysis on the waveform information after smoothed down-sampling, an average such as Z value conversion is used. Perform normalization to set the value to “0”.

周波数解析手段４は、この包絡算出手段３によって算出された包絡情報をフーリエ変換（FFT）してパワースペクトル−周波数領域におけるパワースペクトルを算出する。このフーリエ変換は解析時間ごとに行われるため、楽曲全体については複数のパワースペクトル−周波数領域のパワースペクトルを算出する。ところで、このように解析されたパワースペクトルについては、基本的に、強拍となる周波数位置に大きなパワースペクトルが存在することになるが、実際には、４／４拍子のように１小節内の１拍目と３拍目に強拍が存在する場合のように最大パワースペクトルを有する周波数とテンポ値とが必ずしも対応するとは限らならないことが多い。そこで、次の倍半処理手段５によってテンポ値推定のためのテンポ値候補を抽出する。 The frequency analysis unit 4 calculates the power spectrum in the power spectrum-frequency domain by performing Fourier transform (FFT) on the envelope information calculated by the envelope calculation unit 3. Since this Fourier transform is performed at every analysis time, a plurality of power spectra-frequency domain power spectra are calculated for the entire music. By the way, the power spectrum analyzed in this way basically has a large power spectrum at the frequency position where the beat is strong, but in reality, it is within one measure like a 4/4 time signature. In many cases, the frequency having the maximum power spectrum does not always correspond to the tempo value as in the case where strong beats exist at the first and third beats. Therefore, tempo value candidates for tempo value estimation are extracted by the next half-half processing means 5.

この倍半処理手段５では、まず、テンポ値推定に支配性の高い周波数を求めることで、その周波数からテンポ値を求める。ここで、「支配性の高い周波数」とは、楽曲の全体にわたってパワースペクトルが大きい周波数を示し、所定の閾値よりも大きなパワースペクトルを有する周波数を示す。このとき、テンポ値推定においては、倍半テンポ値問題があるため、その大きなパワースペクトルを有する周波数の倍、または半分の関係がある周波数についても支配性の高い周波数に含まれると考えられる。そのため、正解となりえるテンポ値の候補を列挙する場合、倍半関係を考慮した支配性の高い周波数に対応するテンポ値を候補として列挙する。また、列挙するときには、その支配性の高さが有力なテンポ値の候補であると考えられるため、例えば、テンポ値（周波数）に対応するパワースペクトルを指標として順位付けを行う。この処理の流れを図３に示すと、ばず、楽曲の振幅波形を短時間周波数解析し、その解析結果におけるすべての周波数において、倍半関係の周波数に対応するパワースペクトルの値を乗算する（図３（ａ））。すなわち、100Hzのパワースペクトルには50Hz及び200Hzのパワースペクトルを乗算し、101Hzのパワースペクトルには50.5Hz及び202Hzのパワースペクトルを乗算し、以下どうようにすべての周波数について倍半関係にあるパワースペクトルの値を乗算していく。次に、これらのパワースペクトルを時間軸で合計し（図３（ｂ））、大きなパワースペクトルのピークに対応する基準の周波数（以下、基準周波数という）を得て（図３（ｃ））これを基準のテンポ値とする。これにより、倍半関係のテンポ値に対応するパワースペクトルを互いに強調し、楽曲の全体にわたってパワースペクトルが大きい周波数であるテンポ値の候補を得ることが可能となる。そして、列挙したテンポ値の候補内に、倍半関係のテンポ値がある場合、対応するテンポ値のパワースペクトルを加算した後に、パワースペクトルが大きい順に並び替える（図３（ｄ））。これにより、倍半関係にあるテンポ値の候補のパワースペクトルが大きくなるため、パワースペクトルが大きく倍半関係があるテンポ値が高順位に位置される。 In the double-half processing means 5, first, a tempo value is obtained from the frequency by obtaining a frequency having a high dominance in the tempo value estimation. Here, the “highly dominant frequency” indicates a frequency having a large power spectrum over the entire music, and indicates a frequency having a power spectrum larger than a predetermined threshold. At this time, since there is a double / half tempo value problem in the tempo value estimation, it is considered that frequencies having a relationship of double or half of the frequency having a large power spectrum are also included in the highly dominant frequency. Therefore, when enumerating tempo value candidates that can be correct, enumerate tempo values corresponding to frequencies with high dominance in consideration of the double-half relationship. Further, when enumerating, since it is considered that the dominance of the tempo value is a dominant tempo value candidate, ranking is performed using, for example, a power spectrum corresponding to the tempo value (frequency) as an index. The flow of this process is shown in FIG. 3. In short, the amplitude waveform of the music is subjected to frequency analysis for a short time, and all the frequencies in the analysis result are multiplied by the value of the power spectrum corresponding to the frequency of the half-fold relationship ( FIG. 3 (a)). That is, the 100Hz power spectrum is multiplied by the power spectrum of 50Hz and 200Hz, the 101Hz power spectrum is multiplied by the power spectrum of 50.5Hz and 202Hz, and so on. Multiply by the value of. Next, these power spectra are summed on the time axis (FIG. 3 (b)) to obtain a reference frequency (hereinafter referred to as a reference frequency) corresponding to a large power spectrum peak (FIG. 3 (c)). Is the standard tempo value. As a result, the power spectra corresponding to the tempo values of the half-half relationship are emphasized to each other, and tempo value candidates having a frequency with a large power spectrum over the entire music can be obtained. If there is a halve-related tempo value among the enumerated tempo value candidates, the power spectrum of the corresponding tempo value is added, and the power spectrum is rearranged in descending order (FIG. 3 (d)). As a result, the power spectrum of the tempo value candidate having the double / half relationship is increased, so that the tempo value having a large power spectrum and the double / half relationship is positioned in a high order.

次に、このようにテンポ値の候補を順位付けした後、別の指標を用いてこれらの候補の中から正解となるテンポ値を抽出する。一般に、楽曲の速さの指標については、テンポ値と楽曲の波形情報から算出される「音響パラメタ」がそれに相当すると考えられ、この音響パラメタとテンポ値との相関を求め、相関が高い音響パラメタを楽曲の速さの指標とする。音響パラメタを算出する楽曲の波形情報は、100曲の楽曲群とし、テンポ値はそれぞれの楽曲に対し音楽経験者がハンドラベリングしたものを用いる。用いた楽曲群のテンポ値の分布を、横軸をテンポ値の階級、縦軸を曲数とし、図４に示す。図４に示すように用いた楽曲群のテンポ値は平均的なテンポ値である100bpmを中央に正規分布しており、テンポ値に偏りがないことを確認することができる。このとき、用いた楽曲の音源は、WAVE形式の波形情報とし、サンプリング周波数44100Hz、時間長4-6minのモノラル音源とした。算出する音響パラメタ、及びテンポ値との相関を表１に示す。 Next, after ranking the candidates for the tempo in this way, the correct tempo value is extracted from these candidates using another index. In general, for the index of music speed, it is considered that the “acoustic parameter” calculated from the tempo value and the waveform information of the music corresponds to this, and the correlation between this acoustic parameter and the tempo value is obtained, and the acoustic parameter having a high correlation is obtained. Is an index of the speed of music. The waveform information of the music for which the acoustic parameters are calculated is a group of 100 music, and the tempo value is a value that is experienced by a music experienced person for each music. The distribution of tempo values of the used music group is shown in FIG. 4 with the horizontal axis representing the tempo value class and the vertical axis representing the number of songs. As shown in FIG. 4, the tempo values of the music group used are normally distributed centering on the average tempo value of 100 bpm, and it can be confirmed that there is no bias in the tempo values. At this time, the sound source of the music used was waveform information in WAVE format, and was a monaural sound source with a sampling frequency of 44100 Hz and a time length of 4-6 min. Table 1 shows the correlation between the calculated acoustic parameter and the tempo value.

表１に示す音響パラメタの算出についてそれぞれ述べると、音響パワースペクトルの平均値Γ_ave1 及び標準偏差Γ_1SD は、時刻x(x=0, 1, 2, ・・・X) における音響パワースペクトルΓ₁[x] から得られ、音響パワースペクトルΓ₁[x] は、時刻x(x = 0, 1, 2, ・・・X) における元の波形情報をf[x]として次の数２で求められる。 When the calculation of the acoustic parameters shown in Table 1 is described, the average value Γ _ave1 and standard deviation Γ _1SD of the acoustic power spectrum is the acoustic power spectrum Γ _{1 at} time x (x = 0, 1, 2,... X). [x], and the sound power spectrum Γ ₁ [x] is obtained by the following equation 2 with the original waveform information at time x (x = 0, 1, 2, ... X) as f [x]. It is done.

Fluxの平均値Γ_ave2 及び標準偏差Γ_2SD は、時刻x"(x" =0, 1, 2, ・・・X")におけるFluxΓ₂[x"]から得られる。FluxΓ₂[x"]は、元の波形情報f [x] を短時間周波数解析することで得られ、時刻x"、周波数y における元の波形情報の周波数解析結果をFx" [y] とし数３で求められる。このときの短時間周波数解析はSFFTを用いることとし、用いたサンプル数Y は8192とした。また、この数ごとにシフトして解析することとした。

The average value Γ _ave2 and standard deviation Γ _2SD of Flux are obtained from Flux Γ ₂ [x ”] at time x ″ (x ″ = 0, 1, 2,... X ″). FluxΓ ₂ [x "] is obtained by short-time frequency analysis of the original waveform information f [x]. The frequency analysis result of the original waveform information at time x" and frequency y is Fx "[y]. In this case, SFFT is used for the short-time frequency analysis, and the number of samples Y used is 8192. The analysis is shifted for each number.

Centroidの平均値Γ_ave3 および標準偏差Γ_3SD は、時刻x"(x" = 0, 1, 2, ・・・X")におけるCentroidΓ₃[x"] から得られ、CentroidΓ₃[x"] は、元の波形情報f [x] を短時間周波数解析することで得られ、時刻x"、周波数y における元の波形情報の周波数解析結果をFx" [y] とし数４で求められる。このときの短時間周波数解析は、FluxΓ₂[x"] と同じようにSFFT を用いることとし、用いたサンプル数Y は8192とした。また、この数ごとにシフトして解析することとした。 Centroid's mean value Γ _ave3 and standard deviation Γ _3SD are obtained from CentroidΓ ₃ [x "] at time x" (x "= 0, 1, 2, ... X"). CentroidΓ ₃ [x "] is The original waveform information f [x] is obtained by performing frequency analysis for a short time, and the frequency analysis result of the original waveform information at time x "and frequency y is Fx" [y]. In the short-time frequency analysis, SFFT was used in the same manner as FluxΓ ₂ [x "], and the number of samples Y used was 8192. In addition, the analysis was shifted for each number.

ZeroCrossingsΓ₄ （以下、ゼロクロスという）は、図５に示すように、元の振幅波形情報f[x] が正から負、負から正となる回数（すなわち、時間軸とクロスする回数）であり、これをカウントすることにより得られる。ここで、上述の表１に示した相関を比較すると、ZeroCrossings がテンポ値と高い相関を示している。そのため、楽曲から算出されるZeroCrossingsを楽曲の速さの指標とし、ゼロクロス算出手段６でこの速さの指標に対する値を算出する。このとき、ゼロクロスの値は有力なテンポ値を選択する指標となるため。テンポ値とゼロクロスを回帰分析し、ゼロクロスから求めたテンポ値を最終的な楽曲の速さの指標をＺテンポ値として記憶させるようにしている。 ZeroCrossingsΓ ₄ (hereinafter referred to as zero cross) is the number of times that the original amplitude waveform information f [x] changes from positive to negative and from negative to positive (that is, the number of times crossing the time axis), as shown in FIG. It is obtained by counting this. Here, when comparing the correlation shown in Table 1 above, ZeroCrossings shows a high correlation with the tempo value. For this reason, ZeroCrossings calculated from the music is used as an index of the speed of the music, and the zero cross calculation means 6 calculates a value for this speed index. At this time, the zero cross value is an index for selecting a powerful tempo value. The tempo value and the zero cross are subjected to regression analysis, and the tempo value obtained from the zero cross is stored as a Z tempo value as an index of the final music speed.

相関記憶手段７は、このゼロクロスの値とＺテンポ値との相関関係を記憶させる。一般にゼロクロスの値とＺテンポ値との関係は、図６に示すように、ゼロクロスの値が大きくなればＺテンポ値の値も大きくなる。そこで、これらの対応関係を相関記憶手段７に記憶させておくようにする。 The correlation storage means 7 stores the correlation between the zero cross value and the Z tempo value. In general, as shown in FIG. 6, the relationship between the zero cross value and the Z tempo value increases as the zero cross value increases. Therefore, these correspondences are stored in the correlation storage means 7.

推定手段８は、前記倍半処理手段５によってパワースペクトル順に並び替えられた周波数の中から、周波数が倍半関係のピーク位置のパワースペクトルを足し合わせ、上位２つのテンポ値の候補を抽出する。そして、ゼロクロスの値からＺテンポ値を回帰方程式によって算出し、列挙した２つのテンポ値候補のうちＺテンポ値に近いテンポ値候補を最終的に推定したテンポ値として出力する。 The estimation means 8 adds up the power spectra at the peak position where the frequency is half-half from the frequencies rearranged in order of the power spectrum by the half-half processing means 5 and extracts candidates for the top two tempo values. Then, a Z tempo value is calculated from the zero cross value by a regression equation, and a tempo value candidate close to the Z tempo value among the two enumerated tempo value candidates is output as a finally estimated tempo value.

次に、このように構成された各機能実現手段を用いて楽曲のテンポ値を推定する処理の流れについて図７のフローチャートを用いて説明する。 Next, the flow of processing for estimating the tempo value of a song using each function realizing means configured as described above will be described with reference to the flowchart of FIG.

まず、楽曲からそのテンポ値を推定する場合、WAVE形式の楽曲データを受け付け（ステップＳ１）、その振幅波形を二乗して平滑化区間ごとに平均値を格納する平滑化ダウンサンプリングを行う。これによって振幅包絡の算出を表現しつつ、ダウンサンプリングを行うようにする（ステップＳ２）。そして、このダウンサンプリングされた振幅包絡を正規化する（ステップＳ３）。 First, when estimating the tempo value from the music, WAVE format music data is received (step S1), and the smoothed down-sampling is performed by squaring the amplitude waveform and storing the average value for each smoothed section. Thus, downsampling is performed while expressing the calculation of the amplitude envelope (step S2). The downsampled amplitude envelope is then normalized (step S3).

次に、この平滑化ダウンサンプリングされた波形情報を短時間周波数解析し（ステップＳ４）、図３（ａ）に示すように、すべての周波数についてその倍半関係にある周波数のパワースペクトルを乗算していく（ステップＳ５）。そして、このパワースペクトルを乗算した後の短時間周波数解析の結果を周波数ごとに時間軸方向に足し合わせ（図３（ｂ））、最大パワースペクトルから所定個数のパワースペクトルを有するテンポ値を候補として並び替えて列挙し、最後に、倍半関係のパワースペクトルを足し合わせて上位所定個数（例えば２個）の候補を抽出する（図３（ｃ）（ｄ））（ステップＳ６）。 Next, the smoothed down-sampled waveform information is subjected to frequency analysis for a short time (step S4), and as shown in FIG. (Step S5). Then, the result of the short-time frequency analysis after multiplication of the power spectrum is added in the time axis direction for each frequency (FIG. 3B), and a tempo value having a predetermined number of power spectra from the maximum power spectrum is selected as a candidate. Rearranged and enumerated, and finally, the upper half predetermined number (for example, two) candidates are extracted by adding the power spectra of the half-half relationship (FIGS. 3C and 3D) (step S6).

また、これと並行して、ステップＳ１で受け付けた振幅波形からゼロクロスの値を算出し（ステップＳ１１）、相関記憶手段７に記憶された情報を参照してＺテンポ値を抽出する（ステップＳ１２）。そして、その抽出されたＺテンポ値に近いテンポ値を選択して（ステップＳ１３）、最終的に推定されたテンポ値として出力する（ステップＳ１４）。 In parallel with this, the zero cross value is calculated from the amplitude waveform received in step S1 (step S11), and the Z tempo value is extracted with reference to the information stored in the correlation storage means 7 (step S12). . Then, a tempo value close to the extracted Z tempo value is selected (step S13), and is finally output as an estimated tempo value (step S14).

このように上記実施の形態によれば、楽曲の入力を受け付ける受付手段２と、その受け付けた楽曲における時間−振幅波形から当該波形の振幅包絡を算出する包絡算出手段３と、当該振幅包絡を周波数解析してパワースペクトルを抽出する周波数解析手段４と、楽曲全体にわたるすべての周波数について倍半関係にある周波数のパワースペクトルをその周波数のパワースペクトルに乗算する倍半処理手段５と、楽曲における時間−振幅波形の時間軸をクロスするゼロクロスの値を算出するゼロクロス算出手段６と、あらかじめゼロクロスの値とテンポ値との相関関係を記憶させた相関記憶手段７と、前記倍半処理手段５によって得られたパワースペクトルのうち最大パワースペクトルから順に複数のテンポ値候補を抽出するとともに、前記ゼロクロス算出手段６で算出されたゼロクロスから前記相関関係を参酌して、前記テンポ値候補の中からＺテンポ値に近いテンポ値を推定する推定手段８とを設けるようにしたので、計算量を少なくした状態で精度良くテンポ値を推定することができるようになる。 As described above, according to the above-described embodiment, the reception unit 2 that receives the input of music, the envelope calculation unit 3 that calculates the amplitude envelope of the waveform from the time-amplitude waveform in the received music, and the amplitude envelope as a frequency. Frequency analysis means 4 for analyzing and extracting a power spectrum, double-half processing means 5 for multiplying the power spectrum of that frequency by the power spectrum of the frequency that is half-fold for all frequencies throughout the music, and time in the music Obtained by the zero-cross calculating means 6 for calculating the zero-cross value that crosses the time axis of the amplitude waveform, the correlation storage means 7 for storing the correlation between the zero-cross value and the tempo value in advance, and the half-half processing means 5 In addition, a plurality of tempo value candidates are extracted in order from the maximum power spectrum in the power spectrum. The estimation unit 8 for estimating the tempo value close to the Z tempo value from the tempo value candidates is provided by taking into account the correlation from the zero cross calculated by the cross calculation unit 6, thereby reducing the amount of calculation. In this state, the tempo value can be estimated with high accuracy.

以下、本実施の形態におけるテンポ推定の評価について述べる。上記実施の形態におけるテンポ推定の評価は、Xiao（上記の非特許文献１）が行った既存のテンポ推定手法の評価に基づいて行うこととした。推定したテンポ値の正解の可否は、Xiaoのテンポ推定の評価方法に基づき、数５を満たした場合に正解とした。 Hereinafter, evaluation of tempo estimation in the present embodiment will be described. The evaluation of the tempo estimation in the above embodiment is performed based on the evaluation of the existing tempo estimation method performed by Xiao (Non-Patent Document 1 above). Whether or not the estimated tempo value is correct is determined based on Xiao's tempo estimation evaluation method when Equation 5 is satisfied.

数５では、推定したテンポ値と正解とするテンポ値の比率を求め、その誤差が0.04以下の場合に正解としている。評価指標は、評価する楽曲群の総数における正解した楽曲の割合である正解率とする。評価に用いた楽曲群は、楽曲の速さの指標の調査に用いたテンポ値に偏りがない100曲の楽曲群とした。 In Equation 5, the ratio between the estimated tempo value and the correct tempo value is obtained, and the correct answer is obtained when the error is 0.04 or less. The evaluation index is a correct answer rate that is a ratio of correct music in the total number of music groups to be evaluated. The group of songs used for the evaluation was a group of 100 songs with no bias in the tempo values used to investigate the index of music speed.

上記実施の形態におけるテンポ推定の評価結果を表２に示す。 Table 2 shows the evaluation results of the tempo estimation in the above embodiment.

表２では、提案するテンポ推定において候補別のテンポ値の正解率を「正解率1」とし、候補別の倍または半分のテンポ値である倍半関係のテンポ値の正解率を「正解率2」としている。表２より、提案手法は83%の正解率が得られ、候補の違いによる正解率が10%、倍半関係の違いによる正解率が11%であることが確認できる。そのため、候補の誤りを正解とすると93%、倍半関係の誤りを正解とすると94%といった正解率が得られた。 In Table 2, the correct rate of the tempo value for each candidate in the proposed tempo estimation is set to “correct rate 1”, and the correct rate of the tempo value of the double-half relationship that is a double or half tempo value for each candidate is set to “correct rate 2 " From Table 2, it can be confirmed that the proposed method has an accuracy rate of 83%, the accuracy rate is 10% due to the difference in candidates, and the accuracy rate is 11% due to the difference in the double-half relationship. Therefore, a correct answer rate of 93% was obtained when the candidate error was correct, and 94% was obtained when the double-half error was correct.

ここで、表1に示した既存のテンポ推定手法と比較すると、既存の手法における最高の正解率が61%であるのに対し、上記手法は83%といった高い正解率が得られていることが分かる。このとき評価に用いた楽曲群が違うことについて言及すると、評価に用いた楽曲群は図４に示すようにテンポ値に偏りがなく、その違いによる正解率の違いは微々たるものであると考えられる。そのため、既存のテンポ推定手法と比較して高い正解率が得られたと確認できる。 Here, compared with the existing tempo estimation methods shown in Table 1, the highest accuracy rate of the existing method is 61%, whereas the above method has a high accuracy rate of 83%. I understand. When referring to the fact that the music group used for the evaluation is different at this time, the music group used for the evaluation has no bias in the tempo value as shown in FIG. 4, and the difference in the correct answer rate due to the difference is considered to be slight. It is done. Therefore, it can be confirmed that a high accuracy rate was obtained compared with the existing tempo estimation method.

なお、本発明は上記実施の形態に限定されることなく種々の態様で実施することができる。 In addition, this invention can be implemented in various aspects, without being limited to the said embodiment.

例えば、上記実施の形態における包絡算出手段３では、振幅包絡を二乗して平滑化ダウンサンプリングを用いるようにしたが、振幅情報のまま包絡情報を算出するようにしてもよく、また、平滑化ダウンサンプリングすることなくサンプル点数を多くするか、前処理としてサンプリング周波数を落とす方法を用いても良い。 For example, in the envelope calculation means 3 in the above embodiment, the amplitude envelope is squared and smoothed downsampling is used. However, the envelope information may be calculated as the amplitude information, or the smoothed down A method of increasing the number of sample points without sampling or reducing the sampling frequency as preprocessing may be used.

また、上記実施の形態では、周波数解析されたすべての周波数において倍半処理を行い、パワースペクトルを乗算するようにしたが、楽曲すべてについて所定の閾値よりも強いパワースペクトルを有する周波数について倍半処理を行うようにしてもよい。 Further, in the above-described embodiment, the half spectrum process is performed for all the frequencies subjected to the frequency analysis, and the power spectrum is multiplied. However, the half spectrum process is performed for frequencies having a power spectrum stronger than a predetermined threshold for all the music pieces. May be performed.

１・・・テンポ値推定装置
２・・・受付手段
３・・・包絡算出手段
４・・・周波数解析手段
５・・・倍半処理手段
６・・・ゼロクロス算出手段
７・・・相関記憶手段
８・・・推定手段 DESCRIPTION OF SYMBOLS 1 ... Tempo value estimation apparatus 2 ... Acceptance means 3 ... Envelope calculation means 4 ... Frequency analysis means 5 ... Half-half processing means 6 ... Zero cross calculation means 7 ... Correlation storage means 8 ... Estimation means

Claims

In the tempo value estimation device for estimating the tempo of the music from the input music,
A receiving means for receiving input of music;
A frequency analysis means for extracting a power spectrum by performing frequency analysis on the amplitude waveform in the received music;
Double processing means for adding to the power spectrum of each frequency a value that takes into account the power spectrum of the frequency that is in a half-half relationship for all frequencies throughout the song,
Zero-cross calculating means for calculating a zero-cross value at which the amplitude waveform in the music received by the receiving means crosses the time axis;
Correlation storage means for storing the correlation between the zero cross value and the tempo value in advance;
Extracting a plurality of frequencies as tempo value candidates in order from the maximum power spectrum among the power spectrum obtained by the half-half processing means, taking into account the correlation from the zero cross calculated by the zero cross calculation means, and the tempo value An estimation means for estimating one tempo value from the candidates;
A tempo value estimation device characterized by comprising:

An envelope calculating means for calculating an amplitude envelope of the waveform from an amplitude waveform in the music received by the receiving means is provided, and the calculated envelope waveform information is subjected to frequency analysis to extract the power spectrum of the entire music. The tempo value estimation apparatus according to 1.

3. The tempo value estimation according to claim 2, wherein when the envelope calculating unit calculates the amplitude envelope from the amplitude waveform, an average value of a waveform obtained by squaring the amplitude waveform is calculated for each predetermined smoothing section. apparatus.

In the tempo value estimation method for estimating the tempo of the music from the input music,
Receiving the input of the music;
Extracting the power spectrum by performing frequency analysis on the amplitude waveform of the received music;
Adding to the power spectrum of each frequency a value that takes into account the power spectrum of the frequency that is half-fold for all frequencies throughout the song;
Calculating a zero cross value at which the amplitude waveform in the received music crosses the time axis;
A step in which the correlation between the zero cross value and the tempo value is stored in advance;
A plurality of frequencies are sequentially extracted as tempo value candidates from the maximum power spectrum among the power spectrum obtained by the halving process, and the correlation is taken into consideration from the calculated zero cross, and one frequency is selected from the tempo value candidates. Estimating a tempo value of
A tempo value estimation method characterized by comprising: