JP2012118417A

JP2012118417A - Feature waveform extraction system and feature waveform extraction method

Info

Publication number: JP2012118417A
Application number: JP2010269827A
Authority: JP
Inventors: Masanobu Miura; 雅展三浦; Kazuyuki Aino; 和之相野; Tadashi Shoji; 正庄司; Masatake Ono; 昌剛大野
Original assignee: Ryukoku University
Current assignee: Ryukoku University
Priority date: 2010-12-02
Filing date: 2010-12-02
Publication date: 2012-06-21

Abstract

PROBLEM TO BE SOLVED: To accurately extract the middle out of music.SOLUTION: Frequency analysis is performed on waveform information of music and frequency variation values in a time axis direction are determined. A variation time having a plurality of high-order frequency variation values, in the determined frequency variation values, is extracted and waveform information of a fixed phrase continued from that variation time is extracted. A correlation coefficient of extracted phrase waveforms is then calculated to determine a repetition frequency and an average sound pressure in that phrase waveform is determined. A variation time of a phrase waveform having a plurality of or more repetition frequencies and a maximum average sound pressure is defined as a starting time of the middle, and waveform information until the next variation time is extracted as a feature waveform. Thereafter, processing for fixing a tempo value of the middle or rearrangement according to the tempo value is performed, positions of beats in the middles are matched and the middles are connected.

Description

本発明は、楽曲の中からサビを抽出できるようにした特徴波形抽出システムに関するものである。 The present invention relates to a feature waveform extraction system capable of extracting rust from music.

一般に、楽曲の中のサビとなる部分は、前のフレーズから転調したり、あるいは、長い間奏区間や長い音符・休符などを挟んで音圧が大きく変化したり、高音域へと変化することが多い。このため、従来では、このようなサビの特徴を利用して、楽曲の中からサビとなる部分の情報を抽出できるようにした方法が各種提案されている。 In general, the chorus part of a song is transposed from the previous phrase, or the sound pressure changes greatly over a long interlude or long notes / rests, or changes to a high frequency range. There are many. For this reason, conventionally, various methods have been proposed in which such rust characteristics can be used to extract rust information from music.

例えば、下記の特許文献１には、楽曲の中から音圧が最大となる位置をサビとして抽出する技術が開示されており、また、下記の特許文献２には、所定時間以上大きな音圧が継続し、かつ、周波数スペクトルが高周波域にシフトする傾向が所定時間以上続いた場合は、その部分をサビであると判定する技術が開示されている。 For example, the following Patent Document 1 discloses a technique for extracting the position where the sound pressure is maximum from the music as rust, and the following Patent Document 2 has a sound pressure that is greater than a predetermined time. A technique is disclosed in which, when a tendency to continue and the frequency spectrum shifts to a high frequency region continues for a predetermined time or more, the portion is determined to be rusted.

さらに、下記の特許文献３には、音圧などを用いて抽出されたサビの区間が、楽曲中に繰り返し出現している場合、その区間をサビとする技術が開示されている。 Furthermore, Patent Document 3 below discloses a technique in which a rust section extracted using sound pressure or the like is used as a rust when a section of rust repeatedly appears in music.

特開２００７−０８０３０４号公報JP 2007-080304 A 特開２００８−１５９２５２号公報JP 2008-159252 A 特開２００８−２６２０４３号公報JP 2008-262043 A

しかしながら、上述のようなサビの抽出方法では次のような問題を有する。 However, the rust extraction method as described above has the following problems.

すなわち、特許文献１のように、音圧のみでサビの領域を抽出する方法では、間奏区間や長い休符区間などが存在している場合、その区間を「サビでない」としてサビの候補から除外することができるが、例えば、バスやビートなどのように音圧の大きな楽器が断続的に演奏されている場合、楽曲全体にわたって音圧が大きくなるため、どの部分がサビであるかを正確に判断するのが難しくなる。 That is, in the method of extracting a rust area using only sound pressure as in Patent Document 1, if there is an interlude section or a long rest section, the section is excluded from rust candidates as “not rust”. For example, if an instrument with high sound pressure, such as a bass or beat, is played intermittently, the sound pressure will increase throughout the song, so you can determine exactly which part is chorus. It becomes difficult to judge.

また、上記特許文献２のように、所定時間以上大きな音圧が続き、かつ、高周波成分が多く含まれる部分をサビとする方法においても、同様に、バスやビートなどのような音圧の大きな楽器が断続的に演奏されている場合は、サビとなる部分を抽出するのが難しくなる。また、仮に、その抽出された区間から、高周波成分に基づいて絞り込みを行ったとしても、音圧に基づいて間違ったサビの候補を抽出してしまった場合は、結果として間違ったサビを抽出してしまうことになる。 Similarly, in the method in which a large sound pressure continues for a predetermined time or more and a portion containing a lot of high frequency components is rusted as in Patent Document 2, the sound pressure such as bass or beat is high. When the instrument is played intermittently, it becomes difficult to extract the portion that becomes rust. In addition, even if narrowing is performed based on the high frequency component from the extracted section, if the wrong rust candidate is extracted based on the sound pressure, the wrong rust is extracted as a result. Will end up.

さらには、上記特許文献３のように、繰り返し頻度を計算する場合であっても、ＡメロやＢメロが繰り返し多く演奏される楽曲では、サビだけが突出して繰り返し多く演奏されるわけではないので、そのサビを正確に抽出するのが難しくなってしまう。 Furthermore, even if the repetition frequency is calculated as in the above-mentioned Patent Document 3, in the music in which the A melody and the B melody are repeatedly played, only the chorus does not protrude and is repeatedly played. , It will be difficult to extract the rust accurately.

そこで、本発明は上記課題に着目してなされたもので、楽曲の中から精度よくサビとなる部分を抽出できるようにしたシステムを提供することを目的とする。 Therefore, the present invention has been made paying attention to the above problems, and an object of the present invention is to provide a system that can extract a rust portion from music with high accuracy.

すなわち、本発明は上記課題を解決するために、楽曲の波形情報の入力を受け付ける入力受付手段と、当該入力受付手段によって受け付けた波形情報を周波数解析し、時間軸方向の周波数変動値を求める変動算出手段と、当該変動算出手段によって求められた周波数変動値のうち、所定値以上の周波数変動値を有する変動時刻を抽出する変動時刻抽出手段と、当該変動時刻から連続する所定区間内の波形情報を抽出する区間波形抽出手段と、当該抽出された区間波形同士の繰り返し頻度を求める頻度算出手段と、当該区間波形における音圧を求める音圧算出手段と、前記繰り返し頻度が複数回以上であって前記音圧が最大となる区間波形の変動時刻を特徴開始時刻として抽出し、所定の終了時刻までの波形情報を特徴波形として抽出する特徴波形抽出手段とを備えるようにしたものである。 That is, in order to solve the above-described problem, the present invention provides an input receiving unit that receives an input of music waveform information and a frequency analysis of the waveform information received by the input receiving unit to obtain a frequency fluctuation value in the time axis direction. Calculation means, fluctuation time extraction means for extracting a fluctuation time having a frequency fluctuation value greater than or equal to a predetermined value among the frequency fluctuation values obtained by the fluctuation calculation means, and waveform information in a predetermined section continuous from the fluctuation time Section frequency extracting means for extracting the frequency, frequency calculating means for determining the repetition frequency of the extracted section waveforms, sound pressure calculating means for determining the sound pressure in the section waveform, and the repetition frequency is a plurality of times or more. The feature is that the fluctuation time of the section waveform where the sound pressure is maximum is extracted as the feature start time, and the waveform information up to a predetermined end time is extracted as the feature waveform It is obtained by so and a shape extraction unit.

このように、先に周波数変動の大きな位置をサビの開始時刻の候補として列挙し、その後、その開始時刻から所定区間内の音圧や繰り返し頻度に基づいてサビの候補を抽出するようにすれば、曲想が大きく変わる部分（すなわち、転調する部分や、長い間奏、長い音符や休符の存在する部分）を優先して、正確にサビを抽出することができる。すなわち、音圧に基づいてサビの候補を先に抽出した場合は、音圧変化の少ないサビを抽出することが難しくなるが、サビに切り替わる部分では必ず曲想の変化があるため、この曲想の変化に基づいてサビを正確に抽出することができるようになる。 As described above, if the position where the frequency variation is large is listed as a candidate for the start time of the chorus first, then the chorus candidate is extracted based on the sound pressure and the repetition frequency within the predetermined section from the start time. It is possible to extract rust accurately by giving priority to the part where the musical idea changes greatly (that is, the part that transposes, the part where a long note, a long note or a rest exists). That is, if a rust candidate is extracted first based on sound pressure, it will be difficult to extract a rust with little change in sound pressure, but there will always be a change in the thought at the part that switches to the rust. It becomes possible to accurately extract rust based on the above.

また、このような発明において、変動時刻から連続する所定区間を、一定の区間として抽出する。 Moreover, in such an invention, the predetermined area which continues from the fluctuation | variation time is extracted as a fixed area.

このように変動時刻から抽出された一定区間の情報に基づいて開始時刻のみを決定すれば、あらかじめサビの終了時刻を推定して繰り返し頻度などを計算する場合と比べて、その終了時刻を誤って推定することに基づくサビの誤判断を防止することができる。 In this way, if only the start time is determined based on the information of a certain section extracted from the fluctuation time, the end time is erroneously compared with the case where the end time of the chorus is estimated in advance and the repetition frequency is calculated. It is possible to prevent rust misjudgment based on estimation.

さらには、終了時刻を決定する場合、前記抽出された開始時刻の次の変動時刻とする。 Furthermore, when determining the end time, it is set as the next change time of the extracted start time.

このようにすれば、終了時刻については周波数変動のみで決定されるため、音圧が極端に変化することなく次のメロディに移るような場合であっても、終了時刻を正確に抽出することができるようになる。 In this way, since the end time is determined only by the frequency fluctuation, it is possible to accurately extract the end time even when the sound pressure moves to the next melody without extremely changing. become able to.

また、このように抽出された特徴波形についてテンポ値を一致させて他の楽曲の特徴波形と連結する連結手段も備えるようにする。 Further, a connection means for matching the tempo values of the extracted feature waveforms with the feature waveforms of other music pieces is also provided.

このようにすれば、サビを連結させてサビメドレーを作成する場合に、それぞれのサビのテンポ値を一定にすることによってスムーズに他のサビに遷移させることができる。 In this way, when creating a chorus medley by connecting choruses, it is possible to smoothly transition to another chorus by keeping the tempo value of each chorus constant.

また、このように抽出された特徴波形について、テンポ値順に並び替えて連結する連結手段を備えるようにすることもできる。 Further, it is possible to provide connection means for rearranging the characteristic waveforms thus extracted in order of tempo values.

このようにした場合であっても、スムーズに他のサビに遷移させることができる。 Even in this case, it is possible to smoothly transition to another rust.

また、このように抽出された特徴波形について拍の位置を一致させて他の楽曲の特徴波形と連結するようにすることもできる。 The extracted feature waveforms can be linked to the feature waveforms of other music pieces by matching the positions of the beats.

このようにすれば、サビを連結させてサビメドレーを作成する場合に、それぞれのサビの拍を一致させることによってスムーズに他のサビに遷移させることができる。 In this way, when creating a chorus medley by connecting choruses, it is possible to smoothly transition to other choruses by matching the beats of the choruses.

本発明では、先に周波数変動の大きな位置をサビの開始時刻の候補として列挙し、その後、その開始時刻から所定区間内の音圧や繰り返し頻度に基づいてサビの候補を抽出するようにしたので、曲想が大きく変わる部分（すなわち、転調する部分や、長い間奏、長い音符や休符の存在する部分）を優先して、正確にサビを抽出することができる。すなわち、音圧に基づいてサビの候補を先に抽出した場合は、音圧変化の少ないサビを抽出することが難しくなるが、サビに切り替わる部分では必ず曲想の変化があるため、この曲想の変化に基づいてサビを正確に抽出することができるようになる。 In the present invention, the position where the frequency fluctuation is large is listed as a candidate for the start time of the chorus first, and then the chorus candidate is extracted based on the sound pressure and the repetition frequency within the predetermined section from the start time. It is possible to extract rust accurately by giving priority to the part where the musical idea changes greatly (that is, the part that transposes, the part where a long note, a long note or a rest exists). That is, if a rust candidate is extracted first based on sound pressure, it will be difficult to extract a rust with little change in sound pressure, but there will always be a change in the thought at the part that switches to the rust. It becomes possible to accurately extract rust based on the above.

本発明の一実施の形態における楽曲特徴波形抽出システムの機能ブロックFunctional block of music feature waveform extraction system in one embodiment of the present invention 同形態における各機能実現手段で抽出された波形を示す図The figure which shows the waveform extracted by each function implementation means in the form 同形態におけるテンポ値を推定する場合の処理を示す図The figure which shows the process in the case of estimating the tempo value in the same form 同形態におけるゼロクロスとテンポ値との関係を示す図The figure which shows the relationship between the zero cross and tempo value in the same form 同形態におけるサビを連結する処理を示す図The figure which shows the process which connects the rust in the form 同形態におけるサビを抽出するフローチャートFlow chart for extracting rust in the same form 同形態におけるサビを連結する場合のフローチャートFlow chart for connecting rust in the same form

以下、本発明の一実施の形態について図面を参照して説明する。本実施の形態における特徴波形抽出システム１００は、一曲の楽曲の中からサビとなる部分を抽出できるようにしたものであって、パーソナルコンピューターや、複数のコンピューターや関連するデバイスなどを接続したシステムなどによって構成される。そして、これらのＣＰＵ、ＲＯＭやＲＡＭなどの記憶デバイスに記憶されたプログラム、ディスクドライブなどを協働させて図１に示すような各機能を実現している。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings. The feature waveform extraction system 100 in the present embodiment can extract a rust portion from one piece of music, and is a system in which a personal computer, a plurality of computers, and related devices are connected. Consists of. The functions shown in FIG. 1 are realized by cooperating these CPU, programs stored in storage devices such as ROM and RAM, disk drives, and the like.

具体的には、この特徴波形抽出システム１００は、大きく分けて、楽曲のデータの入力を受け付けてサビとなる部分の波形を抽出する機能と、その抽出されたサビを連結させてサビメドレーを作成する機能とを備えている。 Specifically, the feature waveform extraction system 100 roughly divides the function of receiving music data and extracts the waveform of the portion that becomes a chorus, and creates a chorus medley by connecting the extracted chorus. It has the function to do.

このような構成において、サビの特徴波形を抽出する特徴波形抽出システム１００は、入力受付手段１によって受け付けた波形情報を周波数解析する解析手段２と、その解析された周波数情報から時間軸方向の周波数変動値を求める変動算出手段３と、所定値以上あるいは所定個数以上（例えば上位50個）の周波数変動値を有する変動時刻を抽出する変動時刻抽出手段４と、この変動時刻から連続する所定区間内の波形情報を抽出する区間波形抽出手段５と、その抽出された区間波形同士の繰り返し頻度を求める頻度算出手段６と、区間波形における音圧を求める音圧算出手段７とを備え、特徴波形抽出手段８によって、区間波形の繰り返し頻度が複数回以上であって音圧が最大となる区間波形の変動時刻を特徴開始時刻として抽出し、次の変動時刻までの区間をサビの波形情報を特徴波形として抽出するようにしている。以下、本実施の形態における各機能実現手段について詳細に説明する。 In such a configuration, the feature waveform extraction system 100 that extracts a rust feature waveform includes an analysis unit 2 that performs frequency analysis on the waveform information received by the input reception unit 1, and a frequency in the time axis direction from the analyzed frequency information. Fluctuation calculation means 3 for obtaining fluctuation values, fluctuation time extraction means 4 for extracting fluctuation times having a frequency fluctuation value of a predetermined value or more or a predetermined number or more (for example, the top 50), and a predetermined interval continuous from the fluctuation time Section waveform extraction means 5 for extracting the waveform information, frequency calculation means 6 for determining the repetition frequency of the extracted section waveforms, and sound pressure calculation means 7 for determining the sound pressure in the section waveform. The means 8 extracts the fluctuation time of the section waveform where the repetition frequency of the section waveform is a plurality of times and the sound pressure is the maximum as the feature start time, and And so as to extract a section up dynamic time the rust waveform information as the characteristic waveform. Hereafter, each function realization means in this Embodiment is demonstrated in detail.

まず、入力受付手段１は、ユーザーが選択した楽曲の入力を受け付ける。この楽曲の入力を受け付ける場合、デバイスに挿入されたＣＤから楽曲のデータを受け付ける他、インターネットなどを介してダウンロードされた楽曲のデータなどを受け付ける。なお、楽曲のデータとしては種々のフォーマット形式のものが存在するが、この実施の形態ではWAVE形式で楽曲のデータを受け付けるものとする。この楽曲のデータを受け付けるに際して、他のフォーマット形式の楽曲データを受け付けた場合は、フォーマット変換によってWAVE形式に変換して受け付けるようにする。この受け付けた楽曲の波形情報を図２に示す。図２（ａ）は、受け付けた楽曲の振幅波形である波形情報であって、横軸が時間軸、縦軸が音圧を示している。 First, the input receiving means 1 receives the input of the music selected by the user. When receiving the input of the music, in addition to receiving the music data from the CD inserted in the device, the music data downloaded via the Internet or the like is received. There are various formats of music data, but in this embodiment, music data is received in the WAVE format. When receiving music data in this format, if music data in another format is received, it is converted to WAVE format by format conversion and accepted. The waveform information of the received music is shown in FIG. FIG. 2A shows waveform information that is an amplitude waveform of the received music, where the horizontal axis indicates the time axis and the vertical axis indicates the sound pressure.

解析手段２は、この受け付けた楽曲を所定の分析フレームごとにシフトさせながら高速フーリエ変換（FFT）によって周波数解析を行う。ここでは、サンプル点数として512点、シフト幅としては512点（約11.6msecに相当）で高速フーリエ変換を行う。この周波数解析を行うと、横軸が周波数、縦軸が分析フレームの時間、高さ軸がスペクトルで表現された情報が得られる。 The analysis means 2 performs frequency analysis by fast Fourier transform (FFT) while shifting the received music piece for each predetermined analysis frame. Here, the fast Fourier transform is performed with 512 sampling points and 512 shift points (corresponding to about 11.6 msec). When this frequency analysis is performed, information is obtained in which the horizontal axis represents the frequency, the vertical axis represents the time of the analysis frame, and the height axis represents the spectrum.

変動算出手段３は、この周波数解析された波形情報をもとに分析フレームごとの周波数変動の大きさを算出する。この周波数変動を算出する場合、まずは、隣接する分析フレームの各周波数ビンにおける差分の絶対値の総和（スカラー量）を算出してflux関数を抽出する。これによって、隣接する分析フレームにおける周波数スペクトルの大きさの差が抽出される。この抽出されたflux関数は、図２（ｂ）のように、横軸が時間軸、縦軸が各周波数ビンにおける差分の絶対値の総和（スカラー量）として表される。この図２（ｂ）における波形において、縦軸の大きな信号は、隣接する分析フレームの周波数変動の大きな部分を示しており、隣接する分析フレームの時刻から大きく周波数スペクトルが変化した部分を示している。しかしながら、隣接する分析フレームの周波数変動の総和だけでは、ノイズによって大きく周波数が変動した場合、誤った判断をしてしまう可能性がある。そこで、この抽出された情報を平滑化処理する。 The fluctuation calculation means 3 calculates the magnitude of frequency fluctuation for each analysis frame based on the waveform information subjected to frequency analysis. When calculating this frequency variation, first, the flux function is extracted by calculating the sum (absolute amount) of the absolute value of the difference in each frequency bin of the adjacent analysis frame. As a result, the difference in frequency spectrum between adjacent analysis frames is extracted. As shown in FIG. 2B, the extracted flux function is represented as a time axis on the horizontal axis and the sum of absolute values of differences in each frequency bin (scalar amount) on the vertical axis. In the waveform in FIG. 2B, a signal with a large vertical axis indicates a portion where the frequency fluctuation of the adjacent analysis frame is large, and indicates a portion where the frequency spectrum has greatly changed from the time of the adjacent analysis frame. . However, there is a possibility that an erroneous determination may be made when the frequency fluctuates greatly due to noise only by the sum of the frequency fluctuations of adjacent analysis frames. Therefore, the extracted information is smoothed.

平滑化処理においては、移動平均フィルタを用いてfluxの時間関数の振幅包絡を求める。この移動平均フィルタとしては、この実施の形態では、60点（約696.6msecに相当）の移動平均フィルタを用いるものとする。すると図２（ｂ）に示されたflux関数が、図２（ｃ）に示すように平滑化された関数に置き換えられる。 In the smoothing process, the amplitude envelope of the time function of the flux is obtained using a moving average filter. As this moving average filter, in this embodiment, a moving average filter of 60 points (corresponding to about 696.6 msec) is used. Then, the flux function shown in FIG. 2B is replaced with a smoothed function as shown in FIG.

次に、このように平滑化処理した信号を、もっと大きな周波数変動の傾向として判断できるような処理を行う。この処理においては、最小二乗法を用いて回帰直線を求め、その回帰直線の微分値（すなわち、図２（ｃ）における傾きの値）を求める（図２（ｄ））。この微分値には、プラスとマイナスの二種類存在するが、ここでは周波数変動の大きさのみを判断するため、微分値を絶対値にして抽出する。すると、図２（ｅ）に示すように、図２（ｄ）の値がすべて正の値となった信号を得ることができる。この図２（ｃ）や図２（ｄ）において、横軸は時間軸、縦軸は平滑化されたflux関数の周波数変動の傾きの値、を示している。このような処理を行うと、ノイズが含まれている場合や、隣接する分析フレームの周波数は急激に変化していているものの全体の傾向として周波数変動が少ない場合であっても、サビであるかどうかを判断することができる。 Next, processing is performed so that the signal smoothed in this way can be determined as a tendency of a larger frequency fluctuation. In this process, a regression line is obtained using the least square method, and a differential value of the regression line (that is, the slope value in FIG. 2C) is obtained (FIG. 2D). There are two types of differential values, plus and minus. Here, in order to determine only the magnitude of frequency fluctuation, the differential value is extracted as an absolute value. Then, as shown in FIG. 2E, a signal in which all the values in FIG. 2D are positive values can be obtained. In FIG. 2C and FIG. 2D, the horizontal axis indicates the time axis, and the vertical axis indicates the value of the slope of the frequency variation of the smoothed flux function. If such processing is performed, even if noise is included or the frequency of adjacent analysis frames is changing rapidly, even if there is little frequency variation as a whole, is it rusted? It can be judged.

変動時刻抽出手段４は、この抽出された信号情報から傾きの大きい時刻を上位所定数個抽出する。図２（ｆ）における丸印を付した部分は、この抽出された時刻の信号を示している。ここでは上位から所定数個（例えば150個）の信号を抽出しているが、この個数については、あらかじめ傾きの値について閾値を設けておき、その閾値を超える傾きを有する信号の時刻を抽出するようにしてもよい。このように抽出された時刻は、サビの開始時刻に該当する可能性の高い変動時刻として記憶部に記憶される。 The fluctuation time extraction means 4 extracts a predetermined number of times having a large inclination from the extracted signal information. The part marked with a circle in FIG. 2 (f) shows the signal at the extracted time. Here, a predetermined number of signals (for example, 150 signals) are extracted from the top, but for this number, a threshold is set in advance for the slope value, and the time of the signal having a slope exceeding the threshold is extracted. You may do it. The time extracted in this way is stored in the storage unit as a fluctuation time that is likely to correspond to the start time of the chorus.

次に、区間波形抽出手段５は、この変動時刻を基準として、そこから所定時間の楽曲の波形情報を抽出する。この波形情報を抽出する場合、WAVE波形の情報を抽出する。この抽出された波形情報を図２（ｇ）に示す。この図２（ｇ）において、横軸は時間軸、縦軸は音圧を示しており、図２（ａ）の一部の信号情報となっている。ここで変動時刻から所定時間の楽曲の波形情報を抽出する場合、その抽出される時間幅として、あらかじめ規定された一定長さの時間としておき、それぞれの変動時刻からこの規定された時間幅の信号を抽出するようにしている。ここでは、変動時刻から2048点（約23.7secに相当）の区間波形を抽出している。この時点においては、変動時刻が近接している場合、抽出された区間波形がオーバーラップしている可能性もある。 Next, the section waveform extraction means 5 extracts the waveform information of the music for a predetermined time from the fluctuation time as a reference. When extracting this waveform information, WAVE waveform information is extracted. The extracted waveform information is shown in FIG. In FIG. 2G, the horizontal axis indicates the time axis, and the vertical axis indicates the sound pressure, which is part of the signal information of FIG. Here, when extracting the waveform information of the music of a predetermined time from the fluctuation time, the time width to be extracted is set as a predetermined length of time, and the signal of the predetermined time width from each fluctuation time. To extract. Here, a section waveform of 2048 points (corresponding to about 23.7 sec) is extracted from the fluctuation time. At this point in time, if the fluctuation times are close, the extracted section waveforms may overlap.

頻度算出手段６は、この抽出された区間波形の類似頻度を算出する。この類似頻度を算出する場合、各区間波形同士の相関係数を算出する。この相関係数を算出することにより、それぞれの区間波形の類似度が高い場合は大きな相関係数の値を得ることができ、一方、それぞれの区間波形の類似度が低い場合は小さな相関係数の値を得ることができる。そして、それぞれの区間波形の組み合わせを相関係数の高い順にソーティングする。このとき、ソーティングされた区間波形の組み合わせの中には、前述のように変動時刻が近接していて、ほとんどの区間がオーバーラップしているものも存在するため、変動時刻が所定時間内の組み合わせとなっているものを除外する。ここでは、それぞれの変動時刻が１５秒以内となっている組み合わせを除外するものとし、その組み合わせを構成した２つの変動時刻を除外する。そして、相関係数が所定値以上あるいは所定個数以上の組み合わせ（具体的には、繰り返し頻度が２回以上の組み合わせ）をサビの候補として記憶させておく。 The frequency calculation means 6 calculates the similarity frequency of the extracted section waveform. When calculating this similarity frequency, the correlation coefficient between each section waveform is calculated. By calculating this correlation coefficient, a large correlation coefficient value can be obtained when the similarity of each section waveform is high, while a small correlation coefficient is obtained when the similarity of each section waveform is low. Can be obtained. Then, the combinations of the section waveforms are sorted in descending order of the correlation coefficient. At this time, among the combinations of sorted interval waveforms, as described above, there are those in which the change times are close and most of the intervals overlap. Are excluded. Here, it is assumed that combinations whose respective variation times are within 15 seconds are excluded, and two variation times that constitute the combination are excluded. Then, combinations having a correlation coefficient of a predetermined value or more or a predetermined number or more (specifically, combinations having a repetition frequency of 2 or more) are stored as rust candidates.

音圧算出手段７は、このサビの候補として抽出された区間波形の平均音圧を計算する。この計算においては、抽出されたWAVE波形の情報を用いて音圧を計算する。 The sound pressure calculation means 7 calculates the average sound pressure of the section waveform extracted as the rust candidate. In this calculation, the sound pressure is calculated using the extracted WAVE waveform information.

特徴波形抽出手段８は、このように算出された区間波形について、相関係数が所定値以上の組み合わせの中から平均音圧が最大となる区間波形を抽出する。これにより得られた区間波形は、２つの変動時刻に挟まれた波形であることから、それらをサビの開始時刻と終了時刻とし、そのサビ（WAVEファイル）を抽出する。 The feature waveform extraction means 8 extracts a section waveform having the maximum average sound pressure from the combinations having the correlation coefficient equal to or larger than a predetermined value for the section waveform calculated in this way. Since the section waveform obtained in this way is a waveform sandwiched between two fluctuation times, they are set as the start time and end time of the chorus, and the chorus (WAVE file) is extracted.

このように、周波数変動を計算して、最も大きい周波数変動値を有する変動時刻を所定個抽出し、そこから一定区間の楽曲データの波形の繰り返し頻度および音圧を考慮してサビを抽出するようにすれば、曲想の変化する部分に着目して正確にサビを抽出することができるようになる。 In this way, the frequency fluctuation is calculated, a predetermined number of fluctuation times having the largest frequency fluctuation value are extracted, and rust is extracted in consideration of the repetition frequency and sound pressure of the waveform data waveform in a certain section. By doing so, it becomes possible to accurately extract rust by paying attention to the part where the musical composition changes.

次に、このように抽出されたサビの使用方法について説明する。この実施の形態では、このように抽出されたサビをそれぞれ連結して、サビメドレーとして一連の楽曲を作成できるようにしている。このサビの連結を行う連結手段９の構成について説明する。 Next, how to use the rust extracted in this way will be described. In this embodiment, the choruses extracted in this way are connected to each other so that a series of music pieces can be created as a chorus medley. The structure of the connecting means 9 for connecting the rust will be described.

抽出されたサビは、それぞれテンポ値や拍などがそれぞれ異なっている。このため、ランダムにサビを連結したのでは自然にサビを遷移させることができない。そこで、サビのテンポ値や拍などを一致させて連結させるような処理を行うようにしている。 The extracted chorus has different tempo values and beats. For this reason, if rust is connected at random, rust cannot be changed naturally. Therefore, processing is performed in which the tempo values, beats, and the like of the chorus are matched and connected.

まず、テンポ値を一致させる場合、その楽曲のテンポ値を推定して一定のテンポ値となるようにタイムストレッチ処理を行うようにする。 First, when the tempo values are matched, the tempo value of the music is estimated and the time stretch process is performed so as to be a constant tempo value.

楽曲のテンポ値を推定する場合は、種々の方法を採用することができるが、この実施の形態では、次のような処理を行うようにしている。 In estimating the tempo value of the music, various methods can be adopted. In this embodiment, the following processing is performed.

すなわち、受け付けた楽曲の振幅波形から振幅包絡を算出し、その振幅包絡を周波数解析して周波数スペクトルを抽出する（図３（ａ））。そして、この周波数スペクトルを乗算した後の短時間周波数解析の結果を周波数ごとに時間軸方向に足し合わせ（図３（ｂ））、最大周波数スペクトルから所定個数の周波数スペクトルを有するテンポ値を候補として並び替えて列挙し、最後に、倍半関係の周波数スペクトルを足し合わせて上位所定個数（例えば２個）の候補を抽出する（図３（ｃ）（ｄ））。この倍半関係にある周波スペクトルを考慮するのは、例えば、すなわち、120bpmの楽曲において１拍目と３拍目に強拍が存在する場合（すなわち４／４拍子の演奏の場合）、１小節には第１拍目と第３拍目にピーク値が存在するために、ピーク値の把握の仕方によっては120bpmの半分の60bpmと判断される可能性があり、逆に、120bpmの楽曲において隣接する拍との間に他の楽器の拍が入っている場合は、240bpmと判断される可能性がある。そこで、それぞれの場合を考慮しておくために倍半処理を行うようにしている。 That is, the amplitude envelope is calculated from the received amplitude waveform of the music, and the frequency envelope is extracted by frequency analysis of the amplitude envelope (FIG. 3A). Then, the result of the short-time frequency analysis after multiplication of the frequency spectrum is added in the time axis direction for each frequency (FIG. 3B), and a tempo value having a predetermined number of frequency spectra from the maximum frequency spectrum is selected as a candidate. Rearranged and enumerated, and finally, the upper half predetermined number (for example, two) candidates are extracted by adding the frequency spectrum of the half-half relationship (FIGS. 3C and 3D). Considering the frequency spectrum in the double-half relationship is, for example, when there is a strong beat at the first beat and the third beat in a 120 bpm song (ie when playing 4/4 time), one measure Because there is a peak value at the 1st and 3rd beats, there is a possibility that it is judged as 60bpm, which is half of 120bpm, depending on how the peak value is grasped. If there is a beat of another instrument between the beats to be played, it may be judged as 240 bpm. Therefore, in order to consider each case, the half-fold process is performed.

次に、楽曲の振幅波形が時間軸とクロスするゼロクロスの値を算出する（図４上図の丸印の数）。このとき、あらかじめゼロクロスの値とテンポ値との相関関係を記憶させておき、先に算出されたゼロクロスから相関関係を参酌してテンポ値を抽出し、このテンポ値に最も近いテンポ値を前記並び替えられたテンポ値から抽出する。一般的に、振幅波形のゼロクロスの値とテンポ値とは、図４下図に示すような関係を有しており、テンポ値が高ければゼロクロスの値も高くなる。逆に、テンポ値が低ければゼロクロスの値も低くなる。そこで、このような相関関係を利用することによって正解のテンポ値を抽出できるようにしている。そして、このように推定されたテンポ値でサビが演奏されていると仮定してサビのテンポ値を決定し、すべての楽曲のテンポ値を一定速度にタイムストレッチする。このテンポ値を一定にする場合は、楽曲全体をタイムストレッチしてもよく、あるいは、サビのみをタイムストレッチしてもよい。楽曲全体をタイムストレッチさせる場合は、後の拍時刻を抽出する際に、全体のデータの中から拍に対応する音圧の強い部分を抽出することができ、精度よく拍時刻を抽出することができるというメリットがある。一方、サビのみをタイムストレッチさせる場合は、全体的な処理時間を短くすることができるというメリットがある。 Next, the zero cross value at which the amplitude waveform of the music crosses the time axis is calculated (the number of circles in the upper diagram of FIG. 4). At this time, the correlation between the zero cross value and the tempo value is stored in advance, and the tempo value is extracted from the previously calculated zero cross in consideration of the correlation. Extract from the replaced tempo value. In general, the zero cross value and the tempo value of the amplitude waveform have a relationship as shown in the lower diagram of FIG. 4, and the higher the tempo value, the higher the zero cross value. Conversely, if the tempo value is low, the zero cross value is also low. Therefore, the correct tempo value can be extracted by using such a correlation. Then, assuming that the chorus is played at the estimated tempo value, the chorus tempo value is determined, and the tempo values of all the music pieces are time stretched to a constant speed. When making this tempo value constant, the entire music may be time stretched, or only chorus may be time stretched. When time-stretching the entire piece of music, when extracting the beat time later, it is possible to extract the strong sound pressure part corresponding to the beat from the whole data, and to extract the beat time accurately. There is a merit that you can. On the other hand, when time-stretching only rust, there is an advantage that the overall processing time can be shortened.

次に、拍の位置を推定する方法について説明する。この拍の位置を推定する方法にも種々の方法があるが、ここでは、次のような方法を用いることとする。 Next, a method for estimating the beat position will be described. There are various methods for estimating the position of the beat. Here, the following method is used.

すなわち、この実施の形態では、先に推定されたテンポ値を用いるとともに、楽曲全体から最も音圧の高い部分を抽出する。この最も音圧の高い部分を抽出する場合、タイムストレッチによってテンポ値の統一された楽曲の振幅波形から最も音圧の高い部分を抽出する。もしくは、タイムストレッチされた楽曲の振幅波形から振幅包絡を算出し、その振幅包絡に基づいて最も音圧の高い部分を抽出してもよい。そして、このように抽出された音圧の最も高い部分を基準として、テンポ値に対応する時刻を拍の位置として決定する。 That is, in this embodiment, the previously estimated tempo value is used, and the portion with the highest sound pressure is extracted from the entire music. When extracting the portion with the highest sound pressure, the portion with the highest sound pressure is extracted from the amplitude waveform of the music whose tempo value is unified by time stretching. Alternatively, the amplitude envelope may be calculated from the amplitude waveform of the time stretched music, and the portion with the highest sound pressure may be extracted based on the amplitude envelope. Then, the time corresponding to the tempo value is determined as the beat position with reference to the portion with the highest sound pressure extracted as described above.

そして、連結手段９は、このようにテンポ値の統一されたサビについて、図５に示すように、拍の位置を合わせて、音量をフェードさせながら次のサビに遷移させるようにしていく。 As shown in FIG. 5, the connecting means 9 shifts the beat to the next chorus while adjusting the beat position and fading the volume, as shown in FIG.

次に、このように構成された特徴波形抽出システム１００における楽曲のサビの抽出方法について、図６のフローチャートを用いて説明する。 Next, a rust extraction method for music in the characteristic waveform extraction system 100 configured as described above will be described with reference to the flowchart of FIG.

＜サビの抽出処理＞
一曲の楽曲の中からサビを抽出する場合、その楽曲の入力をWAVEファイルとして受け付ける（ステップＳ１）。そして、この受け付けた楽曲の振幅波形について512点の分析フレームを512点ずつシフトさせながら高速フーリエ変換して周波数解析し（ステップＳ２）、周波数・スペクトル・分析フレームの時刻で表現される周波数情報を得る。 <Rust extraction process>
When extracting rust from a piece of music, the input of the music is accepted as a WAVE file (step S1). Then, with respect to the amplitude waveform of the received music, the 512 analysis frames are shifted by 512 points while performing fast Fourier transform to perform frequency analysis (step S2), and the frequency information represented by the time of the frequency / spectrum / analysis frame is obtained. obtain.

次に、この周波数解析された情報に基づき、時間軸方向に沿った周波数変動の大きさを算出する。この周波数変動の算出においては、まず、隣接する分析フレームの各周波数ビンにおける差分の絶対値の総和を求めてflux関数を算出するとともに（ステップＳ３）、これを平滑化処理する。平滑化処理においては、60点ずつの移動平均フィルタを求めてflux関数の振幅包絡を求め（ステップＳ４）、また、その振幅包絡の部分的な傾きの傾向を把握すべく、最小二乗法を用い、回帰直線を求めて（ステップＳ５）その回帰直線の微分値の絶対値を求める（ステップＳ６）。 Next, the magnitude of the frequency variation along the time axis direction is calculated based on the frequency-analyzed information. In calculating the frequency fluctuation, first, the flux function is calculated by obtaining the sum of the absolute values of the differences in the frequency bins of the adjacent analysis frames (step S3), and this is smoothed. In the smoothing process, a moving average filter of 60 points is obtained to obtain the amplitude envelope of the flux function (step S4), and the least square method is used in order to grasp the partial inclination tendency of the amplitude envelope. Then, the regression line is obtained (step S5), and the absolute value of the differential value of the regression line is obtained (step S6).

そして、この微分値の絶対値について上位150個の信号の時刻を抽出し、その時刻をサビの候補の開始時刻とする（ステップＳ７）。 Then, the time of the top 150 signals is extracted with respect to the absolute value of the differential value, and the time is set as the start time of the rust candidate (step S7).

また、それぞれのサビの開始時刻から一定区間の区間波形を抽出して（ステップＳ８）、任意の２つの区間波形を抽出して相関係数を求め（ステップＳ９）、所定値以上の区間波形の組み合わせであって、かつ、それぞれの区間波形の変動時刻が１５秒以下のものを除去した後に、相関係数の高い波形の組み合わせを所定の数（例えば、50組）残してそれ以外は削除する（ステップＳ１０）。 Further, a section waveform of a certain section is extracted from the start time of each chorus (step S8), two arbitrary section waveforms are extracted to obtain a correlation coefficient (step S9), and a section waveform of a predetermined value or more is obtained. After removing combinations whose waveform fluctuation time is 15 seconds or less, a predetermined number (for example, 50) of combinations of waveforms having a high correlation coefficient are left and the others are deleted. (Step S10).

そして最後に、残った区間波形のうち、最も音圧の高い区間波形の変動時刻を開始時刻とし（ステップＳ１１）、その開始時刻から次の変動時刻までの波形情報をサビとして抽出する（ステップＳ１２）。 Finally, among the remaining interval waveforms, the variation time of the interval waveform with the highest sound pressure is set as the start time (step S11), and the waveform information from the start time to the next variation time is extracted as rust (step S12). ).

＜サビの連結処理＞
次に、このように抽出されたサビを連結して、サビメドレーの楽曲を作成する場合のフローチャートについて、図７を用いて説明する。 <Rust concatenation>
Next, a flowchart in the case of creating a chorus medley musical piece by connecting choruses extracted in this way will be described with reference to FIG.

まず、楽曲のサビを抽出する処理と並行して、楽曲のテンポ値を推定する。このテンポ値を推定する場合は、周波数解析された情報に基づいて（ステップＴ１）、すべての周波数の倍半関係にある周波数スペクトルをその周波数スペクトルに乗算し（ステップＴ２）、この乗算した後の短時間周波数解析の結果を周波数ごとに時間軸方向に足し合わせて、最大周波数スペクトルから所定個数の周波数スペクトルを有するテンポ値を候補として並び替えて列挙する。そして、倍半関係の周波数スペクトルを足し合わせて上位所定個数（例えば２個）の候補を抽出する（ステップＴ３）。 First, the tempo value of the music is estimated in parallel with the process of extracting the music chorus. When this tempo value is estimated, based on the frequency-analyzed information (step T1), the frequency spectrum having the double-half relationship of all frequencies is multiplied by the frequency spectrum (step T2), The results of the short-time frequency analysis are added in the time axis direction for each frequency, and tempo values having a predetermined number of frequency spectra are rearranged as candidates from the maximum frequency spectrum and listed. Then, the upper half predetermined number (for example, two) candidates are extracted by adding the frequency spectra of the half-half relationship (step T3).

また、これとともに、楽曲の振幅波形が時間軸とクロスするゼロクロスの値を算出し（ステップＴ４）、あらかじめ記憶させておいたゼロクロスの値とテンポ値との相関関係を参酌してテンポ値を抽出する（ステップＴ５）。そして、このテンポ値に最も近いテンポ値を先に並び替えられたテンポ値から抽出して（ステップＴ６）、そのサビを含む楽曲のテンポ値として推定する。 At the same time, the zero cross value at which the amplitude waveform of the music crosses the time axis is calculated (step T4), and the tempo value is extracted in consideration of the correlation between the zero cross value stored in advance and the tempo value. (Step T5). Then, the tempo value closest to the tempo value is extracted from the tempo values rearranged first (step T6), and is estimated as the tempo value of the music including the chorus.

このようにテンポ値を推定した後、今度は、一定のテンポ値にタイムストレッチ処理する（ステップＴ７）。 After estimating the tempo value in this way, this time, a time stretch process is performed to a constant tempo value (step T7).

次に、タイムストレッチされた楽曲中から拍に相当する音圧の強い部分を抽出し、その音圧の強い時刻からテンポ値に対応する時刻を拍の時刻として推定する（ステップＴ８）。そして、ランダムにタイムストレッチされたサビを抽出するとともに、それぞれの拍時刻が一致するように各サビを連結させる（ステップＴ９）。 Next, a portion having a strong sound pressure corresponding to a beat is extracted from the time stretched music, and the time corresponding to the tempo value is estimated from the time when the sound pressure is strong (step T8). Then, rust that has been time-stretched at random is extracted, and each rust is connected so that the respective beat times coincide (step T9).

このように上記実施の形態によれば、先に周波数変動の大きな位置をサビの開始時刻の候補として列挙し、その後、その開始時刻から所定区間内の音圧や繰り返し頻度に基づいてサビの候補を抽出するようにしたので、曲想が大きく変わる部分を優先して、正確にサビを抽出することができる。すなわち、音圧に基づいてサビの候補を先に抽出した場合は、音圧変化の少ないサビを抽出することが難しくなるが、サビに切り替わる部分では必ず曲想の変化があるため、この曲想の変化に基づいてサビを正確に抽出することができるようになる。 As described above, according to the above-described embodiment, positions with large frequency fluctuations are listed as rust start time candidates first, and then rust candidates based on sound pressure and repetition frequency within a predetermined section from the start time. Therefore, it is possible to extract the rust accurately by giving priority to the portion where the musical composition greatly changes. That is, if a rust candidate is extracted first based on sound pressure, it will be difficult to extract a rust with little change in sound pressure, but there will always be a change in the thought at the part that switches to the rust. It becomes possible to accurately extract rust based on the above.

また、変動時刻から連続する所定区間を、一定の区間として抽出するようにしたので、あらかじめサビの終了時刻を推定して繰り返し頻度などを計算する場合と比較して、その終了時刻を誤って推定することに基づくサビの誤判断を防止することができる。 In addition, since a predetermined interval that is continuous from the fluctuation time is extracted as a fixed interval, the end time is estimated incorrectly compared to the case where the end time of rust is estimated in advance and the repetition frequency is calculated. It is possible to prevent rust misjudgment based on doing.

さらには、終了時刻を決定する場合、開始時刻の次の変動時刻とするようにしたので、終了時刻については周波数変動のみで決定され、音圧が極端に変化することなく次のメロディに移るような場合であっても、終了時刻を正確に抽出することができるようになる。 Furthermore, when determining the end time, since it is set as the next fluctuation time of the start time, the end time is determined only by the frequency fluctuation, so that the sound pressure does not change drastically and the next melody is moved. Even in such a case, the end time can be accurately extracted.

なお、本発明は上記実施の形態に限定されることなく種々の態様で実施することができる。 In addition, this invention can be implemented in various aspects, without being limited to the said embodiment.

例えば、上記実施の形態では、周波数変動値を求める際に、平滑化処理や最小二乗法による回帰直線を求めるようにしたが、隣接する周波数ビンの差分の総和から周波数変動を求めるようにしてもよい。 For example, in the above embodiment, when the frequency fluctuation value is obtained, the regression line by the smoothing process or the least square method is obtained, but the frequency fluctuation may be obtained from the sum of the differences between adjacent frequency bins. Good.

また、上記実施の形態では、終了時刻を求める際、開始時刻の次の変動時刻を終了時刻としているが、音圧が閾値よりも小さくなる時刻を終了時刻としてもよい。 Moreover, in the said embodiment, when calculating | requiring end time, the fluctuation | variation time next to start time is made into end time, However, Time when sound pressure becomes smaller than a threshold-value may be made into end time.

さらには、上記実施の形態では、変動時刻から一定の区間の区間波形を抽出して繰り返し頻度や音圧などを算出するようにしたが、この区間については、次の変動時刻までの区間としてもよい。 Furthermore, in the above embodiment, the interval waveform of a certain interval is extracted from the variation time and the repetition frequency, sound pressure, etc. are calculated. However, this interval may be the interval until the next variation time. Good.

また、上記実施の形態では、抽出されたサビを連結させてサビメドレーを作成する場合のアプリケーションを例に挙げて説明したが、サビの開始時刻から一定の時間（例えば、１０秒など）で打ち切って、サビのダイジェスト版を作成するようにしてもよい。この場合、それぞれのサビを連結させることなく、数秒のインターバルを挟んで次のサビを開始させるようにしてもよい。 Further, in the above embodiment, the application in the case of creating a chorus medley by connecting extracted choruses has been described as an example. However, it is cut off at a certain time (eg, 10 seconds) from the chorus start time. A digest version of rust may be created. In this case, the next chorus may be started with an interval of several seconds without connecting each chorus.

また、上記実施の形態では、サビメドレーを作成する際に、タイムストレッチを行うようにしたが、タイムストレッチを行わずにテンポ値順に特徴波形（サビ）を並び替え、その並び替えられた順序で連結を行うようにしてもよい。このようにテンポ値順に連結を行うようにした場合であっても、スムーズな遷移を行わせることもできる。 In the above embodiment, time stretching is performed when creating a chorus medley. However, characteristic waveforms (rusts) are rearranged in order of tempo values without performing time stretching, and the rearranged order is used. You may make it perform connection. Even when the connections are performed in the order of the tempo values, smooth transition can be performed.

１００・・・特徴波形抽出システム
１・・・入力受付手段
２・・・解析手段
３・・・変動算出手段
４・・・変動時刻抽出手段
５・・・区間波形抽出手段
６・・・頻度算出手段
７・・・音圧算出手段
８・・・特徴波形抽出手段
９・・・連結手段 DESCRIPTION OF SYMBOLS 100 ... Feature waveform extraction system 1 ... Input reception means 2 ... Analysis means 3 ... Fluctuation calculation means 4 ... Fluctuation time extraction means 5 ... Section waveform extraction means 6 ... Frequency calculation Means 7 ... Sound pressure calculation means 8 ... Feature waveform extraction means 9 ... Connection means

Claims

Input receiving means for receiving input of waveform information of the music;
Fluctuation calculating means for analyzing the frequency of the waveform information received by the input receiving means and obtaining a frequency fluctuation value in the time axis direction;
Of the frequency fluctuation values obtained by the fluctuation calculation means, fluctuation time extraction means for extracting a fluctuation time having a frequency fluctuation value greater than or equal to a predetermined value;
Section waveform extraction means for extracting waveform information in a predetermined section continuous from the fluctuation time;
A frequency calculating means for obtaining a repetition frequency between the extracted section waveforms;
A sound pressure calculating means for obtaining a sound pressure in the section waveform;
A feature waveform extraction means for extracting a fluctuation time of a section waveform in which the repetition frequency is a plurality of times or more and the sound pressure is maximum as a feature start time, and extracting waveform information up to a predetermined end time as a feature waveform;
A feature waveform extraction system for music, characterized by comprising:

The characteristic waveform extraction system according to claim 1, wherein the predetermined section continuing from the fluctuation time is a fixed section.

The characteristic waveform extraction system according to claim 1, wherein the end time is a variation time next to the extracted start time.

The feature waveform extraction system according to claim 1, further comprising a connecting unit that matches the tempo value of the extracted feature waveform with a feature waveform of another music piece.

The feature waveform extraction system according to claim 1, further comprising a connecting unit that rearranges and connects the extracted feature waveforms in order of tempo values.

The feature waveform extraction system according to claim 1, further comprising connecting means for matching beat positions of the extracted feature waveforms with the feature waveforms of other music pieces.

Receiving the input of the waveform information of the music;
A frequency analysis of the received waveform information to obtain a frequency fluctuation value in the time axis direction;
Extracting a fluctuation time having a frequency fluctuation value equal to or greater than a predetermined value from the obtained frequency fluctuation values;
Extracting waveform information in a predetermined interval continuous from the fluctuation time;
Obtaining a repetition frequency between the extracted section waveforms;
Obtaining a sound pressure in the interval waveform;
Extracting the fluctuation time of the section waveform in which the repetition frequency is a plurality of times or more and the sound pressure is maximum as a feature start time, and extracting waveform information up to a predetermined end time as a feature waveform;
A feature waveform extraction method for music, characterized by comprising: