JP2007052394A

JP2007052394A - Tempo detector, code name detector and program

Info

Publication number: JP2007052394A
Application number: JP2006001194A
Authority: JP
Inventors: Ren Sumida; 錬澄田
Original assignee: Kawai Musical Instrument Manufacturing Co Ltd
Current assignee: Kawai Musical Instrument Manufacturing Co Ltd
Priority date: 2005-07-19
Filing date: 2006-01-06
Publication date: 2007-03-01
Anticipated expiration: 2026-01-06
Also published as: JP4767691B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a tempo detector capable of detecting a section for detecting an average tempo and an accurate beat position of the entire melody, a rhythm of the melody and a first beat position of the melody from signals of performance with a fluctuating tempo. <P>SOLUTION: A tempo detector comprises: an input part 1 for inputting a sound signal; a scale sound level detecting part 2 for determining a sound level of each scale at every prescribed time interval by performing an FFT operation on the sound signal; a beat detecting part 3 for detecting an average beat interval and the position of each beat by summing up increments in sound level for each scale note at prescribed time intervals and determining the total increment in level indicative of a degree of variation in all sounds at the above time intervals; and a bar detecting part 4 for calculating the average sound level of each scale note for every beat, and summing up an increment in average level for all scale notes for every beat and thereby determining a value indicative of a degree of variation in all sounds for every beat, and detecting the rhythm and the position of a bar line position from the value. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、テンポ検出装置、コード名検出装置及びプログラムに関する。 The present invention relates to a tempo detection device, a code name detection device, and a program.

従来の自動伴奏装置では、あらかじめ演奏するテンポを使用者が設定し、このテンポに従って自動演奏が行われる。従って、この自動伴奏に合わせて演奏者が演奏する場合、この自動伴奏のテンポに合わせて演奏する必要があり、これは特に演奏初心者にとっては非常に困難なことであった。そのため、演奏者の演奏音から自動的にテンポを検出し、このテンポに合わせて自動伴奏を行うような自動伴奏装置が望まれていた。 In a conventional automatic accompaniment apparatus, a user sets a tempo for performance in advance, and automatic performance is performed according to this tempo. Therefore, when a performer performs in accordance with the automatic accompaniment, it is necessary to perform in accordance with the tempo of the automatic accompaniment, which is very difficult especially for beginners of performance. Therefore, an automatic accompaniment apparatus that automatically detects the tempo from the performance sound of the performer and performs automatic accompaniment in accordance with this tempo has been desired.

また、演奏音が収録された音楽ＣＤ等の音源からコード名や音符情報を検出する採譜装置において、その演奏音からテンポを検出する機能はその前段階の処理として必須である。 Further, in a music transcription device that detects chord names and note information from a sound source such as a music CD on which performance sounds are recorded, the function of detecting the tempo from the performance sounds is indispensable as the previous process.

このようなテンポ検出装置として、例えば、下記特許文献１のテンポ検出装置がある。 As such a tempo detection device, for example, there is a tempo detection device disclosed in Patent Document 1 below.

この特許文献１のテンポ検出装置は、外部から入力される演奏音の１音毎の音程、音量、及び発音のタイミングを表す演奏情報に基づいて、音量に起因するアクセントと、音量以外の音楽要素に起因するアクセントを検出し、これら双方のアクセントに基づいて演奏情報のテンポの変化を予測し、予測されたテンポに内部で生成するテンポを追従させるテンポ変更手段を備えたテンポ検出装置である。従って、テンポ検出するためには、音符情報が検出されていなければならず、ＭＩＤＩ等の音符情報を出力する機能を持った楽器で演奏された場合は、これを簡単に取得できるが、これを持たない一般の楽器で演奏された場合は、演奏音から音符情報を検出するという採譜技術が必要になる。 The tempo detection device of Patent Document 1 is based on performance information representing the pitch, volume, and sounding timing of each performance sound input from the outside, and an accent caused by the volume and music elements other than the volume. Is a tempo detection device comprising a tempo changing means for detecting an accent caused by a tempo, predicting a change in the tempo of performance information based on both of these accents, and causing the internally generated tempo to follow the predicted tempo. Therefore, note information must be detected in order to detect the tempo, and this can be easily obtained when played by a musical instrument having a function of outputting note information such as MIDI. When the performance is performed with a general musical instrument that does not have a musical notation, it is necessary to have a music transcription technique of detecting note information from the performance sound.

ＭＩＤＩ等の音符情報を出力する機能を持たない一般楽器の演奏音、すなわち音響信号を入力とするテンポ検出装置の例としては、下記特許文献２に示される構成がある。 As an example of a tempo detection device that receives a performance sound of a general musical instrument that does not have a function of outputting note information such as MIDI, that is, an acoustic signal, there is a configuration shown in Patent Document 2 below.

該特許文献２に示される構成では、入力される音響信号を時分割でディジタルフィルタリング処理を行って音階を抽出し、検出した音階音のエンベロープ値に基づいてその音階音の発生周期を検出し、この音階音の発生周期とあらかじめ指定された入力音響信号の拍子に基づいてテンポを検出している。このテンポ検出装置は、音符情報を検出しないので、コード名や音符情報を検出する採譜装置の前処理としても使用できる。 In the configuration shown in Patent Document 2, the input acoustic signal is digitally filtered in a time-sharing manner to extract the scale, the scale sound generation period is detected based on the detected envelope value of the scale sound, The tempo is detected based on the generation period of the scale sound and the time signature of the input acoustic signal designated in advance. Since this tempo detection device does not detect note information, it can also be used as preprocessing for a music transcription device that detects chord names and note information.

同様なテンポ検出装置として、後述する非特許文献１がある。 As a similar tempo detection device, there is Non-Patent Document 1 described later.

他方、ポピュラー系の音楽においてコードは非常に重要な要素であり、このようなジャンルの音楽を小編成のバンドで演奏する場合にも、演奏する個々の音符が書かれた楽譜は使用しないで、コード譜またはリードシートと呼ばれるメロディとコード進行のみが書かれた楽譜を使用することが通常である。よって市販のＣＤ等の曲をバンドで演奏するためには曲のコード進行を採譜する必要があるが、この作業は特別な音楽的知識を有する専門家のみが可能であり、一般の人には不可能であった。そこで、市販のパーソナルコンピュータなどを使用して音楽音響信号からコード名を検出する自動採譜装置が求められていた。 On the other hand, chords are a very important element in popular music, and when playing music of such a genre in a small band, do not use the score on which the individual notes to be played are written, It is common to use a score written only with a melody and chord progression called a chord score or lead sheet. Therefore, it is necessary to record the chord progression of a song in order to perform a song such as a commercial CD in a band, but this work can only be performed by an expert with special musical knowledge. It was impossible. Therefore, there has been a demand for an automatic music transcription device that detects a chord name from a music acoustic signal using a commercially available personal computer or the like.

このような音楽音響信号からコードを検出する装置として、下記特許文献３の構成がある。同文献の構成では、パワー・スペクトルの計算結果から基本周波数候補を抽出し、この基本周波数候補から倍音と思われるものを除去して音符情報を検出し、この音符情報から和音を検出している。 As an apparatus for detecting a chord from such a music sound signal, there is a configuration of Patent Document 3 below. In the configuration of this document, a fundamental frequency candidate is extracted from the calculation result of the power spectrum, and what is considered to be a harmonic is removed from the fundamental frequency candidate to detect note information, and a chord is detected from this note information. .

しかし、該特許文献３に示す構成では、上記倍音を除去する作業は、楽器の種類による倍音構造の違い、打鍵強さによる倍音の出方の違い、時間による倍音のパワー変化、同じ周波数を倍音成分として持つ音同士の位相干渉の問題などから非常に困難であることが知られている。即ち、この音符情報を検出するという工程が、多くの楽器や歌唱などが混じった一般の音楽ＣＤなどの音源で必ずしも正しく機能するとは考えられない。 However, in the configuration shown in Patent Document 3, the operation of removing the above harmonics includes the difference in the harmonic structure depending on the type of the instrument, the difference in how the harmonics are generated due to the keystroke strength, the power change of the harmonic over time, and the harmonics of the same frequency. It is known that it is very difficult due to the problem of phase interference between sounds possessed as components. That is, it is not considered that the step of detecting the note information functions correctly with a sound source such as a general music CD in which many musical instruments and singing are mixed.

同様に音楽音響信号からコードを検出する装置として、後述する特許文献４の構成がある。該特許文献４の構成では、入力される音響信号に対して、異なる特性のディジタルフィルタリング処理を時分割で行い、各音階音のレベルを検出し、検出したレベルのうちオクターブ内で同じ音階関係にあるレベル同士を積算して、その積算レベルのうち値が大きい所定数を使って和音検出をしている。この方法では音響信号に含まれる個々の音符情報を検出しないので、特許文献３にあげたような問題は発生しない。
特許第３２３１４８２号特許第３１２７４０６号後藤真孝著「リアルタイムビートトラッキングシステム」（共立出版コンピュータサイエンス誌 bit Vol.28 No.3 1996年）特許第２８７６８６１号特許第３１５６２９９号 Similarly, as a device for detecting a chord from a music acoustic signal, there is a configuration of Patent Document 4 described later. In the configuration of Patent Document 4, digital filtering processing with different characteristics is performed on input acoustic signals in a time-sharing manner to detect the level of each scale sound, and the same scale relation within the octave is detected among the detected levels. The chords are detected by using a predetermined number having a large value among the integrated levels by integrating certain levels. Since this method does not detect individual note information included in the acoustic signal, the problem described in Patent Document 3 does not occur.
Japanese Patent No. 3231482 Japanese Patent No. 3127406 Real-time beat tracking system by Masataka Goto (Kyoritsu Publishing Computer Science magazine bit Vol.28 No.3 1996) Japanese Patent No. 28768861 Japanese Patent No. 3156299

ところが、上記特許文献２のテンポ装置では、音階音のエンベロープからその音階音の発生周期を検出する部分が、エンベロープ値の最大値を検出し、その最大値に対して所定割合以上の部分を検出することにより行う構成となっている。しかし、このように所定割合を一意に定めてしまうと、音量の大小によって発音タイミングを検出できたりできなかったりしてしまい、それが最終的なテンポの決定に大きな影響を与えてしまうという問題を抱えている。 However, in the tempo device disclosed in Patent Document 2, the part that detects the scale sound generation period from the envelope of the scale sound detects the maximum value of the envelope value, and detects the part that exceeds a predetermined ratio with respect to the maximum value. It is the structure performed by doing. However, if the predetermined ratio is uniquely determined in this way, the sound generation timing may not be detected depending on the volume level, which may have a great influence on the final tempo determination. I have it.

上記非特許文献１に示されるビートトラッキングシステムも、音響信号をＦＦＴして得られた周波数スペクトルから音の立ち上がり成分を抽出するので、先の特許文献２のテンポ検出装置と同様、この立ち上がりを検出できるかどうかが最終的なテンポの決定に大きな影響を与えてしまう。 The beat tracking system disclosed in Non-Patent Document 1 also extracts the rising component of the sound from the frequency spectrum obtained by performing FFT on the acoustic signal. Whether it can be done will have a big impact on the final tempo decision.

また、これら２つのテンポ検出装置に共通して言えることは、この音の立ち上がりを検出するのをどの音階音、あるいは、周波数で行うのかということである。たまたま検出する音階音（周波数）で細かいリズムを刻んでいる曲があった場合、間違えて速いテンポを検出してしまう問題があった。 What is common to these two tempo detection devices is which scale sound or frequency is used to detect the rising of this sound. If there is a song that has a fine rhythm in the scale sound (frequency) that happens to be detected, there is a problem that a fast tempo is detected by mistake.

他方、音楽音響信号からコードを検出する上記特許文献４に示す構成では、各音階音のレベルをオクターブ内で同じ音階関係にあるもの同士、つまり１２の音名毎に積算してしまっているので、同じ構成音からなる複数の和音、例えばラ、ド、ミ、ソからなるＡｍ７とド、ミ、ソ、ラからなるＣ６という２つのコードを判別することができない。 On the other hand, in the configuration shown in Patent Document 4 for detecting a chord from a music acoustic signal, the levels of the scales are integrated for each of the 12 pitch names having the same scale relationship within the octave. A plurality of chords composed of the same constituent sound, for example, Am7 composed of la, de, mi and so, and C6 composed of de, mi, so and la cannot be discriminated.

また、この特許文献４の和音検出装置には、テンポや小節の検出機能はなく、和音検出は所定のタイミング毎に行うとなっている。つまり、あらかじめ曲のテンポを設定してそのテンポで発音するメトロノームに合わせて演奏するようなケースを想定しており、音楽ＣＤ等のような演奏後の音響信号に適用した場合、一定時間間隔毎のコード名は検出できるが、テンポや小節を検出していないので、コード譜またはリードシートと呼ばれているような各小節のコード名が書かれた楽譜のような形式に出力することはできない。 Further, the chord detection device of Patent Document 4 does not have a tempo or measure detection function, and the chord detection is performed at predetermined timings. In other words, it is assumed that the tempo of the song is set in advance and the performance is performed in accordance with a metronome that produces the tempo. The chord name can be detected, but the tempo and measure are not detected, so it cannot be output in the form of a score in which the chord name of each measure such as a chord score or lead sheet is written. .

仮に曲のテンポを与えたとしても、一般的に音楽ＣＤに収録されている演奏のテンポは一定ではなく多少揺らぐため、正しく小節毎のコードを検出することはできない。 Even if the tempo of the song is given, the tempo of the performance recorded on the music CD is generally not constant and slightly fluctuates, so that the chord for each measure cannot be detected correctly.

また、一定のテンポで発音されるメトロノームなどに合わせて正確なテンポで演奏することは初心者の演奏者にとっては非常に困難であり、一般的には演奏のテンポは揺らいでしまうのが通常である。 Also, it is very difficult for beginners to perform at an accurate tempo that matches a metronome that is pronounced at a constant tempo, and generally the performance tempo usually fluctuates. .

さらに、特許文献４の構成では、入力される音響信号に対して、異なる特性のディジタルフィルタリング処理を時分割で行う構成が採用されているが、この構成の採用理由として、ＦＦＴ演算では低域で周波数分解が悪いことをあげている。しかし、入力音響信号をダウンサンプリングしてＦＦＴを行うことで低域でもある程度の周波数分解能を得ることは可能であるし、ディジタルフィルタリング処理では、フィルタ出力信号のレベルを求めるためにエンベロープ抽出部が必要になってしまうのに対し、ＦＦＴでは、ＦＦＴ後のパワーそのものが各周波数でのレベルを表しているためそのようなものは必要なく、ＦＦＴポイント数とシフト量のパラメータを適宜選ぶことで周波数分解能や時間分解能を自由に設定できるメリットもある。 Furthermore, in the configuration of Patent Document 4, a configuration in which digital filtering processing with different characteristics is performed in a time-division manner on the input acoustic signal is adopted. The frequency resolution is bad. However, it is possible to obtain a certain level of frequency resolution even at low frequencies by down-sampling the input acoustic signal and performing FFT, and the digital filtering process requires an envelope extraction unit to determine the level of the filter output signal On the other hand, in FFT, the power itself after the FFT represents the level at each frequency, so that is not necessary, and the frequency resolution can be selected by appropriately selecting the parameters of the number of FFT points and the shift amount. And the time resolution can be set freely.

本発明は、以上のような問題に鑑み創案されたもので、人間が演奏したテンポの揺らぐ演奏の音響信号から、曲全体の平均的なテンポと正確なビート（拍）の位置、さらに曲の拍子と１拍目の位置を検出することが可能なテンポ検出装置を提供せんとするものである。 The present invention was devised in view of the above-described problems. From an acoustic signal of a performance performed by a human to change the tempo, the average tempo of the entire song, the exact beat position, and the song It is intended to provide a tempo detection device capable of detecting the time signature and the position of the first beat.

またもう１つの本発明の構成は、特別な音楽的知識を有する専門家でなくても、音楽ＣＤ等の複数の楽器音の混ざった音楽音響信号（オーディオ信号）から、コード名（和音名）を検出することができるコード名検出装置を提供することを目的とする。 Further, another configuration of the present invention is that a chord name (chord name) can be obtained from a music acoustic signal (audio signal) in which a plurality of instrument sounds such as a music CD are mixed, even if it is not an expert having special musical knowledge. It is an object of the present invention to provide a code name detection device capable of detecting the above.

さらに詳しくは、入力された音響信号に対し、個々の音符情報を検出することなしに、全体の響きから、コードを決定することができるコード名検出装置を提供することを目的とする。 More specifically, it is an object of the present invention to provide a chord name detection device that can determine chords from the overall sound without detecting individual note information for an input acoustic signal.

加えて、構成音が同じ和音でも判別可能で、演奏のテンポが揺らいでしまった場合や、逆にわざとテンポを揺らして演奏しているような音源に関しても、小節毎の和音が検出可能なコード名検出装置を提供することを目的とする。 In addition, a chord that can be detected even with the same chord and can detect chords for each measure even if the performance tempo fluctuates or, on the other hand, a sound source that intentionally fluctuates the tempo An object is to provide a name detection device.

以上のように、本発明構成では、簡単な構成のみでビート検出という時間分解能が必要な処理（上記テンポ検出装置の構成と同じ）と、和音検出という周波数分解能が必要な処理（上記テンポ検出装置の構成を基にさらに和音を検出できる構成）を同時に行うことができるコード名検出装置を提供することを目的とする。 As described above, in the configuration of the present invention, processing that requires time resolution of beat detection with the simple configuration only (same as the configuration of the tempo detection device) and processing that requires frequency resolution of chord detection (the tempo detection device described above) It is an object of the present invention to provide a chord name detection apparatus capable of simultaneously performing a chord detection based on the above configuration.

併せて、これらの装置をコンピュータ上に実現できるテンポ検出用及びコード名検出用のコンピュータ・プログラムについても、提供する。 In addition, a computer program for tempo detection and code name detection that can implement these devices on a computer is also provided.

そのため本発明に係るテンポ検出装置は、
音響信号を入力する入力手段と、
入力された音響信号から、所定の時間間隔で、ＦＦＴ演算を行い、所定の時間毎の各音階音のレベルを求める音階音レベル検出手段と、
この所定の時間毎の各音階音のレベルの増分値をすべての音階音について合計して、所定の時間毎の全体の音の変化度合いを示すレベルの増分値の合計を求め、この所定の時間毎の全体の音の変化度合いを示すレベルの増分値の合計から、平均的なビート間隔と各ビートの位置を検出するビート検出手段と、
このビート毎の各音階音のレベルの平均値を計算し、このビート毎の各音階音の平均レベルの増分値をすべての音階音について合計して、ビート毎の全体の音の変化度合いを示す値を求め、このビート毎の全体の音の変化度合いを示す値から、拍子と小節線位置を検出する小節検出手段と
を有することを基本的特徴としている。 Therefore, the tempo detection device according to the present invention is
An input means for inputting an acoustic signal;
A scale sound level detection means for performing an FFT operation at a predetermined time interval from an input acoustic signal and obtaining a level of each scale sound for each predetermined time;
The increment value of each scale sound level for each predetermined time is summed for all the scale sounds to obtain a total of level increment values indicating the degree of change in the overall sound for each predetermined time. Beat detection means for detecting the average beat interval and the position of each beat from the sum of the incremental values of the level indicating the degree of change in the overall sound for each,
The average value of the scale level for each beat is calculated, and the increment value of the average level of each scale sound for each beat is added for all the scale sounds to indicate the degree of change in the overall sound for each beat. The basic feature is to have a bar detection means for detecting a time signature and a bar line position from a value indicating the degree of change in the overall sound for each beat.

上記構成によれば、入力手段に入力された音響信号から所定の時間毎の各音階音のレベルを音階音レベル検出手段によって求め、上記ビート検出手段によって、この所定の時間毎の各音階音のレベルの増分値をすべての音階音について合計して所定の時間毎の全体の音の変化度合いを示すレベルの増分値の合計を求め、同じくビート検出手段により、この所定の時間毎の全体の音の変化度合いを示すレベルの増分値の合計から、平均的なビート（拍）間隔（つまりテンポ）と各ビートの位置を検出し、次に上記小節検出手段により、このビート毎の各音階音のレベルの平均値を計算し、このビート毎の各音階音の平均レベルの増分値をすべての音階音について合計して、ビート毎の全体の音の変化度合いを示す上記値求め、このビート毎の全体の音の変化度合いを示す値から、拍子と小節線位置（1拍目の位置）を検出することになる。 According to the above configuration, the scale level for each predetermined time is obtained from the acoustic signal input to the input means by the scale level detecting means, and the beat detection means determines the scale sound for each predetermined time. The level increment values are summed for all scales to obtain the total level increment value indicating the degree of change in the overall sound for each predetermined time, and the beat detection means also performs the overall sound for this predetermined time. The average beat interval (that is, tempo) and the position of each beat are detected from the sum of the level increments indicating the degree of change in the level, and then the measure detection means measures the tone of each scale note for each beat. The average value of the level is calculated, and the increment value of the average level of each scale sound for each beat is summed up for all the scale sounds, and the above value indicating the degree of change in the overall sound for each beat is obtained, and for each beat. all From a value that indicates the degree of change in tone, thereby detecting the beat and measure line position (first beat position).

すなわち、入力された音響信号から所定の時間毎の各音階音のレベルを求め、この所定の時間毎の各音階音のレベルの変化から平均的なビート（拍）間隔（つまりテンポ）と各ビートの位置を検出し、次にこのビート毎の各音階音のレベルの変化から拍子と小節線位置（1拍目の位置）を検出することになる。 That is, the level of each scale sound for each predetermined time is obtained from the input sound signal, and the average beat (beat) interval (that is, tempo) and each beat are determined from the change in the level of each scale sound for each predetermined time. Next, the time signature and bar line position (position of the first beat) are detected from the change in the level of each scale tone for each beat.

またコード名検出装置の構成は、
音響信号を入力する入力手段と、
入力された音響信号から、所定の時間間隔で、ビート検出に適したパラメータを使ってＦＦＴ演算を行い、所定の時間毎の各音階音のレベルを求める第１の音階音レベル検出手段と、
この所定の時間毎の各音階音のレベルの増分値をすべての音階音について合計して、所定の時間毎の全体の音の変化度合いを示すレベルの増分値の合計を求め、この所定の時間毎の全体の音の変化度合いを示すレベルの増分値の合計から、平均的なビート間隔と各ビートの位置を検出するビート検出手段と、
このビート毎の各音階音のレベルの平均値を計算し、このビート毎の各音階音の平均レベルの増分値をすべての音階音について合計して、ビート毎の全体の音の変化度合いを示す値を求め、このビート毎の全体の音の変化度合いを示す値から、拍子と小節線位置を検出する小節検出手段と、
上記入力された音響信号から、先のビート検出の時とは異なる別の所定の時間間隔で、コード検出に適したパラメータを使ってＦＦＴ演算を行い、所定の時間毎の各音階音のレベルを求める第２の音階音レベル検出手段と、
検出した各音階音のレベルのうち、各小節内における低域側の音階音のレベルからベース音を検出するベース音検出手段と、
検出したベース音と各音階音のレベルから各小節のコード名を決定するコード名決定手段と
を有することを特徴としている。 The configuration of the code name detection device is
An input means for inputting an acoustic signal;
First scale sound level detection means for performing FFT calculation using a parameter suitable for beat detection at predetermined time intervals from the input acoustic signal, and obtaining the level of each scale sound for each predetermined time;
The increment value of each scale sound level for each predetermined time is summed for all the scale sounds to obtain a total of level increment values indicating the degree of change in the overall sound for each predetermined time. Beat detection means for detecting the average beat interval and the position of each beat from the sum of level increments indicating the degree of change in the overall sound for each,
The average value of the scale level for each beat is calculated, and the increment value of the average level of each scale sound for each beat is added for all the scale sounds to indicate the degree of change in the overall sound for each beat. A bar detecting means for obtaining a value and detecting a time signature and a bar line position from a value indicating a change degree of the whole sound for each beat;
From the input acoustic signal, an FFT operation is performed using a parameter suitable for chord detection at a predetermined time interval different from that at the time of the previous beat detection, and the level of each scale sound for each predetermined time is calculated. Second scale level detection means to be obtained;
Bass sound detection means for detecting a bass sound from the level of the low-frequency scale sound in each measure out of the detected scale levels,
It has a chord name determining means for determining the chord name of each measure from the detected bass sound and the level of each scale sound.

また上記ベース音検出手段において、ベース音が小節内で複数検出される場合は、そのベース音検出結果に応じて、上記コード名決定手段は、小節を幾つかのコード検出範囲に分断し、この各コード検出範囲におけるコード名を、ベース音と各コード検出範囲における各音階音のレベルから、決定するものとする。 In the bass sound detecting means, when a plurality of bass sounds are detected in a measure, the chord name determining means divides the measure into several chord detection ranges according to the bass sound detection result. The chord name in each chord detection range is determined from the base sound and the level of each scale sound in each chord detection range.

上記構成によれば、入力手段から入力された入力音響信号に対し、第１の音階音レベル検出手段により、所定の時間間隔で、まずビート検出に適したパラメータでＦＦＴ演算を行い、これにより所定の時間毎の各音階音のレベルを求め、ビート検出手段により、この所定の時間毎の各音階音のレベルの変化から平均的なビート間隔と各ビートの位置を検出する。次に、小節検出手段により、このビート毎の各音階音のレベルの変化から拍子と小節線位置を検出する。さらに、本発明のコード名検出装置は、第２の音階音レベル検出手段により、入力音響信号に対し先のビート検出の時とは異なる別の所定の時間間隔で、今度はコード検出に適したパラメータでＦＦＴ演算を行い、これにより所定の時間毎の各音階音のレベルを求める。そしてベース音検出手段により、この各音階音のレベルの内、低域側の音階音のレベルから各小節のベース音を検出し、コード名決定手段により、検出したベース音と各音階音のレベルから各小節のコード名を決定することになる。 According to the above configuration, the first acoustic scale level detection means first performs an FFT operation with a parameter suitable for beat detection on the input sound signal input from the input means at a predetermined time interval. The level of each scale sound for each time is obtained, and the beat detection means detects the average beat interval and the position of each beat from the change in the level of each scale sound for each predetermined time. Next, the measure and the bar line position are detected from the change in the level of each scale sound for each beat by the measure detecting means. Furthermore, the chord name detection apparatus according to the present invention is suitable for chord detection at a predetermined time interval different from the time of the previous beat detection with respect to the input acoustic signal by the second scale sound level detection means. An FFT operation is performed with the parameters, thereby obtaining the level of each scale sound for each predetermined time. The bass sound detecting means detects the bass sound of each measure from the scale sound level on the low frequency side, and the chord name determining means detects the bass sound and the level of each scale sound. The chord name of each measure will be determined from

また上記のように、ベース音検出手段でこのベース音が小節内で複数検出される場合は、そのベース音検出結果に応じて、上記コード名決定手段は、小節を幾つかのコード検出範囲に分断し、この各コード検出範囲におけるコード名をベース音と各コード検出範囲における各音階音のレベルから決定することになる。 In addition, as described above, when a plurality of bass sounds are detected in the measure by the bass sound detecting means, the chord name determining means determines that the measure is divided into several chord detection ranges according to the bass sound detection result. The chord name in each chord detection range is determined from the bass sound and the level of each tone in the chord detection range.

さらに、請求項９の構成は、請求項１記載の構成を、コンピュータに実行させるために、該コンピュータで実行可能なプログラム自身を規定している。すなわち、上述した課題を解決するための構成として、上記各手段を、コンピュータの構成を利用することで実現する、該コンピュータで読み込まれて実行可能なプログラムである。この場合、コンピュータとは中央演算処理装置の構成を含んだ汎用的なコンピュータの構成の他、特定の処理に向けられた専用機などを含むものであっても良く、中央演算処理装置の構成を伴うものであれば特に限定はない。 Further, in order to cause a computer to execute the configuration according to claim 1, the configuration according to claim 9 defines a program itself that can be executed by the computer. In other words, as a configuration for solving the above-described problems, the above-described means is realized by using the configuration of a computer, and is a program that can be read and executed by the computer. In this case, the computer may include a general-purpose computer configuration including the configuration of the central processing unit, or may include a dedicated machine directed to a specific process, and the configuration of the central processing unit. If it accompanies, there will be no limitation in particular.

上記各手段を実現させるためのプログラムが該コンピュータに読み出されると、請求項１に規定された各機能実現手段と同様な機能実現手段が達成されることになる。 When a program for realizing each of the above means is read by the computer, the same function realizing means as the function realizing means defined in claim 1 is achieved.

請求項９のより具体的構成は、
コンピュータを、
音響信号を入力する入力手段と、
入力された音響信号から、所定の時間間隔で、ＦＦＴ演算を行い、所定の時間毎の各音階音のレベルを求める音階音レベル検出手段と、
この所定の時間毎の各音階音のレベルの増分値をすべての音階音について合計して、所定の時間毎の全体の音の変化度合いを示すレベルの増分値の合計を求め、この所定の時間毎の全体の音の変化度合いを示すレベルの増分値の合計から、平均的なビート間隔と各ビートの位置を検出するビート検出手段と、
このビート毎の各音階音のレベルの平均値を計算し、このビート毎の各音階音の平均レベルの増分値をすべての音階音について合計して、ビート毎の全体の音の変化度合いを示す値を求め、このビート毎の全体の音の変化度合いを示す値から、拍子と小節線位置を検出する小節検出手段と
して機能させることを特徴とするテンポ検出用プログラムである。 A more specific configuration of claim 9 is:
Computer
An input means for inputting an acoustic signal;
A scale sound level detection means for performing an FFT operation at a predetermined time interval from an input acoustic signal and obtaining a level of each scale sound for each predetermined time;
The increment value of each scale sound level for each predetermined time is summed for all the scale sounds to obtain a total of level increment values indicating the degree of change in the overall sound for each predetermined time. Beat detection means for detecting the average beat interval and the position of each beat from the sum of the incremental values of the level indicating the degree of change in the overall sound for each,
The average value of the scale level for each beat is calculated, and the increment value of the average level of each scale sound for each beat is added for all the scale sounds to indicate the degree of change in the overall sound for each beat. A program for detecting a tempo, characterized in that a value is obtained and functioned as a bar detecting means for detecting a time signature and a bar line position from a value indicating the degree of change in the overall sound for each beat.

さらに、請求項１０の構成は、請求項７記載の構成を、コンピュータに実行させるために、該コンピュータで実行可能なプログラム自身を規定している。すなわち、コンピュータに上記各手段を実現させるためのプログラムが該コンピュータに読み出されると、請求項７に規定された各機能実現手段と同様な機能実現手段が達成されることになる。 Further, in order to cause a computer to execute the configuration according to claim 7, the configuration according to claim 10 defines a program itself that can be executed by the computer. That is, when a program for causing a computer to realize each of the above means is read out by the computer, function realizing means similar to the function realizing means defined in claim 7 are achieved.

請求項１０のより具体的構成は、
コンピュータを、
音響信号を入力する入力手段と、
入力された音響信号から、所定の時間間隔で、ビート検出に適したパラメータを使ってＦＦＴ演算を行い、所定の時間毎の各音階音のレベルを求める第１の音階音レベル検出手段と、
この所定の時間毎の各音階音のレベルの増分値をすべての音階音について合計して、所定の時間毎の全体の音の変化度合いを示すレベルの増分値の合計を求め、この所定の時間毎の全体の音の変化度合いを示すレベルの増分値の合計から、平均的なビート間隔と各ビートの位置を検出するビート検出手段と、
このビート毎の各音階音のレベルの平均値を計算し、このビート毎の各音階音の平均レベルの増分値をすべての音階音について合計して、ビート毎の全体の音の変化度合いを示す値を求め、このビート毎の全体の音の変化度合いを示す値から、拍子と小節線位置を検出する小節検出手段と、
上記入力された音響信号から、先のビート検出の時とは異なる別の所定の時間間隔で、コード検出に適したパラメータを使ってＦＦＴ演算を行い、所定の時間毎の各音階音のレベルを求める第２の音階音レベル検出手段と、
検出した各音階音のレベルのうち、各小節内における低域側の音階音のレベルからベース音を検出するベース音検出手段と、
検出したベース音と各音階音のレベルから各小節のコード名を決定するコード名決定手段と
して機能させることを特徴とするコード名検出用プログラムである。 A more specific configuration of claim 10 is:
Computer
An input means for inputting an acoustic signal;
First scale sound level detection means for performing FFT calculation using a parameter suitable for beat detection at predetermined time intervals from the input acoustic signal, and obtaining the level of each scale sound for each predetermined time;
The increment value of each scale sound level for each predetermined time is summed for all the scale sounds to obtain a total of level increment values indicating the degree of change in the overall sound for each predetermined time. Beat detection means for detecting the average beat interval and the position of each beat from the sum of the incremental values of the level indicating the degree of change in the overall sound for each,
The average value of the scale level for each beat is calculated, and the increment value of the average level of each scale sound for each beat is added for all the scale sounds to indicate the degree of change in the overall sound for each beat. A bar detecting means for obtaining a value and detecting a time signature and a bar line position from a value indicating a change degree of the whole sound for each beat;
From the input acoustic signal, an FFT operation is performed using a parameter suitable for chord detection at a predetermined time interval different from that at the time of the previous beat detection, and the level of each scale sound for each predetermined time is calculated. Second scale level detection means to be obtained;
Bass sound detection means for detecting a bass sound from the level of the low-frequency scale sound in each measure out of the detected scale levels,
A chord name detection program that functions as chord name determining means for determining the chord name of each measure from the detected bass sound and the level of each scale sound.

以上のようなプログラムの構成であれば、既存のハードウェア資源を用いてこのプログラムを使用することにより、既存のハードウェアで新たなアプリケーションとしての本発明の夫々の装置が容易に実現できるようになる。 With the program configuration as described above, by using this program using the existing hardware resources, each device of the present invention as a new application can be easily realized with the existing hardware. Become.

このプログラムという態様では、通信などを利用して、これを容易に使用、配布、販売することができるようになる。また、既存のハードウェア資源を用いてこのプログラムを使用することにより、既存のハードウェアで新たなアプリケーションとしての本発明の装置が容易に実行できるようになる。 In the aspect of this program, it becomes possible to easily use, distribute, and sell it using communication or the like. In addition, by using this program using existing hardware resources, the apparatus of the present invention as a new application can be easily executed with the existing hardware.

尚、請求項９、１０記載の各機能実現手段のうち一部の機能は、コンピュータに組み込まれた機能（コンピュータにハードウェア的に組み込まれている機能でも良く、該コンピュータに組み込まれているオペレーティングシステムや他のアプリケーションプログラムなどによって実現される機能でも良い）によって実現され、前記プログラムには、該コンピュータによって達成される機能を呼び出すあるいはリンクさせる命令が含まれていても良い。 It should be noted that some of the functions realizing means according to claims 9 and 10 may be functions built into a computer (functions built into the computer in hardware or operating functions built into the computer). It may be a function realized by a system or other application program), and the program may include an instruction for calling or linking a function achieved by the computer.

これは、請求項１、７に規定された各機能実現手段の一部が、例えばオペレーティングシステムなどによって達成される機能の一部で代行され、その機能を実現するためのプログラムないしモジュールなどは直接存在するわけではないが、それらの機能を達成するオペレーティングシステムの機能の一部を、呼び出したりリンクさせるようにしてあれば、実質的に同じ構成となるからである。 This is because part of each function realization means defined in claims 1 and 7 is substituted for part of the function achieved by, for example, an operating system, and the program or module for realizing the function is directly Although it does not exist, if a part of the function of the operating system that achieves these functions is called or linked, the configuration is substantially the same.

本発明の請求項１〜請求項６記載のテンポ検出装置、及び請求項９記載のプログラムによれば、人間が演奏したテンポの揺らぐ演奏の音響信号から、曲全体の平均的なテンポと正確なビート（拍）の位置、さらに曲の拍子と１拍目の位置を検出することができるようになるという優れた効果を奏し得る。 According to the tempo detection device according to claims 1 to 6 of the present invention and the program according to claim 9, the average tempo of the entire song and the accurate tempo are accurately obtained from the sound signal of the performance of the tempo performed by a human. An excellent effect can be obtained that the position of the beat (beat), the time signature of the song, and the position of the first beat can be detected.

また請求項７及び請求項８記載のコード名検出装置、請求項１０記載のプログラムによれば、特別な音楽的知識を有する専門家でなくても、音楽ＣＤ等の複数の楽器音の混ざった入力された音楽音響信号（オーディオ信号）に対し、個々の音符情報を検出することなしに全体の響きから、コード名（和音名）を検出することが可能となる。 Further, according to the chord name detection apparatus according to claim 7 and claim 8 and the program according to claim 10, a plurality of musical instrument sounds such as music CDs are mixed even if not an expert having special musical knowledge. It is possible to detect a chord name (chord name) from the entire sound without detecting individual note information for an input music sound signal (audio signal).

さらに、該構成によれば、構成音が同じ和音でも判別可能で、演奏のテンポが揺らいでしまった場合や、逆にわざとテンポを揺らして演奏しているような音源に関しても、小節毎の和音が検出可能となる。 Furthermore, according to this configuration, even if the constituent sound is the same chord, even if the tempo of the performance has fluctuated, or conversely the sound source that is performing intentionally fluctuating the tempo, the chord for each measure Can be detected.

特に請求項７及び請求項８記載のコード名検出装置の後者の構成及び請求項１０記載のプログラムでは、簡単な構成のみでビート検出という時間分解能が必要な処理（上記テンポ検出装置の構成と同じ）と、和音検出という周波数分解能が必要な処理（上記テンポ検出装置の構成を基にさらに和音を検出できる構成）を同時に行うことができるようになる。 In particular, in the latter configuration of the code name detection device according to claim 7 and claim 8 and the program according to claim 10, processing that requires time resolution of beat detection with only a simple configuration (the same as the configuration of the tempo detection device). ) And processing that requires a frequency resolution of chord detection (configuration that can further detect chords based on the configuration of the tempo detection device) can be performed simultaneously.

以下、本発明の実施の形態を図示例と共に説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１は、本発明に係るテンポ検出装置の全体ブロック図である。同図によれば、本テンポ検出装置の構成は、音響信号を入力する入力部１と、入力された音響信号から、所定の時間間隔で、ＦＦＴ演算を行い、所定の時間毎の各音階音のレベルを求める音階音レベル検出部２と、この所定の時間毎の各音階音のレベルの増分値をすべての音階音について合計して、所定の時間毎の全体の音の変化度合いを示すレベルの増分値の合計を求め、この所定の時間毎の全体の音の変化度合いを示すレベルの増分値の合計から、平均的なビート間隔と各ビートの位置を検出するビート検出部３と、このビート毎の各音階音のレベルの平均値を計算し、このビート毎の各音階音の平均レベルの増分値をすべての音階音について合計して、ビート毎の全体の音の変化度合いを示す値を求め、このビート毎の全体の音の変化度合いを示す値から、拍子と小節線位置を検出する小節検出部４とを有している。 FIG. 1 is an overall block diagram of a tempo detection apparatus according to the present invention. According to the figure, the configuration of the tempo detection device includes an input unit 1 that inputs an acoustic signal, and performs an FFT operation at a predetermined time interval from the input acoustic signal, and each scale sound for each predetermined time. A level indicating the degree of change in the overall sound per predetermined time by summing up the increment value of the level of each scale sound per predetermined time for all scale sounds A beat detector 3 for detecting an average beat interval and the position of each beat from the sum of the incremental values of the level indicating the degree of change of the entire sound for each predetermined time, A value that indicates the degree of change in the overall sound for each beat by calculating the average value of the level of each scale sound for each beat, and adding up the increments of the average level of each scale sound for each beat for all scale sounds. Change the overall sound for each beat From the value indicating the fit, and a bar detection unit 4 that detects the time signature and bar line position.

音楽音響信号を入力する上記入力部１は、テンポ検出をする対象の音楽音響信号を入力する部分である。マイク等の機器から入力されたアナログ信号をＡ／Ｄ変換器（図示無し）によりディジタル信号に変換しても良いし、音楽ＣＤなどのディジタル化された音楽データの場合は、そのままファイルとして取り込み（リッピング）、これを指定して開くようにしても良い。このようにして入力したディジタル信号がステレオの場合、後の処理を簡略化するためにモノラルに変換する。 The input unit 1 for inputting a music sound signal is a part for inputting a music sound signal to be subjected to tempo detection. An analog signal input from a device such as a microphone may be converted into a digital signal by an A / D converter (not shown). In the case of digitized music data such as a music CD, it is directly taken in as a file ( Ripping), it may be specified and opened. When the input digital signal is stereo, it is converted to monaural in order to simplify subsequent processing.

このディジタル信号は、音階音レベル検出部２に入力される。この音階音レベル検出部は図２の各部から構成される。 This digital signal is input to the scale sound level detector 2. This scale sound level detection unit is composed of each unit shown in FIG.

そのうち波形前処理部２０は、音楽音響信号の上記入力部１からの音響信号を今後の処理に適したサンプリング周波数にダウンサンプリングする構成である。 Among them, the waveform preprocessing unit 20 is configured to downsample the sound signal from the input unit 1 of the music sound signal to a sampling frequency suitable for future processing.

ダウンサンプリングレートは、ビート検出に使う楽器の音域によって決定する。すなわち、シンバル、ハイハット等の高音域のリズム楽器の演奏音をビート検出に反映させるには、ダウンサンプリング後のサンプリング周波数を高い周波数にする必要があるが、ベース音とバスドラム、スネアドラム等の楽器音と中音域の楽器音から主にビート検出させる場合には、ダウンサンプリング後のサンプリング周波数はそれほど高くする必要はない。 The downsampling rate is determined by the range of the instrument used for beat detection. In other words, in order to reflect the performance sound of high-frequency rhythm instruments such as cymbals and hi-hats in beat detection, it is necessary to set the sampling frequency after down-sampling to a high frequency, but bass sounds, bass drums, snare drums, etc. When beat detection is mainly performed from instrument sounds and middle instrument sounds, the sampling frequency after downsampling need not be so high.

例えば検出する最高音をＡ６（Ｃ４が中央のド）とする場合、Ａ６の基本周波数は約１７６０Ｈｚ（Ａ４＝４４０Ｈｚとした場合）となるので、ダウンサンプリング後のサンプリング周波数は、ナイキスト周波数が１７６０Ｈｚ以上となる、３５２０Ｈｚ以上にすれば良い。これから、ダウンサンプリングレートは、元のサンプリング周波数が４４．１ｋＨｚ（音楽ＣＤ）の場合、１／１２程度にすれば良いことになる。この時、ダウンサンプリング後のサンプリング周波数は、３６７５Ｈｚとなる。 For example, when the highest sound to be detected is A6 (C4 is in the middle), the basic frequency of A6 is about 1760 Hz (when A4 = 440 Hz), so the sampling frequency after downsampling is a Nyquist frequency of 1760 Hz or higher. It may be 3520 Hz or higher. From this, the downsampling rate may be about 1/12 when the original sampling frequency is 44.1 kHz (music CD). At this time, the sampling frequency after downsampling is 3675 Hz.

ダウンサンプリングの処理は、通常、ダウンサンプリング後のサンプリング周波数の半分の周波数であるナイキスト周波数（今の例では１８３７．５Ｈｚ）以上の成分をカットするローパスフィルタを通した後に、データを読み飛ばす（今の例では波形サンプルの１２個に１１個を破棄する）ことによって行われる。 In the downsampling process, data is skipped after passing through a low-pass filter that cuts off components above the Nyquist frequency (1837.5 Hz in this example), which is usually half the sampling frequency after downsampling (now In this example, 11 out of 12 waveform samples are discarded).

このようにダウンサンプリングの処理を行うのは、この後のＦＦＴ演算において、同じ周波数分解能を得るために必要なＦＦＴポイント数を下げることで、ＦＦＴの演算時間を減らすのが目的である。 The purpose of downsampling in this way is to reduce the FFT computation time by lowering the number of FFT points necessary to obtain the same frequency resolution in the subsequent FFT computation.

なお、音楽ＣＤのように、音源が固定のサンプリング周波数で既にサンプリングされている場合は、このようなダウンサンプリングが必要になるが、音楽音響信号の入力部１が、マイク等の機器から入力されたアナログ信号をＡ／Ｄ変換器によりディジタル信号に変換するような場合には、当然Ａ／Ｄ変換器のサンプリング周波数を、ダウンサンプリング後のサンプリング周波数に設定することで、この波形前処理部を省くことが可能である。 When a sound source has already been sampled at a fixed sampling frequency, such as a music CD, such downsampling is necessary. However, the music acoustic signal input unit 1 is input from a device such as a microphone. When an analog signal is converted into a digital signal by an A / D converter, the waveform pre-processing unit is naturally set by setting the sampling frequency of the A / D converter to the sampling frequency after downsampling. It can be omitted.

このようにして波形前処理部２０によるダウンサンプリングが終了したら、所定の時間間隔で、波形前処理部の出力信号を、ＦＦＴ演算部２１によりＦＦＴ（高速フーリエ変換）する。 When the downsampling by the waveform preprocessing unit 20 is completed in this manner, the output signal of the waveform preprocessing unit is subjected to FFT (Fast Fourier Transform) by the FFT calculation unit 21 at predetermined time intervals.

ＦＦＴのパラメータ（ＦＦＴポイント数とＦＦＴ窓のシフト量）は、ビート検出に適した値とする。つまり、周波数分解能を上げるためにＦＦＴポイント数を大きくすると、ＦＦＴ窓のサイズが大きくなってしまい、より長い時間から１回のＦＦＴを行うことになり、時間分解能が低下する、というＦＦＴの特性を考慮しなくてはならない（つまりビート検出時は周波数分解能を犠牲にして時間分解能をあげるのが良い）。窓のサイズと同じだけの長さの波形を使わないで、窓の一部だけに波形データをセットし残りは０で埋めることによって、ＦＦＴポイント数を大きくしても時間分解能が悪くならない方法もあるが、低音側のパワーも正しく検出するためには、ある程度の波形サンプル数は必要である。 The FFT parameters (the number of FFT points and the shift amount of the FFT window) are values suitable for beat detection. In other words, if the number of FFT points is increased in order to increase the frequency resolution, the size of the FFT window increases, and one FFT is performed from a longer time, resulting in the FFT characteristic that the time resolution decreases. (In other words, it is better to increase the time resolution at the expense of the frequency resolution when detecting beats.) There is a method in which the time resolution is not deteriorated even if the number of FFT points is increased by setting the waveform data to only a part of the window and filling the rest with 0 without using the waveform as long as the window size. However, a certain number of waveform samples is necessary to correctly detect the power on the bass side.

以上のようなことを考慮し、本実施例では、ＦＦＴポイント数５１２、窓のシフトは３２サンプルで、０埋めなしという設定にした。このような設定でＦＦＴ演算を行うと、時間分解能約８．７ｍｓ、周波数分解能約７．２Ｈｚとなる。時間分解能約８．７ｍｓという値は、四分音符＝３００のテンポの曲で、３２分音符の長さが、２５ｍｓであることを考えると、十分な値であることがわかる。 Considering the above, in this embodiment, the number of FFT points is 512, the window shift is 32 samples, and no zero padding is set. When FFT calculation is performed with such settings, the time resolution is about 8.7 ms and the frequency resolution is about 7.2 Hz. It can be seen that the time resolution of about 8.7 ms is a sufficient value considering that the tune has a tempo of quarter note = 300 and the length of the 32nd note is 25 ms.

このようにして、所定の時間間隔毎にＦＦＴ演算が行われ、その実数部と虚数部のそれぞれを二乗したものの和の平方根からパワーが計算され、その結果がレベル検出部２２に送られる。 In this way, the FFT operation is performed at predetermined time intervals, the power is calculated from the square root of the sum of the squares of the real part and the imaginary part, and the result is sent to the level detector 22.

レベル検出部２２では、ＦＦＴ演算部２１で計算されたパワー・スペクトルから、各音階音のレベルを計算する。ＦＦＴは、サンプリング周波数をＦＦＴポイント数で割った値の整数倍の周波数のパワーが計算されるだけであるので、このパワー・スペクトルから各音階音のレベルを検出するために、以下のような処理を行う。つまり、音階音を計算するすべての音（Ｃ１からＡ６）について、その各音の基本周波数の上下５０セントの範囲（１００セントが半音）の周波数に相当するパワー・スペクトルの内、最大のパワーを持つスペクトルのパワーをこの音階音のレベルとする。 The level detector 22 calculates the level of each scale sound from the power spectrum calculated by the FFT calculator 21. Since the FFT only calculates the power of a frequency that is an integer multiple of the sampling frequency divided by the number of FFT points, in order to detect the level of each scale tone from this power spectrum, the following processing is performed. I do. In other words, for all the sounds (C1 to A6) for which the scale sound is calculated, the maximum power in the power spectrum corresponding to the frequency in the range of 50 cents above and below the fundamental frequency of each sound (100 cents is a semitone) is obtained. Let the power of the spectrum it has be the scale level.

すべての音階音についてレベルが検出されたら、これをバッファに保存し、波形の読み出し位置を所定の時間間隔（先の例では３２サンプル）進めて、ＦＦＴ演算部２１とレベル検出部２２を波形の終わりまで繰り返す。 When the levels are detected for all the scale sounds, this is stored in the buffer, the waveform reading position is advanced by a predetermined time interval (32 samples in the previous example), and the FFT calculation unit 21 and the level detection unit 22 are moved to the waveform. Repeat until the end.

以上により、音楽音響信号の入力部１に入力された音響信号の、所定時間毎の各音階音のレベルが、バッファ２３に保存される。 As described above, the level of each scale sound for each predetermined time of the acoustic signal input to the music acoustic signal input unit 1 is stored in the buffer 23.

次に、図１のビート検出部３の構成について説明する。該ビート検出部３は、図３のような処理の流れで実行される。 Next, the configuration of the beat detection unit 3 in FIG. 1 will be described. The beat detection unit 3 is executed in the process flow as shown in FIG.

ビート検出部３は、音階音レベル検出部が出力した所定時間（以下、この１所定時間を１フレームと呼ぶ）毎の各音階音のレベルの変化を元に平均的なビート（拍）間隔（つまりテンポ）とビートの位置を検出する。そのために、まずビート検出部３は、各音階音のレベル増分値の合計（前のフレームとのレベルの増分値をすべての音階音で合計したもの。前のフレームからレベルが減少している場合は０として加算する）を計算する（ステップＳ１００）。 The beat detection unit 3 uses an average beat (beat) interval (based on a change in the level of each scale sound for each predetermined time (hereinafter, this one predetermined time is referred to as one frame) output from the scale sound level detection unit. That is, the tempo) and the beat position are detected. For this purpose, first, the beat detection unit 3 sums up the level increment values of each scale sound (the sum of the level increment values from the previous frame for all the scale sounds. When the level is reduced from the previous frame Is added as 0) (step S100).

つまり、フレーム時間ｔにおけるｉ番目の音階音のレベルをＬ_ｉ（ｔ）とするとき、ｉ番目の音階音のレベル増分値Ｌ_ａｄｄｉ（ｔ）は、下式数１に示すようになり、このＬ_ａｄｄｉ（ｔ）を使って、フレーム時間ｔにおける各音階音のレベル増分値の合計Ｌ（ｔ）は、下式数２で計算できる。ここで、Ｔは音階音の総数である。 That is, when the level of the i-th scale sound at the frame time t is L _i (t), the level increment value L _addi (t) of the i-th scale sound is as shown in the following equation 1, Using L _addi (t), the sum L (t) of the level increments of each scale tone at the frame time t can be calculated by the following equation (2). Here, T is the total number of scale sounds.

この合計Ｌ（ｔ）値は、フレーム毎の全体での音の変化度合いを表している。この値は、音の鳴り始めで急激に大きくなり、同時に鳴り始める音が多いほど大きな値となる。音楽はビートの位置で音が鳴り始めることが多いので、この値が大きなところはビートの位置である可能性が高いことになる。 The total L (t) value represents the degree of change in sound for each frame. This value suddenly increases at the beginning of sounding, and becomes larger as more sounds begin to sound at the same time. Since music often starts to sound at the beat position, there is a high possibility that the place where this value is large is the beat position.

例として、図４に、ある曲の一部分の波形と各音階音のレベル、各音階音のレベル増分値の合計の図を示す。上段が波形、中央がフレーム毎の各音階音のレベルを濃淡で表したもの（下が低い音、上が高い音。この図では、Ｃ１からＡ６の範囲）、下段がフレーム毎の各音階音のレベル増分値の合計を示している。この図の各音階音のレベルは、音階音レベル検出部から出力されたものであるので、周波数分解能が約７．２Ｈｚであり、Ｇ＃２以下の一部の音階音でレベルが計算できずに歯抜け状態になっているが、この場合はビートを検出するのが目的であるので、低音の一部の音階音のレベルが測定できないのは、問題ない。 As an example, FIG. 4 shows a diagram of the sum of the waveform of a part of a certain song, the level of each musical note, and the level increment value of each musical note. The upper row is the waveform, the middle is the tone level of each scale in each frame (lower is lower, the upper is higher. In this figure, the range is C1 to A6), and the lower is each scale. Shows the sum of level increments. Since the level of each scale sound in this figure is output from the scale sound level detector, the frequency resolution is about 7.2 Hz, and the level cannot be calculated for some scale sounds below G # 2. In this case, since the purpose is to detect beats, it is not a problem that the level of a part of the lower tone cannot be measured.

この図の下段に見られるように、各音階音のレベル増分値の合計は、定期的にピークをもつ形となっている。この定期的なピークの位置が、ビートの位置である。 As seen in the lower part of the figure, the sum of the level increments of each scale sound has a peak periodically. This regular peak position is the beat position.

ビートの位置を求めるために、ビート検出部３では、まずこの定期的なピークの間隔、つまり平均的なビート間隔を求める。平均的なビート間隔はこの各音階音のレベル増分値の合計の自己相関から計算できる（図３；ステップＳ１０２）。 In order to obtain the beat position, the beat detector 3 first obtains the periodic peak interval, that is, the average beat interval. The average beat interval can be calculated from the autocorrelation of the total level increment value of each scale note (FIG. 3; step S102).

あるフレーム時間ｔにおける各音階音のレベル増分値の合計をＬ（ｔ）とすると、この自己相関φ（τ）は、以下の式数３で計算される。 When the total level increment value of each scale tone in a certain frame time t is L (t), this autocorrelation φ (τ) is calculated by the following equation (3).

ここで、Ｎは総フレーム数、τは時間遅れである。

Here, N is the total number of frames, and τ is a time delay.

自己相関計算の概念図を、図５に示す。この図のように、時間遅れτがＬ（ｔ）のピークの周期の整数倍の時に、φ（τ）は大きな値となる。よって、ある範囲のτについてφ（τ）の最大値を求めれば、曲のテンポを求めることができる。 A conceptual diagram of autocorrelation calculation is shown in FIG. As shown in this figure, when the time delay τ is an integral multiple of the peak period of L (t), φ (τ) takes a large value. Therefore, if the maximum value of φ (τ) is obtained for a certain range of τ, the tempo of the music can be obtained.

自己相関を求めるτの範囲は、想定する曲のテンポ範囲によって変えれば良い。例えば、メトロノーム記号で四分音符＝３０から３００の範囲を計算するならば、自己相関を計算する範囲は、０．２秒から２秒となる。時間（秒）からフレームへの変換式は、以下の数４式に示す通りとなる。 The range of τ for obtaining the autocorrelation may be changed according to the assumed tempo range of the song. For example, if the range of quarter note = 30 to 300 is calculated with a metronome symbol, the range for calculating the autocorrelation is 0.2 second to 2 seconds. The conversion formula from time (seconds) to frame is as shown in the following equation (4).

この範囲の自己相関φ（τ）が最大となるτをビート間隔としても良いが、必ずしもすべての曲で自己相関が最大となる時のτがビート間隔とはならないので、自己相関が極大値となる時のτからビート間隔の候補を求め（図３；ステップＳ１０４）、これら複数の候補からユーザにビート間隔を決定させるのが良い（図３；ステップＳ１０６）。 Τ with the maximum autocorrelation φ (τ) in this range may be set as the beat interval, but τ when autocorrelation is maximum in all songs is not necessarily the beat interval, so the autocorrelation is the maximum value. It is preferable to obtain beat interval candidates from τ at a certain time (FIG. 3; step S104), and let the user determine the beat interval from these multiple candidates (FIG. 3; step S106).

このようにしてビート間隔が決定したら（決定したビート間隔をτ_ｍａｘとする）、まず最初に先頭のビート位置を決定する。 When the beat interval is determined in this way (the determined beat interval is set to τ _max ), the head beat position is first determined.

先頭のビート位置の決定方法を、図６を用いて説明する。図６の上段はフレーム時間ｔにおける各音階音のレベル増分値の合計Ｌ（ｔ）で、下段Ｍ（ｔ）は決定したビート間隔τ_ｍａｘの周期で値を持つ関数である。式で表すと、下式数５に示すようになる。 A method for determining the first beat position will be described with reference to FIG. The upper part of FIG. 6 is a total L (t) of the level increment values of each tone at the frame time t, and the lower part M (t) is a function having a value at the determined beat interval τ _max . This is expressed by the following equation (5).

この関数Ｍ（ｔ）を、０からτ_ｍａｘ−１の範囲でずらしながら、Ｌ（ｔ）とＭ（ｔ）の相互相関を計算する。 The cross correlation between L (t) and M (t) is calculated while shifting this function M (t) in the range of 0 to τ _max −1.

相互相関ｒ（ｓ）は、上記Ｍ（ｔ）の特性から、下式数６で計算できる。 The cross-correlation r (s) can be calculated by the following equation 6 from the characteristic of M (t).

この場合のｎは、最初の無音部分の長さに応じて適当に決めれば良い（図６の例では、ｎ＝１０）。 In this case, n may be determined appropriately according to the length of the first silent portion (n = 10 in the example of FIG. 6).

ｒ（ｓ）をｓが０からτ_ｍａｘ−１の範囲で求め、ｒ（ｓ）が最大となるｓを求めれば、このｓのフレームが最初のビート位置である。 If r (s) is obtained in the range of s from 0 to τ _max −1, and s at which r (s) is maximized is obtained, this s frame is the first beat position.

最初のビート位置が決まったら、それ以降のビートの位置を１つずつ決定していく（図３；ステップＳ１０８）。 When the first beat position is determined, the subsequent beat positions are determined one by one (FIG. 3; step S108).

その方法を、図７を用いて説明する。図７の三角印の位置に先頭のビートが見つかったとする。２番目のビート位置は、この先頭のビート位置からビート間隔τ_ｍａｘだけ離れた位置を仮のビート位置とし、その近辺でＬ（ｔ）とＭ（ｔ）が最も相関が取れる位置から決定する。つまり、先頭のビート位置をｂ_０とするとき、以下の式のｒ（ｓ）が最大となるようなｓの値を求める。この式のｓは仮のビート位置からのずれで、以下の式数７の範囲の整数とする。Ｆは揺らぎのパラメータで０．１程度の値が適当であるが、テンポの揺らぎの大きい曲では、もっと大きな値にしてもよい。ｎは５程度でよい。 The method will be described with reference to FIG. Assume that the first beat is found at the position of the triangle in FIG. The second beat position is determined from a position where L (t) and M (t) are most correlated in the vicinity of the temporary beat position at a position separated by a beat interval τ _max from the first beat position. That is, when the leading beat position is b ₀ , the value of s is determined so that r (s) in the following expression is maximized. In this equation, s is a deviation from the temporary beat position, and is an integer in the range of Equation 7 below. F is a fluctuation parameter, and a value of about 0.1 is appropriate. However, a larger value may be used for a song with a large tempo fluctuation. n may be about 5.

ｋは、ｓの値に応じて変える係数で、例えば図８のような正規分布とする。 k is a coefficient that changes in accordance with the value of s, and has a normal distribution as shown in FIG. 8, for example.

ｒ（ｓ）が最大となるようなｓの値が求まれば、２番目のビート位置ｂ_１は、下式数８で計算される。 If the value of s that maximizes r (s) is obtained, the second beat position b ₁ is calculated by the following equation (8).

以降、同じようにして３番目以降のビート位置も求めることができる。 Thereafter, the third and subsequent beat positions can be obtained in the same manner.

テンポがほとんど変わらない曲ではこの方法でビート位置を曲の終わりまで求めることができるが、実際の演奏は多少テンポが揺らいだり、部分的にだんだん遅くなったりすることがよくある。 For songs with almost no change in tempo, the beat position can be obtained to the end of the song in this way, but the actual performance often fluctuates slightly or becomes partly slower.

そこで、これらのテンポの揺らぎにも対応できるように以下のような方法を考えた。 Therefore, the following method was considered so as to cope with these fluctuations in tempo.

つまり、図７のＭ（ｔ）の関数を、図９のように変化させるものである。
１）は、従来の方法で、図のように各パルスの間隔をτ１、τ２、τ３、τ４としたとき、
τ１＝τ２＝τ３＝τ４＝τ_ｍａｘ
である。
２）は、τ１からτ４を均等に大きくしたり小さくしたりするものである。
τ１＝τ２＝τ３＝τ４＝τ_ｍａｘ＋ｓ (-τ_ｍａｘ・Ｆ≦ｓ≦τ_ｍａｘ・Ｆ）これにより、急にテンポが変わった場合に対応できる。
３）は、ｒｉｔ．（リタルダンド、だんだん遅く）又は、ａｃｃｅｌ．（アッチェレランド、だんだん速く）に対応したもので、各パルス間隔は、
τ１＝τ_ｍａｘ
τ２＝τ_ｍａｘ＋１・ｓ
τ３＝τ_ｍａｘ＋２・ｓ（-τ_ｍａｘ・Ｆ≦ｓ≦τ_ｍａｘ・Ｆ）
τ４＝τ_ｍａｘ＋４・ｓ
で計算される。
１、２、４の係数は、あくまで例であり、テンポ変化の大きさによって変えてもよい。
４）は、３）のようなｒｉｔ．やａｃｃｅｌ．の場合の、５個のパルスの位置のどこが現在ビートを求めようとしている場所かを変えるものである。 That is, the function of M (t) in FIG. 7 is changed as shown in FIG.
1) is a conventional method, and when the intervals of each pulse are τ1, τ2, τ3, and τ4 as shown in the figure,
τ1 = τ2 = τ3 = τ4 = τ _max
It is.
In 2), τ1 to τ4 are uniformly increased or decreased.
τ1 = τ2 = τ3 = τ4 = τ max + s (-τ max · F ≦ s ≦ τ max · F) Thus, it corresponds to the case where sudden tempo changes.
3) rit. (Ritardando, gradually) or accele. (Accelerando, gradually faster), each pulse interval is
τ1 = τ _max
τ2 = τ _max + 1 · s
τ3 = τ _max + 2 · s (−τ _max · F ≦ s ≦ τ _max · F)
τ4 = τ _max + 4 · s
Calculated by
The coefficients 1, 2, and 4 are merely examples, and may be changed depending on the magnitude of tempo change.
4) is a rit. And accel. In this case, the position of the five pulses is changed where the current beat is to be obtained.

これらをすべて組み合わせて、Ｌ（ｔ）とＭ（ｔ）の相関を計算し、それらの最大からビート位置を決めれば、テンポが揺らぐ曲に対してもビート位置の決定が可能である。なお、２）と３）の場合には、相関を計算するときの係数ｋの値を、やはりｓの値に応じて変えるようにする。 By combining all of these, calculating the correlation between L (t) and M (t), and determining the beat position from the maximum of them, it is possible to determine the beat position even for a song whose tempo fluctuates. In the case of 2) and 3), the value of the coefficient k when calculating the correlation is also changed according to the value of s.

さらに、５個のパルスの大きさは現在すべて同じにしてあるが、ビートを求める位置（図９の仮のビート位置）のパルスのみ大きくしたり、ビートを求める位置から離れるほど値を小さくして、ビートを求める位置の各音階音のレベル増分値の合計を強調するようにしてもよい［図９の５）］。 Furthermore, although the five pulses are all the same in size, only the pulse at the position where the beat is calculated (the temporary beat position in FIG. 9) is increased, or the value is decreased as the distance from the position where the beat is determined is increased. The sum of the level increments of each scale tone at the position where the beat is sought may be emphasized [5) in FIG.

以上のようにして、各ビートの位置が決定したら、この結果をバッファ３０に保存すると共に、検出した結果を表示し、ユーザに確認してもらい、間違っている箇所を修正してもらうようにしてもよい。 As described above, when the position of each beat is determined, the result is stored in the buffer 30, and the detected result is displayed, and the user is asked to confirm and correct the wrong part. Also good.

ビート検出結果の確認画面の例を、図１０に示す。同図の三角印の位置が検出したビート位置である。 An example of a confirmation screen for beat detection results is shown in FIG. The position of the triangle mark in the figure is the detected beat position.

「再生」のボタンを押すと、現在の音楽音響信号が、Ｄ／Ａ変換され、スピーカ等から再生される。現在の再生位置は、図のように縦線等の再生位置ポインタで表示されるので、演奏を聞きながら、ビート検出位置の誤りを確認できる。さらに、検出の元波形の再生と同時に、ビート位置のタイミングで例えばメトロノームのような音を再生させるようにすれば、目で確認するだけでなく音でも確認でき、より容易に誤検出を判断できる。このメトロノームの音を再生させる方法としては、例えばＭＩＤＩ機器等が考えられる。 When the “play” button is pressed, the current music sound signal is D / A converted and played from a speaker or the like. Since the current playback position is displayed with a playback position pointer such as a vertical line as shown in the figure, it is possible to confirm an error in the beat detection position while listening to the performance. Furthermore, if a sound such as a metronome is played at the beat position timing simultaneously with the reproduction of the original waveform of the detection, it is possible to check not only with the eyes but also with the sound, and it is possible to judge the false detection more easily. . As a method for reproducing the sound of the metronome, for example, a MIDI device can be considered.

ビート検出位置の修正は、「ビート位置の修正」ボタンを押して行う。このボタンを押すと、画面に十字のカーソルが現れるので、最初のビート検出が間違っている箇所で正しいビート位置をクリックする。クリックされた場所の少し前（例えばτ_ｍａｘの半分の位置）から後のビート位置をすべてクリアし、クリックされた場所を、仮のビート位置として、以降のビート位置を再検出する。 The beat detection position is corrected by pressing the “correct beat position” button. When this button is pressed, a cross cursor appears on the screen. Click the correct beat position where the first beat detection is wrong. All beat positions after a position slightly before the clicked position (for example, half the position of _τmax ) are cleared, and the subsequent beat positions are detected again with the clicked position as the temporary beat position.

次に、拍子および小節の検出について説明する。 Next, the detection of time signature and measure will be described.

これまでの処理で、ビートの位置が確定しているので、今度は、ビート毎の音の変化度合いを求める。ビート毎の音の変化度合いは、音階音レベル検出部が出力した、フレーム毎の各音階音のレベルから計算する。 Since the position of the beat has been determined by the processing so far, the degree of change in sound for each beat is obtained next time. The degree of change in sound for each beat is calculated from the level of each scale sound for each frame output from the scale sound level detector.

ｊ番目のビートのフレーム数をｂ_ｊとし、その前後のビートのフレームをｂ_ｊ−１、ｂ_ｊ＋１とする時、ｊ番目のビートのビート毎の音の変化度合いは、フレームｂ_ｊ−１からｂ_ｊ−１までのフレームの各音階音のレベルの平均とフレームｂ_ｊからｂ_ｊ＋１−１までのフレームの各音階音のレベルの平均を計算し、その増分値から各音階音のビート毎の音の変化度合いを求め、それらをすべての音階音で合計して計算することができる。 When the number of frames of the j-th beat is b _j and the frames of the beats before and after the j-th beat are b _j−1 and b _{j + 1} , the degree of change in sound for each beat of the j-th beat is from the frame b _j−1. The average of the levels of each scale in the frames up to b _j −1 and the average of the levels of each scale in the frames from b _j to b _{j + 1} −1 are calculated. The degree of change of sound can be obtained and calculated by summing up all the scales.

つまり、フレーム時間ｔにおけるｉ番目の音階音のレベルをＬ_ｉ（ｔ）とするとき、ｊ番目のビートのｉ番目の音階音のレベルの平均Ｌ_ａｖｇｉ（ｊ）は、下式数９であるから、ｊ番目のビートのｉ番目の音階音のビート毎の音の変化度合いＢ_ａｄｄｉ（ｊ）は、下式数１０に示すようになる。 That is, when the level of the i-th scale sound at the frame time t is L _i (t), the average level L _avg i (j) of the i-th scale sound level of the j-th beat is expressed by the following equation (9). Therefore, the sound change degree B _addi (j) for each beat of the i-th tone of the j-th beat is expressed by the following equation (10).

よって、ｊ番目のビートのビート毎の音の変化度合いＢ（ｊ）は、下式数１１に示すようになる。ここで、Ｔは音階音の総数である。 Therefore, the sound change degree B (j) for each beat of the j-th beat is as shown in the following equation (11). Here, T is the total number of scale sounds.

図１１の最下段は、このビート毎の音の変化度合いである。このビート毎の音の変化度合いから拍子と１拍目の位置を求める。 The bottom row in FIG. 11 shows the degree of change in sound for each beat. The time signature and the position of the first beat are obtained from the degree of change in sound for each beat.

拍子は、ビート毎の音の変化度合いの自己相関から求める。一般的に音楽は１拍目で音が変わることが多いと考えられるので、このビート毎の音の変化度合いの自己相関から拍子を求めることができる。例えば、下式数１２に示す自己相関φ（τ）を求める式から、ビート毎の音の変化度合いＢ（ｊ）の自己相関φ（τ）を遅れτが、２から４の範囲で求め、自己相関φ（τ）が最大となる遅れτを拍子の数とする。 The time signature is obtained from the autocorrelation of the degree of sound change for each beat. In general, it is considered that the sound often changes in the first beat, so the time signature can be obtained from the autocorrelation of the sound change degree for each beat. For example, the autocorrelation φ (τ) of the sound change degree B (j) for each beat is determined in the range of 2 to 4 from the formula for obtaining the autocorrelation φ (τ) shown in the following equation (12). The delay τ that maximizes the autocorrelation φ (τ) is defined as the number of beats.

Ｎは、総ビート数、τ＝２〜４の範囲でφ（τ）を計算し、φ（τ）が最大となるτを拍子の数とする。 N is the total number of beats, and φ (τ) is calculated in the range of τ = 2 to 4, and τ at which φ (τ) is the maximum is the number of beats.

次に１拍目を求めるが、これは、ビート毎の音の変化度合いＢ（ｊ）がもっとも大きい箇所を１拍目とする。つまり、φ（τ）が最大となるτをτ_ｍａｘ、下式数１３のＸ（ｋ）が最大となるｋをｋ_ｍａｘとするとき、ｋ_ｍａｘ番目のビートが最初の１拍目の位置となり、以降、τ_ｍａｘを足したビート位置が１拍目となる。 Next, the first beat is obtained. This is the position where the sound change degree B (j) for each beat is the largest. That is, when phi (tau) is maximum tau and tau _max, the k of X (k) is maximum the following equation number 13 and _{k _max,} _{k max} th beat becomes the position of the first first beat Thereafter, the beat position obtained by adding τ _max is the first beat.

ｎ_ｍａｘは、τ_ｍａｘ・ｎ＋ｋ＜Ｎの条件で最大となるｎ

n _max is the _maximum n under the condition of τ _max · n + k <N

以上のようにして、拍子及び１拍目の位置（小節線の位置）が決定したら、この結果をバッファ４０に保存すると共に、検出した結果を画面表示して、ユーザに変更させるようにすることが望ましい。特に変拍子の曲は、この方法では対応できないので、変拍子の箇所をユーザに指定してもらう必要がある。 As described above, when the time signature and the position of the first beat (bar line position) are determined, the result is stored in the buffer 40, and the detected result is displayed on the screen so that the user can change it. Is desirable. In particular, music with odd time signatures cannot be handled by this method, so it is necessary to have the user specify the location of odd time signatures.

以上の実施例構成により、人間が演奏したテンポの揺らぐ演奏の音響信号から、曲全体の平均的なテンポと正確なビート（拍）の位置、さらに曲の拍子と１拍目の位置を検出することが可能となる。 With the configuration of the above embodiment, the average tempo and accurate beat (beat) position of the entire song, as well as the time signature and the first beat position are detected from the acoustic signal of the performance of the tempo performed by a human. It becomes possible.

図１２は、本発明のコード検出装置の全体ブロック図である。同図において、ビート検出及び小節検出の構成は、実施例１と基本的に同じであり、同一構成において、テンポ検出用とコード検出用の構成について、上記実施例１の場合と異なるものもあるので、数式等を除き、同じ説明が重なるが、以下に示す。 FIG. 12 is an overall block diagram of the code detection apparatus of the present invention. In the figure, the configuration of beat detection and measure detection is basically the same as that of the first embodiment, and in the same configuration, the tempo detection and chord detection configurations are different from those of the first embodiment. Therefore, the same description overlaps except for mathematical formulas and the like, and is shown below.

同図によれば、本コード検出装置の構成は、音響信号を入力する入力部１と、入力された音響信号から、所定の時間間隔で、ビート検出に適したパラメータを使ってＦＦＴ演算を行い、所定の時間毎の各音階音のレベルを求めるビート検出用音階音レベル検出部２と、この所定の時間毎の各音階音のレベルの増分値をすべての音階音について合計して、所定の時間毎の全体の音の変化度合いを示すレベルの増分値の合計を求め、この所定の時間毎の全体の音の変化度合いを示すレベルの増分値の合計から、平均的なビート間隔と各ビートの位置を検出するビート検出部３と、このビート毎の各音階音のレベルの平均値を計算し、このビート毎の各音階音の平均レベルの増分値をすべての音階音について合計して、ビート毎の全体の音の変化度合いを示す値を求め、このビート毎の全体の音の変化度合いを示す値から、拍子と小節線位置を検出する小節検出部４と、上記入力された音響信号から、先のビート検出の時とは異なる別の所定の時間間隔で、コード検出に適したパラメータを使ってＦＦＴ演算を行い、所定の時間毎の各音階音のレベルを求めるコード検出用音階音レベル検出部５と、検出した各音階音のレベルのうち、各小節内における低域側の音階音のレベルからベース音を検出するベース音検出部６と、検出したベース音と各音階音のレベルから各小節のコード名を決定するコード名決定部７とを有している。 According to the figure, the configuration of the code detection device performs an FFT operation using parameters suitable for beat detection at predetermined time intervals from the input unit 1 for inputting an acoustic signal and the input acoustic signal. , A beat detection scale level detector 2 for obtaining the level of each scale sound for each predetermined time, and the increment value of each scale sound level for each predetermined time is summed for all the scale sounds, The sum of the level increments that indicate the degree of change in the overall sound over time is obtained, and the average beat interval and each beat are calculated from the sum of the level increments that indicate the degree of change in the overall sound over a given time period. The beat detection unit 3 for detecting the position of the sound, and the average value of the level of each scale sound for each beat are calculated, and the increment value of the average level of each scale sound for each beat is summed for all the scale sounds. Change in overall sound for each beat And a measure detecting unit 4 for detecting the time signature and the position of the measure line from the value indicating the degree of change in the overall sound for each beat, and the time of the previous beat detection from the input acoustic signal. A chord detection scale sound level detection unit 5 that performs an FFT operation using a parameter suitable for chord detection at a predetermined time interval different from the above and obtains the level of each scale sound for each predetermined time; Among the levels of each scale, the bass sound detection unit 6 that detects a bass sound from the level of the low-frequency scale sound in each measure, and the chord name of each measure from the detected bass sound and the level of each scale sound And a code name determination unit 7 for determination.

音楽音響信号を入力する上記入力部１は、コード検出をする対象の音楽音響信号を入力する部分であるが、基本的構成は上記実施例１の入力部１と同じであるので、その詳細な説明は省略する。ただし、通常センタに定位されるボーカルが後のコード検出でじゃまになる場合は、右チャンネルの波形と左チャンネルの波形を引き算することでボーカルキャンセルするようにしても良い。 The input unit 1 for inputting a music acoustic signal is a part for inputting a music acoustic signal to be subjected to chord detection, but the basic configuration is the same as that of the input unit 1 of the first embodiment. Description is omitted. However, if vocals normally localized at the center are disturbed by later code detection, vocal cancellation may be performed by subtracting the waveform of the right channel and the waveform of the left channel.

このディジタル信号は、ビート検出用音階音レベル検出部２とコード検出用音階音レベル検出部５とに入力される。これらの音階音レベル検出部は、どちらも上記図２の各部から構成され、構成はまったく同じなので、同じものをパラメータだけを変えて再利用できる。 This digital signal is input to the beat detection scale level detector 2 and the chord detection scale level detector 5. Each of these scale sound level detection units is composed of the respective units shown in FIG. 2 and has the same configuration. Therefore, the same components can be reused by changing only the parameters.

そしてその構成として使用される波形前処理部２０は、上記と同様な構成であり、音楽音響信号の上記入力部１からの音響信号を今後の処理に適したサンプリング周波数にダウンサンプリングする。ただし、ダウンサンプリング後のサンプリング周波数、つまり、ダウンサンプリングレートは、ビート検出用とコード検出用で変えるようにしても良いし、ダウンサンプリングする時間を節約するために同じにしても良い。 The waveform pre-processing unit 20 used as the configuration has the same configuration as described above, and down-samples the acoustic signal from the input unit 1 of the music acoustic signal to a sampling frequency suitable for future processing. However, the sampling frequency after downsampling, that is, the downsampling rate, may be changed for beat detection and chord detection, or may be the same in order to save time for downsampling.

ビート検出用の場合は、ビート検出に使う音域によってダウンサンプリングレートを決定する。シンバル、ハイハット等の高音域のリズム楽器の演奏音をビート検出に反映させるには、ダウンサンプリング後のサンプリング周波数を高い周波数にする必要があるが、ベース音とバスドラム、スネアドラム等の楽器音と中音域の楽器音から主にビート検出させる場合には、以下のコード検出時と同じダウンサンプリングレートで構わない。 In the case of beat detection, the downsampling rate is determined by the range used for beat detection. In order to reflect the performance sound of high-frequency rhythm instruments such as cymbals and hi-hats in beat detection, the sampling frequency after down-sampling needs to be set to a high frequency, but the bass sound and instrument sounds such as bass drum and snare drum In the case of detecting beats mainly from instrument sounds in the middle range, the same downsampling rate as that in the following chord detection may be used.

コード検出用の波形前処理部のダウンサンプリングレートは、コード検出音域によって変える。コード検出音域とは、コード名決定部でコード検出するときに使う音域のことである。例えばコード検出音域をＣ３からＡ６（Ｃ４が中央のド）とする場合、Ａ６の基本周波数は約１７６０Ｈｚ（Ａ４＝４４０Ｈｚとした場合）となるので、ダウンサンプリング後のサンプリング周波数はナイキスト周波数が１７６０Ｈｚ以上となる、３５２０Ｈｚ以上にすれば良い。これから、ダウンサンプリングレートは、元のサンプリング周波数が４４．１ｋＨｚ（音楽ＣＤ）の場合、１／１２程度にすれば良いことになる。この時、ダウンサンプリング後のサンプリング周波数は、３６７５Ｈｚとなる。 The down-sampling rate of the chord detection waveform pre-processing unit varies depending on the chord detection range. The chord detection tone range is a tone range used when chord detection is performed by the chord name determination unit. For example, if the chord detection sound range is C3 to A6 (C4 is the center), the basic frequency of A6 is about 1760 Hz (when A4 = 440 Hz), so the sampling frequency after downsampling is a Nyquist frequency of 1760 Hz or higher. It may be 3520 Hz or higher. From this, the downsampling rate may be about 1/12 when the original sampling frequency is 44.1 kHz (music CD). At this time, the sampling frequency after downsampling is 3675 Hz.

ダウンサンプリングの処理は、通常、ダウンサンプリング後のサンプリング周波数の半分の周波数であるナイキスト周波数（今の例では１８３７．５Ｈｚ）以上の成分をカットするローパスフィルタを通した後に、データを読み飛ばす（今の例では波形サンプルの１２個に１１個を破棄する）ことによって行われる。これについては、実施例１に説明したことと同じ理由による。 In the downsampling process, data is skipped after passing through a low-pass filter that cuts off components above the Nyquist frequency (1837.5 Hz in this example), which is usually half the sampling frequency after downsampling (now In this example, 11 out of 12 waveform samples are discarded). This is for the same reason as described in the first embodiment.

このようにして波形前処理部２０によるダウンサンプリングが終了したら、所定の時間間隔で、波形前処理部の出力信号をＦＦＴ演算部２１により、ＦＦＴ（高速フーリエ変換）する。 When the downsampling by the waveform preprocessing unit 20 is completed in this manner, the output signal of the waveform preprocessing unit is subjected to FFT (Fast Fourier Transform) by the FFT calculation unit 21 at predetermined time intervals.

ＦＦＴのパラメータ（ＦＦＴポイント数とＦＦＴ窓のシフト量）は、ビート検出時とコード検出時で異なる値とする。これは、周波数分解能を上げるためにＦＦＴポイント数を大きくすると、ＦＦＴ窓のサイズが大きくなってしまい、より長い時間から１回のＦＦＴを行うことになり、時間分解能が低下する、というＦＦＴの特性によるものである（つまりビート検出時は周波数分解能を犠牲にして時間分解能をあげるのが良い）。窓のサイズと同じだけの長さの波形を使わないで、窓の一部だけに波形データをセットし、残りは０で埋めることによってＦＦＴポイント数を大きくしても時間分解能が悪くならない方法もあるが、本実施例のケースでは、低音側のパワーも正しく検出するためにある程度の波形サンプル数は必要である。 The FFT parameters (the number of FFT points and the shift amount of the FFT window) are different values at the time of beat detection and code detection. This is because if the number of FFT points is increased to increase the frequency resolution, the size of the FFT window increases, and one FFT is performed from a longer time, resulting in a decrease in time resolution. (In other words, it is better to increase the time resolution at the expense of frequency resolution when detecting beats). A method that does not deteriorate the time resolution even if the number of FFT points is increased by setting waveform data to only a part of the window and filling the rest with 0 without using a waveform with the same length as the window size. However, in the case of the present embodiment, a certain number of waveform samples are necessary in order to correctly detect the power on the bass side.

以上のようなことを考慮し、本実施例では、ビート検出時はＦＦＴポイント数５１２、窓のシフトは３２サンプルで、０埋めなし、コード検出時はＦＦＴポイント数８１９２、窓のシフトは１２８サンプルで、波形サンプルは一度のＦＦＴで１０２４サンプル使うようにした。このような設定でＦＦＴ演算を行うと、ビート検出時は、時間分解能約８．７ｍｓ、周波数分解能約７．２Ｈｚ、コード検出時は、時間分解能約３５ｍｓ、周波数分解能約０．４Ｈｚとなる。今レベルを求めようとしている音階音は、Ｃ１からＡ６の範囲であるので、コード検出時の周波数分解能約０．４Ｈｚは、最も周波数差の小さいＣ１とＣ＃１の基本周波数の差、約１．９Ｈｚにも対応できる。また、四分音符＝３００のテンポの曲で３２分音符の長さが２５ｍｓであることを考えると、ビート検出時の時間分解能約８．７ｍｓは、十分な値であることがわかる。 In consideration of the above, in this embodiment, the number of FFT points is 512 at the time of beat detection, the window shift is 32 samples, 0 padding is not performed, the number of FFT points is 8192 at the time of code detection, and the window shift is 128 samples. Then, 1024 samples were used for the waveform sample in one FFT. When FFT calculation is performed with such a setting, the time resolution is about 8.7 ms and the frequency resolution is about 7.2 Hz when the beat is detected, and the time resolution is about 35 ms and the frequency resolution is about 0.4 Hz when the code is detected. Since the scale tone for which the level is to be obtained is in the range from C1 to A6, the frequency resolution of about 0.4 Hz at the time of detecting the chord is the difference between the basic frequency of C1 and C # 1 having the smallest frequency difference, about 1 .9 Hz is also supported. Considering that the length of a 32nd note is 25 ms in a song with a tempo of quarter note = 300, it can be seen that the time resolution of about 8.7 ms at the time of beat detection is a sufficient value.

レベル検出部２２では、ＦＦＴ演算部２１で計算されたパワー・スペクトルから、各音階音のレベルを計算する。ＦＦＴは、サンプリング周波数をＦＦＴポイント数で割った値の整数倍の周波数のパワーが計算されるだけであるので、このパワー・スペクトルから各音階音のレベルを検出するために、実施例１と同様な処理を行う。すなわち、音階音を計算するすべての音（Ｃ１からＡ６）について、その各音の基本周波数の上下５０セントの範囲（１００セントが半音）の周波数に相当するパワー・スペクトルの内、最大のパワーを持つスペクトルのパワーをこの音階音のレベルとする。 The level detector 22 calculates the level of each scale sound from the power spectrum calculated by the FFT calculator 21. Since FFT only calculates the power of a frequency that is an integer multiple of the value obtained by dividing the sampling frequency by the number of FFT points, in order to detect the level of each scale tone from this power spectrum, the same as in the first embodiment. Perform proper processing. That is, for all the sounds (C1 to A6) for which the scale sound is calculated, the maximum power in the power spectrum corresponding to frequencies in the range of 50 cents above and below the fundamental frequency of each sound (100 cents is a semitone) is obtained. Let the power of the spectrum it has be the scale level.

すべての音階音についてレベルが検出されたら、これをバッファに保存し、波形の読み出し位置を所定の時間間隔（先の例ではビート検出時は３２サンプル、コード検出時は１２８サンプル）進めて、ＦＦＴ演算部２１とレベル検出部２２を波形の終わりまで繰り返す。 When the levels are detected for all the scale sounds, this is stored in the buffer, and the waveform read position is advanced by a predetermined time interval (32 samples at the time of beat detection and 128 samples at the time of chord detection in the previous example), and FFT is performed. The calculation unit 21 and the level detection unit 22 are repeated until the end of the waveform.

以上により、音楽音響信号の入力部１に入力された音響信号の、所定時間毎の各音階音のレベルが、ビート検出用とコード検出用の２種類のバッファ２３及び５０に保存される。 As described above, the level of each scale sound of the sound signal input to the music sound signal input unit 1 for each predetermined time is stored in the two types of buffers 23 and 50 for beat detection and chord detection.

次に、図１２のビート検出部３及び小節検出部４の構成については、実施例１のビート検出部３及び小節検出部４と同じ構成なので、その詳細な説明は、ここでは、省略する。 Next, since the configurations of the beat detection unit 3 and the bar detection unit 4 in FIG. 12 are the same as those of the beat detection unit 3 and the bar detection unit 4 of the first embodiment, detailed description thereof is omitted here.

実施例１と同様な構成と手順で、小節線の位置（各小節のフレーム番号）が確定したので、今度は各小節のベース音を検出する。 Since the position of the bar line (the frame number of each bar) is determined by the same configuration and procedure as in the first embodiment, the bass sound of each bar is detected this time.

ベース音は、コード検出用音階音レベル検出部５が出力した各フレームの音階音のレベルから検出する。 The bass sound is detected from the scale level of each frame output by the chord detection scale level detector 5.

図１３に実施例１の図４と同じ曲の同じ部分のコード検出用音階音レベル検出部５が出力した各フレームの音階音のレベルを示す。この図のように、コード検出用音階音レベル検出部５での周波数分解能は、約０．４Ｈｚであるので、Ｃ１からＡ６のすべての音階音のレベルが抽出されている。 FIG. 13 shows the scale level of each frame output by the chord detection scale level detector 5 of the same part of the same song as FIG. 4 of the first embodiment. As shown in this figure, since the frequency resolution in the chord detection scale sound level detector 5 is about 0.4 Hz, the levels of all the scale sounds C1 to A6 are extracted.

ベース音は、小節の前半と後半で異なる可能性があるので、ベース音検出部６により、各小節の前半と後半でそれぞれ検出する。前半と後半のベース音が同じ音のときは、小節のベース音としてこれを確定し、コードも小節全体で検出する。前半と後半で別の音のベース音が検出されたときは、コードも前半と後半に分けて検出する。場合によっては、分割する範囲を更に半分にまで（小節の４分の１まで）狭めてもよい。 Since the bass sound may be different between the first half and the second half of the measure, the bass sound detection unit 6 detects the first half and the second half of each measure. When the first half and the second half are the same, this is confirmed as the bass of the measure, and the chord is also detected in the entire measure. When different bass sounds are detected in the first half and the second half, the chord is also detected separately in the first half and the second half. In some cases, the range to be divided may be further reduced to half (up to a quarter of the bar).

ベース音は、ベース検出期間におけるベース検出音域の音階音のレベルの平均的な強さから求める。 The bass sound is obtained from the average intensity of the scale sound level in the bass detection range during the bass detection period.

フレーム時間ｔにおけるｉ番目の音階音のレベルをＬ_ｉ（ｔ）とすると、フレームｆ_ｓからｆ_ｅのｉ番目の音階音の平均的なレベルＬ_ａｖｇｉ（ｆ_ｓ，ｆ_ｅ）は、下式数１４で計算できる。 When the level of the i-th note in the scale at frame time t and _L i (t), the average level of the i th scale notes of _{f e} from the frame _{_{_{f s L avgi (f s,}}} f e) is the following formula It can be calculated by Equation 14.

この平均的なレベルをベース検出音域、例えばＣ２からＢ３の範囲で計算し、平均的なレベルが最も大きな音階音をベース音として、ベース音検出部６は、決定する。ベース検出音域に音が含まれない曲や無音部分で間違ってベース音を検出しないために、適当な閾値を設定し、検出したベース音の平均的なレベルが、この閾値以下の場合は、ベース音を検出しないようにしてもよい。また、後のコード検出でベース音を重要視する場合には、検出したベース音がベース検出期間中継続してあるレベル以上を保っているかどうかをチェックするようにして、より確実なものだけをベース音として検出するようにしてもよい。さらに、ベース検出音域中、平均的なレベルが最も大きい音階音をベース音として決定するのではなく、この各音名の平均的なレベルを１２の音名毎に平均し、この音名毎のレベルが最も大きな音名をベース音名として決定し、その音名を持つベース検出音域の中の音階音で、平均的なレベルが最も大きい音階音をベース音として決定するようにしてもよい。 This average level is calculated in the bass detection range, for example, in the range from C2 to B3, and the bass sound detection unit 6 determines the scale tone having the highest average level as the bass sound. An appropriate threshold is set to prevent the bass sound from being erroneously detected in songs or silences that do not include sound in the bass detection range, and if the average level of the detected bass sound is below this threshold, Sound may not be detected. In addition, when the bass sound is important in later chord detection, it is checked whether the detected bass sound keeps a certain level or more continuously during the base detection period, so that only the more reliable ones are checked. You may make it detect as a bass sound. Further, instead of determining the scale tone having the highest average level in the bass detection range as the base tone, the average level of each pitch name is averaged for every 12 pitch names, The pitch name having the highest level may be determined as the bass pitch name, and the scale tone having the highest average level among the scale sounds in the bass detection range having the pitch name may be determined as the bass tone.

ベース音が決定したら、この結果をバッファ６０に保存すると共に、ベース検出結果を画面表示して、間違っている場合にはユーザに修正させるようにしてもよい。また、曲によってベース音域が変わることも考えられるので、ユーザがベース検出音域を変更できるようにしてもよい。 When the bass sound is determined, the result may be stored in the buffer 60, and the bass detection result may be displayed on the screen so that the user can correct it if it is incorrect. Further, since the bass range may be changed depending on the song, the user may be able to change the bass detection range.

図１４に、ベース音検出部６によるベース検出結果の表示例を示す。 In FIG. 14, the example of a display of the bass detection result by the bass sound detection part 6 is shown.

次にコード名決定部７によるコード検出処理であるが、該コード検出処理も、同じようにコード検出期間における各音階音の平均的なレベルを計算することによって決定する。 Next, the chord detection process by the chord name determination unit 7 is also determined by calculating the average level of each tone in the chord detection period.

本実施例では、コード検出期間とベース検出期間は同一としている。コード検出音域、例えばＣ３からＡ６の各音階音のコード検出期間における平均的なレベルを計算し、これが大きな値を持つ音階音から順に数個の音名を検出し、これとベース音の音名からコード名候補を抽出する。 In this embodiment, the code detection period and the base detection period are the same. The average level in the chord detection period, for example, the C3 to A6 scales in the chord detection period is calculated, and several pitch names are detected in order from the scale that has the largest value, and the pitch names of the bass sounds Extract code name candidates from.

この際、必ずしもレベルが大きな音がコード構成音であるとは限らないので、複数の音名の音を例えば５つ検出し、その中の２つ以上をすべての組み合わせで抜き出して、これとベース音の音名とからコード名候補の抽出を行う。 At this time, since a sound with a high level is not necessarily a chord component sound, for example, five sounds having a plurality of pitch names are detected, and two or more of them are extracted in all combinations, and this is used as a base. Extract chord name candidates from the pitch names of the sounds.

コードに関しても、平均的なレベルが閾値以下のものは検出しないようにしてもよい。また、コード検出音域もユーザが変更できるようにしてもよい。さらに、コード検出音域中、平均的なレベルが最も大きい音階音から順にコード構成音候補を抽出するのではなく、このコード検出音域内の各音名の平均的なレベルを１２の音名毎に平均し、この音名毎のレベルの最も大きな音名から順にコード構成音候補を抽出してもよい。 As for the code, the code whose average level is less than or equal to the threshold value may not be detected. Also, the chord detection range may be changed by the user. Further, the chord constituent sound candidates are not extracted in order from the scale sound having the highest average level in the chord detection range, but the average level of each pitch name in the chord detection range is set for every 12 pitch names. On average, chord constituent sound candidates may be extracted in order from the sound name having the largest level for each sound name.

コード名候補の抽出は、コードのタイプ（ｍ、Ｍ７等）とコード構成音のルート音からの音程を保存したコード名データベースを、コード名決定部７により検索することによって抽出する。つまり、検出した５つの音名の中からすべての２つ以上の組み合わせを抜き出し、これらの音名間の音程が、このコード名データベースのコード構成音の音程の関係にあるかどうかをしらみつぶしに調べ、同じ音程関係にあれば、コード構成音のいずれかの音名からルート音を算出し、そのルート音の音名にコードタイプを付けて、コード名を決定する。このとき、コードのルート音（根音）や５度の音は、コードを演奏する楽器では省略されることがあるので、これらを含まなくてもコード名候補として抽出するようにする。ベース音を検出した場合には、このコード名候補のコード名にベース音の音名を加える。すなわち、コードのルート音とベース音が同じ音名であればそのままでよいし、異なる音名の場合は分数コードとする。 The chord name candidates are extracted by searching the chord name determination unit 7 for a chord name database storing the chord type (m, M7, etc.) and the pitch from the root tone of the chord constituent sound. In other words, all two or more combinations are extracted from the five detected pitch names, and whether or not the pitch between these pitch names is related to the pitch of the chord constituent pitches of this chord name database. If the same pitch relation is found, the root sound is calculated from any one of the chord constituent sounds, the chord type is added to the pitch name of the root sound, and the chord name is determined. At this time, the chord root sound (five tone) and the fifth sound may be omitted in the musical instrument playing the chord, so that even if they are not included, they are extracted as chord name candidates. When a bass tone is detected, the pitch name of the bass tone is added to the chord name of this chord name candidate. In other words, if the chord root sound and the bass sound have the same pitch name, they can be left as they are.

上記方法では、抽出されるコード名候補が多すぎるという場合には、ベース音による限定を行ってもよい。つまり、ベース音が検出された場合には、コード名候補の中でそのルート音がベース音と同じ音名でないものは削除する。 In the above method, when there are too many code name candidates to be extracted, limitation by bass sound may be performed. That is, when a bass sound is detected, chord name candidates whose root sound is not the same as the base sound are deleted.

コード名候補が複数抽出された場合には、これらの中でどれか１つを決定するために、コード名決定部７により、尤度（もっともらしさ）の計算をする。 When a plurality of code name candidates are extracted, the code name determination unit 7 calculates likelihood (likelihood) in order to determine one of them.

尤度は、コード検出音域におけるすべてのコード構成音のレベルの強さの平均とベース検出音域におけるコードのルート音のレベルの強さから計算する。すなわち、抽出されたあるコード名候補のすべての構成音のコード検出期間における平均レベルの平均値をＬ_ａｖｇｃ、コードのルート音のベース検出期間における平均レベルをＬ_ａｖｇｒとすると、下式数１５のように、この２つの平均により尤度を計算する。 The likelihood is calculated from the average level intensity of all chord constituent sounds in the chord detection range and the level intensity of the chord root tone in the base detection range. That is, the average level of the average value L _avgC in code detection periods for all constituent notes of a chord name candidates _extracted, when the average level at the base detection period route of the chord and L _AVGR, the following equation number 15 Thus, the likelihood is calculated by the average of the two.

この際、コード検出音域やベース検出音域に同一音名の音が複数含まれる場合には、それらのうち、平均レベルの強い方を使うようにする。あるいは、コード検出音域とベース検出音域のそれぞれで、各音階音の平均レベルを１２の音名毎に平均し、その音名毎の平均値を使うようにしてもよい。 At this time, when a plurality of sounds having the same pitch name are included in the chord detection range and the bass detection range, the one with the stronger average level is used. Alternatively, in each of the chord detection range and the bass detection range, the average level of each scale sound may be averaged for every 12 pitch names, and the average value for each pitch name may be used.

さらに、この尤度の計算に音楽的な知識を導入してもよい。例えば、各音階音のレベルを全フレームで平均し、それを１２の音名毎に平均して各音名の強さを計算し、その強さの分布から曲の調を検出する。そして、調のダイアトニックコードには尤度が大きくなるようにある定数を掛ける、あるいは、調のダイアトニックスケール上の音から外れた音を構成音に含むコードはその外れた音の数に応じて尤度が小さくなるようにする等が、考えられる。さらにコード進行のよくあるパターンをデータベースとして記憶しておき、それと比較することで、コード候補の中からよく使われる進行になるようなものは尤度が大きくなるようにある定数を掛けるようにしてもよい。 Further, musical knowledge may be introduced into the likelihood calculation. For example, the level of each musical note is averaged over all frames, and is averaged for every 12 pitch names to calculate the strength of each pitch name, and the key of the song is detected from the distribution of the strength. Then, the key diatonic chord is multiplied by a certain constant so that the likelihood is increased, or the chord that includes the sound deviating from the sound on the key diatonic scale depends on the number of the deviated sounds. For example, the likelihood may be reduced. In addition, by storing a pattern of common chord progressions as a database and comparing it with the ones that are frequently used among chord candidates, a certain constant is applied to increase the likelihood. Also good.

最も尤度が大きいものをコード名として決定するが、コード名の候補を尤度とともに表示し、ユーザに選択させるようにしてもよい。 The code having the highest likelihood is determined as the code name. However, the code name candidates may be displayed together with the likelihood to be selected by the user.

いずれにしても、コード名決定部７により、コード名が決定したら、この結果をバッファ７０に保存すると共に、コード名が、画面出力されることになる。 In any case, when the code name is determined by the code name determination unit 7, the result is stored in the buffer 70 and the code name is output to the screen.

図１５に、コード名決定部７によるコード検出結果の表示例を示す。このように検出されたコード名を画面表示するだけでなく、ＭＩＤＩ機器等を使って、検出されたコードとベース音を再生するようにすることが望ましい。一般的には、コード名を見ただけで正しいかどうかは判断できないからである。 FIG. 15 shows a display example of the code detection result by the code name determination unit 7. In addition to displaying the detected code name on the screen in this way, it is desirable to reproduce the detected code and bass sound using a MIDI device or the like. This is because it is generally not possible to determine whether the code is correct just by looking at the code name.

以上説明した本実施例構成によれば、特別な音楽的知識を有する専門家でなくても、音楽ＣＤ等の複数の楽器音の混ざった入力された音楽音響信号に対し、個々の音符情報を検出することなしに全体の響きから、コード名を検出することができるようになる。 According to the configuration of the present embodiment described above, even if not an expert having special musical knowledge, individual note information is input to an input music sound signal mixed with a plurality of instrument sounds such as a music CD. The code name can be detected from the overall sound without detection.

さらに、該構成によれば、構成音が同じ和音でも判別可能で、演奏のテンポが揺らいでしまった場合や、逆にわざとテンポを揺らして演奏しているような音源に関しても、小節毎のコード名が検出可能となる。 In addition, according to this configuration, even if the constituent sounds are the same chord, even if the performance tempo fluctuates, or conversely, the sound source that is playing intentionally fluctuating the tempo, the code for each measure The name can be detected.

特に本実施例構成では、簡単な構成のみでビート検出という時間分解能が必要な処理（上記テンポ検出装置の構成と同じ）と、コード検出という周波数分解能が必要な処理（上記テンポ検出装置の構成を基にさらにコード名を検出できる構成）を同時に行うことができるようになる。 In particular, in the configuration of this embodiment, processing that requires time resolution of beat detection with the simple configuration (same as the configuration of the tempo detection device) and processing that requires frequency resolution of code detection (configuration of the tempo detection device described above). Based on this, it is possible to simultaneously perform a configuration in which a code name can be detected.

尚、本発明のテンポ検出装置、コード名検出装置及びそれらを実現できるプログラムは、上述の図示例にのみ限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々変更を加え得ることは勿論である。 The tempo detection device, the code name detection device, and the program capable of realizing them according to the present invention are not limited to the above illustrated examples, and various modifications can be made without departing from the scope of the present invention. Of course.

本発明のテンポ検出装置、コード名検出装置及びそれらを実現できるプログラムは、ミュージックプロモーションビデオの作成の際などに音楽トラック中のビートの時刻に対して映像トラック中のイベントを同期させるビデオ編集処理や、ビートトラッキングによりビートの位置を見つけ音楽の音響信号の波形を切り貼りするオーディオ編集処理、人間の演奏に同期して照明の色・明るさ・方向・特殊効果などといった要素を制御したり、観客の手拍子や歓声などを自動制御するライブステージのイベント制御、音楽に同期したコンピュータグラフィックスなど、種々の分野で利用可能である。 The tempo detection device, the code name detection device of the present invention, and a program capable of realizing them include a video editing process for synchronizing an event in a video track with a beat time in a music track when creating a music promotion video, etc. Audio editing processing that finds beat positions by beat tracking, cuts and pastes the sound signal waveform of music, controls elements such as lighting color, brightness, direction, special effects, etc. in synchronization with human performance, It can be used in various fields, such as live stage event control that automatically controls clapping and cheering, and computer graphics synchronized with music.

本発明に係るテンポ検出装置の全体ブロック図である。1 is an overall block diagram of a tempo detection device according to the present invention. 音階音レベル検出部２の構成のブロック図である。It is a block diagram of a structure of a scale sound level detection part. ビート検出部３の処理の流れを示すフローチャートである。4 is a flowchart showing a flow of processing of a beat detection unit 3. ある曲の一部分の波形と各音階音のレベル、各音階音のレベル増分値の合計の図を示すグラフである。It is a graph which shows the figure of the sum total of the waveform of a part of a certain music, the level of each scale sound, and the level increment value of each scale sound. 自己相関計算の概念を示す説明図である。It is explanatory drawing which shows the concept of autocorrelation calculation. 先頭のビート位置の決定方法を説明する説明図である。It is explanatory drawing explaining the determination method of the first beat position. 最初のビート位置決定後のそれ以降のビートの位置を決定していく方法を示す説明図である。It is explanatory drawing which shows the method of determining the position of the subsequent beat after the first beat position determination. ｓの値に応じて変えられる係数ｋの分布状態を示すグラフである。It is a graph which shows the distribution state of the coefficient k changed according to the value of s. ２番目以降のビート位置の決定方法を示す説明図である。It is explanatory drawing which shows the determination method of the beat position after 2nd. ビート検出結果の確認画面の例を示す画面表示図である。It is a screen display figure which shows the example of the confirmation screen of a beat detection result. 小節検出結果の確認画面の例を示す画面表示図である。It is a screen display figure which shows the example of the confirmation screen of a bar detection result. 実施例２に係る本発明のコード検出装置の全体ブロック図である。It is a whole block diagram of the code | cord | chord detection apparatus of this invention which concerns on Example 2. FIG. 曲の同じ部分のコード検出用音階音レベル検出部５が出力した各フレームの音階音のレベルを示すグラフである。It is a graph which shows the level of the scale sound of each flame | frame output from the chord detection scale level detection part 5 of the same part of a music. ベース音検出部６によるベース検出結果の表示例を示すグラフである。It is a graph which shows the example of a display of the bass detection result by the bass sound detection part. コード検出結果の確認画面の例を示す画面表示図である。It is a screen display figure which shows the example of the confirmation screen of a code detection result.

Explanation of symbols

１入力部
２ビート検出用音階音レベル検出部
３ビート検出部
４小節検出部
５コード検出用音階音レベル検出部
６ベース音検出部
７コード名決定部
２０波形前処理部
２１ＦＦＴ演算部
２２レベル検出部
２３、３０、４０、５０、６０、７０バッファ DESCRIPTION OF SYMBOLS 1 Input part 2 Beat detection scale sound level detection part 3 Beat detection part 4 Measure detection part 5 Chord detection scale sound level detection part 6 Bass sound detection part 7 Code name determination part 20 Waveform pre-processing part 21 FFT operation part 22 Level Detector 23, 30, 40, 50, 60, 70 Buffer

Claims

An input means for inputting an acoustic signal;
A scale sound level detection means for performing an FFT operation at a predetermined time interval from an input acoustic signal and obtaining a level of each scale sound for each predetermined time;
The increment value of each scale sound level for each predetermined time is summed for all the scale sounds to obtain a total of level increment values indicating the degree of change in the overall sound for each predetermined time. Beat detection means for detecting the average beat interval and the position of each beat from the sum of the incremental values of the level indicating the degree of change in the overall sound for each,
The average value of the scale level for each beat is calculated, and the increment value of the average level of each scale sound for each beat is added for all the scale sounds to indicate the degree of change in the overall sound for each beat. A tempo detecting device comprising bar detecting means for obtaining a value and detecting a time signature and a bar line position from a value indicating a degree of change of the entire sound for each beat.

In detecting the average beat interval and the position of each beat by the beat detecting means, the average beat interval is obtained from the autocorrelation of the total level increment value of each scale sound, and then the level increment value of each scale sound. The first beat position is calculated by calculating the cross-correlation between the sum of the above and a function having a period at the average beat interval, and the second and subsequent beat intervals also have a period at the average beat interval. 2. The tempo detection apparatus according to claim 1, wherein the tempo detection apparatus is obtained by calculating a cross-correlation with a function it has.

In detecting the average beat interval and the position of each beat by the beat detecting means, the average beat interval is obtained from the autocorrelation of the total level increment value of each scale sound, and then the level increment value of each scale sound. And a function having a period at the average beat interval to obtain a first beat position, and the second and subsequent beat intervals are added to the average beat interval by + α or 2. The tempo detection apparatus according to claim 1, wherein a cross-correlation with a function obtained by adding an interval of-[alpha] is calculated.

In detecting the average beat interval and the position of each beat by the beat detecting means, the average beat interval is obtained from the autocorrelation of the total level increment value of each scale sound, and then the level increment value of each scale sound. The first beat position is calculated by calculating the cross-correlation between the sum of the above and a function having a period at the average beat interval, and the second and subsequent beat intervals are gradually increased from the average beat interval. 2. The tempo detection device according to claim 1, wherein a cross correlation with a function which is or gradually narrows is calculated.

In detecting the average beat interval and the position of each beat by the beat detecting means, the average beat interval is obtained from the autocorrelation of the total level increment value of each scale sound, and then the level increment value of each scale sound. The first beat position is calculated by calculating the cross-correlation between the sum of the above and a function having a period at the average beat interval, and the second and subsequent beat intervals are gradually increased from the average beat interval. 2. The tempo detection device according to claim 1, wherein the cross correlation with the function having an interval that becomes smaller or narrower is calculated by shifting a beat position in the middle thereof.

When obtaining the time signature and bar line position by the above bar detection means, the average value of the level of each scale sound for each beat is calculated, and the increment value of the average level of each scale sound for each beat is calculated for all scale sounds. In total, a value indicating the degree of change in the overall sound for each beat is obtained, the time signature is obtained from the autocorrelation of the value indicating the degree of change in the overall sound for each beat, and further, the change in the overall sound for each beat The tempo detection device according to claim 1, wherein the position having the largest value indicating the degree is set as the bar line position with the first beat as the first beat.

An input means for inputting an acoustic signal;
First scale sound level detection means for performing FFT calculation using a parameter suitable for beat detection at predetermined time intervals from the input acoustic signal, and obtaining the level of each scale sound for each predetermined time;
The increment value of each scale sound level for each predetermined time is summed for all the scale sounds to obtain a total of level increment values indicating the degree of change in the overall sound for each predetermined time. Beat detection means for detecting the average beat interval and the position of each beat from the sum of the incremental values of the level indicating the degree of change in the overall sound for each,
The average value of the scale level for each beat is calculated, and the increment value of the average level of each scale sound for each beat is added for all the scale sounds to indicate the degree of change in the overall sound for each beat. A bar detecting means for obtaining a value and detecting a time signature and a bar line position from a value indicating a change degree of the whole sound for each beat;
From the input acoustic signal, an FFT operation is performed using a parameter suitable for chord detection at a predetermined time interval different from that at the time of the previous beat detection, and the level of each scale sound for each predetermined time is calculated. Second scale level detection means to be obtained;
Bass sound detection means for detecting a bass sound from the level of the low-frequency scale sound in each measure out of the detected scale levels,
A chord name detecting device comprising chord name determining means for determining a chord name of each measure from the detected bass sound and the level of each scale sound.

When a plurality of bass sounds are detected in a measure in the bass sound detecting means, the chord name determining means divides the measure into several chord detection ranges according to the bass sound detection result. 8. The chord name detection apparatus according to claim 7, wherein a chord name in the chord detection range is determined from a bass sound and a level of each scale sound in each chord detection range.

Computer
An input means for inputting an acoustic signal;
A scale sound level detection means for performing an FFT operation at a predetermined time interval from an input acoustic signal and obtaining a level of each scale sound for each predetermined time;
The increment value of each scale sound level for each predetermined time is summed for all the scale sounds to obtain a total of level increment values indicating the degree of change in the overall sound for each predetermined time. Beat detection means for detecting the average beat interval and the position of each beat from the sum of the incremental values of the level indicating the degree of change in the overall sound for each,
The average value of the scale level for each beat is calculated, and the increment value of the average level of each scale sound for each beat is added for all the scale sounds to indicate the degree of change in the overall sound for each beat. A tempo detection program for obtaining a value and functioning as a measure detecting means for detecting a time signature and a measure line position from a value indicating a degree of change in the overall sound for each beat.

Computer
An input means for inputting an acoustic signal;
First scale sound level detection means for performing FFT calculation using a parameter suitable for beat detection at predetermined time intervals from the input acoustic signal, and obtaining the level of each scale sound for each predetermined time;
The increment value of each scale sound level for each predetermined time is summed for all the scale sounds to obtain a total of level increment values indicating the degree of change in the overall sound for each predetermined time. Beat detection means for detecting the average beat interval and the position of each beat from the sum of the incremental values of the level indicating the degree of change in the overall sound for each,
The average value of the scale level for each beat is calculated, and the increment value of the average level of each scale sound for each beat is added for all the scale sounds to indicate the degree of change in the overall sound for each beat. A bar detecting means for obtaining a value and detecting a time signature and a bar line position from a value indicating a change degree of the whole sound for each beat;
From the input acoustic signal, an FFT operation is performed using a parameter suitable for chord detection at a predetermined time interval different from that at the time of the previous beat detection, and the level of each scale sound for each predetermined time is calculated. Second scale level detection means to be obtained;
Bass sound detection means for detecting a bass sound from the level of the low-frequency scale sound in each measure out of the detected scale levels,
A chord name detection program which functions as chord name determination means for determining chord names of each measure from the detected bass sound and the level of each scale sound.