JP5153517B2

JP5153517B2 - Code name detection device and computer program for code name detection

Info

Publication number: JP5153517B2
Application number: JP2008216174A
Authority: JP
Inventors: 錬澄田
Original assignee: Kawai Musical Instrument Manufacturing Co Ltd
Current assignee: Kawai Musical Instrument Manufacturing Co Ltd
Priority date: 2008-08-26
Filing date: 2008-08-26
Publication date: 2013-02-27
Anticipated expiration: 2028-08-26
Also published as: JP2010054535A

Description

本発明は、音楽ＣＤ等の音楽音響信号（オーディオ信号）からその中で演奏されているコード名を検出するコード名検出装置及びコード名検出用コンピュータ・プログラムに関する。 The present invention relates to a chord name detecting device and a chord name detecting computer program for detecting a chord name being played from a music sound signal (audio signal) such as a music CD.

これまで、オーディオ信号からコード名を自動的に採譜する技術が開示されており、本出願人も、同様な構成について、出願してきた。 So far, a technique for automatically recording a chord name from an audio signal has been disclosed, and the present applicant has filed a similar configuration.

通常、楽曲は、楽節が繰り返されてできている。楽節とは、音楽形式で１つのまとまった単位のことで、通常８小節程度である。よく、Ａメロ、Ｂメロ、サビというような用語が使われるが、このＡメロ、Ｂメロ、サビというのが、１つの楽節単位である。 Usually, music is composed of repeated passages. A passage is a single unit in a music format, usually about 8 bars. Often, terms like A melody, B melody, and chorus are used, but A melody, B melody, and chorus are one syllable unit.

しかし、１つの長い楽曲をコード検出した場合、同じ楽節が何度も出てくるが、従来のコード名検出装置や同目的のプログラムでは、その夫々において小節分割の誤りやコード名検出結果の誤りを修正する必要があった。 However, when one long piece of music is chord-detected, the same passage will appear many times, but with conventional chord name detection devices and programs with the same purpose, there are errors in bar division and chord name detection results. Had to be fixed.

従来のコード名検出装置においては、その特性上、同じ楽節においては、同じような検出誤りをしてしまうことが多かった。そのため、ユーザは同じような修正を何度も繰り返す必要があり、これは非常に手間のかかる作業であった。 In the conventional code name detection device, due to its characteristics, the same detection error often occurs in the same passage. Therefore, it is necessary for the user to repeat the same correction many times, which is a very time-consuming work.

本発明は、以上のような問題に鑑み創案されたもので、コード名検出結果の検出精度を上げると共に、その精度向上の技術を生かして一旦一つの所の訂正を行ったら、即座に他の箇所も同様な訂正が行えるコード名検出装置及びコード名検出用コンピュータ・プログラムを提供せんとするものである。 The present invention was devised in view of the above-mentioned problems, and while improving the detection accuracy of the code name detection result and making corrections at one place by utilizing the technology for improving the accuracy, it immediately The code name detecting apparatus and the code name detecting computer program capable of performing the same correction are also provided.

そこで、本発明では、コード検出後、その検出コードの類似性から同じ楽節部分を検出する構成を創案した。その具体的構成は、
入力された音響信号に対し、該音響信号から、ビートの検出、小節の検出及びコードの検出・決定を行うコード名検出装置において、
上記音響信号の演奏を行う演奏手段と、
その演奏に従って、ユーザによる、複数の小節を含む楽節の区切りを受ける第１の入力手段と、
上記楽節の区切りの位置から、各楽節の小節数を割り出し、同じ小節数のものの中から検出したコードネームの文字列又はコードネームの構成音を比較することでそれらの類似性をチェックし、それらの類似性を表す類似度が所定の閾値以上の楽節同士には、ユニークなＩＤを割り振る楽節類似性検出手段と、
音響信号の演奏状況・楽節区切り状況・及びユニークなＩＤの振られた楽節を含む類似性検出状況をユーザに表示する表示手段と、
ユーザによる、上記類似性検出状況の判定を受け付ける第２の入力手段と、
上記類似性検出状況の判定について確定しないと入力された場合に、上記閾値を変更させて、上記楽節類似性検出手段に、各楽節の類似性の再検出を行わしめると共に、上記判定を確定すると入力された場合に、同じ楽節部分は、同じ小節分割及び同じコードになるように再検出させる楽節類似性確定手段と
を有することを基本的特徴としている。 Therefore, in the present invention, a configuration has been devised in which after detecting a code, the same passage portion is detected from the similarity of the detected code. Its specific configuration is
In the chord name detection device for detecting beats, detecting bars, and detecting / determining chords from the acoustic signals for the input acoustic signals,
A performance means for performing the acoustic signal;
A first input means for receiving a break of a passage including a plurality of measures according to the performance;
The number of measures for each passage is determined from the positions of the above-mentioned passages, and the similarities are checked by comparing the character strings of the code names or the constituent sounds of the code names detected from those with the same number of measures. A passage similarity detection means for assigning a unique ID to the passages having a similarity equal to or greater than a predetermined threshold,
Display means for displaying to the user the performance status of the acoustic signal, the section separation status, and the similarity detection status including the passage with the unique ID assigned;
A second input means for accepting determination of the similarity detection status by the user;
When it is input that the determination of the similarity detection status is not fixed, the threshold value is changed, and the similarities of the sections are re-detected by the section similarity detecting means, and the determination is confirmed. The basic feature is that, when input, the same section portion has section similarity determination means for redetecting the same section division and the same code.

上記構成では、コード検出後、その検出コードの類似性から、楽節類似性検出手段により同じ楽節部分を検出するのであるが、その際、ユーザが入力するのは、第１の入力手段による楽節の区切り位置だけであり、それがＡメロなのか、Ｂメロなのかというようなことは意識する必要がない。指定された楽節区切りから同じ小節数の楽節同士を比較し、同じようなコード進行であれば同一の楽節であると検出する。その類似性の判断にあたっては、上述のように、楽節類似性検出手段により、区切られた各楽節に対し、それらの小節数を割り出し、同じ小節数のものの中から検出したコードネームの文字列又はコードネームの構成音を比較し、それらの類似性を表す類似度が所定の閾値以上であるか否かで行う。 In the above configuration, after detecting the code, the same passage portion is detected by the passage similarity detection means from the similarity of the detected code. At this time, the user inputs the passage of the passage by the first input means. There is only a delimiter position, and there is no need to be aware of whether it is A melody or B melody. The same number of measures are compared with each other from the specified segment breaks, and if the chord progression is the same, the same passage is detected. In determining the similarity, as described above, the section similarity detection means calculates the number of bars for each section, and the character string of the code name detected from those having the same number of bars or The constituent sounds of the chord names are compared, and whether or not the similarity representing the similarity is equal to or higher than a predetermined threshold is determined.

それらの類似性のチェックが行われた後、上記楽節類似性検出手段により、それらの類似性を表す類似度が所定の閾値以上の楽節同士には、ユニークなＩＤを割り振る。これは、その後の楽節類似性確定手段により類似度が特定の閾値以上の楽節同士の一気の確定乃至一気に変更した後そのまま確定する作業をし易くするためである。 After the similarity is checked, a unique ID is assigned to the passages whose similarity representing the similarity is equal to or more than a predetermined threshold by the above-described passage similarity detection means. This is for facilitating the work of confirming as it is after the change of the similarities of the passages whose similarity is equal to or higher than a specific threshold by changing at a stroke or at once.

もちろん上記楽節類似性検出手段により、楽節の類似性の検出が、必ずしも正しいとは限らないので、確定前に、本構成では、表示手段によりその検出状況を表示させると共に、（必要に応じて演奏手段により元の音響信号を演奏させ）、最終的に、上記類似性検出状況について、第２の入力手段を使用して、ユーザによる判定を受け付けるようにしている。 Of course, since the similarity detection of the passage is not necessarily correct by the above-mentioned passage similarity detection means, in the present configuration, the detection status is displayed by the display means before confirmation (and if necessary, the performance is performed). The original sound signal is played by the means), and finally the determination by the user is accepted using the second input means for the similarity detection situation.

仮に、上記類似性検出状況の判定について、ユーザにより第２の入力手段において確定しないと入力された場合に、楽節類似性確定手段は、上記閾値を変更させて、上記楽節類似性検出手段に、各楽節の類似性の再検出を行わしめることになる。 If the user inputs that the determination of the similarity detection status is not confirmed by the second input means, the passage similarity determination means changes the threshold value to the passage similarity detection means. The similarity of each section will be rediscovered.

他方、各楽節の類似性の再検出しないまま、或いはそのような再検出がなされた後、上記判定を確定すると第２の入力手段で入力された場合に、同じ楽節部分（同一のユニークなＩＤがつけられた楽節部分）は、上記楽節類似性確定手段により、同じ小節分割及び同じコードになるように再検出させることになる。これは、楽節の類似性があると確定したのだから、同じ楽節間では、同じ小節の分割や並びになるはずであるし、また各小節中のコードは同じになるはずであるからである。 On the other hand, when the similarities of the individual passages are not detected again, or after such a redetection is made and the above determination is confirmed, the same passage portion (the same unique ID) is input when input by the second input means. The above-mentioned section similarity determination means re-detects the same section division and the same code. This is because, since it is determined that there is similarity between the sections, the same section should be divided and arranged in the same section, and the codes in each section should be the same.

また、上記楽節類似性確定手段により、同じ楽節部分が、同じ小節分割及び同じコードになるように再検出させた際に、小節分割及び／又はコードの修正を行える修正手段をさらに備えていると良い。第２の発明は以上のような構成を提案する。すなわち、上記楽節類似性確定手段により、同じ楽節部分が、同じ小節分割及び同じコードになっているはずであるが、それが誤っている場合、該修正手段によりこれらの誤りをユーザにより修正できるようにする。その場合、楽節類似性確定手段で同じ楽節と決定した部分は、修正手段で修正後、上記楽節類似性確定手段により、同じ小節分割及び同じコードになるように修正されることになる。そのような構成によって、同じ楽節部分においては、その誤りのある箇所を修正するだけで、同じ楽節の同じ小節を同時に修正することができ、修正の手間を大幅に減らすことが可能となる。 In addition, when the same section portion is re-detected so as to have the same measure division and the same code by the above-mentioned measure similarity determination means, it is further provided with a correction means capable of dividing the bar and / or correcting the code. good. The second invention proposes the configuration as described above. That is, by the above-mentioned section similarity determination means, the same section portion should have the same measure division and the same code, but if it is wrong, the correction means can correct these errors by the user. To. In that case, the portion determined to be the same passage by the passage similarity determination means is corrected by the correction means so that it becomes the same bar division and the same code by the above-mentioned passage similarity determination means. With such a configuration, it is possible to correct the same measure of the same passage at the same passage portion only by correcting the erroneous portion, and to greatly reduce the trouble of correction.

上記構成では、上述のように、コード検出後、その検出コードの類似性から、楽節類似性検出手段により同じ楽節部分を検出するのであるが、その際、ユーザが入力するのは、第１の入力手段による楽節の区切り位置だけであり、それがＡメロなのか、Ｂメロなのかというようなことは意識する必要がない。指定された楽節区切りから同じ小節数の楽節同士を比較し、同じようなコード進行であれば同一の楽節であると検出する。しかし、ユーザが予めその楽節がＡメロ、Ｂメロ或いはサビであると初めから分かっている場合は、楽曲全体としての総合的な構造が初めから明確になり、検出精度がより高まることから、上記第１の入力手段で受ける楽節の区切りに、さらに、Ａメロ、Ｂメロ、サビ等の楽節の入力を受ける構成とするのが、望ましい。第３の発明は、そのような構成について規定している。 In the above-described configuration, as described above, after detecting a code, the same section portion is detected by the section similarity detecting means from the similarity of the detected code. In this case, the user inputs the first section It is only the section position of the passage by the input means, and there is no need to be aware of whether it is A melody or B melody. The same number of measures are compared with each other from the specified segment breaks, and if the chord progression is the same, the same passage is detected. However, if the user knows in advance that the passage is A melody, B melody, or chorus, the overall structure of the entire song will be clear from the beginning, and the detection accuracy will be further improved. It is desirable that the section of the passage received by the first input means further receives the input of a passage such as A melody, B melody, and chorus. The third invention defines such a configuration.

さらに上記構成のうち、入力された音響信号から、ビートの検出、小節の検出及びコードの検出・決定を行う構成の１例としては、
音響信号を入力する入力手段と、
入力された音響信号から、所定の時間間隔で、ビート検出に適したパラメータを使ってＦＦＴ演算を行い、所定の時間毎の各音階音のレベルを求める第１の音階音レベル検出手段と、
この所定の時間毎の各音階音のレベルの増分値をすべての音階音について合計して、所定の時間毎の全体の音の変化度合いを示すレベルの増分値の合計を求め、この所定の時間毎の全体の音の変化度合いを示すレベルの増分値の合計から、平均的なビート間隔と各ビートの位置を検出するビート検出手段と、
このビート毎の各音階音のレベルの平均値を計算し、このビート毎の各音階音の平均レベルの増分値をすべての音階音について合計して、ビート毎の全体の音の変化度合いを示す値を求め、このビート毎の全体の音の変化度合いを示す値から、拍子と小節線位置を検出する小節検出手段と、
上記入力された音響信号から、先のビート検出の時とは異なる別の所定の時間間隔で、コード検出に適したパラメータを使ってＦＦＴ演算を行い、所定の時間毎の各音階音のレベルを求める第２の音階音レベル検出手段と、
検出した各音階音のレベルのうち、各小節内における低域側の音階音のレベルからベース音を検出するベース音検出手段と、
検出したベース音と各音階音のレベルから各小節のコード名を決定するコード名決定手段と、
検出した全てのコード毎に、コード位置、ベース検出期間におけるベースの検出音域の音階音のレベルより求められるベース域音階音強度、ベース音、コード検出期間におけるコードの検出音域の音階音のレベルより求められるコード音階音強度、コード構成音、コード構成音数、コード名を記憶するコード情報記憶手段と
を少なくとも有する構成が必要である。 Furthermore, among the above configurations, as an example of a configuration that performs beat detection, measure detection, and chord detection / determination from an input acoustic signal,
An input means for inputting an acoustic signal;
First scale sound level detection means for performing FFT calculation using a parameter suitable for beat detection at predetermined time intervals from the input acoustic signal, and obtaining the level of each scale sound for each predetermined time;
The increment value of each scale sound level for each predetermined time is summed for all the scale sounds to obtain a total of level increment values indicating the degree of change in the overall sound for each predetermined time. Beat detection means for detecting the average beat interval and the position of each beat from the sum of the incremental values of the level indicating the degree of change in the overall sound for each,
The average value of the scale level for each beat is calculated, and the increment value of the average level of each scale sound for each beat is added for all the scale sounds to indicate the degree of change in the overall sound for each beat. A bar detecting means for obtaining a value and detecting a time signature and a bar line position from a value indicating a change degree of the whole sound for each beat;
From the input acoustic signal, an FFT operation is performed using a parameter suitable for chord detection at a predetermined time interval different from that at the time of the previous beat detection, and the level of each scale sound for each predetermined time is calculated. Second scale level detection means to be obtained;
Bass sound detection means for detecting a bass sound from the level of the low-frequency scale sound in each measure out of the detected scale levels,
Chord name determining means for determining the chord name of each measure from the detected bass sound and the level of each scale sound;
For every detected chord, from the chord position, the base tone scale intensity obtained from the scale sound level of the bass detection range during the base detection period, the base tone, and the scale sound level of the chord detection range during the chord detection period It is necessary to have at least a chord information storage means for storing the required chord scale sound intensity, chord constituent sound, chord constituent sound number, and chord name.

上記構成では、入力手段に入力された音響信号から所定の時間毎の各音階音のレベルを音階音レベル検出手段によって求め、上記ビート検出手段によって、この所定の時間毎の各音階音のレベルの増分値をすべての音階音について合計して所定の時間毎の全体の音の変化度合いを示すレベルの増分値の合計を求め、同じくビート検出手段により、この所定の時間毎の全体の音の変化度合いを示すレベルの増分値の合計から、平均的なビート（拍）間隔（つまりテンポ）と各ビートの位置を検出し、次に上記小節検出手段により、このビート毎の各音階音のレベルの平均値を計算し、このビート毎の各音階音の平均レベルの増分値をすべての音階音について合計して、ビート毎の全体の音の変化度合いを示す値を求め、このビート毎の全体の音の変化度合いを示す値から、拍子と小節線位置（１拍目の位置）を検出することになる。 In the above-described configuration, the scale level for each predetermined time is obtained by the scale level detection means from the acoustic signal input to the input means, and the level of each scale tone for each predetermined time is determined by the beat detection means. The increment value is summed up for all the scale sounds to obtain the sum of the increment value of the level indicating the change degree of the whole sound every predetermined time, and the change of the whole sound every predetermined time is also obtained by the beat detection means. The average beat (beat) interval (that is, tempo) and the position of each beat are detected from the sum of the level increments indicating the degree, and then the measure of each scale tone for each beat is detected by the above bar detecting means. The average value is calculated, and the average level increment of each scale note for each beat is summed for all scale sounds to obtain a value indicating the degree of change in the overall sound for each beat. sound From the value indicating the degree of change, it will detect the time signature and bar line position (first beat position).

すなわち、入力された音響信号から所定の時間毎の各音階音のレベルを求め、この所定の時間毎の各音階音のレベルの変化から平均的なビート（拍）間隔（つまりテンポ）と各ビートの位置を検出し、次にこのビート毎の各音階音のレベルの変化から拍子と小節線位置（１拍目の位置）を検出することになる。 That is, the level of each scale sound for each predetermined time is obtained from the input sound signal, and the average beat (beat) interval (that is, tempo) and each beat are determined from the change in the level of each scale sound for each predetermined time. Next, the time signature and bar line position (position of the first beat) are detected from the change in the level of each scale tone for each beat.

また上記ベース音検出手段において、ベース音が小節内で複数検出される場合は、そのベース音検出結果に応じて、上記コード名決定手段は、小節を幾つかのコード検出範囲に分断し、この各コード検出範囲におけるコード名を、ベース音と各コード検出範囲における各音階音のレベルから、決定するものとする。 In the bass sound detecting means, when a plurality of bass sounds are detected in a measure, the chord name determining means divides the measure into several chord detection ranges according to the bass sound detection result. The chord name in each chord detection range is determined from the base sound and the level of each scale sound in each chord detection range.

上記構成によれば、入力手段から入力された入力音響信号に対し、第１の音階音レベル検出手段により、所定の時間間隔で、まずビート検出に適したパラメータでＦＦＴ演算を行い、これにより所定の時間毎の各音階音のレベルを求め、ビート検出手段により、この所定の時間毎の各音階音のレベルの変化から平均的なビート間隔と各ビートの位置を検出する。次に、小節検出手段により、このビート毎の各音階音のレベルの変化から拍子と小節線位置を検出する。さらに、本発明のコード名検出装置は、第２の音階音レベル検出手段により、入力音響信号に対し先のビート検出の時とは異なる別の所定の時間間隔で、今度はコード検出に適したパラメータでＦＦＴ演算を行い、これにより所定の時間毎の各音階音のレベルを求める。そしてベース音検出手段により、この各音階音のレベルの内、低域側の音階音のレベルから各小節のベース音を検出し、コード名決定手段により、検出したベース音と各音階音のレベルから各小節のコード名を決定することになる。 According to the above configuration, the first acoustic scale level detection means first performs an FFT operation with a parameter suitable for beat detection on the input sound signal input from the input means at a predetermined time interval. The level of each scale sound for each time is obtained, and the beat detection means detects the average beat interval and the position of each beat from the change in the level of each scale sound for each predetermined time. Next, the measure and the bar line position are detected from the change in the level of each scale sound for each beat by the measure detecting means. Furthermore, the chord name detection apparatus according to the present invention is suitable for chord detection at a predetermined time interval different from the time of the previous beat detection with respect to the input acoustic signal by the second scale sound level detection means. An FFT operation is performed with the parameters, thereby obtaining the level of each scale sound for each predetermined time. The bass sound detecting means detects the bass sound of each measure from the scale sound level on the low frequency side, and the chord name determining means detects the bass sound and the level of each scale sound. The chord name of each measure will be determined from

また上記のように、ベース音検出手段でこのベース音が小節内で複数検出される場合は、そのベース音検出結果に応じて、上記コード名決定手段は、小節を幾つかのコード検出範囲に分断し、この各コード検出範囲におけるコード名をベース音と各コード検出範囲における各音階音のレベルから決定することになる。 In addition, as described above, when a plurality of bass sounds are detected in the measure by the bass sound detecting means, the chord name determining means determines that the measure is divided into several chord detection ranges according to the bass sound detection result. The chord name in each chord detection range is determined from the bass sound and the level of each tone in the chord detection range.

以上のように、本発明のコード名検出装置の構成では、簡単な構成のみでビート検出という時間分解能が必要な処理（謂わばテンポ検出装置の構成と言って良い）と、和音検出という周波数分解能が必要な処理（上記テンポ検出装置の構成を基にさらに和音を検出できる構成）を同時に行うことができるようになる。 As described above, in the configuration of the code name detection device of the present invention, processing that requires time resolution of beat detection with only a simple configuration (so-called tempo detection device configuration) and frequency resolution of chord detection are possible. Can be performed simultaneously (a configuration that can further detect chords based on the configuration of the tempo detection device).

上記構成によって、ビート（拍）間隔、各ビート位置、拍子及び小節（１拍目の位置）の検出ができることとなり、入力手段に入力された音響信号から所定の時間毎の各音階音のパワースペクトルを音階音パワー検出手段によって求め、上記パワー増分値算出手段によって、この所定の時間毎（フレーム毎）の各音階音のパワーの増分値をすべての音階音について合計して所定の時間毎の全体の音の変化度合いを示すパワーの増分値の合計を求め、さらにビート検出手段により、この所定の時間毎の全体の音の変化度合いを示すパワーの増分値の合計から、平均的なビート（拍）間隔（つまりテンポ）と各ビートの位置を検出し、次に上記小節検出手段により、このビート毎の各音階音のパワーの平均値を計算し、このビート毎の各音階音の平均パワーの増分値をすべての音階音について合計して、ビート毎の全体の音の変化度合いを示す上記値求め、このビート毎の全体の音の変化度合いを示す値から、拍子と小節線位置（１拍目の位置）を検出することになる。 With the above configuration, it is possible to detect beat intervals, beat positions, time signatures, and measures (positions of the first beat), and the power spectrum of each scale sound at predetermined intervals from the acoustic signal input to the input means. Is obtained by the scale sound power detecting means, and the power increment value calculating means sums the increment value of the power of each scale sound for every predetermined time (for each frame) for all the scale sounds, and the whole for every predetermined time. The beat increment is used to calculate the average beat (beats) from the sum of the power increments indicating the degree of change in the entire sound every predetermined time. ) Detect the interval (that is, tempo) and the position of each beat, then calculate the average power of each scale sound for each beat by the above-mentioned measure detection means, Add the average power increment value for all scales to obtain the above value indicating the overall sound change rate for each beat, and use the value indicating the overall sound change rate for each beat to determine the time signature and bar line position. (The position of the first beat) is detected.

それを前提として、第１の発明〜第３の発明の構成により、上述のコード進行の類似性検出による楽曲の構造解析がなされれば、類似性の高い部分から総合的にコード名を決定できるので、コードの検出精度を上げることが可能になる。それと共に、その精度向上の技術を生かして、第２の発明（第３の発明が第２の発明の構成を備えている場合は第３の発明の構成も）の構成では、一旦一つの所の訂正を行ったら、即座に他の箇所も同様な訂正が行えるようになる。 On the premise of that, if the structure analysis of the music is performed by the above-described configuration of the first to third inventions, the chord name can be determined comprehensively from the highly similar portions. As a result, the code detection accuracy can be increased. At the same time, in the configuration of the second invention (and the configuration of the third invention when the third invention has the configuration of the second invention), the technique for improving the accuracy is used once. As soon as this correction is made, the same correction can be made in other places.

第５の発明〜第８の発明の構成は、第１の発明〜第４の発明の構成を、コンピュータに実行させるために、該コンピュータで実行可能なプログラム自身を規定している。すなわち、上述した課題を解決するための構成として、上記各手段を、コンピュータの構成を利用することで実現する、該コンピュータで読み込まれて実行可能なプログラムである。この場合、コンピュータとは中央演算処理装置の構成を含んだ汎用的なコンピュータの構成の他、特定の処理に向けられた専用機などを含むものであっても良く、中央演算処理装置の構成を伴うものであれば特に限定はない。 The configurations of the fifth to eighth inventions define a program that can be executed by the computer in order to cause the computer to execute the configurations of the first to fourth inventions. In other words, as a configuration for solving the above-described problems, the above-described means is realized by using the configuration of a computer, and is a program that can be read and executed by the computer. In this case, the computer may include a general-purpose computer configuration including the configuration of the central processing unit, or may include a dedicated machine directed to a specific process, and the configuration of the central processing unit. If it accompanies, there will be no limitation in particular.

上記各手段を実現させるためのプログラムが該コンピュータに読み出されて実行されることで、第１の発明〜第４の発明に規定された各機能実現手段と同様な機能実現手段が達成されることになる。 By reading and executing the program for realizing the above means by the computer, the same function realizing means as the function realizing means defined in the first to fourth inventions is achieved. It will be.

そのうち第５の発明のより具体的構成は、
コンピュータに読み込まれて実行されることにより、該コンピュータを、
入力された音響信号に対し、該音響信号から、ビートの検出、小節の検出及びコードの検出・決定を行う構成として機能させ、さらに該構成中に、
上記音響信号の演奏を行う演奏手段と、
その演奏に従って、ユーザによる、複数の小節を含む楽節の区切りを受ける第１の入力手段と、
上記楽節の区切りの位置から、各楽節の小節数を割り出し、同じ小節数のものの中から検出したコードネームの文字列又はコードネームの構成音を比較することでそれらの類似性をチェックし、それらの類似性を表す類似度が所定の閾値以上の楽節同士には、ユニークなＩＤを割り振る楽節類似性検出手段と、
音響信号の演奏状況・楽節区切り状況・及びユニークなＩＤの振られた楽節を含む類似性検出状況をユーザに表示する表示手段と、
ユーザによる、上記類似性検出状況の判定を受け付ける第２の入力手段と、
上記類似性検出状況の判定について確定しないと入力された場合に、上記閾値を変更させて、上記楽節類似性検出手段に、各楽節の類似性の再検出を行わしめると共に、上記判定を確定すると入力された場合に、同じ楽節部分は、同じ小節分割及び同じコードになるように再検出させる楽節類似性確定手段と
しての機能を備えさせるコード名検出用コンピュータ・プログラムである。 Of these, the more specific configuration of the fifth invention is:
By being read and executed by a computer, the computer is
For the input acoustic signal, from the acoustic signal, function as a configuration for detecting beats, detecting bars and detecting / determining chords,
A performance means for performing the acoustic signal;
A first input means for receiving a break of a passage including a plurality of measures according to the performance;
The number of measures for each passage is determined from the positions of the above-mentioned passages, and the similarities are checked by comparing the character strings of the code names or the constituent sounds of the code names detected from those with the same number of measures. A passage similarity detection means for assigning a unique ID to the passages having a similarity equal to or greater than a predetermined threshold,
Display means for displaying to the user the performance status of the acoustic signal, the section separation status, and the similarity detection status including the passage with the unique ID assigned;
A second input means for accepting determination of the similarity detection status by the user;
When it is input that the determination of the similarity detection status is not fixed, the threshold value is changed, and the similarities of the sections are re-detected by the section similarity detecting means, and the determination is confirmed. When inputted, the same passage portion is a code name detection computer program having a function as a passage similarity determining means for redetecting the same measure division and the same code.

第６の発明のより具体的な構成は、
コンピュータに読み込まれて実行されることにより、該コンピュータを、
入力された音響信号に対し、該音響信号から、ビートの検出、小節の検出及びコードの検出・決定を行う構成として機能させ、さらに該構成中に、
上記音響信号の演奏を行う演奏手段と、
その演奏に従って、ユーザによる、複数の小節を含む楽節の区切りを受ける第１の入力手段と、
上記楽節の区切りの位置から、各楽節の小節数を割り出し、同じ小節数のものの中から検出したコードネームの文字列又はコードネームの構成音を比較することでそれらの類似性をチェックし、それらの類似性を表す類似度が所定の閾値以上の楽節同士には、ユニークなＩＤを割り振る楽節類似性検出手段と、
音響信号の演奏状況・楽節区切り状況・及びユニークなＩＤの振られた楽節を含む類似性検出状況をユーザに表示する表示手段と、
ユーザによる、上記類似性検出状況の判定を受け付ける第２の入力手段と、
上記類似性検出状況の判定について確定しないと入力された場合に、上記閾値を変更させて、上記楽節類似性検出手段に、各楽節の類似性の再検出を行わしめると共に、上記判定を確定すると入力された場合に、同じ楽節部分は、同じ小節分割及び同じコードになるように再検出させる楽節類似性確定手段と、
上記楽節類似性確定手段により、同じ楽節部分が、同じ小節分割及び同じコードになるように再検出させた際に、小節分割及び／又はコードの修正を行える修正手段と
しての機能を備えさせるコード名検出用コンピュータ・プログラムである。 A more specific configuration of the sixth invention is:
By being read and executed by a computer, the computer is
For the input acoustic signal, from the acoustic signal, function as a configuration for detecting beats, detecting bars and detecting / determining chords,
A performance means for performing the acoustic signal;
A first input means for receiving a break of a passage including a plurality of measures according to the performance;
The number of measures for each passage is determined from the positions of the above-mentioned passages, and the similarities are checked by comparing the character strings of the code names or the constituent sounds of the code names detected from those with the same number of measures. A passage similarity detection means for assigning a unique ID to the passages having a similarity equal to or greater than a predetermined threshold,
Display means for displaying to the user the performance status of the acoustic signal, the section separation status, and the similarity detection status including the passage with the unique ID assigned;
A second input means for accepting determination of the similarity detection status by the user;
When it is input that the determination of the similarity detection status is not fixed, the threshold value is changed, and the similarities of the sections are re-detected by the section similarity detecting means, and the determination is confirmed. A passage similarity determination unit that, when input, causes the same section portion to be re-detected to have the same measure division and the same code;
Code name having a function as a correction means that can perform bar division and / or code correction when the same section portion is re-detected to have the same bar division and the same code by the above-mentioned section similarity determination means A computer program for detection.

第７の発明のより具体的構成は、上記第１の入力手段で受ける楽節の区切りに、さらに、Ａメロ、Ｂメロ、サビ等の楽節の入力を受けることを特徴としている。 A more specific configuration of the seventh invention is characterized in that an input of a passage such as an A melody, a B melody, or a chorus is further received at the section of the passage received by the first input means.

第８の発明のより具体的な構成は、
コンピュータに読み込まれて実行されることにより、
音響信号を入力する入力手段と、
入力された音響信号から、所定の時間間隔で、ビート検出に適したパラメータを使ってＦＦＴ演算を行い、所定の時間毎の各音階音のレベルを求める第１の音階音レベル検出手段と、
この所定の時間毎の各音階音のレベルの増分値をすべての音階音について合計して、所定の時間毎の全体の音の変化度合いを示すレベルの増分値の合計を求め、この所定の時間毎の全体の音の変化度合いを示すレベルの増分値の合計から、平均的なビート間隔と各ビートの位置を検出するビート検出手段と、
このビート毎の各音階音のレベルの平均値を計算し、このビート毎の各音階音の平均レベルの増分値をすべての音階音について合計して、ビート毎の全体の音の変化度合いを示す値を求め、このビート毎の全体の音の変化度合いを示す値から、拍子と小節線位置を検出する小節検出手段と、
上記入力された音響信号から、先のビート検出の時とは異なる別の所定の時間間隔で、コード検出に適したパラメータを使ってＦＦＴ演算を行い、所定の時間毎の各音階音のレベルを求める第２の音階音レベル検出手段と、
検出した各音階音のレベルのうち、各小節内における低域側の音階音のレベルからベース音を検出するベース音検出手段と、
検出したベース音と各音階音のレベルから各小節のコード名を決定するコード名決定手段と、
検出した全てのコード毎に、コード位置、ベース検出期間におけるベースの検出音域の音階音のレベルより求められるベース域音階音強度、ベース音、コード検出期間におけるコードの検出音域の音階音のレベルより求められるコード音階音強度、コード構成音、コード構成音数、コード名を記憶するコード情報記憶手段と
しての機能を、入力された音響信号から、ビートの検出、小節の検出及びコードの検出・決定を行う構成として、さらに該コンピュータに備えさせる第５の発明〜第７の発明のいずれか１つに記載の構成に適用可能なコード名検出用コンピュータ・プログラムである。 A more specific configuration of the eighth invention is:
By being loaded and executed on a computer,
An input means for inputting an acoustic signal;
First scale sound level detection means for performing FFT calculation using a parameter suitable for beat detection at predetermined time intervals from the input acoustic signal, and obtaining the level of each scale sound for each predetermined time;
The increment value of each scale sound level for each predetermined time is summed for all the scale sounds to obtain a total of level increment values indicating the degree of change in the overall sound for each predetermined time. Beat detection means for detecting the average beat interval and the position of each beat from the sum of the incremental values of the level indicating the degree of change in the overall sound for each,
The average value of the scale level for each beat is calculated, and the increment value of the average level of each scale sound for each beat is added for all the scale sounds to indicate the degree of change in the overall sound for each beat. A bar detecting means for obtaining a value and detecting a time signature and a bar line position from a value indicating a change degree of the whole sound for each beat;
From the input acoustic signal, an FFT operation is performed using a parameter suitable for chord detection at a predetermined time interval different from that at the time of the previous beat detection, and the level of each scale sound for each predetermined time is calculated. Second scale level detection means to be obtained;
Bass sound detection means for detecting a bass sound from the level of the low-frequency scale sound in each measure out of the detected scale levels,
Chord name determining means for determining the chord name of each measure from the detected bass sound and the level of each scale sound;
For every detected chord, from the chord position, the base tone scale intensity obtained from the scale sound level of the bass detection range during the base detection period, the base tone, and the scale sound level of the chord detection range during the chord detection period Functions as chord information storage means for storing the required chord scale sound intensity, chord constituent sound, chord constituent sound number, chord name, beat detection, measure detection and chord detection / determination from the input acoustic signal The computer program for detecting a code name applicable to the configuration according to any one of the fifth to seventh inventions further provided in the computer as a configuration for performing the above.

以上のようなプログラムの構成であれば、既存のハードウェア資源を用いてこのプログラムを使用することにより、既存のハードウェアで新たなアプリケーションとしての本発明の夫々の装置が容易に実現できるようになる。 With the program configuration as described above, by using this program using the existing hardware resources, each device of the present invention as a new application can be easily realized with the existing hardware. Become.

このプログラムという態様では、通信などを利用して、これを容易に使用、配布、販売することができるようになる。また、既存のハードウェア資源を用いてこのプログラムを使用することにより、既存のハードウェアで新たなアプリケーションとしての本発明の装置が容易に実行できるようになる。 In the aspect of this program, it becomes possible to easily use, distribute, and sell it using communication or the like. In addition, by using this program using existing hardware resources, the apparatus of the present invention as a new application can be easily executed with the existing hardware.

尚、第５の発明〜第８の発明のいずれか１つに記載の各機能実現手段のうち一部の機能は、コンピュータに組み込まれた機能（コンピュータにハードウェア的に組み込まれている機能でも良く、該コンピュータに組み込まれているオペレーティングシステムや他のアプリケーションプログラムなどによって実現される機能でも良い）によって実現され、前記プログラムには、該コンピュータによって達成される機能を呼び出すあるいはリンクさせる命令が含まれていても良い。 It should be noted that some of the functions realizing means according to any one of the fifth to eighth inventions are functions incorporated in a computer (even functions incorporated in a computer in hardware). It may be a function realized by an operating system or other application program incorporated in the computer, and the program includes an instruction for calling or linking a function achieved by the computer. May be.

これは、第１の発明〜第４の発明に規定された各機能実現手段の一部が、例えばオペレーティングシステムなどによって達成される機能の一部で代行され、その機能を実現するためのプログラムないしモジュールなどは直接存在するわけではないが、それらの機能を達成するオペレーティングシステムの機能の一部を、呼び出したりリンクさせるようにしてあれば、実質的に同じ構成となるからである。 This is because a part of each function realization means defined in the first invention to the fourth invention is replaced with a part of a function achieved by, for example, an operating system, and a program or a program for realizing the function This is because modules and the like do not exist directly, but if the functions of the operating system that achieve these functions are called and linked, they have substantially the same configuration.

本発明の構成によれば、コード検出後、そのコード進行の類似性から同じ楽節部分を検出し、類似すると検出されて、確定がなされた場合は、同じ楽節部分は、基本的に同じ小節分割、同じコード進行となるので、コード名の検出精度が極めて高くなるという優れた効果を奏し得る。 According to the configuration of the present invention, after detecting a chord, the same passage portion is detected from the similarity of the chord progression, and when it is detected that they are similar and confirmed, the same passage portion is basically divided into the same measure. Since the same chord progression is used, the chord name detection accuracy can be extremely enhanced.

また、そのコード検出で誤認識がたとえあったとしても、第２の発明又は第６の発明のように、ユーザによる修正作業が可能な場合は、同一のＩＤがつく箇所の１つを修正するだけで、他の部分は、自動的に修正されるため、ユーザは同じような修正を何度も繰り返す必要がなくなり、修正の手間を大幅に減らすことが可能となる。 Even if there is a misrecognition in the code detection, as in the second invention or the sixth invention, if the correction work by the user is possible, one of the parts having the same ID is corrected. However, since other parts are automatically corrected, it is not necessary for the user to repeat the same correction over and over, and the trouble of correction can be greatly reduced.

以下、本発明の実施の形態を図示例と共に説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１は、本発明の望ましい実施形態が適用されるパーソナルコンピュータの構成を示している。同図の構成では、後述するＣＤ−ＲＯＭドライブ１０１６に、ＣＤ−ＲＯＭ１０１６ａを入れて、それに読み込ませ、実行された場合に、該パーソナルコンピュータが、本発明のコード名検出装置として利用可能なプログラムが、該ＣＤ−ＲＯＭ１０１６ａに格納されている。従って、このＣＤ−ＲＯＭ１０１６ａを上記ＣＤ−ＲＯＭドライブ１０１６に読み込ませて実行させ、パーソナルコンピュータ上に、本発明のコード名検出装置が実現されることになる。 FIG. 1 shows the configuration of a personal computer to which a preferred embodiment of the present invention is applied. In the configuration shown in FIG. 1, a program that can be used as the code name detection apparatus of the present invention when the CD-ROM 1016a is loaded into a CD-ROM drive 1016, which will be described later, and is read and executed. Are stored in the CD-ROM 1016a. Accordingly, the CD-ROM 1016a is read and executed by the CD-ROM drive 1016, and the code name detection apparatus of the present invention is realized on the personal computer.

図１に示されるパーソナルコンピュータの回路概要は、システムバス１０００を介して、ＣＰＵ１００２、ＲＯＭ１００４、ＲＡＭ１００６、画像コントロール部（図示無し）を介して接続されるディスプレイ１００８、Ｉ／Ｏインターフェース１０１０、ハードディスクドライブ１０２０がつながっており、該システムバス１０００を介して、夫々のデバイスに制御信号、データの入出力がなされることになる。 The outline of the circuit of the personal computer shown in FIG. 1 is as follows: a CPU 1002, a ROM 1004, a RAM 1006, a display 1008, an I / O interface 1010, and a hard disk drive 1020 connected via a system bus 1000 via an image control unit (not shown). Are connected to each other, and control signals and data are input / output to / from each device via the system bus 1000.

ＣＰＵ１００２は、ＣＤ−ＲＯＭドライブ１０１６により上記ＣＤ−ＲＯＭ１０１６ａから読み込まれ、ハードディスクドライブ１０２０乃至ＲＡＭ１００６に格納される上記プログラムに基づき、コード名検出装置全体の制御を行う中央演算処理装置である。また後述するビート検出用音階音レベル検出部２０、ビート検出部２５、小節検出部３０、コード検出用音階音レベル検出部４０、ベース音検出部５０、コード名決定部６０、楽節類似性検出部１００や楽節類似性確定部１２０は、上記プログラムが稼働した該ＣＰＵ１００２によって構成されることになる。 The CPU 1002 is a central processing unit that controls the entire code name detection device based on the program read from the CD-ROM 1016a by the CD-ROM drive 1016 and stored in the hard disk drive 1020 to the RAM 1006. Also, a beat detection scale level detection unit 20, a beat detection unit 25, a bar detection unit 30, a chord detection scale level detection unit 40, a bass sound detection unit 50, a chord name determination unit 60, and a passage similarity detection unit which will be described later. 100 and the passage similarity determination unit 120 are constituted by the CPU 1002 in which the above-described program is operated.

ＲＯＭ１００４は、本パーソナルコンピュータのＢＩＯＳなどが記憶されている格納領域である。 The ROM 1004 is a storage area in which the BIOS of the personal computer is stored.

ＲＡＭ１００６は、本プログラムの格納エリアの他、ワークエリア、種々の係数、ＦＦＴ演算時に使用するビートやコード検出に適した各パラメータ等の、一時的な記憶領域（例えば後述するような各バッファや各変数を一時的に記憶しておく）等として使用される。 In addition to the storage area for this program, the RAM 1006 is a temporary storage area such as a work area, various coefficients, and parameters suitable for beat and chord detection used during FFT calculation (for example, each buffer and each It is used as a temporary memory).

ディスプレイ１００８は、ＣＰＵ１００２の指令により、必要な画像処理を行う画像コントロール部（図示無し）によって、制御されており、その画像処理結果を表示する。後述する表示部８０がそれに相当する。 The display 1008 is controlled by an image control unit (not shown) that performs necessary image processing in accordance with an instruction from the CPU 1002, and displays the image processing result. A display unit 80 described later corresponds to this.

Ｉ／Ｏインターフェース１０１０は、これを介してシステムバス１０００につながるキーボード１０１２、サウンドシステム１０１４、ＣＤ−ＲＯＭドライブ１０１６及びマウス１０１８に接続されており、これらのデバイスとシステムバス１０００上につながった上記デバイスとの間で、制御信号やデータの入出力がなされることになる。 The I / O interface 1010 is connected to a keyboard 1012, a sound system 1014, a CD-ROM drive 1016, and a mouse 1018 connected to the system bus 1000 through the I / O interface 1010, and these devices connected to the system bus 1000. The control signal and data are input and output between the two.

このサウンドシステム１０１４は、後述する入力部１０を構成するが、その他に、入力し記憶された音響信号を出力する後述する演奏部７０を構成している。 The sound system 1014 constitutes an input unit 10 to be described later. In addition, the sound system 1014 constitutes a performance unit 70 to be described later that outputs an input and stored acoustic signal.

またＣＤ−ＲＯＭドライブ１０１６は、コード名検出用のプログラムが格納されたＣＤ−ＲＯＭ１０１６ａから、該プログラムやデータなどを読み出す。そのプログラムやデータなどは、ハードディスクドライブ１０２０に格納され、またメインとなるプログラムは上記ＲＡＭ１００６上に格納され、ＣＰＵ１００２により実行される。 The CD-ROM drive 1016 reads the program, data, and the like from a CD-ROM 1016a in which a code name detection program is stored. The program, data, and the like are stored in the hard disk drive 1020, and the main program is stored in the RAM 1006 and executed by the CPU 1002.

上述のように、ハードディスクドライブ１０２０は、上記コード名検出用プログラムの読み込み及びその実行によって、該プログラム自身と必要なデータ等を格納する。該データとしては、後述するＦＦＴ演算による、所定の時間毎の各音階音のレベルを求めることにより得られる、ビート検出に適したパラメータやコード検出に適したパラメータ、或いは閾値などを含む各種パラメータなどがあり、ハードディスクドライブ１０２０は、ＲＡＭ１００６と共に上記各バッファやこれらのパラメータ、閾値などを記憶する。さらに後述するコード情報記憶部６２や音響信号記憶部７１としても機能する。該ハードディスクドライブに記憶されるデータは、サウンドシステム１０１４やＣＤ−ＲＯＭドライブ１０１６から入力されるものと同等の音響信号（演奏データなど）や、後述する第１の入力部９０や第２の入力部１１０或いは修正部１３０等からのデータや指示などが含まれる。 As described above, the hard disk drive 1020 stores the program itself and necessary data by reading the code name detection program and executing the program. The data includes, for example, parameters suitable for beat detection, parameters suitable for chord detection, various parameters including thresholds, and the like obtained by obtaining the level of each scale tone for each predetermined time by FFT calculation described later. The hard disk drive 1020 stores the above-described buffers, parameters thereof, threshold values, and the like together with the RAM 1006. Furthermore, it also functions as a chord information storage unit 62 and an acoustic signal storage unit 71 described later. The data stored in the hard disk drive is an acoustic signal (such as performance data) equivalent to that input from the sound system 1014 or the CD-ROM drive 1016, or a first input unit 90 or a second input unit described later. 110 or data from the correction unit 130 or the like.

本実施形態に係るコード名検出用プログラムを、パーソナルコンピュータ（ＲＡＭ１００６及びハードディスクドライブ１０２０）に読み込ませて、（ＣＰＵ１００２に）実行させることで、図２に示すようなコード名検出装置の構成となる。 The code name detection program according to the present embodiment is read into a personal computer (RAM 1006 and hard disk drive 1020) and executed (by the CPU 1002), whereby the code name detection apparatus as shown in FIG. 2 is configured.

さらに、上記キーボード１０１２とマウス１０１８などの入力機器は、後述する第１の入力部９０や第２の入力部１１０或いは修正部１３０を構成することになる。 Further, the input devices such as the keyboard 1012 and the mouse 1018 constitute a first input unit 90, a second input unit 110, or a correction unit 130 which will be described later.

図２は、本発明に係るコード名検出装置の全体ブロック図である。同図によれば、本コード名検出装置の構成は、音響信号を入力する入力部１０と、入力された音響信号から、所定の時間間隔（所定のフレーム；窓）で、ビート検出に適したパラメータを使ってＦＦＴ演算を行い、所定の時間毎の各音階音のレベルを求めるビート検出用音階音レベル検出部２０と、この所定の時間毎の各音階音のレベルの増分値をすべての音階音について合計して、所定の時間毎の全体の音の変化度合いを示すレベルの増分値の合計を求め、この所定の時間毎の全体の音の変化度合いを示すレベルの増分値の合計から、平均的なビート間隔と各ビートの位置を検出するビート検出部２５と、このビート毎の各音階音のレベルの平均値を計算し、このビート毎の各音階音の平均レベルの増分値をすべての音階音について合計して、ビート毎の全体の音の変化度合いを示す値を求め、このビート毎の全体の音の変化度合いを示す値から、拍子と小節線位置を検出する小節検出部３０と、上記入力された音響信号から、先のビート検出の時とは異なる別の所定の時間間隔で、コード検出に適したパラメータを使ってＦＦＴ演算を行い、所定の時間毎の各音階音のレベルを求めるコード検出用音階音レベル検出部４０と、検出した各音階音のレベルのうち、各小節内における低域側の音階音のレベルからベース音を検出するベース音検出部５０と、検出したベース音と各音階音のレベルから各小節のコード名を決定するコード名決定部６０と、検出した全てのコード毎に、コード位置、ベース検出期間におけるベースの検出音域の音階音のレベルより求められるベース域音階音強度、ベース音、コード検出期間におけるコードの検出音域の音階音のレベルより求められるコード音階音強度、コード構成音、コード構成音数、コード名を記憶するコード情報記憶部６２と、入力された音響信号を記憶しておく音響信号記憶部７１と、その音響信号の演奏を行う演奏部７０と、その演奏に従って、ユーザによる、複数の小節を含む楽節の区切りを受ける第１の入力部９０と、上記楽節の区切りの位置から、各楽節の小節数を割り出し、同じ小節数のものの中から検出したコードネームの文字列又はコードネームの構成音を比較することでそれらの類似性をチェックし、それらの類似性を表す類似度が所定の閾値以上の楽節同士には、ユニークなＩＤを割り振る楽節類似性検出部１００と、音響信号の演奏状況・楽節区切り状況・及びユニークなＩＤの振られた楽節を含む類似性検出状況をユーザに表示する表示部８０と、ユーザによる、上記類似性検出状況の判定を受け付ける第２の入力部１１０と、上記類似性検出状況の判定について確定しないと入力された場合に、上記閾値を変更させて、上記楽節類似性検出部１００に、各楽節の類似性の再検出を行わしめると共に、上記判定を確定すると入力された場合に、同じ楽節部分は、同じ小節分割及び同じコードになるように再検出させる楽節類似性確定部１２０と、上記楽節類似性確定部１２０により、同じ楽節部分が、同じ小節分割及び同じコードになるように再検出させた際に、小節分割及び／又はコードの修正を行える修正部１３０とを有している。 FIG. 2 is an overall block diagram of the code name detection apparatus according to the present invention. According to the figure, the configuration of the code name detection apparatus is suitable for beat detection at a predetermined time interval (predetermined frame; window) from the input unit 10 for inputting an acoustic signal and the input acoustic signal. An FFT calculation is performed using the parameters, and a tone detection level detecting unit 20 for detecting a scale sound for each predetermined time, and an increment value of the level of each scale sound for each predetermined time are calculated for all scales. Summing up the sounds, obtaining a sum of level increments indicating the degree of change in the overall sound at a given time, and from the sum of level increments showing the degree of change in the overall sound at a given time, The beat detection unit 25 for detecting the average beat interval and the position of each beat, the average value of the level of each scale sound for each beat, and the increment value of the average level of each scale sound for each beat For the scale sounds of Then, a value indicating the degree of change in the overall sound for each beat is obtained, and from the value indicating the degree of change in the overall sound for each beat, the bar detection unit 30 for detecting the time signature and the bar line position, and the input sound A chord detection scale that obtains the level of each tone at a predetermined time by performing an FFT operation using a parameter suitable for chord detection at a predetermined time interval different from the time of the previous beat detection from the signal. A sound level detection unit 40; a bass sound detection unit 50 that detects a bass sound from the level of a low-frequency tone within each measure; and a detected bass sound and each tone A chord name determination unit 60 that determines the chord name of each measure from the level of the chord, and the bass range sound obtained from the chord position and the level of the scale tone of the bass detection range in the base detection period for every detected chord A chord information storage unit 62 for storing chord scale sound intensity, chord constituent sound, chord constituent sound number, and chord name obtained from the sound intensity, bass sound, and the scale sound level in the chord detection range in the chord detection period. An acoustic signal storage unit 71 for storing the acoustic signal, a performance unit 70 for performing the performance of the acoustic signal, and a first input unit 90 for receiving a break of a passage including a plurality of measures by the user according to the performance. Then, the number of measures in each passage is determined from the position of the section of the above-mentioned passage, and the similarity is checked by comparing the character string of the code name detected from the ones with the same number of measures or the constituent sounds of the code name. The passage similarity detection unit 100 for assigning a unique ID to the passages whose similarity representing the similarity is equal to or greater than a predetermined threshold, and the performance status / section division of the acoustic signal A display unit 80 that displays to the user a similarity detection status including a passage with a unique ID and a unique ID, a second input unit 110 that accepts determination of the similarity detection status by the user, and the similarity When it is input that the determination of the sex detection status is not confirmed, the threshold value is changed, and the similarity detection unit 100 is caused to re-detect the similarity of each passage, and the determination is input when the determination is confirmed. In this case, the same section portion is re-detected so as to have the same measure division and the same code, and the same section portion is changed to the same measure division and the same by the above-described passage similarity determination section 120. A correction unit 130 is provided that can divide bars and / or correct codes when re-detecting the code.

音楽音響信号を入力する上記入力部１０は、コード名検出をする対象の音楽音響信号を入力する部分であり、上述のように、サウンドシステム１０１４により構成されている。マイク等の機器から入力されたアナログ信号をＡ／Ｄ変換器（図示無し）によりディジタル信号に変換しても良いし、音楽ＣＤなどのディジタル化された音楽データの場合は、そのままファイルとして取り込み（リッピング）、これを指定して開くようにしても良い。このようにして入力したディジタル信号がステレオの場合、後の処理を簡略化するためにモノラルに変換する。 The input unit 10 for inputting a music sound signal is a part for inputting a music sound signal to be subjected to chord name detection, and is configured by the sound system 1014 as described above. An analog signal input from a device such as a microphone may be converted into a digital signal by an A / D converter (not shown). In the case of digitized music data such as a music CD, it is directly taken in as a file ( Ripping), it may be specified and opened. When the input digital signal is stereo, it is converted to monaural in order to simplify subsequent processing.

このディジタル信号は、ビート検出用音階音レベル検出部２０に入力される。このビート検出用音階音レベル検出部２０は、コード名検出用プログラムが読み込まれて実行され、上記ＣＰＵ１００２により構成され、上述のように、入力された音響信号から、所定の時間間隔で、ＦＦＴ演算を行い、所定の時間毎の各音階音のレベルを求める機能を有している。該構成は、さらに、図３の各部から構成される。 This digital signal is input to the beat detection scale level detector 20. The beat detection scale level detection unit 20 is read and executed by a chord name detection program, and is configured by the CPU 1002. As described above, an FFT calculation is performed at predetermined time intervals from an input acoustic signal. And has a function of obtaining the level of each scale sound for each predetermined time. The configuration is further configured from each part of FIG.

そのうち波形前処理部２１は、音楽音響信号の上記入力部１０からの音響信号を今後の処理に適したサンプリング周波数にダウンサンプリングする構成である。 Among them, the waveform preprocessing unit 21 is configured to downsample the acoustic signal from the input unit 10 of the music acoustic signal to a sampling frequency suitable for future processing.

ダウンサンプリングレートは、ビート検出に使う楽器の音域によって決定する。すなわち、シンバル、ハイハット等の高音域のリズム楽器の演奏音をビート検出に反映させるには、ダウンサンプリング後のサンプリング周波数を高い周波数にする必要があるが、ベース音とバスドラム、スネアドラム等の楽器音と中音域の楽器音から主にビート検出させる場合には、ダウンサンプリング後のサンプリング周波数はそれほど高くする必要はない。 The downsampling rate is determined by the range of the instrument used for beat detection. In other words, in order to reflect the performance sound of high-frequency rhythm instruments such as cymbals and hi-hats in beat detection, it is necessary to set the sampling frequency after down-sampling to a high frequency, but bass sounds, bass drums, snare drums, etc. When beat detection is mainly performed from instrument sounds and middle instrument sounds, the sampling frequency after downsampling need not be so high.

例えば検出する最高音をＡ６（Ｃ４が中央のド）とする場合、Ａ６の基本周波数は約１７６０Ｈｚ（Ａ４＝４４０Ｈｚとした場合）となるので、ダウンサンプリング後のサンプリング周波数は、ナイキスト周波数が１７６０Ｈｚ以上となる、３５２０Ｈｚ以上にすれば良い。これから、ダウンサンプリングレートは、元のサンプリング周波数が４４．１ｋＨｚ（音楽ＣＤ）の場合、１／１２程度にすれば良いことになる。この時、ダウンサンプリング後のサンプリング周波数は、３６７５Ｈｚとなる。 For example, when the highest sound to be detected is A6 (C4 is in the middle), the basic frequency of A6 is about 1760 Hz (when A4 = 440 Hz), so the sampling frequency after downsampling is a Nyquist frequency of 1760 Hz or higher. It may be 3520 Hz or higher. From this, the downsampling rate may be about 1/12 when the original sampling frequency is 44.1 kHz (music CD). At this time, the sampling frequency after downsampling is 3675 Hz.

ダウンサンプリングの処理は、通常、ダウンサンプリング後のサンプリング周波数の半分の周波数であるナイキスト周波数（今の例では１８３７．５Ｈｚ）以上の成分をカットするローパスフィルタを通した後に、データを読み飛ばす（今の例では波形サンプルの１２個に１１個を破棄する）ことによって行われる。 In the downsampling process, data is skipped after passing through a low-pass filter that cuts off components above the Nyquist frequency (1837.5 Hz in this example), which is usually half the sampling frequency after downsampling (now In this example, 11 out of 12 waveform samples are discarded).

このようにダウンサンプリングの処理を行うのは、この後のＦＦＴ演算において、同じ周波数分解能を得るために必要なＦＦＴポイント数を下げることで、ＦＦＴの演算時間を減らすのが目的である。 The purpose of downsampling in this way is to reduce the FFT computation time by lowering the number of FFT points necessary to obtain the same frequency resolution in the subsequent FFT computation.

なお、音楽ＣＤのように、音源が固定のサンプリング周波数で既にサンプリングされている場合は、このようなダウンサンプリングが必要になるが、音楽音響信号の入力部１０が、マイク等の機器から入力されたアナログ信号をＡ／Ｄ変換器によりディジタル信号に変換するような場合には、当然Ａ／Ｄ変換器のサンプリング周波数を、ダウンサンプリング後のサンプリング周波数に設定することで、この波形前処理部２１を省くことが可能である。 If the sound source is already sampled at a fixed sampling frequency, such as a music CD, such down-sampling is necessary. However, the music acoustic signal input unit 10 is input from a device such as a microphone. When the analog signal is converted into a digital signal by the A / D converter, the waveform preprocessing unit 21 is naturally set by setting the sampling frequency of the A / D converter to the sampling frequency after downsampling. Can be omitted.

このようにして波形前処理部２１によるダウンサンプリングが終了したら、所定の時間間隔で、波形前処理部２１の出力信号を、ＦＦＴ演算部２２によりＦＦＴ（高速フーリエ変換）する。 When the downsampling by the waveform preprocessing unit 21 is completed in this manner, the output signal of the waveform preprocessing unit 21 is subjected to FFT (Fast Fourier Transform) by the FFT calculation unit 22 at a predetermined time interval.

このＦＦＴ演算部２２は、上記プログラムが稼働した該ＣＰＵ１００２によって構成されている。そしてＦＦＴのパラメータ（ＦＦＴポイント数とＦＦＴ窓のシフト量）は、ビート検出に適した値とする。つまり、周波数分解能を上げるためにＦＦＴポイント数を大きくすると、ＦＦＴ窓のサイズが大きくなってしまい、より長い時間から１回のＦＦＴを行うことになり、時間分解能が低下する、というＦＦＴの特性を考慮しなくてはならない。つまりビート検出時は周波数分解能を犠牲にして時間分解能をあげるのが良い。窓のサイズと同じだけの長さの波形を使わないで、窓の一部だけに波形データをセットし残りは０で埋めることによって、ＦＦＴポイント数を大きくしても時間分解能が悪くならない方法もあるが、低音側のパワーも正しく検出するためには、ある程度の波形サンプル数は必要である。 The FFT operation unit 22 is constituted by the CPU 1002 in which the program is operated. The FFT parameters (the number of FFT points and the shift amount of the FFT window) are values suitable for beat detection. In other words, if the number of FFT points is increased in order to increase the frequency resolution, the size of the FFT window increases, and one FFT is performed from a longer time, resulting in the FFT characteristic that the time resolution decreases. Must be taken into account. In other words, at the time of beat detection, it is better to increase the time resolution at the expense of frequency resolution. There is a method in which the time resolution is not deteriorated even if the number of FFT points is increased by setting the waveform data to only a part of the window and filling the rest with 0 without using the waveform as long as the window size. However, a certain number of waveform samples is necessary to correctly detect the power on the bass side.

以上のようなことを考慮し、本実施例では、ＦＦＴポイント数５１２、窓のシフトは３２サンプルで、０埋めなしという設定にした。このような設定でＦＦＴ演算を行うと、時間分解能約８．７ｍｓ、周波数分解能約７．２Ｈｚとなる。時間分解能約８．７ｍｓという値は、四分音符＝３００のテンポの曲で、３２分音符の長さが、２５ｍｓであることを考えると、十分な値であることがわかる。 Considering the above, in this embodiment, the number of FFT points is 512, the window shift is 32 samples, and no zero padding is set. When FFT calculation is performed with such settings, the time resolution is about 8.7 ms and the frequency resolution is about 7.2 Hz. It can be seen that the time resolution of about 8.7 ms is a sufficient value considering that the tune has a tempo of quarter note = 300 and the length of the 32nd note is 25 ms.

このようにして、所定の時間間隔毎にＦＦＴ演算が行われ、その実数部と虚数部のそれぞれを二乗したものの和の平方根からパワースペクトルのレベルが計算され、その結果がレベル検出部２３に送られる。 In this way, the FFT operation is performed at predetermined time intervals, and the level of the power spectrum is calculated from the square root of the sum of the squares of the real part and the imaginary part, and the result is sent to the level detector 23. It is done.

レベル検出部２３では、同じく上記プログラムが稼働した該ＣＰＵ１００２によって構成されており、ＦＦＴ演算部２２で計算されたパワースペクトルから、各音階音のレベルを計算する。ＦＦＴは、サンプリング周波数をＦＦＴポイント数で割った値の整数倍の周波数のパワーが計算されるだけであるので、このパワースペクトルから各音階音のレベルを検出するために、以下のような処理を行う。つまり、音階音を計算するすべての音（Ｃ１からＡ６）について、その各音の基本周波数の上下５０セントの範囲（１００セントが半音）の周波数に相当するパワースペクトルの内、最大のパワーを持つスペクトルのパワーをこの音階音のレベルとする。 Similarly, the level detection unit 23 includes the CPU 1002 in which the above-described program is operated, and calculates the level of each scale tone from the power spectrum calculated by the FFT calculation unit 22. Since FFT only calculates the power of a frequency that is an integer multiple of the sampling frequency divided by the number of FFT points, in order to detect the level of each scale tone from this power spectrum, the following processing is performed. Do. That is, all the sounds (C1 to A6) for which the scale sound is calculated have the maximum power in the power spectrum corresponding to the frequency in the range of 50 cents above and below the fundamental frequency of each sound (100 cents is a semitone). Let the power of the spectrum be the level of this scale sound.

すべての音階音についてレベルが検出されたら、これをバッファ２４に保存し、波形の読み出し位置を所定の時間間隔（先の例では３２サンプル）進めて、ＦＦＴ演算部２２とレベル検出部２３を波形の終わりまで繰り返す。 When the levels are detected for all the scale sounds, the levels are stored in the buffer 24, and the waveform read position is advanced by a predetermined time interval (32 samples in the previous example), so that the FFT calculation unit 22 and the level detection unit 23 have waveforms. Repeat until the end.

以上により、音楽音響信号の入力部１０に入力された音響信号の、所定時間毎の各音階音のパワーが、バッファ２４に保存される。 As described above, the power of each scale sound for each predetermined time of the acoustic signal input to the music acoustic signal input unit 10 is stored in the buffer 24.

上記ビート検出部２５は、同じくコード名検出用プログラムが読み込まれて実行され、以下に示す処理を行うコンピュータのＣＰＵ１００２により構成されている。それは、上述のように、所定の時間毎の各音階音のレベルの増分値をすべての音階音について合計して、所定の時間毎の全体の音の変化度合いを示すレベルの増分値の合計を求め、この所定の時間毎の全体の音の変化度合いを示すレベルの増分値の合計から、平均的なビート間隔と各ビートの位置を検出する機能を有している。 The beat detection unit 25 is configured by a CPU 1002 of a computer that similarly reads and executes a code name detection program and performs the following processing. As described above, the sum of the increments of each scale sound level for each predetermined time is summed for all the scale sounds, and the sum of the level increment values indicating the degree of change of the overall sound for each predetermined time is obtained. It has a function of detecting the average beat interval and the position of each beat from the sum of the incremental values of the level indicating the degree of change of the entire sound every predetermined time.

次に、図１のビート検出部２５の構成について説明する。該ビート検出部２５は、図４のような処理の流れで実行される。 Next, the configuration of the beat detection unit 25 in FIG. 1 will be described. The beat detection unit 25 is executed in the process flow as shown in FIG.

ビート検出部２５は、ビート検出用音階音レベル検出部２０が出力した所定時間（以下、この１所定時間を１フレームと呼ぶ）毎の各音階音のレベルの変化を元に、平均的なビート（拍）間隔（つまりテンポ）とビートの位置を検出する。そのために、まずビート検出部２５は、各音階音のレベル増分値の合計（前のフレームとのレベルの増分値をすべての音階音で合計したもの。前のフレームからレベルが減少している場合は０として加算する）を計算する（ステップＳ１００）。 The beat detection unit 25 generates an average beat based on a change in the level of each scale sound for each predetermined time (hereinafter, this one predetermined time is referred to as one frame) output by the beat detection scale level detection unit 20. (Beat) interval (ie tempo) and beat position are detected. For this purpose, first, the beat detection unit 25 sums up the level increment values of each scale sound (the sum of the level increment values from the previous frame for all the scale sounds. When the level decreases from the previous frame Is added as 0) (step S100).

この各音階音のレベル増分値を算出する構成では、上記ビート検出用音階音レベル検出部２０により、後述する図５の中段に示されるように検出される、この所定の時間（上述のように１フレームと呼ぶ）毎の各音階音のパワースペクトル（図５の例ではＣ１〜Ａ６の縦方向に夫々示されたパワースペクトル）の増分値を、すべての音階音について合計しており、それによって、所定の時間毎の全体の音の変化度合いを示す、後述図５の下段に示されるレベルの増分値の合計が求められることになる。 In the configuration for calculating the level increment value of each scale tone, the beat detection scale level detector 20 detects the predetermined time (as described above) detected as shown in the middle of FIG. The increment value of the power spectrum of each scale sound (referred to as one frame) (the power spectrum shown in the vertical direction of C1 to A6 in the example of FIG. 5) is summed up for all the scale sounds. The sum of the incremental values of the levels shown in the lower part of FIG. 5, which shows the degree of change of the entire sound every predetermined time, is obtained.

すなわち、各音階音のレベル増分値を算出する構成では、各音階音のレベル増分値の合計（前のフレームとのレベルの増分値をすべての音階音で合計したもの。前のフレームからレベルが減少している場合は０として加算する）を算出する。 That is, in the configuration for calculating the level increment value of each scale note, the sum of the level increment values of each scale tone (the sum of the level increment values of the previous frame with all the scale sounds. The level from the previous frame is If it has decreased, it is added as 0).

つまり、フレーム時間ｔにおけるｉ番目の音階音のレベルをＬ_ｉ（ｔ）とするとき、ｉ番目の音階音のレベル増分値Ｌ_ａｄｄｉ（ｔ）は、下式数１に示すようになり、このＬ_ａｄｄｉ（ｔ）を使って、フレーム時間ｔにおける各音階音のレベル増分値の合計Ｌ（ｔ）は、下式数２で計算できる。ここで、Ｔは音階音の総数である。 That is, when the level of the i-th scale sound at the frame time t is L _i (t), the level increment value L _addi (t) of the i-th scale sound is as shown in the following equation 1, Using L _addi (t), the sum L (t) of the level increments of each scale tone at the frame time t can be calculated by the following equation (2). Here, T is the total number of scale sounds.

この合計Ｌ（ｔ）値は、フレーム毎の全体での音の変化度合いを表している。この値は、音の鳴り始めで急激に大きくなり、同時に鳴り始める音が多いほど大きな値となる。音楽はビートの位置で音が鳴り始めることが多いので、この値が大きなところはビートの位置である可能性が高いことになる。 The total L (t) value represents the degree of change in sound for each frame. This value suddenly increases at the beginning of sounding, and becomes larger as more sounds begin to sound at the same time. Since music often starts to sound at the beat position, there is a high possibility that the place where this value is large is the beat position.

例として、図５に、ある曲の一部分の波形と各音階音のレベル、各音階音のレベル増分値の合計の図を示す。上段が波形、中央がフレーム毎の各音階音のレベルを濃淡で表したもの（下が低い音、上が高い音。この図では、Ｃ１からＡ６の範囲）、下段がフレーム毎の各音階音のレベル増分値の合計を示している。この図の各音階音のレベルは、ビート検出用音階音レベル検出部２０から出力されたものであるので、周波数分解能が約７．２Ｈｚであり、Ｇ＃２以下の一部の音階音でレベルが計算できずに歯抜け状態になっているが、この場合はビートを検出するのが目的であるので、低音の一部の音階音のレベルが測定できないのは、問題ない。 As an example, FIG. 5 shows a diagram of the sum of the waveform of a part of a certain song, the level of each scale note, and the level increment value of each scale note. The upper row is the waveform, the middle is the tone level of each scale in each frame (lower is lower, the upper is higher. In this figure, the range is C1 to A6), and the lower is each scale. Shows the sum of level increments. Since the level of each scale tone in this figure is output from the beat detection scale level detector 20, the frequency resolution is about 7.2 Hz, and the level of some scales below G # 2 is the level. However, in this case, since the purpose is to detect a beat, there is no problem that the level of a part of the lower tone cannot be measured.

この図の下段に見られるように、各音階音のレベル増分値の合計は、定期的にピークをもつ形となっている。この定期的なピークの位置が、ビートの位置である。 As seen in the lower part of the figure, the sum of the level increments of each scale sound has a peak periodically. This regular peak position is the beat position.

上述のように、ビート検出部２５は、ビート検出用音階音レベル検出部２０が出力した所定時間毎の各音階音のレベルの変化を元に、平均的なビート（拍）間隔（つまりテンポ）とビートの位置を検出するが、そのために、該ビート検出部２５は、ビートの位置を求めることを目的として、まずこの定期的なピークの間隔、つまり平均的なビート間隔を求める。平均的なビート間隔はこの各音階音のレベル増分値の合計の自己相関から計算できる（図４；ステップＳ１０２）。 As described above, the beat detection unit 25 uses the average beat (beat) interval (that is, tempo) based on the change in the level of each scale sound per predetermined time output by the beat detection scale level detection unit 20. For this purpose, the beat detector 25 first obtains the periodic peak interval, that is, the average beat interval for the purpose of obtaining the beat position. The average beat interval can be calculated from the autocorrelation of the total level increment value of each scale note (FIG. 4; step S102).

あるフレーム時間ｔにおける各音階音のレベル増分値の合計をＬ（ｔ）とすると、この自己相関φ（τ）は、以下の式数３で計算される。 When the total level increment value of each scale tone in a certain frame time t is L (t), this autocorrelation φ (τ) is calculated by the following equation (3).

ここで、Ｎは総フレーム数、τは時間遅れである。

Here, N is the total number of frames, and τ is a time delay.

自己相関計算の概念図を、図６に示す。この図のように、時間遅れτがＬ（ｔ）のピークの周期の整数倍の時に、φ（τ）は大きな値となる。よって、ある範囲のτについてφ（τ）の最大値を求めれば、曲のテンポを求めることができる。 A conceptual diagram of autocorrelation calculation is shown in FIG. As shown in this figure, when the time delay τ is an integral multiple of the peak period of L (t), φ (τ) takes a large value. Therefore, if the maximum value of φ (τ) is obtained for a certain range of τ, the tempo of the music can be obtained.

自己相関を求めるτの範囲は、想定する曲のテンポ範囲によって変えれば良い。例えば、メトロノーム記号で四分音符＝３０から３００の範囲を計算するならば、自己相関を計算する範囲は、０．２秒から２秒となる。時間（秒）からフレームへの変換式は、以下の数４式に示す通りとなる。 The range of τ for obtaining the autocorrelation may be changed according to the assumed tempo range of the song. For example, if the range of quarter note = 30 to 300 is calculated with a metronome symbol, the range for calculating the autocorrelation is 0.2 second to 2 seconds. The conversion formula from time (seconds) to frame is as shown in the following equation (4).

この範囲の自己相関φ（τ）が最大となるτをビート間隔としても良いが、必ずしもすべての曲で自己相関が最大となる時のτがビート間隔とはならないので、自己相関が極大値となる時のτからビート間隔の候補を求め（図４；ステップＳ１０４）、これら複数の候補からユーザにビート間隔を決定させるのが良い（図４；ステップＳ１０６）。 Τ with the maximum autocorrelation φ (τ) in this range may be set as the beat interval, but τ when autocorrelation is maximum in all songs is not necessarily the beat interval, so the autocorrelation is the maximum value. It is preferable to obtain beat interval candidates from τ at the time (FIG. 4; step S104), and let the user determine the beat interval from these multiple candidates (FIG. 4; step S106).

このようにしてビート間隔が決定したら（決定したビート間隔をτ_ｍａｘとする）、まず最初に先頭のビート位置を決定する。 When the beat interval is determined in this way (the determined beat interval is set to τ _max ), the head beat position is first determined.

先頭のビート位置の決定方法を、図７を用いて説明する。図７の上段はフレーム時間ｔにおける各音階音のレベル増分値の合計Ｌ（ｔ）で、下段Ｍ（ｔ）は決定したビート間隔τ_ｍａｘの周期で値を持つ関数である。式で表すと、下式数５に示すようになる。 A method for determining the first beat position will be described with reference to FIG. The upper part of FIG. 7 is a total L (t) of the level increment values of each scale tone at the frame time t, and the lower part M (t) is a function having a value at the determined beat interval τ _max . This is expressed by the following equation (5).

この関数Ｍ（ｔ）を、０からτ_ｍａｘ−１の範囲でずらしながら、Ｌ（ｔ）とＭ（ｔ）の相互相関を計算する。 The cross correlation between L (t) and M (t) is calculated while shifting this function M (t) in the range of 0 to τ _max −1.

相互相関ｒ（ｓ）は、上記Ｍ（ｔ）の特性から、下式数６で計算できる。 The cross-correlation r (s) can be calculated by the following equation 6 from the characteristic of M (t).

この場合のｎは、最初の無音部分の長さに応じて適当に決めれば良い（図７の例では、ｎ＝１０）。 In this case, n may be determined appropriately according to the length of the first silent portion (n = 10 in the example of FIG. 7).

ｒ（ｓ）をｓが０からτ_ｍａｘ−１の範囲で求め、ｒ（ｓ）が最大となるｓを求めれば、このｓのフレームが最初のビート位置である。 If r (s) is obtained in the range of s from 0 to τ _max −1, and s at which r (s) is maximized is obtained, this s frame is the first beat position.

最初のビート位置が決まったら、それ以降のビートの位置を１つずつ決定していく（図４；ステップＳ１０８）。 When the first beat position is determined, the subsequent beat positions are determined one by one (FIG. 4; step S108).

その方法を、図８を用いて説明する。図８の三角印の位置に先頭のビートが見つかったとする。２番目のビート位置は、この先頭のビート位置からビート間隔τ_ｍａｘだけ離れた位置を仮のビート位置とし、その近辺でＬ（ｔ）とＭ（ｔ）が最も相関が取れる位置から決定する。つまり、先頭のビート位置をｂ_０とするとき、以下の式のｒ（ｓ）が最大となるようなｓの値を求める。この式のｓは仮のビート位置からのずれで、以下の式数７の範囲の整数とする。Ｆは揺らぎのパラメータで０．１程度の値が適当であるが、テンポの揺らぎの大きい曲では、もっと大きな値にしてもよい。ｎは５程度でよい。 The method will be described with reference to FIG. Assume that the first beat is found at the position of the triangle in FIG. The second beat position is determined from a position where L (t) and M (t) are most correlated in the vicinity of the temporary beat position at a position separated by a beat interval τ _max from the first beat position. That is, when the leading beat position is b ₀ , the value of s is determined so that r (s) in the following expression is maximized. In this equation, s is a deviation from the temporary beat position, and is an integer in the range of Equation 7 below. F is a fluctuation parameter, and a value of about 0.1 is appropriate. However, a larger value may be used for a song with a large tempo fluctuation. n may be about 5.

ｋは、ｓの値に応じて変える係数で、例えば図９のような正規分布とする。 k is a coefficient that changes in accordance with the value of s, and has a normal distribution as shown in FIG. 9, for example.

ｒ（ｓ）が最大となるようなｓの値が求まれば、２番目のビート位置ｂ_１は、下式数８で計算される。 If the value of s that maximizes r (s) is obtained, the second beat position b ₁ is calculated by the following equation (8).

以降、同じようにして３番目以降のビート位置も求めることができる。 Thereafter, the third and subsequent beat positions can be obtained in the same manner.

テンポがほとんど変わらない曲ではこの方法でビート位置を曲の終わりまで求めることができるが、実際の演奏は多少テンポが揺らいだり、部分的にだんだん遅くなったりすることがよくある。 For songs with almost no change in tempo, the beat position can be obtained to the end of the song in this way, but the actual performance often fluctuates slightly or becomes partly slower.

そこで、これらのテンポの揺らぎにも対応できるように以下のような方法を考えた。 Therefore, the following method was considered so as to cope with these fluctuations in tempo.

つまり、図８のＭ（ｔ）の関数を、図１０のように変化させるものである。
１）は、従来の方法で、図のように各パルスの間隔をτ１、τ２、τ３、τ４としたとき、
τ１＝τ２＝τ３＝τ４＝τ_ｍａｘ
である。
２）は、τ１からτ４を均等に大きくしたり小さくしたりするものである。
τ１＝τ２＝τ３＝τ４＝τ_ｍａｘ＋ｓ (-τ_ｍａｘ・Ｆ≦ｓ≦τ_ｍａｘ・Ｆ）これにより、急にテンポが変わった場合に対応できる。
３）は、ｒｉｔ．（リタルダンド、だんだん遅く）又は、ａｃｃｅｌ．（アッチェレランド、だんだん速く）に対応したもので、各パルス間隔は、
τ１＝τ_ｍａｘ
τ２＝τ_ｍａｘ＋１・ｓ
τ３＝τ_ｍａｘ＋２・ｓ（-τ_ｍａｘ・Ｆ≦ｓ≦τ_ｍａｘ・Ｆ）
τ４＝τ_ｍａｘ＋４・ｓ
で計算される。
１、２、４の係数は、あくまで例であり、テンポ変化の大きさによって変えてもよい。
４）は、３）のようなｒｉｔ．やａｃｃｅｌ．の場合の、５個のパルスの位置のどこが現在ビートを求めようとしている場所かを変えるものである。 That is, the function of M (t) in FIG. 8 is changed as shown in FIG.
1) is a conventional method, and when the intervals of each pulse are τ1, τ2, τ3, and τ4 as shown in the figure,
τ1 = τ2 = τ3 = τ4 = τ _max
It is.
In 2), τ1 to τ4 are uniformly increased or decreased.
τ1 = τ2 = τ3 = τ4 = τ max + s (-τ max · F ≦ s ≦ τ max · F) Thus, it corresponds to the case where sudden tempo changes.
3) rit. (Ritardando, gradually) or accele. (Accelerando, gradually faster), each pulse interval is
τ1 = τ _max
τ2 = τ _max + 1 · s
τ3 = τ _max + 2 · s (−τ _max · F ≦ s ≦ τ _max · F)
τ4 = τ _max + 4 · s
Calculated by
The coefficients 1, 2, and 4 are merely examples, and may be changed depending on the magnitude of tempo change.
4) is a rit. And accel. In this case, the position of the five pulses is changed where the current beat is to be obtained.

これらをすべて組み合わせて、Ｌ（ｔ）とＭ（ｔ）の相関を計算し、それらの最大からビート位置を決めれば、テンポが揺らぐ曲に対してもビート位置の決定が可能である。なお、２）と３）の場合には、相関を計算するときの係数ｋの値を、やはりｓの値に応じて変えるようにする。 By combining all of these, calculating the correlation between L (t) and M (t), and determining the beat position from the maximum of them, it is possible to determine the beat position even for a song whose tempo fluctuates. In the case of 2) and 3), the value of the coefficient k when calculating the correlation is also changed according to the value of s.

さらに、５個のパルスの大きさは現在すべて同じにしてあるが、ビートを求める位置（図１０の仮のビート位置）のパルスのみ大きくしたり、ビートを求める位置から離れるほど値を小さくして、ビートを求める位置の各音階音のレベル増分値の合計を強調するようにしてもよい［図１０の５）］。 Furthermore, although the five pulses are all the same in size at present, only the pulse at the position where the beat is calculated (the temporary beat position in FIG. 10) is increased, or the value is decreased as the distance from the position where the beat is determined is increased. Further, the sum of the level increment values of each scale tone at the position where the beat is sought may be emphasized [5) in FIG.

以上のようにして、各ビートの位置が決定したら、この結果をバッファ２６に保存すると共に、検出した結果を表示部８０を介して表示し、ユーザに確認してもらい、間違っている箇所を修正してもらうようにしてもよい。 When the position of each beat is determined as described above, the result is stored in the buffer 26, and the detected result is displayed via the display unit 80. You may be asked to do it.

ビート検出結果の確認画面の例を、図１１に示す。同図の三角印の位置が検出したビート位置である。 An example of a confirmation screen for the beat detection result is shown in FIG. The position of the triangle mark in the figure is the detected beat position.

「再生」のボタンを押すと、上記演奏部７０により、現在の音楽音響信号が、Ｄ／Ａ変換され、スピーカ等から再生される。現在の再生位置は、図のように縦線等の再生位置ポインタで表示されるので、演奏を聞きながら、ビート検出位置の誤りを確認できる。さらに、検出の元波形の再生と同時に、ビート位置のタイミングで例えばメトロノームのような音を再生させるようにすれば、目で確認するだけでなく音でも確認でき、より容易に誤検出を判断できる。このメトロノームの音を再生させる方法としては、例えばＭＩＤＩ機器等が考えられる。 When the “play” button is pressed, the performance unit 70 performs D / A conversion on the current music sound signal and plays it from a speaker or the like. Since the current playback position is displayed with a playback position pointer such as a vertical line as shown in the figure, it is possible to confirm an error in the beat detection position while listening to the performance. Furthermore, if a sound such as a metronome is played at the beat position timing simultaneously with the reproduction of the original waveform of the detection, it is possible to check not only with the eyes but also with the sound, and it is possible to judge the false detection more easily. . As a method for reproducing the sound of the metronome, for example, a MIDI device can be considered.

ビート検出位置の修正は、「ビート位置の修正」ボタンを押して行う。このボタンを押すと、画面に十字のカーソルが現れるので、最初のビート検出が間違っている箇所で正しいビート位置をクリックする。クリックされた場所の少し前（例えばτ_ｍａｘの半分の位置）から後のビート位置をすべてクリアし、クリックされた場所を、仮のビート位置として、以降のビート位置を再検出する。 The beat detection position is corrected by pressing the “correct beat position” button. When this button is pressed, a cross cursor appears on the screen. Click the correct beat position where the first beat detection is wrong. All beat positions after a position slightly before the clicked position (for example, half the position of _τmax ) are cleared, and the subsequent beat positions are detected again with the clicked position as the temporary beat position.

次に、上記小節検出部３０による拍子および小節の検出について説明する。 Next, the time signature and measure detection by the measure detection unit 30 will be described.

上記小節検出部３０は、同じくコード名検出用プログラムが読み込まれて実行され、以下に示す処理を行うコンピュータのＣＰＵ１００２により構成されている。それは、上述のように、上記ビート毎の各音階音のレベルの平均値を計算し、このビート毎の各音階音の平均レベルの増分値をすべての音階音について合計して、ビート毎の全体の音の変化度合いを示す値を求め、このビート毎の全体の音の変化度合いを示す値から、拍子と小節線位置を検出する機能を有している。 The bar detection unit 30 is configured by a CPU 1002 of a computer that similarly reads and executes a code name detection program and performs the following processing. As mentioned above, it calculates the average value of each scale note level for each beat, and sums the increments of the average level of each scale note for each beat for all the scale notes. A value indicating the degree of change in sound is obtained, and the time and bar line position are detected from the value indicating the degree of change in the overall sound for each beat.

上記ビート検出部２５によるこれまでの処理で、ビートの位置が確定しているので、上記小節検出部３０によって、まずはビート毎の音の変化度合いを求める。このビート毎の音の変化度合いは、ビート検出用音階音レベル検出部２０が出力した、フレーム毎の各音階音のレベルから計算する。 Since the beat position has been determined by the processing by the beat detection unit 25 so far, the measure detection unit 30 first determines the degree of change in sound for each beat. The degree of change in sound for each beat is calculated from the level of each scale sound for each frame output by the beat detection scale level detector 20.

ｊ番目のビートのフレーム数をｂ_ｊとし、その前後のビートのフレームをｂ_ｊ−１、ｂ_ｊ＋１とする時、ｊ番目のビートのビート毎の音の変化度合いは、フレームｂ_ｊ−１からｂ_ｊ−１までのフレームの各音階音のパワーの平均とフレームｂ_ｊからｂ_ｊ＋１−１までのフレームの各音階音のレベルの平均を計算し、その増分値から各音階音のビート毎の音の変化度合いを求め、それらをすべての音階音で合計して計算することができる。 When the number of frames of the j-th beat is b _j and the frames of the beats before and after the j-th beat are b _j−1 and b _{j + 1} , the degree of change in sound for each beat of the j-th beat is from the frame b _j−1. The average power of each scale sound in the frames up to b _j −1 and the average level of each scale sound in the frames from b _j to b _{j + 1} −1 are calculated. The degree of change in sound can be obtained and calculated by summing up all the scales.

つまり、フレーム時間ｔにおけるｉ番目の音階音のレベルをＬ_ｉ（ｔ）とするとき、ｊ番目のビートのｉ番目の音階音のレベルの平均Ｌ_ａｖｇｉ（ｊ）は、下式数９であるから、ｊ番目のビートのｉ番目の音階音のビート毎の音の変化度合いＢ_ａｄｄｉ（ｊ）は、下式数１０に示すようになる。 That is, when the level of the i-th scale sound at the frame time t is L _i (t), the average level L _avg i (j) of the i-th scale sound level of the j-th beat is expressed by the following equation (9). Therefore, the sound change degree B _addi (j) for each beat of the i-th tone of the j-th beat is expressed by the following equation (10).

よって、ｊ番目のビートのビート毎の音の変化度合いＢ（ｊ）は、下式数１１に示すようになる。ここで、Ｔは音階音の総数である。 Therefore, the sound change degree B (j) for each beat of the j-th beat is as shown in the following equation (11). Here, T is the total number of scale sounds.

図１２の最下段は、このビート毎の音の変化度合いである。さらに、上記小節検出部３０は、このビート毎の音の変化度合いから、拍子と１拍目の位置を求める。 The bottom row in FIG. 12 shows the degree of change in sound for each beat. Further, the bar detection unit 30 obtains the time signature and the position of the first beat from the degree of change in sound for each beat.

拍子は、ビート毎の音の変化度合いの自己相関から求める。一般的に音楽は１拍目で音が変わることが多いと考えられるので、このビート毎の音の変化度合いの自己相関から拍子を求めることができる。例えば、下式数１２に示す自己相関φ（τ）を求める式から、ビート毎の音の変化度合いＢ（ｊ）の自己相関φ（τ）を遅れτが、２から４の範囲で求め、自己相関φ（τ）が最大となる遅れτを拍子の数とする。 The time signature is obtained from the autocorrelation of the degree of sound change for each beat. In general, it is considered that the sound often changes in the first beat, so the time signature can be obtained from the autocorrelation of the sound change degree for each beat. For example, the autocorrelation φ (τ) of the sound change degree B (j) for each beat is determined in the range of 2 to 4 from the formula for obtaining the autocorrelation φ (τ) shown in the following equation (12). The delay τ that maximizes the autocorrelation φ (τ) is defined as the number of beats.

Ｎは、総ビート数、τ＝２〜４の範囲でφ（τ）を計算し、φ（τ）が最大となるτを拍子の数とする。 N is the total number of beats, and φ (τ) is calculated in the range of τ = 2 to 4, and τ at which φ (τ) is the maximum is the number of beats.

次に１拍目を求めるが、これは、ビート毎の音の変化度合いＢ（ｊ）がもっとも大きい箇所を１拍目とする。つまり、φ（τ）が最大となるτをτ_ｍａｘ、下式数１３のＸ（ｋ）が最大となるｋをｋ_ｍａｘとするとき、ｋ_ｍａｘ番目のビートが最初の１拍目の位置となり、以降、τ_ｍａｘを足したビート位置が１拍目となる。 Next, the first beat is obtained. This is the position where the sound change degree B (j) for each beat is the largest. That is, when phi (tau) is maximum tau and tau _max, the k of X (k) is maximum the following equation number 13 and _{k _max,} _{k max} th beat becomes the position of the first first beat Thereafter, the beat position obtained by adding τ _max is the first beat.

ｎ_ｍａｘは、τ_ｍａｘ・ｎ＋ｋ＜Ｎの条件で最大となるｎ

n _max is the _maximum n under the condition of τ _max · n + k <N

以上のようにして、小節検出部３０により、拍子及び１拍目の位置（小節線の位置）が決定したら、この結果をバッファ３１に保存すると共に、検出した結果を表示部８０を使用して画面表示し、ユーザに変更させるようにすることが望ましい。特に変拍子の曲は、この方法では対応できないので、変拍子の箇所をユーザに指定してもらう必要がある。 As described above, when the bar detection unit 30 determines the time signature and the position of the first beat (bar line position), the result is stored in the buffer 31 and the detected result is displayed using the display unit 80. It is desirable to display the screen and let the user change it. In particular, music with odd time signatures cannot be handled by this method, so it is necessary to have the user specify the location of odd time signatures.

以上の構成により、人間が演奏したテンポの揺らぐ演奏の音響信号から、曲全体の平均的なテンポと正確なビート（拍）の位置、さらに曲の拍子と１拍目の位置を検出することが可能となる。 With the above configuration, it is possible to detect the average tempo and accurate beat (beat) position of the entire song, as well as the time signature and the first beat position, from the acoustic signal of the performance of the tempo performed by a human. It becomes possible.

次に、コード名検出用の構成につき、以下に説明する。 Next, the configuration for detecting the code name will be described below.

上記コード検出用音階音レベル検出部４０は、同じくコード名検出用プログラムが読み込まれて実行され、上記ＣＰＵ１００２により構成され、上述のように、入力部１０で入力された音響信号から、先のビート検出の時とは異なる別の所定の時間間隔で、コード検出に適したパラメータを使ってＦＦＴ演算を行い、所定の時間毎の各音階音のレベルを求める機能を有している。 The chord detection scale level detection unit 40 is similarly read and executed by the chord name detection program. The chord detection tone level detection unit 40 is configured by the CPU 1002 and, as described above, from the acoustic signal input from the input unit 10, It has a function of performing an FFT operation using a parameter suitable for chord detection at a different predetermined time interval different from that at the time of detection, and obtaining the level of each scale sound for each predetermined time.

上記入力部１０から入力されてくる音響ディジタル信号は、ビート検出用音階音レベル検出部２０とコード検出用音階音レベル検出部４０とに入力される。これらの音階音レベル検出部は、どちらも上記図３の各部から構成され、構成はまったく同じなので、同じものをパラメータだけを変えて再利用できる。 The acoustic digital signal input from the input unit 10 is input to the beat detection scale sound level detection unit 20 and the chord detection scale sound level detection unit 40. These scale sound level detection units are each configured from the respective units shown in FIG. 3 and have the same configuration, so that the same components can be reused by changing only the parameters.

そしてコード検出用音階音レベル検出部４０の構成として使用される波形前処理部２１は、上記と同様な構成であり、音楽音響信号の上記入力部１０からの音響信号を今後の処理に適したサンプリング周波数にダウンサンプリングする。（ただし、ダウンサンプリング後のサンプリング周波数、つまり、ダウンサンプリングレートは、ビート検出用とコード検出用で変えるようにしても良いし、ダウンサンプリングする時間を節約するために同じにしても良い。） The waveform preprocessing unit 21 used as the configuration of the chord detection scale sound level detection unit 40 has the same configuration as described above, and is suitable for future processing of the acoustic signal from the input unit 10 of the music acoustic signal. Downsample to the sampling frequency. (However, the sampling frequency after down-sampling, that is, the down-sampling rate may be changed for beat detection and chord detection, or may be the same in order to save time for down-sampling.)

コード検出用の波形前処理部のダウンサンプリングレートは、コード検出音域によって変える。コード検出音域とは、コード名決定部６０でコード検出するときに使う音域のことである。例えばコード検出音域をＣ３からＡ６（Ｃ４が中央のド）とする場合、Ａ６の基本周波数は約１７６０Ｈｚ（Ａ４＝４４０Ｈｚとした場合）となるので、ダウンサンプリング後のサンプリング周波数はナイキスト周波数が１７６０Ｈｚ以上となる、３５２０Ｈｚ以上にすれば良い。これから、ダウンサンプリングレートは、元のサンプリング周波数が４４．１ｋＨｚ（音楽ＣＤ）の場合、１／１２程度にすれば良いことになる。この時、ダウンサンプリング後のサンプリング周波数は、３６７５Ｈｚとなる。 The down-sampling rate of the chord detection waveform pre-processing unit varies depending on the chord detection range. The chord detection sound range is a sound range used when the chord name determination unit 60 detects a chord. For example, if the chord detection sound range is C3 to A6 (C4 is the center), the basic frequency of A6 is about 1760 Hz (when A4 = 440 Hz), so the sampling frequency after downsampling is a Nyquist frequency of 1760 Hz or higher. It may be 3520 Hz or higher. From this, the downsampling rate may be about 1/12 when the original sampling frequency is 44.1 kHz (music CD). At this time, the sampling frequency after downsampling is 3675 Hz.

ダウンサンプリングの処理は、通常、ダウンサンプリング後のサンプリング周波数の半分の周波数であるナイキスト周波数（今の例では１８３７．５Ｈｚ）以上の成分をカットするローパスフィルタを通した後に、データを読み飛ばす（今の例では波形サンプルの１２個に１１個を破棄する）ことによって行われる。これについては、上述したことと同じ理由による。 In the downsampling process, data is skipped after passing through a low-pass filter that cuts off components above the Nyquist frequency (1837.5 Hz in this example), which is usually half the sampling frequency after downsampling (now In this example, 11 out of 12 waveform samples are discarded). This is for the same reason as described above.

このようにして波形前処理部２１によるダウンサンプリングが終了したら、所定の時間間隔で、波形前処理部の出力信号をＦＦＴ演算部２２により、ＦＦＴ（高速フーリエ変換）する。 When the downsampling by the waveform preprocessing unit 21 is thus completed, the output signal of the waveform preprocessing unit is subjected to FFT (Fast Fourier Transform) by the FFT calculation unit 22 at predetermined time intervals.

ＦＦＴのパラメータ（ＦＦＴポイント数とＦＦＴ窓のシフト量）は、ビート検出時とコード検出時で異なる値とする。これは、周波数分解能を上げるためにＦＦＴポイント数を大きくすると、ＦＦＴ窓のサイズが大きくなってしまい、より長い時間から１回のＦＦＴを行うことになり、時間分解能が低下する、というＦＦＴの特性によるものである（つまりビート検出時は周波数分解能を犠牲にして時間分解能をあげるのが良い）。窓のサイズと同じだけの長さの波形を使わないで、窓の一部だけに波形データをセットし、残りは０で埋めることによってＦＦＴポイント数を大きくしても時間分解能が悪くならない方法もあるが、本実施例のケースでは、低音側のパワーも正しく検出するためにある程度の波形サンプル数は必要である。 The FFT parameters (the number of FFT points and the shift amount of the FFT window) are different values at the time of beat detection and code detection. This is because if the number of FFT points is increased to increase the frequency resolution, the size of the FFT window increases, and one FFT is performed from a longer time, resulting in a decrease in time resolution. (In other words, it is better to increase the time resolution at the expense of frequency resolution when detecting beats). A method that does not deteriorate the time resolution even if the number of FFT points is increased by setting waveform data to only a part of the window and filling the rest with 0 without using a waveform with the same length as the window size. However, in the case of the present embodiment, a certain number of waveform samples are necessary in order to correctly detect the power on the bass side.

以上のようなことを考慮し、本実施例では、ビート検出時はＦＦＴポイント数５１２、窓のシフトは３２サンプルで、０埋めなし、コード検出時はＦＦＴポイント数８１９２、窓のシフトは１２８サンプルで、波形サンプルは一度のＦＦＴで１０２４サンプル使うようにした。このような設定でＦＦＴ演算を行うと、ビート検出時は、時間分解能約８．７ｍｓ、周波数分解能約７．２Ｈｚ、コード検出時は、時間分解能約３５ｍｓ、周波数分解能約０．４Ｈｚとなる。今レベルを求めようとしている音階音は、Ｃ１からＡ６の範囲であるので、コード検出時の周波数分解能約０．４Ｈｚは、最も周波数差の小さいＣ１とＣ＃１の基本周波数の差、約１．９Ｈｚにも対応できる。また、四分音符＝３００のテンポの曲で３２分音符の長さが２５ｍｓであることを考えると、ビート検出時の時間分解能約８．７ｍｓは、十分な値であることがわかる。 In consideration of the above, in this embodiment, the number of FFT points is 512 at the time of beat detection, the window shift is 32 samples, 0 padding is not performed, the number of FFT points is 8192 at the time of code detection, and the window shift is 128 samples. Then, 1024 samples were used for the waveform sample in one FFT. When FFT calculation is performed with such a setting, the time resolution is about 8.7 ms and the frequency resolution is about 7.2 Hz when the beat is detected, and the time resolution is about 35 ms and the frequency resolution is about 0.4 Hz when the code is detected. Since the scale tone for which the level is to be obtained is in the range from C1 to A6, the frequency resolution of about 0.4 Hz at the time of detecting the chord is the difference between the basic frequency of C1 and C # 1 having the smallest frequency difference, about 1 .9 Hz is also supported. Considering that the length of a 32nd note is 25 ms in a song with a tempo of quarter note = 300, it can be seen that the time resolution of about 8.7 ms at the time of beat detection is a sufficient value.

このようにして、所定の時間間隔毎にＦＦＴ演算が行われ、その実数部と虚数部のそれぞれを二乗したものの和の平方根からパワーが計算され、その結果がレベル検出部２３に送られる。 In this way, the FFT operation is performed at predetermined time intervals, the power is calculated from the square root of the sum of the squares of the real part and the imaginary part, and the result is sent to the level detector 23.

レベル検出部２３では、ＦＦＴ演算部２２で計算されたパワー・スペクトルから、各音階音のレベルを計算する。ＦＦＴは、サンプリング周波数をＦＦＴポイント数で割った値の整数倍の周波数のパワーが計算されるだけであるので、このパワー・スペクトルから各音階音のレベルを検出するために、ビート検出用音階音レベル検出部２０の構成と同様な処理を行う。すなわち、音階音を計算するすべての音（Ｃ１からＡ６）について、その各音の基本周波数の上下５０セントの範囲（１００セントが半音）の周波数に相当するパワー・スペクトルの内、最大のパワーを持つスペクトルのパワーをこの音階音のレベルとする。 The level detector 23 calculates the level of each scale tone from the power spectrum calculated by the FFT calculator 22. Since FFT only calculates the power of an integer multiple of the sampling frequency divided by the number of FFT points, in order to detect the level of each scale tone from this power spectrum, the beat detection scale tone The same processing as that of the level detection unit 20 is performed. That is, for all the sounds (C1 to A6) for which the scale sound is calculated, the maximum power in the power spectrum corresponding to frequencies in the range of 50 cents above and below the fundamental frequency of each sound (100 cents is a semitone) is obtained. Let the power of the spectrum it has be the scale level.

すべての音階音についてレベルが検出されたら、これをバッファ４１に保存し、波形の読み出し位置を所定の時間間隔（先の例ではビート検出時は３２サンプル、コード検出時は１２８サンプル）進めて、ＦＦＴ演算部２２とレベル検出部２３の処理を波形の終わりまで繰り返す。 When the levels are detected for all the scale sounds, this is stored in the buffer 41, and the waveform readout position is advanced by a predetermined time interval (32 samples at the time of beat detection and 128 samples at the time of chord detection in the previous example) The processing of the FFT calculation unit 22 and the level detection unit 23 is repeated until the end of the waveform.

以上により、音楽音響信号の入力部１に入力された音響信号の、所定時間毎の各音階音のレベルが、コード検出用のバッファ４１にも保存される。 As described above, the level of each scale sound of the sound signal input to the music sound signal input unit 1 for each predetermined time is also stored in the chord detection buffer 41.

また上記ベース音検出部５０は、同じくコード名検出用プログラムが読み込まれて実行され、上記ＣＰＵ１００２により構成され、上述のように、上記コード検出用音階音レベル検出部４０で検出された各音階音のレベルのうち、上記小節検出部３０で検出された各小節内における低域側の音階音のレベルから、ベース音を検出する機能を有している。すなわち、ベース音は、コード検出用音階音レベル検出部４０が出力された各フレームの音階音のレベルから検出される。 The bass sound detecting unit 50 is similarly read and executed by the chord name detection program, and is constituted by the CPU 1002, and as described above, each scale sound detected by the chord detection scale sound level detecting unit 40. Among these levels, it has a function of detecting a bass sound from the level of the low-frequency tone in each measure detected by the measure detecting unit 30. That is, the bass sound is detected from the scale sound level of each frame output from the chord detection scale sound level detection unit 40.

図１３に、上記図５と同じ曲の同じ部分について、コード検出用音階音レベル検出部４０により出力された各フレームの音階音のレベルを示す。この図のように、コード検出用音階音レベル検出部４０での周波数分解能は、約０．４Ｈｚであるので、Ｃ１からＡ６のすべての音階音のレベルが抽出されている。 FIG. 13 shows the scale level of each frame output by the chord detection scale level detector 40 for the same part of the same song as in FIG. As shown in this figure, since the frequency resolution in the chord detection scale sound level detection unit 40 is about 0.4 Hz, the levels of all the scale sounds C1 to A6 are extracted.

ベース音は、小節の前半と後半で異なる可能性があるので、ベース音検出部５０により、各小節の前半と後半でそれぞれ検出する。前半と後半のベース音が同じ音のときは、小節のベース音としてこれを確定し、コードも小節全体で検出する。前半と後半で別の音のベース音が検出されたときは、コードも前半と後半に分けて検出する。場合によっては、分割する範囲を更に半分にまで（小節の４分の１まで）狭めてもよい。 Since the bass sound may be different between the first half and the second half of the measure, the bass sound detection unit 50 detects the first half and the second half of each measure. When the first half and the second half are the same, this is confirmed as the bass of the measure, and the chord is also detected in the entire measure. When different bass sounds are detected in the first half and the second half, the chord is also detected separately in the first half and the second half. In some cases, the range to be divided may be further reduced to half (up to a quarter of the bar).

本実施例では、ベース音決定部５０で、小節をどのように分割してコードを検出するかを決定しているが、小節の分割方法はこれに限るものではなく、例えば、本出願人による先行出願、特願２００６−２１６３６１にあるようにコード検出音域の音の変化によって小節を分割してもよい。また、この小節分割処理を、ベース音検出部５０で行うのではなく、独立した構成として設けてもよい。 In this embodiment, the bass sound determination unit 50 determines how to divide a bar and detect a chord. However, the method for dividing a bar is not limited to this, for example, by the applicant. As described in the prior application, Japanese Patent Application No. 2006-216361, the bars may be divided by the change of the sound in the chord detection range. Further, this measure division processing may be provided as an independent configuration instead of being performed by the bass sound detection unit 50.

さらに、後述するように、楽節類似性確定部１２０により、同じ楽節部分が、同じ小節分割になるように再検出させる処理を指示した場合は、同ベース音検出部５０の構成が行うことになる。これはもともとベース音決定部５０が、小節をどのように分割してコードを検出するかを決定しているからである。但し、該コード検出部５０にこのうな機能を持たせなくても、上記指示が楽節類似性確定部１２０により出された場合、それを受けて同じ楽節部分が、同じ小節分割になるようにする小節分割部（図示なし）として独立した構成として設けても良い。 Further, as will be described later, when the section similarity determination unit 120 instructs the same section part to be re-detected so as to be divided into the same measure, the bass sound detection unit 50 is configured. . This is because the bass sound determination unit 50 originally determines how to divide a bar and detect chords. However, even if the chord detection unit 50 does not have such a function, when the above instruction is issued by the segment similarity determination unit 120, the same segment part is divided into the same measure in response to the instruction. You may provide as a measure division part (not shown) as an independent structure.

ベース音は、ベース検出期間におけるベース検出音域の音階音のレベルの平均的な強さから求める。すなわちこれがベース音の強度である。 The bass sound is obtained from the average intensity of the scale sound level in the bass detection range during the bass detection period. That is, this is the intensity of the bass sound.

フレーム時間ｔにおけるｉ番目の音階音のレベルをＬ_ｉ（ｔ）とすると、フレームｆ_ｓからｆ_ｅのｉ番目の音階音の平均的なレベルＬ_ａｖｇｉ（ｆ_ｓ，ｆ_ｅ）は、下式数１４で計算できる。 When the level of the i-th note in the scale at frame time t and _L i (t), the average level of the i th scale notes of _{f e} from the frame _{_{_{f s L avgi (f s,}}} f e) is the following formula It can be calculated by Equation 14.

この平均的なレベルをベース検出音域、例えばＣ２からＢ３の範囲で計算し、平均的なレベルが最も大きな音階音をベース音として、ベース音検出部５０は、決定する。ベース検出音域に音が含まれない曲や無音部分で間違ってベース音を検出しないために、適当な閾値を設定し、検出したベース音の平均的なレベルが、この閾値以下の場合は、ベース音を検出しないようにしてもよい。また、後のコード検出でベース音を重要視する場合には、検出したベース音がベース検出期間中継続してあるレベル以上を保っているかどうかをチェックするようにして、より確実なものだけをベース音として検出するようにしてもよい。さらに、ベース検出音域中、平均的なレベルが最も大きい音階音をベース音として決定するのではなく、この各音名の平均的なレベルを１２の音名毎に平均し、この音名毎のレベルが最も大きな音名をベース音名として決定し、その音名を持つベース検出音域の中の音階音で、平均的なレベルが最も大きい音階音をベース音として決定するようにしてもよい。 This average level is calculated in a bass detection range, for example, a range from C2 to B3, and the bass sound detection unit 50 determines the scale sound having the highest average level as the bass sound. An appropriate threshold is set to prevent the bass sound from being erroneously detected in songs or silences that do not include sound in the bass detection range, and if the average level of the detected bass sound is below this threshold, Sound may not be detected. In addition, when the bass sound is important in later chord detection, it is checked whether the detected bass sound keeps a certain level or more continuously during the bass detection period, and only the more reliable ones are checked. You may make it detect as a bass sound. Further, instead of determining the scale tone having the highest average level in the bass detection range as the base tone, the average level of each pitch name is averaged for every 12 pitch names, The pitch name having the highest level may be determined as the bass pitch name, and the scale tone having the highest average level among the scale sounds in the bass detection range having the pitch name may be determined as the bass tone.

ベース音が決定したら、この結果をバッファ５１に保存すると共に、ベース検出結果を上記表示部８０に表示して、間違っている場合にはユーザに修正させるようにしてもよい。また、曲によってベース音域が変わることも考えられるので、ユーザがベース検出音域を変更できるようにしてもよい。 When the bass sound is determined, the result may be stored in the buffer 51, and the bass detection result may be displayed on the display unit 80 so that the user can correct it if it is wrong. Further, since the bass range may be changed depending on the song, the user may be able to change the bass detection range.

図１４に、ベース音検出部５０によるベース検出結果の表示例を示す。 In FIG. 14, the example of a display of the bass detection result by the bass sound detection part 50 is shown.

上記コード名決定部６０は、同じくコード名検出用プログラムが読み込まれて実行され、上記ＣＰＵ１００２により構成され、上述のように、ベース音検出部５０で検出されたベース音と各音階音のレベルから、各小節のコード名を決定する機能を有している。その他、後述するように、楽節類似性確定部１２０により、同じ楽節部分が、同じ小節分割及び同じコードになるように再検出させる処理を指示した場合は、同コード名決定部６０の構成が、該コードの再検出を行うことになる。 The chord name determination unit 60 is similarly read and executed by the chord name detection program, and is configured by the CPU 1002. As described above, the chord name determination unit 60 is based on the level of the bass sound and the scale sound detected by the bass sound detection unit 50. , Has the function of determining the chord name of each measure. In addition, as will be described later, when the section similarity determination unit 120 instructs the same section division to be re-detected so as to have the same measure division and the same code, the configuration of the code name determination unit 60 is: The code is re-detected.

コード名決定部６０によるコード検出処理も、同じようにコード検出期間における各音階音の平均的なレベルを計算することによって決定する。すなわちこれがコードの強度である。 The chord detection process by the chord name determination unit 60 is similarly determined by calculating the average level of each tone in the chord detection period. In other words, this is the strength of the cord.

本実施例では、コード検出期間とベース検出期間は同一としている。コード検出音域、例えばＣ３からＡ６の各音階音のコード検出期間における平均的なレベルを計算し、これが大きな値を持つ音階音から順に数個の音名を検出し、これとベース音の音名からコード名候補を抽出する。 In this embodiment, the code detection period and the base detection period are the same. The average level in the chord detection period, for example, the C3 to A6 scales in the chord detection period is calculated, and several pitch names are detected in order from the scale that has the largest value, and the pitch names of the bass sounds Extract code name candidates from.

上記コード情報記憶部６２は、検出した上記データを記憶する構成であり、上記ＲＡＭ１００６やハードディスクドライブ１０２０で構成される。 The code information storage unit 62 is configured to store the detected data, and includes the RAM 1006 and the hard disk drive 1020.

該コード情報記憶部６２には、コード名検出の過程において、コード位置にはベース音検出期間の最初のフレーム番号（コード１、コード２、……）が記憶され、ベース域音階音強度には、ベース音域内であって、ベース音検出期間のＣ〜Ｂまでの各１２音の強度の区間平均が記憶される。フレーム番号はサンプルにも変換できるのでサンプル番号で記憶しても良い。また、ベース音には、ベース音検出期間において検出されたベース音の音階番号が記憶される。コード音階音強度には、ベース検出期間のＣ〜Ｂまでの各１２音の強度の区間平均が記憶され、コード構成音には、ベース検出期間において抽出されたコード構成音が記憶される。そしてコード名には、決定されたコード名（或いはそれに対応した番号でも良い）が記憶される。 In the chord name detection process, in the chord name detection process, the chord position stores the first frame number (chord 1, chord 2,. The average of the intervals of the 12 sounds in the bass sound range from C to B in the bass sound detection period is stored. Since the frame number can also be converted into a sample, it may be stored as a sample number. The bass sound stores the scale number of the bass sound detected during the bass sound detection period. The chord scale sound intensity stores a section average of the intensity of 12 sounds from C to B in the base detection period, and the chord constituent sound stores the chord constituent sound extracted in the base detection period. In the code name, the determined code name (or a corresponding number) may be stored.

コード名決定部６０は、コード情報記憶部６２を参照してコード名を決定する。コード名決定部６０はコードのタイプ（ｍ、Ｍ７等）とコード構成音のルート音からの音程を保存したコード名データベースから、１つのコード名とそのコード構成音を検索する。そのコード構成音の平均強度を、コード情報記憶部６２のコード音階音強度より算出する。全てのコードのコード構成音平均強度が最も大きいコード名を、その区間のコード名と決定する。このとき、コードのルート音（根音）や５度の音は、コードを演奏する楽器では省略されることがあるので、これらを含まなくてもコード名候補として抽出するようにする。ベース音を検出した場合には、このコード名候補のコード名にベース音の音名を加える。すなわち、コードのルート音とベース音が同じ音名であれば、そのままで良いし、異なる音名の場合は分数コードとする。また、コード構成音平均強度が比較的大きなものを表示部８０に複数表示して、ユーザーに選択させるようにしても良い。 The code name determination unit 60 refers to the code information storage unit 62 to determine the code name. The chord name determination unit 60 searches for one chord name and its chord constituent sound from the chord name database storing the chord type (m, M7, etc.) and the pitch from the root tone of the chord constituent sound. The average intensity of the chord constituent sounds is calculated from the chord scale intensity in the chord information storage unit 62. The chord name having the highest chord constituent sound average intensity of all chords is determined as the chord name of the section. At this time, the chord root sound (five tone) and the fifth sound may be omitted in the musical instrument playing the chord, so that even if they are not included, they are extracted as chord name candidates. When a bass tone is detected, the pitch name of the bass tone is added to the chord name of this chord name candidate. In other words, if the chord root sound and the bass sound have the same pitch name, they can be left as they are. Alternatively, a plurality of chord constituent sound average intensities may be displayed on the display unit 80 to allow the user to select them.

上記方法では、抽出されるコード名候補が多すぎるという場合には、ベース音による限定を行ってもよい。つまり、ベース音が検出された場合には、コード名候補の中でそのルート音がベース音と同じ音名でないものは削除する。 In the above method, when there are too many code name candidates to be extracted, limitation by bass sound may be performed. That is, when a bass sound is detected, chord name candidates whose root sound is not the same as the base sound are deleted.

さらに、このコード構成音平均強度の計算に音楽的な知識を導入してもよい。例えば、各音階音のレベルを全フレームで平均し、それを１２の音名毎に平均して各音名の強さを計算し、その強さの分布から曲の調を検出する。そして、調のダイアトニックコードにはコード構成音平均強度が大きくなるようにある定数を掛ける、あるいは、調のダイアトニックスケール上の音から外れた音を構成音に含むコードはその外れた音の数に応じてコード構成音平均強度が小さくなるようにする等が、考えられる。さらにコード進行のよくあるパターンをデータベースとして記憶しておき、それと比較することで、コード候補の中からよく使われる進行になるようなものはコード構成音平均強度が大きくなるようにある定数を掛けるようにしてもよい。 Furthermore, musical knowledge may be introduced into the calculation of the chord constituent sound average intensity. For example, the level of each musical note is averaged over all frames, and is averaged for every 12 pitch names to calculate the strength of each pitch name, and the key of the song is detected from the distribution of the strength. Then, the key diatonic chord is multiplied by a certain constant so that the average intensity of the chord constituent sound is increased, or the chord that includes the sound deviating from the sound on the key diatonic scale is included in the tone of the off sound. It is conceivable that the chord constituent sound average intensity is reduced according to the number. In addition, by storing a pattern of common chord progressions as a database and comparing it with the ones that are frequently used among chord candidates, a certain constant is applied so that the average intensity of chord constituent sounds increases. You may do it.

いずれにしても、コード名決定部６０により、コード名が決定したら、この結果（変更情報）をバッファ６１を介して、上記コード情報記憶部６２に再記憶させると共に、表示部８０にも再表示させる。 In any case, when the code name is determined by the code name determination unit 60, the result (change information) is re-stored in the code information storage unit 62 via the buffer 61 and redisplayed on the display unit 80. Let

また、コード名を決定した結果を、図１５に示す。同図は、本実施例におけるコード情報記憶部６２の一部を示したものである。 The result of determining the code name is shown in FIG. This figure shows a part of the code information storage unit 62 in this embodiment.

一方、上記入力部１０で入力された音響信号は、上記ハードディスクドライブ１０２０で構成される音響信号記憶部７１に記憶される。また以上で決定されたコード名も、コード名情報記憶部６２から出力されて、該音響信号記憶部７１に記憶される。 On the other hand, the acoustic signal input from the input unit 10 is stored in the acoustic signal storage unit 71 configured by the hard disk drive 1020. The chord name determined above is also output from the chord name information storage unit 62 and stored in the acoustic signal storage unit 71.

またコード名決定部６０により、コード名が決定して、バッファ６１を介して、表示部８０に再表示させた際、ユーザの指示により、音響信号記憶部７１に記憶されている音響信号を、演奏部７０により、演奏させることができる。 In addition, when the code name is determined by the code name determination unit 60 and re-displayed on the display unit 80 via the buffer 61, the acoustic signal stored in the acoustic signal storage unit 71 according to a user instruction is The performance unit 70 can be made to perform.

上記第１の入力部９０は、キーボード１０１２及びマウス１０１８などで構成されているが、コード名決定結果が上記表示部８０に表示され、且つ上記演奏部７０でその音響信号を聴いたユーザにより、この第１の入力部９０が使用されて、複数の小節を含む楽節の区切りを受ける。上述のように、楽節とは、音楽形式で１つのまとまった単位であり、４小節の倍数であって、例えば８小節単位であることが多い。 The first input unit 90 includes a keyboard 1012 and a mouse 1018. A chord name determination result is displayed on the display unit 80, and a user who listens to the acoustic signal on the performance unit 70 The first input unit 90 is used to receive a section break including a plurality of measures. As described above, a musical phrase is a single unit in a music format, which is a multiple of four bars, and for example, is often a unit of eight bars.

ユーザは、上記演奏を聴きながら、楽節の区切り位置と思ったら、その箇所の小節線を、マウス１０１８でクリックする。クリックすると、その箇所に楽節区切りを示すマークが表示され、楽曲の先頭の区切りから順に、Ａ、Ｂ、Ｃ、…の文字が表示される。このＡ、Ｂ、Ｃ、…の文字は、後にコードを修正するときのマーカ（目印）として利用できる。なお、すでに楽節区切りに指定されている箇所をもう一度クリックすると、楽節区切りのキャンセルとして機能する。 While listening to the performance, the user clicks on the bar line at that point with the mouse 1018 when he thinks it is the break position of the passage. When clicked, a mark indicating a segment break is displayed at that location, and characters A, B, C,... Are displayed in order from the beginning of the song. The characters A, B, C,... Can be used as markers (marks) when the code is corrected later. If you click again on a section that has already been specified as a section break, it will function as a section break cancellation.

他方、上記第１の入力部９０で受ける楽節の区切りに、さらに、Ａメロ、Ｂメロ、サビ等の楽節の入力が受けられる構成とすることも可能である。以上の実施例の構成では、後述のように、コード検出後、その検出コードの類似性から、楽節類似性検出部１００により同じ楽節部分を検出するのであるが、その際、ユーザが入力するのは、第１の入力部９０による楽節の区切り位置だけであり、それがＡメロなのか、Ｂメロなのかというようなことは意識する必要がないのは上述の通りである。そして指定された楽節区切りから同じ小節数の楽節同士を比較し、同じようなコード進行であれば同一の楽節であると検出する。しかし、ユーザが予めその楽節がＡメロ、Ｂメロ或いはサビであると初めから分かっている場合は、楽曲全体としての総合的な構造が初めから明確になり、検出精度がより高まることから、上記のように、第１の入力部９０で受ける楽節の区切りに、さらに、Ａメロ、Ｂメロ、サビ等の楽節の入力が受けられる構成とすることもできる。 On the other hand, it is also possible to adopt a configuration in which an input of a passage such as an A melody, a B melody, or a chorus can be received in addition to the division of the passage received by the first input unit 90. In the configuration of the above embodiment, as will be described later, after detecting a code, the same passage portion is detected by the passage similarity detection unit 100 based on the similarity of the detected code. Is only the section break position by the first input unit 90, and it is not necessary to be aware of whether it is A melody or B melody, as described above. Then, the same number of measures are compared with each other from the specified passage breaks, and if the chord progression is the same, the same passage is detected. However, if the user knows in advance that the passage is A melody, B melody, or chorus, the overall structure of the entire song will be clear from the beginning, and the detection accuracy will be further improved. As described above, it is also possible to adopt a configuration in which an input of a passage such as an A melody, a B melody, and a chorus can be received in addition to the division of the passage received by the first input unit 90.

上記楽節類似性検出部１００は、前記と同じく、コード名検出用プログラムが読み込まれて実行され、上記ＣＰＵ１００２により構成され、上述のように、上記楽節の区切りの位置から、各楽節の小節数を割り出し、同じ小節数のものの中から検出したコードネームの文字列又はコードネームの構成音を比較することでそれらの類似性をチェックし、それらの類似性を表す類似度が所定の閾値以上の楽節同士には、ユニークなＩＤを割り振る機能を有している。 As described above, the passage similarity detection unit 100 is read and executed by the code name detection program, and is configured by the CPU 1002. As described above, the number of measures of each passage is calculated from the position of the division of the passage. The similarities are checked by comparing the character strings of the chord names detected from the ones with the same number of measures or the constituent sounds of the chord names, and the degree of similarity representing those similarities is a predetermined threshold or more. Each other has a function of assigning a unique ID.

上記第１の入力部９０により全ての楽節区切りを入力し終わったら、この楽節類似性検出部１００により、楽節の類似性の検出を実行する。図１６は、該楽節類似性検出部１００による、楽節類似性の検出の処理フローを示している。 When all the passage breaks have been input by the first input unit 90, the similarity of the passages is detected by the passage similarity detection unit 100. FIG. 16 shows a processing flow for detecting the similarity of the passages by the passage similarity detection unit 100.

同図に示すように、楽節の区切り位置から各楽節の小節数がカウントされ（ステップＳ２００）、同じ小節数の楽節があるか否かがチェックされる（ステップＳ２０２）。 As shown in the figure, the number of measures in each passage is counted from the position where the passage is separated (step S200), and it is checked whether there is a passage having the same number of measures (step S202).

ここで、同じ小節数の楽節がなければ（ステップＳ２０２；Ｎ）、処理を終了する。 Here, if there is no passage with the same number of bars (step S202; N), the process is terminated.

他方同じ小節数の楽節があれば（ステップＳ２０２；Ｙ）、同じ小節数の楽節についての類似性のチェックがなされる（ステップＳ２０４）。この類似性のチェックのやり方については後述する。 On the other hand, if there are passages with the same number of measures (step S202; Y), the similarity of the passages with the same number of measures is checked (step S204). A method for checking the similarity will be described later.

この類似性のチェックの結果、類似性の高い楽節に対しては、ユニークな（唯一の）ＩＤが割り当てられる（ステップＳ２０６）。この際、同一のＩＤの楽節は、同じ色で表示するなどしてユーザに分かりやすくするようにすると良い。 As a result of the similarity check, a unique (unique) ID is assigned to a passage having a high similarity (step S206). At this time, the passages with the same ID may be displayed in the same color so as to be easily understood by the user.

さらに、他に同じ小節数の楽節があるか否かがチェックされ（ステップＳ２０８）、あれば（ステップＳ２０８；Ｙ）、上記ステップＳ２０４に移行する。 Further, it is checked whether there are other passages with the same number of bars (step S208), and if there are (step S208; Y), the process proceeds to step S204.

他方、他に同じ小節数の楽節がなければ（ステップＳ２０８；Ｎ）、残りの楽節は、上述のように、すでにマーカ（目印）Ａ、Ｂ、Ｃ、…の文字が割り振られているので、そのままとし、処理を終了する。 On the other hand, if there are no other passages with the same number of measures (step S208; N), the remaining passages already have the markers (markers) A, B, C,. Leave it as it is, and end the process.

ここで上述した楽節の類似性のチェックのやり方について、以下に説明する。仮に図１７に示すように、第１楽節と第２楽節とが同じ小節数であったとする。そして、両楽節の間で、第３小節と第４小節が、夫々コードＣとコードＦであったとすると、夫々のコード構成音は、Ｃ＝（ド、ミ、ソ）であり、Ｆ＝（ファ、ラ、ド）であるから、構成音ドのみ一致し、３和音のうち１音が一致するので、一致する率は、第３小節と第４小節とも、夫々１／３ずつとなる。その他は、全てコードが一致しているので、一致率は夫々１となる。従って、（１＋１＋１／３＋１／３＋１＋１＋１＋１）／８（小節数）＝２０／２４＝５／６であり、その値は、０．８３３３３３３…となる。このように、コード構成音を比較するやり方の他に、単純なコードの文字列の比較でも良い（文字列の比較の場合上記コードＣとコードＦとは全く一致しなくなる）。 Here, the method for checking the similarity of the above-described passages will be described below. Suppose that the first and second passages have the same number of bars as shown in FIG. If the third bar and the fourth bar are chord C and chord F, respectively, the chord constituent sounds are C = (de, mi, so) and F = ( F, la, do), only the constituent sounds do match and one of the three chords matches, so the matching rate is 1/3 for each of the third and fourth measures. In all other cases, the codes match, so the match rate is 1. Therefore, (1 + 1 + 1/3 + 1/3 + 1 + 1 + 1 + 1) / 8 (number of measures) = 20/24 = 5/6, and the value is 0.8333333. In this way, in addition to the method of comparing chord constituent sounds, it is also possible to compare simple character strings (in the case of character string comparison, code C and code F do not match at all).

比較した結果、類似度合（％で表しても良い）が、ある閾値以上の場合は、両方の楽節にユニークな同じＩＤを振る。この際、既にどちらかにＩＤが振られている（つまり、既にどこか他の楽節と類似性が検出されている）場合は、まだＩＤが振られていない楽節に、そのＩＤを振る。既に両方ともＩＤが振られている場合は、どちらのＩＤにするか決めて、それと違う方のＩＤを持つ全ての楽節のＩＤを、決めたＩＤに修正する。（つまり、別々のものとして検出されていた２つの楽節のグループを、どれか１対の楽節が同じであると判断したら、全て同じものであるとしてしまう。） As a result of comparison, if the degree of similarity (which may be expressed in%) is equal to or greater than a certain threshold value, the same ID unique to both passages is assigned. At this time, if an ID has already been assigned to either one (that is, similarity with another passage has already been detected), the ID is assigned to a passage that has not yet been assigned an ID. If both have already been assigned IDs, determine which ID is to be used, and modify the IDs of all the passages having different IDs to the determined IDs. (That is, if two pairs of passages that have been detected as separate are judged to have the same pair of passages, they are all considered to be the same.)

その他、この楽節類似判定における一致度合いとして、コードのルート（基音）に重み付けして、その一致度合いを算出しても良い。その場合は、下記式数１５のように、算出されることになる。 In addition, as the degree of matching in this passage similarity determination, the chord root (fundamental tone) may be weighted to calculate the degree of matching. In that case, it is calculated as shown in the following equation (15).

（数１５）
（コードのルートに重み付けした場合のコードの一致度合）＝（同じ音名の構成音の数＋（同じルートの場合１；そうでない場合０）／（コード構成音の数の多い方＋１） (Equation 15)
(Cord match degree when weighting the chord root) = (Number of constituent sounds of the same note name + (1 in the case of the same route; 0 otherwise) / (one having the larger number of chord constituent sounds + 1))

上記表示部８０は、上記ディスプレイ１００８で構成されており、音響信号の演奏状況・楽節区切り状況・及びユニークなＩＤの振られた楽節を含む類似性検出状況をユーザに表示する機能を有している。図１８は、「ふるさと」という曲の楽節類似性検出結果を、当該表示部８０に示した状態を示す画面説明図である。ユーザは、この検出画面を見て楽節の類似性検出が正しく行われたかが確認できる。 The display unit 80 is configured by the display 1008 and has a function of displaying to the user the performance detection status of the acoustic signal, the segment separation status, and the similarity detection status including the segment with the unique ID. Yes. FIG. 18 is an explanatory diagram of a screen showing a state where the result of detecting the similarity of the passage of the song “Hometown” is displayed on the display unit 80. The user can check whether the similarity detection of the passage has been correctly performed by looking at the detection screen.

上記第２の入力部１１０は、第１の入力部９０と同様、キーボード１０１２及びマウス１０１８などで構成されており、楽節類似性検出結果が上記表示部８０に表示され、この結果を見た（場合により上記演奏部７０でその音響信号を聴いた）ユーザにより、この第２の入力部１１０が使用されて、上記類似性検出状況の判定を受け付ける機能を有している。 Similar to the first input unit 90, the second input unit 110 includes a keyboard 1012, a mouse 1018, and the like, and the passage similarity detection result is displayed on the display unit 80, and the result is seen ( The second input unit 110 is used by a user who listens to the sound signal at the performance unit 70 in some cases, and has a function of accepting the determination of the similarity detection status.

上記楽節類似性確定部１２０は、同じく、コード名検出用プログラムが読み込まれて実行され、上記ＣＰＵ１００２により構成され、上述のように、上記類似性検出状況の判定について上記第２の入力部１１０により確定しないと入力された場合に、上記閾値を変更させて（デフォルトでセットされていた閾値が低い値であればその閾値は通常高めに変更されるが、それに限定されるわけではない）、上記楽節類似性検出部１００に、各楽節の類似性の再検出を行わしめると共に、同じく上記第２の入力部１１０により上記判定を確定すると入力された場合に、同じ楽節部分は、同じ小節分割及び同じコードになるように、上記ベース音検出部５０及びコード名決定部６０に再検出させる機能を有している。 Similarly, the section similarity determination unit 120 is read and executed by the code name detection program, and is configured by the CPU 1002. As described above, the second input unit 110 determines the similarity detection status. When it is input that it is not confirmed, the threshold value is changed (if the threshold value set by default is a low value, the threshold value is usually changed to a higher value, but it is not limited to this). When the passage similarity detection unit 100 performs re-detection of similarity of each passage, and when the second input unit 110 also inputs the determination, the same passage portion is divided into the same measure division and The bass sound detecting unit 50 and the chord name determining unit 60 have a function of redetecting the same chord.

楽節類似性検出部１００により同じＩＤに割り振られた楽節は、上述のように、表示部８０で同じ色で表示される。ユーザはこれを見て、正しく判断されているかどうか（同じ色の部分が同じ楽節かどうか）をチェックし、誤っている場合は、第２の入力部１１０により、楽節類似性検出部１００でチェックに使用される閾値を変更して、再度楽節類似性検出を行う。正しい判断になるまで繰り返し、正しくなった時点で、ユーザは第２の入力部１１０を使用し、上記楽節類似性確定部１２０により、これを確定する。 The passages assigned to the same ID by the passage similarity detection unit 100 are displayed in the same color on the display unit 80 as described above. The user sees this and checks whether it is correctly judged (whether the same color part is the same passage) or not, and if it is wrong, the second input unit 110 checks with the passage similarity detection unit 100. The threshold used for is changed, and the passage similarity detection is performed again. The process is repeated until a correct determination is made. When the determination is correct, the user uses the second input unit 110 and the passage similarity determination unit 120 determines this.

楽節類似性が確定したら、上記楽節類似性確定部１２０は、同じ楽節部分は同じ小節分割、コードになるように、再度小節分割、コード検出を、コード名決定部６０に指示する。この際、コード名決定部６０には、同じ楽節部分はそれら全てを使って総合的に小節分割とコードを判断させるようにする。（例えば、小節分割に関しては、全ての同じ楽節の小節で分割すると判断した場合にのみ分割する、または、半数以上の楽節で分割すると判断したら分割する。コード検出に関しては、全ての同じ楽節の同じコード検出箇所のコード候補と尤度を合計して判断する、など） When the passage similarity is determined, the passage similarity determination section 120 instructs the code name determination section 60 to again measure and code the code so that the same section becomes the same measure and code. At this time, the chord name determination unit 60 uses all of the same passage parts to comprehensively determine bar division and chords. (For example, with respect to measure division, divide only when it is determined that all the same passages are divided, or when it is determined that more than half of the divisions are divided. Judgment is based on the sum of code candidates and likelihoods at code detection locations)

さらに上記修正部１３０は、第１の入力部９０や第２の入力部１１０と同様、キーボード１０１２及びマウス１０１８などで構成されており、楽節類似性確定部１２０で確定された結果が上記表示部８０に表示され、この結果を見た（場合により上記演奏部７０でその音響信号を聴いた）ユーザにより、この修正部１３０が使用されて、上記楽節類似性確定部１２０が上記コード名決定部６０に、同じ楽節部分が、同じ小節分割及び同じコードになるように再検出を指示した際に、小節分割及び／又はコードの修正を行える構成である。 Further, the correction unit 130 includes a keyboard 1012 and a mouse 1018 as in the case of the first input unit 90 and the second input unit 110, and the result determined by the passage similarity determination unit 120 is the display unit. The modification unit 130 is used by a user who has seen this result (and listened to the acoustic signal at the performance unit 70), and the passage similarity determination unit 120 causes the chord name determination unit 60, when the re-detection is instructed so that the same section portion has the same bar division and the same chord, the bar division and / or the chord correction can be performed.

上記楽節類似性確定部１２０により、同じ楽節は、同じ小節分割、コードとして検出されているはずである。しかし、これらが誤っている場合（例えば同じＩＤがついている楽節間で同じコードＣと確定しているものが実はＦであったという場合）は、上記修正部１３０により、修正を行う。 The same passage should have been detected as the same measure division and code by the above-mentioned passage similarity determination unit 120. However, when these are incorrect (for example, when the same code C is confirmed between the passages with the same ID is actually F), the correction unit 130 performs correction.

この際、図１９に示すように、例えば第１楽節の第５小節目が、例えばＣからＡに修正された場合、同じＩＤのついた第２楽節の第５小節目も、ＣからＡに修正が行われる。（その小節の楽節にＩＤが振られていたら、同じＩＤの他の楽節の、楽節の最初からの小節数が同じ小節を、同様に修正する。） At this time, as shown in FIG. 19, for example, when the fifth measure of the first passage is corrected from C to A, the fifth measure of the second passage with the same ID is also changed from C to A. Corrections are made. (If an ID is assigned to the measure of the measure, the measure with the same number of measures from the beginning of the measure of other measures with the same ID is corrected in the same manner.)

以上説明した本実施例の構成によれば、コード検出後、そのコード進行の類似性から同じ楽節部分を検出し、謂わば楽曲の構造解析を行っているため、類似すると検出され、確定がなされた場合は、同じ楽節部分は、基本的に同じ小節分割、同じコード進行となり、コード名の検出精度が極めて高くなる。また、そのコード検出で誤認識がたとえあったとしても、ユーザによる修正作業が可能であるので、同一のＩＤがつく箇所の１つを修正するだけで、他の部分は、自動的に修正されることになり、そのため、ユーザは同じような修正を何度も繰り返す必要がなくなり、修正の手間を大幅に減らすことが可能となる。 According to the configuration of the present embodiment described above, after detecting a chord, the same passage part is detected from the chord progression similarity, so-called so-called music structure analysis is performed. In this case, the same passage portion basically has the same measure division and the same chord progression, and the detection accuracy of the chord name becomes extremely high. Moreover, even if there is a misrecognition in the code detection, the correction work by the user is possible. Therefore, by correcting only one of the parts with the same ID, the other part is automatically corrected. Therefore, the user does not need to repeat the same correction over and over, and the time and effort for correction can be greatly reduced.

尚、本発明のコード名検出装置及びコンピュータ・プログラムは、上述の図示例にのみ限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々変更を加え得ることは勿論である。 The code name detection device and the computer program of the present invention are not limited to the illustrated examples described above, and it is needless to say that various changes can be made without departing from the gist of the present invention.

本発明のコード名検出装置及びコンピュータ・プログラムは、ミュージックプロモーションビデオの作成の際などに音楽トラック中のビートの時刻に対して映像トラック中のイベントを同期させるビデオ編集処理や、ビートトラッキングによりビートの位置を見つけ音楽の音響信号の波形を切り貼りするオーディオ編集処理、人間の演奏に同期して照明の色・明るさ・方向・特殊効果などといった要素を制御したり、観客の手拍子や歓声などを自動制御するライブステージのイベント制御、音楽に同期したコンピュータグラフィックスなど、種々の分野で利用可能である。 The code name detection device and the computer program of the present invention can perform beat editing by synchronizing the event in the video track with the time of the beat in the music track when creating a music promotion video, or by beat tracking. Audio editing processing that finds the position and cuts and pastes the waveform of the sound signal of music, controls elements such as lighting color, brightness, direction, special effects, etc. in synchronization with human performance, and automatically controls the clapping and cheers of the audience It can be used in various fields such as event control of a live stage to be controlled and computer graphics synchronized with music.

本発明の望ましい実施形態が適用されるパーソナルコンピュータの回路概要図である。It is a circuit schematic diagram of a personal computer to which a preferred embodiment of the present invention is applied. 本発明に係るコード名検出装置の全体ブロック図である。1 is an overall block diagram of a code name detection device according to the present invention. 音階音レベル検出部２０又は４０の装置構成説明図である。It is apparatus explanatory drawing of a scale sound level detection part 20 or 40. ビート検出部２５における処理の流れを示すフローチャートである。4 is a flowchart showing a flow of processing in a beat detection unit 25. ある曲の一部分の波形と各音階音のレベル、各音階音のレベル増分値の合計を示すグラフである。It is a graph which shows the sum total of the waveform of a part of a certain music, the level of each scale sound, and the level increment value of each scale sound. 自己相関計算の概念図である。It is a conceptual diagram of autocorrelation calculation. 先頭のビート位置の決定方法を示す説明図である。It is explanatory drawing which shows the determination method of a head beat position. 最初のビート位置決定後のそれ以降のビートの位置を決定していく方法を示す説明図である。It is explanatory drawing which shows the method of determining the position of the beat after it after the first beat position determination. ｓの値に応じて変えられる係数ｋの分布状態を示すグラフである。It is a graph which shows the distribution state of the coefficient k changed according to the value of s. ２番目以降のビートの位置の決定方法を示す説明図である。It is explanatory drawing which shows the determination method of the position after the 2nd beat. ビート検出結果の確認画面の例を示す画面表示図である。It is a screen display figure which shows the example of the confirmation screen of a beat detection result. 小節検出結果の確認画面の例を示す画面表示図である。It is a screen display figure which shows the example of the confirmation screen of a bar detection result. 曲の同じ部分のコード検出用音階音レベル検出部４０で出力した各フレームの音階音のレベルを示すグラフである。It is a graph which shows the level of the scale sound of each flame | frame output by the chord detection scale sound level detection part 40 of the same part of a music. ベース音検出部５０によるベース検出結果の表示例を示すグラフである。5 is a graph showing a display example of a bass detection result by a bass sound detection unit 50. コード名検出結果の確認画面の例を示す画面表示図である。It is a screen display figure which shows the example of the confirmation screen of a code name detection result. 楽節類似性検出部１００による楽節類似性検出処理フローを示すフローチャートである。4 is a flowchart showing a passage similarity detection processing flow by a passage similarity detection unit 100; 楽節類似性の一例を示す説明図である。It is explanatory drawing which shows an example of a passage similarity. 「ふるさと」という曲の楽節類似性検出結果を、当該表示部８０に示した状態を示す画面説明図である。FIG. 10 is an explanatory diagram of a screen showing a state in which a section similarity detection result of a song “Hometown” is displayed on the display unit 80. 類似性が確定された間の楽節で、任意の小節などの修正がなされた場合の、その修正結果が類似楽節の同じ小節に反映される状態を示す説明図である。It is explanatory drawing which shows the state in which the correction result is reflected in the same measure of a similar passage when correction of arbitrary measures etc. is made in the passage while similarity was decided.

Explanation of symbols

１０入力部
２０、４０音階音レベル検出部
２１波形前処理部
２２ＦＦＴ演算部
２３レベル検出部
２４、２６、３１、４１、５１、６１バッファ
２５ビート検出部
３０小節検出部
５０ベース音検出部
６０コード名決定部
６２コード情報記憶部
７０演奏部
７１音響信号記憶部
８０表示部
９０第１の入力部
１００楽節類似性検出部
１１０第２の入力部
１２０楽節類似性確定部
１３０修正部
１０００システムバス
１００２ＣＰＵ
１００４ＲＯＭ
１００６ＲＡＭ
１００８ディスプレイ
１０１０Ｉ／Ｏインターフェース
１０１２キーボード
１０１４サウンドシステム
１０１６ＣＤ−ＲＯＭドライブ
１０１６ａプログラムＣＤ−ＲＯＭ
１０１８マウス
１０２０ハードディスクドライブ DESCRIPTION OF SYMBOLS 10 Input part 20, 40 Scale sound level detection part 21 Waveform pre-processing part 22 FFT operation part 23 Level detection part 24, 26, 31, 41, 51, 61 Buffer 25 Beat detection part 30 Measure detection part 50 Bass sound detection part 60 Code name determination unit 62 Code information storage unit 70 Performance unit 71 Acoustic signal storage unit 80 Display unit 90 First input unit 100 Phrase similarity detection unit 110 Second input unit 120 Phrase similarity determination unit 130 Correction unit 1000 System bus 1002 CPU
1004 ROM
1006 RAM
1008 Display 1010 I / O interface 1012 Keyboard 1014 Sound system 1016 CD-ROM drive 1016a Program CD-ROM
1018 Mouse 1020 Hard disk drive

Claims

In the chord name detection device for detecting beats, detecting bars, and detecting / determining chords from the acoustic signals for the input acoustic signals,
A performance means for performing the acoustic signal;
A first input means for receiving a break of a passage including a plurality of measures according to the performance;
The number of measures for each passage is determined from the positions of the above-mentioned passages, and the similarities are checked by comparing the character strings of the code names or the constituent sounds of the code names detected from those with the same number of measures. A passage similarity detection means for assigning a unique ID to the passages having a similarity equal to or greater than a predetermined threshold,
Display means for displaying to the user the performance status of the acoustic signal, the section separation status, and the similarity detection status including the passage with the unique ID assigned;
A second input means for accepting determination of the similarity detection status by the user;
When it is input that the determination of the similarity detection status is not fixed, the threshold value is changed, and the similarities of the sections are re-detected by the section similarity detecting means, and the determination is confirmed. A chord name detection apparatus comprising: a phrase similarity determination unit that, when input, re-detects the same section portion so as to have the same measure division and the same code.

A means for correcting and / or correcting a chord is provided when the same section portion is re-detected so as to have the same measure division and the same code by the measure similarity determining means. Item 2. The code name detection device according to Item 1.

3. The chord name detection apparatus according to claim 1, wherein an input of a melody such as an A melody, a B melody or a chorus is further received as a division of the melody received by the first input means.

In the chord name detection device according to any one of claims 1 to 3, for a configuration that performs beat detection, measure detection, and chord detection / determination from an input acoustic signal.
An input means for inputting an acoustic signal;
First scale sound level detection means for performing FFT calculation using a parameter suitable for beat detection at predetermined time intervals from the input acoustic signal, and obtaining the level of each scale sound for each predetermined time;
The increment value of each scale sound level for each predetermined time is summed for all the scale sounds to obtain a total of level increment values indicating the degree of change in the overall sound for each predetermined time. Beat detection means for detecting the average beat interval and the position of each beat from the sum of the incremental values of the level indicating the degree of change in the overall sound for each,
The average value of the scale level for each beat is calculated, and the increment value of the average level of each scale sound for each beat is added for all the scale sounds to indicate the degree of change in the overall sound for each beat. A bar detecting means for obtaining a value and detecting a time signature and a bar line position from a value indicating a change degree of the whole sound for each beat;
From the input acoustic signal, an FFT operation is performed using a parameter suitable for chord detection at a predetermined time interval different from that at the time of the previous beat detection, and the level of each scale sound for each predetermined time is calculated. Second scale level detection means to be obtained;
Bass sound detection means for detecting a bass sound from the level of the low-frequency scale sound in each measure out of the detected scale levels,
Chord name determining means for determining the chord name of each measure from the detected bass sound and the level of each scale sound;
For every detected chord, from the chord position, the base tone scale intensity obtained from the scale sound level of the bass detection range during the base detection period, the base tone, and the scale sound level of the chord detection range during the chord detection period The chord name detection apparatus according to any one of claims 1 to 3, further comprising: a chord scale sound intensity, chord constituent sound, chord constituent sound number, and chord information storage means for storing chord names.

By being read and executed by a computer, the computer is
For the input acoustic signal, from the acoustic signal, function as a configuration for detecting beats, detecting bars and detecting / determining chords,
A performance means for performing the acoustic signal;
A first input means for receiving a break of a passage including a plurality of measures according to the performance;
The number of measures for each passage is determined from the positions of the above-mentioned passages, and the similarities are checked by comparing the character strings of the code names or the constituent sounds of the code names detected from those with the same number of measures. A passage similarity detection means for assigning a unique ID to the passages having a similarity equal to or greater than a predetermined threshold,
Display means for displaying to the user the performance status of the acoustic signal, the section separation status, and the similarity detection status including the passage with the unique ID assigned;
A second input means for accepting determination of the similarity detection status by the user;
When it is input that the determination of the similarity detection status is not fixed, the threshold value is changed, and the similarities of the sections are re-detected by the section similarity detecting means, and the determination is confirmed. A computer program for detecting a chord name, comprising a function as a phrase similarity determining means for re-detecting the same section portion so as to have the same measure division and the same code when inputted.

By being read and executed by a computer, the computer is
For the input acoustic signal, from the acoustic signal, function as a configuration for detecting beats, detecting bars and detecting / determining chords,
A performance means for performing the acoustic signal;
A first input means for receiving a break of a passage including a plurality of measures according to the performance;
The number of measures for each passage is determined from the positions of the above-mentioned passages, and the similarities are checked by comparing the character strings of the code names or the constituent sounds of the code names detected from those with the same number of measures. A passage similarity detection means for assigning a unique ID to the passages having a similarity equal to or greater than a predetermined threshold,
Display means for displaying to the user the performance status of the acoustic signal, the section separation status, and the similarity detection status including the passage with the unique ID assigned;
A second input means for accepting determination of the similarity detection status by the user;
When it is input that the determination of the similarity detection status is not fixed, the threshold value is changed, and the similarities of the sections are re-detected by the section similarity detecting means, and the determination is confirmed. A passage similarity determination unit that, when input, causes the same section portion to be re-detected to have the same measure division and the same code;
By means of the above-mentioned section similarity determination means, when the same section portion is re-detected so as to have the same measure division and the same code, a function as a correction means capable of dividing the bars and / or correcting the code is provided. A computer program for detecting code names.

7. The computer program for detecting a code name according to claim 5 or 6, wherein a passage such as an A melody, a B melody or a chorus is further received as a segment of the passage received by the first input means.

By being loaded and executed on a computer,
An input means for inputting an acoustic signal;
First scale sound level detection means for performing FFT calculation using a parameter suitable for beat detection at predetermined time intervals from the input acoustic signal, and obtaining the level of each scale sound for each predetermined time;
The increment value of each scale sound level for each predetermined time is summed for all the scale sounds to obtain a total of level increment values indicating the degree of change in the overall sound for each predetermined time. Beat detection means for detecting the average beat interval and the position of each beat from the sum of the incremental values of the level indicating the degree of change in the overall sound for each,
The average value of the scale level for each beat is calculated, and the increment value of the average level of each scale sound for each beat is added for all the scale sounds to indicate the degree of change in the overall sound for each beat. A bar detecting means for obtaining a value and detecting a time signature and a bar line position from a value indicating a change degree of the whole sound for each beat;
From the input acoustic signal, an FFT operation is performed using a parameter suitable for chord detection at a predetermined time interval different from that at the time of the previous beat detection, and the level of each scale sound for each predetermined time is calculated. Second scale level detection means to be obtained;
Bass sound detection means for detecting a bass sound from the level of the low-frequency scale sound in each measure out of the detected scale levels,
Chord name determining means for determining the chord name of each measure from the detected bass sound and the level of each scale sound;
For every detected chord, from the chord position, the base tone scale intensity obtained from the scale sound level of the bass detection range during the base detection period, the base tone, and the scale sound level of the chord detection range during the chord detection period Functions as chord information storage means for storing the required chord scale sound intensity, chord constituent sound, chord constituent sound number, chord name, beat detection, measure detection and chord detection / determination from the input acoustic signal The computer program for code name detection according to any one of claims 5 to 7, wherein the computer is further provided with a configuration for performing the above.