JP5196550B2

JP5196550B2 - Code detection apparatus and code detection program

Info

Publication number: JP5196550B2
Application number: JP2008137100A
Authority: JP
Inventors: 錬澄田
Original assignee: Kawai Musical Instrument Manufacturing Co Ltd
Current assignee: Kawai Musical Instrument Manufacturing Co Ltd
Priority date: 2008-05-26
Filing date: 2008-05-26
Publication date: 2013-05-15
Anticipated expiration: 2028-05-26
Also published as: JP2009282464A

Description

本発明は、コード検出装置およびコード検出プログラムに関し、特に、複数の楽音が混ざった音楽音響信号（以下、単に「音響信号」という）からコード（和音）を精度良く検出するのに好適なコード検出装置およびコード検出プログラムに関する。 The present invention relates to a chord detection device and a chord detection program, and in particular, chord detection suitable for accurately detecting chords (chords) from a music acoustic signal in which a plurality of musical sounds are mixed (hereinafter simply referred to as “acoustic signal”). The present invention relates to an apparatus and a code detection program.

ポピュラー系の音楽においてコードは非常に重要な要素であり、このようなジャンルの音楽を小編成のバンドで演奏する場合、演奏する個々の音符が書かれた楽譜は使用しないで、コード譜またはリードシートと呼ばれるメロディとコード進行のみが書かれた楽譜を使用することが通常である。したがって、市販の音楽ＣＤ等の曲をバンドで演奏するためには曲のコード進行を採譜する必要があるが、この作業は特別な音楽的知識を有する専門家のみが可能であり、一般の人には不可能であった。そこで、市販のパーソナルコンピュータなどを使用して複数の楽音が混ざった音響信号を高速フーリエ変換手法（ＦＦＴ）を使って演算処理してコードを検出するコード検出装置やコード検出プログラムが種々検討されている。 Chord is a very important element in popular music, and when playing music of such a genre in a small band, do not use the score with the individual notes to play, but chord or lead It is common to use a musical score in which only a melody called a sheet and chord progression is written. Therefore, it is necessary to record the chord progression of a song in order to play a song such as a commercially available music CD in a band, but this work can only be performed by an expert with special musical knowledge. It was impossible. Therefore, various code detection devices and code detection programs for detecting codes by performing arithmetic processing on a sound signal mixed with a plurality of musical sounds using a fast Fourier transform method (FFT) using a commercially available personal computer have been studied. Yes.

例えば、１小節が複数のコードで構成されている場合、小節を前半と後半とに分割して、それぞれでベース音を検出し、互いのベース音が異なっていたときに、小節の前半と後半とでそれぞれコード検出されるようにしたコード検出装置が知られる（特許文献１）。 For example, if one measure is composed of multiple chords, the measure is divided into the first half and the second half, and the bass sound is detected in each of them. A code detection device is known in which a code is detected in each case (Patent Document 1).

しかし、特許文献１に記載されたコード検出装置では、複数の互いに異なるコードが同じベース音を含んでいる場合、１小節全体で１つのコードしか検出されず、検出精度が不十分であった。また、小節全体で強い音をベース音とするので、ジャズのように４分音符で音をつないでいくベースランニング等では、ベース音を正しく検出することができないおそれがある。 However, in the chord detection apparatus described in Patent Document 1, when a plurality of different chords contain the same bass sound, only one chord is detected in one whole measure, and the detection accuracy is insufficient. In addition, since a strong sound in the whole measure is used as a bass sound, there is a possibility that the bass sound cannot be correctly detected in bass running or the like in which sounds are connected with quarter notes like jazz.

そこで、同じベース音を含む互いに異なるコードが小節内にある場合でも正しいコードを検出できるコード検出装置が提案されている（特許文献２）。このコード検出装置では、ベース音のみではなく、コードの変化度合に応じて小節を分割する。つまり、ベース音が異なるか、あるいはコードの変化度合が大きい場合に小節を分割してコードを検出する。
特開２００７−５２３９４号公報特開２００８−４０２８３号公報 In view of this, a chord detection apparatus has been proposed that can detect a correct chord even when different chords containing the same bass sound are present in a measure (Patent Document 2). In this chord detection apparatus, measures are divided not only according to the bass sound but also according to the degree of change in chords. That is, when the bass sound is different or the change degree of the chord is large, the chord is detected by dividing the bar.
JP 2007-52394 A JP 2008-40283 A

特許文献２に記載された装置では、まず、ベース音とコード構成音候補の変化度合を使って、単一のコードを含んでいると思われるコード検出区間を決定し、このコード検出区間内でベース音およびコード構成音を使ってコード（コード名）を決定している。しかし、このようにコード検出区間内の情報としてのベース音およびコード構成音だけを使ってコード検出するものでは、依然としてコード検出結果の精度が不十分であった。 In the apparatus described in Patent Literature 2, first, a chord detection section that is considered to contain a single chord is determined using the degree of change between the base sound and the chord constituent sound candidates, and within this chord detection section, The chord (chord name) is determined using the bass sound and the chord component sound. However, in the case of performing chord detection using only the base sound and chord constituent sound as information in the chord detection section as described above, the accuracy of the chord detection result is still insufficient.

例えば、ユーザ毎に好まれる曲のジャンルがあるので、そのジャンルにおいて一般的とされるコード進行やコード理論を考慮してコード検出をすることができれば精度の高いコード検出結果が期待できる。 For example, there is a genre of music that is preferred for each user. If chord detection can be performed in consideration of chord progression and chord theory that are generally used in the genre, a highly accurate chord detection result can be expected.

本発明は、上記課題に対してなされたものであって、ベース音やコード構成音にだけ依存するのではなく、曲のジャンルやユーザが好む曲の傾向等も考慮して正しいコードが検出されるようにすることができるコード検出装置およびコード検出プログラムを提供することを目的とする。 The present invention has been made for the above-mentioned problem, and does not depend only on the bass sound or the chord constituent sound, but also detects the correct chord in consideration of the genre of the song and the tendency of the song preferred by the user. It is an object of the present invention to provide a code detection device and a code detection program that can be configured to be configured.

上記の課題を解決し、目的を達成するための本発明は、曲の音響信号の各音階音のパワーを検出する手段と、前記音響信号に複数のコード検出区間を設定する手段と、前記パワーから、コード検出区間のそれぞれの１拍目に相当する部分における予定のベース音検出音域の各音階音のパワーを検出する手段と、各音階音のうち、パワーが最大である音をベース音として決定する手段と、コード検出区間全体の音階音からパワーが強い順に音名を予定数抽出し、該予定数の音名をそれぞれルート音とするコード候補を検出する手段と、前記コード候補についてすべてのコード構成音の平均的なパワーの大きさから尤度を計算し、該尤度が最大のコード候補をコードと決定する手段と、コード検出対象である曲のジャンルの入力手段とからなり、前記ベース音の決定においては、各音階音の基音および各倍音のパワーが最大の音をベース音とするとともに、前記ジャンルに応じて基音および倍音のパワーに別々に重み付けをするようにした点に第１の特徴がある。 In order to solve the above problems and achieve the object, the present invention comprises means for detecting the power of each musical tone of a musical sound signal, means for setting a plurality of chord detection sections in the acoustic signal, and the power From the above, means for detecting the power of each scale sound in the predetermined bass sound detection range in the portion corresponding to the first beat of each chord detection section, and the sound having the maximum power among the scale sounds as the base sound Means for determining, extracting a predetermined number of pitch names in descending order of power from the scale sounds of the entire chord detection section, detecting chord candidates each having the predetermined number of pitch names as root sounds, and all the chord candidates A likelihood is calculated from the average power level of the chord constituent sound, and a chord candidate having the maximum likelihood is determined as a chord, and a genre of a genre of a song that is a chord detection target, In determining the serial bass, with power of the fundamental tone and the harmonics of each chromatic note is based sound up sound, in that so as to weight differently to the power of the fundamental and harmonics in response to said genre There is a first feature.

また、本発明は、曲の音響信号の各音階音のパワーを検出する手段と、前記音響信号に複数のコード検出区間を設定する手段と、前記パワーから、コード検出区間のそれぞれの１拍目に相当する部分における予定のベース音検出音域の各音階音のパワーを検出する手段と、各音階音のうち、パワーが最大である音をベース音として決定する手段と、コード検出区間全体の音階音からパワーが強い順に音名を予定数抽出し、該予定数の音名をそれぞれルート音とするコード候補を検出する手段と、前記コード候補についてすべてのコード構成音の平均的なパワーの大きさから尤度を計算し、該尤度が最大のコード候補をコードと決定する手段と、テンションレベルの入力手段とからなり、前記コード候補を検出する手段が、テンションレベルに応じた構成音数のコードにコード候補を絞り込むように構成されている点に第２の特徴がある。 The present invention also provides means for detecting the power of each musical tone of a musical sound signal, means for setting a plurality of chord detection sections in the acoustic signal, and the first beat of each chord detection section from the power. Means for detecting the power of each scale sound in the planned bass sound detection range in the portion corresponding to the above, means for determining the sound having the highest power among the scale sounds as the base sound, and the scale of the entire chord detection section Means for extracting a predetermined number of pitch names from the sound in descending order of power, detecting chord candidates each having the predetermined number of pitch names as a root sound, and an average power level of all chord constituent sounds for the chord candidates Then, a likelihood is calculated, and a code candidate having the maximum likelihood is determined as a code and a tension level input means. The means for detecting the code candidate has a tension level. Configuration sound number of codes Flip the invention is secondly characterized in the point that is configured to narrow down the code candidate.

また、本発明は、前記尤度を計算する手段が、検出されたコード候補の構成音について、ルート音からの音程によりテンションレベルに応じたパワー補正を行う手段をさらに含んでいる点に第３の特徴がある。 According to a third aspect of the present invention, the means for calculating the likelihood further includes means for correcting the power corresponding to the tension level based on the pitch from the root tone for the detected constituent sound of the code candidate. There are features.

第１の特徴を有する本発明では、基音および倍音のパワーに基づいてベース音を検出する際に、ジャンルに応じて基音および倍音のパワーにそれぞれ重み付けをすることができる。したがって、例えば、倍音構造をもたないバスドラムの音量が大きいロック系の曲に対して基音に対する重み付けを小さくしておくことによって、バスドラム音のパワーが小さく検出されることになるので、バスドラム音をベース音として誤検出することを回避できる。 In the present invention having the first feature, when detecting the bass sound based on the power of the fundamental tone and the harmonic overtone, the power of the fundamental tone and the harmonic overtone can be weighted according to the genre. Therefore, for example, the bass drum sound power can be detected low by reducing the weighting of the fundamental tone for a rock-type song with a high bass drum volume without a harmonic structure. It is possible to avoid erroneous detection of a drum sound as a bass sound.

第２、第３の特徴を有する本発明では、テンションレベルによって和音の構成音数を限定してコード候補を絞り込むことができる。また、テンションレベルによって、尤度を操作してパワー補正を行うことができるので、それぞれのテンションレベルに応じたコードが検出されやすくすることができる。テンションレベルは曲のジャンルによって、傾向があるので、テンションレベルを指定することによって、特定のジャンルの曲やユーザの好みの曲からコードを検出するのに好都合である。 In the present invention having the second and third features, chord candidates can be narrowed down by limiting the number of chord constituent sounds according to the tension level. In addition, since the power can be corrected by manipulating the likelihood according to the tension level, a code corresponding to each tension level can be easily detected. Since the tension level tends to vary depending on the genre of the music, specifying the tension level is convenient for detecting a chord from a music of a specific genre or a user's favorite music.

以下、図面を参照して本発明を詳細に説明する。図２は、本発明の一実施形態に係るコード検出装置としてのパーソナルコンピュータのハード構成を示すブロック図である。パーソナルコンピュータのハード構成は周知のものであり、本発明に係るコード検出プログラムがロードされるとコード検出装置としての機能に従って動作するものである。 Hereinafter, the present invention will be described in detail with reference to the drawings. FIG. 2 is a block diagram showing a hardware configuration of a personal computer as a code detection apparatus according to an embodiment of the present invention. The hardware configuration of the personal computer is well known, and when the code detection program according to the present invention is loaded, it operates according to the function as the code detection device.

図２において、パーソナルコンピュータ１は、ＣＰＵ２、ＲＯＭ３、ＲＡＭ４、表示装置（例えば、液晶ディスプレイ）５、外部記憶装置（ハードディスク装置）６を有し、これらはシステムバス７を介して接続される。さらに、システムバス７には入出力インタフェース８を介してキーボード９、サウンドシステム１０、およびＣＤ−ＲＯＭドライブ１１が接続される。パーソナルコンピュータ１の各部はシステムバス７を通じて互いに信号やデータの入出力を行う。なお、パーソナルコンピュータ１へユーザが指示を与える入力装置はキーボード９に限らず、マウス等のポインティング・デバイスも含まれる。 In FIG. 2, the personal computer 1 has a CPU 2, a ROM 3, a RAM 4, a display device (for example, a liquid crystal display) 5, and an external storage device (hard disk device) 6, which are connected via a system bus 7. Further, a keyboard 9, a sound system 10, and a CD-ROM drive 11 are connected to the system bus 7 via an input / output interface 8. Each part of the personal computer 1 inputs and outputs signals and data with each other through the system bus 7. Note that the input device for giving an instruction to the personal computer 1 by the user is not limited to the keyboard 9 but also includes a pointing device such as a mouse.

ＲＯＭ３は、パーソナルコンピュータ１のＢＩＯＳ（バイオス）等が記憶される格納領域である。ＲＡＭ４は、ハードディスク７から読み込まれるプログラムの格納領域としての他、ワークエリア、係数やパラメータ等の一時的な記憶領域として使用される。 The ROM 3 is a storage area in which the BIOS (Bios) of the personal computer 1 is stored. The RAM 4 is used not only as a storage area for programs read from the hard disk 7 but also as a temporary storage area for work areas, coefficients, parameters, and the like.

表示装置５は、ＣＰＵ２の指令により、必要な画像処理を行う表示制御部（図示せず）によって制御されており、画像処理結果を表示する。 The display device 5 is controlled by a display control unit (not shown) that performs necessary image processing in accordance with an instruction from the CPU 2 and displays an image processing result.

ＣＤ−ＲＯＭドライブ１１は、ＣＤ−ＲＯＭのデータを読み込んでハーディスク装置６に格納する読み込み手段であり、コード検出プログラムやコード検出の対象となる音響信号をパーソナルコンピュータ１に読み込む駆動装置である。ＣＤ−ＲＯＭドライブ１１によって読み出されたプログラムやデータ（音響信号を含む）はハードディスク装置６に格納され、メインのプログラムはＲＡＭ４に格納される。 The CD-ROM drive 11 is a reading unit that reads data from a CD-ROM and stores it in the hard disk device 6, and is a drive device that reads into the personal computer 1 a code detection program and an acoustic signal to be code-detected. Programs and data (including acoustic signals) read by the CD-ROM drive 11 are stored in the hard disk device 6, and main programs are stored in the RAM 4.

図３は、コード検出プログラムによる要部処理を示すフローチャートである。図３において、ステップＳ１では、コード検出対象楽曲の音響信号を読み込む。音響信号はＣＤ−ＲＯＭドライブ１１にセットされたＣＤ−ＲＯＭからリッピングしたり、ＷＡＶファイルに予め格納されている楽音波形を指定したりして読み込むことができる。ステップＳ２では、読み込んだ音響信号におけるコード検出範囲を決定する。読み込まれた音響信号全部（つまり曲全体）をコード検出範囲としてもよいし、例えば、楽曲のイントロ部分やサビ部分の任意の小節に対してユーザが入力手段を用いて指定するのでもよい。ユーザがコード検出範囲を指定するためには、音響信号の波形を表示装置５に表示してユーザがキーボード９やポインティングデバイスを用いて範囲を指定できるようにする。 FIG. 3 is a flowchart showing main processing by the code detection program. In FIG. 3, in step S1, an acoustic signal of a chord detection target music is read. The acoustic signal can be read by ripping from a CD-ROM set in the CD-ROM drive 11 or designating a musical sound waveform pre-stored in the WAV file. In step S2, a code detection range in the read acoustic signal is determined. The entire read sound signal (that is, the entire music piece) may be used as the chord detection range, or for example, the user may designate an arbitrary measure in the intro part or the chorus part of the music piece using the input means. In order for the user to specify the code detection range, the waveform of the acoustic signal is displayed on the display device 5 so that the user can specify the range using the keyboard 9 or a pointing device.

ステップＳ３では、拍子を設定する。拍子はユーザによって入力される指示に基づいて検出される。拍子の設定は、例えば、ステップＳ１で読み込まれた音響信号に基づいて楽曲を再生し、その再生音を聞いてユーザが拍子を判断し、判断された拍子を、例えば、表示装置５に表示される拍子入力画面から入力する。 In step S3, the time signature is set. The time signature is detected based on an instruction input by the user. For example, the time signature is set by, for example, reproducing the music based on the sound signal read in step S1, listening to the reproduced sound, the user determining the time signature, and displaying the determined time signature on the display device 5, for example. Enter from the time signature input screen.

ステップＳ４では、拍（ビート）の位置を検出する。拍は、ユーザが入力するタッピング（キーボード９の予定キーを叩くこと）に基づいて検出される。すなわち、音響信号に従って曲を再生し、ユーザが、この再生された曲を聞きながら、拍の位置を判断し、その位置でキーボード９を叩いて拍を入力する。タッピングの開始位置は小節の１拍目からとする。つまり、ユーザは再生された楽曲に合わせて１拍目を判断し、そのタイミングからタッピングを開始する。タッピング間隔が安定して所定のゆらぎの範囲に収まるようになったときに拍数（１分間あたりの拍数）を確定する。 In step S4, the position of the beat is detected. The beat is detected based on tapping (hit a schedule key on the keyboard 9) input by the user. That is, the music is reproduced according to the acoustic signal, and the user determines the position of the beat while listening to the reproduced music, and hits the keyboard 9 at that position to input the beat. The tapping start position is from the first beat of the measure. That is, the user determines the first beat according to the reproduced music, and starts tapping from that timing. When the tapping interval is stabilized and falls within a predetermined fluctuation range, the beat number (beats per minute) is determined.

ステップＳ５では、前記拍子と拍数とに基づいて、小節線位置を決定する。最初のタッピングは１拍目であるから、例えば４／４拍子の場合、タッピング開始時から開始してタッピング毎に１〜４の数字を順に当てはめ、拍数が確定した部分の直前の数字１〜４で示されるタッピング位置つまり４つの拍を含む小節を確定する。小節線は、４つの拍のうち第１拍目位置に引く。 In step S5, the bar line position is determined based on the time signature and the number of beats. Since the first tapping is the first beat, for example, in the case of 4/4 time, start with the tapping start and apply the numbers 1 to 4 in order for each tapping. A tapping position indicated by 4, that is, a measure including four beats is determined. The bar line is drawn at the first beat position among the four beats.

こうして、１つの小節が確定されたならば、その小節のサイズで、その前後の音響信号に対しても小節線を引いていき、ステップＳ２で設定したコード検出範囲の中に、小節を１単位とするコード検出区間を決定する。コード検出区間はコードを検出するために音響信号のＦＦＴ演算をして音階音のパワースペクトルを求める区間である。 Thus, when one bar is determined, bar lines are drawn for the acoustic signals before and after the bar size, and one bar is included in the chord detection range set in step S2. Is determined. The chord detection section is a section for obtaining the power spectrum of the scale sound by performing FFT calculation of the acoustic signal in order to detect the chord.

なお、コード検出区間の確定方法は、ユーザが入力する拍子や拍数に基づいて決定するものに限らない。例えば、拍子やタッピングによる拍間隔の入力に加えて、所定フレーム毎に音響信号をＦＦＴ演算して各フレーム毎の各音階音のパワーの変化度合を求め、この変化度合から平均的な拍間隔を求めてテンポ候補とし、このテンポ候補からタッピングテンポに近いものを選択して拍位置とするのでもよい。音響信号のＦＦＴ演算によって曲の小節線を検出する手法としては、本出願人が先に出願した特願２００６−２１６３６２号の明細書（特開２００８−４０２８４号公報）に開示されたものを適用することができる。要するに、この実施形態では、コード検出区間として１小節を単位とする区間が指定されればよい。しかし、本発明は、１小節をコード検出区間とするのに限らず、特許文献１、２に記載したもののように、さらに小節を分割して１小節の幅を複数のコード検出区間としてもよい。 Note that the method for determining the code detection section is not limited to the one that is determined based on the time signature and the number of beats input by the user. For example, in addition to inputting the beat interval by time signature or tapping, the sound signal is subjected to FFT calculation for each predetermined frame to obtain the degree of change in power of each scale sound for each frame, and the average beat interval is obtained from this change degree. It is also possible to obtain a tempo candidate and select a tempo candidate that is close to the tapping tempo as the beat position. As a technique for detecting a bar line of a song by FFT calculation of an acoustic signal, the one disclosed in the specification of Japanese Patent Application No. 2006-216362 (Japanese Patent Laid-Open No. 2008-40284) filed earlier by the present applicant is applied. can do. In short, in this embodiment, it is only necessary to specify a section in units of one measure as the code detection section. However, the present invention is not limited to one bar as a code detection section, and as described in Patent Documents 1 and 2, the bar may be further divided so that the width of one bar is set as a plurality of code detection sections. .

上述のようにして、コード検出範囲にコード検出区間が設定されれば、ステップＳ６で最初のコード検出区間から順にコード検出を行う。 As described above, if a code detection section is set in the code detection range, code detection is performed in order from the first code detection section in step S6.

次に、コード検出処理を説明する。図４は、コード検出処理の要部フローチャートである。図４において、ステップＳ６０では、コード検出区間（１小節分）の音響信号から該コード検出区間のベース音を検出する。ベース音は、コード検出区間の１拍目に相当する部分の音階音のパワーに基づいて求められる。各音階音のパワーは、入力された音響信号から所定の時間間隔（フレーム）でＦＦＴ演算を行い、求められたパワースペクトルからフレーム毎に検出される。ステップＳ６１では、該コード検出区間でのコード候補を検出する。 Next, the code detection process will be described. FIG. 4 is a flowchart showing the main part of the code detection process. In FIG. 4, in step S60, the bass sound of the chord detection section is detected from the sound signal of the chord detection section (for one bar). The base sound is obtained based on the power of the scale sound corresponding to the first beat of the chord detection section. The power of each scale sound is detected for each frame from the obtained power spectrum by performing an FFT operation at predetermined time intervals (frames) from the input acoustic signal. In step S61, code candidates in the code detection section are detected.

ステップＳ６２では、ステップＳ６１で検出されたコード候補の尤度（コード候補としてのふさわしさ）を、後述の数式１に従って計算する。ステップＳ６３では、計算された尤度が最も大きいコード候補を検出コードとして、コード名を決定する。 In step S62, the likelihood of the code candidate detected in step S61 (appropriateness as a code candidate) is calculated according to Equation 1 described later. In step S63, the code name is determined with the code candidate having the highest calculated likelihood as the detection code.

本実施形態のコード検出プログラムでは、ベース音の検出やコード候補の検出の精度を向上させるために検出オプションを設定することができる。検出オプションを付加したベース音やコード候補の検出については後述するが、まず、検出オプションを付加しないベース音の検出手順およびコード候補の検出手順を説明する。 In the chord detection program of this embodiment, detection options can be set to improve the accuracy of bass sound detection and chord candidate detection. The detection of the bass sound and the chord candidate with the detection option added will be described later. First, the base sound detection procedure and the chord candidate detection procedure without the detection option will be described.

ベース音は、コード検出区間の１拍目に相当する音響信号の音階音のパワーに基づいて決定する。各音階音の基本周波数の上下５０セントの範囲（１００セントが半音である）の周波数に相当するパワースペクトルのうち、最大の値をその音階音のパワーとすることができる。全音階音（Ｃ１〜Ａ６）について計算されたパワースペクトルのうち、ベース音の検出にはベース音検出音域（例えばＣ２〜Ｂ３）の音階音のパワーが用いられる。 The base sound is determined based on the power of the scale sound of the acoustic signal corresponding to the first beat of the chord detection section. Of the power spectrum corresponding to the frequency in the range of 50 cents above and below the fundamental frequency of each scale sound (100 cents is a semitone), the maximum value can be the power of the scale sound. Of the power spectrum calculated for all scale sounds (C1 to A6), the power of the scale sound in the bass sound detection range (for example, C2 to B3) is used to detect the bass sound.

音階音のパワーは、フレーム時間ｔにおけるｉ番目（最低音からｉ番目）の音階音のレベルをＬi（ｔ）とすると、フレーム番号ｆsからｆeまでの間のｉ番目の音階音の平均的なパワーＬavgi（ｆs，ｆe）は数式１で計算できる。 The power of the scale sound is an average of the i-th scale sound between frame numbers fs and fe, where Li (t) is the level of the i-th (lowest to i-th) scale sound at frame time t. The power Lavgi (fs, fe) can be calculated by Equation 1.

この数式１で計算される平均的なパワーをすべての音階音ｉ（０≦ｉ≦６９）について計算する。ここで計算したパワーは所定の記憶部に記憶しておく。そして、ベース音検出音域において平均的なパワーが最大となっている音階音をベース音として検出する。 The average power calculated by Equation 1 is calculated for all scale sounds i (0 ≦ i ≦ 69). The power calculated here is stored in a predetermined storage unit. Then, the scale sound having the maximum average power in the bass sound detection range is detected as the bass sound.

コード検出区間（１小節）内で、１拍目に相当する部分における音階音のベース音が検出されたならば、次にコード候補検出に移る。コード候補の検出のためには、コード検出区間（１小節）の音階音のパワーが使われる。予め指定されたコード検出音域（例えばＣ３〜Ａ６）での各音階音の平均的なパワーを１２の音名（Ｃ、Ｃ♯、Ｄ、Ｄ♯、…、Ｂ）毎に積算して、各音名毎のパワーを計算する。そして、この音名毎のパワーが最も大きい音名を最大６つ抽出する。抽出された音名を最大６つとしたが、５つしか検出されないこともある。無音に近いコード検出区間で誤ってコード候補が検出されるのを防ぐため、平均的なパワーがしきい値以下のものがコード候補とされないようにしているためである。 If the bass sound of the scale tone in the portion corresponding to the first beat is detected in the chord detection section (one measure), the process proceeds to chord candidate detection. For detection of chord candidates, the power of the scale sound in the chord detection section (one measure) is used. The average power of each scale tone in a chord detection range (for example, C3 to A6) designated in advance is integrated for every 12 pitch names (C, C #, D, D #,..., B), Calculate the power for each note name. Then, a maximum of six pitch names having the largest power for each pitch name are extracted. Although the maximum number of extracted pitch names is six, only five may be detected. This is because, in order to prevent a code candidate from being erroneously detected in a chord detection section close to silence, a code candidate whose average power is equal to or less than a threshold is not set as a code candidate.

検出されたベース音が、抽出した最大６つの音に含まれない場合は、このベース音も含めた最大７つの音からコード名を決定する。 If the detected bass sound is not included in the extracted maximum six sounds, the chord name is determined from the maximum seven sounds including the bass sound.

コード候補の決定のためには、まず、検出された最大７つの音のいずれか一つをコードのルート音として仮定し、該ルート音と、所定のコードタイプにおいてルートから所定の音程にある構成音のすべてについて、パワーの合計を計算し、コード構成音の数で除算する。こうして、仮のルート音を含めた構成音からなるコード候補と、該コード候補の尤度が計算される。 In order to determine a chord candidate, first, any one of a maximum of seven detected sounds is assumed as a chord root tone, and the root tone and a predetermined chord type are located at a predetermined pitch from the root. For all notes, calculate the total power and divide by the number of chords. In this way, the chord candidate including the constituent sounds including the temporary root sound and the likelihood of the chord candidate are calculated.

例えば、パワーが大きい音として４つの音（ド、ミ、ソ、シ）が抽出された場合は、まず、「ド」をルート音と仮定して、予め設定した各コードタイプ毎の、ルート音を基準とした構成音の音程（インターバル）関係（図５参照）にある音階音のパワーを合計する。 For example, when four sounds (do, mi, seo, and shi) are extracted as sounds with high power, first, it is assumed that “do” is a root sound, and root sounds for each preset chord type are assumed. The powers of the scale sounds in the pitch (interval) relation (see FIG. 5) of the constituent sounds based on the above are summed.

図５は、コードタイプと各コードタイプ毎の構成音の音程（ルート音を基準とする音程）を示すテーブルの例である。この図５を参照して、パワーを検出する音名を決定する。例えば、メジャー（Ｍａｊ）では、「ド」をルート音としたときに、該ルート音から半音４つ分の音程を有する音（ミ）と、ルート音から半音７つ分の音程を有する（ソ）とが選択される。そして、この３つの音について尤度を計算する。尤度は、「すべての構成音の平均的なパワーの合計」を「コード構成音の数」で除算した値である。 FIG. 5 is an example of a table showing chord types and pitches of constituent sounds for each chord type (pitch based on the root tone). With reference to FIG. 5, the pitch name for detecting the power is determined. For example, in the major (Maj), when “do” is a root sound, a sound (mi) having a pitch of four semitones from the root sound and a pitch of seven semitones from the root sound (sound) ) And are selected. Then, the likelihood is calculated for these three sounds. The likelihood is a value obtained by dividing "the total average power of all constituent sounds" by "the number of chord constituent sounds".

例えば、メジャー（Ｍａｊ）コードに関して、ルート音を「ド」とした場合、該メジャーコードの構成音「ド」、「ミ」、「ソ」の音名の、コード検出区間における平均的なパワーを、Ｃ３からＡ６の範囲でオクターブ毎に積算し、オクターブ数で除算して平均したものを、各構成音毎に求め、これを合計したものを「すべての構成音の平均的なパワーの合計」とする。メジャーコードの場合、コード構成音の数は「３」である。 For example, regarding the major chord (Maj), when the root note is “do”, the average power of the note names “do”, “mi” and “so” of the major chord in the chord detection section is calculated. , Summed for each octave in the range of C3 to A6, divided by the number of octaves and averaged, obtained for each component sound, and the total was calculated as "the sum of the average power of all component sounds" And In the case of a major chord, the number of chord constituent sounds is “3”.

ここで、仮のルート音から算出される構成音が、抽出した最大７つの音に含まれない場合は、その音のパワーは「ゼロ」としてもよい。 Here, when the constituent sound calculated from the temporary root sound is not included in the extracted maximum seven sounds, the power of the sound may be “zero”.

こうしてすべてのコード候補に関して尤度が計算される。計算は図５に示したずべてのコードタイプについて行われる。例えば、抽出された音名が４つであった場合は、仮のルート音が４種類となるので、Ｍａｊから１３までの１５種類のコードタイプと構成音数「４」とを乗算した数（６０）のコード候補について尤度が計算される。 Thus, the likelihood is calculated for all code candidates. Calculations are performed for all code types shown in FIG. For example, if there are four extracted note names, there are four types of provisional root sounds, so the number obtained by multiplying 15 types of chords from Maj to 13 and the number of constituent sounds “4” ( The likelihood is calculated for the candidate code 60).

各コード候補について尤度が求められたならば、それらの尤度のうち、最大の尤度となったコード候補を検出コードとして確定する。確定したコード候補に対して設定されている仮のルートとコードタイプとを組み合わせてコード名を完成させる。なお、一般にメジャーコードはコードタイプを付加せず、ルート音のみ、例えば「Ｃ」というふうに表記する。 If the likelihood is obtained for each code candidate, the code candidate having the maximum likelihood among the likelihoods is determined as a detection code. The code name is completed by combining the temporary route and the code type set for the confirmed code candidate. In general, a major chord is not added with a chord type, and only a root sound, for example, “C” is described.

上述のように、一つのコード検出区間についてコード名が確定されると、次のコード検出区間（小節）に対しても同様にしてコードを検出する。指定したコード検出範囲についてコード名が検出されれば、コード検出区間毎に設けたコードトラックにコード名を貼り付けて、表示装置５に表示する。検出コード名を記載したリードシートとして表示してもよい。また、コード検出プログラムに予め設定しておくことができる内蔵リズムパターンを使って検出コードを演奏するように構成してもよい。 As described above, when the code name is determined for one code detection section, the code is similarly detected for the next code detection section (bar). If a code name is detected for the specified code detection range, the code name is pasted on the code track provided for each code detection section and displayed on the display device 5. You may display as a lead sheet which described the detection code name. Alternatively, the detection chord may be played using a built-in rhythm pattern that can be preset in the chord detection program.

図１は、コード検出プログラムの要部を示す機能ブロック図である。図１において、音響信号入力部１２は、音楽ＣＤや波形ファイル等から音響信号を音響信号記憶部１３に入力する手段である。コード検出範囲設定部１４はユーザによって入力されるコード検出範囲の指定を受け入れ、音響信号記憶部１３に格納されている音響信号に範囲を設定する。コード検出区間設定部１５は、ユーザによって入力された拍子や拍によって小節線を確定し、コード検出区間を設定する。コード検出区間の設定方法は、これに限定されないのは、上述のとおりである。 FIG. 1 is a functional block diagram showing the main part of the code detection program. In FIG. 1, an acoustic signal input unit 12 is means for inputting an acoustic signal from a music CD, a waveform file, or the like to the acoustic signal storage unit 13. The chord detection range setting unit 14 accepts designation of the chord detection range input by the user, and sets the range to the acoustic signal stored in the acoustic signal storage unit 13. The chord detection section setting unit 15 determines a bar line based on the time signature and beat input by the user, and sets a chord detection section. The method for setting the code detection section is not limited to this, as described above.

パワー算出部１６は、ＦＦＴ演算により音響信号の音階音のパワースペクトルを計算する。計算されたパワースペクトルは音階音パワー記憶部１７に記憶される。ベース音検出部１８は、コード検出期間の１拍目に相当する部分について音響信号のパワーのうち、最も大きいパワーをもつ音をベース音として検出する。 The power calculation unit 16 calculates the power spectrum of the scale sound of the acoustic signal by FFT calculation. The calculated power spectrum is stored in the scale sound power storage unit 17. The base sound detection unit 18 detects a sound having the highest power among the powers of the acoustic signal for the portion corresponding to the first beat of the chord detection period as the base sound.

コード候補検出部１９は、コード検出区間における各音階音の平均的なパワーを求め、音名毎にパワーが最も大きい音名を所定数抽出する機能と、これら抽出された音名のそれぞれをルート音と仮定したコード候補を音程テーブル１９ａの音程に従って抽出する機能とを有する。音程テーブル１９ａの例は、図５に関して説明したとおり、所定数のコードタイプとその構成音のルートからの音程を示したものである。 The chord candidate detection unit 19 obtains the average power of each scale note in the chord detection section, extracts a predetermined number of pitch names having the largest power for each pitch name, and routes each of these extracted pitch names. It has a function of extracting a chord candidate assumed to be a sound according to the pitch of the pitch table 19a. As described with reference to FIG. 5, the example of the pitch table 19a indicates a predetermined number of chord types and pitches from the constituent sound routes.

尤度算出部２０は、各コード候補毎に、ルート音を含めたすべての構成音の、コード検出区間における平均的なパワーの合計を、コード構成音の数で除算する機能を有する。コード名決定部２１は、尤度算出部２０で計算された尤度が最大であるコード候補を検出コード名として決定し、コード名表示部２１に出力する。コード名表示部２１は、コードトラックに貼り付けたコード名を表示装置５によって表示させる。 The likelihood calculating unit 20 has a function of dividing, for each chord candidate, the total average power of all constituent sounds including the root sound in the chord detection section by the number of chord constituent sounds. The code name determining unit 21 determines a code candidate having the maximum likelihood calculated by the likelihood calculating unit 20 as a detected code name, and outputs the code candidate to the code name display unit 21. The code name display unit 21 causes the display device 5 to display the code name pasted on the code track.

次に、コード検出精度を向上させることができる検出オプションについて説明する。第１の検出オプションとして、曲のジャンル指定によるベース音倍音構造を使ったベース音検出機能を加えることができる。上述したオプション無しのベース音検出においては、ベース音検出音域における平均的なレベルが最も大きい音をベース音としていた。しかし、この方法では、バスドラム等の音をベース音として誤検出してしまうことがある。特に、バスドラムの音量が大きいロック系の曲では誤検出のおそれがある。しかし、一般にバスドラムの音は、倍音構造を持たない点で、音色によって特有の倍音構造を有するベース音と異なる。そこで、本実施形態ではオプションとして、倍音構造を持たないバスドラム音などをベース音として誤検出することがないようにした。 Next, detection options that can improve code detection accuracy will be described. As a first detection option, a bass sound detection function using a bass overtone structure by specifying a genre of music can be added. In the above-described bass sound detection without options, the sound having the highest average level in the bass sound detection range is used as the base sound. However, this method may erroneously detect the sound of a bass drum or the like as a bass sound. In particular, there is a risk of erroneous detection in a rock-type song with a high bass drum volume. However, generally, the sound of a bass drum is different from a bass sound having a specific harmonic structure depending on the tone color in that it does not have a harmonic structure. Therefore, in this embodiment, as an option, a bass drum sound having no overtone structure is not erroneously detected as a bass sound.

上述のように、ベース音は、数式１で計算されるコード検出区間の１拍目に相当する部分における音階音のパワーＬavgi（ｆs，ｆe）に基づいて検出する。このパワーＬavgi（ｆs，ｆe）を以下のように補正する。すなわち、ｉ番目の音階音の平均的なパワーＬavgi（ｆs，ｆe）を、ｉを変数としてＬavg(i)と表し、ｉ番目の音階音について、その基音だけでなく、２倍音、３倍音、…、８倍音の倍音のパワーＬavg(i)〜Ｌavg(i＋36)を合算してｉ番目の音階音の平均的なパワーとする。この場合、各基音並びに倍音毎に係数を乗算する。 As described above, the bass sound is detected based on the power Lavgi (fs, fe) of the scale sound in the portion corresponding to the first beat of the chord detection section calculated by Equation 1. The power Lavgi (fs, fe) is corrected as follows. That is, the average power Lavgi (fs, fe) of the i-th scale tone is expressed as Lavg (i) with i as a variable, and not only the fundamental tone but also the second harmonic, the third harmonic, ..., the powers of the eighth harmonic overtones Lavg (i) to Lavg (i + 36) are added to obtain the average power of the i-th scale. In this case, a coefficient is multiplied for each fundamental tone and overtone.

すなわち、ｉ番目の音階音の平均的なパワーＬavg(i)は次の数式２によって計算する。 That is, the average power Lavg (i) of the i-th scale sound is calculated by the following formula 2.

Ｌavg(i)＝｛（Ｋ１・Ｌavg(i)）＋（Ｋ２・Ｌavg(i＋12))＋（Ｋ３・Ｌavg(i＋19))＋（Ｋ４・Ｌavg(i＋24))＋（Ｋ５・Ｌavg(i＋28))＋（Ｋ６・Ｌavg(i＋31))＋（Ｋ７・Ｌavg(i＋34))＋（Ｋ８・Ｌavg(i＋36))｝÷（Ｋ１＋Ｋ２＋Ｋ３＋Ｋ４＋Ｋ５＋Ｋ６＋Ｋ７＋Ｋ８）……数式２ Lavg (i) = {(K1 · Lavg (i)) + (K2 · Lavg (i + 12)) + (K3 · Lavg (i + 19)) + (K4 · Lavg (i + 24)) + (K5 · Lavg (i + 28)) + (K6 · Lavg (i + 31)) + (K7 · Lavg (i + 34)) + (K8 · Lavg (i + 36))} ÷ (K1 + K2 + K3 + K4 + K5 + K6 + K7 + K8) ...... Equation 2

係数（ジャンル係数）Ｋ１〜Ｋ８は曲のジャンルによって予め設定することができる数値である。この数式２から理解できるように、ｉ番目の音階音の音高そのものである基音から８倍音までの音階音のパワーのそれぞれにジャンル係数を乗算した後、互いを合算し、さらにその合算値をジャンル係数Ｋ１〜Ｋ８の合計で除算してパワーを計算している。 The coefficients (genre coefficients) K1 to K8 are numerical values that can be preset according to the genre of the music. As can be understood from Equation 2, after multiplying each power of the scale tone from the fundamental tone to the eighth harmonic, which is the pitch of the i-th tone, by the genre coefficient, the sum is added to each other, and the sum is further obtained. The power is calculated by dividing by the total of the genre coefficients K1 to K8.

例えば、係数Ｋ１〜Ｋ８は次の値に設定することができる。図６はジャンル係数Ｋ１〜Ｋ８の例である。この例では、ジャズに対しては基音の係数Ｋ１を極端に大きくし、基音のパワーを際だたせている。そして、基音以外の倍音では係数Ｋを小さくしている。一方、ロックに対しては、基音の係数Ｋ１はジャズに比べて極めて小さくし、２倍音以上の係数Ｋ２〜Ｋ８はジャズの場合と同じ値にしている。 For example, the coefficients K1 to K8 can be set to the following values. FIG. 6 is an example of genre coefficients K1 to K8. In this example, for the jazz, the fundamental tone coefficient K1 is made extremely large to emphasize the fundamental tone power. The coefficient K is reduced for harmonics other than the fundamental tone. On the other hand, for the rock, the fundamental tone coefficient K1 is extremely smaller than that of jazz, and the coefficients K2 to K8 of the second overtone are set to the same values as in the case of jazz.

ジャンルとしてユーザがジャズを指定した場合、基音を重視してベース音が検出される。このジャズの指定は、バスドラムがそれほど強く演奏されない曲やドラムレスの曲に対して行うのがよい。ジャズ系の曲ではウッドベース（コントラバス）が用いられることが多く、このウッドベースの音は減衰が激しく倍音も強く検出されないので、この点でも基音を重視した係数の設定は有意義である。 When the user designates jazz as the genre, the bass sound is detected with emphasis on the fundamental tone. This jazz designation should be made for songs where the bass drum is not played very strongly or drumless songs. In jazz music, wood bass (contrabass) is often used, and the sound of this wood bass is strongly attenuated and overtones are not detected strongly. Therefore, in this respect as well, setting a coefficient that emphasizes the fundamental tone is meaningful.

ジャンルとしてのロックを選択するのは、バスドラムが強く演奏されるロック系の曲の場合である。倍音のパワーは基音に対するパワーと同程度のまたは３分の１程度重視されているだけなので、バスドラムの音のように基音しか存在しない音は、倍音に対して相対的にパワーが小さく検出され、結果的に排除される。 The selection of rock as a genre is in the case of rock-type songs where the bass drum is played strongly. Since the power of harmonics is about the same as or about one third of the power of the fundamental tone, the sound that has only the fundamental tone, such as the bass drum, is detected with a relatively small power relative to the harmonics. , As a result is eliminated.

この検出オプションを実現するためには、コード検出プログラムにおいて、前記ステップＳ６０でのベース音検出の開始前に、ユーザに対してジャンルを指定させる表示を行うようにする。つまり、表示装置５上に、ユーザが指示を入力できるジャンル指定画面を表示できるようにする。例えば、ジャンル（ジャズ、ロック）を表示してスイッチ機能を持たせ、その表示をクリック操作することによってジャンル指定を受け付けられるようにするのがよい。 In order to realize this detection option, in the chord detection program, a display for allowing the user to specify a genre is performed before the start of the bass sound detection in step S60. That is, a genre designation screen on which the user can input an instruction can be displayed on the display device 5. For example, it is preferable to display a genre (jazz, rock), have a switch function, and accept a genre designation by clicking on the display.

前記ベース音検出部１８には、ジャンル別の係数Ｋ１〜Ｋ８を記述した係数テーブルを設けておく。ベース音検出部１８は、数式１によってフレームｆsからｆeまでのｉ番目の音階音の平均的なパワーＬavgi（ｆs，ｆe）を計算した後、指定されたジャンルに従って係数テーブルから係数Ｋ１〜Ｋ８を読み出し、数式２によって補正を行う。 The bass sound detection unit 18 is provided with a coefficient table describing the genre-specific coefficients K1 to K8. The bass sound detection unit 18 calculates the average power Lavgi (fs, fe) of the i-th scale tone from the frame fs to fe according to Equation 1, and then calculates the coefficients K1 to K8 from the coefficient table according to the specified genre. Reading and correction are performed according to Equation 2.

なお、図６に挙げた係数Ｋ１〜Ｋ８の値はこれに限らないし、何倍音までを合算するかも変形可能である。また、倍音がＦＦＴ演算によって求めた音階音の範囲を超えるときは、数式２の計算では、パワーを「０」として合算する。 Note that the values of the coefficients K1 to K8 shown in FIG. 6 are not limited to this, and how many overtones are added can be modified. Further, when the overtone exceeds the range of the scale tone obtained by the FFT calculation, the power is added as “0” in the calculation of Expression 2.

このようにして補正した各音階音の平均パワーが、ベース音検出音域内で最も大きい音をベース音として決定するが、計算された各音階音の平均パワーが所定のしきい値以下の音階音はベース音の候補から予め排除するようにするのがよい。したがって、ベース音検出音域内のすべての音について、補正した平均パワーがしきい値未満である場合は、ベース音は検出されない。 The average power of each scale sound corrected in this way is determined as the base sound with the highest power in the bass sound detection range, but the calculated average power of each scale sound is below a predetermined threshold value. Is preferably excluded from bass sound candidates in advance. Therefore, for all sounds in the bass sound detection range, if the corrected average power is less than the threshold, no bass sound is detected.

第２の検出オプションでは、コード候補に対してテンションレベルによる補正を行うことができる。上述のように、コード名の決定に際してコード候補から尤度が最大のものを、コード検出区間のコード名として決定した。第２の検出オプションでは、図５の音程テーブルを使用してコード候補を検出する段階でテンションレベルによるコード候補の絞り込みを行う。これによって、ユーザが好みの曲や曲のジャンル等、曲の特徴に応じたコード候補の絞り込みを行うことができる。例えば、３和音、４和音、および５和音以上等、複数のテンションレベルを指定できるようにし、それぞれのテンションレベルのコード候補を検出する。テンションレベルは、例えば１〜３およびＮの４種類とする。図７は、テンションレベルと各テンションレベル毎の和音の絞り込みの例を示す図である。テンションレベル「０」では３和音のみを抽出するか、３和音を中心に抽出する。テンションレベル「１」では４和音のみを抽出するか、４和音を中心に抽出する。テンションレベル「２」では５和音以上のみ、または５和音以上を中心に抽出する。テンションレベル「Ｎ」では、絞り込みをせず、検出したコードをそのままコード候補として出力する。 In the second detection option, the code candidate can be corrected by the tension level. As described above, when the code name is determined, the code candidate having the maximum likelihood is determined as the code name of the code detection section. In the second detection option, chord candidates are narrowed down by the tension level at the stage of detecting chord candidates using the pitch table of FIG. As a result, it is possible to narrow down code candidates according to the characteristics of the song, such as the song or genre of the song that the user likes. For example, a plurality of tension levels such as three chords, four chords, and five chords or more can be designated, and chord candidates for each tension level are detected. There are four types of tension levels, for example, 1 to 3 and N. FIG. 7 is a diagram illustrating an example of narrowing chords for each tension level and each tension level. At the tension level “0”, only the three chords are extracted or the three chords are extracted. At the tension level “1”, only four chords are extracted, or the four chords are extracted. At the tension level “2”, only five chords or more or five chords or more are extracted. At the tension level “N”, the detected code is output as it is as a code candidate without being narrowed down.

コード候補検出部１９では、図５の音程テーブルから、指定されたテンションレベルのコードタイプを選択し、このコードタイプの音程に対応する音を構成音とするコード候補を検出する。第２の検出オプションを実現するため、コード検出プログラムにおいて、前記ステップＳ６１でのコード候補検出の開始前に、ユーザに対してテンションレベルを指定させる表示を行うようにする。つまり、表示装置５上に、ユーザがテンションレベルを指定できる表示を行うようにする。例えば、テンションレベル（１〜３、Ｎ）を表示装置５に表示して、その表示部にスイッチ機能を持たせ、その表示をクリック操作することによってテンションレベルを受け付けられるようにする。コード候補検出部１９は、テンションレベルの指定を受け付けると、テンションレベルに応じたコード候補を出力するように構成する。 The chord candidate detection unit 19 selects a chord type of a specified tension level from the pitch table of FIG. 5 and detects chord candidates having a sound corresponding to the pitch of the chord type as a constituent sound. In order to realize the second detection option, in the code detection program, before starting the code candidate detection in the step S61, a display for designating the tension level to the user is performed. That is, a display that allows the user to specify the tension level is performed on the display device 5. For example, the tension level (1 to 3, N) is displayed on the display device 5, the display unit is provided with a switch function, and the tension level can be received by clicking the display. The code candidate detection unit 19 is configured to output a code candidate corresponding to the tension level when the designation of the tension level is received.

さらに、指定されたテンションレベルに応じて尤度の計算に補正を加えることができる。つまり、図５のテーブルを使って尤度を計算する際の、構成音の平均的なパワーの合計時に、各構成音の平均的なパワーに対してルート音からの音程に応じた係数を掛けてから、合計する。 Further, the likelihood calculation can be corrected according to the designated tension level. That is, when calculating the likelihood using the table of FIG. 5, the average power of each component sound is multiplied by a coefficient corresponding to the pitch from the root sound when the average power of the component sounds is calculated. Then add up.

図８は、テンションレベル毎のパワー補正係数を示す図である。図８において、テンションレベルが「０」の場合、７th（センブンス）〜１３th（サーティンス）の音のパワーに掛ける係数を小さくしている。テンションレベルが「１」の場合は、セブンスの音のパワーにだけ「１」より大きい係数を掛けて、ナインスからサーティンスの音のパワーは小さくしている。テンションレベルが「２」の場合は、セブンスからサーティンスまでのすべての音のパワーに「１」より大きい係数を掛けている。テンションレベル「Ｎ」の場合は、セブンスからサーティンスまでの音のパワーに係数「１」を掛けて、計算されたパワーを補正しないようにしている。 FIG. 8 is a diagram showing the power correction coefficient for each tension level. In FIG. 8, when the tension level is “0”, the coefficient applied to the sound power of 7th (Seventh) to 13th (Sirence) is reduced. When the tension level is “1”, only the power of the seventh sound is multiplied by a coefficient larger than “1” to reduce the power of the sound from the nineth to the third. When the tension level is “2”, the power of all sounds from the seventh to the third is multiplied by a coefficient larger than “1”. When the tension level is “N”, the power of the sound from the seventh to the third is multiplied by a coefficient “1” so that the calculated power is not corrected.

このようにすれば、指定したテンションレベルに応じたテンションのパワーが相対的に大きくなるので、所望のコードが出やすくなるし、指定したテンションレベルのみが検出されるということもない。 In this way, the tension power corresponding to the designated tension level is relatively increased, so that a desired code is easily generated and only the designated tension level is not detected.

コード名をコード検出区間毎のパワーの強い音から決定してもよいが、より検出精度を上げるために、さらに、コード進行を利用することができる。コード進行とは、複数のコードを連結したもので、コードの流れを表すものをいう。例えば、キーがＣメジャーのとき、「Ｄm7→Ｇ7→Ｃ」というコード進行が最も一般的なコード進行として知られている。このように、コード進行には、よく使われるコード進行と一般的でないコード進行とがあるので、よく使われているコード進行に従ってコード検出すると、検出精度の向上が期待できる。 It may determine the code name from power strong sound for each code detection section, but in order to increase more the detection accuracy can be further utilized chord progression. A chord progression is a concatenation of a plurality of chords and represents a chord flow. For example, when the key is C major, the chord progression of “Dm7 → G7 → C” is known as the most common chord progression. As described above, since chord progression includes frequently used chord progressions and unusual chord progressions, detection of chords according to frequently used chord progressions can be expected to improve detection accuracy.

検出精度を上げるため、コード進行を利用する場合、まず、コード検出範囲のすべてのコード検出区間について、コード候補を検出して尤度の計算を行う。上記テンションレベルによる絞り込みや尤度の補正は、この処理の先に行っておく。そして、すべてのコード検出区間について、コード候補が検出されたら、２番目のコード検出区間から、曲の終わりに向かって順に１つ前のコード検出区間のコード候補のコード名を使って、そのコード検出区間のコード候補の尤度に補正を加える。 In order to improve detection accuracy, when using chord progression, first, chord candidates are detected and the likelihood is calculated for all chord detection sections in the chord detection range. Compensation options and the likelihood of the above tension level, should go ahead of this process. When code candidates are detected for all chord detection sections, the chord names of the chord candidates in the previous chord detection section are used in order from the second chord detection section toward the end of the song. A correction is made to the likelihood of the code candidate in the detection section.

補正のため、コード進行データベースを用意する。そして、１つ前のコード検出区間のコード候補と今回のコード検出区間のコード候補とをコード進行データベースのコード進行と照らし合わせ、データベースのコード進行と一致する連結になっていれば、図５を使って計算された今回のコード候補の尤度を持ち上げる。また、禁則されているコード進行であったならば、尤度を下げるように補正する。 Prepare a chord progression database for correction. If the chord progression in the chord progression database is compared with the chord progression in the chord progression database and the chord progression in the chord progression database is compared with the chord progression in the chord progression database, FIG. Increases the likelihood of the current code candidate calculated using If the chord progression is prohibited, the likelihood is corrected to be lowered.

図９は、コード進行と、コード進行によって尤度に乗算する係数の一例を示す図である。図９では、一般的なコード進行には、大きい係数「１．３」を設定し、あまり一般的でないコード進行には、小さい係数「１．２、１．１等」を設定している。普通には使用されないコード進行に対しては、「１．０」より小さい「０．９」という係数を設定してある。 FIG. 9 is a diagram illustrating an example of chord progression and a coefficient by which likelihood is multiplied by chord progression. In FIG. 9, a large coefficient “1.3” is set for general chord progression, and a small coefficient “1.2, 1.1, etc.” is set for less common chord progression. For chord progressions that are not normally used, a coefficient of “0.9” smaller than “1.0” is set.

図９において、コード名を構成するローマ数字は、曲あるいは曲の中で、そのコードがある部分のキー（調）のトニック（主音）からの音程である。ユーザにキーを入力してもらう等して、そのキーに限定して尤度を補正することもできる。しかし、一般に曲の調性は、１曲の中でも部分毎に替わることが多いので、キーにかかわらず、前のコード検出区間のコード候補と対象としているコード検出区間のコード候補のルートの音程とコードタイプのみの一致によってコード進行の一致を判断すればよいものとすることができる。 In FIG. 9, the Roman numerals constituting the chord name are the pitches from the tonic of the key (key) of the portion where the chord is located in the song or song. Likelihood can be corrected only for the key by having the user input the key. However, in general, the tonality of a song often changes from part to part even within one song. Therefore, regardless of the key, the chord of the previous chord detection section and the chord root of the chord candidate of the target chord detection section It is only necessary to determine whether the chord progression matches based on the chord type only.

なお、図９において、３和音として書かれているメジャー（Ｍａｊ）コード（1、4等）には、メジャーコードだけでなく、Ｍ7のコードを含めてもよい。また、マイナー（ｍ）コード（2m、3m等）には、マイナーのトライアド（３和音）だけでなく、ｍ7のコードを含めてもよい。 In FIG. 9, the major chord (Maj) code (1, 4, etc.) written as a triad may include not only the major chord but also the M7 chord. The minor (m) code (2m, 3m, etc.) may include not only the minor triad (3 chords) but also the m7 code.

なお、図９に示したコード進行はメジャーキー（長調）の場合であるが、マイナーキー（短調）の場合のコード進行データベースも同様に作成可能である。 The chord progression shown in FIG. 9 is for the major key (major key), but a chord progression database for the minor key (minor key) can be created in the same manner.

図１０は、検出オプションの選択を含めたコード検出処理の要部フローチャートである。ステップＳ７０では、検出オプションとしてのジャンルをユーザに入力させる処理を行う。例えば、表示装置５に「曲のジャンルを入力して下さい」という文字を表示させ、ジャズかロックかを指定する選択ボタンを併せて表示させる。 FIG. 10 is a flowchart of a main part of the code detection process including selection of a detection option. In step S70, a process for causing the user to input a genre as a detection option is performed. For example, a character “Please input the genre of the song” is displayed on the display device 5 and a selection button for designating jazz or rock is also displayed.

ステップＳ７１では、検出オプションとしてのテンションレベルの指定をユーザに要求する処理を行う。例えば、表示装置５に「テンションレベルを入力して下さい」という文字を表示させ、テンションレベル「１」〜「３」、「Ｎ」のいずれかを指定する選択ボタンを併せて表示させる。 In step S71, processing for requesting the user to specify a tension level as a detection option is performed. For example, a character “Please input tension level” is displayed on the display device 5 and a selection button for designating any one of tension levels “1” to “3” and “N” is also displayed.

ステップＳ７２では、ユーザによって選択された曲のジャンルがジャズかロックかを判断し、この判断の結果により、ジャズかロックかによってそれぞれのジャンル係数Ｋ１〜Ｋ８を図６から読み出す。ステップＳ７３では、数式１を使ってコード検出区間の１拍目の各音階音の平均的なパワーを計算する。ステップＳ７４では、ステップＳ７３で読み出されたジャンル係数Ｋ１〜Ｋ８を使ってコード検出区間の１拍目の各音階音の平均的なパワーを補正する。ステップＳ７５では、補正されたコード検出区間の１拍目の各音階音の平均的なパワーが最大である音をベース音として抽出する。 In step S72, it is determined whether the genre of the song selected by the user is jazz or rock, and the genre coefficients K1 to K8 are read from FIG. In step S73, the average power of each tone of the first beat in the chord detection section is calculated using Equation 1. In step S74, the average power of each tone of the first beat in the chord detection section is corrected using the genre coefficients K1 to K8 read in step S73. In step S75, a sound having the maximum average power of each tone of the first beat in the corrected chord detection section is extracted as a base sound.

ステップＳ７６ではユーザが指定したテンションレベルを読み出す。ステップＳ７７ではテンションレベルに応じた構成音数の和音に限定してコード候補を抽出する。テンションレベルに応じた構成音数の和音に限定しない場合は、このステップを省略する。ステップＳ７８ではテンションレベルに応じてパワー補正係数を読み出す。ステップＳ７９では、コード候補に関して、図５を使ってパワー補正係数を組み入れて、尤度を計算する。 In step S76, the tension level designated by the user is read. In step S77, chord candidates are extracted only for chords having the number of constituent sounds corresponding to the tension level. This step is omitted if the chord is not limited to a chord having a number corresponding to the tension level. In step S78, the power correction coefficient is read according to the tension level. In step S79, the likelihood is calculated for the code candidate by incorporating the power correction coefficient using FIG.

ステップＳ８０では、尤度が最大であるコード候補を検出コードとしてコード名を決定する。 In step S80, the code name is determined using the code candidate having the maximum likelihood as the detection code.

コード進行による尤度の補正は、すべてのコード検出区間についてコード候補の抽出と尤度の計算が終わった後に行う。つまりステップＳ７３〜Ｓ７９の処理を全ての小節について実行した後、各小節毎にその小節で検出されたコード候補と直前の小節のコード候補との連結が所定のコード進行かどうかを判断する。そして、一般的なコード進行か一般的でないコード進行かによって尤度を補正して、最終的に、補正されたコード候補から最も尤度が大きいコード候補を検出コードとしてコード名を決定する。 Likelihood correction by chord progression is performed after extraction of chord candidates and calculation of likelihood for all chord detection intervals. That is, after the processing of steps S73 to S79 is executed for all the bars, it is determined for each bar whether or not the connection between the chord candidate detected in that bar and the chord candidate of the immediately preceding bar is a predetermined chord progression. Then, the likelihood is corrected depending on whether the chord progression is general or unordinary chord progression, and finally the chord name having the highest likelihood is selected from the corrected chord candidates as the detection code.

上記実施形態では、尤度が最大のコード候補をそのコード検出区間のコード名とする場合を示したが、これに限らず、例えば、尤度が大きい順に複数のコード候補を表示装置５に表示させ、ユーザが選択できるようにしてもよい。この際、コード候補に基づくＭＩＤＩ情報をサウンドシステム１０に転送して和音を発音させ、それを聞いた結果、複数のコード候補から一つを検出コードとして選択できるようにしてもよい。 In the above embodiment, the case where the code candidate having the maximum likelihood is used as the code name of the code detection section has been described. However, the present invention is not limited to this. For example, a plurality of code candidates are displayed on the display device 5 in descending order of likelihood. The user may be allowed to select. At this time, MIDI information based on the chord candidate may be transferred to the sound system 10 to generate a chord, and as a result of hearing it, one of a plurality of chord candidates may be selected as a detection code.

また、コード検出結果を表示する画面には、コード検出対象である曲の音響信号（元波形）による演奏と検出コードのＭＩＤＩによる和音の演奏とを同時に行わせる再生スイッチを設けることができる。ユーザによる再生スイッチの操作に応答して曲と検出コードによる和音とが演奏されるので、ユーザは両者を聞くことによって、検出コードが正しいかどうかを確認することができる。正しくない検出コードが曲と共に再生された場合、違和感のある聞こえになるので、ユーザは検出コードの間違いを発見することができる。この際、検出コードの確認を容易にするため、曲の演奏音と検出されたコード名による和音の演奏音とを個別に音量バランスを調整できるようにするとよい。 In addition, the screen for displaying the chord detection result can be provided with a playback switch for simultaneously performing a performance by an acoustic signal (original waveform) of a song to be chord detected and a chord performance by MIDI of the detection chord. In response to the user operating the playback switch, a song and a chord based on the detection code are played, so that the user can confirm whether the detection code is correct by listening to both. If an incorrect detection code is played along with the song, it will sound uncomfortable, and the user can find an error in the detection code. At this time, in order to easily confirm the detected chord, it is preferable that the volume balance of the performance sound of the song and the chord performance sound based on the detected chord name can be individually adjusted.

検出コードの間違いが発見されたときには、ユーザがコード名を修正することができるようにする。例えば、コードの第１候補に代えて第２候補を選択することができる。また、例えば、表示画面５上に、コードトラックに貼り付けられたコード名を表示し、その表示部をユーザが指定して修正できるようにする。修正のためのコード名候補を画面に表示し、ユーザがその中から修正コードとして選択できるようにするのがよい。画面上のデータ修正は周知の手法を採用して実施できる。 When an error in the detection code is found, the user can correct the code name. For example, the second candidate can be selected instead of the first candidate for the code. Further, for example, the code name pasted on the code track is displayed on the display screen 5, and the display portion can be specified and corrected by the user. It is preferable that code name candidates for correction are displayed on the screen so that the user can select them as correction codes. Data correction on the screen can be performed by employing a well-known method.

さらに、ユーザが検出コード名の修正を行った場合、この修正後のコードと、その１つ前のコード検出区間のコードとの連結からなるコード進行が、所定のコード進行データベース（図９参照）に含まれていない場合は、修正後のコードを含む前記コード進行をコード進行データベースに登録する。これによってその後該コード進行が検出され易くできる。つまり、ユーザが好む曲のジャンルにおいてよく使用されるコード進行がデータベースとして蓄積されることになるので、修正によって追加されるコード進行が増えるにつれて、コード進行を使った検出オプションによる効果が顕著になってくる。 Further, when the user corrects the detected code name, a chord progression formed by concatenating the corrected chord and the chord in the preceding chord detection section is a predetermined chord progression database (see FIG. 9). If not included, the chord progression including the chord after correction is registered in the chord progression database. Thereby, the chord progression can be easily detected thereafter. In other words, the chord progressions that are often used in the genre of the song that the user likes are accumulated as a database, so the effect of the detection option using chord progression becomes more prominent as the chord progression added by modification increases. Come.

さらに、コード名がユーザによって修正されてコード進行が新たに登録された場合、そのコード検出区間以降のコード検出区間においてコード検出を再度行うようにするとよい。ユーザが修正を行うことによって、修正されたコード検出区間のコードと、その直後のコード検出区間のコードとの連結によるコード進行が変化するので、変化後のコード進行を考慮したコード検出が後続のコード検出区間に対しても自動的に行われる。したがって、全体のコード名の確認と修正とにかかる時間を短縮することができる。 Furthermore, when the chord name is corrected by the user and the chord progression is newly registered, chord detection may be performed again in the chord detection section after the chord detection section. When the user makes a correction, the chord progression due to the connection between the chord in the chord detection section that has been corrected and the chord in the chord detection section immediately after the chord changes. This is also automatically performed for the code detection section. Therefore, it is possible to shorten the time required for checking and correcting the entire code name.

以上のように、本実施形態によれば、音響信号からコードを検出する際に、曲のジャンルやテンションレベルをユーザが指定することによって、ユーザの好む曲に関するコードを迅速に、かつ精度良く検出することができる。また、曲のジャンルに合わせて、よく使用されるコード進行を参照して適切なコードを見つけることができる。さらに、決定したコードを修正し、その修正結果をデータベースに含めることで、さらにコード検出精度を高めることができる。 As described above, according to the present embodiment, when a chord is detected from an acoustic signal, a chord related to a user's favorite song can be detected quickly and accurately by designating the genre and tension level of the song. can do. In addition, an appropriate chord can be found by referring to a frequently used chord progression according to the genre of the song. Furthermore, the code detection accuracy can be further improved by correcting the determined code and including the correction result in the database.

なお、本実施形態では、拍子やビート位置をユーザが入力するとともに、コード検出区間の単位を小節とした例を示した。しかし、本発明はこれに限らず、特許文献１や２に示したプログラムのように、拍子やビート位置を音響信号から自動的に検出し、かつ１小節内で複数のコード検出区間を設定するコード検出プログラムにも適用できる。要は、音響信号に対して音階音のパワーを算出して、その検出結果に基づいてベース音を検出し、さらに、コード候補を検出し尤度を算出する際に、ジャンルやテンションレベル、あるいはさらにコード進行データベースによって検出コード候補を絞り込めるようにする点が重要である。 In the present embodiment, the example is shown in which the user inputs the time signature and the beat position and the unit of the chord detection section is a measure. However, the present invention is not limited to this, and as in the programs shown in Patent Documents 1 and 2, the time signature and beat position are automatically detected from the acoustic signal, and a plurality of chord detection sections are set within one measure. It can also be applied to code detection programs. In short, when calculating the power of the scale sound for the acoustic signal, detecting the bass sound based on the detection result, and further detecting the chord candidate and calculating the likelihood, the genre, tension level, or Furthermore , it is important to be able to narrow down the detection code candidates by the chord progression database.

また、本実施形態では、音階音のパワーを求めるのにＦＦＴ演算によるパワースペクトルの算出を行ったが、音階音のパワーは、音響信号を各音階音に対応するバンドパスフィルタを通して求めても良い。 In this embodiment, the power spectrum is calculated by FFT calculation to obtain the power of the scale sound. However, the power of the scale sound may be obtained through a band pass filter corresponding to each scale sound. .

検出オプションを含んでいないコード検出プログラムの要部機能を示すブロック図である。It is a block diagram which shows the principal part function of the code | cord | chord detection program which does not include a detection option. 本発明の一実施形態に係るコード検出プログラムの実行に用いられるパーソナルコンピュータのハード構成を示す図である。It is a figure which shows the hardware constitutions of the personal computer used for execution of the code | cord | chord detection program which concerns on one Embodiment of this invention. 本発明の一実施形態に係るコード検出プログラムの要部を示すフローチャートである。It is a flowchart which shows the principal part of the code | cord | chord detection program which concerns on one Embodiment of this invention. 図３に示したコード検出処理の詳細なフローチャートである。4 is a detailed flowchart of code detection processing shown in FIG. 3. コードタイプと各コードタイプ毎の構成音のルート音からの音程を示すテーブルの例を示す図である。It is a figure which shows the example of the table which shows the pitch from the root sound of the structure sound for every chord type and each chord type. ジャンルに応じて倍音毎に使用される係数の例を示す図である。It is a figure which shows the example of the coefficient used for every overtone according to a genre. テンションレベルと各テンションレベル毎の和音の絞り込みの例を示す図である。It is a figure which shows the example of narrowing down the chord for every tension level and each tension level. テンションレベル毎のパワー補正係数を示す図である。It is a figure which shows the power correction coefficient for every tension level. コード進行と、コード進行毎に乗算する尤度の係数の一例を示す図である。It is a figure which shows an example of the coefficient of the likelihood which multiplies for every chord progression and chord progression. 検出オプションの選択を含めたコード検出処理の要部フローチャートである。It is a principal part flowchart of a code detection process including selection of a detection option.

Explanation of symbols

１…パーソナルコンピュータ、５…表示装置、９…キーボード、１１…ＣＤ−ＲＯＭドライブ、１２…音響信号入力部、１３…音響信号記憶部、１５…コード検出区間設定部、１６…パワー算出部、１８…ベース音検出部、１９…コード候補検出部、２０…尤度算出部、２１…コード名決定部 DESCRIPTION OF SYMBOLS 1 ... Personal computer, 5 ... Display apparatus, 9 ... Keyboard, 11 ... CD-ROM drive, 12 ... Acoustic signal input part, 13 ... Acoustic signal storage part, 15 ... Code detection area setting part, 16 ... Power calculation part, 18 ... Bass sound detection unit 19. Chord candidate detection unit 20. Likelihood calculation unit 21. Chord name determination unit

Claims

In a code detection program for causing a computer to function as a sound signal code detection device by being read and executed by a computer,
Reading an acoustic signal of a song from an acoustic signal source;
Detecting the power of each tone of the acoustic signal;
Setting a plurality of chord detection sections in the acoustic signal;
Detecting from the power the power of each tone of the planned bass detection range in the portion corresponding to the first beat of each chord detection section;
Determining a sound having the maximum power among the scale sounds as a bass sound;
Extracting a predetermined number of pitch names in descending order of power from the scale sounds of the entire chord detection section, and detecting chord candidates each having the predetermined number of pitch names as a root sound;
Calculating likelihood from the average power magnitude of all chord constituent sounds for the chord candidates;
Determining a code candidate having the maximum likelihood as a code;
And accepting the input of the genre of the song that is the code detection target,
The step of determining the base sound further includes the step of using the sound having the maximum power of the fundamental tone and harmonics of each scale tone as the base tone, and the step of separately weighting the power of the fundamental tone and harmonics according to the genre. A code detection program comprising:

In a code detection program for causing a computer to function as a sound signal code detection device by being read and executed by a computer,
Reading an acoustic signal of a song from an acoustic signal source;
Detecting the power of each tone of the acoustic signal;
Setting a plurality of chord detection sections in the acoustic signal;
Detecting from the power the power of each tone of the planned bass detection range in the portion corresponding to the first beat of each chord detection section;
Determining a sound having the maximum power among the scale sounds as a bass sound;
Extracting a predetermined number of pitch names in descending order of power from the scale sounds of the entire chord detection section, and detecting chord candidates each having the predetermined number of pitch names as a root sound;
Calculating likelihood from the average power magnitude of all chord constituent sounds for the chord candidates;
Determining a code candidate having the maximum likelihood as a code;
The step of accepting the input of the tension level,
The code detection program characterized in that the step of detecting the chord candidates further includes a step of narrowing chord candidates down to chords having constituent numbers corresponding to tension levels.

The chord according to claim 2, wherein the step of calculating the likelihood further includes a step of performing power correction in accordance with a tension level based on a pitch from a root tone with respect to a constituent tone of a detected chord candidate. Detection program.

Based on the power of each tone of the musical signal of the song, the base sound is determined as the sound having the maximum power of each scale sound in the planned base sound detection range in the portion corresponding to the first beat of each chord detection section Sound detection means;
A chord candidate detection means for extracting a predetermined number of pitch names in order of increasing power from the scale sounds of the entire chord detection section, and detecting chord candidates each having the predetermined number of pitch names as a root sound;
A likelihood calculating means for calculating a likelihood from an average power level of all chord constituent sounds for the code candidate, and configured to determine a code candidate having the maximum likelihood as a code. In the code detection device,
Comprising genre input means for designating the genre of a song consisting of an acoustic signal that is a code detection target;
The bass sound detecting means is configured to use the sound having the highest fundamental and harmonic power of each scale as a bass sound, and separately weight the fundamental and harmonic power according to the genre. A code detection device characterized by the above.

Based on the power of each tone of the musical signal of the song, the base sound is determined as the sound having the maximum power of each scale sound in the planned base sound detection range in the portion corresponding to the first beat of each chord detection section Sound detection means;
A chord candidate detection means for extracting a predetermined number of pitch names in order of increasing power from the scale sounds of the entire chord detection section, and detecting chord candidates each having the predetermined number of pitch names as a root sound;
A likelihood calculating means for calculating a likelihood from an average power level of all chord constituent sounds for the code candidate, and configured to determine a code candidate having the maximum likelihood as a code. In the code detection device,
A tension level input means for designating the tension level;
The chord detection apparatus further comprises a chord candidate detection means for narrowing chord candidates to a chord having a number of constituent sounds corresponding to a tension level.

The likelihood calculation means is configured to calculate the likelihood with the power corrected according to the tension level by the pitch from the root sound for the constituent sound of the detected chord candidate. 5. The code detection device according to 5 .