JP2604409B2

JP2604409B2 - Automatic music transcription method and device

Info

Publication number: JP2604409B2
Application number: JP63046124A
Authority: JP
Inventors: 七郎鶴田; 洋典高島; 正樹藤本; 正典水野
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1988-02-29
Filing date: 1988-02-29
Publication date: 1997-04-30
Anticipated expiration: 2012-04-30
Also published as: JPH01219633A

Description

【発明の詳細な説明】［産業上の利用分野］本発明は、歌唱音声やハミング音声や楽器音等の音響
信号から楽譜データを作成する自動採譜方法及び装置に
関し、特に音響信号が有する調を決定する調決定処理に
関するものである。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an automatic music transcription method and apparatus for creating musical score data from acoustic signals such as singing voices, humming voices, and instrumental sounds. This is related to a key determination process to be determined.

［従来の技術］歌唱音声やハミング音声や楽器音等の音響信号を楽譜
データに変換する自動採譜方式においては、音響信号か
ら楽譜としての基本的な情報である音長、音程、調、拍
子及びテンポを検出することを有する。[Prior Art] In an automatic transcription system for converting an acoustic signal such as a singing voice, a humming voice, or a musical instrument sound into musical score data, a sound length, a pitch, a key, a time signature, and the like, which are basic information as a musical score from an acoustic signal. Detecting the tempo.

ところで、音響信号は基本波形の繰返し波形を連続的
に含む信号であるだけであり、上述した各情報を直ちに
得ることはできない。By the way, an acoustic signal is only a signal that continuously includes a repetitive waveform of a basic waveform, and the above-described information cannot be obtained immediately.

そこで、従来の自動採譜方式においては、まず、音響
信号の音高を表す基本波形の繰返し情報（以下、ピッチ
情報と呼ぶ）及びパワー情報を分析周期毎に抽出し、そ
の後、抽出されたピッチ情報及び又はパワー情報から音
響信号を同一音程とみなせる区間（セグメント）に区分
し（かかる処理をセグメンテーションと呼ぶ）、次い
で、セグメントのピッチ情報から各セグメントの音響信
号の音程として絶対音程軸にそった音程を同定し、上述
したピッチ情報から音響信号が有する調を決定し、さら
に、セグメントに基づいて音響信号の拍子及びテンポを
決定するという順序で各情報を得ていた。Therefore, in the conventional automatic transcription method, first, repetition information (hereinafter, referred to as pitch information) of a basic waveform representing a pitch of an acoustic signal and power information are extracted for each analysis cycle, and thereafter, the extracted pitch information is extracted. And / or dividing the audio signal into sections (segments) that can be regarded as the same pitch based on the power information (this processing is called segmentation), and then, based on the pitch information of the segment, the pitch along the absolute pitch axis as the pitch of the audio signal of each segment. Is identified, the key of the sound signal is determined from the above-described pitch information, and further, the time signature and the tempo of the sound signal are determined based on the segment.

［発明が解決しようとする課題］ところで、音響信号のあるセグメントを絶対音程軸上
の音程に同定しようとしても、音響信号、特に人によっ
て発声された音響信号は音程が安定しておらず、１音を
意図している場合であっても音程の揺らぎが多い。その
ため、音程同定処理を非常に難しいものとしていた。[Problems to be Solved by the Invention] By the way, even if an attempt is made to identify a certain segment of an acoustic signal as a pitch on an absolute pitch axis, the pitch of an acoustic signal, particularly an acoustic signal uttered by a human, is not stable. Even if a sound is intended, the pitch fluctuates frequently. For this reason, the pitch identification processing is very difficult.

楽譜データは音響信号の再現性を確保するための記録
情報であり、再現性の観点からいえば、音程は、音長と
共に楽譜データの基本的な要素であって正確に同定する
ことが必要であり、正確に同定することができない場合
には、楽譜データの精度を低いものとする。Musical score data is recorded information for ensuring the reproducibility of acoustic signals, and from the viewpoint of reproducibility, pitch is a fundamental element of musical score data together with pitch, and it is necessary to identify it accurately. If it cannot be identified correctly, the accuracy of the score data is deemed to be low.

ところで、音響信号が有する調は、楽譜データの要素
であるだけでなく、音程との間にある種の関係が存在
し、音程を決定する重要な手掛かりを与えるものであ
る。従って、調を決定して同定音程を見直すことは同定
された音程の精度を向上される上で望ましく、良好に音
響信号の調を決定することが望まれる。By the way, the key of the sound signal is not only an element of the musical score data but also has a certain relationship with the musical interval, and provides an important clue for determining the musical interval. Therefore, it is desirable to determine the key and review the identified pitch in order to improve the accuracy of the identified pitch, and it is desirable to determine the key of the acoustic signal well.

本発明は、以上の点を考慮してなされたもので、音響
信号の調を正確に決定することができ、最終的な楽譜デ
ータの精度を一段と向上させることのできる自動採譜方
法及び装置を提供しようとするものである。The present invention has been made in consideration of the above points, and provides an automatic music transcription method and apparatus that can accurately determine the tone of an acoustic signal and can further improve the accuracy of final score data. What you want to do.

［課題を解決するための手段］かかる課題を解決するため、第１の本発明において
は、入力された音響信号波形の繰返し周期であり、音高
を表すピッチ情報及び音響信号のパワー情報を抽出する
処理と、ピッチ情報及び又はパワー情報に基づいて音響
信号を同一音程とみなせる区間に区分するセグメンテー
ション処理と、この区分された区間について音響信号の
絶対音程軸上の音程を決定する音程同定処理と、ピッチ
情報に基づいて音響信号が有する調を決定する調決定処
理とをを少なくとも含み、音響信号を楽譜データに変換
する自動採譜方法において、調決定処理が、ピッチ情報
を絶対音程軸上の各音程に振り分けて集計して音響信号
が有する音程の音階発生頻度を抽出する処理と：調にお
ける各音階について予め定められている重み付け係数と
抽出された音響信号が有する音程の音階発生頻度との積
和を全ての調について算出する処理と：算出された積和
のうち最大の積和を有する調を音響信号の調として決定
する処理とからなるようにした。[Means for Solving the Problems] In order to solve the problems, in the first aspect of the present invention, pitch information representing a pitch and a power information of an acoustic signal, which is a repetition period of an input acoustic signal waveform, is extracted. And a segmentation process of dividing the acoustic signal into sections that can be regarded as having the same pitch based on the pitch information and / or the power information, and a pitch identification process of determining a pitch on the absolute pitch axis of the acoustic signal for the divided section. And at least a key determination process of determining a key of the sound signal based on the pitch information, wherein the key determination process converts the pitch information to each of the absolute pitch axes on the absolute pitch axis. A process of extracting the frequency of occurrence of the pitch of the sound signal by dividing and summing up the pitch, and a predetermined weight for each scale in the key Calculating the sum of the products of the pitch coefficient and the scale occurrence frequency of the pitch of the extracted audio signal for all the keys; and determining the key having the largest sum of the products of the calculated sum of the products as the key of the audio signal. Processing.

また、第２の本発明においては、入力された音響信号
波形の繰返し周期であり、音高を表すピッチ情報及び音
響信号のパワー情報を抽出するピッチ・パワー抽出手段
と、ピッチ情報及び又は上記パワー情報に基づいて音響
信号を同一音程とみなせる区間に区分するセグメンテー
ション手段と、この区分された区間について音響信号の
絶対音程軸上の音程を決定する音程同定手段と、ピッチ
情報に基づいて音響信号が有する調を決定する調決定手
段とを一部に備えて音響信号を楽譜データに変換する自
動採譜装置において、調決定手段を、ピッチ情報を絶対
音程軸上の各音程に振り分けて集計して音響信号が有す
る音程の音階発生頻度を抽出する音階頻度抽出部と：調
における各音階について予め定められている重み付け係
数と抽出された音響信号が有する音程の音階発生頻度と
の積和を全ての調について算出する積和算出部と：算出
された積和のうち最大の積和を有する調を音響信号の調
として決定する調決定部とで構成した。Further, in the second aspect of the present invention, pitch / power extraction means for extracting pitch information representing a pitch and power information of the audio signal, which is a repetition period of the input audio signal waveform, comprises pitch information and / or the power Segmentation means for dividing the sound signal into sections that can be regarded as having the same pitch based on the information; pitch identification means for determining the pitch on the absolute pitch axis of the sound signal for the divided section; In an automatic transcription apparatus which partially converts a sound signal into musical score data by providing a key determining means for determining a key to have, the key determining means sorts pitch information into respective pitches on an absolute pitch axis and totals the pitch information. A scale frequency extraction unit for extracting a scale occurrence frequency of a pitch included in a signal: a weighting coefficient predetermined for each scale in a key and an extracted sound A sum-of-products calculation unit that calculates the sum of products of the pitches of the pitches and the scale occurrence frequency for all tones; It consisted of:

［作用］第１の本発明においては、調によって重要となる音階
が異なることに着目して調毎に各音階に対する重み付け
係数を予め作成しておき、入力された音響信号が有する
音程の音階発生頻度をピッチ情報に基づいて抽出し、抽
出された頻度とその音階の重み付け係数との積和を各調
について求めて積和が最大となる調を当該音響信号の調
として決定するようにした。[Operation] In the first aspect of the present invention, a weighting coefficient for each scale is created in advance for each key, focusing on the fact that important scales differ depending on the key, and the scale generation of the pitch of the input acoustic signal is performed. The frequency is extracted based on the pitch information, the product sum of the extracted frequency and the scale weighting coefficient is obtained for each key, and the key having the maximum product sum is determined as the key of the sound signal.

また、第２の本発明においては、同様な着目に基づい
て調毎に各音階に対する重み付け係数を予め作成してお
き、入力された音響信号が有する音程の音階発生頻度を
ピッチ情報に基づいて音階頻度抽出部によって抽出し、
抽出された頻度とこの重み付け係数との積和を積和算出
部によって各調について求め、積和が最大となる調を調
決定部によって当該音響信号の調として決定するように
した。In the second aspect of the present invention, a weighting coefficient for each scale is created in advance for each key based on the same attention, and the scale generation frequency of the pitch of the input acoustic signal is determined based on the pitch information. Extracted by the frequency extraction unit,
The product sum of the extracted frequency and this weighting coefficient is obtained for each key by the product-sum calculation unit, and the key that maximizes the product sum is determined as the key of the sound signal by the key determination unit.

［実施例］以下、本発明の一実施例を図面を参照しながら詳述す
る。Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.

自動採譜方式まず、本発明が適用される自動採譜方式について説明
する。Automatic transcription system First, an automatic transcription system to which the present invention is applied will be described.

第３図において、中央処理ユニット（CPU）１は、当
該装置の全体を制御するものであり、バス２を介して接
続している主記憶装置３に格納されている第４図に示す
採譜処理プログラムを実行するものである。バス２に
は、CPU1及び主記憶装置３に加えて、入力装置としての
キーボード４、出力装置としての表示装置５、ワーキン
グメモリとして用いられる補助記憶装置６及びアナログ
／デジタル変換器７が接続されている。In FIG. 3, a central processing unit (CPU) 1 controls the whole of the apparatus, and performs a transcription process shown in FIG. 4 stored in a main storage device 3 connected via a bus 2. Execute the program. In addition to the CPU 1 and the main storage device 3, a keyboard 4 as an input device, a display device 5 as an output device, an auxiliary storage device 6 used as a working memory, and an analog / digital converter 7 are connected to the bus 2. I have.

アナログ／デジタル変換器７には、例えば、マイクロ
ファンでなる音響信号入力装置８が接続されている。こ
の音響信号入力装置８は、ユーザによって発声された歌
唱やハミングや、楽器から発生された楽音等の音響信号
を捕捉して電気信号に変換するものであり、その電気信
号をアナログ／デジタル変換器７に出力するものであ
る。The analog / digital converter 7 is connected to an acoustic signal input device 8 composed of, for example, a microfan. The acoustic signal input device 8 captures an acoustic signal such as singing or humming uttered by a user or a musical tone generated from a musical instrument and converts the signal into an electric signal, and converts the electric signal into an analog / digital converter. 7 is output.

CPU1は、キーボード入力装置４によって処理が指令さ
れたとき、当該採譜処理を開始し、主記憶装置３に格納
されているプログラムを実行してアナログ／デジタル変
換器７によってデジタル信号に変換された音響信号を一
旦補助記憶装置６に格納し、その後、これら音響信号を
上述のプログラムを実行して楽譜データに変換して必要
に応じて表示装置５に出力するようになされている。When a process is instructed by the keyboard input device 4, the CPU 1 starts the transcription process, executes a program stored in the main storage device 3, and converts the sound converted into a digital signal by the analog / digital converter 7. The signals are temporarily stored in the auxiliary storage device 6, and thereafter, these sound signals are converted into musical score data by executing the above-described program and output to the display device 5 as necessary.

次に、CPU1が実行する音響信号を取り込んだ後の採譜
処理を第４図の機能レベルで示すフローチャートに従っ
て詳述する。Next, the transcription process performed by the CPU 1 after capturing the audio signal will be described in detail with reference to the flowchart shown in FIG.

まず、CPU1は、音響信号を自己相関分析して分析周期
毎に音響信号のピッチ情報を抽出し、また２乗和処理し
て分析周期毎にパワー情報を抽出し、その後ノイズ除去
や平滑化処理等の後処理を実行する（ステップSP1、SP
2）。その後、CPU1は、ピッチ情報については、その分
布状況に基づいて絶対音程軸に対する音響信号が有する
音程軸のずれ量をピッチ情報の分布情報に基づいて算出
し、得られたピッチ情報をそのずれ量に応じてシフトさ
せるチューニング処理を実行する（ステップSP3）。す
なわち、音響信号を発生した歌唱者または楽器が有する
音程軸と絶対音程軸との差が小さくなるようにピッチ情
報を修正する。First, the CPU 1 performs an autocorrelation analysis of the acoustic signal to extract pitch information of the acoustic signal at each analysis cycle, and also performs a sum-of-squares process to extract power information at each analysis cycle, and then performs noise removal and smoothing processing. And other post-processing (steps SP1, SP
2). Thereafter, for pitch information, the CPU 1 calculates a shift amount of a pitch axis of the acoustic signal with respect to the absolute pitch axis based on the distribution state of the pitch information based on the distribution information of the pitch information, and calculates the obtained pitch information as the shift amount. (Step SP3). That is, the pitch information is corrected so that the difference between the pitch axis and the absolute pitch axis of the singer or musical instrument that has generated the acoustic signal is reduced.

次いて、CPU1は、得られたピッチ情報が同一音程を指
示するものと考えられるピッチ情報の連続期間を得て、
音響信号を１音ごとのセグメントに切り分けるセグメン
テーションを実行し、また、得られたパワー情報の変化
に基づいてセグメンテーションを実行する（ステップSP
4、SP5）。これら得られた両者のセグメント情報に基づ
いて、CPU1は、４分音符や８分音符等の時間長に相当す
る基準長を算出してこの基準長に基づいて再度セグメン
テーションを実行する（ステップSP6）。Next, the CPU 1 obtains a continuous period of pitch information in which the obtained pitch information is considered to indicate the same pitch,
A segmentation is performed to divide the acoustic signal into segments for each sound, and a segmentation is performed based on the obtained change in the power information (step SP
4, SP5). Based on these two pieces of segment information obtained, the CPU 1 calculates a reference length corresponding to a time length of a quarter note, an eighth note, etc., and executes the segmentation again based on this reference length (step SP6). .

CPU1は、このようにしてセグメンテーションされたセ
グメントのピッチ情報に基づき、そのピッチ情報が最も
近いと判断できる絶対音程軸上の音程にそのセグメント
の音程を同定し、さらに、同定された連続するセグメン
トの音程が同一か否かに基づいて再度セグメンテーショ
ンを実行する（ステップSP7、SP8）。Based on the pitch information of the segment thus segmented, the CPU 1 identifies the pitch of the segment as a pitch on the absolute pitch axis that can be determined that the pitch information is the closest, and further, determines the pitch of the identified continuous segment. The segmentation is performed again based on whether or not the pitches are the same (steps SP7 and SP8).

その後、CPU1は、ピッチ情報の分布情報に基づいて、
例えば、ハ長調やイ短調というように入力音響信号の楽
曲の調を決定し、決定された調における音階上の所定の
音程についてその音程をピッチ情報に基づいてより詳細
に見直して音程を確認、修正する（ステップSP9、SP1
0）。次いで、CPU1は、最終的に決定された音程から連
続するセグメントについて同一なものがあるか否か、ま
た連続するセグメント間でパワーの変化があるか否かに
基づいてセグメンテーションの見直しを実行し、最終的
なセグメンテーションを行なう（ステップSP11）。After that, the CPU 1 determines, based on the pitch information distribution information,
For example, the key of the music of the input sound signal is determined such as C major or A minor, and the predetermined pitch on the scale in the determined key is reviewed in more detail based on the pitch information to confirm the pitch, Modify (steps SP9, SP1
0). Next, the CPU 1 executes a review of the segmentation based on whether or not there is the same continuous segment from the finally determined pitch, and whether or not there is a power change between the continuous segments, Final segmentation is performed (step SP11).

このようにして音程及びセグメントが決定されると、
CPU1は、楽曲は１拍目から始まる、フレーズの最後の音
は次の小節にまたがらない、小節ごとに切れ目がある等
の観点から小節を抽出し、この小節情報及びセグメンテ
ーション情報から拍子を決定し、この決定された拍子情
報及び小節の長さからテンポを決定する（ステップSP1
2、SP13）。Once the pitch and segment are determined in this way,
The CPU 1 extracts measures from the viewpoint that the music starts from the first beat, the last sound of the phrase does not extend to the next measure, and there is a break in each measure, and determines the time signature from the measure information and the segmentation information. The tempo is determined from the determined time signature information and the length of the bar (step SP1).
2, SP13).

そして、CPU1は決定された音程、音長、調、拍子及び
テンポの情報を整理して最終的に楽譜データを作成する
（ステップSP14）。Then, the CPU 1 organizes the information on the determined pitch, pitch, key, beat, and tempo to finally create the musical score data (step SP14).

音響信号の調決定処理次に、このような自動採譜方式における音響信号が有
する調の決定処理（ステップSP9）について、第１図の
フローチャートを用いて詳述する。Processing for Determining the Key of an Acoustic Signal Next, the processing for determining the key (step SP9) of an audio signal in such an automatic transcription system will be described in detail with reference to the flowchart of FIG.

CPU1は、上述したチューニング処理によってチューニ
ングされた全てのピッチ情報から音階ヒストグラムを作
成する（ステップSP20）。ここで、音階ヒストグラムと
は、「ハ（ド）」、「嬰ハ：変ニ（ド＃：レｂ）」、
「ニ（レ）」、…、「イ（ラ）」、「嬰イ：変ロ（ラ
＃：シｂ）」、「ロ（シ）」の12個の絶対音程軸上の音
階についてのヒストグラムであり、ピッチ情報が絶対音
程軸上にない場合には、そのピッチ情報が近い２個の絶
対音程軸上の音階に対してそれらに対する距離に応じて
按分して振り分けて集計したものである。従って、１オ
クターブだけ異なる音程は同一の音階として集計される
ことになる。The CPU 1 creates a scale histogram from all pieces of pitch information tuned by the above-described tuning process (step SP20). Here, the musical scale histogram is “ha (do)”, “sharp ha: strange ni (do #: leb)”,
Histograms of the scales on the 12 absolute pitch axes of "ni (re)", ..., "i (la)", "sharp a: strange ro (la #: shi b)", and "ro (shi)" When the pitch information is not on the absolute pitch axis, the pitch information is distributed to the two scales on the absolute pitch axis that are close to each other in proportion to the distance between them, and totaled. Therefore, intervals that differ by one octave are counted as the same scale.

次いで、CPU1は「ハ長調」、「変ニ長調」、「ニ長
調」、…、「変ロ長調」、「ロ長調」の12個の長調、、
及び「イ短調」、「変ロ短調」、「ロ短調」、…、「ト
短調」、「変イ短調」の12個の短調の計24個の全ての調
について、その調により定まる第２図に示すような各音
階に対する重み付け係数と上述の音階ヒストグラムの集
計値との積和を求める（ステップSP21）。Next, the CPU 1 performs twelve majors of "C major", "D major major", "D major", ..., "B major major", "B major",
Fig. 2 Determined by all the tones in a total of 24 tones, including 12 minors of "A minor", "B minor", "B minor", ..., "G minor", "G minor A" Then, the product sum of the weighting coefficient for each scale and the total value of the above-described scale histogram is obtained (step SP21).

なお、第２図は「ハ長調」の重み付け係数を第１欄CO
L1に、「イ短調」の重み付け係数を第２欄COL2に、「変
ニ長調」の重み付け係数を第３欄COL3に、「変ロ短調」
の重み付け係数を第４欄COL4に例示したものであり、他
の調についても同様に、長調については、主音（ド）か
ら「202021020201」の重み付け係数を用い、短調につい
ては主音（ラ）から「202201022010」の重み付け係数を
用いるようになされている。FIG. 2 shows the weighting coefficient of “C major” in the first column CO
In L1, the weighting coefficient of "A minor" is in the second column COL2, and in the third column COL3 is the weighting coefficient of "in D major"
In the fourth column, COL4 is used as an example. Similarly, for other keys, a weighting coefficient of "202021020201" is used for the major key from "do" and for a minor key, the weighting coefficient is " 202201022010 ".

ここで、重み付け係数は、その調において、臨時記号
（＃、ｂ）なしで表せる音階について「０」以外の重み
を与えると共に、長調と短調の７音音階と５音音階とを
合わせたもの、すなわち、長調と短調で主音を一致させ
た場合に主音からの音程差が一致する音階に「２」を用
い、音程差が一致しない音階に「１」を用いるように決
定した。なお、このような重み付け係数は、その調にお
ける各音階の重要度に対応しているものである。Here, the weighting coefficient gives a weight other than “0” to the scale that can be expressed without the accidentals (#, b) in the key, and combines the seven scales of the major and minor scales and the pentatone scale, That is, when the main tones are matched in major and minor, "2" is used for the scale in which the pitch difference from the main tone matches, and "1" is used for the scale in which the pitch difference does not match. Such weighting coefficients correspond to the importance of each scale in the key.

CPU1は、このようにして24個の全ての調について積和
を得ると、その積和が最大となっている調を当該音響信
号の調として決定して当該調決定処理を終了する（ステ
ップSP22）。When the sum of products is obtained for all 24 tones in this way, the CPU 1 determines the tone with the largest sum of products as the tone of the audio signal, and ends the tone determination process (step SP22). ).

従って、上述の実施例によれば、音階ヒストグラムを
作成し、各音程の音階としての発生頻度を捕らえ、その
頻度と調に応じて定まる音階の重要度パラメータとして
の重み付け係数との積合を求め、その積和の最大となる
調を音響信号の調として決定するようにしたので、正確
に調を決定することができ、この調によって同定された
音程を見直すことができ、楽譜データの精度を一段と向
上させることができる。Therefore, according to the above-described embodiment, a scale histogram is created, the frequency of occurrence of each pitch as a scale is captured, and the product of the frequency and a weighting coefficient as an importance parameter of the scale determined according to the key is obtained. Since the key of the sum of the products is determined as the key of the sound signal, the key can be determined accurately, the pitch identified by this key can be reviewed, and the accuracy of the score data can be improved. It can be further improved.

他の実施例なお、調決定処理に用いるピッチ情報は、周波数単位
のHzで表わされているものであっても良く、また、音楽
分野で良く用いられているセント単位で表わされている
ものであっても良い。Other Embodiments Note that the pitch information used in the key determination process may be expressed in Hz in frequency units, or in cent units often used in the music field. It may be something.

また、重み付け係数は上述の実施例にものに限られる
ことはなく、例えば、主音にさらに大きい重みを与える
ようにしても良い。Further, the weighting coefficients are not limited to those of the above-described embodiment, and for example, a larger weight may be given to the tonic.

さらに、上述の実施例においては、第４図に示す全て
の処理をCPU1が主記憶装置３に格納されているプログラ
ムに従って実行するものを示したが、その一部または全
部の処理をハードウェア構成で実行するようにしても良
い。例えば、第３図との対応部分に同一符号を付した第
５図に示すように、音響信号入力装置８からの音響信号
を増幅回路10を介して増幅した後、さらに前置フィルタ
11を介してアナログ／デジタル変換器12に与えてデジタ
ル信号に変換し、このデジタル信号に変換された音響信
号を信号処理プロセッサ13が自己相関分析してピッチ情
報を抽出し、また２乗和処理してパワー情報を抽出して
CPU1によるソフトウェア処理系に与えるようにしても良
い。このようなハードウェア構成（10〜13）に用いられ
る信号処理プロセッサ13としては、音声帯域の信号をリ
アルタイム処理し得ると共に、ホストのCPU1とのインタ
ーフェース信号が用意されているプロセッサ（例えば、
日本電気株式会社製μPD7720）を適用し得る。Further, in the above-described embodiment, the CPU 1 executes all the processing shown in FIG. 4 according to the program stored in the main storage device 3. However, a part or all of the processing is performed by a hardware configuration. May be executed. For example, as shown in FIG. 5 in which the same reference numerals are given to the corresponding parts in FIG. 3, after the sound signal from the sound signal input device 8 is amplified through the amplifier circuit 10, the pre-filter is further added.
The digital signal is supplied to an analog / digital converter 12 via an analog-to-digital converter 11 and converted into a digital signal. The acoustic signal converted to the digital signal is subjected to autocorrelation analysis by a signal processor 13 to extract pitch information, and a square sum processing is performed. And extract power information
It may be provided to the software processing system by the CPU1. As the signal processor 13 used in such a hardware configuration (10 to 13), a processor (for example, a processor that can process a signal in an audio band in real time and is provided with an interface signal with the host CPU 1)
NEC Corporation μPD7720) can be applied.

［発明の効果］以上のように、本発明によれば、音響信号の音程の音
階としての発生頻度と、調毎に予め作成されている各音
階の重要度を示す重み付け係数との積和を算出して積和
最大の調を当該音響信号の調とするようにしたので、調
を良好に決定でき、最終的な楽譜データの精度を一段と
高めることができる。[Effects of the Invention] As described above, according to the present invention, the product sum of the frequency of occurrence of the pitch of an acoustic signal as a scale and the weighting coefficient indicating the importance of each scale created in advance for each key is determined. Since the calculated key of the sum of products is set as the key of the sound signal, the key can be determined well, and the accuracy of the final score data can be further improved.

[Brief description of the drawings]

第１図は本発明の一実施例にかかる調決定処理を示すフ
ロチャート、第２図は各調に応じて定められている各音
階の重み付け係数の例を示す図表、第３図は本発明を適
用する自動採譜方式の構成を示すブロック図、第４図は
その自動採譜処理手順を示すフローチャート、第５図は
自動採譜方式の他の構成を示すブロック図である。１……CPU、３……主記憶装置、６……補助記憶装置、
７……アナログ／デジタル変換器、８……音響信号入力
装置。FIG. 1 is a flowchart showing a key determination process according to an embodiment of the present invention, FIG. 2 is a table showing an example of a weighting coefficient of each scale determined according to each key, and FIG. Is a block diagram showing a configuration of an automatic transcription system to which is applied, FIG. 4 is a flowchart showing an automatic transcription process procedure, and FIG. 5 is a block diagram showing another configuration of the automatic transcription system. 1 ... CPU, 3 ... main storage device, 6 ... auxiliary storage device,
7 ... A / D converter, 8 ... Acoustic signal input device.

───────────────────────────────────────────────────── フロントページの続き (72)発明者藤本正樹東京都港区芝５丁目７番15号日本電気技術情報システム開発株式会社内 (72)発明者水野正典東京都港区芝５丁目７番15号日本電気技術情報システム開発株式会社内審査官新井重雄 ──────────────────────────────────────────────────続き Continued on the front page (72) Inventor Masaki Fujimoto 5-7-15 Shiba, Minato-ku, Tokyo Inside NEC Technical Information System Development Co., Ltd. (72) Inventor Masanori Mizuno 5-7-1 Shiba, Minato-ku, Tokyo No. 15 Examiner, NEC Technical Information Systems Development Co., Ltd. Shigeo Arai

Claims

(57) [Claims]

1. A process for extracting pitch information representing a pitch and power information of the sound signal, which is a repetition period of an input sound signal waveform, and processing the sound signal based on the pitch information and / or the power information. Is divided into sections that can be regarded as the same pitch, a pitch identification process that determines a pitch on the absolute pitch axis of the acoustic signal for the divided section, and a key that the acoustic signal has based on the pitch information. An automatic music transcription method for converting the audio signal into musical score data, wherein the key determination process allocates the pitch information to each pitch on an absolute pitch axis and totals the audio signal. Extracting the scale occurrence frequency of the pitches of the key, and extracting the predetermined weighting coefficient and the extracted weighting coefficient for each scale in the key. Processing of calculating the sum of products of the scale occurrence frequency of the reverberation signal for all tones, and determining the key having the largest sum of products of the calculated sum of products as the key of the acoustic signal. Characteristic automatic transcription method.

2. A pitch / power extracting means for extracting pitch information representing a pitch and power information of the audio signal, which is a repetition period of an input audio signal waveform, and based on the pitch information and / or the power information. Segmenting means for dividing the acoustic signal into sections that can be regarded as having the same pitch, pitch identifying means for determining a pitch on the absolute pitch axis of the acoustic signal for the divided section, and the acoustic signal based on the pitch information. An automatic transcription apparatus for converting the acoustic signal into musical score data, which is partially provided with a key determining means for determining the key of the musical note data, wherein the key determining means distributes the pitch information to each pitch on an absolute pitch axis. A scale frequency extraction unit for totalizing and extracting a scale occurrence frequency of a pitch included in the sound signal, and a predetermined weight for each scale in the key Sum-of-products calculation unit for calculating the sum of products of the scale generation frequency of the extracted sound signal and the scale occurrence frequency of the extracted sound signal, and the key having the largest sum of products of the calculated sum of products is calculated as the sound signal of the sound signal. An automatic transcription apparatus comprising: a key determination unit that determines a key.