JP2604401B2

JP2604401B2 - Automatic music transcription method and device

Info

Publication number: JP2604401B2
Application number: JP63046112A
Authority: JP
Inventors: 七郎鶴田; 洋典高島; 正樹藤本; 正典水野
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1988-02-29
Filing date: 1988-02-29
Publication date: 1997-04-30
Anticipated expiration: 2012-04-30
Also published as: JPH01219622A

Description

【発明の詳細な説明】［産業上の利用分野］本発明は、歌唱音声やハミング音声や楽器音等の音響
信号から楽器データを作成する自動採譜方法及び装置に
関し、特に音響信号の音程軸と絶対音程軸とを一致させ
るチューニング処理に関するものである。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an automatic music notation method and apparatus for creating musical instrument data from acoustic signals such as singing voices, humming voices, and musical instrument sounds. This relates to a tuning process for matching an absolute pitch axis.

［従来の技術］歌唱音声やハミング音声や楽器音等の音響信号は、基
本波形の繰返し波形となっている。このような音響信号
を楽譜データに変換する自動採譜装置においては、ま
ず、音響信号における基本波形の繰返し周波数（以下、
ピッチ周波数と呼び、これに対応した周期をピッチ周期
と呼び、これらを合せた概念をピッチと呼ぶ）を分析周
期毎に抽出し、その後、このピッチ周波数に基づいて、
同一音程と考えられる区間（セグメント）やそのセグメ
ントにおける音程等を定めていた。[Prior Art] An acoustic signal such as a singing voice, a humming voice, or a musical instrument sound has a repeating waveform of a basic waveform. In an automatic music transcription device that converts such an acoustic signal into musical score data, first, a repetition frequency of a basic waveform in the acoustic signal (hereinafter, referred to as “repetition frequency”)
Is called a pitch frequency, a cycle corresponding to the pitch frequency is called a pitch cycle, and a concept of combining them is called a pitch) is extracted for each analysis cycle, and then, based on this pitch frequency,
A section (segment) considered to be the same pitch, a pitch in the segment, and the like are determined.

［発明が解決しようとする課題］ところで、音響信号、特に人によって発声される歌唱
音声やハミング音程の音程は、絶対的な音程に対してず
れがある。そのため、音響信号から得られたピッチ情報
に基づいて絶対的な音程軸にその音程を同定しようとし
てもそのずれによって適切に同定し得ないことも生じ
る。このずれが大きければ大きい程、音程軸に同定した
音程が不正確になり、最終的に作成された楽譜データの
精度は低いものとなっていた。[Problems to be Solved by the Invention] By the way, the pitch of an acoustic signal, particularly a singing voice or a humming pitch uttered by a person, is different from the absolute pitch. Therefore, even if an attempt is made to identify the pitch on the absolute pitch axis based on the pitch information obtained from the acoustic signal, it may not be possible to properly identify the pitch due to the deviation. The greater the deviation, the more inaccurate the pitch identified on the pitch axis, and the lower the accuracy of the finally created musical score data.

本発明は、以上の点を考慮してなされたもので、音響
信号の音程軸と絶対音程軸とのずれを検出してピッチ情
報をそのずれ量に応じて修正して以降の処理においてよ
り良好に楽譜データを作成させることができるようにし
た自動採譜方法及び装置を提供しようとするものであ
る。The present invention has been made in consideration of the above points, and detects a deviation between a pitch axis and an absolute pitch axis of an acoustic signal, corrects pitch information according to the deviation amount, and improves the following processing. It is an object of the present invention to provide an automatic transcription method and apparatus which allows a user to create musical score data.

［課題を解決するための手段］かかる課題を解決するため、第１の本発明において
は、入力された音響信号をデジタル信号に変換するアナ
ログ／デジタル変換手段と、所定の処理手順を記憶して
いる記憶手段と、記憶手段に記憶されている処理手順を
実行する制御手段とを備え、デジタル信号に変換された
音響信号からそのピッチ情報及びパワー情報を抽出し、
抽出されたピッチ情報及び又はパワー情報から音響信号
を同一音程とみなせる区間に区分し、ピッチ情報から各
区間の音響信号の絶対音程軸にそった音程を決定し、ピ
ッチ情報に基づいて音響信号の調を決定し、区間情報に
基づいて音響信号の拍子及びテンポを決定し、音響信号
を楽譜データに変換する自動採譜方法において、抽出さ
れたピッチ情報を集計して絶対音程軸周りのピッチ情報
の分布を検出し、検出されたピッチ分布情報に基づいて
音響信号の音程軸と絶対音程軸とのずれを検出し、検出
されたずれに応じて抽出されたピッチ情報を絶対音程軸
との差が最小となるように移動修正するようにした。[Means for Solving the Problems] In order to solve the problems, according to a first aspect of the present invention, an analog / digital conversion means for converting an input audio signal into a digital signal, and a predetermined processing procedure are stored. Storage means, comprising a control means for executing the processing procedure stored in the storage means, to extract the pitch information and power information from the acoustic signal converted to a digital signal,
Based on the extracted pitch information and / or power information, the audio signal is divided into sections that can be regarded as having the same pitch, the pitch along the absolute pitch axis of the audio signal in each section is determined from the pitch information, and the pitch of the audio signal is determined based on the pitch information. Determine the key, determine the time signature and tempo of the audio signal based on the section information, and in the automatic transcription method that converts the audio signal into musical score data, the extracted pitch information is aggregated to determine the pitch information around the absolute pitch axis. Detects the distribution, detects the deviation between the pitch axis of the acoustic signal and the absolute pitch axis based on the detected pitch distribution information, and calculates the difference between the pitch information extracted according to the detected deviation and the absolute pitch axis. Moved and corrected to minimize.

また、第２の本発明においては、入力された音響信号
をデジタル信号に変換するアナログ／デジタル変換手段
と、デジタル信号に変換された音響信号からそのピッチ
情報及びパワー情報を抽出するピッチ・パワー抽出手段
と、抽出されたピッチ情報及び又はパワー情報から音響
信号を同一音程とみなせる区間に区分するセグメンテー
ション手段と、ピッチ情報から各区間の音響信号の絶対
音程軸にそった音程を決定する音程同定手段と、ピッチ
情報に基づいて音響信号の調を決定する調決定手段と、
区間情報に基づいて音響信号の拍子及びテンポを決定す
る拍子・テンポ決定手段とを備え、音響信号を楽譜デー
タに変換する自動採譜装置において、抽出されたピッチ
情報を集計して絶対音程軸周りのピッチ情報の分布を検
出するピッチ分布検出手段と、検出されたピッチ分布情
報に基づいて音響信号の音程軸と絶対音程軸とのずれを
検出する音程ずれ検出手段と、検出されたずれに応じて
抽出されたピッチ情報を絶対音程軸との差が最小となる
ように移動修正するピッチ情報修正手段とを備えた。Further, in the second aspect of the present invention, analog / digital conversion means for converting an input audio signal into a digital signal, and pitch / power extraction for extracting pitch information and power information from the audio signal converted into a digital signal Means, segmentation means for dividing the audio signal into sections that can be regarded as having the same pitch from the extracted pitch information and / or power information, and pitch identification means for determining a pitch along the absolute pitch axis of the audio signal in each section from the pitch information And a key determining means for determining the key of the audio signal based on the pitch information;
A time signature / tempo determination means for determining a time signature and a tempo of the audio signal based on the section information; and an automatic transcription apparatus for converting the audio signal into musical score data. Pitch distribution detecting means for detecting a distribution of pitch information, pitch deviation detecting means for detecting a deviation between a pitch axis and an absolute pitch axis of an acoustic signal based on the detected pitch distribution information, and Pitch information correcting means for moving and correcting the extracted pitch information so that the difference from the absolute pitch axis is minimized.

［作用］第１の本発明においては、音響信号の音程軸と絶対音
程軸との差を修正して区間の決定及び音程の決定等の処
理に進むべく、まず、音響信号から得られたピッチ情報
の絶対音程軸周りの分布を検出して音響信号が有する音
程軸情報を捕らえ易く処理し、この分布を統計的に処理
して絶対音程軸に対する音響信号の音程のずれを検出
し、この検出されたずれに応じて抽出されたピッチ情報
を絶対音程軸との差が最小となるように移動修正する処
理を設けて最終的に得られる楽譜データの精度を向上さ
せるようにした。[Operation] In the first aspect of the present invention, in order to correct the difference between the pitch axis and the absolute pitch axis of the sound signal and proceed to the process of determining a section and determining a pitch, first, a pitch obtained from the sound signal is used. The distribution around the absolute pitch axis of the information is detected to process the pitch axis information of the audio signal so as to be easily captured, and the distribution is statistically processed to detect the deviation of the pitch of the audio signal from the absolute pitch axis. A process for moving and correcting the pitch information extracted according to the deviation so as to minimize the difference from the absolute pitch axis is provided to improve the accuracy of the finally obtained score data.

また、第２の本発明においては、ピッチ・パワー抽出
手段によって抽出されたピッチ情報をピッチ分布検出手
段によって集計して絶対音程軸周りのピッチ情報の分布
を検出し、検出されたピッチ分布情報に基づいて音程ず
れ検出手段によって音響信号の音程軸と絶対音程軸との
ずれを検出し、検出されたずれに応じてピッチ情報修正
手段によって抽出されたピッチ情報を絶対音程軸との差
が最小となるように移動修正して音響信号の音程軸と絶
対音程軸とを合せ込んで最終的に得られる楽譜データの
精度を向上させるようにした。Further, in the second aspect of the present invention, the pitch information extracted by the pitch power extracting means is totalized by the pitch distribution detecting means to detect the distribution of pitch information around the absolute pitch axis, and the detected pitch distribution information is added to the detected pitch distribution information. The pitch deviation of the acoustic signal and the absolute pitch axis are detected by the pitch deviation detecting means based on the pitch information extracted by the pitch information correcting means in accordance with the detected deviation. By correcting the movement so as to match the pitch axis of the acoustic signal with the absolute pitch axis, the accuracy of the finally obtained musical score data is improved.

［実施例］以下、本発明の一実施例を図面を参照しながら詳述す
る。Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.

自動採譜方式まず、本発明が適用される自動採譜方式について説明
する。Automatic transcription system First, an automatic transcription system to which the present invention is applied will be described.

第３図において、中央処理ユニット（CPU）１は、当
該装置の全体を制御するものであり、バス２を介して接
続されている主記憶装置３に格納されている第４図に示
す採譜処理プログラムを実行するものである。バス２に
は、CPU1及び主記憶装置３に加えて、入力装置としての
キーボード４、出力装置としての表示装置５、ワーキン
グメモリとして用いられる補助記憶装置６及びアナログ
／デジタル変換器７が接続されている。In FIG. 3, a central processing unit (CPU) 1 controls the whole of the apparatus, and performs a musical notation processing shown in FIG. 4 stored in a main storage device 3 connected via a bus 2. Execute the program. In addition to the CPU 1 and the main storage device 3, a keyboard 4 as an input device, a display device 5 as an output device, an auxiliary storage device 6 used as a working memory, and an analog / digital converter 7 are connected to the bus 2. I have.

アナログ／デジタル変換器７には、例えば、マイクロ
フォンでなる音響信号入力装置８が接続されている。こ
の音響信号入力装置８は、ユーザによって発声された歌
唱やハミングや、楽器から発生された楽音等の音響信号
を捕捉して電気信号に変換するものであり、その電気信
号をアナログ／デジタル変換器７に出力するものであ
る。An audio signal input device 8 including, for example, a microphone is connected to the analog / digital converter 7. The acoustic signal input device 8 captures an acoustic signal such as singing or humming uttered by a user or a musical tone generated from a musical instrument and converts the signal into an electric signal, and converts the electric signal into an analog / digital converter. 7 is output.

CPU1は、キーボード入力装置４によって処理が指令さ
れたとき、当該採譜処理を開始し、主記憶装置３に格納
されているプログラムを実行してアナログ／デジタル変
換器７によってデジタル信号に変換された音響信号を一
旦補助記憶装置６に格納し、その後、これら音響信号を
上述のプログラムを実行して楽譜データに変換して必要
に応じて表示装置５に出力するようになされている。When a process is instructed by the keyboard input device 4, the CPU 1 starts the transcription process, executes a program stored in the main storage device 3, and converts the sound converted into a digital signal by the analog / digital converter 7. The signals are temporarily stored in the auxiliary storage device 6, and thereafter, these sound signals are converted into musical score data by executing the above-described program and output to the display device 5 as necessary.

次に、CPU1が実行する音響信号を取り込んだ後の採譜
処理を第４図の機能レベルで示すフローチャートに従っ
て詳述する。Next, the transcription process performed by the CPU 1 after capturing the audio signal will be described in detail with reference to the flowchart shown in FIG.

まず、CPU1は、音響信号を自己相関分析して分析周期
毎に音響信号のピッチ情報を抽出し、また２乗和処理し
てパワー情報を抽出し、その後ノイズ除去や平滑化処理
等の後処理を実行する（ステップSP1、SP2）。その後、
CPU1は、ピッチ情報については、絶対音程軸に対する音
響信号のずれ量を算出し、得られたピッチ情報をそのず
れ量に応じてシフトさせるチューニング処理を実行する
（ステップSP3）。すなわち、音響信号を発生した歌唱
者または楽器の音程軸と絶対音程軸との差が小さくなる
ようにピッチ情報を修正する。First, the CPU 1 performs an autocorrelation analysis of the audio signal, extracts pitch information of the audio signal in each analysis cycle, and extracts power information by performing a square sum process, and then performs post-processing such as noise removal and smoothing. (Steps SP1 and SP2). afterwards,
For the pitch information, the CPU 1 calculates a shift amount of the acoustic signal with respect to the absolute pitch axis, and executes a tuning process of shifting the obtained pitch information according to the shift amount (step SP3). That is, the pitch information is corrected so that the difference between the pitch axis of the singer or the musical instrument that generated the acoustic signal and the absolute pitch axis becomes smaller.

次いで、CPU1は、得られたピッチ情報が同一音程を指
示するものと考えられるピッチ情報の連続期間を得て、
音響信号を１音ごとのセグメントに切り分ける処理（以
下、セグメンテーションと呼ぶ）を実行し、また、得ら
れたパワー情報の変化に基づいてセグメンテーションを
実行する（ステップSP4、SP5）。これら得られた両者の
セグメント情報に基づいて、CPU1は、４分音符や８分音
符等の時間長に相当する基準長を算出してこの基準長に
基づいてより詳細にセグメンテーションを実行する（ス
テップSP6）。Next, the CPU 1 obtains a continuous period of pitch information in which the obtained pitch information is considered to indicate the same pitch,
A process of dividing the acoustic signal into segments for each sound (hereinafter, referred to as segmentation) is performed, and a segmentation is performed based on the obtained change in power information (steps SP4 and SP5). On the basis of these two pieces of segment information obtained, the CPU 1 calculates a reference length corresponding to a time length of a quarter note, an eighth note, and the like, and performs more detailed segmentation based on the reference length (step SP6).

CPU1は、このようにしてセグメンテーションされたセ
グメントのピッチ情報に基づき、そのピッチ情報が最も
近いと判断できる絶対音程軸上の音程にそのセグメント
の音程を同定し、さらに、同定された連続するセグメン
トの音程が同一か否かに基づいて再度セグメンテーショ
ンを実行する（ステップSP7、SP8）。Based on the pitch information of the segment thus segmented, the CPU 1 identifies the pitch of the segment as a pitch on the absolute pitch axis that can be determined that the pitch information is the closest, and further, determines the pitch of the identified continuous segment. The segmentation is performed again based on whether or not the pitches are the same (steps SP7 and SP8).

その後、CPU1は、ピッチ情報を集計して得た音程の出
現頻度と、調に応じて定まる所定の重み付け係数との積
和を求めての積和の最大情報に基づいて、例えば、ハ長
調やイ短調というように入力音響信号の楽曲の調を決定
し、決定された調における音階上の所定の音程について
その音程をピッチ情報について見直して音程を確認、修
正する（ステップSP9、SP10）。次いで、CPU1は、最終
的に決定された音程から連続するセグメントについて同
一なものがあるか否か、また連続するセグメント間でパ
ワーの変化があるか否かに基づいてセグメンテーション
の見直しを実行し、最終的なセグメンテーションを行な
う（ステップSP11）。Thereafter, the CPU 1 obtains, for example, the C major and the C major based on the maximum information of the product sum obtained by calculating the product sum of the pitch appearance frequency obtained by totaling the pitch information and a predetermined weighting coefficient determined according to the key. The key of the musical composition of the input sound signal is determined, such as a minor key, and the pitch of the predetermined pitch on the scale in the determined key is reviewed with respect to the pitch information to check and correct the pitch (steps SP9 and SP10). Next, the CPU 1 executes a review of the segmentation based on whether or not there is the same continuous segment from the finally determined pitch, and whether or not there is a power change between the continuous segments, Final segmentation is performed (step SP11).

このようにして音程及び音長（セグメント）が決定さ
れると、CPU1は、楽曲は１拍目から始まる、フレーズの
最後の音は次の小節にまたがらない、小節ごとに切れ目
がある等の観点から小節を抽出し、この小節情報及びセ
グメンテーション情報から拍子を決定し、この決定され
た拍子情報及び小節の長さからテンポを決定する（ステ
ップSP12、SP13）。When the pitch and duration (segment) are determined in this manner, the CPU 1 determines that the music starts from the first beat, the last sound of the phrase does not extend to the next bar, that there is a break in each bar, etc. Measures are extracted from the viewpoint, the beat is determined from the measure information and the segmentation information, and the tempo is determined from the determined beat information and the length of the measure (steps SP12 and SP13).

そして、CPU1は決定された音程、音長、調、拍子及び
テンポの情報を整理して最終的に楽譜データを作成する
（ステップSP14）。Then, the CPU 1 organizes the information on the determined pitch, pitch, key, beat, and tempo to finally create the musical score data (step SP14).

チューニング次に、このような処理を実行して採譜を行なう自動採
譜方式におけるチューニング処理（ステップSP3参照）
について、第１図の詳細フローチャートを用いて詳述す
る。Tuning Next, the tuning process in the automatic transcription system that performs transcription by performing such processing (see step SP3)
Will be described in detail with reference to the detailed flowchart of FIG.

CPU1は、まず、周波数の単位であるHzで表わされてい
る入力ピッチ情報を、音階の単位であるセント（基準音
程に対するある音程の周波数比を、２を底とする対数で
表現し、1200を乗算したもの）で表わされたピッチデー
タに変換する（ステップSP20）。なお、100セントの差
が半音の音程差に相当する。The CPU 1 first expresses input pitch information expressed in Hz which is a unit of frequency, in cents which is a unit of scale (the frequency ratio of a certain interval to a reference interval is expressed by a logarithm having a base of 2, 1200 Is converted to pitch data represented by (multiplication of...) (Step SP20). Note that a difference of 100 cents corresponds to a pitch difference of a semitone.

その後、CPU1は、セント値の下２桁が同一なピッチデ
ータ毎に集計して第２図に示すようなヒストグラムを作
成する（ステップSP21）。すなわち、セント値が０、10
0、200、…のデータを同一なものとして集計し、セント
値が１、101、201、…のデータを同一なものとして集計
し、セント値が２、102、202、…のデータを同一なもの
として集計し、かかる集計をセント値が99、199、299、
…のグループのデータについてまで行なう。かくして、
第２図に示すような１セントずつ異なる全幅が100セン
トのピッチ情報についてのヒストグラムが得られる。Thereafter, the CPU 1 creates a histogram as shown in FIG. 2 by summing up each pitch data in which the last two digits of the cent value are the same (step SP21). That is, the cent value is 0, 10
Data of 0, 200,... Are counted as the same data, data of cent value of 1, 101, 201,... Are counted as the same data, and data of cent value of 2, 102, 202,. Tally, and the tally is 99, 199, 299,
… For the data of the group of. Thus,
As shown in FIG. 2, a histogram is obtained for pitch information having a total width of 100 cents which differs by 1 cent.

なお、同一なものとして集計する100セントずる異な
るピッチ情報は半音の整数倍だけ異なるものであり、ま
た、音響信号は半音及び全音を音程差の基準としている
ので、得られたヒストグラムは一様な分布を呈するので
はなく、音響信号を発声した歌唱者や音響信号を発生し
た楽器が有する音程軸に対応したセント値近傍に頻度の
ピークを持つものとなる。Note that the different pitch information, which is 100 cents different as the same, is different by an integral multiple of a semitone, and since the sound signal uses a semitone and a whole tone as a reference for the pitch difference, the obtained histogram is uniform. Instead of exhibiting a distribution, the frequency has a peak near the cent value corresponding to the pitch axis of the singer who uttered the acoustic signal or the musical instrument that generated the acoustic signal.

次いで、CPU1は、パラメータｉ、ｊを０クリアし、パ
ラメータMINを十分に大きい値Ａにセットする（ステッ
プSP22）。その後、CPU1は、得られたヒストグラム情報
を用いてｉセントを中心とした統計上の分散VARを演算
する（ステップSP23）。その後、演算された分散VARが
パラメータMINより大きいか否かを判別し、小さい場合
にはパラメータMINをその分散値VARに更新し、パラメー
タｉを値ｉに更新してステップSP26に進み、他方、大き
い場合には更新動作することなく直ちにステップSP26に
進む（ステップSP24〜SP26）。その後、CPU1はパラメー
タｉが値99か否かを判断し、異なる場合にはパラメータ
ｉをインクリメントして上述のステップSP23に戻る（ス
テップSP27）。Next, the CPU 1 clears the parameters i and j to 0 and sets the parameter MIN to a sufficiently large value A (step SP22). After that, the CPU 1 calculates a statistical variance VAR centering on i-cent using the obtained histogram information (step SP23). Thereafter, it is determined whether or not the calculated variance VAR is larger than the parameter MIN. If it is smaller, the parameter MIN is updated to the variance VAR, the parameter i is updated to the value i, and the process proceeds to step SP26. If it is larger, the process immediately proceeds to step SP26 without performing the update operation (steps SP24 to SP26). Thereafter, the CPU 1 determines whether or not the parameter i is 99, and if different, increments the parameter i and returns to step SP23 described above (step SP27).

このようにして、得られたピッチ情報の集計情報から
分散が最も小さくなるセント情報（ｊ）が得られる。こ
こで、このセント情報周りの分散が最も小さいので、音
響信号の中心とする半音毎のセント群（ｊ、100＋ｊ、2
00＋ｊ、…）と判断することができる。すなわち、歌唱
者または楽器の音程軸を表わしていると捕らえることが
できる。In this way, cent information (j) with the smallest variance is obtained from the obtained total information of pitch information. Here, since the variance around the cent information is the smallest, the cent group (j, 100 + j, 2
00 + j,...). In other words, it can be understood that it represents the pitch axis of the singer or musical instrument.

そこで、CPU1は、このセント情報分だけずらして音響
信号の音程軸を絶対音程軸に合せ込む。まず、CPU1は、
パラメータｊが50セントより小さいか否かを判断して、
すなわち、高音及び低音のどちらの絶対音程軸に近いか
を判断し、高音側に近い場合には、すべてのピッチ情報
を得られたセントｊ分だけ高音側にずらして修正し、低
音側に近い場合には、すべてのピッチ情報を得られたセ
ントｊ分だけ低音側にずらして修正する（ステップSP28
〜30）。Therefore, the CPU 1 shifts the pitch axis of the acoustic signal to the absolute pitch axis by shifting by the cent information. First, CPU1
Determine whether parameter j is less than 50 cents,
That is, it is determined which of the higher pitch and the lower pitch is closer to the absolute pitch axis. If the pitch is closer to the higher pitch side, all pitch information is shifted to the higher pitch side by the obtained cent j and corrected to the lower pitch side. In this case, all pitch information is corrected by shifting it to the lower tone side by the obtained cent j (step SP28).
~ 30).

かくして、音響信号の音程軸がほぼ絶対音程軸に合せ
込まれ、このようにしたピッチ情報が以降の処理に用い
られる。Thus, the pitch axis of the acoustic signal is substantially aligned with the absolute pitch axis, and such pitch information is used for the subsequent processing.

従って、上述の実施例によれば、求められたピッチ情
報をセグメンテーションや音程同定処理等にそのまま適
用するのではなく、半音毎のピッチ情報を同軸上に集計
し、その集計情報から分散をパラメータとして絶対音程
軸とのずれ分を検出し、そのずれ分だけ音響信号の音程
軸を修正して以降の処理に利用するようにしたので、音
響信号源がいかなるものであろうと楽譜データとしてよ
り正確なものを得ることができる。Therefore, according to the above-described embodiment, the obtained pitch information is not directly applied to the segmentation, the pitch identification processing, and the like, but the pitch information for each semitone is totaled coaxially, and the variance is used as a parameter based on the total information. The deviation from the absolute pitch axis is detected, and the pitch axis of the audio signal is corrected by that deviation for use in subsequent processing. You can get things.

なお、上述の実施例においては、自己相関分析によっ
て得られたピッチ情報をチューニング処理するものを示
したが、ピッチ情報の抽出方法はこれに限られないこと
は勿論である。In the above-described embodiment, the tuning processing is performed on the pitch information obtained by the autocorrelation analysis. However, the method of extracting the pitch information is not limited to this.

また、上述の実施例においては、音響信号の音程軸を
分散を用いて得るものを示したが、他の統計的手法を用
いて検出するようにしても良い。In the above-described embodiment, the pitch axis of the acoustic signal is obtained by using the variance. However, the pitch axis may be detected by using another statistical method.

さらに、上述の実施例においては、チューニング処理
で統計処理するピッチ情報がセント単位のものを示した
が、単位系はこれに限らないことはいうまでもない。Further, in the above-described embodiment, the pitch information statistically processed in the tuning process is shown in cent units, but it goes without saying that the unit system is not limited to this.

さらにまた、上述の実施例においては、第４図に示す
全ての処理をCPU1が主記憶装置３に格納されているプロ
グラムに従って実行するものを示したが、その一部また
全部の処理をハードウェア構成で実行するようにしても
良い。例えば、第３図との対応部分に同一符号を付した
第５図に示すように、音響信号入力装置８からの音響信
号を増幅回路10を介して増幅した後、さらに前置フィル
タ11を介してアナログ／デジタル変換器12に与えてデジ
タル信号に変換し、このデジタル信号に変換された音響
信号を信号処理プロセッサ13が自己相関分析してピッチ
情報を抽出し、また２乗和処理してパワー情報を抽出し
てCPU1によるソフトウェア処理系に与えるようにしても
良い。このようなハードウェア構成（10〜13）に用いら
れる信号処理プロセッサ13としては、音声帯域の信号を
リアルタイム処理し得ると共に、ホストのCPU1とのイン
タフェース信号が用意されているプロセッサ（例えば、
日本電気株式会社製μPD7720）を適用し得る。Further, in the above-described embodiment, the CPU 1 executes all the processes shown in FIG. 4 according to the program stored in the main storage device 3. However, some or all of the processes are executed by hardware. The configuration may be executed. For example, as shown in FIG. 5 in which the same reference numerals are given to the corresponding parts in FIG. 3, the sound signal from the sound signal input device 8 is amplified via the amplifier circuit 10 and then further passed through the pre-filter 11. The digital signal is converted to a digital signal by an analog / digital converter 12, and the acoustic signal converted to the digital signal is subjected to autocorrelation analysis by a signal processor 13 to extract pitch information, and is processed by sum-of-squares processing to obtain power. Information may be extracted and provided to the software processing system by the CPU 1. As the signal processor 13 used in such a hardware configuration (10 to 13), a processor (for example, a processor that can process a signal in a voice band in real time and is provided with an interface signal with a host CPU 1)
NEC Corporation μPD7720) can be applied.

［発明の効果］以上のように、本発明によれば、ピッチ情報を、音響
信号の音程軸と絶対音程軸とのずれ分だけ修正して以降
の処理に供するようにしたので、音響信号源がいかなる
ものであろうと楽譜データとしてより正確なものを得る
ことができる。[Effects of the Invention] As described above, according to the present invention, the pitch information is corrected by the difference between the pitch axis of the acoustic signal and the absolute pitch axis, and the corrected pitch information is used for the subsequent processing. Is more accurate as the musical score data.

[Brief description of the drawings]

第１図は本発明の一実施例にかかるチューニング処理を
示すフローチャート、第２図はピッチ情報の分布状況を
示すヒストグラム、第３図は本発明を適用する自動採譜
方式の構成を示すブロック図、第４図はその自動採譜方
式の処理手順を示すフローチャート、第５図は自動採譜
方式の他の構成を示すブロック図である。１……CPU、３……主記憶装置、６……補助記憶装置、
７……アナログ／デジタル変換器、８……音響信号入力
装置。FIG. 1 is a flowchart showing a tuning process according to an embodiment of the present invention, FIG. 2 is a histogram showing a distribution state of pitch information, FIG. 3 is a block diagram showing a configuration of an automatic transcription system to which the present invention is applied, FIG. 4 is a flowchart showing a processing procedure of the automatic transcription system, and FIG. 5 is a block diagram showing another configuration of the automatic transcription system. 1 ... CPU, 3 ... main storage device, 6 ... auxiliary storage device,
7 ... A / D converter, 8 ... Acoustic signal input device.

───────────────────────────────────────────────────── フロントページの続き (72)発明者藤本正樹東京都港区芝５丁目７番15号日本電気技術情報システム開発株式会社内 (72)発明者水野正典東京都港区芝５丁目７番15号日本電気技術情報システム開発株式会社内審査官新井重雄 ──────────────────────────────────────────────────続き Continuing on the front page (72) Inventor Masaki Fujimoto 5-7-15 Shiba, Minato-ku, Tokyo Inside NEC Technical Information System Development Co., Ltd. (72) Inventor Masanori Mizuno 5-7-1 Shiba, Minato-ku, Tokyo No. 15 Examiner, NEC Technical Information Systems Development Co., Ltd. Shigeo Arai

Claims

(57) [Claims]

An analog / digital converter for converting an input audio signal into a digital signal, a storage for storing a predetermined processing procedure, and a control for executing the processing procedure stored in the storage. Means for extracting pitch information and power information from the audio signal converted to a digital signal, dividing the audio signal from the extracted pitch information and / or power information into sections that can be regarded as having the same pitch, Determine the pitch along the absolute pitch axis of the sound signal of each section from the information, determine the key of the sound signal based on the pitch information, and determine the beat and tempo of the sound signal based on the section information. In the automatic music transcription method for determining and converting the acoustic signal into musical score data, the extracted pitch information is summed up and the distribution of pitch information around the absolute pitch axis is detected. Out, detecting a deviation between the pitch axis and the absolute pitch axis of the acoustic signal based on the detected pitch distribution information, and extracting the pitch information extracted according to the detected deviation from the absolute pitch axis with a minimum difference. An automatic transcription method, wherein the movement is corrected so that

2. An analog / digital converter for converting an input audio signal into a digital signal, a pitch / power extractor for extracting pitch information and power information from the audio signal converted into a digital signal, and extraction. Segmentation means for dividing the acoustic signal into sections that can be regarded as having the same pitch based on the obtained pitch information and / or power information, and pitch identification means for determining a pitch along the absolute pitch axis of the acoustic signal in each section from the pitch information And key determination means for determining the key of the audio signal based on the pitch information, and time signature and tempo determination means for determining the time signature and tempo of the audio signal based on the section information,
An automatic music transcription device for converting the acoustic signal into musical score data; a pitch distribution detecting means for counting the extracted pitch information and detecting a distribution of pitch information around an absolute pitch axis; Pitch shift detecting means for detecting a shift between the pitch axis and the absolute pitch axis of the acoustic signal, and moving the pitch information extracted according to the detected shift so that the difference between the absolute pitch axis is minimized. An automatic music transcription device comprising a pitch information correction means for correcting.