JP2604407B2

JP2604407B2 - Automatic music transcription method and device

Info

Publication number: JP2604407B2
Application number: JP4612288A
Authority: JP
Inventors: 七郎鶴田; 洋典高島; 正樹藤本; 正典水野
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1988-02-29
Filing date: 1988-02-29
Publication date: 1997-04-30
Anticipated expiration: 2012-04-30
Also published as: JPH01219631A

Abstract

PURPOSE:To obtain a good intervale as an identified one, by extracting a section identified with an erroneous intervale and correcting the identified intervale of said section so as to become the identified intervale of one of the front and rear sections reduced in intervale difference. CONSTITUTION:When the length of a segment to be processed is shorter than a threshold value, a CPU 1 judges whether the change tendency of the pitch data of said segment overshoots or undershoots. As the result of this judgement, when there is the possibility of an overshoot- or undershoot-segment, the difference between the front and rear segments is calculated and the segment smaller in difference is selected to judge whether the intervale difference with said segment is below the threshold value. When said segment is below the threshold value, it is judged whether the power change between segments is below the threshold value. When said change is below the threshold value as the result of the judgement, the CPU 1 corrects the intervale of said segment to that of a selected segment. By this method, a good intervale as an identified one can be obtained.

Description

【発明の詳細な説明】［産業上の利用分野］本発明は、歌唱音声やハミング音声や楽器音等の音響
信号から楽譜データを作成する自動採譜方法及び装置に
関し、特に所定区間の音響信号の音程として同定された
音程を必要に応じて修正する同定音程の修正処理に関す
るものである。Description: BACKGROUND OF THE INVENTION The present invention relates to an automatic music transcription method and apparatus for creating musical score data from acoustic signals such as singing voices, humming voices, and musical instrument sounds, and more particularly to an audio transcription method for a predetermined section. The present invention relates to a process for correcting an identified pitch, which corrects a pitch identified as a pitch as necessary.

［従来の技術］歌唱音声やハミング音声や楽器音等の音響信号を楽譜
データに変換する自動採譜方式においては、音響信号か
ら楽譜としての基本的な情報である音長、音程、調、拍
子及びテンポを検出することを有する。[Prior Art] In an automatic transcription system for converting an acoustic signal such as a singing voice, a humming voice, or a musical instrument sound into musical score data, a sound length, a pitch, a key, a time signature, and the like, which are basic information as a musical score from an acoustic signal. Detecting the tempo.

ところで、音響信号は基本波形の繰返し波形を連続的
に含む信号であるだけであり、上述した各情報を直ちに
得ることはできない。By the way, an acoustic signal is only a signal that continuously includes a repetitive waveform of a basic waveform, and the above-described information cannot be obtained immediately.

そこで、従来の自動採譜方式においては、まず、音響
信号の音高を表す基本波形の繰返し情報（以下、ピッチ
情報と呼ぶ）及びパワー情報を分析周期毎に抽出し、そ
の後、抽出されたピッチ情報及び又はパワー情報から音
響信号を同一音程とみなせる区間（セグメント）に区分
し（かかる処理をセグメンテーションと呼ぶ）、次い
で、セグメントのピッチ情報から各セグメントの音響信
号の絶対音程軸にそった音程を同定し、ピッチ情報の分
布状況に基づいて音響信号の調を決定し、さらに、セグ
メントに基づいて音響信号の拍子及びテンポを決定する
という順序で各情報を得ていた。Therefore, in the conventional automatic transcription method, first, repetition information (hereinafter, referred to as pitch information) of a basic waveform representing a pitch of an acoustic signal and power information are extracted for each analysis cycle, and thereafter, the extracted pitch information is extracted. And / or dividing the sound signal into segments (segments) that can be regarded as the same pitch based on the power information (this processing is called segmentation), and then identifying the pitch along the absolute pitch axis of the sound signal of each segment from the pitch information of the segment. Then, the tone of the sound signal is determined based on the distribution state of the pitch information, and further, the information is obtained in the order of determining the beat and the tempo of the sound signal based on the segment.

［発明が解決しようとする課題］ところで、音響信号のあるセグメントを絶対音程軸上
の音程として同定しようとしても、音響信号、特に人に
よって発声された音響信号は音程が安定しておらず、１
音を意図している場合てあっても音程の揺らぎが多い。
そのため、音程同定処理を非常に難しいものとしてい
た。[Problems to be Solved by the Invention] By the way, even if an attempt is made to identify a certain segment of an acoustic signal as a pitch on an absolute pitch axis, the pitch of an acoustic signal, particularly an acoustic signal uttered by a human, is not stable.
Even if the sound is intended, the pitch fluctuates a lot.
For this reason, the pitch identification processing is very difficult.

特に、ある音から次の音への移動時には、次の音の音
程に滑らかに移行することができずにその前後で音程が
ふらつくことが多い。そのため、セグメンテーション処
理において異なる音の区間と誤って区分されることも多
くあり、その結果音程同定処理においても異なる音程に
同定されることがあった。In particular, when moving from one sound to the next, the pitch often fluctuates before and after the transition to the next sound cannot be made smoothly. For this reason, in the segmentation processing, there are many cases where the section is erroneously classified as a section of a different sound, and as a result, the section is sometimes identified as a different pitch also in the pitch identification processing.

本発明は、以上の点を考慮してなされたもので、音響
信号における次の音の移行時に生じる音程の揺らぎのた
めに、歌唱者等が意図した音程と異なる音程に同定され
たセグメントの音程をその前後のセグメントの音程情報
から修正してより正確な音程情報を得ることができ、最
終的な楽譜データの精度を一段と向上させることのでき
る自動採譜方法及び装置を提供しようとするものであ
る。The present invention has been made in consideration of the above points, and because of the fluctuation of the pitch generated at the transition of the next sound in the acoustic signal, the pitch of the segment identified as a pitch different from the pitch intended by the singer etc. The present invention aims to provide an automatic transcription method and apparatus which can correct the pitch information of the preceding and succeeding segments to obtain more accurate pitch information and further improve the accuracy of final score data. .

［課題を解決するための手段］かかる課題を解決するために、第１の本発明の方法に
おいては、入力された音響信号波形の繰返し周期であ
り、音高を表すピッチ情報及び音響信号のパワー情報を
抽出する処理と、ピッチ情報及び又はパワー情報に基づ
いて音響信号を同一音程とみなせる区間に区分するセグ
メンテーション処理と、この区分された区間についてピ
ッチ情報に基づいてその区間の音程として絶対音程軸上
の音程に同定する音程同定処理とを少なくとも含み、音
響信号を楽譜データに変換する自動採譜方法において、
区分された各区間のうち所定長さより短い区間を抽出す
る処理と、抽出された区間からピッチ情報の変化が特異
な所定の傾向を呈する区間をさらに抽出する処理と、変
化傾向に基づいて抽出された区間とその前後の区間との
同定音程の差を検出して少なくとも一方の音程差が所定
値より小さいとき音程差が小さい区間の同定音程に当該
区間の同定音程を修正する処理とを、音程同定処理の後
処理として設けた。[Means for Solving the Problems] In order to solve the problems, in the first method of the present invention, a pitch information representing a repetition period of an input acoustic signal waveform, representing a pitch, and a power of the acoustic signal. A process of extracting information, a segmentation process of dividing an audio signal into sections that can be regarded as having the same pitch based on pitch information and / or power information, and an absolute pitch axis as a pitch of the section based on the pitch information for the divided section. At least a pitch identification process to identify the upper pitch, in an automatic music transcription method of converting the acoustic signal into musical score data,
A process of extracting a section shorter than a predetermined length from each of the divided sections, a process of further extracting a section in which a change in pitch information exhibits a peculiar predetermined tendency from the extracted section, and extracting the section based on the change tendency. Detecting the difference between the identified intervals of the section and the preceding and following sections and correcting the identified interval of the section to the identified interval of the section where the interval difference is small when at least one of the intervals is smaller than a predetermined value. This is provided as a post-processing of the identification processing.

また、第２の本発明の方法においては、変化傾向に基
づいて抽出された区間の同定音程を修正する処理が、前
後の区間との音程差が小さく、かつその音程差が小さい
区間との間でパワー情報の変化が小さい場合にその前後
一方の区間の同定音程に同定音程を修正するようにし
た。Further, in the second method of the present invention, the process of correcting the identification interval of the section extracted based on the change tendency is performed when the interval difference between the preceding and following sections is small and the interval between the intervals is small. When the change in the power information is small, the identification pitch is corrected to the identification pitch in one of the preceding and following sections.

第３の本発明の装置においては、入力された音響信号
波形の繰返し周期であり、音高を表すピッチ情報及び音
響信号のパワー情報を抽出するピッチ・パワー抽出手段
と、ピッチ情報及び又はパワー情報に基づいて音響信号
を同一音程とみなせる区間に区分するセグメンテーショ
ン手段と、この区分された区間についてピッチ情報に基
づいてその区間の音程として絶対音程軸上の音程に同定
する音程同定手段とを一部に備えて音響信号を楽譜デー
タに変換する自動採譜装置において、区分された各区間
のうち所定長さより短い区間を抽出する短区間抽出手段
と、抽出された区間からピッチ情報の変化が特異な所定
の傾向を呈する区間をさらに抽出する修正候補区間抽出
手段と、変化傾向に基づいて抽出された区間とその前後
の区間との同定音程の差を検出して少なくとも一方の音
程差が所定値より小さいとき音程差が小さい区間の同定
音程に当該区間の同定音程を修正する同定音程修正手段
とを、音程同定手段の後段に設けた。In the apparatus according to the third aspect of the present invention, pitch / power extracting means for extracting pitch information representing a pitch and power information of an audio signal, which is a repetition period of the input audio signal waveform, pitch information and / or power information Segmentation means for classifying an acoustic signal into sections that can be regarded as having the same pitch based on pitch information, and pitch identification means for identifying the divided sections as pitches on the absolute pitch axis as pitches of the section based on pitch information. An automatic transcription apparatus for converting an acoustic signal into musical score data in preparation for a short section extracting means for extracting a section shorter than a predetermined length from each section, and a predetermined section in which a change in pitch information is unique from the extracted section. Candidate section extracting means for further extracting a section exhibiting the tendency of the above, and an identification pitch between the section extracted based on the change tendency and the sections before and after the section The identification interval correcting means at least one of the pitch difference is detected is to modify the identification interval of the section identified musical interval of pitch difference is small section is smaller than a predetermined value the difference was provided after the interval identification means.

第４の本発明においては、同定音程修正手段が、前後
の区間との音程差が小さく、かつその音程差が小さい区
間との間でパワー情報の変化が小さい場合にその前後一
方の区間の同定音程に同定音程を修正するようにした。According to the fourth aspect of the present invention, when the interval difference between the preceding and following sections is small and the change in power information is small between the section and the section having the small pitch difference, the identification pitch correcting means identifies the preceding or succeeding section. Modified the identification pitch to the pitch.

［作用］第１乃至第４の本発明共に、同一音を意図しているに
も拘らず、音の移行時の音程の揺らぎのために別個な区
間と区分され、異なる音程に同定された区間の同定音程
を修正しようとするものである。[Operation] Although the first to fourth aspects of the present invention are intended for the same sound, they are divided into separate sections due to pitch fluctuation at the time of sound transition, and sections identified as different pitches. Is to correct the identification pitch of.

第１の本発明による方法においては、区間長、その区
間のピッチ情報の変化傾向及び前後する区間との音程差
により誤った音程に同定された区間を抽出し、音程差が
小さい前後する一方の区間の同定音程にその区間の同定
音程を修正するようにした。In the method according to the first aspect of the present invention, a section identified as an incorrect pitch based on a section length, a change tendency of pitch information of the section, and a pitch difference between the preceding and following sections is extracted, and one of the preceding and following ones having a small pitch difference is extracted. The identification pitch of the section is modified to the identification pitch of the section.

第２の本発明による方法においては、さらに前後する
区間とのパワー情報の変化量をも考慮して誤った音程に
同定された区間を抽出し、音程差が小さい前後する一方
の区間の同定音程にその区間の同定音程を修正するよう
にした。In the method according to the second aspect of the present invention, a section identified as an erroneous interval is extracted in consideration of the amount of change in power information between the preceding and following sections, and the identified interval of one of the preceding and following sections having a small pitch difference is extracted. Then, the identification pitch of the section was modified.

第３の本発明による装置においては、区間長が短い区
間を短区間抽出手段によって抽出し、さらに、特異なピ
ッチ情報の変化傾向を呈する区間を修正候補区間抽出手
段によって抽出し、抽出された区間について同定音程手
段が前後する区間との音程差を確認して小さい場合に音
程差が小さい前後する一方の区間の同定音程にその区間
の同定音程を修正するようにした。In the apparatus according to the third aspect of the present invention, a section having a short section length is extracted by a short section extracting means, and a section exhibiting a unique pitch information change tendency is extracted by a correction candidate section extracting means. With regard to the above, the identification pitch means confirms the pitch difference between the preceding and following sections, and when the pitch difference is small, the identification pitch of the preceding or following section having a small pitch difference is corrected to the identification pitch of that section.

第４の本発明による装置においては、同定音程手段が
修正候補区間抽出手段によって抽出された区間について
前後する区間との音程差及びパワー情報の変化量を確認
して音程差及びパワー情報の差が共に小さい場合に音程
差が小さい前後する一方の区間の同定音程にその区間の
同定音程を修正するようにした。In the device according to the fourth aspect of the present invention, the identified pitch means confirms a pitch difference and a power information change amount between the preceding and following sections of the section extracted by the correction candidate section extracting means, and determines a difference between the pitch difference and the power information. When both are small, the identification interval in one section before and after the interval difference is small is corrected.

［実施例］以下、本発明の一実施例を図面を参照しながら詳述す
る。Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.

自動採譜方式まず、本発明が適用される自動採譜方式について説明
する。Automatic transcription system First, an automatic transcription system to which the present invention is applied will be described.

第３図において、中央処理ユニット（CPU）１は、当
該装置の全体を制御するものであり、バス２を介して接
続されている主記憶装置３に格納されている第４図に示
す採譜処理プログラムを実行するものである。バス２に
は、CPU1及び主記憶装置３に加えて、入力装置としての
キーボード４、出力装置としての表示装置５、ワーキン
グメモリとして用いられる補助記憶装置６及びアナログ
／デジタル変換器７が接続されている。In FIG. 3, a central processing unit (CPU) 1 controls the whole of the apparatus, and performs a musical notation processing shown in FIG. 4 stored in a main storage device 3 connected via a bus 2. Execute the program. In addition to the CPU 1 and the main storage device 3, a keyboard 4 as an input device, a display device 5 as an output device, an auxiliary storage device 6 used as a working memory, and an analog / digital converter 7 are connected to the bus 2. I have.

アナログ／デジタル変換器７には、例えば、マイクロ
フォンでなる音響信号入力装置８が接続されている。こ
の音響信号入力装置８は、ユーザによって発声された歌
唱やハミングや、楽器から発生された楽音等の音響信号
を捕捉して電気信号に変換するものであり、その電気信
号をアナログ／デジタル変換器７に出力するものであ
る。An audio signal input device 8 including, for example, a microphone is connected to the analog / digital converter 7. The acoustic signal input device 8 captures an acoustic signal such as singing or humming uttered by a user or a musical tone generated from a musical instrument and converts the signal into an electric signal, and converts the electric signal into an analog / digital converter. 7 is output.

CPU1は、キーボード入力装置４によって処理が指令さ
れたとき、当該採譜処理を開始し、主記憶装置３に格納
されているプログラムを実行してアナログ／デジタル変
換器７によってデジタル信号に変換された音響信号を一
旦補助記憶装置６に格納し、その後、これら音響信号を
上述のプログラムを実行して楽譜データに変換して必要
に応じて表示装置５に出力するようになされている。When a process is instructed by the keyboard input device 4, the CPU 1 starts the transcription process, executes a program stored in the main storage device 3, and converts the sound converted into a digital signal by the analog / digital converter 7. The signals are temporarily stored in the auxiliary storage device 6, and thereafter, these sound signals are converted into musical score data by executing the above-described program and output to the display device 5 as necessary.

次に、CPU1が実行する音響信号を取り込んだ後の採譜
処理を第４図の機能レベルで示すフローチャートに従っ
て詳述する。Next, the transcription process performed by the CPU 1 after capturing the audio signal will be described in detail with reference to the flowchart shown in FIG.

まず、CPU1は、音響信号を自己相関分析して分析周期
毎に音響信号のピッチ情報を抽出し、また２乗和処理し
て分析周期毎にパワー情報を抽出し、その後ノイズ除去
や平滑化処理等の後処理を実行する（ステップSP1、SP
2）。その後、CPU1は、ピッチ情報については、その分
布状況に基づいて絶対音程軸に対する音響信号が有する
音程軸のずれ量を算出し、得られたピッチ情報をそのず
れ量に応じてシフトさせるチューニング処理を実行する
（ステップSP3）。すなわち、音響信号を発生した歌唱
者または楽器の音程軸と絶対音程軸との差が小さくなる
ようにピッチ情報を修正する。First, the CPU 1 performs an autocorrelation analysis of the acoustic signal to extract pitch information of the acoustic signal at each analysis cycle, and also performs a sum-of-squares process to extract power information at each analysis cycle, and then performs noise removal and smoothing processing. And other post-processing (steps SP1, SP
2). Thereafter, for the pitch information, the CPU 1 calculates a shift amount of the pitch axis of the acoustic signal with respect to the absolute pitch axis based on the distribution state, and performs a tuning process of shifting the obtained pitch information according to the shift amount. Execute (step SP3). That is, the pitch information is corrected so that the difference between the pitch axis of the singer or the musical instrument that generated the acoustic signal and the absolute pitch axis becomes smaller.

次いで、CPU1は、得られたピッチ情報が同一音程を指
示するものと考えられるピッチ情報の連続期間を得て、
音響信号を１音ごとのセグメントに切り分けるセグメン
テーションを実行し、また、得られたパワー情報の変化
に基づいてセグメンテーションを実行する（ステップSP
4、SP5）。これら得られた両者のセグメント情報に基づ
いて、CPU1は、４分音符や８分音符等の時間長に相当す
る基準長を算出してこの基準長に基づいて再度セグメン
テーションを実行する（ステップSP6）。Next, the CPU 1 obtains a continuous period of pitch information in which the obtained pitch information is considered to indicate the same pitch,
A segmentation is performed to divide the acoustic signal into segments for each sound, and a segmentation is performed based on the obtained change in the power information (step SP
4, SP5). Based on these two pieces of segment information obtained, the CPU 1 calculates a reference length corresponding to a time length of a quarter note, an eighth note, etc., and executes the segmentation again based on this reference length (step SP6). .

CPU1は、このようにしてセグメンテーションされたセ
グメントのピッチ情報に基づきそのピッチ情報が最も近
いと判断できる絶対音程軸上の音程にそのセグメントの
音程を同定し、さらに、同定された連続するセグメント
の音程が同一か否かに基づいて再度セグメンテーション
を実行する（ステップSP7、SP8）。The CPU 1 identifies the pitch of the segment as a pitch on the absolute pitch axis that can determine that the pitch information is the closest based on the pitch information of the segment thus segmented, and further identifies the pitch of the identified continuous segment. Segmentation is again performed based on whether or not are the same (steps SP7 and SP8).

その後、CPU1は、各セグメントについのピッチ情報を
音程軸周りに集計して得た音程の出現頻度と、調に応じ
て定まる所定の重み付け係数との積和を求めてこの積和
の最大情報に基づいて、例えば、ハ長調やイ短調という
ように入力音響信号の楽曲の調を決定し、決定された調
における音階上の所定の音程についてその音程をピッチ
情報について見直して音程を確認、修正する（ステップ
SP9、SP10）。次いで、CPU1は、最終的に決定された音
程から連続するセグメントについて同一なものがあるか
否か、また連続するセグメント間でパワーの変化がある
か否かに基づいてセグメンテーションの見直しを実行
し、最終的なセグメンテーションを行なう（ステップSP
11）。Thereafter, the CPU 1 obtains a sum of products of a frequency of appearance of a pitch obtained by summing pitch information about each segment around the pitch axis and a predetermined weighting coefficient determined according to a key, and obtains the maximum information of the sum of products. Based on this, for example, the key of the musical composition of the input audio signal is determined, such as C major or A minor, and for a predetermined pitch on a scale in the determined key, the pitch is reviewed for pitch information to confirm and correct the pitch. (Step
SP9, SP10). Next, the CPU 1 executes a review of the segmentation based on whether or not there is the same continuous segment from the finally determined pitch, and whether or not there is a power change between the continuous segments, Perform final segmentation (step SP
11).

このようにして音程及びセグメントが決定されると、
CPU1は、楽曲は１拍目から始まる、フレーズの最後の音
は次の小節にまたがらない、小節ごとに切れ目がある等
の観点から小節を抽出し、この小節情報及びセグメンテ
ーション情報から拍子を決定し、この決定された拍子情
報及び小節の長さからテンポを決定する（ステップSP1
2、SP13）。Once the pitch and segment are determined in this way,
The CPU 1 extracts measures from the viewpoint that the music starts from the first beat, the last sound of the phrase does not extend to the next measure, and there is a break in each measure, and determines the time signature from the measure information and the segmentation information. The tempo is determined from the determined time signature information and the length of the bar (step SP1).
2, SP13).

そして、CPU1は決定された音程、音長、調、拍子及び
テンポの情報を整理して最終的に楽譜データを作成する
（ステップSP14）。Then, the CPU 1 organizes the information on the determined pitch, pitch, key, beat, and tempo to finally create the musical score data (step SP14).

音程同定処理次に、このような自動採譜方式における上述したステ
ップSP7の音程同定処理、特に一旦同定された音程の見
直し処理について、第１図のフローチャートを用いて詳
述する。Pitch Identifying Process Next, the above-described pitch identifying process of step SP7 in the automatic transcription method, particularly the review process of the once identified pitch, will be described in detail with reference to the flowchart of FIG.

CPU1は、まずセグメンテーションによって得られたセ
グメントについて、例えばそのセグメント内のピッチ情
報の平均値を得て平均値が最も近い絶対音程軸上の半音
ずつ異なるいずれかの音程にそのセグメントの音程を決
定する（ステップSP20）。First, for the segment obtained by the segmentation, the CPU 1 obtains, for example, the average value of the pitch information in the segment, and determines the interval of the segment to any interval that differs by a semitone on the absolute pitch axis closest to the average value. (Step SP20).

このようにして同定された音程を以下のようにして見
直す。ここで、見直しは音の移行時において音程が不安
定なため、別のセグメントとして区分され、前後のセグ
メントと独立に音程同定されたと考えられるセグメント
に対して行なう。The pitch thus identified is reviewed as follows. Here, since the pitch is unstable at the time of transition of the sound, the review is performed on a segment which is classified as another segment and which is considered to have been identified independently of the preceding and following segments.

CPU1は、まず最後のセグメントについて処理が終了し
ていないことを確認して処理対象のセグメントの長さが
閾値より短いか否かを判断し、閾値以上の場合には、処
理対象セグメントを次のセグメントに移行して上述のス
テップSP20に戻る（ステップSP21、22）。The CPU 1 first confirms that the processing has not been completed for the last segment and determines whether or not the length of the segment to be processed is shorter than a threshold. The process proceeds to the segment and returns to step SP20 (steps SP21 and SP22).

このようにしたのは、音の移行時において音の開始時
または終了時で同一音であるにも拘わず他のセグメント
に区分された場合には、そのセグメント長は短くなるた
めである。このようにして、長さが短いセグメントであ
ることが検出されると、CPU1はそのセグメントのピッチ
情報の変化傾向とオーバーシュートの変化傾向とのマッ
チングをとり、また、そのセグメントのピッチ情報の変
化傾向とアンダーシュートの変化傾向とのマッチングを
とり、そのセグメントのピッチ情報の変化傾向がオーバ
ーシュートかアンダーシュートかを判断する（ステップ
SP23、24）。The reason for this is that if the sound is the same at the start or end of the sound at the time of the transition of the sound and is divided into other segments, the segment length is shortened. In this way, when it is detected that the segment has a short length, the CPU 1 matches the change tendency of the pitch information of the segment with the change tendency of the overshoot, and changes the pitch information of the segment. By matching the tendency with the change tendency of the undershoot, it is determined whether the change tendency of the pitch information of the segment is overshoot or undershoot (step
SP23, 24).

ここで、ある音から次の音に移行するとき、次の音の
開始近傍において多少高い音程からその音の音程に徐々
に移行することがあり、次の音の開始近傍において多少
低い音程からその音の音程に徐々に移行することがあ
り、また、ある音の終了近傍においてその音の音程から
徐々に低くなっていって次の音に移行することがあり、
ある音の終了近傍においてその音の音程から徐々に高く
なっていって次の音に移行することがある。このような
同一音であるにも拘らず、音の移行時のために音程が徐
々に増加傾向または減少傾向を示しながら変化する部分
のうち所定の音程より高い部分をオーバーシュートと、
また音の移行時のために音程が徐々に増加傾向または減
少傾向を示しながら変化する部分のうち所定の音程より
低い部分をアンダーシュートと呼んでいる。Here, when transitioning from a certain sound to the next sound, there may be a gradual transition from a slightly higher pitch to the pitch of the sound near the start of the next sound, and from a slightly lower pitch near the start of the next sound. There may be a gradual transition to the pitch of the sound, and in the vicinity of the end of a certain sound, it may gradually decrease from the pitch of that sound and transition to the next sound,
In the vicinity of the end of a sound, the sound may gradually rise from the pitch of the sound and shift to the next sound. Despite being such an identical sound, a portion higher than a predetermined pitch is an overshoot in a portion where the pitch changes while showing a gradually increasing or decreasing tendency due to the transition of the sound,
In addition, a portion lower than a predetermined pitch among portions where the pitch changes while showing a gradually increasing or decreasing tendency due to a transition of a sound is called an undershoot.

このようなオーバーシュート及びアンダーシュート部
分が独立のセグメントとして区分されることがあり、上
述のように、セグメントのピッチ情報の変化傾向と所定
の増加傾向または減少傾向とのマッチングをとって当該
処理対象のセグメントがオーバーシュートまたはアンダ
ーシュートを呈するセグメントの可能性がある否かを判
断している。Such overshoot and undershoot portions may be classified as independent segments, and as described above, matching between the change tendency of the pitch information of the segment and a predetermined increasing or decreasing tendency is performed, and It is determined whether or not there is a possibility that any of the segments may exhibit an overshoot or undershoot.

この判断の結果、否定結果を得ると、処理対象セグメ
ントを次のセグメントにして上述のステップSP21に戻
る。他方、オーバーシュートまたはアンダーシュートの
セグメントの可能性があると判断すると、CPU1は前後の
セグメントの同定された音程と当該セグメントの同定さ
れた音程との差を求め、差が小さい方のセグメントをマ
ークした後、マークしたセグメントとの音程差が閾値よ
り小さいか否かを判断する（ステップSP25、26）。If a negative result is obtained as a result of this determination, the processing target segment is set as the next segment, and the process returns to step SP21. On the other hand, when determining that there is a possibility of an overshoot or undershoot segment, the CPU 1 obtains a difference between the identified pitch of the preceding and following segments and the identified pitch of the segment, and marks the segment having the smaller difference. After that, it is determined whether or not the pitch difference from the marked segment is smaller than a threshold (steps SP25 and SP26).

同一音であるにも拘らず、別個のセグメントにセグメ
ンテーションされた場合には、そのセグメントの音程は
前後するセグメントの音程とさほど相違しておらず、前
後のセグメントとの音程差が大きい場合には、オーバー
シュートまたはアンダーシュートのセグメントではない
と考えられるので、この場合には、CPU1は処理対象セグ
メントを次のセグメントに移行して上述のステップSP21
に戻る。If the segment is segmented into separate segments in spite of being the same sound, the pitch of that segment is not so different from the pitch of the preceding and following segments, and if the pitch difference between the preceding and following segments is large, In this case, it is considered that the segment is not an overshoot or undershoot segment. In this case, the CPU 1 shifts the segment to be processed to the next segment and proceeds to step SP21 described above.
Return to

他方、マークしたセグメントとの音程差が小さい場合
には、CPU1は当該セグメントとマークしたセグメントと
の境界近傍においてパワー情報の変化が閾値以上あるか
否かを判断する（ステップSP27）。異なる音に移行する
ときには、パワー情報も変化することが多く、パワー情
報の変化が大きい場合には、当該セグメントはオーバー
シュートまたはアンダーシュートのセグメントではない
と考えられるので、この場合には、CPU1は処理対象セグ
メントを次のセグメントに移行して上述のステップSP21
に戻る。On the other hand, if the pitch difference between the marked segment and the marked segment is small, the CPU 1 determines whether or not the change in the power information near the boundary between the segment and the marked segment is equal to or larger than a threshold (step SP27). When transitioning to a different sound, the power information often changes, and if the change in the power information is large, the segment is not considered to be an overshoot or undershoot segment. Move the segment to be processed to the next segment and go to step SP21
Return to

このステップSP27の判断によっても、肯定結果を得る
と、当該セグメントがオーバーシュートまたはアンダー
シュートのセグメントと考えられるので、CPU1は当該セ
グメントの音程をマークしたセグメントの音程に修正
し、その後、処理対象セグメントを次のセグメントに移
して上述のステップSP21に戻る（ステップSP28）。If a positive result is also obtained by the determination in step SP27, the segment is considered to be an overshoot or undershoot segment, so the CPU 1 corrects the pitch of the segment to the pitch of the marked segment. Is moved to the next segment and the process returns to step SP21 (step SP28).

このような処理を繰り返して全てのセグメントについ
て音程の見直しを実行し、最後のセグメントの見直し処
理が終了すると、ステップSP21で肯定結果が得られて当
該処理プログラムを終了する。Such a process is repeated to review the pitch for all the segments, and when the review process for the last segment is completed, a positive result is obtained in step SP21 and the processing program ends.

第２図は当該処理によって同定音程が修正された一例
を示すものであり、曲線はピッチ情報PITを表してお
り、この例では第２のセグメントS2及び第３のセグメン
トは同一音程を意図されている。第２のセグメントS2は
修正前においては、セグメントS3の同定音程R3より半音
低い音程R2に同定されていたが、かかる処理によってセ
グメントS3の音程R3にその音程R3Cが修正された。FIG. 2 shows an example in which the identified pitch has been corrected by the process, and the curve represents the pitch information PIT. In this example, the second segment S2 and the third segment are intended to be the same pitch. I have. Prior to the modification, the second segment S2 was identified as the pitch R2 which is a semitone lower than the identification pitch R3 of the segment S3, but the pitch R3C of the segment S3 has been corrected to the pitch R3 of the segment S3 by such processing.

従って、上述の実施例においては、セグメント長、ピ
ッチ情報の変化傾向、前後するセグメントとの音程差及
び前後するセグメントとのパワー情報の差に基づいて、
誤って異なる音程に同定されたセグメントを検出して音
程を修正するようにしたので、同定音程が一段と正確に
なり、以降の処理を一段と正確に実行させることができ
て楽譜データの精度を高めることができる。Therefore, in the above embodiment, based on the segment length, the change tendency of the pitch information, the pitch difference between the preceding and following segments, and the power information difference between the preceding and following segments,
Detecting segments identified by mistake at different intervals and correcting the intervals, the identified intervals become more accurate, and the subsequent processing can be executed more accurately, increasing the accuracy of the score data. Can be.

他の実施例なお、上述の実施例においては、前後する区間とのパ
ワー情報の差を考慮して誤った音程に同定されたセグメ
ントを抽出するようにしたが、少なくともセグメント
長、ピッチ情報の変化傾向及び前後するセグメントとの
音程差とに基づいて抽出すれば良い。Other Embodiments In the above-described embodiment, a segment identified at an incorrect pitch is extracted in consideration of a difference in power information between the preceding and following sections, but at least a change in segment length and pitch information is performed. What is necessary is just to extract based on a tendency and the pitch difference with the segment before and behind.

また、オーバーシュートかアンダーシュートかをピッ
チ情報の変化傾向により検出する方法は、上述のような
単純な増加傾向、減少傾向で検出する方法だけでなく、
基準パターンとの比較等他の方法を適用することができ
ることはいうまでもない。In addition, the method of detecting overshoot or undershoot based on the change tendency of pitch information is not limited to the method of detecting a simple increase tendency and a decrease tendency as described above,
It goes without saying that other methods such as comparison with a reference pattern can be applied.

さらに、上述の実施例においては、第４図に示す全て
の処理をCPU1が主記憶装置３に格納されているプログラ
ムに従って実行するものを示したが、その一部または全
部の処理をハードウェア構成で実行するようにしたも良
い。例えば、第３図との対応部分に同一符号を付した第
５図に示すように、音響信号入力装置８からの音響信号
を増幅回路10を介して増幅した後、さらに前置フィルタ
11を介してアナログ／デジタル変換器12に与えてデジタ
ル信号に変換し、このデジタル信号に変換された音響信
号を信号処理プロセッサ13が自己相関分析してピッチ情
報を抽出し、また２乗和処理してパワー情報を抽出して
CPU1によるソフトウェア処理系に与えるようにしても良
い。このようなハードウェア構成（10〜13）に用いられ
る信号処理プロセッサ13としては、音程帯域の信号をリ
アルタイム処理し得ると共に、ホストのCPU1とのインタ
ーフェース信号が用意されているプロセッサ（例えば、
日本電気株式会社製μPD7720）を適用し得る。Further, in the above-described embodiment, the CPU 1 executes all the processing shown in FIG. 4 according to the program stored in the main storage device 3. However, a part or all of the processing is performed by a hardware configuration. It is good to be executed in. For example, as shown in FIG. 5 in which the same reference numerals are given to the corresponding parts in FIG. 3, after the sound signal from the sound signal input device 8 is amplified through the amplifier circuit 10, the pre-filter is further added.
The digital signal is supplied to an analog / digital converter 12 via an analog-to-digital converter 11 and converted into a digital signal. The acoustic signal converted to the digital signal is subjected to autocorrelation analysis by a signal processor 13 to extract pitch information, and a square sum processing is performed. And extract power information
It may be provided to the software processing system by the CPU1. As the signal processor 13 used in such a hardware configuration (10 to 13), a processor (for example, a processor that can process a signal in a pitch band in real time and is provided with an interface signal with the CPU 1 of the host)
NEC Corporation μPD7720) can be applied.

［発明の構成］以上のように、本発明によれば、少なくともセグメン
ト長、ピッチ情報の変化傾向及び前後するセグメントと
の音程差とに基づいて誤った音程に同定されたセグメン
トを抽出してそのセグメントの音程を前後一方のセグメ
ントの同定音程に修正するようにしたので、同定音程と
して良好なものを得ることができ、最終的な楽譜データ
として精度の高いものを得ることができる。[Configuration of the Invention] As described above, according to the present invention, a segment identified as an incorrect pitch is extracted based on at least the segment length, the change tendency of the pitch information, and the pitch difference between the preceding and following segments, and the Since the interval of the segment is corrected to the identified interval of one of the preceding and succeeding segments, a good identified interval can be obtained, and a highly accurate final score data can be obtained.

[Brief description of the drawings]

第１図は本発明の一実施例にかかる同定音程の修正処理
を示すフローチャート、第２図はかかる同定音程の修正
の一例を示す略線図、第３図は本発明を適用する自動採
譜方式の構成を示すブロック図、第４図はその自動採譜
処理手順を示すフトーチャート、第５図は自動採譜方式
の他の構成を示すブロック図である。１……CPU、３……主記憶装置、６……補助記憶装置、
７……アナログ／デジタル変換器、８……音響信号入力
装置。FIG. 1 is a flowchart showing a process for correcting an identified pitch according to an embodiment of the present invention, FIG. 2 is a schematic diagram showing an example of such a process for correcting the identified pitch, and FIG. 3 is an automatic transcription system to which the present invention is applied. FIG. 4 is a foot chart showing the automatic transcription process, and FIG. 5 is a block diagram showing another configuration of the automatic transcription system. 1 ... CPU, 3 ... main storage device, 6 ... auxiliary storage device,
7 ... A / D converter, 8 ... Acoustic signal input device.

───────────────────────────────────────────────────── フロントページの続き (72)発明者藤本正樹東京都港区芝５丁目７番15号日本電気技術情報システム開発株式会社内 (72)発明者水野正典東京都港区芝５丁目７番15号日本電気技術情報システム開発株式会社内審査官新井重雄 ──────────────────────────────────────────────────続き Continued on the front page (72) Inventor Masaki Fujimoto 5-7-15 Shiba, Minato-ku, Tokyo Inside NEC Technical Information System Development Co., Ltd. (72) Inventor Masanori Mizuno 5-7-1 Shiba, Minato-ku, Tokyo No. 15 Examiner, NEC Technical Information Systems Development Co., Ltd. Shigeo Arai

Claims

(57) [Claims]

1. A process for extracting pitch information representing a pitch and power information of the sound signal, which is a repetition period of an input sound signal waveform, and processing the sound signal based on the pitch information and / or the power information. At least includes a segmentation process of classifying the segment into sections that can be regarded as the same pitch, and a pitch identification process of identifying the segmented section as a pitch on the absolute pitch axis based on the pitch information as a pitch of the section. In the automatic music transcription method of converting the pitch into musical score data, a process of extracting a section shorter than a predetermined length from each of the divided sections, and a section in which the change of the pitch information exhibits a peculiar predetermined tendency from the extracted section. Further extracting, and detecting a difference in the identified pitch between the section extracted based on the change tendency and the section before and after the section, and performing at least one A process of correcting the identified interval in the section where the interval difference is small when the interval difference is smaller than a predetermined value, as a post-process of the interval identifying process.

2. A process for correcting an identification interval of said section extracted based on a change tendency, wherein a difference in pitch between the preceding and following sections is small, and a change in the power information between the section and the section having a small interval difference. 2. The automatic music transcription method according to claim 1, wherein the identification pitch is corrected to one of the preceding and succeeding identification pitches when is smaller.

3. A pitch / power extracting means for extracting a pitch information representing a pitch and a power information of the sound signal, which is a repetition period of an input sound signal waveform, and based on the pitch information and / or the power information. Segmentation means for dividing the acoustic signal into sections that can be regarded as having the same pitch, and pitch identification means for identifying the divided sections as pitches on the absolute pitch axis based on the pitch information. An automatic transcription apparatus for converting the acoustic signal into musical score data in preparation for the following: a short section extracting means for extracting a section shorter than a predetermined length from each of the divided sections; and a change in the pitch information from the extracted section. A modification candidate section extracting means for further extracting a section exhibiting a peculiar predetermined tendency; a section extracted based on the change tendency and a section before and after the section; An identification interval correction means for detecting the difference between the identification intervals and the interval, and when at least one of the interval differences is smaller than a predetermined value, correcting the identification interval of the interval to the identification interval of the interval having a small interval. An automatic music transcription device, which is provided at a subsequent stage.

4. An identification pitch correcting means which, when a pitch difference between the preceding and succeeding sections is small and a change in the power information is small between the sections having a small pitch difference, the identification pitch in one of the preceding and following sections. 4. The automatic transcription apparatus according to claim 3, wherein the identification pitch is corrected.