JP2604408B2

JP2604408B2 - Automatic music transcription method and device

Info

Publication number: JP2604408B2
Application number: JP63046123A
Authority: JP
Inventors: 七郎鶴田; 洋典高島; 正樹藤本; 正典水野
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1988-02-29
Filing date: 1988-02-29
Publication date: 1997-04-30
Anticipated expiration: 2012-04-30
Also published as: JPH01219632A

Description

【発明の詳細な説明】［産業上の利用分野］本発明は、歌唱音声やハミング音声や楽器音等の音響
信号から楽譜データを作成する自動採譜方法及び装置に
関し、特に所定区間の音響信号の音程として同定された
音程を必要に応じて修正する同定音程の修正処理に関す
るものである。Description: BACKGROUND OF THE INVENTION The present invention relates to an automatic music transcription method and apparatus for creating musical score data from acoustic signals such as singing voices, humming voices, and musical instrument sounds, and more particularly to an audio transcription method for a predetermined section. The present invention relates to a process for correcting an identified pitch, which corrects a pitch identified as a pitch as necessary.

［従来の技術］歌唱音声やハミング音声や楽器音等の音響信号を楽譜
データに変換する自動採譜方式においては、音響信号か
ら楽譜としての基本的な情報である音長、音程、調、拍
子及びテンポを検出することを有する。[Prior Art] In an automatic transcription system for converting an acoustic signal such as a singing voice, a humming voice, or a musical instrument sound into musical score data, a sound length, a pitch, a key, a time signature, and the like, which are basic information as a musical score from an acoustic signal. Detecting the tempo.

ところで、音響信号は基本波形の繰返し波形を連続的
に含む信号であるだけであり、上述した各情報を直ちに
得ることはできない。By the way, an acoustic signal is only a signal that continuously includes a repetitive waveform of a basic waveform, and the above-described information cannot be obtained immediately.

そこで、従来の自動採譜方式においては、まず、音響
信号の音高を表す基本波形の繰返し情報（以下、ピッチ
情報と呼ぶ）及びパワー情報を分析周期毎に抽出し、そ
の後、抽出されたピッチ情報及び又はパワー情報から音
響信号を同一音程とみなせる区間（セグメント）に区分
し（かかる処理をセグメンテーションと呼ぶ）、次い
で、セグメントのピッチ情報から各セグメントの音響信
号の音程として絶対音程軸にそった音程に同定し、ピッ
チ情報の音程軸周りの分布状況に基づいて音響信号の調
を決定し、さらに、セグメントに基づいて音響信号の拍
子及びテンポを決定するという順序で各情報を得てい
た。Therefore, in the conventional automatic transcription method, first, repetition information (hereinafter, referred to as pitch information) of a basic waveform representing a pitch of an acoustic signal and power information are extracted for each analysis cycle, and thereafter, the extracted pitch information is extracted. And / or dividing the audio signal into sections (segments) that can be regarded as the same pitch based on the power information (this processing is called segmentation), and then, based on the pitch information of the segment, the pitch along the absolute pitch axis as the pitch of the audio signal of each segment. Each information is obtained in the order of determining the tone of the acoustic signal based on the distribution of the pitch information around the pitch axis, and further determining the beat and tempo of the acoustic signal based on the segment.

［発明が解決しようとする課題］ところで、音響信号のあるセグメントを絶対音程軸上
の音程として同定しようとしても、音響信号、特に人に
よって発生された音響信号は音程が安定しておらず、同
一音程（１音）を意図している場合であっても音程の揺
らぎが多い。そのため、音程同定処理を非常に難しいも
のとしていた。[Problems to be Solved by the Invention] By the way, even if an attempt is made to identify a certain segment of an acoustic signal as a pitch on an absolute pitch axis, the pitch of an acoustic signal, particularly an acoustic signal generated by a person, is not stable and the same. Even when a pitch (one tone) is intended, the pitch fluctuates greatly. For this reason, the pitch identification processing is very difficult.

ところで、従来の音程同定方法としては、リアルタイ
ム処理を実現するように演算等が簡易な方法、例えば、
セグメント内のピッチ情報の平均値に最も近い絶対軸上
の音程に、またセグメントのピッチ情報の中央値に最も
近い絶対軸上の音程に同定するような方法等が考えられ
ている。このような方法によれば、音響信号が揺らぎを
有していても音階上で隣り合う音の音程差が例えば、ハ
長調の音階でドとレのように全音の場合には良好に音程
を同定できるが、隣り合う音の音程差が例えば、ハ長調
の音階でミとファのように半音の場合には音響信号の揺
らぎのために同定が正確でない場合も生じる。例えば、
ハ長調においてミを意図しているにも拘らず、ファとし
て同定されることもあった。By the way, as a conventional pitch identification method, a method such as a simple operation to realize real-time processing, for example,
A method of identifying a pitch on the absolute axis closest to the average value of the pitch information in the segment and a pitch on the absolute axis closest to the median of the pitch information of the segment has been considered. According to such a method, even if the sound signal has fluctuation, the pitch difference between adjacent sounds on the musical scale is good, for example, in the case of a full scale like C and D in the C major scale. Although identification is possible, if the pitch difference between adjacent sounds is, for example, a semitone such as M and F in a C major scale, the identification may not be accurate due to fluctuations in the acoustic signal. For example,
Despite being intended for Mi in C major, it was sometimes identified as a fa.

本発明は、以上の点を考慮してなされたもので、音階
上で隣り合う音程との音程差が全音または半音のいずれ
の音程を歌唱者等が意図しても、音程の絶対軸上に対す
る同定を正確に実行することができ、最終的な楽譜デー
タの精度を一段と向上させることのできる自動採譜方法
及び装置を提供しようとするものである。The present invention has been made in consideration of the above points, and the pitch difference between adjacent pitches on a musical scale is either a whole tone or a semitone, even if a singer or the like intends the pitch to be on the absolute axis of the pitch. An object of the present invention is to provide an automatic music transcription method and apparatus that can perform identification accurately and can further improve the accuracy of final score data.

［課題を解決するための手段］かかる課題を解決するため、第１の本発明において
は、入力された音響信号波形の繰返し周期であり、音高
を表すピッチ情報及び音響信号のパワー情報を抽出する
処理と、ピッチ情報及び又はパワー情報に基づいて音響
信号を同一音程とみなせる区間に区分するセグメンテー
ション処理と、ピッチ情報に基づいてこの区分された区
間の音程として絶対音程軸上の音程に同定する音程同定
処理と、ピッチ情報の分布状況に基づいて音響信号の調
を決定する調決定処理とを少なくとも含み、音響信号を
楽譜データに変換する自動採譜方法において、区間の音
程として同定された音程が決定された調の音階上で隣り
合う音程差が半音である音程の区間を抽出する処理と、
抽出された区間のピッチ情報のうちその区間の同定音程
と決定された調の音階上で半音だけ異なる音程との間に
あるピッチ情報を集計する処理と、その集計された分布
状況に応じてその区間の音程を同定し直す処理とを、調
決定処理の後処理として設けた。[Means for Solving the Problems] In order to solve the problems, in the first aspect of the present invention, pitch information representing a pitch and a power information of an acoustic signal, which is a repetition period of an input acoustic signal waveform, is extracted. Processing, segmentation processing for dividing an audio signal into sections that can be regarded as having the same pitch based on pitch information and / or power information, and identifying a pitch on the absolute pitch axis as a pitch for this section based on pitch information. A pitch identification process and at least a key determination process of determining a key of an audio signal based on a distribution state of pitch information, and in an automatic transcription method of converting an audio signal into musical score data, a pitch identified as a pitch of a section is determined. A process of extracting intervals of intervals in which the pitch difference adjacent to each other on the scale of the determined key is a semitone;
A process of summarizing pitch information between the identified pitch of the extracted section and the pitch of a semitone on the scale of the determined key in the pitch information of the extracted section, and according to the tabulated distribution status, The process of re-identifying the interval of the section is provided as a post-process of the key determination process.

また、第２の本発明においては、入力された音響信号
波形の繰返し周期であり、音高を表すピッチ情報及び音
響信号のパワー情報を抽出するピッチ・パワー抽出手段
と、抽出されたピッチ情報及び又はパワー情報に基づい
て音響信号を同一音程とみなせる区間に区分するセグメ
ンテーション手段と、ピッチ情報に基づいてこの区分さ
れた区間の音程として絶対音程軸上の音程に同定する音
程同定手段と、ピッチ情報の分布状況に基づいて音響信
号が有する調を決定する調決定手段とを一部に備えて音
響信号を楽譜データに変換する自動採譜装置において、
区間の音程として同定された音程が決定された調の音階
上で隣り合う音程差が半音である音程の区間を抽出する
修正候補区間抽出手段と、抽出された区間のピッチ情報
のうちその区間の同定音程と決定された調の音階上で半
音だけ異なる音程との間にあるピッチ情報を集計するピ
ッチ集計手段と、その集計された分布状況に応じてその
区間の音程を同定し直す同定音程修正手段とを、調決定
手段の後段に設けた。Further, in the second aspect of the present invention, a pitch / power extracting means for extracting pitch information representing a pitch and a power information of the audio signal, which is a repetition period of the input audio signal waveform, Or segmentation means for dividing an audio signal into sections that can be regarded as having the same pitch based on power information, pitch identification means for identifying a pitch on the absolute pitch axis as a pitch of this section based on pitch information, and pitch information. An automatic music transcription device that partially converts a sound signal into musical score data, including a key determination unit that determines a key of the sound signal based on a distribution state of the sound signal.
A correction candidate section extracting means for extracting a section of a pitch in which a pitch difference adjacent to a pitch of a key whose pitch is determined as a pitch of a section is a semitone, and pitch information of the section in the pitch information of the extracted section. Pitch summation means for summing up pitch information between the identified pitch and a pitch that differs by a semitone on the scale of the determined key, and an identification pitch correction for re-identifying the interval in the section according to the tabulated distribution status Means are provided after the key determination means.

［作用］例えば、ハ長調において音階上の７音（ドレミファソ
ラシ）のうち、ドとシ、及び、ミとファは、他の隣り合
う音程の組と異なり、音程差は半音である。そのため、
音程同定時に誤った同定された可能性がある。そこで、
第１及び第２の本発明共に、区間の音程として同定され
た音程が音響信号が有する調において隣り合う音程との
差が半音である上述のような４音である場合には、音程
を見直すようにした。[Operation] For example, among the seven notes (doremi faso rashi) on the musical scale in C major, do and shi and mi and fa differ from the other pairs of adjacent pitches in that the pitch difference is a semitone. for that reason,
It is possible that an incorrect identification was made during pitch identification. Therefore,
In both the first and second aspects of the present invention, when the interval identified as the interval of the section is the above-described four tones, which are semitones in the key of the acoustic signal, the intervals are reviewed. I did it.

すなわち、第１の本発明においては、同定された音程
が決定された調におけるこれら４音の区間を抽出した
後、半音間にあるその区間のピッチ情報を集計してその
集計された分布状況に応じてその区間の音程を同定し直
すようにした。That is, in the first aspect of the present invention, after extracting the intervals of these four tones in the key in which the identified pitch is determined, the pitch information of the interval between the semitones is totaled, and the totaled distribution state is obtained. Accordingly, the interval of the section is re-identified.

また、第２の本発明においては、同定された音程が決
定された調における上述の４音の区間を修正候補区間抽
出手段が修正する区間の候補として抽出した後、半音間
にあるその区間のピッチ情報をピッチ集計手段が集計し
てその集計された分布状況に応じて同定音程修正手段が
その区間の音程を同定し直すようにした。According to the second aspect of the present invention, after the above-mentioned four-tone section in the key whose identified interval is determined is extracted as a candidate for the section to be corrected by the correction candidate section extracting means, the correction candidate section extraction means extracts The pitch information is summed up by the pitch summing means, and the identified pitch correcting means re-identifies the pitch in the section in accordance with the summed distribution status.

［実施例］以下、本発明の一実施例を図面を参照しながら詳述す
る。Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.

自動採譜方式まず、本発明が適用される自動採譜方式について説明
する。Automatic transcription system First, an automatic transcription system to which the present invention is applied will be described.

第３図において、中央処理ユニット（CPU）１は、当
該装置の全体を制御するものであり、バス２を介して接
続されている主記憶装置３に格納されている第４図に示
す採譜処理プログラムを実行するものである。バス２に
は、CPU1及び主記憶装置３に加えて、入力装置としての
キーボード４、出力装置としての表示装置５、ワーキン
グメモリとして用いられる補助記憶装置６及びアナログ
／デジタル変換器７が接続されている。In FIG. 3, a central processing unit (CPU) 1 controls the whole of the apparatus, and performs a musical notation processing shown in FIG. 4 stored in a main storage device 3 connected via a bus 2. Execute the program. In addition to the CPU 1 and the main storage device 3, a keyboard 4 as an input device, a display device 5 as an output device, an auxiliary storage device 6 used as a working memory, and an analog / digital converter 7 are connected to the bus 2. I have.

アナログ／デジタル変換器７には、例えば、マイクロ
フォンでなる音響信号入力装置８が接続されている。こ
の音響信号入力装置８は、ユーザによって発声された歌
唱やハミングや、楽器から発声された楽音等の音響信号
を捕捉して電気信号に変換するものであり、その電気信
号をアナログ／デジタル変換器７に出力するものであ
る。An audio signal input device 8 including, for example, a microphone is connected to the analog / digital converter 7. The acoustic signal input device 8 captures an acoustic signal such as singing or humming uttered by a user or a musical sound uttered from a musical instrument and converts the signal into an electric signal. The electric signal is converted into an analog / digital converter. 7 is output.

CPU1は、キーボード入力装置４によって処理が指令さ
れたとき、当該採譜処理を開始し、主記憶装置３に格納
されているプログラムを実行してアナログ／デジタル変
換器７によってデジタル信号に変換された音響信号を一
旦補助記憶装置６に格納し、その後、これら音響信号を
上述のプログラムを実行して楽譜データに変換して必要
に応じて表示装置５に出力するようになされている。When a process is instructed by the keyboard input device 4, the CPU 1 starts the transcription process, executes a program stored in the main storage device 3, and converts the sound converted into a digital signal by the analog / digital converter 7. The signals are temporarily stored in the auxiliary storage device 6, and thereafter, these sound signals are converted into musical score data by executing the above-described program and output to the display device 5 as necessary.

次に、CPU1が実行する音響信号を取り込んだ後の採譜
処理を第４図の機能レベルで示すフローチャートに従っ
て詳述する。Next, the transcription process performed by the CPU 1 after capturing the audio signal will be described in detail with reference to the flowchart shown in FIG.

まず、CPU1は、音響信号を自己相関分析して分析周期
毎に音響信号のピッチ情報を抽出し、また２乗和処理し
て分析周期毎にパワー情報を抽出し、その後ノイズ除去
や平滑化処理等の後処理を実行する（ステップSP1、SP
2）。その後、CPU1は、ピッチ情報については、その分
布状況に応じて絶対音程軸に対する音響信号の音程軸の
ずれ量を算出し、得られたピッチ情報をそのずれ量に応
じてシフトさせるチューニング処理を実行する（ステッ
プSP3）。すなわち、音響信号を発生した歌唱者または
楽器が有する音程軸と絶対音程軸との差が小さくなるよ
うにピッチ情報を修正する。First, the CPU 1 performs an autocorrelation analysis of the acoustic signal to extract pitch information of the acoustic signal at each analysis cycle, and also performs a sum-of-squares process to extract power information at each analysis cycle, and then performs noise removal and smoothing processing. And other post-processing (steps SP1, SP
2). Thereafter, for the pitch information, the CPU 1 calculates a shift amount of the pitch axis of the acoustic signal with respect to the absolute pitch axis according to the distribution state, and executes a tuning process of shifting the obtained pitch information according to the shift amount. (Step SP3). That is, the pitch information is corrected so that the difference between the pitch axis and the absolute pitch axis of the singer or musical instrument that has generated the acoustic signal is reduced.

次いで、CPU1は、得られたピッチ情報が同一音程を指
示するものと考えられるピッチ情報の連続期間を得て、
音響信号を１音ごとのセグメントに切り分けるセグメン
テーションを実行し、また、得られたパワー情報の変化
に基づいてセグメンテーションを実行する（ステップSP
4、SP5）。これら得られた両者のセグメント情報に基づ
いて、CPU1は、４分音符や８分音符等の時間長に相当す
る基準長を算出してこの基準長に基づいて再度セグメン
テーションを実行する（ステップSP6）。Next, the CPU 1 obtains a continuous period of pitch information in which the obtained pitch information is considered to indicate the same pitch,
A segmentation is performed to divide the acoustic signal into segments for each sound, and a segmentation is performed based on the obtained change in the power information (step SP
4, SP5). Based on these two pieces of segment information obtained, the CPU 1 calculates a reference length corresponding to a time length of a quarter note, an eighth note, etc., and executes the segmentation again based on this reference length (step SP6). .

CPU1は、このようにしてセグメンテーションされたセ
グメントのピッチ情報に基づきそのピッチ情報が最も近
いと判断できる絶対音程軸上の音程にそのセグメントの
音程を同定し、さらに、同定された連続するセグメント
の音程が同一か否かに基づいて再度セグメンテーション
を実行する（ステップSP7、SP8）。The CPU 1 identifies the pitch of the segment as a pitch on the absolute pitch axis that can determine that the pitch information is the closest based on the pitch information of the segment thus segmented, and further identifies the pitch of the identified continuous segment. Segmentation is again performed based on whether or not are the same (steps SP7 and SP8).

その後、CPU1は、チューニング後のピッチ情報を集計
して得た各セグメントについての音程の出現頻度と、調
に応じて定まる所定の重み付け係数との積和を求めてこ
の積和の最大情報に基づいて、例えば、ハ長調やイ短調
というように入力音響信号の楽曲の調を決定し、その
後、同定された音程が所定の音階の場合にはその音程を
ピッチ情報について見直して音程を確認、修正する（ス
テップSP9、SP10）。次いで、CPU1は、最終的に決定さ
れた音程から連続するセグメントについて同一なものが
あるか否か、また連続するセグメント間でパワーの変化
があるか否かに基づいてセグメンテーションの見直しを
実行し、最終的なセグメンテーションを行なう（ステッ
プSP11）。Thereafter, the CPU 1 obtains a product sum of a frequency of occurrence of a pitch for each segment obtained by summing the pitch information after tuning and a predetermined weighting coefficient determined according to the key, and based on the maximum information of the product sum. Then, for example, the key of the music of the input audio signal is determined such as C major or A minor, and then, if the identified pitch is a predetermined scale, the pitch is reviewed for pitch information to confirm and correct the pitch. (Steps SP9 and SP10). Next, the CPU 1 executes a review of the segmentation based on whether or not there is the same continuous segment from the finally determined pitch, and whether or not there is a power change between the continuous segments, Final segmentation is performed (step SP11).

このようにして音程及びセグメントが決定されると、
CPU1は、楽曲は１拍目から始まる、フレーズの最後の音
は次の小節にまたがらない、小節ごとに切れ目がある等
の観点から小節を抽出し、この小節情報及びセグメンテ
ーション情報から拍子を決定し、この決定された拍子情
報及び小節の長さからテンポを決定する（ステップSP1
2、SP13）。Once the pitch and segment are determined in this way,
The CPU 1 extracts measures from the viewpoint that the music starts from the first beat, the last sound of the phrase does not extend to the next measure, and there is a break in each measure, and determines the time signature from the measure information and the segmentation information. The tempo is determined from the determined time signature information and the length of the bar (step SP1).
2, SP13).

そして、CPU1は決定された音程、音長、調、拍子及び
テンポの情報を整理して最終的に楽譜データを作成する
（ステップSP14）。Then, the CPU 1 organizes the information on the determined pitch, pitch, key, beat, and tempo to finally create the musical score data (step SP14).

同定音程の修正処理次に、上述したステップSP7の音程同定処理によって
同定された音程の修正処理（ステップSP10）について、
第１図のフローチャートを用いて詳述する。Correction processing of the identified pitch Next, regarding the correction processing of the pitch identified by the above-described pitch identification processing of step SP7 (step SP10),
This will be described in detail with reference to the flowchart of FIG.

CPU1は、かかる音程の修正を実行する前に上述したよ
うに、まずセグメンテーションによって得られたセグメ
ントについて、例えばそのセグメント内のピッチ情報の
平均値を得て平均値が最も近い絶対音程軸上の半音ずつ
異なるいずれかの音程にそのセグメントの音程を同定し
（ステップSP20）、その後、全てのピッチ情報の12音階
についてのヒストグラムを作成し、調毎に定まる各音階
についての重み付け係数と各音階の発生頻度との積和を
求め、最大の積和を与える調を当該音響信号の調として
決定する（ステップSP21）。As described above, before executing the pitch correction, the CPU 1 first obtains, for example, an average value of pitch information in the segment obtained by the segmentation, and obtains a semitone on the absolute pitch axis having the closest average value. The pitch of the segment is identified for each of the different pitches (step SP20), and thereafter, a histogram is created for all 12 pitches of the pitch information, and a weighting coefficient for each pitch determined for each key and generation of each pitch The product sum with the frequency is obtained, and the key that gives the maximum product sum is determined as the key of the acoustic signal (step SP21).

修正処理においては、CPU1は、まず最後のセグメント
について処理が終了していないことを確認して処理対象
のセグメントの同定された音程が、その決定された調の
音階上で隣り合う音程との差が半音である音程（例え
ば、ハ長調であればミ、ファ、シ、ド）のいずれである
か否かを判断し、異なる場合には、音程の修正を行なこ
となく処理対象セグメントを次のセグメントに移行して
ステップSP22に戻る（ステップSP22〜24）。In the correction process, the CPU 1 first confirms that the process has not been completed for the last segment, and determines the difference between the identified interval of the segment to be processed and the adjacent interval on the scale of the determined key. Is a semitone (e.g., C, F, C, C in C major), and if not, the next segment to be processed is performed without correcting the pitch. And returns to step SP22 (steps SP22 to SP24).

これに対して、処理セグメントの同定された音程がそ
れらの音程であると、CPU1はそのセグメントの同定音程
と決定された調の音階上で半音だけ異なる音程との間に
あるピッチ情報を集計する（ステップSP25）。例えば、
ハ長調において処理セグメントの音程が「ミ」である場
合には、当該処理セグメントの「ミ」及び「ファ」に対
応したピッチ情報間にあるピッチ情報の分布が求める。
従って、このセグメントのピッチ情報でもこの半音間に
ないピッチ情報は、集計されないことになる。On the other hand, if the identified intervals of the processing segment are those intervals, the CPU 1 aggregates the pitch information between the identified interval of the segment and the intervals that differ by a semitone on the scale of the determined key. (Step SP25). For example,
If the pitch of the processing segment is "mi" in C major, the distribution of pitch information between the pitch information corresponding to "mi" and "fa" of the processing segment is determined.
Therefore, even in the pitch information of this segment, pitch information that is not between the semitones is not counted.

次いで、CPU1は、この半音の中間のピッチ情報ピッチ
情報より大きいピッチ情報が多いか、またはその中間の
ピッチ情報より小さいピッチ情報が多いかを求め、多い
方に近い絶対軸上の音程を当該セグメントの音程として
同定する（ステップSP26）。Next, the CPU 1 determines whether there is a lot of pitch information larger than the pitch information in the middle of the semitone, or a lot of pitch information smaller than the pitch information in the middle of the semitone. (Step SP26).

このようにして同定の見直し、修正が終了すると、処
理セグメントを次のセグメントに移行させて上述のステ
ップSP22に戻る。When the identification is reviewed and corrected in this manner, the processing segment is shifted to the next segment, and the process returns to step SP22.

このように同定された音程が決定された調における隣
り合う音程との差が半音である音程の場合の音程を見直
すようにしたのは、その音程差が半音であるため、同定
ミスの可能性が大きいためである。The reason for reconsidering the pitch in the case where the difference between adjacent pitches in the determined key in the determined key is a semitone is because the pitch difference is a semitone, the possibility of identification error may occur. Is large.

上述した処理を繰り返して全てのセグメントについて
音程の見直しを実行し、最後のセグメントの見直し処理
が終了すると、ステップSP22で肯定結果が得られて当該
処理プログラムを終了する。The above process is repeated to review the pitch for all the segments, and when the review process for the last segment is completed, a positive result is obtained in step SP22, and the processing program ends.

第２図は同定された音程の修正の一例を示すものであ
り、決定された調がハ長調であってピッチ情報の平均値
に基づいて同定された音程が「ミ」である場合を示して
いる。このセグメントは同定された音程が「ミ」である
ので、修正処理に進み、「ミ」と「ファ」の間にあるピ
ッチ情報、従って期間T1のピッチ情報だけが集計され、
「ミ」と「ファ」の中間のピッチ情報値PCの上下のピッ
チ情報が計数され、この場合にはピッチ情報値PCより大
きいピッチ情報がこの期間T1では多いので、「ファ」の
音程にこのセグメントの音程が同定し直される。FIG. 2 shows an example of correction of the identified pitch, in which the determined key is C major and the pitch identified based on the average value of the pitch information is "mi". I have. Since the identified pitch of this segment is "mi", the process proceeds to the correction process, and only the pitch information between "mi" and "fa", that is, the pitch information of the period T1, is totaled.
The pitch information above and below the pitch information value PC between "mi" and "fa" is counted. In this case, there is more pitch information in this period T1 than the pitch information value PC. The pitch of the segment is re-identified.

従って、上述の実施例によれば、同定された音程が決
定された調における隣り合う音程差が半音である音程の
場合、さらにそのセグメントの音程を詳細に見直すよう
にしたので、各セグメントの音程を正確に同定すること
ができる。Therefore, according to the above-described embodiment, when the pitch difference between adjacent pitches in the determined key is a semitone, the pitch of the segment is further reviewed in detail. Can be accurately identified.

他の実施例なお、上述の実施例においては、セグメントのピッチ
情報の平均値が最も近い音程に同定したものを見直すも
のを示したが、他の音程同定方法によって同定された音
程を同様に見直すこともできる。Other Embodiments In the above-described embodiment, an example in which the average value of the pitch information of the segment is reviewed to be identified to the closest interval has been described. However, the interval identified by another interval identification method is similarly reviewed. You can also.

また、上述の実施例においては、見直し対象の２個の
音程の中間ピッチ情報より大きいピッチ情報と小さいピ
ッチ情報の多少によって音程を同定し直すようにした
が、他の方法によって見直すようにしても良い。例え
ば、当該処理セグメントのピッチ情報のうち見直し対象
の２個の音程間にあるピッチ情報の平均値や最頻度によ
って見直すようにしても良い。Further, in the above-described embodiment, the pitch is re-identified based on the pitch information larger and smaller than the intermediate pitch information of the two pitches to be reviewed. However, the pitch may be re-examined by another method. good. For example, the pitch information of the processing segment may be reviewed based on the average value or the most frequent pitch information between the two intervals to be reviewed.

さらに、上述の実施例においては、第４図に示す全て
の処理をCPU1が主記憶装置３に格納されているプログラ
ムに従って実行するものを示したが、その一部または全
部の処理をハードウェア構成で実行するようにしても良
い。例えば、第３図との対応部分に同一符号を付した第
５図に示すように、音響信号入力装置８からの音響信号
を増幅回路10を介して増幅した後、さらに前置フィルタ
11を介してアナログ／デジタル変換器12に与えてデジタ
ル信号に変換し、このデジタル信号に変換された音響信
号を信号処理プロセッサ13が自己相関分析してピッチ情
報を抽出し、また２乗和処理してパワー情報を抽出して
CPU1によるソフトウェア処理系に与えるようにしても良
い。このようなハードウェア構成（10〜13）に用いられ
る信号処理プロセッサ13としては、音声帯域の信号をリ
アルタイム処理し得ると共に、ホストのCPU1とのインタ
ーフェース信号が用意されているプロセッサ（例えば、
日本電気株式会社製μPD7720）を適用し得る。Further, in the above-described embodiment, the CPU 1 executes all the processing shown in FIG. 4 according to the program stored in the main storage device 3. However, a part or all of the processing is performed by a hardware configuration. May be executed. For example, as shown in FIG. 5 in which the same reference numerals are given to the corresponding parts in FIG. 3, after the sound signal from the sound signal input device 8 is amplified through the amplifier circuit 10, the pre-filter is further added.
The digital signal is supplied to an analog / digital converter 12 via an analog-to-digital converter 11 and converted into a digital signal. The acoustic signal converted to the digital signal is subjected to autocorrelation analysis by a signal processor 13 to extract pitch information, and a square sum processing is performed. And extract power information
It may be provided to the software processing system by the CPU1. As a signal processor 13 used in such a hardware configuration (10 to 13), a processor (for example, a processor that can process a signal in a voice band in real time and is provided with an interface signal with a host CPU 1)
NEC Corporation μPD7720) can be applied.

［発明の効果］以上のような、本発明によれば、同定された音程が決
定された調の音階における隣り合う音程差が半音である
音程の場合に、さらにそのセグメントの音程を詳細に見
直すようにしたので、各セグメントの音程を正確に同定
することができ、最終的な楽譜データの精度を向上させ
ることのできる自動採譜方法及び装置を得ることができ
る。[Effect of the Invention] According to the present invention as described above, when the pitch difference between adjacent pitches in the scale of the key whose identified pitch is determined is a semitone, the pitch of the segment is further reviewed in detail. As a result, the pitch of each segment can be accurately identified, and an automatic transcription method and apparatus that can improve the accuracy of final score data can be obtained.

[Brief description of the drawings]

第１図は本発明の一実施例にかかる同定音程の修正処理
を示すフローチャート、第２図はかかる同定音程の修正
処理の説明に供する略線図、第３図は本発明を適用する
自動採譜方式の構成を示すブロック図、第４図はその自
動採譜処理手順を示すフローチャート、第５図は自動採
譜方式の他の構成を示すブロック図である。１……CPU、３……主記憶装置、６……補助記憶装置、
７……アナログ／デジタル変換器、８……音響信号入力
装置。FIG. 1 is a flowchart showing a process of correcting an identified pitch according to an embodiment of the present invention, FIG. 2 is a schematic diagram for explaining the process of correcting the identified pitch, and FIG. 3 is an automatic music transcription to which the present invention is applied. FIG. 4 is a block diagram showing the configuration of the system, FIG. 4 is a flowchart showing the automatic transcription process, and FIG. 5 is a block diagram showing another configuration of the automatic transcription system. 1 ... CPU, 3 ... main storage device, 6 ... auxiliary storage device,
7 ... A / D converter, 8 ... Acoustic signal input device.

───────────────────────────────────────────────────── フロントページの続き (72)発明者藤本正樹東京都港区芝５丁目７番15号日本電気技術情報システム開発株式会社内 (72)発明者水野正典東京都港区芝５丁目７番15号日本電気技術情報システム開発株式会社内審査官新井重雄 ──────────────────────────────────────────────────続き Continued on the front page (72) Inventor Masaki Fujimoto 5-7-15 Shiba, Minato-ku, Tokyo Inside NEC Technical Information System Development Co., Ltd. (72) Inventor Masanori Mizuno 5-7-1 Shiba, Minato-ku, Tokyo No. 15 Examiner, NEC Technical Information Systems Development Co., Ltd. Shigeo Arai

Claims

(57) [Claims]

1. A process for extracting pitch information representing a pitch and power information of the sound signal, which is a repetition period of an input sound signal waveform, and processing the sound signal based on the pitch information and / or the power information. A segmentation process of dividing the interval into segments that can be regarded as the same interval, a pitch identification process of identifying the interval on the absolute pitch axis as a pitch of the segmented section based on the pitch information, and according to a distribution state of the pitch information. At least a key determination process of determining a key of the sound signal, wherein the automatic music transcription method of converting the sound signal into musical score data, wherein the pitch identified as the pitch of the section is adjacent to the scale of the determined key. A process of extracting a section of a pitch in which a matching pitch difference is a semitone, and a tone of a key determined as an identification pitch of the section in pitch information of the extracted section. A process of summarizing pitch information between pitches that differ by a semitone on the floor and a process of re-identifying a pitch in that section according to the tabulated distribution status are provided as post-processes of the above-described key determination process. Automatic transcription method characterized by the following.

2. A pitch / power extracting means for extracting pitch information representing a pitch and power information of the audio signal, which is a repetition period of an input audio signal waveform, and includes: Segmentation means for classifying an acoustic signal into sections that can be regarded as having the same pitch based on the pitch information; pitch identification means for identifying a pitch on the absolute pitch axis as a pitch of the divided section based on the pitch information; and distribution of the pitch information. In an automatic transcription apparatus that partially includes a key determination unit that determines a key of the audio signal according to a situation and converts the audio signal into musical score data, a pitch identified as a pitch of the section is determined. Correction candidate section extraction means for extracting a section of a pitch in which the pitch difference between adjacent pitches is a semitone on the scale of the key, and the section of pitch information of the extracted section Pitch summing means for summing up pitch information between the identified pitch of the pitch and the pitch of the determined key that differs by a semitone, and the identified pitch for re-identifying the pitch of the section according to the tabulated distribution status An automatic transcription apparatus, wherein a correction unit is provided at a stage subsequent to the key determination unit.