JP2604403B2

JP2604403B2 - Automatic music transcription method and device

Info

Publication number: JP2604403B2
Application number: JP4611688A
Authority: JP
Inventors: 七郎鶴田; 洋典高島; 正樹藤本; 正典水野
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1988-02-29
Filing date: 1988-02-29
Publication date: 1997-04-30
Anticipated expiration: 2012-04-30
Also published as: JPH01219626A

Abstract

PURPOSE:To perform segmentation unaffected by acoustic signal fluctuation or sudden external sound, by performing the segmentation on the basis of the specific value defined by the length of a verse wherein the pitch data of an input acoustic signal continues in a certain intervale width. CONSTITUTION:The digital acoustic signal such as singing passing through an acoustic signal input apparatus 8 and an A/D converter 7 by a CPU 1 corresponding to a keyboard 4 is stored in the auxiliary memory device 6 of a working memory and subjected to process control processing by the CPU 1 corresponding to the program from a main memory device 3 to extract pitch data and a verse wherein the pitch data continues in a specified intervale width is calculated and the length of the sequence is defined to detect a continuous section wherein the length of the sequence is a predetermined value or more. The sampling point having the max. sequence length in this section is set as a representative point and the acoustic signal is divided at a sampling point where the difference between two adjacent representative pitch data is a predetermined value or more and the change of the pitch data is max. and segmentation not affected by the fluctuation of the acoustic signal or sudden external sound is performed.

Description

【発明の詳細な説明】［産業上の利用分野］本発明は、歌唱音声やハミング音声や楽器音等の音響
信号から楽譜データを作成する自動採譜方法及び装置に
関し、特に、ピッチ情報から音響信号を同一音程とみな
せる区間に区分するセグメンテーション処理に関するも
のである。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an automatic music transcription method and apparatus for creating musical score data from acoustic signals such as singing voices, humming voices, and instrument sounds, and more particularly, to an audio signal from pitch information. Is related to a segmentation process for dividing into sections that can be regarded as the same pitch.

［従来の技術］歌唱音声やハミング音声や楽器音等の音響信号を楽譜
データに変換する自動採譜方式においては、音響信号か
ら楽譜としての基本的な情報である音長、音程、調、拍
子及びテンポを検出することを有する。[Prior Art] In an automatic transcription system for converting an acoustic signal such as a singing voice, a humming voice, or a musical instrument sound into musical score data, a sound length, a pitch, a key, a time signature, and the like, which are basic information as a musical score from an acoustic signal. Detecting the tempo.

ところで、音響信号は基本波形の繰返し波形を連続的
に含む信号であるだけであり、上述した各情報を直ちに
得ることはできない。By the way, an acoustic signal is only a signal that continuously includes a repetitive waveform of a basic waveform, and the above-described information cannot be obtained immediately.

そこで、従来の自動採譜方式においては、まず、音響
信号の音高を表す基本波形の繰返し情報（以下、ピッチ
情報と呼ぶ）及びパワー情報を分析周期毎に抽出し、そ
の後、抽出されたピッチ情報及びパワー情報から音響信
号を同一音程とみなせる区間（セグメント）に区分し
（かかる処理をセグメンテーションと呼ぶ）、次いで、
ピッチ情報及び又はパワー情報から各セグメントの音響
信号の絶対音程軸にそった音程を決定し、また、ピッチ
情報の分布情報に基づいて音響信号の調を決定し、さら
に、セグメントに基づいて音響信号の拍子及びテンポを
決定するという順序で各情報を得ていた。Therefore, in the conventional automatic transcription method, first, repetition information (hereinafter, referred to as pitch information) of a basic waveform representing a pitch of an acoustic signal and power information are extracted for each analysis cycle, and thereafter, the extracted pitch information is extracted. And the audio signal is divided into sections (segments) that can be regarded as having the same pitch from the power information (this processing is called segmentation).
Determine the pitch along the absolute pitch axis of the sound signal of each segment from the pitch information and / or power information, determine the tone of the sound signal based on the distribution information of the pitch information, and further determine the sound signal based on the segment. Each information was obtained in the order of determining the time signature and tempo.

従って、音程、拍子、テンポ等は、セグメント（音
長）に基づき決定されるので、セグメンテーション処理
は、特に楽譜データを作成する上で重要なものとなって
いる。Therefore, the interval, time signature, tempo, and the like are determined based on the segment (tone length), and thus the segmentation processing is particularly important in creating musical score data.

［発明が解決しようとする課題］ところで、音響信号、特に人によって発声された音響
信号は、音程が安定しておらず、音程の揺らぎが多い。
そのため、ピッチ情報に基づきセグメンテーションを良
好に行なうことは難しかった。また、楽器から発生され
た音響であっても、音響を電気信号に変換する音響信号
入力装置が周りの雑音をも捕らえているため、これがピ
ッチ情報に入り込み、ピッチ情報に基づくセグメンテー
ションを難しくしている場合があった。[Problems to be Solved by the Invention] By the way, an acoustic signal, particularly an acoustic signal uttered by a human, has an unstable pitch and a large fluctuation of the pitch.
Therefore, it has been difficult to satisfactorily perform the segmentation based on the pitch information. Also, even in the case of sound generated from musical instruments, the sound signal input device that converts the sound into an electric signal also captures surrounding noise, which enters pitch information, making segmentation based on pitch information difficult. There was a case.

上述のように、セグメンテーションは楽譜データを作
成する上で重要な要素であり、必要に応じてパワー情報
に基づくセグメンテーション結果とから最終的なセグメ
ンテーションを行なうとはいえ、このようにピッチ情報
に基づくセグメンテーションの精度が低いと、最終的に
得られる楽譜データの精度も著しく低くなる。As described above, the segmentation is an important element in creating score data, and the final segmentation is performed from the segmentation result based on the power information as necessary, but the segmentation based on the pitch information is performed in this manner. If the precision of the musical score data is low, the precision of the finally obtained musical score data also becomes extremely low.

本発明は、以上の点を考慮してなされたもので、音響
信号の揺らぎや突発的な外部音の影響を受けることな
く、ピッチ情報に基づいたセグメンテーションを良好に
実行することができ、最終的な楽譜データの精度を一段
と向上させることのできる自動採譜方法及び装置を提供
しようとするものである。The present invention has been made in consideration of the above points, and can perform a segmentation based on pitch information satisfactorily without being affected by fluctuations of acoustic signals or sudden external sounds. It is an object of the present invention to provide an automatic transcription method and apparatus that can further improve the accuracy of simple musical score data.

［課題を解決するための手段］かかる課題を解決するため、第１の本発明において
は、入力された音響信号からそのピッチ情報を抽出する
処理と、抽出されたピッチ情報から音響信号を同一音程
とみなせる区間に区分するセグメンテーション処理とを
少なくとも含み、音響信号を楽譜データに変換する自動
採譜方法において、セグメンテーション処理が、ピッチ
情報がある音程幅で続く連の長さを各サンプリング点に
ついて計数し；計数された連の長さが所定値以上の連続
する区間を検出し；検出された各区間について、最大の
連の長さを有するサンプリング点を代表点として抽出
し、隣り合う２つの代表点のピッチ情報の差が所定値以
上の差を有するとき、代表点間のピッチ情報の変化量が
最大のサンプリング点で音響信号を区分する処理からな
るようにした。[Means for Solving the Problems] In order to solve the problems, according to a first aspect of the present invention, a process of extracting pitch information from an input audio signal, and converting an audio signal from the extracted pitch information to the same pitch. A segmentation process that divides the sound signal into musical score data, wherein the segmentation process counts, for each sampling point, the length of a continuous run of pitch information for a certain pitch width; Detecting a continuous section in which the counted run length is equal to or more than a predetermined value; extracting a sampling point having a maximum run length as a representative point for each detected section; When the pitch information difference has a difference equal to or greater than a predetermined value, a process of classifying the audio signal at the sampling point where the amount of change in pitch information between representative points is the largest It was made to consist of.

また、第２の本発明においては、入力された音響信号
からそのピッチ情報を抽出するピッチ抽出手段と、抽出
されたピッチ情報から音響信号を同一音程とみなせる区
間に区分するセグメンテーション手段とを少なくとも備
え、音響信号を楽譜データに変換する自動採譜装置にお
いて、セグメンテーション手段が、ピッチ情報がある音
程幅で続く連の長さを各サンプリング点について計数す
る連の長さ計数部と；計数されたれの長さが所定値以上
の連続する区間を検出する区間検出部と；検出された各
区間について、最大の連の長さを有するサンプリング点
を代表点として抽出する代表点抽出部と；隣り合う２つ
の代表点のピッチ情報の差が所定値以上の差を有すると
き、代表点間のピッチ情報の変化量が最大のサンプリン
グ点で音響信号を区分する音響信号区分部とで構成し
た。In the second aspect of the present invention, the apparatus further includes at least pitch extracting means for extracting pitch information from the input audio signal, and segmentation means for dividing the audio signal into sections that can be regarded as having the same pitch from the extracted pitch information. An automatic music transcription apparatus for converting an acoustic signal into musical score data, wherein the segmentation means counts the length of a continuous run of pitch information at a certain interval for each sampling point; a run length counting unit; A section detection unit that detects a continuous section whose length is equal to or more than a predetermined value; a representative point extraction unit that extracts a sampling point having a maximum run length as a representative point for each detected section; When the difference between the pitch information of the representative points is greater than or equal to a predetermined value, the audio signal is divided at the sampling point where the amount of change in the pitch information between the representative points is the largest. It constituted by an acoustic signal classification unit for.

［作用］第１及び第２のの本発明共に、ピッチ情報から同一音
程とみなせる区間を区分するにつき、音響信号の揺らぎ
や突発的な外部音の影響を受けないように、ピッチ情報
がある音程幅で続く連の長さという特性値を用いて区分
するようにした。[Effect] In both the first and second aspects of the present invention, when a section that can be regarded as the same pitch is divided from the pitch information, the pitch information having the pitch information is used so as not to be affected by the fluctuation of the acoustic signal or the sudden external sound. The classification is made using the characteristic value of the length of the run following the width.

すなわち、第１の本発明においては、各サンプリング
点について連の長さを計数し、所定値以上の連の長さを
有する連続区間で最大の連の長さを有するサンプリング
点を検出し、この検出された隣合うサンプリング点間の
ピッチ情報の変化が十分に大きいとき変化が最も大きい
サンプリング点を音響信号の区分点とした。That is, in the first aspect of the present invention, the run length is counted for each sampling point, and the sampling point having the maximum run length in the continuous section having the run length equal to or more than the predetermined value is detected. When the change of the pitch information between the detected adjacent sampling points is sufficiently large, the sampling point at which the change is the largest is determined as the segmentation point of the acoustic signal.

また、第２の本発明においては、ピッチ情報がある音
程幅で続く連の長さを連の長さ計数部によって各サンプ
リング点について計数し、計数された連の長さが所定値
以上の連続区間を区間検出部によって検出し、検出され
た各区間について、代表点抽出部によって最大の連の長
さを有するサンプリング点を代表点として抽出し、隣り
合う２つの代表点のピッチ情報の差が所定値以上の差を
有するとき、音響信号区分部によって代表点間のピッチ
情報の変化量が最大のサンプリング点で音響信号を区分
するようにした。In the second aspect of the present invention, the length of a continuous run of pitch information at a certain pitch is counted for each sampling point by a run length counting section, and the run length counted is longer than a predetermined value. The section is detected by the section detection unit, and for each detected section, the sampling point having the maximum run length is extracted as a representative point by the representative point extraction unit, and the difference between the pitch information of two adjacent representative points is determined. When there is a difference equal to or more than a predetermined value, the acoustic signal is divided at the sampling point where the amount of change in pitch information between the representative points is maximum by the acoustic signal dividing unit.

［実施例］以下、本発明の一実施例を図面を参照しながら詳述す
る。Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.

自動採譜方式まず、本発明が適用される自動採譜方式について説明
する。Automatic transcription system First, an automatic transcription system to which the present invention is applied will be described.

第４図において、中央処理ユニット（CPU）１は、当
該装置の全体を制御するものであり、バス２を介して接
続されている主記憶装置３に格納されている第５図に示
す採譜処理プログラムを実行するものである。バス２に
は、CPU1及び主記憶装置３に加えて、入力装置としての
キーボード４、出力装置としての表示装置５、ワーキン
グメモリとして用いられる補助記憶装置６及びアナログ
／デジタル変換器７が接続されている。In FIG. 4, a central processing unit (CPU) 1 controls the whole of the apparatus, and performs a transcription process shown in FIG. 5 stored in a main storage device 3 connected via a bus 2. Execute the program. In addition to the CPU 1 and the main storage device 3, a keyboard 4 as an input device, a display device 5 as an output device, an auxiliary storage device 6 used as a working memory, and an analog / digital converter 7 are connected to the bus 2. I have.

アナログ／デジタル変換器７には、例えば、マイクロ
フォンでなる音響信号入力装置７が接続されている。こ
の音響信号入力装置８は、ユーザによって発声された歌
唱やハミングや、楽器から発生された楽音等の音響信号
を捕捉して電気信号に変換するものであり、その電気信
号をアナログ／デジタル変換器７に出力するものであ
る。To the analog / digital converter 7, for example, an audio signal input device 7 including a microphone is connected. The acoustic signal input device 8 captures an acoustic signal such as singing or humming uttered by a user or a musical tone generated from a musical instrument and converts the signal into an electric signal, and converts the electric signal into an analog / digital converter. 7 is output.

CPU1は、キーボード入力装置４によって処理が指令さ
れたとき、当該採譜処理を開始し、主記憶装置３に格納
されているプログラムを実行してアナログ／デジタル変
換器７によってデジタル信号に変換された音響信号を一
旦補助記憶装置６に格納し、その後、これら音響信号を
上述のプログラムを実行して楽譜データに変換して必要
に応じて表示装置５に出力するようになされている。When a process is instructed by the keyboard input device 4, the CPU 1 starts the transcription process, executes a program stored in the main storage device 3, and converts the sound converted into a digital signal by the analog / digital converter 7. The signals are temporarily stored in the auxiliary storage device 6, and thereafter, these sound signals are converted into musical score data by executing the above-described program and output to the display device 5 as necessary.

次に、CPU1が実行する音響信号を取り込んだ後の採譜
処理を第５図の機能レベルで示すフローチャートに従っ
て詳述する。Next, the music transcription process performed by the CPU 1 after capturing the audio signal will be described in detail with reference to the flowchart shown in the functional level of FIG.

まず、CPU1は、音響信号を自己相関分析して分析周期
毎に音響信号のピッチ情報を抽出し、また音響信号を２
乗和処理してパワー情報を抽出し、その後ノイズ除去や
平滑化処理等の後処理を実行する（ステップSP1、SP
2）。その後、CPU1は、ピッチ情報については、絶対音
程軸周りの分布状況を検出して絶対音程軸に対する音響
信号のずれ量を算出し、得られたピッチ情報をそのずれ
量に応じてシフトさせるチューニング処理を実行する
（ステップSP3）。すなわち、音響信号を発生した歌唱
者または楽器の音程軸と絶対音程軸との差が小さくなる
ようにピッチ情報を修正する。First, the CPU 1 performs an autocorrelation analysis on the acoustic signal, extracts pitch information of the acoustic signal at each analysis cycle, and
The power information is extracted by multiplying and summing, and then post-processing such as noise removal and smoothing is performed (steps SP1 and SP
2). Thereafter, for the pitch information, the CPU 1 detects a distribution state around the absolute pitch axis, calculates a shift amount of the acoustic signal with respect to the absolute pitch axis, and shifts the obtained pitch information according to the shift amount. Is executed (step SP3). That is, the pitch information is corrected so that the difference between the pitch axis of the singer or the musical instrument that generated the acoustic signal and the absolute pitch axis becomes smaller.

次いで、CPU1は、得られたピッチ情報が同一音を指示
するものと考えられるピッチ情報の連続期間を得て、音
響信号を１音ごとのセグメントに切り分けるセグメンテ
ーションを実行し、また、得られたパワー情報の変化に
基づいてセグメンテーションを実行する（ステップSP
4、SP5）。これら得られた両者のセグメント情報に基づ
いて、CPU1は、４分音符や８分音符等の時間長に相当す
る基準長を算出してこの基準長に基づいて再度セグメン
テーションを実行する（ステップSP6）。Next, the CPU 1 obtains a continuous period of the pitch information in which the obtained pitch information is considered to indicate the same sound, executes a segmentation for cutting the sound signal into segments for each sound, and executes the obtained power. Perform segmentation based on information changes (step SP
4, SP5). Based on these two pieces of segment information obtained, the CPU 1 calculates a reference length corresponding to a time length of a quarter note, an eighth note, etc., and executes the segmentation again based on this reference length (step SP6). .

CPU1は、このようにしてセグメンテーションされたセ
グメントのピッチ情報に基づきそのセグメント内のピッ
チ情報が最も近いと判断できる絶対音程軸上の音程にそ
のセグメントの音程を同定し、さらに、同定された連続
するセグメントの音程が同一か否かに基づいて再度セグ
メンテーションを実行する（ステップSP7、SP8）。The CPU 1 identifies the pitch of the segment as a pitch on the absolute pitch axis that can determine that the pitch information in the segment is closest based on the pitch information of the segment thus segmented, and further identifies the identified continuous The segmentation is executed again based on whether or not the intervals of the segments are the same (steps SP7 and SP8).

その後、CPU1は、各セグメントについてのピッチ情報
で捕らえた音程の出現頻度と、調に応じて定まる所定の
重み付け係数との積和を求めてこの積和の最大情報に基
づいて、例えば、ハ長調やイ短調というように入力音響
信号の楽曲の調を決定し、決定された調において隣り合
う音の音程差が小さい同定された音程についてその音程
をピッチ情報について見直して音程を確認、修正する
（ステップSP9、SP10）。次いで、CPU1は、最終的に決
定された音程から連続するセグメントについて同一なも
のがあるか否か、また連続するセグメント間でパワーの
変化があるか否かに基づいてセグメンテーションの見直
しを実行し、最終的なセグメンテーションを行なう（ス
テップSP11）。Thereafter, the CPU 1 obtains the product sum of the frequency of occurrence of the pitch captured by the pitch information for each segment and a predetermined weighting coefficient determined according to the key, and, based on the maximum information of the product sum, for example, C major And the key of the music of the input audio signal is determined, such as a minor key or a minor key, and for the identified pitch in which the pitch difference between adjacent sounds in the determined key is small, the pitch is reviewed with respect to pitch information to confirm and correct the pitch ( Steps SP9, SP10). Next, the CPU 1 executes a review of the segmentation based on whether or not there is the same continuous segment from the finally determined pitch, and whether or not there is a power change between the continuous segments, Final segmentation is performed (step SP11).

このようにして音程及び音長（セグメント）が決定さ
れると、CPU1は、楽曲は１拍目から始まる、フレーズの
最後の音は次の小節にまたがらない、小節ごとに切れ目
がある等の観点から小節を抽出し、この小節情報及びセ
グメンテーション情報から拍子を決定し、この決定され
た拍子情報及び小節の長さからテンポを決定する（ステ
ップSP12、SP13）。When the pitch and duration (segment) are determined in this manner, the CPU 1 determines that the music starts from the first beat, the last sound of the phrase does not extend to the next bar, that there is a break in each bar, etc. Measures are extracted from the viewpoint, the beat is determined from the measure information and the segmentation information, and the tempo is determined from the determined beat information and the length of the measure (steps SP12 and SP13).

そして、CPU1は決定された音程、音長、調、拍子及び
テンポの情報を整理して最終的に楽譜データを作成する
（ステップSP14）。Then, the CPU 1 organizes the information on the determined pitch, pitch, key, beat, and tempo to finally create the musical score data (step SP14).

ピッチ情報に基づくセグメンテーション次に、このような自動採譜方式におけるピッチ情報に
基づくセグメンテーション処理（ステップSP4参照）に
ついて、第１図及び第２図のフローチャートを用いて詳
述する。Segmentation Based on Pitch Information Next, a segmentation process (see step SP4) based on pitch information in such an automatic transcription system will be described in detail with reference to the flowcharts of FIGS.

なお、第１図はかかる処理を機能レベルで示すフロー
チャートであり、第２図は第１図をより詳細に示すフロ
ーチャートである。FIG. 1 is a flowchart showing such processing at a functional level, and FIG. 2 is a flowchart showing FIG. 1 in more detail.

CPU1は、分析周期毎の全てのサンプリング点につい
て、得られたピッチ情報から連の長さを算出する（ステ
ップSP20）。ここで、連の長さとは、第３図に示すよう
に着目点P1のピッチ情報を中心として対称な狭い所定範
囲R1のピッチ情報の値をとる連続した期間RUNをいう。
歌唱者等から発生された音響信号は、所定期間ずつ一定
の音程を取る音を意図されて発生されているので、揺ら
ぎを有するとはいえ、同一音程を意図している期間で
は、その期間のピッチ情報の変化は狭い範囲に存在する
と考えられ、連の長さRUNは同一音程期間を捕らえる目
安となる。The CPU 1 calculates a run length from the obtained pitch information for all sampling points for each analysis cycle (step SP20). Here, the run length refers to a continuous period RUN in which the value of the pitch information in a narrow predetermined range R1 symmetrical about the pitch information of the point of interest P1 as shown in FIG.
The acoustic signal generated by a singer or the like is intended to generate a sound having a fixed pitch at predetermined intervals, and thus has fluctuations. The change in pitch information is considered to exist in a narrow range, and the run length RUN is a measure for capturing the same interval.

次いで、CPU1は連の長さが所定値以上のサンプリング
点の連続する区間を算出する（ステップSP21）。これに
よりピッチ情報の変化による影響を除去している。その
後、CPU1は算出された各区間について、最大の連の長さ
を有するサンプリング点を代表点として抽出する（ステ
ップSP22）。Next, the CPU 1 calculates a continuous section of sampling points whose run lengths are equal to or greater than a predetermined value (step SP21). This removes the influence of the change in pitch information. After that, the CPU 1 extracts the sampling point having the maximum run length as a representative point for each calculated section (step SP22).

そして、最後に、CPU1は隣り合う２つの代表点のピッ
チ情報の差（音高の差）が所定値以上の差を有すると
き、代表点間のピッチ情報の変化量をその間にある各サ
ンプリング点について求め、変化量が最大のサンプリン
グ点で音響信号をセグメントする（ステップSP23）。Finally, when the difference (pitch difference) between the pitch information of two adjacent representative points has a difference equal to or greater than a predetermined value, the CPU 1 determines the amount of change in the pitch information between the representative points at each sampling point between them. , And the audio signal is segmented at the sampling point where the variation is the largest (step SP23).

かくして、音響信号の揺らぎに対して影響を受けず
に、また、突発的な外部音に影響を受けずにピッチ情報
に基づいてセグメンテーションを実行することができ
る。Thus, the segmentation can be performed based on the pitch information without being affected by the fluctuation of the acoustic signal and without being affected by sudden external sound.

次に、かかる処理を第２図に基づいてさらに詳述す
る。Next, such processing will be described in more detail with reference to FIG.

まず、CPU1は、分析周期毎の全てのサンプリング点ｔ
（ｔ＝０〜Ｎ）について連の長さrun（ｔ）を算出する
（ステップSP30）。First, the CPU 1 calculates all sampling points t for each analysis cycle.
The run length run (t) is calculated for (t = 0 to N) (step SP30).

次いで、処理対象のサンプリングてを指示するパラメ
ータｔを０クリアした後、全てのサンプリング点につい
て処理を終了していないことを確認して処理対象のサン
プリング点ｔの連の長さrun（ｔ）が閾値θｒより小さ
いか否かを判断する（ステップSP31〜33）。その結果、
連の長さrun（ｔ）が不十分であると判断すると、パラ
メータｔをインクリメントして上述のステップSP32に戻
る（ステップSP34）。Next, after clearing the parameter t indicating the sampling of the processing target to 0, it is confirmed that the processing has not been completed for all the sampling points, and the run length (run (t)) of the sampling point t to be processed is determined. It is determined whether it is smaller than the threshold value θr (steps SP31 to SP33). as a result,
If it is determined that the run length run (t) is insufficient, the parameter t is incremented and the process returns to step SP32 (step SP34).

このような処理を繰返すことにより、やがて、連の長
さrun（ｔ）が閾値θｒより長いサンプリング点が処理
対象となり、ステップSP33において否定結果が得られ
る。このときには、CPU1はのパラメータｔをパラメータ
ｓとして格納して連の長さrun（ｔ）が閾値θｒ以上に
なった開始点としてマークした後、全てのサンプリング
点について処理を終了していないことを確認して処理対
象のサンプリング点ｔの連の長さrun（ｔ）が閾値θｒ
より小さいか否かを判断する（ステップSP35〜37）。そ
の結果、連の長さrun（ｔ）が十分であると判断する
と、パラメータｔをインクリメントして上述のステップ
SP36に戻る（ステップSP38）。By repeating such processing, a sampling point whose run length (run (t)) is longer than the threshold value θr is eventually processed, and a negative result is obtained in step SP33. At this time, the CPU 1 stores the parameter t as the parameter s, marks it as a start point at which the run length run (t) has become equal to or larger than the threshold θr, and then confirms that the processing has not been completed for all sampling points. The run length (run (t)) of the sampling point t to be checked and processed is the threshold θr
It is determined whether or not it is smaller (steps SP35 to SP37). As a result, if it is determined that the run length run (t) is sufficient, the parameter t is incremented and the above-described step is performed.
The process returns to SP36 (step SP38).

このような処理を繰返すことにより、やがて、連の長
さrun（ｔ）が閾値θｒより短いサンプリング点が処理
対象となり、ステップSP37において肯定結果が得られ
る。かくして、連の長さrun（ｔ）が閾値θｒより長い
連続する区間、すなわち、マーク点ｓから一つ前の処理
サンプリング点ｔ−１までの区間が検出され、CPU1はこ
の区間のサンプリング点のうち最大の連の長さを与える
点に代表点としての印を付す（ステップSP39）。なお、
かかる処理が終了すると、上述のステップSP32に戻って
連の長さrun（ｔ）が閾値θｒ以上である次の連続区間
の検出処理を行なう。By repeating such a process, a sampling point whose run length run (t) is shorter than the threshold value θr is eventually processed, and a positive result is obtained in step SP37. Thus, a continuous section in which the run length run (t) is longer than the threshold value θr, that is, a section from the mark point s to the immediately preceding processing sampling point t−1 is detected, and the CPU 1 determines the sampling point of this section. The point giving the maximum run length is marked as a representative point (step SP39). In addition,
When this process ends, the process returns to the step SP32 to detect the next continuous section in which the run length (run (t)) is equal to or larger than the threshold value θr.

このようにして、全てのサンプリング点の処理が終了
して連の長さrun（ｔ）が閾値θｒ以上である連続区間
の検出及び代表点への印づけを終了すると、CPU1は再度
パラメータｔを０クリアした後、全てのサンプリング点
について処理を終了していないことを確認して処理対象
のサンプリング点ｔに代表点としての印が付されている
か否かを判断する（ステップSP40〜42）。付されていな
い場合には、パラメータｔをインクリメントとして上述
のステップSP41に戻る（ステップSP43）。In this way, when the processing of all the sampling points is completed and the detection of the continuous section in which the run length run (t) is equal to or larger than the threshold value θr and the marking on the representative point are completed, the CPU 1 resets the parameter t again. After clearing to 0, it is confirmed that the processing has not been completed for all sampling points, and it is determined whether or not a sampling point t to be processed is marked as a representative point (steps SP40 to SP42). If not, the process returns to step SP41 with the parameter t incremented (step SP43).

このような処理を繰返すことにより、やがて、印が付
されたサンプリング点が処理対象となり、最初の代表点
が見付け出され、CPU1はパラメータｓとしてこの値ｔを
格納してマークし、さらにパラメータｔをインクリメン
トし、全てのサンプリング点について処理を終了してい
ないことを確認して処理対象のサンプリング点ｔに代表
点としての印が付されているか否かを判断する（ステッ
プSP44〜47）。付されていない場合には、パラメータｔ
をインクリメントして上述のステップSP44に戻る（ステ
ップSP48）。By repeating such a process, the marked sampling point becomes a processing target, the first representative point is found out, and the CPU 1 stores and marks this value t as the parameter s, and further marks the parameter t. Is incremented, and it is confirmed that the processing has not been completed for all sampling points, and it is determined whether or not the sampling point t to be processed is marked as a representative point (steps SP44 to SP47). If not, the parameter t
Is incremented, and the process returns to the above-mentioned step SP44 (step SP48).

このような処理を繰返すことにより、やがて、印が付
されたサンプリング点が処理対象となって次の代表点ｔ
が見付け出される。このときには、CPU1はこれら隣り合
う代表点ｓ、ｔのピッチ情報の差が閾値θｐより小さい
か否かを判断し、小さい場合には、上述のステップSP44
に戻って次に隣り合う代表点を見付け出す処理に進み、
他方、差が閾値θｐ以上の場合には、代表点間のピッチ
情報の変化量をその間にある各サンプリング点ｓ〜ｔに
ついて求め、変化量が最大のサンプリング点にセグメン
トの印を付す（ステップSP49〜51）。By repeating such processing, the marked sampling points are eventually processed and the next representative point t
Is found. At this time, the CPU 1 determines whether or not the difference between the pitch information of these adjacent representative points s and t is smaller than the threshold value θp.
And proceed to the process of finding the next adjacent representative point.
On the other hand, if the difference is equal to or larger than the threshold value θp, the amount of change in pitch information between the representative points is determined for each of the sampling points st to t located therebetween, and the sampling point with the largest amount of change is marked with a segment (step SP49). ~ 51).

かかる処理を繰返すことで、代表点間に次々とセグメ
ントの印が付されていき、やがてステップSP46において
肯定結果が得られて当該処理が終了される。By repeating this processing, segments are sequentially marked between the representative points, and a positive result is obtained in step SP46, and the processing is ended.

従って、上述の実施例によれば、ピッチ情報が狭い範
囲に存在する長さを表す連の長さを用いてセグメンテー
ションを実行しているので、音響信号に揺らぎがあって
も、また突発的な外部音が含まれていても良好にセグメ
ンテーションを実行することができる。Therefore, according to the above-described embodiment, since the segmentation is performed using the length of the run indicating the length in which the pitch information exists in a narrow range, even if the sound signal fluctuates, it is also sudden. Even if an external sound is included, the segmentation can be performed well.

他の実施例なお、上述の実施例においては、自己相関分析によっ
て得られたピッチ情報をセグメンテーション処理するも
のを示したが、ピッチ情報の抽出方法はこれに限られな
いこは勿論である。Other Embodiments In the above-described embodiment, the case where the pitch information obtained by the autocorrelation analysis is subjected to the segmentation processing has been described. However, the method of extracting the pitch information is not limited to this.

また、上述の実施例においては、第５図に示す全ての
処理をCPU1が主記憶装置３に格納されているプログラム
に従って実行するものを示したが、その一部または全部
の処理をハードウェア構成で実行するようにしても良
い。例えば、第４図との対応部分に同一符号を付した第
６図に示すように、音響信号入力装置８からの音響信号
を増幅回路10を介して増幅した後、さらに前置フィルタ
11を介してアナログ／デジタル変換器12に与えてデジタ
ル信号に変換し、このデザタル信号に変換された音響信
号を信号処理プロセッサ13が自己相関分析してピッチ情
報を抽出し、また２乗和処理してパワー情報を抽出して
CPU1によるソフトウェア処理系に与えるようにしても良
い。このようにハードウェア構成（10〜13）に用いられ
る信号処理プロセッサ13としては、音声帯域の信号をリ
アルタイム処理し得ると共に、ホストのCPU1とのインタ
フェース信号が用意されているプロセッサ（例えば、日
本電気株式会社製μPD7720）を適用し得る。In the above-described embodiment, the CPU 1 executes all the processing shown in FIG. 5 according to the program stored in the main storage device 3. However, a part or all of the processing is performed by a hardware configuration. May be executed. For example, as shown in FIG. 6 in which the same reference numerals are given to the corresponding parts in FIG. 4, the audio signal from the audio signal input device 8 is amplified through the amplifier circuit 10, and then the pre-filter is added.
The digital signal is supplied to an analog / digital converter 12 via an analog-to-digital converter 11 and converted into a digital signal. The acoustic signal converted to the digital signal is subjected to auto-correlation analysis by a signal processor 13 to extract pitch information. And extract power information
It may be provided to the software processing system by the CPU1. As described above, as the signal processor 13 used in the hardware configuration (10 to 13), a processor (for example, NEC Corporation) capable of real-time processing of a signal in a voice band and having an interface signal with the host CPU 1 is prepared. Co., Ltd. μPD7720) can be applied.

［発明の効果］以上のように、本発明によれば、ピッチ情報がある程
度幅で続く連の長さを分析周期毎にサンプリング点のつ
いて計数し、連の長さが所定値以上の連続する区間を検
出し、検出された各区間について、最大の連の長さを有
するサンプリング点を代表点として抽出し、隣合う２つ
の代表点のピッチ情報の差が所定値以上の差を有すると
き、代表点間のピッチ情報の変化量をその間にある各サ
ンプリング点について求め、変化量が最大のサンプリン
グ点で音響信号をセグメントするようにしたので、音響
信号に揺らぎがあっても、また突発的な外部音が含まれ
ていても良好にセグメンテーションを実行することがで
き、最終的な楽譜データとして精度の良いものを得るこ
とができる。[Effects of the Invention] As described above, according to the present invention, the length of a run in which pitch information continues with a certain width is counted for each analysis cycle for each sampling point, and the length of the run is continuous for a predetermined value or more. Detect a section, for each detected section, extract the sampling point having the maximum run length as a representative point, when the difference between the pitch information of two adjacent representative points has a difference of a predetermined value or more, The amount of change in pitch information between representative points is obtained for each sampling point in between, and the audio signal is segmented at the sampling point where the amount of change is the largest, so even if there is fluctuation in the audio signal, Even if an external sound is included, the segmentation can be satisfactorily performed, and accurate final score data can be obtained.

[Brief description of the drawings]

第１図及び第２図は本発明の一実施例にかかるピッチ情
報に基づくセグメンテーション処理を示すフローチャー
ト、第３図は連の長さの説明に供する略線図、第４図は
本発明を適用する自動採譜方式の構成を示すブロック
図、第５図はその自動採譜処理を示すフローチャート、
第６図は自動採譜方式の他の構成を示すブロック図であ
る。１……CPU、３……主記憶装置、６……補助記憶装置、
７……アナログ／デジタル変換器、８……音響信号入力
装置。1 and 2 are flowcharts showing a segmentation process based on pitch information according to one embodiment of the present invention, FIG. 3 is a schematic diagram for explaining the length of a run, and FIG. 4 is an application of the present invention FIG. 5 is a block diagram showing the structure of an automatic music transcription system, FIG.
FIG. 6 is a block diagram showing another configuration of the automatic transcription system. 1 ... CPU, 3 ... main storage device, 6 ... auxiliary storage device,
7 ... A / D converter, 8 ... Acoustic signal input device.

───────────────────────────────────────────────────── フロントページの続き (72)発明者藤本正樹東京都港区芝５丁目７番15号日本電気技術情報システム開発株式会社内 (72)発明者水野正典東京都港区芝５丁目７番15号日本電気技術情報システム開発株式会社内審査官新井重雄 ──────────────────────────────────────────────────続き Continued on the front page (72) Inventor Masaki Fujimoto 5-7-15 Shiba, Minato-ku, Tokyo Inside NEC Technical Information System Development Co., Ltd. (72) Inventor Masanori Mizuno 5-7-1 Shiba, Minato-ku, Tokyo No. 15 Examiner, NEC Technical Information Systems Development Co., Ltd. Shigeo Arai

Claims

(57) [Claims]

1. A method for extracting at least one pitch signal from an input audio signal and a segmentation process for classifying the audio signal into sections that can be regarded as having the same pitch based on the extracted pitch information. In the automatic music transcription method for converting data into data, the segmentation process counts the length of a run that continues at a certain pitch interval for each of the pitch information at each sampling point, and calculates a continuous section in which the counted run length is equal to or more than a predetermined value. , And for each detected section, the sampling point having the maximum run length is extracted as a representative point. When the difference between the pitch information of two adjacent representative points has a difference equal to or more than a predetermined value, the representative An automatic music transcription method comprising a process of classifying the acoustic signal at a sampling point where the amount of change in pitch information between points is maximum.

2. The apparatus according to claim 1, further comprising: pitch extracting means for extracting pitch information from the input sound signal; and segmentation means for dividing the sound signal into sections that can be regarded as having the same pitch based on the extracted pitch information. In the automatic music notation apparatus for converting music data into musical score data, the segmentation means includes a run length counting unit that counts, for each sampling point, a run length following the pitch information at a certain pitch width; A section detection unit that detects a continuous section whose length is equal to or greater than a predetermined value; a representative point extraction unit that extracts a sampling point having a maximum run length as a representative point for each detected section; When the difference between the pitch information of the representative points is greater than or equal to a predetermined value, the audio signal is divided at the sampling point where the amount of change in the pitch information between the representative points is the largest. Automatic transcription apparatus characterized by comprising an acoustic signal classification unit for.