JPH01219627A

JPH01219627A - Automatic score taking method and apparatus

Info

Publication number: JPH01219627A
Application number: JP4611788A
Authority: JP
Inventors: Shichiro Tsuruta; 鶴田　七郎; Hironori Takashima; 洋典高島; Masaki Fujimoto; 正樹藤本; Masanori Mizuno; 水野　正典
Original assignee: NIPPON DENKI GIJUTSU JOHO SYST KAIHATSU KK; NEC Home Electronics Ltd; NEC Corp; Nippon Electric Co Ltd
Current assignee: NIPPON DENKI GIJUTSU JOHO SYST KAIHATSU KK; NEC Home Electronics Ltd; NEC Corp
Priority date: 1988-02-29
Filing date: 1988-02-29
Publication date: 1989-09-01
Anticipated expiration: 2012-04-30
Also published as: JP2604404B2

Abstract

PURPOSE:To enhance the accuracy of segmentation, by performing segmentation processing for classifying an input acoustic signal into a section regarded as the same sound and finely dividing a section having a length equal to or more than the reference length extracted from the length of the section. CONSTITUTION:The digital acoustic signal such as singing passing through an acoustic signal input apparatus 8 and an A/D converter 8 by a CPU 1 corresponding to the order from a keyboard 4 is stored in the auxiliary memory device 6 of a working memory and the program from a main memory device 3 is executed. The pitch data of the acoustic signal is extracted to perform segmentation processing for dividing the acoustic signal into a section regarding the acoustic signal as the same sound. Subsequently, a reference length is extracted on the basis of the length of the section and the section of the reference length or more is subjected to finely dividing segmentation processing based on the reference length to exclude a section containing two or more sounds caused by the fluctuation of power data and the accuracy of segmentation is enhanced to perform highly accurate score taking processing.

Description

【発明の詳細な説明】［産業上の利用分野］本発明は、歌唱音声やハミング音声や楽器音等の音響信
号から楽譜データを作成する自動採譜方法及び装置に関
し、特に音響信号を同一音程とみなせる区間に区分する
セグメンテーション処理に関するものである。[Detailed Description of the Invention] [Industrial Application Field] The present invention relates to an automatic musical notation method and apparatus for creating musical score data from acoustic signals such as singing voices, humming voices, musical instrument sounds, etc. This relates to segmentation processing that divides into segments that can be considered.

［従来の技術］歌唱音声やハミング音声や楽器音等の音響信号を楽譜デ
ータに変換する自動採譜方式においては、音響信号から
楽譜としての基本的な情報である音長、音程、調、拍子
及びテンポを検出することを有する。[Prior Art] In an automatic score transcription system that converts acoustic signals such as singing voices, humming voices, and musical instrument sounds into musical score data, the basic information of musical scores such as length, pitch, key, meter, and so on are extracted from the acoustic signals. It has to detect the tempo.

ところで、音響信号は基本波形の繰返し波形を連続的に
含む信号であるだけであり、上述した各情報を直ちに得
ることはできない。By the way, the acoustic signal is only a signal that continuously includes a repeating waveform of the basic waveform, and the above-mentioned information cannot be obtained immediately.

そこで、従来の自動採譜方式においては、まず、音響信
号の基本波形の繰返し情報（以下、ピッチ情報と呼ぶ）
及びパワー情報を分析周期毎に抽出し、その後、少なく
とも抽出されたパワー情報から音響信号を同一音程とみ
なせる区間（セグメント）に区分しくかかる処理をセグ
メンテーションと呼ぶ）、次いで、セグメントのピッチ
情報がら各セグメントの音響信号の音程として絶対音程
軸にそっな音程に同定し、ピッチ情報の分布に基づいて
音響信号の調を決定し、さらに、セグメントに基づいて
音響信号の拍子及びテンポを決定するという順序で各情
報を得ていた。Therefore, in the conventional automatic score transcription method, first, the repetition information (hereinafter referred to as pitch information) of the basic waveform of the acoustic signal is
and power information are extracted for each analysis cycle, and then the acoustic signal is divided into sections (segments) that can be considered to have the same pitch based on at least the extracted power information (segmentation). The sequence of identifying the pitch of the acoustic signal of a segment as a pitch along the absolute pitch axis, determining the key of the acoustic signal based on the distribution of pitch information, and further determining the time signature and tempo of the acoustic signal based on the segment. I was getting all the information.

従って、音程、拍子、テンポ等は、セグメント（音長）
に基づき決定されるので、セグメンテーション結果は、
特に楽譜データを作成する上で重要なものとなっている
。Therefore, pitch, time signature, tempo, etc. are determined by segments (note lengths).
The segmentation result is determined based on
This is especially important when creating musical score data.

［発明が解決しようとする課題］しかしながら、セグメンテーションを実行する上での情
報たるパワー情報及び又はピッチ情報は、揺らぎを有す
るため、２音以上を１つのセグメントにする場合もあっ
た。[Problems to be Solved by the Invention] However, since the power information and/or pitch information, which is information used to perform segmentation, has fluctuations, two or more notes may be combined into one segment.

上述のように、セグメンテーションは楽譜データを作成
する上で重要な要素であり、セグメンテーションの精度
が低いと、同定された音程も誤ったものとなる可能性が
多く、最終的に得られる楽譜データの精度も著しく低く
なるので、ピッチ情報に基づくセグメンテーション結果
及びパワー情報に基づくセグメンテーション結果の両者
から最終的にセグメンテーションを行なう場合、または
パワー情報から最終的なセグメンテーションを行なう場
合共に、セグメンテーション処理自体の精度も向上する
ことが望まれる。As mentioned above, segmentation is an important element in creating musical score data, and if the accuracy of segmentation is low, the identified pitches are likely to be incorrect, and the final score data The accuracy will also be significantly lower, so when final segmentation is performed from both the segmentation results based on pitch information and the segmentation results based on power information, or when the final segmentation is performed from power information, the accuracy of the segmentation process itself will also be lowered. It is hoped that this will improve.

本発明は、以上の点を考慮してなされたもので、セグメ
ンテーションの精度を向上させることができ、最終的な
楽譜データの精度を一段と向上させることのできる自動
採譜方法及び装置を提供しようとするものである。The present invention has been made in consideration of the above points, and aims to provide an automatic score transcription method and device that can improve the accuracy of segmentation and further improve the accuracy of final musical score data. It is something.

［課題を解決するための手段］かかる課題を解決するため、第１の本発明においては、
入力された音響信号波形の繰返し周期であり、音高を表
すピッチ情報及び音響信号のパワー情報を抽出する処理
と、ピッチ情報及び又はパワー情報に基づいて音響信号
を同一音程とみなせる区間に区分する第１のセグメンテ
ーション処理と、ピッチ情報に基づいてこの区分された
区間の音程として絶対音程軸上の音程に同定する音程同
定処理と、同定された音程及びパワー情報に基づいて同
一音が複数の区間に分割されている場合を検出してその
連続区間をつなげて一つの区間とする第２のセグメンテ
ーション処理とを少なくとも含み、音響信号を楽譜デー
タに変換する自動採譜方法において、第１のセグメンテ
ーション処理によって区分された各区間の長さに基づい
て所定の音符の時間長に相当する基準長を抽出する基準
長抽出処理と、抽出された基準長に基づいて区分された
区間のうち所定の長さ以上のものを細区分する区間細区
分処理とを、音程同定処理の前処理として設けた。[Means for solving the problem] In order to solve the problem, in the first invention,
It is the repetition period of the input acoustic signal waveform, and includes processing to extract pitch information representing the pitch and power information of the acoustic signal, and dividing the acoustic signal into sections that can be considered to have the same pitch based on the pitch information and/or power information. a first segmentation process, an interval identification process that identifies the interval on the absolute pitch axis as the interval of the segmented interval based on the pitch information, and an interval identification process that identifies the interval on the absolute interval axis based on the identified interval and power information, and the same note is divided into multiple intervals based on the identified interval and power information. In an automatic music score method for converting an acoustic signal into musical score data, the method includes at least a second segmentation process that detects cases where the continuous sections are divided into one section and connects the continuous sections to form one section. A standard length extraction process that extracts a standard length corresponding to the time length of a predetermined note based on the length of each divided section, and a standard length extraction process that extracts a standard length corresponding to the time length of a predetermined note based on the length of each divided section, and extracts a standard length that is longer than a predetermined length among the divided sections based on the extracted standard length. An interval subdivision process for subdividing something is provided as a preprocessing for the pitch identification process.

また、第２の本発明においては、入力された音響信号波
形の繰返し周期であり、音高を表すピッチ情報及び音響
信号のパワー情報を抽出する手段と、ピッチ情報及び又
はパワー情報に基づいて音響信号を同一音程とみなせる
区間に区分する第１のセグメンテーション手段と、ピッ
チ情報に基づいてこの区分された区間の音程として絶対
音程軸上の音程に同定する音程同定手段と、同定された
音程及びパワー情報に基づいて同一音が複数の区間に分
割されている場合を検出してその連続区間をつなげて一
つの区間とする第２のセグメンテーション手段とを一部
に備えて音響信号を楽譜データに変換する自動採譜装置
において、第１のセグメンテーション手段によって区分
された各区間の長さに基づいて所定の音符の時間長に相
当する基準長を抽出する基準長抽出手段と、抽出された
基準長に基づいて区分された区間のうち所定の長さ以上
のものを細区分する区間細区分手段とを、音程同定手段
の前段に設けた。Further, in the second aspect of the present invention, there is provided a means for extracting pitch information and power information of the acoustic signal, which is the repetition period of the input acoustic signal waveform and represents the pitch, and an acoustic signal based on the pitch information and/or the power information. a first segmentation means that divides the signal into intervals that can be considered to be the same pitch; a pitch identification means that identifies the interval on the absolute pitch axis as the pitch of the divided interval based on pitch information; and the identified pitch and power. A second segmentation means detects when the same sound is divided into a plurality of sections based on the information and connects the continuous sections to form one section, and converts the acoustic signal into musical score data. In the automatic score transcription apparatus, a reference length extraction means extracts a reference length corresponding to the time length of a predetermined note based on the length of each section divided by the first segmentation means; A section subdivision means for subdividing the sections having a predetermined length or more among the sections divided by the pitch identification means is provided at a stage preceding the pitch identification means.

［作用］第１の本発明においては、第１のセグメンテーション処
理で区分して同一音と考えられる区間に切り分けてもセ
グメンテーションの元となるピッチ情報及び又はパワー
情報が揺らぎを有しているので２音以上を一つの区間と
することもあることに鑑み、各区間の長さに基づいて所
定の音符の時間長に相当する基準長を抽出し、その基準
長に基づいて所定の長さ以上の長さを有する区間を細区
分するようにした。これにより、２音以上を含む区間を
排除するようにした。なお、第２のセグメンテーション
処理により１音を意図しているにも拘らず、２以上の区
間に分けられた区間はつなげられる。[Operation] In the first aspect of the present invention, even if the first segmentation process divides the sound into sections that are considered to be the same sound, the pitch information and/or power information that is the source of segmentation has fluctuations. Considering that more than one note may be considered as one section, a standard length corresponding to the time length of a predetermined note is extracted based on the length of each section, and based on that standard length, a period longer than the predetermined length is Sections with length are subdivided. As a result, sections containing two or more tones are excluded. Note that the second segmentation process connects the sections divided into two or more sections even though one note is intended.

また、第２の本発明においても、同様な点に着目し、基
準長抽出手段によって基準長を抽出し、細区分手段によ
ってその基準長に基づいて所定の長さ以上の区間を細区
分するようにした。Also, in the second invention, focusing on the same point, the reference length is extracted by the reference length extraction means, and the section having a predetermined length or more is subdivided based on the reference length by the subdivision means. I made it.

し実施例］以下、本発明の一実施例を図面を参照しながら詳述する
。Embodiment] Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.

且勉採証方式まず、本発明が適用される自動採譜方式について説明す
る。First, an automatic score transcription method to which the present invention is applied will be explained.

第３図において、中央処理ユニット（ＣＰＵ）１は、当
該装置の全体を制御するものであり、バス２を介して接
続されている主記憶装置３に格納されている第４図に示
す採譜処理プログラムを実行するものである。バス２に
は、ＣＰＵＩ及び主記憶装置３に加えて、入力装置とし
てのキーボード４、出力装置としての表示装置５、ワー
キングメモリとして用いられる補助記憶装置６及びアナ
ログ／デジタル変換器７が接続されている。In FIG. 3, a central processing unit (CPU) 1 controls the entire device, and performs the music transcription process shown in FIG. 4 stored in a main storage device 3 connected via a bus 2. It executes the program. Connected to the bus 2 are a keyboard 4 as an input device, a display device 5 as an output device, an auxiliary storage device 6 used as a working memory, and an analog/digital converter 7, in addition to the CPUI and main storage device 3. There is.

アナログ／デジタル変換器７には、例えば、マイクロフ
ォンでなる音響信号入力装置８が接続されている。この
音響信号入力装置８は、ユーザによって発声された歌唱
やハミングや、楽器から発生された楽音等の音響信号を
捕捉して電気信号に変換するものであり、その電気信号
をアナログ／デジタル変換器７に出力するものである。Connected to the analog/digital converter 7 is an acoustic signal input device 8 consisting of, for example, a microphone. This acoustic signal input device 8 captures acoustic signals such as singing or humming vocalized by a user, or musical sounds generated from musical instruments, and converts them into electrical signals, and converts the electrical signals into electrical signals. 7.

ＣＰＵＩは、キーボード入力装置４によって処理が指令
されたとき、当該採譜処理を開始し、主記憶装置３に格
納されているプログラムを実行してアナログ／デジタル
変換器７によってデジタル信号に変換された音響信号を
一旦補助記憶装置６に格納し、その後、これら音響信号
を上述のプログラムを実行して楽譜データに変換して必
要に応じて表示装置５に出力するようになされている。When the processing is instructed by the keyboard input device 4, the CPU starts the music transcription processing, executes the program stored in the main storage device 3, and converts the sound converted into a digital signal by the analog/digital converter 7. The signals are temporarily stored in the auxiliary storage device 6, and then the above-mentioned program is executed to convert these acoustic signals into musical score data, which is output to the display device 5 as necessary.

次に、ＣＰＵＩが実行する音響信号を収り込んだ後の採
譜処理を第４図の機能レベルで示すフローチャートに従
って詳述する。Next, the score transcription process performed by the CPU after capturing the audio signal will be described in detail according to the flowchart shown at the functional level in FIG. 4.

まず、ＣＰＵＩは、音響信号を自己相関分析して分析周
期毎に音響信号のピッチ情報を抽出し、また２乗和処理
して分析周期毎にパワー情報を抽出し、その後ノイズ除
去や平滑化処理等の後処理を実行する（ステップＳＰＩ
、５Ｐ２）。その後、ＣＰＵＩは、ピッチ情報について
は、その分布状況に基づいて絶対音程軸に対する音響信
号の音程軸のずれ量を算出し、得られたピッチ情報をそ
のずれ量に応じてシフトさせるチューニング処理を実行
する（ステップ５Ｐ３）。すなわち、音響信号を発生し
た歌唱者または楽器の音程軸と絶対音程軸との差が小さ
くなるようにピッチ情報を修正する。First, the CPU performs autocorrelation analysis on the acoustic signal to extract pitch information of the acoustic signal for each analysis period, performs sum-of-squares processing to extract power information for each analysis period, and then performs noise removal and smoothing processing. Execute post-processing such as (step SPI
, 5P2). After that, the CPU calculates the amount of deviation of the pitch axis of the acoustic signal from the absolute pitch axis based on the distribution status of the pitch information, and executes tuning processing to shift the obtained pitch information according to the amount of deviation. (Step 5P3). That is, the pitch information is corrected so that the difference between the pitch axis of the singer or musical instrument that generated the acoustic signal and the absolute pitch axis becomes small.

次いで、ＣＰＵＩは、得られたピッチ情報が同一音程を
指示するものと考えられるピッチ情報の連続期間を得て
、音響信号を１音ごとのセグメントに切り分けるセグメ
ンテーションを実行し、また、得られたパワー情報の変
化に基づいてセグメンテーションを実行する（ステップ
ＳＰ４．５Ｐ５）。これら得られた両者のセグメント情
報に基づいて、ＣＰＵＩは、４分音符や８分音符等の時
間長に相当する基準長を算出してこの基準長に基づいて
より詳細にセグメンテーションを実行する（ステップ５
Ｐ６）。Next, the CPU obtains continuous periods of pitch information in which the obtained pitch information is considered to indicate the same pitch, performs segmentation to cut the acoustic signal into segments for each note, and also calculates the obtained power. Segmentation is performed based on changes in information (step SP4.5P5). Based on the obtained segment information, the CPU calculates a reference length corresponding to the time length of a quarter note, an eighth note, etc., and executes segmentation in more detail based on this reference length (step 5
P6).

ＣＰＵＩは、このようにしてセグメンテーションされた
セグメントのピッチ情報に基づきそのピッチ情報が最も
近いと判断できる絶対音程軸上の音程にそのセグメント
の音程を同定し、さらに、同定された連続するセグメン
トの音程が同一か否かに基づいてセグメンテーションの
見直しを実行する（ステップＳＰ７．５Ｐ８）。Based on the pitch information of the segments segmented in this way, the CPUI identifies the pitch of the segment to the pitch on the absolute pitch axis that can be determined to be closest to the pitch information, and further identifies the pitch of the identified continuous segment. The segmentation is reviewed based on whether or not they are the same (step SP7.5P8).

その後、ＣＰＵＩは、チューニング処理後のピッチ情報
を集計して得た音程の出現頻度と、調に応じて定まる所
定の重み付は係数との積和を求めてこの積和の最大情報
に基づいて、例えば、ハ長調やイ短調というように入力
音響信号の楽曲の調を決定し、決定された調における音
階の所定の音程についてその音程をピッチ情報について
見直して音程を確認、修正する（ステップＳＰ９．５Ｐ
１０）。次いで、ＣＰＵＩは、最終的に決定された音程
から連続するセグメントについて同一なものがあるか否
か、また連続するセグメント間でパワーの変化があるか
否かに基づいてセグメンテーションの見直しを実行し、
必要ならばセグメントをつなげて最終的なセグメンテー
ションを行なう（ステップ５ＰＩＩ）。After that, the CPU calculates the product sum of the appearance frequency of the pitch obtained by summing up the pitch information after the tuning process and the predetermined weighting coefficient determined according to the key, and calculates the product sum based on the maximum information of this product sum. For example, the key of the music of the input acoustic signal is determined, such as C major or A minor, and the pitch information is reviewed for a predetermined interval of the scale in the determined key to confirm and correct the interval (step SP9 .5P
10). Then, the CPUI performs a segmentation review based on whether consecutive segments from the finally determined pitch are the same and whether there is a change in power between consecutive segments;
If necessary, connect the segments for final segmentation (step 5PII).

このようにして音程及びセグメントが決定されると、Ｃ
ＰＵＩは、楽曲は１拍目から始まる、フレーズの最後の
音は次の小節にまたがらない、小節ごとに切れ目がある
等の観点から小節を抽出し、この小節情報及びセグメン
テーション情報から拍子を決定し、この決定された拍子
情報及び小節の長さからテンポを決定する（ステップ５
Ｐ１２．５Ｐ１３）。Once the pitch and segment are determined in this way, C
PUI extracts measures from the viewpoints of whether a song starts from the first beat, the last note of a phrase does not span the next measure, or there is a break between each measure, and determines the time signature from this measure information and segmentation information. Then, the tempo is determined from this determined time signature information and measure length (step 5).
P12.5P13).

そして、ＣＰＵＩは決定された音程、音長、調、拍子及
びテンポの情報を整理して最終的に楽譜データを作成す
る（ステップ５Ｐ１４＞。The CPU then organizes the determined pitch, length, key, time signature, and tempo information and finally creates musical score data (Step 5P14>).

セ　メンテーシ　ンの　育　　１次に、このような自動採譜方式におけるセグメンテーシ
ョンの見直し処理（ステップＳＰ６参照）について、第
１図のフローチャートを用いて詳述する。Developing Cementation 1 Next, the segmentation review process (see step SP6) in such an automatic music transcription method will be described in detail using the flowchart in FIG.

なお、このような見直し処理は、セグメントが誤って２
音以上含むように区分された場合には、その同定音程も
誤ったものとなる可能性が強く楽譜データの精度を低下
させるので、音程同定処理前において予めセグメントを
細分しておき、そのセグメントで音程同定を実行させて
音程同定処理の精度を向上させようにするために設けら
れた。Note that such a review process may result in segments being incorrectly
If the segment is segmented to include more than one note, there is a strong possibility that the identified pitch will be incorrect, reducing the accuracy of the score data. Therefore, before the pitch identification process, subdivide the segment in advance and It is provided to execute pitch identification and improve the accuracy of pitch identification processing.

この場合、１音を２以上のセグメントに区分することも
考えられるが、上述したステップ５ＰＩＩのセグメンテ
ーション処理で同定音程及びパワー情報に基づいて１音
と考えられるセグメントをつなげるので問題となること
はない。In this case, it is conceivable to divide one note into two or more segments, but this does not pose a problem because segments considered to be one note are connected based on the identified pitch and power information in the segmentation process in Step 5 PII described above. .

ＣＰＵＩは、かかるセグメンテーションの見直し処理に
おいては、まず、処理対象のセグメントが最後のセグメ
ントでないことを確認して当該セグメントの長さと全体
のセグメンテーション結果とのマツチングを実行する（
ステップ５Ｐ２０．２１）。In the segmentation review process, the CPU first checks that the segment to be processed is not the last segment, and then matches the length of the segment with the overall segmentation result (
Step 5P20.21).

ここで、マツチングとは、当該セグメントの長さの整数
分の１または整数倍の値と他のセグメントの長さとの差
の絶対値の総和、及び、当該セグメントの長さの整数分
の１または整数倍の値と他のセグメントの長さが一致し
ない回数（ミスマッ千回数）を求めることをいう。なお
、この実施例の場合、マツチングの相手となる他のセグ
メントは、ピッチ情報に基づいて得られたセグメント及
びパワー情報に基づいて得られたセグメントの両者にな
る。Here, matching refers to the sum of the absolute values of the differences between an integral fraction or multiple of the length of the segment and the length of another segment, and This means finding the number of times the length of an integer multiple does not match the length of another segment (1,000 mismatches). In this embodiment, the other segments to be matched are both the segment obtained based on the pitch information and the segment obtained based on the power information.

例えば、前段のセグメンテーション処理（ステップＳＰ
４．５）で第２図に示すような１０個のセグメントに区
分された場合であって第１のセグメントＳ１が処理対象
の場合、このマツチング処理によって差の総和情報とし
ては、ｒｌ＋３＋１−ト１＋５＋Ｏ＋０＋１＋９＝２１
Ｊが得られ、ミスマツチ回数としては７回が得られる。For example, the previous segmentation process (step SP
4.5), when the first segment S1 is divided into 10 segments as shown in FIG. =21
J is obtained, and the number of mismatches is 7.

ＣＰＵＩは、このようにして処理対象のセグメントにつ
いてミスマツチ回数及びその程度（差の総和情報）が得
られると、補助記憶装置６に格納した後、処理対象セグ
メントを次のセグメントとして上述のステップ５Ｐ２０
に戻る（ステップ５Ｐ２２）。When the CPUI obtains the number of mismatches and its degree (total difference information) for the segment to be processed in this way, the CPU stores it in the auxiliary storage device 6 and then sets the segment to be processed as the next segment to step 5P20 described above.
Return to step 5P22.

かかるステップ５Ｐ２０〜２２でなる処理ループを繰り
返すことにより、全てのセグメントについてミスマツチ
回数及びその程度の情報が得られ、やがて、ステップ５
Ｐ２０において肯定結果が得られる。このとき、ＣＰＵ
Ｉは、補助記憶装置６に格納されている全てのミスマツ
チ回数及びその程度の情報から、これらが最も小さいセ
グメント長に基づいて基準長を決定する（ステップ５Ｐ
２４）。ここで、基準長とは、４分音符（または８分音
符）に相当する時間長をいう。By repeating the processing loop consisting of Steps 5P20 to 22, information on the number of mismatches and its degree can be obtained for all segments, and eventually Step 5P20-22 is repeated.
A positive result is obtained at P20. At this time, the CPU
I determines a reference length based on the segment length with the smallest mismatch count and degree information stored in the auxiliary storage device 6 (step 5P).
24). Here, the reference length refers to a time length corresponding to a quarter note (or eighth note).

第２図の例の場合、ミスマツチ回数及びその程度が最も
小さいセグメント長として「６０」が抽出され、この長
さ「６０」の２倍のｒ１２０Ｊが基準長として選定され
る。実際上、４分音符に対する時間長が取り得る長さは
、所定範囲の値であり、かかる観点から「６０」ではな
く、ｒ１２０Ｊが基準長として抽出される。In the case of the example shown in FIG. 2, "60" is extracted as the segment length with the smallest number of mismatches and its degree, and r120J, which is twice this length "60", is selected as the reference length. In reality, the length that the time length for a quarter note can take is a value within a predetermined range, and from this point of view, r120J is extracted as the reference length instead of "60".

基準長を抽出すると、ＣＰＵＩは概ね基準長より長いセ
グメントを基準長の半分の植菌後でそのセグメントを細
区分し、かかるセグメンテーションの見直し処理を終了
する（ステップ５Ｐ２５）。After extracting the reference length, the CPUI subdivides the segment approximately longer than the reference length after inoculating half of the reference length, and ends the segmentation review process (step 5P25).

第２図の例の場合、第５のセグメントＳ５を「６１」と
「６０」に細区分し、第６のセグメントＳ６を「６３」
と「６２」に細区分し、第９のセグメントＳ９を「６０
」と「５９」に細区分し、第１０のセグメントＳＩＯを
「５８」、「５８」、「５８」、「５７」に細区分する
。In the example of FIG. 2, the fifth segment S5 is subdivided into "61" and "60", and the sixth segment S6 is divided into "63".
and “62”, and the ninth segment S9 is divided into “60” and “62”.
” and “59”, and the tenth segment SIO is subdivided into “58”, “58”, “58”, and “57”.

従って、上述の実施例によれば、２音以上を一つのセグ
メントとしてセグメンテーションされた場合にも、細区
分することができ、音程の同定処理や音程の修正処理等
の処理を正確に実行させることができる。Therefore, according to the above-described embodiment, even when two or more tones are segmented as one segment, it is possible to subdivide them, and it is possible to accurately perform processes such as pitch identification processing and pitch correction processing. Can be done.

かくするにつき、同一音と考えられるセグメントをつな
げる処理が後処理として設けられているので、１音が誤
って２以上のセグメントに区分されたままとなることは
ない。In this case, since post-processing is provided to connect segments considered to be the same sound, one sound will not remain erroneously divided into two or more segments.

卸ム実施馴なお、上述の実施例においては、基準長をミスマツチ回
数及びその程度に基づいて抽出するものを示したが、セ
グメント長の発生頻度に基づいて抽出するようにしても
良い。In addition, in the above-described embodiment, the reference length is extracted based on the number of mismatches and the degree thereof, but the reference length may be extracted based on the frequency of occurrence of segment lengths.

また、上述の実施例においては、４分音符に対応する時
間長を基準長としたものを示したが、８分音符に対応す
る時間長を基準長とするようにしても良い。この場合に
は、基準長の半分の長さで細区分するのではなく、基準
長自体で細区分することになる。Further, in the above-described embodiment, the time length corresponding to a quarter note is used as the reference length, but the time length corresponding to an eighth note may be used as the reference length. In this case, instead of subdividing by half the standard length, subdivision is performed by the standard length itself.

さらに、上述の実施例においては、ピッチ情報及びパワ
ー情報に基づくセグメンテーションの両者を有するもの
に適用したものを示したが、少なくともパワー情報に基
づくセグメンテーション処理を有する自動採譜方式に対
して本発明を適用することができる。Furthermore, although the above-described embodiments have been applied to a method having segmentation based on both pitch information and power information, the present invention is applicable to at least an automatic music transcription method having segmentation processing based on power information. can do.

さらにまた、上述の実施例においては、第４図に示す全
ての処理をＣＰＵＩが主記憶装置３に格納されているプ
ログラムに従って実行するものを示したが、その一部ま
たは全部の処理をハードウェア構成で実行するようにし
ても良い。例えば、第３図との対応部分に同一符号を付
した第５図に示すように、音響信号入力装置８からの音
響信号を増幅回路１０を介して増幅した後、さらに前置
フィルタ１１を介してアナログ／デジタル変換器１２に
与えてデジタル信号に変換し、このデジタル信号に変換
された音響信号を信号処理プロセッサ１３が自己相関分
析してピッチ情報を抽出し、また２乗和処理してパワー
情報を抽出してＣＰＵ１によるソフトウェア処理系に与
えるようにしても良い。このようなハードウェア構成（
１０〜１３）に用いられる信号処理プロセッサ１３とし
ては、音声帯域の信号をリアルタイム処理し得ると共に
、ホストのＣＰＵＩとのインタフェース信号が用意され
ているプロセッサ（例えば、日本電気株式会社製μＰ　
Ｄ　７７２０）を適用し得る。Furthermore, in the above-described embodiment, all the processes shown in FIG. It may also be executed in the configuration. For example, as shown in FIG. 5, in which parts corresponding to those in FIG. The signal processing processor 13 performs autocorrelation analysis on the acoustic signal converted to the digital signal to extract pitch information, and performs sum-of-squares processing to obtain power. The information may be extracted and provided to a software processing system by the CPU 1. Hardware configuration like this (
As the signal processing processor 13 used in 10 to 13), a processor that can process audio band signals in real time and has an interface signal with the host CPU (for example, NEC Corporation's μP
D 7720) may be applied.

［発明の効果］以上のように、本発明によれば、４分音符または８分音
符に相当する基準長を抽出してこの基準長に基づいて既
に得られたヤグメントを細区分するようにしなので、音
程同定処理等の後処理の精度を一段と向上させることが
でき、最終的な楽譜データの精度を向上させることがで
きる。[Effects of the Invention] As described above, according to the present invention, a reference length corresponding to a quarter note or an eighth note is extracted and an already obtained segment is subdivided based on this reference length. , the accuracy of post-processing such as pitch identification processing can be further improved, and the accuracy of final musical score data can be improved.

[Brief explanation of the drawing]

第１図は本発明の一実施例にかかるセグメンテーション
の見直し処理を示すフローチャート、第２図はその見直
し処理の説明に供する路線図、第３図は本発明を適用す
る自動採譜方式の構成を示すブロック図、第４図はその
自動採譜処理手順を示すフローチャート、第５図は自動
採譜方式の他の構成を示すブロック図である。１・・・ＣＰＵ、３・・・主記憶装置、６・・・補助記
憶装置、７・・・アナロタ／デジタル変換器、８・・・
音響信号入力装置。FIG. 1 is a flowchart showing a segmentation review process according to an embodiment of the present invention, FIG. 2 is a route map for explaining the review process, and FIG. 3 is a diagram showing the configuration of an automatic music transcription system to which the present invention is applied. FIG. 4 is a flowchart showing the automatic music transcription processing procedure, and FIG. 5 is a block diagram showing another configuration of the automatic music transcription method. 1... CPU, 3... Main storage device, 6... Auxiliary storage device, 7... Analog/digital converter, 8...
Acoustic signal input device.

Claims

[Claims]

(1) Process of extracting pitch information representing the pitch and power information of the acoustic signal, which is the repetition period of the input acoustic signal waveform, and matching the acoustic signal based on the pitch information and/or the power information. A first segmentation process that divides into intervals that can be regarded as intervals; an interval identification process that identifies intervals on the absolute pitch axis as intervals in this divided interval based on the pitch information; and the identified interval and the power information. converting the acoustic signal into musical score data, including at least a second segmentation process that detects when the same sound is divided into a plurality of sections based on the above, and connects the continuous sections to form one section; In the automatic music transcription method, a standard length extraction process that extracts a standard length corresponding to the time length of a predetermined note based on the length of each section divided by the first segmentation process; An automatic musical notation method characterized in that a section subdivision process for subdividing sections having a predetermined length or more out of the sections divided by the interval is provided as a preprocessing of the pitch identification process.

(2) means for extracting pitch information representing the pitch and power information of the acoustic signal, which is a repetition period of the input acoustic signal waveform; and a means for extracting the pitch information indicating the pitch and the power information of the acoustic signal; a first segmentation means for dividing into intervals that can be regarded as intervals; an interval identification means for identifying intervals on the absolute interval axis as intervals in the divided intervals based on the pitch information; and information on the identified intervals and the power. and a second segmentation means that detects when the same sound is divided into a plurality of sections based on the above-mentioned section and connects the continuous sections to form one section, and converts the acoustic signal into musical score data. In the automatic score transcription apparatus for converting, a reference length extraction means extracts a reference length corresponding to the time length of a predetermined note based on the length of each section divided by the first segmentation means; An automatic score transcription apparatus characterized in that a section subdivision means for subdividing sections having a predetermined length or more among the sections divided based on a reference length is provided at a stage upstream of the pitch identification means.