JP2604412B2

JP2604412B2 - Automatic music transcription method and device

Info

Publication number: JP2604412B2
Application number: JP4612888A
Authority: JP
Inventors: 七郎鶴田; 洋典高島; 正樹藤本; 正典水野
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1988-02-29
Filing date: 1988-02-29
Publication date: 1997-04-30
Anticipated expiration: 2012-04-30
Also published as: JPH01219636A

Abstract

PURPOSE:To prevent that segmentation is performed erroneously, by finely dividing a section wherein power data is larger than a threshold value at the rising change point of power data and subsequently looking at segmentation again on the basis of the length of a segment. CONSTITUTION:A CPU 1 compares the power data at each analytical point with a threshold value to set not only the section larger than the threshold value as the segment of an effective section but also the section smaller than said threshold value as the segment of an ineffective section and subsequently finely divides the effective section at an extracted rising change point. Thereafter, the CPU 1 counts the lengths of both segments regardless of effectiveness and ineffectiveness and connects the segment whose length is shorter than a predetermined length to the front segment to form one segment. By this method, it can be prevented that segmentation is performed erroneously by a noise component and processing on and after such as intervale identification processing can be well executed.

Description

【発明の詳細な説明】［産業上の利用分野］本発明は、歌唱音声やハミング音声や楽器音等の音響
信号から楽譜データを作成する自動採譜方法及び装置に
関し、特に音響信号を同一音程とみなせる区間に区分す
るセグメンテーション処理に関するものである。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an automatic music notation method and apparatus for creating musical score data from acoustic signals such as singing voices, humming voices, and instrument sounds, and more particularly, to a method for converting an audio signal into the same pitch. This relates to a segmentation process for dividing into sections that can be regarded as sections.

［従来の技術］歌唱音声やハミング音声や楽器音等の音響信号を楽譜
データに変換する自動採譜方式においては、音響信号か
ら楽譜としての基本的な情報である音長、音程、調、拍
子及びテンポを検出することを有する。[Prior Art] In an automatic transcription system for converting an acoustic signal such as a singing voice, a humming voice, or a musical instrument sound into musical score data, a sound length, a pitch, a key, a time signature, and the like, which are basic information as a musical score from an acoustic signal. Detecting the tempo.

ところで、音響信号は基本波形の繰返し波形を連続的
に含む信号であるだけであり、上述した各情報を直ちに
得ることはできない。By the way, an acoustic signal is only a signal that continuously includes a repetitive waveform of a basic waveform, and the above-described information cannot be obtained immediately.

そこで、従来の自動採譜方式においては、まず、音響
信号の音高を表す基本波形の繰返し情報（以下、ピッチ
情報と呼ぶ）及びパワー情報を分析周期毎に抽出し、そ
の後、抽出されたピッチ情報及び又はパワー情報から音
響信号を同一音程とみなせる区間（セグメント）に区分
し（かかる処理をセグメンテーションと呼ぶ）、次い
で、セグメントのピッチ情報から各セグメントの音響信
号を絶対音程軸にそった音程を決定し、決定された音程
情報に基づいて音響信号の調を決定し、さらに、セグメ
ントに基づいて音響信号の拍子及びテンポを決定すると
いう順序で各情報を得ていた。Therefore, in the conventional automatic transcription method, first, repetition information (hereinafter, referred to as pitch information) of a basic waveform representing a pitch of an acoustic signal and power information are extracted for each analysis cycle, and thereafter, the extracted pitch information is extracted. And / or dividing the sound signal into sections (segments) that can be regarded as the same pitch based on the power information (this process is called segmentation), and then determines the pitch along the absolute pitch axis of the sound signal of each segment from the pitch information of the segment. Then, the tone of the sound signal is determined based on the determined pitch information, and the information is obtained in the order of determining the beat and tempo of the sound signal based on the segment.

従って、音程、拍子、テンポ等は、セグメント（音
長）に基づき決定されるので、セグメンテーション処理
は、特に楽譜データを作成する上で重要なものとなって
いる。Therefore, the interval, time signature, tempo, and the like are determined based on the segment (tone length), and thus the segmentation processing is particularly important in creating musical score data.

［発明が解決しようとする課題］このように、セグメンテーションは楽譜データを作成
する上で重要な要素であり、セグメンテーションの精度
が低いと、最終的に得られる楽譜データの精度も著しく
低くなるので、ピッチ情報に基づくセグメンテーション
結果及びパワー情報に基づくセグメンテーション結果の
両者から最終的にセグメンテーションを行なう場合、ま
たはパワー情報から最終的なセグメンテーションを行な
う場合共に、パワー情報からのセグメンテーション処理
自体の精度も向上することが望まれる。[Problems to be Solved by the Invention] As described above, the segmentation is an important element in creating the score data, and if the accuracy of the segmentation is low, the accuracy of the finally obtained score data also becomes extremely low. The accuracy of the segmentation process itself from the power information should be improved both when performing the final segmentation from both the segmentation result based on the pitch information and the segmentation result based on the power information, or when performing the final segmentation based on the power information. Is desired.

本発明は、以上の点を考慮してなされたもので、パワ
ー情報に基づいたセグメンテーションを良好に実行する
ことができ、楽譜データの精度を向上させることのでき
る自動採譜方法及び装置を提供しようとするものであ
る。The present invention has been made in view of the above points, and an object of the present invention is to provide an automatic transcription method and apparatus that can perform segmentation based on power information satisfactorily and improve the accuracy of musical score data. Is what you do.

［課題を解決するための手段］かかる課題を解決するため、第１の本発明において
は、入力された音響信号のパワー情報を抽出する処理
と、抽出されたパワー情報に基づいて音響信号を同一音
程とみなせる区間に区分するセグメンテーション処理と
を少なくとも含み、音響信号を楽譜データに変換する自
動採譜方法において、セグメンテーション処理が、パワ
ー情報が所定値以上の有効区間と所定値以下の無効区間
とに区分する処理と：有効区間についてパワー情報の立
上り変化点を抽出する処理と：抽出された立上り変化点
で有効区間を細区分する処理と：細区分された有効区間
を含めた有効区間及び無効区間の長さを抽出する処理
と：抽出された長さが所定長より短い区間を前後いずれ
かの区間につなげる処理とでなるようにした。[Means for Solving the Problem] In order to solve the problem, in the first aspect of the present invention, a process of extracting power information of an input audio signal is the same as a process of extracting an audio signal based on the extracted power information. A segmentation process for converting an acoustic signal into musical score data at least including a segmentation process for segmenting into a section that can be regarded as a musical interval. Processing: extracting a rising change point of power information for an effective section; processing of subdividing an effective section by the extracted rising transition point; and processing of an effective section and an invalid section including the subdivided effective section. Processing for extracting length: processing for connecting a section whose extracted length is shorter than a predetermined length to any of the preceding and following sections.

また、第２の本発明においては、入力された音響信号
からパワー情報を抽出するパワー抽出手段と、抽出され
たパワー情報に基づいて上記音響信号を同一音程とみな
せる区間に区分するセグメンテーション手段とを一部に
備えて音響信号を楽譜データに変換する自動採譜装置に
おいて、セグメンテーション手段を、パワー情報が所定
値以上の有効区間と所定値以下の無効区間とに区分する
分割処理部と：有効区間についてパワー情報の立上り変
化点を抽出する変化点抽出部と：抽出された立上り変化
点で有効区間を細区分する細区分部と：細区分された有
効区間を含めた有効区間及び無効区間の長さを抽出する
区間長抽出部と：抽出された長さが所定長より短い区間
を前後いずれかの区間につなげる区間補正部とで構成し
た。Further, in the second aspect of the present invention, the power extraction means for extracting power information from the input audio signal, and the segmentation means for dividing the audio signal into sections that can be regarded as having the same pitch based on the extracted power information, In an automatic transcription apparatus for converting an audio signal into musical score data provided partially, a division processing unit for dividing a segmentation unit into an effective section in which power information is equal to or more than a predetermined value and an invalid section in which power information is equal to or less than a predetermined value: A change point extracting unit that extracts a rising change point of power information; and a subdivision unit that subdivides an effective section based on the extracted rising change point: lengths of an effective section and an invalid section including the subdivided effective section. And a section correction section for connecting a section whose extracted length is shorter than a predetermined length to one of the preceding and following sections.

［作用］第１の本発明においては、音響信号のパワーの揺らぎ
やノイズの影響を小さくするように、パワー情報が所定
値以上の有効区間と、所定値以下の無効区間とに分割す
る。また、２音以上を含む有効区間が生じることもある
ので、音の移行時にはパワーも立上ることに着目し、パ
ワーの立上り変化点を抽出して有効区間を細区分するよ
うにした。さらに、ノイズ等で誤って区分された場合に
は、その区間の長さは短くなっているので、長さが所定
長より短い区間を抽出して前後一方の区間につなげて一
つの区間とするようにした。[Operation] In the first aspect of the present invention, power information is divided into an effective section having a power value equal to or greater than a predetermined value and an invalid section having a power value equal to or less than the predetermined value so as to reduce the influence of power fluctuation and noise of the audio signal. In addition, since an effective section including two or more sounds may occur, the power rises at the time of sound transition, and the effective rising section is subdivided by extracting a power rising change point. Further, when the section is erroneously classified due to noise or the like, since the length of the section is short, a section whose length is shorter than a predetermined length is extracted and connected to one of the preceding and following sections to form one section. I did it.

また、第２の本発明においても、同様にパワー情報が
所定値以上の有効区間と所定値以下の無効区間とに分割
処理部により区分してパワーの揺らぎやノイズの影響を
受けないようにした。また、有効区間内のパワーの立上
り変化点を変化点抽出部によって抽出し、その立上り変
化点で細区分部によって有効区間を細分して２音以上を
含む区間が生じないようにした。さらに、各区間の長さ
を区間長抽出部によって抽出してその長さが所定の長さ
より短い区間を区間補正部によって前後一方の区間につ
なげて一つの区間とし、ノイズ等によって非常に短い区
間が生じることのないようにした。Also, in the second aspect of the present invention, similarly, the power information is divided into an effective section having a predetermined value or more and an invalid section having a predetermined value or less by the division processing section so as not to be affected by power fluctuation or noise. . In addition, the rising change point of the power in the effective section is extracted by the change point extracting section, and the effective section is subdivided by the subdivision section at the rising change point so that a section including two or more sounds is not generated. Further, the length of each section is extracted by a section length extracting unit, and a section whose length is shorter than a predetermined length is connected to one of the preceding and following sections by a section correcting unit to form one section. Was prevented from occurring.

［実施例］以下、本発明の一実施例を図面を参照しながら詳述す
る。Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.

自動採譜方式まず、本発明が適用される自動採譜方式について説明
する。Automatic transcription system First, an automatic transcription system to which the present invention is applied will be described.

第４図において、中央処理ユニット（CPU）１は、当
該装置の全体を制御するものであり、バス２を介して接
続されている主記憶装置３に格納されている第５図に示
す採譜処理プログラムを実行するものである。バス２に
は、CPU1及び主記憶装置３に加えて、入力装置としての
キーボード４、出力装置としての表示装置５、ワーキグ
メモリとして用いられる補助記憶装置６及びアナログ／
デジタル変換器７が接続されている。In FIG. 4, a central processing unit (CPU) 1 controls the whole of the apparatus, and performs a transcription process shown in FIG. 5 stored in a main storage device 3 connected via a bus 2. Execute the program. In addition to the CPU 1 and the main storage device 3, the bus 2 includes a keyboard 4 as an input device, a display device 5 as an output device, an auxiliary storage device 6 used as a working memory, and an analog /
The digital converter 7 is connected.

アナログ／デジタル変換器７には、例えば、マイクロ
フォンでなる音響信号入力装置８が接続されている。こ
の音響信号入力装置８は、ユーザによって発声された歌
唱やハミングや、楽器から発生された楽音等の音響信号
を捕捉して電気信号に変換するものであり、その電気信
号をアナログ／デジタル変換器７に出力するものであ
る。An audio signal input device 8 including, for example, a microphone is connected to the analog / digital converter 7. The acoustic signal input device 8 captures an acoustic signal such as singing or humming uttered by a user or a musical tone generated from a musical instrument and converts the signal into an electric signal, and converts the electric signal into an analog / digital converter. 7 is output.

CPU1は、キーボード入力装置４によって処理が指令さ
れたとき、当該採譜処理を開始し、主記憶装置３に格納
されているプログラムを実行してアナログ／デジタル変
換器７によってデジタル信号に変換された音響信号を一
旦補助記憶装置６に格納し、その後、これら音響信号を
上述のプログラムを実行して楽譜データに変換して必要
に応じて表示装置５に出力するようになされている。When a process is instructed by the keyboard input device 4, the CPU 1 starts the transcription process, executes a program stored in the main storage device 3, and converts the sound converted into a digital signal by the analog / digital converter 7. The signals are temporarily stored in the auxiliary storage device 6, and thereafter, these sound signals are converted into musical score data by executing the above-described program and output to the display device 5 as necessary.

次に、CPU1が実行する音響信号を取り込んだ後の採譜
処理を第５図の機能レベルで示すフローチャートに従っ
て詳述する。Next, the music transcription process performed by the CPU 1 after capturing the audio signal will be described in detail with reference to the flowchart shown in the functional level of FIG.

まず、CPU1は、音響信号を自己相関分析して分析周期
毎に音響信号のピッチ情報を抽出し、また２乗和処理し
て分析周期毎にパワー情報を抽出し、その後ノイズ除去
や平滑化処理等の後処理を実行する（ステップSP1、SP
2）。その後、CPU1は、ピッチ情報については、その分
析状況に基づいて絶対音程軸に対する音響信号が有する
音程軸のずれ量を算出し、得られたピッチ情報をそのず
れ量に応じてシフトさせるチューニング処理を実行する
（ステップSP3）。すなわち、音響信号を発生した歌唱
者または楽器の音程軸と絶対音程軸との差が小さくなる
ようにピッチ情報を修正する。First, the CPU 1 performs an autocorrelation analysis of the acoustic signal to extract pitch information of the acoustic signal at each analysis cycle, and also performs a sum-of-squares process to extract power information at each analysis cycle, and then performs noise removal and smoothing processing. And other post-processing (steps SP1, SP
2). Thereafter, for the pitch information, the CPU 1 calculates a shift amount of the pitch axis of the acoustic signal with respect to the absolute pitch axis based on the analysis state, and performs a tuning process of shifting the obtained pitch information according to the shift amount. Execute (step SP3). That is, the pitch information is corrected so that the difference between the pitch axis of the singer or the musical instrument that generated the acoustic signal and the absolute pitch axis becomes smaller.

次いで、CPU1は、得られたピッチ情報が同一音程を指
示するものと考えられるピッチ情報の連続期間を得て、
音響信号を１音ごとのセグメントに切り分けるセグメン
テーションを実行し、また、得らてたパワー情報の変化
に基づいてセグメンテーションを実行する（ステップSP
4、SP5）。これら得られた両者のセグメント情報に基づ
いて、CPU1は、４分音符や８分音符等の時間長に相当す
る基準長を算出してこの基準長に基づいてより詳細にセ
グメンテーションを実行する（ステップSP6）。Next, the CPU 1 obtains a continuous period of pitch information in which the obtained pitch information is considered to indicate the same pitch,
A segmentation is performed to divide the acoustic signal into segments for each sound, and a segmentation is performed based on the obtained change in power information (step SP
4, SP5). On the basis of these two pieces of segment information obtained, the CPU 1 calculates a reference length corresponding to a time length of a quarter note, an eighth note, and the like, and performs more detailed segmentation based on the reference length (step SP6).

CPU1は、このようにしてセグメンテーションされたセ
グメントのピッチ情報に基づきそのピッチ情報が最も近
いと判断できる絶対音程軸上の音程にそのセグメントの
音程を同定し、さらに、同定された連続するセグメント
の音程が同一か否かに基づいて再度セグメンテーション
を実行する（ステップSP7、SP8）。The CPU 1 identifies the pitch of the segment as a pitch on the absolute pitch axis that can determine that the pitch information is the closest based on the pitch information of the segment thus segmented, and further identifies the pitch of the identified continuous segment. Segmentation is again performed based on whether or not are the same (steps SP7 and SP8).

その後、CPU1は、チューニング後のピッチ情報を主計
して得た音階の出現頻度と、調に応じて定まる所定の重
み付け係数との積和を求めてこの積和の最大情報に基づ
いて、例えば、ハ長調やイ短調というように入力音響信
号の楽曲の調を決定し、決定された調における音階上の
所定の音程についてその音程をピッチ情報についてより
詳細に見直して音程を確認、修正する（ステップSP9、S
P10）。次いで、CPU1は、最終的に決定された音程から
連続するセグメントについて同一なものがあるか否か、
また連続するセグメント間でパワーの変化があるか否か
に基づいてセグメンテーションの見直しを実行し、最終
的なセグメンテーションを行なう（ステップSP11）。Thereafter, the CPU 1 obtains a product sum of the frequency of appearance of the scale obtained by calculating the pitch information after tuning and a predetermined weighting coefficient determined according to the key, and based on the maximum information of the product sum, for example, Determine the key of the music of the input audio signal, such as C major or A minor, and review and correct the pitch of the predetermined pitch on the scale in the determined key in more detail with respect to the pitch information (step SP9, S
P10). Next, the CPU 1 determines whether or not there is the same one for consecutive segments from the finally determined pitch,
The segmentation is reviewed based on whether or not there is a change in power between consecutive segments, and final segmentation is performed (step SP11).

このようにして音程及びセグメントが決定されると、
CPU1は、楽曲は１拍目から始まる、フレーズの最後の音
は次の小節にまたがらない、小節ごとに切れ目がある等
の観点から小節を抽出し、この小節情報及びセグメンテ
ーション情報から拍子を決定し、この決定された拍子情
報及び小節の長さからテンポを決定する（ステップSP1
2、SP13）。Once the pitch and segment are determined in this way,
The CPU 1 extracts measures from the viewpoint that the music starts from the first beat, the last sound of the phrase does not extend to the next measure, and there is a break in each measure, and determines the time signature from the measure information and the segmentation information. The tempo is determined from the determined time signature information and the length of the bar (step SP1).
2, SP13).

そして、CPU1は決定された音程、音長、調、拍子及び
テンポの情報を整理して最終的に楽譜データを作成する
（ステップSP14）。Then, the CPU 1 organizes the information on the determined pitch, pitch, key, beat, and tempo to finally create the musical score data (step SP14).

パワー情報に基づくセグメンテーション次に、このような自動採譜方式における音響信号のパ
ワー情報に基づくセグメンテーション処理（第５図ステ
ップSP5）について、第１図及び第２図のフローチャー
トを用いて詳述する。なお、第１図はかかる処理を機能
レベルで示すフローチャートであり、第２図は第１図を
より詳細に示すフローチャートである。Segmentation Based on Power Information Next, a segmentation process (step SP5 in FIG. 5) based on the power information of the audio signal in such an automatic transcription system will be described in detail with reference to the flowcharts of FIGS. FIG. 1 is a flowchart showing such processing at a functional level, and FIG. 2 is a flowchart showing FIG. 1 in more detail.

また、音響信号のパワー情報としては、分析周期内の
各サンプリング点について音響信号を２乗し、これら２
乗値の総和をその分析周期におけるパワー情報として用
いている。As power information of the acoustic signal, the acoustic signal is squared at each sampling point in the analysis cycle, and
The sum of the power values is used as power information in the analysis cycle.

CPU1は各分析点のパワー情報を閾値と比較して閾値よ
り大きい区間と小さい区間に音響信号を分割し、閾値よ
り大きい区間を有効区間のセグメントとし、小さい区間
を無効区間のセグメントとし、有効区間の初めに有効セ
グメントの開始の印を付け、無効区間の初めに無効セグ
メントの開始の印を付す（ステップSP15、16）。このよ
うにしたのは、パワー情報が小さい範囲では音響信号の
音程が不安定なことが多く、音程同定が適切になされな
い場合が多いためであり、また、休符区間を検出するた
めである。The CPU 1 compares the power information of each analysis point with a threshold, divides the audio signal into a section larger than the threshold and a section smaller than the threshold, sets a section larger than the threshold as a segment of an effective section, sets a smaller section as a segment of an invalid section, At the beginning of the invalid segment, and the beginning of the invalid segment at the beginning of the invalid section (steps SP15 and SP16). The reason for this is that the pitch of the acoustic signal is often unstable in a range where the power information is small, and the pitch identification is often not performed properly, and also to detect a rest period. .

次に、CPU1は分割した有効セグメント内において、パ
ワー情報の変化関数を演算し、この変化関数からパワー
情報の立上り変化点を抽出して抽出された立上り変化点
で当該有効セグメントを細区分してその点に有効セグメ
ントの開始の印を付ける（ステップSP17、18）。このよ
うにしたのは、ある程度のパワーを保ったまま次の音に
移行することがあり、この場合には上述の処理だけでは
２音以上を含むセグメントが生じる恐れがあるためであ
り、そのようなセグメントを次の音の開始時にはパワー
を増大させるということに着目して細区分するようにし
たためである。Next, the CPU 1 calculates a change function of the power information in the divided effective segments, extracts a rising change point of the power information from the change function, and subdivides the effective segment into the extracted rising change points. A mark indicating the start of the effective segment is given at that point (steps SP17 and SP18). The reason for this is that the sound may shift to the next sound while maintaining a certain level of power. In this case, there is a possibility that a segment including two or more sounds may be generated by the above processing alone. This is because such segments are subdivided by focusing on increasing the power at the start of the next sound.

その後、CPU1は有効、無効に関係なく各セグメントの
長さを計数し、その長さが所定長より短いセグメントを
前のセグメントにつなげて一つのセグメントとする（ス
テップSP19、20）。このようにしたのは、ノイズ等によ
って微小なセグメントに区分されることがあるので、そ
のセグメントを他のセグメントにつなぐためである。ま
た、１音を意図されているにも拘らず、上述の立上り変
化点に基づく細区分処理によって複数のセグメントに区
分されたものをつなげるためである。Thereafter, the CPU 1 counts the length of each segment irrespective of validity or invalidity, and connects a segment whose length is shorter than a predetermined length to the previous segment to form one segment (steps SP19 and SP20). This is because the segment may be divided into minute segments due to noise or the like, and the segment is connected to another segment. In addition, although one sound is intended, the sound is divided into a plurality of segments by the subdivision processing based on the rising transition point described above.

次に、かかる処理を第２図のフローチャートを用いて
より詳細に説明をする。Next, such processing will be described in more detail with reference to the flowchart of FIG.

CPU1は、まず分析点パラメータｔを０クリアした後、
処理すべき分析点データが終了していないことを確認し
てその分析点における音響信号のパワー情報power
（ｔ）が閾値θｐより小さいか否かを判断する（ステッ
プSP21〜23）。CPU1 first clears the analysis point parameter t to 0,
Confirm that the analysis point data to be processed is not completed, and power information of the acoustic signal at that analysis point
It is determined whether (t) is smaller than the threshold value θp (steps SP21 to SP23).

CPU1は閾値θｐよりパワー情報power（ｔ）が小さい
場合には分析点パラメータｔをインクリメントして再度
ステップSP22に戻り次の分析点のパワー情報について判
定する（ステップSP24）。When the power information power (t) is smaller than the threshold value θp, the CPU 1 increments the analysis point parameter t and returns to step SP22 again to determine the power information of the next analysis point (step SP24).

一方、CPU1はステップSP23においてパワー情報power
（ｔ）の値が閾値θｐ以上である場合にはその分析点に
有効セグメント開始点として印を付け、次のステップSP
26以降の処理に移る（ステップSP25）。On the other hand, the CPU 1 determines in step SP23 that the power information power
If the value of (t) is equal to or larger than the threshold value θp, the analysis point is marked as an effective segment start point, and the next step SP
The process moves to the process after step 26 (step SP25).

このときには、CPU1は、全ての分析点について処理が
終了していないことを確認し、再度パワー情報power
（ｔ）の値が閾値θｐより小さいか否か判定し、閾値θ
ｐ以上ならば分析点パラメータｔをインクリメントして
ステップSP26に戻る（ステップSP26〜SP28）。他方、パ
ワー情報power（ｔ）の値が閾値θｐより小さくなった
場合にはその分析点に無効セグメント開始点としての印
を付けて上述のステップSP22に戻る（ステップSP29）。At this time, the CPU 1 confirms that the processing has not been completed for all analysis points,
It is determined whether the value of (t) is smaller than the threshold value θp,
If it is equal to or more than p, the analysis point parameter t is incremented and the process returns to step SP26 (steps SP26 to SP28). On the other hand, when the value of the power information power (t) becomes smaller than the threshold value θp, the analysis point is marked as an invalid segment start point, and the process returns to step SP22 (step SP29).

CPU1は、以上の処理を、ステップSP22またはSP24にお
いて全ての分析点について処理が終了したことを検出す
るまで行ない、全ての分析点のパワー情報power（ｔ）
と閾値θｐとを比較して閾値θｐ以上の有効セグメント
と閾値θｐ未満の無効セグメントとに音響信号を区分し
てステップSP30以降の処理に移る。The CPU 1 performs the above processing until it detects in step SP22 or SP24 that processing has been completed for all analysis points, and power information power (t) of all analysis points.
Then, the sound signal is divided into an effective segment equal to or more than the threshold value θp and an invalid segment less than the threshold value θp, and the process proceeds to step SP30 and subsequent steps.

これ以降においては、CPU1は分析点パラメータｔを０
クリアして最初の分析点から以下の処理を開始する（ス
テップSP30）。CPU1は処理すべき分析点データが終了し
ていないことを確認した後、有効セグメント開始の印を
付けられた分析点か否かを判断する（ステップSP31、3
2）。有効セグメント開始の分析点でない場合には、CPU
1は分析点パラメータｔをインクリメントして上述のス
テップSP29に戻る（ステップSP33）。After this, the CPU 1 sets the analysis point parameter t to 0.
Clear and start the following processing from the first analysis point (step SP30). After confirming that the analysis point data to be processed has not been completed, the CPU 1 determines whether or not the analysis point has been marked with the start of a valid segment (steps SP31 and SP3).
2). If it is not the analysis point at the start of the effective segment, the CPU
1 increments the analysis point parameter t and returns to step SP29 described above (step SP33).

一方、有効セグメント開始の分析点を検出した場合に
は、処理すべき分析点データが残っていないことを再度
確認し、さらに無効セグメント開始の分析点か否かを判
断する（ステップSP34、35）。無効セグメント開始の分
析点でない場合には、従って、当該分析点は有効セグメ
ント内の分析点であるので、パワー情報power（ｔ）の
変化関数（以降の処理でパワー情報の立上り抽出に用い
るので、以下では、立上り抽出関数と呼ぶ）ｄ（ｔ）を
（１）式に従って求める（ステップSP36）。On the other hand, when the analysis point at the start of the effective segment is detected, it is confirmed again that there is no analysis point data to be processed, and it is further determined whether the analysis point is at the analysis point at the start of the invalid segment (steps SP34 and SP35). . When the analysis point is not the analysis point at the start of the invalid segment, the analysis point is an analysis point in the valid segment. Therefore, the change function of the power information power (t) (since it is used for the extraction of the rising edge of the power information in the subsequent processing, In the following, d (t) is obtained according to equation (1) (step SP36).

ｄ（ｔ）＝｛power（ｔ＋ｋ）−power（ｔ）｝／｛power（ｔ＋ｋ）＋power（ｔ）｝ …（１）ただし、ｋはパワーの変化をとらえるのに適当な時間
を示す自然数である。d (t) = {power (t + k) -power (t)} / {power (t + k) + power (t)} (1) where k is a natural number indicating an appropriate time for capturing a change in power. .

その後、CPU1は求めた立上り抽出関数ｄ（ｔ）の値が
閾値θｄより小さいか否かを判断し、小さい場合には分
析点パラメータｔをインクリメントしてステップSP34に
戻る（ステップSP37、38）。他方、立上り抽出関数ｄ
（ｔ）が閾値θｄ以上になった場合にはその分析点に新
たなセグメントの開始としての印を付ける（ステップSP
39）。これにより、有効セグメントが細区分されたこと
になる。Thereafter, the CPU 1 determines whether or not the value of the rising extraction function d (t) obtained is smaller than the threshold value θd. If the value is smaller, the analysis point parameter t is incremented and the process returns to step SP34 (steps SP37 and SP38). On the other hand, the rising extraction function d
If (t) is equal to or larger than the threshold value θd, the analysis point is marked as the start of a new segment (step SP).
39). Thus, the effective segment is subdivided.

その後、CPU1は全ての分析点データについて処理が終
了していないことを確認した後、当該処理中の分析点に
無効セグメントの開始の印が付されているか否かを判断
し、付されている場合には、上述したステップSP31に戻
って次の有効セグメントの開始点の検出処理に戻る（ス
テップSP40、41）。Thereafter, the CPU 1 confirms that the processing has not been completed for all the analysis point data, and then determines whether or not the analysis point being processed is marked with a start of an invalid segment, and the analysis point is attached. In this case, the process returns to step SP31 described above and returns to the process of detecting the start point of the next valid segment (steps SP40 and SP41).

他方、無効セグメントの開始の分析点でない場合に
は、パワー情報power（ｔ）に基づいて立上り抽出関数
ｄ（ｔ）を（１）式により求め、立上り抽出関数ｄ
（ｔ）が閾値θｐより小さいか否かを判断する（ステッ
プSP42、43）。小さくなると、上述のステップSP34に戻
ってパワー情報の立上り変化点の抽出処理に進む。一
方、ステップSP43において分析点の立上り抽出関数ｄ
（ｔ）が継続して閾値θｐ以上ならば分析点パラメータ
ｔをインクリメントして次の分析点について立上り抽出
関数ｄ（ｔ）が閾値θｐより小さくなったか否かを判断
するべくステップSP40に戻る。On the other hand, when the analysis point is not the analysis point at the start of the invalid segment, the rising extraction function d (t) is obtained by the equation (1) based on the power information power (t).
It is determined whether (t) is smaller than the threshold value θp (steps SP42 and SP43). When it becomes smaller, the process returns to step SP34, and proceeds to the process of extracting the rising change point of the power information. On the other hand, in step SP43, the rising extraction function d of the analysis point
If (t) continues to be equal to or greater than the threshold value θp, the analysis point parameter t is incremented, and the process returns to step SP40 to determine whether the rising extraction function d (t) has become smaller than the threshold value θp for the next analysis point.

上述の処理を繰り返すことにより、ステップSP31、SP
34またはSP40で全ての分析点について処理が終了したこ
とを検出すると、ステップSP45以降のセグメント長に基
づくセグメントの見直し処理に進む。By repeating the above processing, steps SP31, SP
If it is detected in 34 or SP40 that the processing has been completed for all analysis points, the process proceeds to segment review processing based on the segment length after step SP45.

かかる処理においては、CPU1は分析点パラメータｔを
０クリアした後、さらに分析点データが終了していない
ことを確認して当該分析点に有効、無効に関係なくセグ
メントの開始の印が付されているか否かを判断する（ス
テップSP45〜47）。セグメント開始点でない場合には、
分析点パラメータｔをインクリメントして次の分析点デ
ータに移るべくステップSP46に戻る（ステップSP48）。
セグメント開始点を検出した場合には、この開始点から
始まるセグメントの長さを計数するべくセグメント長パ
ラメータＬを初期値「１」にセットする（ステップSP4
9）。In this processing, after clearing the analysis point parameter t to 0, the CPU 1 further confirms that the analysis point data has not been completed, and marks the start of the segment regardless of whether the analysis point is valid or invalid. It is determined whether or not there is (steps SP45 to SP47). If it is not the segment start point,
The process returns to step SP46 to increment the analysis point parameter t and move to the next analysis point data (step SP48).
When the segment start point is detected, the segment length parameter L is set to an initial value "1" to count the length of the segment starting from the start point (step SP4).
9).

その後、CPU1は分析点パラメータｔをインクリメント
し、さらに、分析点データが終了していないことを確認
して、当該分析点に有効、無効に関係なくセグメントの
開始の印が付されているか否かを判断する（ステップSP
50〜52）。その結果、セグメント開始点でない場合に
は、セグメント長パラメータＬをインクリメントし、さ
らに分析点パラメータｔをもインクリメントして上述の
ステップSP51に戻る（ステップSP53、54）。Thereafter, the CPU 1 increments the analysis point parameter t, further confirms that the analysis point data is not completed, and determines whether or not the analysis point is marked as a start of a segment regardless of validity or invalidity. Judge (step SP
50-52). As a result, if it is not the segment start point, the segment length parameter L is incremented, the analysis point parameter t is also incremented, and the process returns to step SP51 (steps SP53 and SP54).

かかるステップSP51〜54でなる処理を繰返すことによ
り、やがて、次にセグメント開始の印が付されている分
析点となり、ステップSP52で肯定結果が得られる。この
ときのセグメント長パラメータＬは、印が付されている
処理対象の分析点とその直前の印が付されている分析点
との距離に相当し、すなわち、セグメントの長さに相当
している。CPU1はステップSP52で肯定結果が得られる
と、このパラメータＬ（セグメント長）が閾値θＬより
短いか否かを判断し、閾値θＬ以上の場合には、セグメ
ント開始の印を取ることなく、上述のステップSP46に戻
り、閾値θＬより小さい場合には、前側のセグメント開
始の印を取り去って、すなわちこのセグメントを前側の
セグメントとつなげて上述のステップSP46に戻る（ステ
ップSP55、56）。By repeating the processing of steps SP51 to SP54, the analysis point to which the start of the segment is marked is eventually given, and a positive result is obtained in step SP52. The segment length parameter L at this time corresponds to the distance between the analysis point to be processed marked with and the analysis point marked immediately before it, that is, corresponds to the length of the segment. . If a positive result is obtained in step SP52, the CPU 1 determines whether or not the parameter L (segment length) is shorter than the threshold value θL. Returning to step SP46, if it is smaller than the threshold value θL, the mark of the start of the front segment is removed, that is, this segment is connected to the front segment, and the process returns to step SP46 (steps SP55 and SP56).

なお、ステップSP55またはSP56からステップSP46に戻
った場合には、分析点データが終了していないと、ステ
ップSP47で直ちに肯定結果が得られてステップSP49以降
の処理に進み、今見付かったばかりの印を次の印を捜す
動作に移行することになり、上述と同様にして次の印を
見出だしてセグメント長の見直しを実行する。When returning to step SP46 from step SP55 or SP56, if the analysis point data is not completed, an affirmative result is obtained immediately in step SP47, the process proceeds to step SP49 and thereafter, and the mark just found is marked. The operation shifts to the operation of searching for the next mark. In the same manner as described above, the next mark is found and the segment length is reviewed.

このような処理を繰返すことにより、全てのセグメン
ト長の見直しが終了し、やがてステップSP46で肯定結果
が得られてCPU1は当該処理プログラムを終了させる。By repeating such processing, the review of all segment lengths is completed, and a positive result is obtained in step SP46, and the CPU 1 ends the processing program.

第３図はかかる処理によるセグメンテーションの一例
を示すものである。この例の場合、ステップSP29までの
処理を繰り返すことによりパワー情報power（ｔ）に基
づいて有効セグメントS1〜S8及び無効セグメントS11〜S
18に区分される。その後、ステップSP44までの処理を繰
り返すことにより、立上り抽出関数ｄ（ｔ）に基づいて
有効セグメントS4はパワーの立上り変化点によってS41
及びS42に細区分される。さらに、その後、ステップSP4
5以降の処理がなされてセグメント長による見直しがな
されるが、この例の場合、所定長より短いセグメントが
ないので、特にセグメトがつなげられることはない。FIG. 3 shows an example of the segmentation by such processing. In the case of this example, the valid segments S1 to S8 and the invalid segments S11 to S8 are repeated based on the power information power (t) by repeating the processing up to step SP29.
It is divided into 18. Thereafter, by repeating the processing up to step SP44, the effective segment S4 is determined based on the rising extraction function d (t), and the effective segment S4 is determined by the rising edge of the power at S41.
And S42. Then, after that, step SP4
The processing after 5 is performed and the segment length is reviewed, but in this case, there is no segment shorter than the predetermined length, so that no particular segment is connected.

従って、上述の実施例によれば、音響信号をパワー情
報が閾値以上の有効セグメントと閾値以下の無効セグメ
ントに区分すると共に、その有効セグメントをパワー情
報の立上り変化点によって細区分し、さらに、セグメン
ト長によって見直すようにしたので、ノイズやパワーの
揺らぎによる誤ったセグメンテーションを行なうことの
ない精度の高いセグメンテーションを実行することがで
きる。Therefore, according to the above-described embodiment, the audio signal is divided into an effective segment whose power information is equal to or more than the threshold value and an invalid segment whose power information is equal to or less than the threshold value, and the effective segment is subdivided according to a rising transition point of the power information. Because the length is reviewed according to the length, it is possible to execute highly accurate segmentation without performing erroneous segmentation due to noise or fluctuation of power.

すなわち、閾値以上のパワー情報をもつ区間を有効な
セグメントとしているので、音声パワーが小さい音程の
不安定な期間を音程同定処理等の以降の処理に用いるこ
ともなくし得る。また、パワーの立上り変化点を抽出し
て細区分するようにしたので、パワーが所定以上保った
まま次の音に移行する場合にも良好にセグメンテーショ
ンを実行させることができる。さらに、セグメント長で
見直しをするようにしたので、１音や休符期間を複数の
セグメントに分割することを無くし得る。That is, since a section having power information equal to or greater than the threshold value is set as an effective segment, an unstable period of a pitch at which the audio power is low may not be used for a subsequent process such as a pitch identification process. Further, since the rising change point of the power is extracted and subdivided, the segmentation can be favorably performed even when shifting to the next sound while the power is maintained at a predetermined level or more. Further, since the review is performed based on the segment length, it is possible to eliminate the need to divide one sound or rest period into a plurality of segments.

他の実施例なお、上述の実施例においては、パワー情報として音
響信号の２乗和を用いたものを示したが、他のパラメー
タを用いても良い。例えば、２乗和の平方根を用いても
良い。また、立上り抽出関数を（１）式のように求めた
が、他のパラメータを用いても良く、例えば、（１）式
の分子のみを用いた関数によってパワー情報の立上りを
抽出するようにしても良い。Other Embodiments In the above-described embodiment, the power information is obtained by using the sum of squares of the audio signal. However, other parameters may be used. For example, the square root of the sum of squares may be used. Further, the rising extraction function is obtained as shown in equation (1), but other parameters may be used. For example, the rising of power information is extracted by a function using only the numerator of equation (1). Is also good.

さらに、上述の実施例においては、セグメント長が不
十分なセグメントを直前のセグメントにつなげるものを
示したが、直後のセグメントにつなげるようにしても良
く、また、直前のセグメントが休符区間以外では直前の
セグメントにつなげ、休符区間であれば直後のセグメン
トにつなげるようにしても良い。Furthermore, in the above-described embodiment, the segment having an insufficient segment length is connected to the immediately preceding segment. However, the segment may be connected to the immediately following segment. It may be connected to the immediately preceding segment, and if it is a rest section, connected to the immediately following segment.

また、上述の実施例においては、第５図に示す全ての
処理をCPU1が主記憶装置３に格納されているプログラム
に従って実行するものを示したが、その一部または全部
の処理をハードウェア構成で実行するようにしても良
い。例えば、第４図との対応部分に同一符号を付した第
６図に示すように、音響信号入力装置８からの音響信号
を増幅回路10を介して増幅した後、さらに前置フィルタ
11を介してアナログ／デジタル変換器12に与えてデジタ
ル信号に変換し、このデジタル信号に変換された音響信
号を信号処理プロセッサ13が自己相関分析してピッチ情
報を抽出し、また２乗和処理してパワー情報を抽出して
CPU1によるソフトウェア処理系に与えるようにして良
い。このようなハードウェア構成（10〜13）に用いられ
る信号処理プロセッサ13としては、音声帯域の信号をリ
アルタイム処理し得ると共に、ホストのCPU1とのインタ
フェース信号が用意されているプロセッサ（例えば、日
本電気株式会社製μPD7720）を適用し得る。In the above-described embodiment, the CPU 1 executes all the processing shown in FIG. 5 according to the program stored in the main storage device 3. However, a part or all of the processing is performed by a hardware configuration. May be executed. For example, as shown in FIG. 6 in which the same reference numerals are given to the corresponding parts in FIG. 4, the audio signal from the audio signal input device 8 is amplified through the amplifier circuit 10, and then the pre-filter is added.
The digital signal is supplied to an analog / digital converter 12 via an analog-to-digital converter 11 and converted into a digital signal. The acoustic signal converted to the digital signal is subjected to autocorrelation analysis by a signal processor 13 to extract pitch information, and a square sum processing is performed. And extract power information
It may be given to the software processing system by CPU1. As the signal processor 13 used in such a hardware configuration (10 to 13), a processor capable of processing a signal in a voice band in real time and providing an interface signal with a host CPU 1 (for example, NEC Corporation) Co., Ltd. μPD7720) can be applied.

［発明の効果］以上のように、本発明によれば、パワー情報が閾値よ
り大きい区間及び小さい区間に分け、かつその大きい区
間をパワー情報の立上り変化点で細区分し、その後セグ
メント長に基づいてセグメンテーションを見直すように
したので、ノイズ成分等によって誤ってセグメンテーシ
ョンがなされることを少なくし得ると共に、音程同定処
理等の以降の処理を良好に実行させて楽譜データの採譜
精度を向上させることができる。[Effects of the Invention] As described above, according to the present invention, the power information is divided into sections larger and smaller than the threshold value, and the larger section is subdivided at the rising transition point of the power information, and then based on the segment length. Since the segmentation is re-evaluated, it is possible to reduce the possibility that the segmentation is erroneously performed due to a noise component, etc. it can.

[Brief description of the drawings]

第１図は本発明の一実施例にかかるパワー情報に基づく
セグメンテーション処理を示す概略フローチャート、第
２図はセグメンテーション処理をより詳細に示すフロー
チャート、第３図はかかる処理によるセグメンテーショ
ンの一例を示す特性曲線図、第４図は本発明を適用する
自動採譜方式の構成を示すブロック図、第５図はその自
動採譜処理手順を示すフローチャート、第６図は自動採
譜方式の他の構成を示すブロック図である。１……CPU、３……主記憶装置、６……補助記憶装置、
７……アナログ／デジタル変換器、８……音響信号入力
装置。FIG. 1 is a schematic flowchart showing a segmentation process based on power information according to one embodiment of the present invention, FIG. 2 is a flowchart showing the segmentation process in more detail, and FIG. 3 is a characteristic curve showing an example of the segmentation by the process. FIG. 4 is a block diagram showing the configuration of an automatic transcription system to which the present invention is applied, FIG. 5 is a flowchart showing the automatic transcription process, and FIG. 6 is a block diagram showing another configuration of the automatic transcription system. is there. 1 ... CPU, 3 ... main storage device, 6 ... auxiliary storage device,
7 ... A / D converter, 8 ... Acoustic signal input device.

───────────────────────────────────────────────────── フロントページの続き (72)発明者水野正典東京都港区芝５丁目７番15号日本電気技術情報システム開発株式会社内審査官新井重雄 ──────────────────────────────────────────────────続き Continued on the front page (72) Inventor Masanori Mizuno Examiner, NEC Technical Information System Development Co., Ltd. Shigeo Arai, 5-7-15 Shiba, Minato-ku, Tokyo

Claims

(57) [Claims]

An information processing apparatus comprising: at least a process of extracting power information of an input audio signal; and a segmentation process of dividing the audio signal into sections that can be regarded as having the same pitch based on the extracted power information. In the automatic music transcription method for converting into musical score data, the segmentation processing includes a process of dividing the power information into an effective section having a predetermined value or more and an invalid section having a power value of not more than the predetermined value, and a rising change of the power information for the effective section. A process of extracting a point, a process of subdividing the valid section at the extracted rising transition point, and a process of extracting the length of the valid section and the invalid section including the subdivided valid section; An automatic music transcription method, comprising: connecting a section whose extracted length is shorter than a predetermined length to any of the preceding and following sections.

2. A power extraction unit for extracting power information from an input audio signal, and a segmentation unit for dividing the audio signal into sections that can be regarded as having the same pitch based on the extracted power information. In an automatic music transcription apparatus for converting the acoustic signal into musical score data, the segmentation means may be divided into an effective section in which the power information is equal to or more than a predetermined value and an invalid section in which the power information is equal to or less than the predetermined value. A change point extraction unit that extracts a rising change point of the power information; a subdivision unit that subdivides the effective section with the extracted rising change point; and an effective section including the subdivided effective section; A section length extraction unit that extracts the length of the invalid section, and a section correction unit that connects a section whose extracted length is shorter than a predetermined length to one of the preceding and following sections. Automatic transcription apparatus being characterized in that form.