JP2604411B2

JP2604411B2 - Automatic music transcription method and device

Info

Publication number: JP2604411B2
Application number: JP4612788A
Authority: JP
Inventors: 七郎鶴田; 洋典高島; 正樹藤本; 正典水野
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1988-02-29
Filing date: 1988-02-29
Publication date: 1997-04-30
Anticipated expiration: 2012-04-30
Also published as: JPH01219635A

Abstract

PURPOSE:To prevent that segmentation is performed by error by a noise component, by dividing a section wherein power data is larger than a threshold value into small sections and finely dividing the large section at the rising change point of power data. CONSTITUTION:A CPU 1 divides a section wherein the power data of an acoustic signal at an analytical point is below a threshold value and a section wherein said power data is the threshold value or more. Next, the CPU 1 applies start and finish marks to the first and last of the section equal to or more than the threshold value and subsequently calculates the change function of power in the section wherein power data is the threshold value or more and then extracts the rising changing point of power on the basis of the calculated change function to finely divide the section equal to or more than the threshold value. By this method, it can be prevented that segmentation is performed erroneously by a noise component and processing on and after such as intervale identification processing can be well executed.

Description

【発明の詳細な説明】［産業上の利用分野］本発明は、歌唱音声やハミング音声や楽器音等の音響
信号から楽譜データを作成する自動採譜方法及び装置に
関し、特に、パワー情報に基づいて音響信号を同一音程
とみなせる区間に区分するセグメンテーション処理に関
するものである。Description: BACKGROUND OF THE INVENTION The present invention relates to an automatic music transcription method and apparatus for creating musical score data from acoustic signals such as singing voices, humming voices, and instrument sounds, and more particularly, to a method and apparatus based on power information. The present invention relates to a segmentation process for dividing an audio signal into sections that can be regarded as having the same pitch.

［従来の技術］歌唱音声やハミング音声や楽器音等の音響信号を楽譜
データに変換する自動採譜方式においては、音響信号か
ら楽譜としての基本的な情報である音長、音程、調、拍
子及びテンポを検出することを有する。[Prior Art] In an automatic transcription system for converting an acoustic signal such as a singing voice, a humming voice, or a musical instrument sound into musical score data, a sound length, a pitch, a key, a time signature, and the like, which are basic information as a musical score from an acoustic signal. Detecting the tempo.

ところで、音響信号は基本波形の繰返し波形を連続的
に含む信号であるだけであり、上述した各情報を直ちに
得ることはできない。By the way, an acoustic signal is only a signal that continuously includes a repetitive waveform of a basic waveform, and the above-described information cannot be obtained immediately.

そこで、従来の自動採譜方式においては、まず、音響
信号の音高を表す基本波形の繰返し情報（以下、ピッチ
情報と呼ぶ）及びパワー情報を分析周期毎に抽出し、そ
の後、少なくとも抽出されたパワー情報から音響信号を
同一音程とみなせる区間（セグメント）に区分し（かか
る処理をセグメンテーションと呼ぶ）、次いで、セグメ
ントのピッチ情報から各セグメントの音響信号の音程と
して絶対音程軸にそった音程に同定し、ピッチ情報の音
程軸周りの分布に基づいて音響信号の調を決定し、さら
に、セグメントに基づいて音響信号の拍子及びテンポを
決定するという順序で各情報を得ていた。Therefore, in the conventional automatic transcription method, first, repetition information (hereinafter, referred to as pitch information) of a basic waveform representing a pitch of an acoustic signal and power information are extracted for each analysis period, and then at least the extracted power is extracted. Based on the information, the audio signal is divided into sections (segments) that can be regarded as having the same pitch (this processing is called segmentation). Then, from the pitch information of the segment, the pitch of the audio signal of each segment is identified as a pitch along the absolute pitch axis. Each information is obtained in the order of determining the tone of the acoustic signal based on the distribution of the pitch information around the pitch axis, and further determining the beat and tempo of the acoustic signal based on the segment.

従って、音程、拍子、テンポ等は、セグメント（音
長）に基づき決定されるので、セグメンテーション処理
は、特に楽譜データを作成する上で重要なものとなって
いる。Therefore, the interval, time signature, tempo, and the like are determined based on the segment (tone length), and thus the segmentation processing is particularly important in creating musical score data.

［発明が解決しようとする課題］このように、セグメンテーションは楽譜データを作成
する上で重要な要素であり、セグメンテーションの精度
が低いと、最終的に得られる楽譜データの精度も著しく
低くなるので、ピッチ情報に基づくセグメンテーション
結果及びパワー情報に基づくセグメンテーション結果の
両者から最終的にセグメンテーションを行なう場合、ま
たはパワー情報から最終的なセグメンテーションを行な
う場合共に、パワー情報からのセグメンテーション処理
自体の精度も向上することが望まれる。[Problems to be Solved by the Invention] As described above, the segmentation is an important element in creating the score data, and if the accuracy of the segmentation is low, the accuracy of the finally obtained score data also becomes extremely low. The accuracy of the segmentation process itself from the power information should be improved both when performing the final segmentation from both the segmentation result based on the pitch information and the segmentation result based on the power information, or when performing the final segmentation based on the power information. Is desired.

本発明は、以上の点を考慮してなされたもので、パワ
ー情報に基づいたセグメンテーションを良好に実行する
ことができ、楽譜データの精度を向上させることのでき
る自動採譜方法及び装置を提供しようとするものであ
る。The present invention has been made in view of the above points, and an object of the present invention is to provide an automatic transcription method and apparatus that can perform segmentation based on power information satisfactorily and improve the accuracy of musical score data. Is what you do.

［課題を解決するための手段］かかる課題を解決するため、第１の本発明において
は、入力された音響信号からパワー情報を抽出する処理
と、抽出されたパワー情報に基づいて音響信号を同一音
程とみなせる区間に区分するセグメンテーション処理と
を少なくとも含み、音響信号を楽譜データに変換する自
動採譜方法において、上記セグメンテーション処理を、
パワー情報が所定値以上の有効区間と所定値以下の無効
区間とに区分する処理と：有効区間についてパワー情報
の立上り変化点を抽出する処理と：抽出された立上り変
化点で上述の有効区間を細区分する処理とからなる一連
の処理によって行うようにした。[Means for Solving the Problem] In order to solve the problem, according to the first aspect of the present invention, a process of extracting power information from an input audio signal is the same as a process of extracting an audio signal based on the extracted power information. At least including a segmentation process for segmenting into intervals that can be regarded as intervals, in an automatic transcription method for converting an acoustic signal into musical score data, the segmentation process includes:
A process of dividing power information into a valid section having a predetermined value or more and an invalid section having a power value of not more than a predetermined value; processing of extracting a rising change point of power information for the valid section; The processing is performed by a series of processing including the subdivision processing.

また、第２の本発明においては、入力された音響信号
からパワー情報を抽出するパワー抽出手段と、抽出され
たパワー情報に基づいて音響信号を同一音程とみなせる
区間に区分するセグメンテーション手段とを一部に備え
て音響信号を楽譜データに変換する自動採譜装置におい
て、セグメンテーション手段を、パワー情報が所定値以
上の有効区間と所定値以下の無効区間とに区分する分割
処理部と：有効区間についてパワー情報の立上り変化点
を抽出する変化点抽出部と：抽出された立上り変化点で
上述の有効区間を細区分する細区分部とで構成した。Further, in the second aspect of the present invention, the power extraction means for extracting power information from the input audio signal and the segmentation means for dividing the audio signal into sections which can be regarded as having the same pitch based on the extracted power information are provided. And a division processing unit for dividing the segmentation means into an effective section in which power information is equal to or greater than a predetermined value and an invalid section in which power information is equal to or less than a predetermined value. A change point extracting unit that extracts a rising change point of information and a subdivision unit that subdivides the above-described effective section with the extracted rising change point.

［作用］第１の本発明においては、音響信号のパワーの揺らぎ
やノイズの影響を小さくするように、パワー情報が所定
値以上の有効区間と、所定値以下の無効区間とに分割す
る。また、２音以上を含む有効区間が生じることもある
ので、音の移行時にはパワーも立上ることに着目し、パ
ワーの立上り変化点を抽出して有効区間を細区分するよ
うにした。[Operation] In the first aspect of the present invention, power information is divided into an effective section having a power value equal to or greater than a predetermined value and an invalid section having a power value equal to or less than the predetermined value so as to reduce the influence of power fluctuation and noise of the audio signal. In addition, since an effective section including two or more sounds may occur, the power rises at the time of sound transition, and the effective rising section is subdivided by extracting a power rising change point.

また、第２の本発明においても、同様にパワー情報が
所定値以上の有効区間と所定値以下の無効区間とに分割
処理部により区分してパワーの揺らぎやノイズの影響を
受けないようにした。また、有効区間内のパワーの立上
り変化点で変化点抽出部によって抽出し、その立上り変
化点で細区分部によって有効区間を細分して２音以上を
含む区間が生じないようにした。Also, in the second aspect of the present invention, similarly, the power information is divided into an effective section having a predetermined value or more and an invalid section having a predetermined value or less by the division processing section so as not to be affected by power fluctuation or noise. . In addition, the rising point of the power within the effective section is extracted by the change point extracting section, and the effective section is subdivided by the subdivision section at the rising change point so that a section including two or more sounds does not occur.

［実施例］以下、本発明の一実施例を図面を参照しながら詳述す
る。Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.

自動採譜方式まず、本発明が適用される自動採譜方式について説明
する。Automatic transcription system First, an automatic transcription system to which the present invention is applied will be described.

第４図において、中央処理ユニット（CPU）１は、当
該装置の全体を制御するものであり、バス２を介して接
続されている主記憶装置３に格納されている第５図に示
す採譜処理プログラムを実行するものである。バス２に
は、CPU1及び主記憶装置３に加えて、入力装置としての
キーボード４、出力装置としての表示装置５、ワーキン
グメモリとして用いられる補助記憶装置６及びアナログ
／デジタル変換器７が接続されている。In FIG. 4, a central processing unit (CPU) 1 controls the whole of the apparatus, and performs a transcription process shown in FIG. 5 stored in a main storage device 3 connected via a bus 2. Execute the program. In addition to the CPU 1 and the main storage device 3, a keyboard 4 as an input device, a display device 5 as an output device, an auxiliary storage device 6 used as a working memory, and an analog / digital converter 7 are connected to the bus 2. I have.

アナログ／デジタル変換器７には、例えば、マイクロ
フォンでなる音響信号入力装置８が接続されている。こ
の音響信号入力装置８は、ユーザによって発声された歌
唱やハミングや、楽器から発生された楽音等の音響信号
を捕捉して電気信号に変換するものであり、その電気信
号をアナログ／デジタル変換器７に出力するものであ
る。An audio signal input device 8 including, for example, a microphone is connected to the analog / digital converter 7. The acoustic signal input device 8 captures an acoustic signal such as singing or humming uttered by a user or a musical tone generated from a musical instrument and converts the signal into an electric signal, and converts the electric signal into an analog / digital converter. 7 is output.

CPU1は、キーボード入力装置４によって処理が指令さ
れたとき、当該採譜処理を開始し、主記憶装置３に格納
されているプログラムを実行してアナログ／デジタル変
換器７によってデジタル信号に変換された音響信号を一
旦補助記憶装置６に格納し、その後、これら音響信号を
上述のプログラムを実行して楽譜データに変換して必要
に応じて表示装置５に出力するようになされている。When a process is instructed by the keyboard input device 4, the CPU 1 starts the transcription process, executes a program stored in the main storage device 3, and converts the sound converted into a digital signal by the analog / digital converter 7. The signals are temporarily stored in the auxiliary storage device 6, and thereafter, these sound signals are converted into musical score data by executing the above-described program and output to the display device 5 as necessary.

次に、CPU1が実行する音響信号を取り込んだ後の採譜
処理を第５図の機能レベルで示すフローチャートに従っ
て詳述する。Next, the music transcription process performed by the CPU 1 after capturing the audio signal will be described in detail with reference to the flowchart shown in the functional level of FIG.

まず、CPU1は、音響信号を自己相関分析して分析周期
毎に音響信号のピッチ情報を抽出し、また２乗和処理し
て分析周期毎にパワー情報を抽出し、その後ノイズ除去
や平滑化処理等の後処理を実行する（ステップSP1、SP
2）。その後、CPU1は、ピッチ情報については、その音
程軸周りの分布状況に基づいて絶対音程軸に対する音響
信号の音程軸が有するずれ量を算出し、得られたピッチ
情報をその音程軸のずれ量に応じてシフトさせるチュー
ニング処理を実行する（ステップSP3）。すなわち、音
響信号を発生した歌唱者または楽器が有する音程軸と絶
対音程軸との差が小さくなるようにピッチ情報を修正す
る。First, the CPU 1 performs an autocorrelation analysis of the acoustic signal to extract pitch information of the acoustic signal at each analysis cycle, and also performs a sum-of-squares process to extract power information at each analysis cycle, and then performs noise removal and smoothing processing. And other post-processing (steps SP1, SP
2). Thereafter, for the pitch information, the CPU 1 calculates a shift amount of the pitch axis of the acoustic signal with respect to the absolute pitch axis based on the distribution around the pitch axis, and converts the obtained pitch information into a shift amount of the pitch axis. A tuning process for shifting according to the position is executed (step SP3). That is, the pitch information is corrected so that the difference between the pitch axis and the absolute pitch axis of the singer or musical instrument that has generated the acoustic signal is reduced.

次いで、CPU1は、得られたピッチ情報が同一音程を指
示するものと考えられるピッチ情報の連続期間を得て、
音響信号を１音ごとのセグメントに切り分けるセグメン
テーションを実行し、また、得られたパワー情報の変化
に基づいてセグメンテーションを実行する（ステップSP
4、SP5）。これら得られた両者のセグメント情報に基づ
いて、CPU1は、４分音符や８分音符等の時間長に相当す
る基準長を算出してこの基準長に基づいてより詳細にセ
グメンテーションを実行する（ステップSP6）。Next, the CPU 1 obtains a continuous period of pitch information in which the obtained pitch information is considered to indicate the same pitch,
A segmentation is performed to divide the acoustic signal into segments for each sound, and a segmentation is performed based on the obtained change in the power information (step SP
4, SP5). On the basis of these two pieces of segment information obtained, the CPU 1 calculates a reference length corresponding to a time length of a quarter note, an eighth note, and the like, and performs more detailed segmentation based on the reference length (step SP6).

CPU1は、このようにしてセグメンテーションされたセ
グメントのピッチ情報に基づきそのピッチ情報が最も近
いと判断できる絶対音程軸上の音程にそのセグメントの
音程を同定し、さらに、同定された連続するセグメント
の音程が同一か否かに基づいて再度セグメンテーション
を実行する（ステップSP7、SP8）。The CPU 1 identifies the pitch of the segment as a pitch on the absolute pitch axis that can determine that the pitch information is the closest based on the pitch information of the segment thus segmented, and further identifies the pitch of the identified continuous segment. Segmentation is again performed based on whether or not are the same (steps SP7 and SP8).

その後、CPU1は、チューニング後のピッチ情報を音程
軸周りに集計して得た音程の出現頻度と、調に応じて定
まる所定の重み付け係数との積和を求めてこの積和の最
大情報に基づいて、例えば、ハ長調やイ短調というよう
に入力音響信号の楽曲の調を決定し、決定された調にお
ける音階の所定の音程についてその音程をピッチ情報に
ついて見直して音程を確認、修正する（ステップSP9、S
P10）。次いで、CPU1は、最終的に決定された音程から
連続するセグメントについて同一なものがあるか否か、
また連続するセグメント間でパワーの変化があるか否か
に基づいてセグメンテーションの見直しを実行し、最終
的なセグメンテーションを行なう（ステップSP11）。Thereafter, the CPU 1 obtains a product sum of the frequency of occurrence of the pitch obtained by summing the pitch information after tuning around the pitch axis and a predetermined weighting coefficient determined according to the key, and based on the maximum information of the product sum. Then, for example, the key of the musical composition of the input sound signal is determined, such as C major or A minor, and for a predetermined pitch of the scale in the determined key, the pitch is reviewed with respect to pitch information to confirm and correct the pitch (step SP9, S
P10). Next, the CPU 1 determines whether or not there is the same one for consecutive segments from the finally determined pitch,
The segmentation is reviewed based on whether or not there is a change in power between consecutive segments, and final segmentation is performed (step SP11).

このようにして音程及びセグメントが決定されると、
CPU1は、楽曲は１拍目から始まる、フレーズの最後の音
は次の小節にまたがらない、小節ごとに切れ目がある等
の観点から小節を抽出し、この小節情報及びセグメンテ
ーション情報から拍子を決定し、この決定された拍子情
報及び小節の長さからテンポを決定する（ステップSP1
2、SP13）。Once the pitch and segment are determined in this way,
The CPU 1 extracts measures from the viewpoint that the music starts from the first beat, the last sound of the phrase does not extend to the next measure, and there is a break in each measure, and determines the time signature from the measure information and the segmentation information. The tempo is determined from the determined time signature information and the length of the bar (step SP1).
2, SP13).

そして、CPU1は決定された音程、音長、調、拍子及び
テンポの情報を整理して最終的に楽譜データを作成する
（ステップSP14）。Then, the CPU 1 organizes the information on the determined pitch, pitch, key, beat, and tempo to finally create the musical score data (step SP14).

パワー情報に基づくセグメンテーション次に、このような自動採譜方式における音響信号のパ
ワー情報に基づくセグメンテーション処理（第５図ステ
ップSP5）について、第１図及び第２図のフローチャー
トを用いて詳述する。なお、第１図はかかる処理を機能
レベルで示すフローチャートであり、第２図は第１図を
より詳細に示すフローチャートである。Segmentation Based on Power Information Next, a segmentation process (step SP5 in FIG. 5) based on the power information of the audio signal in such an automatic transcription system will be described in detail with reference to the flowcharts of FIGS. FIG. 1 is a flowchart showing such processing at a functional level, and FIG. 2 is a flowchart showing FIG. 1 in more detail.

また、音響信号のパワー情報としては、分析周期内の
各サンプリング点について音響信号を２乗し、これら２
乗値の総和をその分析周期におけるパワー情報として用
いている。As power information of the acoustic signal, the acoustic signal is squared at each sampling point in the analysis cycle, and
The sum of the power values is used as power information in the analysis cycle.

CPU1は各分析点のパワー情報を閾値と比較して閾値よ
り大きい区間と小さい区間に音響信号を分割し、閾値よ
り大きい区間を有効区間のセグメントとし、小さい区間
を無効区間のセグメントとし、有効区間の初めにセグメ
ント開始の印を付し、終わりにセグメント終了の印を付
す（ステップSP15、16）。このようにしたのは、パワー
情報が小さい範囲では音響信号の音程が不安定なことが
多く、音程同定が適切になされない場合が多いためであ
り、また、休符区間を検出するためである。The CPU 1 compares the power information of each analysis point with a threshold, divides the audio signal into a section larger than the threshold and a section smaller than the threshold, sets a section larger than the threshold as a segment of an effective section, sets a smaller section as a segment of an invalid section, Is marked at the beginning of the segment and the end of the segment is marked at the end (steps SP15 and SP16). The reason for this is that the pitch of the acoustic signal is often unstable in a range where the power information is small, and the pitch identification is often not performed properly, and also to detect a rest period. .

次に、CPU1は分割した有効セグメント内において、パ
ワー情報の変化関数を演算し、この変化関数からパワー
情報の立上り変化点を抽出して抽出された立上り変化点
で当該有効セグメントを細区分する（ステップSP17、1
8）。このようにしたのは、ある程度のパワーを保った
まま次の音に移行することがあるためであり、この場合
においても次の音の開始時にはパワーを増大させると考
えられるためである。Next, the CPU 1 calculates a change function of the power information in the divided effective segments, extracts a rising change point of the power information from the change function, and subdivides the effective segment into the extracted rising change points ( Step SP17, 1
8). The reason for this is that the sound may shift to the next sound while maintaining a certain level of power. In this case also, it is considered that the power is increased at the start of the next sound.

次に、かかる処理を第２図のフローチャートを用いて
より詳細に説明をする。Next, such processing will be described in more detail with reference to the flowchart of FIG.

CPU1は、まず分析点パラメータｔを０クリアした後、
処理すべき分析点データが終了していないことを確認し
てその分析点における音響信号のパワー情報power
（ｔ）が閾値θｐより小さいか否かを判断する（ステッ
プSP19〜21）。CPU1 first clears the analysis point parameter t to 0,
Confirm that the analysis point data to be processed is not completed, and power information of the acoustic signal at that analysis point
It is determined whether (t) is smaller than the threshold value θp (steps SP19 to SP21).

CPU1は閾値θｐよりパワー情報power（ｔ）が小さい
場合には分析点パラメータｔをインクリメントして再度
ステップSP20に戻り次の分析点のパワー情報について判
定する。When the power information power (t) is smaller than the threshold value θp, the CPU 1 increments the analysis point parameter t and returns to step SP20 again to determine the power information of the next analysis point.

一方、CPU1はステップSP21においてパワー情報power
（ｔ）の値が閾値θｐ以上である場合にはその分析点に
セグメント開始点として印を付け、次のステップSP24以
降の処理に移る（ステップSP23）。On the other hand, the CPU 1 determines in step SP21 that the power information power
If the value of (t) is equal to or larger than the threshold value θp, the analysis point is marked as a segment start point, and the process proceeds to the next step SP24 and subsequent steps (step SP23).

CPU1は全ての分析点について処理が終了していないこ
とを確認し、再度パワー情報power（ｔ）の値が閾値θ
ｐより小さいか否か判定し、閾値θｐ以上ならば分析周
期パラメータｔをインクリメントしてステップSP24に戻
る（ステップSP24〜SP26）。他方、パワー情報power
（ｔ）の値が閾値θｐより小さくなった場合にはセグメ
ント終了点として印を当該分析点に付けて上述のステッ
プSP20に戻る（ステップSP27）。The CPU 1 confirms that the processing has not been completed for all the analysis points, and again sets the value of the power information power (t) to the threshold θ.
It is determined whether it is smaller than p, and if it is not smaller than the threshold value θp, the analysis cycle parameter t is incremented and the process returns to step SP24 (steps SP24 to SP26). On the other hand, power information power
When the value of (t) becomes smaller than the threshold value θp, a mark is added to the analysis point as a segment end point, and the process returns to step SP20 (step SP27).

CPU1は、以上の処理を、ステップSP20またはステップ
SP24において全ての分析点について処理が終了したこと
を検出するまで行ない、全ての分析点のパワー情報powe
r（ｔ）と閾値θｐとを比較して閾値θｐ以上の有効セ
グメントと閾値θｐ以下の無効セグメントとに音響信号
を区分してステップSP28以降の処理に移る。CPU1 executes the above processing in step SP20 or step
In SP24, the processing is performed until it is detected that processing has been completed for all analysis points, and power information powe for all analysis points is
By comparing r (t) with the threshold value θp, the audio signal is divided into an effective segment equal to or more than the threshold value θp and an invalid segment equal to or less than the threshold value θp, and the process proceeds to step SP28 and thereafter.

これ以降においては、CPU1は分析点パラメータｔを０
クリアして最初の分析点から以下の処理を開始する。CP
U1は処理すべき分析点データが終了していないことを確
認した後、セグメント開始の印を付けられた分析点か否
かを判断する（ステップSP29、30）。セグメント開始の
分析点でない場合には、CPU1は分析点パラメータｔをイ
ンクリメントして上述のステップSP29に戻る。After this, the CPU 1 sets the analysis point parameter t to 0.
Clear and start the following processing from the first analysis point. CP
After confirming that the analysis point data to be processed has not been completed, U1 determines whether or not the analysis point has been marked with the start of the segment (steps SP29 and SP30). If it is not the analysis point at the start of the segment, the CPU 1 increments the analysis point parameter t and returns to step SP29.

一方、セグメント開始の分析点を検出した場合には、
すなわち、有効セグメントの開始点を見付け出すと、処
理すべき分析点データが残っていないことを再度確認
し、さらにセグメント終了の分析点か否かを判断する
（ステップSP32、33）。セグメント終了の分析点でない
場合には、従って有効セグメントの分析点であるので、
パワー情報power（ｔ）の変化関数（以降の処理でパワ
ー情報の立上り抽出に用いるので、以下では、立上り抽
出関数と呼ぶ）ｄ（ｔ）を（１）式に従い求める（ステ
ップSP34）。On the other hand, if the analysis point at the start of the segment is detected,
That is, when the start point of the effective segment is found, it is confirmed again that there is no analysis point data to be processed, and it is further determined whether or not the analysis point is at the end of the segment (steps SP32 and SP33). If it is not the analysis point at the end of the segment, it is therefore the analysis point of the active segment.
A change function d (t) of the power information power (t) (hereinafter, referred to as a rising extraction function, which is used for the extraction of the rising edge of the power information in the following processing) is obtained according to the equation (1) (step SP34).

ｄ（ｔ）＝｛power（ｔ＋ｋ）−power（ｔ）｝／｛power（ｔ＋ｋ）＋power（ｔ）｝ …（１）ただし、ｋはパワーの変化をとらえるのに適当な時間
を示す自然数である。d (t) = {power (t + k) -power (t)} / {power (t + k) + power (t)} (1) where k is a natural number indicating an appropriate time for capturing a change in power. .

その後、CPU1は求めた立上り抽出関数ｄ（ｔ）の値が
閾値θｄより小さいか否かを判断し、小さい場合には分
析点パラメータｔをインクリメントしてステップSP32に
戻る（ステップSP35、36）。他方、立上り抽出関数ｄ
（ｔ）が閾値θｄ以上になった場合にはその分析点に新
たなセグメント開始としての印を付ける（ステップSP3
7）。これにより、有効セグメントが細区分されたこと
になる。Thereafter, the CPU 1 determines whether or not the value of the rising extraction function d (t) obtained is smaller than the threshold value θd. If it is smaller, the analysis point parameter t is incremented and the process returns to step SP32 (steps SP35 and SP36). On the other hand, the rising extraction function d
If (t) is equal to or larger than the threshold value θd, the analysis point is marked as a new segment start (step SP3).
7). Thus, the effective segment is subdivided.

その後、CPU1は全ての分析点データについて処理が終
了していないことを確認した後、当該処理中の分析点に
セグメント終了の印が付されているか否かを判断する。
付されている場合には、上述したステップSP29に戻って
次の有効セグメントの開始分析点の検出処理に戻る（ス
テップSP38、39）。Thereafter, after confirming that the processing has not been completed for all the analysis point data, the CPU 1 determines whether or not the analysis point being processed is marked with a segment end mark.
If so, the process returns to step SP29 described above and returns to the process of detecting the start analysis point of the next valid segment (steps SP38 and SP39).

他方、セグメント終了の分析点でない場合には、パワ
ー情報power（ｔ）より立上り抽出関数ｄ（ｔ）を
（１）式により求め、立上り抽出関数ｄ（ｔ）が閾値θ
ｄより小さいか否かを判断する（ステップSP40、41）。
小さくなると、上述のステップSP32に戻ってパワー情報
の立上り変化点の抽出処理に進む。一方、ステップSP41
において分析点の立上り抽出関数ｄ（ｔ）が継続して閾
値θｄ以上ならば分析点パラメータｔをインクリメント
して次の分析点について立上り抽出関数ｄ（ｔ）が閾値
θｄより小さくなったか否かを判断するべくステップSP
38に戻る。On the other hand, if the analysis point is not the analysis point at the end of the segment, the rising extraction function d (t) is obtained from the power information power (t) by Expression (1), and the rising extraction function d (t) is determined by the threshold θ
It is determined whether it is smaller than d (steps SP40, 41).
When it becomes smaller, the process returns to step SP32 and proceeds to the process of extracting the rising change point of the power information. On the other hand, step SP41
If the rising point extraction function d (t) of the analysis point continues to be equal to or greater than the threshold value θd, the analysis point parameter t is incremented to determine whether the rising point extraction function d (t) has become smaller than the threshold value θd for the next analysis point. Step SP to judge
Return to 38.

上述の処理を繰り返すことにより、ステップSP29、SP
32またはSP38で全ての分析点について処理が終了したこ
とを検出すると、当該プログラムを終了させる。By repeating the above processing, steps SP29 and SP
When it is detected at 32 or SP38 that processing has been completed for all analysis points, the program is terminated.

第３図はかかる処理によるセグメンテーションの一例
を示すものである。この例の場合、ステップSP27までの
処理を繰り返すことによりパワー情報power（ｔ）に基
づいて有効セグメントS1〜S8及び無効セグメントS11〜S
18に区分される。また、ステップSP28以降の処理を繰り
返すことにより、立上り抽出関数ｄ（ｔ）に基づいて有
効セグメントS4はパワーの立上り変化点によってS41及
びS42に細区分される。FIG. 3 shows an example of the segmentation by such processing. In this example, the effective segments S1 to S8 and the invalid segments S11 to S8 are repeated based on the power information power (t) by repeating the processing up to step SP27.
It is divided into 18. Further, by repeating the processing after step SP28, the effective segment S4 is subdivided into S41 and S42 according to the rising transition point of the power based on the rising extraction function d (t).

従って、上述の実施例によれば、音響信号をパワー情
報が閾値以上の有効セグメントと閾値以下の無効セグメ
ントに区分すると共に、その有効セグメントをパワー情
報の立上り変化点によって細区分するようにしたので、
ノイズやパワーの揺らぎによる誤ったセグメンテーショ
ンを行なうことのない精度の高いセグメンテーションを
実行することができる。Therefore, according to the above-described embodiment, the audio signal is divided into an effective segment whose power information is equal to or larger than the threshold value and an invalid segment whose power information is equal to or smaller than the threshold value, and the effective segment is subdivided by the rising transition point of the power information. ,
It is possible to execute highly accurate segmentation without performing erroneous segmentation due to noise or fluctuation of power.

また、閾値以上のパワー情報をもつ区間を有効なセグ
メントとしているので、音声パワーが小さい音程の不安
定な期間を音程同定処理等の以降の処理に用いることも
なくし得る。Further, since a section having power information equal to or greater than the threshold value is set as a valid segment, it is possible to avoid using an unstable period during which the sound power is low for an interval such as a pitch identification process.

さらに、パワーの立上り変化点を抽出して細区分する
ようにしたので、パワーが所定以上保ったまま次の音に
移行する場合にも良好にセグメンテーションを実行させ
ることができる。Further, since the rising change point of the power is extracted and subdivided, the segmentation can be favorably performed even when shifting to the next sound while the power is maintained at a predetermined level or more.

他の実施例なお、上述の実施例においては、パワー情報として音
響信号の２乗和を用いたものを示したが、他のパラメー
タを用いても良い。例えば、２乗和の平方根を用いても
良い。また、立上り変化点を抽出する関数を（１）式の
ように求めたが、他のパラメータを用いても良く、例え
ば、（１）式の分子のみを用いた関数によって変化点を
抽出するようにしても良い。Other Embodiments In the above-described embodiment, the power information is obtained by using the sum of squares of the audio signal. However, other parameters may be used. For example, the square root of the sum of squares may be used. In addition, the function for extracting the rising change point is obtained as in equation (1), but other parameters may be used. For example, the change point may be extracted by a function using only the numerator of equation (1). You may do it.

また、上述の実施例においては、第５図に示す全ての
処理をCPU1が主記憶装置３に格納されているプログラム
に従って実行するものを示したが、その一部または全部
の処理をハードウェア構成で実行するようにしても良
い。例えば、第４図との対応部分に同一符号を付した第
６図に示すように、音響信号入力装置８からの音響信号
を増幅回路10を介して増幅した後、さらに前置フィルタ
11を介してアナログ／デジタル変換器12に与えてデジタ
ル信号に変換し、このデジタル信号に変換された音響信
号を信号処理プロセッサ13が自己相関分析してピッチ情
報を抽出し、また２乗和処理してパワー情報を抽出して
CPU1によるソフトウェア処理系に与えるようにしても良
い。このようなハードウェア構成（10〜13）に用いられ
る信号処理プロセッサ13としては、音声帯域の信号をリ
アルタイム処理し得ると共に、ホストのCPU1とのインタ
フェース信号が用意されているプロセッサ（例えば、日
本電気株式会社製μPD7720）を適用し得る。In the above-described embodiment, the CPU 1 executes all the processing shown in FIG. 5 according to the program stored in the main storage device 3. However, a part or all of the processing is performed by a hardware configuration. May be executed. For example, as shown in FIG. 6 in which the same reference numerals are given to the corresponding parts in FIG. 4, the audio signal from the audio signal input device 8 is amplified through the amplifier circuit 10, and then the pre-filter is added.
The digital signal is supplied to an analog / digital converter 12 via an analog-to-digital converter 11 and converted into a digital signal. The acoustic signal converted to the digital signal is subjected to autocorrelation analysis by a signal processor 13 to extract pitch information, and a square sum processing is performed. And extract power information
It may be provided to the software processing system by the CPU1. As the signal processor 13 used in such a hardware configuration (10 to 13), a processor capable of processing a signal in a voice band in real time and providing an interface signal with a host CPU 1 (for example, NEC Corporation) Co., Ltd. μPD7720) can be applied.

［発明の効果］以上のように、本発明によれば、パワー情報が閾値よ
り大きい区間及び小さい区間に分け、かつその大きい区
間をパワー情報の立上り変化点で細区分するようにした
ので、ノイズ成分等によって誤ってセグメンテーション
がなされることを少なくし得ると共に、音程同定処理等
の以降の処理を良好に実行させて楽譜データの採譜精度
を向上させることができる。[Effects of the Invention] As described above, according to the present invention, the power information is divided into sections larger and smaller than the threshold, and the larger sections are subdivided at the rising change points of the power information. It is possible to reduce erroneous segmentation due to components and the like, and it is possible to satisfactorily execute the subsequent processing such as the pitch identification processing, thereby improving the transcription accuracy of musical score data.

[Brief description of the drawings]

第１図は本発明の一実施例にかかるパワー情報に基づく
セグメンテーション処理を示す概略フローチャート、第
２図はセグメンテーション処理をより詳細に示すフロー
チャート、第３図はかかる処理によるセグメンテーショ
ンの一例を示す特性曲線図、第４図は本発明を適用する
自動採譜方式の構成を示すブロック図、第５図はその自
由採譜処理手順を示すフローチャート、第６図は自動採
譜方式の他の構成を示すブロック図である。１……CPU、３……主記憶装置、６……補助記憶装置、
７……アナログ／デジタル変換器、８……音響信号入力
装置。FIG. 1 is a schematic flowchart showing a segmentation process based on power information according to one embodiment of the present invention, FIG. 2 is a flowchart showing the segmentation process in more detail, and FIG. 3 is a characteristic curve showing an example of the segmentation by the process. FIG. 4 is a block diagram showing a configuration of an automatic transcription system to which the present invention is applied, FIG. 5 is a flowchart showing a free transcription process, and FIG. 6 is a block diagram showing another configuration of the automatic transcription system. is there. 1 ... CPU, 3 ... main storage device, 6 ... auxiliary storage device,
7 ... A / D converter, 8 ... Acoustic signal input device.

───────────────────────────────────────────────────── フロントページの続き (72)発明者水野正典東京都港区芝５丁目７番15号日本電気技術情報システム開発株式会社内審査官新井重雄 ──────────────────────────────────────────────────続き Continued on the front page (72) Inventor Masanori Mizuno Examiner, NEC Technical Information System Development Co., Ltd. Shigeo Arai, 5-7-15 Shiba, Minato-ku, Tokyo

Claims

(57) [Claims]

An information processing apparatus comprising: at least a process of extracting power information of an input audio signal; and a segmentation process of dividing the audio signal into sections that can be regarded as having the same pitch based on the extracted power information. In the automatic music transcription method for converting into musical score data, the segmentation processing is divided into an effective section in which the power information is equal to or more than a predetermined value and an invalid section in which the power information is equal to or less than the predetermined value. An automatic music transcription method, comprising: a process of extracting points; and a process of subdividing the valid section based on the extracted rising transition points.

2. A power extraction unit for extracting power information from an input audio signal, and a segmentation unit for dividing the audio signal into sections that can be regarded as having the same pitch based on the extracted power information. In an automatic music transcription apparatus for converting the acoustic signal into musical score data, the segmentation means may be divided into an effective section in which the power information is equal to or more than a predetermined value and an invalid section in which the power information is equal to or less than the predetermined value. An automatic music transcription apparatus, comprising: a change point extracting unit that extracts a rising change point of the power information; and a subdivision unit that subdivides the effective section by the extracted rising change point.