JP2713952B2

JP2713952B2 - Automatic music transcription method and device

Info

Publication number: JP2713952B2
Application number: JP63046113A
Authority: JP
Inventors: 七郎鶴田; 洋典高島; 正樹藤本; 正典水野
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1988-02-29
Filing date: 1988-02-29
Publication date: 1998-02-16
Anticipated expiration: 2013-02-16
Also published as: JPH01219623A

Abstract

PURPOSE:To well perform score processing after accurate segmentation is performed, by performing segmentation for dividing the power data extracted from an acoustic signal into an effective section equal to or more than a threshold value and an ineffective section equal to or less than said threshold value. CONSTITUTION:The digital acoustic signal such as singing controlled by a CPU 1 corresponding to the indication from a keyboard 4 and passing through an acoustic signal input apparatus 8 and an A/D converter 7 is stored in the auxiliary memory device 6 of a working memory. The stored content thereof is subjected to process control processing by the CPU 1 corresponding to the program of a main memory device 3 to extract power data to perform segmentation for dividing the acoustic signal into effective and ineffective sections according to whether the extracted power data is a threshold value or more and the effective section of a predetermined length of less is also divided as the ineffective section. By this accurate segmentation unaffected by noise or intervale fluctuation, the score taking processing on and after corresponding to the segmentation is effectively performed and a highly accurate score is formed.

Description

【発明の詳細な説明】［産業上の利用分野］本発明は、歌唱音声やハミング音声や楽器音等の音響
信号から楽譜データを作成する自動採譜方法及び装置に
関し、特に音響信号のピッチ情報とパワー情報から同一
音程とみなせる区間を１音毎のセグメントに区分するよ
うにした自動採譜方法及び装置に関するものである。Description: BACKGROUND OF THE INVENTION The present invention relates to an automatic music transcription method and apparatus for creating musical score data from acoustic signals such as singing voices, humming voices, and instrument sounds. The present invention relates to an automatic transcription method and apparatus in which sections that can be regarded as having the same pitch from power information are divided into segments for each note.

［従来の技術］歌唱音声やハミング音声や楽器音等の音響信号を楽譜
データに変換する自動採譜方式においては、音響信号か
ら楽譜としての基本的な情報である音長、音程、調、拍
子及びテンポを検出することを有する。[Prior Art] In an automatic transcription system for converting an acoustic signal such as a singing voice, a humming voice, or a musical instrument sound into musical score data, a sound length, a pitch, a key, a time signature, and the like, which are basic information as a musical score from an acoustic signal. Detecting the tempo.

ところで、音響信号は基本波形の繰返し波形を連続的
に含む信号であるだけであり、上述した各情報を直ちに
得ることはできない。例えば、特開昭58−181090号「音
符作成装置」には、音響信号を自己相関分析し、分析結
果からピッチ情報（音響信号の音高を表す基本波形の繰
返し情報）を算出し、同一ピッチの長さを決定する装置
が開示されている。しかしながら、このものは、ただ単
に音響信号から抽出したピッチ情報に基づいて入力楽音
の音階上の高さと長さを抽出するだけであるため、ピッ
チ情報を同一音程とみなされる区間（セグメント）に区
分するセグメンテーション処理の精度が低く、絶対音階
上に音程への同定処理、或いは同定された音程情報に基
づいてなされる音響信号の調の決定、さらにはセグメン
トに基づく音響信号の拍子或いはテンポの決定といった
後処理の精度が低いといった課題を抱えるものであっ
た。By the way, an acoustic signal is only a signal that continuously includes a repetitive waveform of a basic waveform, and the above-described information cannot be obtained immediately. For example, Japanese Patent Application Laid-Open No. 58-181090 discloses a "note creation device" which performs an autocorrelation analysis on an acoustic signal, calculates pitch information (repetition information of a basic waveform representing the pitch of the acoustic signal) from the analysis result, and outputs the same pitch. An apparatus for determining the length of a is disclosed. However, this method merely extracts the pitch and length on the scale of the input musical tone based on the pitch information extracted from the acoustic signal, and thus classifies the pitch information into sections (segments) regarded as having the same pitch. The accuracy of the segmentation process to be performed is low, the process of identifying the pitch on the absolute scale, the determination of the key of the audio signal based on the identified pitch information, and the determination of the beat or tempo of the audio signal based on the segment. There is a problem that the post-processing accuracy is low.

そこで、従来の自動採譜方式においては、まず、ピッ
チ情報とパワー情報を分析周期毎に抽出し、その後、少
なくとも抽出されたパワー情報から音響信号を同一音程
とみなせるセグメントに区分し、次いで、セグメントの
ピッチ情報から各セグメントの音響信号の音程として絶
対音程軸にそった音程に同定し、同定された音程情報に
基づいて音響信号の調を決定し、さらに、セグメントに
基づいて音響信号の拍子及びテンポを決定するという順
序で各情報を得ていた。Therefore, in the conventional automatic transcription method, first, pitch information and power information are extracted for each analysis cycle, and then, based on at least the extracted power information, an audio signal is divided into segments that can be regarded as having the same pitch. From the pitch information, the pitch of the audio signal of each segment is identified as a pitch along the absolute pitch axis, the tone of the audio signal is determined based on the identified pitch information, and the beat and tempo of the audio signal are determined based on the segment. Each information was obtained in the order of deciding.

従って、音程、拍子、テンポ等は、セグメント（音
長）に基づき決定されるので、セグメンテーション処理
は、特に楽譜データを作成する上で重要な要素となって
いる。Therefore, the pitch, time signature, tempo, and the like are determined based on the segment (length), and the segmentation process is an important element particularly in creating score data.

［発明が解決しようとする課題］ところで、音響信号は音が変化した直後にはそのパワ
ーが大きくなるという特徴を有しており、上述したよう
にこの特徴を利用してパワー情報によってセグメンテー
ションを行なっている。[Problems to be Solved by the Invention] By the way, an acoustic signal has a feature that its power becomes large immediately after a change in sound, and as described above, segmentation is performed using power information by using this feature. ing.

しかしながら、音響信号、特に人によって歌唱された
音響信号は、パワー情報の変化も常に特定パターンで変
化するものでなく、変化パターンに対して揺らぎを有し
ており、また、外部音等の突発的な音をも含まれてい
る。そのため、パワー情報の変化に着目して単純にセグ
メンテーションしても必しも音毎の区分を良好に行なう
ことはできなかった。However, an acoustic signal, especially an acoustic signal sung by a human, does not always change in power information in a specific pattern, and has fluctuations with respect to the change pattern. Some sounds are included. For this reason, even if a simple segmentation is performed while focusing on a change in the power information, it is not always possible to satisfactorily perform the division for each sound.

上述のように、セグメンテーションは楽譜データを作
成する上で重要な要素であり、セグメンテーションの精
度が低いと、音程など最終的に得られる楽譜データの精
度も著しく低くなるので、パワー情報からのセグメンテ
ーション処理自体の精度も高くすることが望まれる。As described above, segmentation is an important element in creating score data, and if the accuracy of segmentation is low, the accuracy of score data that is finally obtained, such as pitch, will also be extremely low. It is desired to increase the accuracy of itself.

一方、特開昭58−223198号「音節入力方式」には、音
声認識のための音節のセグメンテーションを行うように
した音節入力方式が開示されている。このものは、入力
音声のパワーを予め定めた長さのフレーム毎に算出し、
算出パワーをしきい値判別して無音区間を抽出するとと
もに音声パワーの谷をもって音節の終端とし、無音区間
と音節終端とにより音声区間検出を行うものであり、か
な文字列を出力する音声認識方式において入力音声から
音声区間を検出することを目的とするものである。しか
しながら、この音声入力方式は、音節間に長いポーズが
なくとも音声区間を検出する目的で、ただ単に音声パワ
ーの谷の情報を抽出しているに過ぎないものであり、楽
音と異なり音程を伴わないかな文字列にとって音程の情
報は元来不要であるといった背景があるにせよ、音声パ
ワーの谷をもって音節を区分する技術をそのまま基本波
形の繰り返し波形を連続的に含む音響信号の自動採譜技
術に適用できないことは明らかであった。すなわち、歌
唱音等の自動採譜においては、最終的には音響信号を音
符の一つ一つに同定する必要があり、音節区分の前に同
一音程が連続する期間を検出する技術が不可欠であるか
ら、パワーにのみ着目してセグメンテーションを行って
も自動採譜できないことは明らかであった。On the other hand, Japanese Patent Application Laid-Open No. 58-223198, "Syllable Input Method", discloses a syllable input method in which syllable segmentation for voice recognition is performed. This calculates the power of the input voice for each frame of a predetermined length,
A speech recognition method that extracts a silent section by determining the calculated power as a threshold value, extracts a silent section with a valley of voice power, detects a voice section by using a silent section and a syllable end, and outputs a kana character string. The purpose of the present invention is to detect a voice section from an input voice. However, this voice input method merely extracts information on the valley of voice power in order to detect a voice section even if there is no long pause between syllables. Despite the background that pitch information is originally unnecessary for kanakana character strings, the technology for segmenting syllables based on the valleys of audio power has been applied to the automatic transcription technology for acoustic signals that contain repeated waveforms of the basic waveform as they are. Apparently not applicable. That is, in the automatic transcription of singing sounds and the like, it is necessary to finally identify an acoustic signal for each note, and a technique for detecting a period in which the same interval continues before a syllable is indispensable. Thus, it was clear that automatic transcription could not be performed even if the segmentation was performed by focusing only on power.

本発明は、以上の点を考慮してなされたもので、ピッ
チ情報とパワー情報とに基づいたセグメンテーションを
良好に実行することができ、最終的な楽譜データの精度
を向上させることのできる自動採譜方法及び装置を提供
しようとするものである。The present invention has been made in consideration of the above points, and can perform segmentation based on pitch information and power information satisfactorily, and can improve the accuracy of final score data. It is intended to provide a method and apparatus.

［課題を解決するための手段］かかる課題を解決するため、本発明は、音響信号を音
符の時系列からなる楽譜データに変換する自動採譜方法
において、前記音響信号を自己相関分析し、分析周期毎
に音響信号のピッチ情報ならびにパワー情報を抽出し、
前記ピッチ情報が同一音程を指示する期間について、前
記パワー情報が所定値以上の有効区間と該所定値に満た
ない無効区間とに分け、かつ所定長さに満たない有効区
間については無効区間とし、１音毎にセグメントに区分
することを特徴とするものである。[Means for Solving the Problems] In order to solve such problems, the present invention provides an automatic music transcription method for converting an acoustic signal into musical score data composed of a time series of musical notes. Extract the pitch information and power information of the sound signal for each
For the period in which the pitch information indicates the same interval, the power information is divided into an effective section of a predetermined value or more and an invalid section of less than the predetermined value, and an effective section of less than a predetermined length is regarded as an invalid section, The sound is divided into segments for each sound.

［作用］本発明によれば、ピッチ情報から同一音程とみなせる
区間について、パワー情報をしきい値判別してセグメン
テーションを行うことにより、音響信号を１音毎のセグ
メントに区分し、ノイズ成分やパワーの揺らぎ等に影響
されることなく、休符期間を含む自動採譜を行うととも
に、所定長さに満たない有効区間については無効区間と
することにより、ノイズ等に起因する余りにも短い有効
区間を排除し、自動採譜の精度向上を果たすことができ
る。[Operation] According to the present invention, for a section that can be regarded as the same pitch based on the pitch information, the power information is subjected to threshold value discrimination and segmentation is performed, so that the acoustic signal is divided into segments for each sound, and noise components and power Automatic transcription including the rest period is performed without being affected by fluctuations of the sound, and effective sections shorter than a predetermined length are set as invalid sections, thereby eliminating too short valid sections due to noise and the like. In addition, the accuracy of automatic transcription can be improved.

［実施例］以下、本発明の一実施例を図面を参照しながら詳述す
る。Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.

自動採譜様式まず、本発明が適用される自動採譜方式について説明
する。Automatic transcription system First, an automatic transcription system to which the present invention is applied will be described.

第３図において、中央処理ユニット（CPU）１は、当
該装置の全体を制御するものであり、バス２を介して接
続されている主記憶装置３に格納されている第４図に示
す採譜処理プログラムを実行するものである。バス２に
は、CPU1及び主記憶装置３に加えて、入力装置としての
キーボード４、出力装置としての表示装置５、ワーキン
グメモリとして用いられる補助記憶装置６及びアナロブ
／デジタル変換器７が接続されている。In FIG. 3, a central processing unit (CPU) 1 controls the whole of the apparatus, and performs a musical notation processing shown in FIG. 4 stored in a main storage device 3 connected via a bus 2. Execute the program. To the bus 2, in addition to the CPU 1 and the main storage device 3, a keyboard 4 as an input device, a display device 5 as an output device, an auxiliary storage device 6 used as a working memory, and an analog / digital converter 7 are connected. I have.

アナログ／デジタル変換器７には、例えば、マイクロ
フォンからなる音響信号入力装置８が接続されている。
この音響信号入力装置８は、ユーザによって発声された
歌唱やハミングや、楽器ら発生された楽音等の音響信号
を捕捉して電気信号に変換するものであり、その電気信
号をアナログ／デジタル変換器７に出力するものであ
る。An audio signal input device 8 including, for example, a microphone is connected to the analog / digital converter 7.
The acoustic signal input device 8 captures an acoustic signal such as singing or humming uttered by a user or a musical tone generated from a musical instrument and converts the signal into an electric signal, and converts the electric signal into an analog / digital converter. 7 is output.

CPU1は、キーボード入力装置４によって処理が指令さ
れたとき、当該採譜処理を開始し、主記憶装置３に格納
されているプログラムを実行してアナログ・デジタル変
換器７によってデジタル信号に変換された音響信号を一
旦補助記憶装置６に格納し、その後、これら音響信号
を、上述のプログラムを実行することにより、音符の時
系列からなる楽譜データに変換し、必要に応じて表示装
置５に出力するようになされている。When a process is instructed by the keyboard input device 4, the CPU 1 starts the transcription process, executes a program stored in the main storage device 3, and converts the sound converted into a digital signal by the analog / digital converter 7. The signals are temporarily stored in the auxiliary storage device 6, and thereafter, these sound signals are converted into musical score data composed of time series of musical notes by executing the above-described program, and are output to the display device 5 as necessary. Has been made.

次に、CPU1が実行する音響信号を取り込んだ後の採譜
処理を第４図の機能レベルで示すフローチャートに従っ
て詳述する。Next, the transcription process performed by the CPU 1 after capturing the audio signal will be described in detail with reference to the flowchart shown in FIG.

まず、CPU1は、音響信号を自己相関分析して分析周期
毎に音響信号のピッチ情報を抽出し、また２乗和処理し
て分析周期毎にパワー情報を抽出し、その後ノイズ除去
や平滑化処理等の後処理を実行する（ステップSP1、SP
2）。その後、CPU1は、ピッチ情報については、その分
布状況に基づいて絶対音程軸に対する音響信号が有する
音程軸のずれ量を算出し、得られたピッチ情報をそのず
れ量に応じてシフトさせるチューニング処理を実行する
（ステップSP3）。すなわち、音響信号を発生した歌唱
者または楽器の音程軸と絶対音程軸との差が小さくなる
ようにピッチ惰報を修正する。First, the CPU 1 performs an autocorrelation analysis of the acoustic signal to extract pitch information of the acoustic signal at each analysis cycle, and also performs a sum-of-squares process to extract power information at each analysis cycle, and then performs noise removal and smoothing processing. And other post-processing (steps SP1, SP
2). Thereafter, for the pitch information, the CPU 1 calculates a shift amount of the pitch axis of the acoustic signal with respect to the absolute pitch axis based on the distribution state, and performs a tuning process of shifting the obtained pitch information according to the shift amount. Execute (step SP3). That is, the pitch coasting information is corrected so that the difference between the pitch axis of the singer or musical instrument that generated the acoustic signal and the absolute pitch axis becomes smaller.

次いで、CPU1は、得られたピッチ情報が同一音程を指
示するものと考えられるピッチ情報の連続期間を得て、
音響信号を１音ごとのセグメントに切り分けるセグメン
テーションを実行し、また、得られたパワー情報の変化
に基づいてセグメンテーションを実行する（ステップSP
4、SP5）。これら得られた両者のセグメント情報に基づ
いて、CPU1は、４分音符や８分音符等の時間長に相当す
る基準長を算出してこの基準長に基づいて再度セグメン
テーションを実行する（ステップSP6）。Next, the CPU 1 obtains a continuous period of pitch information in which the obtained pitch information is considered to indicate the same pitch,
A segmentation is performed to divide the acoustic signal into segments for each sound, and a segmentation is performed based on the obtained change in the power information (step SP
4, SP5). Based on these two pieces of segment information obtained, the CPU 1 calculates a reference length corresponding to a time length of a quarter note, an eighth note, etc., and executes the segmentation again based on this reference length (step SP6). .

CPU1は、このようにしてセグメンテーションされたセ
グメントのピッチ情報に基づきそのピッチ情報が最も近
いと判断できる絶対音程軸上の音程にそのセグメントの
音程を同定し、さらに、同定された連続するセグメント
の音程が同一か否かに基づいて再度セグメンテーション
を実行する（ステップSP7、SP8）。The CPU 1 identifies the pitch of the segment as a pitch on the absolute pitch axis that can determine that the pitch information is the closest based on the pitch information of the segment thus segmented, and further identifies the pitch of the identified continuous segment. Segmentation is again performed based on whether or not are the same (steps SP7 and SP8).

その後、CPU1は、チューニング処理後のピッチ情報の
分布状況に基づいて音程の出現頻度を検出し、この音程
の出現頻度と、調に応じて定まる所定の重み付け係数と
の積和を求めてこの積和の最大情報に基づいて、例え
ば、ハ長調やイ短調というように入力音響信号の楽曲の
調を決定し、決定された調における所定の音階の音程に
ついて、その音程をピッチ情報についてより詳細に見直
して音程を確認、修正する（ステップSP9、SP10）。次
いで、CUP1は、最終的に決定された音程から連続するセ
グメントについて同一なものがあるか否か、また連続す
るセグメント間でパワーの変化があるか否かに基づいて
セグメンテーションの見直しを実行し、最終的なセグメ
ンテーションを行なう（ステップSP11）。Thereafter, the CPU 1 detects the appearance frequency of the pitch based on the distribution of the pitch information after the tuning process, obtains the product sum of the appearance frequency of the pitch and a predetermined weighting coefficient determined according to the key, and obtains this product. On the basis of the maximum information of the sum, for example, the key of the music of the input audio signal is determined such as C major or A minor, and for the pitch of the predetermined scale in the determined key, the pitch is described in more detail with respect to the pitch information. Review and confirm and correct the pitch (steps SP9 and SP10). Next, CUP1 performs a segmentation review based on whether or not there is an identical one in consecutive segments from the finally determined pitch, and whether or not there is a change in power between consecutive segments, Final segmentation is performed (step SP11).

このようにして音程及びセグメント決定されると、CP
U1は、楽曲は１拍目から始まる、フレーズの最後の音は
次の小節にまたがらない、小節ごとに切れ目がある等の
観点から小節を抽出し、この小節情報及びセグメンテー
ション情報から拍子を決定し、この決定された拍子情報
及び小節の長さからテンポを決定する（ステップSP12、
SP13）。When the pitch and segment are determined in this way, the CP
U1 extracts measures from the viewpoint that the music starts from the first beat, the last sound of the phrase does not extend to the next measure, there is a break in each measure, etc., and determines the time signature from this measure information and segmentation information Then, a tempo is determined from the determined time signature information and the bar length (step SP12,
SP13).

そして、CPU1は決定された音程、音長、調、拍子及び
テンポの情報を整理して最終的に楽譜データを作成する
（ステップSP14）。Then, the CPU 1 organizes the information on the determined pitch, pitch, key, beat, and tempo to finally create the musical score data (step SP14).

パワー情報に基づくセグメンテーション次に、このような自動採譜方式におけるパワー情報に
基づくセグメンテーション処理（ステップSP5参照）に
ついて、第１図のフローチャートを用いて詳述する。Segmentation Based on Power Information Next, a segmentation process based on power information in such an automatic transcription system (see step SP5) will be described in detail with reference to the flowchart of FIG.

なお、音響信号のパワー情報としては、分析周期内の
各サンプリング点について音響信号を２乗し、これら２
乗値の総和をその分析周期におけるパワー情報として用
いている。The power information of the acoustic signal is obtained by squaring the acoustic signal at each sampling point in the analysis cycle,
The sum of the power values is used as power information in the analysis cycle.

CPU1は、まず分析点パラメータｔを０クリアした後、
処理すべきデータが終了していないことを確認してその
分析点ｔにおけるパワー情報pow（ｔ）が閾値θ１以上
か否かを判断する（ステップSP20〜22）。閾値θ１より
小さくて否定結果を得ると、パラメータｔをインクリメ
ントして上述のステップSP21に戻る（ステップSP23）。CPU1 first clears the analysis point parameter t to 0,
It is confirmed that the data to be processed is not completed, and it is determined whether or not the power information pow (t) at the analysis point t is equal to or larger than the threshold value θ1 (steps SP20 to SP22). If the result is smaller than the threshold value θ1 and a negative result is obtained, the parameter t is incremented and the process returns to step SP21 (step SP23).

かかる処理を繰返すことにより、CPU1は、やがて閾値
θ１以上のパワー上述pow（ｔ）をとる分析点を見出
し、ステップSP22において肯定結果を得る。このとき、
CPU1はこの分析点にセグメント開始の印を付した後、処
理すべき分析点のデータが終了していないことを確認し
てその分析点ｔにおけるパワー情報pow（ｔ）が閾値θ
１より小さいか否かを判断する（ステップSP24〜26）。
閾値θ１以上であって否定結果を得ると、パラメータｔ
をインクリメントして上述のステップSP25に戻る（ステ
ップSP27）。By repeating this processing, the CPU 1 eventually finds an analysis point that takes the power pow (t) above the threshold θ1 and obtains a positive result in step SP22. At this time,
After marking the start of the segment at this analysis point, the CPU 1 confirms that the data of the analysis point to be processed has not been completed, and the power information pow (t) at the analysis point t has a threshold θ.
It is determined whether it is smaller than 1 (steps SP24 to SP26).
If the result is not less than the threshold θ1 and a negative result is obtained, the parameter t
Is incremented, and the process returns to step SP25 (step SP27).

かかる処理を繰返すことにより、CPU1は、やがて閾値
θ１より小さいパワー情報pow（ｔ）をとる分析点を見
出し、ステップSP26において肯定結果を得る。このと
き、CPU1はこの分析点にセグメント終了の印を付した
後、上述のセグメント開始の印情報と当該セグメント終
了の印情報からセグメントの長さＬを検出し、この長さ
Ｌが閾値θ２より小さいか否かを判断する（ステップSP
28〜30）。かかる判断ステップは、余りにも短いセグメ
ントを有効なセグメントとみなさないためのステップで
あり、閾値θ２は音符との関係から定められている。CP
U1はこのステップSP30において肯定結果を得ると、セグ
メントの開始及び終了の印を取り去った後、パラメータ
ｔをインクリメントして上述のステップSP21に戻り、他
方、セグメントの長さＬが十分であって否定結果を得る
と、印を取ることなく、直ちにパラメータｔをインクリ
メントして上述のステップSP21に戻る（ステップSP31、
32）。By repeating such processing, the CPU 1 eventually finds an analysis point that takes power information pow (t) smaller than the threshold θ1, and obtains a positive result in step SP26. At this time, after marking the analysis point with the end of the segment, the CPU 1 detects the length L of the segment from the above-described mark information of the start of the segment and the mark information of the end of the segment. Judge whether it is small (step SP
28-30). This determination step is a step for preventing a too short segment from being regarded as a valid segment, and the threshold value θ2 is determined from the relationship with musical notes. CP
If U1 obtains a positive result in this step SP30, after removing the mark of the start and end of the segment, the parameter t is incremented and the process returns to the above-described step SP21. When the result is obtained, the parameter t is immediately incremented without removing the mark, and the process returns to step SP21 (step SP31,
32).

このような処理を繰返すことにより、やがて、全ての
パワー情報について処理が終了し、ステップSP22または
SP25において肯定結果が得られて当該グログラムを終了
させる。By repeating such processing, processing of all power information is eventually completed, and step SP22 or
A positive result is obtained in SP25, and the gram is ended.

第２図はパワー情報の時間変化とこの時間変化に対応
したセグメンテーション結果の一例を示すものであり、
この例の場合には、第１図の処理を実行することにより
セグメントS1、S2・・・SMが得られる。なお、時点t1〜
t2の期間では、パワー情報が閾値θ１を越えているが、
この期間は短くてその長さが閾値θ２以下であるので、
セグメントとしては抽出されない。FIG. 2 shows an example of a time change of the power information and a segmentation result corresponding to the time change.
In the case of this example, the segments S1, S2,... SM are obtained by executing the processing of FIG. In addition, from time t1
In the period of t2, the power information exceeds the threshold θ1,
Since this period is short and its length is equal to or less than the threshold θ2,
It is not extracted as a segment.

従って、上述の実施例によれば、パワー情報が閾値以
上の期間を検出し、その長さが十分に長いときセグメン
トとして抽出するようにしたので、ノイズ成分や揺らぎ
によって誤ったセグメンテーションを行なうことのない
精度の高いセグメンテーションを実行することができ
る。また、閾値以上のパワー情報期間をセグメントとし
ているので、パワーが小さい音程の不安定な期間は音程
同定処理に用いられず、音程同定処理を良好に実行させ
ることができる。Therefore, according to the above-described embodiment, a period in which the power information is equal to or larger than the threshold is detected, and when the length is sufficiently long, the segment is extracted as a segment. Therefore, erroneous segmentation due to noise components or fluctuations may be performed. No high-precision segmentation can be performed. In addition, since the power information period equal to or greater than the threshold value is used as a segment, an unstable period of a low power interval is not used for the pitch identification process, and the pitch identification process can be performed well.

他の実施例なお、上述の実施例においては、パワー情報として音
響信号の２乗和を用いたものを示したが、他のパラメー
タを用いても良い。例えば、２乗和の平方根を用いても
良い。Other Embodiments In the above-described embodiment, the power information is obtained by using the sum of squares of the audio signal. However, other parameters may be used. For example, the square root of the sum of squares may be used.

さらに、上述の実施例においては、第４図に示す全て
の処理をCPU1が主記憶装置３に格納されているプログラ
ムに従って実行するものを示したが、その一部または全
部の処理をハードウェア構成で実行するようにしても良
い。例えば、第３図との対応部分に同一符号を付した第
５図に示すように、音響信号入力装置８からの音響信号
を増幅回路10を介して増幅した後、さらに前置フィルタ
11を介してアナログ／デジタル変換器12に与えてデジタ
ル信号に変換し、このデジタル信号に変換された音響信
号を信号処理プロセッサ13が自己相関分析してピッチ情
報を抽出し、また２乗和処理してパワー情報を抽出して
CPU1によるソフトウェア処理系に与えるようにしても良
い。このようなハードウェア構成（10〜13）に用いられ
る信号処理プロセッサ13としては、音声帯域の信号をリ
アルタイム処理し得ると共に、ホストのCPU1とのインタ
フェース信号が用意されているプロセッサ（例えば、日
本電気株式会社製μpD7720）を適用し得る。Further, in the above-described embodiment, the CPU 1 executes all the processing shown in FIG. 4 according to the program stored in the main storage device 3. However, a part or all of the processing is performed by a hardware configuration. May be executed. For example, as shown in FIG. 5 in which the same reference numerals are given to the corresponding parts in FIG. 3, after the sound signal from the sound signal input device 8 is amplified through the amplifier circuit 10, the pre-filter is further added.
The digital signal is supplied to an analog / digital converter 12 via an analog-to-digital converter 11 and converted into a digital signal. The acoustic signal converted to the digital signal is subjected to autocorrelation analysis by a signal processor 13 to extract pitch information, and a square sum processing is performed. And extract power information
It may be provided to the software processing system by the CPU1. As the signal processor 13 used in such a hardware configuration (10 to 13), a processor capable of processing a signal in a voice band in real time and providing an interface signal with a host CPU 1 (for example, NEC Corporation) Co., Ltd. μpD7720) can be applied.

［発明の効果］以上説明したように、本発明によれば、ピッチ情報が
同一音程を指示する期間について、前記パワー情報が所
定値以上の有効区間と該所定値に満たない無効区間とに
分け、かつ所定長に満たない有効区間については無効区
間とし、１音毎のセグメントに区分するようにしたか
ら、所定値を越えるパワー情報期間をセグメントとする
ことで、パワーが小さい音程の不安定な期間を音程同定
処理対象から除外し、精度の高い音程同定処理が可能で
あり、また自己相関分析から得られるピッチ情報に基づ
いて同一音程を指示する期間を特定しているため、単に
パワー情報だけをしいき値判別して音節に区分するだけ
では不可能な同一音程における音符配列を克明に解析す
ることができ、連続的に与えられる音響信号を音符の時
系列として正確に楽譜データに変換することができ、し
かも所定値を越えるパワー情報についても、その長さが
十分に長いときにセグメントとして抽出することで、ノ
イズ成分やパワーの揺らぎに起因する誤セグメンテーシ
ョンを排除し、非常に精度の高いセグメンテーションを
実行することができ、セグメンテーション結果を利用す
る音程同定処理等の以降の処理を良好に実行させること
ができる等の効果を奏する。[Effects of the Invention] As described above, according to the present invention, the period in which the pitch information indicates the same pitch is divided into an effective section in which the power information is equal to or more than a predetermined value and an invalid section in which the power information is less than the predetermined value. In addition, an effective section shorter than a predetermined length is determined to be an invalid section and is divided into segments for each note. By segmenting a power information period exceeding a predetermined value as a segment, an unstable period of a low-pitch pitch is obtained. Periods are excluded from the process of pitch identification processing, and high-precision pitch identification processing is possible.Since the period in which the same pitch is specified based on pitch information obtained from autocorrelation analysis is specified, only power information is used. It is possible to precisely analyze the note arrangement at the same interval, which is impossible only by discriminating the threshold value and dividing it into syllables. By converting power information that exceeds a predetermined value into segments when the length is sufficiently long, erroneous segmentation due to noise components and power fluctuations can be performed. , Segmentation with extremely high accuracy can be performed, and the following processes such as the pitch identification process using the segmentation result can be performed satisfactorily.

[Brief description of the drawings]

第１図は、本発明の一実施例にかかるパワー情報に基づ
くセグメンテーション処理を示すフローチャート、第２
図は、パワー情報の経時変化をセグメンテーション結果
と共に示す特性曲線図、第３図は、本発明を適用する自
動採譜方式の構成を示すブロック図、第４図は、第３図
に示した構成による自動採譜処理手順を示すフローチャ
ート、第５図は、自動採譜方式の他の構成を示すブロッ
ク図である。１……CPU ３……主記憶装置６……補助記憶装置７……アナログ／デジタル変換器８……音響信号入力装置FIG. 1 is a flowchart showing a segmentation process based on power information according to an embodiment of the present invention.
FIG. 3 is a characteristic curve diagram showing a temporal change of power information together with a segmentation result, FIG. 3 is a block diagram showing a configuration of an automatic transcription system to which the present invention is applied, and FIG. 4 is a diagram shown in FIG. FIG. 5 is a flowchart showing an automatic transcription process, and FIG. 5 is a block diagram showing another configuration of the automatic transcription system. DESCRIPTION OF SYMBOLS 1 ... CPU 3 ... Main storage device 6 ... Auxiliary storage device 7 ... Analog / digital converter 8 ... Audio signal input device

───────────────────────────────────────────────────── フロントページの続き (72)発明者藤本正樹東京都港区芝５丁目７番15号日本電気技術情報システム開発株式会社内 (72)発明者水野正典東京都港区芝５丁目７番15号日本電気技術情報システム開発株式会社内審査官新井重雄 (56)参考文献特開昭58−181090（ＪＰ，Ａ) 特開昭58−223198（ＪＰ，Ａ) ──────────────────────────────────────────────────続き Continued on the front page (72) Inventor Masaki Fujimoto 5-7-15 Shiba, Minato-ku, Tokyo Inside NEC Technical Information System Development Co., Ltd. (72) Inventor Masanori Mizuno 5-7-1 Shiba, Minato-ku, Tokyo No. 15 Examiner, NEC Technical Information Systems Development Co., Ltd. Shigeo Arai (56) References JP-A-58-181090 (JP, A) JP-A-58-223198 (JP, A)

Claims

(57) [Claims]

1. An automatic music transcription method for converting an acoustic signal into musical score data comprising a time series of musical notes, wherein said acoustic signal is subjected to an autocorrelation analysis, and pitch information and power information of the acoustic signal are extracted for each analysis cycle. For a period in which the pitch information indicates the same pitch, the power information is divided into an effective section of a predetermined value or more and an invalid section of less than the predetermined value, and an effective section of less than a predetermined length is regarded as an invalid section,
An automatic music transcription method characterized by dividing each sound into segments.

2. An automatic music transcription apparatus for converting an acoustic signal into musical score data comprising a time series of musical notes, wherein said acoustic signal is subjected to autocorrelation analysis, and information extraction for extracting pitch information and power information of the acoustic signal at each analysis cycle. Means, for a period in which the pitch information indicates the same interval, the power information is divided into an effective section of a predetermined value or more and an invalid section of less than the predetermined value, and an invalid section of less than a predetermined length is invalid. An automatic music transcription apparatus comprising: a segmentation unit that divides the sound into segments for each sound.