JP2004334240A

JP2004334240A - Sound signal analysis device and method

Info

Publication number: JP2004334240A
Application number: JP2004217686A
Authority: JP
Inventors: Tomoyuki Funaki; 知之船木
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 1996-11-20
Filing date: 2004-07-26
Publication date: 2004-11-25
Anticipated expiration: 2017-11-20
Also published as: JP3888372B2

Abstract

<P>PROBLEM TO BE SOLVED: To determine validity/invalidity of a section, estimated to correspond to a normal part of a musical sound, i.e. a single note except for a fluctuated part, even when the pitch or a level of an input sound from a microphone or the like fluctuates delicately. <P>SOLUTION: A sound signal analysis device inputs an arbitrary sound signal consisting of one or a plurality of note time series connections, and detects the section estimated to correspond to the note one by one from among the input signals, respectively. It arranges the detected section on a grid divided at time intervals, corresponding to the prescribed note length according to the time series, respectively, assigns one which is nearest the grid position on one prescribed end from either the start or the finish end of each normal section for each normal section respectively, and as a result, selects one normal section of the longest time length as a valid note, when a plurality of the normal sections are assigned at the same grid position. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

この発明は、マイクロフォン等によって入力した音声信号や楽音信号等、そのピッチまたはノートが未確定の音信号に基づいて、音楽的な音が存在する区間（有効区間）やその音楽的な音の定常部分を分析し、そのノート（音階の音名）や音符長を自動的に分析することができるようにした音信号分析装置及び方法に関し、更に、そのためのプログラムを記憶した記録媒体に関する。
この発明に従う分析結果は、必要に応じてＭＩＤＩ情報等の形態の電子的楽譜情報として出力することができるものであり、従って、この発明は人間の音声等で入力した可聴的なメロディを自動的に楽譜化することができる技術に関するものである。 The present invention relates to a section in which a musical sound exists (effective section) and a steady state of the musical sound based on a sound signal whose pitch or note is undetermined, such as an audio signal or a musical sound signal input by a microphone or the like. The present invention relates to a sound signal analyzing apparatus and method capable of automatically analyzing a part (note name of a scale) and a note length thereof, and further relates to a recording medium storing a program therefor.
The analysis result according to the present invention can be output as electronic musical score information in the form of MIDI information or the like as necessary. Therefore, the present invention automatically converts an audible melody input by human voice or the like. The present invention relates to a technology that can be converted into a score.

最近、コンピュータ等を用いて、ＭＩＤＩ情報等の演奏情報を発生し、その演奏情報に基づいて演奏音を再生するコンピュータ演奏システムが新たな楽音演奏装置として注目されている。
この種のコンピュータ演奏システムでは、演奏情報を発生するためのデータを入力する方式として、リアルタイム入力方式、ステップ入力方式、数値入力方式、楽譜入力方式等がある。 Recently, a computer performance system that generates performance information such as MIDI information using a computer or the like and reproduces performance sounds based on the performance information has attracted attention as a new musical performance apparatus.
In this type of computer performance system, a method for inputting data for generating performance information includes a real-time input method, a step input method, a numerical value input method, a score input method, and the like.

リアルタイム入力方式は、テープレコーダのように演奏者が実際に演奏した鍵盤等の演奏操作子の操作情報をリアルタイムに演奏情報に変換する方式である。数値入力方式は、音高（ピッチ）、音の長さ、音の強弱等の演奏情報をコンピュータのキーボードから直接数値データとして入力する方式である。楽譜入力方式は、コンピュータのファンクションキーやマウス等を用いてディスプレイ上の楽譜（５線譜）に単純化した音譜記号等を配置していく方式である。ステップ入力方式は、音譜をＭＩＤＩ鍵盤やソフトウェア鍵盤で入力し、音の長さをコンピュータのファンクションキーやマウス等を用いて入力する方式である。 The real-time input method is a method of converting operation information of a performance operator such as a keyboard actually played by a player like a tape recorder into performance information in real time. The numerical value input method is a method in which performance information such as a pitch (pitch), a sound length, and the strength of a sound is directly input as numerical data from a computer keyboard. The musical score input method is a method in which simplified musical notation symbols and the like are arranged in a musical score (5-line notation) on a display using a function key of a computer, a mouse, or the like. The step input method is a method in which a musical score is input on a MIDI keyboard or a software keyboard, and a sound length is input using a function key of a computer, a mouse, or the like.

上述の各入力方式のうち、リアルタイム入力方式は、実際の演奏操作状態をそのまま演奏情報として記憶することができるので、人間的な微妙な演奏上のニュアンスを表現し易く、また短時間入力が可能であるという利点を有する。しかし、この方式は演奏者自身に高度の楽器演奏能力が必要であり、初心者等には不向きな入力方式である。
そこで、リアルタイム入力方式の利点を生かし、初心者でも短時間で簡単に演奏情報を入力できるようにした演奏情報発生装置として、人声音又は自然楽器の楽音をマイクを介して直接入力し、その入力音に応じて演奏情報を発生するものがある。すなわち、これは、人声音やギター等の音（単音）をマイクから入力するだけで、簡単にＭＩＤＩ信号を発生することができ、ＭＩＤＩキーボード等を使用しなくてもＭＩＤＩ機器を制御できる。 Of the above-mentioned input methods, the real-time input method can store the actual performance operation state as it is as performance information, so it is easy to express human delicate nuances in performance, and it is possible to input in a short time It has the advantage of being However, this method requires the player to have a high level of musical instrument playing ability, and is not suitable for beginners.
Therefore, as a performance information generator that allows beginners to easily input performance information in a short time by taking advantage of the real-time input method, human voices or musical sounds of natural musical instruments are directly input through a microphone, and the input sound is input. In some cases, performance information is generated in accordance with the performance information. That is, this can easily generate a MIDI signal simply by inputting a human voice or a sound (single sound) such as a guitar from a microphone, and can control a MIDI device without using a MIDI keyboard or the like.

従来の演奏情報発生装置では、マイクからの入力音のピッチ変化に対して、次のような処理を行ってＭＩＤＩ情報を発生している。すなわち、第１の方法はピッチ変化を半音単位で検出し、そのピッチのノート情報のみを発生する。第２の方法はピッチ変化を半音単位で検出し、そのピッチのノート情報と、その間のピッチ変化に関するピッチベンド情報（音高変化情報）とを発生する。第３の方法はノート検出することなく、入力信号のピッチを上下１オクターブの範囲で変化し得るピッチベンド情報として発生する。また、ノート情報（ノートオン又はノートオフ）を発生するのに、入力音のレベルを所定の基準値と比較し、その基準値よりも入力音のレベルが大きくなった時点でノートオンを、小さくなった時点でノートオフを発生している。 In a conventional performance information generating apparatus, MIDI information is generated by performing the following processing for a pitch change of a sound input from a microphone. That is, the first method detects a change in pitch in semitone units and generates only note information of that pitch. In the second method, a pitch change is detected in semitone units, and note information of the pitch and pitch bend information (pitch change information) relating to a pitch change therebetween are generated. The third method generates pitch bend information that can change the pitch of an input signal in the range of one octave up and down without detecting a note. To generate note information (note-on or note-off), the level of the input sound is compared with a predetermined reference value, and when the level of the input sound becomes higher than the reference value, the note-on is reduced. A note-off has occurred at the point in time.

しかしながら、上記第１及び第２の方法のようにピッチ変化を半音単位で検出する場合において、入力音のピッチが微妙にゆれると意図しないノート情報（ノートオン又はノートオフ）が多数発生するという問題がある。また、第３の方法のようにピッチ変化をピッチベンド情報で発生する場合は、ピッチ変化をピッチベンド情報で忠実に追従させることができるが、採譜のような目的には適さない。さらに、入力レベルに応じてノート情報を発生すると、入力音のレベルのゆれに応じて意図しないノート情報が多数発生するという問題がある。 However, in the case where the pitch change is detected in semitone units as in the first and second methods, if the pitch of the input sound is slightly changed, a lot of unintended note information (note on or note off) is generated. There is. When the pitch change is generated by the pitch bend information as in the third method, the pitch change can be made to follow the pitch bend information faithfully, but is not suitable for purposes such as transcription. Further, when note information is generated according to the input level, there is a problem that a large number of unintended note information is generated according to the fluctuation of the level of the input sound.

ところで、リアルタイム入力方式においては、複数の音が任意の時間間隔で時系列的にマイクに入力されるので、音の存在する部分に対して効率的な分析を行うことが要求される。すなわち、マイク入力された信号に対してピッチ等の分析を絶えず行うようにしていたのでは、実際には音が入力されていない時間においても無駄な分析処理をすることになるので好ましくない。そこで、マイク入力された信号から実際に音が存在している区間（有効区間）を抽出し、抽出された有効区間についてのみピッチ分析等の複雑な分析処理を施すようにするのが効率的である。そのための従来の有効区間の抽出法は、単純に所定基準レベルと入力信号レベルを比較して有効区間の抽出を行っていたので、入力音のレベルが微妙に変動するような場合、特に基準レベル付近で変動した場合には有効区間の抽出が不正確になると問題があった。 By the way, in the real-time input method, since a plurality of sounds are input to the microphone in a time series at arbitrary time intervals, it is required to efficiently analyze a portion where the sound exists. That is, it is not preferable to constantly analyze the pitch and the like with respect to the signal input through the microphone, since a useless analysis process is performed even during a time when no sound is actually input. Therefore, it is efficient to extract a section (effective section) where a sound actually exists from a signal input to the microphone and to perform a complicated analysis process such as pitch analysis only on the extracted effective section. is there. The conventional effective section extraction method for this purpose simply extracts the effective section by comparing a predetermined reference level with the input signal level. Therefore, when the level of the input sound fluctuates subtly, the reference level is particularly high. If it fluctuates in the vicinity, there is a problem that the extraction of the effective section becomes inaccurate.

この発明は、マイク等からの入力音に基づき音楽的な音の定常部分すなわち１つの音符に相当する部分を分析することのできる音信号分析装置及び方法を提供することを目的とする。例えば、マイク等からの入力音のピッチ又はレベルが微妙にゆれた場合でも、そのゆれた部分以外の音楽的な音の定常部分すなわち１つの音符に相当する部分を分析することのできる音信号分析装置及び方法を提供する。詳しくは、入力された音信号からその定常部分を有効に分析し、これに基づき音のピッチを正確に分析できるようにするものである。しかして、この分析結果を必要に応じてＭＩＤＩ情報等の形態の電子的楽譜情報として出力することができるようにし、人間の音声や実際の楽器演奏等で入力した可聴的なメロディを自動的に楽譜化することを正確に行うことができるようにした技術を提供しようとするものである。 SUMMARY OF THE INVENTION It is an object of the present invention to provide a sound signal analyzing apparatus and method capable of analyzing a stationary part of a musical sound, that is, a part corresponding to one note, based on an input sound from a microphone or the like. For example, even when the pitch or level of an input sound from a microphone or the like is slightly fluctuated, a sound signal analysis capable of analyzing a stationary part of a musical sound other than the fluctuated part, that is, a part corresponding to one note. An apparatus and method are provided. More specifically, the present invention effectively analyzes a stationary part of an input sound signal, and accurately analyzes the pitch of the sound based on the result. This analysis result can be output as electronic musical score information in the form of MIDI information or the like, if necessary, so that audible melody input by human voice or actual musical instrument performance can be automatically output. It is an object of the present invention to provide a technique capable of accurately performing musical score conversion.

請求項１に記載の本発明の音信号分析装置は、１又は複数の音符の時系列的連なりからなる任意の音信号を入力するための入力手段と、前記入力された音信号の中から１つ１つの音符に相当すると推量される区間をそれぞれ検出する区間検出手段と、前記検出された区間を、その時系列に従って、所定の音符長に対応する時間間隔で分割されたグリッド上にそれぞれ配置し、各区間の開始又は終了端部のうち所定の一方の端部に最も近い１つのグリッド位置を各区間に対してそれぞれ割り当て、その結果同じグリッド位置に複数の区間が割り当てられた場合には最も時間長の長い１つの区間を有効な音符として選択する手段とを具備したものである。 The sound signal analyzer of the present invention according to claim 1, wherein input means for inputting an arbitrary sound signal formed of a time series of one or a plurality of notes, and one of the input sound signals. A section detecting means for respectively detecting a section estimated to correspond to one note; and arranging the detected section on a grid divided at time intervals corresponding to a predetermined note length in accordance with the time series. One grid position closest to a predetermined one of the start or end ends of each section is assigned to each section. As a result, if a plurality of sections are assigned to the same grid position, Means for selecting one section having a long time length as a valid note.

入力された音信号の中から１つ１つの音符に相当すると推量される区間をそれぞれ検出する場合、検出された区間が全て有効な音符に相当しているとは限らず、その中には、無効なものもあり得る。無効な区間を判定するために、所定の音符長（例えば許容される最小音符長）に対応する時間間隔で分割されたグリッドを、有利に使用するようにしている。すなわち、検出された各区間を、その時系列に従って、所定の音符長に対応する時間間隔で分割されたグリッド上にそれぞれ配置し、各区間の開始端部に最も近い１つのグリッド位置を各区間に対してそれぞれ割り当て、その結果同じグリッド位置に複数の区間が割り当てられた場合には最も時間長の長い１つの区間を有効なものとして選択するようにしている。 When detecting sections that are estimated to correspond to notes one by one from the input sound signal, all of the detected sections do not necessarily correspond to valid notes. Some may be invalid. In order to determine an invalid section, a grid divided at time intervals corresponding to a predetermined note length (for example, a minimum allowable note length) is advantageously used. That is, each of the detected sections is arranged on a grid divided at time intervals corresponding to a predetermined note length in accordance with the time series, and one grid position closest to the start end of each section is assigned to each section. In contrast, when a plurality of sections are assigned to the same grid position, one section having the longest time length is selected as a valid section.

更に、本発明によれば、隣接する区間同士の波形の一致度を演算し、この一致度が所定値よりも大きい部分を有声区間として検出し、検出された有声区間内の一致度の高い区間の波形を基準にして、その両側の区間の波形との間で順次一致度を演算し、その一致度に基づいて定常区間を検出し、この定常区間について音信号のピッチ等を分析／検出するようにしてもよい。有声区間内で一致度の高い区間は音声の母音の基準となるので、それを基準に求められた一致度によって母音の変化を検出できる。このように検出された定常区間を母音、すなわち１つの音符と推量される区間として認識するとよい。 Further, according to the present invention, the degree of coincidence between the waveforms of adjacent sections is calculated, a portion where the degree of coincidence is larger than a predetermined value is detected as a voiced section, and a section having a high degree of coincidence within the detected voiced section is calculated. , The degree of coincidence is sequentially calculated with the waveforms in the sections on both sides of the waveform, and a steady section is detected based on the degree of coincidence, and the pitch and the like of the sound signal are analyzed / detected in the steady section. You may do so. Since a section having a high degree of coincidence in a voiced section serves as a reference for a vowel of a voice, a change in a vowel can be detected based on the degree of coincidence determined based on the vowel. The thus detected steady section may be recognized as a vowel, that is, a section inferred as one note.

更に、本発明においては、上述した各構成すなわち発明は、音声分析装置の装置発明として構成することができるのみならず、音声分析方法の方法発明として構成することができる。また、本発明の実施は、コンピュータプログラムの形態で実施することができ、そのようなコンピュータプログラムを記憶した記録媒体の形態で本発明を実施することもできるし、それも本出願における本発明の範囲に含まれる。 Furthermore, in the present invention, each of the above-described configurations, that is, the invention, can be configured not only as a device invention of a voice analysis device, but also as a method invention of a voice analysis method. Further, the embodiment of the present invention can be implemented in the form of a computer program, and the present invention can be implemented in the form of a recording medium storing such a computer program. Included in the range.

以下、この発明の実施の形態を添付図面に従って詳細に説明する。
図２はこの発明に係る楽音情報分析装置及び演奏情報発生装置を内蔵した電子楽器の構成を示すハードブロック図である。 Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
FIG. 2 is a hardware block diagram showing the configuration of an electronic musical instrument incorporating a musical sound information analyzer and a performance information generator according to the present invention.

電子楽器は、マイクロプロセッサユニット（ＣＰＵ）１、プログラムメモリ２及びワーキングメモリ３からなるマイクロコンピュータによって制御される。
ＣＰＵ１は、この電子楽器全体の動作を制御するものである。このＣＰＵ１に対して、データ及びアドレスバス１Ｅを介してプログラムメモリ２、ワーキングメモリ３、演奏データメモリ４、押鍵検出回路５、マイクインターフェイス６、スイッチ検出回路７、表示回路８及び音源回路９がそれぞれ接続されている。 The electronic musical instrument is controlled by a microcomputer including a microprocessor unit (CPU) 1, a program memory 2, and a working memory 3.
The CPU 1 controls the operation of the entire electronic musical instrument. A program memory 2, a working memory 3, a performance data memory 4, a key press detection circuit 5, a microphone interface 6, a switch detection circuit 7, a display circuit 8, and a sound source circuit 9 are supplied to the CPU 1 via a data and address bus 1E. Each is connected.

プログラムメモリ２はＣＰＵ１の各種プログラム（システムプログラムや動作プログラムなど）、各種データ等を格納するものであり、リードオンリーメモリ（ＲＯＭ）で構成されている。 The program memory 2 stores various programs (system programs, operation programs, and the like) of the CPU 1, various data, and the like, and is configured by a read-only memory (ROM).

ワーキングメモリ３は、演奏情報やＣＰＵ１がプログラムを実行する際に発生する各種データを一時的に記憶するものであり、ランダムアクセスメモリ（ＲＡＭ）の所定のアドレス領域がそれぞれ割り当てられ、レジスタ、フラグ、バッファ、テーブル等などとして利用される。
演奏データメモリ４は、マイク等からの入力音に基づいて発生された演奏情報（ＭＩＤＩデータ）などを記憶するものである。 The working memory 3 temporarily stores performance information and various data generated when the CPU 1 executes a program. A predetermined address area of a random access memory (RAM) is assigned to each of the working memories 3. It is used as a buffer, table, etc.
The performance data memory 4 stores performance information (MIDI data) generated based on a sound input from a microphone or the like.

また、ＣＰＵ１には、ハードディスク装置１Ｈなどを接続して、そこに自動演奏データやコード進行データ等の各種データを記憶していてもよく、更に、前記動作プログラムを記憶するようにしてもよい。また、前記ＲＯＭ２に動作プログラムを記憶せずに、ハードディスク装置１Ｈにこれらの動作プログラムを記憶させておき、それをＲＡＭ３に読み込むことにより、ＲＯＭ２に動作プログラムを記憶したときと同様の動作をＣＰＵ１に行わせることができる。このようにすると、動作プログラムの追加やバージョンアップ等が容易に行える。また、着脱自在な外部記憶媒体１Ｇ、例えばＣＤ−ＲＯＭ等、を設けてもよい。この外部記憶媒体１Ｇ例えばＣＤ−ＲＯＭには、各種データ及び任意の動作プログラムを記憶していてもよい。このＣＤ−ＲＯＭに記憶されている動作プログラムや各種データは、ＣＤ−ＲＯＭドライブ（図示せず）によって、読み出され、ハードディスク装置１Ｈに転送記憶させることができる。これにより、動作プログラムの新規のインストールやバージョンアップを容易に行うことができる。 Further, the CPU 1 may be connected to a hard disk device 1H or the like and may store therein various data such as automatic performance data and chord progression data, and may further store the operation program. In addition, these operation programs are stored in the hard disk device 1H without storing the operation programs in the ROM 2, and by reading them into the RAM 3, the same operation as when the operation programs are stored in the ROM 2 is performed on the CPU 1. Can be done. In this way, an operation program can be easily added or upgraded. Also, a removable external storage medium 1G, for example, a CD-ROM or the like may be provided. Various data and an arbitrary operation program may be stored in the external storage medium 1G, for example, a CD-ROM. The operation program and various data stored in the CD-ROM can be read out by a CD-ROM drive (not shown), and can be transferred and stored in the hard disk drive 1H. This makes it possible to easily perform new installation and version upgrade of the operation program.

なお、通信インターフェイス１Ｆをデータ及びアドレスバス１Ｅに接続し、この通信インターフェイス１Ｆを介してＬＡＮ（ローカルエリアネットワーク）やインターネットなどの種々の通信ネットワーク上に接続可能とし、他のサーバコンピュータとの間でデータのやりとりを行うようにしてもよい。これにより、ハードディスク装置１Ｈ内に動作プログラムや各種データが記憶されていないような場合には、サーバコンピュータからその動作プログラムや各種データをダウンロードすることができる。この場合、クライアントとなる楽音生成装置である電子楽器から、通信インターフェイス及び通信ネットワークを介してサーバコンピュータに動作プログラムや各種データのダウンロードを要求するコマンドを送信する。サーバコンピュータは、このコマンドに応じて、所定の動作プログラムやデータを、通信ネットワークを介して電子楽器１に送信する。電子楽器では、通信インターフェイスを介してこれらの動作プログラムやデータを受信して、ハードディスク装置にこれらを蓄積する。これによって、動作プログラム及び各種データのダウンロードが完了する。 The communication interface 1F is connected to a data and address bus 1E, and can be connected to various communication networks such as a LAN (local area network) and the Internet via the communication interface 1F. Data may be exchanged. Thereby, when the operation program and various data are not stored in the hard disk device 1H, the operation program and various data can be downloaded from the server computer. In this case, a command requesting downloading of an operation program and various data is transmitted from the electronic musical instrument, which is a musical sound generation device serving as a client, to a server computer via a communication interface and a communication network. The server computer transmits predetermined operation programs and data to the electronic musical instrument 1 via the communication network in response to the command. The electronic musical instrument receives these operation programs and data via the communication interface and stores them in the hard disk device. Thus, downloading of the operation program and various data is completed.

鍵盤１０は、発音すべき楽音の音高を選択するための複数の鍵を備えており、各鍵に対応してキースイッチを有しており、また必要に応じて押鍵速度検出装置や押圧力検出装置等のタッチ検出手段を有している。 The keyboard 10 has a plurality of keys for selecting a pitch of a musical tone to be pronounced, has a key switch corresponding to each key, and has a key pressing speed detecting device and a key pressing key as necessary. It has touch detection means such as a pressure detection device.

押鍵検出回路５は、発生すべき楽音の音高を指定する鍵盤１０のそれぞれの鍵に対応して設けられた複数のキースイッチからなる回路を含んで構成されており、新たな鍵が押圧されたときはキーオンイベントを出力し、鍵が新たに離鍵されたときはキーオフイベントを出力する。また、鍵押し下げ時の押鍵操作速度又は押圧力等を判別してタッチデータを生成する処理を行い、生成したタッチデータをベロシティデータとして出力する。このようにキーオン、キーオフイベント及びベロシティなどのデータはＭＩＤＩ規格に準拠したデータ（以下「ＭＩＤＩデータ」とする）で表現されておりキーコードと割当てチャンネルを示すデータも含んでいる。 The key press detection circuit 5 includes a circuit composed of a plurality of key switches provided corresponding to respective keys of the keyboard 10 for designating the pitch of a musical tone to be generated. When a key is released, a key-on event is output, and when a key is newly released, a key-off event is output. In addition, a process of generating touch data by determining a key pressing operation speed or pressing force at the time of key pressing is performed, and the generated touch data is output as velocity data. As described above, data such as a key-on event, a key-off event, and velocity are represented by data conforming to the MIDI standard (hereinafter, referred to as “MIDI data”), and include data indicating a key code and an assigned channel.

マイクロフォン１Ａは、音声信号や楽器音を電圧信号に変換して、マイクインターフェイス６に出力する。マイクインターフェイス６は、マイクロフォン１Ａからのアナログの電圧信号をディジタル信号に変換してデータ及びアドレスバス１Ｅを介してＣＰＵ１に出力する。 The microphone 1A converts a voice signal or a musical instrument sound into a voltage signal and outputs the voltage signal to the microphone interface 6. The microphone interface 6 converts an analog voltage signal from the microphone 1A into a digital signal and outputs the digital signal to the CPU 1 via the data and address bus 1E.

テンキー＆各種スイッチ１Ｂは、数値データ入力用のテンキーや文字データ入力用のキーボード、音符化処理（音信号分析処理及び演奏情報発生処理）のスタート／ストップスイッチなどの各種の操作子を含んで構成される。なお、この他にも音高、音色、効果等を選択・設定・制御するための各種操作子を含むが、その詳細については公知なので説明を省略する。 The numeric keypad and various switches 1B are configured to include various operators such as a numeric keypad for inputting numerical data, a keyboard for inputting character data, and a start / stop switch for a note conversion process (sound signal analysis process and performance information generation process). Is done. In addition, various controls for selecting, setting, and controlling the pitch, tone, effect, and the like are also included, but the details thereof are known and will not be described.

スイッチ検出回路７は、テンキー＆各種スイッチ１Ｂの各操作子の操作状態を検出し、その操作状態に応じたスイッチ情報をデータ及びアドレスバス１Ｅを介してＣＰＵ１に出力する。 The switch detection circuit 7 detects the operation state of each operator of the numeric keypad & various switches 1B, and outputs switch information corresponding to the operation state to the CPU 1 via the data and address bus 1E.

表示回路８はＣＰＵ１の制御状態、設定データの内容等の各種の情報をディスプレイ１Ｃに表示するものである。ディスプレイ１Ｃは液晶表示パネル（ＬＣＤ）やＣＲＴ等から構成され、表示回路８によってその表示動作を制御されるようになっている。
このテンキー＆各種スイッチ１Ｂ、並びにディスプレイ１ＣによってＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）が構成される。 The display circuit 8 displays various information such as the control state of the CPU 1 and the contents of the setting data on the display 1C. The display 1C is composed of a liquid crystal display panel (LCD), a CRT, or the like, and its display operation is controlled by the display circuit 8.
A GUI (Graphical User Interface) is configured by the numeric keypad & various switches 1B and the display 1C.

音源回路９は、複数チャンネルで楽音信号の同時発生が可能であり、データ及びアドレスバス１Ｅを経由して与えられた楽音トラック上のＭＩＤＩデータを入力し、このデータに基づいた楽音信号を生成し、それをサウンドシステム１Ｄに出力する。 The tone generator 9 is capable of simultaneously generating a tone signal on a plurality of channels, receives data and MIDI data on a given tone track via an address bus 1E, and generates a tone signal based on the data. And outputs it to the sound system 1D.

音源回路９において複数チャンネルで楽音信号を同時に発音させる構成としては、１つの回路を時分割で使用することによって複数の発音チャンネルを形成するようなものや、１つの発音チャンネルが１つの回路で構成されるような形式のものであってもよい。また、音源回路９における楽音信号発生方式はいかなるものを用いてもよい。例えば、発生すべき楽音の音高に対応して変化するアドレスデータに応じて波形メモリに記憶した楽音波形サンプル値データを順次読み出すメモリ読み出し方式（波形メモリ方式）、又は上記アドレスデータを位相角パラメータデータとして所定の周波数変調演算を実行して楽音波形サンプル値データを求めるＦＭ方式、あるいは上記アドレスデータを位相角パラメータデータとして所定の振幅変調演算を実行して楽音波形サンプル値データを求めるＡＭ方式等の公知の方式を適宜採用してもよい。また、これらの方式以外にも、自然楽器の発音原理を模したアルゴリズムにより楽音波形を合成する物理モデル方式、基本波に複数の高調波を加算することで楽音波形を合成する高調波合成方式、特定のスペクトル分布を有するフォルマント波形を用いて楽音波形を合成するフォルマント合成方式、ＶＣＯ、ＶＣＦ及びＶＣＡを用いたアナログシンセサイザ方式等を採用してもよい。また、専用のハードウェアを用いて音源回路を構成するものに限らず、ＤＳＰとマイクロプログラムを用いて音源回路を構成するようにしてもよいし、ＣＰＵとソフトウェアのプログラムで音源回路を構成するようにしてもよい。
音源回路９から発生された楽音信号は、アンプ及びスピーカからなるサウンドシステム１Ｄを介して発音される。 The tone generator circuit 9 can simultaneously generate tone signals on a plurality of channels by using a single circuit in a time-division manner to form a plurality of tone channels, or a single tone channel is constituted by a single circuit. It may be of such a type as to be performed. Further, any tone signal generation method in the tone generator 9 may be used. For example, a memory reading method (waveform memory method) for sequentially reading out tone waveform sample value data stored in a waveform memory in accordance with address data that changes in accordance with a pitch of a musical tone to be generated, or a phase angle parameter An FM method for executing a predetermined frequency modulation operation as data to obtain tone waveform sample value data, or an AM method for executing a predetermined amplitude modulation operation using the above address data as phase angle parameter data to obtain tone waveform sample value data, etc. May be appropriately adopted. In addition to these methods, a physical model method of synthesizing a musical tone waveform by an algorithm that simulates the sounding principle of a natural musical instrument, a harmonic synthesizing method of synthesizing a musical tone waveform by adding a plurality of harmonics to a fundamental wave, A formant synthesis method for synthesizing a musical tone waveform using a formant waveform having a specific spectral distribution, an analog synthesizer method using VCO, VCF, and VCA may be employed. In addition, the tone generator circuit is not limited to the one that uses the dedicated hardware, and the tone generator circuit may be configured using a DSP and a microprogram. Alternatively, the tone generator circuit may be configured using a CPU and software programs. It may be.
The tone signal generated by the tone generator 9 is generated via a sound system 1D including an amplifier and a speaker.

次に、この発明に係る電子楽器が音信号分析装置及び演奏情報発生装置として動作する場合の一例を説明する。 Next, an example in which the electronic musical instrument according to the present invention operates as a sound signal analyzer and a performance information generator will be described.

図１は図２の電子楽器が演奏情報発生装置として動作する際のメインフローを示す図である。メインフローは次のようなステップで順番に実行される。
ステップ１１：まず、初期設定処理を行い、図２のワーキングメモリ３内の各レジスタ及びフラグなどに初期値を設定したりする。
このとき、テンキー＆各種スイッチ１Ｂ上の音符化処理スタートスイッチがオン操作された場合に、ステップ１２〜ステップ１８までの一連の処理を行う。 FIG. 1 is a diagram showing a main flow when the electronic musical instrument of FIG. 2 operates as a performance information generating device. The main flow is executed sequentially in the following steps.
Step 11: First, an initial setting process is performed, and an initial value is set to each register and flag in the working memory 3 in FIG.
At this time, when the musical note processing start switch on the numeric keypad & various switches 1B is turned on, a series of processing from step 12 to step 18 is performed.

ステップ１２：このステップは音符化処理スタートスイッチのオン操作有りと判定された場合に行われるものであり、ここでは、そのオン操作に対応して、マイクインターフェイス６を介してマイクロフォン１Ａから入力される音声信号や楽器音の電圧波形を所定周期（例えば４４．１ｋＨｚ）でサンプリング処理し、それをディジタルサンプル信号としてワーキングメモリ３内の所定領域に記憶する。このサンプリング処理は従来の公知の方法で行うので、ここでは詳細は省略する。 Step 12: This step is performed when it is determined that the musical note processing start switch is turned on, and is input from the microphone 1A via the microphone interface 6 in response to the turn-on operation. A voltage waveform of a voice signal or a musical instrument sound is sampled at a predetermined period (for example, 44.1 kHz), and the result is stored in a predetermined area in the working memory 3 as a digital sample signal. Since this sampling process is performed by a conventionally known method, the details are omitted here.

ステップ１３からステップ１６までが音符化処理スタートスイッチのオン操作に対応した音符化処理である。この音符化処理ではサンプリングされた音声信号や楽器音のディジタルサンプル信号を種々分析してそれを音高列すなわち楽譜表示可能なＭＩＤＩデータに変換する。 Steps 13 to 16 correspond to the note conversion process corresponding to the ON operation of the note conversion start switch. In this note conversion process, sampled audio signals and digital sample signals of musical instrument sounds are analyzed in various ways and converted into a pitch sequence, that is, MIDI data that can be displayed in a musical score.

ステップ１３：ステップ１２の音声サンプリング処理の結果得られたディジタルサンプル信号に基づいて音楽的な音が存在する区間すなわち有効区間がどこにあるのかを検出するための有効区間検出処理を行う。この有効区間検出処理の詳細については後述する。 Step 13: Based on the digital sample signal obtained as a result of the audio sampling processing in step 12, an effective section detection process for detecting a section where a musical sound is present, that is, an effective section is performed. Details of the valid section detection processing will be described later.

ステップ１４：ステップ１３の有効区間検出処理の結果、検出された各有効区間をさらにレベルの安定している安定区間に細分化するための安定区間検出処理を行う。この安定区間検出処理の詳細についは後述する。 Step 14: As a result of the effective section detection processing in step 13, a stable section detection processing for subdividing each detected effective section into stable sections having a more stable level is performed. The details of the stable section detection processing will be described later.

ステップ１５：ステップ１４の安定区間検出処理の結果、検出された各安定区間内に存在する音楽的な音の定常部分（１つの音符に相当する部分）を検出する定常区間検出処理を行う。この定常区間検出処理の詳細についは後述する。 Step 15: As a result of the stable section detection processing in step 14, a steady section detection processing for detecting a steady part (a part corresponding to one note) of a musical sound present in each detected stable section is performed. The details of the stationary section detection processing will be described later.

ステップ１６：ステップ１３〜ステップ１５の処理の結果、得られた各定常区間毎に最も最適な音符を割り当てる音高列決定処理を行う。すなわち、このステップではＭＩＤＩデータを発生する。この音高列決定処理の詳細については後述する。 Step 16: A pitch sequence determination process for allocating the most optimal note for each steady section obtained as a result of the processes of steps 13 to 15 is performed. That is, in this step, MIDI data is generated. Details of the pitch sequence determination processing will be described later.

ステップ１７：ステップ１６の処理によって発生されたＭＩＤＩデータに基づいて楽譜を作成する楽譜作成処理を行う。この楽譜作成処理は従来の技術によって容易に実現可能なので詳細は省略する。 Step 17: A score creation process for creating a score based on the MIDI data generated by the process of step 16 is performed. Since the score creation processing can be easily realized by the conventional technique, the details are omitted.

ステップ１８：ステップ１６の処理によって発生されたＭＩＤＩデータに基づいた自動演奏処理を行う。この自動演奏処理についても従来の技術によって容易に実現可能なので詳細は省略する。 Step 18: Perform an automatic performance process based on the MIDI data generated by the process of step 16. Since the automatic performance process can be easily realized by the conventional technique, the details are omitted.

図３は図１のステップ１３の有効区間検出処理の詳細を示す図である。以下、ステップ１２によって求められたディジタルサンプル信号からどのようにして有効区間が検出されるのか、この有効区間検出処理の動作を図７及び図８を用いて説明する。 FIG. 3 is a diagram showing details of the valid section detection processing in step 13 of FIG. Hereinafter, how the effective section is detected from the digital sample signal obtained in step 12 will be described with reference to FIGS. 7 and 8.

ステップ３１：ステップ１２によって求められたディジタルサンプル信号に基づいて平均音圧レベルを算出する。図７は、サンプリング周波数４４．１ｋＨｚでサンプリングされた音声信号すなわちディジタルサンプル信号の波形値の一例を示す図である。図７では、約２０ポイント分の波形値が示されている。ステップ３１では、所定のサンプル数（例えば、１０ｍｓｅｃ相当の時間に対応するサンプル数）にわたるサンプル振幅値の平均を求め、それを平均音圧レベルとする。従って、サンプリング周期４４．１ｋＨｚの場合においては、この所定サンプル数は『４４１個』であり、あるサンプルポイントの平均値は、そのポイントを最終ポイントとする１０ｍｓｅｃ分前の各ポイントの合計値、すなわち最終ポイントから４４１ポイント分前の波形値の合計を４４１で除した値となる。なお、０ポイントから４４０ポイントまでは、４４１ポイント分の波形値が存在しないので、０ポイントからその該当ポイントまでの波形値の平均をそのポイントの平均値とする。こうして、時系列的な平均音圧レベル情報が各サンプルタイミング毎に得られる。 Step 31: An average sound pressure level is calculated based on the digital sample signal obtained in step 12. FIG. 7 is a diagram illustrating an example of a waveform value of an audio signal sampled at a sampling frequency of 44.1 kHz, that is, a digital sample signal. FIG. 7 shows waveform values for about 20 points. In step 31, the average of the sample amplitude values over a predetermined number of samples (for example, the number of samples corresponding to a time corresponding to 10 msec) is obtained, and the average is set as the average sound pressure level. Therefore, in the case of the sampling period of 44.1 kHz, the predetermined number of samples is “441”, and the average value of a certain sample point is the total value of each point 10 msec before the last point, that is, This is a value obtained by dividing the sum of the waveform values 441 points before the final point by 441. Since there is no waveform value for 441 points from the point 0 to the point 440, the average of the waveform values from the point 0 to the corresponding point is set as the average value of the point. In this way, time-series average sound pressure level information is obtained for each sample timing.

図７では、説明の便宜上１５ポイント分の波形値の平均値を平均音圧レベルとして算出する場合を図示している。従って、最初の１５ポイントまではそれまでの波形値の合計値をそのポイント数で除する形になっている。また、波形値の合計は、その絶対値を合計することによって求める。 FIG. 7 illustrates a case where the average value of the waveform values for 15 points is calculated as the average sound pressure level for convenience of explanation. Therefore, up to the first 15 points, the total value of the waveform values up to that point is divided by the number of points. The sum of the waveform values is obtained by summing the absolute values.

図８（Ａ）はこのようにして求められた平均音圧レベルの値を、サンプリングポイントを横軸とした場合をグラフ化して示したものである。以下、図８（Ａ）の平均音圧レベルによって形成される曲線を平均音圧レベルカーブと称する。 FIG. 8A is a graph showing the average sound pressure level value obtained in this way, with the sampling point on the horizontal axis. Hereinafter, the curve formed by the average sound pressure level in FIG. 8A is referred to as an average sound pressure level curve.

なお、図７のように１５ポイント毎に平均音圧レベルを求める場合には、カットオフ周波数１０Ｈｚ程度のローパスフィルタを掛けて、レベル変動を滑らかにしている。従って、実際に４４１ポイント分の波形値の平均を取る場合には、カットオフ周波数８０〜１００Ｈｚ程度のローパスフィルタを掛けて、そのレベル変動を滑らかにするのが望ましい。 When an average sound pressure level is obtained at every 15 points as shown in FIG. 7, a low-pass filter with a cutoff frequency of about 10 Hz is applied to smooth the level fluctuation. Therefore, when actually averaging the waveform values for 441 points, it is desirable to apply a low-pass filter having a cutoff frequency of about 80 to 100 Hz to smooth the level fluctuation.

また、ここでは、あるサンプリングポイントの平均値を求めるのに、そのポイントより前の所定数のポイントの波形値を合計して平均音圧レベルを求める場合について説明したが、あるサンプリングポイントを中心として前後に所定数のポイントの波形値を合計してもよいし、サンプリングポイントから後に所定数のポイントの波形値を合計してもよい。 Also, here, a case has been described where the average value of a certain sampling point is obtained, and the average sound pressure level is obtained by summing the waveform values of a predetermined number of points before that point. The waveform values at a predetermined number of points before and after may be totaled, or the waveform values at a predetermined number of points after the sampling point may be totaled.

ステップ３２：前記ステップ３１で求められた図８（Ａ）のような平均音圧レベルカーブを、所定のしきい値に基づいて有効区間又は無効区間にそれぞれ分類する。この処理では、しきい値として、その平均音圧レベルカーブの中の最大波形値の２０パーセントの値をしきい値とする。これ以外の値をしいき値としてもよいことは言うまでもない。例えば、平均音圧レベルカーブの平均値をしきい値としたり、又はその平均値の８０パーセントをしきい値としたり、平均音圧レベルカーブの最大値の半分の値をしきい値としたりしてもよい。 Step 32: The average sound pressure level curve obtained in step 31 as shown in FIG. 8A is classified into an effective section or an invalid section based on a predetermined threshold value. In this process, the threshold value is set to 20% of the maximum waveform value in the average sound pressure level curve. It goes without saying that other values may be used as threshold values. For example, the average value of the average sound pressure level curve is set as the threshold value, 80% of the average value is set as the threshold value, and the half value of the maximum value of the average sound pressure level curve is set as the threshold value. You may.

しきい値は図８（Ｂ）のような点線で示される。従って、この点線（しきい値）と平均音圧レベルカーブとの交点位置が有効区間及び無効区間の境界となり、この点線（しきい値）よりも大きい区間が有効区間となり、小さい区間が無効区間となる。図８（Ｂ）では、有効区間を○印で示し、無効区間を×印で示す。 The threshold value is indicated by a dotted line as shown in FIG. Therefore, the intersection point between the dotted line (threshold) and the average sound pressure level curve is a boundary between the valid section and the invalid section, a section larger than the dotted line (threshold) is a valid section, and a section smaller than the dotted line (threshold) is the invalid section. It becomes. In FIG. 8B, the valid section is indicated by a circle, and the invalid section is indicated by a cross.

ステップ３３：人間が音高を認知できる必要な最低長を０．０５ｍｓｅｃとした場合に、前記ステップ３２で決定された無効区間の中からこの最低長よりも小さな無効区間を有効区間に変更する。例えば、サンプリング周期が４４．１ｋＨｚの場合にはサンプリング数で２２０５個以下の無効区間を有効区間に変更する。図８（Ｂ）においては、左側から第３番目及び第５番目の無効区間がこの短い無効区間に相当する。従って、ステップ３３の処理の結果、図８（Ｂ）は図８（Ｃ）のようになり、有効区間が拡張される。なお、この処理において、全区間内の始まりと終わりの部分に存在する無効区間は、短い無効区間に相当するが、短いからといって有効区間に変更しない特別な領域として△印を用いて表現している。 Step 33: If the minimum length required for human recognition of the pitch is 0.05 msec, the invalid section smaller than the minimum length is changed to the valid section from the invalid sections determined in step 32. For example, when the sampling period is 44.1 kHz, invalid sections of 2205 or less in sampling number are changed to valid sections. In FIG. 8B, the third and fifth invalid sections from the left correspond to this short invalid section. Therefore, as a result of the processing in step 33, FIG. 8B becomes as shown in FIG. 8C, and the valid section is extended. In this processing, the invalid sections existing at the beginning and end of the entire section correspond to short invalid sections, but are represented by a special area which is not changed to a valid section just because it is short. are doing.

ステップ３４：前記ステップ３３の処理の結果、得られた有効区間及び無効区間のパターンの中から０．０５ｍｓｅｃ以下の短い有効区間を無効区間に変更する処理を行う。この処理は前記ステップ３３と同様の処理にて行う。図８（Ｃ）においては、右端の有効区間がこの短い有効区間に該当する。従って、ステップ３４の処理の結果、図８（Ｃ）は図８（Ｄ）のようになる。図８（Ｄ）から明らかなように、有効区間は第１区間から第４区間までの全部の４つの区間となる。なお、区間の終わりの部分の△印は第４の有効区間とみなされる。 Step 34: A process of changing a short valid section of 0.05 msec or less from the pattern of the valid section and the invalid section obtained as a result of the processing of the step 33 to the invalid section is performed. This process is performed in the same manner as in step 33. In FIG. 8C, the rightmost effective section corresponds to the short effective section. Therefore, as a result of the processing in step 34, FIG. 8C becomes as shown in FIG. As is clear from FIG. 8D, the effective sections are all four sections from the first section to the fourth section. Note that the mark at the end of the section is regarded as the fourth effective section.

ステップ３５：ステップ３４で特定された有効区間の平均音圧レベルカーブの平均値を求め、それが所定値よりも小さい場合にその部分を無効区間とする最終的な有効区間のチェックを行う。この平均値はその有効区間に存在する各ポイントの平均音圧レベル値の合計をその有効区間長で除することによって得られる。このようにして得られた平均音圧レベルの平均値が図８（Ｄ）の各区間の下側に示してある。第１区間は６０、第２区間は２５、第３区間は４５、第４区間は１５である。この平均音圧レベルの平均値がその区間の最大波形値の３０パーセントを下回った場合は、その区間を無効区間とする。ここでは、第２区間及び第４区間が該当するので、それぞれの区間が無効区間になる。図８（Ｅ）はこのステップ３５の有効区間チェック処理によって特定された有効区間と無効区間を示す図である。 Step 35: The average value of the average sound pressure level curve of the effective section specified in step 34 is obtained, and if the average value is smaller than a predetermined value, a final effective section in which that part is regarded as an invalid section is checked. This average value is obtained by dividing the sum of the average sound pressure level values of the points existing in the effective section by the effective section length. The average value of the average sound pressure levels obtained in this manner is shown below each section in FIG. The first section is 60, the second section is 25, the third section is 45, and the fourth section is 15. If the average value of the average sound pressure level falls below 30% of the maximum waveform value of the section, the section is regarded as an invalid section. Here, since the second section and the fourth section correspond to each other, each section becomes an invalid section. FIG. 8E is a diagram showing the valid section and the invalid section specified by the valid section check processing in step 35.

ステップ３６：ステップ３１〜ステップ３５までの処理によって特定された有効区間を拡張する処理を行う。例えば、図８（Ｆ）に示すように最大波形値の１５パーセントを拡張許可レベルとして、そこの部分に線を引き、有効区間を特定する境界線をその拡張許可レベルの線のところまで拡張する。すなわち、各有効区間の端から外側に向かって平均音圧レベルカーブの上昇下降をチェックしながら、そのカーブが拡張許可レベルを下回ったかどうかのチェックを行いながら拡張処理を行う。このとき、下降が上昇に反転した場合や拡張許可レベルを下回った場合には、そこまでを有効区間とする。 Step 36: A process of extending the valid section specified by the processes of steps 31 to 35 is performed. For example, as shown in FIG. 8 (F), 15% of the maximum waveform value is set as the extension permission level, and a line is drawn on that portion, and the boundary line for specifying the valid section is extended to the line of the extension permission level. . That is, the extension processing is performed while checking whether the average sound pressure level curve has fallen below the extension permission level while checking the rise and fall of the average sound pressure level curve from the end of each effective section to the outside. At this time, when the descending is reversed to the ascending or when the descending falls below the extended permission level, the area up to that point is regarded as an effective section.

また、図８（Ｇ）は、この有効区間拡張処理の別の例を示す図である。拡張許可レベルを最大波形値の５パーセントとし、平均音圧レベルカーブの下降が終了した位置を有効区間の末端とする。又は、上昇が始まった位置を末端としてもいい。この拡張処理によれば、図８（Ｆ）の場合よりも第１区間及び第３区間の拡張幅が大きくなる。このようにして、人間が音高として認知することの可能な有効区間が最終的に決定することになる。 FIG. 8G is a diagram showing another example of the effective section extension processing. The extension permission level is set to 5% of the maximum waveform value, and the position where the lowering of the average sound pressure level curve ends is set as the end of the effective section. Alternatively, the position at which the ascent starts may be the end. According to the extension processing, the extension width of the first section and the third section is larger than in the case of FIG. In this way, an effective section that can be recognized as a pitch by a human is finally determined.

なお、拡張許可レベルが低く、かつ、有効区間が近い距離にある場合には、ある有効区間の末尾側の拡張位置と次の有効区間の先頭側の拡張位置とが接近することもあれば、また同じ位置になることもある。また、下降が終わる部分と上昇が始まる部分のいずれを区切りにするかによっても境界位置が変わる。この拡張処理の結果、有効区間同士が重複した場合には、両方の中間位置を境界位置とすればよい。
なお、図８（Ｆ）及び（Ｇ）では、有効区間の拡張を前後に行う場合について説明したが、前方向又は後方向のみにしてもよい。また、前後に拡張する場合に、前方向と後方向とで拡張許可レベルを異ならせるようにしてもよい。 If the extension permission level is low and the effective section is at a short distance, the extension position at the end of a certain effective section may be close to the extension position at the beginning of the next effective section. It may also be at the same position. In addition, the boundary position changes depending on whether a part where the descent ends or a part where the ascent starts is delimited. As a result of the extension processing, when the valid sections overlap, both intermediate positions may be set as the boundary positions.
8 (F) and 8 (G), the case where the effective section is extended before and after is described. However, only the forward direction or the backward direction may be performed. Further, when extending forward and backward, the extension permission level may be made different between the forward direction and the backward direction.

図４は図１のステップ１４の安定区間検出処理の詳細を示す図である。以下、ステップ１３によって求められた有効区間内の平均音圧レベルカーブに対して、レベルの安定した領域を検出するための安定区間検出処理を行う。この安定区間検出処理の動作を図９を用いて説明する。図９は、図８のＡ点からＢ点までの第１の有効区間について、安定区間を検出する場合について示してある。
ステップ４１：図３で検出された有効区間内の平均音圧レベルカーブに基づいて、その傾斜度数を算出する。この処理では、図９（Ｂ）に示すように、傾斜を算出するための算出幅を例えば１００ポイントとし、その算出幅のシフト量を例えば５０ポイントとして、Ａ点からＢ点に向かって順次シフトしながら、その傾斜度数を算出する。Ａ点がサンプルポイント『０００』だとすると、サンプルポイント『０００』と『１００』との間の傾斜を求め、次にそれをそれぞれ５０ポイントずつシフトしたサンプルポイント『０５０』と『１５０』との間の傾斜度数を求める。例えば、サンプルポイント『０００』の平均音圧レベルが『３２５』で、サンプルポイント『１００』の平均音圧レベルが『１５７６』である場合、その傾斜度数は（１５７６−３２５）／１００＝１２．５１となる。以後、サンプルポイント『１００』と『２００』との間、『１５０』と『２５０』との間、『２００』と『３００』との間のように、順番にその傾斜度数を算出する。算出された傾斜度数の一例を図９（Ｂ）に示す。図から明らかなように、サンプルポイント『０００』，『１００』間の傾斜度数は１２．５１、サンプルポイント『０５０』，『１５０』間の傾斜度数は３２．４２、サンプルポイント『１００』，『２００』間の傾斜度数は２０．１２、サンプルポイント『１５０』，『２５０』間の傾斜度数は１１．８４、サンプルポイント『２００』，『３００』間の傾斜度数は５．２４、サンプルポイント『２５０』，『３５０』間の傾斜度数は４．８２、サンプルポイント『３００』，『４００』間の傾斜度数は２．３４、サンプルポイント『３５０』，『４５０』間の傾斜度数は３．８９、サンプルポイント『４００』，『５００』間の傾斜度数は５．３６となる。これらの傾斜度数は前者のサンプルポイントにおける傾斜度数として記憶されることになる。すなわち、サンプルポイント『０００』の傾斜度数は１２．５１、サンプルポイント『０５０』の傾斜度数は３２．４２として、それぞれのサンプルポイント毎に傾斜度数が記憶される。
このようにしてＡ点からＢ点までの全区間における傾斜度数を算出し、次のステップ４２の安定区間抽出処理を行う。 FIG. 4 is a diagram showing details of the stable section detection processing in step 14 of FIG. Hereinafter, a stable section detection process for detecting an area with a stable level is performed on the average sound pressure level curve in the effective section obtained in step 13. The operation of the stable section detection processing will be described with reference to FIG. FIG. 9 shows a case where a stable section is detected for the first effective section from point A to point B in FIG.
Step 41: The slope frequency is calculated based on the average sound pressure level curve in the effective section detected in FIG. In this process, as shown in FIG. 9B, the calculation width for calculating the inclination is set to, for example, 100 points, and the shift amount of the calculation width is set to, for example, 50 points. While calculating the inclination frequency. Assuming that the point A is the sample point “000”, the slope between the sample points “000” and “100” is calculated, and then the slope is shifted by 50 points between the sample points “050” and “150”. Obtain the slope frequency. For example, when the average sound pressure level of the sample point “000” is “325” and the average sound pressure level of the sample point “100” is “1576”, the slope frequency is (1576-325) / 100 = 12. It becomes 51. Thereafter, the slope degrees are calculated in order, such as between sample points "100" and "200", between "150" and "250", and between "200" and "300". FIG. 9B shows an example of the calculated inclination degree. As is clear from the figure, the gradient between the sample points "000" and "100" is 12.51, the gradient between the sample points "050" and "150" is 32.42, and the sample points "100" and "100". The slope between sample points "200" and "250" is 20.12, the slope between sample points "150" and "250" is 11.84, the slope between sample points "200" and "300" is 5.24, and the sample point " The slope between sample points "300" and "400" is 4.82, the slope between sample points "300" and "400" is 2.34, and the slope between sample points "350" and "450" is 3.89. , The inclination between sample points “400” and “500” is 5.36. These gradients will be stored as the gradients at the former sample point. That is, the slope frequency of the sample point “000” is 12.51, and the slope frequency of the sample point “050” is 32.42, and the slope frequency is stored for each sample point.
In this way, the gradient frequencies in all the sections from the point A to the point B are calculated, and the stable section extraction processing in the next step 42 is performed.

ステップ４２：前記ステップ４１で算出された傾斜度数に基づいて今度は安定区間の抽出を行う。すなわち、各サンプルポイントにおける傾斜度数の中から所定値（例えば１０）以下のものを安定部分とみなし、この安定部分とみなされたサンプルポイントの数が所定数以上すなわち所定時間だけ継続している場合にその連続した安定部分を安定区間とする。この所定時間は、テンポも考慮に入れて、例えば、約２０００サンプルポイント程度とする。図９（Ａ）のような平均音圧レベルカーブの場合は、図９（Ｃ）のようなａ，ｂ，ｃの３ヵ所が安定区間として探索されることになる。
ステップ４３：前記ステップ４２によって抽出された安定区間の存在に基づいて人間は初めてその安定区間の開始点付近に音符のトリガである音の開始点があることに気付く。ここでは、その音符の開始点付近を決定するために、前記ステップ４２で抽出された安定区間を拡張する。
この安定区間を拡張する場合、すなわち音符の開始点を決定する場合、最初の安定区間ａについては、必然的にＡ点がその安定区間ａの音符の開始点となり、最後の安定区間ｃについては、必然的にＢ点がその安定区間ｃの音符の終了点となる。ところが、安定区間ａの音符終了点、安定区間ｂの音符開始点は容易に求めることができない。そこで、安定区間の終了点から次の安定区間の開始点までの間における傾斜度数の最も大きいサンプルポイントをその安定区間の音符終了点及び次の安定区間の音符開始点とすることにした。各安定区間ａの音符終了点及び安定区間ｂの音符開始点は図９（Ｄ）のようにＣ点となり、安定区間ｂの音符終了点及び安定区間ｃの音符開始点はＤ点となる。
なお、上述の説明では、傾斜度数の最も大きいサンプルポイントを安定区間の音符開始点及び次の安定区間の音符終了点とする場合について説明したが、これに限らず、安定区間の終了点から次の安定区間の開始点までの間で安定度数が所定の値（しきい値）を最初に越えた場合のサンプルポイントを音符終了点（音符開始点）としてもよいし、安定区間の開始点の直前で所定の値（しいき値）を下回った場合のサンプルポイントを音符終了点（音符開始点）としてもよいし、以上の３つの方法で求められたサンプルポイントを複合的に計算して新たに音符終了点（音符開始点）を求めるようにしてもよい。このようにして求められた区間ＡＣ，ＣＤ，ＤＢがそれぞれのレベルに対応した安定区間になる。すなわち、図９の場合、安定区間ａのレベルに対応した安定区間はＡＣとなり、安定区間ｂのレベルの対応した安定区間はＣＤとなり、安定区間ｃのレベルに対応した安定区間はＤＢとなる。 Step 42: Based on the gradient calculated in step 41, a stable section is extracted. That is, if the slope frequency at each sample point is less than or equal to a predetermined value (for example, 10) as a stable portion, and the number of sample points regarded as the stable portion is equal to or more than a predetermined number, that is, for a predetermined time, Then, the continuous stable portion is defined as a stable section. The predetermined time is set to, for example, about 2000 sample points in consideration of the tempo. In the case of the average sound pressure level curve as shown in FIG. 9A, three sections a, b, and c as shown in FIG. 9C are searched as stable sections.
Step 43: Based on the existence of the stable section extracted in step 42, the human first notices that there is a starting point of the sound which is the trigger of the note near the starting point of the stable section. Here, in order to determine the vicinity of the starting point of the note, the stable section extracted in step 42 is extended.
When extending this stable section, that is, when determining the start point of a note, for the first stable section a, the point A is necessarily the start point of the note in the stable section a, and for the last stable section c, Inevitably, the point B is the end point of the note in the stable section c. However, the note end point of the stable section a and the note start point of the stable section b cannot be easily obtained. Therefore, the sample point having the largest gradient between the end point of the stable section and the start point of the next stable section is determined as the note end point of the stable section and the note start point of the next stable section. The note end point of each stable section a and the note start point of stable section b are point C as shown in FIG. 9D, and the note end point of stable section b and the note start point of stable section c are point D.
In the above description, the case where the sample point having the largest inclination is the note start point of the stable section and the note end point of the next stable section has been described. However, the present invention is not limited to this. The sample point when the stability degree first exceeds a predetermined value (threshold value) up to the start point of the stable section may be set to the note end point (note start point), or the start point of the stable section. A sample point in the case where the value falls below a predetermined value (threshold value) immediately before may be set as a note end point (note start point), or a sample point obtained by the above three methods may be compositely calculated and newly calculated. Alternatively, a note end point (note start point) may be obtained. The sections AC, CD, and DB obtained in this manner become stable sections corresponding to the respective levels. That is, in the case of FIG. 9, the stable section corresponding to the level of the stable section a is AC, the stable section corresponding to the level of the stable section b is CD, and the stable section corresponding to the level of the stable section c is DB.

図５は図１のステップ１５の定常区間検出処理の詳細を示す図である。ステップ１４によって求められた安定区間の中から定常区間がどのようにして検出されるのか、その定常区間検出処理の詳細を図１０から図１７までの図面を用いて説明する。
音声や楽音などの音楽的なオーディオ信号を分析する場合、定常部がどこにあるかを知ることは重要なことである。リズム系以外の音色では、定常部の周期性によって音高が決定され、定常部を骨格として音価が決定されるからである。この実施の形態では、定常部は、楽譜として表した時に一つの音符に相当する区間のことであり、音色、音高、ベロシティという音の３大要素の変化に注目し、人間が一つの音として認識する区間を時間軸上で検出しようとすることをいう。
以下、図５のステップに従って、この定常区間検出処理について説明する。 FIG. 5 is a diagram showing details of the stationary section detection processing in step 15 of FIG. How the steady section is detected from the stable sections obtained in step 14 will be described in detail with reference to FIGS. 10 to 17.
When analyzing musical audio signals such as voices and musical sounds, it is important to know where the stationary part is. This is because, in a tone other than the rhythm type, the pitch is determined by the periodicity of the stationary part, and the tone value is determined by using the stationary part as a skeleton. In this embodiment, the stationary part is a section corresponding to one note when expressed as a musical score, and attention is paid to changes in the three major elements of tone, pitch, and velocity, and humans are assigned one sound. Means to detect on the time axis a section recognized as.
Hereinafter, the stationary section detection processing will be described according to the steps in FIG.

定常区間を検出するためには、まず音信号波形の周期の基準位置を検出することが必要である。この基準位置の検出方法には大きく分けて、０クロス位置検出法とピーク位置検出法のいずれか一方を用いるのが一般的である。０クロス位置検出法によって周期の基準位置を検出するためには、フィルタ等で倍音をできるだけ取り除かないと検出は困難であり、それに帯域分割も必要である。ピーク位置検出法の場合も倍音をできるだけ取り除くことが望ましいが、０クロス位置検出ほどはシビアでないため、音声や楽器の発音可能周波数帯をカットオフ周波数としてバンドパスフィルタを掛けるだけでよく、帯域分割などの処理を特に行う必要はない。従って、ピーク位置検出法の方が手順が簡単で、そこそこの結果が得られる方法であり、望ましい。従って、この実施の形態では、ピーク位置検出法のよって周期の基準位置を検出する場合について説明する。 In order to detect the stationary section, it is necessary to first detect the reference position of the period of the sound signal waveform. The method of detecting the reference position is generally roughly divided into one of the zero cross position detection method and the peak position detection method. In order to detect the reference position of the period by the 0 cross position detection method, it is difficult to detect the overtone without removing as much as possible of the overtone with a filter or the like, and it is necessary to divide the band. In the case of the peak position detection method as well, it is desirable to remove overtones as much as possible, but since it is not as severe as the 0 cross position detection, it is only necessary to apply a band-pass filter using the soundable frequency band of the voice or instrument as a cutoff frequency, and the band is divided. There is no particular need to perform such processing. Therefore, the peak position detection method is simpler in procedure, and a method that can obtain a reasonable result is desirable. Therefore, in this embodiment, the case where the reference position of the cycle is detected by the peak position detection method will be described.

ステップ５１：第１次バンドパスフィルタ（第１次ＢＰＦ）を通過させて、所定の倍音を削除する。これは、発音可能な帯域をカットオフ周波数として、バンドパスフィルタを掛けることである。音声の場合、人間の発音可能な帯域は８０〜１０００Ｈｚ程度であり、ユーザを限定せずに、オールマイティに分析するにはこれくらいが必要である。但し、ユーザが限定されている場合には、発音可能な帯域をある程度絞ることによって、倍音による間違いを減少させて、検出精度を向上させることができる。ギターなら、８０〜７００Ｈｚ程度であるが、これも予め音高枠を決めておくと精度が上がる。楽器ごとの違いなども予め設定しておくと精度が向上する。図１０（Ａ）は、第１次ＢＰＦ処理後の音声波形の一部を示すものである。
ステップ５２：ステップ５１の第１次ＢＰＦ処理によって得られた楽音波形信号に対してピーク位置検出法を用いて１周期の基準となるピーク基準位置検出処理を行う。このピーク位置検出方法は公知の手法によって行う。楽音波形のピークレベルを検知して、これを所定の時定数回路で保持し、その保持されている値をスレッシュルドホールド電圧として次にこのスレッシュルドホールド電圧以上になった場合を次のピークレベルとして保持し、それを順次繰り返すことによって、図１０（Ａ）のようなピーク位置を検出することができる。図１０（Ａ）はこのピーク位置を検出する際のスレッシュルドホールド電圧の様子を示す図である。
図１０（Ａ）の音声波形からは、図１０（Ｂ）のようなピーク位置が検出されることになる。図１０（Ｂ）ではピーク基準位置Ｐ１，Ｐ２，Ｐ３，Ｐ６は共に規則正しく所定の位置で現れているが、ピーク基準位置Ｐ４，Ｐ５については、音声波形の若干の乱れによって誤差を含む位置にピークが現れている。
これは、ステップ５１の第１次ＢＰＦ処理のカットオフ周波数の帯域が広い範囲を網羅しているため、図１０（Ａ）のようにピーク位置が連続して表れたからである。 Step 51: Pass a first-order band-pass filter (first-order BPF) to delete a predetermined harmonic. This is to apply a band-pass filter using a band capable of generating sound as a cutoff frequency. In the case of voice, the band that can be pronounced by humans is about 80 to 1000 Hz, which is necessary for almighty analysis without limiting the user. However, when the number of users is limited, it is possible to reduce errors due to overtones and improve the detection accuracy by narrowing the soundable band to some extent. In the case of a guitar, the frequency is about 80 to 700 Hz. However, if the pitch frame is determined in advance, the accuracy is improved. The accuracy can be improved by setting in advance the differences between the instruments. FIG. 10A shows a part of the audio waveform after the primary BPF processing.
Step 52: A peak reference position detection process, which is a reference for one cycle, is performed on the musical tone waveform signal obtained by the first-order BPF process in step 51 using a peak position detection method. This peak position detection method is performed by a known method. The peak level of the musical tone waveform is detected and held by a predetermined time constant circuit, and the held value is used as a threshold voltage. When the voltage exceeds the threshold voltage, the next peak level is obtained. , And by repeating the sequence sequentially, a peak position as shown in FIG. 10A can be detected. FIG. 10A is a diagram showing a state of a threshold hold voltage when the peak position is detected.
A peak position as shown in FIG. 10B is detected from the audio waveform of FIG. In FIG. 10B, the peak reference positions P1, P2, P3, and P6 all appear regularly at predetermined positions. However, the peak reference positions P4 and P5 have peaks at positions including errors due to slight disturbance of the audio waveform. Is appearing.
This is because the peak position appears continuously as shown in FIG. 10A because the band of the cutoff frequency of the first-order BPF processing in step 51 covers a wide range.

ステップ５３：前記ステップ５２で検出されたピーク基準位置に基づいて、あるピーク基準位置から始まる基本区間と、その基本区間の直後の次のピーク基準位置までの区間（以下、移動区間とする）との間の２つの区間の波形について波形が同じであるか否かの比較を行う。
図１０（Ｂ）に示されるピーク基準位置について考察すると、ピーク基準位置Ｐ１からピーク基準位置Ｐ２までが区間ｄ、ピーク基準位置Ｐ２からピーク基準位置Ｐ３までが区間ｅとなる。このとき、両区間ｄ，ｅは帯域最低長よりも大きく、帯域最高長よりも小さいので、区間ｄが基本区間となり、区間ｅが移動区間となり、後述する波形比較処理の対象となる。
次に、区間ｅが基本区間となり、ピーク基準位置Ｐ３からピーク基準位置Ｐ４までが区間ｆとなる。このとき、両区間ｅ，ｆは帯域最低長よりも大きく、帯域最高長よりも小さいので、今度は区間ｅが基本区間となり、区間ｆが移動区間となり、後述する波形比較処理の対象となる。
ところが、ピーク基準位置Ｐ４からピーク基準位置Ｐ５までの区間は帯域最低長よりも小さいので、比較対象の区間とはならずに、次のピーク基準位置Ｐ５からピーク基準位置Ｐ６までの区間ｇが区間ｆとの波形比較対象となる。
なお、波形比較処理の結果、区間ｆ及び区間ｇは他の区間ｄや区間ｅとは異なった波形として認識されることになる。まず、ワーキングメモリ（ＲＡＭ）には、ピーク基準位置情報をアドレスとして、そこに一致フラグ又は不一致フラグがそれぞれ書き込まれるデータ領域が設けられる。そして、図１０（Ｂ）のような場合には、区間ｄと区間ｅとは一致すると判定されるので、区間ｅに対応するピーク基準位置情報Ｐ２に関するデータ領域には一致フラグが書き込まれる。一方、区間ｅと区間ｆとは一致しないと判定されるので、区間ｆに対応するピーク基準位置情報Ｐ３に関するデータ領域には不一致フラグが書き込まれる。ピーク基準位置Ｐ４からピーク基準位置Ｐ５までの区間は帯域最低長よりも小さいので、ピーク基準位置情報Ｐ４及びＰ５に関するデータ領域には不一致フラグが書き込まれる。なお、ピーク基準位置情報Ｐ１及びＰ６に関するデータ領域には一致フラグが書き込まれているものとする。このようにして順次ピーク基準位置情報と共に書き込まれた一致フラグ及び不一致フラグの様子が図１０（Ｃ）に示されている。 Step 53: Based on the peak reference position detected in step 52, a basic section starting from a certain peak reference position and a section up to the next peak reference position immediately after the basic section (hereinafter, referred to as a moving section). A comparison is made as to whether the waveforms in the two sections between are the same.
Considering the peak reference position shown in FIG. 10B, a section d is from the peak reference position P1 to the peak reference position P2, and a section e is from the peak reference position P2 to the peak reference position P3. At this time, since both sections d and e are larger than the minimum band length and smaller than the maximum band length, the section d is a basic section, the section e is a moving section, and is subjected to a waveform comparison process described later.
Next, the section e becomes the basic section, and the section from the peak reference position P3 to the peak reference position P4 becomes the section f. At this time, since both sections e and f are larger than the minimum band length and smaller than the maximum band length, the section e becomes a basic section and the section f becomes a moving section, and is subjected to a waveform comparison process described later.
However, since the section from the peak reference position P4 to the peak reference position P5 is smaller than the minimum bandwidth, the section g from the next peak reference position P5 to the peak reference position P6 is not a section to be compared. f is a waveform comparison target.
As a result of the waveform comparison processing, the section f and the section g are recognized as different waveforms from the other sections d and e. First, a working memory (RAM) is provided with a data area in which a coincidence flag or a non-coincidence flag is written, using the peak reference position information as an address. Then, in the case as shown in FIG. 10B, since it is determined that the section d and the section e match, a match flag is written in the data area related to the peak reference position information P2 corresponding to the section e. On the other hand, since it is determined that the section e does not match the section f, a mismatch flag is written in the data area related to the peak reference position information P3 corresponding to the section f. Since the section from the peak reference position P4 to the peak reference position P5 is smaller than the minimum bandwidth, a mismatch flag is written in the data area related to the peak reference position information P4 and P5. It is assumed that a match flag has been written in the data area relating to the peak reference position information P1 and P6. FIG. 10C shows the state of the match flag and the non-match flag sequentially written together with the peak reference position information.

波形比較処理は後述する誤差率を算出する方法によって行われる。
図１３はこの波形比較処理の中で行われる誤差率の算出方法を説明するための図である。
まず、誤差率の算出対象となる２つの波形が図１２の示すような比較波１Ｘと比較波２Ｘであるとする。この波形は図１０のピーク基準位置によって区切られた範囲となる。まず、比較波１Ｘ及び比較波２Ｘについて、最大振幅値が１００パーセントとなるようにその振幅値の正規化を行う。まず、比較波１Ｘは比較波１Ｙとなり、比較波２Ｘは比較波２Ｙとなる。ここで、比較波２Ｘは比較波１Ｘに比べて時間軸（横軸）方向の長さが短いので、比較波２Ｘを比較波１Ｘと同じ時間幅となるように伸長する。すなわち、比較波２Ｙの時間軸を伸長して比較波２Ｚにする。
この比較波１Ｙと比較波２Ｚとの間で誤差率の計算が行われる。
図１３は、比較波１Ｙと比較波２Ｚとの間の誤差率を算出する場合の具体的な値を示す図である。図では、比較波１Ｙと比較波２Ｚの最初の１周期の波形すなわちサンプリング数で２４個分について誤差率を算出する場合について説明する。
比較波１Ｙと比較波２Ｚの同じサンプリング位置についてその差分を算出し、その差分の絶対値の合計を求める。図１３の場合には絶対値の合計値は１２２である。これをサンプリング数２４で除することによって、誤差率が求まる。この場合には誤差率は５となる。そこで、同じ波形がどうかのしいき値を１０とすれば、図１３の場合の誤差率５は１０以下なので、同じ波形として処理されることになる。なお、図１３において、各波形は１０００を最大レベルとして正規化されている。 The waveform comparison processing is performed by a method of calculating an error rate described later.
FIG. 13 is a diagram for explaining a method of calculating an error rate performed in the waveform comparison processing.
First, it is assumed that two waveforms whose error rates are to be calculated are a comparison wave 1X and a comparison wave 2X as shown in FIG. This waveform is a range divided by the peak reference position in FIG. First, the amplitude values of the comparison wave 1X and the comparison wave 2X are normalized such that the maximum amplitude value becomes 100%. First, the comparison wave 1X becomes the comparison wave 1Y, and the comparison wave 2X becomes the comparison wave 2Y. Here, since the comparative wave 2X has a shorter length in the time axis (horizontal axis) than the comparative wave 1X, the comparative wave 2X extends so as to have the same time width as the comparative wave 1X. That is, the time axis of the comparison wave 2Y is extended to make the comparison wave 2Z.
The error rate is calculated between the comparison wave 1Y and the comparison wave 2Z.
FIG. 13 is a diagram showing specific values when calculating the error rate between the comparison wave 1Y and the comparison wave 2Z. In the drawing, a case will be described in which the error rate is calculated for the waveforms of the first cycle of the comparison wave 1Y and the comparison wave 2Z, that is, for 24 samplings.
The difference is calculated for the same sampling position of the comparison wave 1Y and the comparison wave 2Z, and the sum of the absolute values of the difference is calculated. In the case of FIG. 13, the total value of the absolute values is 122. By dividing this by the number of samplings 24, the error rate is obtained. In this case, the error rate is 5. Therefore, if the threshold value of the same waveform is set to 10, the error rate 5 in the case of FIG. 13 is 10 or less, so that the same waveform is processed. In FIG. 13, each waveform is normalized with 1000 as the maximum level.

ステップ５４：ステップ５３の波形比較処理の結果を利用して、誤差率が所定値（例えば１０）よりも小さな区間同士を繋げて、それを疑似的な一致区間とし、各一致区間から抽出されるピッチの最大値と最小値を検出し、それに基づいてカットオフ周波数帯を決定する。例えば、波形比較処理の結果得られた複数の一致区間の中のピッチの最小値が２３５ポイントで、最大値が３６５ポイントだとする。この一致区間にやや余裕を持たせるために、最小値を１割減とし、最大値を１割増しとすると、一致区間は約２１２ポイントから約４０２ポイントになる。これは、サンプリング周波数が４４．１ｋＨｚだと、１１０Ｈｚから２０８Ｈｚのオーディオ信号の周波数帯に相当する。従って、この１１０Ｈｚから２０８Ｈｚをカットオフ周波数帯とする。
ステップ５５：ステップ５４で決定された新たなカットオフ周波数帯を用いて、第２次バンドパスフィルタ（第２次ＢＰＦ）を通過させて、不要な倍音を除去する。例えば、前述の場合には、カットオフ周波数帯は１１０Ｈｚから２０８Ｈｚの範囲となる。これによって、倍音による間違いを減少させて、検出精度を向上させることができる。
ステップ５６：ステップ５２のピーク基準位置検出処理と同じ処理を行う。
ステップ５７：ステップ５３の波形比較処理と同じ処理を行う。
ステップ５５からステップ５７までの一連の処理によって、誤差の原因となる低周波や高調波がカットされてより精度の高いピーク基準位置検出処理及び波形比較処理が可能となり、前回よりも精度の高い一致区間が得られる。ステップ５７の波形比較処理によって、図１０（Ｃ）のように一致フラグ及び不一致フラグによって特徴付けられた有効区間の波形は、図１０（Ｄ）のような三つの定常区間Ｘ，Ｙ，Ｚのようなピッチ列が求められる。 Step 54: Using the result of the waveform comparison process of step 53, the sections in which the error rate is smaller than a predetermined value (for example, 10) are connected to each other to be a pseudo-match section, and extracted from each match section. The maximum value and the minimum value of the pitch are detected, and the cutoff frequency band is determined based on the maximum value and the minimum value. For example, it is assumed that the minimum value of the pitch in a plurality of matching sections obtained as a result of the waveform comparison processing is 235 points, and the maximum value is 365 points. Assuming that the minimum value is reduced by 10% and the maximum value is increased by 10% in order to give a margin to this matching section, the matching section is changed from about 212 points to about 402 points. This corresponds to a frequency band of an audio signal of 110 Hz to 208 Hz when the sampling frequency is 44.1 kHz. Therefore, the cutoff frequency band is from 110 Hz to 208 Hz.
Step 55: Using the new cut-off frequency band determined in Step 54, the signal is passed through a second-order band-pass filter (second-order BPF) to remove unnecessary harmonics. For example, in the case described above, the cutoff frequency band is in a range from 110 Hz to 208 Hz. This can reduce errors due to overtones and improve the detection accuracy.
Step 56: The same processing as the peak reference position detection processing of step 52 is performed.
Step 57: The same processing as the waveform comparison processing of step 53 is performed.
By the series of processing from step 55 to step 57, low frequency and harmonics that cause errors are cut, and more accurate peak reference position detection processing and waveform comparison processing become possible. A section is obtained. By the waveform comparison process of step 57, the waveform of the effective section characterized by the coincidence flag and the non-coincidence flag as shown in FIG. 10 (C) is divided into three stationary sections X, Y and Z as shown in FIG. Such a pitch sequence is required.

ステップ５８：ステップ５７までの処理によって得られた図１０（Ｄ）のようなピッチ列を用いても良いが、さらに精度を高めるために、各ピーク基準位置におけるピッチデータを補間して、１サンプルポイント毎に１ピッチデータとなるように補間する。この場合に、図１０（Ａ）から明らかなように最初のピーク基準位置より前のサンプルポイント及び最後のピーク基準位置より後のサンプルポイントについては、補間するためのピッチデータが存在しないためピッチの補間を行うことができない。そこで、最初のピーク基準位置より前のサンプルポイントについては最初のピーク位置におけるピッチデータを、最後のピーク基準位置より後のサンプルポイントについては最後のピーク位置におけるピッチデータをそのまま適用することにした。そして、それぞれのピーク基準位置間においては、両者のピッチデータの値を直線補間して適用する。例えば、図１０（Ｂ）において、ピーク基準位置Ｐ１のピッチデータがＰＤ１、ピーク基準位置Ｐ２のピッチデータがＰＤ２であるとすれば、ピーク基準位置Ｐ１とＰ２との間の任意のサンプルポイントＰＶにおけるピッチデータは、次式によって求められる。
（ＰＤ２−ＰＤ１）×（ＰＶ−ＰＡ）／（Ｐ２−Ｐ１） Step 58: The pitch sequence as shown in FIG. 10D obtained by the processing up to step 57 may be used, but in order to further improve the accuracy, the pitch data at each peak reference position is interpolated to obtain one sample. Interpolation is performed so that one pitch data is obtained for each point. In this case, as apparent from FIG. 10 (A), for the sample points before the first peak reference position and the sample points after the last peak reference position, there is no pitch data for interpolation, so that the pitch Interpolation cannot be performed. Therefore, pitch data at the first peak position is directly applied to the sample points before the first peak reference position, and pitch data at the last peak position is applied to the sample points after the last peak reference position. Then, between the respective peak reference positions, the values of both pitch data are linearly interpolated and applied. For example, in FIG. 10B, assuming that the pitch data at the peak reference position P1 is PD1 and the pitch data at the peak reference position P2 is PD2, at an arbitrary sample point PV between the peak reference positions P1 and P2. The pitch data is obtained by the following equation.
(PD2-PD1) x (PV-PA) / (P2-P1)

ステップ５９：ステップ５８の処理によって求められた各サンプルポイント毎のピッチデータを用いてバンドパスフィルタ処理を行う。すなわち、ピッチデータは時間経過と共に変化するので、カットオフ周波数帯も時間的に変動する、いわやる時変動バンドパスフィルタ（ＢＰＦ）処理を行う。これによって、楽音波形信号はサイン波形に近い波形に変形されるので、このような波形に対してピーク位置検出処理を行うことによって、理想的なピーク位置検出を行うことができる。また、これを基準に比較処理を行えるので、誤差が最小限に抑えられるようになるため、高精度で同波形（同母音）区間を見つけることが可能となる。
ステップ５Ａ：ステップ５９の時変動ＢＰＦ処理を経た楽音波形に対して、ステップ５２のピーク基準位置検出処理と同じ処理を行う。
ステップ５Ｂ：ステップ５９の時変動ＢＰＦ処理を経た楽音波形に対して、ステップ５３の波形比較処理と同じ処理を行う。 Step 59: Band-pass filter processing is performed using the pitch data for each sample point obtained by the processing of step 58. That is, since the pitch data changes with the passage of time, the so-called time-varying bandpass filter (BPF) processing in which the cutoff frequency band also varies with time is performed. As a result, the musical tone waveform signal is transformed into a waveform close to a sine waveform. By performing peak position detection processing on such a waveform, ideal peak position detection can be performed. In addition, since the comparison process can be performed based on this, the error can be minimized, so that the same waveform (same vowel) section can be found with high accuracy.
Step 5A: The same processing as the peak reference position detection processing in step 52 is performed on the musical tone waveform that has undergone the time-varying BPF processing in step 59.
Step 5B: The same processing as the waveform comparison processing in step 53 is performed on the musical tone waveform that has undergone the time-varying BPF processing in step 59.

上述の説明では、図５の定常区間検出処理のステップ５２、５６及び５Ａでは、楽音波形のプラス側だけに注目してピーク基準位置を検出する場合について説明したが、音声音や楽器音などのようにピッチを有する楽音波形は、プラス側、マイナス側、又はプラス側マイナス側の両側に強いピークが現れることがある。従って、前述のようにプラス側にピークが強く現れている場合にはプラス側に注目してピーク基準位置を検出することによって、ピッチの検出が可能である。この場合、鋭いピークが両側に現れている場合も問題ないが、マイナス側に偏って強く現れる場合がある。このような場合には、その強く現れる側に注目してピーク基準位置の検出を行う方が良いことは言うまでもない。仮に、ピーク基準位置の検出をピークが弱く現れる方向に注目して行ったとすると、ピーク基準位置の検出自体が曖昧になり、思ったようにピッチを検出することができなくなるというおそれが生じる。ピークがプラス側マイナス側のどちらかに偏って現れるという現象は、発音する人間や楽器のその時々の条件に応じて種々変化するものなので、一概にどちら側に注目してピーク基準位置を検出したらよいかということは言えないのが現状である。
そこで、強いピークがどちら側に現れてもよいように、予め楽音波形をチェックして、ピークがプラス側又はマイナス側のどちらに強く現れているかを検出し、検出された側の楽音波形に基づいてピーク基準位置の検出及びピッチ検出を行うようにすればよい。例えば、図５のステップ５１の第１次ＢＰＦ処理後における安定区間の楽音波形が図１１のようであったとする。この楽音波形の場合、強いピークはマイナス側に現れ、弱いピークがプラス側に現れている。この楽音波形の両側についてピーク基準位置の検出処理を施した場合、ピークの強さは異なるがどちら側にも安定したピークが現れているので、プラス側でもマイナス側でもほぼ変わりなく規則的なピーク基準位置を検出することは可能である。従って、この楽音波形の場合には、プラス側に注目して図５の定常区間検出処理を行ってもなんら支障はないことになる。しかしながら、楽音波形によっては、周期が比較的短かくて、長く繰り返す波形などの場合には、徐々にそのピークが鈍ることもあり、プラス側だけに注目して定常区間処理を行った場合に、正確なピーク基準位置を検出できなくなることがある。従って、図１１のような波形の場合でも、できるだけ強いピークの現れるマイナス側に注目することが望ましい。
そこで、図４の安定区間検出処理によって検出された安定区間毎に、その区間全体で楽音波形の絶対値の最大がプラス側又はマイナス側のどちらに存在するかを検出し、検出する側に注目してピーク基準位置の検出を行うようにすればよい。図１１の場合にはマイナス側に絶対値の最大が存在するので、マイナス側に注目してピーク基準位置の検出が行うことが望ましいことになる。これによって、倍音などに惑わされることなくピーク基準位置を検出することができる。 In the above description, in the steps 52, 56 and 5A of the stationary section detection processing in FIG. 5, the case where the peak reference position is detected by focusing only on the plus side of the musical sound waveform has been described. As described above, a musical tone waveform having a pitch may have strong peaks on both the plus side, the minus side, and both the plus side and the minus side. Therefore, as described above, when a peak appears strongly on the plus side, the pitch can be detected by focusing on the plus side and detecting the peak reference position. In this case, there is no problem if a sharp peak appears on both sides, but it may appear strongly on the negative side. In such a case, it is needless to say that it is better to detect the peak reference position by paying attention to the strong appearance side. If the peak reference position is detected by paying attention to the direction in which the peak appears weakly, the detection of the peak reference position itself becomes ambiguous, and the pitch may not be detected as expected. The phenomenon that the peak appears to be biased to either the plus side or the minus side varies variously depending on the conditions of the human being or the musical instrument at each time, so if you focus on either side and detect the peak reference position At present it is not possible to say that it is good.
Therefore, the tone waveform is checked in advance so that a strong peak may appear on either side, and whether the peak appears strongly on the plus side or the minus side is detected, and based on the tone waveform on the detected side. Then, the peak reference position and the pitch may be detected. For example, it is assumed that the musical tone waveform in the stable section after the primary BPF processing in step 51 in FIG. 5 is as shown in FIG. In the case of this tone waveform, a strong peak appears on the minus side, and a weak peak appears on the plus side. When the peak reference position is detected on both sides of this musical tone waveform, the peak intensity is different, but stable peaks appear on both sides. It is possible to detect the reference position. Therefore, in the case of the musical tone waveform, there is no problem even if the stationary section detection processing of FIG. 5 is performed by focusing on the plus side. However, depending on the musical sound waveform, in the case of a waveform having a relatively short period and a long repetition, the peak may gradually become dull. In some cases, an accurate peak reference position cannot be detected. Therefore, even in the case of the waveform as shown in FIG. 11, it is desirable to pay attention to the negative side where the strongest peak appears.
Therefore, for each stable section detected by the stable section detection processing of FIG. 4, it is detected whether the maximum of the absolute value of the musical tone waveform exists on the plus side or the minus side in the entire section, and attention is paid to the side to be detected. Then, the peak reference position may be detected. In FIG. 11, since the maximum of the absolute value exists on the minus side, it is desirable to detect the peak reference position by paying attention to the minus side. Thus, the peak reference position can be detected without being disturbed by overtones or the like.

なお、上述の実施の形態では、楽音波形のピークレベルを検知して、これを所定の時定数回路で保持し、その保持されている値をスレッシュルドホールド電圧として次にこのスレッシュルドホールド電圧以上になった場合を次のピークレベルとして保持し、それを順次繰り返すことによって、ピーク位置を検出していた。しかしながら、この方法だと、時定数をどの程度に設定するかによって、所望のピーク基準位置を検出することができるか否かが決定していたので、倍音を相当な帯域で含む音声音や楽器音の場合には、整然としたピークがなかなか出現しない場合が多いという問題を有していた。そこで、上述の実施の形態では、検出されたピーク基準位置が後の周波数帯決定処理に用いることのできる正確なものであるか否かの判定を、検出されたピーク位置に基づいた波形比較処理によって行っていた。このことは、前述のピーク基準位置検出処理によって検出されたピーク位置がさほど正確なものでもなくてもよいということを意味するものである。そこで、楽音波形のピークレベルを検知する場合に、時定数をある程度小さめに設定しておき、楽音波形からピーク基準位置として可能性のあるものを多数抽出し、抽出されたピーク基準位置に基づいて波形比較処理を行って、ピーク基準位置を順次決定していくようにしてもよい。この場合、図１４のような楽音波形のプラス側に注目してピーク位置を検出すれば、各ピーク位置は１周期内で３箇所抽出される。この１周期当たり３箇所のピーク位置に基づいてそれぞれ波形比較処理を行うと、その処理に要する時間は大変なものとなる。故に、ここでは、まず、同波形であると認定された区間に基づいて、これ以降の同波形区間の検出処理を効率的に行うようにした。
例えば、図１４のような楽音波形の場合、ピーク位置としてＰａ〜Ｐｏが検出されることになる。従って、最初に波形比較処理されるのは、ピーク位置Ｐａを起点とした以下の１６通りの組み合わせについてである。
（Ｐａ−Ｐｂ）と（Ｐｂ−Ｐｃ）、（Ｐａ−Ｐｂ）と（Ｐｂ−Ｐｄ）、
（Ｐａ−Ｐｂ）と（Ｐｂ−Ｐｅ）、（Ｐａ−Ｐｂ）と（Ｐｂ−Ｐｆ）、
（Ｐａ−Ｐｃ）と（Ｐｃ−Ｐｄ）、（Ｐａ−Ｐｃ）と（Ｐｃ−Ｐｅ）、
（Ｐａ−Ｐｃ）と（Ｐｃ−Ｐｆ）、（Ｐａ−Ｐｃ）と（Ｐｃ−Ｐｇ）、
（Ｐａ−Ｐｄ）と（Ｐｄ−Ｐｅ）、（Ｐａ−Ｐｄ）と（Ｐｄ−Ｐｆ）、
（Ｐａ−Ｐｄ）と（Ｐｄ−Ｐｇ）、（Ｐａ−Ｐｄ）と（Ｐｄ−Ｐｈ）、
（Ｐａ−Ｐｅ）と（Ｐｅ−Ｐｆ）、（Ｐａ−Ｐｅ）と（Ｐｅ−Ｐｇ）、
（Ｐａ−Ｐｅ）と（Ｐｅ−Ｐｈ）、（Ｐａ−Ｐｅ）と（Ｐｅ−Ｐｉ）
この結果、区間（Ｐａ−Ｐｄ）と区間（Ｐｄ−Ｐｇ）の波形が一致すると判定される。この結果、ピーク位置Ｐａはピッチ基準位置ＰＰａとなり、他のピーク位置Ｐｂ，Ｐｃは候補から除外される。そして、次はピーク位置Ｐｄを起点として同じようして１６通りの組み合わせについて波形比較処理を行い、ピーク位置Ｐｄがピッチ基準位置ＰＰｄとなる。以下、同様にして、ピッチ基準位置が次々と検出されることになる。
なお、１６通りの中から同波形区間を検出する場合には、１６通り全ての誤差率を算出し、その中で所定値（例えば１０）以下の誤差率で最小のものを同波形区間としてもよいし、順次算出された誤差率の中で所定値（例えば１０）以下のものが現れた時点でそれを同波形区間としてもよい。 In the above-described embodiment, the peak level of the musical tone waveform is detected and held by a predetermined time constant circuit, and the held value is set as a threshold hold voltage, which is then equal to or higher than the threshold hold voltage. Is held as the next peak level, and the peak position is detected by sequentially repeating this. However, according to this method, it is determined whether or not a desired peak reference position can be detected depending on how much the time constant is set. In the case of a sound, there is a problem that an orderly peak often does not easily appear. Therefore, in the above-described embodiment, the determination whether or not the detected peak reference position is an accurate one that can be used in the subsequent frequency band determination processing is performed by the waveform comparison processing based on the detected peak position. Had gone by. This means that the peak position detected by the above-described peak reference position detection processing may not be very accurate. Therefore, when detecting the peak level of the musical sound waveform, the time constant is set to a relatively small value, a large number of possible peak reference positions are extracted from the musical sound waveform, and based on the extracted peak reference position. A waveform comparison process may be performed to sequentially determine the peak reference position. In this case, if the peak positions are detected by paying attention to the plus side of the musical tone waveform as shown in FIG. 14, three peak positions are extracted in one cycle. When the waveform comparison processing is performed based on the three peak positions per one cycle, the time required for the processing becomes very long. Therefore, here, first, based on the section determined to have the same waveform, the subsequent detection processing of the same waveform section is efficiently performed.
For example, in the case of a musical sound waveform as shown in FIG. 14, Pa to Po are detected as peak positions. Accordingly, the waveform comparison processing is first performed for the following 16 combinations starting from the peak position Pa.
(Pa-Pb) and (Pb-Pc), (Pa-Pb) and (Pb-Pd),
(Pa-Pb) and (Pb-Pe), (Pa-Pb) and (Pb-Pf),
(Pa-Pc) and (Pc-Pd), (Pa-Pc) and (Pc-Pe),
(Pa-Pc) and (Pc-Pf), (Pa-Pc) and (Pc-Pg),
(Pa-Pd) and (Pd-Pe), (Pa-Pd) and (Pd-Pf),
(Pa-Pd) and (Pd-Pg), (Pa-Pd) and (Pd-Ph),
(Pa-Pe) and (Pe-Pf), (Pa-Pe) and (Pe-Pg),
(Pa-Pe) and (Pe-Ph), (Pa-Pe) and (Pe-Pi)
As a result, it is determined that the waveforms of the section (Pa-Pd) and the section (Pd-Pg) match. As a result, the peak position Pa becomes the pitch reference position PPa, and the other peak positions Pb and Pc are excluded from the candidates. Then, waveform comparison processing is performed for 16 combinations in the same manner starting from the peak position Pd, and the peak position Pd becomes the pitch reference position PPd. Hereinafter, similarly, the pitch reference positions are successively detected.
When the same waveform section is detected from the 16 patterns, the error rates of all 16 patterns are calculated, and the smallest one having an error rate equal to or less than a predetermined value (for example, 10) is set as the same waveform section. Alternatively, when an error rate that is less than or equal to a predetermined value (for example, 10) appears in the sequentially calculated error rates, it may be set as the same waveform section.

このように同波形区間を抽出するのに多数の組み合わせについて誤差率の算出処理を行っていると、相当の時間を要することになるので、ここでは、前述のように同波形であると認定された区間に基づいて、これ以降の同波形区間の検出処理を行う。すなわち、前述の１６通りの組み合わせの中でも、
（Ｐａ−Ｐｂ）と（Ｐｂ−Ｐｄ）、（Ｐａ−Ｐｂ）と（Ｐｂ−Ｐｅ）、
（Ｐａ−Ｐｂ）と（Ｐｂ−Ｐｆ）、（Ｐａ−Ｐｃ）と（Ｐｃ−Ｐｄ）、
（Ｐａ−Ｐｃ）と（Ｐｃ−Ｐｇ）、（Ｐａ−Ｐｄ）と（Ｐｄ−Ｐｅ）、
（Ｐａ−Ｐｄ）と（Ｐｄ−Ｐｆ）、（Ｐａ−Ｐｅ）と（Ｐｅ−Ｐｆ）、
（Ｐａ−Ｐｅ）と（Ｐｅ−Ｐｇ）
の９通りについては比較処理を行わない。これは比較対象の波形区間長の比が２倍近いので、比較するまでもなく同波形とはなりえないので、事前にそれらの比較を行わないようにするためである。
従って、ここでは、次の７通りの組み合わせについて波形比較処理を行う。
（Ｐａ−Ｐｂ）と（Ｐｂ−Ｐｃ）、（Ｐａ−Ｐｃ）と（Ｐｃ−Ｐｅ）、
（Ｐａ−Ｐｃ）と（Ｐｃ−Ｐｆ）、（Ｐａ−Ｐｄ）と（Ｐｄ−Ｐｇ）、
（Ｐａ−Ｐｄ）と（Ｐｄ−Ｐｈ）、（Ｐａ−Ｐｅ）と（Ｐｅ−Ｐｈ）、
（Ｐａ−Ｐｅ）と（Ｐｅ−Ｐｉ）
すると、前述の場合と同じく区間（Ｐａ−Ｐｄ）と区間（Ｐｄ−Ｐｇ）の波形が一致すると判定される。この結果、ピーク位置Ｐａはピッチ基準位置ＰＰａとなり、他のピーク位置Ｐｂ，Ｐｃは候補から除外される。そして、次はピーク位置Ｐｄを起点として同じようして７通りの組み合わせについて波形比較処理を行うことになるが、ここでは、区間（Ｐｄ−Ｐｇ）に基づいて、次の比較対象となる区間を限定する。すなわち、区間（Ｐｄ−Ｐｇ）の区間長に±αとなるような区間（Ｐｇ−Ｐｉ）、（Ｐｇ−Ｐｊ）、（Ｐｇ−Ｐｋ）に対して波形比較処理を行う。ここで、αとして、例えば、区間（Ｐａ−Ｐｄ）の約４分の１の長さを用いる。なお、αにはこれ以外の適当な値を用いてもよいことはいうまでもない。この波形比較処理の結果、区間（Ｐｄ−Ｐｇ）と区間（Ｐｇ−Ｐｊ）が同波形区間と判定される。従って、これ以降は３通りの組み合わせについて波形比較処理を行えばよいので、演算処理が非常に楽になる。 In this way, if the error rate calculation processing is performed for a large number of combinations to extract the same waveform section, it takes a considerable amount of time. Therefore, here, it is determined that the waveforms have the same waveform as described above. Based on the section, the detection processing of the subsequent same waveform section is performed. That is, among the above 16 combinations,
(Pa-Pb) and (Pb-Pd), (Pa-Pb) and (Pb-Pe),
(Pa-Pb) and (Pb-Pf), (Pa-Pc) and (Pc-Pd),
(Pa-Pc) and (Pc-Pg), (Pa-Pd) and (Pd-Pe),
(Pa-Pd) and (Pd-Pf), (Pa-Pe) and (Pe-Pf),
(Pa-Pe) and (Pe-Pg)
No comparison processing is performed for the nine cases. This is because the ratio of the waveform section lengths to be compared is almost twice, and it is impossible to obtain the same waveform without comparison, so that those comparisons are not performed in advance.
Therefore, here, the waveform comparison processing is performed for the following seven combinations.
(Pa-Pb) and (Pb-Pc), (Pa-Pc) and (Pc-Pe),
(Pa-Pc) and (Pc-Pf), (Pa-Pd) and (Pd-Pg),
(Pa-Pd) and (Pd-Ph), (Pa-Pe) and (Pe-Ph),
(Pa-Pe) and (Pe-Pi)
Then, it is determined that the waveforms of the section (Pa-Pd) and the section (Pd-Pg) match as in the case described above. As a result, the peak position Pa becomes the pitch reference position PPa, and the other peak positions Pb and Pc are excluded from the candidates. Next, the waveform comparison processing is performed for the seven combinations in the same manner starting from the peak position Pd. Here, the section to be compared next is determined based on the section (Pd-Pg). limit. That is, the waveform comparison processing is performed on the sections (Pg-Pi), (Pg-Pj), and (Pg-Pk) in which the section length of the section (Pd-Pg) becomes ± α. Here, for example, a length of about a quarter of the section (Pa-Pd) is used as α. It goes without saying that other appropriate values may be used for α. As a result of this waveform comparison processing, the section (Pd-Pg) and the section (Pg-Pj) are determined to be the same waveform section. Therefore, since the waveform comparison processing may be performed for the three combinations thereafter, the arithmetic processing becomes very easy.

ステップ５Ｃ：ステップ５１からステップ５Ｂまでの処理によって得られた定常区間を拡張する。すなわち、ステップ５１からステップ５Ｂまでの処理を行った結果、各定常区間Ｘ，Ｙ，Ｚが図１０（Ｄ）のように１個の不一致区間によって区切られている場合はよいが、図１０（Ｃ）のように定常区間が複数の不一致区間によって区切られている場合には、各不一致区間を定常区間に接続して、定常区間を拡張しなければならない。このステップ５Ｃはこの定常区間の拡張処理を行うものである。
例えば、ステップ５１からステップ５Ｂまでの処理によって、１つの安定区間が図１５のように第１同母音部ＸＸと第２同母音部ＹＹという同波形区間すなわち定常区間が決定された場合、その安定区間の先頭部分に接する第１同母音部ＸＸの先頭周期区間Ｓ１と、安定区間の末尾部分に接する第２同母音部ＹＹの最終周期区間Ｅ２とは、その安定区間に沿って拡張すればよい。ところが、第１同母音部ＸＸと第２同母音部ＹＹとの間の不一致区間Ｎ１〜Ｎ６については単純に拡張することはできないので、次のように拡張する。
まず、前記ステップ５３、５７及び５Ｂの波形比較処理よりも誤差率の許容度の大きい拡張用誤差率に基づいて、第１同母音部ＸＸの最終周期区間Ｅ１と不一致区間Ｎ１、Ｎ２、Ｎ３、Ｎ４、Ｎ５、Ｎ６の順に比較し、拡張用誤差率よりも小さいと判断された不一致区間を第１同母音部ＸＸに組み込んで拡張する。同じく第２同母音部ＹＹの先頭周期区間Ｓ２と不一致区間Ｎ６、Ｎ５、Ｎ４、Ｎ３、Ｎ２、Ｎ１の順に比較し、拡張用誤差率よりも小さいと判断された不一致区間を両方の同母音部ＸＸ、ＹＹに組み込んで拡張する。図１５（Ａ）の場合は、不一致区間Ｎ１，Ｎ２が第１同母音部ＸＸに組み込まれ、不一致区間Ｎ６が第２同母音部ＹＹに組み込まれ、結果として、図１５（Ｂ）のようになったとする。
なお、各同母音区間に組み込まれずに残った不一致区間Ｎ３，Ｎ４，Ｎ５は次のようにして、いずれかの同母音区間に組み込むようにする。不一致区間Ｎ３と、第１同母音部ＸＸに組み込まれた不一致区間Ｎ２との間の波形比較処理を行って誤差率を求め、不一致区間Ｎ５と、第２同母音区間ＹＹに組み込まれた不一致区間Ｎ６との波形比較処理を行って誤差率を求め、両方の誤差率を比較して、誤差率の小さい方（一致する度合いの高い方）をその同母音部として組み込み、拡張する。この結果、不一致区間Ｎ２と不一致区間Ｎ３との誤差率の方が小さいので、図１５（Ｃ）のように、不一致区間Ｎ３が第１同母音部ＸＸに組み込まれる。今度は、不一致区間Ｎ２と不一致区間Ｎ４との誤差率を求め、不一致区間Ｎ６と不一致区間Ｎ５との誤差率と比べ、同じく誤差率の小さい方を組み込む。このようにして、図１５（Ｃ）のように不一致区間Ｎ３，Ｎ４が第１同母音部ＸＸに組み込まれ、不一致区間Ｎ５は第２同母音部ＹＹに組み込まれる。
なお、前述のように不一致区間Ｎ３が第１同母音区間に組み込まれた時点で、図１５（Ｄ）のように、この不一致区間Ｎ３を第１同母音区間とみなして、次の誤差率算出の際の不一致区間Ｎ４と不一致区間Ｎ３とを比較対象区間としてもよいことはいうまでもない。
また、この誤差率の小さいほうをいずれかの同母音区間に組み込む際に、誤差率に上限値を設け、誤差率がその上限値を越えた場合には、その不一致区間は同母音区間に組み込まないようにしてもよい。
以上のようにして、すき間区間を両側の同母音区間に組み込み、定常区間の拡張処理を終了する。 Step 5C: Extend the steady section obtained by the processing from step 51 to step 5B. That is, as a result of performing the processing from step 51 to step 5B, it is good that each of the steady sections X, Y, and Z is separated by one non-matching section as shown in FIG. When the stationary section is divided by a plurality of non-coincidence sections as in C), each non-coincidence section must be connected to the normal section to extend the normal section. This step 5C is for performing the extension processing of this steady section.
For example, when one stable section is determined to have the same waveform section of the first same vowel portion XX and the second same vowel portion YY as shown in FIG. The first periodic section S1 of the first vowel part XX that is in contact with the beginning of the section and the last periodic section E2 of the second vowel part YY that is in contact with the end of the stable section may be extended along the stable section. . However, the non-coincidence sections N1 to N6 between the first vowel portion XX and the second vowel portion YY cannot be simply extended, and are extended as follows.
First, based on the expansion error rate having a larger error rate tolerance than the waveform comparison processing of steps 53, 57, and 5B, the last period section E1 of the first same vowel part XX and the non-coincidence sections N1, N2, N3, The comparison is performed in the order of N4, N5, and N6, and the unmatched section determined to be smaller than the error rate for expansion is incorporated into the first vowel part XX and expanded. Similarly, the head period section S2 of the second vowel portion YY is compared with the non-coincidence sections N6, N5, N4, N3, N2, and N1 in that order. XX and YY to be expanded. In the case of FIG. 15A, the mismatched sections N1 and N2 are incorporated in the first vowel part XX, and the mismatched section N6 is incorporated in the second vowel part YY. As a result, as shown in FIG. Let's say
The unmatched sections N3, N4, and N5 remaining without being incorporated in each of the homologous vowel sections are incorporated in any of the homologous vowel sections in the following manner. An error rate is obtained by performing a waveform comparison process between the mismatched section N3 and the mismatched section N2 incorporated in the first vowel section XX, and the mismatched section N5 and the mismatched section incorporated in the second same vowel section YY. An error rate is obtained by performing a waveform comparison process with N6, the two error rates are compared, and the smaller error rate (the one with a higher degree of coincidence) is incorporated as the same vowel part and expanded. As a result, since the error rate between the mismatched section N2 and the mismatched section N3 is smaller, the mismatched section N3 is incorporated into the first same vowel portion XX as shown in FIG. This time, the error rate between the non-matching section N2 and the non-matching section N4 is determined, and the smaller error rate is compared with the error rate between the mismatching section N6 and the mismatching section N5. In this way, as shown in FIG. 15C, the non-matching sections N3 and N4 are incorporated into the first vowel part XX, and the mismatching section N5 is incorporated into the second vowel part YY.
When the mismatching section N3 is incorporated into the first same vowel section as described above, the mismatching section N3 is regarded as the first same vowel section as shown in FIG. Needless to say, the mismatched section N4 and the mismatched section N3 at this time may be set as comparison target sections.
When incorporating the smaller error rate into one of the vowel sections, an upper limit value is set for the error rate. If the error rate exceeds the upper limit, the mismatched section is incorporated into the same vowel section. It may not be necessary.
As described above, the gap section is incorporated into the same vowel section on both sides, and the extended processing of the steady section ends.

なお、前述の波形比較処理では、ステップ５９の時変動ＢＰＦ処理の施された楽音波形に対して、ステップ５Ｂの波形比較処理を行う場合について説明したが、この場合だと、ＢＰＦ処理後のサイン波形に近い波形に対して比較処理を行うことになるので、母音毎の特徴までもがフィルタリングされてしまい同母音区間を抽出するという意義が薄れてしまう恐れがある。そこで、ピーク位置検出用と波形比較処理用の波形を別途用意して、それに基づいてそれぞれピーク位置検出及び波形比較処理を行うようにしてもよい。すなわち、ピーク位置検出用の波形としては時変動ＢＰＦ処理後の波形をそのまま用い、波形比較処理用としてはその時変動ＢＰＦ処理に用いた周波数成分の数倍周期の周波数帯波形を残すようなＢＰＦ処理を行った波形を用いるようにする。
例えば、ステップ５Ａのピーク基準位置検出処理によって検出されたピーク基準位置に基づいてそれぞれの波形区間長の周波数を求めた場合、次のような周波数列になったとする。
１３４．６Ｈｚ、１３５．２Ｈｚ、１４５．７Ｈｚ、１３５．７Ｈｚ、・・・
従って、この周波数列を基本周波数列として、その整数倍の周波数帯を今度はカットオフ周波数とする時変動ＢＰＦ処理をそれぞれの周波数帯毎に行い、それによって得られた波形を合成する。すなわち、上記のような周波数列の場合には、基本周波数列の２倍の周波数列として、
２６９．２Ｈｚ、２７０．４Ｈｚ、２９１．４Ｈｚ、２７１．４Ｈｚ、・・・
３倍の周波数列として、
４０３．８Ｈｚ、４０５．６Ｈｚ、４３７．１Ｈｚ、４０７．１Ｈｚ、・・・
４倍の周波数列として、
５３８．４Ｈｚ、５４０．８Ｈｚ、５８２．８Ｈｚ、５４２．８Ｈｚ、・・・
のように、それぞれ基本周波数列の整数倍の周波数列をカットオフ周波数とする時変動ＢＰＦ処理をそれぞれ別々に行う。
このようにして得られた各周波数列に対応したＢＰＦ処理後の波形を合成して得られた合成波形をステップ５Ｂの波形比較処理の対象波形として使用する。
これによって、同母音区間の検出時には、音色（母音）の変化に従った正確な同母音区間の検出を行うことができるようになる。
なお、基本周波数を最低周波数とし、基本周波数の整数倍を最高周波数とするバンドパスフィルタ処理を行い、それを波形比較処理の対象波形として使用してもよいことはいうまでもない。 In the above-described waveform comparison processing, the case where the waveform comparison processing in step 5B is performed on the musical tone waveform subjected to the time-varying BPF processing in step 59 has been described. Since the comparison process is performed on a waveform close to the waveform, even the feature of each vowel may be filtered, and the significance of extracting the vowel section may be reduced. Therefore, a waveform for peak position detection and a waveform for waveform comparison processing may be separately prepared, and the peak position detection and waveform comparison processing may be respectively performed based on the waveforms. That is, the waveform after the time-varying BPF processing is used as it is as the waveform for peak position detection, and the BPF processing for leaving a frequency band waveform several times as long as the frequency component used for the time-varying BPF processing is used for the waveform comparison processing. The waveform obtained by performing the above is used.
For example, when the frequencies of the respective waveform section lengths are obtained based on the peak reference positions detected by the peak reference position detection processing in step 5A, it is assumed that the following frequency sequence is obtained.
134.6 Hz, 135.2 Hz, 145.7 Hz, 135.7 Hz, ...
Therefore, this frequency sequence is used as a basic frequency sequence, and a time-varying BPF process is performed for each frequency band with a frequency band that is an integral multiple of this as a cutoff frequency, and a waveform obtained thereby is synthesized. That is, in the case of the above-described frequency train, the frequency train is twice as large as the fundamental frequency train.
269.2 Hz, 270.4 Hz, 291.4 Hz, 271.4 Hz, ...
As a triple frequency train,
403.8 Hz, 405.6 Hz, 437.1 Hz, 407.1 Hz, ...
As a quadruple frequency train,
538.4 Hz, 540.8 Hz, 582.8 Hz, 542.8 Hz, ...
As described above, the time-varying BPF processing is performed separately using a frequency sequence that is an integral multiple of the fundamental frequency sequence as a cutoff frequency.
The synthesized waveform obtained by synthesizing the waveforms after the BPF processing corresponding to the respective frequency trains thus obtained is used as the target waveform of the waveform comparison processing in step 5B.
As a result, when detecting a vowel section, an accurate vowel section can be accurately detected in accordance with a change in timbre (vowel).
It goes without saying that bandpass filter processing may be performed in which the basic frequency is set to the lowest frequency and the integral frequency of the basic frequency is set to the highest frequency, and this may be used as a target waveform for the waveform comparison processing.

ステップ５Ｄ：ステップ５１からステップ５Ｃまでの処理によって得られた定常区間について今度は音高の変化や安定性を考慮して細分化処理を行い、最終的な定常区間を決定する。ステップ５Ｃまでの定常区間検出処理では、波形を引き延ばして比較しているため、『ああ』などのような連続母音による音声波形の音高変化であっても、それを１つの同じ音としてとらえるような仕組みになっている。従って、楽器音の楽音波形の場合には、持続系の楽器音の音高変化を見つけ出せないような事態も起こる。そこで、この実施の形態では、ステップ５Ｃまでの処理によって得られた定常部区間ごとに音高変化の状態を調べて、その状態に応じてさらに分割する必要があるかどうかの判定を行い。必要があると判定された場合には、定常部区間をさらに細かな定常区間に分割する。
すなわち、ある定常区間の中におけるピーク基準位置間の長さ（周期長）を計算し、それでサンプリング周波数を割ることによってそのピーク基準位置における周波数が算出される。各定常区間を構成する各波形の周波数の値に基づいて、その波形区間の周波数ｆ₁と前波形区間の周波数ｆ₀との差分（すなわち比）を、ノートに対応したリニア軸で数値化した値すなわち「音程のセント値」に基づいた相対値ｘで表わすと、下記式のようになる。
ｆ₁／ｆ₀ ＝２^(x/12)
これを対数で表わして、ｘを解くと、
ｘ＝ｌｏｇ（ｆ₁／ｆ₀）／ｌｏｇ（¹²√２）
なる式によって求められる。なお、「¹²√２」は、２の１２乗根である。周知のように、これは、２つの周波数の差すなわち比（すなわち音程）をセント値に変換する公式に対応している。ただし、一般的なセント値の表現では半音の音程が１００セントで表現されるが、上記式に従うｘは数値「１」が半音の音程に相当しており、半音の音程を「１」とする、小数点以下の値を含む値である。しかし、これは小数点の位取りの仕方の問題でしかないので、上記相対値ｘは、実質的にセント値に相当するものであると考えてもよく、要するに、相対的な音程情報のことである。上記式では、この相対値ｘは、ｆ₁とｆ₀のどちらが大きいかによってプラス又はマイナスの符号を持つことになるが、ピッチ安定区間の検出のためにはこの正負符号は不要であるから、これを除去した絶対値表現｜ｘ｜で表わしたものを「ノート距離」ということにする。図１６（Ａ）は、このようにして求められる「ノート距離」の時間的変化の関数（以下、ノート距離変動曲線という）の一例を示すもので、縦軸が「ノート距離」、横軸が時間である。このノート距離変動曲線がフラットである区間が、ピッチが安定している区間に相当する。 Step 5D: The stationary section obtained by the processing from step 51 to step 5C is subdivided in consideration of the change in pitch and stability, and the final stationary section is determined. In the stationary section detection processing up to step 5C, since the waveform is expanded and compared, even if the pitch change of the voice waveform is caused by a continuous vowel such as "Oh", it is regarded as one and the same sound. It is a mechanism. Therefore, in the case of the musical sound waveform of the musical instrument sound, a situation may occur in which the pitch change of the continuous musical instrument sound cannot be found. Therefore, in the present embodiment, the state of the pitch change is checked for each stationary section obtained by the processing up to step 5C, and it is determined whether or not it is necessary to divide the pitch in accordance with the state. When it is determined that it is necessary, the steady section is divided into smaller steady sections.
That is, the frequency at the peak reference position is calculated by calculating the length (period length) between the peak reference positions in a certain stationary section and dividing the sampling frequency by that. Based on the value of the frequency of each waveform composing a respective stationary section, a difference (i.e., the ratio) between the frequency f ₀ of the previous waveform segment and the frequency f ₁ of the waveform segment, and quantified with a linear axis which corresponds to the note When expressed as a relative value x based on the value, ie, the “cent value of the pitch”, the following expression is obtained.
_{_{^{f 1 / f 0 = 2 (}}} x / 12)
Expressing this in logarithm and solving x gives
_{x = log (f 1 / f} 0) / log (12 √2)
It is obtained by the following formula. In addition, ^"12 √2" is a 2 of the 12 root. As is well known, this corresponds to a formula that converts the difference or ratio (ie, pitch) of two frequencies to a cent value. However, in the expression of a general cent value, the pitch of a semitone is expressed by 100 cents. However, for x according to the above equation, the numerical value “1” corresponds to the pitch of the semitone, and the pitch of the semitone is “1”. , The value including the value after the decimal point. However, since this is only a matter of the scale of the decimal point, the relative value x may be considered to substantially correspond to a cent value, that is, relative pitch information. . In the above equation, the relative value x has a plus or minus sign depending on which of f ₁ and f ₀ is larger, but since this sign is not necessary for detecting a pitch stable section, The absolute value expression | x | from which this is removed is referred to as "note distance". FIG. 16A shows an example of a function of the temporal change of the “note distance” (hereinafter referred to as a note distance variation curve) obtained in this manner, in which the vertical axis represents the “note distance” and the horizontal axis represents the “note distance”. Time. A section where the note distance variation curve is flat corresponds to a section where the pitch is stable.

図１６（Ａ）のようなノート距離変動曲線を微分してその立下り又は立上りの大きい部分を区切りとすれば、２箇所のピッチ安定区間ＰＳ１及びＰＳ２が検出される。
なお、このようにしてピッチ安定区間を求めてもよいが、この実施の形態では、動的ボーダ曲線を算出し、それに基づいてピッチ安定区間を検出するようにした。ここで、動的ボーダ曲線は、ノート距離変動曲線に基づいて算出されるものであり、例えば、あるサンプルポイントＰＸにおける動的ボーダは、開始位置からサンプルポイントＰＸまでのノート距離変動曲線の平均値を求め、それに所定の定数を乗じたものである。なお、これにオフセット値を加算してもよい。図１６（Ａ）の場合は、動的ボーダ曲線は曲線ＡＣ１のようになる。この動的ボーダ曲線ＡＣ１とノート距離変動曲線ＮＣ１とを比較して、ノート距離変動曲線ＮＣ１が動的ボーダ曲線ＡＣ１よりも小さな区間をピッチ安定区間とする。なお、このとき、動的ボーダ曲線ＡＣ１がノート距離変動曲線ＮＣ１よりも小さくなった時点で、動的ボーダ曲線ＡＣ１の演算を停止し、その値を保持し続けて、ノート距離変動曲線ＮＣ１とその保持していた値とが等しくなった時点で前回までの動的ボーダ曲線ＡＣ１の値をリセットして、再び最初から同じように動的ボーダ曲線ＡＣ１の演算を開始する。このようすが図１６（Ｂ）に示されている。すると、図１６（Ｂ）のようなピッチ安定区間ＰＳ３及びＰＳ４が求まることになる。図１６のようなノート距離変動曲線ＮＣ１の場合は、微分処理によって検出されたピッチ安定区間も、動的ボーダ曲線によって検出されたピッチ安定区間もさほど変わりはない。しかしながら、図１７のようにノート距離変動曲線ＮＣ２の場合には、明確な違いが現れる。 If the note distance variation curve as shown in FIG. 16A is differentiated and a portion having a large falling or rising is defined as a break, two pitch stable sections PS1 and PS2 are detected.
Although the pitch stable section may be obtained in this manner, in this embodiment, the dynamic border curve is calculated, and the pitch stable section is detected based on the calculated dynamic border curve. Here, the dynamic border curve is calculated based on the note distance variation curve. For example, the dynamic border at a certain sample point PX is an average value of the note distance variation curve from the start position to the sample point PX. And multiplying it by a predetermined constant. Note that an offset value may be added to this. In the case of FIG. 16A, the dynamic border curve becomes like the curve AC1. By comparing the dynamic border curve AC1 with the note distance variation curve NC1, a section where the note distance variation curve NC1 is smaller than the dynamic border curve AC1 is defined as a pitch stable section. At this time, when the dynamic border curve AC1 becomes smaller than the note distance variation curve NC1, the calculation of the dynamic border curve AC1 is stopped, the value is kept, and the note distance variation curve NC1 and its When the held value becomes equal, the value of the dynamic border curve AC1 up to the previous time is reset, and the calculation of the dynamic border curve AC1 is started again from the beginning in the same manner. This is shown in FIG. 16 (B). Then, pitch stable sections PS3 and PS4 as shown in FIG. 16B are obtained. In the case of the note distance variation curve NC1 as shown in FIG. 16, the pitch stable section detected by the differentiation processing and the pitch stable section detected by the dynamic border curve do not change much. However, in the case of the note distance variation curve NC2 as shown in FIG. 17, a clear difference appears.

図１７のノート距離変動曲線ＮＣ２の場合は、同母音区間の後半部分でピッチが不安定になっているため、曲線ＮＣ２の傾きでピッチ安定区間を検出すると、図１７（Ａ）のようにピッチ安定区間ＰＳ５，ＰＳ６，ＰＳ７，ＰＳ８が多数現れてしまう。しかしながら、図１７のノート距離変動曲線の場合、人間の耳はピッチが不安定な部分でそのピッチ変化に対して鈍く（疎く）反応するようになるので、実際には図１７（Ａ）のような多数のピッチ安定区間を感じとることはなく、図１６（Ｂ）のような大まかな２つのピッチ安定区間を感じ取ることになる。
一方、前述の動的ボーダ曲線によってピッチ安定区間を検出すれば、人間の耳と同じような反応を行わせることが可能となる。すなわち、図１７のノート距離変動曲線ＮＣ２の動的ボーダ曲線を求めると、図１７（Ｂ）のような曲線ＡＣ２になる。従って、この動的ボーダ曲線ＡＣ２よりも小さなノート距離変動曲線ＮＣ２の部分がピッチ安定区間ＰＳ９及びＰＳＡとなり、図１６のような大まかな２区間として把握されるようになる。すなわち、音程が安定している区間（図１７の安定区間ＰＳ９）では、その後にそれなりの音程変化が発生すると音の区割りが変わったこと、つまり新しい音が始まったということを人間は感じる。逆に、音程が不安定な区間（図１７の安定区間ＰＳＡ）では多少の音程の変化は人間の耳にはあまり感じ取られなくなり、新しい音すなわち区間として認識しない。従って、音程が不安定な区間ではそれなりの大きな音程の変化でないと、音程の変化として認められなくなる。このような人間の耳に近いピッチ安定区間の検出を可能とするために、上述のような動的ボーダを曲線を用い、一連の処理で安定区間又は不安定区間を動的に検出し、区間分けを行っている。
このようにして検出されたピッチ安定区間が図５の定常区間検出処理によって最終的に検出された定常区間すなわち楽譜として表した時に一つの音符に相当する区間になる。 In the case of the note distance variation curve NC2 in FIG. 17, the pitch is unstable in the latter half of the same vowel section. Therefore, when a pitch stable section is detected by the slope of the curve NC2, the pitch as shown in FIG. Many stable sections PS5, PS6, PS7, and PS8 appear. However, in the case of the note distance variation curve of FIG. 17, the human ear responds dullly (sparsely) to a change in pitch at a portion where the pitch is unstable. A large number of stable pitch sections are not sensed, and two rough pitch stable sections as shown in FIG. 16B are sensed.
On the other hand, if a pitch stable section is detected by the above-described dynamic border curve, it is possible to cause a reaction similar to that of a human ear. That is, when a dynamic border curve of the note distance variation curve NC2 in FIG. 17 is obtained, a curve AC2 as shown in FIG. 17B is obtained. Therefore, the portion of the note distance variation curve NC2 smaller than the dynamic border curve AC2 becomes the pitch stable sections PS9 and PSA, and is grasped as roughly two sections as shown in FIG. That is, in a section in which the pitch is stable (stable section PS9 in FIG. 17), when a certain pitch change occurs thereafter, the human feels that the division of the sound has changed, that is, a new sound has started. Conversely, in a section where the pitch is unstable (stable section PSA in FIG. 17), a slight change in the pitch is hardly perceived by the human ear, and is not recognized as a new sound, that is, a section. Therefore, in a section where the pitch is unstable, the change of the pitch is not recognized unless the change of the pitch is relatively large. In order to enable detection of such a pitch stable section close to the human ear, a dynamic border as described above is used as a curve, and a stable section or an unstable section is dynamically detected in a series of processes. We are dividing.
The stable pitch section detected in this way is a steady section finally detected by the steady section detection process of FIG. 5, that is, a section corresponding to one note when expressed as a musical score.

図６は図１のステップ１６の音高列決定処理の詳細を示す図である。音高列決定処理は、ステップ１５によって検出された各定常区間に対して最適な音高列を決定するための処理である。以下、この音高列決定処理について図１８及び図１９を用いて説明する。
音声や楽音などを最終的に音符情報に変換する場合、ある特定周波数をどの音高に丸めるかによってメロディが大幅に変わってしまい、思ったような検出ができない場合が多い。そこで、この実施の形態では、相対音を主体として音高を決定し、さらにそれに調を利用して一番ふさわしい音高遷移を選択することによって音高列を決定するようにした。
この音高列決定処理の一例を図６のフローチャートに従って説明する。
ステップ６１：ステップ１５（図５）の定常区間検出処理によって得られた各定常区間に対してその区間の代表周波数を決定する。図１９（Ａ）は、最終的に得られた定常区間の一例を示す図である。ここでは全部で１２個の区間が検出されたものとして、各区間に括弧記号で囲まれた〔０〕〜〔１２〕の区間番号を割り当ててある。
各定常区間の代表周波数を決定する場合に重要なことは、各定常区間の周期位置から周波数の動向を洗い出して、その区間固有の周波数を１つに決定することである。そのための方法として、第１の方法は定常区間全体の平均周波数をその区間の代表周波数とする。第２の方法は定常区間の丁度中間付近の周期（周波数）をその区間の代表周波数とする。第３の方法はピッチが安定している部分の平均周波数をその区間の代表周波数とする。
なお、この実施の形態では、図５のステップ５Ｄのノート距離による細分化処理の際に使用したノート距離変動曲線、及びその時に検出されたピッチ安定区間を用いて代表周波数を算出する。すなわち、図５のステップ５Ｄの細分化処理によって細分化された区間すなわちピッチ安定区間におけるノート距離変動曲線の平均値を求める。この平均値を静的ボーダとする。例えば、図１７のようなノート距離変動曲線ＮＣ２の場合には、ピッチ区間ＰＳ９における静的ボーダはＳＢ１となり、ピッチ区間ＰＳＡにおける静的ボーダはＳＢ２となる。そして、各ピッチ区間ＰＳ９及びＰＳＡにおけるノート距離変動曲線ＮＣ２がこの静的ボーダＳＢ１及びＳＢ２よりも小さい区間を代表周波数検出区間Ｆ１及びＦ２として、その代表周波数検出区間Ｆ１及びＦ２に存在する各波形のピッチに基づいてその定常区間（ピッチ安定区間）ＰＳ９及びＰＳＡの代表周波数を決定する。
例えば、図１８の代表周波数検出区間Ｆ１を構成する波形区間が図１９（Ｂ）のような１２個であり、各波形区間の周期長は図示の通りとする。この場合、この代表周波数検出区間Ｆ１における周期長の平均値は、２５５．８３３となる。ここで、周期長はサンプリング数で表されているので、サンプリング周波数が４４．１ｋＨｚだから、この代表周波数検出区間Ｆ１の代表周波数は、その周期長の平均値でサンプリング周波数を除することによって得られるので、図１９（Ｂ）の場合には１７２．３８Ｈｚとなる。この場合、代表周波数の値は小数点２桁を有効として扱う。図１９（Ｃ）はこのようにして図１９（Ａ）のような各定常区間の代表周波数を算出した結果を示す図である。 FIG. 6 is a diagram showing details of the pitch sequence determination processing in step 16 of FIG. The pitch sequence determination process is a process for determining an optimal pitch sequence for each steady section detected in step 15. Hereinafter, the pitch sequence determination processing will be described with reference to FIGS.
When voices or musical sounds are finally converted to note information, the melody greatly changes depending on the pitch to which a certain specific frequency is rounded, and in many cases, the desired detection cannot be performed. Therefore, in this embodiment, the pitch is determined mainly by the relative sound, and the pitch sequence is determined by selecting the most suitable pitch transition using the key.
An example of the pitch sequence determination processing will be described with reference to the flowchart of FIG.
Step 61: For each stationary section obtained by the stationary section detection processing of step 15 (FIG. 5), a representative frequency of the section is determined. FIG. 19A is a diagram illustrating an example of a finally obtained steady section. Here, assuming that a total of twelve sections have been detected, section numbers [0] to [12] surrounded by parentheses are assigned to each section.
What is important in determining the representative frequency of each stationary section is to identify the trend of the frequency from the periodic position of each stationary section and determine one unique frequency for the section. As a method for this, the first method uses the average frequency of the entire stationary section as the representative frequency of the section. In the second method, a period (frequency) near the middle of a stationary section is set as a representative frequency of the section. In the third method, the average frequency of the portion where the pitch is stable is set as the representative frequency of the section.
In this embodiment, the representative frequency is calculated using the note distance variation curve used in the subdivision processing based on the note distance in step 5D of FIG. 5 and the pitch stable section detected at that time. That is, the average value of the note distance variation curve in the section subdivided by the subdivision processing in step 5D of FIG. 5, that is, the pitch stable section is obtained. This average value is used as a static border. For example, in the case of the note distance variation curve NC2 as shown in FIG. 17, the static border in the pitch section PS9 is SB1, and the static border in the pitch section PSA is SB2. The sections in which the note distance variation curves NC2 in the pitch sections PS9 and PSA are smaller than the static borders SB1 and SB2 are defined as representative frequency detection sections F1 and F2, and the waveforms of the waveforms present in the representative frequency detection sections F1 and F2 are determined. Based on the pitch, the representative frequency of the steady section (pitch stable section) PS9 and PSA is determined.
For example, the number of waveform sections constituting the representative frequency detection section F1 in FIG. 18 is 12 as shown in FIG. 19B, and the cycle length of each waveform section is as illustrated. In this case, the average value of the cycle length in the representative frequency detection section F1 is 255.833. Here, since the cycle length is represented by the number of samplings, the sampling frequency is 44.1 kHz. Therefore, the representative frequency of this representative frequency detection section F1 is obtained by dividing the sampling frequency by the average value of the cycle length. Therefore, in the case of FIG. 19B, the frequency is 172.38 Hz. In this case, the value of the representative frequency treats two decimal places as valid. FIG. 19C is a diagram showing a result of calculating the representative frequency of each stationary section as shown in FIG. 19A.

ステップ６２：ステップ６１の処理によって各定常区間の代表周波数が決定されると、今度はその代表周波数に基づいて各定常区間の相前後する定常区間番号同士のノート距離を決定する。ノート距離の決定は図５のステップ５Ｄで用いた演算式と同様にして求める。図１９（Ｃ）にはこのようにして算出されたノート距離の一例が示されている。
ステップ６３：算出されたノート距離の小数点以下一桁を四捨五入して、ノート距離を１２音階上の各音高へ丸め込む。例えば、図１９（Ｃ）の場合には、各ノート距離は四捨五入されて、右欄の整数のようになる。この整数は、前音高からのノート番号上の差を示すことになるので、最初の音高を決定することによって、音高列データを完成することが可能となる。図１９（Ｃ）の最右欄に示す音高列データが最初の音高を０とした場合の音高遷移のようすを示すデータである。
すなわち、図１９（Ｃ）の場合には０−２−４−５−２−３・・・となる。
ステップ６４：第１音の音高を決定する。まず、最も簡単な方法は、第１音にデフォルト値として６０のノートナンバ（ノートネームＣ４）音を割り当てる。すなわち、ＭＩＤＩ規格の場合、ノートナンバの限界は０〜１２７なので、第１音の音高として、ノートナンバ６０（ノートネームＣ４）の音を割り当てる。これによって、高音側（プラス側）には６７半音分、低音側（マイナス側）には６０半音分だけ音高を振ることができる。
このようにすると図１９（Ｃ）の最右欄の音高列を示すデータは、６０（Ｃ４）−６２（Ｄ４）−６４（Ｅ４）−６５（Ｆ４）−６２（Ｄ４）−６３（Ｄ＃４）・・・・となる。 Step 62: When the representative frequency of each stationary section is determined by the processing of step 61, the note distance between consecutive section numbers before and after each stationary section is determined based on the representative frequency. The determination of the note distance is obtained in the same manner as the arithmetic expression used in step 5D of FIG. FIG. 19C shows an example of the note distance calculated in this way.
Step 63: The calculated note distance is rounded to one decimal place, and the note distance is rounded to each pitch on the 12th scale. For example, in the case of FIG. 19C, each note distance is rounded off to become an integer in the right column. Since this integer indicates the difference in note number from the previous pitch, it is possible to complete the pitch sequence data by determining the initial pitch. The pitch sequence data shown in the rightmost column of FIG. 19C is data indicating a pitch transition when the initial pitch is set to 0.
That is, in the case of FIG. 19 (C), it becomes 0-2--4-5-2-3.
Step 64: Determine the pitch of the first sound. First, the simplest method assigns a 60 note number (note name C4) sound to the first sound as a default value. That is, in the case of the MIDI standard, since the limit of the note number is 0 to 127, the note having the note number 60 (note name C4) is assigned as the pitch of the first note. As a result, the pitch can be increased by 67 semitones on the high tone side (plus side) and by 60 semitones on the low tone side (minus side).
In this way, the data indicating the pitch sequence in the rightmost column in FIG. 19C is 60 (C4) -62 (D4) -64 (E4) -65 (F4) -62 (D4) -63 (D # 4).

ステップ６５：ステップ６４で決定された音高列データを修正する。すなわち、ステップ６４で決定された音高列データの振れ幅を検出し、それが低音側（マイナス側）に−６０以下に振れている場合には、その最小振れ幅に合わせてデフォルト値６０を修正する。この修正は、最小振れ幅のノートが０以上となるようにデフォルト値を上側にシフトすることによって行う。例えば、最小振れ幅が−６４の場合には、計算式−６０−（−６４）＝４の結果に従って、デフォルト値６０を４ノート分上側にシフトして、第１音として６４を割当てる。高音側（プラス側）に＋６７以上振れている場合にも同様に最大振れ幅に合わせてデフォルト値６０を修正すればよい。なお、低音側（マイナス側）及び高音（プラス側）の両方において振れ幅がオーバーすることは人間の発声帯域から判断してあり得ないので、そのような場合は除外する。なお、このようなことが起こり得るような場合には、特別に音域を０〜２５６の範囲で設定するようにしてもよい。
なお、ステップ６４では、第１音の音高をデフォルト値（例えば６０）として決定し、音高列データを作成する場合について説明したが、これに限らず、最初の定常区間の代表周波数に最も近い純正率音階の周波数を検出し、その音階に当てはめるようにしてもよい。
例えば、図１９（Ｃ）の場合には、区間番号〔０〕の代表周波数は１７２．３８Ｈｚなので、第１音の音高をそれに最も近いノートナンバ５３（ノートネームＦ３）に決定する。
これによって、図１９（Ｃ）の音高列を示すデータは、５３（Ｆ３）−５５（Ｇ３）−５７（Ａ３）−５８（Ａ＃３）−５５（Ｇ３）−５６（Ｇ＃３）・・・・となる。
なお、これ以外にも種々の方法で音程列を割り当ててもよいことは言うまでもない。 Step 65: Modify the pitch sequence data determined in step 64. That is, the swing range of the pitch sequence data determined in step 64 is detected, and if the swing range is -60 or less on the low frequency side (minus side), the default value 60 is set according to the minimum swing width. Fix it. This correction is performed by shifting the default value upward so that the note with the minimum swing is 0 or more. For example, when the minimum swing is -64, the default value 60 is shifted upward by four notes according to the result of the calculation formula -60-(-64) = 4, and 64 is assigned as the first note. Even in the case where the sound swings +67 or more toward the higher pitch side (plus side), the default value 60 may be similarly corrected according to the maximum swing width. In addition, it is impossible to judge from the human utterance band that the swing width exceeds on both the low-pitched side (minus side) and the high-pitched tone (plus side), and such a case is excluded. In the case where such a case can occur, the range may be set in a range of 0 to 256.
In step 64, the case where the pitch of the first sound is determined as a default value (for example, 60) and the pitch sequence data is created has been described. However, the present invention is not limited to this. A frequency of a close genuine scale may be detected and applied to the scale.
For example, in the case of FIG. 19 (C), since the representative frequency of the section number [0] is 172.38 Hz, the pitch of the first sound is determined to be the note number 53 (note name F3) closest thereto.
Thus, the data indicating the pitch sequence in FIG. 19C is 53 (F3) -55 (G3) -57 (A3) -58 (A # 3) -55 (G3) -56 (G # 3). ...
It goes without saying that the pitch sequence may be assigned by various other methods.

次に、この発明に係る電子楽器が音信号分析装置及び演奏情報発生装置として動作する場合の第２の実施の形態について説明する。
この第２の実施の形態に係る電子楽器が音信号分析装置及び演奏情報発生装置として動作する際のメインフローは図１と同じなので、その説明は省略する。ただし、メインフローの中のステップ１３〜ステップ１５の各処理の内容が前述の第１の実施の形態のものとは異なるので、以下その異なる点について詳細に説明する。 Next, a second embodiment in which the electronic musical instrument according to the present invention operates as a sound signal analyzer and a performance information generator will be described.
The main flow when the electronic musical instrument according to the second embodiment operates as the sound signal analyzer and the performance information generator is the same as that in FIG. However, since the contents of each processing of Steps 13 to 15 in the main flow are different from those of the above-described first embodiment, the different points will be described in detail below.

図２０は図１のステップ１３の有効区間検出処理の詳細を示す図であり、図３に対応したものである。有効区間検出処理は、ステップ１２の音声サンプリング処理の結果得られたディジタルサンプル信号に基づいて音楽的な音が存在する区間すなわち有効区間を検出するための処理である。以下、この有効区間検出処理の詳細を図２３を用いて説明する。
ステップ２０１：ステップ１２によって求められたディジタルサンプル信号を所定のサンプル数毎に区切る処理を行う。図２３（Ａ）は、サンプリング周波数４４．１ｋＨｚでサンプリングされた音声信号すなわちディジタルサンプル信号の波形値の一例を示す図である。図２３（Ａ）には、約４４０８ポイント分の波形値が示されている。図２３（Ｄ）には、その２倍の約８８１６ポイント分の波形値が示されている。ステップ２０１では、所定のサンプル数（例えば、音声の最低周波数を８０Ｈｚとした場合におけるその最大周期に対応するサンプル数）でディジタルサンプル信号を区切る。従って、サンプリング周期４４．１ｋＨｚの場合には、この所定サンプル数は『５５１＝４４１００／８０』である。図２３（Ｂ）は図２３（Ａ）の波形値に対応しており、この波形値が５５１サンプル数毎に区切られた場合の各波形区間Ｓ１〜Ｓ８の様子を示す図である。 FIG. 20 is a diagram showing details of the valid section detection processing in step 13 of FIG. 1, and corresponds to FIG. The valid section detection processing is processing for detecting a section in which a musical sound exists, that is, a valid section, based on the digital sample signal obtained as a result of the voice sampling processing in step 12. Hereinafter, the details of the valid section detection processing will be described with reference to FIG.
Step 201: A process for dividing the digital sample signal obtained in step 12 into a predetermined number of samples is performed. FIG. 23A is a diagram illustrating an example of a waveform value of an audio signal sampled at a sampling frequency of 44.1 kHz, that is, a digital sample signal. FIG. 23A shows waveform values for about 4408 points. FIG. 23D shows a waveform value of about 8816 points that is twice as large. In step 201, the digital sample signal is divided by a predetermined number of samples (for example, the number of samples corresponding to the maximum period when the lowest frequency of audio is 80 Hz). Therefore, when the sampling period is 44.1 kHz, the predetermined number of samples is “551 = 44100/80”. FIG. 23B corresponds to the waveform values of FIG. 23A, and is a diagram showing the state of each of the waveform sections S1 to S8 when the waveform values are divided every 551 samples.

ステップ２０２：ステップ２０１によって区切られた区間毎に、その区間内に存在するディジタルサンプル信号波形の最大値を抽出する。図２３（Ｃ）には、図２３（Ａ）のディジタルサンプル信号波形が点線で示され、その各区間Ｓ１〜Ｓ８内における各波形の最大値が黒点で示されている。
ステップ２０３：ステップ２０２で求められた各区間の最大値を補間（例えば直線補間）し、補助波形を作成する。図２３（Ｄ）は、図２３（Ａ）〜（Ｃ）の約２倍の区間に相当する補助波形を示すものであり、各区間の最大値を直線補間することによって得られた補助波形を示している。なお、図２３（Ａ）のディジタルサンプル信号波形は点線で示されている。
次のステップ２０４〜ステップ２０６では、このようにして得られた補助波形に基づいて有効区間の抽出処理が行われる。 Step 202: For each section divided by step 201, the maximum value of the digital sample signal waveform existing in the section is extracted. In FIG. 23 (C), the digital sample signal waveform of FIG. 23 (A) is indicated by a dotted line, and the maximum value of each waveform in each of the sections S1 to S8 is indicated by a black point.
Step 203: Interpolate (for example, linearly interpolate) the maximum value of each section obtained in Step 202 to create an auxiliary waveform. FIG. 23D shows an auxiliary waveform corresponding to a section approximately twice as long as FIGS. 23A to 23C. The auxiliary waveform obtained by linearly interpolating the maximum value of each section is shown in FIG. Is shown. Note that the digital sample signal waveform in FIG. 23A is shown by a dotted line.
In the next steps 204 to 206, an effective section is extracted based on the auxiliary waveform thus obtained.

ステップ２０４：前記ステップ２０３で求められた図２３（Ｄ）のような補助波形を、所定のしきい値Ｔｈに基づいて有効区間又は無効区間にそれぞれ分類する。この処理では、しきい値Ｔｈとして、最大波形値の約３分の１の値をしきい値とする。これ以外の値をしきい値Ｔｈとしてもよいことは言うまでもない。例えば、図２３（Ｄ）の実線波形の平均値をしきい値Ｔｈとしたり、又はその平均値の８０パーセントをしきい値Ｔｈとしたりしてもよい。
従って、このしきい値Ｔｈと補助波形との交点位置が有効区間及び無効区間の境界となり、このしきい値Ｔｈよりも大きい区間が有効区間となり、小さい区間が無効区間となる。 Step 204: The auxiliary waveform as shown in FIG. 23D obtained in step 203 is classified into a valid section and an invalid section based on a predetermined threshold Th. In this processing, the threshold Th is set to a value about one third of the maximum waveform value. It goes without saying that a value other than this may be used as the threshold value Th. For example, the average value of the solid waveform in FIG. 23D may be set to the threshold value Th, or 80% of the average value may be set to the threshold value Th.
Therefore, the intersection of the threshold value Th and the auxiliary waveform is the boundary between the valid section and the invalid section, the section larger than the threshold value Th is the valid section, and the section smaller than the threshold Th is the invalid section.

ステップ２０５：人間が音高を認知できる必要な最低長を０．０５ｍｓｅｃとした場合に、前記ステップ２０２で決定された無効区間の中からこの最低長よりも小さな無効区間を有効区間に変更する。例えば、サンプリング周期が４４．１ｋＨｚの場合にはサンプリング数で２２０５個以下の無効区間が、人間が音高を認知できる必要な最低長である０．０５ｍｓｅｃ以下の無効期間に対応するので、そのような無効区間を有効区間に変更する。図２３（Ｄ）の無効区間は、波形区間で３個分（サンプリング数で約１６５３個分）なので、この最低長よりも小さな無効区間に相当するので、このステップ２０５の処理によって有効区間に変更される。
ステップ２０６：前記ステップ２０５の処理の結果、得られた有効区間及び無効区間のパターンの中から０．０５ｍｓｅｃ以下の短い有効区間を無効区間に変更する処理を行う。この処理は前記ステップ２０５と同様の処理にて行う。 Step 205: When the minimum length required for human recognition of the pitch is 0.05 msec, the invalid section smaller than the minimum length is changed from the invalid sections determined in step 202 to the valid section. For example, when the sampling period is 44.1 kHz, the invalid section of 2205 or less in sampling number corresponds to the invalid period of 0.05 msec or less, which is the minimum length necessary for human being to recognize the pitch. Change invalid invalid section to valid section. The invalid section shown in FIG. 23D corresponds to three invalid sections (about 1653 sampling numbers) in the waveform section, and thus corresponds to an invalid section smaller than the minimum length. Is done.
Step 206: A process of changing a short valid section of 0.05 msec or less from the pattern of the valid section and the invalid section obtained as a result of the processing of the step 205 to the invalid section is performed. This process is performed in the same manner as in step 205 described above.

図２１は図１のステップ１４の安定区間検出処理の詳細を示す図であり、図４に対応したものである。以下、図２０の有効区間検出処理によって求められた有効区間内のディジタルサンプリング信号に対して、レベルの安定した領域を検出するための安定区間検出処理を行う。この安定区間検出処理の動作を図２４〜図２６を用いて説明する。
ステップ２１１：図２０の有効区間検出処理によって検出された有効区間内のディジタルサンプリング信号に基づいて、波形のピークが強く出ているサイドを検出する。すなわち、図２４（Ａ）のように有効区間内のディジタルサンプリング信号のプラス（＋）側の波形のピーク値ｍａｘとマイナス（−）側の波形のピーク値ｍｉｎのそれぞれの絶対値を取り、どちらの絶対値が大きいかによって、ピークの強く出ているサイドを決定する。なお、これ以外の方法でピークの強く出ているサイドを決定するようにしてもよい。例えば、上位３〜５個のピーク値の絶対値の合計を比較して決定するようにしてもよい。
ステップ２１２：ステップ２１１でピークの検出されたサイドにおいて、前方（時間経過方向）に向けてエンベロープを取り、そのピーク部を検出する。すなわち、図２４（Ｂ）のように、プラス側の波形に対して前方にエンベロープを取り、そのピーク部を検出する。この結果、図２４（Ｂ）の場合、ピーク部としてＰ１〜Ｐ４の４点が検出される。
ステップ２１３：今度はステップ２１２とは逆の方向（時間経過とは逆方向）に向けてエンベロープを取り、そのピーク部を検出する。すると、図２４（Ｃ）のように、同じ位置にピーク部Ｐ１〜Ｐ４が検出されるが、波形によってはこれ以外にもピーク部ＰＰが検出される。このことは、徐々にレベルが上がっている波形においては、いずれか一方向だけでエンベロープ検出を行った場合には、倍音ピークをピッチのピークとして取り間違え、実際にピークでない箇所（ピーク部ＰＰなどのようなもの）を誤ってピークとして検出してしまうことがある。従って、ステップ２１２及びステップ２１３のように、異なる方向でエベロープを取り、ピーク部を検出することによって、ピーク部の検出精度を向上することができる。 FIG. 21 is a diagram showing details of the stable section detection processing in step 14 of FIG. 1, and corresponds to FIG. Hereinafter, a stable section detection process for detecting an area with a stable level is performed on the digital sampling signal in the effective section obtained by the effective section detection processing of FIG. The operation of the stable section detection processing will be described with reference to FIGS.
Step 211: The side where the peak of the waveform is strong is detected based on the digital sampling signal in the effective section detected by the effective section detection processing of FIG. That is, as shown in FIG. 24A, the absolute value of the peak value max of the plus (+) side waveform and the peak value min of the minus (−) side waveform of the digital sampling signal in the effective section are calculated. Is determined depending on whether the absolute value of is large. Note that a side having a strong peak may be determined by other methods. For example, the total of the absolute values of the top three to five peak values may be compared and determined.
Step 212: On the side where the peak is detected in step 211, the envelope is taken forward (in the time lapse direction), and the peak portion is detected. That is, as shown in FIG. 24B, an envelope is taken ahead of the waveform on the plus side, and the peak portion is detected. As a result, in the case of FIG. 24B, four points P1 to P4 are detected as peak portions.
Step 213: This time, the envelope is taken in the direction opposite to that of step 212 (the direction opposite to the elapsed time), and its peak is detected. Then, as shown in FIG. 24C, peak portions P1 to P4 are detected at the same position, but depending on the waveform, other peak portions PP are detected. This means that if the envelope is detected in only one direction in a waveform whose level gradually rises, the overtone peak is mistaken as a pitch peak, and a portion that is not actually a peak (such as a peak PP). ) May be erroneously detected as a peak. Therefore, as in steps 212 and 213, by detecting the peaks in different directions and detecting the peaks, it is possible to improve the detection accuracy of the peaks.

ステップ２１４：ステップ２１２及びステップ２１３の処理によって検出されたピーク部を直線補間し、新たな波形を生成する。図２４（Ｄ）は、ステップ２１２及びステップ２１３によって検出されたピーク部Ｐ１〜Ｐ４を直線補間することによって生成された新たなピーク値補間曲線を示している。なお、図２４（Ｄ）において図２４（Ａ）のディジタルサンプル信号波形は点線で示されている。
ステップ２１５：以上の処理によって生成されたピーク値補間曲線に基づいて、ピーク間の合計傾斜を算出する。この処理では、図２４（Ｄ）に示すように、傾斜を算出するための算出幅を例えば２００ポイントとし、その算出幅のシフト量を例えば１００ポイントとして、このシフト量に相当するポイントを順次シフトしながら、その傾斜を算出する。有効区間の最初のサンプルポイントａ１が『１００』だとすると、そのサンプルポイント『１００』と『３００』との間の傾斜ｂ１は、算式：（ａ3 −ａ１）／２００によって求められる。次にそれぞれのポイントを１００ポイントずつシフトしたサンプルポイント『２００』と『４００』との間の傾斜ｂ２を求める。 Step 214: A new waveform is generated by linearly interpolating the peaks detected by the processing of steps 212 and 213. FIG. 24D shows a new peak value interpolation curve generated by linearly interpolating the peak portions P1 to P4 detected in steps 212 and 213. In FIG. 24D, the waveform of the digital sample signal in FIG. 24A is indicated by a dotted line.
Step 215: Calculate the total slope between the peaks based on the peak value interpolation curve generated by the above processing. In this processing, as shown in FIG. 24D, the calculation width for calculating the inclination is set to, for example, 200 points, the shift amount of the calculation width is set to, for example, 100 points, and points corresponding to this shift amount are sequentially shifted. While calculating its inclination. Assuming that the first sample point a1 of the valid section is "100", the slope b1 between the sample points "100" and "300" is obtained by the formula: (a3 -a1) / 200. Next, a slope b2 between sample points "200" and "400" obtained by shifting each point by 100 points is obtained.

このようにして算出された傾斜の一例を図２５に示す。図から明らかなように、サンプルポイント『１００』−『３００』間の傾斜ｂ１は０．０３、サンプルポイント『２００』−『４００』間の傾斜ｂ２は０．１５、サンプルポイント『３００』−『５００』間の傾斜ｂ３は０．２５、サンプルポイント『４００』−『６００』間の傾斜ｂ４は０．５０、サンプルポイント『５００』−『７００』間の傾斜ｂ５は０．９０、サンプルポイント『６００』−『８００』間の傾斜ｂ６は１．８０、サンプルポイント『７００』−『９００』間の傾斜ｂ７は１．９０、サンプルポイント『８００』−『１０００』間の傾斜ｂ８は２．００、サンプルポイント『９００』−『１１００』間の傾斜ｂ９は１．７０、『１０００』−『１２００』間の傾斜ｂ１０は１．２０、『１１００』−『１３００』間の傾斜ｂ１１は０．７０となる。
これらの傾斜ｂ１〜ｂ１１は前者のサンプルポイントａ１〜ａ１１における傾斜として記憶される。すなわち、サンプルポイントａ１の傾斜ｂ１が０．０３、サンプルポイントａ２の傾斜ｂ２が０．１５として、それぞれのサンプルポイント毎に傾斜が記憶される。 FIG. 25 shows an example of the inclination calculated in this manner. As is clear from the drawing, the slope b1 between the sample points "100" and "300" is 0.03, the slope b2 between the sample points "200" and "400" is 0.15, and the sample point "300"-" The slope b3 between "500" and "500" is 0.25, the slope b4 between sample points "400" and "600" is 0.50, the slope b5 between sample points "500" and "700" is 0.90, and the sample point " The slope b6 between "600" and "800" is 1.80, the slope b7 between sample points "700" and "900" is 1.90, and the slope b8 between sample points "800" and "1000" is 2.00. The slope b9 between sample points “900” and “1100” is 1.70, the slope b10 between “1000” and “1200” is 1.20, and the slope b10 between “1100” and “1300”. Oblique b11 is 0.70.
These slopes b1 to b11 are stored as the slopes at the former sample points a1 to a11. That is, assuming that the slope b1 of the sample point a1 is 0.03 and the slope b2 of the sample point a2 is 0.15, the slope is stored for each sample point.

次に、このようにして求められた傾斜ｂ１〜ｂ１１の値に基づいて、合計傾斜を求める。合計傾斜はそのサンプルポイントを基準に後ろ５つの傾斜を合計することによって得られるものであり、そのサンプルポイント付近の傾斜の度合いを示すものである。例えば、サンプルポイントａ１の合計傾斜ｃ１は、そのサンプルポイントａ１の傾斜ｂ１と、それから４つ後ろの傾斜ｂ２〜ｂ５とを合計することによって算出される。すなわち、ｃ１＝ｂ１＋ｂ２＋ｂ３＋ｂ４＋ｂ５によって算出される。図２５の場合には、サンプルポイントａ１の合計傾斜ｃ１は１．８３、サンプルポイントａ２の合計傾斜は３．６０、サンプルポイントａ３の合計傾斜ｃ３は５．３５、サンプルポイントａ４の合計傾斜は７．１０、サンプルポイントａ５の合計傾斜は８．３０、サンプルポイントａ６の合計傾斜ｃ６は８．６０、サンプルポイントａ７の合計傾斜は７．５０である。 Next, a total inclination is calculated based on the values of the inclinations b1 to b11 thus obtained. The total slope is obtained by summing the last five slopes with respect to the sample point, and indicates the degree of the slope near the sample point. For example, the total slope c1 of the sample point a1 is calculated by summing the slope b1 of the sample point a1 and the slopes b2 to b5 four places behind it. That is, it is calculated by c1 = b1 + b2 + b3 + b4 + b5. In the case of FIG. 25, the total slope c1 of the sample point a1 is 1.83, the total slope of the sample point a2 is 3.60, the total slope c3 of the sample point a3 is 5.35, and the total slope of the sample point a4 is 7 .10, the total slope of the sample point a5 is 8.30, the total slope c6 of the sample point a6 is 8.60, and the total slope of the sample point a7 is 7.50.

このようにして全区間における合計傾斜を算出し、次のステップ２１６の処理を行う。なお、ここでは、５ポイントの合計をその先頭の合計傾斜とする場合について説明したが、これに限らず、５ポイントの中間の合計傾斜としてもよい。すなわち、合計傾斜ｃ１をサンプルポイントａ１〜ａ５の中間すなわちサンプルポイントａ３の値としてもよい。また、これ以外に５ポイントにおける位置が明確であれば、合計傾斜をどのポイントの値としてもよいことは言うまでもない。また、５ポイントに限らず、それ以上でもそれ以下でもよいことは言うまでもない。
このように合計傾斜を用いることで、一時的な傾きにだまされることなく、適切な傾斜部分が見つけ出せるので、適切な安定箇所を発見することができるようになる。 In this way, the total inclination in all the sections is calculated, and the process of the next step 216 is performed. Here, a case has been described in which the sum of the five points is used as the total slope at the head, but the present invention is not limited to this, and a total slope in the middle of five points may be used. That is, the total slope c1 may be set to a value between the sample points a1 to a5, that is, the value of the sample point a3. In addition, if the position at five points is clear, the total inclination may be set to any value. Needless to say, the number of points is not limited to 5 and may be more or less.
By using the total inclination as described above, an appropriate inclined portion can be found without being deceived by a temporary inclination, so that an appropriate stable portion can be found.

ステップ２１６：前記ステップ２１５で算出された合計傾斜に基づいて今度は安定区間の抽出を行う。すなわち、各サンプルポイントにおける合計傾斜を直線補間又はその他の補間によって結ぶことによって形成された合計傾斜曲線の中で所定値（例えば、合計傾斜値５）以下の箇所を安定区間とし、それ以外の箇所は不安定区間とする。
ステップ２１７：ステップ２１６の処理によって安定区間とみなされた区間毎に波形の最大値すなわちピーク値補間曲線の最大値を検出し、その最大値が所定値以下の場合にはその安定区間は削除し、不安定区間に変更する。
ステップ２１８：このようにして抽出された安定区間の存在に基づいて人間は初めてその安定区間の開始点付近に音符のトリガである音の開始点があることに気付く。そこで、その音符の開始点付近を決定するために、ステップ２１６及びステップ２１７で安定区間と認定された部分の音符開始点を検出し、それに応じて安定区間の拡張を行う。 Step 216: This time, a stable section is extracted based on the total slope calculated in step 215. That is, in a total slope curve formed by connecting the total slopes at each sample point by linear interpolation or other interpolation, a portion having a predetermined value (for example, a total slope value of 5) or less is defined as a stable section, and other portions are defined as stable sections. Is an unstable section.
Step 217: The maximum value of the waveform, that is, the maximum value of the peak value interpolation curve is detected for each section considered as a stable section by the processing of step 216, and when the maximum value is equal to or less than the predetermined value, the stable section is deleted. , Change to an unstable section.
Step 218: Based on the existence of the stable section extracted in this way, the human first notices that there is a starting point of a sound which is a trigger of a note near the starting point of the stable section. Therefore, in order to determine the vicinity of the starting point of the note, the note starting point of the portion that has been recognized as a stable section in steps 216 and 217 is detected, and the stable section is extended accordingly.

図２６はステップ２１６からステップ２１８までの処理の概念を示すものである。有効区間内の合計傾斜曲線が図２６に示すような場合、それを所定値で区切ることによって３つの安定区間ｄ１〜ｄ３が検出される。なお、安定区間ｄ２についてはステップ２１７の処理の結果、ピーク値補間曲線の最大値が所定値以下であるために削除される。従って、図２６の有効区間の場合には２つの安定区間ｄ１，ｄ３が存在することになり、２つの安定区間ｄ１，ｄ３の間には削除された安定区間ｄ２を含む不安定区間が存在することになる。この不安定区間を安定区間ｄ１，ｄ３に接続して、それぞれの安定区間ｄ１，ｄ３の拡張処理を行う必要がある。 FIG. 26 shows the concept of the processing from step 216 to step 218. When the total slope curve within the effective section is as shown in FIG. 26, three stable sections d1 to d3 are detected by dividing the curve by a predetermined value. Note that the stable section d2 is deleted because the maximum value of the peak value interpolation curve is equal to or less than the predetermined value as a result of the process of step 217. Therefore, in the case of the effective section in FIG. 26, two stable sections d1 and d3 exist, and an unstable section including the deleted stable section d2 exists between the two stable sections d1 and d3. Will be. It is necessary to connect the unstable sections to the stable sections d1 and d3 and perform the extension processing of the respective stable sections d1 and d3.

安定区間ｄ１については、必然的に有効区間の開始点がその安定区間ｄ１の音符の開始点となり、安定区間ｄ３については、必然的に有効区間の最終点がその安定区間ｄ３の音符の終了点となる。そして、安定区間ｄ３の音符開始点及び安定区間ｄ１の最終点は次のようにして求められる。すなわち、ステップ２１６で検出された不安定区間の中から、音符開始点の検出対象となる安定区間に近い方の不安定区間を決定し、その不安定区間内の合計傾斜曲線のピーク値に相当するサンプルポイントをその安定区間の音符開始点とするようにした。従って、図２６のように安定区間ｄ２が削除されて不安定区間が２箇所存在する場合には、音符開始点の検出対象となる安定区間ｄ３に近い方の不安定区間において、合計傾斜のピーク値に相当するサンプルポイントｆ２が安定区間ｄ３の音符開始点となる。従って、安定区間ｄ１の音符終了点はサンプルポイントｆ２となるので、最終的に、安定区間ｄ１は拡張安定区間ｅ１となり、安定区間ｄ２は拡張安定区間ｅ３となる。
なお、図２６の場合に、安定区間ｄ２がステップ２１７の処理で削除されなかった場合には、サンプルポイントｆ１が安定区間ｄ１の音符終了点及び安定区間ｄ２の音符開始点となり、サンプルポイントｆ２が安定区間ｄ２の音符終了点及び安定区間ｄ３の音符開始点となる。 For the stable section d1, the start point of the effective section is necessarily the start point of the note in the stable section d1, and for the stable section d3, the end point of the valid section is necessarily the end point of the note in the stable section d3. It becomes. The note start point of the stable section d3 and the final point of the stable section d1 are obtained as follows. That is, from the unstable sections detected in step 216, an unstable section that is closer to the stable section for which the note start point is to be detected is determined and corresponds to the peak value of the total slope curve in the unstable section. The sample point to be used is set as the note start point of the stable section. Therefore, when the stable section d2 is deleted and two unstable sections exist as shown in FIG. 26, the peak of the total slope is determined in the unstable section closer to the stable section d3 for which the note start point is to be detected. The sample point f2 corresponding to the value is the note start point of the stable section d3. Accordingly, since the note end point of the stable section d1 is the sample point f2, the stable section d1 finally becomes the extended stable section e1 and the stable section d2 becomes the extended stable section e3.
In the case of FIG. 26, if the stable section d2 is not deleted by the process of step 217, the sample point f1 becomes the note end point of the stable section d1 and the note start point of the stable section d2, and the sample point f2 becomes This is the note end point of the stable section d2 and the note start point of the stable section d3.

図２２は図１のステップ１５の定常区間検出処理の詳細を示す図であり、図５に対応したものである。図２１の安定区間検出処理によって求められた安定区間の中から定常区間がどのようにして検出されるのか、その定常区間検出処理の詳細を説明する。なお、図２２の定常区間処理の内、ステップ２２１〜ステップ２２９は図５に示したステップ５１〜ステップ５９とほとんど同じなので、その部分については簡単に説明し、これ以外について詳細に説明する。 FIG. 22 is a diagram showing details of the steady-state section detection processing in step 15 of FIG. 1 and corresponds to FIG. How the steady section is detected from the stable sections determined by the stable section detection processing of FIG. 21 will be described in detail. Steps 221 to 229 in the steady-state section processing in FIG. 22 are almost the same as steps 51 to 59 shown in FIG. 5, so that the portion will be briefly described, and the other portions will be described in detail.

ステップ２２１：第１次バンドパスフィルタ（第１次ＢＰＦ）を通過させて、所定の倍音を削除する。
ステップ２２２：ステップ２２１の第１次ＢＰＦ処理によって得られた楽音波形信号に対してピーク位置検出法を用いて１周期の基準となるピーク基準位置検出処理を行う。
ステップ２２３：前記ステップ２２２で検出されたピーク基準位置に基づいて、あるピーク基準位置から始まる基本区間と、その基本区間の直後の次のピーク基準位置までの区間（以下、移動区間とする）との間の２つの区間の波形について波形が同じであるか否かの比較を図１３に示すような誤差率算出方法によって行う。
ステップ２２４：ステップ２２３の波形比較処理の結果を利用して、誤差率が所定値（例えば１０）よりも小さな区間同士を繋げて、それを疑似的な一致区間とし、各一致区間から抽出されるピッチの最大値と最小値を検出し、それに基づいてカットオフ周波数帯を決定する。
ステップ２２５：ステップ２２４で決定された新たなカットオフ周波数帯を用いて、第２次バンドパスフィルタ（第２次ＢＰＦ）を通過させて、不要な倍音を除去する。
ステップ２２６：ステップ２２２のピーク基準位置検出処理と同じ処理を行う。
ステップ２２７：ステップ２２３の波形比較処理と同じ処理を行う。
ステップ２２５からステップ２２７までの一連の処理によって、誤差の原因となる低周波や高調波がカットされて、より精度の高いピーク基準位置検出処理及び波形比較処理が可能となり、前回よりも精度の高い一致区間が得られる。
ステップ２２８：ステップ２２７までの処理によって得られた各ピーク基準位置におけるピッチデータを補間して、１サンプルポイント毎に１ピッチデータとなるように直線補間する。 Step 221: Pass a first-order bandpass filter (first-order BPF) to delete a predetermined harmonic.
Step 222: A peak reference position detection process, which is a reference for one cycle, is performed on the musical tone waveform signal obtained by the first-order BPF process of step 221 using a peak position detection method.
Step 223: Based on the peak reference position detected in Step 222, a basic section starting from a certain peak reference position and a section up to the next peak reference position immediately after the basic section (hereinafter referred to as a moving section). The comparison of whether or not the waveforms of the two sections between are the same is performed by an error rate calculation method as shown in FIG.
Step 224: Using the result of the waveform comparison process of step 223, the sections in which the error rate is smaller than a predetermined value (for example, 10) are connected to each other to be a pseudo matching section, and extracted from each matching section. The maximum value and the minimum value of the pitch are detected, and the cutoff frequency band is determined based on the maximum value and the minimum value.
Step 225: Using the new cut-off frequency band determined in Step 224, pass through a second-order band-pass filter (second-order BPF) to remove unnecessary harmonics.
Step 226: The same processing as the peak reference position detection processing of step 222 is performed.
Step 227: The same processing as the waveform comparison processing of step 223 is performed.
By the series of processes from step 225 to step 227, low frequencies and harmonics that cause an error are cut, and a more accurate peak reference position detection process and a waveform comparison process can be performed. A matching section is obtained.
Step 228: The pitch data at each peak reference position obtained by the processing up to step 227 is interpolated, and linear interpolation is performed so that one pitch data is obtained for each sample point.

ステップ２２９：ステップ２２８の処理によって求められた各サンプルポイント毎のピッチデータを用いて時変動バンドパスフィルタ（ＢＰＦ）処理を行う。
ステップ２２Ａ：ステップ２２９の時変動バンドパスフィルタ処理を経た楽音波形に対して、ピークが強く出ているサイドを決定し、ステップ２２８で得られた周波数変化に基づいて決定されるピリオド区間によって楽音波形を区切り、各区切り区間内で最大となる位置を検出し、そこをピーク基準位置とする。すなわち、ステップ２２９の時変動バンドパスフィルタ処理によって得られた楽音波形が図２７に示すようなものである場合、その周波数変化に基づいて決定されるピリオド区間ＰＲ１〜ＰＲ５によって各楽音波形を区切る。この区切られた区間ＰＲ１〜ＰＲ５における最大値Ｐ１〜Ｐ６がピーク基準位置となる。ステップ２２２（ステップ５２）やステップ２２６（ステップ５６）のようなピーク基準位置検出処理によってピーク基準位置を検出すると、図１０（Ａ）に示すような波形の場合、明らかに誤った位置にピーク基準位置Ｐ４，Ｐ５が出現してしまうという問題があるが、このステップ２２Ａのようにピリオド区間によって楽音波形を区切り、その中でピーク基準位置を検出する場合だと、そのような明らかに誤ったピーク基準位置が検出されることはなくなり、ピーク基準位置検出精度が向上する。 Step 229: Perform a time-varying bandpass filter (BPF) process using the pitch data for each sample point obtained by the process of step 228.
Step 22A: For the musical sound waveform that has undergone the time-varying band-pass filter processing in step 229, the side where the peak is strong is determined, and the musical sound waveform is formed by the period section determined based on the frequency change obtained in step 228. , And a position that becomes the maximum in each section is detected, and that position is defined as a peak reference position. That is, when the musical tone waveform obtained by the time-varying bandpass filter processing in step 229 is as shown in FIG. 27, each musical tone waveform is divided by period sections PR1 to PR5 determined based on the frequency change. The maximum values P1 to P6 in the divided sections PR1 to PR5 are the peak reference positions. When the peak reference position is detected by the peak reference position detection processing as in step 222 (step 52) or step 226 (step 56), in the case of the waveform shown in FIG. There is a problem that the positions P4 and P5 appear. However, when the musical sound waveform is divided by a period section as in step 22A and the peak reference position is detected therein, such an apparently incorrect peak is detected. No reference position is detected, and the accuracy of peak reference position detection is improved.

ステップ２２Ｂ：ステップ２２Ａのピーク基準位置検出処理によって検出されたピーク基準位置に基づいて波形の有声区間検出処理を行う。すなわち、この有声区間検出処理では、ステップ２２３と同様に、ステップ２２Ａで検出されたピーク基準位置に基づいて、あるピーク基準位置から始まる基本区間と、その基本区間の直後の次のピーク基準位置までの区間（以下、移動区間とする）との間の２つの区間の波形について波形が同じであるか否かの比較を図１３に示すような誤差率算出方法によって行う。そのために、この有声区間検出処理では、基本区間と移動区間とが不一致と判定された場合、その部分を直ちに有声区間の区切りとしないで、不一致が所定回数以上連続して発生した場合に有声区間の区切りとする。これによって、「あ〜い〜う〜」や「あ〜あ〜あ〜」等のように「母音が連続する部分」を有声区間として検出することができる。 Step 22B: Perform voiced section detection processing of the waveform based on the peak reference position detected by the peak reference position detection processing of step 22A. That is, in this voiced section detection process, as in step 223, based on the peak reference position detected in step 22A, a basic section starting from a certain peak reference position and a next peak reference position immediately after the basic section are determined. The comparison of whether or not the waveforms of the two sections between the section (hereinafter referred to as a moving section) are the same is performed by an error rate calculation method as shown in FIG. For this reason, in the voiced section detection process, if it is determined that the basic section and the moving section do not match, the section is not immediately set as a section of the voiced section. And a delimiter. As a result, it is possible to detect a "vowel-continuous portion" such as "a-a-a-" or "a-a-a-" as a voiced section.

例えば、図２８の場合、ピーク基準位置Ｐ１からピーク基準位置Ｐ２までを基本区間Ｐ１２とすると、ピーク基準位置Ｐ２からピーク基準位置Ｐ３までが移動区間Ｐ２３となる。この場合、基本区間Ｐ１２と移動区間２３は一致すると判定されたとする。区間Ｐ２３と区間Ｐ３４についても同様に一致と判定されたとする。次の区間Ｐ３４と区間Ｐ４５が不一致と判定された場合、その区間Ｐ３４と、区間Ｐ４５の次の区間Ｐ５６との間で波形比較処理を行う。その結果、区間Ｐ３４と区間Ｐ５６が一致と判定された場合には、区間Ｐ４５と区間５６が不一致であっても、区間Ｐ３４、区間Ｐ４５、区間Ｐ５６は一致するものとして次の区間Ｐ５６と区間Ｐ６７（図示せず）の判定に進む。このとき、区間Ｐ３４と区間Ｐ４５、区間Ｐ３４と区間Ｐ５６、区間Ｐ３４と区間Ｐ６７（図示せず）、区間Ｐ３４と区間Ｐ７８（図示せず）、区間Ｐ３４と区間Ｐ８９（図示せず）がそれぞれ不一致と判定された場合、すなわち所定回数（例えば５回）以上不一致が連続して発生した場合には、その区間Ｐ３４を有声区間の区切りとし、次の区間Ｐ４５と区間Ｐ５６について同様の判定を行う。区間Ｐ４５と区間Ｐ５６が不一致の場合には、区間Ｐ４５と区間Ｐ６７（図示へせず）について判定を行う。なお、区間４５と区間Ｐ５６が不一致の場合、不一致が連続するがどうかの処理を行わずに、次の区間Ｐ５６と区間Ｐ６７について判定を行い、隣合う区間が一致したときに始めて前述と同様の一致処理を行うようにしてもよい。
このようにして、有声区間が決定したら、今度はその有声区間の中で所定長以下のもの（短い有声区間）を削除する。 For example, in the case of FIG. 28, assuming that the basic section P12 is from the peak reference position P1 to the peak reference position P2, the moving section P23 is from the peak reference position P2 to the peak reference position P3. In this case, it is assumed that it is determined that the basic section P12 and the moving section 23 match. It is assumed that the sections P23 and P34 are also determined to be identical. If it is determined that the next section P34 does not match the section P45, a waveform comparison process is performed between the section P34 and the section P56 next to the section P45. As a result, when it is determined that the section P34 and the section P56 match, even if the section P45 and the section 56 do not match, the section P34, the section P45, and the section P56 are assumed to match, and the next section P56 and the section P67 match. (Not shown). At this time, the sections P34 and P45, the sections P34 and P56, the sections P34 and P67 (not shown), the sections P34 and P78 (not shown), and the sections P34 and P89 (not shown) respectively do not match. Is determined, that is, when a mismatch occurs continuously for a predetermined number of times (for example, five times) or more, the section P34 is set as a section of a voiced section, and the same determination is performed for the next section P45 and the section P56. When the sections P45 and P56 do not match, the determination is made for the sections P45 and P67 (not shown). When the section 45 and the section P56 do not match, the process is not performed to determine whether or not the mismatch continues, but the determination is made for the next section P56 and the section P67. A matching process may be performed.
When the voiced section is determined in this way, the voiced section having a predetermined length or less (short voiced section) is deleted.

以上の処理によって、安定区間は、図２９に示すように不安定区間によって分離された有声区間Ｖ１〜Ｖ３のように分類される。なお、この有声区間Ｖ１〜Ｖ３は隣接比較誤差曲線の値の低い安定した部分に対応し、隣接比較誤差曲線の値の高い部分が不安定区間に対応している。従って、この隣接比較誤差曲線に基づいて、安定区間Ｖ１〜Ｖ３の拡張処理を行う。この拡張処理は、安定区間の開始点及び終了点に接する有効区間については無条件にその安定区間の開始点及び終了点まで拡張し、二つの有声区間に挟まれた不安定区間については隣接比較誤差の最大値を区切り点として、有効区間の拡張を行う。従って、図２９に示すような隣接比較誤差曲線の場合には、各有声区間Ｖ１〜Ｖ３は拡張処理によって拡張有声区間Ｖ１Ｅ〜Ｖ３Ｅのようになる。なお、図２９では、拡張有声区間内の隣接比較誤差の傾きが０となる部分（底辺部分）が一か所の場合のみが示されているが、実際には隣接比較誤差の傾きが０となる部分（底辺部分）は複数箇所存在する場合があることは言うまでもない。 By the above processing, stable sections are classified as voiced sections V1 to V3 separated by unstable sections as shown in FIG. Note that the voiced sections V1 to V3 correspond to a stable portion having a low value of the adjacent comparison error curve, and a portion having a high value of the adjacent comparison error curve corresponds to the unstable section. Therefore, based on this adjacent comparison error curve, expansion processing of the stable sections V1 to V3 is performed. This expansion process unconditionally extends the effective section that touches the start point and end point of the stable section to the start point and end point of the stable section, and compares the unstable section between two voiced sections with the adjacent comparison. The effective section is extended using the maximum value of the error as a breakpoint. Therefore, in the case of the adjacent comparison error curve as shown in FIG. 29, each voiced section V1 to V3 becomes an expanded voiced section V1E to V3E by the expansion processing. Note that FIG. 29 shows only one case where the slope of the adjacent comparison error in the extended voiced section is 0 (the bottom portion), but in actuality, the slope of the adjacent comparison error is 0. Needless to say, there may be a plurality of portions (base portions).

ステップ２２Ｃ：ステップ２２１からステップ２２Ｂまでの処理によって得られた各拡張有声区間について、隣合う区間の誤差すなわち隣接比較誤差の傾きが０となる部分（底辺部分）を検出し、そこを母音の基準位置とし、その母音の発音に対応した区間を音色区間として検出する処理を行う。
この音色区間を検出する処理では、底辺部分に相当する波形区間を基本区間として固定し、その前後に存在する複数の波形区間を移動区間として順次波形比較処理を行い、その比較誤差を求める。このようにして求めた比較誤差を基準比較誤差と呼ぶ。
すなわち、図３０（Ａ）に示すように隣接比較誤差の底辺部分に相当する波形区間ｍ０を基本区間とし、この基本区間とその両側に存在する複数の移動区間ｍ１，ｍ−１，ｍ２，ｍ−２，ｍ３，ｍ−３，ｍ４，ｍ−４・・・との間で波形比較処理を行う。基本区間は隣接比較誤差の最低値に相当するもの、すなわち、波形比較処理の結果、一致度が高いと認定された波形区間のことである。このようにして得られた比較誤差が図３０（Ｂ）のような基準比較誤差曲線となる。この基準比較誤差曲線は波形区間ｍ０を基準にして波形比較処理を行っている関係上、波形区間ｍ０の近傍では隣接比較誤差曲線と同じような傾向を示すが、比較的離れた部分では誤差率は大きくなり、誤差率最大に収束する。
そして、この基準比較誤差曲線の値（誤差率）が所定値以下の部分の波形区間が音色区間ＴＳ１となる。
なお、基準比較誤差曲線を求める場合にも、ステップ２２Ｂの有声区間検出処理のように、基準比較誤差曲線の値が所定値よりも大きくなった場合にそこを直ちに音色区間の区切りとしないで、所定値よりも大きい値が所定回数以上連続して発生した場合に音色区間の区切りとする。 Step 22C: With respect to each extended voiced section obtained by the processing from step 221 to step 22B, a portion (base portion) where an error between adjacent sections, that is, a slope of an adjacent comparison error becomes 0, is detected, and the detected portion is used as a vowel reference. A process is performed to detect a section corresponding to the position of the vowel as a timbre section.
In the process of detecting the tone color section, a waveform section corresponding to the base portion is fixed as a basic section, and a plurality of waveform sections existing before and after the base section are sequentially set as moving sections, and a waveform comparison process is sequentially performed to obtain a comparison error. The comparison error thus obtained is called a reference comparison error.
That is, as shown in FIG. 30A, a waveform section m0 corresponding to the bottom portion of the adjacent comparison error is set as a basic section, and a plurality of moving sections m1, m-1, m2, m on both sides of the basic section. Perform waveform comparison processing between −2, m3, m−3, m4, m−4,. The basic section is a section corresponding to the minimum value of the adjacent comparison error, that is, a waveform section determined as having high coincidence as a result of the waveform comparison processing. The comparison error thus obtained becomes a reference comparison error curve as shown in FIG. This reference comparison error curve shows the same tendency as the adjacent comparison error curve in the vicinity of the waveform section m0 due to the fact that the waveform comparison processing is performed with the waveform section m0 as the reference, but the error rate is relatively high in the part far away. Increases and converges to the maximum error rate.
The waveform section in which the value (error rate) of the reference comparison error curve is equal to or less than a predetermined value is the timbre section TS1.
In the case where the reference comparison error curve is obtained, as in the voiced section detection process in step 22B, when the value of the reference comparison error curve becomes larger than a predetermined value, the reference comparison error curve is not immediately used as a delimiter of the timbre section. When a value larger than a predetermined value occurs continuously for a predetermined number of times or more, it is defined as a timbre section break.

このようにして音色区間が決定した場合に、拡張有声区間内でこの音色区間以外の未決定区間長が所定長以上の場合には、決定した音色区間以外の拡張音声区間について同様の処理を行う。すなわち、図３０の場合には、図３０（Ｂ）のような音色区間ＴＳ１が決定した場合、この音色区間ＴＳ１以外の拡張有声区間すなわち未決定区間長が所定長以上なので、この未決定区間長についても同様に、図３０（Ｃ）に示すような隣接比較誤差の底辺部分に相当する波形区間ｎ０を基本区間とし、この基本区間とその前後に存在する複数の移動区間ｎ１，ｎ−１，ｎ２，ｎ−２，ｎ３，ｎ−３，ｎ４，ｎ−４・・・との間で波形比較処理を行う。このようにして得られた比較誤差が図３０（Ｄ）のような基準比較誤差曲線となる。この基準比較誤差曲線の値（誤差率）が所定値以下の部分の波形区間が今度は音色区間ＴＳ２となる。従って、図３０の拡張有声区間の場合には２つの音色区間ＴＳ１，ＴＳ２が検出されることになる。 When the timbre section is determined in this way, if the length of the undecided section other than the timbre section in the extended voiced section is equal to or longer than a predetermined length, the same processing is performed on the extended voice section other than the determined timbre section. . That is, in the case of FIG. 30, when the timbre section TS1 as shown in FIG. 30B is determined, the extended voiced section other than the timbre section TS1, that is, the undecided section length is equal to or longer than a predetermined length. Similarly, a waveform section n0 corresponding to the base portion of the adjacent comparison error as shown in FIG. 30C is defined as a basic section, and the basic section and a plurality of moving sections n1, n−1, The waveform comparison processing is performed among n2, n-2, n3, n-3, n4, n-4,. The comparison error obtained in this manner becomes a reference comparison error curve as shown in FIG. The waveform section in which the value (error rate) of the reference comparison error curve is equal to or less than a predetermined value is the timbre section TS2. Therefore, in the case of the extended voiced section in FIG. 30, two tone color sections TS1 and TS2 are detected.

ステップ２２Ｄ：ステップ２２Ｃの処理によって得られた音色区間をステップ５Ｃの定常区間拡張処理と同じようにして拡張する。すなわち、ステップ２２１からステップ２２Ｃまでの処理を行った結果、検出された音色区間ＳＴ１と音声区間ＳＴ２との間が１個の波形区間によって区切られている場合にはそのままその波形区間を音色区間ＳＴ１及びＳＴ２の区切りとすればよいが、隣合う音色区間同士が複数の波形区間によって区切られている場合には、これらの波形区間を前後の音色区間に接続して、音色区間を拡張しなければならない。この音色区間を拡張する処理は、図１５と同様の処理によって行われる。
なお、この場合もＢＰＦ処理後のサイン波形に近い波形に対して比較処理を行うことになるので、母音毎の特徴までもがフィルタリングされてしまい同母音区間すなわち同じ音色を抽出するという意義が薄れてしまう恐れがある。そこで、ピーク位置検出用と波形比較処理用の波形を別途用意して、それに基づいてそれぞれピーク位置検出及び波形比較処理を行うようにしてもよい。すなわち、ピーク位置検出用の波形としては時変動ＢＰＦ処理後の波形をそのまま用い、波形比較処理用としてはその時変動ＢＰＦ処理に用いた周波数成分の数倍周期の周波数帯波形を残すようなＢＰＦ処理を行った波形を用いるようにする。
なお、基本周波数を最低周波数とし、基本周波数の整数倍を最高周波数とするバンドパスフィルタ処理を行い、それを波形比較処理の対象波形として使用してもよいことはいうまでもない。 Step 22D: The timbre section obtained by the processing of step 22C is extended in the same manner as the stationary section extension processing of step 5C. That is, as a result of performing the processing from step 221 to step 22C, if the detected timbre section ST1 and the detected speech section ST2 are separated by one waveform section, the waveform section is directly changed to the timbre section ST1. And ST2, but if adjacent timbre sections are separated by a plurality of waveform sections, these waveform sections must be connected to the preceding and following timbre sections to extend the timbre section. No. The process of expanding the timbre section is performed by the same process as in FIG.
In this case as well, since the comparison process is performed on a waveform close to the sine waveform after the BPF process, even the features of each vowel are filtered, and the significance of extracting the same vowel section, that is, the same timbre, is diminished. There is a risk that it will. Therefore, a waveform for peak position detection and a waveform for waveform comparison processing may be separately prepared, and the peak position detection and waveform comparison processing may be respectively performed based on the waveforms. That is, the waveform after the time-varying BPF processing is used as it is as the waveform for peak position detection, and the BPF processing for leaving a frequency band waveform several times as long as the frequency component used for the time-varying BPF processing is used for the waveform comparison processing. The waveform obtained by performing the above is used.
It goes without saying that bandpass filter processing may be performed in which the basic frequency is set to the lowest frequency and the integral frequency of the basic frequency is set to the highest frequency, and this may be used as a target waveform for the waveform comparison processing.

このようにした拡張された音色区間について今度は音高の変化や安定性を考慮して細分化処理を行い、最終的な音程区間を決定する。ステップ２２Ｃまでの音色区間検出処理では、波形を引き延ばして比較しているため、『ああ』などのような連続母音による音声波形の音高変化であっても、それを１つの同じ音としてとらえるような仕組みになっている。従って、楽器音の楽音波形の場合には、持続系の楽器音の音高変化を見つけ出せないような事態も起こる。そこで、この実施の形態では、ステップ２２Ｃまでの処理によって得られた音色区間ごとに音高変化の状態を調べて、その状態に応じてさらに分割する必要があるかどうかの判定を行い。必要があると判定された場合には、音色区間をさらに細かな音程区間に分割する。この音色区間を音程区間に分割する処理は、図１６に示すようなノート距離変動曲線を用いて行う。 Subdivision processing is performed on the extended timbre section in this way in consideration of a change in pitch and stability, and a final interval section is determined. In the timbre section detection processing up to step 22C, since the waveform is expanded and compared, even if the pitch change of the voice waveform is caused by a continuous vowel such as "Oh", it is regarded as one and the same sound. It is a mechanism. Therefore, in the case of the musical sound waveform of the musical instrument sound, a situation may occur in which the pitch change of the continuous musical instrument sound cannot be found. Therefore, in this embodiment, the state of the pitch change is examined for each timbre section obtained by the processing up to step 22C, and it is determined whether or not further division is necessary according to the state. If it is determined that it is necessary, the timbre section is divided into smaller pitch sections. The process of dividing the timbre section into pitch sections is performed using a note distance variation curve as shown in FIG.

ステップ２２Ｅ：ステップ２２Ｄの処理によって検出された音程区間の中には、音符として存在しえないほど短いものが含まれていたりする場合がある。故に、このステップでは１小節を所定の音符長（例えば８分音符長）を単位としたグリッドに均等に分割し、このグリッドに前述の音程区間を当てはめて、音価を決定するようにしている。各音程区間の先頭が最も近いグリッドにその音程区間を当てはめるようにしているが、１つのグリッドに対して２つ以上の音程区間が最も近いという場合には、それらの音程区間の中で音長の長いものをそのグリッドに当てはめるようにした。
例えば、図３１は８分音符長で分割された１小節分に該当する音程区間の一例を示す図である。図において、ステップ２２Ｄによって最終的に決定された音程区間はＰＴ１〜ＰＴ５のようになったとする。この場合、音程区間ＰＴ１はグリッドＧ２に、音程区間ＰＴ２はグリッドＧ４に、音程区間ＰＴ３はグリッドＧ５に当てはまる。しかしながら、グリッドＧ６に関しては、音程区間ＰＴ４と音程区間ＰＴ５の２つがグリッドＧ６に最も近い音程区間である。従って、この場合には、音程区間ＰＴ４と音程区間ＰＴ５の音長の長い方、すなわち音程区間ＰＴ５がグリッグＧ６に当てはめられることになる。
なお、グリッドＧ５に音程区間ＰＴ３が当てはめられている関係上、音程区間ＰＴ２の音長はクリッドＧ４からグリッドＧ５までとなるが、このときに、音程区間ＰＴ３が存在しない場合には、その音程区間ＰＴ２の音長の最終位置をそのまま採用してもよいし、音程区間ＰＴ２の末尾が最も近いグリッドにその音程区間を当てはめるようにしてもよい。この場合、音程区間の存在しない部分にノートオフ（休符）を当てはめるようにしてもよい。
また、音程区間ＰＴ３が存在しない場合には、その音程区間ＰＴ２の音長の最終位置を次の音程区間ＰＴ５の開始位置であるグリッドＧ６までとしてもよい。この場合には、ノートオフ（休符）などは存在しないことになる。
このように図２２の定常区間検出処理によって音価が決定された後は、図１のステップ１６の音高列決定処理によって、各音価に最適な音高列が割り当てられる。この音高列決定処理は第１の実施の形態と同じなので説明は省略する。 Step 22E: The interval detected by the process of step 22D may include a note that is too short to exist as a note. Therefore, in this step, one measure is equally divided into a grid in units of a predetermined note length (for example, eighth note length), and the pitch interval is applied to this grid to determine a note value. . Although the interval is assigned to the grid closest to the beginning of each interval, if two or more intervals are closest to one grid, the pitches in those intervals are set. The longer ones to the grid.
For example, FIG. 31 is a diagram showing an example of a musical interval corresponding to one measure divided by an eighth note length. In the figure, it is assumed that the pitch intervals finally determined in step 22D are like PT1 to PT5. In this case, the interval PT1 applies to the grid G2, the interval PT2 applies to the grid G4, and the interval PT3 applies to the grid G5. However, with respect to the grid G6, the interval PT4 and the interval PT5 are the intervals closest to the grid G6. Therefore, in this case, the longer pitch of the interval PT4 and the interval PT5, that is, the interval PT5 is applied to the Grigg G6.
Note that, since the interval PT3 is applied to the grid G5, the pitch of the interval PT2 is from the grid G4 to the grid G5. At this time, if the interval PT3 does not exist, the interval of the interval PT3 is set. The final position of the pitch of PT2 may be used as it is, or the interval of the interval PT2 may be applied to the nearest grid. In this case, note-off (rest) may be applied to a portion where no interval exists.
When there is no interval PT3, the last position of the pitch of the interval PT2 may be set to the grid G6, which is the start position of the next interval PT5. In this case, note-off (rest) does not exist.
After the pitch value is determined by the stationary section detection process of FIG. 22 in this manner, the optimum pitch sequence is assigned to each pitch value by the pitch sequence determination process of step 16 of FIG. This pitch sequence determination processing is the same as that of the first embodiment, and a description thereof will be omitted.

上記実施例に係る音信号分析装置によれば、マイク等からの入力音のピッチ又はレベルが微妙にゆれた場合でも、そのゆれた部分以外の音楽的な音の定常部分すなわち１つの音符に相当する部分を分析することのできる音信号分析装置を提供することができる。 According to the sound signal analyzer according to the embodiment, even when the pitch or level of the input sound from the microphone or the like is slightly fluctuated, it corresponds to a stationary part of a musical sound other than the fluctuated part, that is, one note. It is possible to provide a sound signal analyzer capable of analyzing a part to be reproduced.

図２の電子楽器が演奏情報発生装置として動作する際のメインフローを示す図である。FIG. 3 is a diagram showing a main flow when the electronic musical instrument of FIG. 2 operates as a performance information generating device. この発明に係る楽音情報分析装置及び演奏情報発生装置を内蔵した電子楽器の構成を示すハードブロック図である。1 is a hardware block diagram showing a configuration of an electronic musical instrument incorporating a musical sound information analyzer and a performance information generator according to the present invention. 図１のステップ１３の有効区間検出処理の詳細を示す図である。FIG. 3 is a diagram illustrating details of an effective section detection process in step 13 of FIG. 1. 図１のステップ１４の安定区間検出処理の詳細を示す図である。FIG. 3 is a diagram illustrating details of a stable section detection process in step 14 of FIG. 1. 図１のステップ１５の定常区間検出処理の詳細を示す図である。FIG. 3 is a diagram illustrating details of a steady section detection process in step 15 of FIG. 1. 図１のステップ１６の音高列決定処理の詳細を示す図である。FIG. 2 is a diagram illustrating details of a pitch sequence determination process in step 16 of FIG. 1. サンプリング周波数４４．１ｋＨｚでサンプリングされた音声信号すなわちディジタルサンプル信号の波形値の一例を示す図である。FIG. 4 is a diagram illustrating an example of a waveform value of an audio signal sampled at a sampling frequency of 44.1 kHz, that is, a digital sample signal. 図３の有効区間検出処理の動作例の概念を示す図である。FIG. 4 is a diagram illustrating a concept of an operation example of an effective section detection process in FIG. 3. 図４の安定区間検出処理の動作例の概念を示す図である。FIG. 5 is a diagram illustrating a concept of an operation example of the stable section detection processing in FIG. 4. 図５の第１次及び第２次ＢＰＦ処理並びに波形比較処理による動作例の概念を示す図である。FIG. 6 is a diagram illustrating the concept of an operation example by the first and second order BPF processes and the waveform comparison process of FIG. 図５のステップ５１の第１次ＢＰＦ処理後における安定区間の楽音波形の強いピークがマイナス側に現れ、弱いピークがプラス側に現れる場合の波形例を示す図である。FIG. 6 is a diagram illustrating a waveform example in a case where a strong peak of a tone waveform in a stable section appears on the minus side and a weak peak appears on a plus side after the primary BPF processing in step 51 of FIG. 5. 図５の波形比較処理の中で行われる誤差率の算出方法がどのように行われるのか、その具体例を２個の比較波を用いて示した図である。FIG. 6 is a diagram showing a specific example of how a calculation method of an error rate performed in the waveform comparison processing of FIG. 5 is performed using two comparison waves. 図５の波形比較処理によって、図１１の２個の比較波からどのようにして誤差率が算出されるのか、具体的な数値を示す図である。FIG. 12 is a diagram showing specific numerical values on how the error rate is calculated from the two comparison waves in FIG. 11 by the waveform comparison processing in FIG. 5. 時定数を小さめに設定した場合に図５のピーク基準位置検出処理によってピーク基準位置がどのように抽出されるか、その具体例を示す図である。FIG. 6 is a diagram showing a specific example of how a peak reference position is extracted by the peak reference position detection processing of FIG. 5 when the time constant is set to be small. 図５のステップ５Ｃの定常区間拡張処理の動作例を示す図である。FIG. 6 is a diagram illustrating an operation example of a steady section extension process in step 5C in FIG. 5. 図５のステップ５Ｄのノート距離による細分化処理の動作例を示す図である。FIG. 6 is a diagram illustrating an operation example of a subdivision process based on a note distance in step 5D of FIG. 5. 図５のステップ５Ｄのノート距離による細分化処理の別の動作例を示す図である。FIG. 16 is a diagram illustrating another operation example of the subdivision processing based on the note distance in step 5D of FIG. 5. 図６のステップ６１の各定常区間の代表周波数決定処理を行う場合に、定常区間のどの部分から代表周波数を検出するのかその動作例を示す図である。FIG. 7 is a diagram illustrating an example of an operation of determining a representative frequency from a portion of a steady section when a representative frequency determination process of each steady section in step 61 of FIG. 6 is performed. 図６のステップ６１の各定常区間からどのようにして代表周波数が検出されるのかその動作例を示す図である。FIG. 7 is a diagram illustrating an operation example of how a representative frequency is detected from each stationary section in step 61 of FIG. 6. 図１のステップ１３の有効区間検出処理の別の実施の形態に係るものの詳細を示す図である。FIG. 13 is a diagram illustrating details of an effective section detection process according to another embodiment in step 13 of FIG. 1. 図１のステップ１４の安定区間検出処理の別の実施の形態に係るものの詳細を示す図である。FIG. 9 is a diagram showing details of a stable section detection process according to another embodiment in step 14 of FIG. 1. 図１のステップ１５の定常区間検出処理の別の実施の形態に係るものの詳細を示す図である。FIG. 11 is a diagram showing details of a stationary section detection process according to another embodiment of step 15 in FIG. 1. 図２０の有効区間検出処理の動作例の概念を示す図である。FIG. 21 is a diagram illustrating a concept of an operation example of an effective section detection process in FIG. 20. 図２１のステップ２１１からステップ２１５までの処理の動作例の概念を示す図である。FIG. 22 is a diagram illustrating the concept of an operation example of processing from step 211 to step 215 in FIG. 21. 図２１のステップ２１５の合計傾斜の算出例を示す図である。FIG. 22 is a diagram illustrating an example of calculating a total inclination in step 215 of FIG. 21. 図２１のステップ２１６及びステップ２１８の処理の動作例の概念を示す図である。FIG. 22 is a diagram illustrating the concept of an operation example of the processing in steps 216 and 218 in FIG. 21. 図２２のステップ２２Ａのピーク基準位置検出処理の動作例の概念を示す図である。FIG. 23 is a diagram illustrating a concept of an operation example of a peak reference position detection process in step 22A of FIG. 22. 図２２のステップ２２Ｂの有声区間検出処理における動作例の概念の前半部分を示す図である。FIG. 23 is a diagram illustrating the first half of the concept of an operation example in the voiced section detection process in step 22B of FIG. 22. 図２２のステップ２２Ｂの有声区間検出処理における動作例の概念の後半部分を示す図である。FIG. 23 is a diagram illustrating the latter half of the concept of the operation example in the voiced section detection process in step 22B of FIG. 22. 図２２のステップ２２Ｃの音色区間検出処理における動作例の概念を示す図である。FIG. 23 is a diagram showing a concept of an operation example in the tone color section detection processing in step 22C of FIG. 22. 図２２のステップ２２Ｅの音価決定処理における動作例の概念を示す図である。FIG. 23 is a diagram illustrating a concept of an operation example in a tone value determination process in step 22E of FIG. 22.

Explanation of reference numerals

１…ＣＰＵ、２…プログラムメモリ、３…ワーキングメモリ、４…演奏データメモリ、５…押鍵検出回路、６…マイクインターフェイス、７…スイッチ検出回路、８…表示回路、９…音源回路、１０…鍵盤、１Ａ…マイクロフォン、１Ｂ…テンキー＆各種スイッチ、１Ｃ…ディスプレイ、１Ｄ…サウンドシステム、１Ｅ…データ及びアドレスバス DESCRIPTION OF SYMBOLS 1 ... CPU, 2 ... program memory, 3 ... working memory, 4 ... performance data memory, 5 ... key press detection circuit, 6 ... microphone interface, 7 ... switch detection circuit, 8 ... display circuit, 9 ... sound source circuit, 10 ... Keyboard, 1A microphone, 1B numeric keypad and various switches, 1C display, 1D sound system, 1E data and address bus

Claims

Input means for inputting an arbitrary sound signal consisting of a chronological sequence of one or more notes,
Section detection means for detecting a section estimated to correspond to each note from the input sound signal,
According to the time series, the detected sections are respectively arranged on grids divided at time intervals corresponding to predetermined note lengths, and are closest to a predetermined one of the start or end ends of each section. Means for allocating one grid position to each section, and selecting one section having the longest time length as a valid note when a plurality of sections are allocated to the same grid position. Signal analyzer.

A computer-readable recording medium having, as its storage content, a group of instructions for a program for analyzing a sound signal executed by the computer, and the program for analyzing the sound signal includes:
Inputting an arbitrary sound signal consisting of a time-series sequence of one or more notes;
Detecting a section estimated to correspond to each note from the input sound signal;
According to the time series, the detected sections are respectively arranged on grids divided at time intervals corresponding to predetermined note lengths, and are closest to a predetermined one of the start or end ends of each section. Allocating one grid position to each section, and selecting one section having the longest time length as a valid note when a plurality of sections are allocated to the same grid position as a result. A recording medium characterized by the above-mentioned.