JPH10149160A

JPH10149160A - Sound signal analyzing device and performance information generating device

Info

Publication number: JPH10149160A
Application number: JP8324774A
Authority: JP
Inventors: Tomoyuki Funaki; 知之船木
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 1996-11-20
Filing date: 1996-11-20
Publication date: 1998-06-02
Anticipated expiration: 2016-11-20
Also published as: JP3279204B2

Abstract

PROBLEM TO BE SOLVED: To easily analyze an effective section even when a level of an input sound is attired delicately by dividing a mean sound pressure level of successively inputted signals to valid, invalid section based on a first prescribed value and further specifying the valid section based on first, second prescribed lengths. SOLUTION: The section where the average sound pressure level (A) of the successively inputted signals is the first prescribed value (e.g. 20%) or above is defined as the valid section where a musical sound exists, and the section of the first prescribed value or below is defined as the invalid section (B). Then, when the time length of the invalid section held between both side valid sections is the first prescribed length or below, these both sides are synthesized to obtain the valid section (C), and thereafter, when the time length of the valid section held between both side invalid sections is the second prescribed length or below, these both sides are synthesized to obtain the invalid section (D). At this point of time, the mean value of the average sound pressure level is calculated, and when it is the second prescribed value or below, its valid section is changed to the invalid section (E).

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、入力音声や楽音
に基づいて音楽的な音が存在する区間（有効区間）やそ
の音楽的な音の定常部分を分析する音信号分析装置、及
びこの入力音声や楽音に基づいてＭＩＤＩ情報等の演奏
情報を発生する演奏情報発生装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a sound signal analyzing apparatus for analyzing a section (effective section) in which a musical sound exists based on an input voice or a musical sound, and a stationary part of the musical sound, and a sound signal analyzing apparatus. The present invention relates to a performance information generation device that generates performance information such as MIDI information based on voices and musical sounds.

【０００２】[0002]

【従来の技術】最近、コンピュータ等を用いて、ＭＩＤ
Ｉ情報等の演奏情報を発生し、その演奏情報に基づいて
演奏音を再生するコンピュータ演奏システムが新たな楽
音演奏装置として注目されている。この種のコンピュー
タ演奏システムでは、演奏情報を発生するためのデータ
を入力する方式として、リアルタイム入力方式、ステッ
プ入力方式、数値入力方式、楽譜入力方式等がある。2. Description of the Related Art Recently, MIDs have been
A computer performance system that generates performance information such as I information and reproduces performance sounds based on the performance information has attracted attention as a new musical performance apparatus. In this type of computer performance system, a method for inputting data for generating performance information includes a real-time input method, a step input method, a numerical value input method, a score input method, and the like.

【０００３】リアルタイム入力方式は、テープレコーダ
のように演奏者が実際に演奏した鍵盤等の演奏操作子の
操作情報をリアルタイムに演奏情報に変換する方式であ
る。数値入力方式は、音高（ピッチ）、音の長さ、音の
強弱等の演奏情報をコンピュータのキーボードから直接
数値データとして入力する方式である。楽譜入力方式
は、コンピュータのファンクションキーやマウス等を用
いてディスプレイ上の楽譜（５線譜）に単純化した音譜
記号等を配置していく方式である。ステップ入力方式
は、音譜をＭＩＤＩ鍵盤やソフトウェア鍵盤で入力し、
音の長さをコンピュータのファンクションキーやマウス
等を用いて入力する方式である。[0003] The real-time input method is a method of converting operation information of a performance operator, such as a keyboard, actually played by a player like a tape recorder into performance information in real time. The numerical value input method is a method in which performance information such as a pitch (pitch), a sound length, and the strength of a sound is directly input as numerical data from a computer keyboard. The score input method is a method in which simplified music symbols and the like are arranged in a score (5-line notation) on a display using a function key of a computer, a mouse, or the like. In the step input method, the musical score is input on a MIDI keyboard or software keyboard,
In this method, the length of a sound is input using a function key of a computer, a mouse, or the like.

【０００４】上述の各入力方式のうち、リアルタイム入
力方式は、実際の演奏操作状態をそのまま演奏情報とし
て記憶することができるので、人間的なニュアンスを表
現し易く、また短時間入力が可能であるという利点を有
する。しかし、この方式は演奏者自身に高度の楽器演奏
能力が必要であり、初心者等には不向きな入力方式であ
る。そこで、リアルタイム入力方式の利点を生かし、初
心者でも短時間で簡単に演奏情報を入力できるようにし
た演奏情報発生装置として、人声音又は自然楽器の楽音
をマイクを介して直接入力し、その入力音に応じて演奏
情報を発生するものがある。すなわち、これは、人声音
やギター等の音（単音）をマイクから入力するだけで、
簡単にＭＩＤＩ信号を発生することができ、ＭＩＤＩキ
ーボード等を使用しなくてもＭＩＤＩ機器を制御でき
る。[0004] Among the above-mentioned input methods, the real-time input method can store an actual performance operation state as performance information as it is, so that human nuances can be easily expressed and a short-time input is possible. It has the advantage that. However, this method requires the player to have a high level of musical instrument playing ability, and is not suitable for beginners. Therefore, as a performance information generator that allows beginners to easily input performance information in a short time by taking advantage of the real-time input method, a human voice or a musical instrument of a natural instrument is directly input via a microphone, and the input sound is input. In some cases, performance information is generated in response to a request. In other words, this is just to input the sound (single sound) of human voice and guitar etc. from the microphone,
A MIDI signal can be easily generated, and a MIDI device can be controlled without using a MIDI keyboard or the like.

【０００５】[0005]

【発明が解決しようとする課題】従来の演奏情報発生装
置では、マイクからの入力音のピッチ変化に対して、次
のような処理を行ってＭＩＤＩ情報を発生している。す
なわち、第１の方法はピッチ変化を半音単位で検出し、
そのピッチのノート情報のみを発生する。第２の方法は
ピッチ変化を半音単位で検出し、そのピッチのノート情
報と、その間のピッチ変化に関するピッチベンド情報
（音高変化情報）とを発生する。第３の方法はピッチ変
化を上下１オクターブの範囲で変化するピッチベンド情
報として発生する。また、ノート情報（ノートオン又は
ノートオフ）を発生するのに、入力音のレベルを所定の
基準値と比較し、その基準値よりも入力音のレベルが大
きくなった時点でノートオンを、小さくなった時点でノ
ートオフを発生している。In a conventional performance information generating apparatus, MIDI information is generated by performing the following processing for a pitch change of a sound input from a microphone. That is, the first method detects a pitch change in semitone units,
Only note information of the pitch is generated. The second method detects a pitch change in semitone units, and generates note information of the pitch and pitch bend information (pitch change information) relating to a pitch change therebetween. In the third method, the pitch change is generated as pitch bend information that changes in the range of one octave up and down. To generate note information (note on or note off), the level of the input sound is compared with a predetermined reference value, and when the level of the input sound becomes higher than the reference value, the note-on is reduced. A note-off has occurred at that point.

【０００６】しかしながら、上記第１及び第２の方法の
ようにピッチ変化を半音単位で検出する場合において、
入力音のピッチが微妙にゆれると意図しないノート情報
（ノートオン又はノートオフ）が多数発生するという問
題がある。また、第３の方法のようにピッチ変化をピッ
チベンド情報で発生する場合は、ピッチ変化をピッチベ
ンド情報で忠実に追従することができるが、採譜のよう
な目的には適さない。さらに、入力レベルに応じてノー
ト情報を発生すると、入力音のレベルのゆれに応じて意
図しないノート情報が多数発生するという問題がある。
ところで、リアルタイム入力方式においては、複数の音
が任意の時間間隔で時系列的にマイクに入力されるの
で、音の存在する部分に対して効率的な分析を行うこと
が要求される。すなわち、マイク入力された信号に対し
てピッチ等の分析を絶えず行うようにしていたのでは、
実際には音が入力されていない時間においても無駄な分
析処理をすることになるので好ましくない。そこで、マ
イク入力された信号から実際に音が存在している区間
（有効区間）を抽出し、抽出された有効区間についての
みピッチ分析等の複雑な分析処理を施すようにするのが
効率的である。そのための従来の有効区間の抽出法は、
単純に所定基準レベルと入力信号レベルを比較して有効
区間の抽出を行っていたので、入力音のレベルが微妙に
変動するような場合、特に基準レベル付近で変動した場
合には有効区間の抽出が不正確になると問題があった。However, in the case of detecting a pitch change in semitone units as in the first and second methods,
If the pitch of the input sound is slightly changed, there is a problem that many unintended note information (note on or note off) is generated. Further, when the pitch change is generated by the pitch bend information as in the third method, the pitch change can be faithfully followed by the pitch bend information, but is not suitable for purposes such as transcription. Further, when note information is generated according to the input level, there is a problem that a large number of unintended note information is generated according to the fluctuation of the level of the input sound.
By the way, in the real-time input method, since a plurality of sounds are input to the microphone in a time series at arbitrary time intervals, it is required to efficiently analyze a portion where the sound exists. In other words, if the analysis of the pitch etc. was constantly performed on the signal input to the microphone,
Actually, useless analysis processing is performed even during a time when no sound is input, which is not preferable. Therefore, it is efficient to extract a section (effective section) where a sound actually exists from a signal input to the microphone, and to perform a complicated analysis process such as pitch analysis only on the extracted effective section. is there. The conventional method of extracting valid sections for that purpose is as follows:
Since the effective section is extracted simply by comparing the input signal level with the predetermined reference level, the effective section is extracted when the input sound level fluctuates slightly, especially when the level fluctuates near the reference level. Was inaccurate and had problems.

【０００７】この発明は、マイク等からの入力音のピッ
チ又はレベルが微妙にゆれた場合でも、音楽的な音が存
在する区間（有効区間）を容易に分析することのできる
音信号分析装置を提供することを目的とする。この発明
は、マイク等からの入力音のピッチ又はレベルが微妙に
ゆれた場合でも、そのゆれた部分以外の音楽的な音の定
常部分すなわち１つの音符に相当する部分を分析するこ
とのできる音信号分析装置を提供することを目的とす
る。詳しくは、入力された１連の音から定常部分を有効
に分析し、これに基づき音のピッチを正確に分析できる
ようにするものである。この発明は上述の点に鑑みてな
されたものであり、マイク等からの入力音のピッチ又は
レベルが微妙にゆれた場合でもそのピッチに対するノー
ト情報を確実に発生することのできる演奏情報発生装置
を提供することを目的とする。The present invention provides a sound signal analyzing apparatus capable of easily analyzing a section (effective section) in which a musical sound exists, even when the pitch or level of an input sound from a microphone or the like slightly fluctuates. The purpose is to provide. According to the present invention, even when the pitch or level of an input sound from a microphone or the like is slightly fluctuated, a sound capable of analyzing a stationary part of a musical sound other than the fluctuated part, that is, a part corresponding to one note. It is an object to provide a signal analyzer. More specifically, a steady part is effectively analyzed from a series of inputted sounds, and the pitch of the sound can be accurately analyzed based on the steady part. The present invention has been made in view of the above points, and provides a performance information generating device capable of reliably generating note information for a pitch even when the pitch or level of an input sound from a microphone or the like slightly fluctuates. The purpose is to provide.

【０００８】[0008]

【課題を解決するための手段】請求項１に記載の音信号
分析装置は、外部から任意の音信号を入力するための入
力手段と、前記入力手段から順次入力された信号のサン
プル振幅値の所定サンプル数にわたる平均値をそれぞれ
求め、その結果を時系列的な平均音圧レベル情報として
出力する演算手段と、前記演算手段によって求められた
平均音圧レベルが第１の所定値以上である区間を音楽的
な音が存在する有効区間とし、前記第１の所定値未満の
区間を音楽的な音の存在しない無効区間とする区間決定
手段と、両側を前記有効区間に挟まれた前記無効区間の
中でその区間の時間長が第１の所定長未満の場合には、
その無効区間を有効区間に変更し、変更後の有効区間と
その両側の有効区間とを合成して新たな有効区間とする
有効区間化手段と、前記有効区間化手段による処理が終
了した時点で、両側を前記無効区間に挟まれた前記有効
区間の中でその区間の時間長が第２の所定長未満の場合
には、その有効区間を無効区間に変更し、変更後の無効
区間とその両側の無効区間とを合成して新たな無効区間
とする第１の無効区間化手段と、前記第１の無効区間化
手段による処理が終了した時点における前記有効区間の
それぞれについて、前記平均音圧レベルの平均値を算出
し、それが第２の所定値未満の場合には、その有効区間
を無効区間に変更する第２の無効区間化手段とを具えた
ものである。請求項１に記載の音信号分析装置では、入
力手段から順次入力される信号の各サンプル振幅値の所
定サンプル数にわたる平均値をそれぞれ求めているの
で、入力された音信号のレベル変動に敏感に応答した滑
らかに変化する平均音圧レベル情報を得ることだでき
る。このようにして得られた平均音圧レベルを第１の所
定値に基づいて有効区間及び無効区間に区分けし、区分
けされた有効区間及び無効区間をさらに第１及び第２の
所定長に基づいて最終的に有効区間を特定している。こ
れによって、マイク等からの入力音のレベルが微妙にゆ
れた場合でも、音楽的な音が存在する区間（有効区間）
を容易に分析することができる。According to a first aspect of the present invention, there is provided a sound signal analyzing apparatus, comprising: input means for inputting an arbitrary sound signal from outside; and a sample amplitude value of a signal sequentially input from the input means. Calculating means for obtaining an average value over a predetermined number of samples and outputting the result as time-series average sound pressure level information; and a section in which the average sound pressure level obtained by the calculating means is equal to or more than a first predetermined value. Is a valid section in which a musical sound is present, and a section less than the first predetermined value is an invalid section in which no musical sound is present, and the invalid section sandwiched between the valid sections on both sides. If the time length of the section is less than the first predetermined length in
An effective section which changes the invalid section into a valid section, combines the changed valid section and the valid sections on both sides thereof into a new valid section, and a point in time when the processing by the valid section forming means is completed. If the time length of the valid section sandwiched between the invalid sections on both sides is less than the second predetermined length, the valid section is changed to the invalid section, and the changed invalid section and the A first invalid section generating unit that combines the invalid sections on both sides into a new invalid section; and the average sound pressure for each of the valid sections at the time when the processing by the first invalid section ends. A second invalid section conversion means for calculating an average value of the levels and, when the average value is less than a second predetermined value, changing the valid section into an invalid section. In the sound signal analyzer according to the first aspect, since the average value of each sample amplitude value of the signal sequentially input from the input means over a predetermined number of samples is obtained, the average value is sensitive to the level fluctuation of the input sound signal. You can get the responding smoothly changing average sound pressure level information. The average sound pressure level obtained in this manner is divided into an effective section and an invalid section based on the first predetermined value, and the divided effective section and invalid section are further divided based on the first and second predetermined lengths. Finally, the valid section is specified. As a result, even when the level of the input sound from the microphone or the like slightly fluctuates, a section in which a musical sound exists (an effective section)
Can be easily analyzed.

【０００９】請求項２に記載の音信号分析装置は、外部
から任意の音信号を入力するための入力手段と、前記入
力手段から順次入力された信号のサンプル振幅値の所定
サンプル数にわたる平均値をそれぞれ求め、その結果を
時系列的な平均音圧レベル情報として出力する演算手段
と、前記演算手段によって求められた平均音圧レベルが
第１の所定値以上である区間を有効区間とし、前記第１
の所定値未満の区間であって両側を前記有効区間に挟ま
れた区間を無効区間として、これ以外の前記平均音圧レ
ベルの両端側の区間を未確定区間とする区間決定手段
と、両側を前記有効区間に挟まれた前記無効区間の中で
その区間の時間長が第１の所定長未満の場合には、その
無効区間を有効区間に変更し、変更後の有効区間とその
両側の有効区間とを合成して新たな有効区間とする有効
区間化手段と、前記有効区間化手段による処理が終了し
た時点で、両側を前記無効区間に挟まれた前記有効区間
の中でその区間の時間長が第２の所定長未満の場合に
は、その有効区間を無効区間に変更し、変更後の無効区
間とその両側の無効区間とを合成して新たな無効区間と
し、前記未確定区間に隣接する前記有効区間の中でその
区間の時間長が前記第２の所定長未満の場合には、その
有効区間と、それに隣接する無効区間と未確定区間とを
合成して新たな未確定区間とする第１の無効区間化手段
と、前記第１の無効区間化手段による処理が終了した時
点における前記有効区間及び前記未確定区間のそれぞれ
について、前記平均音圧レベルの平均値を算出し、それ
が第２の所定値未満の場合には前記有効区間又は前記未
確定区間を無効区間に変更し、前記第２の所定値以上の
場合には前記未確定区間を有効区間に変更する第２の無
効区間化手段とを具えたものである。請求項２に記載の
音信号分析装置は、基本的には請求項１に記載のものと
同じであり、求められた平均音圧レベルを第１の所定値
に基づいて有効区間及び無効区間に区分けするが、その
ときに平均音圧レベルの両端部分すなわち平均音圧レベ
ルの立上り部分と立下り部分を未確定区間とし、その後
の有効区間の特定処理を行うようにした。これによっ
て、入力された音信号の立上り部分及び立下り部分を正
確に有効区間であるかどうかの分析を行うことができ
る。According to a second aspect of the present invention, there is provided a sound signal analyzer, comprising: input means for inputting an arbitrary sound signal from the outside; and an average value over a predetermined number of samples of a sample amplitude value of a signal sequentially input from the input means. Calculating means for outputting the result as time-series average sound pressure level information, and a section in which the average sound pressure level obtained by the calculating means is equal to or more than a first predetermined value is defined as an effective section, First
A section that is less than the predetermined value of and a section sandwiched between the effective sections on both sides as an invalid section, and other sections on both ends of the average sound pressure level as undetermined sections; When the time length of the invalid section between the valid sections is shorter than the first predetermined length, the invalid section is changed to the valid section, and the valid section after the change and the valid sections on both sides are changed. An effective section forming means for combining a section and a new effective section; and, when the processing by the effective section forming means is completed, the time of the section in the effective section sandwiched between the invalid sections on both sides. If the length is less than the second predetermined length, the valid section is changed to an invalid section, the changed invalid section and the invalid sections on both sides are combined to form a new invalid section, and When the time length of the adjacent effective section is the If the length is less than the predetermined length of the first invalid section, the first invalid section is composed of a valid section, an invalid section adjacent thereto and an undetermined section, and a new undetermined section. Calculating an average value of the average sound pressure level for each of the effective section and the undetermined section at the time when the processing by the converting means is completed, and if the average value is less than a second predetermined value, the effective section or the A second invalid section conversion means for changing the undetermined section to an invalid section and changing the undetermined section to a valid section when the value is equal to or more than the second predetermined value. A sound signal analyzer according to a second aspect is basically the same as that of the first aspect, and converts the determined average sound pressure level into an effective section and an invalid section based on a first predetermined value. At this time, both ends of the average sound pressure level, that is, the rising part and the falling part of the average sound pressure level are set as undetermined sections, and the subsequent processing for specifying the effective section is performed. As a result, it is possible to analyze whether the rising portion and the falling portion of the input sound signal are correctly valid sections.

【００１０】請求項３に記載の音信号分析装置は、請求
項１又は２に記載の音信号分析装置において、前記第２
の無効区間化手段による処理が終了した時点で、前記演
算手段によって求められた平均音圧レベルと前記第１の
所定値よりも小さな第２の所定値とを用いて前記有効区
間を拡張する拡張手段をさらに設けたものである。請求
項１又は２に記載の音信号分析装置では、平均音圧レベ
ルの有効区間と無効区間との境界は第１の所定値に基づ
いて決定されているので、この第１の所定値をどのよう
な値にするかによって大幅に依存するようになるので、
一連の処理によって特定された有効区間長を第１の所定
値よりも小さな第２の所定値を用いて拡張することによ
って、音楽的な音が存在する有効区間の特定を正確に行
うようにしている。The sound signal analyzer according to claim 3 is the sound signal analyzer according to claim 1 or 2, wherein
At the time when the processing by the invalid section conversion means is completed, the effective section is extended using the average sound pressure level obtained by the calculation means and a second predetermined value smaller than the first predetermined value. Means are further provided. In the sound signal analyzer according to claim 1 or 2, since the boundary between the effective section and the invalid section of the average sound pressure level is determined based on the first predetermined value, the first predetermined value is It depends greatly on whether you choose such a value,
By extending the effective section length specified by the series of processing by using a second predetermined value smaller than the first predetermined value, it is possible to accurately specify an effective section in which a musical sound exists. I have.

【００１１】請求項４に記載の音信号分析装置は、外部
から任意の音信号を入力するための入力手段と、前記入
力手段から入力する前記音信号に対して周期基準となる
候補位置の複数を検出する周期基準検出手段と、前記音
信号について、前記候補位置で表される隣接する区間同
士の波形の一致度を演算し、その一致度の高いもの同士
を接続して同波形区間を検出する区間検出手段と、前記
区間検出手段によって検出された同波形区間に基づいて
定常区間を検出する定常区間決定手段とを具えたもので
ある。請求項４に記載の音信号分析装置は、音信号の正
負両側で周期基準となる候補位置を検出し、その候補位
置に基づいて同波形区間を検出し、その正負両側の同波
形区間を重ね合わせているので、正負側で音信号のピッ
チやレベルが微妙にゆれた場合でもその誤差を減少する
ことができる。そして、このようにして決定された同音
色区間を音高の急激な変化及び音圧の急激な変化に基づ
いて最終的な１つの音符に相当する定常区間を分析す
る。これによって、マイク等からの入力音のピッチ又は
レベルが微妙にゆれた場合でも、そのゆれた部分以外の
音楽的な音の１つの音符に相当する定常部分を分析する
ことができる。According to a fourth aspect of the present invention, there is provided a sound signal analyzer, comprising: input means for inputting an arbitrary sound signal from the outside; and a plurality of candidate positions serving as a cycle reference for the sound signal input from the input means. Cycle reference detecting means for detecting the same signal, and calculating the degree of coincidence between waveforms of adjacent sections represented by the candidate positions with respect to the sound signal, and connecting those having higher degrees of coincidence to detect the same waveform section And a steady section determining means for detecting a steady section based on the same waveform section detected by the section detecting means. The sound signal analyzing apparatus according to claim 4, wherein a candidate position serving as a period reference is detected on both the positive and negative sides of the sound signal, the same waveform section is detected based on the candidate position, and the same waveform section on both the positive and negative sides is overlapped. Since the pitch is adjusted, even if the pitch or level of the sound signal slightly fluctuates on the positive or negative side, the error can be reduced. Then, the steady-state section corresponding to one final note is analyzed based on the rapid change of the pitch and the rapid change of the sound pressure in the same tone section determined in this way. As a result, even when the pitch or level of the input sound from the microphone or the like slightly fluctuates, it is possible to analyze a stationary part corresponding to one note of a musical sound other than the fluctuated part.

【００１２】請求項５に記載の音信号分析装置は、外部
から任意の音信号を入力するための入力手段と、前記入
力手段から入力する前記音信号に対して周期基準となる
仮候補位置の複数を検出する第１の周期基準検出手段
と、前記第１の周期基準検出手段によって検出された前
記仮候補位置に基づいて前記音信号の最大周波数及び最
小周波数を検出する周波数帯検出手段と、この周波数帯
検出手段によって、検出された最大周波数及び最小周波
数をカットオフ周波数とするバンドパスフィルタ処理を
前記入力手段から入力する前記音信号に施すフィルタ処
理手段と、前記フィルタ処理手段から出力される前記音
信号に対して周期基準となる候補位置の複数を検出する
第２の周期基準検出手段と、前記音信号について、前記
候補位置で表される隣接する区間同士の波形の一致度を
演算し、その一致度の高いもの同士を接続して同波形区
間を検出する区間検出手段と、前記区間検出手段によっ
て検出された同波形区間に基づいて定常区間を検出する
定常区間検出手段とを具えたものである。請求項５に記
載の音信号分析装置では、入力された音信号から周期基
準となる仮候補位置を検出し、その仮候補位置に基づい
て音信号の最大周波数及び最小周波数を検出し、検出さ
れた最大周波数及び最小周波数をカットオフ周波数とす
るバンドパスフィルタ処理を再び入力された音信号に対
して行うことによって、同波形区間の検出時に発生する
誤差の原因となる不要な低周波成分や高調波成分を有効
に除去することができる。このように不要な低周波成分
や高調波成分の除去された音信号について請求項４と同
様の処理を行うことによって、より高精度な定常区間の
分析を行うことができる。According to a fifth aspect of the present invention, there is provided a sound signal analyzing apparatus, comprising: input means for inputting an arbitrary sound signal from the outside; and a temporary candidate position serving as a cycle reference for the sound signal input from the input means. First period reference detection means for detecting a plurality, frequency band detection means for detecting a maximum frequency and a minimum frequency of the sound signal based on the tentative candidate position detected by the first period reference detection means, A filter processing unit that performs a band-pass filter process using the detected maximum frequency and minimum frequency as a cutoff frequency on the sound signal input from the input unit, and an output from the filter processing unit. Second period reference detecting means for detecting a plurality of candidate positions serving as a period reference for the sound signal; and the sound signal is represented by the candidate position. Calculating a degree of coincidence between waveforms of adjacent sections and connecting sections having a high degree of coincidence to detect the same waveform section; And a stationary section detecting means for detecting the In the sound signal analyzer according to claim 5, a temporary candidate position serving as a period reference is detected from the input sound signal, and the maximum frequency and the minimum frequency of the sound signal are detected based on the temporary candidate position. By performing band-pass filter processing using the maximum and minimum frequencies as cutoff frequencies on the input sound signal again, unnecessary low-frequency components and harmonics that may cause errors when detecting the same waveform section are generated. Wave components can be effectively removed. By performing the same processing as in claim 4 on the sound signal from which unnecessary low-frequency components and high-frequency components have been removed, more accurate analysis of a steady section can be performed.

【００１３】請求項６に記載の音信号分析装置は、外部
から任意の音信号を入力するための入力手段と、前記入
力手段から入力する前記音信号の中から音楽的な音が存
在すると思われる有効区間を分析する有効区間分析手段
と、前記有効区間を構成する前記音信号の正負両側部分
のそれぞれに対して周期基準となる候補位置の複数を検
出する周期基準検出手段と、前記音信号の前記正負両側
部分のそれぞれについて、前記候補位置で表される隣接
する区間同士の波形の一致度を演算し、その一致度の高
いもの同士を接続して同波形区間を検出する区間検出手
段と、前記区間検出手段によって検出された正負両側の
同波形区間を重ね合わせることによってできた区間を同
音色区間とする音色区間決定手段と、前記音色区間決定
手段によって決定された同音色区間に基づいて定常区間
を検出する定常区間決定手段とを具えたものである。請
求項６に記載の音信号分析装置では、音信号の正負両側
で周期基準となる候補位置を検出し、その候補位置に基
づいて同波形区間を検出し、その正負両側の同波形区間
を重ね合わせているので、正負側で音信号のピッチやレ
ベルが微妙にゆれた場合でもその誤差を減少することが
できる。そして、このようにして決定された同音色区間
を音高の急激な変化及び音圧の急激な変化に基づいて最
終的な１つの音符に相当する定常区間を分析する。これ
によって、マイク等からの入力音のピッチ又はレベルが
微妙にゆれた場合でも、そのゆれた部分以外の音楽的な
音の１つの音符に相当する定常区間を分析することがで
きる。According to the sound signal analyzing device of the present invention, it is considered that an input means for inputting an arbitrary sound signal from the outside and a musical sound exists from the sound signals input from the input means. Effective section analyzing means for analyzing a valid section to be used, cycle reference detecting means for detecting a plurality of candidate positions serving as a cycle reference for both the positive and negative sides of the sound signal constituting the effective section, and the sound signal For each of the positive and negative sides, a section detecting means for calculating a degree of coincidence between waveforms of adjacent sections represented by the candidate positions, and connecting those having a high degree of coincidence to detect the same waveform section. A timbre section determining means that sets a section formed by overlapping the same waveform sections on both the positive and negative sides detected by the section detecting means as a same timbre section; In which equipped with a constant interval determining means for detecting the constant section based on the tone interval that. In the sound signal analyzer according to the sixth aspect, a candidate position serving as a period reference is detected on both the positive and negative sides of the sound signal, the same waveform section is detected based on the candidate position, and the same waveform section on both the positive and negative sides is overlapped. Since the pitch is adjusted, even if the pitch or level of the sound signal slightly fluctuates on the positive or negative side, the error can be reduced. Then, the steady-state section corresponding to one final note is analyzed based on the rapid change of the pitch and the rapid change of the sound pressure in the same tone section determined in this way. Thus, even when the pitch or level of the input sound from the microphone or the like slightly fluctuates, it is possible to analyze a stationary section corresponding to one note of a musical sound other than the fluctuated portion.

【００１４】請求項７に記載の音信号分析装置は、外部
から任意の音信号を入力するための入力手段と、前記入
力手段から入力する前記音信号の中から音楽的な音が存
在すると思われる有効区間を分析する有効区間分析手段
と、前記有効区間を構成する前記音信号に対して周期基
準となる仮候補位置の複数を検出する第１の周期基準検
出手段と、前記第１の周期基準検出手段によって検出さ
れた前記仮候補位置に基づいて前記音信号の全区間又は
前記有効区間に関する最大周波数及び最小周波数を検出
する周波数帯検出手段と、この周波数帯検出手段によっ
て、検出された最大周波数及び最小周波数をカットオフ
周波数とするバンドパスフィルタ処理を前記入力手段か
ら入力する前記音信号の全区間又は前記有効区間毎に施
すフィルタ処理手段と、前記フィルタ処理手段から出力
される前記音信号に対して周期基準となる候補位置の複
数を検出する第２の周期基準検出手段と、前記音信号の
前記正負両側部分のそれぞれについて、前記候補位置で
表される隣接する区間同士の波形の一致度を演算し、そ
の一致度の高いもの同士を接続して同波形区間を検出す
る区間検出手段と、前記区間検出手段によって検出され
た同波形区間に基づいて定常区間を検出する定常区間決
定手段とを具えたものである。請求項７に記載の音信号
分析装置では、入力された音信号から周期基準となる仮
候補位置を検出し、その仮候補位置に基づいて音信号の
最大周波数及び最小周波数を検出し、検出された最大周
波数及び最小周波数をカットオフ周波数とするバンドパ
スフィルタ処理を再び入力された音信号に対して行うこ
とによって、同波形区間の検出時に発生する誤差の原因
となる不要な低周波成分や高調波成分を有効に除去する
ことができる。このようにな不要な低周波成分や高調波
成分の除去された音信号について定常区間の分析処理を
行うことによって、より高精度な分析が可能となる。請
求項８に記載の音信号分析装置は、請求項６又は７に記
載の前記有効区間検出手段を、請求項１、２又は３に記
載の音信号分析装置で構成したものである。In the sound signal analyzing device according to the present invention, it is considered that an input means for inputting an arbitrary sound signal from the outside and a musical sound exists among the sound signals input from the input means. Effective section analyzing means for analyzing an effective section to be used, first cycle reference detecting means for detecting a plurality of temporary candidate positions serving as a cycle reference for the sound signal constituting the effective section, and the first cycle Frequency band detecting means for detecting a maximum frequency and a minimum frequency for the entire section or the effective section of the sound signal based on the tentative candidate position detected by the reference detecting means; and a maximum frequency band detected by the frequency band detecting means. A filter processing unit that performs band-pass filtering using a frequency and a minimum frequency as a cutoff frequency for the entire section or the effective section of the sound signal input from the input unit. And second period reference detecting means for detecting a plurality of candidate positions serving as a period reference for the sound signal output from the filter processing means, and for each of the positive and negative sides of the sound signal, A section detecting means for calculating the degree of coincidence of the waveforms of adjacent sections represented by the position, connecting those having a high degree of coincidence to detect the same waveform section, and the same waveform detected by the section detecting means A stationary section determining means for detecting a stationary section based on the section. In the sound signal analyzer according to claim 7, a temporary candidate position serving as a period reference is detected from the input sound signal, and the maximum frequency and the minimum frequency of the sound signal are detected based on the temporary candidate position. By performing band-pass filter processing using the maximum and minimum frequencies as cutoff frequencies on the input sound signal again, unnecessary low-frequency components and harmonics that may cause errors when detecting the same waveform section are generated. Wave components can be effectively removed. By performing the analysis processing in the stationary section on the sound signal from which such unnecessary low-frequency components and harmonic components have been removed, more accurate analysis is possible. In a sound signal analyzer according to an eighth aspect, the effective section detecting means according to the sixth or seventh aspect is configured by the sound signal analyzer according to the first, second or third aspect.

【００１５】請求項９に記載の演奏情報発生装置は、外
部から任意の音信号を入力するための入力手段と、前記
入力手段から入力する前記音信号の中から１つの音符に
相当する定常区間を分析する定常区間分析手段と、前記
定常区間分析手段によって分析された前記定常区間毎に
代表周波数を決定する周波数決定手段と、前記周波数決
定手段によって決定された前記定常区間の代表周波数に
基づいて前後する２つの定常区間同士の代表周波数の差
をセントを基準にした値に変換するセント値変換手段
と、このセント値変換手段によって変換されたセントを
基準にした値に基づいて前記２つの定常区間同士の相対
的な音高差データを算出する音高差算出手段と、前記音
高差算出手段によって算出された前記音高差データに基
づいて各定常区間に所定の音階上の音高を割り当てる音
高割当手段とを具えたものである。請求項９に記載の演
奏情報発生装置は、１つの音符に相当する定常区間毎に
その代表周波数を決定し、それぞれ相前後する定常区間
同士の代表周波数の差から算出されるセントを基準にし
た値に基づいて音高差データを算出し、この音高差デー
タに基づいて各定常区間に所定の音階上の音高を割り当
てている。すなわち、定常区間の代表周波数はその定常
区間を構成する複数波形の平均値であり、また、音高差
データは相前後する定常区間同士のセントを基準にした
値に基づいて算出されているので、マイク等からの入力
音のピッチが微妙にゆれた場合でも最終的に割り当てら
れる音階上の音高の中にその誤差成分を吸収することが
可能となる。According to a ninth aspect of the present invention, there is provided a performance information generating apparatus, comprising: input means for externally inputting an arbitrary sound signal; and a stationary section corresponding to one note among the sound signals input from the input means. Based on the representative frequency of the stationary section determined by the frequency determining means, the frequency determining means for determining a representative frequency for each stationary section analyzed by the stationary section analyzing means, Cent value conversion means for converting the difference between the representative frequencies of the two preceding and following stationary sections into a value based on cents, and the two stationary values based on the cent value converted by the cent value conversion means. Pitch difference calculating means for calculating relative pitch difference data between sections, and in each steady section based on the pitch difference data calculated by the pitch difference calculating means. In which it equipped the pitch assignment means for assigning pitches on a constant of the scale. In the performance information generating device according to the ninth aspect, a representative frequency is determined for each stationary section corresponding to one note, and the cent is calculated from a difference between the representative frequencies of the preceding and following stationary sections. Pitch difference data is calculated based on the value, and a pitch on a predetermined scale is assigned to each steady section based on the pitch difference data. That is, the representative frequency of the stationary section is an average value of a plurality of waveforms constituting the stationary section, and the pitch difference data is calculated based on a value based on the cent of the consecutive sections adjacent to each other. Even if the pitch of the input sound from the microphone or the like slightly fluctuates, it is possible to absorb the error component in the pitch on the scale finally assigned.

【００１６】請求項１０に記載の演奏情報発生装置は、
外部から任意の音信号を入力するための入力手段と、前
記入力手段から入力する前記音信号の中から１つの音符
に相当する定常区間を分析する定常区間分析手段と、定
常区間分析手段によって分析された前記定常区間毎に代
表周波数を決定する周波数決定手段と、前記定常区間分
析手段によって分析された前記定常区間の複数を纏めて
１つのフレーズを検出するフレーズ検出手段と、前記フ
レーズ検出手段によって検出された１フレーズ内におけ
る当該定常区間よりも前に存在する全ての定常区間に対
して、その代表周波数の差をそれぞれセントを基準にし
た値に変換するセント値変換手段と、前記フレーズ検出
手段によって検出された１フレーズ内における当該定常
区間よりも前に存在する全ての定常区間に対する相対的
な時間距離に基づいた重みを算出する重み算出手段と、
このセント値変換手段によって変換されたセントを基準
にした値及び重み算出手段によって算出された重みに基
づいて前記２つの定常区間同士の相対的な音高差データ
を算出する音高差算出手段と、前記音高差算出手段によ
って算出された前記音高差データに基づいて各定常区間
に所定の音階上の音高を割り当てる音高割当手段とを具
えたものである。請求項１０に記載の演奏情報発生装置
は、複数の定常区間によって構成されるフレーズについ
て、請求項９に記載の演奏情報発生装置と同様に代表周
波数及びセントを基準にした値を決定し、フレーズ内の
前置音全てに対する時間距離に基づいた重み付けを行っ
て、音高差データを算出し、それに基づいて各定常区間
に所定の音階上の音高を割り当てている。これによっ
て、マイク等からの入力音のピッチが微妙にゆれた場合
でもフレーズを構成する定常区間の音に依存した音高割
当てを行うことができる。According to a tenth aspect of the present invention, there is provided a performance information generating apparatus.
An input unit for inputting an arbitrary sound signal from outside, a stationary interval analyzing unit for analyzing a stationary interval corresponding to one note from the sound signal input from the input unit, and an analyzing unit for analyzing the stationary interval analyzing unit Frequency determining means for determining a representative frequency for each of the stationary sections, a plurality of the stationary sections analyzed by the stationary section analyzing means, a phrase detecting means for detecting one phrase, and the phrase detecting means. Cent value conversion means for converting the difference between the representative frequencies into values based on cents for all stationary sections existing before the stationary section in one detected phrase, and the phrase detecting means Based on the relative time distances to all the stationary sections existing before the stationary section in one phrase detected by A weight calculation unit that calculates a weight was,
Pitch difference calculating means for calculating relative pitch difference data between the two stationary sections based on the value based on the cent converted by the cent value converting means and the weight calculated by the weight calculating means; And pitch assigning means for assigning a pitch on a predetermined scale to each stationary section based on the pitch difference data calculated by the pitch difference calculating means. A performance information generating device according to a tenth aspect determines a value based on a representative frequency and a cent for a phrase composed of a plurality of stationary sections in the same manner as the performance information generating device according to the ninth aspect. The pitch difference data is calculated by performing weighting based on the time distances for all the pre-positioned sounds, and the pitch on a predetermined scale is assigned to each steady section based on the weighted data. As a result, even when the pitch of the input sound from the microphone or the like slightly fluctuates, it is possible to perform the pitch assignment depending on the sound in the steady section constituting the phrase.

【００１７】請求項１１に記載の演奏情報発生装置は、
外部から任意の音信号を入力するための入力手段と、前
記入力手段から入力する前記音信号の中から１つの音符
に相当する定常区間を分析する定常区間分析手段と、定
常区間分析手段によって分析された前記定常区間毎に代
表周波数を決定する周波数決定手段と、前記定常区間分
析手段によって分析された前記定常区間の複数を纏めて
１つのフレーズを検出するフレーズ検出手段と、前記フ
レーズ検出手段によって検出された１フレーズ内の先頭
の定常区間の代表周波数に対する前記フレーズ内の他の
各定常区間の代表周波数の差をセントを基準にした値に
変換するセント値変換手段と、このセント値変換手段に
よって変換されたセントを基準にした値に基づいて前記
２つの定常区間同士の相対的な音高差データを算出する
音高差算出手段と、前記音高差算出手段によって算出さ
れた前記音高差データに基づいて各定常区間に所定の音
階上の音高を割り当てる音高割当手段とを具えたもので
ある。請求項１１に記載の演奏情報発生装置は、複数の
定常区間によって構成されるフレーズについて、請求項
９に記載の演奏情報発生装置と同様に代表周波数を決定
し、フレーズ内の先頭音に対してセントを基準にした値
及び音高差データを算出し、それに基づいて各定常区間
に所定の音階上の音高を割り当てている。これによっ
て、マイク等からの入力音のピッチが微妙にゆれた場合
でもフレーズ先頭音に依存した音高割当てを行うことが
できる。A performance information generating device according to claim 11 is
An input unit for inputting an arbitrary sound signal from outside, a stationary interval analyzing unit for analyzing a stationary interval corresponding to one note from the sound signal input from the input unit, and an analyzing unit for analyzing the stationary interval analyzing unit Frequency determining means for determining a representative frequency for each of the determined stationary sections, phrase detecting means for detecting a single phrase by collecting a plurality of the stationary sections analyzed by the stationary section analyzing means, and the phrase detecting means Cent value conversion means for converting the difference between the detected representative frequency of the first stationary section in one phrase and the representative frequency of each of the other stationary sections in the phrase into a value based on cents; Pitch difference calculating means for calculating relative pitch difference data between the two steady intervals based on a value based on cents converted by In which equipped the pitch assignment means for assigning the pitch on a given scale to each constant interval on the basis of the pitch difference data calculated by said pitch difference calculating means. The performance information generating device according to claim 11 determines a representative frequency for a phrase constituted by a plurality of stationary sections in the same manner as the performance information generating device according to claim 9, and determines a representative frequency for a leading sound in the phrase. A value based on a cent and pitch difference data are calculated, and a pitch on a predetermined scale is assigned to each steady section based on the calculated value. As a result, even when the pitch of the input sound from the microphone or the like slightly fluctuates, the pitch can be assigned depending on the leading sound of the phrase.

【００１８】請求項１２に記載の演奏情報発生装置は、
請求項９、１０又は１１に記載の前記音高割当手段を、
各定常区間に所定の音階上の音高を割り当てる際に、最
初の定常区間に所定の音高を割り当ててから、順番に残
りの定常区間に所定の音階上の音高を割り当てるように
構成したものである。請求項１３に記載の演奏情報発生
装置は、請求項９、１０又は１１に記載の前記音高割当
手段を、各定常区間に所定の音階上の音高を割り当てる
際に、最初の定常区間の音信号を分析してその定常区間
の平均周波数を検出し、検出された平均周波数に基づい
た音高を最初の定常区間の音高として割り当ててから、
残りの定常区間に順番に所定の音階上の音高を割り当て
るように構成したものである。請求項１４に記載の演奏
情報発生装置は、請求項９、１０又は１１に記載の前記
音高割当手段を、各定常区間に複数の音階上の音高をノ
ート位置をずらしながらそれぞれ割り当ててみて、各音
階の各ノート位置におけるノート割当誤差の累計値を算
出し、その累計値に応じて最適な音階を決定し、決定さ
れた音階上の音高をその定常区間の音高として順番に割
り当てるようにしたものである。請求項１５に記載の演
奏情報発生装置は、請求項９、１０又は１１に記載の前
記音高割当手段を、前記決定された音階上の音高をその
定常区間の音高として順番に割り当てる際にノート許容
誤差範囲の値に応じて音階外の音高を割り当てるように
構成したものである。[0018] According to a twelfth aspect of the present invention, there is provided a performance information generating apparatus.
The pitch assigning means according to claim 9, 10 or 11,
When assigning a pitch on a predetermined scale to each stationary section, a predetermined pitch is assigned to the first stationary section, and then a pitch on the predetermined scale is sequentially assigned to the remaining stationary sections. Things. According to a thirteenth aspect of the present invention, in the performance information generating device, the pitch allocating means according to the ninth, tenth, or eleventh aspect assigns a pitch on a predetermined scale to each steady interval. Analyzing the sound signal, detecting the average frequency of the stationary section, assigning the pitch based on the detected average frequency as the pitch of the first stationary section,
It is configured such that pitches on a predetermined scale are sequentially assigned to the remaining stationary sections. According to a fourteenth aspect of the present invention, in the performance information generating apparatus, the pitch allocating means according to the ninth, tenth, or eleventh aspect assigns pitches on a plurality of scales to each steady section while shifting note positions. Calculates the total value of the note assignment error at each note position of each scale, determines the optimal scale according to the total value, and sequentially assigns the pitch on the determined scale as the pitch of the stationary section. It is like that. A performance information generating apparatus according to claim 15, wherein the pitch assigning means according to claim 9, 10 or 11 sequentially assigns the pitches on the determined scale as pitches in a steady section. Are assigned to pitches outside the scale according to the value of the note allowable error range.

【００１９】[0019]

【発明の実施の形態】以下、この発明の実施の形態を添
付図面に従って詳細に説明する。図２はこの発明に係る
楽音情報分析装置及び演奏情報発生装置を内蔵した電子
楽器の構成を示すハードブロック図である。電子楽器
は、マイクロプロセッサユニット（ＣＰＵ）１、プログ
ラムメモリ２及びワーキングメモリ３からなるマイクロ
コンピュータによって制御される。ＣＰＵ１は、この電
子楽器全体の動作を制御するものである。このＣＰＵ１
に対して、データ及びアドレスバス１Ｅを介してプログ
ラムメモリ２、ワーキングメモリ３、演奏データメモリ
４、押鍵検出回路５、マイクインターフェイス６、スイ
ッチ検出回路７、表示回路８及び音源回路９がそれぞれ
接続されている。Embodiments of the present invention will be described below in detail with reference to the accompanying drawings. FIG. 2 is a hardware block diagram showing the configuration of an electronic musical instrument incorporating a musical sound information analyzer and a performance information generator according to the present invention. The electronic musical instrument is controlled by a microcomputer including a microprocessor unit (CPU) 1, a program memory 2, and a working memory 3. The CPU 1 controls the operation of the entire electronic musical instrument. This CPU1
, A program memory 2, a working memory 3, a performance data memory 4, a key press detection circuit 5, a microphone interface 6, a switch detection circuit 7, a display circuit 8, and a sound source circuit 9 are connected to each other via a data and address bus 1 E. Have been.

【００２０】プログラムメモリ２はＣＰＵ１の各種プロ
グラム（システムプログラムや動作プログラムなど）、
各種データ等を格納するものであり、リードオンリーメ
モリ（ＲＯＭ）で構成されている。ワーキングメモリ３
は、演奏情報やＣＰＵ１がプログラムを実行する際に発
生する各種データを一時的に記憶するものであり、ラン
ダムアクセスメモリ（ＲＡＭ）の所定のアドレス領域が
それぞれ割り当てられ、レジスタ、フラグ、バッファ、
テーブル等などとして利用される。演奏データメモリ４
は、マイク等からの入力音に基づいて発生された演奏情
報（ＭＩＤＩデータ）などを記憶するものである。ま
た、ＣＰＵ１には、図示していないが、ハードディスク
装置などを接続して、そこに自動演奏データやコード進
行データ等の各種データを記憶していてもよく、更に、
前記動作プログラムを記憶するようにしてもよい。ま
た、前記ＲＯＭ２に動作プログラムを記憶せずに、ハー
ドディスク装置にこれらの動作プログラムを記憶させて
おき、それをＲＡＭ３に読み込むことにより、ＲＯＭ２
に動作プログラムを記憶したときと同様の動作をＣＰＵ
１に行わせることができる。このようにすると、動作プ
ログラムの追加やバージョンアップ等が容易に行える。
着脱自在な外部記憶媒体の１つとして、ＣＤ−ＲＯＭを
使用してもよい。このＣＤ−ＲＯＭには、各種データ及
び任意の動作プログラムを記憶していてもよい。ＣＤ−
ＲＯＭに記憶されている動作プログラムや各種データ
は、ＣＤ−ＲＯＭドライブ（図示せず）によって、読み
出され、ハードディスク装置に転送記憶させることがで
きる。これにより、動作プログラムの新規のインストー
ルやバージョンアップを容易に行うことができる。The program memory 2 stores various programs of the CPU 1 (such as system programs and operation programs),
It stores various data and the like, and is constituted by a read only memory (ROM). Working memory 3
Is a memory for temporarily storing performance information and various data generated when the CPU 1 executes a program. A predetermined address area of a random access memory (RAM) is assigned to each of the registers, flags, buffers,
It is used as a table or the like. Performance data memory 4
Is for storing performance information (MIDI data) generated based on an input sound from a microphone or the like. Although not shown, a hard disk drive or the like may be connected to the CPU 1 to store various data such as automatic performance data and chord progression data therein.
The operation program may be stored. Also, without storing the operation programs in the ROM 2, these operation programs are stored in the hard disk device and read into the RAM 3, whereby the ROM 2 is read.
The same operation as when the operation program is stored in the CPU
1 can be performed. By doing so, it is possible to easily add an operation program or upgrade the version.
A CD-ROM may be used as one of the removable external storage media. This CD-ROM may store various data and an arbitrary operation program. CD-
The operation program and various data stored in the ROM can be read out by a CD-ROM drive (not shown) and transferred to a hard disk device for storage. This makes it possible to easily perform new installation and version upgrade of the operation program.

【００２１】なお、図示していないが、通信インターフ
ェイスをデータ及びアドレスバス１Ｅに接続し、この通
信インターフェイスを介してＬＡＮ（ローカルエリアネ
ットワーク）やインターネットなどの種々の通信ネット
ワーク上に接続可能とし、他のサーバコンピュータとの
間でデータのやりとりを行うようにしてもよい。これに
より、ハードディスク装置内に動作プログラムや各種デ
ータが記憶されていないような場合には、サーバコンピ
ュータからその動作プログラムや各種データをダウンロ
ードすることができる。この場合、クライアントとなる
楽音生成装置である電子楽器から、通信インターフェイ
ス及び通信ネットワークを介してサーバコンピュータに
動作プログラムや各種データのダウンロードを要求する
コマンドを送信する。サーバコンピュータは、このコマ
ンドに応じて、所定の動作プログラムやデータを、通信
ネットワークを介して電子楽器１に送信する。電子楽器
では、通信インターフェイスを介してこれらの動作プロ
グラムやデータを受信して、ハードディスク装置にこれ
らを蓄積する。これによって、動作プログラム及び各種
データのダウンロードが完了する。Although not shown, a communication interface is connected to the data and address bus 1E, and can be connected to various communication networks such as a LAN (local area network) and the Internet via the communication interface. May be exchanged with another server computer. Thus, when the operation program and various data are not stored in the hard disk device, the operation program and various data can be downloaded from the server computer. In this case, an electronic musical instrument, which is a musical sound generation device serving as a client, transmits a command for requesting download of an operation program and various data to a server computer via a communication interface and a communication network. The server computer transmits predetermined operation programs and data to the electronic musical instrument 1 via the communication network in response to the command. The electronic musical instrument receives these operation programs and data via the communication interface and stores them in the hard disk device. Thus, the download of the operation program and various data is completed.

【００２２】鍵盤１０は、発音すべき楽音の音高を選択
するための複数の鍵を備えており、各鍵に対応してキー
スイッチを有しており、また必要に応じて押鍵速度検出
装置や押圧力検出装置等のタッチ検出手段を有してい
る。押鍵検出回路５は、発生すべき楽音の音高を指定す
る鍵盤１０のそれぞれの鍵に対応して設けられた複数の
キースイッチからなる回路を含んで構成されており、新
たな鍵が押圧されたときはキーオンイベントを出力し、
鍵が新たに離鍵されたときはキーオフイベントを出力す
る。また、鍵押し下げ時の押鍵操作速度又は押圧力等を
判別してタッチデータを生成する処理を行い、生成した
タッチデータをベロシティデータとして出力する。この
ようにキーオン、キーオフイベント及びベロシティなど
のデータはＭＩＤＩ規格に準拠したデータ（以下「ＭＩ
ＤＩデータ」とする）で表現されておりキーコードと割
当てチャンネルを示すデータも含んでいる。マイクロフ
ォン１Ａは、音声信号や楽器音を電圧信号に変換して、
マイクインターフェイス６に出力する。マイクインター
フェイス６は、マイクロフォン１Ａからのアナログの電
圧信号をディジタル信号に変換してデータ及びアドレス
バス１Ｅを介してＣＰＵ１に出力する。The keyboard 10 is provided with a plurality of keys for selecting the pitch of a musical tone to be produced, has a key switch corresponding to each key, and detects a key pressing speed as required. It has touch detection means such as a device and a pressing force detection device. The key press detection circuit 5 is configured to include a circuit composed of a plurality of key switches provided corresponding to each key of the keyboard 10 for designating a pitch of a musical tone to be generated, and a new key is pressed. When a key-on event is output,
When a key is newly released, a key-off event is output. In addition, a process of generating touch data by determining a key pressing operation speed or a pressing force at the time of key pressing is performed, and the generated touch data is output as velocity data. As described above, data such as a key-on event, a key-off event, and velocity are data conforming to the MIDI standard (hereinafter referred to as “MI
DI data ”) and includes data indicating the key code and the assigned channel. The microphone 1A converts an audio signal or a musical instrument sound into a voltage signal,
Output to the microphone interface 6. The microphone interface 6 converts an analog voltage signal from the microphone 1A into a digital signal and outputs the digital signal to the CPU 1 via the data and address bus 1E.

【００２３】テンキー＆各種スイッチ１Ｂは、数値デー
タ入力用のテンキーや文字データ入力用のキーボード、
音符化処理（音信号分析処理及び演奏情報発生処理）の
スタート／ストップスイッチなどの各種の操作子を含ん
で構成される。なお、この他にも音高、音色、効果等を
選択・設定・制御するための各種操作子を含むが、その
詳細については公知なので説明を省略する。スイッチ検
出回路７は、テンキー＆各種スイッチ１Ｂの各操作子の
操作状態を検出し、その操作状態に応じたスイッチ情報
をデータ及びアドレスバス１Ｅを介してＣＰＵ１に出力
する。表示回路８はＣＰＵ１の制御状態、設定データの
内容等の各種の情報をディスプレイ１Ｃに表示するもの
である。ディスプレイ１Ｃは液晶表示パネル（ＬＣＤ）
やＣＲＴ等から構成され、表示回路８によってその表示
動作を制御されるようになっている。このテンキー＆各
種スイッチ１Ｂ、並びにディスプレイ１ＣによってＧＵ
Ｉ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃ
ｅ）が構成される。Numeric keys and various switches 1B include a numeric keypad for inputting numerical data, a keyboard for inputting character data,
It is configured to include various operators such as a start / stop switch for a note conversion process (sound signal analysis process and performance information generation process). In addition, various controls for selecting, setting, and controlling pitches, timbres, effects, and the like are also included, but details thereof are publicly known, and thus description thereof is omitted. The switch detection circuit 7 detects the operation state of each operator of the numeric keypad & various switches 1B, and outputs switch information corresponding to the operation state to the CPU 1 via the data and address bus 1E. The display circuit 8 displays various information such as the control state of the CPU 1 and the contents of the setting data on the display 1C. The display 1C is a liquid crystal display panel (LCD)
, A CRT or the like, and the display operation thereof is controlled by the display circuit 8. GU is operated by this numeric keypad & various switches 1B and display 1C.
I (Graphical User Interface)
e) is configured.

【００２４】音源回路９は、複数チャンネルで楽音信号
の同時発生が可能であり、データ及びアドレスバス１Ｅ
を経由して与えられた楽音トラック上のＭＩＤＩデータ
を入力し、このデータに基づいた楽音信号を生成し、そ
れをサウンドシステム１Ｄに出力する。音源回路９にお
いて複数チャンネルで楽音信号を同時に発音させる構成
としては、１つの回路を時分割で使用することによって
複数の発音チャンネルを形成するようなものや、１つの
発音チャンネルが１つの回路で構成されるような形式の
ものであってもよい。また、音源回路９における楽音信
号発生方式はいかなるものを用いてもよい。例えば、発
生すべき楽音の音高に対応して変化するアドレスデータ
に応じて波形メモリに記憶した楽音波形サンプル値デー
タを順次読み出すメモリ読み出し方式（波形メモリ方
式）、又は上記アドレスデータを位相角パラメータデー
タとして所定の周波数変調演算を実行して楽音波形サン
プル値データを求めるＦＭ方式、あるいは上記アドレス
データを位相角パラメータデータとして所定の振幅変調
演算を実行して楽音波形サンプル値データを求めるＡＭ
方式等の公知の方式を適宜採用してもよい。また、これ
らの方式以外にも、自然楽器の発音原理を模したアルゴ
リズムにより楽音波形を合成する物理モデル方式、基本
波に複数の高調波を加算することで楽音波形を合成する
高調波合成方式、特定のスペクトル分布を有するフォル
マント波形を用いて楽音波形を合成するフォルマント合
成方式、ＶＣＯ、ＶＣＦ及びＶＣＡを用いたアナログシ
ンセサイザ方式等を採用してもよい。また、専用のハー
ドウェアを用いて音源回路を構成するものに限らず、Ｄ
ＳＰとマイクロプログラムを用いて音源回路を構成する
ようにしてもよいし、ＣＰＵとソフトウェアのプログラ
ムで音源回路を構成するようにしてもよい。音源回路９
から発生された楽音信号は、アンプ及びスピーカからな
るサウンドシステム１Ｄを介して発音される。The tone generator 9 is capable of simultaneously generating musical tone signals on a plurality of channels, and has a data and address bus 1E.
, MIDI data on the musical sound track given via the input device, and generates a musical sound signal based on this data, and outputs it to the sound system 1D. The tone generator circuit 9 can simultaneously generate tone signals on a plurality of channels by using a single circuit in a time-division manner to form a plurality of tone channels, or a single tone channel is constituted by a single circuit. It may be of the type as described below. Also, any tone signal generation method in the tone generator 9 may be used. For example, a memory reading method (waveform memory method) for sequentially reading out tone waveform sample value data stored in a waveform memory in accordance with address data that changes according to the pitch of a musical tone to be generated, or a phase angle parameter An FM method for executing a predetermined frequency modulation operation as data to obtain tone waveform sample value data, or an AM for executing a predetermined amplitude modulation operation using the address data as phase angle parameter data to obtain tone waveform sample value data
A known method such as a method may be appropriately adopted. In addition to these methods, a physical model method that synthesizes a musical sound waveform by an algorithm that simulates the sounding principle of a natural musical instrument, a harmonic synthesis method that synthesizes a musical sound waveform by adding a plurality of harmonics to a fundamental wave, A formant synthesis method of synthesizing a musical tone waveform using a formant waveform having a specific spectral distribution, an analog synthesizer method using VCO, VCF, and VCA may be employed. Further, the present invention is not limited to the case where the tone generator circuit is configured using dedicated hardware.
The tone generator circuit may be configured by using the SP and the microprogram, or the tone generator circuit may be configured by a CPU and a software program. Sound source circuit 9
Is generated through a sound system 1D including an amplifier and a speaker.

【００２５】次に、この発明に係る電子楽器が音信号分
析装置及び演奏情報発生装置として動作する場合の一例
を説明する。図１は図２の電子楽器が演奏情報発生装置
として動作する際のメインフローを示す図である。メイ
ンフローは次のようなステップで順番に実行される。ステップ１１：まず、初期設定処理を行い、図２のワー
キングメモリ３内の各レジスタ及びフラグなどに初期値
を設定したりする。このとき、テンキー＆各種スイッチ
１Ｂ上の音符化処理スタートスイッチがオン操作された
場合に、ステップ１２〜ステップ１７までの一連の処理
を行う。Next, an example in which the electronic musical instrument according to the present invention operates as a sound signal analyzer and a performance information generator will be described. FIG. 1 is a diagram showing a main flow when the electronic musical instrument of FIG. 2 operates as a performance information generating device. The main flow is executed sequentially in the following steps. Step 11: First, an initial setting process is performed, and an initial value is set in each register, flag, and the like in the working memory 3 in FIG. At this time, when the musical note processing start switch on the numeric keypad & various switches 1B is turned on, a series of processing from step 12 to step 17 is performed.

【００２６】ステップ１２：このステップは音符化処理
スタートスイッチのオン操作有りと判定された場合に行
われるものであり、ここでは、そのオン操作に対応し
て、マイクインターフェイス６を介してマイクロフォン
１Ａから入力される音声信号や楽器音の電圧波形を所定
周期（例えば４４．１ｋＨｚ）でサンプリング処理し、
それをディジタルサンプル信号としてワーキングメモリ
３内の所内領域に記憶する。このサンプリング処理は従
来の公知の方法で行うので、ここでは詳細は省略する。
ステップ１３からステップ１５までが音符化処理スター
トスイッチのオン操作に対応した音符化処理である。こ
の音符化処理ではサンプリングされた音声信号や楽器音
のディジタルサンプル信号を種々分析してそれを音高列
すなわち楽譜表示可能なＭＩＤＩデータに変換する。ステップ１３：ステップ１２の音声サンプリング処理の
結果得られたディジタルサンプル信号に基づいて音楽的
な音が存在する区間すなわち有効区間がどこにあるのか
を検出する有効区間検出処理を行う。この有効区間検出
処理の詳細については後述する。ステップ１４：ステップ１３の有効区間検出処理の結
果、検出された各有効区間内に存在する音楽的な音の定
常部分を検出する定常区間検出処理を行う。この定常区
間検出処理の詳細についは後述する。ステップ１５：ステップ１３及びステップ１４の処理の
結果、得られた各定常区間毎に最も最適な音符を割り当
てる音高列決定処理を行う。すなわち、このステップで
はＭＩＤＩデータを発生する。この音高列決定処理の詳
細については後述する。ステップ１６：ステップ１５の処理によって発生された
ＭＩＤＩデータに基づいて楽譜を作成する楽譜作成処理
を行う。この楽譜作成処理は従来の技術によって容易に
実現可能なので詳細は省略する。ステップ１７：ステップ１５の処理によって発生された
ＭＩＤＩデータに基づいた自動演奏処理を行う。この自
動演奏処理についても従来の技術によって容易に実現可
能なので詳細は省略する。Step 12: This step is carried out when it is determined that the musical note processing start switch has been turned on. The input audio signal and the voltage waveform of the musical instrument sound are sampled at a predetermined period (for example, 44.1 kHz).
It is stored as a digital sample signal in an area within the working memory 3. Since this sampling process is performed by a conventionally known method, the details are omitted here.
Steps 13 to 15 correspond to the note conversion process corresponding to the ON operation of the note conversion start switch. In this note conversion process, sampled audio signals and digital sample signals of musical instrument sounds are analyzed in various ways and converted into a pitch sequence, that is, MIDI data that can be displayed in a musical score. Step 13: Based on the digital sample signal obtained as a result of the audio sampling processing in step 12, an effective section detection process for detecting a section where a musical sound exists, that is, an effective section, is performed. Details of the valid section detection processing will be described later. Step 14: As a result of the effective section detection processing in step 13, a steady section detection processing for detecting a steady portion of a musical sound present in each detected effective section is performed. The details of the stationary section detection processing will be described later. Step 15: A pitch sequence determination process for allocating the most optimal note for each steady section obtained as a result of the processes of steps 13 and 14 is performed. That is, in this step, MIDI data is generated. Details of the pitch sequence determination processing will be described later. Step 16: A score creation process for creating a score based on the MIDI data generated by the process of step 15 is performed. Since the score creation processing can be easily realized by the conventional technique, the details are omitted. Step 17: Perform an automatic performance process based on the MIDI data generated by the process of step 15. Since the automatic performance processing can be easily realized by the conventional technique, the details are omitted.

【００２７】図３は図１のステップ１３の有効区間検出
処理の詳細を示す図である。以下、ステップ１２によっ
て求められたディジタルサンプル信号からどのようにし
て有効区間が検出されるのか、この有効区間検出処理の
動作を図９及び図１０を用いて説明する。ステップ３１：ステップ１２によって求められたディジ
タルサンプル信号に基づいて平均音圧レベルを算出す
る。図９は、サンプリング周波数４４．１ｋＨｚでサン
プリングされた音声信号すなわちディジタルサンプル信
号の波形値の一例を示す図である。図９では、約２０ポ
イント分の波形値が示されている。ステップ３１では、
所定のサンプル数（例えば、１０ｍｓｅｃ相当の時間に
対応するサンプル数）にわたるサンプル振幅値の平均を
求め、それを平均音圧レベルとする。従って、サンプリ
ング周期４４．１ｋＨｚの場合においては、この所定サ
ンプル数は『４４１』であり、あるサンプルポイントの
平均値は、そのポイントを最終とする１０ｍｓｅｃ分前
の各ポイントの合計値、すなわちそのポイントから４４
１ポイント分前の波形値の合計を４４１で除した値とな
る。なお、０ポイントから４４０ポイントまでは、４４
１ポイント分の波形値が存在しないので、０ポイントか
らその該当ポイントまでの波形値の平均をそのポイント
の平均値とする。こうして、時系列的な平均音圧レベル
情報が各サンプルタイミング毎に得られる。図９では、
説明の便宜上１５ポイント分の波形値の平均値を平均音
圧レベルとする場合を図示している。従って、最初の１
５ポイントまではそれまでの波形値の合計値をそのポイ
ント数で除する形になっている。また、波形値の合計
は、その絶対値を合計することによって求める。図１０
（Ａ）はこのようにして求められた平均音圧レベルの値
を、サンプリングポイントを横軸とした場合をグラフ化
して示したものである。以下、この平均音圧レベルを平
均レベルカーブと称する。なお、図９のように１５ポイ
ント毎に平均音圧レベルを求める場合には、カットオフ
周波数１０Ｈｚ程度のローパスフィルタを掛けて、レベ
ル変動を滑らかにしている。従って、実際に４４１ポイ
ント分の波形値の平均を取る場合には、カットオフ周波
数８０〜１００Ｈｚ程度のローパスフィルタを掛けて、
そのレベル変動を滑らかにするのが望ましい。また、こ
こでは、あるサンプリングポイントの平均値を求めるの
に、そのポイントより前の所定数のポイントの波形値を
合計して平均音圧レベルを求める場合について説明した
が、あるサンプリングポイントを中心として前後に所定
数のポイントの波形値を合計してもよいし、サンプリン
グポイントから後に所定数のポイントの波形値を合計し
てもよい。FIG. 3 is a diagram showing details of the valid section detection processing in step 13 of FIG. Hereinafter, how the effective section is detected from the digital sample signal obtained in step 12 will be described with reference to FIGS. 9 and 10. Step 31: An average sound pressure level is calculated based on the digital sample signal obtained in step 12. FIG. 9 is a diagram illustrating an example of a waveform value of a voice signal sampled at a sampling frequency of 44.1 kHz, that is, a digital sample signal. FIG. 9 shows waveform values for about 20 points. In step 31,
The average of the sample amplitude values over a predetermined number of samples (for example, the number of samples corresponding to a time equivalent to 10 msec) is obtained, and the average is used as the average sound pressure level. Therefore, in the case of the sampling period of 44.1 kHz, the predetermined number of samples is “441”, and the average value of a certain sample point is the sum of the points 10 msec before the last point, that is, the point. From 44
This is a value obtained by dividing the sum of the waveform values one point before by 441. In addition, 44 points from 0 points to 440 points
Since there is no waveform value for one point, the average of the waveform values from 0 point to the corresponding point is set as the average value of that point. In this way, time-series average sound pressure level information is obtained for each sample timing. In FIG.
For convenience of explanation, the case where the average value of the waveform values for 15 points is set as the average sound pressure level is shown. Therefore, the first one
Up to 5 points, the total value of the waveform values up to that point is divided by the number of points. The sum of the waveform values is obtained by summing the absolute values. FIG.
(A) is a graph showing the average sound pressure level value obtained in this way, with the sampling point being taken on the horizontal axis. Hereinafter, this average sound pressure level is referred to as an average level curve. When the average sound pressure level is obtained at every 15 points as shown in FIG. 9, a low-pass filter with a cutoff frequency of about 10 Hz is applied to smooth the level fluctuation. Therefore, when actually averaging the waveform values for 441 points, a low-pass filter with a cutoff frequency of about 80 to 100 Hz is applied,
It is desirable to smooth the level fluctuation. Also, here, a case has been described in which the average value of a certain sampling point is obtained, and the average sound pressure level is obtained by summing the waveform values of a predetermined number of points before that point. The waveform values at a predetermined number of points before and after may be totaled, or the waveform values at a predetermined number of points after the sampling point may be totaled.

【００２８】ステップ３２：前記ステップ３１で求めら
れた図１０のような平均レベルカーブを、所定のしきい
値に基づいて有効区間又は無効区間にそれぞれ分類する
処理を行う。この処理では、しきい値として、その平均
レベルカーブの中の最大波形値の２０パーセントの値を
しきい値とする。これ以外の値をしいき値としてもよい
ことは言うまでもない。例えば、平均レベルカーブの平
均値をしきい値としたり、又はその平均値の８０パーセ
ントをしきい値としたり、平均レベルカーブの最大値の
半分の値をしきい値としたりしてもよい。しきい値は図
１０（Ｂ）のような点線で示される。従って、この点線
（しきい値）と平均レベルカーブとの交点位置が有効区
間及び無効区間の境界となり、この点線（しきい値）よ
りも大きい区間が有効区間となり、小さい区間が無効区
間となる。図１０（Ｂ）では、有効区間を○印で示し、
無効区間を×印で示す。Step 32: The average level curve as shown in FIG. 10 obtained in step 31 is classified into a valid section and an invalid section based on a predetermined threshold value. In this process, the threshold value is set to 20% of the maximum waveform value in the average level curve. It goes without saying that other values may be used as threshold values. For example, the average value of the average level curve may be set as the threshold value, 80% of the average value may be set as the threshold value, or a half value of the maximum value of the average level curve may be set as the threshold value. The threshold value is indicated by a dotted line as shown in FIG. Therefore, the intersection between the dotted line (threshold value) and the average level curve is a boundary between the valid section and the invalid section. . In FIG. 10 (B), the valid section is indicated by a circle,
Invalid sections are indicated by crosses.

【００２９】ステップ３３：人間が音高を認知できる必
要な最低長を０．０５ｍｓｅｃとした場合に、前記ステ
ップ３２で決定された無効区間の中からこの最低長より
も小さな無効区間を有効区間に変更する。例えば、サン
プリング周期が４４．１ｋＨｚの場合にはサンプリング
数で２２０５個以下の無効区間を有効区間に変更する。
図１０（Ｂ）においては、左側から第３番目及び第５番
目の無効区間がこの短い無効区間に相当する。従って、
ステップ３３の処理の結果、図１０（Ｂ）は図１０
（Ｃ）のようになり、有効区間が拡張されたようにな
る。なお、この処理において、全区間内の始まりと終わ
りの部分に存在する無効区間は、短い無効区間に相当す
るが、短いからといって有効区間に変更しない特別な領
域として△印を用いて表現している。Step 33: If the minimum length required for human recognition of the pitch is 0.05 msec, an invalid section smaller than the minimum length is set as an effective section from the invalid sections determined in step 32. change. For example, when the sampling period is 44.1 kHz, invalid sections of 2205 or less in sampling number are changed to valid sections.
In FIG. 10B, the third and fifth invalid sections from the left correspond to this short invalid section. Therefore,
As a result of the processing in step 33, FIG.
As shown in (C), the effective section is extended. In this process, the invalid sections existing at the beginning and end of the entire section correspond to short invalid sections, but are represented by special symbols that are not changed to valid sections just because they are short, using a triangle. doing.

【００３０】ステップ３４：前記ステップ３３の処理の
結果、得られた有効区間及び無効区間のパターンの中か
ら０．０５ｍｓｅｃ以下の短い有効区間を無効区間に変
更する処理を行う。この処理は前記ステップ３３と同様
の処理にて行う。図１０（Ｃ）においては、右端の有効
区間がこの短い有効区間に該当する。従って、ステップ
３４の処理の結果、図１０（Ｃ）は図１０（Ｄ）のよう
になる。図１０（Ｄ）から明らかなように、有効区間は
第１区間から第４区間までの全部の４つの区間となる。
なお、区間の終わりの部分の△印は第４の有効区間とみ
なされる。ステップ３５：ステップ３４で特定された有効区間の平
均レベルカーブの平均値を求め、それが所定値よりも小
さい場合にその部分を無効区間とする最終的な有効区間
のチェックを行う。この平均値はその有効区間に存在す
る各ポイントの平均音圧レベル値の合計をその有効区間
長で除することによって得られる。このようにして得ら
れた平均音圧レベルの平均値が図１０（Ｄ）の各区間の
下側に示してある。第１区間は６０、第２区間は２５、
第３区間は４５、第４区間は１５である。この平均音圧
レベルの平均値がその区間の最大波形値の３０パーセン
トを下回った場合は、その区間を無効区間とする。ここ
では、第２区間及び第４区間が該当するので、それぞれ
の区間が無効区間になる。図１０（Ｅ）はこのステップ
３５の有効区間チェック処理によって特定された有効区
間と無効区間を示す図である。Step 34: A process for changing a short effective section of 0.05 msec or less from the pattern of the effective section and the invalid section obtained as a result of the processing of the step 33 to an invalid section is performed. This process is performed in the same manner as in step 33. In FIG. 10C, the effective section at the right end corresponds to the short effective section. Therefore, as a result of the processing in step 34, FIG. 10C becomes as shown in FIG. As is clear from FIG. 10D, the effective sections are all four sections from the first section to the fourth section.
Note that the mark at the end of the section is regarded as the fourth effective section. Step 35: The average value of the average level curve of the effective section specified in step 34 is obtained, and if the average value is smaller than a predetermined value, the final effective section is checked to make that part an invalid section. This average value is obtained by dividing the sum of the average sound pressure level values of the points existing in the effective section by the effective section length. The average value of the average sound pressure levels obtained in this way is shown below each section in FIG. The first section is 60, the second section is 25,
The third section is 45 and the fourth section is 15. If the average value of the average sound pressure level falls below 30% of the maximum waveform value of the section, the section is regarded as an invalid section. Here, since the second section and the fourth section correspond to each other, each section becomes an invalid section. FIG. 10E is a diagram showing the valid section and the invalid section specified by the valid section check processing in step 35.

【００３１】ステップ３６：ステップ３１〜３５までの
処理によって特定された有効区間を拡張する処理を行
う。例えば、図１０（Ｆ）に示すように最大波形値の１
５パーセントを拡張許可レベルとして、そこの部分に線
を引き、有効区間を特定する境界線をその拡張許可レベ
ルの線のところまで拡張する。すなわち、各有効区間の
端から外側に向かって平均レベルカーブの上昇下降をチ
ェックしながら、そのカーブが拡張許可レベルを下回っ
たかどうかのチェックを行いながら拡張処理を行う。こ
のとき、下降が上昇に反転した場合や拡張許可レベルを
下回った場合には、そこまでを有効区間とする。また、
図１０（Ｇ）は、この有効区間拡張処理の別の例を示す
図である。拡張許可レベルを最大波形値の５パーセント
とし、平均レベルカーブの下降が終了した位置を有効区
間の末端とする。又は、上昇が始まった位置を末端とし
てもいい。この拡張処理によれば、図１０（Ｆ）の場合
よりも第１区間及び第３区間の拡張幅が大きくなる。こ
のようにして、人間が音高として認知することの可能な
有効区間が最終的に決定することになる。なお、拡張許
可レベルが低く、かつ、有効区間が近い距離にある場合
には、ある有効区間の末尾側の拡張位置と次の有効区間
の先頭側の拡張位置とが接近することもあれば、また同
じ位置になることもある。また、下降が終わる部分と上
昇が始まる部分のいずれを区切りにするかによっても境
界位置が変わる。この拡張処理の結果、有効区間同士が
重複した場合には、両方の中間位置を境界位置とすれば
よい。なお、図１０（Ｆ）及び（Ｇ）では、有効区間の
拡張を前後に行う場合について説明したが、前方向又は
後方向のみにしてもよい。また、前後に拡張する場合
に、前方向と後方向とで拡張許可レベルを異ならせるよ
うにしてもよい。Step 36: A process for extending the valid section specified by the processes of steps 31 to 35 is performed. For example, as shown in FIG.
With 5% as the extension permission level, a line is drawn on that portion, and the boundary line for specifying the valid section is extended to the line of the extension permission level. That is, the extension processing is performed while checking whether the average level curve is below the extension permission level while checking the rise and fall of the average level curve from the end of each effective section to the outside. At this time, when the descending is reversed to the ascending or when the descending is below the extended permission level, the area up to that point is regarded as an effective section. Also,
FIG. 10 (G) is a diagram showing another example of the effective section extension processing. The extension permission level is set to 5% of the maximum waveform value, and the position where the lowering of the average level curve ends is set as the end of the effective section. Alternatively, the position where the rise has started may be set as the end. According to this extension processing, the extension width of the first section and the third section is larger than in the case of FIG. In this way, an effective section that can be recognized as a pitch by a human is finally determined. If the extension permission level is low and the effective section is at a short distance, the extension position at the end of a certain effective section may be close to the extension position at the beginning of the next effective section. It may also be at the same position. In addition, the boundary position changes depending on whether a part where the descent ends or a part where the ascent starts is delimited. If the effective sections overlap as a result of this extension processing, both intermediate positions may be set as boundary positions. 10 (F) and 10 (G), the case where the effective section is extended before and after has been described. However, only the forward direction or the backward direction may be performed. Further, when extending forward and backward, the extension permission level may be different between the forward direction and the backward direction.

【００３２】図４は図１のステップ１４の定常区間検出
処理の詳細を示す図である。ステップ１３によって求め
られた有効区間の中から定常区間がどのようにして検出
されるのか、その定常区間検出処理の詳細を図１１から
図１８までの図面を用いて説明する。音声や楽音などの
音楽的なオーディオ信号を分析する場合、定常部がどこ
にあるかを知ることは重要なことである。リズム系以外
の音色では、定常部の周期性によって音高が決定され、
定常部を骨格として音価が決定されるからである。この
実施の形態では、定常部は、楽譜として表した時に一つ
の音符に相当する区間のことであり、音色、音高、ベロ
シティという音の３大要素の変化に注目し、人間が一つ
の音として認識する区間を時間軸上で検出しようとする
ことをいう。以下、図４のステップに従って、この定常
区間検出処理について説明する。FIG. 4 is a diagram showing the details of the stationary section detection processing in step 14 of FIG. Details of how the steady section is detected from the effective sections obtained in step 13 will be described with reference to FIGS. 11 to 18. When analyzing musical audio signals such as voices and musical sounds, it is important to know where the stationary part is. For non-rhythm sounds, the pitch is determined by the periodicity of the stationary part.
This is because the sound value is determined using the stationary part as a skeleton. In this embodiment, the stationary part is a section corresponding to one note when expressed as a musical score. Means to detect on the time axis a section recognized as. Hereinafter, the stationary section detection processing will be described according to the steps in FIG.

【００３３】ステップ４１：図３の有効区間検出処理に
よって得られた有効区間の全ての区間に対して１周期の
基準となる位置を検出する処理を行う。周期の基準位置
を検出するには、大きく分けて、０クロス位置検出かピ
ーク位置検出のどちらか一方を用いるのが一般的であ
る。０クロス位置検出によって周期の基準位置を検出す
るためには、フィルタ等で倍音をできるだけ取り除かな
いと検出は困難であり、帯域分割も必要である。ピーク
位置検出の場合も倍音をできるだけ取り除くことが望ま
しいが、０クロス位置検出ほどはシビアでないため、音
声や楽器の発音可能周波数帯をカットオフ周波数として
バンドパスフィルタを掛けるだけでよく、帯域分割など
の処理を特に行う必要はない。従って、ピーク位置検出
の方が手順が簡単で、そこそこの結果が得られる方法で
あり、望ましい。従って、この実施の形態では、ピーク
位置検出のよって周期の基準位置を検出する場合につい
て説明する。まず、ピーク位置検出を行う前にフィルタ
による倍音削除を行う。これは、発音可能な帯域をカッ
トオフ周波数として、バンドパスフィルタを掛けること
である。音声の場合、人間の発音可能な帯域は８０〜１
０００Ｈｚ程度であり、ユーザを限定せずに、オールマ
イティに分析するにはこれくらいが必要である。但し、
ユーザが限定されている場合には、発音可能な帯域をあ
る程度絞ることによって、倍音による間違いが減少させ
て精度を向上させることができる。ギターなら、８０〜
７００Ｈｚ程度であるが、これも予め音高枠を決めてお
くと精度が上がる。楽器ごとの違いなども予め設定して
おくと精度が向上する。図１１に示すように、有効区間
内の楽音波形のピーク位置検出を行う。このピーク位置
検出方法は公知の手法によって行う。楽音波形のピーク
レベルを検知して、これを所定の時定数回路で保持し、
そのピークレベルをスレッシュルドホールド電圧として
次にこのスレッシュルドホールド電圧以上になった場合
を次のピークレベルとして保持し、それを順次繰り返す
ことによって、図１１（Ｂ）のようなピーク位置を検出
することができる。図１１（Ａ）はこのピーク位置を検
出する際のスレッシュルドホールド電圧の様子を示す図
である。図１１（Ａ）の音声波形からは、図１１（Ｂ）
のようなピーク位置が検出されることになる。ピーク位
置Ｐ１，Ｐ３，Ｐ５，Ｐ７，ＰＡ，ＰＣ，ＰＥは共に規
則正しく所定の周期でピーク位置が現れているが、ピー
ク位置Ｐ９については、音声波形の若干の乱れによって
不正なピーク位置の現れ方をしている。以下、このよう
な不正なピーク位置が修正されて、正しいピーク位置に
おける定常区間の検出が行われる。Step 41: A process of detecting a reference position for one cycle is performed for all the effective sections obtained by the effective section detection processing of FIG. In order to detect the reference position of the cycle, it is general that either one of the 0 cross position detection and the peak position detection is generally used. In order to detect the reference position of the cycle by detecting the 0 cross position, it is difficult to detect the overtone without removing as much as possible of the overtone with a filter or the like, and band division is also required. In the case of peak position detection, it is desirable to remove harmonics as much as possible. However, since it is not as severe as the detection of the 0 cross position, it is only necessary to apply a band-pass filter with the cut-off frequency of the audible frequency band of the voice or musical instrument. There is no particular need to perform the processing of. Therefore, the peak position detection is a simpler procedure and a method that can obtain a reasonable result, which is desirable. Therefore, in this embodiment, a case where the reference position of the cycle is detected by the peak position detection will be described. First, before performing peak position detection, harmonic overtones are removed by a filter. This is to apply a band-pass filter using a soundable band as a cutoff frequency. In the case of voice, the band that humans can pronounce is 80-1
The frequency is about 000 Hz, which is necessary for almighty analysis without limiting the user. However,
When the number of users is limited, it is possible to reduce errors due to overtones and to improve accuracy by narrowing the band in which sound can be generated to some extent. 80 ~
Although the frequency is about 700 Hz, the accuracy is improved if the pitch frame is determined in advance. The accuracy can be improved by setting in advance the differences for each instrument. As shown in FIG. 11, the peak position of the musical sound waveform in the effective section is detected. This peak position detection method is performed by a known method. The peak level of the musical sound waveform is detected, and this is held by a predetermined time constant circuit.
The peak level is set as a threshold voltage, and the case where the voltage becomes higher than the threshold voltage is held as the next peak level, and the peak level as shown in FIG. be able to. FIG. 11A is a diagram showing a state of a threshold hold voltage when detecting this peak position. From the audio waveform of FIG. 11A, FIG.
Will be detected. The peak positions P1, P3, P5, P7, PA, PC, and PE all regularly appear at a predetermined period, but the peak position P9 appears due to a slight disturbance of the audio waveform. You are. Hereinafter, such an incorrect peak position is corrected, and a stationary section at the correct peak position is detected.

【００３４】ステップ４２：前記ステップ４１で検出さ
れた周期基準位置に基づいて、あるピーク基準位置から
始まる基本区間と、その基本区間の直後の次のピーク基
準位置までの区間（以下、移動区間とする）との２つの
区間について比較を行う。図１１（Ｂ）に示すように、
ピーク基準位置Ｐ７について考えると、最初の基本区間
はピーク基準位置Ｐ７から次のピーク基準位置Ｐ９まで
の区間７９となる。しかし、区間７９は帯域最低長以下
の長さなので、ピーク基準位置Ｐ７から次のピーク基準
位置ＰＡまでの区間７Ａに拡張される。この区間７Ａは
帯域最低長よりも大きく、帯域最高長よりも小さいの
で、これに決定する。次に移動区間はピーク基準位置Ｐ
Ａからピーク基準位置ＰＣまでの区間ＡＣとなる。区間
７Ａと区間ＡＣとの誤差率を算出して、それが所定値以
上だったら両区間は一致していないと判断して、移動区
間の長さをさらに広げる。そして、拡大した移動区間と
基本区間とで誤差率を算出して、さらに誤差率が所定値
以上だったら、その時の基本区間では一致しないと判定
し、基本区間の拡大を行う。従って、基本区間はピーク
基準位置Ｐ７から次のピーク基準位置ＰＣまでの区間７
Ｃに拡張される。しかしながら、この区間７Ｃは帯域最
高長よりも大きいので、比較処理は中止し、この区間で
は一致しなかったと判定される。仮に、区間７Ａと区間
ＡＣとの比較の結果、誤差率が所定値（例えば１０）よ
りも小さい場合には、両区間は一致すると判定して、次
のピーク基準位置ＰＡから始まる区間３と次の移動区間
について同様の処理を行う。この誤差率の算出方法につ
いては後述する。このとき、ワーキングメモリ（ＲＡ
Ｍ）には、ピーク基準位置情報と、そのときの誤差率
と、一致フラグとのデータがそれぞれ書き込まれるデー
タ領域を有する。上述の場合、区間７Ａと区間ＡＣとが
一致した場合は、ピーク基準位置情報Ｐ７と、そのとき
の誤差率と、一致を示す一致フラグとがデータ領域にそ
れぞれ書き込まれる。一方、区間２と区間３とが一致し
ない場合には、不一致フラグだけが書き込まれる。Step 42: Based on the periodic reference position detected in step 41, a basic section starting from a certain peak reference position and a section up to the next peak reference position immediately after the basic section (hereinafter referred to as a moving section). ) Are compared with each other. As shown in FIG.
Considering the peak reference position P7, the first basic section is a section 79 from the peak reference position P7 to the next peak reference position P9. However, since the section 79 has a length equal to or less than the minimum band length, the section 79 is extended to a section 7A from the peak reference position P7 to the next peak reference position PA. Since this section 7A is larger than the minimum band length and smaller than the maximum band length, it is determined to be this. Next, the moving section is the peak reference position P
A section AC from A to the peak reference position PC. The error rate between the section 7A and the section AC is calculated, and if the error rate is equal to or more than a predetermined value, it is determined that the two sections do not match, and the length of the moving section is further increased. Then, an error rate is calculated between the enlarged moving section and the basic section. If the error rate is equal to or more than a predetermined value, it is determined that the basic section at that time does not match, and the basic section is expanded. Therefore, the basic section is a section 7 from the peak reference position P7 to the next peak reference position PC.
Extended to C. However, since this section 7C is larger than the maximum band length, the comparison process is stopped, and it is determined that there is no match in this section. If, as a result of the comparison between the sections 7A and AC, the error rate is smaller than a predetermined value (for example, 10), it is determined that the two sections match, and the sections 3 and 4 starting from the next peak reference position PA The same processing is performed for the moving section of. The method of calculating the error rate will be described later. At this time, the working memory (RA
M) has a data area in which the data of the peak reference position information, the error rate at that time, and the match flag are respectively written. In the case described above, when the section 7A and the section AC match, the peak reference position information P7, the error rate at that time, and the match flag indicating the match are written in the data area. On the other hand, when the sections 2 and 3 do not match, only the mismatch flag is written.

【００３５】図１２はこの波形比較処理の中で行われる
誤差率の算出方法を説明するための図である。まず、誤
差率の算出対象となる２つの波形が図１２の示すような
比較波１Ｘと比較波２Ｘだとする。まず、比較波１Ｘ及
び比較波２Ｘについて、最大振幅値が１００パーセント
となるようにその振幅値の正規化を行う。ここで、比較
波２Ｘは比較波１Ｘに比べて時間軸（横軸）方向の大き
さが短くなっているので、比較波２Ｘを比較波１Ｘと同
じ時間幅となるように伸長する。これによって、比較波
１Ｘは比較波１Ｙとなり、比較波２Ｘは比較波２Ｙとな
り、時間軸の伸長によって最終的には比較波２Ｚとな
る。この比較波１Ｙと比較波２Ｚとの間で誤差率の計算
が行われる。図１３は、比較波１Ｙと比較波２Ｚとの間
の誤差率を算出する場合の具体例を示す図である。図で
は、比較波１Ｙと比較波２Ｚの最初の１周期の波形すな
わちサンプリング数で２４個分について誤差率を算出す
る場合について説明する。比較波１Ｙと比較波２Ｚの同
じサンプリング位置についてその差分を算出し、その差
分の絶対値の合計を求める。図１３の場合には絶対値の
合計値は１２２である。これをサンプリング数２４で除
することによって、誤差率が求まる。この場合には誤差
率は５となる。この場合、所定値を１０とすれば、この
誤差率は１０以下なので、同じ波形として処理される。
図１３において、各波形は１０００を最大レベルとして
正規化されている。このようにして波形比較処理が行わ
れることによって、図１１（Ｂ）のピーク基準位置Ｐ９
はキャンセルされ、図１４（Ａ）のように規則正しいピ
ーク位置が検出されることになる。FIG. 12 is a diagram for explaining a method of calculating an error rate performed in the waveform comparison processing. First, it is assumed that two waveforms whose error rates are to be calculated are a comparison wave 1X and a comparison wave 2X as shown in FIG. First, with respect to the comparison wave 1X and the comparison wave 2X, the amplitude values are normalized such that the maximum amplitude value becomes 100%. Here, since the comparative wave 2X has a smaller size in the time axis (horizontal axis) direction than the comparative wave 1X, the comparative wave 2X extends so as to have the same time width as the comparative wave 1X. Thus, the comparison wave 1X becomes the comparison wave 1Y, the comparison wave 2X becomes the comparison wave 2Y, and finally becomes the comparison wave 2Z due to the elongation of the time axis. The error rate is calculated between the comparison wave 1Y and the comparison wave 2Z. FIG. 13 is a diagram illustrating a specific example in the case where the error rate between the comparison wave 1Y and the comparison wave 2Z is calculated. In the figure, a case will be described in which the error rates are calculated for the waveforms of the first cycle of the comparison wave 1Y and the comparison wave 2Z, that is, for 24 sampling waves. The difference is calculated for the same sampling position of the comparison wave 1Y and the comparison wave 2Z, and the sum of the absolute values of the difference is calculated. In the case of FIG. 13, the sum of the absolute values is 122. By dividing this by the sampling number 24, the error rate is obtained. In this case, the error rate is 5. In this case, if the predetermined value is 10, the error rate is 10 or less, so that the same waveform is processed.
In FIG. 13, each waveform is normalized with 1000 as the maximum level. By performing the waveform comparison processing in this manner, the peak reference position P9 in FIG.
Is canceled, and regular peak positions are detected as shown in FIG.

【００３６】ステップ４３：ステップ４２の波形比較処
理の結果を利用して、誤差率が所定値（例えば１０）よ
りも小さな区間同士を繋げて、それを疑似的な一致区間
とし、その一致区間の最大値と最小値を検出し、それに
基づいてカットオフ周波数帯を決定する。例えば、波形
比較処理の結果得られた複数の一致区間の中の最小値が
２３５ポイントで、最大値が３６５ポイントだとする。
この一致区間にやや余裕を持たせるために、最小値を１
割減とし、最大値を１割増しとすると、一致区間は約２
１２ポイントから約４０２ポイントになる。これは、サ
ンプリング周波数が４４．１ｋＨｚだと、１１０Ｈｚか
ら２０８Ｈｚのオーディオ信号の周波数帯に相当する。
従って、この１１０Ｈｚから２０８Ｈｚをカットオフ周
波数帯とする。ステップ４４：ステップ４３で決定された新たなカット
オフ周波数を用いて、ステップ４１及びステップ４２と
同様の処理を繰り返し実行する。例えば、前述の場合に
は、カットオフ周波数を１１０Ｈｚから２０８Ｈｚの範
囲として、ステップ４１の周期基準位置検出処理とステ
ップ４２の波形比較処理を繰り返し、同じ波形の連続す
る区間（一致区間）を検出する。これによって、誤差の
原因となる低周波や高調波がカットされてより精度の高
い処理が可能となり、前回よりも精度の高い一致区間が
得られる。このステップ４４の同波形区間検出処理によ
って、図１１（Ａ）のような音声波形は、図１４（Ｂ）
のような三つの区間Ｘ，Ｙ，Ｚに分割されたような形、
すなわち、区間Ｘと区間Ｚが区間Ｙによって連続性の途
切れた形になる。図１４（Ｂ）では、音声波形のプラス
側もマイナス側も同じような場所で途切れている。な
お、図１５（Ａ）のような音声波形の場合には、４周期
目と５周期目の基音成分がやや少ないので、プラス側の
ピークがうまく認識できずに、図１５（Ｂ）のようなピ
ーク位置が検出されることになる。従って、図１５
（Ａ）のような音声波形についてステップ４１からステ
ップ４４の処理を行うと、図１５（Ｃ）のようにプラス
側だけがピーク位置Ｐ５とピーク位置ＰＢとの間で連続
性の途切れたような形になる。しかしながら、これが定
常区間であるの場合には、プラス側かマイナス側のどち
らかに必ず基音成分が顕著に現れるので、図１５（Ａ）
の音声波形の場合には音声波形のマイナス側については
同波形区間として規則正しく認識される。従って、ステ
ップ４２での波形比較処理における誤差率が許容範囲内
であったとすると、定常区間は図１５（Ｃ）のようなマ
イナス部分の矢印で示される範囲となり、プラス側の途
切れた部分は誤りだったと認識されることになる。な
お、音声波形の場合は、基音成分がプラス側でもマイナ
ス側でも乱れる場合が結構あり得るので、ステップ４６
で定常部の重ね合わせ処理を行い、その対策を行うよう
にしている。Step 43: Using the result of the waveform comparison processing in step 42, the sections having an error rate smaller than a predetermined value (for example, 10) are connected to each other, and this is used as a pseudo matching section. The maximum value and the minimum value are detected, and the cutoff frequency band is determined based on the maximum value and the minimum value. For example, it is assumed that the minimum value among a plurality of matching sections obtained as a result of the waveform comparison processing is 235 points and the maximum value is 365 points.
In order to allow a margin for this matching section, the minimum value is set to 1
Assuming that the discount is reduced and the maximum value is increased by 10%, the matching section is approximately 2
From 12 points to about 402 points. This corresponds to a frequency band of an audio signal of 110 Hz to 208 Hz when the sampling frequency is 44.1 kHz.
Accordingly, the cutoff frequency band is from 110 Hz to 208 Hz. Step 44: The same processing as in steps 41 and 42 is repeatedly executed using the new cutoff frequency determined in step 43. For example, in the case described above, the cutoff frequency is set in the range of 110 Hz to 208 Hz, and the cycle reference position detection processing in step 41 and the waveform comparison processing in step 42 are repeated to detect a continuous section (coincidence section) of the same waveform. . As a result, low frequencies and harmonics that cause errors are cut off, so that more accurate processing can be performed, and a matching section with higher accuracy than the previous time can be obtained. By the same waveform section detection processing in step 44, the audio waveform as shown in FIG.
Divided into three sections X, Y, and Z, such as
In other words, the section X and the section Z have a form in which continuity is interrupted by the section Y. In FIG. 14B, both the plus side and the minus side of the audio waveform are interrupted at similar places. In the case of the speech waveform as shown in FIG. 15A, since the fundamental components in the fourth and fifth cycles are slightly smaller, the peak on the plus side cannot be recognized well, and as shown in FIG. Will be detected. Therefore, FIG.
When the processing from step 41 to step 44 is performed on the audio waveform as shown in (A), the continuity is broken between the peak position P5 and the peak position PB only on the plus side as shown in FIG. It takes shape. However, when this is a stationary section, the fundamental tone component always appears remarkably on either the plus side or the minus side, and therefore, FIG.
In the case of the voice waveform of, the negative side of the voice waveform is regularly recognized as the same waveform section. Therefore, assuming that the error rate in the waveform comparison process in step 42 is within the allowable range, the steady section becomes a range indicated by a minus arrow as shown in FIG. It will be recognized that it was. In the case of an audio waveform, there is a possibility that the fundamental tone component is disturbed on both the positive side and the negative side.
The superimposing process of the stationary part is performed in order to take countermeasures.

【００３７】ステップ４５：ステップ４４までの処理に
よって図１６（Ａ）に示すような両矢印に対応した区間
が同波形区間であると認定された場合に、その区間を拡
張する。すなわち、同波形区間内の先頭（○印の部分）
と末尾（×印の部分）の波形をそれぞれ基本波形とし
て、その両側の区間をそれぞれ移動区間としてステップ
４２の波形比較処理と同じ方法によって、誤差率を求
め、同波形区間を拡張する。このとき、誤差率のしきい
値をステップ４２の場合よりも高め（例えば誤差率１５
程度）に設定しておくことによって、この同波形区間を
図１６（Ｂ）のような点線矢印で示すような位置にまで
拡張することができる。但し、拡張した結果が隣の区間
に重なった場合にはその時点で拡張処理を止める。この
ようにして拡張された同波形区間が音声波形のプラス側
及びマイナス側のそれぞれの定常区間となる。ステップ４６：ステップ４１からステップ４５までの処
理は、音声波形のプラス側及びマイナス側についても行
われるので、両側で独立に得られた定常区間を重ね合わ
せる。例えば、ステップ４５までの処理の結果、プラス
側とマイナス側の定常区間が図１７（Ａ）に示すような
矢印の範囲になったとする。これらプラス側とマイナス
側の定常区間をそれぞれ重ね合わせると、図１７（Ｂ）
のハッチングされた長方形部分が最終的な定常区間とな
る。この場合、プラス側及びマイナス側に存在するそれ
ぞれ５つの定常区間はステップ４６の定常部重ね合わせ
処理の結果、４つの定常区間となる。Step 45: If the section corresponding to the double-headed arrow as shown in FIG. 16A is determined to be the same waveform section by the processing up to step 44, the section is extended. In other words, the beginning of the same waveform section (the part marked with ○)
The error rate is obtained by the same method as the waveform comparison process of step 42, and the waveforms at the end and the end (marked with x) are set as the basic waveforms, and the sections on both sides thereof are set as the moving sections. At this time, the threshold value of the error rate is set higher than in the case of step 42 (for example, the error rate 15
), The same waveform section can be extended to a position shown by a dotted arrow in FIG. 16B. However, if the result of the extension overlaps the adjacent section, the extension process is stopped at that point. The same waveform section extended in this way becomes a stationary section on each of the plus side and the minus side of the voice waveform. Step 46: Since the processes from step 41 to step 45 are also performed on the plus side and the minus side of the audio waveform, the stationary sections independently obtained on both sides are overlapped. For example, as a result of the processing up to step 45, it is assumed that the steady section on the plus side and the minus side is in the range of the arrow as shown in FIG. When these positive side and negative side normal sections are overlapped with each other, FIG.
Is the final stationary section. In this case, each of the five steady sections existing on the plus side and the minus side becomes four steady sections as a result of the steady section overlapping processing in step 46.

【００３８】ステップ４７：ステップ４６までの処理に
よって得られた定常区間についてさらに今度は音高及び
音圧の変化による細分化処理を行う。ステップ４６まで
の定常区間検出処理では、波形を引き延ばして比較して
いるため、『ああ』などのような連続母音による音声波
形の音高変化であっても、それを１つの同じ音としてと
らえるような仕組みになっている。従って、楽器音の楽
音波形の場合には、持続系の楽器音の音高変化を見つけ
出せないような事態も起こる。そこで、この実施の形態
では、ステップ４６までの処理によって得られた定常部
区間ごとに音高変化の状態を調べて、その状態に応じて
さらに分割する必要があるかどうかの判定を行い。必要
があると判定された場合には、定常部区間をさらに細か
く分割する。例えば、ある定常区間の中における周期基
準位置の間の長さを計算し、それをサンプリング周波数
で割ることによってその周期基準位置おける周波数が算
出される。図１８（Ａ）は定常区間を構成する各波形の
周波数の値を示すと共に、その区間の周波数と前区間の
周波数との差分をノートに対応したリニア軸で数値化し
たノート距離を示している。ここで、ノート距離は、ｌｏｇ（比較される周波数／元となる周波数）／ｌｏｇ
（１２√２）の式によって算出される。このノート距離の値が±０．
５の範囲内にある場合は急激な音高の変化ではないと見
なし、これよりも大きい値の場合は音高が急激に変化し
たものと見なして、その部分を区間の区切りとしてその
定常区間をさらに細かく分離する。例えば、図１８
（Ａ）の場合には、定常区間の第１０番目から第１２番
目のノート距離が共に０．５よりも大きいので、その部
分で音高が急激に変化したとみなされるので、定常区間
がさらに第１番目から第９番目までの区間と、第１３番
目から第２４番目までの区間に分離される。これは、音
色は変わらないが音高が変化した時の音符の区間を検出
しているのに等しい。Step 47: Subdivision processing based on changes in pitch and sound pressure is further performed on the stationary section obtained by the processing up to step 46. In the stationary section detection processing up to step 46, since the waveform is expanded and compared, even if the pitch change of the voice waveform is caused by a continuous vowel such as "Oh", it is regarded as one and the same sound. It is a mechanism. Therefore, in the case of the musical sound waveform of the musical instrument sound, a situation may occur in which the pitch change of the continuous musical instrument sound cannot be found. Therefore, in this embodiment, the state of the pitch change is checked for each steady part section obtained by the processing up to step 46, and it is determined whether or not it is necessary to further divide the tone according to the state. When it is determined that it is necessary, the stationary part section is further divided. For example, the frequency at the periodic reference position is calculated by calculating the length between the periodic reference positions in a certain stationary section and dividing the length by the sampling frequency. FIG. 18A shows the value of the frequency of each waveform constituting the stationary section, and also shows the note distance obtained by quantifying the difference between the frequency of the section and the frequency of the previous section on a linear axis corresponding to the note. . Here, note distance is log (frequency to be compared / source frequency) / log
It is calculated by the equation (12√2). The value of this note distance is ± 0.
When the value is within the range of 5, it is considered that the pitch does not change suddenly, and when the value is larger than this, it is considered that the pitch has changed suddenly, and that part is defined as a section break and the steady section is defined as the section. Separate further. For example, FIG.
In the case of (A), since the tenth to twelfth note distances in the stationary section are both greater than 0.5, it is considered that the pitch has changed abruptly in that part. It is divided into the first to ninth sections and the thirteenth to twenty-fourth sections. This is equivalent to detecting a section of a note when the tone does not change but the pitch changes.

【００３９】次に、音圧についても音高変化の場合と同
様のことが言えるので、音圧レベルが急激に変化した位
置を検出し、その部分で定常区間をさらに細かく分割す
る。図１８（Ｂ）は定常区間を構成する各波形の平均音
圧レベルの値を示すと共に、その区間の平均音圧レベル
と前区間の平均音圧レベルとの増幅比を示している。こ
の増幅比はｌｏｇ（前区間の平均レベル／その区間の平均音圧レベ
ル）によって得られる。この増幅比の値が±０．０１の範囲
内の場合は急激な音圧変化はないと見なし、これよりも
大きな値の場合は音圧が急激に変化したものと見なし
て、その部分を区間の区切りとして、その定常区間をさ
らに細かく分離する。例えば、図１８（Ｂ）の場合に
は、定常区間の第１６番目と第１７番目が０．０１より
も大きいので、その部分は音圧レベルの急激な変化部分
と見なされる。従って、初期分割区間のように３つの区
間に分割される。ところが、人間が音を感知できるの
は、０．０１秒から０．１秒程度なので、それに応じて
区間の最低長を決め、それ以下の区間はその次の区間と
繋げることとする。従って、図１８（Ｂ）の初期分割区
間の２番目の区間（○印の部分）は３番目の区間に繋げ
られて、最終的には初期分割区間の右側に示すような２
区間に分割されることになる。Next, since the same can be said for the sound pressure as in the case of the sound pitch change, the position where the sound pressure level has sharply changed is detected, and the steady section is further finely divided at that position. FIG. 18B shows the value of the average sound pressure level of each waveform constituting the steady section, and also shows the amplification ratio between the average sound pressure level of that section and the average sound pressure level of the previous section. This amplification ratio is obtained by log (average level of previous section / average sound pressure level of that section). If the value of this amplification ratio is within the range of ± 0.01, it is considered that there is no sudden change in sound pressure, and if the value is larger than this, it is considered that the sound pressure has changed suddenly, and that part is regarded as a section. , The stationary section is further finely separated. For example, in the case of FIG. 18B, since the sixteenth and seventeenth portions of the steady section are larger than 0.01, that portion is regarded as a portion where the sound pressure level changes rapidly. Therefore, it is divided into three sections like the initial division section. However, since a person can sense a sound for about 0.01 to 0.1 seconds, the minimum length of the section is determined accordingly, and a section shorter than that is connected to the next section. Therefore, the second section (the portion marked with a circle) of the initial division section in FIG. 18B is connected to the third section, and finally, the second section as shown on the right side of the initial division section.
It will be divided into sections.

【００４０】図１９は、図４の定常区間検出処理の概念
を示すための図である。図１９において、有効区間は図
３の有効区間検出処理によって検出された結果である。
同波形区間はステップ４１からステップ４６までの処理
によって得られた区間である。同音高区間は、ステップ
４７の音高変化の急激な部分で同音高区間をさらに細か
く分割することによって得られた区間である。同音圧区
間はこの同音高区間をさらに音圧変化の急激な部分で分
割することによって得られた区間である。なお、ノート
距離及び増幅比でそれぞれ定常区間（同波形区間）を分
割する場合について述べたが、分割された結果のいずれ
か一方だけを採用してもよいし、両方を採用してもよ
い。両方採用する場合に前述と同様に区間の最低長によ
る調整を行うようにしてもよい。また、ノート距離及び
増幅比で分割する場合に、いずれか一方の分割処理を優
先的に行い、その結果分割されなかった場合に限り他方
の分割処理を行うようにしてもよい。FIG. 19 is a diagram for illustrating the concept of the stationary section detection processing of FIG. In FIG. 19, the valid section is a result detected by the valid section detection processing of FIG.
The same waveform section is a section obtained by the processing from step 41 to step 46. The same pitch section is a section obtained by further subdividing the same pitch section at the portion where the pitch change is abrupt in step 47. The same sound pressure section is a section obtained by further dividing this same sound pitch section by a portion where the sound pressure changes rapidly. Although the case has been described in which the steady section (same waveform section) is divided by the note distance and the amplification ratio, either one of the divided results may be employed, or both may be employed. When both are adopted, adjustment may be performed based on the minimum length of the section in the same manner as described above. Further, when dividing by the note distance and the amplification ratio, one of the dividing processes may be preferentially performed, and the other dividing process may be performed only when the division is not performed as a result.

【００４１】図５は図１のステップ１５の音高列決定処
理の詳細を示す図である。ステップ１４によって検出さ
れた各定常区間に対して最適な音高列を決定する。な
お、この実施の形態では、４種類の音高列決定処理につ
いて、図２０から図２５までの図面を用いて説明する。
音声や楽音などを最終的に音符情報に変換する場合、あ
る特定周波数をどの音高に丸めるかによってメロディが
大幅に変わってしまい、思ったような検出ができない場
合が多い。そこで、この実施の形態では、相対音を主体
として音高を決定し、さらにそれに調を利用して一番ふ
さわしい音高遷移を選択することによって音高列を決定
するようにした。まず、音高列決定処理の第１の実施例
である音高列決定処理１について、図５のフローチャー
トに従って説明する。ステップ５１：ステップ１３（図３）の有効区間検出処
理及びステップ１４（図４）の定常区間検出処理によっ
て得られた各定常区間に対してその区間の代表周波数を
決定する。図２０（Ａ）は、最終的に得られた定常区間
の一例を示す図である。ここでは全部で１２個の区間が
検出されたものとして、各区間に括弧記号で囲まれた
FIG. 5 is a diagram showing details of the pitch sequence determination processing in step 15 of FIG. An optimal pitch sequence is determined for each stationary section detected in step 14. In this embodiment, four types of pitch sequence determination processing will be described with reference to FIGS. 20 to 25.
When voices, musical sounds, and the like are finally converted to note information, the melody greatly changes depending on the pitch to which a certain specific frequency is rounded, and in many cases, the desired detection cannot be performed. Therefore, in this embodiment, the pitch is determined mainly by the relative sound, and the pitch sequence is determined by selecting the most suitable pitch transition using the key. First, a pitch sequence determination process 1 which is a first embodiment of the pitch sequence determination process will be described with reference to the flowchart of FIG. Step 51: For each stationary section obtained by the effective section detection processing of step 13 (FIG. 3) and the stationary section detection processing of step 14 (FIG. 4), the representative frequency of the section is determined. FIG. 20A is a diagram illustrating an example of a finally obtained steady section. Here, it is assumed that a total of 12 sections have been detected, and each section is enclosed in parentheses.

〔０〕〜〔１２〕の区間番号を割り当ててある。各定常
区間の代表周波数を決定する場合に重要なことは、各定
常区間の周期位置から周波数の動向を洗い出して、その
区間固有の周波数を１つに決定することである。そのた
めの方法として、第１の方法は定常区間全体の平均周波
数をその区間の代表周波数とする。第２の方法は定常区
間の丁度中間付近の周期（周波数）をその区間の代表周
波数とする。第３の方法はピッチが安定している部分の
平均周波数をその区間の代表周波数とする。なお、この
実施の形態では、図１のステップ１４の定常区間検出処
理の際に算出した誤差率を用いて代表周波数の算出処理
を行う。すなわち、図４のステップ４５の定常区間拡張
処理前の同波形区間検出処理を利用して、その中で所定
値（例えば１０）以下の誤差率の並んだ安定した定常区
間における周波数の平均を算出し、それをその定常区間
の代表周波数とする。例えば、図４のステップ４４の同
波形区間検出処理で、隣接する区間の誤差率が１０以下
の場合を波形が一致すると判断して、定常区間を検出し
たとする。この場合（定常区間拡張処理前）の定常区間
を構成する各波形区間の情報が図２０（Ｂ）のようにな
っていたとする。すなわち、図２０（Ｂ）に示すように
定常区間は誤差率１０以下の１２個の波形区間で構成さ
れる。各波形区間の周期長及び誤差率は図示の通りであ
る。この場合、この定常区間における周期長の平均値
は、２５５．８３３となる。ここで、周期長はサンプリ
ング数で表されているので、サンプリング周波数が４
４．１ｋＨｚだから、この定常区間の代表周波数は、そ
の周期長の平均値でサンプリング周波数を除することに
よって得られ、図２０（Ｂ）の場合には１７２．３８Ｈ
ｚとなる。この場合、代表周波数の値は小数点２桁を有
効として扱う。図２０（Ｃ）はこのようにして図２０
（Ａ）のような各定常区間の代表周波数を算出した結果
を示す図である。Section numbers [0] to [12] are assigned. What is important when determining the representative frequency of each stationary section is to identify the trend of the frequency from the periodic position of each stationary section and determine one unique frequency for that section. As a method for this, the first method uses the average frequency of the entire stationary section as the representative frequency of the section. In the second method, a cycle (frequency) near the middle of a stationary section is set as a representative frequency of the section. In the third method, the average frequency of the portion where the pitch is stable is set as the representative frequency of the section. In this embodiment, the representative frequency is calculated using the error rate calculated at the time of the stationary section detection processing in step 14 of FIG. That is, using the same waveform section detection processing before the steady section expansion processing of step 45 in FIG. 4, the average of the frequencies in the stable steady sections in which the error rates are equal to or less than a predetermined value (for example, 10) is calculated. And set it as the representative frequency of the stationary section. For example, it is assumed that in the same waveform section detection processing in step 44 of FIG. 4, when the error rate of the adjacent section is 10 or less, it is determined that the waveforms match, and a steady section is detected. In this case, it is assumed that the information of each waveform section constituting the steady section in this case (before the steady section extension processing) is as shown in FIG. That is, as shown in FIG. 20B, the stationary section is composed of 12 waveform sections having an error rate of 10 or less. The cycle length and error rate of each waveform section are as shown. In this case, the average value of the cycle length in this steady section is 255.833. Here, since the cycle length is represented by the number of samples, the sampling frequency is 4
Since the frequency is 4.1 kHz, the representative frequency of this stationary section is obtained by dividing the sampling frequency by the average value of the period length. In the case of FIG.
z. In this case, the value of the representative frequency treats two decimal places as valid. FIG. 20 (C) thus shows FIG.
It is a figure showing the result of having computed the representative frequency of each steady section like (A).

【００４２】ステップ５２：ステップ５１の処理によっ
て各定常区間の代表周波数が決定されると、今度はその
代表周波数に基づいて各定常区間の相前後する定常区間
番号同士のノート距離を決定する。ノート距離の決定は
図４のステップ４７で用いた演算式と同様にして求め
る。図２０（Ｃ）にはこのようにして算出されたノート
距離の一例が示されている。ステップ５３：算出されたノート距離の小数点以下一桁
を四捨五入して、ノート距離を１２音階上の各音高へ丸
め込む。例えば、図２０（Ｃ）の場合には、各ノート距
離は四捨五入されて、右欄の実数のようになる。この実
数は、前音高からのノート番号上の差を示すことになる
ので、最初の音高を決定することによって、音高列デー
タを完成することが可能となる。図２０（Ｃ）の最右欄
に示す音高列データが最初の音高を０とした場合の音高
遷移のようすを示すデータである。すなわち、図２０
（Ｃ）の場合には０−２−４−５−２−３・・・とな
る。ステップ５４：第１音の音高を決定する。まず、最も簡
単な方法は、第１音にデフォルト値として６０のノート
ナンバ（ノートネームＣ４）音を割り当てる。すなわ
ち、ＭＩＤＩ規格の場合、ノートナンバの限界は０〜１
２７なので、第１音の音高として、ノートナンバ６０
（ノートネームＣ４）の音を割り当てる。これによっ
て、高音側（プラス側）には６７半音分、低音側（マイ
ナス側）には６０半音分だけ音高を振ることができる。
このようにすると図２０（Ｃ）の最右欄の音高列を示す
データは、６０（Ｃ４）−６２（Ｄ４）−６４（Ｅ４）
−６５（Ｆ４）−６２（Ｄ４）−６３（Ｄ♯４）・・・
・となる。Step 52: When the representative frequency of each stationary section is determined by the processing of step 51, the note distance between consecutive stationary section numbers of successive stationary sections is determined based on the representative frequency. The note distance is determined in the same manner as the arithmetic expression used in step 47 of FIG. FIG. 20C shows an example of the note distance calculated in this way. Step 53: The calculated note distance is rounded off to one digit after the decimal point, and the note distance is rounded to each pitch on the 12th scale. For example, in the case of FIG. 20C, each note distance is rounded off and becomes a real number in the right column. Since this real number indicates the difference in note number from the previous pitch, the pitch sequence data can be completed by determining the first pitch. The pitch sequence data shown in the rightmost column of FIG. 20C is data indicating a state of a pitch transition when the first pitch is set to 0. That is, FIG.
In the case of (C), 0-2--4-5-2-3... Step 54: Determine the pitch of the first sound. First, the simplest method is to assign a 60 note number (note name C4) sound to the first sound as a default value. That is, in the case of the MIDI standard, the limit of the note number is 0 to 1
27, the note number 60
(Note name C4) is assigned. As a result, the pitch can be increased by 67 semitones on the high frequency side (plus side) and by 60 semitones on the low frequency side (minus side).
In this case, the data indicating the pitch sequence in the rightmost column in FIG. 20C is 60 (C4) -62 (D4) -64 (E4).
-65 (F4) -62 (D4) -63 (D♯4) ・・・
・ It becomes.

【００４３】ステップ５５：ステップ５４で決定された
音高列データを修正する。すなわち、ステップ５４で決
定された音高列データの振れ幅を検出し、それが低音側
（マイナス側）に−６０以下に振れている場合には、そ
の最小振れ幅に合わせてデフォルト値６０を修正する。
この修正は、最小振れ幅のノートが０以上となるように
デフォルト値を上側にシフトすることによって行う。例
えば、最小振れ幅が−６４の場合には、計算式−６０−
（−６２）＝４の結果に従って、デフォルト値６０を４
ノート分上側にシフトして、第１音として６４を割当て
る。高音側（プラス側）に＋６７以上振れている場合に
も同様に最大振れ幅に合わせてデフォルト値６０を修正
すればよい。なお、低音側（マイナス側）及び高音（プ
ラス側）の両方において振れ幅がオーバーすることは人
間の発声帯域から判断してあり得ないので、そのような
場合は除外する。なお、このようなことが起こり得るよ
うな場合には、特別に音域を０〜２５６の範囲で設定す
るようにしてもよい。なお、ステップ５４では、第１音
の音高をデフォルト値（例えば６０）として決定し、音
高列データを作成する場合について説明したが、これに
限らず、最初の定常区間の代表周波数に最も近い純正率
音階の周波数を検出し、その音階に当てはめるようにし
てもよい。例えば、図２０（Ｃ）の場合には、区間番号
Step 55: Correct the pitch sequence data determined in Step 54. That is, the swing range of the pitch sequence data determined in step 54 is detected, and if the swing range is -60 or less on the low frequency side (minus side), the default value 60 is set according to the minimum swing width. Fix it.
This correction is performed by shifting the default value upward so that the note with the minimum swing is 0 or more. For example, if the minimum swing width is −64, the calculation formula −60−
According to the result of (−62) = 4, the default value 60 is set to 4
Shift up by a note and assign 64 as the first note. Even in the case where the swing is +67 or more on the treble side (plus side), the default value 60 may be similarly corrected in accordance with the maximum swing width. It is impossible to judge from the human utterance band that the swing width exceeds both the bass side (minus side) and the treble side (plus side), and such a case is excluded. If such a situation can occur, the range may be set in a range of 0 to 256. In step 54, the case where the pitch of the first sound is determined as a default value (for example, 60) and the pitch sequence data is created has been described. However, the present invention is not limited to this. It is also possible to detect the frequency of a close genuine scale and apply it to the scale. For example, in the case of FIG.

〔０〕の代表周波数は１７２．３８Ｈｚなので、第１音
の音高をそれに最も近いノートナンバ５３（ノートネー
ムＦ３）に決定する。これによって、図２０（Ｃ）の音
高列を示すデータは、５３（Ｆ３）−５５（Ｇ３）−５
７（Ａ３）−５８（Ａ♯３）−５５（Ｇ３）−５６（Ｇ
♯３）・・・・となる。Since the representative frequency of [0] is 172.38 Hz, the pitch of the first sound is determined to be the note number 53 (note name F3) closest thereto. As a result, the data indicating the pitch sequence in FIG. 20C is 53 (F3) -55 (G3) -5.
7 (A3) -58 (A♯3) -55 (G3) -56 (G
♯3) ...

【００４４】次に、音高列決定処理の第２の実施例であ
る音高列決定処理２について、図６のフローチャートに
従って説明する。この音高列決定処理２におけるステッ
プ６１及びステップ６２の処理は、前述の図５のステッ
プ５１及びステップ５２と同じなので、ここでは、ステ
ップ６３からの処理について説明する。ステップ６３：算出されたノート距離を用いて、複数ス
ケール上の各音高に丸めた場合のノート割当誤差の累計
処理を行う。すなわち、この実施の形態では、自然的音
階、和声的音階、旋律的音階の３種類の音階について、
それぞれ丸めた場合の適合度を算出する。自然的音階は
図２１に示すように全音，半音，全音，全音，半音，全
音，半音の順番に割当可能音が並んでいる。また、和声
的音階は全音，半音，全音，全音，半音，３半音（全音
＋半音），半音の順番に並んでいる。また、旋律的音階
は、上昇時には全音，半音，全音，全音，全音，全音，
半音の順番となり、下降時に自然的音階と同じような順
番の並びになっている。図２１では○印が音階構成音と
して採用可能なものを示し、×印は音階構成音として採
用不可のものを示す。図２１の各音階について、最初の
音が各○印の音高から始まったものと仮定して、各定常
区間番号の音を×印を選択しないように順次割り当て
る。このときに、定常区間番号の音と割り当てられた音
との間の音高差すなわちノート割当誤差を算出し、それ
を累計する。例えば、定常区間番号Next, a pitch sequence determining process 2 which is a second embodiment of the pitch sequence determining process will be described with reference to the flowchart of FIG. Steps 61 and 62 in the pitch sequence determination processing 2 are the same as steps 51 and 52 in FIG. 5 described above, and thus the processing from step 63 will be described here. Step 63: Using the calculated note distance, a process of accumulating note assignment errors when rounding to each pitch on a plurality of scales is performed. That is, in this embodiment, three types of scales, a natural scale, a harmony scale, and a melody scale,
Calculate the fitness of each rounding. As shown in FIG. 21, in the natural scale, assignable sounds are arranged in the order of whole tone, half tone, whole tone, whole tone, half tone, whole tone, and half tone. The harmony scale is arranged in the order of a whole tone, a semitone, a whole tone, a whole tone, a half tone, three semitones (a whole tone + half tone), and a semitone. In addition, the melody scale, when ascending, is a whole tone, a semitone, a whole tone, a whole tone, a whole tone, a whole tone,
The order of the semitones is the same as the natural scale when descending. In FIG. 21, a mark “もの” indicates a sound that can be adopted as a scale constituent sound, and a mark “X” indicates a sound that cannot be adopted as a scale constituent sound. For each scale in FIG. 21, assuming that the first sound starts from the pitch of each circle, the sound of each stationary section number is sequentially assigned so as not to select the cross. At this time, a pitch difference, that is, a note allocation error between the sound of the stationary section number and the allocated sound is calculated, and the total is calculated. For example, stationary section number

〔０〕に対するノー
ト距離が図２０（Ｃ）のような音高列データの場合に、
定常区間番号〔５〕までを図２１の自然的音階に割り当
ててみる。まず、図２０（Ｃ）の定常区間番号When the note distance to [0] is pitch sequence data as shown in FIG.
Let's assign up to the regular section number [5] to the natural scale of FIG. First, the stationary section number shown in FIG.

〔０〕の
音は最初の音なので図２１の自然的音階のノート位置
（０）に割当てられる。次に図２０（Ｃ）の定常区間番
号〔１〕の音は定常区間番号Since the sound [0] is the first sound, it is assigned to the note position (0) of the natural scale in FIG. Next, the sound of the stationary section number [1] in FIG.

〔０〕の音に対してノート
距離が１．７１５８なので、ノート距離としては半音又
は２半音（全音）が選択されるべきである。このとき、
自然的音階ではノート距離で半音の音すなわちノート位
置（１）は×印なので、ここは選択されないで、ノート
距離で２半音（全音）の音すなわちノート位置（２）に
割り当てられる。従って、ノート距離１．７１５８の音
がノート位置（２）に割り当てられたので、定常区間番
号〔１〕の音のノート割当誤差は、実際に割り当てられ
たノート位置（２）までのノート距離２と定常区間番号
Since the note distance is 1.7158 for the sound [0], a semitone or two semitones (whole sound) should be selected as the note distance. At this time,
In the natural scale, a semitone sound at the note distance, that is, the note position (1) is marked with a cross, so that it is not selected, and is assigned to a two semitone (whole sound) sound at the note distance, that is, the note position (2). Accordingly, since the note with the note distance of 1.7158 is assigned to the note position (2), the note assignment error of the sound of the stationary section number [1] is the note distance of 2 to the actually assigned note position (2). And stationary section number

〔０〕と定常区間番号〔１〕とのノート距離１．７１５
８との差分となり、その値は０．２８４２となる。次
に、定常区間番号〔２〕の音は定常区間番号〔１〕の音
に対してノート距離が２．１５５７なので、ノート距離
として全音又は３半音（全音＋半音）が選択される。定
常区間番号〔１〕の音は前回の処理で、ノート位置
（２）に割り当てられているので、定常区間番号〔２〕
の音はノート距離全音又は３半音（全音＋半音）のノー
ト位置（４）又は（５）に割り当てられることになる。
このとき、自然的音階ではノート位置（４）の音は×印
なので、定常区間番号〔２〕の音はノート位置（５）に
割り当てられることになる。従って、ノート距離２．１
５５７の音がノート距離３に対応するノート位置（５）
に割り当てられることになる。従って、定常区間番号
〔２〕の音のノート割当誤差は０．８４４３となる。前
回の定常区間番号〔１〕の場合と今回の定常区間番号
〔２〕の場合との累計は、０．２８４２＋０．８４４３
＝１．１２８５となる。このようにして、残りの定常区
間番号〔３〕から定常区間番号〔５〕までの音について
計算を行い、ノート割当誤差の累計値を算出すると、
２．２３３となる。この値は、自然的音階のノート位置
（０）を出発音とした場合である。従って、このノート
割当誤差の累計値の計算を自然的音階のノート位置
（２）、（３）、（５）、（７）、（８）、（１０）に
ついても行い、同じく和声的音階及び旋律的音階の各ノ
ート位置についても行う。図２２（Ａ）は、このように
して、各音階について、各ノート位置を出発音とした場
合におけるノート割当誤差の累計値を示す図である。Note distance 1.715 between [0] and steady section number [1]
8 and the value is 0.2842. Next, since the note distance of the sound of the steady section number [2] is 2.1557 with respect to the sound of the steady section number [1], the whole note or 3 semitones (full note + semitone) is selected as the note distance. Since the sound of the steady section number [1] has been assigned to the note position (2) in the previous processing, the steady section number [2]
Is assigned to the note position (4) or (5) at the note distance of a full tone or three semitones (full tone + half tone).
At this time, the sound at the note position (4) is marked with a cross in the natural scale, so the sound of the stationary section number [2] is assigned to the note position (5). Therefore, note distance 2.1
Note position where the sound of 557 corresponds to note distance 3 (5)
Will be assigned to Therefore, the note allocation error of the sound of the steady section number [2] is 0.8443. The cumulative total of the case of the previous steady section number [1] and the case of the current steady section number [2] is 0.2842 + 0.8443.
= 1.1285. In this way, the calculation is performed for the remaining sounds from the steady section number [3] to the steady section number [5], and the total value of the note assignment errors is calculated.
2.233. This value is obtained when the note position (0) of the natural scale is set to sound output. Therefore, the calculation of the total value of the note assignment error is performed for the note positions (2), (3), (5), (7), (8), and (10) of the natural scale, and the harmonic scale is also calculated. And for each note position of the melodic scale. FIG. 22A is a diagram showing the cumulative value of the note assignment error when each note position is set to output sound for each scale.

【００４５】ステップ６４：複数スケール上の各音高に
丸めた場合におけるノート割当誤差が０．５以上のもの
の累計値を算出する。すなわち、ステップ６３の場合
は、ノート割当誤差の全ての値の累計値を算出したが、
このステップでは、ノート割当誤差が０．５以上の場
合、すなわち図２０（Ｃ）の場合、定常区間番号〔１〕
はノート距離（２）の位置に、定常区間番号〔２〕はノ
ート距離（３）の位置に丸められることがノート割当誤
差も小さく理想的であるが、前述のように音階によって
は×印であって、割り当てることのできない場合が存在
する。このような場合には、ノート距離に最も近い音高
以外の音高に割当が修正されることになる。従って、こ
のような場合、すなわちノート割当誤差が０．５以上の
場合を、ノート修正誤差として、その累計値を算出す
る。図２２（Ｂ）がこのノート修正誤差の累計値を示す
図であり、図２２（Ａ）に対応している。ステップ６５：前記ステップ６４でノート割当誤差が
０．５以上のノート数、すなわちステップ６４の累計値
の算出に使用したノート数の合計を計算する。図２２
（Ｃ）がこのノート数の合計を示す図であり、図２２
（Ｂ）に対応している。Step 64: Calculate the total value of note assignment errors of 0.5 or more when rounded to each pitch on a plurality of scales. That is, in the case of step 63, the total value of all values of the note assignment error was calculated,
In this step, when the note assignment error is 0.5 or more, that is, in the case of FIG.
Is ideally rounded to the position of note distance (2) and stationary section number [2] is rounded to the position of note distance (3), ideally with a small note allocation error. There are cases where they cannot be assigned. In such a case, the allocation is corrected to a pitch other than the pitch closest to the note distance. Therefore, in such a case, that is, when the note allocation error is 0.5 or more, the total value is calculated as the note correction error. FIG. 22B is a diagram showing the cumulative value of the note correction errors, which corresponds to FIG. Step 65: The number of notes having a note assignment error of 0.5 or more in step 64, that is, the total number of notes used for calculating the total value in step 64 is calculated. FIG.
FIG. 22C shows the total number of notes, and FIG.
(B).

【００４６】ステップ６６：前記ステップ６３から前記
ステップ６５までの処理の結果、すなわち図２２（Ａ）
〜（Ｃ）の算出結果を利用して、一番ふさわしい音階
と、その始まり音を決定する。この決定方法には図２２
（Ａ）のノート割当誤差の累計値の最も小さいもの、図
２２（Ｂ）のノート割当誤差０．５以上の累計値の最も
小さいもの、図２２（Ｃ）のノート割当誤差０．５以上
のノート数の最も少ないもの、これらを適宜組み合わせ
たものなどが考えれる。ただし、近親調（関係調）など
のによって、最終的に１つに決定するとは限らない。こ
のような場合には、どれを利用しても同じメロディ遷移
となるので、どの決定方法を選択してもよい。従って、
図２２（Ａ）において、ノート割当誤差の累計値の最小
値は１．６８８であり、自然的音階に２ヵ所、和声的音
階に１ヵ所、旋律的音階に１ヵ所の合計４ヵ所に存在す
る。次にこの４ヵ所について、図２２（Ｂ）最小値を探
す。すると、４ヵ所とも０．８９１で同じ値である。従
って、これら４ヵ所のどれを選択してもよいことになる
が、ここでは、上側音階を優先し、左側ノート位置を優
先することとする。従って、最終的には、音階は自然的
音階が決定し、始まり音は３度に決定する。なお、図２
２では音階の表示を統一するために、短音階で表示して
いたので、自然的音階の３度始まりの音は、自然（長）
音階の１度（トニック）音ということになる。ステップ６７：前記ステップ６６で決定した音階と始ま
り音に基づいて、再度ノート割当誤差の算出処理と同じ
処理を行い、音高列を決定する。このようにして、図２
０（Ｃ）の定常区間番号に割り当てられた音高列を図２
２（Ｄ）に示す。図２０（Ｃ）の場合には音高列は０−
２−４−５−２−３・・・であったが、今回の音高列は
０−２−４−５−２−４・・・である。これから明らか
なように定常区間番号〔５〕の音高列が『４』であり、
図２０（Ｃ）の『３』と異なっていることがよく理解で
きる。Step 66: The result of the processing from step 63 to step 65, that is, FIG.
By using the calculation results of (C) to (C), the most suitable scale and its starting sound are determined. This determination method is shown in FIG.
FIG. 22 (A) shows the smallest note assignment error, FIG. 22 (B) shows the smallest note assignment error of 0.5 or more, and FIG. 22 (C) shows the smallest note assignment error of 0.5 or more. The one with the smallest number of notes, the combination of these as appropriate, and the like can be considered. However, it is not always the case that the number is finally determined to be one depending on the relative tone (related tone). In such a case, no matter which method is used, the same melody transition results, so any decision method may be selected. Therefore,
In FIG. 22 (A), the minimum value of the cumulative value of the note assignment error is 1.688, which is present at four places in total: two places in the natural scale, one place in the harmony scale, and one place in the melody scale. I do. Next, a minimum value in FIG. 22B is searched for these four locations. Then, the values are 0.891, which is the same value in all four places. Therefore, any one of these four locations may be selected. Here, the upper scale is prioritized, and the left note position is prioritized. Therefore, the scale is finally determined by the natural scale, and the starting sound is determined by the third. Note that FIG.
In 2nd, the scales are displayed in minor scales to unify the display of scales.
This is the first (tonic) sound of the scale. Step 67: Based on the scale and the starting sound determined in the step 66, the same processing as the note assignment error calculation processing is performed again to determine a pitch sequence. Thus, FIG.
FIG. 2 shows a pitch sequence assigned to a stationary section number of 0 (C).
2 (D). In the case of FIG. 20C, the pitch sequence is 0-
2-4-5-2-3..., But this time the pitch sequence is 0-2--4-5-2-4. As is clear from this, the pitch sequence of the stationary section number [5] is “4”,
It can be clearly understood that this is different from “3” in FIG.

【００４７】図６の音高列決定処理２の場合は、音階を
使って音を丸めることで、不安定な音がスケール音に丸
められるため、比較的安定したメロディに近づくのでユ
ーザの所望の音に近づく可能性が極めて高い。しかしな
がら、音感のよいユーザが音階構成音以外にわざと音を
ずらしてメロディを入力した場合には、このように音階
構成音に音を丸めるという処理は不適切である。最終的
にはメロディの善し悪しを判断しなければならない場合
もありうるが、それは現在の技術では不可能なため、図
６の音高列決定処理２のように音階を意識しながら音を
丸めて、特殊な音に限り、音階構成音以外の音でも認め
るようにすればよい。音階構成音以外の音とのノート距
離がある一定値以下の場合には、スケール構成音以外の
音であってもその音に丸めるという処理を行う。これ
は、図５の音高列決定処理１と図６の音高列決定処理２
との間に位置する中間的な音の丸め処理に該当する。図
６の音高列決定処理２によって、自然短音階の３度が始
まり音だと決定した後（すなわちステップ６６の処理の
後）に、ノート誤差許容範囲を決定する処理を行う。こ
の処理は、予めノート誤差許容範囲として０．２などの
定数を設定する方法と、計算によって求める方法とがあ
る。計算によって求める方法は、図６のステップ６３の
処理で音階構成音に丸める処理を行う時に、ノート距離
が０．５以下で丸められた場合におけるそのノート距離
の平均値を算出し、ユーザ（発声者）の音高ずれの傾向
を把握して、その平均値の定数倍をノート誤差許容範囲
とする方法が考えられる。この実施の形態では、ノート
許容誤差範囲として定数値の０．２を採用した場合につ
いて、図２０（Ｃ）の各定常区間にどのような音高列が
割り当てられるのかを説明する。まず、図６のステップ
６６の処理によって、自然短音階の３度が始まり音だと
決定するので、定常区間番号〔１〕のノート距離は１．
７１５８なので、一番近い音高は４度すなわちノート位
置（５）の音となる。この場合はそのまま近い音高に丸
めてしまえば問題ないので、ノート位置（５）の音が決
定する。次に定常区間番号〔２〕、〔３〕、〔４〕につ
いては同様にして音階内の音として、ノート位置
（７）、（８）、（５）が次々と決定する。ところが、
定常区間番号〔５〕については、ノート距離が１．１０
９３であり、定常区間番号〔４〕がノート位置（５）に
決定しているので、一番近い音高はノート位置（６）と
なる。このノート位置（６）の音高は音階構成音以外の
音である。従って、この場合にはノート許容誤差範囲内
かどうかの判定が行われる。この場合には、ノート距離
が１．１０９３なので、誤差は０．１０９３となり、ノ
ート許容誤差範囲の０．２以下なので、音階構成音以外
の音ではあるが、このノート位置（６）が採用されるこ
とになる。なお、定常区間番号〔５〕のノート距離が例
えば１．２０９３であったら、誤差は０．２０９３とな
り、ノート許容誤差範囲の０．２よりも大きいため、ノ
ート位置（６）とはならずに、その一つ上の音階構成音
であるノート位置（７）の音に決定することになる。こ
のようにノート許容誤差範囲を設定して、音階構成音以
外の音も音高列に加えることができるようにすることに
よって音階を使いながらも音階構成音以外の音にも配慮
でき、人がイメージして歌ったメロディに近い音高列を
決定することが可能となる。In the pitch sequence determination processing 2 shown in FIG. 6, since the unstable sound is rounded to the scale sound by rounding the sound using the scale, the melody approaches a relatively stable melody, so that the desired sound of the user can be obtained. Very likely to approach sound. However, when a user with a good sense of sound inputs a melody by intentionally shifting a sound other than the scale-constituting sound, such a process of rounding the sound to the scale-constituting sound is inappropriate. Ultimately, there may be cases where it is necessary to judge the quality of the melody, but this is not possible with the current technology. Therefore, as in pitch sequence determination processing 2 in FIG. Only special sounds may be recognized as sounds other than scale constituent sounds. If the note distance to a sound other than the scale-constituting sound is smaller than a certain value, a process for rounding the sound other than the scale-constituting sound to that sound is performed. This corresponds to the pitch sequence determination process 1 in FIG. 5 and the pitch sequence determination process 2 in FIG.
This corresponds to an intermediate sound rounding process located between. After the pitch sequence determination process 2 of FIG. 6 determines that the third natural tone is the first sound (ie, after the process of step 66), a process of determining a note error allowable range is performed. This processing includes a method of setting a constant such as 0.2 as a note error allowable range in advance, and a method of calculating by a calculation. The method of obtaining by calculation is to calculate the average value of the note distance when the note distance is rounded to 0.5 or less when performing the process of rounding to the scale component sound in the process of step 63 in FIG. A method is conceivable in which the pitch deviation tendency of the person) is grasped, and a constant multiple of the average value is set as a note error allowable range. In this embodiment, a description will be given of what pitch sequence is assigned to each stationary section in FIG. 20C when a constant value of 0.2 is adopted as the note allowable error range. First, by the processing of step 66 in FIG. 6, it is determined that the third sound of the natural minor scale is the first sound, so that the note distance of the stationary section number [1] is 1.
Since it is 7158, the closest pitch is the sound at the fourth note, that is, the note position (5). In this case, there is no problem if the pitch is rounded to a close pitch, and the sound at the note position (5) is determined. Next, note positions (7), (8), and (5) are sequentially determined as sounds in the scale for the steady section numbers [2], [3], and [4]. However,
For the steady section number [5], the note distance is 1.10
93, and the steady interval number [4] is determined at the note position (5), so the closest pitch is the note position (6). The pitch at the note position (6) is a sound other than the scale component sound. Therefore, in this case, it is determined whether or not the value is within the note allowable error range. In this case, since the note distance is 1.1093, the error is 0.1093, which is less than or equal to the note allowable error range of 0.2. Will be. If the note distance of the steady section number [5] is, for example, 1.2093, the error is 0.2093, which is larger than the note allowable error range of 0.2. Is determined to be the note at the note position (7), which is the musical note that is one level higher than that. By setting the note tolerance range in this way, it is possible to add sounds other than scale constituent sounds to the pitch sequence, so that it is possible to consider sounds other than scale constituent sounds while using scales, It becomes possible to determine a pitch sequence close to the melody sung and imaged.

【００４８】次に、音高列決定処理の第３の実施例であ
る音高列決定処理３について、図７のフローチャートに
従って説明する。前述の音高列決定処理１、２では、直
前の音とのノート距離に基づいて音高列を決定する場合
について説明したが、一連のフレーズにおいては直前の
音とのノート距離だけで次の音高が決定することはな
く、フレーズの流れすなわち音高列を構成する音はその
フレーズの先頭の音に対して影響するので、ここでは、
フレーズを検出し、音高列の決定について、そのフレー
ズ先頭の音との音高差を考慮して音高列を決定するよう
にした。ステップ７１：図４の定常区間検出処理によって検出さ
れた定常区間の長さを音価列（時価列）のグリッド数で
表した場合にどれくらいになるのか決定する。定常区間
が図２３（Ａ）のような場合、各定常区間の先頭から次
の定常区間の先頭までを１つの区間として図２３（Ｂ）
のような音価基準を作成する。１秒間の数百分の一程度
を１グリッドとした場合に、これらの音価基準がそのグ
リッドの何個分で構成されるかを決定する。図２３
（Ｃ）が音価列を決定するためのグリッドである。従っ
て、図２３（Ｂ）のような音価基準を図２３（Ｃ）のグ
リッドに適合させるために、各音価基準の位置を修正す
る。例えば、音価基準の境界位置がグリッドとグリッド
の間に位置する場合には、最も近いグリッドに音価基準
の境界位置を変更する。なお、グリックとグリッドのち
ょうど中間に位置する場合には、前側のグリッドに音価
基準の境界位置を変更する。このようにして音価基準の
境界位置が変更されたものが、図２３（Ｄ）の音価区間
である。図２３（Ｄ）の音価区間の上側には定常区間番
号と同じ音価区間番号が、その下側にはその各音価区間
のグリッド数が示されている。グリッド数の並びは、
４，４，５，６，３，６，１１，４，７，３，５，３，
１０，・・・のようになっている。Next, a pitch sequence determination process 3 which is a third embodiment of the pitch sequence determination process will be described with reference to the flowchart of FIG. In the pitch sequence determination processes 1 and 2 described above, the case where the pitch sequence is determined based on the note distance from the immediately preceding sound has been described. Since the pitch is not determined and the flow of the phrase, that is, the sound that makes up the pitch sequence, affects the first sound of the phrase,
When a phrase is detected and a pitch sequence is determined, a pitch sequence is determined in consideration of a pitch difference from the first sound of the phrase. Step 71: It is determined how long the length of the stationary section detected by the stationary section detection process in FIG. 4 is when it is represented by the number of grids of a sound value sequence (current value sequence). When the steady section is as shown in FIG. 23A, the section from the beginning of each steady section to the beginning of the next steady section is defined as one section in FIG.
Create a note value standard like If one hundredth of a second is defined as one grid, it is determined how many of these grids are composed of these pitch standards. FIG.
(C) is a grid for determining a tone value sequence. Therefore, the position of each sound value reference is corrected in order to adapt the sound value reference as shown in FIG. 23 (B) to the grid of FIG. 23 (C). For example, when the boundary position of the note value reference is located between the grids, the boundary position of the note value reference is changed to the closest grid. If the grid is located exactly in the middle between the grid and the grid, the boundary position based on the note value is changed to the grid on the front side. FIG. 23D shows a sound value section in which the boundary position of the sound value reference is changed in this manner. The upper part of the note value section in FIG. 23 (D) shows the same note value section number as the steady section number, and the lower part shows the grid number of each note value section. The grid number is
4,4,5,6,3,6,11,4,7,3,5,3
10, etc.

【００４９】ステップ７２：このように各音価区間の長
さがグリッド数で規定されたので、今度は、そのグリッ
ド数に基づいて複数の音価区間を纏めて１つのフレーズ
を構成する。フレーズを構成する手法は、本願の出願人
が先に出願した特願平７−１２３１０５号に記載してあ
るので、ここでは簡単に説明する。まず、一つの音価区
間が一つの音符に対応するので、各音価区間の長さに基
づいて音価区間の平均の長さ（平均音価区間長）を算出
する。算出した平均音価区間長に所定係数Ｋ（１以上の
値であり、例えば２）を乗じることによって乗算値を得
る。このようにして得られた乗算値以上の値の音価区間
長を検出する。検出された音価区間の後にフレーズの区
切りを示す区切りデータを挿入する。区切りデータによ
って区切られた音価区間が一つのフレーズを構成するこ
とになる。今度は、このようにして纏められた各フレー
ズ毎に平均音価区間長を算出する。算出された平均音価
区間長に所定係数Ｌ（１以上の値であり、例えば２）を
乗じる。各フレーズの最後の音価区間の長さすなわち終
端音価区間長がこの乗算値よりも小さい場合には、その
フレーズの最終音価区間の後に挿入されているフレーズ
区切りデータを削除する。終端音価区間長が乗算値以上
の場合には何もしない。このようなフレーズ削除処理を
全フレーズに対して行う。例えば、図２３（Ｄ）の場合
は、グリッド数の合計は、４＋４＋５＋６＋３＋６＋１１＋４＋７＋３＋５＋３＋
１０＝７１である。これを区間数１３で割ると、７１÷１３＝５．
４６となる。小数点一桁で四捨五入すると、平均音価区
間長は５となる。この５に所定係数２を乗じた値は１０
となる。従って、音価区間長が１０以上である音価区間
は、音価区間〔６〕と音価区間〔１２〕である。従っ
て、これらの音価区間〔６〕及び〔１２〕の後に区切り
データが挿入されるので、図２３（Ｅ）に示されるよう
に第１フレーズは音価区間Step 72: Since the length of each note value section is defined by the number of grids, a plurality of note value sections are combined into one phrase based on the number of grids. The method of constructing a phrase is described in Japanese Patent Application No. 7-123105 previously filed by the applicant of the present application, and will be briefly described here. First, since one note value section corresponds to one note, the average length of the note value sections (average note value section length) is calculated based on the length of each note value section. A multiplied value is obtained by multiplying the calculated average duration section length by a predetermined coefficient K (a value of 1 or more, for example, 2). A note value section length of a value equal to or larger than the multiplied value obtained in this way is detected. Delimiter data indicating a phrase break is inserted after the detected note value section. The note value sections delimited by the delimiter data constitute one phrase. This time, the average pitch section length is calculated for each of the phrases compiled in this way. The calculated average duration section length is multiplied by a predetermined coefficient L (a value of 1 or more, for example, 2). If the length of the last note value section of each phrase, that is, the end note value section length is smaller than this multiplication value, the phrase delimiter data inserted after the last note value section of the phrase is deleted. If the end note value section length is not less than the multiplication value, nothing is performed. Such a phrase deletion process is performed for all phrases. For example, in the case of FIG. 23D, the total number of grids is 4 + 4 + 5 + 6 + 3 + 6 + 11 + 4 + 7 + 3 + 5 + 3 +
10 = 71. When this is divided by the number of sections 13, 71 ÷ 13 = 5.
46 is obtained. If the value is rounded off to one decimal place, the average pitch length becomes 5. The value obtained by multiplying this 5 by the predetermined coefficient 2 is 10
Becomes Therefore, the note value sections whose note value section length is 10 or more are note value section [6] and note value section [12]. Accordingly, since the delimiter data is inserted after these note value sections [6] and [12], the first phrase is not included in the note value section as shown in FIG.

〔０〕〜〔６〕の７個で構成
され、第２フレーズは音価区間〔７〕〜〔１２〕の６個
で構成される。The second phrase is composed of six pieces of note value sections [7] to [12].

【００５０】ステップ７３：ステップ７２で決定された
フレーズの各音価区間の代表周波数を決定する。ここ
で、音価区間は前述の図５の音程列決定処理１の定常区
間に相当するので、ステップ５１と同じ方法で音価区間
の代表周波数を決定する。ステップ７４：フレーズ先頭音すなわちフレーズの先頭
の音価区間の代表周波数との間でノート距離を決定す
る。音程列決定処理１及び２では、直前の定常区間との
間だけでノート距離を算出していたが、ここでは、フレ
ーズの先頭の音価区間Step 73: The representative frequency of each tone value section of the phrase determined in step 72 is determined. Here, the note value section corresponds to the stationary section of the above-described pitch sequence determination processing 1 of FIG. Step 74: A note distance is determined between the head sound of the phrase, that is, the representative frequency of the first note value section of the phrase. In the pitch sequence determination processes 1 and 2, the note distance is calculated only from the immediately preceding stationary section.

〔０〕の代表周波数を基準とし
て、そのフレーズの各音価区間のノート距離を決定す
る。図２４は、第１フレーズの音価区間With reference to the representative frequency of [0], the note distance of each note value section of the phrase is determined. FIG. 24 shows the note value section of the first phrase.

〔０〕〜〔６〕
における代表周波数の値と、第１フレーズの先頭の音価
区間[0]-[6]
And the value of the first note value of the first phrase

〔０〕と、第１フレーズを構成する各音価区間
〔１〕〜〔６〕との間のノート距離の値を示す図であ
る。ステップ７５：前記ステップ７５で算出されたノート距
離に基づいて、前述の音程列決定処理１のステップ５３
からステップ５５までの処理、又は音程列決定処理２の
ステップ６３からステップ６６までの処理を行い、所定
の音程列を決定する。It is a figure which shows the value of the note distance between [0] and each tone value section [1]-[6] which comprises a 1st phrase. Step 75: Based on the note distance calculated in step 75, step 53 of the above-described pitch sequence determination processing 1
To the step 55 or the processing from the step 63 to the step 66 of the pitch sequence determination process 2 to determine a predetermined pitch sequence.

【００５１】次に、音程列決定処理の第４の実施例であ
る音程列決定処理４について、図８のフローチャートに
従って説明する。前述の音程列決定処理３では、フレー
ズの先頭音すなわちフレーズの先頭に位置する定常区間
の音との間でノート距離を算出して音程列を決定する場
合について説明したが、一連のフレーズにおいてはフレ
ーズ先頭音だけではなく、その該当する音が発音される
までに発音された音に対して影響することもあるので、
ここでは、フレーズを検出し、音程列の決定について、
各フレーズにおいて、その音が発音されるまでに発音さ
れた音との関係を考慮して音程列を決定するようにし
た。ステップ８１からステップ８３までの処理は、ステ
ップ７１からステップ７３までの処理と同じなので説明
を省略する。ステップ８４：フレーズ内の各前置音との間におけるノ
ート距離を決定する。図２３（Ｅ）の第１フレーズにつ
いてこのノート距離を求める。まず、第１フレーズの音
価区間Next, a description will be given of a pitch sequence determination process 4, which is a fourth embodiment of the pitch sequence determination process, with reference to the flowchart of FIG. In the above-described pitch sequence determination processing 3, the case where the pitch sequence is determined by calculating the note distance between the head sound of the phrase, that is, the sound of the stationary section located at the head of the phrase has been described. Since it may affect not only the first sound of the phrase but also the sound that was pronounced before the corresponding sound was pronounced,
Here, the phrase is detected and the pitch sequence is determined.
In each phrase, the pitch sequence is determined in consideration of the relationship with the sound that was generated before the sound was generated. The processing from step 81 to step 83 is the same as the processing from step 71 to step 73, and a description thereof will be omitted. Step 84: Determine the note distance between each pre-sound in the phrase. This note distance is obtained for the first phrase in FIG. First, the note value section of the first phrase

〔０〕については前置音が存在しないので、ノー
ト距離は存在しない。音価区間〔１〕については前置音
として音価区間For [0], since there is no pre-sound, there is no note distance. A note value section [1] is a note value section as a prefix sound

〔０〕が存在するので、そのノート距離
は１．７１５８となる。音価区間〔２〕については前置
音として音価区間Since [0] is present, the note distance is 1.7158. The note value section [2] is a note value section as a prefix sound

〔０〕と〔１〕が存在するので、それ
ぞれのノート距離を求めると、３．８７１５と２．１５
５７となる。以下、同様にして各音価区間の前置音との
間のノート距離を算出すると、図２５（Ａ）のようにな
る。ステップ８５：フレーズ内の各前置音との間における時
間距離に基づいて重み付けを行う。まず、各音価区間が
前置音との間にどれだけの時間差を有するかをそのグリ
ッド数で表す。図２３（Ｅ）の第１フレーズについてこ
れを算出すると、その値は図２５（Ｂ）に示すようにな
る。音価区間Since [0] and [1] are present, the respective note distances are determined to be 3.8715 and 2.15.
57. Hereinafter, when the note distance between each note value section and the preceding sound in the same manner is calculated, the result is as shown in FIG. Step 85: Weighting is performed based on the time distance between each pre-sound in the phrase. First, the time difference between each note value section and the preceding sound is represented by the number of grids. When this is calculated for the first phrase in FIG. 23 (E), the value is as shown in FIG. 25 (B). Note value section

〔０〕については前置音が存在しないの
で、時間距離は存在しない。音価区間〔１〕については
前置音として音価区間For [0], there is no pre-sound, so there is no time distance. A note value section [1] is a note value section as a prefix sound

〔０〕が存在するので、その時間
距離は４グリッドとなる。音価区間〔２〕については前
置音として音価区間Since [0] exists, the time distance is 4 grids. The note value section [2] is a note value section as a prefix sound

〔０〕と〔１〕が存在するので、そ
れぞれの時間距離を求めると、８グリッドと４グリッド
となる。以下、同様にして各音価区間の前置音との間の
時間距離を算出すると、図２５（Ｂ）のようになる。こ
のようにして求められた時間距離に基づいて、その重み
を算出する。各音価区間の時間距離をその各音価区間の
総和で除した除算値の逆数の総和が１００となるように
正規化したものを各音価区間の時間距離による重みとす
る。例えば、図２５（Ｂ）の音価区間〔２〕の場合は、
時間距離は８グリッドと４グリッドである。音価区間
〔２〕の時間距離８をその音価区間の総和である１２で
除した値は８／１２＝２／３であり、時間距離４をその
音価区間の総和である１２で除した値は４／１２＝１／
３である。この除算値の逆数３／２と３／１の総和が１
００となるには、それぞれの逆数３／２及び３／１に２
００／９を乗じたものである。従って、音価区間〔２〕
の区間番号Since [0] and [1] are present, the respective time distances are 8 grids and 4 grids. Hereinafter, when the time distance between each sound value section and the preceding sound in the same manner is calculated, the result is as shown in FIG. The weight is calculated based on the time distance obtained in this manner. The normalized distance value of the reciprocal of the divided value obtained by dividing the time distance of each note value section by the sum of the note value sections is set to be 100, which is a weight based on the time distance of each note value section. For example, in the case of the note value section [2] in FIG.
The time distance is 8 grids and 4 grids. The value obtained by dividing the time distance 8 of the note value section [2] by 12 which is the sum of the note value sections is 8/12 = 2/3, and the time distance 4 is divided by 12 which is the sum of the note value sections. The value obtained is 4/12 = 1 /
3. The sum of the reciprocals 3/2 and 3/1 of this division value is 1
To obtain 00, the reciprocals 3/2 and 3/1 have 2
00/9. Therefore, note value section [2]
Section number

〔０〕に対する重みは３３．３となり、区間
番号〔１〕に対する重みは６６．７となる。このように
して、各音価区間毎に時間距離により重み付けが行われ
る。図２５（Ｃ）は図２５（Ｂ）の時間距離による重み
付けしたものの値を示す。The weight for [0] is 33.3, and the weight for section number [1] is 66.7. In this way, weighting is performed by time distance for each note value section. FIG. 25C shows the values weighted by the time distance in FIG. 25B.

【００５２】ステップ８６：前記ステップ８５の処理に
よって算出された重みに基づいて各音価区間の音を１２
音階又は所定の音階上の音に丸める処理を行う。１２音
階上の音に丸める処理は図５のステップ５３〜ステップ
５５の処理を行う。音階上の音に丸める処理は図６のス
テップ６３〜ステップ６６の処理を行う。この際に、ノ
ート距離として時間距離による重みを参考にする。例え
ば、図２５（Ｃ）において、音価区間〔１〕の前置音は
音価区間Step 86: Based on the weights calculated by the process of step 85, 12 tones of each note value section are
A process for rounding to a musical scale or a sound on a predetermined musical scale is performed. The process of rounding to a tone on the 12th scale is performed by the processes of steps 53 to 55 in FIG. In the process of rounding to a note on the scale, the processes of steps 63 to 66 in FIG. 6 are performed. At this time, the weight based on the time distance is referred to as the note distance. For example, in FIG. 25 (C), the preceding sound of the note value section [1] is a note value section.

〔０〕の音だけなので、ノート距離は１．７１
５８がそのまま使用される。故に、ノート距離１．７１
５８に最も近い音程として音価区間Since only the sound of [0], the note distance is 1.71
58 is used as it is. Therefore, note distance 1.71
Note value section as the interval closest to 58

〔０〕の音よりも全
音高い音が選択される。次に音価区間〔２〕について考
察すると、音価区間〔２〕の音は音価区間A sound that is higher than the sound of [0] by a whole tone is selected. Next, when considering the note value section [2], the sound of the note value section [2]

〔０〕に対し
ては、３３．３パーセントの重みで影響を受け、音価区
間〔１〕に対しては、６６．７パーセントの影響を受け
る。このとき、音価区間〔１〕は既に音価区間[0] is affected by a weight of 33.3%, and note value section [1] is affected by 66.7%. At this time, the note value section [1] is already a note value section.

〔０〕と
の間でノート距離として『２』が決定しているので、音
価区間Since "2" is determined as the note distance between [0] and the note value interval,

〔０〕と音価区間〔２〕との間のノート距離３．
８７１５からはそのノート距離『２』を減算した値１．
８７１５となる。一方、音価区間〔２〕と音価区間
〔１〕とのノート距離は２．１５５７である。従って、
音価区間〔２〕のノート距離はその重みを考慮して、次
のように算出される。（１．８７１５×３３．３＋２．１５７７×６６．６）
／１００＝２．０６従って、音価区間〔２〕のノート距離は２．０６とな
る。このノート距離２．０６を用いて、１２音階上の音
に丸める処理（図５のステップ５３〜ステップ５５の処
理）又は音階上の音に丸める処理（図６のステップ６３
〜ステップ６６の処理）を行う。2. Note distance between [0] and note value section [2]
8715 minus the note distance “2”.
8715. On the other hand, the note distance between the note value section [2] and the note value section [1] is 2.1557. Therefore,
The note distance of the note value section [2] is calculated as follows in consideration of the weight. (1.8715 × 33.3 + 2.1577 × 66.6)
/100=2.06 Therefore, the note distance in the note value section [2] is 2.06. Using this note distance of 2.06, processing to round to a note on the 12th scale (the processing of steps 53 to 55 in FIG. 5) or processing to round to a sound on the scale (step 63 in FIG. 6)
(Step 66).

【００５３】[0053]

【発明の効果】この発明に係る音信号分析装置によれ
ば、マイク等からの入力音のピッチ又はレベルが微妙に
ゆれた場合でも、音楽的な音が存在する区間（有効区
間）を容易に分析することができる。別の発明に係る音
信号分析装置によれば、マイク等からの入力音のピッチ
又はレベルが微妙にゆれた場合でも、そのゆれた部分以
外の音楽的な音の定常部分すなわち１つの音符に相当す
る部分を分析することのできる音信号分析装置を提供す
ることができる。さらに別の発明の演奏情報発生装置に
よれば、マイク等からの入力音のピッチ又はレベルが微
妙にゆれた場合でもそのピッチに対するノート情報を確
実に発生することができる。According to the sound signal analyzing apparatus of the present invention, even when the pitch or level of the input sound from the microphone or the like slightly fluctuates, a section (effective section) in which a musical sound exists can be easily formed. Can be analyzed. According to the sound signal analyzing device of another invention, even when the pitch or level of the input sound from the microphone or the like slightly fluctuates, it corresponds to a stationary part of a musical sound other than the fluctuated part, that is, one note. It is possible to provide a sound signal analyzer capable of analyzing a portion to be reproduced. According to the performance information generating device of another invention, even when the pitch or level of the input sound from the microphone or the like is slightly fluctuated, note information corresponding to the pitch can be reliably generated.

[Brief description of the drawings]

【図１】図２の電子楽器が演奏情報発生装置として動
作する際のメインフローを示す図である。FIG. 1 is a diagram showing a main flow when the electronic musical instrument of FIG. 2 operates as a performance information generating device.

【図２】この発明に係る楽音情報分析装置及び演奏情
報発生装置を内蔵した電子楽器の構成を示すハードブロ
ック図である。FIG. 2 is a hardware block diagram showing a configuration of an electronic musical instrument incorporating a musical sound information analyzer and a performance information generator according to the present invention.

【図３】図１のステップ１３の有効区間検出処理の詳
細を示す図である。FIG. 3 is a diagram showing details of an effective section detection process in step 13 of FIG. 1;

【図４】図１のステップ１４の定常区間検出処理の詳
細を示す図である。FIG. 4 is a diagram showing details of a steady section detection process in step 14 of FIG. 1;

【図５】図１のステップ１５の音高列決定処理の詳細
を示す図である。FIG. 5 is a diagram showing details of a pitch sequence determination process in step 15 of FIG. 1;

【図６】図１のステップ１５の音高列決定処理の第２
の実施例である音高列決定処理２の詳細を示す図であ
る。FIG. 6 shows a second example of the pitch sequence determination process in step 15 of FIG.
It is a figure which shows the detail of the pitch row | line determination process 2 which is a Example.

【図７】図１のステップ１５の音高列決定処理の第３
の実施例である音高列決定処理３の詳細を示す図であ
る。FIG. 7 shows a third example of the pitch sequence determination process in step 15 of FIG.
FIG. 14 is a diagram showing details of a pitch sequence determination process 3 which is an embodiment of the present invention.

【図８】図１のステップ１５の音高列決定処理の第４
の実施例である音高列決定処理４の詳細を示す図であ
る。8 is a fourth example of the pitch sequence determination process in step 15 of FIG.
FIG. 9 is a diagram illustrating details of a pitch sequence determination process 4 according to the embodiment of FIG.

【図９】サンプリング周波数４４．１ｋＨｚでサンプ
リングされた音声信号すなわちディジタルサンプル信号
の波形値の一例を示す図である。FIG. 9 is a diagram illustrating an example of a waveform value of an audio signal sampled at a sampling frequency of 44.1 kHz, that is, a digital sample signal.

【図１０】図１のステップ１３の有効区間検出処理の
動作例の概念を示す図である。FIG. 10 is a diagram showing a concept of an operation example of an effective section detection process in step 13 of FIG. 1;

【図１１】図４のステップ４１の周期基準位置検出処
理の一例である有効区間内の楽音波形のピーク位置検出
処理の動作例の概念を示す図である。11 is a diagram showing a concept of an operation example of a peak position detection process of a musical sound waveform in an effective section, which is an example of a period reference position detection process in step 41 of FIG.

【図１２】図４のステップ４２の波形比較処理の中で
行われる誤差率の算出方法がどのように行われるのか、
その具体例を２個の比較波を用いて示した図である。FIG. 12 shows how the calculation method of the error rate performed in the waveform comparison processing in step 42 of FIG. 4 is performed.
FIG. 9 is a diagram illustrating a specific example using two comparison waves.

【図１３】図４のステップ４２の波形比較処理によっ
て、図１２の２個の比較波からどのようにして誤差率が
算出されるのか、その具体例を示す図である。13 is a diagram showing a specific example of how an error rate is calculated from the two comparison waves in FIG. 12 by the waveform comparison processing in step 42 in FIG. 4;

【図１４】図４のステップ４２の波形比較処理によっ
て、図１１（Ｂ）のピーク基準位置が修正されて、規則
正しいピーク位置が検出される様子を示す図である。FIG. 14 is a diagram showing a state in which the peak reference position in FIG. 11B is corrected by the waveform comparison processing in step 42 in FIG. 4, and a regular peak position is detected.

【図１５】図４のステップ４１の周期基準位置検出処
理の一例である有効区間内の楽音波形のピーク位置検出
処理の動作例の別の楽音波形に対する概念を示す図であ
る。FIG. 15 is a diagram showing a concept for another musical sound waveform in an operation example of a peak position detecting process of a musical sound waveform in an effective section, which is an example of the periodic reference position detecting process in step 41 of FIG.

【図１６】図４のステップ４５の定常区間拡張処理の
動作例を示す図である。FIG. 16 is a diagram illustrating an operation example of a steady section extension process in step 45 of FIG. 4;

【図１７】図４のステップ４６の定常部重ね合わせ処
理の動作例を示す図である。FIG. 17 is a diagram illustrating an operation example of a steady portion overlapping process in step 46 of FIG. 4;

【図１８】図４のステップ４７の音高・音圧の変化に
よる細分化処理の動作例を示す図である。FIG. 18 is a diagram showing an operation example of a subdivision process based on a change in pitch and sound pressure in step 47 of FIG. 4;

【図１９】ステップ１３によって求められた有効区間
の中から定常区間がどのようにして検出されるのか、図
４の定常区間検出処理の概念を示す図である。FIG. 19 is a diagram showing a concept of the stationary section detection process in FIG. 4, showing how a stationary section is detected from the effective sections obtained in step 13.

【図２０】図５の音高列決定処理１の動作例の概念を
示す図である。20 is a diagram showing a concept of an operation example of pitch sequence determination processing 1 of FIG.

【図２１】図６の音高列決定処理２で用いられる複数
スケールの一例を示す図である。21 is a diagram illustrating an example of a plurality of scales used in pitch sequence determination processing 2 in FIG.

【図２２】図６の音高列決定処理２の動作例の概念を
示す図である。FIG. 22 is a diagram showing the concept of an operation example of pitch sequence determination processing 2 in FIG. 6;

【図２３】図７の音高列決定処理３の動作例の概念を
示す図である。FIG. 23 is a diagram showing the concept of an operation example of pitch sequence determination processing 3 in FIG. 7;

【図２４】図７のステップ７４のフレーズ先頭音との
ノート距離決定処理の具体例を示す図である。FIG. 24 is a diagram showing a specific example of a note distance determination process with respect to a phrase leading sound in step 74 of FIG. 7;

【図２５】図８の音高列決定処理４の動作例の概念を
示す図である。FIG. 25 is a diagram showing the concept of an operation example of pitch sequence determination processing 4 in FIG. 8;

[Explanation of symbols]

１…ＣＰＵ、２…プログラムメモリ、３…ワーキングメ
モリ、４…演奏データメモリ、５…押鍵検出回路、６…
マイクインターフェイス、７…スイッチ検出回路、８…
表示回路、９…音源回路、１０…鍵盤、１Ａ…マイクロ
フォン、１Ｂ…テンキー＆各種スイッチ、１Ｃ…ディス
プレイ、１Ｄ…サウンドシステム、１Ｅ…データ及びア
ドレスバスDESCRIPTION OF SYMBOLS 1 ... CPU, 2 ... program memory, 3 ... working memory, 4 ... performance data memory, 5 ... key press detection circuit, 6 ...
Microphone interface, 7 ... Switch detection circuit, 8 ...
Display circuit, 9: sound source circuit, 10: keyboard, 1A: microphone, 1B: numeric keypad and various switches, 1C: display, 1D: sound system, 1E: data and address bus

Claims

[Claims]

An input means for inputting an arbitrary sound signal from the outside, and an average value of a sample amplitude value of a signal sequentially input from the input means over a predetermined number of samples is obtained. Calculating means for outputting the average sound pressure level information as the average sound pressure level information,
A section that is equal to or greater than a predetermined value as a valid section in which a musical sound is present, and a section that is less than the first predetermined value as an invalid section in which no musical sound is present; If the time length of the invalid section between the invalid sections is shorter than the first predetermined length, the invalid section is changed to a valid section, and the valid section after the change and the valid sections on both sides are changed. A valid section forming means for combining and forming a new valid section; and when the processing by the valid section forming means is completed, the time length of the section in the valid section sandwiched between the invalid sections on both sides is the If the length is less than the predetermined length of 2, the valid section is changed to an invalid section, and the changed invalid section and the invalid sections on both sides are combined to form a new invalid section. At the time when the processing by the first invalid section conversion means is completed. For each of the valid sections, an average value of the average sound pressure level is calculated, and when the average value is less than a second predetermined value, a second invalid section converting means for changing the valid section to an invalid section is provided. A sound signal analyzer characterized by comprising:

2. An input means for inputting an arbitrary sound signal from the outside, and an average value of a sample amplitude value of a signal sequentially input from the input means over a predetermined number of samples is obtained. Calculating means for outputting the average sound pressure level information as the average sound pressure level information,
A section that is equal to or greater than a predetermined value is defined as an effective section, a section that is less than the first predetermined value and a section sandwiched between the valid sections on both sides is defined as an invalid section, and both ends of the average sound pressure level other than this are defined as invalid sections. And a section determining means for setting the section as an undetermined section, and, when the time length of the section is shorter than a first predetermined length in the invalid section sandwiched between the valid sections on both sides, the invalid section is set to valid. A valid section which is changed to a section, and combines the valid section after the change and the valid sections on both sides thereof into a new valid section; and when the processing by the valid section forming means is completed, both sides are invalidated. If the time length of the valid section between the sections is shorter than the second predetermined length, the valid section is changed to the invalid section, and the invalid section after the change and the invalid sections on both sides are changed. Are combined into a new invalid section, which is adjacent to the undetermined section. If the time length of the valid section is shorter than the second predetermined length, the valid section is combined with an invalid section adjacent thereto and an undetermined section to form a new undetermined section. And calculating an average value of the average sound pressure level for each of the valid section and the undetermined section at the time when the processing by the first invalid section forming means is completed. If the value is less than a second predetermined value, the valid section or the undetermined section is changed to an invalid section, and if the value is equal to or more than the second predetermined value, the undetermined section is changed to a valid section.
A sound signal analyzing device, comprising:

3. When the processing by the second invalid section conversion means is completed, the average sound pressure level calculated by the calculation means and a second predetermined value smaller than the first predetermined value are used. The sound signal analyzer according to claim 1, further comprising an extension unit that extends the effective section.

4. An input means for inputting an arbitrary sound signal from the outside, a cycle reference detecting means for detecting a plurality of candidate positions serving as a cycle reference for the sound signal input from the input means, A section detecting means for calculating a degree of coincidence between waveforms of adjacent sections represented by the candidate positions with respect to a sound signal, and connecting the sections having a high degree of coincidence to detect the same waveform section; And a stationary section determining means for detecting a stationary section based on the same waveform section detected by the sound signal analyzing apparatus.

5. An input means for inputting an arbitrary sound signal from outside, and a first cycle reference detection for detecting a plurality of temporary candidate positions serving as a cycle reference for the sound signal input from the input means. Means, frequency band detecting means for detecting a maximum frequency and a minimum frequency of the sound signal based on the tentative candidate position detected by the first cycle reference detecting means, and a frequency band detected by the frequency band detecting means. A filter processing unit that performs band-pass filter processing using a maximum frequency and a minimum frequency as a cutoff frequency on the sound signal input from the input unit; and a period reference for the sound signal output from the filter processing unit. Second period reference detecting means for detecting a plurality of candidate positions; and, for the sound signal, the degree of coincidence of waveforms between adjacent sections represented by the candidate positions. A section detecting means for calculating and connecting those having a high degree of coincidence to detect the same waveform section; and a steady section detecting means for detecting a steady section based on the same waveform section detected by the section detecting means. A sound signal analyzer characterized by comprising:

6. An input section for inputting an arbitrary sound signal from outside, and an effective section analyzing section for analyzing an effective section in which a musical sound is considered to be present from the sound signal input from the input section. A period reference detecting means for detecting a plurality of candidate positions serving as a period reference for each of the positive and negative sides of the sound signal forming the effective section; and for each of the positive and negative sides of the sound signal, A section detecting means for calculating the degree of coincidence of the waveforms of adjacent sections represented by the candidate positions, connecting those having a high degree of coincidence to detect the same waveform section, and positive or negative detected by the section detecting means A timbre section determining means that sets a section formed by superimposing the same waveform sections on both sides as a same timbre section; and a steady state based on the same timbre section determined by the timbre section determining means. The sound signal analysis device, characterized in that it comprises a stationary section determining means for detecting between.

7. An input section for inputting an arbitrary sound signal from outside, and an effective section analyzing section for analyzing an effective section in which a musical sound is considered to be present from the sound signal input from the input section. A first cycle reference detecting means for detecting a plurality of temporary candidate positions serving as a cycle reference for the sound signal forming the effective section; and the temporary candidate detected by the first cycle reference detecting means. Frequency band detecting means for detecting a maximum frequency and a minimum frequency for the entire section or the effective section of the sound signal based on the position; Filter processing means for performing band-pass filter processing for all sections or the effective sections of the sound signal input from the input means; and A second period reference detecting means for detecting a plurality of candidate positions serving as a period reference for the sound signal to be input; Calculating the degree of coincidence of waveforms between sections, connecting sections having a high degree of coincidence to detect the same waveform section, and determining a stationary section based on the same waveform section detected by the section detecting means. A sound signal analyzer comprising: a stationary section determining means for detecting.

8. The method according to claim 1, wherein said effective section detecting means comprises:
The sound signal analyzer according to claim 6, wherein the valid section is detected by the sound signal analyzer according to claim 3.

9. An input means for inputting an arbitrary sound signal from the outside, a stationary section analyzing means for analyzing a stationary section corresponding to one note from the sound signal input from the input means, Frequency determining means for determining a representative frequency for each of the stationary sections analyzed by the stationary section analyzing means; and a representative frequency between two stationary sections preceding and succeeding based on the representative frequency of the stationary section determined by the frequency determining means. Value conversion means for converting the difference between the two constant intervals into a value based on cents, and relative pitch difference data between the two stationary sections based on the value based on cents converted by the cent value conversion means. Pitch difference calculating means for calculating the pitch difference data calculated by the pitch difference calculating means, and assigning a pitch on a predetermined scale to each steady section based on the pitch difference data. Performance information generating device, characterized in that it comprises the assigning means.

10. An input means for inputting an arbitrary sound signal from outside, a steady-state analysis means for analyzing a steady-state section corresponding to one note from the sound signal input from the input means, Frequency determining means for determining a representative frequency for each of the stationary sections analyzed by the section analyzing means; phrase detecting means for detecting a single phrase by collecting a plurality of the stationary sections analyzed by the stationary section analyzing means; Cent value conversion means for converting the difference between the representative frequencies into values based on cents for all the stationary sections existing before the stationary section in one phrase detected by the phrase detecting section; Relative to all the stationary sections existing before the stationary section in one phrase detected by the phrase detecting means. Weight calculating means for calculating a weight based on a proper time distance; and a relative value between the two stationary sections based on a value based on the cent converted by the cent value converting means and the weight calculated by the weight calculating means. Difference calculating means for calculating typical pitch difference data, and pitch assignment for assigning a pitch on a predetermined scale to each steady section based on the pitch difference data calculated by the pitch difference calculating means Means for generating performance information.

11. An input means for inputting an arbitrary sound signal from the outside, a stationary section analyzing means for analyzing a stationary section corresponding to one note from the sound signal input from the input means, Frequency determining means for determining a representative frequency for each of the stationary sections analyzed by the section analyzing means; phrase detecting means for detecting a single phrase by collecting a plurality of the stationary sections analyzed by the stationary section analyzing means; Cent value conversion means for converting the difference between the representative frequency of the first stationary section in one phrase detected by the phrase detecting means and the representative frequency of each other stationary section in the phrase to a value based on cents, The relative pitch difference data between the two stationary sections is calculated based on the value based on the cent converted by the cent value converting means. Pitch difference calculating means, and pitch assigning means for assigning a pitch on a predetermined scale to each steady interval based on the pitch difference data calculated by the pitch difference calculating means, Performance information generator.

12. When assigning a pitch on a predetermined scale to each stationary section, the pitch assigning means assigns a predetermined pitch to an initial stationary section, and then assigns a predetermined pitch to the remaining stationary sections in order. The musical performance information generating apparatus according to claim 9, 10 or 11, wherein a pitch on a musical scale is assigned.

13. The pitch assigning means, when assigning a pitch on a predetermined scale to each stationary section, analyzes a sound signal of a first stationary section and detects an average frequency of the stationary section,
The pitch based on the detected average frequency is assigned as the pitch of the first stationary section, and then the pitch on a predetermined scale is sequentially assigned to the remaining stationary sections. 12. The performance information generator according to item 11.

14. The pitch assigning means assigns pitches on a plurality of scales to each steady interval while shifting note positions, and calculates a total value of note assignment errors at each note position on each scale. 12. The performance information according to claim 9, 10 or 11, wherein an optimum scale is determined according to the accumulated value, and pitches on the determined scale are sequentially assigned as pitches of the stationary section. Generator.

15. The pitch assigning means assigns a pitch outside the scale according to a value of a note allowable error range when assigning the determined pitch on the scale in order as a pitch of the stationary section. The method according to claim 9, 10 or 11, wherein
A performance information generator according to any one of the preceding claims.

16. The method according to claim 5, wherein the stationary section analyzing means includes:
9. The sound signal analyzer according to claim 6, 7 or 8, wherein the stationary section is analyzed.
2. The performance information generator according to 1.