JP2008268369A

JP2008268369A - Vibrato detecting device, vibrato evaluating device, vibrato detecting method, and vibrato evaluating method, and program

Info

Publication number: JP2008268369A
Application number: JP2007108526A
Authority: JP
Inventors: Ryuichi Nariyama; 隆一成山; Tatsuya Iriyama; 達也入山; 拓弥 ▲高▼橋; Takuya Takahashi
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2007-04-17
Filing date: 2007-04-17
Publication date: 2008-11-06
Anticipated expiration: 2027-04-17
Also published as: JP4900017B2

Abstract

<P>PROBLEM TO BE SOLVED: To detects a section in which a vibrato technique is used and further to evaluate skill in vibrato technique in the section. <P>SOLUTION: A control unit 11 detects pitches for each frame of a predetermined time length from singer voice data, generates pitch data representing the detected pitches, and performs filter processing for extracting a specified frequency component. Then the control unit 11 computes pitches of the pitch data subjected to the filter processing for each note of melody pitch data based upon a pitch of the melody pitch data as a zero reference value, and extracts various parameters from temporal variation in pitch. Further, the control unit 11 specifies a section to be sung with the vibrato technique according to the extracted parameters and evaluates skill in vibrato technique of the singer voice data in the section using the various parameters extracted from the singer voice data. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、音データからビブラート技法が用いられている箇所を特定する技術、および音データにおけるビブラート技法の巧拙を評価する技術に関する。 The present invention relates to a technique for identifying a place where a vibrato technique is used from sound data, and a technique for evaluating the skill of the vibrato technique in sound data.

カラオケ装置においては、歌唱者の歌唱の巧拙を採点したり、歌唱の指導内容を表すメッセージを表示して歌唱指導を行ったりする装置が提案されている。このようなカラオケ装置において、例えばビブラート技法について採点や指導を行う場合には、まずお手本となる歌唱音声からビブラート技法が用いられている区間（以下、ビブラート区間）を特定し、ビブラート区間について採点や指導を行う方法が考えられる。なお、ビブラート技法とは、音を伸ばしながらピッチをわずかに上下させ震えるような音色を出すことにより、音に豊かな響きを与える歌唱技法である。
ビブラート区間を特定する方法として、例えば、特許文献１には、入力データのピッチが振動する区間をビブラート区間として特定する方法が提案されている。
特開２００６−１９５４４９号公報 In the karaoke apparatus, an apparatus for scoring a singer's skill of singing or displaying a message indicating the singing instruction content to perform singing instruction has been proposed. In such a karaoke apparatus, for example, when scoring and teaching about the vibrato technique, first, a section where the vibrato technique is used (hereinafter referred to as a vibrato section) is identified from a singing voice as a model, A method of teaching can be considered. Note that the vibrato technique is a singing technique that gives the sound a rich reverberation by producing a timbre that slightly raises and lowers the pitch while stretching the sound.
As a method for specifying a vibrato section, for example, Patent Document 1 proposes a method for specifying a section in which the pitch of input data vibrates as a vibrato section.
JP 2006-195449 A

さて、上記特許文献１に記載の技術によれば、ピッチが振動することを指標としてビブラート区間を特定しようとすると、例えばトレモロなどビブラートとは異なる技法が含まれる区間をビブラート区間として特定してしまうという問題点があった。なお、トレモロにおいては、ビブラートに比べピッチの変化の幅が大きくピッチの変動パターンは不安定であるのが一般的である。
また、上記特許文献１は、ビブラート区間を特定する技術について開示されているが、特定された区間におけるビブラート技法の巧拙を評価する技術については開示されていない。 Now, according to the technique described in Patent Document 1, when trying to specify a vibrato section using the vibration of the pitch as an index, a section including a technique different from vibrato such as tremolo is specified as a vibrato section. There was a problem. In tremolo, the pitch variation pattern is generally larger than that of vibrato, and the pitch fluctuation pattern is generally unstable.
Moreover, although the said patent document 1 is disclosed about the technique which specifies a vibrato area, the technique which evaluates the skill of the vibrato technique in the specified area is not disclosed.

本発明は上述した背景の下になされたものであり、音データからビブラート技法が用いられている区間を検出し、更には該検出された区間におけるビブラート技法の巧拙を評価する技術を提供することを目的とする。 The present invention has been made under the background described above, and provides a technique for detecting a section in which the vibrato technique is used from sound data and further evaluating the skill of the vibrato technique in the detected section. With the goal.

本発明の好適な態様であるビブラート検出装置は、メロディのピッチを表すメロディピッチデータを記憶する記憶手段と、音データから所定時間長のフレーム単位でピッチを検出し、検出したピッチを表すピッチデータを生成するピッチ検出手段と、前記ピッチ検出手段により生成されたピッチデータに対して、特定の周波数成分を抽出するフィルタ処理を施すフィルタ処理手段と、前記記憶手段に記憶されたメロディピッチデータの表すピッチをゼロ基準値とし、前記フィルタ処理手段によりフィルタ処理が施されたピッチデータと前記ゼロ基準値との相対的な値を表すピッチ差分データの値を算出し、前記ピッチ差分データが負から正又は正から負に変化する箇所をゼロクロス箇所として特定するゼロクロス箇所特定手段と、前記ゼロクロス箇所特定手段により特定されたゼロクロス箇所が所定の態様になっている区間を特定する特定手段と、前記ピッチ差分データにおいて、ピッチの値が極大および極小となる時刻を特定すると共に、隣接する極大値と極小値との間の差分からピッチ振動幅データを生成するピッチ振動幅データ生成手段と、前記特定手段により特定された区間において、前記ピッチ振動幅データが所定の態様になっている場合に前記特定手段が特定した区間を示す情報を出力する出力手段とを備えることを特徴とする。 A vibrato detecting device according to a preferred aspect of the present invention includes a storage unit that stores melody pitch data representing a pitch of a melody, pitch data representing a detected pitch by detecting the pitch in units of frames of a predetermined time length from sound data. The pitch detection means for generating the filter, the filter processing means for applying a filter process for extracting a specific frequency component to the pitch data generated by the pitch detection means, and the melody pitch data stored in the storage means Using the pitch as a zero reference value, a pitch difference data value representing a relative value between the pitch data filtered by the filter processing means and the zero reference value is calculated, and the pitch difference data is changed from negative to positive. Or a zero-cross point specifying means for specifying a point that changes from positive to negative as a zero-cross point; In the pitch difference data, the time at which the pitch value becomes the maximum and the minimum is specified in the pitch difference data, and the adjacent maximum is specified. A pitch vibration width data generating means for generating pitch vibration width data from the difference between the value and the minimum value, and the pitch vibration width data in a predetermined mode in the section specified by the specifying means Output means for outputting information indicating the section specified by the specifying means.

本発明の好適な態様であるビブラート検出装置において、前記特定手段は、前記ゼロクロス箇所特定手段により特定されたゼロクロス箇所が現れる時間間隔を測定し、測定された時間間隔が予め定められた範囲内であり、かつ、その状態が連続して所定回数以上検出された区間を特定しても良い。 In the vibrato detection device according to a preferred aspect of the present invention, the specifying unit measures a time interval at which the zero cross point specified by the zero cross point specifying unit appears, and the measured time interval is within a predetermined range. There may also be specified a section in which the state is continuously detected a predetermined number of times or more.

本発明の好適な態様であるビブラート検出装置において、前記出力手段は、前記特定手段により特定された区間における前記ピッチ変動幅データの時間変化を近似する直線を算出し、算出された前記直線の傾きの絶対値が所定の閾値より小さい場合に、前記特定手段が特定した区間を示す情報を出力しても良い。 In the vibrato detection device according to a preferred aspect of the present invention, the output means calculates a straight line that approximates a time change of the pitch fluctuation width data in the section specified by the specifying means, and the inclination of the calculated straight line When the absolute value of is smaller than a predetermined threshold, information indicating the section specified by the specifying means may be output.

本発明の好適な態様であるビブラート検出装置において、前記出力手段は、前記特定手段により特定された区間における前記ピッチ変動幅データの時間平均を算出し、算出された前記時間平均が所定の範囲にある場合に、前記特定手段が特定した区間を示す情報を出力しても良い。 In the vibrato detecting device according to a preferred aspect of the present invention, the output means calculates a time average of the pitch fluctuation data in the section specified by the specifying means, and the calculated time average falls within a predetermined range. In some cases, information indicating the section specified by the specifying unit may be output.

本発明の好適な態様であるビブラート評価装置は、メロディのピッチを表すメロディピッチデータを記憶する記憶手段と、音データから所定時間長のフレーム単位でピッチを検出し、検出したピッチを表すピッチデータを生成するピッチ検出手段と、前記ピッチ検出手段により生成されたピッチデータに対して、特定の周波数成分を抽出するフィルタ処理を施すフィルタ処理手段と、ビブラート区間を示すデータを受取る受取手段と、前記受取手段が受取ったデータが示すビブラート区間において、前記記憶手段に記憶されたメロディピッチデータの表すピッチをゼロ基準値とし、前記フィルタ処理手段によりフィルタ処理が施されたピッチデータと前記ゼロ基準値との相対的な値を表すピッチ差分データのピッチの値を算出し、前記ピッチ差分データが極大および極小となる時刻を特定すると共に、隣接する極大値と極小値との間の差分からピッチ振動幅データを生成するピッチ振動幅データ生成手段と、前記受取手段が受取ったデータが示すビブラート区間において、前記ピッチ振動幅データが所定の態様になっている場合に高い評価を算出し、該評価を出力する出力手段とを備えることを特徴とする。 A vibrato evaluation device according to a preferred embodiment of the present invention includes a storage unit that stores melody pitch data that represents a pitch of a melody, pitch data that detects the pitch in units of frames of a predetermined time length from sound data, and represents the detected pitch data A pitch detection means for generating a filter processing means for extracting a specific frequency component from the pitch data generated by the pitch detection means, a receiving means for receiving data indicating a vibrato section, and In the vibrato section indicated by the data received by the receiving means, the pitch represented by the melody pitch data stored in the storage means is set as a zero reference value, and the pitch data filtered by the filter processing means and the zero reference value Calculating the pitch value of the pitch difference data representing the relative value of the pitch Pitch vibration width data generating means for specifying the time when the minute data becomes maximum and minimum, and generating pitch vibration width data from the difference between the adjacent maximum and minimum values, and the data received by the receiving means In the vibrato section shown, when the pitch vibration width data is in a predetermined form, an output means for calculating a high evaluation and outputting the evaluation is provided.

本発明の好適な態様であるビブラート評価装置において、前記出力手段は、前記ピッチ変動幅データの時間変化を近似する直線を算出し、算出された前記直線の傾きの絶対値が小さいほど高い評価を算出しても良い。 In the vibrato evaluation device according to a preferred aspect of the present invention, the output means calculates a straight line that approximates the time variation of the pitch fluctuation data, and the smaller the absolute value of the calculated slope of the straight line, the higher the evaluation. It may be calculated.

本発明の好適な態様であるビブラート評価装置において、前記出力手段は、前記ピッチ変動幅データの値からその標準偏差または標準誤差を算出し、算出された前記標準偏差または標準誤差が小さいほど高い評価を算出しても良い。 In the vibrato evaluation apparatus according to a preferred aspect of the present invention, the output means calculates the standard deviation or standard error from the value of the pitch fluctuation range data, and the smaller the calculated standard deviation or standard error, the higher the evaluation. May be calculated.

本発明の好適な態様であるビブラート評価装置において、前記ビブラート区間において、前記ピッチ差分値データの値が、負から正又は正から負に変化する箇所をゼロクロス箇所として特定するゼロクロス箇所特定手段と、前記ゼロクロス箇所特定手段により特定されたゼロクロス箇所が現れる時間間隔を測定し、測定された時間間隔の標準偏差または標準誤差を算出する算出手段とを更に有し、前記出力手段は、前記算出手段が算出した標準偏差または標準誤差が小さいほど高い評価を算出しても良い。 In the vibrato evaluation device according to a preferred aspect of the present invention, in the vibrato section, a zero-cross location specifying means for specifying a location where the value of the pitch difference value data changes from negative to positive or from positive to negative as a zero-cross location; And measuring means for measuring a time interval at which the zero-cross point specified by the zero-cross point specifying unit appears, and calculating a standard deviation or a standard error of the measured time interval, and the output unit includes: Higher evaluation may be calculated as the calculated standard deviation or standard error is smaller.

本発明の好適な態様であるビブラート検出方法は、メロディのピッチを表すメロディピッチデータを記憶装置に記憶する記憶段階と、音データから所定時間長のフレーム単位でピッチを検出し、検出したピッチを表すピッチデータを生成するピッチ検出段階と、前記ピッチ検出段階において生成されたピッチデータに対して、特定の周波数成分を抽出するフィルタ処理を施すフィルタ処理段階と、前記記憶装置に記憶されたメロディピッチデータの表すピッチをゼロ基準値とし、前記フィルタ処理段階においてフィルタ処理が施されたピッチデータと前記ゼロ基準値との相対的な値を表すピッチ差分データの値を算出し、前記ピッチ差分データが負から正又は正から負に変化する箇所をゼロクロス箇所として特定するゼロクロス箇所特定段階と、前記ゼロクロス箇所特定段階において特定されたゼロクロス箇所が所定の態様になっている区間を特定する特定段階と、前記ピッチ差分データにおいて、ピッチの値が極大および極小となる時刻を特定すると共に、隣接する極大値と極小値との間の差分からピッチ振動幅データを生成するピッチ振動幅データ生成段階と、前記特定段階において特定された区間において、前記ピッチ振動幅データが所定の態様になっている場合に前記特定段階において特定した区間を示す情報を出力する出力段階とを備えることを特徴とする。 The vibrato detection method according to a preferred aspect of the present invention includes a storage step of storing melody pitch data representing the pitch of a melody in a storage device, and detecting the pitch in units of frames of a predetermined time length from the sound data, A pitch detection stage for generating pitch data to represent, a filter processing stage for performing a filtering process to extract a specific frequency component on the pitch data generated in the pitch detection stage, and a melody pitch stored in the storage device A pitch represented by data is set as a zero reference value, and a value of pitch difference data representing a relative value between the pitch data filtered in the filtering step and the zero reference value is calculated. Zero-cross point identification stage that identifies a point that changes from negative to positive or positive to negative as a zero-cross point A step of specifying a section in which the zero-cross point specified in the zero-cross point specifying step is in a predetermined form; and specifying a time at which the pitch value is maximum and minimum in the pitch difference data; The pitch vibration width data is in a predetermined mode in the pitch vibration width data generation stage for generating pitch vibration width data from the difference between the maximum value and the minimum value to be generated, and in the section specified in the specific stage. And an output stage for outputting information indicating the section identified in the specific stage.

本発明の好適な態様であるビブラート評価方法は、メロディのピッチを表すメロディピッチデータを記憶装置に記憶する記憶段階と、音データから所定時間長のフレーム単位でピッチを検出し、検出したピッチを表すピッチデータを生成するピッチ検出段階と、前記ピッチ検出段階において生成されたピッチデータに対して、特定の周波数成分を抽出するフィルタ処理を施すフィルタ処理段階と、ビブラート区間を示すデータを受取る受取段階と、前記受取段階において受取ったデータが示すビブラート区間において、前記記憶装置に記憶されたメロディピッチデータの表すピッチをゼロ基準値とし、前記フィルタ処理段階においてフィルタ処理が施されたピッチデータと前記ゼロ基準値との相対的な値を表すピッチ差分データのピッチの値を算出し、前記ピッチ差分データが極大および極小となる時刻を特定すると共に、隣接する極大値と極小値との間の差分からピッチ振動幅データを生成するピッチ振動幅データ生成段階と、前記受取段階において受取ったデータが示すビブラート区間において、前記ピッチ振動幅データが所定の態様になっている場合に高い評価を算出し、該評価を出力する出力段階とを備えることを特徴とする。 The vibrato evaluation method, which is a preferred aspect of the present invention, stores a melody pitch data representing the pitch of a melody in a storage device, detects the pitch in units of frames of a predetermined time length from the sound data, and detects the detected pitch. A pitch detection stage for generating pitch data to be represented, a filter processing stage for applying a filtering process for extracting a specific frequency component to the pitch data generated in the pitch detection stage, and a reception stage for receiving data indicating a vibrato section In the vibrato section indicated by the data received in the receiving step, the pitch represented by the melody pitch data stored in the storage device is set as a zero reference value, and the pitch data filtered in the filtering step and the zero The pitch value of the pitch difference data that indicates the relative value to the reference value Calculating and specifying the time when the pitch difference data becomes maximum and minimum, and generating the pitch vibration width data from the difference between the adjacent maximum and minimum values, and the receiving step In the vibrato section indicated by the data received in step (b), a high evaluation is calculated when the pitch vibration width data is in a predetermined mode, and an output stage is provided for outputting the evaluation.

本発明の好適な態様であるプログラムは、コンピュータを、メロディのピッチを表すメロディピッチデータを記憶する記憶手段と、音データから所定時間長のフレーム単位でピッチを検出し、検出したピッチを表すピッチデータを生成するピッチ検出手段と、前記ピッチ検出手段により生成されたピッチデータに対して、特定の周波数成分を抽出するフィルタ処理を施すフィルタ処理手段と、前記記憶手段に記憶されたメロディピッチデータの表すピッチをゼロ基準値とし、前記フィルタ処理手段によりフィルタ処理が施されたピッチデータと前記ゼロ基準値との相対的な値を表すピッチ差分データの値を算出し、前記ピッチ差分データが負から正又は正から負に変化する箇所をゼロクロス箇所として特定するゼロクロス箇所特定手段と、前記ゼロクロス箇所特定手段により特定されたゼロクロス箇所が所定の態様になっている区間を特定する特定手段と、前記ピッチ差分データにおいて、ピッチの値が極大および極小となる時刻を特定すると共に、隣接する極大値と極小値との間の差分からピッチ振動幅データを生成するピッチ振動幅データ生成手段と、前記特定手段により特定された区間において、前記ピッチ振動幅データが所定の態様になっている場合に前記特定手段が特定した区間を示す情報を出力する出力手段として機能させることを特徴とする。 A program according to a preferred embodiment of the present invention includes a computer that stores storage means for storing melody pitch data representing the pitch of a melody, and detects the pitch in units of frames of a predetermined time length from the sound data, and the pitch representing the detected pitch. A pitch detection means for generating data, a filter processing means for extracting a specific frequency component from the pitch data generated by the pitch detection means, and a melody pitch data stored in the storage means. The pitch to be represented is set as a zero reference value, and a value of pitch difference data representing a relative value between the pitch data filtered by the filter processing unit and the zero reference value is calculated. Zero cross point specifying means for specifying a point that changes from positive or positive to negative as a zero cross point; The specifying means for specifying the section in which the zero-cross location specified by the zero-cross location specifying means is in a predetermined form, and the time when the pitch value is maximum and minimum in the pitch difference data, and the adjacent maximum A pitch vibration width data generating means for generating pitch vibration width data from the difference between the value and the minimum value, and the pitch vibration width data in a predetermined mode in the section specified by the specifying means It is made to function as an output means which outputs the information which shows the area which the said specific means specified.

本発明の好適な態様であるプログラムの別の態様は、コンピュータを、メロディのピッチを表すメロディピッチデータを記憶する記憶手段と、音データから所定時間長のフレーム単位でピッチを検出し、検出したピッチを表すピッチデータを生成するピッチ検出手段と、前記ピッチ検出手段により生成されたピッチデータに対して、特定の周波数成分を抽出するフィルタ処理を施すフィルタ処理手段と、ビブラート区間を示すデータを受取る受取手段と、前記受取手段が受取ったデータが示すビブラート区間において、前記記憶手段に記憶されたメロディピッチデータの表すピッチをゼロ基準値とし、前記フィルタ処理手段によりフィルタ処理が施されたピッチデータと前記ゼロ基準値との相対的な値を表すピッチ差分データのピッチの値を算出し、前記ピッチ差分データが極大および極小となる時刻を特定すると共に、隣接する極大値と極小値との間の差分からピッチ振動幅データを生成するピッチ振動幅データ生成手段と、前記受取手段が受取ったデータが示すビブラート区間において、前記ピッチ振動幅データが所定の態様になっている場合に高い評価を算出し、該評価を出力する出力手段として機能させることを特徴とする。 According to another aspect of the program which is a preferred aspect of the present invention, the computer detects the pitch in units of frames of a predetermined time length from the storage means for storing the melody pitch data representing the pitch of the melody and the sound data. Pitch detection means for generating pitch data representing the pitch, filter processing means for applying a filter process for extracting a specific frequency component to the pitch data generated by the pitch detection means, and data indicating a vibrato section Pitch data represented by the melody pitch data stored in the storage means as a zero reference value in the vibrato section indicated by the data received by the receiving means and the receiving means, and the pitch data filtered by the filtering means; The pitch value of the pitch difference data representing the relative value with the zero reference value A pitch vibration width data generating means for specifying a time at which the pitch difference data becomes maximum and minimum, and generating pitch vibration width data from a difference between adjacent maximum values and minimum values; and the receiving means In the vibrato section indicated by the received data, a high evaluation is calculated when the pitch vibration width data is in a predetermined mode, and the data is made to function as output means for outputting the evaluation.

本発明によれば、音データからビブラート技法が用いられている区間を検出し、更には該区間におけるビブラート技法の巧拙を評価することができる。 According to the present invention, it is possible to detect a section in which the vibrato technique is used from the sound data, and further evaluate the skill of the vibrato technique in the section.

本発明に係るカラオケ装置１は、お手本となる歌唱音声（以下、お手本音声）に基づいてビブラート技法を用いて歌唱すべき区間を楽曲において特定する機能を有する。またカラオケ装置１は、歌唱を練習する者（以下、練習者）の歌唱において、特にビブラート技法の巧拙を評価する機能を有する。
以下、図面を参照して、本発明の実施形態について説明する。 The karaoke apparatus 1 according to the present invention has a function of specifying a section to be sung in a music piece using a vibrato technique based on a singing voice (hereinafter, a model voice) as a model. Further, the karaoke apparatus 1 has a function of evaluating the skill of the vibrato technique particularly in the singing of a person who practices singing (hereinafter, a practitioner).
Embodiments of the present invention will be described below with reference to the drawings.

（Ａ：構成）
図１は、本実施形態であるカラオケ装置１のハードウェア構成の一例を示すブロック図である。
図において、制御部１１は、ＣＰＵ（Central Processing Unit）、ＲＯＭ（Read Only Memory）、およびＲＡＭ（Random Access Memory）を備え、上記ＲＯＭに記憶されている制御プログラムを読み出して実行することにより、カラオケ装置１の各部を制御する。 (A: Configuration)
FIG. 1 is a block diagram illustrating an example of a hardware configuration of a karaoke apparatus 1 according to the present embodiment.
In the figure, the control unit 11 includes a CPU (Central Processing Unit), a ROM (Read Only Memory), and a RAM (Random Access Memory), and reads out and executes a control program stored in the ROM, thereby Each part of the apparatus 1 is controlled.

表示部１３は、液晶パネルを備え、制御部１１による制御の下でカラオケの歌詞テロップなどの各種の画像を表示する。
操作部１４は各種の操作子を備え、操作者による操作内容を表す操作信号を制御部１１に出力する。 The display unit 13 includes a liquid crystal panel, and displays various images such as karaoke lyrics telop under the control of the control unit 11.
The operation unit 14 includes various operators, and outputs an operation signal indicating the operation content by the operator to the control unit 11.

マイクロホン１５は、収音した音声を表すアナログの音声信号を出力する。
音声処理部１６は、Ａ／Ｄコンバータを有し、マイクロホン１５が出力した音声信号をデジタルの音声データに変換して制御部１１に出力する。また、音声処理部１６は、Ｄ／Ａコンバータを有し、制御部１１から受取ったデジタルの音声データをアナログの音声信号に変換してスピーカ１７に出力する。
スピーカ１７は、音声処理部１６から受取った音声信号を放音する。 The microphone 15 outputs an analog audio signal representing the collected audio.
The audio processing unit 16 includes an A / D converter, converts the audio signal output from the microphone 15 into digital audio data, and outputs the digital audio data to the control unit 11. The audio processing unit 16 includes a D / A converter, converts digital audio data received from the control unit 11 into an analog audio signal, and outputs the analog audio signal to the speaker 17.
The speaker 17 emits the audio signal received from the audio processing unit 16.

記憶部１２は、各種のデータを記憶するための記憶手段であり、例えばハードディスク装置である。記憶部１２は、以下に説明する各種の記憶領域を備えている。 The storage unit 12 is a storage unit for storing various data, and is, for example, a hard disk device. The storage unit 12 includes various storage areas described below.

伴奏データ記憶領域１２１には、各楽曲の伴奏の楽音を表す伴奏データが記憶されている。該伴奏データは、例えばＭＩＤＩ（Musical Instrument Digital Interface）形式などのデータ形式で、上記各楽曲の伴奏を行う各種楽器の音高（ピッチ）を示す情報が楽曲の進行に伴って記されている。また、この伴奏データには、楽曲のガイドメロディのノート（音符）毎のピッチを示すメロディピッチデータが含まれている。
歌詞データ記憶領域１２２には、各楽曲の歌詞を示す歌詞データが記憶されている。歌詞データは、カラオケ歌唱の際に楽曲の進行に伴い歌詞テロップとして表示部１３に表示される。
お手本音声データ記憶領域１２３には、各楽曲の歌唱のお手本として予め収録された音声を表すお手本音声データが記憶されている。お手本音声データは、ＷＡＶＥ形式やＭＰ３（MPEG1 Audio Layer-3）形式などの音声データである。 In the accompaniment data storage area 121, accompaniment data representing accompaniment sounds of each music is stored. The accompaniment data is, for example, in a data format such as MIDI (Musical Instrument Digital Interface) format, and information indicating the pitch (pitch) of various musical instruments that accompany each of the above-mentioned music is written along with the progress of the music. The accompaniment data includes melody pitch data indicating the pitch of each guide melody note (note) of the music.
The lyrics data storage area 122 stores lyrics data indicating the lyrics of each song. The lyric data is displayed on the display unit 13 as a lyric telop as the music progresses during karaoke singing.
In the model voice data storage area 123, model voice data representing voice recorded in advance as a model for singing each music is stored. The model audio data is audio data in the WAVE format or MP3 (MPEG1 Audio Layer-3) format.

練習者音声データ記憶領域１２４には、各楽曲について、練習者の歌唱がマイクロホン１５によって収音されて音声処理部１６でデジタルデータに変換されることで生成された音声データ（以下、練習者音声データ）が記憶される。この練習者音声データも、ＷＡＶＥ形式やＭＰ３形式などの音声データである。
ビブラート区間データ記憶領域１２５には、各楽曲のお手本音声データにおいてビブラート技法が用いられている区間を示すデータ（以下、ビブラート区間データ）が記憶される。このビブラート区間データは、後述するように制御部１１によって生成されるデータである。
パラメータ記憶領域１２６には、各楽曲について、お手本音声データまたは練習者音声データから抽出されたピッチや、該ピッチから抽出された各種パラメータが記憶される。 In the trainer voice data storage area 124, for each piece of music, voice data generated by the trainer's singing being picked up by the microphone 15 and converted into digital data by the voice processing unit 16 (hereinafter referred to as “practitioner voice”). Data) is stored. This trainer voice data is also voice data in the WAVE format or MP3 format.
The vibrato section data storage area 125 stores data (hereinafter referred to as vibrato section data) indicating sections in which the vibrato technique is used in the model voice data of each music piece. The vibrato section data is data generated by the control unit 11 as will be described later.
The parameter storage area 126 stores a pitch extracted from the model voice data or the trainer voice data and various parameters extracted from the pitch for each musical piece.

（Ｂ：実施形態の動作）
次に、本発明に係るカラオケ装置１の動作について説明する。なお、以下の説明においては、カラオケ装置１の制御部１１が、お手本音声からビブラート区間を特定し、練習者音声データの上記特定した区間において、ビブラート技法の巧拙を評価する動作について説明する。 (B: Operation of the embodiment)
Next, the operation of the karaoke apparatus 1 according to the present invention will be described. In the following description, an operation in which the control unit 11 of the karaoke apparatus 1 specifies a vibrato section from the model voice and evaluates the skill of the vibrato technique in the specified section of the trainer voice data will be described.

（Ｂ−１；カラオケ伴奏）
練習者が、カラオケ装置１の操作部１４を操作して歌唱する楽曲を選択すると、選択された楽曲を特定する操作信号が操作部１４から制御部１１に出力される。制御部１１は、操作部１４から供給された操作信号に応じて楽曲を選択し、選択した楽曲について図２に示すフローチャートに示すビブラート区間特定処理を行う。 (B-1; Karaoke accompaniment)
When the practitioner operates the operation unit 14 of the karaoke apparatus 1 and selects a song to sing, an operation signal specifying the selected song is output from the operation unit 14 to the control unit 11. The control unit 11 selects music according to the operation signal supplied from the operation unit 14, and performs the vibrato section specifying process shown in the flowchart of FIG. 2 for the selected music.

ステップＳＡ１００において、制御部１１は、歌詞テロップを表示部１３に表示させるとともに、カラオケ伴奏を行わせる。すなわち、制御部１１は、伴奏データ記憶領域１２１から伴奏データを読み出して音声処理部１６に供給する。音声処理部１６は、伴奏データをアナログの音声信号に変換し、スピーカ１７に出力する。スピーカ１７は、音声処理部１６から受取った音声信号を放音する。また、制御部１１は、歌詞データ記憶領域１２２から歌詞データを読み出して、該歌詞データに従って歌詞テロップを表示部１３に表示させる。
練習者は、表示部１３に表示された歌詞テロップを見ながら、スピーカ１７から放音されるカラオケ伴奏にあわせて歌唱を行う。練習者の歌唱は収音され、練習者音声データが生成される（ステップＳＡ１１０）。生成された練習者音声データは記憶部１２の練習者音声データ記憶領域１２４に書き込まれる。 In step SA100, the control unit 11 displays the lyrics telop on the display unit 13 and performs karaoke accompaniment. That is, the control unit 11 reads the accompaniment data from the accompaniment data storage area 121 and supplies it to the audio processing unit 16. The audio processing unit 16 converts the accompaniment data into an analog audio signal and outputs it to the speaker 17. The speaker 17 emits the audio signal received from the audio processing unit 16. Further, the control unit 11 reads out the lyrics data from the lyrics data storage area 122 and causes the display unit 13 to display the lyrics telop according to the lyrics data.
The practitioner sings along with the karaoke accompaniment emitted from the speaker 17 while watching the lyrics telop displayed on the display unit 13. The trainee's song is picked up and trainer voice data is generated (step SA110). The generated trainer voice data is written in the trainer voice data storage area 124 of the storage unit 12.

制御部１１は、楽曲の進行に伴いお手本音声データおよび練習者音声データのピッチを解析し、解析結果を表すお手本ピッチデータおよび練習者ピッチデータを生成する（ステップＳＡ１２０）。すなわち、選択した楽曲に対応するお手本音声データをお手本音声データ記憶領域１２３から読み出し、読み出したお手本音声データから所定時間長（例えば、１０msec）のフレーム単位でピッチを検出する。そして制御部１１は、該検出されたピッチからメロディピッチデータに含まれるノートのピッチをゼロ基準値とした場合の相対値を算出し、お手本ピッチデータを生成する。生成されたお手本ピッチデータは、記憶部１２のパラメータ記憶領域１２６に書き込まれる。
また、制御部１１は、練習者音声データ記憶領域１２４に書き込まれた練習者音声データから、所定時間長（例えば、１０msec）のフレーム単位でピッチを検出する。そして制御部１１は、該検出されたピッチからメロディピッチデータに含まれるノートのピッチをゼロ基準値とした場合の相対値を算出し、練習者ピッチデータを生成する。生成された練習者ピッチデータは、記憶部１２のパラメータ記憶領域１２６に書き込まれる。 The control unit 11 analyzes the pitches of the model voice data and the trainer voice data as the music progresses, and generates model pitch data and practitioner pitch data representing the analysis results (step SA120). That is, the model voice data corresponding to the selected music is read from the model voice data storage area 123, and the pitch is detected from the read model voice data in units of frames of a predetermined time length (for example, 10 msec). Then, the control unit 11 calculates a relative value when the pitch of the note included in the melody pitch data is set as a zero reference value from the detected pitch, and generates model pitch data. The generated model pitch data is written in the parameter storage area 126 of the storage unit 12.
Further, the control unit 11 detects the pitch in units of frames of a predetermined time length (for example, 10 msec) from the trainer speech data written in the trainer speech data storage area 124. Then, the control unit 11 calculates a relative value when the pitch of the note included in the melody pitch data is set as the zero reference value from the detected pitch, and generates the trainer pitch data. The generated trainee pitch data is written in the parameter storage area 126 of the storage unit 12.

図３には、上述のようにして生成されたお手本ピッチデータの一部（時刻１２ｓ０００ｍｓ〜１８ｓ０００ｍｓ）をグラフＡ１で示す。図３において、横軸は時刻（楽曲が開始されてからの経過時間）を表す。また、縦軸には、各時刻におけるお手本ピッチデータの値が示されている。練習者ピッチデータについても、お手本ピッチデータと同様にデータが生成されるが、ここではその内容の図示は省略する。 In FIG. 3, a part of the model pitch data generated as described above (time 12s000ms to 18s000ms) is shown by a graph A1. In FIG. 3, the horizontal axis represents time (elapsed time since the music was started). The vertical axis indicates the value of the model pitch data at each time. The trainer pitch data is also generated in the same manner as the model pitch data, but the illustration of the contents is omitted here.

ステップＳＡ１３０において、制御部１１は、楽曲の演奏が一曲分終了したか否かを判定する。ステップＳＡ１３０の判定結果が“Ｙｅｓ”である場合には、ステップＳＡ１４０以下の処理を行う。一方、ステップＳＡ１３０の判定結果が“Ｎｏ”である場合には、楽曲の残りの部分についてステップＳＡ１００ないしステップＳＡ１２０の処理を行う。 In step SA130, the control unit 11 determines whether or not the music performance has been completed for one song. If the determination result in step SA130 is “Yes”, the processes in and after step SA140 are performed. On the other hand, if the determination result in step SA130 is “No”, the processing from step SA100 to step SA120 is performed on the remaining portion of the music.

（Ｂ−２；ビブラート区間の特定）
制御部１１は、楽曲の演奏が終了すると、お手本ピッチデータを用いてビブラート区間の特定を行う。
ステップＳＡ１４０において、制御部１１は、ステップＳＡ１２０において生成したお手本ピッチデータに対して、特定の周波数成分を抽出するフィルタ処理を施す。すなわち、制御部１１は、お手本ピッチデータを６Ｈｚより低い周波数の成分を抽出するローパスフィルタで処理し、新たなピッチデータ（以下、フィルタお手本ピッチデータ）を生成する。図３におけるグラフＡ２は、お手本ピッチデータを上記ローパスフィルタにより処理したフィルタお手本ピッチデータを示している。 (B-2; identification of vibrato section)
When the performance of the music is finished, the control unit 11 specifies the vibrato section using the model pitch data.
In step SA140, the control unit 11 performs a filter process for extracting a specific frequency component on the model pitch data generated in step SA120. That is, the control unit 11 processes the model pitch data with a low-pass filter that extracts components having a frequency lower than 6 Hz, and generates new pitch data (hereinafter referred to as filter model pitch data). A graph A2 in FIG. 3 shows filter model pitch data obtained by processing the model pitch data by the low-pass filter.

図３に示されるように、フィルタをかける前のピッチデータ（例えばお手本ピッチデータ；Ａ１）には、波形に細かい乱れがある。このような波形の乱れは例えばリバーブによるものであり、リバーブのかかった音声データからピッチを検出した場合には、その検出結果は正弦波にならず波形の乱れたものとなる。そのため、リバーブのかかったお手本音声からは、ビブラート区間を正確に検出することが困難であった。更には、お手本音声にリバーブがかかっているか否かをお手本音声データから判定することも困難であった。それに対して、ローパスフィルタで処理されたピッチデータにおいては音声にかけられたリバーブの影響は取り除かれており、本処理によって、後述の処理において適切にビブラート区間を特定することが可能になる。 As shown in FIG. 3, the pitch data (for example, model pitch data; A1) before being filtered has a fine disturbance in the waveform. Such waveform disturbance is caused by, for example, reverb. When a pitch is detected from audio data subjected to reverberation, the detection result is not a sine wave but a waveform disturbance. For this reason, it has been difficult to accurately detect the vibrato section from the reverberated model voice. Furthermore, it is difficult to determine from the model voice data whether or not the model voice is reverberated. On the other hand, in the pitch data processed by the low-pass filter, the influence of the reverberation applied to the sound is removed, and this process makes it possible to appropriately specify the vibrato section in the process described later.

ステップＳＡ１５０において、制御部１１は、お手本音声データにおいてビブラート区間の特徴を示す区間（以下、ビブラート候補区間）を以下の条件で特定する。すなわち、制御部１１は、ステップＳＡ１４０においてローパスフィルタをかけ生成されたフィルタお手本ピッチデータの表すピッチが、負から正又は正から負に変化する（ゼロクロスする）箇所をゼロクロス箇所として特定する。具体的には、例えば、図３に示す例においては、フィルタお手本ピッチデータを表すグラフＡ２がゼロクロスする時刻すなわち、時刻Ｐ１，Ｐ２，Ｐ３，Ｐ４などが、ゼロクロス箇所として特定される。 In step SA150, the control unit 11 specifies a section (hereinafter referred to as a vibrato candidate section) indicating the characteristics of the vibrato section in the model voice data under the following conditions. That is, the control unit 11 identifies a point where the pitch represented by the filter model pitch data generated by applying the low pass filter in Step SA140 changes from negative to positive or from positive to negative (zero crossing) as a zero crossing point. Specifically, for example, in the example shown in FIG. 3, the time when the graph A2 representing the filter model pitch data zero-crosses, that is, the times P1, P2, P3, P4, etc., is specified as the zero-cross point.

次いで、制御部１１は、ゼロクロス箇所が現れる時間間隔を測定し、測定された時間間隔が予め定められた範囲内であり、かつ、その時間間隔が連続して所定回数以上検出された区間を、ビブラート候補区間として特定する。この処理によって、図３に示した例では、ゼロクロス箇所がほぼ等間隔で現れる区間Ａ３がビブラート候補区間として特定される。なお、図３に示された区間においては、ビブラート候補区間として１つの区間が特定されたが、図３に含まれない楽曲部分においてもビブラート候補区間が特定される。 Next, the control unit 11 measures the time interval at which the zero-cross point appears, and the interval in which the measured time interval is within a predetermined range and the time interval is continuously detected a predetermined number of times or more, Identified as a vibrato candidate section. With this processing, in the example shown in FIG. 3, the section A3 in which the zero cross points appear at almost equal intervals is specified as the vibrato candidate section. In the section shown in FIG. 3, one section is specified as a vibrato candidate section, but a vibrato candidate section is specified also in a music portion not included in FIG. 3.

そして、制御部１１は、ステップＳＡ１５０において特定されたビブラート候補区間のそれぞれについて、該区間においてビブラート技法が実際に用いられていることを更に厳密に解析するため、以下のパラメータを抽出する（ステップＳＡ１６０）。なお、以下の説明において、例えば図３の区間Ａ３のようにフィルタお手本ピッチデータの値が周期的に変動している場合に、単位時間あたりの振動の回数を「ビブラートの振動数」と呼ぶ。 Then, the control unit 11 extracts the following parameters for each of the vibrato candidate sections identified in step SA150 in order to more strictly analyze that the vibrato technique is actually used in the section (step SA160). ). In the following description, the number of vibrations per unit time is referred to as “vibrato frequency” when the value of the filter model pitch data periodically varies, for example, as in section A3 of FIG.

（１）ビブラートの振動数の平均値（Ａｆ；Average of frequency）
パラメータＡｆは各ビブラート候補区間におけるビブラートの振動数の平均値であり、上記フィルタお手本ピッチデータが横軸とゼロクロスする時間間隔の逆数の平均値として算出される。
（２）ビブラートの振動数の標準偏差（Ｄｆ；Deviation of frequency）
パラメータＤｆは、上記フィルタお手本ピッチデータが横軸とゼロクロスする時間間隔の逆数の分布の標準偏差として算出される。本パラメータから、ビブラートの振動数の「ばらつき」の大きさを推定することができる。すなわち、本パラメータの値が０に近いほど均一な振動数を持つ、優れたビブラートであることを示す。 (1) Average frequency of vibrato (Af; Average of frequency)
The parameter Af is the average value of the vibration frequency of the vibrato in each vibrato candidate section, and is calculated as the average value of the reciprocal of the time interval at which the filter model pitch data zero crosses the horizontal axis.
(2) Standard deviation (Df: Deviation of frequency) of vibrato
The parameter Df is calculated as the standard deviation of the reciprocal distribution of the time interval at which the filter model pitch data crosses zero with the horizontal axis. From this parameter, it is possible to estimate the magnitude of “variation” of the vibrato frequency. That is, the closer the value of this parameter is to 0, the better the vibrato has a uniform frequency.

ここで、以下のパラメータの説明において用いられる「ピッチ振動幅」について説明する。図４は、図３におけるフィルタお手本ピッチデータ（Ａ２）を取り出して示したグラフである。図４において、制御部１１は、以下のようにして上記ビブラート候補区間の「ピッチ振動幅」を算出する。まず、制御部１１は、フィルタお手本ピッチデータを時間で微分することにより、該データのグラフから極大値および極小値を特定する。例えば、図中Ｑ２、Ｑ４、Ｑ６、Ｑ８、およびＱ１０は極大値を示し、Ｑ１、Ｑ３、Ｑ５、Ｑ７、およびＱ９は極小値を示す。制御部１１は、特定された１つの極小値と、時間的に後ろに隣接する極大値との差分をピッチ振動幅とし、該ピッチ振動幅を、該値の算出に用いた極小値と極大値との中間の時刻に位置付ける。例えば極小値Ｑ１と極大値Ｑ２とからはピッチ振動幅Ｗ１が生成される。図４には、そのようにして生成されたピッチ振動幅Ｗ１〜５が書き込まれている。 Here, the “pitch vibration width” used in the description of the following parameters will be described. FIG. 4 is a graph showing the filter model pitch data (A2) extracted from FIG. In FIG. 4, the control unit 11 calculates the “pitch vibration width” of the vibrato candidate section as follows. First, the control unit 11 differentiates the filter model pitch data with respect to time, thereby specifying a maximum value and a minimum value from the graph of the data. For example, Q2, Q4, Q6, Q8, and Q10 in the figure indicate maximum values, and Q1, Q3, Q5, Q7, and Q9 indicate minimum values. The control unit 11 uses a difference between one specified local minimum value and a local maximum value that is temporally adjacent to the pitch vibration width as the pitch vibration width, and uses the pitch vibration width as a local minimum value and a local maximum value used to calculate the value. It is positioned at the time between. For example, the pitch vibration width W1 is generated from the minimum value Q1 and the maximum value Q2. In FIG. 4, pitch vibration widths W1 to W5 generated in this way are written.

さて、ステップＳＡ１６０で抽出されるパラメータの説明に戻る。
（３）ピッチ振動幅の平均値（Ａｐ；Average of pitch）
パラメータＡｐは、各ビブラート候補区間において見出されたピッチ振動幅の平均値を示す。本パラメータから、ビブラート区間におけるピッチの振動幅を推定することができる。
（４）ピッチ振動幅の標準偏差（Ｄｐ；Deviation of pitch）
パラメータＤｐは、各ビブラート候補区間において見出されたピッチ振動幅の標準偏差を示す。本パラメータから、ビブラート区間におけるピッチの振動幅の「ばらつき」の大きさを推定することができる。すなわち、本パラメータの値が０に近いほど均一の振動幅でピッチが振動する、優れたビブラートであることを示す。 Now, the description returns to the parameters extracted in step SA160.
(3) Average value of pitch vibration width (Ap: Average of pitch)
The parameter Ap indicates an average value of the pitch vibration width found in each vibrato candidate section. From this parameter, the vibration width of the pitch in the vibrato section can be estimated.
(4) Standard deviation of pitch vibration width (Dp; Deviation of pitch)
The parameter Dp indicates the standard deviation of the pitch vibration width found in each vibrato candidate section. From this parameter, the “variation” of the vibration width of the pitch in the vibrato section can be estimated. That is, the closer the value of this parameter is to 0, the better the vibrato that the pitch vibrates with a uniform vibration width.

（５）ピッチ振動幅の線形近似直線の傾き（Ｓｐ；Slope of pitch）
パラメータＳｐは、上記ピッチ振動幅のグラフにおける線形近似直線の傾きを示す。図５は、上記ピッチ振動幅のグラフを取り出して示している。制御部１１は、ビブラート候補区間（図中Ａ３）におけるピッチ振動幅の点について、線形近似直線を決定する。例えば、図５に示す区間Ａ３においては、線形近似直線のグラフは直線Ｌ１のように決定され、（式１）として表される。
（式１）Ｐ＝１５ｔ＋１５０
このように線形近似直線を算出することにより直線の傾きＳｐが決定される。上記の例では、ピッチ振動幅の線形近似直線の傾きＳｐは、１５である。
本パラメータから、ビブラート区間を通したピッチの振動幅の安定性を推定することができる。すなわち、Ｓｐが０に近似しているほど、ビブラート区間を通してピッチの変動幅が均一に保たれた、優れたビブラートであることを表す。 (5) Slope of pitch (Sp; Slope of pitch)
The parameter Sp indicates the slope of the linear approximation line in the pitch vibration width graph. FIG. 5 shows an extracted graph of the pitch vibration width. The control unit 11 determines a linear approximation straight line for the pitch vibration width point in the vibrato candidate section (A3 in the figure). For example, in the section A3 shown in FIG. 5, the linear approximate straight line graph is determined as a straight line L1 and expressed as (Equation 1).
(Formula 1) P = 15t + 150
Thus, by calculating the linear approximate straight line, the slope Sp of the straight line is determined. In the above example, the slope Sp of the linear approximation line of the pitch vibration width is 15.
From this parameter, the stability of the vibration width of the pitch through the vibrato section can be estimated. In other words, the closer Sp is to 0, the better the vibrato is, in which the pitch fluctuation range is kept uniform throughout the vibrato section.

ステップＳＡ１７０において、制御部１１は、以下のような基準で、ステップＳＡ１５０において特定されたビブラート候補区間をビブラート区間と最終的に決定するか否かを判定する。すなわち、
（１）Ｄｆが所定の閾値より小さい
（２）Ａｐが所定の範囲内にある
（３）Ｄｐが所定の閾値より小さい
（４）Ｓｐの絶対値が所定の閾値より小さい
上記（１）ないし（４）の全ての条件を満たす区間をビブラート区間として最終決定する。 In step SA170, the control unit 11 determines whether to finally determine the vibrato candidate section identified in step SA150 as a vibrato section based on the following criteria. That is,
(1) Df is smaller than a predetermined threshold (2) Ap is within a predetermined range (3) Dp is smaller than a predetermined threshold (4) The absolute value of Sp is smaller than a predetermined threshold (1) to ( The section satisfying all the conditions of 4) is finally determined as the vibrato section.

上記（１）ないし（４）の条件は、ビブラート区間を特定するのに最適な条件である。なぜなら、一般に理想的なビブラートにおいては、ビブラートの振動数、ピッチの振動幅はばらつかず均一であり、また、その振動幅は所定の大きさの範囲内（例えば５００セント以内など）にあり、ピッチの変動幅はビブラート区間を通して一定となるからである。なお、「セント」とは、ピッチの相対的な音程差を示す単位であり、例えば＋１００セントが示すピッチは基準となるピッチから半音分上の音程を示す。制御部１１は、特定した区間を表すビブラート区間データを、ビブラート区間データ記憶領域１２５に記憶する。以上で、ビブラート区間特定処理は終了する。 The above conditions (1) to (4) are optimum conditions for specifying the vibrato section. This is because, in general, in an ideal vibrato, the vibration frequency of the vibrato and the vibration width of the pitch do not vary and are uniform, and the vibration width is within a predetermined size range (for example, within 500 cents), This is because the fluctuation range of the pitch is constant throughout the vibrato section. Note that “cent” is a unit indicating a relative pitch difference of pitches. For example, a pitch indicated by +100 cents indicates a pitch that is a semitone above a reference pitch. The control unit 11 stores vibrato section data representing the specified section in the vibrato section data storage area 125. This completes the vibrato section specifying process.

以上のように、ステップＳＡ１５０において、ピッチの振動の時間間隔が予め定められた範囲内であり、かつ、その時間間隔が連続して所定回数以上検出されたことを条件として、一旦ビブラート区間の候補を絞り込み、ステップＳＡ１６０において抽出されたパラメータに基づいて上記ビブラート候補区間がビブラート区間として適切であるか厳密に判定することで、正確にビブラート区間を判定することができる。 As described above, in step SA150, on the condition that the time interval of pitch vibration is within a predetermined range and that the time interval is continuously detected a predetermined number of times or more, the candidate for the vibrato section is once set. By narrowing down and precisely determining whether the vibrato candidate section is appropriate as the vibrato section based on the parameters extracted in step SA160, the vibrato section can be determined accurately.

（Ｂ−３；歌唱の評価）
以下では、上述のようにしてお手本音声データから特定されたビブラート区間について、対応する練習者音声データのビブラート技法の巧拙を判定する方法について説明する。 (B-3; Evaluation of singing)
Hereinafter, a method for determining the skill of the vibrato technique of the corresponding trainer voice data for the vibrato section specified from the model voice data as described above will be described.

図６は、練習者の歌唱におけるビブラート評価処理の流れを示したフローチャートである。
ステップＳＢ１００において、制御部１１は、パラメータ記憶領域１２６から練習者ピッチデータを読み出し、上述したフィルタお手本ピッチデータの生成の手順と同様に、特定の周波数成分を抽出するフィルタ処理を施す。すなわち制御部１１は、練習者ピッチデータを６Ｈｚより低い周波数領域の成分を抽出するローパスフィルタで処理し、新たなピッチデータ（以下、フィルタ練習者ピッチデータと呼ぶ）を生成する。 FIG. 6 is a flowchart showing the flow of the vibrato evaluation process in the practitioner's song.
In step SB100, the control unit 11 reads out the trainer pitch data from the parameter storage area 126, and performs a filtering process for extracting a specific frequency component in the same manner as the above-described filter model pitch data generation procedure. That is, the control unit 11 processes the trainer pitch data with a low-pass filter that extracts components in a frequency region lower than 6 Hz, and generates new pitch data (hereinafter referred to as filter trainer pitch data).

次に制御部１１は、記憶部１２のビブラート区間データ記憶領域１２５から、上記ビブラート区間特定処理により生成されたビブラート区間データを読出す（ステップＳＢ１１０）。
制御部１１は、フィルタ練習者ピッチデータの上記ビブラート区間の各々において、以下に挙げるパラメータを抽出する（ステップＳＢ１２０）。なお、上述のビブラート区間特定処理において説明したパラメータと同様の手順で算出されるため、その詳細な説明は省略する。 Next, the control unit 11 reads the vibrato section data generated by the vibrato section specifying process from the vibrato section data storage area 125 of the storage unit 12 (step SB110).
The control unit 11 extracts the following parameters in each of the vibrato sections of the filter trainer pitch data (step SB120). In addition, since it calculates in the procedure similar to the parameter demonstrated in the above-mentioned vibrato area identification process, the detailed description is abbreviate | omitted.

（１）ビブラートの振動数の平均値Ａｆ
（２）ビブラートの振動数の標準偏差Ｄｆ
（３）ピッチ振動幅の平均値Ａｐ
（４）ピッチ振動幅の標準偏差Ｄｐ
（５）ピッチ振動幅の線形近似直線の傾きＳｐ (1) Average frequency Af of vibrato
(2) Standard deviation Df of vibrato frequency
(3) Pitch vibration width average value Ap
(4) Standard deviation Dp of pitch vibration width
(5) Inclination Sp of linear approximation line of pitch vibration width

制御部は、ステップＳＢ１１０で読み出したビブラート区間ごとに以上に示した５つのパラメータを抽出し、抽出されたパラメータをパラメータ記憶領域１２６に書き込む。
ステップＳＢ１３０において、制御部１１は、パラメータ記憶領域１２６に書き込まれたパラメータに基づいて、該ビブラート区間におけるビブラート技法の巧拙を評価する。本実施例においては、Ｄｆ、Ｄｐ、およびＳｐを用いて評価する。なお、評価結果は、例えば満点（１００点）から上記パラメータの値に応じて減点をする場合について説明する。 The control unit extracts the five parameters shown above for each vibrato section read in step SB110 and writes the extracted parameters in the parameter storage area 126.
In step SB130, the control unit 11 evaluates the skill of the vibrato technique in the vibrato section based on the parameters written in the parameter storage area 126. In this example, evaluation is performed using Df, Dp, and Sp. In addition, the evaluation result demonstrates the case where a point is deducted according to the value of the said parameter, for example from a perfect score (100 points).

制御部１１は、Ｄｆの値が大きいほど歌唱評価に大きな減点をする。なぜなら、Ｄｆはビブラートの振動数のばらつきを表し、Ｄｆの値が０に近いほど均一な振動数を示す、優れたビブラートであると考えられるからである。
また、制御部１１は、Ｄｐの値が大きいほど歌唱評価に大きな減点をする。Ｄｐは、ビブラート区間におけるピッチの変動幅のばらつきを意味し、Ｄｐの値が０に近いほど均一な変動幅でピッチが振動する、優れたビブラートであることを示すからである。
また、制御部１１は、Ｓｐの絶対値が大きいほど歌唱評価に大きな減点をする。Ｓｐは、ビブラート区間を通したピッチの振動幅の安定性を意味し、Ｓｐの絶対値が０に近いほどビブラート区間を通してピッチの振動幅が均一に保たれた、優れたビブラートであることを意味するからである。 The control part 11 makes a big deduction to singing evaluation, so that the value of Df is large. This is because Df represents a variation in the vibration frequency of vibrato, and is considered to be an excellent vibrato that shows a uniform vibration frequency as the value of Df is closer to zero.
Moreover, the control part 11 makes a big deduction to singing evaluation, so that the value of Dp is large. This is because Dp means the variation of the pitch fluctuation range in the vibrato section, and the closer the value of Dp is to 0, the better the vibrato that the pitch vibrates with a uniform fluctuation range.
Moreover, the control part 11 makes a big deduction to singing evaluation, so that the absolute value of Sp is large. Sp means the stability of the pitch vibration width through the vibrato section, and the closer the absolute value of Sp is to 0, the better the vibrato that the pitch vibration width is kept uniform throughout the vibrato section. Because it does.

制御部１１は、以上のように、特定されたビブラート区間の各々について、練習者音声データのビブラート技法の巧拙を評価する。ビブラート技法においては、ピッチの振動幅が規則的でしかもある範囲のものであると、聴覚印象的に音の響きは豊かで望ましいものとなることが知られている。従って、ビブラートの振動数の均一性を反映するパラメータであるＤｆや、ピッチの変動幅の均一性を反映するパラメータであるＤｐや、ビブラート区間を通したピッチの振動幅の均一性を反映するパラメータであるＳｐを元にしてビブラート技法の巧拙を適切に評価することが可能である。 As described above, the control unit 11 evaluates the skill of the vibrato technique of the trainer voice data for each of the specified vibrato sections. In the vibrato technique, it is known that if the pitch vibration width is regular and within a certain range, the sound reverberantly becomes rich and desirable. Therefore, Df which is a parameter reflecting the uniformity of the vibration frequency of vibrato, Dp which is a parameter reflecting the uniformity of the fluctuation range of the pitch, and a parameter which reflects the uniformity of the vibration width of the pitch through the vibrato section It is possible to appropriately evaluate the skill of the vibrato technique based on the Sp.

（Ｃ：変形例）
以上、本発明の実施形態について説明したが、本発明は上述した実施形態に限定されることなく、他の様々な形態で実施可能である。以下にその一例を示す。 (C: Modification)
As mentioned above, although embodiment of this invention was described, this invention is not limited to embodiment mentioned above, It can implement with another various form. An example is shown below.

（１）上述した実施形態においては、カラオケの演奏の進行に伴ってお手本音声データおよび練習者音声データからピッチを検出し、お手本ピッチデータおよび練習者ピッチデータを生成する場合について説明した。しかし、該処理は必ずしもカラオケの演奏に伴って行われる必要はない。例えば、練習者音声データを一旦楽曲の初めから終わりまで蓄積し、カラオケの演奏が終了した段階で、お手本音声データおよび練習者音声データからピッチを検出するようにしても良い。 (1) In the above-described embodiment, the case has been described in which the pitch is detected from the model voice data and the trainer voice data as the karaoke performance progresses, and the model pitch data and the practitioner pitch data are generated. However, this process does not necessarily have to be performed with the performance of karaoke. For example, the trainer voice data may be temporarily accumulated from the beginning to the end of the music, and the pitch may be detected from the model voice data and the practice voice data when the performance of karaoke is completed.

（２）上記実施形態においては、ビブラート区間をお手本音声データから特定する場合について説明した。しかし、ビブラート区間を予め上記実施形態に説明した方法でお手本音声データから生成し、特定した区間を示すデータを含むカラオケ用楽曲データを作成してもよい。従って、本発明は、上記の方法を用いてビブラート区間を特定するカラオケ楽曲データの生成装置としても実施可能である。 (2) In the above embodiment, the case where the vibrato section is specified from the model voice data has been described. However, the vibrato section may be generated from the model voice data in advance by the method described in the above embodiment, and karaoke music data including data indicating the specified section may be created. Therefore, the present invention can also be implemented as a device for generating karaoke music data that identifies a vibrato section using the above method.

（３）上述した実施形態では、制御部１１は、フィルタお手本ピッチデータにおいてゼロクロス箇所が現れる時間間隔を測定し、測定された時間間隔が予め定められた範囲内であり、かつ、その時間間隔が連続して所定回数以上検出された区間を、ビブラート候補区間として特定する場合について説明した。しかし、上述のゼロクロス箇所が現れる時間間隔に代えて、ピッチが極大または極小を示す時間間隔に基づいてビブラート候補区間を特定するようにしても良い。要は、ピッチの振動の周期を検出することができる方法であれば良い。 (3) In the embodiment described above, the control unit 11 measures the time interval at which the zero-cross point appears in the filter sample pitch data, and the measured time interval is within a predetermined range, and the time interval is A case has been described in which sections continuously detected a predetermined number of times or more are specified as vibrato candidate sections. However, the vibrato candidate section may be specified based on a time interval in which the pitch is maximum or minimum, instead of the time interval at which the above-described zero-cross location appears. In short, any method capable of detecting the pitch vibration period may be used.

（４）上述した実施形態においては、カラオケ装置１の制御部１１は、カラオケ伴奏が終了した後にビブラート区間を特定し、抽出した区間について歌唱の評価を行った。しかし、制御部１１は、カラオケ伴奏の開始前に上記実施形態に説明した方法でビブラート区間を特定し、該特定した区間を示す情報を表示部１３やスピーカ１７に出力し、表示部１３やスピーカ１７から視覚的または聴覚的な信号を練習者に提示してビブラート区間であることを報知しても良い。そのようにすれば、練習者はビブラート区間が差し掛かった際により注意深く歌唱することができる。 (4) In embodiment mentioned above, the control part 11 of the karaoke apparatus 1 specified the vibrato area after the karaoke accompaniment was complete | finished, and performed singing evaluation about the extracted area. However, the control unit 11 specifies the vibrato section by the method described in the above embodiment before the start of the karaoke accompaniment, and outputs information indicating the specified section to the display unit 13 and the speaker 17. A visual or audible signal may be presented to the practitioner from 17 so as to notify the vibrato section. That way, the practitioner can sing more carefully when the vibrato section is approaching.

（５）上述した実施形態では、ステップＳＡ１５０においてビブラート候補区間を特定し、ステップＳＡ１７０において該特定されたビブラート候補区間の各々についてのパラメータに基づいてビブラート区間を特定する場合について説明した。しかし、上述のように２段階でビブラート区間を絞り込まず、ステップＳＡ１５０ないしステップＳＡ１７０において用いられた条件を一度に用いてビブラート区間を特定しても良い。 (5) In the above-described embodiment, the case where the vibrato candidate section is specified in step SA150 and the vibrato section is specified based on the parameter for each of the specified vibrato candidate sections in step SA170 has been described. However, as described above, the vibrato section may be specified by using the conditions used in step SA150 to step SA170 at a time without narrowing down the vibrato section in two stages.

（６）上述した実施形態では、楽曲のガイドメロディのノート（音符）毎のピッチを示すメロディピッチデータを用いたが、メロディピッチデータはガイドメロディを表すデータに限らず、例えば、歌唱の巧拙を採点するために予め用意されたメロディピッチデータであってもよく、要するに、楽曲のメロディのピッチを表すデータであればどのようなものであってもよい。 (6) In the embodiment described above, the melody pitch data indicating the pitch of each note (note) of the guide melody of the music is used. However, the melody pitch data is not limited to the data indicating the guide melody, and, for example, skill of singing is used. The melody pitch data prepared in advance for scoring may be used. In short, any data representing the melody pitch of the music may be used.

（７）上述した実施形態では、制御部１１が、特定の周波数以下の周波数成分を抽出するローパスフィルタ処理を行ったが、制御部１１が行うフィルタ処理はこれに限らず、例えば、所定の周波数幅の周波数成分を取り出すフィルタを用いてもよい。要するに、特定の周波数成分を抽出するフィルタ処理であればどのようなものであってもよい。 (7) In the above-described embodiment, the control unit 11 performs the low-pass filter process for extracting frequency components below a specific frequency. However, the filter process performed by the control unit 11 is not limited to this, for example, a predetermined frequency. A filter that extracts the frequency component of the width may be used. In short, any filtering process that extracts a specific frequency component may be used.

（８）上述した実施形態においては、お手本の音声を表すお手本音声データや、練習者の歌唱を表す練習者音声データについて、ビブラート区間の特定や歌唱の評価を行う場合について説明した。しかし、処理の対象となる音声データは、歌唱音声を表すデータに限らず、例えばバイオリンやフルートなどの楽器の演奏音を表す音声データであってもよい。 (8) In embodiment mentioned above, the case where the vibrato area was specified and the evaluation of singing was demonstrated about the model audio | voice data showing the audio | voice of a model, and the trainer audio | voice data showing the practitioner's song. However, the audio data to be processed is not limited to data representing the singing voice, and may be audio data representing the performance sound of a musical instrument such as a violin or a flute.

（９）上記実施形態におけるカラオケ装置１の制御部１１によって実行されるプログラムは、磁気テープ、磁気ディスク、フレキシブルディスク、光記録媒体、光磁気記録媒体、ＲＡＭ、ＲＯＭなどの記録媒体に記録した状態で提供し得る。また、インターネットのようなネットワーク経由でカラオケ装置１にダウンロードさせることも可能である。 (9) The program executed by the control unit 11 of the karaoke apparatus 1 in the above embodiment is recorded on a recording medium such as a magnetic tape, a magnetic disk, a flexible disk, an optical recording medium, a magneto-optical recording medium, a RAM, and a ROM. Can be offered at. It is also possible to download to the karaoke apparatus 1 via a network such as the Internet.

（１０）上記実施形態においては、お手本音声データを用いて楽曲からビブラート区間を特定する方法を用いた。なぜなら練習者のビブラートの巧拙によっては、練習者の練習者音声データからはビブラート区間が特定されないおそれがあるからである。しかし、お手本音声データからではなく練習者音声データからビブラート区間を特定するようにしても良い。そのようにすることで、練習者が意図的にビブラート技法を用いて歌唱した部分についてのみ歌唱評価が行われるとの効果を奏する。 (10) In the above embodiment, the method of specifying the vibrato section from the music using the model voice data is used. This is because, depending on the skill of the practitioner's vibrato, the vibrato section may not be specified from the practitioner's practitioner voice data. However, the vibrato section may be specified not from the model voice data but from the trainer voice data. By doing so, there is an effect that the singing evaluation is performed only on the part where the practitioner sang intentionally using the vibrato technique.

（１１）上記実施形態においては、リバーブのかかったお手本音声からビブラート区間を特定したが、制御部１１は、リバーブのかかっていないお手本音声からもビブラート区間を特定することができるのは勿論である。すなわち、上述の実施形態によれば、お手本音声にリバーブがかかっているか否かに関わらず、また、お手本音声にリバーブがかかっているか否かを判定する必要もなく、お手本音声からビブラート区間を特定することができる。 (11) In the above embodiment, the vibrato section is specified from the model voice with reverb applied. However, the control unit 11 can also specify the vibrato section from the model voice without reverb applied. . That is, according to the above-described embodiment, it is not necessary to determine whether or not reverberation is applied to the model voice regardless of whether or not the model sound is reverbed, and the vibrato section is specified from the model sound. can do.

（１２）上記実施形態においては、Ａｆ、Ｄｆ、Ａｐ、Ｄｐ、およびＳｐの各パラメータを算出してビブラート区間の特定および歌唱の評価を行う場合について説明した。更に、以下のようなパラメータを算出することにより、上述の処理に反映させても良い。例えば、パラメータを抽出する区間におけるピッチの最大値と最小値の比の値（Ｍａｘ／Ｍｉｎ）を求めても良い。そして、算出された比の値が所定の閾値（ｋ）よりも小さいか否か、すなわち、
１＜Ｍａｘ／Ｍｉｎ＜ｋ
となるか否かを判定し、上述のビブラート区間の特定処理においては、判定結果が肯定的である場合に該区間はビブラート区間の条件を満たすと判定しても良い。また、上述のビブラート評価処理においては、判定結果が肯定的である場合に該区間におけるビブラートの評価に加点したり、判定結果が否定的である場合に該区間におけるビブラートの評価に減点したりしても良い。 (12) In the above embodiment, the case where the parameters of Af, Df, Ap, Dp, and Sp are calculated to specify the vibrato section and evaluate the singing has been described. Furthermore, the following parameters may be calculated and reflected in the above-described processing. For example, the value (Max / Min) of the ratio between the maximum value and the minimum value of the pitch in the parameter extraction interval may be obtained. Then, whether or not the calculated ratio value is smaller than a predetermined threshold value (k), that is,
1 <Max / Min <k
In the above-described vibrato section specifying process, if the determination result is affirmative, it may be determined that the section satisfies the conditions of the vibrato section. Further, in the above-described vibrato evaluation process, when the determination result is affirmative, points are added to the evaluation of the vibrato in the interval, and when the determination result is negative, the vibrato evaluation in the interval is deducted. May be.

（１３）上記実施形態においては、フィルタ練習者ピッチデータから抽出された各種パラメータに基づいて、満点（１００点）から減点してビブラート技法の巧拙を評価する場合について説明した。しかし、上記各種パラメータの値に応じて加点することにより評価をしても良い。 (13) In the above embodiment, the case where the skill of the vibrato technique is evaluated by deducting from the perfect score (100 points) based on various parameters extracted from the filter trainer pitch data has been described. However, evaluation may be performed by adding points according to the values of the various parameters.

（１４）ビブラート技法において、望ましい振動数はビブラートがなされた楽曲部分のピッチによって異なることが知られている。例えば、低い音では、１秒間に３〜５周期程度であり高い音では６〜１０周期程度が望ましいといわれている。そのように、ピッチによって望ましい振動数が変わるような場合には、ピッチと望ましいビブラートの振動数を対応づけたテーブルを設け、該テーブルにおいて練習者音声データから抽出したピッチの絶対的な値に対応付けられた振動数と、練習者音声データのビブラートの振動数との差分に基づいて歌唱評価を行うとしても良い。 (14) In the vibrato technique, it is known that a desirable frequency varies depending on the pitch of the music portion on which the vibrato is made. For example, it is said that about 3 to 5 cycles per second is desirable for low sounds, and about 6 to 10 cycles is desirable for high sounds. In this way, when the desired frequency varies depending on the pitch, a table in which the pitch is associated with the desired vibrato frequency is provided, and the absolute value of the pitch extracted from the trainer voice data in the table is provided. Singing evaluation may be performed based on the difference between the attached vibration frequency and the vibration frequency of the vibrato in the trainee voice data.

（１５）上記実施形態においては、フィルタ練習者ピッチデータから各種のパラメータを算出し、算出されたパラメータを上述のような組み合わせで用いて、ビブラート区間特定処理およびビブラート評価処理を行う場合について説明した。しかし、それぞれの処理に用いるパラメータの組み合わせの方法はあくまでも一例であり、上述の実施形態のような組み合わせで必ずしも用いる必要はない。例えば、上述の実施形態に用いたパラメータのうち１つまたは複数のパラメータを用いないで処理をしても良いし、更に他のパラメータを組み合わせて用いても良い。 (15) In the above embodiment, a case has been described in which various parameters are calculated from the filter practitioner pitch data, and the vibrato section specifying process and the vibrato evaluation process are performed using the calculated parameters in the above combinations. . However, the parameter combination method used for each process is merely an example, and it is not always necessary to use the combination as in the above-described embodiment. For example, the processing may be performed without using one or more parameters among the parameters used in the above-described embodiment, and other parameters may be used in combination.

（１６）上記実施形態においては、パラメータの算出において、ビブラートの振動数の標準偏差Ｄｆ、およびピッチ振動幅の標準偏差Ｄｐを算出する場合について説明した。しかし、該パラメータは、それぞれビブラートの振動数のばらつきの度合い、およびピッチ振動幅のばらつきの度合いを反映したパラメータであれば良い。従って、それらのパラメータの算出に標準偏差に代えて標準誤差を用いるとしても良い。 (16) In the above embodiment, the case where the standard deviation Df of the vibrato frequency and the standard deviation Dp of the pitch vibration width are calculated in the parameter calculation has been described. However, the parameters may be parameters that reflect the degree of variation in the vibration frequency of the vibrato and the degree of variation in the pitch vibration width. Therefore, a standard error may be used instead of the standard deviation for calculating these parameters.

（１７）上記実施形態においては、各楽曲におけるビブラート区間を特定し、該特定されたビブラート区間について歌唱を評価する場合について説明した。しかし、ビブラート区間を示すデータが別途得られる場合には、上記ビブラート区間の特定を行わなくても良い。ビブラート区間を示すデータが別途得られる場合とは、例えば、各楽曲データにビブラート区間を示すデータが予め含まれている場合などである。そのような場合、上記ビブラート区間を示すデータが示すビブラート区間をビブラート区間データ記憶領域１２５に書き込んでおき、ビブラート評価処理の際に該書き込まれたビブラート区間データを読み出して用いれば良い。 (17) In the above embodiment, a case has been described in which a vibrato section in each musical piece is specified and a song is evaluated for the specified vibrato section. However, when data indicating a vibrato section is obtained separately, the above-described vibrato section need not be specified. The case where the data indicating the vibrato section is obtained separately is, for example, the case where each piece of music data includes data indicating the vibrato section in advance. In such a case, the vibrato section indicated by the data indicating the vibrato section may be written in the vibrato section data storage area 125, and the written vibrato section data may be read and used during the vibrato evaluation process.

（１８）上記実施形態においては、図２のフローチャートに示されているように、カラオケ伴奏および歌唱(ステップＳＡ１００ないし１２０)が終了してからビブラート区間が特定される（ステップＳＡ１４０ないし１７０）場合について説明した。しかし、カラオケの伴奏および歌唱の進行に伴いリアルタイムにビブラート区間を特定するようにしても良い。その場合の制御部１１の処理手順を以下に示す。すなわち、制御部１１は、カラオケ伴奏の進行に伴いお手本音声データを読み出し、読み出した部分のお手本音声データについて、お手本ピッチデータを生成する。制御部１１は、そのように順次生成されるお手本ピッチデータについてステップＳＡ１４０ないし１７０の処理を行うことにより、カラオケ伴奏が済んだ楽曲部分のビブラート区間を特定する。その結果、カラオケ伴奏の終了の時点で楽曲の全ての部分についてビブラート区間が特定される。 (18) In the above embodiment, as shown in the flowchart of FIG. 2, the vibrato section is specified (steps SA140 to 170) after the karaoke accompaniment and singing (steps SA100 to 120) are completed. explained. However, the vibrato section may be specified in real time as karaoke accompaniment and singing progress. The processing procedure of the control part 11 in that case is shown below. That is, the control unit 11 reads out model voice data as the karaoke accompaniment progresses, and generates model pitch data for the model voice data of the read portion. The control unit 11 specifies the vibrato section of the musical piece with the karaoke accompaniment by performing the processes of steps SA140 to 170 on the model pitch data sequentially generated as described above. As a result, the vibrato section is specified for all parts of the music at the end of the karaoke accompaniment.

カラオケ装置１のハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of the karaoke apparatus. ビブラート区間特定処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a vibrato area specific process. ピッチデータの内容の一例を示す図である。It is a figure which shows an example of the content of pitch data. ピッチ振動幅の算出法を説明するための図である。It is a figure for demonstrating the calculation method of a pitch vibration width. ピッチ振動幅の線形近似直線の算出法を説明するための図である。It is a figure for demonstrating the calculation method of the linear approximate line of pitch vibration width. ビブラート評価処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a vibrato evaluation process.

Explanation of symbols

１…カラオケ装置、１１…制御部、１２…記憶部、１３…表示部、１４…操作部、１５…マイクロホン、１６…音声処理部、１７…スピーカ、１２１…伴奏データ記憶領域、１２２…歌詞データ記憶領域、１２３…お手本音声データ記憶領域、１２４…練習者音声データ記憶領域、１２５…ビブラート区間データ記憶領域、１２６…パラメータ記憶領域。 DESCRIPTION OF SYMBOLS 1 ... Karaoke apparatus, 11 ... Control part, 12 ... Memory | storage part, 13 ... Display part, 14 ... Operation part, 15 ... Microphone, 16 ... Sound processing part, 17 ... Speaker, 121 ... Accompaniment data storage area, 122 ... Lyric data Storage area 123: Model voice data storage area 124 ... Trainer voice data storage area 125 ... Vibrato section data storage area 126 ... Parameter storage area

Claims

Storage means for storing melody pitch data representing the pitch of the melody;
Pitch detection means for detecting pitch in units of frames of a predetermined time length from sound data and generating pitch data representing the detected pitch;
Filter processing means for applying a filter process for extracting a specific frequency component to the pitch data generated by the pitch detection means;
A pitch difference data value representing a relative value between the pitch data filtered by the filter processing means and the zero reference value, with the pitch represented by the melody pitch data stored in the storage means as a zero reference value A zero cross point specifying means for specifying a point where the pitch difference data changes from negative to positive or from positive to negative as a zero cross point;
A specifying means for specifying a section in which the zero-cross location specified by the zero-cross location specifying means is in a predetermined mode;
In the pitch difference data, the pitch vibration width data generating means for specifying the time at which the pitch value becomes maximum and minimum, and generating pitch vibration width data from the difference between the adjacent maximum value and minimum value;
Vibrato detection, comprising: output means for outputting information indicating the section specified by the specifying means when the pitch vibration width data is in a predetermined form in the section specified by the specifying means. apparatus.

The specifying means measures a time interval at which the zero-cross location specified by the zero-cross location specifying means appears, the measured time interval is within a predetermined range, and the state is continuously repeated a predetermined number of times or more. The vibrato detection device according to claim 1, wherein the detected section is specified.

The output means calculates a straight line that approximates the time change of the pitch fluctuation width data in the section specified by the specifying means, and when the absolute value of the calculated slope of the straight line is smaller than a predetermined threshold, 2. The vibrato detection device according to claim 1, wherein information indicating the section specified by the specifying means is output.

The output means calculates the time average of the pitch fluctuation data in the section specified by the specifying means, and indicates the section specified by the specifying means when the calculated time average is within a predetermined range. The vibrato detection device according to claim 1, wherein information is output.

Storage means for storing melody pitch data representing the pitch of the melody;
Pitch detection means for detecting pitch in units of frames of a predetermined time length from sound data and generating pitch data representing the detected pitch;
Filter processing means for applying a filter process for extracting a specific frequency component to the pitch data generated by the pitch detection means;
Receiving means for receiving data indicating a vibrato section;
In the vibrato section indicated by the data received by the receiving means, the pitch represented by the melody pitch data stored in the storage means is a zero reference value, and the pitch data filtered by the filtering means and the zero reference value The pitch value of the pitch difference data representing the relative value is calculated, the time when the pitch difference data becomes maximum and minimum is specified, and the pitch vibration is determined from the difference between the adjacent maximum and minimum values. Pitch vibration width data generating means for generating width data;
A vibrato section indicated by the data received by the receiving means; and an output means for calculating a high evaluation when the pitch vibration width data is in a predetermined mode and outputting the evaluation. Evaluation device.

The said output means calculates the straight line which approximates the time change of the said pitch fluctuation range data, and calculates high evaluation, so that the absolute value of the calculated inclination of the said straight line is small. Vibrato evaluation device.

6. The output means calculates a standard deviation or standard error from a value of the pitch fluctuation data, and calculates a higher evaluation as the calculated standard deviation or standard error is smaller. Vibrato evaluation device.

In the vibrato section, a zero-cross location specifying means for specifying a location where the value of the pitch difference value data changes from negative to positive or from positive to negative as a zero-cross location;
A measuring means for measuring a time interval at which the zero-cross location specified by the zero-cross location specifying means appears, and calculating a standard deviation or a standard error of the measured time interval;
6. The vibrato evaluation apparatus according to claim 5, wherein the output unit calculates a higher evaluation as the standard deviation or standard error calculated by the calculation unit is smaller.

Storing a melody pitch data representing a melody pitch in a storage device;
A pitch detection stage that detects pitch in units of frames of a predetermined time length from sound data, and generates pitch data representing the detected pitch;
A filter processing step of applying a filter processing to extract a specific frequency component to the pitch data generated in the pitch detection step;
The pitch represented by the melody pitch data stored in the storage device is set to a zero reference value, and the value of the pitch difference data representing the relative value between the pitch data filtered in the filtering process and the zero reference value A zero-cross point identifying step for identifying a point where the pitch difference data changes from negative to positive or from positive to negative as a zero-cross point;
A specifying step for specifying a section in which the zero-cross point specified in the zero-cross point specifying step is in a predetermined mode;
In the pitch difference data, the pitch vibration width data generation step of specifying the time when the value of the pitch becomes maximum and minimum and generating the pitch vibration width data from the difference between the adjacent maximum value and the minimum value;
An output stage that outputs information indicating the section specified in the specific stage when the pitch vibration width data is in a predetermined mode in the section specified in the specific stage. Method.

Storing a melody pitch data representing a melody pitch in a storage device;
A pitch detection stage that detects pitch in units of frames of a predetermined time length from sound data, and generates pitch data representing the detected pitch;
A filter processing step of applying a filter processing to extract a specific frequency component to the pitch data generated in the pitch detection step;
A receiving stage for receiving data indicating a vibrato section;
In the vibrato section indicated by the data received in the receiving step, the pitch represented by the melody pitch data stored in the storage device is set as a zero reference value, and the pitch data and the zero reference value subjected to the filtering process in the filtering step The pitch value of the pitch difference data representing the relative value is calculated, the time when the pitch difference data becomes maximum and minimum is specified, and the pitch vibration is determined from the difference between the adjacent maximum and minimum values. Pitch vibration width data generation stage for generating width data;
In the vibrato section indicated by the data received in the receiving step, the vibrato section includes: an output step of calculating a high evaluation when the pitch vibration width data is in a predetermined form and outputting the evaluation. Evaluation methods.

Computer
Storage means for storing melody pitch data representing the pitch of the melody;
Pitch detection means for detecting pitch in units of frames of a predetermined time length from sound data and generating pitch data representing the detected pitch;
Filter processing means for applying a filter process for extracting a specific frequency component to the pitch data generated by the pitch detection means;
A pitch difference data value representing a relative value between the pitch data filtered by the filter processing means and the zero reference value, with the pitch represented by the melody pitch data stored in the storage means as a zero reference value A zero cross point specifying means for specifying a point where the pitch difference data changes from negative to positive or from positive to negative as a zero cross point;
A specifying means for specifying a section in which the zero-cross location specified by the zero-cross location specifying means is in a predetermined mode;
In the pitch difference data, the pitch vibration width data generating means for specifying the time at which the pitch value becomes maximum and minimum, and generating pitch vibration width data from the difference between the adjacent maximum value and minimum value;
A program for functioning as an output means for outputting information indicating a section specified by the specifying means when the pitch vibration width data is in a predetermined mode in the section specified by the specifying means.

Computer
Storage means for storing melody pitch data representing the pitch of the melody;
Pitch detection means for detecting pitch in units of frames of a predetermined time length from sound data and generating pitch data representing the detected pitch;
Filter processing means for applying a filter process for extracting a specific frequency component to the pitch data generated by the pitch detection means;
Receiving means for receiving data indicating a vibrato section;
In the vibrato section indicated by the data received by the receiving means, the pitch represented by the melody pitch data stored in the storage means is a zero reference value, and the pitch data filtered by the filtering means and the zero reference value The pitch value of the pitch difference data representing the relative value is calculated, the time when the pitch difference data becomes maximum and minimum is specified, and the pitch vibration is determined from the difference between the adjacent maximum and minimum values. Pitch vibration width data generating means for generating width data;
A program for calculating a high evaluation when the pitch vibration width data is in a predetermined form in the vibrato section indicated by the data received by the receiving means and functioning as an output means for outputting the evaluation.